[proofplan]
The goal is to replace every terminal symbol $a$ appearing on the left-hand side of a rule with a fresh variable $X_a$, then restore the terminals via new rules $X_a \to a$. Concretely, we define a renaming map $X$ that sends terminals to their fresh counterparts and leaves variables fixed, extend it functorially to strings as $\hat{X}$, and build a new grammar $G'$ whose rule set is $\hat{X}(P) \cup \{X_a \to a : a \in \Sigma\}$. The resulting $G'$ is variable-based by construction. Equivalence $\mathcal{L}(G) = \mathcal{L}(G')$ follows by translating each $G$-derivation of a word $w$ through $\hat{X}$ and then firing $|w|$ rules from $P''$ to recover $w$; conversely, any $G'$-derivation of $w$ uses exactly $|w|$ rules from $P''$ (one per terminal in $w$), and removing these yields a $G$-derivation after applying the inverse renaming.
[/proofplan]
[step:Set up the alphabets, the renaming map, and its functorial extension]
Let $G = (\Sigma, V, P, S)$ be a grammar, with $\Omega := \Sigma \cup V$ the joint alphabet. For each terminal $a \in \Sigma$ introduce a fresh symbol $X_a$ distinct from every element of $\Omega$. Set
\begin{align*}
V_X &:= \{X_a : a \in \Sigma\}, & V' &:= V \cup V_X, & \Omega' &:= \Sigma \cup V'.
\end{align*}
By construction $V_X \cap \Omega = \varnothing$, so $V \cap V_X = \varnothing$ and the union $V \cup V_X$ is disjoint.
Define the renaming map
\begin{align*}
X: \Omega &\to V' \\
x &\mapsto \begin{cases} X_a & \text{if } x = a \in \Sigma, \\ A & \text{if } x = A \in V. \end{cases}
\end{align*}
Extend $X$ to strings by the functorial extension
\begin{align*}
\hat{X}: \Omega^\star &\to (V')^\star \\
x_1 x_2 \cdots x_n &\mapsto X(x_1)\, X(x_2) \cdots X(x_n),
\end{align*}
with $\hat{X}(\varepsilon) = \varepsilon$. The extension $\hat{X}$ is a monoid homomorphism: $\hat{X}(\alpha \beta) = \hat{X}(\alpha)\hat{X}(\beta)$ for all $\alpha, \beta \in \Omega^\star$, directly from the definition of concatenation and the pointwise definition of $\hat{X}$.
The map $X$ is injective on $\Omega$ (distinct inputs land in distinct cases, and within each case $x \mapsto X_x$ or $x \mapsto x$ is injective), so $\hat{X}$ is injective on $\Omega^\star$.
[/step]
[step:Define $G'$ and verify it is variable-based]
Set
\begin{align*}
P' &:= \{\hat{X}(\alpha) \to \hat{X}(\beta) : (\alpha \to \beta) \in P\}, \\
P'' &:= \{X_a \to a : a \in \Sigma\}, \\
G' &:= (\Sigma, V', P' \cup P'', S).
\end{align*}
We verify $G'$ is a grammar with terminal alphabet $\Sigma$, variable alphabet $V'$, start symbol $S \in V' $ (since $S \in V \subseteq V'$), and rules $P' \cup P''$. Each rule in $P'$ has the form $\hat{X}(\alpha) \to \hat{X}(\beta)$ with $\hat{X}(\alpha), \hat{X}(\beta) \in (V')^\star$: indeed every symbol of $\hat{X}(\alpha)$ lies in $V'$, because $X(a) = X_a \in V_X \subseteq V'$ for $a \in \Sigma$ and $X(A) = A \in V \subseteq V'$ for $A \in V$. Each rule in $P''$ has left-hand side $X_a \in V_X \subseteq V'$ and right-hand side $a \in \Sigma \subseteq (\Omega')^\star$.
A grammar is **variable-based** when every rule has left-hand side and right-hand side consisting solely of variables, except possibly for right-hand sides that are pure terminals in rules whose left-hand side is a single variable. More precisely, we require: for every rule $\gamma \to \delta$, either (i) both $\gamma, \delta \in (V')^\star$, or (ii) $\gamma \in V'$ is a single variable and $\delta \in \Sigma$ is a single terminal. Rules in $P'$ satisfy (i) by the previous paragraph. Rules in $P''$ satisfy (ii). Hence $G'$ is variable-based.
[/step]
[step:Show every $G$-derivation lifts to a $G'$-derivation of the same word]
We show: if $S \xrightarrow{G} \alpha$ with $\alpha \in \Omega^\star$, then $S \xrightarrow{G'} \hat{X}(\alpha)$.
The proof is by induction on the length $n$ of the $G$-derivation $S = \sigma_0 \to_G \sigma_1 \to_G \cdots \to_G \sigma_n = \alpha$.
*Base case* $n = 0$: here $\alpha = S$ and $\hat{X}(S) = X(S) = S$ since $S \in V$. The empty derivation $S \xrightarrow{G'} S = \hat{X}(\alpha)$ suffices.
*Inductive step.* Assume $S \xrightarrow{G'} \hat{X}(\sigma_{n-1})$, and consider the last $G$-step $\sigma_{n-1} \to_G \sigma_n$. By the definition of a single derivation step, there exist a rule $(\gamma \to \delta) \in P$ and strings $\mu, \nu \in \Omega^\star$ such that
\begin{align*}
\sigma_{n-1} &= \mu\gamma\nu, & \sigma_n &= \mu\delta\nu.
\end{align*}
Applying $\hat{X}$ and using that $\hat{X}$ is a monoid homomorphism,
\begin{align*}
\hat{X}(\sigma_{n-1}) &= \hat{X}(\mu)\,\hat{X}(\gamma)\,\hat{X}(\nu), & \hat{X}(\sigma_n) &= \hat{X}(\mu)\,\hat{X}(\delta)\,\hat{X}(\nu).
\end{align*}
The rule $\hat{X}(\gamma) \to \hat{X}(\delta)$ belongs to $P'$ by definition, so $\hat{X}(\sigma_{n-1}) \to_{G'} \hat{X}(\sigma_n)$. Concatenating with the derivation $S \xrightarrow{G'} \hat{X}(\sigma_{n-1})$ obtained from the inductive hypothesis gives $S \xrightarrow{G'} \hat{X}(\sigma_n)$, completing the induction.
[/step]
[step:Conclude $\mathcal{L}(G) \subseteq \mathcal{L}(G')$ by firing the $P''$-rules]
Let $w \in \mathcal{L}(G)$ and write $w = a_1 a_2 \cdots a_m \in \Sigma^\star$ with $m = |w|$. Then $S \xrightarrow{G} w$, and by the previous step $S \xrightarrow{G'} \hat{X}(w) = X_{a_1} X_{a_2} \cdots X_{a_m}$.
For each index $i \in \{1, \dots, m\}$ apply the $P''$-rule $X_{a_i} \to a_i$ to the $i$-th symbol of the current string: this gives
\begin{align*}
X_{a_1} \cdots X_{a_m} \to_{G'} a_1 X_{a_2} \cdots X_{a_m} \to_{G'} \cdots \to_{G'} a_1 a_2 \cdots a_m = w.
\end{align*}
Concatenating these $m$ single steps onto the derivation $S \xrightarrow{G'} \hat{X}(w)$ yields $S \xrightarrow{G'} w$. Since $w \in \Sigma^\star$, we conclude $w \in \mathcal{L}(G')$. Hence $\mathcal{L}(G) \subseteq \mathcal{L}(G')$.
[guided]
The previous step gave a derivation that reaches $\hat{X}(w)$, which is a string of *variables* (the $X_{a_i}$), not the word $w$ itself. The only way to eliminate these variables is through the rules in $P''$, each of which replaces exactly one $X_a$ by $a$. Since $\hat{X}(w) = X_{a_1} \cdots X_{a_m}$ has $m$ variables and each $P''$-rule removes one, we need exactly $m$ applications.
Concretely, for $w = a_1 a_2 \cdots a_m \in \mathcal{L}(G)$ we have $S \xrightarrow{G'} \hat{X}(w) = X_{a_1} \cdots X_{a_m}$ by the previous step. For each $i \in \{1, \dots, m\}$ the $P''$-rule $X_{a_i} \to a_i$ replaces the $i$-th symbol, yielding the chain
\begin{align*}
X_{a_1} X_{a_2} \cdots X_{a_m} &\to_{G'} a_1 X_{a_2} \cdots X_{a_m} \to_{G'} a_1 a_2 X_{a_3} \cdots X_{a_m} \to_{G'} \cdots \to_{G'} a_1 a_2 \cdots a_m = w.
\end{align*}
Each individual step is a valid $G'$-derivation step: the rule $X_{a_i} \to a_i$ is in $P'' \subseteq P' \cup P''$, and the surrounding context $a_1 \cdots a_{i-1}$ (left) and $X_{a_{i+1}} \cdots X_{a_m}$ (right) makes the step a legal rewriting. Since derivations are closed under concatenation, $S \xrightarrow{G'} w$, and since $w \in \Sigma^\star$ it belongs to $\mathcal{L}(G')$.
Note a subtlety: the order in which we fire the $P''$-rules does not matter (they are positioned at disjoint locations), but the count is forced to be exactly $m = |w|$. This observation becomes essential in the converse direction.
[/guided]
[/step]
[step:Count the $P''$-steps in any $G'$-derivation of a terminal word]
We now argue the converse. Fix $w \in \mathcal{L}(G')$ with $|w| = m$, and let
\begin{align*}
S = \tau_0 \to_{G'} \tau_1 \to_{G'} \cdots \to_{G'} \tau_N = w
\end{align*}
be a $G'$-derivation of $w$. Classify each step as a **$P'$-step** (the rule used belongs to $P'$) or a **$P''$-step** (the rule used belongs to $P''$), and let $k$ denote the number of $P''$-steps.
[claim:Every $G'$-derivation of $w$ uses exactly $m$ $P''$-steps]
We count the number of terminal symbols (i.e., elements of $\Sigma$) in each $\tau_i$. Let $t_i := |\tau_i|_\Sigma$ denote this count.
- A $P'$-step replaces $\hat{X}(\gamma)$ by $\hat{X}(\delta)$ in the current string. Since $\hat{X}$ maps into $(V')^\star$, both $\hat{X}(\gamma)$ and $\hat{X}(\delta)$ contain no terminals. The surrounding context is unchanged. Hence $t_{i+1} = t_i$: $P'$-steps preserve the terminal count.
- A $P''$-step replaces a single $X_a \in V'$ by $a \in \Sigma$. Hence $t_{i+1} = t_i + 1$: each $P''$-step increments the terminal count by one.
Starting from $\tau_0 = S$ with $t_0 = 0$ (since $S \in V$ is a variable) and ending at $\tau_N = w$ with $t_N = m$, the net terminal-count increment is $m$. Since only $P''$-steps change the count and each increments by one, we conclude $k = m$.
[/claim]
[proof]
See the computation inside the claim body above: the invariant that $P'$-steps preserve $t_i$ and $P''$-steps increment $t_i$ by one, combined with $t_0 = 0$ and $t_N = m$, forces the number $k$ of $P''$-steps to equal $m$.
[/proof]
[/step]
[step:Contract the $P''$-steps to recover a $G$-derivation]
Continue with the $G'$-derivation $S = \tau_0 \to_{G'} \cdots \to_{G'} \tau_N = w$. Let $i_1 < i_2 < \cdots < i_m$ be the indices of the $P''$-steps; the remaining $N - m$ steps are $P'$-steps.
Define, for $0 \leq j \leq N$, the "devariablised" string
\begin{align*}
\psi_j &:= \hat{X}^{-1}\!\left(\text{the result of reapplying $X$ to each terminal of $\tau_j$}\right).
\end{align*}
More precisely, let $Y: \Omega' \to \Omega$ be the inverse renaming
\begin{align*}
Y(x) &:= \begin{cases} a & \text{if } x = X_a \in V_X, \\ a & \text{if } x = a \in \Sigma, \\ A & \text{if } x = A \in V, \end{cases}
\end{align*}
and $\hat{Y}: (\Omega')^\star \to \Omega^\star$ its functorial extension. Note $Y$ sends both $a$ and $X_a$ to $a$, collapsing the two copies of $\Sigma$ inside $\Omega'$ into one; it is not an inverse to $X$ in the strict sense but rather a two-to-one retraction. We set $\psi_j := \hat{Y}(\tau_j)$.
We show: if the step $\tau_j \to_{G'} \tau_{j+1}$ is a $P'$-step, then $\psi_j \to_G \psi_{j+1}$ or $\psi_j = \psi_{j+1}$; if it is a $P''$-step, then $\psi_j = \psi_{j+1}$.
- *$P'$-step.* The step uses a rule $\hat{X}(\gamma) \to \hat{X}(\delta) \in P'$ at positions $\mu, \nu \in (\Omega')^\star$:
\begin{align*}
\tau_j &= \mu\, \hat{X}(\gamma)\, \nu, & \tau_{j+1} &= \mu\, \hat{X}(\delta)\, \nu.
\end{align*}
Applying $\hat{Y}$ and using it is a monoid homomorphism,
\begin{align*}
\psi_j &= \hat{Y}(\mu)\, \hat{Y}(\hat{X}(\gamma))\, \hat{Y}(\nu), & \psi_{j+1} &= \hat{Y}(\mu)\, \hat{Y}(\hat{X}(\delta))\, \hat{Y}(\nu).
\end{align*}
We claim $\hat{Y}(\hat{X}(\gamma)) = \gamma$ for every $\gamma \in \Omega^\star$: on a terminal $a$ we have $\hat{Y}(\hat{X}(a)) = Y(X_a) = a$, and on a variable $A$ we have $\hat{Y}(\hat{X}(A)) = Y(A) = A$, so $\hat{Y} \circ \hat{X} = \mathrm{id}_{\Omega^\star}$ by homomorphic extension. Therefore
\begin{align*}
\psi_j &= \hat{Y}(\mu)\, \gamma\, \hat{Y}(\nu), & \psi_{j+1} &= \hat{Y}(\mu)\, \delta\, \hat{Y}(\nu),
\end{align*}
and since $(\gamma \to \delta) \in P$ by definition of $P'$, this is a valid $G$-step: $\psi_j \to_G \psi_{j+1}$.
- *$P''$-step.* The step uses a rule $X_a \to a$:
\begin{align*}
\tau_j &= \mu X_a \nu, & \tau_{j+1} &= \mu a \nu.
\end{align*}
Applying $\hat{Y}$,
\begin{align*}
\psi_j &= \hat{Y}(\mu)\, Y(X_a)\, \hat{Y}(\nu) = \hat{Y}(\mu)\, a\, \hat{Y}(\nu), \\
\psi_{j+1} &= \hat{Y}(\mu)\, Y(a)\, \hat{Y}(\nu) = \hat{Y}(\mu)\, a\, \hat{Y}(\nu).
\end{align*}
Hence $\psi_j = \psi_{j+1}$: $P''$-steps are contracted to no-ops.
Assembling the chain $(\psi_0, \psi_1, \dots, \psi_N)$ and removing consecutive duplicates produces a sequence of genuine $G$-derivation steps from $\psi_0 = \hat{Y}(S) = S$ to $\psi_N = \hat{Y}(w) = w$ (the latter holds because $w \in \Sigma^\star$ and $\hat{Y}$ fixes terminals of $\Sigma$). This is a $G$-derivation of $w$ from $S$, so $w \in \mathcal{L}(G)$.
[guided]
The strategy is to "undo" the renaming in reverse. The map $\hat{Y}$ is designed so that every symbol in the extended alphabet $\Omega' = \Sigma \cup V \cup V_X$ is sent back to its corresponding symbol in the original alphabet $\Omega = \Sigma \cup V$: both $a$ and $X_a$ go to $a$, and each $A \in V$ goes to itself. The key computation is $\hat{Y} \circ \hat{X} = \mathrm{id}_{\Omega^\star}$, which says: renaming via $X$ and then contracting via $Y$ recovers the original.
Why does a $P''$-step become a no-op under $\hat{Y}$? Because the only thing a $P''$-step does is replace $X_a$ by $a$, and $\hat{Y}$ identifies these two symbols. So after applying $\hat{Y}$ to both sides, the strings are indistinguishable.
Why does a $P'$-step translate back to a $G$-step? Because the rule $\hat{X}(\gamma) \to \hat{X}(\delta) \in P'$ was built as the $\hat{X}$-image of a rule $\gamma \to \delta \in P$. Applying $\hat{Y}$ unwinds $\hat{X}$ precisely, so the transformed step is $\gamma \to \delta$ applied at the (translated) positions $\hat{Y}(\mu), \hat{Y}(\nu)$.
After the translation, the sequence $(\psi_0, \dots, \psi_N)$ may contain consecutive duplicates corresponding to the $P''$-steps. A derivation does not permit idle steps (steps where the string is unchanged), but removing consecutive duplicates preserves the fact that consecutive *distinct* strings are related by a valid rule from $P$. This shortened sequence is the desired $G$-derivation.
What happens at the endpoints? Since $S \in V$, $\hat{Y}(S) = Y(S) = S$. Since $w \in \Sigma^\star$, each symbol of $w$ is fixed by $Y$, so $\hat{Y}(w) = w$. Hence the compressed derivation starts at $S$ and ends at $w$, witnessing $w \in \mathcal{L}(G)$.
[/guided]
[/step]
[step:Combine both directions to conclude equivalence]
From the fourth step, $\mathcal{L}(G) \subseteq \mathcal{L}(G')$. From the previous step, $\mathcal{L}(G') \subseteq \mathcal{L}(G)$. Hence $\mathcal{L}(G) = \mathcal{L}(G')$, and since $G'$ is variable-based (second step), $G'$ is a variable-based grammar equivalent to $G$. This completes the proof.
[/step]