[proofplan]
We transform an arbitrary context-free grammar $G$ into an equivalent grammar $G'$ in [Chomsky normal form](/pages/???) through three successive rewriting passes, each of which is a terminating algorithm producing a new grammar with the same language. *Pass 1 (terminal extraction)* replaces every terminal symbol $a$ appearing in a rule of length $\ge 2$ by a fresh variable $T_a$, after installing the rule $T_a \to a$; this purges terminal-variable mixing from long rules. *Pass 2 (unit elimination)* forms the unit closure of the grammar and then deletes all unit rules; by the [Unit-Rule Removal Theorem](/theorems/1804), this preserves the language, and the result has no rules of the form $A \to B$ with $B$ a single variable. *Pass 3 (binarisation)* replaces every remaining long rule $A \to X_1 X_2 \cdots X_k$ ($k \ge 3$, $X_i \in V$) by the chain $A \to X_1 Y_1$, $Y_1 \to X_2 Y_2$, …, $Y_{k-2} \to X_{k-1} X_k$ using fresh variables. At the end, every production has one of the CNF shapes $A \to BC$ or $A \to a$ (we assume $\varepsilon \notin \mathcal{L}(G)$; the standard epsilon-elimination refinement handles the general case separately). Effectiveness is immediate because each pass is a finite, fully specified rewriting procedure operating on a finite grammar.
[/proofplan]
[step:Fix terminology, the CNF definition, and the standing assumption on $\varepsilon$]
Let $G = (V, \Sigma, P, S)$ be a [context-free grammar](/pages/???) with variables $V$, terminals $\Sigma$, productions $P$, and start symbol $S$. A rule is written $A \to \gamma$ with $A \in V$, $\gamma \in (V \cup \Sigma)^*$.
A grammar $G'$ is in *Chomsky normal form* iff every production has one of the two shapes
\begin{align*}
A &\to B\, C, \qquad A, B, C \in V, \\
A &\to a, \qquad\quad A \in V,\ a \in \Sigma.
\end{align*}
In particular no rule has right-hand side $\varepsilon$, and no right-hand side mixes terminals and variables. We assume throughout that $\varepsilon \notin \mathcal{L}(G)$; the reduction $\varepsilon \in \mathcal{L}(G) \Rightarrow \varepsilon \in \mathcal{L}(G')$ is handled by a separate pre-processing step (epsilon elimination) that we cite as [Elimination of $\varepsilon$-Rules](/theorems/???) and do not repeat here. Our goal is to produce, from $G$, a CNF grammar $G'$ with $\mathcal{L}(G') = \mathcal{L}(G)$.
Two grammars $G$, $G''$ are *equivalent* iff $\mathcal{L}(G) = \mathcal{L}(G'')$. We check equivalence after each pass; composition of equivalences gives $\mathcal{L}(G') = \mathcal{L}(G)$.
[/step]
[step:Pass 1 — extract terminals occurring inside long right-hand sides]
Define the set of *long rules* as
\begin{align*}
P_{\ge 2} &:= \{A \to \gamma \in P : |\gamma| \ge 2\}.
\end{align*}
For each terminal $a \in \Sigma$ that appears in *some* $\gamma$ with $A \to \gamma \in P_{\ge 2}$, introduce a fresh variable $T_a \notin V$. Let $V_1 := V \cup \{T_a : a \in \Sigma_1\}$, where $\Sigma_1 \subseteq \Sigma$ is the set of such terminals. Build $P_1$ by:
- for every rule $A \to \gamma \in P$ with $|\gamma| = 1$, keep it unchanged;
- for every rule $A \to X_1 \cdots X_k \in P_{\ge 2}$ (with $X_i \in V \cup \Sigma$), replace each $X_i \in \Sigma_1$ by $T_{X_i}$, obtaining a new rule whose right-hand side lies in $V_1^*$;
- add the rules $T_a \to a$ for each $a \in \Sigma_1$.
Let $G_1 := (V_1, \Sigma, P_1, S)$.
[claim:$\mathcal{L}(G_1) = \mathcal{L}(G)$]
*Forward ($\mathcal{L}(G) \subseteq \mathcal{L}(G_1)$).* Given a $G$-derivation $S \xrightarrow{*}_G w$, imitate each step in $G_1$: if the step uses a length-$1$ rule, use the identical rule in $P_1$; if it uses a long rule $A \to X_1 \cdots X_k$, use the corresponding $P_1$-rule $A \to X_1' \cdots X_k'$ (with $X_i' = T_{X_i}$ for $X_i \in \Sigma_1$ and $X_i' = X_i$ otherwise), then, for each substituted $T_{X_i}$, apply the rule $T_{X_i} \to X_i \in P_1$ at the new variable occurrence. The resulting sequence is a valid $G_1$-derivation producing $w$.
*Reverse ($\mathcal{L}(G_1) \subseteq \mathcal{L}(G)$).* Given a $G_1$-derivation $S \xrightarrow{*}_{G_1} w$ (with $w \in \Sigma^*$), note that every fresh variable $T_a$ has exactly one $G_1$-rule, namely $T_a \to a$. Collapse each pair (use of long rule in $P_1$, subsequent rewriting of each introduced $T_a$) into a single use of the corresponding $G$-rule: since $w \in \Sigma^*$ and $T_a$ is not a terminal, every $T_a$-occurrence must eventually be rewritten, and the only available rewrite is $T_a \to a$. The resulting sequence is a valid $G$-derivation.
[/claim]
After Pass 1, every rule of $G_1$ is either a length-$1$ rule (of the form $A \to a$ with $a \in \Sigma$, or of the form $A \to B$ with $B \in V$; note that $T_a \to a$ is length-$1$) or a rule whose right-hand side lies in $V_1^*$ (all-variable, length $\ge 2$).
[/step]
[step:Pass 2 — form the unit closure and delete unit rules]
A rule $A \to B$ with $A, B \in V_1$ is a *unit rule*. We first form the unit closure $\widehat{G}_1 := (V_1, \Sigma, \widehat{P}_1, S)$, where $\widehat{P}_1$ contains all rules of $P_1$ together with, for every chain $A = A_0 \to A_1 \to \cdots \to A_k$ of unit rules in $P_1$ and every non-unit rule $A_k \to \gamma \in P_1$, the added rule $A \to \gamma$. The closure $\widehat{P}_1$ is obtained by iterating this saturation until no new rule is added; since the number of potential rules $V_1 \times (V_1 \cup \Sigma)^*$ involved is bounded by $|V_1| \cdot |P_1|$ (every added rule has the form $A \to \gamma$ with $A \in V_1$ and $\gamma$ the right-hand side of some pre-existing rule of $P_1$), saturation terminates in at most $|V_1| \cdot |P_1|$ steps.
The grammar $\widehat{G}_1$ is *unit-closed* by construction, and adding rules to a grammar cannot decrease its language; that additions also cannot increase the language is because each added rule $A \to \gamma$ is derivable in $G_1$ as $A \Rightarrow_{G_1} A_1 \Rightarrow_{G_1} \cdots \Rightarrow_{G_1} A_k \Rightarrow_{G_1} \gamma$. Therefore $\mathcal{L}(\widehat{G}_1) = \mathcal{L}(G_1)$.
Now let $G_2 := (V_1, \Sigma, P_2, S)$ where $P_2$ is obtained from $\widehat{P}_1$ by deleting all unit rules. Applying the [Unit-Rule Removal Theorem](/theorems/1804) to the unit-closed grammar $\widehat{G}_1$:
[claim:$\mathcal{L}(G_2) = \mathcal{L}(\widehat{G}_1)$]
The grammar $\widehat{G}_1$ is unit-closed, and $G_2$ is obtained by deleting all unit rules. The [Unit-Rule Removal Theorem](/theorems/1804) asserts that for any unit-closed grammar, deleting all unit rules preserves the language. Applying it gives $\mathcal{L}(G_2) = \mathcal{L}(\widehat{G}_1)$.
[/claim]
Chaining, $\mathcal{L}(G_2) = \mathcal{L}(\widehat{G}_1) = \mathcal{L}(G_1) = \mathcal{L}(G)$. After Pass 2 every production of $G_2$ has one of the shapes
\begin{align*}
&A \to a \quad (A \in V_1,\ a \in \Sigma), \\
&A \to X_1 X_2 \cdots X_k \quad (A, X_i \in V_1,\ k \ge 2),
\end{align*}
because unit rules are the only length-$1$ rules with variable right-hand side, and they have been deleted, while length-$1$ rules with terminal right-hand side remain. The number of added rules during unit closure is bounded by $|V_1| \cdot |P_1|$, as asserted.
[/step]
[step:Pass 3 — binarise right-hand sides of length $\ge 3$]
Consider a rule $A \to X_1 X_2 \cdots X_k \in P_2$ with $k \ge 3$ and $X_1, \dots, X_k \in V_1$. Introduce fresh variables $Y_1^{(A, X_1 \cdots X_k)}, \dots, Y_{k - 2}^{(A, X_1 \cdots X_k)}$ (one set of fresh variables per rule, so distinct rules do not share auxiliary variables), abbreviated $Y_1, \dots, Y_{k-2}$ when the rule is clear from context. Replace the rule by the chain of binary rules
\begin{align*}
A &\to X_1\, Y_1, \\
Y_i &\to X_{i+1}\, Y_{i+1}, \qquad 1 \le i \le k - 3, \\
Y_{k-2} &\to X_{k-1}\, X_k.
\end{align*}
Rules with $k = 2$ (binary) and $k = 1$ (necessarily $A \to a$ with $a \in \Sigma$, since unit rules have been eliminated) are left unchanged.
Let $V'$ be $V_1$ augmented by all fresh auxiliaries introduced, $P'$ the resulting production set, and $G' := (V', \Sigma, P', S)$.
[claim:$\mathcal{L}(G') = \mathcal{L}(G_2)$]
*Forward ($\mathcal{L}(G_2) \subseteq \mathcal{L}(G')$).* Given a $G_2$-derivation, imitate each step in $G'$: binary and unary steps reuse the identical rule, while a use of a long rule $A \to X_1 \cdots X_k$ is simulated by the chain of $k - 1$ binary steps $A \Rightarrow_{G'} X_1 Y_1 \Rightarrow_{G'} X_1 X_2 Y_2 \Rightarrow_{G'} \cdots \Rightarrow_{G'} X_1 X_2 \cdots X_k$.
*Reverse ($\mathcal{L}(G') \subseteq \mathcal{L}(G_2)$).* Given a $G'$-derivation producing $w \in \Sigma^*$, each auxiliary $Y_i^{(A, X_1 \cdots X_k)}$ appears exclusively on the left-hand side of exactly one $G'$-rule (the rule introduced for it in Pass 3) and on the right-hand side of the preceding chain-rule; so once $Y_i$ appears, the only way to reach a terminal string is to apply the unique $G'$-rule $Y_i \to X_{i+1}\, Y_{i+1}$ (or $Y_{k-2} \to X_{k-1} X_k$ at the end of the chain). Collapsing each full chain of $k - 1$ binary rewrites into a single use of the original rule $A \to X_1 \cdots X_k$ in $P_2$ produces a valid $G_2$-derivation of $w$.
[/claim]
Chaining with the previous equivalences, $\mathcal{L}(G') = \mathcal{L}(G_2) = \mathcal{L}(G)$.
Each rule $A \to X_1 \cdots X_k$ contributes $k - 2$ fresh auxiliaries and yields $k - 1$ binary rules in place of one $k$-rule, so Pass 3 adds at most $\sum_{A \to \alpha \in P_2} (|\alpha| - 2) \le \sum_{A \to \alpha \in P} (|\alpha| - 2)$ fresh variables and rules, as asserted.
[/step]
[step:Verify $G'$ is in Chomsky normal form and conclude effectiveness]
Inspect the productions of $G'$:
- Terminal-valued length-$1$ rules: these are the rules $A \to a$ (from $G_2$) and $T_a \to a$ (introduced in Pass 1). Both have the CNF shape $A \to a$ with $A \in V'$, $a \in \Sigma$.
- Binary rules: these are either the unchanged binary rules of $G_2$ (with both symbols in $V_1 \subseteq V'$ because terminal extraction in Pass 1 ensured no long rule contains a terminal) or the rules produced by the binarisation chain $A \to X_1 Y_1$, $Y_i \to X_{i+1} Y_{i+1}$, $Y_{k-2} \to X_{k-1} X_k$ (all entries in $V_1 \cup \{\text{auxiliaries}\} = V'$). All have the CNF shape $A \to B\, C$.
- There are no remaining length-$1$ rules with variable right-hand side, because all unit rules were deleted in Pass 2.
- There are no remaining rules with $|\gamma| \ge 3$, because Pass 3 replaced each such rule.
- There are no rules with $\gamma = \varepsilon$: these were ruled out by the standing assumption $\varepsilon \notin \mathcal{L}(G)$ and the pre-processing step that eliminates $\varepsilon$-rules from $G$ before Pass 1.
Therefore every production of $G'$ has CNF shape, and $G'$ is in Chomsky normal form.
*Effectiveness.* Pass 1 is performed by a single scan of $P$: for each $a \in \Sigma$ appearing in any long rule, add $T_a$ and the rule $T_a \to a$, and rewrite long rules. Pass 2 is a saturation procedure: repeatedly adjoin rules obtained by unit composition until a fixpoint is reached, then delete unit rules; the number of iterations is bounded by $|V_1| \cdot |P_1|$ because each new rule has the form $A \to \gamma$ with $\gamma$ the right-hand side of a pre-existing rule and $A$ a variable, giving a finite search space. Pass 3 is a finite rewrite of every long rule. Each pass halts on every finite input, so the composition is a terminating algorithm. Hence $G'$ is computable from $G$.
Combining $\mathcal{L}(G') = \mathcal{L}(G_2) = \mathcal{L}(\widehat{G}_1) = \mathcal{L}(G_1) = \mathcal{L}(G)$ with the verification that $G'$ is in CNF, every context-free grammar has an algorithmically computable CNF equivalent, completing the proof.
[/step]