[proofplan]
The easy inclusion $\mathcal{L}(G') \subseteq \mathcal{L}(G)$ follows because $G'$ is obtained from $G$ by deleting rules, so every $G'$-derivation is already a $G$-derivation. The reverse inclusion is the content of the theorem: given $w \in \mathcal{L}(G)$, we must remove all unit rules from a derivation. We argue by minimality. Pick a $G$-derivation of $w$ of minimum length and suppose, for contradiction, that it uses a unit rule $A \to B$. We trace the variable descendant of $B$ created by that rule through the remainder of the derivation until it is first rewritten by a non-unit $B$-rule $B \to \zeta$. Unit closure of $G$ then provides a rule $A \to \zeta$, allowing us to splice out the unit step and replace the eventual $B \to \zeta$ step by a single direct step $A \to \zeta$ — saving one step and contradicting minimality. Hence a shortest $G$-derivation uses no unit rules, so it is a valid $G'$-derivation and $w \in \mathcal{L}(G')$.
[/proofplan]
[step:Fix notation for the grammars, derivations, and the hypothesis of unit closure]
Let $G = (V, \Sigma, P, S)$ be a [context-free grammar](/pages/???) with variables $V$, terminals $\Sigma$, productions $P \subseteq V \times (V \cup \Sigma)^*$, and start symbol $S \in V$. A production $A \to B$ with $A, B \in V$ is called a *unit rule*. Define the set of non-unit rules
\begin{align*}
P' &:= \{A \to \gamma \in P : \gamma \notin V\}
\end{align*}
and let $G' := (V, \Sigma, P', S)$ be the grammar obtained from $G$ by deleting all unit rules. The grammar $G$ is *unit-closed* iff for every chain of unit rules $A = A_0 \to A_1 \to \cdots \to A_k$ in $P$ and every non-unit rule $A_k \to \gamma \in P$, the rule $A \to \gamma$ also lies in $P$. In particular, whenever $A \xrightarrow{*}_G B$ via unit rules only and $B \to \zeta$ is a non-unit rule of $P$, the rule $A \to \zeta$ is also in $P$.
Write $\alpha \Rightarrow_G \beta$ for a one-step $G$-derivation and $S \xrightarrow{*}_G w$ for a multi-step derivation; the *length* of a derivation is its number of one-step transitions. The language is $\mathcal{L}(G) := \{w \in \Sigma^* : S \xrightarrow{*}_G w\}$, and analogously $\mathcal{L}(G') := \{w \in \Sigma^* : S \xrightarrow{*}_{G'} w\}$.
[/step]
[step:Establish the easy inclusion $\mathcal{L}(G') \subseteq \mathcal{L}(G)$]
Let $w \in \mathcal{L}(G')$. By definition there is a derivation $S = \alpha_0 \Rightarrow_{G'} \alpha_1 \Rightarrow_{G'} \cdots \Rightarrow_{G'} \alpha_n = w$. Each one-step transition $\alpha_i \Rightarrow_{G'} \alpha_{i+1}$ uses a rule $A \to \gamma \in P' \subseteq P$, hence is also a one-step transition $\alpha_i \Rightarrow_G \alpha_{i+1}$. Concatenating gives $S \xrightarrow{*}_G w$, so $w \in \mathcal{L}(G)$.
[/step]
[step:Set up the reverse inclusion via a shortest $G$-derivation]
Fix $w \in \mathcal{L}(G)$. The set of $G$-derivations of $w$ is nonempty, so by well-ordering of $\mathbb{N}$ we may choose one of *minimum length* $n \ge 0$:
\begin{align*}
S = \alpha_0 \Rightarrow_G \alpha_1 \Rightarrow_G \cdots \Rightarrow_G \alpha_n = w.
\end{align*}
We claim this derivation uses no unit rules; then every one-step transition uses a rule in $P' = P \setminus \{\text{unit rules}\}$, witnessing $w \in \mathcal{L}(G')$ and completing the proof. The remaining steps establish this claim by contradiction.
[/step]
[step:Isolate the first unit-rule step and trace the induced variable]
Suppose toward a contradiction that the minimal derivation uses at least one unit rule. Let $i_0$ be the smallest index at which a unit rule is applied; then the step $\alpha_{i_0} \Rightarrow_G \alpha_{i_0 + 1}$ rewrites some occurrence of a variable $A$ at position $k$ using a unit rule $A \to B \in P$, so
\begin{align*}
\alpha_{i_0} &= \mu\, A\, \nu, & \alpha_{i_0 + 1} &= \mu\, B\, \nu,
\end{align*}
for some $\mu, \nu \in (V \cup \Sigma)^*$. Write $\alpha\, A\, \beta := \alpha_{i_0}$ and $\alpha\, B\, \beta := \alpha_{i_0 + 1}$ using the shorthand $\alpha := \mu$, $\beta := \nu$.
Since $w \in \Sigma^*$, every variable of $\alpha_{i_0 + 1}$ must eventually be rewritten by a non-unit rule terminating its derivation into terminals; in particular the distinguished occurrence of $B$ at position $k$ is eventually rewritten. Let $j \ge i_0 + 1$ be the smallest index at which this distinguished $B$-occurrence is the symbol being rewritten, and let the rule applied there be $B \to \zeta$ for some $\zeta \in (V \cup \Sigma)^*$.
We claim $B \to \zeta$ is a non-unit rule, i.e., $\zeta \notin V$. Indeed, if $\zeta = C \in V$ were a unit rule, then replacing $A \to B$ by the unit rule $A \to C$ (which exists in $P$ by unit closure, applied to the length-$2$ unit chain $A \to B \to C$) at step $i_0$ and skipping the step at $j$ would shorten the derivation by one — but we assume minimality. (We formalise a stronger version of this rerouting argument in the next step; the current paragraph only rules out the simple two-link case. For now it suffices to note that by repeating this reasoning one may take $j$ to be the first index at which the distinguished descendant of $B$ is rewritten by a non-unit rule.)
Between steps $i_0 + 1$ and $j$, the derivation must rewrite *other* symbols of $\alpha_{i_0 + 1}$, producing intermediate sentential forms $\alpha_{i_0 + 1}, \alpha_{i_0 + 2}, \dots, \alpha_j$ in which the distinguished $B$-occurrence at position $k$ remains untouched. Hence there exist $\gamma, \delta \in (V \cup \Sigma)^*$ with
\begin{align*}
\alpha_j &= \gamma\, B\, \delta, & \alpha_{j + 1} &= \gamma\, \zeta\, \delta,
\end{align*}
and the sub-derivation
\begin{align*}
\alpha\, B\, \beta = \alpha_{i_0 + 1} \xrightarrow{*}_G \alpha_j = \gamma\, B\, \delta
\end{align*}
uses no rule with left-hand side the distinguished $B$ (by minimality of $j$).
[/step]
[step:Construct a strictly shorter derivation using unit closure]
[claim:The rule $A \to \zeta$ belongs to $P$]
By hypothesis $A \to B \in P$ and $B \to \zeta \in P$ with $\zeta \notin V$. The length-$1$ unit chain $A \to B$ together with the non-unit rule $B \to \zeta$ satisfies the hypothesis of unit closure of $G$. Therefore $A \to \zeta \in P$.
[/claim]
We now splice the derivation. Starting from $\alpha_{i_0} = \alpha\, A\, \beta$, we construct a $G$-derivation that reaches the same sentential form $\alpha_{j + 1} = \gamma\, \zeta\, \delta$ in *strictly fewer* one-step transitions than the segment $\alpha_{i_0} \Rightarrow_G \cdots \Rightarrow_G \alpha_{j + 1}$.
*Step 1 (replay the $\alpha \to \gamma$, $\beta \to \delta$ transformations with $A$ in place of $B$).* Consider the sub-derivation $\alpha B \beta \xrightarrow{*}_G \gamma B \delta$, which has length $j - i_0 - 1$. Every one-step transition in this sub-derivation rewrites some variable occurrence inside $\alpha$ or inside $\beta$ (never the distinguished $B$, by minimality of $j$). Since each such transition is independent of the symbol at position $k$, the *same sequence* of productions applied to the sentential forms obtained by replacing that $B$ with $A$ gives a valid $G$-derivation
\begin{align*}
\alpha\, A\, \beta \xrightarrow{*}_G \gamma\, A\, \delta
\end{align*}
of length $j - i_0 - 1$.
*Step 2 (apply $A \to \zeta$ directly).* Using the rule $A \to \zeta \in P$ established in the claim, perform the single one-step transition
\begin{align*}
\gamma\, A\, \delta \Rightarrow_G \gamma\, \zeta\, \delta.
\end{align*}
Concatenating Steps 1 and 2 gives a $G$-derivation
\begin{align*}
\alpha_{i_0} = \alpha\, A\, \beta \xrightarrow{*}_G \gamma\, \zeta\, \delta = \alpha_{j + 1}
\end{align*}
of length $(j - i_0 - 1) + 1 = j - i_0$. The original segment had length $j + 1 - i_0$, one greater. Replacing the original segment by this shorter one (prepending $\alpha_0 \Rightarrow_G \cdots \Rightarrow_G \alpha_{i_0}$ and appending $\alpha_{j + 1} \Rightarrow_G \cdots \Rightarrow_G \alpha_n = w$) yields a $G$-derivation of $w$ of length $n - 1 < n$.
[/step]
[step:Conclude by contradiction and deduce $\mathcal{L}(G) \subseteq \mathcal{L}(G')$]
The derivation we constructed has length $n - 1$, contradicting the choice of the original derivation as one of minimum length $n$. Therefore our supposition fails: a shortest $G$-derivation of $w$ uses no unit rules.
Hence every one-step transition $\alpha_i \Rightarrow_G \alpha_{i + 1}$ in this shortest derivation uses a non-unit rule, i.e., a rule in $P'$. Reading the same sequence of productions as transitions of $G'$ gives a $G'$-derivation $S \xrightarrow{*}_{G'} w$, so $w \in \mathcal{L}(G')$.
Combining with the easy inclusion from earlier, $\mathcal{L}(G) = \mathcal{L}(G')$, as claimed.
[/step]