[proofplan]
We prove by induction on the length of a $G$-derivation that every sentential form $\alpha$ produced by a regular grammar from its start symbol satisfies $\alpha \in \mathbb{W} \cup \mathbb{W}V$ — either a pure terminal string or a terminal string followed by a single variable. The base case is immediate: the zero-length derivation gives $\alpha = S = \varepsilon S \in \mathbb{W} V$. For the inductive step, a regular grammar has only rules of the form $A \to a$ and $A \to aB$; the form of the current string (in $\mathbb{W}$ or $\mathbb{W}V$) restricts which rules can fire, and we verify that whichever case applies produces a string of the same shape.
[/proofplan]
[step:Fix the convention for regular grammar rules]
A **regular grammar** $G = (\Sigma, V, P, S)$ has the property that every rule $r \in P$ has one of the two forms
\begin{align*}
A &\to a, & A &\to a B,
\end{align*}
where $A, B \in V$ and $a \in \Sigma$. Write $\mathbb{W} := \Sigma^\star$ and $\mathbb{W} V := \{wA : w \in \mathbb{W},\; A \in V\}$; these are disjoint subsets of $(\Sigma \cup V)^\star$ because elements of $\mathbb{W}$ contain no variables and elements of $\mathbb{W} V$ contain exactly one variable, namely the final symbol.
The rules $A \to a$ are called **terminal rules** and the rules $A \to aB$ are called **nonterminal rules**. A terminal rule applied to a sentential form replaces a variable by a terminal; a nonterminal rule applied to a sentential form replaces a variable by a terminal followed by a variable.
[/step]
[step:State the invariant and set up the induction]
We prove: for every $n \geq 0$ and every $\alpha \in (\Sigma \cup V)^\star$ with $S \xrightarrow{G,\, n} \alpha$ (meaning $\alpha$ is derived from $S$ in $n$ steps),
\begin{align*}
\alpha &\in \mathbb{W} \cup \mathbb{W} V.
\end{align*}
This is the "shape invariant" $\mathcal{P}(\alpha) := (\alpha \in \mathbb{W}) \lor (\alpha \in \mathbb{W} V)$.
The proof is by induction on $n$.
[/step]
[step:Verify the base case $n = 0$]
A zero-step derivation from $S$ ends at $S$ itself, so $\alpha = S \in V$. Writing $S = \varepsilon S$ with $\varepsilon \in \mathbb{W}$ (the empty string) and $S \in V$ exhibits $S \in \mathbb{W} V$. Hence $\mathcal{P}(S)$ holds.
[/step]
[step:Show $\mathbb{W}$ is a sink for any derivation]
We record a simple observation for use in the inductive step:
[claim:No rule can be applied to a string in $\mathbb{W}$]
If $\beta \in \mathbb{W} = \Sigma^\star$, then $\beta$ contains no variable. Every rule in $P$ has a single variable $A \in V$ as its left-hand side. A derivation step $\beta \to_G \beta'$ requires a substring of $\beta$ equal to the left-hand side of some rule; since $A \in V$ does not occur in $\beta$, no rule applies. Hence $\beta$ admits no $G$-derivation step.
[/claim]
[proof]
See the claim statement. The argument uses only that the left-hand side of every regular rule is a single variable and that $\beta \in \mathbb{W}$ contains no variable.
[/proof]
[/step]
[step:Verify the inductive step]
Fix $n \geq 0$ and assume the invariant holds for all $n$-step derivations: if $S \xrightarrow{G,\, n} \beta$ then $\beta \in \mathbb{W} \cup \mathbb{W} V$.
Consider an $(n+1)$-step derivation $S \xrightarrow{G,\, n} \beta \to_G \alpha$. By the inductive hypothesis $\beta \in \mathbb{W} \cup \mathbb{W} V$. We split cases on which subset contains $\beta$.
*Case 1: $\beta \in \mathbb{W}$.* The previous step's claim gives that no $G$-rule applies to $\beta$, so there is no $\alpha$ with $\beta \to_G \alpha$. This contradicts the assumption that a step $\beta \to_G \alpha$ exists. Hence this case does not occur.
*Case 2: $\beta \in \mathbb{W} V$.* Write $\beta = w A$ with $w \in \mathbb{W}$ and $A \in V$. The single variable $A$ occurs at the last position of $\beta$; it is the only position in $\beta$ where a rule can fire (since every rule has a single-variable left-hand side, and $A$ is the only variable in $\beta$). So the derivation step $\beta \to_G \alpha$ uses some rule with left-hand side $A$. Because $G$ is regular, this rule is one of:
- $A \to a$ for some $a \in \Sigma$. Then
\begin{align*}
\alpha &= w \cdot a \in \mathbb{W},
\end{align*}
because $wa \in \Sigma^\star$. Hence $\alpha \in \mathbb{W}$.
- $A \to aB$ for some $a \in \Sigma$, $B \in V$. Then
\begin{align*}
\alpha &= w \cdot a B = (wa) B \in \mathbb{W} V,
\end{align*}
because $wa \in \Sigma^\star$ and $B \in V$. Hence $\alpha \in \mathbb{W} V$.
In either subcase $\alpha \in \mathbb{W} \cup \mathbb{W} V$, closing the induction.
[guided]
The strategy is to use induction on the derivation length together with a careful analysis of the two admissible regular-rule shapes. The inductive hypothesis assumes the invariant for the predecessor string $\beta$, and the inductive step examines which rules can actually be applied.
Why does Case 1 ($\beta \in \mathbb{W}$) lead to a contradiction rather than simply being vacuous? Because the derivation step $\beta \to_G \alpha$ is a *given* datum — we are analysing an $(n+1)$-step derivation, so a step from $\beta$ to $\alpha$ must exist. The claim from the previous step proves that if $\beta \in \mathbb{W}$, no such step can exist. So under the inductive hypothesis, the case $\beta \in \mathbb{W}$ is incompatible with the existence of an $(n+1)$-th step. We conclude that whenever a derivation continues past an $n$-step prefix ending at $\beta$, that $\beta$ cannot have been in $\mathbb{W}$.
In Case 2, the key observation is that the single variable in $\beta = wA$ occupies the last position. Where else could the rule fire? Nowhere: rules have a single-variable left-hand side, and the only variable in $\beta$ is the trailing $A$. So we know exactly which position is rewritten.
What determines the shape of $\alpha$? The rule. Regular rules come in two shapes, and each preserves the invariant:
- $A \to a$ removes the variable and appends a terminal, producing a pure terminal string: $\alpha = wa \in \mathbb{W}$.
- $A \to aB$ removes $A$ and substitutes a terminal followed by a variable, keeping the shape "terminal string followed by a single variable": $\alpha = waB \in \mathbb{W}V$.
In both cases the invariant holds. This is the heart of the argument: the two-part structure of regular grammars ($\mathbb{W}$ terminus vs $\mathbb{W}V$ intermediate) is closed under the two rule shapes.
Notice that had we allowed any rule of shape other than the two regular shapes — e.g., $A \to Ba$, $A \to BC$, or $A \to \varepsilon$ — the invariant would fail. The left-prefixed placement of the terminal and the limitation to at most one trailing variable are what keep the sentential forms in $\mathbb{W} \cup \mathbb{W} V$.
[/guided]
[/step]
[step:Conclude the theorem]
By induction on derivation length (base case in the third step, inductive step in the fifth step), every $\alpha$ with $S \xrightarrow{G} \alpha$ satisfies $\alpha \in \mathbb{W} \cup \mathbb{W} V$. This completes the proof.
[/step]