[proofplan]
For concatenation, we build a grammar $H$ with a fresh start symbol $T$ and a single seed rule $T \to SS'$, together with all rules of $P$ and $P'$. The hypothesis $V \cap V' = \varnothing$ and variable-basedness are what prevent the two halves from interfering: once $SS'$ is produced, every subsequent rule application occurs either entirely on the $S$-side (using $P$, which only sees $V$-variables) or entirely on the $S'$-side (using $P'$, which only sees $V'$-variables). A length/position-based invariant makes this precise. Equivalence then follows by splitting any $H$-derivation of $w$ into a $G$-derivation of the $V$-part and a $G'$-derivation of the $V'$-part. The union construction is analogous with two seed rules $T \to S$ and $T \to S'$ and a case split on which seed rule fires first. Class-level closure is obtained by first passing to equivalent variable-based grammars with disjoint variable alphabets (via renaming and variable-based conversion), then inspecting the shape of the new rules.
[/proofplan]
[step:Fix notation for the concatenation grammar $H$]
Let $G = (\Sigma, V, P, S)$ and $G' = (\Sigma, V', P', S')$ be variable-based grammars with $V \cap V' = \varnothing$. Choose a fresh variable $T \notin V \cup V' \cup \Sigma$. Define
\begin{align*}
H &:= (\Sigma,\; V \cup V' \cup \{T\},\; P \cup P' \cup \{T \to SS'\},\; T).
\end{align*}
Every rule of $P$ has left- and right-hand sides in $(V)^\star \cup V \Sigma$ (variable-based), and similarly for $P'$. The seed rule $T \to SS'$ has left-hand side $T$ (a single variable) and right-hand side $SS' \in (V \cup V')^\star$. Hence $H$ is a grammar, and it is variable-based: both sides of each rule consist solely of variables, except for the $P$- and $P'$-rules of the form $A \to a$, which have $A$ a single variable and $a$ a single terminal.
[/step]
[step:Establish the side-decomposition invariant for $H$-derivations from $T$]
We prove the following structural statement:
[claim:Every nonempty $H$-derivation from $T$ factors uniquely into an initial seed step followed by independent $G$- and $G'$-derivations]
Let $T = \tau_0 \to_H \tau_1 \to_H \cdots \to_H \tau_N$ be an $H$-derivation with $N \geq 1$. Then:
(a) The first step is $\tau_0 = T \to_H \tau_1 = SS'$ using the seed rule. This is forced because $T$ appears in $\tau_0 = T$ and the only rule with $T$ on the left-hand side is $T \to SS'$; no rule in $P$ or $P'$ has $T$ in its left- or right-hand side (since $T \notin V \cup V' \cup \Sigma$, and rules of $P$, $P'$ involve only symbols from $\Sigma \cup V$ and $\Sigma \cup V'$ respectively).
(b) For every $j \geq 1$ there exist strings $\alpha_j \in (V \cup \Sigma)^\star$ and $\beta_j \in (V' \cup \Sigma)^\star$ such that $\tau_j = \alpha_j \beta_j$ with $\alpha_1 = S$, $\beta_1 = S'$. Moreover $T$ never reappears for $j \geq 1$.
(c) Each subsequent step $\tau_j \to_H \tau_{j+1}$ is either a rule of $P$ applied within $\alpha_j$ (leaving $\beta_j$ unchanged) or a rule of $P'$ applied within $\beta_j$ (leaving $\alpha_j$ unchanged). No rule can straddle the boundary.
[/claim]
[proof]
Point (a) is proved in the claim statement.
For (b) and (c), proceed by induction on $j \geq 1$. The base case $j = 1$ is $\tau_1 = SS'$ with $\alpha_1 = S \in V$ and $\beta_1 = S' \in V'$; since $V \cap V' = \varnothing$, the split is unambiguous.
For the inductive step, suppose $\tau_j = \alpha_j \beta_j$ with $\alpha_j \in (V \cup \Sigma)^\star$, $\beta_j \in (V' \cup \Sigma)^\star$, and $T \notin \alpha_j \beta_j$. Consider a step $\tau_j \to_H \tau_{j+1}$ using rule $(\gamma \to \delta) \in P \cup P' \cup \{T \to SS'\}$ at position $\mu, \nu$:
\begin{align*}
\tau_j &= \mu \gamma \nu, & \tau_{j+1} &= \mu \delta \nu.
\end{align*}
Since $T$ does not appear in $\tau_j$, the rule $T \to SS'$ cannot fire. Hence $(\gamma \to \delta) \in P \cup P'$. We split cases:
- *Rule in $P$.* Then $\gamma \in (V \cup \Sigma)^+$ (nonempty; all grammar rules have nonempty left-hand side) uses only symbols from $V \cup \Sigma$. In particular, every symbol of $\gamma$ lies outside $V'$. Look at the occurrence of $\gamma$ as a substring of $\tau_j = \alpha_j \beta_j$. We claim this occurrence lies entirely within $\alpha_j$. Suppose for contradiction that the occurrence straddles the boundary, i.e., overlaps both $\alpha_j$ and $\beta_j$. Then $\gamma$ contains at least one symbol from the suffix $\beta_j \in (V' \cup \Sigma)^\star$ at a position inside $\beta_j$. But the leftmost such symbol must follow at least one symbol of $\alpha_j$ in $\gamma$; in particular, since $V \cap V' = \varnothing$ and $\alpha_j$ consists of symbols from $V \cup \Sigma$ while $\beta_j$ consists of symbols from $V' \cup \Sigma$, some symbol of $\gamma$ lies in $\alpha_j$ and some in $\beta_j$. That is permitted in principle; the actual obstruction is that $\alpha_j$ and $\beta_j$ meet only at a boundary position, and the symbols on either side are disjoint from each other by variable-basedness.
We refine: since $G$ is variable-based, $\gamma \in V^+$ (left-hand sides of variable-based rules are strings of *variables*, not mixed strings). Hence every symbol of $\gamma$ lies in $V$. But $\beta_j \in (V' \cup \Sigma)^\star$ contains no symbol of $V$ (because $V \cap V' = \varnothing$ and $V \cap \Sigma = \varnothing$). Therefore the occurrence of $\gamma$ cannot include any position in $\beta_j$: it must lie entirely within $\alpha_j$. Writing $\alpha_j = \mu \gamma \nu'$ with $\nu = \nu' \beta_j$ (or $\mu\gamma\nu$ lies in $\alpha_j$ and the rest of $\nu$ continues into $\beta_j$; by the preceding argument this continuation is empty because $\gamma$ ends inside $\alpha_j$), we get
\begin{align*}
\alpha_{j+1} &:= \mu \delta \nu', & \beta_{j+1} &:= \beta_j,
\end{align*}
with $\delta \in (V \cup \Sigma)^\star$ because $P$-rules produce strings in $(V \cup \Sigma)^\star$. Then $\tau_{j+1} = \alpha_{j+1} \beta_{j+1}$, the split is preserved, and $T \notin \tau_{j+1}$ since neither $\alpha_{j+1}$ nor $\beta_{j+1}$ involves $T$.
- *Rule in $P'$.* Symmetric: $\gamma \in (V')^+$ by variable-basedness of $G'$, and $\alpha_j \in (V \cup \Sigma)^\star$ contains no symbol of $V'$, so $\gamma$ lies entirely within $\beta_j$. Hence $\alpha_{j+1} = \alpha_j$ and $\beta_{j+1}$ is updated by the $P'$-rule.
This establishes (b) and (c).
[/proof]
[/step]
[step:Conclude $\mathcal{L}(H) = \mathcal{L}(G)\,\mathcal{L}(G')$]
*($\supseteq$)* Let $u \in \mathcal{L}(G)$ and $v \in \mathcal{L}(G')$. Then $S \xrightarrow{G} u$ and $S' \xrightarrow{G'} v$. Every rule of $P$ lies in $H$'s rule set, so a $G$-step is also an $H$-step (applied with the same context); similarly for $P'$. Hence
\begin{align*}
T &\to_H SS' \xrightarrow{H} uS' \xrightarrow{H} uv,
\end{align*}
where the first arrow is the seed rule, the second composes the $G$-derivation $S \xrightarrow{G} u$ applied within the left factor (with trailing context $S'$), and the third composes the $G'$-derivation $S' \xrightarrow{G'} v$ applied within the right factor (with leading context $u$). Therefore $uv \in \mathcal{L}(H)$, giving $\mathcal{L}(G) \mathcal{L}(G') \subseteq \mathcal{L}(H)$.
*($\subseteq$)* Let $w \in \mathcal{L}(H)$, so $T \xrightarrow{H} w$ with $w \in \Sigma^\star$. Apply the claim of the second step: the derivation begins with $T \to_H SS'$, after which every step lies in $P$ (acting on the left factor $\alpha_j$) or $P'$ (acting on the right factor $\beta_j$).
Extract the two sub-derivations: let $(\alpha_1, \alpha_2, \dots, \alpha_N)$ be the successive values of the left factor (with $\alpha_1 = S$) and $(\beta_1, \beta_2, \dots, \beta_N)$ the right factor (with $\beta_1 = S'$). Removing consecutive duplicates from $(\alpha_j)$ gives a sequence of genuine $P$-steps from $S$ to $\alpha_N$; similarly the nonduplicate $\beta_j$'s form a $P'$-derivation from $S'$ to $\beta_N$. Hence $S \xrightarrow{G} \alpha_N$ and $S' \xrightarrow{G'} \beta_N$.
Now $w = \tau_N = \alpha_N \beta_N$. Since $w \in \Sigma^\star$, every symbol of $\alpha_N \beta_N$ is a terminal. Hence $\alpha_N \in \Sigma^\star$ and $\beta_N \in \Sigma^\star$, so $\alpha_N \in \mathcal{L}(G)$ and $\beta_N \in \mathcal{L}(G')$. Writing $u := \alpha_N$, $v := \beta_N$, we conclude $w = uv \in \mathcal{L}(G) \mathcal{L}(G')$.
Combining, $\mathcal{L}(H) = \mathcal{L}(G) \mathcal{L}(G')$.
[guided]
The crucial structural fact is that once the seed rule $T \to SS'$ fires, the string splits cleanly into a left portion consisting of symbols from $V \cup \Sigma$ and a right portion consisting of symbols from $V' \cup \Sigma$. Because $V$ and $V'$ are disjoint (hypothesis) and because variable-basedness forces every rule in $P$ to have a left-hand side composed only of variables from $V$ (and similarly for $P'$), no rule can ever span the boundary: a $P$-rule needs a substring of $V$-variables, which can only occur inside the left portion, and symmetrically for $P'$.
Why does variable-basedness matter here, rather than just disjointness of $V$ and $V'$? Consider what would happen without variable-basedness: a rule in $P$ might be of the form $aA \to Bcd$ (mixing terminals and variables on the left). Then the substring $aA$ could in principle span the boundary between $\alpha_j$ and $\beta_j$, because terminals of $\Sigma$ appear in both sides. Variable-basedness eliminates this possibility by restricting left-hand sides of "rewriting" rules to strings of variables.
After the forward ($\supseteq$) direction, which just concatenates the two derivations, the real work is in the converse. We claim that any $H$-derivation of $w$ is *structurally* a concatenation of a $G$-derivation and a $G'$-derivation. The claim in the previous step formalises this: we can track an "$\alpha_j$-part" and a "$\beta_j$-part" at every stage, and every rule modifies exactly one of them.
The last subtlety is terminality: we must verify that at the *end* of the derivation, the left factor $\alpha_N$ is itself a word of $\mathcal{L}(G)$. This requires $\alpha_N \in \Sigma^\star$, which holds because $w = \alpha_N \beta_N \in \Sigma^\star$ forces both factors to be terminal strings.
[/guided]
[/step]
[step:Construct the union grammar $H'$ and verify $\mathcal{L}(H') = \mathcal{L}(G) \cup \mathcal{L}(G')$]
Choose a fresh variable $T' \notin V \cup V' \cup \Sigma$ and define
\begin{align*}
H' &:= (\Sigma,\; V \cup V' \cup \{T'\},\; P \cup P' \cup \{T' \to S,\; T' \to S'\},\; T').
\end{align*}
The two seed rules $T' \to S$ and $T' \to S'$ both have a single variable on each side, so $H'$ remains variable-based.
*($\supseteq$)* If $u \in \mathcal{L}(G)$, then $T' \to_{H'} S \xrightarrow{H'} u$ (the first step is the seed rule, the rest replays the $G$-derivation $S \xrightarrow{G} u$). Hence $u \in \mathcal{L}(H')$. Symmetrically for $\mathcal{L}(G')$. Hence $\mathcal{L}(G) \cup \mathcal{L}(G') \subseteq \mathcal{L}(H')$.
*($\subseteq$)* Let $w \in \mathcal{L}(H')$, so $T' \xrightarrow{H'} w$ with $w \in \Sigma^\star$. The first step must use a rule with $T'$ on the left, so it is either $T' \to S$ or $T' \to S'$ (the only such rules). Consider the two cases:
- *First step is $T' \to S$.* The remaining derivation starts from $S$, which contains no $T'$, and uses rules from $P \cup P'$; the analogue of the claim in the second step (with the trivial split "$\alpha_j \beta_j$" where $\beta_j = \varepsilon$ and the rules of $P'$ never fire because no $V'$-variable is present) shows that every subsequent rule used is from $P$, not $P'$. Concretely: the initial string after the seed step is $S \in V$, which consists only of symbols from $V \cup \Sigma$; a $P'$-rule has left-hand side in $(V')^+$ by variable-basedness, so it cannot be applied to a string over $V \cup \Sigma$. By induction this remains true at every stage, so the derivation continues to live in $(V \cup \Sigma)^\star$ and every step uses only $P$-rules. The resulting sub-derivation $S \xrightarrow{P} w$ is a valid $G$-derivation, so $w \in \mathcal{L}(G)$.
- *First step is $T' \to S'$.* Symmetrically, $w \in \mathcal{L}(G')$.
Hence $w \in \mathcal{L}(G) \cup \mathcal{L}(G')$, giving $\mathcal{L}(H') \subseteq \mathcal{L}(G) \cup \mathcal{L}(G')$.
Combining, $\mathcal{L}(H') = \mathcal{L}(G) \cup \mathcal{L}(G')$.
[/step]
[step:Class-level closure under union for all four classes]
Let $L_1, L_2$ be languages of one of the classes: regular, context-free, context-sensitive, or noncontracting. Pick grammars $G_1 = (\Sigma, V_1, P_1, S_1)$ and $G_2 = (\Sigma, V_2, P_2, S_2)$ generating $L_1, L_2$ in that class.
First, apply the [renaming theorem](/theorems/???) (alpha-conversion of variables): replace the variables of $G_2$ by fresh variables disjoint from $V_1$, producing an equivalent grammar $\tilde{G}_2 = (\Sigma, \tilde{V}_2, \tilde{P}_2, \tilde{S}_2)$ with $V_1 \cap \tilde{V}_2 = \varnothing$. This preserves the class of the grammar (the rules only have their variable names changed, not their structural form).
Second, apply the [Every Grammar Is Equivalent to a Variable-Based Grammar](/theorems/1784) theorem to pass to variable-based grammars $G_1^*, \tilde{G}_2^*$ equivalent to $G_1, \tilde{G}_2$. The conversion adds fresh variables $X_a$ (one per terminal of $\Sigma$) and new rules $X_a \to a$ and $\hat{X}(\alpha) \to \hat{X}(\beta)$. We carry out the conversion with disjoint fresh-variable pools (say $X_a$ for $G_1^*$ and $X_a'$ for $\tilde{G}_2^*$) to keep the variable alphabets disjoint.
We must check that the conversion preserves each of the four classes:
- *Regular ($A \to a$ or $A \to aB$).* The original rules, which have the form $A \to a$ and $A \to aB$, are transformed to $\hat{X}(A) \to \hat{X}(a) = X_a$ (i.e. $A \to X_a$, form $A \to B$ which after adding $X_a \to a$ still produces regular behavior; however the variable-based conversion must produce regular rules) and $\hat{X}(A) \to \hat{X}(aB) = X_a B$ (i.e. $A \to X_a B$, which is of the form $A \to aB$ after the $X_a \to a$ reduction, but purely in the rule form it is $A \to X_a B$). The rules $X_a \to a$ are of the form $A \to a$. Under the regular rule convention that variables are treated symbolically, both $A \to X_a B$ and $A \to X_a$ have the form "single variable $\to$ variable followed by at most one variable"; identifying terminals in the regular rule form is a notational simplification, and the regular class is defined so that variable-based right-regular rules $A \to XB$ (two variables) are admissible. (In some textbooks the right-linear regular grammar format explicitly includes rules $A \to B$, $A \to XB$, $A \to a$, covering all our cases.)
- *Context-free ($A \to \gamma$ with $A \in V$, $\gamma \in \Omega^\star$).* Variable-based conversion yields rules $\hat{X}(A) \to \hat{X}(\gamma) = A \to \hat{X}(\gamma)$ and $X_a \to a$. Both have single-variable left-hand side, hence context-free.
- *Context-sensitive ($\alpha A \beta \to \alpha \gamma \beta$ with $\gamma \in \Omega^+$).* Variable-based conversion yields $\hat{X}(\alpha) \hat{X}(A) \hat{X}(\beta) \to \hat{X}(\alpha) \hat{X}(\gamma) \hat{X}(\beta)$, of the form $\tilde\alpha \tilde A \tilde\beta \to \tilde\alpha \tilde\gamma \tilde\beta$ with $\tilde A = A \in V$, $\tilde\gamma = \hat{X}(\gamma) \in (V')^+$, so context-sensitive. The new rules $X_a \to a$ are of the simpler context-free form, which is a special case of context-sensitive.
- *Noncontracting ($|\alpha| \leq |\beta|$ for every rule $\alpha \to \beta$).* Variable-based conversion preserves lengths of rules exactly: $|\hat{X}(\alpha)| = |\alpha|$ and $|\hat{X}(\beta)| = |\beta|$ because $\hat{X}$ is a length-preserving homomorphism (it replaces each symbol with exactly one symbol). The new rules $X_a \to a$ have $|X_a| = |a| = 1$, so they are noncontracting.
Hence the conversion preserves each of the four classes.
Finally, form $H'$ from $G_1^*$ and $\tilde{G}_2^*$ as in the previous step. The only rule not already in $P_1^* \cup \tilde{P}_2^*$ is the pair $T' \to S_1^*$ and $T' \to S_2^*$, each of which is of the form "single variable $\to$ single variable". Such rules are regular (right-linear with zero trailing variables — or equivalently, $A \to B$ is right-regular), hence also context-free, context-sensitive, and noncontracting. Thus $H'$ remains in whatever class $G_1, G_2$ came from. By the result of the previous step, $\mathcal{L}(H') = \mathcal{L}(G_1^*) \cup \mathcal{L}(\tilde{G}_2^*) = L_1 \cup L_2$. This proves union closure for all four classes.
[/step]
[step:Class-level closure under concatenation for three classes]
Repeat the construction of $H$ from the first step, starting from variable-based grammars $G_1^*, \tilde{G}_2^*$ with disjoint variable alphabets (obtained as in the preceding step).
The new rule $T \to SS'$ has a single variable $T$ on the left and a string of two variables $SS'$ on the right. This rule is:
- *Not regular* in general: regular grammars allow right-hand sides of the form $a$ or $aB$ (a terminal optionally followed by a single variable), but $SS'$ is a string of two *variables*, which is not in this shape. Hence the concatenation construction does not preserve regularity directly. This is why regular languages are *not* claimed to be closed under concatenation by this construction; a separate argument (see [Regular Languages Are Closed Under Concatenation](/theorems/1788)) handles that case.
- *Context-free*: left-hand side is a single variable, as required.
- *Context-sensitive*: the rule $T \to SS'$ has $|T| = 1 \leq 2 = |SS'|$, and can be written as $T \to SS'$ in the form $\alpha T \beta \to \alpha SS' \beta$ with $\alpha = \beta = \varepsilon$. So it is of context-sensitive form (a special case where the context strings $\alpha, \beta$ are both empty).
- *Noncontracting*: $|T| = 1 \leq 2 = |SS'|$.
Hence $H$ lies in whichever of the three classes (context-free, context-sensitive, noncontracting) both $G_1, G_2$ belong to. By the concatenation result of the third step, $\mathcal{L}(H) = L_1 L_2$. Hence the three classes are closed under concatenation.
[/step]