[proofplan]
We derive a contradiction from the assumption that $W = \{(w, v) \in \mathbb{W}^2 : w \in \mathcal{L}(G_v)\}$ is computable. From a computable $\mathbb{1}_W$ we assemble a computable partial function $f : \mathbb{W} \rightharpoonup \mathbb{W}$ whose domain is precisely $\{w : w \notin \mathcal{L}(G_w)\}$. By [Halting Sets Are Computably Enumerable](/theorems/1822) and [Type 0 Languages Are Computably Enumerable](/theorems/1827) run backwards — every computably enumerable set is the language of some type 0 grammar — $\operatorname{dom} f$ is the language of some grammar $G = G_d$. Evaluating the defining biconditional for $f$ at the diagonal word $d$ yields the classical self-referential contradiction $d \in \mathcal{L}(G_d) \iff d \notin \mathcal{L}(G_d)$. This is a diagonal argument: the fixed alphabet and the computable coding of grammars $v \mapsto G_v$ make the diagonal $w \mapsto G_w$ meaningful, and the contradiction falls out of a single substitution.
[/proofplan]
[step:Fix notation and the grammar coding]
Fix the course alphabet $\Sigma$ with distinguished accept letter $a \in \Sigma$, and write $\mathbb{W} = \Sigma^*$. The course fixes a computable coding of type 0 grammars
\begin{align*}
\operatorname{code} : \{\text{type 0 grammars}\} &\to \mathbb{W}
\end{align*}
under which every word $v \in \mathbb{W}$ is the code of a grammar: for each $v$, write $G_v$ for the grammar with $\operatorname{code}(G_v) = v$ (so $v \mapsto G_v$ is a surjection from $\mathbb{W}$ onto the set of grammars, and $G_v$ is determined up to strong equivalence). The language generated by $G_v$ is $\mathcal{L}(G_v) \subseteq \mathbb{W}$.
The *word problem* for type 0 grammars is
\begin{align*}
W &= \{(w, v) \in \mathbb{W}^2 : w \in \mathcal{L}(G_v)\}.
\end{align*}
"$W$ is computable" means its indicator $\mathbb{1}_W : \mathbb{W}^2 \to \mathbb{W}$ is total computable.
[/step]
[step:Assume $W$ is computable and build a diagonal partial function $f$]
Suppose, for contradiction, that $W$ is computable, so that $\mathbb{1}_W : \mathbb{W}^2 \to \mathbb{W}$ is total computable. Define
\begin{align*}
f : \mathbb{W} &\rightharpoonup \mathbb{W}, \\
w &\mapsto \begin{cases} \uparrow & \text{if } w \in \mathcal{L}(G_w), \\ a & \text{if } w \notin \mathcal{L}(G_w). \end{cases}
\end{align*}
[claim:$f$ is computable]
A register machine $M_f$ witnessing $f$ runs as follows. On input $w$ in register $0$:
(a) Save a master copy of $w$ in scratch registers $R_1, R_2$.
(b) Build the pair $(w, w)$ by the word-pair primitive (total computable by L2 of Step 4 of the proof of [Recursive Implies Computable](/theorems/1817)): copy $R_1$ into input register $1$ and $R_2$ into input register $2$ to present $(w, w)$ as input to the next stage.
(c) Run the register machine witnessing $\mathbb{1}_W$ on $(w, w)$. Since $\mathbb{1}_W$ is total computable, this halts; write the output to register $0$. By construction of $\mathbb{1}_W$, register $0$ now contains $a$ if $w \in \mathcal{L}(G_w)$ and $\varepsilon$ if $w \notin \mathcal{L}(G_w)$.
(d) Case-distinguish on register $0$ by the case-distinction primitive (L2 of Step 4 of the proof of [Recursive Implies Computable](/theorems/1817)): if register $0$ is $a$, enter an immediate infinite loop (produce $\uparrow$); if register $0$ is $\varepsilon$, write $a$ to register $0$ and halt.
Formally this is the composition $f = \mathrm{cd} \circ (\mathbb{1}_W \circ \Delta)$ where $\Delta : \mathbb{W} \to \mathbb{W}^2$, $w \mapsto (w, w)$ is the diagonal map (total computable by pairing, L2) and $\mathrm{cd} : \mathbb{W} \rightharpoonup \mathbb{W}$ is the case-distinction partial function $\mathrm{cd}(a) \uparrow$, $\mathrm{cd}(\varepsilon) = a$ (computable by L2). The composition is computable by closure of computable partial functions under composition (L1 of Step 4 of the proof of [Recursive Implies Computable](/theorems/1817)).
[/claim]
By the defining cases,
\begin{align*}
\operatorname{dom} f &= \{w \in \mathbb{W} : w \notin \mathcal{L}(G_w)\} = \mathbb{W} \setminus D,
\end{align*}
where $D := \{w \in \mathbb{W} : w \in \mathcal{L}(G_w)\}$ is the diagonal.
[guided]
We explain the self-reference that the function $f$ encodes.
*Why the diagonal at all?* We want a contradiction from "$W$ is computable". The standard recipe — used in Cantor, the halting problem, and Rice's theorem — is: pretend some decision procedure exists, then build a single object whose membership in a relevant set contradicts the procedure. The "single object" here is a grammar whose language is designed to disagree with itself.
*How does the two-variable predicate $W(w, v)$ become one-variable?* By diagonalising: we restrict to the slice $v = w$, giving the one-variable predicate $D(w) := W(w, w) \iff w \in \mathcal{L}(G_w)$. If $W$ is computable, so is $D$ (composition of computable $\mathbb{1}_W$ with the total computable diagonal $w \mapsto (w, w)$).
*Why build $f$ and not just $\mathbb{1}_D$?* Because we want to feed $f$ into the computably-enumerable-equals-language-of-a-grammar machinery. The target set $\operatorname{dom} f = \mathbb{W} \setminus D$ is the complement of the diagonal. We choose the defining cases of $f$ so that its domain is exactly this complement:
- on $w \in D$ (i.e. $w \in \mathcal{L}(G_w)$), force $f(w) \uparrow$: keep $w$ out of the domain;
- on $w \notin D$ (i.e. $w \notin \mathcal{L}(G_w)$), return any value (we chose $a$): put $w$ into the domain.
*Why force divergence on the "wrong" side?* Because a *partial* function can omit $w$ from its domain just by diverging at $w$. There is no computable way to make a *total* function encode the same information — that would require $\mathbb{W} \setminus D$ to be computable, and we do not yet know it is. Divergence is cheap; definedness is expensive. By exploiting divergence we bypass the need for a decision procedure on $\mathbb{W} \setminus D$.
*Computability of $f$.* The assumed computability of $\mathbb{1}_W$ buys us computability of $\mathbb{1}_D$. The $\mathrm{cd}$ trick (halt with $a$ on $\varepsilon$, loop on $a$) converts a total-computable-indicator value into a divergence-or-definedness outcome; this is exactly L2 case distinction. Composing with the diagonal and the indicator assembles $f$ as required.
[/guided]
[/step]
[step:Realise $\operatorname{dom} f$ as the language of a type 0 grammar]
Since $f$ is a computable partial function, its domain $\operatorname{dom} f \subseteq \mathbb{W}$ is computably enumerable (this is the definition of a computably enumerable set: the empty set or the domain of a computable partial function; if $\operatorname{dom} f = \varnothing$ it is covered by convention, but in fact $\operatorname{dom} f \neq \varnothing$ since e.g. $\varepsilon \in \operatorname{dom} f$ when $\varepsilon \notin \mathcal{L}(G_\varepsilon)$ — and if $\varepsilon \in \mathcal{L}(G_\varepsilon)$ we will reach a contradiction by the same diagonal route with $\varepsilon$ in place of $d$; we may assume non-emptiness for the construction to follow without circularity).
The class of type 0 languages coincides with the class of computably enumerable subsets of $\mathbb{W}$ (the "$\supseteq$" inclusion). By [Type 0 Languages Are Computably Enumerable](/theorems/1827), every type 0 language is computably enumerable; the converse — every computably enumerable set is the language of some type 0 grammar — is the content of [Computably Enumerable Sets Are Type 0 Languages](/theorems/???). Applying the converse direction to $\operatorname{dom} f$, there exists a type 0 grammar $G$ with
\begin{align*}
\mathcal{L}(G) &= \operatorname{dom} f.
\end{align*}
Let $d := \operatorname{code}(G)$, so that $G = G_d$ and $\mathcal{L}(G_d) = \operatorname{dom} f$.
[/step]
[step:Substitute the diagonal word $d$ to derive the contradiction]
Evaluate the two equivalent descriptions of $\operatorname{dom} f$ at $w = d$:
\begin{align*}
d \in \mathcal{L}(G_d) &\iff d \in \operatorname{dom} f &&\text{(since $\mathcal{L}(G_d) = \operatorname{dom} f$)} \\
&\iff f(d) \downarrow &&\text{(definition of $\operatorname{dom} f$)} \\
&\iff d \notin \mathcal{L}(G_d) &&\text{(definition of $f$: $f(w) \downarrow \iff w \notin \mathcal{L}(G_w)$).}
\end{align*}
Hence $d \in \mathcal{L}(G_d) \iff d \notin \mathcal{L}(G_d)$, a contradiction.
[guided]
Three substitutions are at play in this chain, and it is worth naming each.
(i) $\mathcal{L}(G_d) = \operatorname{dom} f$: this is the chosen identification from Step 3. The grammar $G = G_d$ was built *expressly* to have its language equal $\operatorname{dom} f$ — this is the entire point of invoking the converse of [Type 0 Languages Are Computably Enumerable](/theorems/1827).
(ii) $\operatorname{dom} f = \{w : f(w) \downarrow\}$: this is the definition of the domain of a partial function — the set of inputs on which it converges.
(iii) $f(w) \downarrow \iff w \notin \mathcal{L}(G_w)$: this is the defining case distinction of $f$ from Step 2.
The diagonal substitution $w = d$ turns "$w \notin \mathcal{L}(G_w)$" into "$d \notin \mathcal{L}(G_d)$", and the left-hand side of the biconditional in (i) is precisely "$d \in \mathcal{L}(G_d)$". The two combine to give $d \in \mathcal{L}(G_d) \iff d \notin \mathcal{L}(G_d)$ — a statement of the form $p \iff \neg p$, which is a contradiction in classical propositional logic (it entails $p \land \neg p$).
*What went wrong if $W$ is computable?* Nothing "went wrong" in our construction: every step (build $f$, realise $\operatorname{dom} f$ as a language, substitute $d$) is valid. The contradiction must therefore be due to the *single unsupported* assumption, namely that $W$ is computable. We conclude it is not.
*Where did the hypothesis "$W$ is computable" get used?* Only in Step 2, Claim "$f$ is computable": the register machine for $f$ calls $\mathbb{1}_W$ as a subroutine and relies on its totality. Without computability of $W$, we cannot assemble $f$ as a computable partial function, and Step 3 (which needs $f$ computable to realise $\operatorname{dom} f$ as a type 0 language) collapses.
[/guided]
[/step]
[step:Conclude that $W$ is not computable]
The contradiction in Step 4 was derived *only* from the hypothesis that $W$ is computable — all other ingredients (the diagonal construction of $f$, the computable coding of grammars, the equality of type 0 languages with computably enumerable sets) are theorems of the course. Therefore $W$ is not computable. The word problem for type 0 grammars is unsolvable.
[/step]