The Emptiness Problem for Regular Grammars Is Solvable

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We reduce the emptiness problem to a finite membership search. Given a regular grammar $G$, convert it to a deterministic automaton $D$ recognising $\mathcal{L}(G)$; set $n := |Q|$, the pumping number supplied by the pumping lemma. By the short-word theorem, $\mathcal{L}(G)$ is nonempty if and only if it contains a word of length strictly less than $n$. The set of candidate words is $\bigcup_{i=0}^{n-1} \Sigma^i$, which is finite and effectively enumerable. For each candidate $w$, membership $w \in \mathcal{L}(G)$ is decided by simulating $D$ on $w$ — a finite computation. The decision procedure accepts (outputs "nonempty") if any candidate is accepted and rejects (outputs "empty") if none is. [/proofplan] [step:Fix the model of computability and the input encoding] A problem is *solvable* (decidable) when there is an algorithm that, on every input of the problem, halts in finitely many steps and outputs the correct yes/no answer. In the setting of regular grammars, an input is a finite object — a 4-tuple $G = (\Sigma, V, P, S)$ with $\Sigma, V$ finite sets, $P$ a finite set of production rules, and $S \in V$ — so the input is specified by a finite bitstring under any standard encoding. We construct an algorithm that, given such an encoding of $G$, decides whether $\mathcal{L}(G) = \varnothing$. [/step] [step:Convert the grammar to a deterministic automaton and read off the pumping number] From $G = (\Sigma, V, P, S)$, apply the effective construction [Regular Grammars Produce Nondeterministic Automata](/theorems/1792) to obtain a nondeterministic automaton $N$ with $\mathcal{L}(N) = \mathcal{L}(G)$. Apply the effective construction [Nondeterministic and Deterministic Automata Are Equivalent](/theorems/1791) (the subset construction) to $N$ to obtain a deterministic automaton \begin{align*} D &= (\Sigma, Q, \delta, q_0, F) \end{align*} with $\mathcal{L}(D) = \mathcal{L}(N) = \mathcal{L}(G)$. Both constructions are algorithmic: they produce the tuple data of $D$ explicitly from the tuple data of $G$, and the size of $D$ is bounded computably in the size of $G$ (at worst $|Q| \leq 2^{|V|+1}$ from the subset construction on $N$, whose state set is $V \cup \{q_\text{accept}\}$ or similar). Set \begin{align*} n &:= |Q|, \end{align*} which is a finite positive integer computable from the description of $D$ (at minimum $|Q| \geq 1$ because $q_0 \in Q$). By the [Pumping Lemma for Regular Languages](/theorems/1793), $n = |Q|$ is a pumping number for $\mathcal{L}(D) = \mathcal{L}(G)$. [/step] [step:Reduce emptiness to checking words of length less than $n$] By [Nonempty Regular Languages Contain Short Words](/theorems/1794) applied to $\mathcal{L}(G)$ with pumping number $n$, \begin{align*} \mathcal{L}(G) \neq \varnothing \quad &\iff \quad \exists\, w \in \mathcal{L}(G) \text{ with } |w| < n. \end{align*} The forward direction is the content of Theorem 1794; the reverse direction is immediate because a word in $\mathcal{L}(G)$ of any length witnesses nonemptiness. Define the set of candidate short words \begin{align*} S_{<n} &:= \bigcup_{i=0}^{n-1} \Sigma^i. \end{align*} Here $\Sigma^0 = \{\varepsilon\}$ and $\Sigma^i$ is the set of words over $\Sigma$ of length exactly $i$. The set $S_{<n}$ is finite, with cardinality \begin{align*} |S_{<n}| &= \sum_{i=0}^{n-1} |\Sigma|^i. \end{align*} (This is finite because $|\Sigma| < \infty$ and the sum is over $n$ terms, each a finite integer.) The equivalence above rewrites as \begin{align*} \mathcal{L}(G) \neq \varnothing \quad &\iff \quad S_{<n} \cap \mathcal{L}(G) \neq \varnothing. \end{align*} [/step] [step:Decide membership in $\mathcal{L}(G)$ algorithmically for each short word] For each $w \in S_{<n}$, we decide membership $w \in \mathcal{L}(G)$ by simulating $D$ on $w$. The simulation algorithm is standard: initialise a state variable $q \leftarrow q_0$, and for each letter $a_1, a_2, \dots, a_{|w|}$ of $w$ in order, update $q \leftarrow \delta(q, a_i)$ using the transition table of $D$. After $|w|$ updates, accept iff $q \in F$ and reject otherwise. The algorithm halts in exactly $|w|$ steps (each a table lookup), so it halts in finite time, and it outputs the correct answer because, by the definition of $\mathcal{L}(D)$, \begin{align*} w \in \mathcal{L}(D) \quad &\iff \quad \hat{\delta}(q_0, w) \in F, \end{align*} and the simulation computes $\hat{\delta}(q_0, w)$ step by step by the recursive clause $\hat{\delta}(q_0, a_1 \cdots a_i) = \delta(\hat{\delta}(q_0, a_1 \cdots a_{i-1}), a_i)$. Hence the membership problem for $\mathcal{L}(D)$ — equivalently for $\mathcal{L}(G)$, since $\mathcal{L}(D) = \mathcal{L}(G)$ — is decidable by this simulation procedure. [/step] [step:Assemble the decision procedure for emptiness and verify its correctness and termination] The full algorithm on input $G$ is: 1. Construct $D$ from $G$ via the effective constructions above. 2. Compute $n := |Q|$ and enumerate $S_{<n} = \bigcup_{i=0}^{n-1} \Sigma^i$. 3. For each $w \in S_{<n}$ (processed in any fixed order, e.g., length-lexicographic): - Simulate $D$ on $w$ and record whether $w \in \mathcal{L}(G)$. - If $w \in \mathcal{L}(G)$, output "$\mathcal{L}(G) \neq \varnothing$" and halt. 4. If no $w \in S_{<n}$ was accepted, output "$\mathcal{L}(G) = \varnothing$" and halt. *Termination.* Step 1 halts because the constructions [Regular Grammars Produce Nondeterministic Automata](/theorems/1792) and [Nondeterministic and Deterministic Automata Are Equivalent](/theorems/1791) are algorithmic. Step 2 halts because $|S_{<n}| = \sum_{i=0}^{n-1} |\Sigma|^i$ is finite. Step 3 iterates at most $|S_{<n}|$ times, and each iteration is a simulation of $D$ on a word of length $< n$, which takes at most $n - 1$ transition steps. Step 4 is reached only after step 3 completes. Hence the algorithm halts in finite time, with total running time bounded by $|S_{<n}| \cdot (n - 1)$ transitions (plus the fixed cost of constructing $D$). *Correctness.* By the equivalence established in the reduction step, \begin{align*} \mathcal{L}(G) \neq \varnothing \quad &\iff \quad S_{<n} \cap \mathcal{L}(G) \neq \varnothing \quad \iff \quad \exists\, w \in S_{<n} : w \in \mathcal{L}(G), \end{align*} which is exactly the condition the algorithm tests in step 3. If step 3 finds such a $w$, the algorithm outputs "nonempty", which is correct. If step 3 exhausts $S_{<n}$ without finding an accepted word, then $S_{<n} \cap \mathcal{L}(G) = \varnothing$ and hence $\mathcal{L}(G) = \varnothing$, so the output in step 4 is correct. [guided] The structure of this decidability proof is the standard pattern for reducing a question about an infinite set (*is there any word in this possibly-infinite language?*) to a question about a finite, explicitly enumerable set (*is there a word of bounded length in this language?*). Three ingredients are doing the work. *First ingredient: an effective representation.* The regular grammar $G$ has a finite description, and that description can be mechanically transformed into a deterministic automaton $D$ with the same language. This is where [Regular Grammars Produce Nondeterministic Automata](/theorems/1792) and the subset construction [Nondeterministic and Deterministic Automata Are Equivalent](/theorems/1791) matter: they are not just existence statements but *constructions* — given the data of $G$, we write down the data of $D$ explicitly. Without effective constructions, we could not run the algorithm on specific inputs. *Second ingredient: a length bound.* The short-word theorem [Nonempty Regular Languages Contain Short Words](/theorems/1794) is what makes the search finite. Absent this theorem, emptiness might still be decidable (indeed it is for regular languages by other routes), but we would have no obvious finite search space. The pumping lemma provides a *witness shrinker*: any word in $\mathcal{L}(G)$ can be pumped down until its length drops below $n$. So the existence of *some* element is equivalent to the existence of a *short* element, and short elements come from a finite alphabet lying in a bounded-length universe. *Third ingredient: a membership decider.* Even with a finite search space, we must be able to check each candidate. For regular languages this is trivial — simulate the DFA — but the step is not a priori free; it requires a decidable membership problem. For context-free languages, say, the search space analogue exists (context-free pumping), and membership is decidable (the CYK algorithm), so emptiness is likewise decidable. For context-sensitive or Type-0 languages the analogue breaks down: membership is undecidable for Type-0 grammars and PSPACE-hard for context-sensitive, and emptiness is undecidable for Type-0. *Why $n = |Q|$ suffices as a pumping number.* The pumping lemma proved in [Pumping Lemma for Regular Languages](/theorems/1793) specifically uses $n = |Q|$ because any run of $D$ on a word of length $\geq n$ visits $n + 1$ states in a set of size $n$, forcing a repetition (pigeonhole). So the pumping number is tight to the number of states of the deterministic automaton, not a loose constant. This gives us a concrete, computable bound. *Why not search over $\Sigma^\star$ naively?* If one tried to decide emptiness by enumerating all words of $\Sigma^\star$ and checking each, the procedure would fail when $\mathcal{L}(G) = \varnothing$: it would enumerate forever without finding a witness, never halting. The pumping-lemma bound converts this semi-decision procedure into a full decision procedure by cutting off the search at length $n - 1$. If *any* word in $\mathcal{L}(G)$ exists, there exists one of length $< n$; if no word of length $< n$ is in $\mathcal{L}(G)$, we can be confident that no word of any length is, and output "empty" safely. *Complexity observation.* The algorithm runs in time roughly $|S_{<n}| \cdot n \leq n \cdot |\Sigma|^{n-1} \cdot n = n^2 \cdot |\Sigma|^{n-1}$, which is exponential in $n$. This is not the best bound — one can decide emptiness of a DFA in linear time by computing the set of states reachable from $q_0$ and checking whether it meets $F$. But the exponential bound suffices for decidability, which is the point here. [/guided] [/step] [step:Conclude decidability of the emptiness problem] The algorithm constructed above takes a regular grammar $G$ as input, halts in finite time, and outputs "$\mathcal{L}(G) = \varnothing$" or "$\mathcal{L}(G) \neq \varnothing$" correctly. By definition, this shows the emptiness problem for regular grammars is decidable, completing the proof. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.