[proofplan]
Both directions factor through Kraft's inequality as a common criterion. The forward direction uses McMillan's theorem to pass from decipherability to Kraft's inequality, then Kraft's theorem to build a prefix-free code with the same lengths. The reverse direction is immediate because every prefix-free code is decipherable.
[/proofplan]
[step:Reduce the reverse direction to the fact that prefix-free implies decipherable]
Suppose a prefix-free $a$-ary code $c : A \to B^*$ with word lengths $(l_1, \ldots, l_m)$ exists. Every prefix-free code is decipherable: a concatenation $c(x_1) c(x_2) \cdots c(x_k)$ can be parsed from left to right by reading the shortest prefix of the unread suffix that coincides with a codeword, which is uniquely determined by the prefix-free property. Hence $c$ is itself a decipherable code with the same word lengths $(l_1, \ldots, l_m)$.
[/step]
[step:Use McMillan's theorem to extract Kraft's inequality from decipherability]
Conversely, suppose a decipherable $a$-ary code $\tilde{c} : A \to B^*$ with word lengths $(l_1, \ldots, l_m)$ exists. By [McMillan's Theorem](/theorems/1627), whose hypothesis "the code is decipherable" is precisely our assumption on $\tilde{c}$,
\begin{align*}
\sum_{i=1}^m a^{-l_i} \le 1.
\end{align*}
[guided]
McMillan's theorem requires only decipherability of the code, nothing more. Its conclusion is Kraft's inequality, a purely numerical statement about the length profile $(l_1, \ldots, l_m)$ — it does not record any structural information about $\tilde{c}$ itself. That is the engine of this equivalence: two different notions of "good code" (prefix-free, decipherable) are mediated by the same scalar inequality on their length profiles.
[/guided]
[/step]
[step:Use Kraft's theorem to construct a prefix-free code with the same lengths]
By [Kraft's Inequality Characterises Prefix-Free Codes](/theorems/1626), whose hypothesis — Kraft's inequality for the length profile $(l_1, \ldots, l_m)$ — was just verified in Step 2, there exists a prefix-free $a$-ary code $c : A \to B^*$ with word lengths $(l_1, \ldots, l_m)$.
Combining with Step 1, we have shown: a decipherable code with word lengths $(l_1, \ldots, l_m)$ exists if and only if a prefix-free code with the same word lengths exists.
[guided]
The construction in Kraft's theorem does not reuse the decipherable code $\tilde{c}$ — it builds a new prefix-free code from scratch, using only the numerical constraint on lengths. The message is operational: if you are given a decipherable code, you can replace it by a prefix-free code at no cost in word lengths (and hence no cost in expected codeword length, given any source distribution).
Prefix-free codes are strictly easier to work with than general decipherable codes, and this theorem says you lose nothing by restricting to them. Three distinct advantages follow. First, they can be decoded online, letter by letter, without look-ahead: a decoder reads the incoming stream one symbol at a time and emits a source symbol as soon as the current prefix matches a codeword. Second, they correspond to leaves of a rooted $a$-ary tree, which makes their combinatorial analysis geometric rather than algebraic. Third, the natural algorithms for constructing optimal codes (Huffman's algorithm, Shannon–Fano coding) produce prefix-free codes directly, so the class is closed under the optimisation procedures we actually use.
The theorem therefore provides the foundation for the entire remaining development of source coding: every subsequent construction or bound can be stated for prefix-free codes without loss of generality.
The logical structure of this equivalence is worth pausing on. Two apparently different "goodness" criteria for a code — decipherability (semantic) and prefix-freeness (syntactic) — collapse to the same constraint on the length profile, namely Kraft's inequality. The semantic-to-syntactic passage goes through a scalar inequality that remembers only the lengths, not the codewords themselves. This is a structural pattern that recurs throughout coding theory: a combinatorial classification question reduces to an inequality on a finite numerical invariant.
[/guided]
[/step]