[proofplan]
We prove both directions by a counting argument on strings of a common length $s := \max_i l_i$. The forward direction observes that in a prefix-free code, each codeword of length $l$ "blocks" exactly $a^{s-l}$ strings of length $s$ from being reached by any other codeword, and these blocked sets are pairwise disjoint, hence their sizes sum to at most $a^s$. The converse is proved by induction on $s$: at each stage we assign the codewords of length $l$ greedily, using Kraft's inequality to show that enough length-$l$ strings remain unblocked by previously chosen codewords. The common reformulation $\sum_{l=1}^s n_l \, a^{s-l} \le a^s$, where $n_l$ is the number of codewords of length $l$, drives both directions.
[/proofplan]
[step:Normalise the problem by grouping codewords by length]
Fix the alphabet $B$ of size $a \ge 1$ and set
\begin{align*}
s := \max_{1 \le i \le m} l_i, \qquad n_l := \#\{i \in \{1, \ldots, m\} : l_i = l\} \text{ for } l \in \{1, \ldots, s\}.
\end{align*}
Then $m = \sum_{l=1}^s n_l$. Multiplying Kraft's inequality through by $a^s$ gives the equivalent form
\begin{align*}
\sum_{i=1}^m a^{-l_i} \le 1 \iff \sum_{l=1}^s n_l \, a^{-l} \le 1 \iff \sum_{l=1}^s n_l \, a^{s-l} \le a^s.
\end{align*}
We will work with the last form throughout.
[/step]
[step:Prove the forward direction by counting length-$s$ extensions]
Assume a prefix-free $a$-ary code $c$ with word lengths $l_1, \ldots, l_m$ exists. For each codeword $w \in B^l$ of length $l \le s$, define the map
\begin{align*}
E_w : B^{s-l} &\to B^s \\
u &\mapsto wu,
\end{align*}
and let $E(w) := E_w(B^{s-l}) \subset B^s$ be its image — the set of length-$s$ strings having $w$ as a prefix. The map $E_w$ is injective (concatenation with a fixed string on the left is cancellative), so $\#E(w) = a^{s-l}$.
[guided]
We count the length-$s$ strings that start with a given codeword. Fix a codeword $w$ of length $l \le s$. A length-$s$ string begins with $w$ if and only if it is of the form $wu$ for some $u \in B^{s-l}$. The number of choices of $u$ is $\#B^{s-l} = a^{s-l}$, and distinct $u$ produce distinct concatenations $wu$, so the number of length-$s$ strings prefixed by $w$ is exactly $a^{s-l}$.
[/guided]
We now show the sets $\{E(w) : w \text{ a codeword}\}$ are pairwise disjoint. Suppose $w, w'$ are distinct codewords of lengths $l, l' \le s$ and some $v \in E(w) \cap E(w')$. Then $v$ begins with both $w$ and $w'$; since $|w| \le |w'|$ or $|w'| \le |w|$, one of $w, w'$ is a prefix of the other, contradicting the prefix-free hypothesis.
Since the sets $E(w)$ are pairwise disjoint subsets of $B^s$ and $\#B^s = a^s$,
\begin{align*}
\sum_{l=1}^s n_l \, a^{s-l} = \sum_{w \text{ codeword}} \#E(w) = \#\!\left(\bigcup_{w} E(w)\right) \le a^s.
\end{align*}
By Step 1, this is Kraft's inequality.
[/step]
[step:Prove the converse by strong induction on $s$]
Assume $\sum_{l=1}^s n_l \, a^{s-l} \le a^s$ with nonnegative integers $n_1, \ldots, n_s$. We construct a prefix-free code with $n_l$ codewords of each length $l \in \{1, \ldots, s\}$ by induction on $s$.
**Base case** ($s = 1$). Kraft's inequality reads $n_1 \le a$, so there are enough letters in $B$ to choose $n_1$ distinct length-$1$ codewords. The code is prefix-free because all codewords have the same length, ruling out any non-trivial prefix relation.
**Inductive step.** Suppose the result holds for parameter $s - 1$, and let $(n_1, \ldots, n_s)$ satisfy $\sum_{l=1}^s n_l \, a^{s-l} \le a^s$. The truncated sequence $(n_1, \ldots, n_{s-1})$ satisfies
\begin{align*}
\sum_{l=1}^{s-1} n_l \, a^{(s-1)-l} = \frac{1}{a}\sum_{l=1}^{s-1} n_l \, a^{s-l} \le \frac{1}{a}\left(a^s - n_s\right) \le \frac{a^s}{a} = a^{s-1}.
\end{align*}
By the inductive hypothesis, there exists a prefix-free code $\hat{c}$ with $n_l$ codewords of length $l$ for each $l \in \{1, \ldots, s-1\}$.
[guided]
The inductive hypothesis is about codes whose maximum word length is $s-1$. To apply it, we must check that Kraft's inequality (in its equivalent form, Step 1) holds for the truncated length profile. Starting from $\sum_{l=1}^s n_l\, a^{s-l} \le a^s$, isolate the $l = s$ term: $\sum_{l=1}^{s-1} n_l \, a^{s-l} \le a^s - n_s$. Dividing by $a > 0$ gives precisely Kraft's inequality for the parameter $s-1$, so the inductive hypothesis applies and produces a prefix-free code $\hat{c}$ realising the length profile $(n_1, \ldots, n_{s-1})$.
[/guided]
[/step]
[step:Extend $\hat{c}$ to length $s$ by choosing unblocked length-$s$ strings]
We now append $n_s$ codewords of length $s$ to $\hat{c}$. Let $\mathrm{Cov} \subset B^s$ be the set of length-$s$ strings that have some codeword of $\hat{c}$ as a prefix:
\begin{align*}
\mathrm{Cov} := \bigcup_{w \text{ codeword of } \hat{c}} E(w), \qquad E(w) = \{v \in B^s : w \text{ is a prefix of } v\}.
\end{align*}
As in Step 2, the sets $E(w)$ are pairwise disjoint because $\hat{c}$ is prefix-free, and $\#E(w) = a^{s - |w|}$, so
\begin{align*}
\#\mathrm{Cov} = \sum_{l=1}^{s-1} n_l \, a^{s-l} \le a^s - n_s,
\end{align*}
using Kraft's inequality in its equivalent form. Hence $\#(B^s \setminus \mathrm{Cov}) \ge n_s$, and we may choose $n_s$ distinct length-$s$ strings $v_1, \ldots, v_{n_s} \in B^s \setminus \mathrm{Cov}$.
Let $c$ be the code obtained by adjoining $v_1, \ldots, v_{n_s}$ to $\hat{c}$. We verify that $c$ is prefix-free:
- No codeword of $\hat{c}$ is a prefix of any $v_j$: by construction, $v_j \notin \mathrm{Cov}$, so no codeword of $\hat{c}$ is a prefix of $v_j$.
- No $v_j$ is a prefix of a codeword of $\hat{c}$: every codeword of $\hat{c}$ has length at most $s-1 < s = |v_j|$, so $v_j$ cannot be a prefix of a strictly shorter string.
- No $v_j$ is a prefix of another $v_k$ ($j \ne k$): the $v_j$ all have length $s$ and are distinct, ruling out prefix relations.
- Inside $\hat{c}$: $\hat{c}$ is prefix-free by the inductive hypothesis.
So $c$ is prefix-free with the prescribed word lengths $l_1, \ldots, l_m$. By induction, the converse holds for all $s \in \mathbb{N}$, completing the proof.
[guided]
The only non-mechanical verification is that no codeword of $\hat{c}$ is a prefix of any freshly chosen $v_j$. This is exactly the statement $v_j \notin \mathrm{Cov}$, which holds by construction. The other three cases reduce to length comparisons: a prefix must be no longer than its containing string, and strings of equal length are prefixes of each other only when equal.
[/guided]
[/step]