[proofplan]
The strategy is to bound a high power of the Kraft sum. Expanding $\left(\sum_i a^{-l_i}\right)^R$ and grouping terms by total length yields a sum over $l$ of $b_l\, a^{-l}$, where $b_l$ counts tuples of $R$ codewords with lengths summing to $l$. Decipherability means that distinct tuples of codewords produce distinct concatenations in $B^*$, so the map from such tuples to length-$l$ strings is injective; hence $b_l \le a^l$. This collapses the sum to at most $Rs$, giving $(\sum_i a^{-l_i})^R \le Rs$. Taking $R$-th roots and sending $R \to \infty$ yields $\sum_i a^{-l_i} \le 1$.
[/proofplan]
[step:Set up notation and the combinatorial identity for the $R$-th power]
Let $c : A \to B^*$ be a decipherable $a$-ary code with $\#A = m$, word lengths $l_1, \ldots, l_m$, and $s := \max_{1 \le i \le m} l_i$. Fix $R \in \mathbb{N}$ and define, for $l \in \{R, R+1, \ldots, Rs\}$,
\begin{align*}
b_l := \#\!\left\{(i_1, \ldots, i_R) \in \{1, \ldots, m\}^R : l_{i_1} + l_{i_2} + \cdots + l_{i_R} = l\right\}.
\end{align*}
Note $b_l \ge 0$ and $b_l = 0$ for $l < R$ and $l > Rs$ (since each $l_{i_j}$ is between $1$ and $s$).
Expanding the power and grouping by the total length of the selected tuple,
\begin{align*}
\left(\sum_{i=1}^m a^{-l_i}\right)^{\!R} = \sum_{i_1=1}^m \cdots \sum_{i_R=1}^m a^{-(l_{i_1} + \cdots + l_{i_R})} = \sum_{l=R}^{Rs} b_l \, a^{-l}.
\end{align*}
[guided]
The $R$-th power of a sum is the sum over all $R$-tuples of indices. Each tuple $(i_1, \ldots, i_R)$ contributes the product of the corresponding terms, which is $a^{-l_{i_1}} \cdots a^{-l_{i_R}} = a^{-(l_{i_1} + \cdots + l_{i_R})}$. Grouping the tuples by the common value of the length sum gives a sum indexed by that total length $l$, with the number of tuples having sum $l$ recorded as $b_l$. The range $R \le l \le Rs$ is forced by the bounds $1 \le l_{i_j} \le s$.
[/guided]
[/step]
[step:Bound $b_l$ by $a^l$ using decipherability]
Define the concatenation map
\begin{align*}
\Phi : \{1, \ldots, m\}^R &\to B^*, \\
(i_1, \ldots, i_R) &\mapsto c(x_{i_1})\, c(x_{i_2}) \cdots c(x_{i_R}),
\end{align*}
where $A = \{x_1, \ldots, x_m\}$ enumerates the source alphabet. Decipherability of $c$ (extended coordinatewise to $c : A^* \to B^*$, $a_1 \ldots a_R \mapsto c(a_1) \cdots c(a_R)$) means precisely that this extension is injective on $A^R$; hence $\Phi$ is injective on $\{1, \ldots, m\}^R$.
If $(i_1, \ldots, i_R)$ contributes to $b_l$, then $|\Phi(i_1, \ldots, i_R)| = l_{i_1} + \cdots + l_{i_R} = l$, so $\Phi$ restricts to an injection
\begin{align*}
\Phi\big|_{\{\text{tuples with length sum } l\}} : \{\text{tuples with length sum } l\} \hookrightarrow B^l.
\end{align*}
Taking cardinalities,
\begin{align*}
b_l \le \#B^l = a^l.
\end{align*}
[guided]
Decipherability is exactly the statement that the concatenation of codewords has an unambiguous inverse — the map "source string $\mapsto$ concatenated codeword string" is injective. Via the bijection $A \leftrightarrow \{1, \ldots, m\}$, this translates to $\Phi$ being injective on index tuples. Now, when the length sum equals $l$, the output lives in $B^l$, so an injection into $B^l$ gives the bound $b_l \le \#B^l = a^l$. This is the only place where decipherability is used.
[/guided]
[/step]
[step:Collapse the sum to a polynomial bound in $R$]
Substituting $b_l \le a^l$ into the identity of Step 1,
\begin{align*}
\left(\sum_{i=1}^m a^{-l_i}\right)^{\!R} = \sum_{l=R}^{Rs} b_l \, a^{-l} \le \sum_{l=R}^{Rs} a^l \cdot a^{-l} = \sum_{l=R}^{Rs} 1 = Rs - R + 1 \le Rs.
\end{align*}
[/step]
[step:Take $R$-th roots and pass to the limit $R \to \infty$]
Let $K := \sum_{i=1}^m a^{-l_i} \ge 0$. The previous step gives $K^R \le Rs$ for every $R \in \mathbb{N}$, i.e.,
\begin{align*}
K \le (Rs)^{1/R} \quad \text{for all } R \in \mathbb{N}.
\end{align*}
Since $s$ is a fixed positive integer, $(Rs)^{1/R} = \exp\!\left(\tfrac{\log(Rs)}{R}\right)$ and
\begin{align*}
\lim_{R \to \infty} \frac{\log(Rs)}{R} = \lim_{R \to \infty} \frac{\log R + \log s}{R} = 0.
\end{align*}
By continuity of $\exp$ at $0$, $\lim_{R \to \infty} (Rs)^{1/R} = e^0 = 1$. Hence $K \le 1$, which is Kraft's inequality.
[guided]
The point of taking the $R$-th power was to extract polynomial-in-$R$ information from an $R$-fold product. After taking the $R$-th root, the right-hand side $(Rs)^{1/R}$ converges to $1$ — this is the standard limit $\lim_{R \to \infty} R^{1/R} = 1$, adjusted by the bounded factor $s^{1/R} \to 1$. Since $K \le (Rs)^{1/R}$ for every $R$, passing to the infimum over $R$ gives $K \le 1$. No bound weaker than $K \le 1$ would have sufficed: Kraft's inequality is exactly the assertion $K \le 1$.
[/guided]
[/step]