[proofplan]
For any $\varepsilon > 0$ we exhibit sets $A_n \subseteq A^n$ whose size grows at rate $\mathbb{E}[l_1] + \varepsilon$ and whose probability tends to $1$. The sets are the preimages under $c^*$ of short codewords: the cumulative length of $c^*(X_1, \ldots, X_n) = c(X_1) \cdots c(X_n)$ is $\sum_{i=1}^{n} l_i$, which by the weak law of large numbers is less than $n(\mathbb{E}[l_1] + \varepsilon)$ with probability approaching $1$. Injectivity of $c^*$ (from decipherability of $c$) then bounds $|A_n|$ by the number of binary strings of length less than $n(\mathbb{E}[l_1] + \varepsilon)$, giving reliable encoding at rate $\mathbb{E}[l_1] + \varepsilon$.
[/proofplan]
[step:Set up the codeword-length random variables and their i.i.d. structure]
Let $l: A \to \mathbb{N}$ be the codeword-length function of the code $c$, i.e., $l(x) := |c(x)|$ is the binary length of $c(x) \in \{0, 1\}^*$. For each $i \ge 1$ define
\begin{align*}
l_i := l(X_i),
\end{align*}
the length of the codeword emitted when encoding $X_i$. Since $X_1, X_2, \ldots$ are i.i.d. (by the Bernoulli hypothesis) and each $l_i$ is a fixed measurable function of $X_i$, the sequence $(l_i)_{i \ge 1}$ is i.i.d. with common distribution equal to that of $l_1 = l(X_1)$. The alphabet $A$ is finite and $l$ is a well-defined function on $A$, so $l_1$ takes only finitely many values; in particular $\mathbb{E}[l_1] < \infty$.
[/step]
[step:Define $A_n$ as the set of source sequences whose concatenated encoding is short]
Let $c^*: \bigcup_{n \ge 1} A^n \to \{0, 1\}^*$, $(x_1, \ldots, x_n) \mapsto c(x_1) c(x_2) \cdots c(x_n)$ denote the concatenation extension of $c$. For $(x_1, \ldots, x_n) \in A^n$, the length of $c^*(x_1, \ldots, x_n)$ is $\sum_{i=1}^{n} l(x_i)$. Fix $\varepsilon > 0$ and set
\begin{align*}
A_n := \left\{ (x_1, \ldots, x_n) \in A^n : |c^*(x_1, \ldots, x_n)| < n(\mathbb{E}[l_1] + \varepsilon) \right\}.
\end{align*}
[guided]
The membership condition $(x_1, \ldots, x_n) \in A_n$ can be rewritten in terms of the lengths: $|c^*(x_1, \ldots, x_n)| = \sum_{i=1}^{n} l(x_i)$, so $(x_1, \ldots, x_n) \in A_n$ iff the empirical average $\frac{1}{n}\sum_{i=1}^{n} l(x_i)$ is strictly less than $\mathbb{E}[l_1] + \varepsilon$. The strategy is to use this: the empirical average concentrates at $\mathbb{E}[l_1]$ by the WLLN, which lies strictly below $\mathbb{E}[l_1] + \varepsilon$, so the event becomes almost certain. Meanwhile, the cardinality $|A_n|$ is controlled by the number of short binary strings because $c^*$ is injective and its image on $A_n$ lies in a set of size $2^{n(\mathbb{E}[l_1] + \varepsilon)}$.
[/guided]
[/step]
[step:Apply the WLLN to show $\mathbb{P}((X_1, \ldots, X_n) \in A_n) \to 1$]
In terms of the random variables $l_i$,
\begin{align*}
\mathbb{P}\bigl((X_1, \ldots, X_n) \in A_n\bigr) = \mathbb{P}\!\left(\sum_{i=1}^{n} l_i < n(\mathbb{E}[l_1] + \varepsilon)\right) = \mathbb{P}\!\left(\frac{1}{n} \sum_{i=1}^{n} l_i < \mathbb{E}[l_1] + \varepsilon\right).
\end{align*}
Since $(l_i)$ is i.i.d. with finite mean $\mathbb{E}[l_1]$, the [Weak Law of Large Numbers](/theorems/???) applies and gives $\frac{1}{n}\sum_{i=1}^{n} l_i \xrightarrow{\mathbb{P}} \mathbb{E}[l_1]$. In particular, for every fixed $\varepsilon > 0$,
\begin{align*}
\mathbb{P}\!\left(\left|\frac{1}{n}\sum_{i=1}^{n} l_i - \mathbb{E}[l_1]\right| < \varepsilon\right) \to 1,
\end{align*}
and the event $\bigl\{\frac{1}{n}\sum_{i=1}^{n} l_i - \mathbb{E}[l_1] < \varepsilon\bigr\}$ contains the two-sided event, so its probability also tends to $1$. Therefore
\begin{align*}
\mathbb{P}\bigl((X_1, \ldots, X_n) \in A_n\bigr) \to 1 \quad \text{as } n \to \infty.
\end{align*}
[/step]
[step:Bound $|A_n|$ by counting short binary strings via injectivity of $c^*$]
Because the code $c$ is decipherable, its concatenation extension $c^*$ is [injective on $\bigcup_{n} A^n$](/theorems/???). Restricted to $A_n$, the map $c^*$ is in particular injective, and
\begin{align*}
c^*(A_n) \subseteq \{w \in \{0, 1\}^* : |w| < n(\mathbb{E}[l_1] + \varepsilon)\} = \bigcup_{k = 0}^{\lceil n(\mathbb{E}[l_1] + \varepsilon) \rceil - 1} \{0, 1\}^k.
\end{align*}
Hence
\begin{align*}
|A_n| = |c^*(A_n)| \le \sum_{k = 0}^{\lceil n(\mathbb{E}[l_1] + \varepsilon) \rceil - 1} 2^k = 2^{\lceil n(\mathbb{E}[l_1] + \varepsilon) \rceil} - 1 \le 2^{n(\mathbb{E}[l_1] + \varepsilon) + 1}.
\end{align*}
Taking $\log_2$ and dividing by $n$:
\begin{align*}
\frac{\log_2 |A_n|}{n} \le \mathbb{E}[l_1] + \varepsilon + \frac{1}{n},
\end{align*}
so $\limsup_{n \to \infty} \frac{\log_2 |A_n|}{n} \le \mathbb{E}[l_1] + \varepsilon$.
[guided]
The counting argument rests on two ingredients. First, injectivity of $c^*$ turns a probability question about source sequences into a cardinality question about binary strings: distinct source sequences in $A_n$ map to distinct binary strings in $c^*(A_n)$, so $|A_n| = |c^*(A_n)|$. Second, every binary string of length less than $L := \lceil n(\mathbb{E}[l_1] + \varepsilon)\rceil$ lies in $\bigcup_{k=0}^{L-1}\{0,1\}^k$, a set of size $\sum_{k=0}^{L-1} 2^k = 2^L - 1 \le 2^L$. Combining gives $|A_n| \le 2^L \le 2^{n(\mathbb{E}[l_1] + \varepsilon) + 1}$, since $L = \lceil n(\mathbb{E}[l_1] + \varepsilon)\rceil \le n(\mathbb{E}[l_1] + \varepsilon) + 1$. The additive $+1$ in the exponent produces the $\frac{1}{n}$ correction, which vanishes in the limit and so does not affect the asymptotic rate.
Where is decipherability used? Without it, $c^*$ could be non-injective — two distinct source sequences could produce the same concatenated codeword — and then the short-string counting bound would not control $|A_n|$. Decipherability is precisely the condition that ensures unique decoding and hence injectivity of $c^*$.
[/guided]
[/step]
[step:Combine to conclude reliable encoding at rate $\mathbb{E}[l_1] + \varepsilon$]
From the two previous steps, $(A_n)$ satisfies $\mathbb{P}((X_1, \ldots, X_n) \in A_n) \to 1$ and $\limsup_{n \to \infty} \frac{\log_2 |A_n|}{n} \le \mathbb{E}[l_1] + \varepsilon$. By the [definition of reliable encodability](/theorems/???), the source is reliably encodable at rate $\mathbb{E}[l_1] + \varepsilon$.
[/step]
[step:Take $\varepsilon \downarrow 0$ and conclude the information rate bound]
Because the source is reliably encodable at rate $\mathbb{E}[l_1] + \varepsilon$ for every $\varepsilon > 0$, the information rate — defined as the infimum of rates at which reliable encoding is possible — satisfies
\begin{align*}
\text{information rate} \le \inf_{\varepsilon > 0} (\mathbb{E}[l_1] + \varepsilon) = \mathbb{E}[l_1].
\end{align*}
This completes the proof.
[/step]