[proofplan]
We establish the information rate by proving two matching bounds: the source is reliably encodable at every rate $H + \varepsilon$ (upper bound), and fails to be reliably encodable at any rate $H - 2\varepsilon$ (lower bound), for every $\varepsilon > 0$. The upper bound uses the typical-set lower probability bound $p(x) \ge 2^{-n(H + \varepsilon)}$ on the typical set $T_n$, together with $\mathbb{P}(T_n) \le 1$, to control $|T_n|$. The lower bound argues by contradiction: any encoding at rate $H - 2\varepsilon$ intersects $T_n$ in a set of vanishing probability, contradicting $\mathbb{P}(T_n) \to 1$ given by the AEP.
[/proofplan]
[step:Invoke the AEP to fix the typical sets and their defining bounds]
By the [Asymptotic Equipartition Property](/theorems/???) with constant $H$, for every $\varepsilon > 0$ there exists a sequence of sets $T_n \subseteq A^n$ (the typical sets at tolerance $\varepsilon$) such that:
\begin{align*}
&\text{(a)} \quad \mathbb{P}\bigl((X_1, \ldots, X_n) \in T_n\bigr) \to 1 \text{ as } n \to \infty,\\
&\text{(b)} \quad 2^{-n(H + \varepsilon)} \le p(x_1, \ldots, x_n) \le 2^{-n(H - \varepsilon)} \text{ for all } (x_1, \ldots, x_n) \in T_n,
\end{align*}
where $p(x_1, \ldots, x_n) = \mathbb{P}(X_1 = x_1, \ldots, X_n = x_n)$ is the joint mass function. We use these $T_n$ throughout.
[/step]
[step:Upper bound — show rate $H + \varepsilon$ suffices by taking $A_n = T_n$]
Fix $\varepsilon > 0$. Summing the lower probability bound $p(x) \ge 2^{-n(H + \varepsilon)}$ from (b) over $T_n$:
\begin{align*}
\mathbb{P}(T_n) = \sum_{x \in T_n} p(x) \ge |T_n| \cdot 2^{-n(H + \varepsilon)}.
\end{align*}
Since $\mathbb{P}(T_n) \le 1$, rearranging gives
\begin{align*}
|T_n| \le 2^{n(H + \varepsilon)}, \qquad \text{hence } \frac{\log_2 |T_n|}{n} \le H + \varepsilon.
\end{align*}
Set $A_n := T_n$. By (a), $\mathbb{P}((X_1, \ldots, X_n) \in A_n) \to 1$, and the size bound shows the encoding rate is at most $H + \varepsilon$. Therefore the source is [reliably encodable](/theorems/???) at rate $H + \varepsilon$ for every $\varepsilon > 0$, so the information rate is at most $H$.
[guided]
The key inequality $|T_n| \le 2^{n(H + \varepsilon)}$ is really the pigeonhole principle in disguise. The total probability mass on $T_n$ is at most $1$, and every element of $T_n$ carries at least $2^{-n(H+\varepsilon)}$ mass, so there can be at most $2^{n(H+\varepsilon)}$ elements. This is a purely counting argument from the lower probability bound.
Why does $A_n = T_n$ work as a reliable encoding? We need two things from the [definition of reliable encodability at rate $R$](/theorems/???): a sequence $(A_n)$ with $\mathbb{P}((X_1, \ldots, X_n) \in A_n) \to 1$ (so the probability of correct decoding tends to one) and $\limsup_n \frac{\log_2 |A_n|}{n} \le R$ (so the rate is at most $R$). Both hold for $A_n = T_n$ at rate $R = H + \varepsilon$, by (a) and the counting bound above respectively. Since $\varepsilon > 0$ is arbitrary, the infimum of reliable rates — the information rate — is at most $H$.
[/guided]
[/step]
[step:Lower bound — suppose for contradiction rate $H - 2\varepsilon$ is achievable]
If $H = 0$ there is nothing to prove, since rates are non-negative and the upper bound already gives information rate at most $0$. Assume $H > 0$ and fix $\varepsilon$ with $0 < \varepsilon < H/2$. Suppose for contradiction that the source is reliably encodable at rate $H - 2\varepsilon$; then there exists a sequence $A_n \subseteq A^n$ with
\begin{align*}
\mathbb{P}\bigl((X_1, \ldots, X_n) \in A_n\bigr) \to 1 \quad \text{and} \quad |A_n| \le 2^{n(H - 2\varepsilon)}
\end{align*}
along some subsequence. By passing to a subsequence, we may assume both conditions hold for all $n$.
[/step]
[step:Bound the typical-set mass captured by $A_n$ using the upper probability bound]
On $T_n$, the AEP gives $p(x) \le 2^{-n(H - \varepsilon)}$ by (b). Hence
\begin{align*}
\mathbb{P}\bigl((X_1, \ldots, X_n) \in A_n \cap T_n\bigr) = \sum_{x \in A_n \cap T_n} p(x) \le |A_n \cap T_n| \cdot 2^{-n(H - \varepsilon)} \le |A_n| \cdot 2^{-n(H - \varepsilon)}.
\end{align*}
Substituting the assumed size bound $|A_n| \le 2^{n(H - 2\varepsilon)}$:
\begin{align*}
\mathbb{P}\bigl((X_1, \ldots, X_n) \in A_n \cap T_n\bigr) \le 2^{n(H - 2\varepsilon)} \cdot 2^{-n(H - \varepsilon)} = 2^{-n\varepsilon}.
\end{align*}
Since $\varepsilon > 0$, the right-hand side tends to $0$ as $n \to \infty$.
[guided]
The bound here is the dual of the bound in the upper-bound step: previously we used the lower probability bound to control the size of $T_n$; now we use the upper probability bound to control the mass of $A_n \cap T_n$. Elements of $T_n$ have probability at most $2^{-n(H-\varepsilon)}$, and $A_n \cap T_n$ has at most $|A_n| \le 2^{n(H-2\varepsilon)}$ elements, so its total mass is at most $2^{n(H-2\varepsilon)} \cdot 2^{-n(H-\varepsilon)} = 2^{-n\varepsilon}$. The shortfall of $2\varepsilon$ in the rate versus the $\varepsilon$ tolerance of the AEP produces a factor $2^{-n\varepsilon}$, which decays geometrically. This is where the specific choice of $H - 2\varepsilon$ (rather than $H - \varepsilon$) is used: we need a strict gap between the encoding rate and the AEP's upper bound exponent.
[/guided]
[/step]
[step:Derive a contradiction from $\mathbb{P}(T_n) \to 1$]
The typical set decomposes as $T_n = (A_n \cap T_n) \cup (T_n \setminus A_n)$, with the two parts disjoint. Since $T_n \setminus A_n \subseteq A_n^c$, additivity of probability gives
\begin{align*}
\mathbb{P}(T_n) = \mathbb{P}(A_n \cap T_n) + \mathbb{P}(T_n \setminus A_n) \le \mathbb{P}(A_n \cap T_n) + \mathbb{P}(A_n^c).
\end{align*}
By the assumption $\mathbb{P}(A_n) \to 1$, we have $\mathbb{P}(A_n^c) \to 0$, and the previous step gives $\mathbb{P}(A_n \cap T_n) \to 0$. Therefore $\mathbb{P}(T_n) \to 0$, contradicting (a) from the AEP, which gives $\mathbb{P}(T_n) \to 1$.
The contradiction shows no such $A_n$ exists, so the source is not reliably encodable at rate $H - 2\varepsilon$ for any $0 < \varepsilon < H/2$. Therefore the information rate is at least $H$.
[/step]
[step:Combine the two bounds]
Combining the upper and lower bounds, the information rate of the source equals $H$.
[/step]