Shannon's First Coding Theorem — Statement & Proof

Shannon's First Coding Theorem (Theorem # 1644)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We establish the information rate by proving two matching bounds: the source is reliably encodable at every rate $H + \varepsilon$ (upper bound), and fails to be reliably encodable at any rate $H - 2\varepsilon$ (lower bound), for every $\varepsilon > 0$. The upper bound uses the typical-set lower probability bound $p(x) \ge 2^{-n(H + \varepsilon)}$ on the typical set $T_n$, together with $\mathbb{P}(T_n) \le 1$, to control $|T_n|$. The lower bound argues by contradiction: any encoding at rate $H - 2\varepsilon$ intersects $T_n$ in a set of vanishing probability, contradicting $\mathbb{P}(T_n) \to 1$ given by the AEP. [/proofplan] [step:Invoke the AEP to fix the typical sets and their defining bounds] By the [Asymptotic Equipartition Property](/theorems/???) with constant $H$, for every $\varepsilon > 0$ there exists a sequence of sets $T_n \subseteq A^n$ (the typical sets at tolerance $\varepsilon$) such that: \begin{align*} &\text{(a)} \quad \mathbb{P}\bigl((X_1, \ldots, X_n) \in T_n\bigr) \to 1 \text{ as } n \to \infty,\\ &\text{(b)} \quad 2^{-n(H + \varepsilon)} \le p(x_1, \ldots, x_n) \le 2^{-n(H - \varepsilon)} \text{ for all } (x_1, \ldots, x_n) \in T_n, \end{align*} where $p(x_1, \ldots, x_n) = \mathbb{P}(X_1 = x_1, \ldots, X_n = x_n)$ is the joint mass function. We use these $T_n$ throughout. [/step] [step:Upper bound — show rate $H + \varepsilon$ suffices by taking $A_n = T_n$] Fix $\varepsilon > 0$. Summing the lower probability bound $p(x) \ge 2^{-n(H + \varepsilon)}$ from (b) over $T_n$: \begin{align*} \mathbb{P}(T_n) = \sum_{x \in T_n} p(x) \ge |T_n| \cdot 2^{-n(H + \varepsilon)}. \end{align*} Since $\mathbb{P}(T_n) \le 1$, rearranging gives \begin{align*} |T_n| \le 2^{n(H + \varepsilon)}, \qquad \text{hence } \frac{\log_2 |T_n|}{n} \le H + \varepsilon. \end{align*} Set $A_n := T_n$. By (a), $\mathbb{P}((X_1, \ldots, X_n) \in A_n) \to 1$, and the size bound shows the encoding rate is at most $H + \varepsilon$. Therefore the source is [reliably encodable](/theorems/???) at rate $H + \varepsilon$ for every $\varepsilon > 0$, so the information rate is at most $H$. [guided] The key inequality $|T_n| \le 2^{n(H + \varepsilon)}$ is really the pigeonhole principle in disguise. The total probability mass on $T_n$ is at most $1$, and every element of $T_n$ carries at least $2^{-n(H+\varepsilon)}$ mass, so there can be at most $2^{n(H+\varepsilon)}$ elements. This is a purely counting argument from the lower probability bound. Why does $A_n = T_n$ work as a reliable encoding? We need two things from the [definition of reliable encodability at rate $R$](/theorems/???): a sequence $(A_n)$ with $\mathbb{P}((X_1, \ldots, X_n) \in A_n) \to 1$ (so the probability of correct decoding tends to one) and $\limsup_n \frac{\log_2 |A_n|}{n} \le R$ (so the rate is at most $R$). Both hold for $A_n = T_n$ at rate $R = H + \varepsilon$, by (a) and the counting bound above respectively. Since $\varepsilon > 0$ is arbitrary, the infimum of reliable rates — the information rate — is at most $H$. [/guided] [/step] [step:Lower bound — suppose for contradiction rate $H - 2\varepsilon$ is achievable] If $H = 0$ there is nothing to prove, since rates are non-negative and the upper bound already gives information rate at most $0$. Assume $H > 0$ and fix $\varepsilon$ with $0 < \varepsilon < H/2$. Suppose for contradiction that the source is reliably encodable at rate $H - 2\varepsilon$; then there exists a sequence $A_n \subseteq A^n$ with \begin{align*} \mathbb{P}\bigl((X_1, \ldots, X_n) \in A_n\bigr) \to 1 \quad \text{and} \quad |A_n| \le 2^{n(H - 2\varepsilon)} \end{align*} along some subsequence. By passing to a subsequence, we may assume both conditions hold for all $n$. [/step] [step:Bound the typical-set mass captured by $A_n$ using the upper probability bound] On $T_n$, the AEP gives $p(x) \le 2^{-n(H - \varepsilon)}$ by (b). Hence \begin{align*} \mathbb{P}\bigl((X_1, \ldots, X_n) \in A_n \cap T_n\bigr) = \sum_{x \in A_n \cap T_n} p(x) \le |A_n \cap T_n| \cdot 2^{-n(H - \varepsilon)} \le |A_n| \cdot 2^{-n(H - \varepsilon)}. \end{align*} Substituting the assumed size bound $|A_n| \le 2^{n(H - 2\varepsilon)}$: \begin{align*} \mathbb{P}\bigl((X_1, \ldots, X_n) \in A_n \cap T_n\bigr) \le 2^{n(H - 2\varepsilon)} \cdot 2^{-n(H - \varepsilon)} = 2^{-n\varepsilon}. \end{align*} Since $\varepsilon > 0$, the right-hand side tends to $0$ as $n \to \infty$. [guided] The bound here is the dual of the bound in the upper-bound step: previously we used the lower probability bound to control the size of $T_n$; now we use the upper probability bound to control the mass of $A_n \cap T_n$. Elements of $T_n$ have probability at most $2^{-n(H-\varepsilon)}$, and $A_n \cap T_n$ has at most $|A_n| \le 2^{n(H-2\varepsilon)}$ elements, so its total mass is at most $2^{n(H-2\varepsilon)} \cdot 2^{-n(H-\varepsilon)} = 2^{-n\varepsilon}$. The shortfall of $2\varepsilon$ in the rate versus the $\varepsilon$ tolerance of the AEP produces a factor $2^{-n\varepsilon}$, which decays geometrically. This is where the specific choice of $H - 2\varepsilon$ (rather than $H - \varepsilon$) is used: we need a strict gap between the encoding rate and the AEP's upper bound exponent. [/guided] [/step] [step:Derive a contradiction from $\mathbb{P}(T_n) \to 1$] The typical set decomposes as $T_n = (A_n \cap T_n) \cup (T_n \setminus A_n)$, with the two parts disjoint. Since $T_n \setminus A_n \subseteq A_n^c$, additivity of probability gives \begin{align*} \mathbb{P}(T_n) = \mathbb{P}(A_n \cap T_n) + \mathbb{P}(T_n \setminus A_n) \le \mathbb{P}(A_n \cap T_n) + \mathbb{P}(A_n^c). \end{align*} By the assumption $\mathbb{P}(A_n) \to 1$, we have $\mathbb{P}(A_n^c) \to 0$, and the previous step gives $\mathbb{P}(A_n \cap T_n) \to 0$. Therefore $\mathbb{P}(T_n) \to 0$, contradicting (a) from the AEP, which gives $\mathbb{P}(T_n) \to 1$. The contradiction shows no such $A_n$ exists, so the source is not reliably encodable at rate $H - 2\varepsilon$ for any $0 < \varepsilon < H/2$. Therefore the information rate is at least $H$. [/step] [step:Combine the two bounds] Combining the upper and lower bounds, the information rate of the source equals $H$. [/step]

Explore Further

Achievability for the BSC Information Theory Bernoulli Sources Satisfy AEP Information Theory Shannon's Second Coding Theorem Information Theory Perfect Secrecy Requires |K| ≥ |M| Information Theory Hamming Bound Information Theory Gibbs' Inequality Information Theory Joint Conditioning Inequality Information Theory Prefix-Free Codes Are No Worse Than Decipherable Codes Information Theory

What brings you to Androma?

Start with a route through the knowledge graph.

Shannon's First Coding Theorem (Theorem # 1644)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Shannon's First Coding Theorem (Theorem # 1644)

Discussion

Proof

Explore Further