Information Rate Bounded by Expected Code Length

Information Rate Bounded by Expected Code Length (Theorem # 1647)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] For any $\varepsilon > 0$ we exhibit sets $A_n \subseteq A^n$ whose size grows at rate $\mathbb{E}[l_1] + \varepsilon$ and whose probability tends to $1$. The sets are the preimages under $c^*$ of short codewords: the cumulative length of $c^*(X_1, \ldots, X_n) = c(X_1) \cdots c(X_n)$ is $\sum_{i=1}^{n} l_i$, which by the weak law of large numbers is less than $n(\mathbb{E}[l_1] + \varepsilon)$ with probability approaching $1$. Injectivity of $c^*$ (from decipherability of $c$) then bounds $|A_n|$ by the number of binary strings of length less than $n(\mathbb{E}[l_1] + \varepsilon)$, giving reliable encoding at rate $\mathbb{E}[l_1] + \varepsilon$. [/proofplan] [step:Set up the codeword-length random variables and their i.i.d. structure] Let $l: A \to \mathbb{N}$ be the codeword-length function of the code $c$, i.e., $l(x) := |c(x)|$ is the binary length of $c(x) \in \{0, 1\}^*$. For each $i \ge 1$ define \begin{align*} l_i := l(X_i), \end{align*} the length of the codeword emitted when encoding $X_i$. Since $X_1, X_2, \ldots$ are i.i.d. (by the Bernoulli hypothesis) and each $l_i$ is a fixed measurable function of $X_i$, the sequence $(l_i)_{i \ge 1}$ is i.i.d. with common distribution equal to that of $l_1 = l(X_1)$. The alphabet $A$ is finite and $l$ is a well-defined function on $A$, so $l_1$ takes only finitely many values; in particular $\mathbb{E}[l_1] < \infty$. [/step] [step:Define $A_n$ as the set of source sequences whose concatenated encoding is short] Let $c^*: \bigcup_{n \ge 1} A^n \to \{0, 1\}^*$, $(x_1, \ldots, x_n) \mapsto c(x_1) c(x_2) \cdots c(x_n)$ denote the concatenation extension of $c$. For $(x_1, \ldots, x_n) \in A^n$, the length of $c^*(x_1, \ldots, x_n)$ is $\sum_{i=1}^{n} l(x_i)$. Fix $\varepsilon > 0$ and set \begin{align*} A_n := \left\{ (x_1, \ldots, x_n) \in A^n : |c^*(x_1, \ldots, x_n)| < n(\mathbb{E}[l_1] + \varepsilon) \right\}. \end{align*} [guided] The membership condition $(x_1, \ldots, x_n) \in A_n$ can be rewritten in terms of the lengths: $|c^*(x_1, \ldots, x_n)| = \sum_{i=1}^{n} l(x_i)$, so $(x_1, \ldots, x_n) \in A_n$ iff the empirical average $\frac{1}{n}\sum_{i=1}^{n} l(x_i)$ is strictly less than $\mathbb{E}[l_1] + \varepsilon$. The strategy is to use this: the empirical average concentrates at $\mathbb{E}[l_1]$ by the WLLN, which lies strictly below $\mathbb{E}[l_1] + \varepsilon$, so the event becomes almost certain. Meanwhile, the cardinality $|A_n|$ is controlled by the number of short binary strings because $c^*$ is injective and its image on $A_n$ lies in a set of size $2^{n(\mathbb{E}[l_1] + \varepsilon)}$. [/guided] [/step] [step:Apply the WLLN to show $\mathbb{P}((X_1, \ldots, X_n) \in A_n) \to 1$] In terms of the random variables $l_i$, \begin{align*} \mathbb{P}\bigl((X_1, \ldots, X_n) \in A_n\bigr) = \mathbb{P}\!\left(\sum_{i=1}^{n} l_i < n(\mathbb{E}[l_1] + \varepsilon)\right) = \mathbb{P}\!\left(\frac{1}{n} \sum_{i=1}^{n} l_i < \mathbb{E}[l_1] + \varepsilon\right). \end{align*} Since $(l_i)$ is i.i.d. with finite mean $\mathbb{E}[l_1]$, the [Weak Law of Large Numbers](/theorems/???) applies and gives $\frac{1}{n}\sum_{i=1}^{n} l_i \xrightarrow{\mathbb{P}} \mathbb{E}[l_1]$. In particular, for every fixed $\varepsilon > 0$, \begin{align*} \mathbb{P}\!\left(\left|\frac{1}{n}\sum_{i=1}^{n} l_i - \mathbb{E}[l_1]\right| < \varepsilon\right) \to 1, \end{align*} and the event $\bigl\{\frac{1}{n}\sum_{i=1}^{n} l_i - \mathbb{E}[l_1] < \varepsilon\bigr\}$ contains the two-sided event, so its probability also tends to $1$. Therefore \begin{align*} \mathbb{P}\bigl((X_1, \ldots, X_n) \in A_n\bigr) \to 1 \quad \text{as } n \to \infty. \end{align*} [/step] [step:Bound $|A_n|$ by counting short binary strings via injectivity of $c^*$] Because the code $c$ is decipherable, its concatenation extension $c^*$ is [injective on $\bigcup_{n} A^n$](/theorems/???). Restricted to $A_n$, the map $c^*$ is in particular injective, and \begin{align*} c^*(A_n) \subseteq \{w \in \{0, 1\}^* : |w| < n(\mathbb{E}[l_1] + \varepsilon)\} = \bigcup_{k = 0}^{\lceil n(\mathbb{E}[l_1] + \varepsilon) \rceil - 1} \{0, 1\}^k. \end{align*} Hence \begin{align*} |A_n| = |c^*(A_n)| \le \sum_{k = 0}^{\lceil n(\mathbb{E}[l_1] + \varepsilon) \rceil - 1} 2^k = 2^{\lceil n(\mathbb{E}[l_1] + \varepsilon) \rceil} - 1 \le 2^{n(\mathbb{E}[l_1] + \varepsilon) + 1}. \end{align*} Taking $\log_2$ and dividing by $n$: \begin{align*} \frac{\log_2 |A_n|}{n} \le \mathbb{E}[l_1] + \varepsilon + \frac{1}{n}, \end{align*} so $\limsup_{n \to \infty} \frac{\log_2 |A_n|}{n} \le \mathbb{E}[l_1] + \varepsilon$. [guided] The counting argument rests on two ingredients. First, injectivity of $c^*$ turns a probability question about source sequences into a cardinality question about binary strings: distinct source sequences in $A_n$ map to distinct binary strings in $c^*(A_n)$, so $|A_n| = |c^*(A_n)|$. Second, every binary string of length less than $L := \lceil n(\mathbb{E}[l_1] + \varepsilon)\rceil$ lies in $\bigcup_{k=0}^{L-1}\{0,1\}^k$, a set of size $\sum_{k=0}^{L-1} 2^k = 2^L - 1 \le 2^L$. Combining gives $|A_n| \le 2^L \le 2^{n(\mathbb{E}[l_1] + \varepsilon) + 1}$, since $L = \lceil n(\mathbb{E}[l_1] + \varepsilon)\rceil \le n(\mathbb{E}[l_1] + \varepsilon) + 1$. The additive $+1$ in the exponent produces the $\frac{1}{n}$ correction, which vanishes in the limit and so does not affect the asymptotic rate. Where is decipherability used? Without it, $c^*$ could be non-injective — two distinct source sequences could produce the same concatenated codeword — and then the short-string counting bound would not control $|A_n|$. Decipherability is precisely the condition that ensures unique decoding and hence injectivity of $c^*$. [/guided] [/step] [step:Combine to conclude reliable encoding at rate $\mathbb{E}[l_1] + \varepsilon$] From the two previous steps, $(A_n)$ satisfies $\mathbb{P}((X_1, \ldots, X_n) \in A_n) \to 1$ and $\limsup_{n \to \infty} \frac{\log_2 |A_n|}{n} \le \mathbb{E}[l_1] + \varepsilon$. By the [definition of reliable encodability](/theorems/???), the source is reliably encodable at rate $\mathbb{E}[l_1] + \varepsilon$. [/step] [step:Take $\varepsilon \downarrow 0$ and conclude the information rate bound] Because the source is reliably encodable at rate $\mathbb{E}[l_1] + \varepsilon$ for every $\varepsilon > 0$, the information rate — defined as the infimum of rates at which reliable encoding is possible — satisfies \begin{align*} \text{information rate} \le \inf_{\varepsilon > 0} (\mathbb{E}[l_1] + \varepsilon) = \mathbb{E}[l_1]. \end{align*} This completes the proof. [/step]

Explore Further

Shannon's Noiseless Coding Theorem Information Theory Converse of Shannon's Second Coding Theorem Information Theory Hamming and Gilbert–Varshamov Bounds Information Theory Optimality of Huffman Codes Information Theory Rational Form of LFSR Streams Information Theory Basis from Generator Polynomial Information Theory Chain Rule for Entropy Information Theory Minimum Distance Determines Error Capability Information Theory

What brings you to Androma?

Start with a route through the knowledge graph.