Achievability with Worst-Case Error for the BSC — Statement & Proof

Achievability with Worst-Case Error for the BSC (Theorem # 1659)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We bootstrap off the average-error version (the [Achievability Theorem for the BSC](/theorems/1658)). Pick a rate $R'$ strictly between $R$ and $1 - H(p)$, produce a sequence of codes $(C'_n)$ of size $\lfloor 2^{nR'} \rfloor$ with average error $e(C'_n) \to 0$, then expurgate: delete the worst half of each $C'_n$, keeping the half with the lowest conditional error probabilities. A Markov-style argument shows the surviving codewords each have error at most $2 e(C'_n)$, so the maximum error goes to zero. The expurgated code has size $\lceil \lfloor 2^{nR'} \rfloor / 2 \rceil$, which is eventually at least $\lfloor 2^{nR} \rfloor$ because $R' > R$; we trim down to exactly that size by discarding extra codewords, which only lowers the maximum error. [/proofplan] [step:Choose an intermediate rate $R'$ with $R < R' < 1 - H(p)$] By the hypothesis $R < 1 - H(p)$, the open interval $(R, 1 - H(p))$ is non-empty; pick any \begin{align*} R' \in (R, 1 - H(p)). \end{align*} By the [Achievability Theorem for the BSC](/theorems/1658) applied at rate $R'$, there exists a sequence $(C'_n)_{n \ge 1}$ of codes with $C'_n \subseteq \{0,1\}^n$, $|C'_n| = m'_n := \lfloor 2^{nR'} \rfloor$, and average error probability $e(C'_n) \to 0$. [guided] Why introduce $R'$ rather than working directly at rate $R$? The expurgation halves the code size, which costs one bit of rate. Starting the construction at rate $R$ would leave us with a code of rate $R - 1/n$, eventually below $R$. We need a cushion: a rate $R'$ strictly bigger than $R$. The constraint $R' < 1 - H(p)$ is needed so that the average-error theorem applies. Since $R < 1 - H(p)$ strictly, there is room in between; any $R'$ in the open interval $(R, 1 - H(p))$ will do. [/guided] [/step] [step:Order each $C'_n$ by conditional error probability and delete the worst half] Fix $n$ and abbreviate $C' = C'_n$ with $m' = m'_n$. For each codeword $c \in C'$, let \begin{align*} e(c; C') := \mathbb{P}(\text{decoder output} \ne c \mid c \text{ sent using } C'), \end{align*} so that $e(C') = \frac{1}{m'} \sum_{c \in C'} e(c; C')$. Order the codewords as $C' = \{c_{(1)}, c_{(2)}, \ldots, c_{(m')}\}$ with \begin{align*} e(c_{(1)}; C') \le e(c_{(2)}; C') \le \cdots \le e(c_{(m')}; C'). \end{align*} Let $k := \lceil m'/2 \rceil$ and define the expurgated code \begin{align*} C'' := \{c_{(1)}, c_{(2)}, \ldots, c_{(k)}\}. \end{align*} Then $|C''| = k = \lceil m'/2 \rceil$. [/step] [step:Bound the worst-case error of $C''$ by twice the average error of $C'$] We show \begin{align*} \hat{e}(C''; \text{decoder of } C') \le 2\, e(C'). \end{align*} Suppose for contradiction that $e(c_{(j)}; C') > 2 e(C')$ for some $j \le k = \lceil m'/2 \rceil$. By the ordering, \begin{align*} e(c_{(i)}; C') \ge e(c_{(j)}; C') > 2 e(C') \qquad \text{for all } i \ge j. \end{align*} The number of such indices $i$ is $m' - j + 1 \ge m' - k + 1 \ge m'/2$. Hence \begin{align*} e(C') = \frac{1}{m'} \sum_{i=1}^{m'} e(c_{(i)}; C') \ge \frac{1}{m'} \sum_{i=j}^{m'} e(c_{(i)}; C') > \frac{1}{m'} \cdot \frac{m'}{2} \cdot 2 e(C') = e(C'), \end{align*} a contradiction. So every codeword of $C''$ has conditional error at most $2 e(C')$. Crucially, the decoder used for $C''$ is still the decoder originally specified for $C'$ (minimum-distance on $C'$), restricted to the event that a codeword of $C''$ was sent. This decoder may output an element of $C' \setminus C''$ — if so, it counts as a decoding error for $C''$. Thus the bound $e(c; C') \le 2 e(C')$ directly controls the error probability on $C''$, giving $\hat{e}(C'') \le 2 e(C')$. Since $e(C'_n) \to 0$ by Step 1, we have $\hat{e}(C''_n) \le 2 e(C'_n) \to 0$. [guided] The argument is a Markov-type tail bound turned deterministic by the ordering. We want: at least half of the codewords have error $\le 2 e(C')$. Suppose instead that strictly more than half had error $> 2 e(C')$. Then the average over all codewords — which is a mean of $m'$ numbers — would be strictly greater than $(m'/2) \cdot 2 e(C') / m' = e(C')$, contradicting that the average is exactly $e(C')$. Equivalently, if we sort the errors in ascending order, the median-or-below indices $i \le \lceil m'/2 \rceil$ are forced to have error $\le 2 e(C')$, so the first $k = \lceil m'/2 \rceil$ of them do. Why is the decoder question slightly delicate? When we delete codewords, the decoder is not automatically redefined. The natural choice is to keep using the decoder inherited from $C'$, which maps $\{0,1\}^n \to C' \cup \{\bot\}$. Restricted to the sender of a kept codeword $c \in C''$, the event "decoder outputs $c$" is exactly the event "no error under the original decoder". So the conditional error probability $e(c; C')$ is unchanged when we restrict attention to $C''$-senders. A decoding into some element of $C' \setminus C''$ is still an error for the $C''$-code. We do not need to replace the decoder with a minimum-distance decoder over $C''$: any such replacement can only **decrease** the error, so the bound $\hat{e}(C'') \le 2 e(C')$ still holds. [/guided] [/step] [step:Verify that $C''$ has size at least $\lfloor 2^{nR} \rfloor$ for large $n$] We have $|C''_n| = \lceil m'_n / 2 \rceil = \lceil \lfloor 2^{nR'} \rfloor / 2 \rceil$. For $n$ large enough, \begin{align*} \left\lceil \frac{\lfloor 2^{nR'} \rfloor}{2} \right\rceil \ge \frac{2^{nR'} - 1}{2} \ge 2^{nR' - 1} - 1. \end{align*} We need this to be at least $\lfloor 2^{nR} \rfloor$. Since $\lfloor 2^{nR} \rfloor \le 2^{nR}$, it suffices to show $2^{nR' - 1} - 1 \ge 2^{nR}$ for all large $n$, or equivalently \begin{align*} 2^{nR'-1} \ge 2^{nR} + 1. \end{align*} Because $R' > R$, we have $2^{n(R' - R) - 1} \to \infty$ as $n \to \infty$. Concretely, pick $N$ large enough that $2^{n(R' - R) - 1} \ge 2$ for all $n \ge N$; then $2^{nR'-1} = 2^{nR} \cdot 2^{n(R' - R) - 1} \ge 2 \cdot 2^{nR} \ge 2^{nR} + 1$. So for all $n \ge N$, $|C''_n| \ge \lfloor 2^{nR} \rfloor$. [/step] [step:Pass to a subcode of exact size $\lfloor 2^{nR} \rfloor$] For $n \ge N$, let $C_n$ be an arbitrary subset of $C''_n$ of size exactly $\lfloor 2^{nR} \rfloor$; this is possible by Step 4. For $n < N$, set $C_n$ to be any $\lfloor 2^{nR} \rfloor$-element subset of $\{0,1\}^n$ (finitely many such $n$ do not affect the limit). We verify that $\hat{e}(C_n) \to 0$. For $n \ge N$, $C_n \subseteq C''_n$ and the decoder on $C_n$ is minimum-distance on $C_n$. For any $c \in C_n$, \begin{align*} \mathbb{P}(\text{decoder on } C_n \text{ outputs } \ne c \mid c \text{ sent}) \le \mathbb{P}(\text{decoder on } C' \text{ outputs } \ne c \mid c \text{ sent}) \le \hat{e}(C''_n), \end{align*} because shrinking the codebook from $C'$ to $C_n$ removes candidate decoder outputs, so the event "nearest codeword (in $C_n$) equals $c$" is a superset of "nearest codeword (in $C'$) equals $c$". Hence $\hat{e}(C_n) \le \hat{e}(C''_n) \le 2 e(C'_n) \to 0$. [guided] Removing codewords from a code can only help the worst-case error under minimum-distance decoding. The minimum-distance decoder on $C_n$ outputs the nearest element of $C_n$; if $c$ was the nearest in $C'$ among all the competitors, it is still the nearest in $C_n \subset C'$. So every "correct decoding in $C'$" remains a correct decoding in $C_n$. Erroneous decodings may become more common (because new elements of $C_n$ may tie or beat the true codeword, but now nothing new is added), or rather, they may shift — but on the event where $C'$-decoding succeeded, $C_n$-decoding also succeeds. Hence the worst-case error probability of $C_n$ is at most that of $C''_n$, and passes to zero. (A clean way to see this: the event "minimum-distance decoder on $C_n$ outputs $c$" contains the event "minimum-distance decoder on $C'$ outputs $c$" when $c \in C_n \subseteq C'$ and we have a unique-minimum decoder, because fewer competitors means fewer ways for another codeword to be closer.) [/guided] [/step] [step:Combine the estimates to conclude] By Steps 4 and 5, $(C_n)_{n \ge 1}$ is a sequence of codes with $C_n \subseteq \{0,1\}^n$, $|C_n| = \lfloor 2^{nR} \rfloor$, and \begin{align*} \hat{e}(C_n) \le 2\, e(C'_n) \to 0 \qquad \text{as } n \to \infty. \end{align*} This is the required sequence with maximum error probability tending to zero, completing the proof. [/step]

Explore Further

Shannon's First Coding Theorem Information Theory Rabin Security Reduction Information Theory Optimality of Huffman Codes Information Theory RSA Key Recovery Reduces to Factoring Information Theory Shannon's Noiseless Coding Theorem Information Theory GV Asymptotic Lower Bound Information Theory Joint Conditioning Inequality Information Theory Generator Polynomial Information Theory

What brings you to Androma?

Start with a route through the knowledge graph.

Achievability with Worst-Case Error for the BSC (Theorem # 1659)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Achievability with Worst-Case Error for the BSC (Theorem # 1659)

Discussion

Proof

Explore Further