Joint Conditioning Inequality — Statement & Proof

Joint Conditioning Inequality (Theorem # 1653)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We expand the joint entropy $H(X, Y, Z)$ in two ways using the [Chain Rule for Entropy](/theorems/???): once peeling off $Z$ last, and once peeling off $X$ last. Equating the two expansions produces an identity relating $H(X \mid Y)$, $H(X \mid Y, Z)$, and the conditional entropies of $Z$. Using the non-negativity of conditional entropy and the fact that conditioning reduces entropy, both $H(Z \mid X, Y) \geq 0$ and $H(Z \mid Y) \leq H(Z)$, we bound the difference by $H(Z)$. [/proofplan] [step:Expand $H(X, Y, Z)$ peeling off $Z$ first] Applying the [Chain Rule for Entropy](/theorems/???) repeatedly, grouping $(X, Y)$ first and then peeling off $Y$, \begin{align*} H(X, Y, Z) &= H(X, Y) + H(Z \mid X, Y) \\ &= H(Y) + H(X \mid Y) + H(Z \mid X, Y). \end{align*} Each application of the chain rule requires only that the relevant joint distributions be well defined, which holds since $X$, $Y$, $Z$ are defined on a common probability space with finite alphabets. [/step] [step:Expand $H(X, Y, Z)$ peeling off $X$ first] Again by the [Chain Rule for Entropy](/theorems/???), this time grouping $(Y, Z)$ first and then peeling off $Y$, \begin{align*} H(X, Y, Z) &= H(Y, Z) + H(X \mid Y, Z) \\ &= H(Y) + H(Z \mid Y) + H(X \mid Y, Z). \end{align*} [/step] [step:Equate the two expansions and cancel $H(Y)$] Setting the two expressions for $H(X, Y, Z)$ equal, \begin{align*} H(Y) + H(X \mid Y) + H(Z \mid X, Y) = H(Y) + H(Z \mid Y) + H(X \mid Y, Z). \end{align*} Subtracting $H(Y)$ (a finite real number, since the alphabets are finite) from both sides, \begin{align*} H(X \mid Y) + H(Z \mid X, Y) = H(Z \mid Y) + H(X \mid Y, Z). \end{align*} Rearranging, \begin{align*} H(X \mid Y) - H(X \mid Y, Z) = H(Z \mid Y) - H(Z \mid X, Y). \end{align*} [guided] The two chain-rule expansions both decompose $H(X, Y, Z)$, so equating them is a bookkeeping identity that must hold. The value of this manoeuvre is that it produces, on the two sides, exactly the quantities we need to compare: $H(X \mid Y)$ and $H(X \mid Y, Z)$ on the left, and $H(Z \mid Y)$ and $H(Z \mid X, Y)$ on the right. The symmetry between the two triples is the key idea — the identity converts a statement about the $X$-side into a statement about the $Z$-side, which turns out to be easier to bound. Cancelling $H(Y)$ from both sides is valid because $H(Y)$ is a finite non-negative real number (finite because $\mathcal{Y}$ is finite and the Shannon entropy of a random variable with finite support is finite). Rearranging so that both "$X$-quantities" are on the left and both "$Z$-quantities" are on the right yields \begin{align*} H(X \mid Y) - H(X \mid Y, Z) = H(Z \mid Y) - H(Z \mid X, Y). \end{align*} This identity is the core of the proof: it reduces the desired inequality to bounding the right-hand side. [/guided] [/step] [step:Bound the right-hand side by $H(Z)$] By [non-negativity of conditional entropy](/theorems/???), $H(Z \mid X, Y) \geq 0$. Hence \begin{align*} H(Z \mid Y) - H(Z \mid X, Y) \leq H(Z \mid Y). \end{align*} By [Conditioning Reduces Entropy](/theorems/1652) applied to the pair $(Z, Y)$, \begin{align*} H(Z \mid Y) \leq H(Z). \end{align*} Chaining these bounds, \begin{align*} H(Z \mid Y) - H(Z \mid X, Y) \leq H(Z \mid Y) \leq H(Z). \end{align*} [guided] The right-hand side of the identity established above is $H(Z \mid Y) - H(Z \mid X, Y)$. Two facts deliver the bound we need. First, conditional entropy is non-negative: for any random variables $U, V$ on finite alphabets, \begin{align*} H(U \mid V) = \sum_v p_V(v) \, H(U \mid V = v) \geq 0, \end{align*} since each $H(U \mid V = v) \geq 0$ (entropy of a probability distribution on a finite set is non-negative) and $p_V(v) \geq 0$. Applying this to $(U, V) = (Z, (X, Y))$, \begin{align*} H(Z \mid X, Y) \geq 0, \end{align*} so dropping this term can only increase the right-hand side: \begin{align*} H(Z \mid Y) - H(Z \mid X, Y) \leq H(Z \mid Y). \end{align*} Second, [Conditioning Reduces Entropy](/theorems/1652) states: for any discrete random variables $U, V$ on finite alphabets, $H(U \mid V) \leq H(U)$. Applying this with $(U, V) = (Z, Y)$, \begin{align*} H(Z \mid Y) \leq H(Z). \end{align*} Both hypotheses of theorem 1652 are satisfied: $Z$ and $Y$ are discrete random variables on finite alphabets defined on the same probability space, by the hypotheses of the theorem being proved. Chaining, \begin{align*} H(Z \mid Y) - H(Z \mid X, Y) \leq H(Z \mid Y) \leq H(Z). \end{align*} [/guided] [/step] [step:Combine the identity and the bound to conclude] Combining the identity from the third step, \begin{align*} H(X \mid Y) - H(X \mid Y, Z) = H(Z \mid Y) - H(Z \mid X, Y), \end{align*} with the bound from the fourth step, \begin{align*} H(Z \mid Y) - H(Z \mid X, Y) \leq H(Z), \end{align*} we obtain \begin{align*} H(X \mid Y) - H(X \mid Y, Z) \leq H(Z), \end{align*} equivalently \begin{align*} H(X \mid Y) \leq H(X \mid Y, Z) + H(Z), \end{align*} which is the statement of the theorem. [/step]

Explore Further

Kraft's Inequality Characterises Prefix-Free Codes Information Theory Standard Form of Generator Matrix Information Theory Double Dual for Linear Codes Information Theory Minimum Distance and Column Independence Information Theory Optimality of Huffman Codes Information Theory Information Rate of Bernoulli Sources Information Theory DSA Correctness Information Theory Hamming and Gilbert–Varshamov Bounds Information Theory

What brings you to Androma?

Start with a route through the knowledge graph.

Joint Conditioning Inequality (Theorem # 1653)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Joint Conditioning Inequality (Theorem # 1653)

Discussion

Proof

Explore Further