Attributions & Verification

Track contributions and verify content correctness

Proof

custom_env Unknown

[guided]The strategy is to exhibit the deficit $H(X) + H(Y) - H(X, Y)$ as a Kullback–Leibler divergence and invoke Gibbs' inequality. The natural candidate for a reference distribution is $q(x, y) := p_X(x) p_Y(y)$: the product of the marginals, which is the joint distribution that $(X, Y)$ would have if they were independent. First we verify the prerequisites for defining $D(p_{X,Y} \,\|\, q)$. Both are probability mass functions on the finite set $\mathcal{X} \times \mathcal{Y}$: $q$ sums to $\sum_{x,y} p_X(x) p_Y(y) = (\sum_x p_X(x))(\sum_y p_Y(y)) = 1$. We need absolute continuity, $p_{X,Y} \ll q$ — that is, whenever $q(x, y) = 0$ we must have $p_{X,Y}(x, y) = 0$. Suppose $q(x_0, y_0) = 0$. Then $p_X(x_0) = 0$ or $p_Y(y_0) = 0$. If $p_X(x_0) = 0$, then by marginalisation $\sum_y p_{X,Y}(x_0, y) = 0$, and non-negativity of the summands forces $p_{X,Y}(x_0, y) = 0$ for every $y$, in particular $p_{X,Y}(x_0, y_0) = 0$. The case $p_Y(y_0) = 0$ is symmetric. Hence $p_{X,Y} \ll q$. Now we compute: \begin{align*} D(p_{X,Y} \,\|\, q) &= \sum_{x, y} p_{X,Y}(x, y) \log \frac{p_{X,Y}(x, y)}{p_X(x) p_Y(y)} \\ &= \sum_{x, y} p_{X,Y}(x, y) \log p_{X,Y}(x, y) - \sum_{x, y} p_{X,Y}(x, y) \log p_X(x) - \sum_{x, y} p_{X,Y}(x, y) \log p_Y(y) \\ &= -H(X, Y) - \sum_x \log p_X(x) \sum_y p_{X,Y}(x, y) - \sum_y \log p_Y(y) \sum_x p_{X,Y}(x, y) \\ &= -H(X, Y) - \sum_x p_X(x) \log p_X(x) - \sum_y p_Y(y) \log p_Y(y) \\ &= -H(X, Y) + H(X) + H(Y). \end{align*} The term-by-term splitting of the logarithm, the switch in summation order, and the marginalisations are all legitimate because the sums are finite. (Where $p_{X,Y}(x, y) = 0$ we use the convention $0 \log 0 := 0$, consistent with $\lim_{t \downarrow 0} t \log t = 0$; where $p_X(x) = 0$ or $p_Y(y) = 0$ but $p_{X,Y}(x, y) = 0$, the entire term vanishes so no $\log 0$ issue arises.) [Gibbs' Inequality](/theorems/???) states: if $p$ and $q$ are probability mass functions on a countable set with $p \ll q$, then $D(p \,\|\, q) \geq 0$, with equality if and only if $p = q$. Its hypotheses are met ($p_{X,Y} \ll q$ was just verified), so \begin{align*} H(X) + H(Y) - H(X, Y) = D(p_{X,Y} \,\|\, q) \geq 0, \end{align*} i.e. $H(X, Y) \leq H(X) + H(Y)$. Why start from KL divergence rather than directly convexifying with Jensen? Both work; KL packages the computation more cleanly and simultaneously delivers the equality case for free via the Gibbs equality condition, which is exactly what we need next.[/guided]

custom_env Unknown

Verification Progress

6 Total Blocks

0 Verified

0% verified

Contributors

Unknown 6 blocks (0 verified)

Who Can Verify

Areas: Probability & Statistics
Subareas: Probability Theory

Viktor Miykov Admin

Max Vassiliev Global Reviewer

Horia Neagu Global Reviewer

강현욱 Global Reviewer

Demo Testing Global Reviewer

Archie Pennycook Global Reviewer

Quick Actions

Edit Theorem

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data