[proofplan]
Compare the joint distribution $(p_{xy})$ with the product of marginals $(p_x q_y)$ — both are probability distributions on the same finite set $A \times B$, where $A, B$ denote the (finite) ranges of $X, Y$. [Gibbs' inequality](/theorems/???) applied to these two distributions upper-bounds the joint entropy $H(X, Y) = -\sum_{x,y} p_{xy} \log_2 p_{xy}$ by the cross-entropy $-\sum_{x,y} p_{xy} \log_2 (p_x q_y)$; factoring the logarithm of a product and collapsing the resulting marginal sums yields $H(X) + H(Y)$. The equality case of Gibbs gives the independence characterisation.
[/proofplan]
[step:Set up the two probability distributions on $A \times B$]
Let $A$ and $B$ denote the finite ranges of $X$ and $Y$, and write $p_{xy} := \mathbb{P}(X = x, Y = y)$ for the joint distribution. We may assume WLOG that $p_x > 0$ for all $x \in A$ and $q_y > 0$ for all $y \in B$: an element $x \in A$ with $p_x = 0$ has $p_{xy} = 0$ for every $y$, contributes $0$ to all entropies by the convention $0 \log_2 0 := 0$, and may be dropped from $A$; similarly for $B$.
Define two mass functions on $A \times B$:
\begin{align*}
P: A \times B &\to [0, 1], & P(x, y) &:= p_{xy}, \\
Q: A \times B &\to (0, 1], & Q(x, y) &:= p_x q_y.
\end{align*}
Then $P$ is a probability distribution on $A \times B$ (it is the joint distribution of $(X, Y)$) and $Q$ is also a probability distribution:
\begin{align*}
\sum_{x \in A} \sum_{y \in B} p_x q_y = \left(\sum_{x \in A} p_x\right)\left(\sum_{y \in B} q_y\right) = 1 \cdot 1 = 1.
\end{align*}
Moreover $Q(x, y) = p_x q_y > 0$ for every $(x, y) \in A \times B$ since both $p_x > 0$ and $q_y > 0$.
[/step]
[step:Apply Gibbs' inequality to bound $H(X, Y)$ by the cross-entropy]
We apply [Gibbs' inequality](/theorems/???) to the pair $(P, Q)$ on the finite set $A \times B$. Gibbs' inequality states: if $(P(z))_{z \in Z}$ and $(Q(z))_{z \in Z}$ are probability distributions on a finite (or countable) set $Z$ with $Q(z) > 0$ for all $z$ in the support of $P$, then
\begin{align*}
-\sum_{z \in Z} P(z) \log_2 P(z) \leq -\sum_{z \in Z} P(z) \log_2 Q(z),
\end{align*}
with equality if and only if $P(z) = Q(z)$ for all $z$ in the support of $P$.
Hypothesis verification: $P$ and $Q$ are probability distributions on the finite set $Z = A \times B$ (shown in the previous step); $Q(x, y) = p_x q_y > 0$ for all $(x, y) \in A \times B$, so in particular at every point of the support of $P$. Gibbs applies, giving
\begin{align*}
H(X, Y) = -\sum_{(x,y) \in A \times B} p_{xy} \log_2 p_{xy}
\leq -\sum_{(x,y) \in A \times B} p_{xy} \log_2 (p_x q_y). \tag{$\ast$}
\end{align*}
[/step]
[step:Factor the cross-entropy into $H(X) + H(Y)$ by splitting the logarithm]
Expand the right-hand side of ($\ast$) using $\log_2(p_x q_y) = \log_2 p_x + \log_2 q_y$:
\begin{align*}
-\sum_{(x,y) \in A \times B} p_{xy} \log_2 (p_x q_y)
&= -\sum_{(x,y)} p_{xy} \log_2 p_x - \sum_{(x,y)} p_{xy} \log_2 q_y \\
&= -\sum_{x \in A} \log_2 p_x \sum_{y \in B} p_{xy} - \sum_{y \in B} \log_2 q_y \sum_{x \in A} p_{xy} \\
&= -\sum_{x \in A} p_x \log_2 p_x - \sum_{y \in B} q_y \log_2 q_y \\
&= H(X) + H(Y),
\end{align*}
where the third equality uses the marginalisation identities $\sum_{y \in B} p_{xy} = p_x$ and $\sum_{x \in A} p_{xy} = q_y$ (both are immediate from the definition of marginal distribution). Combining with ($\ast$):
\begin{align*}
H(X, Y) \leq H(X) + H(Y),
\end{align*}
which is the subadditivity inequality.
[guided]
We want to turn the cross-entropy $-\sum_{x,y} p_{xy} \log_2 (p_x q_y)$ into $H(X) + H(Y)$. The mechanism is: split the logarithm, exchange the order of summation, and use that $(p_{xy})$ marginalises to $(p_x)$ and $(q_y)$.
Step 1: Split the log of a product. Since $p_x, q_y > 0$, we have $\log_2(p_x q_y) = \log_2 p_x + \log_2 q_y$. So
\begin{align*}
-\sum_{(x,y)} p_{xy} \log_2 (p_x q_y)
= -\sum_{(x,y)} p_{xy} \log_2 p_x - \sum_{(x,y)} p_{xy} \log_2 q_y.
\end{align*}
Step 2: In each sum, the log factor depends on only one variable, so we can pull it outside the inner sum. For the first sum, $\log_2 p_x$ depends only on $x$:
\begin{align*}
-\sum_{(x,y) \in A \times B} p_{xy} \log_2 p_x
= -\sum_{x \in A} \log_2 p_x \left(\sum_{y \in B} p_{xy}\right).
\end{align*}
By definition of marginal, $\sum_{y \in B} p_{xy} = \mathbb{P}(X = x) = p_x$. Hence
\begin{align*}
-\sum_{(x,y)} p_{xy} \log_2 p_x = -\sum_{x \in A} p_x \log_2 p_x = H(X).
\end{align*}
Symmetrically for the second sum, using $\sum_{x \in A} p_{xy} = q_y$:
\begin{align*}
-\sum_{(x,y)} p_{xy} \log_2 q_y = -\sum_{y \in B} q_y \log_2 q_y = H(Y).
\end{align*}
Combining these with ($\ast$):
\begin{align*}
H(X, Y) \leq H(X) + H(Y).
\end{align*}
[/guided]
[/step]
[step:Characterise equality via the Gibbs equality condition]
We now examine when equality holds. By the equality condition of [Gibbs' inequality](/theorems/???), the inequality ($\ast$) is an equality iff $P(x, y) = Q(x, y)$ for all $(x, y)$ in the support of $P$, i.e., iff
\begin{align*}
p_{xy} = p_x q_y \quad \text{for all } (x, y) \in A \times B \text{ with } p_{xy} > 0.
\end{align*}
In fact the factorisation extends to all of $A \times B$: if $p_{xy} = 0$ for some $(x, y)$, then either we have already dropped $x$ or $y$ (if the corresponding marginal is zero), or else both marginals are positive and $p_{xy} = p_x q_y$ would force $p_x q_y = 0$ — but $p_x, q_y > 0$ gives $p_x q_y > 0$, contradicting $p_{xy} = 0$. So equality in ($\ast$) is equivalent to $p_{xy} = p_x q_y$ for **every** $(x, y) \in A \times B$, which is precisely the definition of $X$ and $Y$ being independent.
Since the manipulation in the third step is an equality regardless, the full chain $H(X, Y) \leq H(X) + H(Y)$ is an equality iff ($\ast$) is an equality iff $X$ and $Y$ are independent. This completes the proof.
[/step]