[proofplan]
Start from the multiplicative decomposition of the joint mass function, $\mathbb{P}(X=x, Y=y) = \mathbb{P}(X=x)\,\mathbb{P}(Y=y \mid X=x)$, valid wherever $\mathbb{P}(X=x) > 0$. Taking logarithms turns this product into a sum; multiplying by $-\mathbb{P}(X=x, Y=y)$ and summing over $(x, y)$ yields the joint entropy on the left and two sums on the right. The first sum collapses via marginalisation to $H(X)$; the second is, by definition, the conditional entropy $H(Y \mid X)$.
[/proofplan]
[step:Reduce to the support of the joint distribution]
Let $A, B$ denote the finite ranges of $X, Y$ and write $p_{xy} := \mathbb{P}(X = x, Y = y)$, $p_x := \mathbb{P}(X = x) = \sum_{y \in B} p_{xy}$, and $p_{y \mid x} := \mathbb{P}(Y = y \mid X = x) := p_{xy}/p_x$ (defined for $p_x > 0$).
All three entropies are defined using the convention $0 \log_2 0 := 0$. Let
\begin{align*}
S := \{(x, y) \in A \times B : p_{xy} > 0\}.
\end{align*}
Summands $(x, y) \notin S$ contribute $0$ to $H(X, Y) = -\sum_{(x,y)} p_{xy} \log_2 p_{xy}$, to $H(X) = -\sum_x p_x \log_2 p_x$ (on the marginal level, via $-p_x \log_2 p_x$ with $p_x = 0$ contributing $0$), and to $H(Y \mid X) = -\sum_{x : p_x > 0} p_x \sum_{y : p_{y \mid x} > 0} p_{y \mid x} \log_2 p_{y \mid x}$. So we may restrict all sums to $S$ without changing any quantity.
Note that $(x, y) \in S$ implies $p_x \geq p_{xy} > 0$, so the conditional probability $p_{y \mid x}$ is defined and positive on $S$.
[/step]
[step:Express $\log_2 p_{xy}$ as a sum via the multiplicative law of conditional probability]
On $S$, the definition $p_{y \mid x} = p_{xy}/p_x$ rearranges to
\begin{align*}
p_{xy} = p_x \cdot p_{y \mid x}.
\end{align*}
Taking $-\log_2$ of both sides (both sides are positive on $S$):
\begin{align*}
-\log_2 p_{xy} = -\log_2 p_x - \log_2 p_{y \mid x}. \tag{$\ast$}
\end{align*}
Multiplying ($\ast$) by $p_{xy} \geq 0$ and summing over $(x, y) \in S$:
\begin{align*}
-\sum_{(x,y) \in S} p_{xy} \log_2 p_{xy}
= -\sum_{(x,y) \in S} p_{xy} \log_2 p_x
- \sum_{(x,y) \in S} p_{xy} \log_2 p_{y \mid x}. \tag{$\ast\ast$}
\end{align*}
The left-hand side of ($\ast\ast$) is $H(X, Y)$ (by the reduction in the previous step).
[guided]
The identity that drives the chain rule is the multiplicative law for joint probability: wherever $p_x > 0$,
\begin{align*}
p_{xy} = \mathbb{P}(X = x, Y = y) = \mathbb{P}(X = x) \cdot \mathbb{P}(Y = y \mid X = x) = p_x \cdot p_{y \mid x}.
\end{align*}
This is just the definition of conditional probability rearranged.
Because entropy is a sum of $-p \log_2 p$ terms, we want to turn the product $p_x \cdot p_{y \mid x}$ into a sum by taking a logarithm. Since both factors are strictly positive on $S$:
\begin{align*}
\log_2 p_{xy} = \log_2 p_x + \log_2 p_{y \mid x},
\end{align*}
or equivalently, after negating:
\begin{align*}
-\log_2 p_{xy} = -\log_2 p_x - \log_2 p_{y \mid x}. \tag{$\ast$}
\end{align*}
We now weigh ($\ast$) by the joint probability $p_{xy}$ (the natural measure for entropy) and sum over all $(x, y) \in S$. Multiplication of an equality by $p_{xy} \geq 0$ is valid regardless of sign; summation over a finite set preserves the equality:
\begin{align*}
-\sum_{(x,y) \in S} p_{xy} \log_2 p_{xy}
= -\sum_{(x,y) \in S} p_{xy} \log_2 p_x
- \sum_{(x,y) \in S} p_{xy} \log_2 p_{y \mid x}. \tag{$\ast\ast$}
\end{align*}
The left-hand side is $-\sum_{(x,y) \in A \times B} p_{xy} \log_2 p_{xy} = H(X, Y)$: the summands with $(x, y) \notin S$ are zero by the $0 \log_2 0 := 0$ convention, so restricting to $S$ loses nothing.
The proof now reduces to identifying the two sums on the right as $H(X)$ and $H(Y \mid X)$ respectively.
[/guided]
[/step]
[step:Collapse the first sum to $H(X)$ via marginalisation]
In the first sum on the right of ($\ast\ast$), the factor $\log_2 p_x$ depends only on $x$, so we interchange the order of summation:
\begin{align*}
-\sum_{(x,y) \in S} p_{xy} \log_2 p_x
&= -\sum_{x \in A : p_x > 0} \log_2 p_x \sum_{y \in B : p_{xy} > 0} p_{xy} \\
&= -\sum_{x \in A : p_x > 0} \log_2 p_x \sum_{y \in B} p_{xy},
\end{align*}
where the last equality adds back the zero summands $p_{xy} = 0$ (which contribute $0$). By the marginalisation identity $\sum_{y \in B} p_{xy} = p_x$:
\begin{align*}
-\sum_{(x,y) \in S} p_{xy} \log_2 p_x
= -\sum_{x \in A : p_x > 0} p_x \log_2 p_x
= -\sum_{x \in A} p_x \log_2 p_x
= H(X),
\end{align*}
where in the second-to-last equality we again add back zero summands ($p_x \log_2 p_x = 0$ when $p_x = 0$).
[/step]
[step:Recognise the second sum as $H(Y \mid X)$ by definition]
The conditional entropy of $Y$ given $X$ is defined by
\begin{align*}
H(Y \mid X) := \sum_{x \in A : p_x > 0} p_x \cdot H(Y \mid X = x)
= -\sum_{x \in A : p_x > 0} p_x \sum_{y \in B : p_{y \mid x} > 0} p_{y \mid x} \log_2 p_{y \mid x}.
\end{align*}
Using $p_x \cdot p_{y \mid x} = p_{xy}$ and noting that $p_{y \mid x} > 0$ iff $p_{xy} > 0$ (for $p_x > 0$):
\begin{align*}
H(Y \mid X)
&= -\sum_{x \in A : p_x > 0} \sum_{y \in B : p_{xy} > 0} p_x \cdot p_{y \mid x} \log_2 p_{y \mid x} \\
&= -\sum_{(x, y) \in S} p_{xy} \log_2 p_{y \mid x}.
\end{align*}
This is exactly the second sum on the right of ($\ast\ast$).
[guided]
We recall the definition of the conditional entropy $H(Y \mid X)$. For each $x$ with $p_x > 0$, the conditional distribution of $Y$ given $X = x$ is $(p_{y \mid x})_{y \in B}$, and its entropy is
\begin{align*}
H(Y \mid X = x) = -\sum_{y \in B : p_{y \mid x} > 0} p_{y \mid x} \log_2 p_{y \mid x}.
\end{align*}
$H(Y \mid X)$ is the $X$-average of these:
\begin{align*}
H(Y \mid X) := \sum_{x \in A : p_x > 0} p_x \cdot H(Y \mid X = x).
\end{align*}
Expanding and using $p_x p_{y \mid x} = p_{xy}$ (the rearranged multiplicative law):
\begin{align*}
H(Y \mid X)
&= -\sum_{x : p_x > 0} p_x \sum_{y : p_{y \mid x} > 0} p_{y \mid x} \log_2 p_{y \mid x} \\
&= -\sum_{x : p_x > 0} \sum_{y : p_{y \mid x} > 0} (p_x p_{y \mid x}) \log_2 p_{y \mid x} \\
&= -\sum_{(x, y) : p_{xy} > 0} p_{xy} \log_2 p_{y \mid x} \\
&= -\sum_{(x, y) \in S} p_{xy} \log_2 p_{y \mid x},
\end{align*}
where we used: for $x$ with $p_x > 0$, the condition $p_{y \mid x} > 0$ is equivalent to $p_{xy} > 0$ (since $p_{y \mid x} = p_{xy}/p_x$ and $p_x > 0$); and for $x$ with $p_x = 0$, all $(x, y)$ satisfy $p_{xy} = 0$ so are excluded from $S$ regardless.
The resulting expression is exactly the second sum on the right of ($\ast\ast$).
[/guided]
[/step]
[step:Combine the identifications to conclude]
Substituting $H(X)$ and $H(Y \mid X)$ for the two right-hand-side sums in ($\ast\ast$):
\begin{align*}
H(X, Y) = H(X) + H(Y \mid X),
\end{align*}
which is the chain rule. This completes the proof.
[/step]