[guided]The right-hand side of the identity established above is $H(Z \mid Y) - H(Z \mid X, Y)$. Two facts deliver the bound we need.
First, conditional entropy is non-negative: for any random variables $U, V$ on finite alphabets,
\begin{align*}
H(U \mid V) = \sum_v p_V(v) \, H(U \mid V = v) \geq 0,
\end{align*}
since each $H(U \mid V = v) \geq 0$ (entropy of a probability distribution on a finite set is non-negative) and $p_V(v) \geq 0$. Applying this to $(U, V) = (Z, (X, Y))$,
\begin{align*}
H(Z \mid X, Y) \geq 0,
\end{align*}
so dropping this term can only increase the right-hand side:
\begin{align*}
H(Z \mid Y) - H(Z \mid X, Y) \leq H(Z \mid Y).
\end{align*}
Second, [Conditioning Reduces Entropy](/theorems/1652) states: for any discrete random variables $U, V$ on finite alphabets, $H(U \mid V) \leq H(U)$. Applying this with $(U, V) = (Z, Y)$,
\begin{align*}
H(Z \mid Y) \leq H(Z).
\end{align*}
Both hypotheses of theorem 1652 are satisfied: $Z$ and $Y$ are discrete random variables on finite alphabets defined on the same probability space, by the hypotheses of the theorem being proved.
Chaining,
\begin{align*}
H(Z \mid Y) - H(Z \mid X, Y) \leq H(Z \mid Y) \leq H(Z).
\end{align*}[/guided]