[proofplan]
We expand the definition of conditional entropy, $H_\mu(\alpha \mid \beta)=H_\mu(\alpha\vee\beta)-H_\mu(\beta)$, directly over the atoms $A\cap B$ of the join of partitions. Let $\beta_+=\{B\in\beta: \mu(B)>0\}$, and for $B\in\beta_+$ let $\mu_B$ denote the [conditional probability measure](/theorems/4972) on $(X,\mathcal B)$ defined by $\mu_B(E)=\mu(E\cap B)/\mu(B)$ for $E\in\mathcal B$. For each such $B$, the numbers $\mu_B(A)$ with $A\in\alpha$ form the conditional distribution of the partition $\alpha$ inside $B$. Subtracting the entropy of a finite partition of $\beta$ from the entropy of the joined partition leaves exactly the weighted average of these conditional entropies, and null atoms of $\beta$ contribute no terms.
[/proofplan]
[step:Expand the conditional entropy over atoms of the joined partition]
Because $\alpha$ and $\beta$ are finite measurable partitions, every set $A\cap B$ with $A\in\alpha$ and $B\in\beta$ is measurable, and the nonempty intersections form the finite measurable partition $\alpha\vee\beta$, the join of partitions. By the definition of the join of partitions, $\alpha\vee\beta:=\{A\cap B:A\in\alpha, B\in\beta, A\cap B\neq\varnothing\}$. By the definition of entropy for a finite measurable partition $\gamma$, namely $H_\mu(\gamma):=-\sum_{C\in\gamma}\mu(C)\log\mu(C)$ with the convention $0\log 0=0$, we may sum over all pairs $(A,B)\in\alpha\times\beta$, including pairs with $\mu(A\cap B)=0$. Hence
\begin{align*}
H_\mu(\alpha\vee\beta) = -\sum_{A\in\alpha}\sum_{B\in\beta}\mu(A\cap B)\log\mu(A\cap B).
\end{align*}
The same finite-partition entropy definition gives
\begin{align*}
H_\mu(\beta) = -\sum_{B\in\beta}\mu(B)\log\mu(B).
\end{align*}
By the definition of conditional entropy, $H_\mu(\alpha\mid\beta):=H_\mu(\alpha\vee\beta)-H_\mu(\beta)$,
\begin{align*}
H_\mu(\alpha\mid\beta) = -\sum_{A\in\alpha}\sum_{B\in\beta}\mu(A\cap B)\log\mu(A\cap B)+\sum_{B\in\beta}\mu(B)\log\mu(B).
\end{align*}
[/step]
[step:Rewrite the entropy difference atom by atom inside each atom of $\beta$]
Fix $B\in\beta$. Since $\alpha$ is a partition of $X$, the family $\{A\cap B:A\in\alpha\}$ is a finite measurable partition of $B$. Hence finite additivity of $\mu$ gives
\begin{align*}
\sum_{A\in\alpha}\mu(A\cap B)=\mu(B).
\end{align*}
If $\mu(B)=0$, then $\mu(A\cap B)=0$ for every $A\in\alpha$, so all terms involving this atom vanish.
Define the finite set of positive-measure atoms of $\beta$ by
\begin{align*}
\beta_+=\{B\in\beta: \mu(B)>0\}.
\end{align*}
Now restrict to atoms $B\in\beta_+$. For such $B$, use the displayed factorisation
\begin{align*}
\mu(A\cap B)=\mu(B)\frac{\mu(A\cap B)}{\mu(B)}
\end{align*}
inside the logarithm. Then
\begin{align*}
\log\mu(A\cap B)=\log\mu(B)+\log\frac{\mu(A\cap B)}{\mu(B)}
\end{align*}
whenever $\mu(A\cap B)>0$, and the zero-measure terms are interpreted as $0$. Therefore, for each positive-measure $B$,
\begin{align*}
-\sum_{A\in\alpha}\mu(A\cap B)\log\mu(A\cap B)+\mu(B)\log\mu(B) = -\sum_{A\in\alpha}\mu(A\cap B)\log\frac{\mu(A\cap B)}{\mu(B)}.
\end{align*}
Summing this identity over all atoms $B\in\beta_+$ yields
\begin{align*}
H_\mu(\alpha\mid\beta) = -\sum_{A\in\alpha}\sum_{B\in\beta_+}\mu(A\cap B)\log\frac{\mu(A\cap B)}{\mu(B)}.
\end{align*}
[guided]
The only algebraic point in the proof is that the subtraction of $H_\mu(\beta)$ cancels exactly the $\log\mu(B)$ part of the joined entropy. Fix one atom $B\in\beta$. Since the sets $A\cap B$ for $A\in\alpha$ are disjoint and their union is $B$, finite additivity gives
\begin{align*}
\sum_{A\in\alpha}\mu(A\cap B)=\mu(B).
\end{align*}
If $\mu(B)=0$, then each subset $A\cap B$ has measure $0$, so this atom contributes neither to $H_\mu(\alpha\vee\beta)$ nor to $H_\mu(\beta)$. This is why the final sums may be restricted to positive-measure atoms of $\beta$.
Assume now that $\mu(B)>0$. For each $A\in\alpha$ with $\mu(A\cap B)>0$, we factor the measure of the intersection as
\begin{align*}
\mu(A\cap B)=\mu(B)\frac{\mu(A\cap B)}{\mu(B)}.
\end{align*}
Taking logarithms gives
\begin{align*}
\log\mu(A\cap B)=\log\mu(B)+\log\frac{\mu(A\cap B)}{\mu(B)}.
\end{align*}
Multiplying by $-\mu(A\cap B)$ and summing over $A\in\alpha$ gives
\begin{align*}
-\sum_{A\in\alpha}\mu(A\cap B)\log\mu(A\cap B) = -\sum_{A\in\alpha}\mu(A\cap B)\log\mu(B)-\sum_{A\in\alpha}\mu(A\cap B)\log\frac{\mu(A\cap B)}{\mu(B)}.
\end{align*}
The first sum on the right simplifies using $\sum_{A\in\alpha}\mu(A\cap B)=\mu(B)$:
\begin{align*}
-\sum_{A\in\alpha}\mu(A\cap B)\log\mu(B) = -\mu(B)\log\mu(B).
\end{align*}
Adding the $+\mu(B)\log\mu(B)$ term coming from subtracting $H_\mu(\beta)$ cancels this contribution. Therefore the net contribution of the fixed atom $B$ to $H_\mu(\alpha\mid\beta)$ is
\begin{align*}
-\sum_{A\in\alpha}\mu(A\cap B)\log\frac{\mu(A\cap B)}{\mu(B)}.
\end{align*}
Summing over all atoms $B\in\beta_+$ gives
\begin{align*}
H_\mu(\alpha\mid\beta) = -\sum_{A\in\alpha}\sum_{B\in\beta_+}\mu(A\cap B)\log\frac{\mu(A\cap B)}{\mu(B)}.
\end{align*}
Terms with $\mu(A\cap B)=0$ are assigned value $0$, matching the entropy convention and avoiding any undefined logarithm contribution.
To connect this cancellation to the weighted conditional entropy formula, define the [conditional probability](/page/Conditional%20Probability) measure on the positive-measure atom $B$ by
\begin{align*}
\mu_B(E)=\frac{\mu(E\cap B)}{\mu(B)}
\end{align*}
for every $E\in\mathcal B$. Then $\mu_B(A)=\mu(A\cap B)/\mu(B)$ for each $A\in\alpha$, so the finite-partition entropy of $\alpha$ with respect to $\mu_B$ is
\begin{align*}
H_{\mu_B}(\alpha)=-\sum_{A\in\alpha}\frac{\mu(A\cap B)}{\mu(B)}\log\frac{\mu(A\cap B)}{\mu(B)}.
\end{align*}
Multiplying by $\mu(B)$ gives exactly the fixed-atom contribution
\begin{align*}
\mu(B)H_{\mu_B}(\alpha)=-\sum_{A\in\alpha}\mu(A\cap B)\log\frac{\mu(A\cap B)}{\mu(B)}.
\end{align*}
Therefore summing over $B\in\beta_+$ proves both displayed identities in the theorem statement.
[/guided]
[/step]
[step:Identify the atomwise expression with the weighted conditional entropy]
Let $B\in\beta$ satisfy $\mu(B)>0$. By the definition of conditioning on a positive-measure atom, the conditional probability measure $\mu_B$ on $(X,\mathcal B)$ is the map
\begin{align*}
\mu_B: \mathcal B \to [0,1], \qquad E \mapsto \frac{\mu(E\cap B)}{\mu(B)}.
\end{align*}
Thus, for each $A\in\alpha$,
\begin{align*}
\mu_B(A)=\frac{\mu(A\cap B)}{\mu(B)}.
\end{align*}
The entropy of a finite partition of $\alpha$ with respect to $\mu_B$ is therefore
\begin{align*}
H_{\mu_B}(\alpha)=-\sum_{A\in\alpha}\frac{\mu(A\cap B)}{\mu(B)}\log\frac{\mu(A\cap B)}{\mu(B)}.
\end{align*}
Multiplying by $\mu(B)$ gives
\begin{align*}
\mu(B)H_{\mu_B}(\alpha)=-\sum_{A\in\alpha}\mu(A\cap B)\log\frac{\mu(A\cap B)}{\mu(B)}.
\end{align*}
Summing over all atoms $B\in\beta_+$ and comparing with the formula obtained in the previous step gives
\begin{align*}
H_\mu(\alpha\mid\beta)=\sum_{B\in\beta_+}\mu(B)H_{\mu_B}(\alpha).
\end{align*}
Together with the double-sum identity already proved, this is the desired conditional entropy formula.
[/step]