[proofplan]
The proof expands the integral over the finite measurable refinement of $X$ by the atoms $A \cap B$, where $A \in \alpha$ and $B \in \beta$. On every atom with $\mu(B)>0$, the conditional information function is constant, so its integral over that atom is the measure of the atom times the constant value. The atoms lying inside $\mu$-null atoms of $\beta$ contribute nothing, and the resulting finite sum is exactly the definition of $H_\mu(\alpha \mid \beta)$.
[/proofplan]
[step:Partition $X$ into the measurable atoms on which conditional information is constant]
For each pair $(A,B) \in \alpha \times \beta$, the set $A \cap B$ belongs to $\mathcal B$ because both $A$ and $B$ are $\mathcal B$-measurable. Since $\alpha$ and $\beta$ are finite partitions of $X$, define $\gamma := \{A \cap B : A \in \alpha,\ B \in \beta,\ A \cap B \neq \varnothing\}$. Then $\gamma$ is a finite $\mathcal B$-measurable partition of $X$.
Let $N_\beta := \bigcup\{B \in \beta : \mu(B)=0\}$. Because $\beta$ is finite and each atom in the union has $\mu$-measure zero, finite additivity gives $\mu(N_\beta)=0$.
If $x \in A \cap B$ with $\mu(B)>0$, then by definition $I_\mu(\alpha \mid \beta)(x)$ is the quantity
\begin{align*}
-\log \frac{\mu(A \cap B)}{\mu(B)}.
\end{align*}
Thus $I_\mu(\alpha \mid \beta)$ is constant on every set $A \cap B$ with $\mu(B)>0$. On $N_\beta$ it is defined to be $0$.
Because $\gamma$ is a finite $\mathcal B$-measurable partition of $X$ and $I_\mu(\alpha \mid \beta)$ is constant on each atom of $\gamma$ after the null atoms of $\beta$ are assigned value $0$, the function $I_\mu(\alpha \mid \beta): X \to [0,\infty]$ is $\mathcal B$-measurable. This includes the case where the constant value on a refined atom is $+\infty$.
[/step]
[step:Integrate the constant value over each refined atom]
Since $\gamma$ is a finite measurable partition of $X$, the indicator functions of its atoms sum pointwise to $1$ on $X$. Multiplying $I_\mu(\alpha \mid \beta)$ by this finite indicator decomposition and using the defining additivity of the non-negative extended [Lebesgue integral](/page/Lebesgue%20Integral) over finite sums gives
\begin{align*}
\int_X I_\mu(\alpha \mid \beta)(x)\,d\mu(x) = \sum_{\{B \in \beta : \mu(B)>0\}} \sum_{A \in \alpha} \int_{A \cap B} I_\mu(\alpha \mid \beta)(x)\,d\mu(x).
\end{align*}
The atoms contained in $N_\beta$ do not appear in this sum because their total $\mu$-measure is zero and $I_\mu(\alpha \mid \beta)$ is zero on $N_\beta$.
For fixed $A \in \alpha$ and $B \in \beta$ with $\mu(B)>0$, the function $I_\mu(\alpha \mid \beta)$ has the extended constant value $-\log(\mu(A \cap B)/\mu(B))$ on $A \cap B$. Therefore
\begin{align*}
\int_{A \cap B} I_\mu(\alpha \mid \beta)(x)\,d\mu(x) = -\mu(A \cap B)\log \frac{\mu(A \cap B)}{\mu(B)}.
\end{align*}
If $\mu(A \cap B)=0$, then $I_\mu(\alpha \mid \beta)=+\infty$ on the null set $A \cap B$, so the extended Lebesgue integral over that set is $0$; this is exactly the convention $0\log 0:=0$.
[guided]
The hypotheses give that $\alpha$ and $\beta$ are finite measurable partitions of $(X,\mathcal B,\mu)$. Define the finite measurable refinement $\gamma$ by
\begin{align*}
\gamma := \{A \cap B : A \in \alpha,\ B \in \beta,\ A \cap B \neq \varnothing\}.
\end{align*}
The key point is that the conditional information function is a non-negative extended-valued function that is constant on each atom of $\gamma$, except that it may take the value $+\infty$ on refined atoms of $\mu$-measure zero. The refinement $\gamma$ is the right partition because knowing $x \in A \cap B$ determines both the atom $A$ of $\alpha$ and the atom $B$ of $\beta$.
Before integrating, we check measurability. Each set $A \cap B$ is $\mathcal B$-measurable, and because $\alpha$ and $\beta$ are finite, the refinement $\gamma$ is finite. Since $I_\mu(\alpha \mid \beta)$ is constant, possibly with extended value $+\infty$, on each refined atom with $\mu(B)>0$ and is defined to be $0$ on the union of the null atoms of $\beta$, it is a finite-partition simple extended-valued function. Therefore $I_\mu(\alpha \mid \beta): X \to [0,\infty]$ is $\mathcal B$-measurable.
For every $A \in \alpha$ and every $B \in \beta$ with $\mu(B)>0$, the [conditional probability](/page/Conditional%20Probability) is
\begin{align*}
\mu(A \mid B) = \frac{\mu(A \cap B)}{\mu(B)}.
\end{align*}
Hence every point $x \in A \cap B$ has the same conditional information value:
\begin{align*}
I_\mu(\alpha \mid \beta)(x) = -\log \frac{\mu(A \cap B)}{\mu(B)}.
\end{align*}
If $\mu(A \cap B)>0$, this value is finite and constant on $A \cap B$, so integrating over $A \cap B$ multiplies the constant by the measure of the set. If $\mu(A \cap B)=0$, the value is $+\infty$ on a null set and its extended Lebesgue integral over that null set is $0$. In both cases, with the convention $0\log 0:=0$, we have
\begin{align*}
\int_{A \cap B} I_\mu(\alpha \mid \beta)(x)\,d\mu(x) = -\mu(A \cap B)\log \frac{\mu(A \cap B)}{\mu(B)}.
\end{align*}
Now we add these contributions over all atoms of the finite refinement. The atoms $B \in \beta$ with $\mu(B)=0$ form the measurable set
\begin{align*}
N_\beta = \bigcup\{B \in \beta : \mu(B)=0\},
\end{align*}
and since the union is finite, $\mu(N_\beta)=0$. Therefore those atoms contribute zero to the integral. Thus
\begin{align*}
\int_X I_\mu(\alpha \mid \beta)(x)\,d\mu(x) = \sum_{\{B \in \beta : \mu(B)>0\}} \sum_{A \in \alpha} \int_{A \cap B} I_\mu(\alpha \mid \beta)(x)\,d\mu(x).
\end{align*}
Substituting the constant-value integral on each atom gives
\begin{align*}
\int_X I_\mu(\alpha \mid \beta)(x)\,d\mu(x) = -\sum_{\{B \in \beta : \mu(B)>0\}} \sum_{A \in \alpha} \mu(A \cap B)\log \frac{\mu(A \cap B)}{\mu(B)}.
\end{align*}
This is exactly the finite double sum defining conditional entropy.
[/guided]
[/step]
[step:Identify the resulting finite sum with conditional entropy]
Combining the atomwise integral formula over all $A \in \alpha$ and all $B \in \beta$ with $\mu(B)>0$, we obtain
\begin{align*}
\int_X I_\mu(\alpha \mid \beta)(x)\,d\mu(x) = -\sum_{\{B \in \beta : \mu(B)>0\}} \sum_{A \in \alpha} \mu(A \cap B)\log \frac{\mu(A \cap B)}{\mu(B)}.
\end{align*}
By the definition of $H_\mu(\alpha \mid \beta)$, the right-hand side is $H_\mu(\alpha \mid \beta)$. Hence
\begin{align*}
H_\mu(\alpha \mid \beta) = \int_X I_\mu(\alpha \mid \beta)(x)\,d\mu(x).
\end{align*}
This proves the theorem.
[/step]