[proofplan]
We use the definition of conditional entropy for finite measurable partitions. We compare the conditional distribution of the $\alpha$-atoms over a coarse atom $B \in \beta$ with the conditional distributions over the finer $\gamma$-atoms lying inside $B$. The coarse conditional distribution is a weighted average of the finer conditional distributions. Since finite Shannon entropy is concave on the probability simplex, the entropy of the coarse distribution is at least the weighted average of the entropies of the refined distributions. Summing this inequality over all atoms of $\beta$ gives the desired monotonicity.
[/proofplan]
custom_env
admin
[step:Group the atoms of $\gamma$ according to the atom of $\beta$ that contains them]
For each $G \in \gamma$, choose an atom $r(G) \in \beta$ such that $\mu(G \setminus r(G))=0$. This defines a map $r: \gamma \to \beta$.
For each $B \in \beta$, define
\begin{align*}
\gamma_B := \{G \in \gamma : r(G)=B\}.
\end{align*}
If $G \in \gamma$ has $\mu(G)>0$, then the atom $r(G) \in \beta$ is unique: if $B_1,B_2 \in \beta$ both satisfy $\mu(G \setminus B_i)=0$, then $\mu(G \cap B_i)=\mu(G)$ for $i=1,2$, so the disjointness of distinct atoms of $\beta$ forces $B_1=B_2$. Zero-measure atoms of $\gamma$ may be assigned arbitrarily, and they do not affect any of the finite sums below. Since $\gamma$ refines $\beta$ modulo $\mu$-null sets and both partitions cover $X$, finite additivity over the disjoint atoms of $\gamma$ gives
\begin{align*}
\mu(B) = \sum_{G \in \gamma_B} \mu(G)
\end{align*}
for every $B \in \beta$. Also, for every $A \in \alpha$,
\begin{align*}
\mu(A \cap B) = \sum_{G \in \gamma_B} \mu(A \cap G).
\end{align*}
Atoms with measure zero do not contribute to conditional entropy, so all subsequent ratios are used only when the denominator is positive.
[/step]
custom_env
admin
[step:Write the coarse conditional distribution as an average of the refined distributions]Fix $B \in \beta$ with $\mu(B)>0$. Define the probability vector
\begin{align*}
p^B : \alpha \to [0,1]
\end{align*}
by
\begin{align*}
p^B(A) := \frac{\mu(A \cap B)}{\mu(B)}.
\end{align*}
For each $G \in \gamma_B$ with $\mu(G)>0$, define the probability vector
\begin{align*}
p^G : \alpha \to [0,1]
\end{align*}
by
\begin{align*}
p^G(A) := \frac{\mu(A \cap G)}{\mu(G)}.
\end{align*}
Finally define the weights
\begin{align*}
\lambda_G := \frac{\mu(G)}{\mu(B)}
\end{align*}
for $G \in \gamma_B$ with $\mu(G)>0$. These weights satisfy $\lambda_G \geq 0$ and
\begin{align*}
\sum_{\{G \in \gamma_B : \mu(G)>0\}} \lambda_G = 1.
\end{align*}
For each $A \in \alpha$, using the decomposition of $B$ into the atoms in $\gamma_B$ modulo null sets gives
\begin{align*}
p^B(A) = \frac{\mu(A \cap B)}{\mu(B)}.
\end{align*}
The same decomposition and omission of null refined atoms give
\begin{align*}
\frac{\mu(A \cap B)}{\mu(B)} = \sum_{G \in \gamma_B,\, \mu(G)>0} \frac{\mu(G)}{\mu(B)}\frac{\mu(A \cap G)}{\mu(G)}.
\end{align*}
By the definition of $\lambda_G$ and $p^G$, this becomes
\begin{align*}
p^B(A) = \sum_{G \in \gamma_B,\, \mu(G)>0} \lambda_G p^G(A).
\end{align*}
Thus $p^B$ is the weighted average of the probability vectors $p^G$ over the non-null refined atoms inside $B$.[/step]
custom_env
admin
[guided]Fix an atom $B \in \beta$ with $\mu(B)>0$. The goal is to compare the uncertainty of the $\alpha$-name after conditioning on $B$ with the uncertainty after conditioning on the smaller atoms $G \in \gamma_B$.
Define
\begin{align*}
p^B : \alpha \to [0,1]
\end{align*}
by
\begin{align*}
p^B(A) := \frac{\mu(A \cap B)}{\mu(B)}.
\end{align*}
This is the conditional distribution of the atom of $\alpha$ given that the point lies in $B$. For each $G \in \gamma_B$ with $\mu(G)>0$, define
\begin{align*}
p^G : \alpha \to [0,1]
\end{align*}
by
\begin{align*}
p^G(A) := \frac{\mu(A \cap G)}{\mu(G)}.
\end{align*}
This is the corresponding conditional distribution after the finer information $G$ is known.
The weights with which the refined pieces appear inside $B$ are
\begin{align*}
\lambda_G := \frac{\mu(G)}{\mu(B)}.
\end{align*}
They are non-negative. Because the atoms in $\gamma_B$ cover $B$ up to a $\mu$-null set, their total weight is one:
\begin{align*}
\sum_{\{G \in \gamma_B : \mu(G)>0\}} \lambda_G = \frac{1}{\mu(B)}\sum_{\{G \in \gamma_B : \mu(G)>0\}} \mu(G) = 1.
\end{align*}
Now fix $A \in \alpha$. Since $B$ is the union, modulo a null set, of the atoms $G \in \gamma_B$, finite additivity of $\mu$ over disjoint measurable atoms gives
\begin{align*}
\mu(A \cap B)
=
\sum_{G \in \gamma_B} \mu(A \cap G).
\end{align*}
Dividing by $\mu(B)$ and omitting only terms with $\mu(G)=0$, which contribute zero measure, yields
\begin{align*}
p^B(A) = \frac{\mu(A \cap B)}{\mu(B)}.
\end{align*}
Using the finite additivity identity above gives
\begin{align*}
\frac{\mu(A \cap B)}{\mu(B)} = \sum_{G \in \gamma_B,\, \mu(G)>0} \frac{\mu(G)}{\mu(B)}\frac{\mu(A \cap G)}{\mu(G)}.
\end{align*}
Substituting the definitions of $\lambda_G$ and $p^G$ gives
\begin{align*}
p^B(A) = \sum_{G \in \gamma_B,\, \mu(G)>0} \lambda_G p^G(A).
\end{align*}
Thus the conditional distribution on the coarse atom $B$ is exactly the weighted average of the conditional distributions on the finer atoms contained in $B$.[/guided]
custom_env
admin
[step:Use concavity of finite entropy on each coarse atom]
Define the finite entropy function
\begin{align*}
h_\alpha : [0,1]^\alpha \to [0,\infty)
\end{align*}
on probability vectors indexed by $\alpha$ by
\begin{align*}
h_\alpha(p) := -\sum_{A \in \alpha} p(A)\log p(A),
\end{align*}
again with $0\log 0=0$. The scalar function $\phi:[0,1]\to[0,\infty)$ given by $\phi(t)=-t\log t$ for $t>0$ and $\phi(0)=0$ is concave, since $\phi''(t)=-1/t<0$ on $(0,1]$ and $\phi$ is continuous at $0$. Therefore $h_\alpha$, being a finite sum of concave coordinate functions, is concave on the probability simplex.
Applying this concavity to the convex combination found above gives
\begin{align*}
h_\alpha(p^B) \geq \sum_{G \in \gamma_B,\, \mu(G)>0} \lambda_G h_\alpha(p^G).
\end{align*}
Multiplying by $\mu(B)$ gives
\begin{align*}
\mu(B)h_\alpha(p^B) \geq \sum_{G \in \gamma_B,\, \mu(G)>0} \mu(G)h_\alpha(p^G).
\end{align*}
[/step]
custom_env
admin
[step:Sum over the atoms of $\beta$ to obtain the conditional entropy inequality]
Summing the preceding inequality over all atoms $B \in \beta$ with $\mu(B)>0$ gives
\begin{align*}
\sum_{B \in \beta,\, \mu(B)>0} \mu(B)h_\alpha(p^B) \geq \sum_{B \in \beta,\, \mu(B)>0} \sum_{G \in \gamma_B,\, \mu(G)>0} \mu(G)h_\alpha(p^G).
\end{align*}
The left-hand side is exactly $H_\mu(\alpha \mid \beta)$. The double sum on the right-hand side is exactly $H_\mu(\alpha \mid \gamma)$, because every non-null atom $G \in \gamma$ belongs to exactly one family $\gamma_B$. Hence
\begin{align*}
H_\mu(\alpha \mid \beta) \geq H_\mu(\alpha \mid \gamma).
\end{align*}
Equivalently,
\begin{align*}
H_\mu(\alpha \mid \gamma) \leq H_\mu(\alpha \mid \beta).
\end{align*}
This proves monotonicity of conditional entropy under refinement.
[/step]