[proofplan]
We compare the finite-time joined partitions used to define partition entropy. Refinement is preserved by taking inverse images under $T$ and by finite joins, so $\mathcal{Q}_0^{n-1}$ refines $\mathcal{P}_0^{n-1}$ for every $n\in\mathbb{N}$. We then prove directly that Shannon entropy of a finite measurable partition is monotone under refinement. Applying this finite-time inequality, dividing by $n$, and passing to the defining limits gives the desired inequality.
[/proofplan]
[step:Define the finite joins and entropy convention]
Let $\mathbb{N}:=\{1,2,3,\dots\}$ denote the set of positive integers. Let $\operatorname{Part}_{\mathrm{fin}}(X)$ denote the set of finite measurable partitions of $X$. For a finite measurable partition $\mathcal{R}$ of $X$ and $n\in\mathbb{N}$, define the finite join of partitions
\begin{align*}
\mathcal{R}_0^{n-1}:=\bigvee_{j=0}^{n-1}T^{-j}\mathcal{R},
\end{align*}
whose atoms are the nonempty sets of the form
\begin{align*}
\bigcap_{j=0}^{n-1}T^{-j}R_j
\end{align*}
with $R_j\in\mathcal{R}$ for each $j\in\{0,\dots,n-1\}$.
Define the finite-partition Shannon entropy map
\begin{align*}
H_\mu: \operatorname{Part}_{\mathrm{fin}}(X)\to[0,\infty)
\end{align*}
by
\begin{align*}
H_\mu(\mathcal{R}):=-\sum_{R\in\mathcal{R}}\mu(R)\log \mu(R)
\end{align*}
for each finite measurable partition $\mathcal{R}\in\operatorname{Part}_{\mathrm{fin}}(X)$, with the convention $0\log 0:=0$. Define the partition entropy map associated to $T$ by
\begin{align*}
h_\mu(T,\cdot): \operatorname{Part}_{\mathrm{fin}}(X)\to[0,\infty]
\end{align*}
and, for each $\mathcal{R}\in\operatorname{Part}_{\mathrm{fin}}(X)$, set
\begin{align*}
h_\mu(T,\mathcal{R}):=\lim_{n\to\infty}\frac{1}{n}H_\mu(\mathcal{R}_0^{n-1}).
\end{align*}
Because $T:X\to X$ is measure-preserving, the standard entropy subadditivity argument for finite joined partitions implies that, for each $\mathcal{R}\in\operatorname{Part}_{\mathrm{fin}}(X)$, the sequence $a_n(\mathcal{R}):=H_\mu(\mathcal{R}_0^{n-1})$ is subadditive. Fekete's lemma therefore gives the existence of the entropy-rate limit defining $h_\mu(T,\mathcal{R})$. The theorem statement uses this entropy-rate notation; in the present argument we use the existence of the defining limits for $h_\mu(T,\mathcal{P})$ and $h_\mu(T,\mathcal{Q})$.
[/step]
[step:Show that refinement is preserved by inverse images and finite joins]
Fix $j\in\{0,\dots,n-1\}$. Since $\mathcal{Q}$ refines $\mathcal{P}$, for every $Q\in\mathcal{Q}$ there exists $P\in\mathcal{P}$ such that $Q\subset P$. Taking inverse images under the measurable map $T^j:X\to X$ gives
\begin{align*}
T^{-j}Q\subset T^{-j}P.
\end{align*}
Thus $T^{-j}\mathcal{Q}$ refines $T^{-j}\mathcal{P}$.
Now let $A$ be an atom of $\mathcal{Q}_0^{n-1}$. Then there exist atoms $Q_0,\dots,Q_{n-1}\in\mathcal{Q}$ such that
\begin{align*}
A=\bigcap_{j=0}^{n-1}T^{-j}Q_j.
\end{align*}
For each $j$, choose $P_j\in\mathcal{P}$ with $Q_j\subset P_j$. Then
\begin{align*}
A\subset \bigcap_{j=0}^{n-1}T^{-j}P_j.
\end{align*}
The set on the right is contained in an atom of $\mathcal{P}_0^{n-1}$. Hence $\mathcal{Q}_0^{n-1}$ refines $\mathcal{P}_0^{n-1}$.
[/step]
[step:Prove entropy increases when a finite partition is refined]
We prove the finite refinement inequality needed below. Let $\mathcal{A}$ and $\mathcal{B}$ be finite measurable partitions of $X$, and suppose $\mathcal{B}$ refines $\mathcal{A}$. For each $A\in\mathcal{A}$, define
\begin{align*}
\mathcal{B}_A:=\{B\in\mathcal{B}:B\subset A\}.
\end{align*}
Because $\mathcal{B}$ refines $\mathcal{A}$ and both families are partitions,
\begin{align*}
\mu(A)=\sum_{B\in\mathcal{B}_A}\mu(B).
\end{align*}
If $\mu(A)=0$, then $\mu(B)=0$ for every $B\in\mathcal{B}_A$, so the contribution of $A$ and of all $B\in\mathcal{B}_A$ to the [entropy inequality](/theorems/6729) is $0$. If $\mu(A)>0$, define $r_B:=\mu(B)/\mu(A)$ for $B\in\mathcal{B}_A$. Then $0\leq r_B\leq 1$ and
\begin{align*}
\sum_{B\in\mathcal{B}_A}r_B=1.
\end{align*}
Therefore
\begin{align*}
-\sum_{B\in\mathcal{B}_A}\mu(B)\log\mu(B)
=
-\sum_{B\in\mathcal{B}_A}\mu(A)r_B\log\bigl(\mu(A)r_B\bigr).
\end{align*}
Expanding the logarithm gives
\begin{align*}
-\sum_{B\in\mathcal{B}_A}\mu(B)\log\mu(B)
=
-\mu(A)\log\mu(A)-\mu(A)\sum_{B\in\mathcal{B}_A}r_B\log r_B.
\end{align*}
Since $0\leq r_B\leq 1$, we have $\log r_B\leq 0$ whenever $r_B>0$, while the convention $0\log 0=0$ handles the zero terms. Hence
\begin{align*}
-\mu(A)\sum_{B\in\mathcal{B}_A}r_B\log r_B\geq 0.
\end{align*}
Thus
\begin{align*}
-\mu(A)\log\mu(A)\leq -\sum_{B\in\mathcal{B}_A}\mu(B)\log\mu(B).
\end{align*}
Summing this inequality over all $A\in\mathcal{A}$ yields
\begin{align*}
H_\mu(\mathcal{A})\leq H_\mu(\mathcal{B}).
\end{align*}
[guided]
We need one basic finite fact: if a partition is made finer, its Shannon entropy cannot decrease. Let $\mathcal{A}$ be the coarser finite measurable partition and $\mathcal{B}$ the finer one. Refinement means every atom $B\in\mathcal{B}$ lies inside a unique atom $A\in\mathcal{A}$, because the atoms of $\mathcal{A}$ are pairwise disjoint and cover $X$.
For each atom $A\in\mathcal{A}$, define the collection of finer atoms lying inside it by
\begin{align*}
\mathcal{B}_A:=\{B\in\mathcal{B}:B\subset A\}.
\end{align*}
Since $\mathcal{B}$ is a partition and the atoms in $\mathcal{B}_A$ exactly decompose $A$, finite additivity of the probability measure $\mu$ gives
\begin{align*}
\mu(A)=\sum_{B\in\mathcal{B}_A}\mu(B).
\end{align*}
Now compare the entropy contribution of $A$ with the total entropy contribution of the smaller pieces inside $A$. If $\mu(A)=0$, then every $B\in\mathcal{B}_A$ also has $\mu(B)=0$, so all corresponding entropy terms are $0$ by the convention $0\log 0=0$.
Assume next that $\mu(A)>0$. Define normalized weights $r_B:=\mu(B)/\mu(A)$ for $B\in\mathcal{B}_A$. These numbers form a finite probability vector:
\begin{align*}
0\leq r_B\leq 1
\end{align*}
and
\begin{align*}
\sum_{B\in\mathcal{B}_A}r_B=1.
\end{align*}
Using $\mu(B)=\mu(A)r_B$, we compute
\begin{align*}
-\sum_{B\in\mathcal{B}_A}\mu(B)\log\mu(B)
=
-\sum_{B\in\mathcal{B}_A}\mu(A)r_B\log\bigl(\mu(A)r_B\bigr).
\end{align*}
The logarithm splits into $\log\mu(A)+\log r_B$, so
\begin{align*}
-\sum_{B\in\mathcal{B}_A}\mu(B)\log\mu(B)
=
-\mu(A)\log\mu(A)\sum_{B\in\mathcal{B}_A}r_B-\mu(A)\sum_{B\in\mathcal{B}_A}r_B\log r_B.
\end{align*}
Since the $r_B$ sum to $1$, this becomes
\begin{align*}
-\sum_{B\in\mathcal{B}_A}\mu(B)\log\mu(B)
=
-\mu(A)\log\mu(A)-\mu(A)\sum_{B\in\mathcal{B}_A}r_B\log r_B.
\end{align*}
The final term is nonnegative, because $0\leq r_B\leq 1$ implies $\log r_B\leq 0$ for $r_B>0$, and the convention $0\log 0=0$ covers zero weights. Therefore
\begin{align*}
-\mu(A)\log\mu(A)\leq -\sum_{B\in\mathcal{B}_A}\mu(B)\log\mu(B).
\end{align*}
This proves that replacing one atom $A$ by the finer atoms inside it cannot decrease entropy. Summing over all atoms $A\in\mathcal{A}$ gives
\begin{align*}
H_\mu(\mathcal{A})\leq H_\mu(\mathcal{B}).
\end{align*}
[/guided]
[/step]
[step:Apply the finite refinement inequality to the entropy rates]
From the previous refinement step, for every $n\in\mathbb{N}$ the partition $\mathcal{Q}_0^{n-1}$ refines $\mathcal{P}_0^{n-1}$. Applying the finite refinement inequality with $\mathcal{A}:=\mathcal{P}_0^{n-1}$ and $\mathcal{B}:=\mathcal{Q}_0^{n-1}$ gives
\begin{align*}
H_\mu(\mathcal{P}_0^{n-1})\leq H_\mu(\mathcal{Q}_0^{n-1}).
\end{align*}
Dividing by the positive integer $n$ preserves the inequality:
\begin{align*}
\frac{1}{n}H_\mu(\mathcal{P}_0^{n-1})\leq \frac{1}{n}H_\mu(\mathcal{Q}_0^{n-1}).
\end{align*}
The defining limits exist by the measure-preserving subadditivity argument stated at the beginning of the proof. Passing to those limits in the preceding inequality gives
\begin{align*}
h_\mu(T,\mathcal{P})\leq h_\mu(T,\mathcal{Q}).
\end{align*}
This is the desired monotonicity under refinement.
[/step]