[proofplan]
Fix a finite measurable partition $\mathcal P$ and write the entropy of its first $n$ iterates as a sequence $a_n$. The [entropy inequality](/theorems/6729) for joins, together with measure-preservation of $T$, makes $(a_n)_{n\geq 1}$ subadditive. A direct Fekete-type argument then proves that $a_n/n$ has a limit and identifies it with $\inf_{n\geq 1}a_n/n$. Finally, the stated formula is exactly the definition of Kolmogorov–Sinai entropy as the supremum of these partition entropy rates.
[/proofplan]
[step:Prove the entropy inequality for joins of finite partitions]
Let $\mathcal A=\{A_i:i\in I\}$ and $\mathcal B=\{B_k:k\in K\}$ be finite measurable partitions of $X$, where $I$ and $K$ are finite index sets. Define [real numbers](/page/Real%20Numbers)
\begin{align*}
p_{ik}:=\mu(A_i\cap B_k), \qquad r_i:=\mu(A_i), \qquad s_k:=\mu(B_k).
\end{align*}
The join $\mathcal A\vee\mathcal B$ is the finite measurable partition whose atoms are the nonempty sets $A_i\cap B_k$. We claim that
\begin{align*}
H_\mu(\mathcal A\vee\mathcal B)\leq H_\mu(\mathcal A)+H_\mu(\mathcal B).
\end{align*}
For indices with $p_{ik}>0$, we also have $r_i>0$ and $s_k>0$. We justify the scalar inequality used below: define $\phi:(0,\infty)\to\mathbb R$ by $\phi(u)=u\log u-u+1$. Then $\phi'(u)=\log u$ and $\phi''(u)=1/u>0$, so $\phi$ is convex and attains its minimum at $u=1$, where $\phi(1)=0$. Hence $u\log u-u+1\geq 0$ for every $u>0$. Applying this with $u=p_{ik}/(r_i s_k)$ and weighting by $r_i s_k$, gives
\begin{align*}
p_{ik}\log\frac{p_{ik}}{r_i s_k}-p_{ik}+r_i s_k\geq 0.
\end{align*}
Summing the inequality over all pairs $(i,k)\in I\times K$ for which $p_{ik}>0$ gives
\begin{align*}
\sum_{i,k:p_{ik}>0}p_{ik}\log\frac{p_{ik}}{r_i s_k}\geq \sum_{i,k:p_{ik}>0}p_{ik}-\sum_{i,k:p_{ik}>0}r_i s_k.
\end{align*}
Because $\sum_{i,k:p_{ik}>0}p_{ik}=1$ and $\sum_{i,k:p_{ik}>0}r_i s_k\leq \sum_{i,k}r_i s_k=1$, the right-hand side is nonnegative. Therefore
\begin{align*}
-\sum_{i,k:p_{ik}>0}p_{ik}\log p_{ik}\leq -\sum_{i,k:p_{ik}>0}p_{ik}\log r_i-\sum_{i,k:p_{ik}>0}p_{ik}\log s_k.
\end{align*}
Since $\sum_k p_{ik}=r_i$ and $\sum_i p_{ik}=s_k$, the right-hand side is
\begin{align*}
-\sum_i r_i\log r_i-\sum_k s_k\log s_k=H_\mu(\mathcal A)+H_\mu(\mathcal B).
\end{align*}
Thus $H_\mu(\mathcal A\vee\mathcal B)\leq H_\mu(\mathcal A)+H_\mu(\mathcal B)$.
[/step]
[step:Show the iterate entropy sequence is subadditive]
Fix a finite measurable partition $\mathcal P$ of $X$. For each $n\in\mathbb N$, define
\begin{align*}
\mathcal P_n:=\bigvee_{j=0}^{n-1}T^{-j}\mathcal P, \qquad a_n:=H_\mu(\mathcal P_n).
\end{align*}
Let $m,n\in\mathbb N$. Since
\begin{align*}
\mathcal P_{n+m}=\mathcal P_n\vee T^{-n}\mathcal P_m,
\end{align*}
the entropy inequality for joins gives
\begin{align*}
a_{n+m}=H_\mu(\mathcal P_n\vee T^{-n}\mathcal P_m)\leq H_\mu(\mathcal P_n)+H_\mu(T^{-n}\mathcal P_m).
\end{align*}
Since $T$ is measure-preserving, induction on $n$ shows that $T^n$ is measure-preserving, meaning $\mu((T^n)^{-1}E)=\mu(E)$ for every measurable set $E\subset X$. Hence every atom of $T^{-n}\mathcal P_m$ has the same measure as the corresponding atom of $\mathcal P_m$, so
\begin{align*}
H_\mu(T^{-n}\mathcal P_m)=H_\mu(\mathcal P_m)=a_m.
\end{align*}
Hence
\begin{align*}
a_{n+m}\leq a_n+a_m.
\end{align*}
Thus $(a_n)_{n\geq 1}$ is a subadditive sequence of nonnegative real numbers.
[guided]
The goal is to turn the dynamical expression into a purely numerical subadditivity statement. Fix the finite measurable partition $\mathcal P$ and define, for each $n\in\mathbb N$,
\begin{align*}
\mathcal P_n:=\bigvee_{j=0}^{n-1}T^{-j}\mathcal P, \qquad a_n:=H_\mu(\mathcal P_n).
\end{align*}
The partition $\mathcal P_n$ records which atom of $\mathcal P$ a point visits at times $0,1,\dots,n-1$. To compare an orbit segment of length $n+m$ with two shorter orbit segments, split the join at time $n$:
\begin{align*}
\mathcal P_{n+m}=\left(\bigvee_{j=0}^{n-1}T^{-j}\mathcal P\right)\vee\left(\bigvee_{j=n}^{n+m-1}T^{-j}\mathcal P\right).
\end{align*}
The first factor is $\mathcal P_n$. In the second factor, write $j=n+\ell$ with $0\leq \ell\leq m-1$, so it is $T^{-n}\mathcal P_m$. Hence
\begin{align*}
\mathcal P_{n+m}=\mathcal P_n\vee T^{-n}\mathcal P_m.
\end{align*}
Now apply the entropy inequality for joins to the finite partitions $\mathcal P_n$ and $T^{-n}\mathcal P_m$:
\begin{align*}
H_\mu(\mathcal P_n\vee T^{-n}\mathcal P_m)\leq H_\mu(\mathcal P_n)+H_\mu(T^{-n}\mathcal P_m).
\end{align*}
The measure-preserving hypothesis is used exactly here. Since $T$ is measure-preserving, induction gives that $T^n$ is measure-preserving: for every measurable set $E\subset X$, one has $\mu((T^n)^{-1}E)=\mu(E)$. If $C$ is an atom of $\mathcal P_m$, then $T^{-n}C=(T^n)^{-1}C$ is the corresponding atom of $T^{-n}\mathcal P_m$, and $\mu(T^{-n}C)=\mu(C)$. Therefore the list of atom measures for $T^{-n}\mathcal P_m$ agrees with the list of atom measures for $\mathcal P_m$, so their entropies agree:
\begin{align*}
H_\mu(T^{-n}\mathcal P_m)=H_\mu(\mathcal P_m).
\end{align*}
Combining these identities gives
\begin{align*}
a_{n+m}=H_\mu(\mathcal P_{n+m})\leq H_\mu(\mathcal P_n)+H_\mu(\mathcal P_m)=a_n+a_m.
\end{align*}
Thus the numerical sequence $(a_n)_{n\geq 1}$ is subadditive.
[/guided]
[/step]
[step:Use direct subadditivity estimates to obtain the entropy rate]
Define
\begin{align*}
L:=\inf_{n\geq 1}\frac{a_n}{n}.
\end{align*}
Since $a_n\geq 0$, we have $L\geq 0$. Also $L\leq a_n/n$ for every $n$, so
\begin{align*}
L\leq \liminf_{n\to\infty}\frac{a_n}{n}.
\end{align*}
Fix $m\in\mathbb N$. For each $n\in\mathbb N$, write $n=qm+r$, where $q\in\mathbb N\cup\{0\}$ and $r\in\{0,1,\dots,m-1\}$. If $r=0$, subadditivity gives $a_n\leq q a_m$. If $r>0$, subadditivity gives $a_n\leq q a_m+a_r$. Define
\begin{align*}
C_m:=\max\{a_r:1\leq r\leq m-1\},
\end{align*}
with $C_1:=0$. Then, for all $n\geq 1$,
\begin{align*}
a_n\leq q a_m+C_m.
\end{align*}
Dividing by $n$ and using $q/n\leq 1/m$ gives
\begin{align*}
\frac{a_n}{n}\leq \frac{a_m}{m}+\frac{C_m}{n}.
\end{align*}
Taking the limit superior as $n\to\infty$ yields
\begin{align*}
\limsup_{n\to\infty}\frac{a_n}{n}\leq \frac{a_m}{m}.
\end{align*}
Since $m$ was arbitrary,
\begin{align*}
\limsup_{n\to\infty}\frac{a_n}{n}\leq L.
\end{align*}
Together with the lower bound by $L$, this proves that
\begin{align*}
\lim_{n\to\infty}\frac{a_n}{n}=L=\inf_{n\geq 1}\frac{a_n}{n}.
\end{align*}
Therefore the limit defining $h_\mu(T,\mathcal P)$ exists.
[/step]
[step:Take the supremum over finite measurable partitions]
For the fixed finite measurable partition $\mathcal P$, define its entropy rate under $T$ by
\begin{align*}
h_\mu(T,\mathcal P):=\lim_{n\to\infty}\frac{1}{n}H_\mu\left(\bigvee_{j=0}^{n-1}T^{-j}\mathcal P\right).
\end{align*}
The previous step proves that this limit exists for every finite measurable partition $\mathcal P$ of $X$. By the definition of Kolmogorov–Sinai entropy as the supremum over finite measurable partition entropy rates, the Kolmogorov–Sinai entropy of $T$ is
\begin{align*}
h_\mu(T):=\sup_{\mathcal P}h_\mu(T,\mathcal P),
\end{align*}
where the supremum is over all finite measurable partitions of $X$. This definition applies because $T$ is assumed measure-preserving and the previous step establishes the existence of each displayed entropy rate. Substituting the expression for $h_\mu(T,\mathcal P)$ gives
\begin{align*}
h_\mu(T)=\sup_{\mathcal P}\lim_{n\to\infty}\frac{1}{n}H_\mu\left(\bigvee_{j=0}^{n-1}T^{-j}\mathcal P\right).
\end{align*}
This is the claimed Kolmogorov–Sinai entropy formula.
[/step]