Product Formula for Kolmogorov-Sinai Entropy (Theorem # 6754)
Theorem
Let $(X,\mathcal B_X,\mu,T)$ and $(Y,\mathcal B_Y,\nu,S)$ be measure-preserving systems. Then
\begin{align*}
h_{\mu\otimes\nu}(T\times S)=h_\mu(T)+h_\nu(S).
\end{align*}
Knowledge Status
Analysis
Discussion
States and proves Product Formula for Kolmogorov-Sinai Entropy, a result in advanced ergodic theory focused on entropy, dynamical structure, and related invariants.
Proof
[proofplan]
We first prove exact additivity for rectangle partitions: the Shannon entropy of a product partition is the sum of the two factor entropies, and the same remains true after taking all finite dynamical joins. This gives the lower bound after taking suprema over finite partitions of the two factors. For the upper bound, we show that every finite partition of $X \times Y$ can be approximated in conditional entropy by a finite rectangle partition, using countable generation modulo null sets. The entropy rate of an arbitrary partition is then bounded by the entropy rate of a nearby rectangle partition plus an arbitrarily small error.
[/proofplan]
[step:Define the partition entropy notation used throughout]
All logarithms are natural. If $\mathcal A=\{A_1,\dots,A_m\}$ is a finite measurable partition of a probability space $(Z,\mathcal B_Z,\rho)$, define its Shannon entropy by
\begin{align*}
H_\rho(\mathcal A) := -\sum_{i=1}^m \rho(A_i)\log \rho(A_i),
\end{align*}
with the convention $0\log 0=0$.
If $\mathcal A$ and $\mathcal C$ are finite measurable partitions of $Z$, write $\mathcal A \vee \mathcal C$ for their common refinement and define the conditional entropy
\begin{align*}
H_\rho(\mathcal A \mid \mathcal C) := H_\rho(\mathcal A \vee \mathcal C)-H_\rho(\mathcal C).
\end{align*}
For a measure-preserving map $R: Z \to Z$ and a finite measurable partition $\mathcal A$ of $Z$, define the $n$-block partition
\begin{align*}
\mathcal A_0^{n-1} := \bigvee_{k=0}^{n-1} R^{-k}\mathcal A.
\end{align*}
The entropy rate of $\mathcal A$ is
\begin{align*}
h_\rho(R,\mathcal A) := \lim_{n\to\infty}\frac{1}{n}H_\rho(\mathcal A_0^{n-1}).
\end{align*}
The limit exists by the elementary subadditivity argument. Define $a_n:=H_\rho(\mathcal A_0^{n-1})$ for $n\in\mathbb N$. Measure preservation gives $H_\rho(R^{-n}\mathcal A_0^{m-1})=H_\rho(\mathcal A_0^{m-1})$, and the [entropy inequality](/theorems/6729) $H_\rho(\mathcal E\vee\mathcal F)\le H_\rho(\mathcal E)+H_\rho(\mathcal F)$ gives
\begin{align*}
a_{n+m}=H_\rho(\mathcal A_0^{n+m-1})\le H_\rho(\mathcal A_0^{n-1})+H_\rho(\mathcal A_0^{m-1})=a_n+a_m.
\end{align*}
Let $\alpha:=\inf_{n\in\mathbb N}a_n/n$. For every $N\in\mathbb N$ and every $n=qN+r$ with $q\in\mathbb N\cup\{0\}$ and $0\le r<N$, repeated subadditivity gives $a_n\le q a_N+a_r$, where $a_0:=0$. Hence
\begin{align*}
\limsup_{n\to\infty}\frac{a_n}{n}\le \frac{a_N}{N}.
\end{align*}
Taking the infimum over $N$ gives $\limsup_{n\to\infty}a_n/n\le\alpha$, while $\alpha\le a_n/n$ for every $n$ gives $\alpha\le\liminf_{n\to\infty}a_n/n$. Thus the limit exists and equals $\inf_{n\in\mathbb N}a_n/n$. The Kolmogorov-Sinai entropy is
\begin{align*}
h_\rho(R) := \sup_{\mathcal A} h_\rho(R,\mathcal A),
\end{align*}
where the supremum is over all finite measurable partitions $\mathcal A$ of $Z$.
[/step]
[step:Compute the entropy rate of a rectangle partition]
Let $\mathcal P=\{P_1,\dots,P_r\}$ be a finite measurable partition of $X$, and let $\mathcal Q=\{Q_1,\dots,Q_s\}$ be a finite measurable partition of $Y$. Define the rectangle partition
\begin{align*}
\mathcal P \boxtimes \mathcal Q := \{P_i \times Q_j : 1 \le i \le r,\ 1 \le j \le s\}.
\end{align*}
For every $n \in \mathbb N$, the $n$-block join of $\mathcal P \boxtimes \mathcal Q$ under $T \times S$ is
\begin{align*}
\bigvee_{k=0}^{n-1}(T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q) = \left(\bigvee_{k=0}^{n-1}T^{-k}\mathcal P\right)\boxtimes\left(\bigvee_{k=0}^{n-1}S^{-k}\mathcal Q\right).
\end{align*}
Indeed, an atom on the left is determined by choosing, for each $0 \le k \le n-1$, an atom $P_{i_k}$ of $\mathcal P$ and an atom $Q_{j_k}$ of $\mathcal Q$, giving the set
\begin{align*}
\bigcap_{k=0}^{n-1}(T\times S)^{-k}(P_{i_k}\times Q_{j_k}) = \left(\bigcap_{k=0}^{n-1}T^{-k}P_{i_k}\right)\times\left(\bigcap_{k=0}^{n-1}S^{-k}Q_{j_k}\right).
\end{align*}
For finite partitions $\mathcal A=\{A_1,\dots,A_a\}$ of $X$ and $\mathcal C=\{C_1,\dots,C_c\}$ of $Y$, product measure gives
\begin{align*}
(\mu\otimes\nu)(A_i\times C_j)=\mu(A_i)\nu(C_j).
\end{align*}
Therefore
\begin{align*}
H_{\mu\otimes\nu}(\mathcal A\boxtimes\mathcal C)=H_\mu(\mathcal A)+H_\nu(\mathcal C).
\end{align*}
Applying this with $\mathcal A=\bigvee_{k=0}^{n-1}T^{-k}\mathcal P$ and $\mathcal C=\bigvee_{k=0}^{n-1}S^{-k}\mathcal Q$ gives
\begin{align*}
H_{\mu\otimes\nu}\left(\bigvee_{k=0}^{n-1}(T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q)\right)=H_\mu\left(\bigvee_{k=0}^{n-1}T^{-k}\mathcal P\right)+H_\nu\left(\bigvee_{k=0}^{n-1}S^{-k}\mathcal Q\right).
\end{align*}
Dividing by $n$ and passing to the limit yields
\begin{align*}
h_{\mu\otimes\nu}(T\times S,\mathcal P\boxtimes\mathcal Q)=h_\mu(T,\mathcal P)+h_\nu(S,\mathcal Q).
\end{align*}
[guided]
The purpose of this step is to check that no entropy is lost when we pass from two finite partitions in the factors to their rectangle partition in the product.
Let $\mathcal P=\{P_1,\dots,P_r\}$ be a finite measurable partition of $X$, and let $\mathcal Q=\{Q_1,\dots,Q_s\}$ be a finite measurable partition of $Y$. We define the finite partition of $X\times Y$
\begin{align*}
\mathcal P \boxtimes \mathcal Q := \{P_i \times Q_j : 1 \le i \le r,\ 1 \le j \le s\}.
\end{align*}
The first point is that dynamical joins commute with forming rectangles. For $n \in \mathbb N$, an atom of the join
\begin{align*}
\bigvee_{k=0}^{n-1}(T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q)
\end{align*}
is obtained by choosing, at each time $k$, one rectangle $P_{i_k}\times Q_{j_k}$. The corresponding atom is
\begin{align*}
\bigcap_{k=0}^{n-1}(T\times S)^{-k}(P_{i_k}\times Q_{j_k}).
\end{align*}
Since $(T\times S)(x,y)=(T x,S y)$, this preimage separates into the $X$-condition and the $Y$-condition:
\begin{align*}
\bigcap_{k=0}^{n-1}(T\times S)^{-k}(P_{i_k}\times Q_{j_k}) = \left(\bigcap_{k=0}^{n-1}T^{-k}P_{i_k}\right)\times\left(\bigcap_{k=0}^{n-1}S^{-k}Q_{j_k}\right).
\end{align*}
Thus the whole $n$-block partition is exactly
\begin{align*}
\bigvee_{k=0}^{n-1}(T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q) = \left(\bigvee_{k=0}^{n-1}T^{-k}\mathcal P\right)\boxtimes\left(\bigvee_{k=0}^{n-1}S^{-k}\mathcal Q\right).
\end{align*}
The second point is additivity of Shannon entropy for product partitions. If $\mathcal A=\{A_1,\dots,A_a\}$ is a finite partition of $X$ and $\mathcal C=\{C_1,\dots,C_c\}$ is a finite partition of $Y$, then product measure gives
\begin{align*}
(\mu\otimes\nu)(A_i\times C_j)=\mu(A_i)\nu(C_j).
\end{align*}
Substituting these probabilities into the definition of entropy gives
\begin{align*}
H_{\mu\otimes\nu}(\mathcal A\boxtimes\mathcal C)= -\sum_{i=1}^a\sum_{j=1}^c \mu(A_i)\nu(C_j)\log(\mu(A_i)\nu(C_j)).
\end{align*}
Using $\log(ab)=\log a+\log b$ when $a,b>0$, and the convention $0\log 0=0$ for zero-measure atoms, this becomes
\begin{align*}
H_{\mu\otimes\nu}(\mathcal A\boxtimes\mathcal C)=H_\mu(\mathcal A)+H_\nu(\mathcal C).
\end{align*}
Now take $\mathcal A=\bigvee_{k=0}^{n-1}T^{-k}\mathcal P$ and $\mathcal C=\bigvee_{k=0}^{n-1}S^{-k}\mathcal Q$. The preceding two identities give
\begin{align*}
H_{\mu\otimes\nu}\left(\bigvee_{k=0}^{n-1}(T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q)\right)=H_\mu\left(\bigvee_{k=0}^{n-1}T^{-k}\mathcal P\right)+H_\nu\left(\bigvee_{k=0}^{n-1}S^{-k}\mathcal Q\right).
\end{align*}
After dividing by $n$ and letting $n\to\infty$, the definitions of entropy rate give
\begin{align*}
h_{\mu\otimes\nu}(T\times S,\mathcal P\boxtimes\mathcal Q)=h_\mu(T,\mathcal P)+h_\nu(S,\mathcal Q).
\end{align*}
[/guided]
[/step]
[step:Take suprema over rectangle partitions to get the lower bound]
For every finite partition $\mathcal P$ of $X$ and every finite partition $\mathcal Q$ of $Y$, the partition $\mathcal P\boxtimes\mathcal Q$ is a finite measurable partition of $X\times Y$. Hence
\begin{align*}
h_{\mu\otimes\nu}(T\times S)\ge h_{\mu\otimes\nu}(T\times S,\mathcal P\boxtimes\mathcal Q).
\end{align*}
Using the rectangle computation,
\begin{align*}
h_{\mu\otimes\nu}(T\times S)\ge h_\mu(T,\mathcal P)+h_\nu(S,\mathcal Q).
\end{align*}
Taking the supremum first over $\mathcal P$ and then over $\mathcal Q$ gives
\begin{align*}
h_{\mu\otimes\nu}(T\times S)\ge h_\mu(T)+h_\nu(S).
\end{align*}
[/step]
[step:Approximate arbitrary product-space partitions by rectangle partitions]
Let $\rho:=\mu\otimes\nu$ and let $\mathcal R=\{R_1,\dots,R_m\}$ be a finite measurable partition of $X\times Y$. The approximation below uses the definition of the product $\sigma$-algebra: finite algebras generated by rectangles with arbitrary measurable sides are dense in the product measure algebra. We claim that for every $\varepsilon>0$ there exist finite measurable partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$ such that
\begin{align*}
H_\rho(\mathcal R\mid \mathcal P\boxtimes\mathcal Q)<\varepsilon.
\end{align*}
[claim:Finite rectangle algebras are dense in the product measure algebra]
For every $E\in\mathcal B_X\otimes\mathcal B_Y$ and every $\delta>0$, there is a set $F$ belonging to a finite algebra generated by measurable rectangles such that
\begin{align*}
\rho(E\triangle F)<\delta.
\end{align*}
[/claim]
[proof]
Let $\mathscr A$ denote the collection of all $E\in\mathcal B_X\otimes\mathcal B_Y$ with the stated approximation property. We prove that $\mathscr A$ is a $\sigma$-algebra containing every measurable rectangle $B\times C$ with $B\in\mathcal B_X$ and $C\in\mathcal B_Y$.
First, $\mathscr A$ is closed under complements because $(E^c)\triangle(F^c)=E\triangle F$. Next let $(E_j)_{j\in\mathbb N}$ be a sequence of pairwise disjoint sets in $\mathscr A$, and let $E:=\bigcup_{j=1}^\infty E_j$. Since $\rho$ is a probability measure, continuity from below gives an integer $N\in\mathbb N$ such that
\begin{align*}
\rho\left(E\setminus\bigcup_{j=1}^N E_j\right)<\frac{\delta}{2}.
\end{align*}
For each $1\le j\le N$, choose a set $F_j$ in a finite rectangle algebra with
\begin{align*}
\rho(E_j\triangle F_j)<\frac{\delta}{2N}.
\end{align*}
The set $F:=\bigcup_{j=1}^N F_j$ belongs to the finite algebra generated by the finitely many rectangles used to construct $F_1,\dots,F_N$, and subadditivity of measure gives
\begin{align*}
\rho(E\triangle F)\le \rho\left(E\setminus\bigcup_{j=1}^N E_j\right)+\sum_{j=1}^N\rho(E_j\triangle F_j)<\delta.
\end{align*}
Thus $\mathscr A$ is closed under countable disjoint unions, and together with closure under complements this makes $\mathscr A$ a $\sigma$-algebra.
Every measurable rectangle $B\times C$ belongs to $\mathscr A$, because it is itself an atom-union in the finite algebra generated by the single rectangle $B\times C$. Since measurable rectangles generate $\mathcal B_X\otimes\mathcal B_Y$, it follows that $\mathscr A=\mathcal B_X\otimes\mathcal B_Y$.
[/proof]
If $m=1$, then $\mathcal R=\{X\times Y\}$ up to null sets, so $H_\rho(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)=0$ for any finite measurable partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$. Thus the desired approximation is immediate in this case. Assume from now on that $m\ge 2$.
Apply the claim to each atom $R_i$ with accuracy $\eta/m^2$. Taking the common refinement of the finitely many rectangle algebras obtained for $R_1,\dots,R_m$, we get sets $A_1,\dots,A_m$ in one finite rectangle algebra such that
\begin{align*}
\sum_{i=1}^m \rho(R_i\triangle A_i)<\frac{\eta}{m}.
\end{align*}
Refining the finitely many $X$-side and $Y$-side generators of this rectangle algebra into atoms gives finite measurable partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$ such that every $A_i$ is a union of atoms of $\mathcal P\boxtimes\mathcal Q$.
Define a finite partition $\mathcal C=\{C_1,\dots,C_m\}$ by
\begin{align*}
C_1:=A_1
\end{align*}
and, for $2\le i\le m-1$,
\begin{align*}
C_i:=A_i\setminus\bigcup_{\ell=1}^{i-1}A_\ell,
\end{align*}
with
\begin{align*}
C_m:=(X\times Y)\setminus\bigcup_{\ell=1}^{m-1}C_\ell.
\end{align*}
Each $C_i$ is a union of atoms of $\mathcal P\boxtimes\mathcal Q$, so $\mathcal P\boxtimes\mathcal Q$ refines $\mathcal C$. We now trace the error explicitly. For $1\le i\le m-1$, the inclusion $C_i\subset A_i$ gives
\begin{align*}
R_i\triangle C_i\subset (R_i\triangle A_i)\cup(A_i\setminus C_i).
\end{align*}
Since $A_i\setminus C_i\subset \bigcup_{\ell<i}A_i\cap A_\ell$ and the sets $R_1,\dots,R_m$ are disjoint, every point of $A_i\cap A_\ell$ lies in $(A_i\triangle R_i)\cup(A_\ell\triangle R_\ell)$. Hence
\begin{align*}
\sum_{i=1}^{m-1}\rho(R_i\triangle C_i)\le m\sum_{i=1}^m\rho(R_i\triangle A_i)<\eta.
\end{align*}
Because both $\mathcal R$ and $\mathcal C$ are partitions, $R_m\triangle C_m\subset\bigcup_{i=1}^{m-1}(R_i\triangle C_i)$, so
\begin{align*}
\sum_{i=1}^m\rho(R_i\triangle C_i)<2\eta.
\end{align*}
Let $L_{\mathcal R}:X\times Y\to\{1,\dots,m\}$ and $L_{\mathcal C}:X\times Y\to\{1,\dots,m\}$ be the label maps defined by
\begin{align*}
L_{\mathcal R}(z)=i \quad \text{if } z\in R_i
\end{align*}
and
\begin{align*}
L_{\mathcal C}(z)=i \quad \text{if } z\in C_i.
\end{align*}
Let $E:=\{z\in X\times Y: L_{\mathcal R}(z)\ne L_{\mathcal C}(z)\}$. Then
\begin{align*}
\rho(E)\le \sum_{i=1}^m\rho(R_i\triangle C_i)<2\eta.
\end{align*}
The following elementary finite-label estimate proves the needed entropy bound. Let $I:X\times Y\to\{0,1\}$ be the error-indicator map defined by $I(z)=1$ for $z\in E$ and $I(z)=0$ for $z\notin E$. Given $L_{\mathcal C}$ and $I$, the label $L_{\mathcal R}$ is determined on $E^c$ and has at most $m$ possible values on $E$. Expanding the joint entropy of the finite labels $(L_{\mathcal R},L_{\mathcal C},I)$ by the identity $H(A,B)=H(B)+H(A\mid B)$ gives
\begin{align*}
H_\rho(\mathcal R\mid\mathcal C)\le H_\rho(\{E,E^c\})+\rho(E)\log m.
\end{align*}
Writing the two-atom entropy explicitly gives
\begin{align*}
H_\rho(\mathcal R\mid\mathcal C)\le -\rho(E)\log\rho(E)-(1-\rho(E))\log(1-\rho(E))+\rho(E)\log m.
\end{align*}
Since the right-hand side tends to $0$ as $\rho(E)\to 0$, choose $\eta>0$ small enough that $H_\rho(\mathcal R\mid\mathcal C)<\varepsilon$. Because $\mathcal P\boxtimes\mathcal Q$ refines $\mathcal C$, there is a finite partition $\mathcal D$ such that $\mathcal P\boxtimes\mathcal Q=\mathcal C\vee\mathcal D$. The chain rule gives both identities
\begin{align*}
H_\rho(\mathcal R\vee\mathcal D\mid\mathcal C)=H_\rho(\mathcal D\mid\mathcal C)+H_\rho(\mathcal R\mid\mathcal C\vee\mathcal D)
\end{align*}
and
\begin{align*}
H_\rho(\mathcal R\vee\mathcal D\mid\mathcal C)=H_\rho(\mathcal R\mid\mathcal C)+H_\rho(\mathcal D\mid\mathcal R\vee\mathcal C).
\end{align*}
The finite-sum formula for conditional entropy, applied atom by atom over $\mathcal C$ and using the log-sum inequality, gives $H_\rho(\mathcal D\mid\mathcal R\vee\mathcal C)\le H_\rho(\mathcal D\mid\mathcal C)$. Subtracting this inequality from the two displayed chain-rule identities yields the monotonicity under refinement in the present case:
\begin{align*}
H_\rho(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)=H_\rho(\mathcal R\mid\mathcal C\vee\mathcal D)\le H_\rho(\mathcal R\mid\mathcal C)<\varepsilon.
\end{align*}
[guided]
We want to compare an arbitrary finite partition $\mathcal R$ of $X\times Y$ with a rectangle partition. This approximation uses only the product $\sigma$-algebra: measurable rectangles generate $\mathcal B_X\otimes\mathcal B_Y$, and finite rectangle algebras approximate product-measurable sets in measure.
Let $\rho:=\mu\otimes\nu$ and let $\mathcal R=\{R_1,\dots,R_m\}$ be a finite measurable partition of $X\times Y$. Fix $\varepsilon>0$. If $m=1$, then $\mathcal R=\{X\times Y\}$ up to null sets, and $H_\rho(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)=0$ for any finite measurable partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$. Hence the claim is immediate in that case, so we assume $m\ge 2$. We first justify the approximation fact used here. Let $\mathscr A$ be the class of all sets $E\in\mathcal B_X\otimes\mathcal B_Y$ such that, for every $\delta>0$, there is a set $F$ in a finite algebra generated by measurable rectangles with
\begin{align*}
\rho(E\triangle F)<\delta.
\end{align*}
Why is $\mathscr A$ a $\sigma$-algebra? Complements are immediate because symmetric difference commutes with complement. For countable disjoint unions, let $(E_j)_{j\in\mathbb N}$ be pairwise disjoint sets in $\mathscr A$, define $E:=\bigcup_{j=1}^\infty E_j$, and fix $\delta>0$. Since $\rho$ is finite, continuity from below gives $N\in\mathbb N$ such that
\begin{align*}
\rho\left(E\setminus\bigcup_{j=1}^N E_j\right)<\frac{\delta}{2}.
\end{align*}
For $1\le j\le N$, choose $F_j$ in a finite rectangle algebra with
\begin{align*}
\rho(E_j\triangle F_j)<\frac{\delta}{2N}.
\end{align*}
Then $F:=\bigcup_{j=1}^N F_j$ lies in the finite algebra generated by all rectangles used for $F_1,\dots,F_N$, and
\begin{align*}
\rho(E\triangle F)\le \rho\left(E\setminus\bigcup_{j=1}^N E_j\right)+\sum_{j=1}^N\rho(E_j\triangle F_j)<\delta.
\end{align*}
Thus $\mathscr A$ is closed under countable disjoint unions, hence is a $\sigma$-algebra. Every measurable rectangle $B\times C$ belongs to $\mathscr A$, because it is already in the finite algebra generated by that rectangle. Since measurable rectangles generate $\mathcal B_X\otimes\mathcal B_Y$, we get $\mathscr A=\mathcal B_X\otimes\mathcal B_Y$.
Now apply this density result to each atom $R_i$ with tolerance $\eta/m^2$, where $\eta>0$ will be chosen later. Taking the common refinement of the finitely many rectangle algebras obtained for $R_1,\dots,R_m$, we may choose sets $A_1,\dots,A_m$ belonging to one finite rectangle algebra such that
\begin{align*}
\sum_{i=1}^m \rho(R_i\triangle A_i)<\frac{\eta}{m}.
\end{align*}
A finite rectangle algebra is generated by finitely many sets in $X$ and finitely many sets in $Y$. Refining those generators into atoms gives finite measurable partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$ such that each $A_i$ is a union of atoms of $\mathcal P\boxtimes\mathcal Q$.
The sets $A_i$ need not themselves form a partition. We turn them into one without leaving the same rectangle algebra. Define
\begin{align*}
C_1:=A_1.
\end{align*}
For $2\le i\le m-1$, define
\begin{align*}
C_i:=A_i\setminus\bigcup_{\ell=1}^{i-1}A_\ell.
\end{align*}
Finally define
\begin{align*}
C_m:=(X\times Y)\setminus\bigcup_{\ell=1}^{m-1}C_\ell.
\end{align*}
Then $\mathcal C=\{C_1,\dots,C_m\}$ is a finite measurable partition, and every $C_i$ is still a union of atoms of $\mathcal P\boxtimes\mathcal Q$. Hence $\mathcal P\boxtimes\mathcal Q$ refines $\mathcal C$. We also keep explicit control of the approximation error. For $1\le i\le m-1$, because $C_i\subset A_i$,
\begin{align*}
R_i\triangle C_i\subset (R_i\triangle A_i)\cup(A_i\setminus C_i).
\end{align*}
The only points removed from $A_i$ are those already lying in some earlier $A_\ell$. If $z\in A_i\cap A_\ell$ with $\ell<i$, then $z$ cannot lie in both disjoint atoms $R_i$ and $R_\ell$, so $z\in(A_i\triangle R_i)\cup(A_\ell\triangle R_\ell)$. Thus
\begin{align*}
\sum_{i=1}^{m-1}\rho(R_i\triangle C_i)\le m\sum_{i=1}^m\rho(R_i\triangle A_i)<\eta.
\end{align*}
Finally, since both $\mathcal R$ and $\mathcal C$ are partitions, any point in $R_m\triangle C_m$ must already lie in one of the symmetric differences $R_i\triangle C_i$ for $1\le i\le m-1$. Therefore
\begin{align*}
\sum_{i=1}^m\rho(R_i\triangle C_i)<2\eta.
\end{align*}
Now we convert closeness in measure into small conditional entropy. Define the label maps
\begin{align*}
L_{\mathcal R}:X\times Y\to\{1,\dots,m\}
\end{align*}
and
\begin{align*}
L_{\mathcal C}:X\times Y\to\{1,\dots,m\}
\end{align*}
by $L_{\mathcal R}(z)=i$ for $z\in R_i$ and $L_{\mathcal C}(z)=i$ for $z\in C_i$. Let
\begin{align*}
E:=\{z\in X\times Y: L_{\mathcal R}(z)\ne L_{\mathcal C}(z)\}.
\end{align*}
If the two labels disagree at $z$, then $z$ belongs to at least one symmetric difference $R_i\triangle C_i$, so
\begin{align*}
\rho(E)\le \sum_{i=1}^m\rho(R_i\triangle C_i)<2\eta.
\end{align*}
Define the error-indicator map $I:X\times Y\to\{0,1\}$ by $I(z)=1$ for $z\in E$ and $I(z)=0$ for $z\notin E$. Given the $\mathcal C$-label and $I$, the $\mathcal R$-label is determined on $E^c$ and has at most $m$ possible values on $E$. Expanding the joint entropy of the finite labels $(L_{\mathcal R},L_{\mathcal C},I)$ by $H(A,B)=H(B)+H(A\mid B)$ gives the elementary bound
\begin{align*}
H_\rho(\mathcal R\mid\mathcal C)\le H_\rho(\{E,E^c\})+\rho(E)\log m.
\end{align*}
Writing out the two-atom entropy gives
\begin{align*}
H_\rho(\mathcal R\mid\mathcal C)\le -\rho(E)\log\rho(E)-(1-\rho(E))\log(1-\rho(E))+\rho(E)\log m.
\end{align*}
The right-hand side tends to $0$ as $\rho(E)\to 0$. Choose $\eta>0$ small enough that this upper bound is below $\varepsilon$. Since $\mathcal P\boxtimes\mathcal Q$ refines $\mathcal C$, there is a finite partition $\mathcal D$ such that $\mathcal P\boxtimes\mathcal Q=\mathcal C\vee\mathcal D$. The chain rule gives
\begin{align*}
H_\rho(\mathcal R\vee\mathcal D\mid\mathcal C)=H_\rho(\mathcal D\mid\mathcal C)+H_\rho(\mathcal R\mid\mathcal C\vee\mathcal D)
\end{align*}
and also
\begin{align*}
H_\rho(\mathcal R\vee\mathcal D\mid\mathcal C)=H_\rho(\mathcal R\mid\mathcal C)+H_\rho(\mathcal D\mid\mathcal R\vee\mathcal C).
\end{align*}
Why does this imply the desired direction of monotonicity? The finite-sum formula for conditional entropy says that, on each atom of $\mathcal C$, splitting further according to $\mathcal R$ cannot increase the average uncertainty of the $\mathcal D$-label; this is exactly the log-sum inequality. Hence
\begin{align*}
H_\rho(\mathcal D\mid\mathcal R\vee\mathcal C)\le H_\rho(\mathcal D\mid\mathcal C).
\end{align*}
Comparing the two chain-rule identities gives
\begin{align*}
H_\rho(\mathcal R\mid\mathcal C\vee\mathcal D)\le H_\rho(\mathcal R\mid\mathcal C).
\end{align*}
Since $\mathcal P\boxtimes\mathcal Q=\mathcal C\vee\mathcal D$, we conclude
\begin{align*}
H_\rho(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)=H_\rho(\mathcal R\mid\mathcal C\vee\mathcal D)\le H_\rho(\mathcal R\mid\mathcal C)<\varepsilon.
\end{align*}
This proves the desired approximation in conditional entropy.
[/guided]
[/step]
[step:Bound the entropy rate of an arbitrary partition by a nearby rectangle partition]
Let $\mathcal R$ be a finite measurable partition of $X\times Y$, and fix $\varepsilon>0$. By the approximation step, choose finite partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$ such that
\begin{align*}
H_{\mu\otimes\nu}(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)<\varepsilon.
\end{align*}
For each $n\in\mathbb N$, define
\begin{align*}
\mathcal R_0^{n-1}:=\bigvee_{k=0}^{n-1}(T\times S)^{-k}\mathcal R
\end{align*}
and
\begin{align*}
\mathcal U_0^{n-1}:=\bigvee_{k=0}^{n-1}(T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q).
\end{align*}
Using the identity $H(\mathcal A\vee\mathcal C)=H(\mathcal C)+H(\mathcal A\mid\mathcal C)$ with $\mathcal A=\mathcal R_0^{n-1}$ and $\mathcal C=\mathcal U_0^{n-1}$, we obtain
\begin{align*}
H_{\mu\otimes\nu}(\mathcal R_0^{n-1})\le H_{\mu\otimes\nu}(\mathcal U_0^{n-1})+H_{\mu\otimes\nu}(\mathcal R_0^{n-1}\mid\mathcal U_0^{n-1}).
\end{align*}
The finite identity $H(\mathcal A\vee\mathcal B\mid\mathcal C)=H(\mathcal A\mid\mathcal C)+H(\mathcal B\mid\mathcal A\vee\mathcal C)$, obtained by expanding the definition of conditional entropy, and the non-negativity of finite conditional entropy give the subadditivity estimate for joins
\begin{align*}
H_{\mu\otimes\nu}(\mathcal R_0^{n-1}\mid\mathcal U_0^{n-1})\le \sum_{k=0}^{n-1}H_{\mu\otimes\nu}((T\times S)^{-k}\mathcal R\mid (T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q)).
\end{align*}
Indeed, applying this finite identity to the ordered family $((T\times S)^{-k}\mathcal R)_{k=0}^{n-1}$ and conditioning on $\mathcal U_0^{n-1}$ gives a sum of conditional entropies; in the $k$-th summand, dropping all conditioning information except $(T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q)$ can only increase conditional entropy, because the difference is another non-negative finite conditional entropy.
Since $T\times S$ preserves $\mu\otimes\nu$, each summand equals
\begin{align*}
H_{\mu\otimes\nu}(\mathcal R\mid\mathcal P\boxtimes\mathcal Q).
\end{align*}
Therefore
\begin{align*}
H_{\mu\otimes\nu}(\mathcal R_0^{n-1})\le H_{\mu\otimes\nu}(\mathcal U_0^{n-1})+n\varepsilon.
\end{align*}
Divide by $n$ and let $n\to\infty$:
\begin{align*}
h_{\mu\otimes\nu}(T\times S,\mathcal R)\le h_{\mu\otimes\nu}(T\times S,\mathcal P\boxtimes\mathcal Q)+\varepsilon.
\end{align*}
By the rectangle partition computation,
\begin{align*}
h_{\mu\otimes\nu}(T\times S,\mathcal R)\le h_\mu(T,\mathcal P)+h_\nu(S,\mathcal Q)+\varepsilon.
\end{align*}
Since $h_\mu(T,\mathcal P)\le h_\mu(T)$ and $h_\nu(S,\mathcal Q)\le h_\nu(S)$, we get
\begin{align*}
h_{\mu\otimes\nu}(T\times S,\mathcal R)\le h_\mu(T)+h_\nu(S)+\varepsilon.
\end{align*}
[guided]
Fix a finite measurable partition $\mathcal R$ of $X\times Y$ and fix $\varepsilon>0$. The approximation step supplies finite measurable partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$ such that
\begin{align*}
H_{\mu\otimes\nu}(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)<\varepsilon.
\end{align*}
For $n\in\mathbb N$, define
\begin{align*}
\mathcal R_0^{n-1}:=\bigvee_{k=0}^{n-1}(T\times S)^{-k}\mathcal R
\end{align*}
and
\begin{align*}
\mathcal U_0^{n-1}:=\bigvee_{k=0}^{n-1}(T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q).
\end{align*}
The identity $H(\mathcal A\vee\mathcal C)=H(\mathcal C)+H(\mathcal A\mid\mathcal C)$ gives
\begin{align*}
H_{\mu\otimes\nu}(\mathcal R_0^{n-1})\le H_{\mu\otimes\nu}(\mathcal U_0^{n-1})+H_{\mu\otimes\nu}(\mathcal R_0^{n-1}\mid\mathcal U_0^{n-1}).
\end{align*}
We now bound the conditional term by adding the errors from the $n$ time levels. The finite chain rule and monotonicity of conditional entropy under additional conditioning give
\begin{align*}
H_{\mu\otimes\nu}(\mathcal R_0^{n-1}\mid\mathcal U_0^{n-1})\le \sum_{k=0}^{n-1}H_{\mu\otimes\nu}((T\times S)^{-k}\mathcal R\mid (T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q)).
\end{align*}
Since $T\times S$ preserves $\mu\otimes\nu$, each summand equals $H_{\mu\otimes\nu}(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)$, which is less than $\varepsilon$. Therefore
\begin{align*}
H_{\mu\otimes\nu}(\mathcal R_0^{n-1})\le H_{\mu\otimes\nu}(\mathcal U_0^{n-1})+n\varepsilon.
\end{align*}
Dividing by $n$ and letting $n\to\infty$ gives
\begin{align*}
h_{\mu\otimes\nu}(T\times S,\mathcal R)\le h_{\mu\otimes\nu}(T\times S,\mathcal P\boxtimes\mathcal Q)+\varepsilon.
\end{align*}
The rectangle computation gives
\begin{align*}
h_{\mu\otimes\nu}(T\times S,\mathcal P\boxtimes\mathcal Q)=h_\mu(T,\mathcal P)+h_\nu(S,\mathcal Q).
\end{align*}
Finally, by the definitions of $h_\mu(T)$ and $h_\nu(S)$ as suprema over finite partitions,
\begin{align*}
h_\mu(T,\mathcal P)\le h_\mu(T)
\end{align*}
and
\begin{align*}
h_\nu(S,\mathcal Q)\le h_\nu(S).
\end{align*}
Combining these inequalities yields
\begin{align*}
h_{\mu\otimes\nu}(T\times S,\mathcal R)\le h_\mu(T)+h_\nu(S)+\varepsilon.
\end{align*}
[/guided]
[/step]
[step:Take the remaining supremum to get the upper bound and conclude]
The preceding inequality holds for every finite measurable partition $\mathcal R$ of $X\times Y$ and every $\varepsilon>0$. Letting $\varepsilon\downarrow 0$ gives
\begin{align*}
h_{\mu\otimes\nu}(T\times S,\mathcal R)\le h_\mu(T)+h_\nu(S).
\end{align*}
Taking the supremum over all finite measurable partitions $\mathcal R$ of $X\times Y$ yields
\begin{align*}
h_{\mu\otimes\nu}(T\times S)\le h_\mu(T)+h_\nu(S).
\end{align*}
Together with the lower bound, this proves
\begin{align*}
h_{\mu\otimes\nu}(T\times S)=h_\mu(T)+h_\nu(S).
\end{align*}
[guided]
The preceding step proved that, for every finite measurable partition $\mathcal R$ of $X\times Y$ and every $\varepsilon>0$,
\begin{align*}
h_{\mu\otimes\nu}(T\times S,\mathcal R)\le h_\mu(T)+h_\nu(S)+\varepsilon.
\end{align*}
Because the left-hand side does not depend on $\varepsilon$, letting $\varepsilon\downarrow 0$ gives
\begin{align*}
h_{\mu\otimes\nu}(T\times S,\mathcal R)\le h_\mu(T)+h_\nu(S).
\end{align*}
Now take the supremum over all finite measurable partitions $\mathcal R$ of $X\times Y$. By the definition of Kolmogorov-Sinai entropy,
\begin{align*}
h_{\mu\otimes\nu}(T\times S)=\sup_{\mathcal R}h_{\mu\otimes\nu}(T\times S,\mathcal R),
\end{align*}
so
\begin{align*}
h_{\mu\otimes\nu}(T\times S)\le h_\mu(T)+h_\nu(S).
\end{align*}
The lower-bound step already proved the reverse inequality
\begin{align*}
h_{\mu\otimes\nu}(T\times S)\ge h_\mu(T)+h_\nu(S).
\end{align*}
Combining the two inequalities gives
\begin{align*}
h_{\mu\otimes\nu}(T\times S)=h_\mu(T)+h_\nu(S).
\end{align*}
[/guided]
[/step]
Prerequisites (0/1 completed)
Prerequisites Graph
Interactive dependency map showing how this theorem builds on foundational concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Definitions & Concepts
Explore Further
Continuity
Definition
Scaling Law for Riemannian Curvature Under Constant Metric Rescaling
Analysis
Dobrushin Lanford Ruelle Existence
Analysis
MERGED — See Theorem 58
Sobolev Spaces
Parabolic Scaling Invariance of Ricci Flow
Analysis
Continuous Functions Integrable
Integration
Extrinsic Euler-Lagrange Equation for Weak Harmonic Maps
Analysis
Periodic Decimals are Rational
Real Numbers
Hilbert-Space Approximation by Finite Rank
Functional Analysis
Analysis
Area