Androma — The Home of Mathematics on the Internet

custom_env admin

[step:Define the partition entropy notation used throughout] All logarithms are natural. If $\mathcal A=\{A_1,\dots,A_m\}$ is a finite measurable partition of a probability space $(Z,\mathcal B_Z,\rho)$, define its Shannon entropy by \begin{align*} H_\rho(\mathcal A) := -\sum_{i=1}^m \rho(A_i)\log \rho(A_i), \end{align*} with the convention $0\log 0=0$. If $\mathcal A$ and $\mathcal C$ are finite measurable partitions of $Z$, write $\mathcal A \vee \mathcal C$ for their common refinement and define the conditional entropy \begin{align*} H_\rho(\mathcal A \mid \mathcal C) := H_\rho(\mathcal A \vee \mathcal C)-H_\rho(\mathcal C). \end{align*} For a measure-preserving map $R: Z \to Z$ and a finite measurable partition $\mathcal A$ of $Z$, define the $n$-block partition \begin{align*} \mathcal A_0^{n-1} := \bigvee_{k=0}^{n-1} R^{-k}\mathcal A. \end{align*} The entropy rate of $\mathcal A$ is \begin{align*} h_\rho(R,\mathcal A) := \lim_{n\to\infty}\frac{1}{n}H_\rho(\mathcal A_0^{n-1}). \end{align*} The limit exists by the elementary subadditivity argument. Define $a_n:=H_\rho(\mathcal A_0^{n-1})$ for $n\in\mathbb N$. Measure preservation gives $H_\rho(R^{-n}\mathcal A_0^{m-1})=H_\rho(\mathcal A_0^{m-1})$, and the [entropy inequality](/theorems/6729) $H_\rho(\mathcal E\vee\mathcal F)\le H_\rho(\mathcal E)+H_\rho(\mathcal F)$ gives \begin{align*} a_{n+m}=H_\rho(\mathcal A_0^{n+m-1})\le H_\rho(\mathcal A_0^{n-1})+H_\rho(\mathcal A_0^{m-1})=a_n+a_m. \end{align*} Let $\alpha:=\inf_{n\in\mathbb N}a_n/n$. For every $N\in\mathbb N$ and every $n=qN+r$ with $q\in\mathbb N\cup\{0\}$ and $0\le r<N$, repeated subadditivity gives $a_n\le q a_N+a_r$, where $a_0:=0$. Hence \begin{align*} \limsup_{n\to\infty}\frac{a_n}{n}\le \frac{a_N}{N}. \end{align*} Taking the infimum over $N$ gives $\limsup_{n\to\infty}a_n/n\le\alpha$, while $\alpha\le a_n/n$ for every $n$ gives $\alpha\le\liminf_{n\to\infty}a_n/n$. Thus the limit exists and equals $\inf_{n\in\mathbb N}a_n/n$. The Kolmogorov-Sinai entropy is \begin{align*} h_\rho(R) := \sup_{\mathcal A} h_\rho(R,\mathcal A), \end{align*} where the supremum is over all finite measurable partitions $\mathcal A$ of $Z$. [/step]

custom_env admin

[guided]The purpose of this step is to check that no entropy is lost when we pass from two finite partitions in the factors to their rectangle partition in the product. Let $\mathcal P=\{P_1,\dots,P_r\}$ be a finite measurable partition of $X$, and let $\mathcal Q=\{Q_1,\dots,Q_s\}$ be a finite measurable partition of $Y$. We define the finite partition of $X\times Y$ \begin{align*} \mathcal P \boxtimes \mathcal Q := \{P_i \times Q_j : 1 \le i \le r,\ 1 \le j \le s\}. \end{align*} The first point is that dynamical joins commute with forming rectangles. For $n \in \mathbb N$, an atom of the join \begin{align*} \bigvee_{k=0}^{n-1}(T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q) \end{align*} is obtained by choosing, at each time $k$, one rectangle $P_{i_k}\times Q_{j_k}$. The corresponding atom is \begin{align*} \bigcap_{k=0}^{n-1}(T\times S)^{-k}(P_{i_k}\times Q_{j_k}). \end{align*} Since $(T\times S)(x,y)=(T x,S y)$, this preimage separates into the $X$-condition and the $Y$-condition: \begin{align*} \bigcap_{k=0}^{n-1}(T\times S)^{-k}(P_{i_k}\times Q_{j_k}) = \left(\bigcap_{k=0}^{n-1}T^{-k}P_{i_k}\right)\times\left(\bigcap_{k=0}^{n-1}S^{-k}Q_{j_k}\right). \end{align*} Thus the whole $n$-block partition is exactly \begin{align*} \bigvee_{k=0}^{n-1}(T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q) = \left(\bigvee_{k=0}^{n-1}T^{-k}\mathcal P\right)\boxtimes\left(\bigvee_{k=0}^{n-1}S^{-k}\mathcal Q\right). \end{align*} The second point is additivity of Shannon entropy for product partitions. If $\mathcal A=\{A_1,\dots,A_a\}$ is a finite partition of $X$ and $\mathcal C=\{C_1,\dots,C_c\}$ is a finite partition of $Y$, then product measure gives \begin{align*} (\mu\otimes\nu)(A_i\times C_j)=\mu(A_i)\nu(C_j). \end{align*} Substituting these probabilities into the definition of entropy gives \begin{align*} H_{\mu\otimes\nu}(\mathcal A\boxtimes\mathcal C)= -\sum_{i=1}^a\sum_{j=1}^c \mu(A_i)\nu(C_j)\log(\mu(A_i)\nu(C_j)). \end{align*} Using $\log(ab)=\log a+\log b$ when $a,b>0$, and the convention $0\log 0=0$ for zero-measure atoms, this becomes \begin{align*} H_{\mu\otimes\nu}(\mathcal A\boxtimes\mathcal C)=H_\mu(\mathcal A)+H_\nu(\mathcal C). \end{align*} Now take $\mathcal A=\bigvee_{k=0}^{n-1}T^{-k}\mathcal P$ and $\mathcal C=\bigvee_{k=0}^{n-1}S^{-k}\mathcal Q$. The preceding two identities give \begin{align*} H_{\mu\otimes\nu}\left(\bigvee_{k=0}^{n-1}(T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q)\right)=H_\mu\left(\bigvee_{k=0}^{n-1}T^{-k}\mathcal P\right)+H_\nu\left(\bigvee_{k=0}^{n-1}S^{-k}\mathcal Q\right). \end{align*} After dividing by $n$ and letting $n\to\infty$, the definitions of entropy rate give \begin{align*} h_{\mu\otimes\nu}(T\times S,\mathcal P\boxtimes\mathcal Q)=h_\mu(T,\mathcal P)+h_\nu(S,\mathcal Q). \end{align*}[/guided]

custom_env admin

[step:Approximate arbitrary product-space partitions by rectangle partitions]Let $\rho:=\mu\otimes\nu$ and let $\mathcal R=\{R_1,\dots,R_m\}$ be a finite measurable partition of $X\times Y$. The approximation below uses the definition of the product $\sigma$-algebra: finite algebras generated by rectangles with arbitrary measurable sides are dense in the product measure algebra. We claim that for every $\varepsilon>0$ there exist finite measurable partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$ such that \begin{align*} H_\rho(\mathcal R\mid \mathcal P\boxtimes\mathcal Q)<\varepsilon. \end{align*} [claim:Finite rectangle algebras are dense in the product measure algebra] For every $E\in\mathcal B_X\otimes\mathcal B_Y$ and every $\delta>0$, there is a set $F$ belonging to a finite algebra generated by measurable rectangles such that \begin{align*} \rho(E\triangle F)<\delta. \end{align*} [/claim] [proof] Let $\mathscr A$ denote the collection of all $E\in\mathcal B_X\otimes\mathcal B_Y$ with the stated approximation property. We prove that $\mathscr A$ is a $\sigma$-algebra containing every measurable rectangle $B\times C$ with $B\in\mathcal B_X$ and $C\in\mathcal B_Y$. First, $\mathscr A$ is closed under complements because $(E^c)\triangle(F^c)=E\triangle F$. Next let $(E_j)_{j\in\mathbb N}$ be a sequence of pairwise disjoint sets in $\mathscr A$, and let $E:=\bigcup_{j=1}^\infty E_j$. Since $\rho$ is a probability measure, continuity from below gives an integer $N\in\mathbb N$ such that \begin{align*} \rho\left(E\setminus\bigcup_{j=1}^N E_j\right)<\frac{\delta}{2}. \end{align*} For each $1\le j\le N$, choose a set $F_j$ in a finite rectangle algebra with \begin{align*} \rho(E_j\triangle F_j)<\frac{\delta}{2N}. \end{align*} The set $F:=\bigcup_{j=1}^N F_j$ belongs to the finite algebra generated by the finitely many rectangles used to construct $F_1,\dots,F_N$, and subadditivity of measure gives \begin{align*} \rho(E\triangle F)\le \rho\left(E\setminus\bigcup_{j=1}^N E_j\right)+\sum_{j=1}^N\rho(E_j\triangle F_j)<\delta. \end{align*} Thus $\mathscr A$ is closed under countable disjoint unions, and together with closure under complements this makes $\mathscr A$ a $\sigma$-algebra. Every measurable rectangle $B\times C$ belongs to $\mathscr A$, because it is itself an atom-union in the finite algebra generated by the single rectangle $B\times C$. Since measurable rectangles generate $\mathcal B_X\otimes\mathcal B_Y$, it follows that $\mathscr A=\mathcal B_X\otimes\mathcal B_Y$. [/proof] If $m=1$, then $\mathcal R=\{X\times Y\}$ up to null sets, so $H_\rho(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)=0$ for any finite measurable partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$. Thus the desired approximation is immediate in this case. Assume from now on that $m\ge 2$. Apply the claim to each atom $R_i$ with accuracy $\eta/m^2$. Taking the common refinement of the finitely many rectangle algebras obtained for $R_1,\dots,R_m$, we get sets $A_1,\dots,A_m$ in one finite rectangle algebra such that \begin{align*} \sum_{i=1}^m \rho(R_i\triangle A_i)<\frac{\eta}{m}. \end{align*} Refining the finitely many $X$-side and $Y$-side generators of this rectangle algebra into atoms gives finite measurable partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$ such that every $A_i$ is a union of atoms of $\mathcal P\boxtimes\mathcal Q$. Define a finite partition $\mathcal C=\{C_1,\dots,C_m\}$ by \begin{align*} C_1:=A_1 \end{align*} and, for $2\le i\le m-1$, \begin{align*} C_i:=A_i\setminus\bigcup_{\ell=1}^{i-1}A_\ell, \end{align*} with \begin{align*} C_m:=(X\times Y)\setminus\bigcup_{\ell=1}^{m-1}C_\ell. \end{align*} Each $C_i$ is a union of atoms of $\mathcal P\boxtimes\mathcal Q$, so $\mathcal P\boxtimes\mathcal Q$ refines $\mathcal C$. We now trace the error explicitly. For $1\le i\le m-1$, the inclusion $C_i\subset A_i$ gives \begin{align*} R_i\triangle C_i\subset (R_i\triangle A_i)\cup(A_i\setminus C_i). \end{align*} Since $A_i\setminus C_i\subset \bigcup_{\ell<i}A_i\cap A_\ell$ and the sets $R_1,\dots,R_m$ are disjoint, every point of $A_i\cap A_\ell$ lies in $(A_i\triangle R_i)\cup(A_\ell\triangle R_\ell)$. Hence \begin{align*} \sum_{i=1}^{m-1}\rho(R_i\triangle C_i)\le m\sum_{i=1}^m\rho(R_i\triangle A_i)<\eta. \end{align*} Because both $\mathcal R$ and $\mathcal C$ are partitions, $R_m\triangle C_m\subset\bigcup_{i=1}^{m-1}(R_i\triangle C_i)$, so \begin{align*} \sum_{i=1}^m\rho(R_i\triangle C_i)<2\eta. \end{align*} Let $L_{\mathcal R}:X\times Y\to\{1,\dots,m\}$ and $L_{\mathcal C}:X\times Y\to\{1,\dots,m\}$ be the label maps defined by \begin{align*} L_{\mathcal R}(z)=i \quad \text{if } z\in R_i \end{align*} and \begin{align*} L_{\mathcal C}(z)=i \quad \text{if } z\in C_i. \end{align*} Let $E:=\{z\in X\times Y: L_{\mathcal R}(z)\ne L_{\mathcal C}(z)\}$. Then \begin{align*} \rho(E)\le \sum_{i=1}^m\rho(R_i\triangle C_i)<2\eta. \end{align*} The following elementary finite-label estimate proves the needed entropy bound. Let $I:X\times Y\to\{0,1\}$ be the error-indicator map defined by $I(z)=1$ for $z\in E$ and $I(z)=0$ for $z\notin E$. Given $L_{\mathcal C}$ and $I$, the label $L_{\mathcal R}$ is determined on $E^c$ and has at most $m$ possible values on $E$. Expanding the joint entropy of the finite labels $(L_{\mathcal R},L_{\mathcal C},I)$ by the identity $H(A,B)=H(B)+H(A\mid B)$ gives \begin{align*} H_\rho(\mathcal R\mid\mathcal C)\le H_\rho(\{E,E^c\})+\rho(E)\log m. \end{align*} Writing the two-atom entropy explicitly gives \begin{align*} H_\rho(\mathcal R\mid\mathcal C)\le -\rho(E)\log\rho(E)-(1-\rho(E))\log(1-\rho(E))+\rho(E)\log m. \end{align*} Since the right-hand side tends to $0$ as $\rho(E)\to 0$, choose $\eta>0$ small enough that $H_\rho(\mathcal R\mid\mathcal C)<\varepsilon$. Because $\mathcal P\boxtimes\mathcal Q$ refines $\mathcal C$, there is a finite partition $\mathcal D$ such that $\mathcal P\boxtimes\mathcal Q=\mathcal C\vee\mathcal D$. The chain rule gives both identities \begin{align*} H_\rho(\mathcal R\vee\mathcal D\mid\mathcal C)=H_\rho(\mathcal D\mid\mathcal C)+H_\rho(\mathcal R\mid\mathcal C\vee\mathcal D) \end{align*} and \begin{align*} H_\rho(\mathcal R\vee\mathcal D\mid\mathcal C)=H_\rho(\mathcal R\mid\mathcal C)+H_\rho(\mathcal D\mid\mathcal R\vee\mathcal C). \end{align*} The finite-sum formula for conditional entropy, applied atom by atom over $\mathcal C$ and using the log-sum inequality, gives $H_\rho(\mathcal D\mid\mathcal R\vee\mathcal C)\le H_\rho(\mathcal D\mid\mathcal C)$. Subtracting this inequality from the two displayed chain-rule identities yields the monotonicity under refinement in the present case: \begin{align*} H_\rho(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)=H_\rho(\mathcal R\mid\mathcal C\vee\mathcal D)\le H_\rho(\mathcal R\mid\mathcal C)<\varepsilon. \end{align*}[/step]

custom_env admin

[guided]We want to compare an arbitrary finite partition $\mathcal R$ of $X\times Y$ with a rectangle partition. This approximation uses only the product $\sigma$-algebra: measurable rectangles generate $\mathcal B_X\otimes\mathcal B_Y$, and finite rectangle algebras approximate product-measurable sets in measure. Let $\rho:=\mu\otimes\nu$ and let $\mathcal R=\{R_1,\dots,R_m\}$ be a finite measurable partition of $X\times Y$. Fix $\varepsilon>0$. If $m=1$, then $\mathcal R=\{X\times Y\}$ up to null sets, and $H_\rho(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)=0$ for any finite measurable partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$. Hence the claim is immediate in that case, so we assume $m\ge 2$. We first justify the approximation fact used here. Let $\mathscr A$ be the class of all sets $E\in\mathcal B_X\otimes\mathcal B_Y$ such that, for every $\delta>0$, there is a set $F$ in a finite algebra generated by measurable rectangles with \begin{align*} \rho(E\triangle F)<\delta. \end{align*} Why is $\mathscr A$ a $\sigma$-algebra? Complements are immediate because symmetric difference commutes with complement. For countable disjoint unions, let $(E_j)_{j\in\mathbb N}$ be pairwise disjoint sets in $\mathscr A$, define $E:=\bigcup_{j=1}^\infty E_j$, and fix $\delta>0$. Since $\rho$ is finite, continuity from below gives $N\in\mathbb N$ such that \begin{align*} \rho\left(E\setminus\bigcup_{j=1}^N E_j\right)<\frac{\delta}{2}. \end{align*} For $1\le j\le N$, choose $F_j$ in a finite rectangle algebra with \begin{align*} \rho(E_j\triangle F_j)<\frac{\delta}{2N}. \end{align*} Then $F:=\bigcup_{j=1}^N F_j$ lies in the finite algebra generated by all rectangles used for $F_1,\dots,F_N$, and \begin{align*} \rho(E\triangle F)\le \rho\left(E\setminus\bigcup_{j=1}^N E_j\right)+\sum_{j=1}^N\rho(E_j\triangle F_j)<\delta. \end{align*} Thus $\mathscr A$ is closed under countable disjoint unions, hence is a $\sigma$-algebra. Every measurable rectangle $B\times C$ belongs to $\mathscr A$, because it is already in the finite algebra generated by that rectangle. Since measurable rectangles generate $\mathcal B_X\otimes\mathcal B_Y$, we get $\mathscr A=\mathcal B_X\otimes\mathcal B_Y$. Now apply this density result to each atom $R_i$ with tolerance $\eta/m^2$, where $\eta>0$ will be chosen later. Taking the common refinement of the finitely many rectangle algebras obtained for $R_1,\dots,R_m$, we may choose sets $A_1,\dots,A_m$ belonging to one finite rectangle algebra such that \begin{align*} \sum_{i=1}^m \rho(R_i\triangle A_i)<\frac{\eta}{m}. \end{align*} A finite rectangle algebra is generated by finitely many sets in $X$ and finitely many sets in $Y$. Refining those generators into atoms gives finite measurable partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$ such that each $A_i$ is a union of atoms of $\mathcal P\boxtimes\mathcal Q$. The sets $A_i$ need not themselves form a partition. We turn them into one without leaving the same rectangle algebra. Define \begin{align*} C_1:=A_1. \end{align*} For $2\le i\le m-1$, define \begin{align*} C_i:=A_i\setminus\bigcup_{\ell=1}^{i-1}A_\ell. \end{align*} Finally define \begin{align*} C_m:=(X\times Y)\setminus\bigcup_{\ell=1}^{m-1}C_\ell. \end{align*} Then $\mathcal C=\{C_1,\dots,C_m\}$ is a finite measurable partition, and every $C_i$ is still a union of atoms of $\mathcal P\boxtimes\mathcal Q$. Hence $\mathcal P\boxtimes\mathcal Q$ refines $\mathcal C$. We also keep explicit control of the approximation error. For $1\le i\le m-1$, because $C_i\subset A_i$, \begin{align*} R_i\triangle C_i\subset (R_i\triangle A_i)\cup(A_i\setminus C_i). \end{align*} The only points removed from $A_i$ are those already lying in some earlier $A_\ell$. If $z\in A_i\cap A_\ell$ with $\ell<i$, then $z$ cannot lie in both disjoint atoms $R_i$ and $R_\ell$, so $z\in(A_i\triangle R_i)\cup(A_\ell\triangle R_\ell)$. Thus \begin{align*} \sum_{i=1}^{m-1}\rho(R_i\triangle C_i)\le m\sum_{i=1}^m\rho(R_i\triangle A_i)<\eta. \end{align*} Finally, since both $\mathcal R$ and $\mathcal C$ are partitions, any point in $R_m\triangle C_m$ must already lie in one of the symmetric differences $R_i\triangle C_i$ for $1\le i\le m-1$. Therefore \begin{align*} \sum_{i=1}^m\rho(R_i\triangle C_i)<2\eta. \end{align*} Now we convert closeness in measure into small conditional entropy. Define the label maps \begin{align*} L_{\mathcal R}:X\times Y\to\{1,\dots,m\} \end{align*} and \begin{align*} L_{\mathcal C}:X\times Y\to\{1,\dots,m\} \end{align*} by $L_{\mathcal R}(z)=i$ for $z\in R_i$ and $L_{\mathcal C}(z)=i$ for $z\in C_i$. Let \begin{align*} E:=\{z\in X\times Y: L_{\mathcal R}(z)\ne L_{\mathcal C}(z)\}. \end{align*} If the two labels disagree at $z$, then $z$ belongs to at least one symmetric difference $R_i\triangle C_i$, so \begin{align*} \rho(E)\le \sum_{i=1}^m\rho(R_i\triangle C_i)<2\eta. \end{align*} Define the error-indicator map $I:X\times Y\to\{0,1\}$ by $I(z)=1$ for $z\in E$ and $I(z)=0$ for $z\notin E$. Given the $\mathcal C$-label and $I$, the $\mathcal R$-label is determined on $E^c$ and has at most $m$ possible values on $E$. Expanding the joint entropy of the finite labels $(L_{\mathcal R},L_{\mathcal C},I)$ by $H(A,B)=H(B)+H(A\mid B)$ gives the elementary bound \begin{align*} H_\rho(\mathcal R\mid\mathcal C)\le H_\rho(\{E,E^c\})+\rho(E)\log m. \end{align*} Writing out the two-atom entropy gives \begin{align*} H_\rho(\mathcal R\mid\mathcal C)\le -\rho(E)\log\rho(E)-(1-\rho(E))\log(1-\rho(E))+\rho(E)\log m. \end{align*} The right-hand side tends to $0$ as $\rho(E)\to 0$. Choose $\eta>0$ small enough that this upper bound is below $\varepsilon$. Since $\mathcal P\boxtimes\mathcal Q$ refines $\mathcal C$, there is a finite partition $\mathcal D$ such that $\mathcal P\boxtimes\mathcal Q=\mathcal C\vee\mathcal D$. The chain rule gives \begin{align*} H_\rho(\mathcal R\vee\mathcal D\mid\mathcal C)=H_\rho(\mathcal D\mid\mathcal C)+H_\rho(\mathcal R\mid\mathcal C\vee\mathcal D) \end{align*} and also \begin{align*} H_\rho(\mathcal R\vee\mathcal D\mid\mathcal C)=H_\rho(\mathcal R\mid\mathcal C)+H_\rho(\mathcal D\mid\mathcal R\vee\mathcal C). \end{align*} Why does this imply the desired direction of monotonicity? The finite-sum formula for conditional entropy says that, on each atom of $\mathcal C$, splitting further according to $\mathcal R$ cannot increase the average uncertainty of the $\mathcal D$-label; this is exactly the log-sum inequality. Hence \begin{align*} H_\rho(\mathcal D\mid\mathcal R\vee\mathcal C)\le H_\rho(\mathcal D\mid\mathcal C). \end{align*} Comparing the two chain-rule identities gives \begin{align*} H_\rho(\mathcal R\mid\mathcal C\vee\mathcal D)\le H_\rho(\mathcal R\mid\mathcal C). \end{align*} Since $\mathcal P\boxtimes\mathcal Q=\mathcal C\vee\mathcal D$, we conclude \begin{align*} H_\rho(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)=H_\rho(\mathcal R\mid\mathcal C\vee\mathcal D)\le H_\rho(\mathcal R\mid\mathcal C)<\varepsilon. \end{align*} This proves the desired approximation in conditional entropy.[/guided]

custom_env admin

[step:Bound the entropy rate of an arbitrary partition by a nearby rectangle partition]Let $\mathcal R$ be a finite measurable partition of $X\times Y$, and fix $\varepsilon>0$. By the approximation step, choose finite partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$ such that \begin{align*} H_{\mu\otimes\nu}(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)<\varepsilon. \end{align*} For each $n\in\mathbb N$, define \begin{align*} \mathcal R_0^{n-1}:=\bigvee_{k=0}^{n-1}(T\times S)^{-k}\mathcal R \end{align*} and \begin{align*} \mathcal U_0^{n-1}:=\bigvee_{k=0}^{n-1}(T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q). \end{align*} Using the identity $H(\mathcal A\vee\mathcal C)=H(\mathcal C)+H(\mathcal A\mid\mathcal C)$ with $\mathcal A=\mathcal R_0^{n-1}$ and $\mathcal C=\mathcal U_0^{n-1}$, we obtain \begin{align*} H_{\mu\otimes\nu}(\mathcal R_0^{n-1})\le H_{\mu\otimes\nu}(\mathcal U_0^{n-1})+H_{\mu\otimes\nu}(\mathcal R_0^{n-1}\mid\mathcal U_0^{n-1}). \end{align*} The finite identity $H(\mathcal A\vee\mathcal B\mid\mathcal C)=H(\mathcal A\mid\mathcal C)+H(\mathcal B\mid\mathcal A\vee\mathcal C)$, obtained by expanding the definition of conditional entropy, and the non-negativity of finite conditional entropy give the subadditivity estimate for joins \begin{align*} H_{\mu\otimes\nu}(\mathcal R_0^{n-1}\mid\mathcal U_0^{n-1})\le \sum_{k=0}^{n-1}H_{\mu\otimes\nu}((T\times S)^{-k}\mathcal R\mid (T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q)). \end{align*} Indeed, applying this finite identity to the ordered family $((T\times S)^{-k}\mathcal R)_{k=0}^{n-1}$ and conditioning on $\mathcal U_0^{n-1}$ gives a sum of conditional entropies; in the $k$-th summand, dropping all conditioning information except $(T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q)$ can only increase conditional entropy, because the difference is another non-negative finite conditional entropy. Since $T\times S$ preserves $\mu\otimes\nu$, each summand equals \begin{align*} H_{\mu\otimes\nu}(\mathcal R\mid\mathcal P\boxtimes\mathcal Q). \end{align*} Therefore \begin{align*} H_{\mu\otimes\nu}(\mathcal R_0^{n-1})\le H_{\mu\otimes\nu}(\mathcal U_0^{n-1})+n\varepsilon. \end{align*} Divide by $n$ and let $n\to\infty$: \begin{align*} h_{\mu\otimes\nu}(T\times S,\mathcal R)\le h_{\mu\otimes\nu}(T\times S,\mathcal P\boxtimes\mathcal Q)+\varepsilon. \end{align*} By the rectangle partition computation, \begin{align*} h_{\mu\otimes\nu}(T\times S,\mathcal R)\le h_\mu(T,\mathcal P)+h_\nu(S,\mathcal Q)+\varepsilon. \end{align*} Since $h_\mu(T,\mathcal P)\le h_\mu(T)$ and $h_\nu(S,\mathcal Q)\le h_\nu(S)$, we get \begin{align*} h_{\mu\otimes\nu}(T\times S,\mathcal R)\le h_\mu(T)+h_\nu(S)+\varepsilon. \end{align*}[/step]

custom_env admin

[guided]Fix a finite measurable partition $\mathcal R$ of $X\times Y$ and fix $\varepsilon>0$. The approximation step supplies finite measurable partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$ such that \begin{align*} H_{\mu\otimes\nu}(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)<\varepsilon. \end{align*} For $n\in\mathbb N$, define \begin{align*} \mathcal R_0^{n-1}:=\bigvee_{k=0}^{n-1}(T\times S)^{-k}\mathcal R \end{align*} and \begin{align*} \mathcal U_0^{n-1}:=\bigvee_{k=0}^{n-1}(T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q). \end{align*} The identity $H(\mathcal A\vee\mathcal C)=H(\mathcal C)+H(\mathcal A\mid\mathcal C)$ gives \begin{align*} H_{\mu\otimes\nu}(\mathcal R_0^{n-1})\le H_{\mu\otimes\nu}(\mathcal U_0^{n-1})+H_{\mu\otimes\nu}(\mathcal R_0^{n-1}\mid\mathcal U_0^{n-1}). \end{align*} We now bound the conditional term by adding the errors from the $n$ time levels. The finite chain rule and monotonicity of conditional entropy under additional conditioning give \begin{align*} H_{\mu\otimes\nu}(\mathcal R_0^{n-1}\mid\mathcal U_0^{n-1})\le \sum_{k=0}^{n-1}H_{\mu\otimes\nu}((T\times S)^{-k}\mathcal R\mid (T\times S)^{-k}(\mathcal P\boxtimes\mathcal Q)). \end{align*} Since $T\times S$ preserves $\mu\otimes\nu$, each summand equals $H_{\mu\otimes\nu}(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)$, which is less than $\varepsilon$. Therefore \begin{align*} H_{\mu\otimes\nu}(\mathcal R_0^{n-1})\le H_{\mu\otimes\nu}(\mathcal U_0^{n-1})+n\varepsilon. \end{align*} Dividing by $n$ and letting $n\to\infty$ gives \begin{align*} h_{\mu\otimes\nu}(T\times S,\mathcal R)\le h_{\mu\otimes\nu}(T\times S,\mathcal P\boxtimes\mathcal Q)+\varepsilon. \end{align*} The rectangle computation gives \begin{align*} h_{\mu\otimes\nu}(T\times S,\mathcal P\boxtimes\mathcal Q)=h_\mu(T,\mathcal P)+h_\nu(S,\mathcal Q). \end{align*} Finally, by the definitions of $h_\mu(T)$ and $h_\nu(S)$ as suprema over finite partitions, \begin{align*} h_\mu(T,\mathcal P)\le h_\mu(T) \end{align*} and \begin{align*} h_\nu(S,\mathcal Q)\le h_\nu(S). \end{align*} Combining these inequalities yields \begin{align*} h_{\mu\otimes\nu}(T\times S,\mathcal R)\le h_\mu(T)+h_\nu(S)+\varepsilon. \end{align*}[/guided]

custom_env admin

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data