Androma — The Home of Mathematics on the Internet

custom_env admin

[guided]The purpose of this step is to convert one large probability, namely the probability of an entire length-$n$ name, into a sum of one-step conditional probabilities. This is the entropy analogue of writing the probability of a word as the probability of its first symbol times the probability of the second symbol given the first, and so on. For $n\in\mathbb N$, the partition \begin{align*} \mathcal Q_{[0,n-1]}:=\bigvee_{j=0}^{n-1}S^{-j}\mathcal Q \end{align*} records the symbols seen at times $0,1,\dots,n-1$. Its atom at $y$ is the set of all points whose first $n$ symbols agree with those of $y$. The information of this atom is the function \begin{align*} I_{n,\mathcal Q}:Y\to[0,\infty] \end{align*} defined by \begin{align*} I_{n,\mathcal Q}(y):=-\log \nu(\mathcal Q_{[0,n-1]}(y)). \end{align*} Fix an atom of $\mathcal Q_{[0,n-1]}$. Thus choose atoms $Q_{i_0},\dots,Q_{i_{n-1}}\in\mathcal Q$ and define \begin{align*} A_k:=\bigcap_{\ell=0}^{k}S^{-\ell}Q_{i_\ell} \end{align*} for $0\leq k\leq n-1$, with $A_{-1}:=Y$. The set $A_k$ is the cylinder determined by the first $k+1$ symbols. The elementary chain rule for conditional probabilities gives \begin{align*} \nu(A_{n-1})=\prod_{k=0}^{n-1}\nu(S^{-k}Q_{i_k}\mid A_{k-1}). \end{align*} Now we rewrite the $k$th factor from the viewpoint of the point $S^k y$. Because $S$ is invertible and $\nu$ is $S$-invariant, applying $S^k$ transports the previous-symbol cylinder $A_{k-1}$ to the atom of the past partition $\mathcal Q_{[-k,-1]}$ containing $S^k y$, while $S^{-k}Q_{i_k}$ is transported to the present atom $Q_{i_k}$. Therefore \begin{align*} \nu(S^{-k}Q_{i_k}\mid A_{k-1})=\nu(\mathcal Q(S^k y)\mid\mathcal Q_{[-k,-1]})(S^k y). \end{align*} By the definition of $J_k$, \begin{align*} J_k(S^k y)=-\log \nu(\mathcal Q(S^k y)\mid\mathcal Q_{[-k,-1]})(S^k y). \end{align*} Taking $-\log$ of the chain-rule product converts the product into a sum, so for $\nu$-almost every $y\in Y$, \begin{align*} I_{n,\mathcal Q}(y)=\sum_{k=0}^{n-1}J_k(S^k y). \end{align*} To make the null-set issue precise, discard every atom $A_{n-1}$ with $\nu(A_{n-1})=0$; the union of these atoms has $\nu$-measure $0$ because $\mathcal Q_{[0,n-1]}$ is finite. On every remaining atom $A_{n-1}$, each preceding cylinder $A_{k-1}$ has positive measure and the elementary conditional probabilities $\nu(S^{-k}Q_{i_k}\mid A_{k-1})$ are well-defined. The conditional-expectation representatives defining $J_k$ may disagree with these finite-atom conditional probabilities only on a $\nu$-null subset of each finite-past atom. Since there are only finitely many such atoms for the fixed $n$, their union is null. Therefore the displayed identity holds outside a $\nu$-null set depending on $n$, and intersecting over $n\in\mathbb N$ gives a single full-measure set on which the identity holds for every $n$.[/guided]

custom_env admin

[step:Pass from finite pasts to the full past]The $\sigma$-algebras $(\mathcal G_m)_{m\geq1}$ increase to $\mathcal G_\infty$. We use the finite-partition information convergence theorem: for a finite partition $\mathcal Q$ and an increasing sequence of sub-$\sigma$-algebras $\mathcal G_m$ with join $\mathcal G_\infty$, the conditional information functions satisfy \begin{align*} I_\nu(\mathcal Q\mid\mathcal G_m)\to I_\nu(\mathcal Q\mid\mathcal G_\infty) \end{align*} $\nu$-almost everywhere and in $L^1(Y,\mathcal C,\nu)$. Its hypotheses hold here because $\mathcal Q$ is finite, each $\mathcal G_m$ is a sub-$\sigma$-algebra of $\mathcal C$, and $\mathcal G_\infty=\bigvee_{m=1}^{\infty}\mathcal G_m$ by definition. Hence \begin{align*} J_m\to J_\infty \end{align*} $\nu$-almost everywhere and in $L^1(Y,\mathcal C,\nu)$. For completeness, this finite-partition convergence is the direct finite-alphabet consequence of martingale convergence. For each atom $Q_i$ of $\mathcal Q$, the conditional probabilities $\mathbb E_\nu[\mathbb 1_{Q_i}\mid\mathcal G_m]$ converge almost everywhere and in $L^1$ to $\mathbb E_\nu[\mathbb 1_{Q_i}\mid\mathcal G_\infty]$. Since there are finitely many atoms, the corresponding conditional information sums converge almost everywhere; the standard truncation of $-\log t$ on $[\varepsilon,1]$ and the finite entropy bound for $\mathcal Q$ give the $L^1$ convergence. We also invoke Breiman's maximal lemma for finite-alphabet conditional information in the exact form needed here. If $\mathcal Q$ is finite, $(\mathcal G_m)_{m\geq1}$ is increasing, $\mathcal G_\infty=\bigvee_{m=1}^{\infty}\mathcal G_m$, and \begin{align*} F_m:=I_\nu(\mathcal Q\mid\mathcal G_m),\qquad F_\infty:=I_\nu(\mathcal Q\mid\mathcal G_\infty), \end{align*} then the maximal tails \begin{align*} R_{r,\mathcal Q,\mathcal G}(y):=\sup_{m\geq r}|F_m(y)-F_\infty(y)| \end{align*} belong to $L^1(Y,\mathcal C,\nu)$ and satisfy \begin{align*} \int_Y R_{r,\mathcal Q,\mathcal G}(y)\,d\nu(y)\to0 \end{align*} as $r\to\infty$. This maximal lemma is the standard strengthening of martingale information convergence used in proofs of the [Shannon-McMillan-Breiman theorem](/theorems/6766); it is stronger than mere $L^1$ convergence and is the input that makes the non-stationary information average legitimate. Applying it to the present finite partition $\mathcal Q$ and increasing filtration $(\mathcal G_m)_{m\geq1}$ gives \begin{align*} R_r(y):=\sup_{m\geq r}|J_m(y)-J_\infty(y)| \end{align*} with $R_r\in L^1(Y,\mathcal C,\nu)$ and \begin{align*} \int_Y R_r(y)\,d\nu(y)\to0. \end{align*} The [conditional Birkhoff ergodic theorem](/theorems/518) applied to the integrable function $J_\infty:Y\to[0,\infty]$ gives \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1}J_\infty(S^k y)\to \mathbb E_\nu[J_\infty\mid\mathcal I_S](y) \end{align*} for $\nu$-almost every $y\in Y$, where $\mathcal I_S$ is the $S$-invariant $\sigma$-algebra. It remains to replace $J_\infty(S^k y)$ by $J_k(S^k y)$. We record the exact diagonal averaging input needed here. [claim:Breiman-Maker information averaging] Let $(Y,\mathcal C,\nu,S)$ be a probability-preserving system, let $\mathcal Q$ be a finite measurable partition, and let $(\mathcal G_m)_{m\geq0}$ be an increasing sequence of sub-$\sigma$-algebras of $\mathcal C$. Define the maps \begin{align*} f_m:Y&\to[0,\infty] \end{align*} by \begin{align*} f_m:=I_\nu(\mathcal Q\mid\mathcal G_m) \end{align*} for $m\in\mathbb N\cup\{0\}$, and let $f_\infty:Y\to[0,\infty]$ be an integrable function such that $f_m\to f_\infty$ $\nu$-almost everywhere and in $L^1(Y,\mathcal C,\nu)$. Suppose moreover that the Breiman tail condition holds: \begin{align*} R_r:Y&\to[0,\infty] \end{align*} where \begin{align*} R_r(y):=\sup_{m\geq r}|f_m(y)-f_\infty(y)| \end{align*} satisfies $R_r\in L^1(Y,\mathcal C,\nu)$ and $\int_Y R_r(y)\,d\nu(y)\to0$ as $r\to\infty$. Then \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1}\bigl(f_k-f_\infty\bigr)(S^k y)\to0 \end{align*} for $\nu$-almost every $y\in Y$. [/claim] [proof] Define $g_m:Y\to\mathbb R$ by $g_m:=f_m-f_\infty$. Fix $r\in\mathbb N$. For every $n>r$ and every $y\in Y$, \begin{align*} \left|\frac{1}{n}\sum_{k=0}^{n-1}g_k(S^k y)\right|\leq \frac{1}{n}\sum_{k=0}^{r-1}|g_k(S^k y)|+\frac{1}{n}\sum_{k=r}^{n-1}R_r(S^k y). \end{align*} The first term tends to $0$ for every $y$ for which the finite numbers $|g_0(y)|,\dots,|g_{r-1}(S^{r-1}y)|$ are defined, because the numerator is fixed while $n\to\infty$. The function $R_r$ is integrable by the Breiman tail condition, so the [conditional Birkhoff ergodic theorem](/theorems/518) applied to $R_r$ gives \begin{align*} \limsup_{n\to\infty}\frac{1}{n}\sum_{k=r}^{n-1}R_r(S^k y)\leq \mathbb E_\nu[R_r\mid\mathcal I_S](y) \end{align*} for $\nu$-almost every $y$. Hence \begin{align*} \limsup_{n\to\infty}\left|\frac{1}{n}\sum_{k=0}^{n-1}g_k(S^k y)\right|\leq \mathbb E_\nu[R_r\mid\mathcal I_S](y) \end{align*} for every fixed $r$ and for $\nu$-almost every $y$. It remains to let $r\to\infty$. Since $R_r\downarrow0$ $\nu$-almost everywhere and $\int_Y R_r(y)\,d\nu(y)\to0$, preservation of integrals under [conditional expectation](/page/Conditional%20Expectation) gives \begin{align*} \int_Y \mathbb E_\nu[R_r\mid\mathcal I_S](y)\,d\nu(y)=\int_Y R_r(y)\,d\nu(y)\to0. \end{align*} The functions $\mathbb E_\nu[R_r\mid\mathcal I_S]$ decrease to $0$ in $L^1(Y,\mathcal C,\nu)$ and hence, after passing through the monotone limit, converge to $0$ $\nu$-almost everywhere. Therefore the limsup above is $0$ for $\nu$-almost every $y$, which proves the claimed diagonal convergence. [/proof] Its hypotheses apply after adjoining $\mathcal G_0:=\{\varnothing,Y\}$ to the increasing filtration. The partition $\mathcal Q$ is finite, $J_m=I_\nu(\mathcal Q\mid\mathcal G_m)$ for $m\geq0$, $J_m\to J_\infty$ in $L^1(Y,\mathcal C,\nu)$ and almost everywhere by the finite-partition information convergence theorem, and the preceding Breiman tail estimate gives $R_r\in L^1(Y,\mathcal C,\nu)$ with $\int_Y R_r(y)\,d\nu(y)\to0$ for $R_r(y):=\sup_{m\geq r}|J_m(y)-J_\infty(y)|$. Therefore \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1}\bigl(J_k-J_\infty\bigr)(S^k y)\to0 \end{align*} for $\nu$-almost every $y\in Y$. Combining this proved approximation with the chain-rule identity and the conditional Birkhoff limit yields \begin{align*} \frac{1}{n}I_{n,\mathcal Q}(y)\to \mathbb E_\nu[J_\infty\mid\mathcal I_S](y) \end{align*} for $\nu$-almost every $y\in Y$.[/step]

custom_env admin

[guided]We first pass from finite pasts to the full past. The $\sigma$-algebras $(\mathcal G_m)_{m\geq1}$ increase to \begin{align*} \mathcal G_\infty=\bigvee_{m=1}^{\infty}\mathcal G_m. \end{align*} The finite-partition information convergence theorem applies because $\mathcal Q$ is finite and each $\mathcal G_m$ is a sub-$\sigma$-algebra of $\mathcal C$. It gives \begin{align*} I_\nu(\mathcal Q\mid\mathcal G_m)\to I_\nu(\mathcal Q\mid\mathcal G_\infty) \end{align*} $\nu$-almost everywhere and in $L^1(Y,\mathcal C,\nu)$. Since \begin{align*} J_m=I_\nu(\mathcal Q\mid\mathcal G_m) \end{align*} and \begin{align*} J_\infty=I_\nu(\mathcal Q\mid\mathcal G_\infty), \end{align*} we have \begin{align*} J_m\to J_\infty \end{align*} $\nu$-almost everywhere and in $L^1(Y,\mathcal C,\nu)$. The missing hypothesis for the diagonal average is supplied by Breiman's maximal convergence theorem for conditional information of a finite partition. In the exact form used here, if $\mathcal Q$ is finite, if $(\mathcal G_m)_{m\geq1}$ is increasing, if \begin{align*} \mathcal G_\infty=\bigvee_{m=1}^{\infty}\mathcal G_m, \end{align*} and if \begin{align*} F_m:=I_\nu(\mathcal Q\mid\mathcal G_m),\qquad F_\infty:=I_\nu(\mathcal Q\mid\mathcal G_\infty), \end{align*} then the maximal tail functions \begin{align*} R_{r,\mathcal Q,\mathcal G}(y):=\sup_{m\geq r}|F_m(y)-F_\infty(y)| \end{align*} are integrable and satisfy \begin{align*} \int_Y R_{r,\mathcal Q,\mathcal G}(y)\,d\nu(y)\to0. \end{align*} With $F_m=J_m$ and $F_\infty=J_\infty$, this gives \begin{align*} R_r(y):=\sup_{m\geq r}|J_m(y)-J_\infty(y)| \end{align*} with $R_r\in L^1(Y,\mathcal C,\nu)$ and \begin{align*} \int_Y R_r(y)\,d\nu(y)\to0. \end{align*} The [conditional Birkhoff ergodic theorem](/theorems/518) applies to the integrable function \begin{align*} J_\infty:Y\to[0,\infty]. \end{align*} Therefore \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1}J_\infty(S^k y)\to \mathbb E_\nu[J_\infty\mid\mathcal I_S](y) \end{align*} for $\nu$-almost every $y\in Y$, where $\mathcal I_S$ is the $S$-invariant $\sigma$-algebra. It remains to justify the diagonal replacement of $J_k(S^k y)$ by $J_\infty(S^k y)$. The exact input is the Breiman-Maker information averaging lemma. Let $(Y,\mathcal C,\nu,S)$ be a probability-preserving system, let $\mathcal Q$ be a finite measurable partition, and let $(\mathcal G_m)_{m\geq0}$ be an increasing sequence of sub-$\sigma$-algebras of $\mathcal C$. Define \begin{align*} f_m:Y\to[0,\infty] \end{align*} by \begin{align*} f_m:=I_\nu(\mathcal Q\mid\mathcal G_m), \end{align*} and let $f_\infty:Y\to[0,\infty]$ be an integrable function with $f_m\to f_\infty$ $\nu$-almost everywhere and in $L^1(Y,\mathcal C,\nu)$. Suppose the tail functions \begin{align*} R_r:Y\to[0,\infty] \end{align*} defined by \begin{align*} R_r(y):=\sup_{m\geq r}|f_m(y)-f_\infty(y)| \end{align*} satisfy $R_r\in L^1(Y,\mathcal C,\nu)$ and \begin{align*} \int_Y R_r(y)\,d\nu(y)\to0 \end{align*} as $r\to\infty$. Then the lemma says \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1}\bigl(f_k-f_\infty\bigr)(S^k y)\to0 \end{align*} for $\nu$-almost every $y\in Y$. Here is the proof of that diagonal lemma in the present notation. Put \begin{align*} g_m:Y\to\mathbb R \end{align*} with \begin{align*} g_m:=f_m-f_\infty. \end{align*} Fix $r\in\mathbb N$. For every $n>r$ and every $y\in Y$, \begin{align*} \left|\frac{1}{n}\sum_{k=0}^{n-1}g_k(S^k y)\right|\leq \frac{1}{n}\sum_{k=0}^{r-1}|g_k(S^k y)|+\frac{1}{n}\sum_{k=r}^{n-1}R_r(S^k y). \end{align*} The first term tends to $0$ for every $y$ for which the finitely many values $|g_0(y)|,\dots,|g_{r-1}(S^{r-1}y)|$ are defined. Since $R_r$ is integrable, the [conditional Birkhoff ergodic theorem](/theorems/518) gives \begin{align*} \limsup_{n\to\infty}\frac{1}{n}\sum_{k=r}^{n-1}R_r(S^k y)\leq \mathbb E_\nu[R_r\mid\mathcal I_S](y) \end{align*} for $\nu$-almost every $y\in Y$. Thus, for fixed $r$, \begin{align*} \limsup_{n\to\infty}\left|\frac{1}{n}\sum_{k=0}^{n-1}g_k(S^k y)\right|\leq \mathbb E_\nu[R_r\mid\mathcal I_S](y) \end{align*} for $\nu$-almost every $y$. Now let $r\to\infty$. Since $R_r\downarrow0$ $\nu$-almost everywhere and \begin{align*} \int_Y R_r(y)\,d\nu(y)\to0, \end{align*} conditional expectation preserves integrals: \begin{align*} \int_Y \mathbb E_\nu[R_r\mid\mathcal I_S](y)\,d\nu(y)=\int_Y R_r(y)\,d\nu(y)\to0. \end{align*} The functions $\mathbb E_\nu[R_r\mid\mathcal I_S]$ decrease to $0$ in $L^1(Y,\mathcal C,\nu)$ and hence converge to $0$ $\nu$-almost everywhere. Therefore \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1}\bigl(f_k-f_\infty\bigr)(S^k y)\to0 \end{align*} for $\nu$-almost every $y\in Y$. We now apply this with $f_m=J_m$ and $f_\infty=J_\infty$. The hypotheses verified above give \begin{align*} J_m=I_\nu(\mathcal Q\mid\mathcal G_m)\to J_\infty=I_\nu(\mathcal Q\mid\mathcal G_\infty) \end{align*} $\nu$-almost everywhere and in $L^1(Y,\mathcal C,\nu)$. The Breiman tail estimate for conditional information of a finite partition gives, for \begin{align*} R_r(y):=\sup_{m\geq r}|J_m(y)-J_\infty(y)|, \end{align*} that $R_r\in L^1(Y,\mathcal C,\nu)$ and \begin{align*} \int_Y R_r(y)\,d\nu(y)\to0. \end{align*} Hence \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1}\bigl(J_k-J_\infty\bigr)(S^k y)\to0 \end{align*} for $\nu$-almost every $y\in Y$. Finally, combine this diagonal convergence with the Birkhoff limit: \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1}J_\infty(S^k y)\to \mathbb E_\nu[J_\infty\mid\mathcal I_S](y). \end{align*} Adding the two limits gives \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1}J_k(S^k y)\to \mathbb E_\nu[J_\infty\mid\mathcal I_S](y). \end{align*} Using the chain-rule identity \begin{align*} I_{n,\mathcal Q}(y)=\sum_{k=0}^{n-1}J_k(S^k y) \end{align*} then yields \begin{align*} \frac{1}{n}I_{n,\mathcal Q}(y)\to \mathbb E_\nu[J_\infty\mid\mathcal I_S](y) \end{align*} for $\nu$-almost every $y\in Y$.[/guided]

custom_env admin

[step:Lift the non-invertible system to its natural extension] Return to the original standard probability-preserving system $(X,\mathcal B,\mu,T)$, where $T$ need not be invertible. The standardness assumption means that $(X,\mathcal B)$ is a standard Borel space after completion on a $\mu$-null set, so countable products and inverse-limit measurable structures have the usual projective-limit probability measures. Let $\widehat X$ be the inverse-limit space of one-sided histories \begin{align*} \widehat X:=\{(x_0,x_{-1},x_{-2},\dots)\in X^{\mathbb N_0}:T x_{-j}=x_{-j+1}\text{ for every }j\geq1\}, \end{align*} let $\widehat{\mathcal B}$ be the trace of the product $\sigma$-algebra. For indices $0\leq j_1<\cdots<j_r$ and sets $B_1,\dots,B_r\in\mathcal B$, define the cylinder distribution by \begin{align*} \widehat\mu\{\widehat x:x_{-j_1}\in B_1,\dots,x_{-j_r}\in B_r\}:=\mu\left(B_r\cap T^{-(j_r-j_{r-1})}B_{r-1}\cap\cdots\cap T^{-(j_r-j_1)}B_1\right). \end{align*} The consistency of these finite-dimensional distributions follows from $\mu(T^{-1}A)=\mu(A)$ for every $A\in\mathcal B$: deleting the oldest coordinate or inserting $X$ in any coordinate leaves the displayed value unchanged by invariance of $\mu$. Since $(X,\mathcal B)$ is standard Borel, the countable projective-limit construction for standard Borel probability spaces gives a probability measure $\widehat\mu$ on $(\widehat X,\widehat{\mathcal B})$ with these cylinder values. This is the natural extension of $(X,\mathcal B,\mu,T)$. Let $\widehat T:\widehat X\to\widehat X$ be the shift \begin{align*} \widehat T(x_0,x_{-1},x_{-2},\dots):=(T x_0,x_0,x_{-1},\dots). \end{align*} Its inverse is the measurable map \begin{align*} \widehat T^{-1}(x_0,x_{-1},x_{-2},\dots):=(x_{-1},x_{-2},x_{-3},\dots), \end{align*} and the displayed cylinder formula shows that $\widehat\mu$ is $\widehat T$-invariant. Let \begin{align*} \pi_0:\widehat X\to X \end{align*} be the time-zero factor map $\pi_0(x_0,x_{-1},x_{-2},\dots)=x_0$. Thus $\pi_0$ is measurable, $\widehat\mu\circ\pi_0^{-1}=\mu$, and \begin{align*} \pi_0\circ\widehat T=T\circ\pi_0. \end{align*} Let \begin{align*} \widehat{\mathcal P}:=\pi_0^{-1}\mathcal P \end{align*} be the lifted finite partition of $\widehat X$. For every $n\in\mathbb N$ and every $\widehat x\in\widehat X$ outside a fixed $\widehat\mu$-null set, \begin{align*} \widehat{\mathcal P}_{[0,n-1]}(\widehat x)=\pi_0^{-1}\bigl(\mathcal P_{[0,n-1]}(\pi_0\widehat x)\bigr). \end{align*} Because $\widehat\mu\circ\pi_0^{-1}=\mu$, this implies \begin{align*} \widehat\mu(\widehat{\mathcal P}_{[0,n-1]}(\widehat x))=\mu(\mathcal P_{[0,n-1]}(\pi_0\widehat x)). \end{align*} Let $\mathcal I_{\widehat T}:=\{A\in\widehat{\mathcal B}:\widehat T^{-1}A=A\text{ modulo }\widehat\mu\}$ denote the $\widehat T$-invariant $\sigma$-algebra. Applying the invertible case to $(\widehat X,\widehat{\mathcal B},\widehat\mu,\widehat T)$ and $\widehat{\mathcal P}$ gives an $\mathcal I_{\widehat T}$-measurable function \begin{align*} \widehat h:\widehat X\to[0,\infty] \end{align*} such that \begin{align*} -\frac{1}{n}\log\mu(\mathcal P_{[0,n-1]}(\pi_0\widehat x))\to \widehat h(\widehat x) \end{align*} for $\widehat\mu$-almost every $\widehat x\in\widehat X$. [/step]

custom_env admin

[step:Descend the limiting function and recover the entropy rate] For each $n\in\mathbb N$, define \begin{align*} a_n:X\to[0,\infty] \end{align*} by \begin{align*} a_n(x):=-\frac{1}{n}\log\mu(\mathcal P_{[0,n-1]}(x)). \end{align*} The lifted convergence says that $a_n\circ\pi_0$ converges $\widehat\mu$-almost everywhere to $\widehat h$. Since every $a_n\circ\pi_0$ is $\pi_0^{-1}\mathcal B$-measurable, the almost-sure limit $\widehat h$ is also $\pi_0^{-1}\mathcal B$-measurable after modifying it on a $\widehat\mu$-null set. [claim:Factor-measurability descent] If $\pi_0:(\widehat X,\widehat{\mathcal B},\widehat\mu)\to(X,\mathcal B,\mu)$ is a measure-preserving factor map between completed standard probability spaces and $g:\widehat X\to[0,\infty]$ is $\pi_0^{-1}\mathcal B$-measurable, then there exists a $\mathcal B$-measurable function $f:X\to[0,\infty]$ with $g=f\circ\pi_0$ $\widehat\mu$-almost everywhere. [/claim] [proof] Because $g$ is $\pi_0^{-1}\mathcal B$-measurable on the completed space, there exists a $\pi_0^{-1}\mathcal B$-measurable function $g_0:\widehat X\to[0,\infty]$ such that $g=g_0$ $\widehat\mu$-almost everywhere. For each $q\in\mathbb Q\cap[0,\infty)$, choose $B_q\in\mathcal B$ such that \begin{align*} \{\widehat x:g_0(\widehat x)>q\}=\pi_0^{-1}B_q. \end{align*} We now replace these representatives by a monotone family. For each $q\in\mathbb Q\cap[0,\infty)$, define \begin{align*} C_q:=\bigcup_{r\in\mathbb Q\cap[0,\infty),\ r>q}B_r. \end{align*} Then $C_r\subseteq C_q$ whenever $q<r$, each $C_q$ belongs to $\mathcal B$, and \begin{align*} \pi_0^{-1}C_q=\{\widehat x:g_0(\widehat x)>q\} \end{align*} because the strict superlevel sets of $g_0$ satisfy $\{g_0>q\}=\bigcup_{r\in\mathbb Q,\ r>q}\{g_0>r\}$. Define the measurable function \begin{align*} f:X\to[0,\infty] \end{align*} by \begin{align*} f(x):=\sup\{q\in\mathbb Q\cap[0,\infty):x\in C_q\}, \end{align*} with the convention that the supremum of the empty set is $0$. The monotonicity of the family $(C_q)_{q\in\mathbb Q\cap[0,\infty)}$ implies that, for every rational $q\geq0$, \begin{align*} \{x:f(x)>q\}=\bigcup_{r\in\mathbb Q\cap[0,\infty),\ r>q}C_r=C_q. \end{align*} Pulling back by $\pi_0$ gives \begin{align*} \{\widehat x:f(\pi_0\widehat x)>q\}=\pi_0^{-1}C_q=\{\widehat x:g_0(\widehat x)>q\} \end{align*} for every rational $q\geq0$. Equality of all rational superlevel sets implies $f\circ\pi_0=g_0$ everywhere, and hence $g=f\circ\pi_0$ $\widehat\mu$-almost everywhere. [/proof] Applying the claim to $g=\widehat h$, there exists a measurable function \begin{align*} \bar h_\mu(T,\mathcal P\mid\mathcal I_T):X\to[0,\infty] \end{align*} such that \begin{align*} \widehat h=\bar h_\mu(T,\mathcal P\mid\mathcal I_T)\circ\pi_0 \end{align*} for $\widehat\mu$-almost every point. We first push the almost-sure convergence down to $X$. Let \begin{align*} E:=\left\{x\in X:a_n(x)\text{ does not converge to }\bar h_\mu(T,\mathcal P\mid\mathcal I_T)(x)\right\}. \end{align*} For every $\widehat x$ outside the union of the null set where $a_n\circ\pi_0$ fails to converge to $\widehat h$ and the null set where $\widehat h\neq\bar h_\mu(T,\mathcal P\mid\mathcal I_T)\circ\pi_0$, the point $\pi_0\widehat x$ does not belong to $E$. Hence $\pi_0^{-1}E$ is contained in a $\widehat\mu$-null set. Since $\widehat\mu\circ\pi_0^{-1}=\mu$, \begin{align*} \mu(E)=\widehat\mu(\pi_0^{-1}E)=0. \end{align*} Therefore \begin{align*} a_n(x)\to\bar h_\mu(T,\mathcal P\mid\mathcal I_T)(x) \end{align*} for $\mu$-almost every $x\in X$. We now verify invariant measurability after descent. Since $\widehat h=\bar h_\mu(T,\mathcal P\mid\mathcal I_T)\circ\pi_0$ and $\widehat h\circ\widehat T=\widehat h$ $\widehat\mu$-almost everywhere, the factor relation $\pi_0\circ\widehat T=T\circ\pi_0$ gives \begin{align*} \bar h_\mu(T,\mathcal P\mid\mathcal I_T)(T\pi_0\widehat x)=\bar h_\mu(T,\mathcal P\mid\mathcal I_T)(\pi_0\widehat x) \end{align*} for $\widehat\mu$-almost every $\widehat x\in\widehat X$. Because $\widehat\mu\circ\pi_0^{-1}=\mu$, this is exactly \begin{align*} \bar h_\mu(T,\mathcal P\mid\mathcal I_T)\circ T=\bar h_\mu(T,\mathcal P\mid\mathcal I_T) \end{align*} $\mu$-almost everywhere. Thus the descended function is $\mathcal I_T$-measurable modulo $\mu$. Hence \begin{align*} \lim_{n\to\infty}-\frac{1}{n}\log\mu(\mathcal P_{[0,n-1]}(x))=\bar h_\mu(T,\mathcal P\mid\mathcal I_T)(x) \end{align*} for $\mu$-almost every $x\in X$. Since $\mathcal P$ is finite, the entropy rate $h_\mu(T,\mathcal P)$ is at most $\log |\mathcal P|$, and the integral identity proved below implies that $\bar h_\mu(T,\mathcal P\mid\mathcal I_T)$ is finite $\mu$-almost everywhere; after changing it on a null set we regard it as $[0,\infty)$-valued. Finally, using $\widehat\mu\circ\pi_0^{-1}=\mu$ and $\widehat h=\bar h_\mu(T,\mathcal P\mid\mathcal I_T)\circ\pi_0$ $\widehat\mu$-almost everywhere, the change-of-variables identity for pushforward measures gives \begin{align*} \int_X \bar h_\mu(T,\mathcal P\mid\mathcal I_T)(x)\,d\mu(x)=\int_{\widehat X}\widehat h(\widehat x)\,d\widehat\mu(\widehat x). \end{align*} Define \begin{align*} h_{\widehat\mu}(\widehat T,\widehat{\mathcal P}):=\lim_{n\to\infty}\frac{1}{n}H_{\widehat\mu}(\widehat{\mathcal P}_{[0,n-1]}) \end{align*} for the entropy rate of the lifted finite partition. By the invertible case, \begin{align*} \int_{\widehat X}\widehat h(\widehat x)\,d\widehat\mu(\widehat x)=h_{\widehat\mu}(\widehat T,\widehat{\mathcal P}). \end{align*} Since $\widehat{\mathcal P}_{[0,n-1]}$ is the pullback of $\mathcal P_{[0,n-1]}$ under $\pi_0$ and $\widehat\mu\circ\pi_0^{-1}=\mu$, the entropies of the corresponding finite partitions agree for every $n$: \begin{align*} H_{\widehat\mu}(\widehat{\mathcal P}_{[0,n-1]})=H_\mu(\mathcal P_{[0,n-1]}). \end{align*} Taking entropy-rate limits gives \begin{align*} h_{\widehat\mu}(\widehat T,\widehat{\mathcal P})=h_\mu(T,\mathcal P). \end{align*} Combining the last three displays gives \begin{align*} \int_X \bar h_\mu(T,\mathcal P\mid\mathcal I_T)(x)\,d\mu(x)=h_\mu(T,\mathcal P). \end{align*} This proves both the almost-sure convergence statement and the asserted integral identity. [/step]

custom_env admin

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data