Shannon-McMillan-Breiman Theorem — Statement & Proof

Shannon-McMillan-Breiman Theorem (Theorem # 6766)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first prove the theorem when $T$ is invertible. In that case the information of a forward block can be written exactly as a sum of conditional information increments: the symbol at time $k$ is conditioned on the symbols at times $0,\dots,k-1$, which becomes the present symbol conditioned on a finite past after applying $T^k$. The martingale convergence theorem identifies these finite-past increments with the conditional information given the entire past, and an ergodic averaging lemma plus Birkhoff's theorem turns the resulting time average into the entropy rate. For a non-invertible system we pass to the natural extension, where forward names and their measures are preserved under the factor map, and then descend the almost sure statement back to $X$. [/proofplan] [step:Introduce conditional information for a finite partition] Let $\log$ denote the natural logarithm. For $n\in\mathbb N$, define the joined partition $\mathcal P_0^{n-1}$ by $\mathcal P_0^{n-1}:=\bigvee_{k=0}^{n-1}T^{-k}\mathcal P$, and let $\mathcal P_0^{n-1}(x)$ denote the atom of this joined partition containing $x$. Define the block information function $I_n^{\mathcal P}:X\to[0,\infty]$ by $I_n^{\mathcal P}(x):=-\log\mu(\mathcal P_0^{n-1}(x))$, with the convention $-\log 0=\infty$. If $\mathcal Q$ is a finite measurable partition of $X$, define the information function $I_\mu(\mathcal Q):X\to[0,\infty]$ by \begin{align*} I_\mu(\mathcal Q)(x):=-\log\mu(\mathcal Q(x)), \end{align*} where $\mathcal Q(x)$ is the atom of $\mathcal Q$ containing $x$. If $\mathcal G\subseteq\mathcal B$ is a sub-$\sigma$-algebra, define the conditional information function $I_\mu(\mathcal Q\mid\mathcal G):X\to[0,\infty]$ by the finite-atom formula \begin{align*} I_\mu(\mathcal Q\mid\mathcal G)(x):=-\sum_{Q\in\mathcal Q}\mathbb 1_Q(x)\log\mathbb E[\mathbb 1_Q\mid\mathcal G](x), \end{align*} with the convention $-\log 0=\infty$. This avoids treating $\mathcal Q(x)$ as a fixed set inside the [conditional expectation](/page/Conditional%20Expectation): for each atom $Q\in\mathcal Q$, $\mathbb E[\mathbb 1_Q\mid\mathcal G]:X\to[0,1]$ is a chosen $\mathcal G$-measurable version of the [conditional probability](/page/Conditional%20Probability) of $Q$, and the finite sum selects the atom containing $x$. The corresponding conditional entropy is \begin{align*} H_\mu(\mathcal Q\mid\mathcal G):=\int_X I_\mu(\mathcal Q\mid\mathcal G)(x)\,d\mu(x). \end{align*} [/step] [step:Prove the theorem when $T$ is invertible] Assume in this step that $T:X\to X$ is invertible modulo null sets and that $T^{-1}:X\to X$ is measurable and measure-preserving. For each integer $j\in\mathbb Z$, define the shifted partition \begin{align*} \mathcal P_j:=T^{-j}\mathcal P. \end{align*} For $m\in\mathbb N$, define the finite-past $\sigma$-algebra \begin{align*} \mathcal F_m:=\sigma\left(\bigvee_{j=1}^{m}\mathcal P_{-j}\right)=\sigma\left(\bigvee_{j=1}^{m}T^j\mathcal P\right), \end{align*} and define the full-past $\sigma$-algebra \begin{align*} \mathcal F_\infty:=\sigma\left(\bigcup_{m=1}^{\infty}\mathcal F_m\right). \end{align*} For $m\geq 0$, with $\mathcal F_0:=\{\varnothing,X\}$, define $f_m:X\to[0,\infty]$ by \begin{align*} f_m(x):=I_\mu(\mathcal P\mid\mathcal F_m)(x). \end{align*} Also define $f:X\to[0,\infty]$ by \begin{align*} f(x):=I_\mu(\mathcal P\mid\mathcal F_\infty)(x). \end{align*} For each $n\in\mathbb N$, the chain rule for conditional probabilities applied to the finite partition $\mathcal P_0^{n-1}$ gives, outside the null union of zero-measure atoms, \begin{align*} I_n^{\mathcal P}(x)=\sum_{k=0}^{n-1} f_k(T^k x). \end{align*} Indeed, the conditional probability of the symbol at time $k$ given the previous symbols at times $0,\dots,k-1$ is, after applying $T^k$, exactly the conditional probability of the present $\mathcal P$-atom given the finite past $\mathcal F_k$. We use the conditional information martingale convergence theorem for finite partitions: if $\mathcal R$ is a finite measurable partition with $H_\mu(\mathcal R)<\infty$ and $(\mathcal G_m)_{m\geq0}$ is an increasing sequence of sub-$\sigma$-algebras with $\mathcal G_\infty=\sigma(\bigcup_{m\geq0}\mathcal G_m)$, then $I_\mu(\mathcal R\mid\mathcal G_m)\to I_\mu(\mathcal R\mid\mathcal G_\infty)$ in $L^1$ and almost everywhere. Its hypotheses hold here because $\mathcal P$ is finite, so $H_\mu(\mathcal P)<\infty$, and because $\mathcal F_\infty$ is defined as $\sigma(\bigcup_{m\geq0}\mathcal F_m)$. Therefore $(f_m)_{m\geq0}$ converges to $f$ in $L^1(X,\mathcal B,\mu)$ and $\mu$-almost everywhere. We now use the Breiman conditional-information averaging lemma in its finite-partition form. This auxiliary result is the Breiman-Maker estimate for the specific sequence of conditional information functions: if $(Y,\mathcal A,\nu,S)$ is measure-preserving, $\mathcal R$ is a finite measurable partition with $H_\nu(\mathcal R)<\infty$, $(\mathcal G_m)_{m\geq0}$ is an increasing sequence of sub-$\sigma$-algebras with $\mathcal G_\infty=\sigma(\bigcup_{m\geq0}\mathcal G_m)$, and $g_m:=I_\nu(\mathcal R\mid\mathcal G_m)$ and $g:=I_\nu(\mathcal R\mid\mathcal G_\infty)$, then \begin{align*} \lim_{n\to\infty}\frac{1}{n}\sum_{k=0}^{n-1}(g_k-g)(S^k y)=0 \end{align*} for $\nu$-almost every $y\in Y$. This is the standard Breiman-Maker step specialized to conditional information; its proof supplies the required truncation and maximal-tail estimates for finite partitions, so no separate domination hypothesis such as $\sup_m|g_m-g|\in L^1$ is being assumed here. The hypotheses are satisfied with $Y=X$, $\mathcal A=\mathcal B$, $\nu=\mu$, $S=T$, $\mathcal R=\mathcal P$, and $\mathcal G_m=\mathcal F_m$: the transformation $T$ is measure-preserving, the partition $\mathcal P$ is finite and hence has finite entropy, the sequence $(\mathcal F_m)_{m\geq0}$ is increasing, and $\mathcal F_\infty=\sigma(\bigcup_{m\geq0}\mathcal F_m)$ by definition. Hence \begin{align*} \lim_{n\to\infty}\frac{1}{n}\sum_{k=0}^{n-1}(f_k-f)(T^k x)=0 \end{align*} for $\mu$-almost every $x\in X$. Therefore \begin{align*} \lim_{n\to\infty}\left(\frac{1}{n}I_n^{\mathcal P}(x)-\frac{1}{n}\sum_{k=0}^{n-1}f(T^k x)\right)=0 \end{align*} for $\mu$-almost every $x\in X$. Since $f\in L^1(X,\mathcal B,\mu)$, $T$ is measure-preserving, and $T$ is ergodic, Birkhoff's ergodic theorem for integrable observables applies and gives \begin{align*} \lim_{n\to\infty}\frac{1}{n}\sum_{k=0}^{n-1}f(T^k x)=\int_X f\,d\mu \end{align*} for $\mu$-almost every $x\in X$. Finally, the fixed-partition entropy formula for invertible systems says that, for a finite partition $\mathcal P$ of an invertible measure-preserving system, the fixed-partition entropy equals the conditional entropy of the present atom given the full past. Applying that formula identifies \begin{align*} \int_X f\,d\mu=H_\mu(\mathcal P\mid\mathcal F_\infty)=h_\mu(T,\mathcal P). \end{align*} Combining the preceding limits proves \begin{align*} \lim_{n\to\infty}\frac{1}{n}I_n^{\mathcal P}(x)=h_\mu(T,\mathcal P) \end{align*} for $\mu$-almost every $x\in X$ in the invertible case. [guided] Assume first that the transformation is invertible. The reason for making this assumption is that it lets us speak about the past of a point. The forward name of $x$ records the atoms visited by $x,T x,\dots,T^{n-1}x$; after shifting by $T^k$, the previous symbols become symbols in the past of $T^k x$. For each integer $j\in\mathbb Z$, define \begin{align*} \mathcal P_j:=T^{-j}\mathcal P. \end{align*} Thus $\mathcal P_j(x)$ records the $\mathcal P$-symbol of $T^j x$. For $m\in\mathbb N$, define the finite-past $\sigma$-algebra \begin{align*} \mathcal F_m:=\sigma\left(\bigvee_{j=1}^{m}\mathcal P_{-j}\right). \end{align*} This is the information contained in the symbols at times $-m,\dots,-1$. Equivalently, since $\mathcal P_{-j}=T^j\mathcal P$ for $j\geq1$, we may write $\mathcal F_m=\sigma(\bigvee_{j=1}^{m}T^j\mathcal P)$. Define also \begin{align*} \mathcal F_\infty:=\sigma\left(\bigcup_{m=1}^{\infty}\mathcal F_m\right), \end{align*} the $\sigma$-algebra generated by the entire past. For $m\geq0$, with $\mathcal F_0=\{\varnothing,X\}$, define $f_m:X\to[0,\infty]$ by \begin{align*} f_m(x):=I_\mu(\mathcal P\mid\mathcal F_m)(x). \end{align*} Thus $f_m(x)$ is the information in the present $\mathcal P$-symbol of $x$ once the previous $m$ symbols are known. Define the limiting conditional information $f:X\to[0,\infty]$ by \begin{align*} f(x):=I_\mu(\mathcal P\mid\mathcal F_\infty)(x). \end{align*} Now fix $n\in\mathbb N$. The ordinary chain rule for probabilities factors the measure of an $n$-block atom into successive conditional probabilities. For a point $x$ outside the null union of atoms of $\mathcal P_0^{n-1}$ with measure $0$, this gives \begin{align*} \mu(\mathcal P_0^{n-1}(x))=\prod_{k=0}^{n-1}\mu(\mathcal P_k(x)\mid\mathcal P_0^{k-1})(x), \end{align*} where the factor for $k=0$ is interpreted as the unconditional probability of the atom $\mathcal P(x)$. Taking $-\log$ turns this product into a sum: \begin{align*} I_n^{\mathcal P}(x)=\sum_{k=0}^{n-1} f_k(T^k x). \end{align*} The equality holds because conditioning the $k$-th symbol of $x$ on the symbols at times $0,\dots,k-1$ is the same, after applying $T^k$, as conditioning the present symbol of $T^k x$ on its symbols at times $-k,\dots,-1$. The next question is what happens to $f_k$ as $k$ grows. The conditional information martingale convergence theorem for finite partitions applies to a finite partition with finite entropy and to an increasing family of conditioning $\sigma$-algebras. Those hypotheses are satisfied here: $\mathcal P$ has finitely many atoms, hence $H_\mu(\mathcal P)<\infty$, and the $\sigma$-algebras $\mathcal F_k$ increase to $\mathcal F_\infty=\sigma(\bigcup_{k\geq0}\mathcal F_k)$. Therefore \begin{align*} f_k\to f \end{align*} in $L^1(X,\mathcal B,\mu)$ and $\mu$-almost everywhere. This convergence alone is not quite enough, because the summand in the block decomposition is $f_k(T^k x)$ rather than $f_k(x)$. We therefore invoke Breiman's conditional-information averaging lemma in the finite-partition form. The statement needed here is: if $(Y,\mathcal A,\nu,S)$ is measure-preserving, $\mathcal R$ is a finite measurable partition with $H_\nu(\mathcal R)<\infty$, $(\mathcal G_m)_{m\geq0}$ is an increasing sequence of sub-$\sigma$-algebras, $\mathcal G_\infty=\sigma(\bigcup_{m\geq0}\mathcal G_m)$, and $g_m=I_\nu(\mathcal R\mid\mathcal G_m)$ and $g=I_\nu(\mathcal R\mid\mathcal G_\infty)$, then \begin{align*} \lim_{n\to\infty}\frac{1}{n}\sum_{k=0}^{n-1}(g_k-g)(S^k y)=0 \end{align*} for $\nu$-almost every $y\in Y$. The important point is that this is not the bare assertion that arbitrary $L^1$ convergence may be sampled along the orbit with a moving index. The lemma is specialized to conditional information of a finite partition. Its proof truncates the functions $g_m-g$ at a height $M$, applies Maker's averaging argument to the bounded truncated part, and then uses the finite entropy of $\mathcal R$ together with conditional-information tail estimates to make the contribution of the set where conditional probabilities are smaller than $e^{-M}$ uniformly small after averaging. Thus no separate domination hypothesis such as $\sup_m|g_m-g|\in L^1$ is being assumed. We check the hypotheses in our setting. The system $(X,\mathcal B,\mu,T)$ is measure-preserving by assumption. The partition $\mathcal P$ is finite. The $\sigma$-algebras $(\mathcal F_m)_{m\geq0}$ are increasing because adding one more past symbol refines the generated information, and $\mathcal F_\infty=\sigma(\bigcup_{m\geq0}\mathcal F_m)$ by definition. With $Y=X$, $\mathcal A=\mathcal B$, $\nu=\mu$, $S=T$, $\mathcal R=\mathcal P$, and $\mathcal G_m=\mathcal F_m$, the functions in the lemma are exactly $g_m=f_m$ and $g=f$. Applying the lemma gives \begin{align*} \lim_{n\to\infty}\frac{1}{n}\sum_{k=0}^{n-1}(f_k-f)(T^k x)=0 \end{align*} for $\mu$-almost every $x\in X$. Hence \begin{align*} \lim_{n\to\infty}\left(\frac{1}{n}I_n^{\mathcal P}(x)-\frac{1}{n}\sum_{k=0}^{n-1}f(T^k x)\right)=0. \end{align*} It remains to identify the limit of the ordinary time average of $f$. Since $f\in L^1(X,\mathcal B,\mu)$ and $T$ is measure-preserving and ergodic, Birkhoff's ergodic theorem gives \begin{align*} \lim_{n\to\infty}\frac{1}{n}\sum_{k=0}^{n-1}f(T^k x)=\int_X f\,d\mu \end{align*} for $\mu$-almost every $x\in X$. The fixed-partition entropy formula for invertible systems identifies this integral with the entropy rate of the partition: \begin{align*} \int_X f\,d\mu=H_\mu(\mathcal P\mid\mathcal F_\infty)=h_\mu(T,\mathcal P). \end{align*} Combining the chain-rule decomposition, Maker's lemma, Birkhoff's theorem, and the entropy formula proves \begin{align*} \lim_{n\to\infty}\frac{1}{n}I_n^{\mathcal P}(x)=h_\mu(T,\mathcal P) \end{align*} for $\mu$-almost every $x\in X$ when $T$ is invertible. [/guided] [/step] [step:Pass from a non-invertible system to its natural extension] Now let $T$ be an arbitrary measure-preserving transformation. Let \begin{align*} \widehat X:=\{(x_j)_{j\in\mathbb Z}\in X^{\mathbb Z}:T x_j=x_{j+1}\text{ for every }j\in\mathbb Z\} \end{align*} be the natural extension space. Let $\widehat{\mathcal B}$ be the trace of the product $\sigma$-algebra on $\widehat X$, let $\widehat T:\widehat X\to\widehat X$ be the shift map given by \begin{align*} \widehat T((x_j)_{j\in\mathbb Z}):=(x_{j+1})_{j\in\mathbb Z}, \end{align*} and let $\pi_0:\widehat X\to X$ be the coordinate projection given by \begin{align*} \pi_0((x_j)_{j\in\mathbb Z}):=x_0. \end{align*} By the natural [extension theorem](/theorems/59) for standard probability measure-preserving systems, applied to the probability-preserving endomorphism $T:(X,\mathcal B,\mu)\to(X,\mathcal B,\mu)$, there is a probability measure $\widehat\mu$ on $(\widehat X,\widehat{\mathcal B})$ such that $(\widehat X,\widehat{\mathcal B},\widehat\mu,\widehat T)$ is an invertible measure-preserving system, $\pi_0\circ\widehat T=T\circ\pi_0$, and $\widehat\mu\circ\pi_0^{-1}=\mu$. In particular, the inverse is the measurable shift $\widehat T^{-1}((x_j)_{j\in\mathbb Z})=(x_{j-1})_{j\in\mathbb Z}$, which is well-defined on $\widehat X$ because $T x_{j-1}=x_j$ for every $j\in\mathbb Z$, and the theorem includes preservation of $\widehat\mu$ by both $\widehat T$ and $\widehat T^{-1}$. The theorem also states that ergodicity is preserved by the natural extension: since $(X,\mathcal B,\mu,T)$ is ergodic, $(\widehat X,\widehat{\mathcal B},\widehat\mu,\widehat T)$ is ergodic. Define the lifted partition \begin{align*} \widehat{\mathcal P}:=\pi_0^{-1}\mathcal P=\{\pi_0^{-1}(P):P\in\mathcal P\}. \end{align*} For every $n\in\mathbb N$ and every $\widehat x\in\widehat X$ with $x:=\pi_0(\widehat x)$, the factor relation $\pi_0\circ\widehat T^k=T^k\circ\pi_0$ gives \begin{align*} \widehat{\mathcal P}_0^{n-1}(\widehat x)=\pi_0^{-1}(\mathcal P_0^{n-1}(x))\cap\widehat X. \end{align*} Since $\widehat\mu\circ\pi_0^{-1}=\mu$, it follows that \begin{align*} \widehat\mu(\widehat{\mathcal P}_0^{n-1}(\widehat x))=\mu(\mathcal P_0^{n-1}(x)). \end{align*} Therefore \begin{align*} I_n^{\widehat{\mathcal P}}(\widehat x)=I_n^{\mathcal P}(\pi_0(\widehat x)). \end{align*} Applying the invertible case to $(\widehat X,\widehat{\mathcal B},\widehat\mu,\widehat T)$ and $\widehat{\mathcal P}$ gives \begin{align*} \lim_{n\to\infty}\frac{1}{n}I_n^{\widehat{\mathcal P}}(\widehat x)=h_{\widehat\mu}(\widehat T,\widehat{\mathcal P}) \end{align*} for $\widehat\mu$-almost every $\widehat x\in\widehat X$. We verify the fixed-partition entropy equality directly from the lifted block measures. For each $n\in\mathbb N$, the atoms of $\bigvee_{k=0}^{n-1}\widehat T^{-k}\widehat{\mathcal P}$ are precisely the nonempty sets of the form $\pi_0^{-1}(A)\cap\widehat X$, where $A$ is an atom of $\bigvee_{k=0}^{n-1}T^{-k}\mathcal P$. Since $\widehat\mu\circ\pi_0^{-1}=\mu$, corresponding atoms have the same measure. Therefore \begin{align*} H_{\widehat\mu}\left(\bigvee_{k=0}^{n-1}\widehat T^{-k}\widehat{\mathcal P}\right)=H_\mu\left(\bigvee_{k=0}^{n-1}T^{-k}\mathcal P\right). \end{align*} Dividing by $n$ and taking the defining limit of fixed-partition entropy gives \begin{align*} h_{\widehat\mu}(\widehat T,\widehat{\mathcal P})=h_\mu(T,\mathcal P). \end{align*} Thus \begin{align*} \lim_{n\to\infty}\frac{1}{n}I_n^{\mathcal P}(\pi_0(\widehat x))=h_\mu(T,\mathcal P) \end{align*} for $\widehat\mu$-almost every $\widehat x\in\widehat X$. [guided] When $T$ is not invertible, a point $x\in X$ may have many possible histories, so the previous step cannot be applied directly. The natural extension replaces $x$ by a full two-sided orbit. Define \begin{align*} \widehat X:=\{(x_j)_{j\in\mathbb Z}\in X^{\mathbb Z}:T x_j=x_{j+1}\text{ for every }j\in\mathbb Z\}. \end{align*} Let $\widehat{\mathcal B}$ be the trace of the product $\sigma$-algebra on $\widehat X$, define the shift map $\widehat T:\widehat X\to\widehat X$ by \begin{align*} \widehat T((x_j)_{j\in\mathbb Z}):=(x_{j+1})_{j\in\mathbb Z}, \end{align*} and define the coordinate projection $\pi_0:\widehat X\to X$ by \begin{align*} \pi_0((x_j)_{j\in\mathbb Z}):=x_0. \end{align*} The natural extension theorem applies because $T$ is a measure-preserving map on the standard probability space $(X,\mathcal B,\mu)$. It gives a probability measure $\widehat\mu$ on $(\widehat X,\widehat{\mathcal B})$ such that $\widehat T$ is invertible and measure-preserving. With our convention $\widehat T((x_j)_{j\in\mathbb Z})=(x_{j+1})_{j\in\mathbb Z}$, the inverse is the measurable shift $\widehat T^{-1}((x_j)_{j\in\mathbb Z})=(x_{j-1})_{j\in\mathbb Z}$; this is well-defined on $\widehat X$ because each sequence in $\widehat X$ satisfies $T x_{j-1}=x_j$. The same theorem gives preservation of $\widehat\mu$ by this invertible shift, the factor identity \begin{align*} \pi_0\circ\widehat T=T\circ\pi_0 \end{align*} holds, and the projection has the original law: \begin{align*} \widehat\mu\circ\pi_0^{-1}=\mu. \end{align*} The same natural-extension theorem preserves ergodicity, so the lifted system is ergodic because the original system is ergodic. Define the lifted partition \begin{align*} \widehat{\mathcal P}:=\pi_0^{-1}\mathcal P=\{\pi_0^{-1}(P):P\in\mathcal P\}. \end{align*} Why does this partition preserve the information blocks? Fix $n\in\mathbb N$ and $\widehat x\in\widehat X$, and write $x:=\pi_0(\widehat x)$. For every $0\leq k\leq n-1$, the factor identity gives \begin{align*} \pi_0(\widehat T^k\widehat x)=T^k(\pi_0(\widehat x))=T^k x. \end{align*} Thus the length-$n$ lifted name of $\widehat x$ with respect to $\widehat{\mathcal P}$ is exactly the pullback of the length-$n$ name of $x$ with respect to $\mathcal P$: \begin{align*} \widehat{\mathcal P}_0^{n-1}(\widehat x)=\pi_0^{-1}(\mathcal P_0^{n-1}(x))\cap\widehat X. \end{align*} Since $\widehat\mu\circ\pi_0^{-1}=\mu$, the two atoms have the same measure: \begin{align*} \widehat\mu(\widehat{\mathcal P}_0^{n-1}(\widehat x))=\mu(\mathcal P_0^{n-1}(x)). \end{align*} Therefore their information functions agree pointwise: \begin{align*} I_n^{\widehat{\mathcal P}}(\widehat x)=I_n^{\mathcal P}(\pi_0(\widehat x)). \end{align*} The lifted system is now invertible and ergodic, so the invertible case applies to $(\widehat X,\widehat{\mathcal B},\widehat\mu,\widehat T)$ and $\widehat{\mathcal P}$. Hence \begin{align*} \lim_{n\to\infty}\frac{1}{n}I_n^{\widehat{\mathcal P}}(\widehat x)=h_{\widehat\mu}(\widehat T,\widehat{\mathcal P}) \end{align*} for $\widehat\mu$-almost every $\widehat x\in\widehat X$. It remains to check that this entropy is the original entropy. For each $n\in\mathbb N$, the atoms of $\bigvee_{k=0}^{n-1}\widehat T^{-k}\widehat{\mathcal P}$ are exactly pullbacks under $\pi_0$ of atoms of $\bigvee_{k=0}^{n-1}T^{-k}\mathcal P$, up to empty sets. Corresponding atoms have equal measure because $\widehat\mu\circ\pi_0^{-1}=\mu$. Therefore the finite-block entropies agree: \begin{align*} H_{\widehat\mu}\left(\bigvee_{k=0}^{n-1}\widehat T^{-k}\widehat{\mathcal P}\right)=H_\mu\left(\bigvee_{k=0}^{n-1}T^{-k}\mathcal P\right). \end{align*} Dividing by $n$ and passing to the defining entropy limits gives \begin{align*} h_{\widehat\mu}(\widehat T,\widehat{\mathcal P})=h_\mu(T,\mathcal P). \end{align*} Combining the information equality with the lifted almost sure convergence yields \begin{align*} \lim_{n\to\infty}\frac{1}{n}I_n^{\mathcal P}(\pi_0(\widehat x))=h_\mu(T,\mathcal P) \end{align*} for $\widehat\mu$-almost every $\widehat x\in\widehat X$. [/guided] [/step] [step:Descend the almost sure conclusion to the original system] Define the exceptional set \begin{align*} E:=\left\{x\in X:\lim_{n\to\infty}\frac{1}{n}I_n^{\mathcal P}(x)\neq h_\mu(T,\mathcal P)\right\}. \end{align*} The preceding step proves that \begin{align*} \widehat\mu(\pi_0^{-1}(E))=0. \end{align*} Because $\widehat\mu\circ\pi_0^{-1}=\mu$, we have \begin{align*} \mu(E)=\widehat\mu(\pi_0^{-1}(E))=0. \end{align*} Therefore \begin{align*} \lim_{n\to\infty}\frac{1}{n}I_n^{\mathcal P}(x)=h_\mu(T,\mathcal P) \end{align*} for $\mu$-almost every $x\in X$. Since $I_n^{\mathcal P}(x)=-\log\mu(\mathcal P_0^{n-1}(x))$ by definition, this is exactly the equivalent logarithmic formulation. This completes the proof. [/step]

Prerequisites (0/9 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Martingale Definition Conditional Probability Definition Extension Theorem Theorem #59 Ergodic Theorem for Markov Chains Theorem #2226 Entropy Formula for Bernoulli Shifts Theorem #6776 Entropy Formula for Topological Markov Chains Theorem #6789 Entropy Formula for a Stationary Markov Shift Theorem #6791 Entropy Formula for Right-Resolving Sofic Shifts Theorem #6798 Entropy Formula for the Gauss Map Theorem #6826 Continuous Image of a Connected Space is Connected Topology Integrability of Continuous Composition Real Analysis Center Manifold Reduction Principle Dynamical Systems Type I Ricci Flow Singularity Model Theorem Analysis Fundamental Theorem Of Calculus Real Analysis Vanishing Average at Large Scales Analysis Leibniz Rule for Distributional Products Distribution Theory Exterior Schwarz–Christoffel Formula Complex Analysis Analysis Area

What brings you to Androma?

Start with a route through the knowledge graph.

Shannon-McMillan-Breiman Theorem (Theorem # 6766)

Discussion

Proof

Prerequisites (0/9 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Shannon-McMillan-Breiman Theorem (Theorem # 6766)

Discussion

Proof

Prerequisites (0/9 completed)

Prerequisites Graph

Explore Further