Kolmogorov-Sinai Generator Theorem — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The inequality $h_\mu(T)\ge h_\mu(T,\mathcal P)$ is immediate once the entropy of $T$ is defined as a supremum over finite partitions, because finite coarsenings of $\mathcal P$ approximate $\mathcal P$ in entropy. For the reverse inequality, we fix an arbitrary finite partition $\mathcal Q$ and approximate it, in conditional entropy, by a finite time-window partition generated by $\mathcal P$. The entropy rate of that finite window is controlled by the entropy rate of $\mathcal P$, because the extra time indices form only a bounded boundary window whose contribution vanishes after division by $n$. Taking the approximation error to zero gives $h_\mu(T,\mathcal Q)\le h_\mu(T,\mathcal P)$ for every finite $\mathcal Q$, hence the supremum is no larger than $h_\mu(T,\mathcal P)$. [/proofplan] [step:Set the notation for time-window partitions and entropy rates] For integers $a\le b$, define the countable time-window partition \begin{align*} \mathcal P_{[a,b]}:=\bigvee_{k=a}^{b}T^{-k}\mathcal P. \end{align*} For a finite measurable partition $\mathcal Q$, define similarly \begin{align*} \mathcal Q_{[0,n-1]}:=\bigvee_{k=0}^{n-1}T^{-k}\mathcal Q. \end{align*} The entropy rate of the partition $\mathcal Q$ is \begin{align*} h_\mu(T,\mathcal Q):=\lim_{n\to\infty}\frac{1}{n}H_\mu(\mathcal Q_{[0,n-1]}), \end{align*} and the same definition applies to the countable finite-entropy partition $\mathcal P$. The limit exists because the sequence $a_n:=H_\mu(\mathcal P_{[0,n-1]})$ is finite and subadditive. Finiteness follows from $H_\mu(\mathcal P)<\infty$ and the entropy subadditivity bound \begin{align*} H_\mu(\mathcal P_{[0,n-1]})\le \sum_{k=0}^{n-1}H_\mu(T^{-k}\mathcal P)=nH_\mu(\mathcal P)<\infty. \end{align*} Subadditivity follows from invariance of $\mu$ and the [Chain Rule for Entropy](/theorems/1635): \begin{align*} H_\mu(\mathcal P_{[0,m+n-1]})\le H_\mu(\mathcal P_{[0,m-1]})+H_\mu(T^{-m}\mathcal P_{[0,n-1]})=H_\mu(\mathcal P_{[0,m-1]})+H_\mu(\mathcal P_{[0,n-1]}). \end{align*} Thus the finite-valued subadditive form of Fekete's lemma applies to $(a_n)_{n\ge1}$. [guided] The notation $\mathcal P_{[a,b]}$ records the atoms of $\mathcal P$ seen from time $a$ through time $b$. This is the partition whose atoms carry the finite names used in entropy computations. The entropy rate $h_\mu(T,\mathcal P)$ is defined from the averages $H_\mu(\mathcal P_{[0,n-1]})/n$, so we must first know that those averages have a limit. For every $n$, the finite entropy assumption and [subadditivity of entropy](/theorems/1634) give \begin{align*} H_\mu(\mathcal P_{[0,n-1]})\le nH_\mu(\mathcal P)<\infty. \end{align*} If a block has length $m+n$, we split it into its first $m$ coordinates and its last $n$ coordinates: \begin{align*} \mathcal P_{[0,m+n-1]}=\mathcal P_{[0,m-1]}\vee T^{-m}\mathcal P_{[0,n-1]}. \end{align*} The chain rule for entropy bounds the entropy of the join by the sum of the two entropies, and $T$-invariance of $\mu$ preserves the atom measures of the shifted partition. Hence \begin{align*} H_\mu(\mathcal P_{[0,m+n-1]})\le H_\mu(\mathcal P_{[0,m-1]})+H_\mu(\mathcal P_{[0,n-1]}). \end{align*} Thus the sequence $a_n=H_\mu(\mathcal P_{[0,n-1]})$ is finite and subadditive. The finite-valued subadditive sequence argument then gives the existence of $\lim_{n\to\infty}a_n/n$, which is the entropy rate $h_\mu(T,\mathcal P)$. [/guided] [/step] [step:Approximate an arbitrary finite partition by a finite generator window] Let $\mathcal Q=\{Q_1,\dots,Q_r\}$ be a finite measurable partition of $X$. For each $N\in\mathbb N$, define the finite-window $\sigma$-algebra \begin{align*} \mathcal F_N:=\sigma(\mathcal P_{[-N,N]}). \end{align*} For a sub-$\sigma$-algebra $\mathcal F\subseteq\mathcal A$, define the conditional entropy of the finite partition $\mathcal Q$ relative to $\mathcal F$ by \begin{align*} H_\mu(\mathcal Q\mid\mathcal F):=\int_X -\sum_{i=1}^{r}\mu(Q_i\mid\mathcal F)(x)\log\mu(Q_i\mid\mathcal F)(x)\,d\mu(x), \end{align*} with the convention $0\log0=0$. Because $\mathcal P$ is a two-sided generator, the increasing family of completed $\sigma$-algebras generated by $(\mathcal F_N)_{N\ge1}$ generates the completion of $\mathcal A$ modulo $\mu$. We take conditional expectations with respect to these completed $\sigma$-algebras, which does not change their $\mu$-a.e. equivalence classes. The martingale convergence theorem for conditional expectations gives $\mu(Q_i\mid\mathcal F_N)\to\mathbb 1_{Q_i}$ in $L^1(X,\mathcal A,\mu)$ for each $i\in\{1,\dots,r\}$. Since the entropy function on the probability simplex in $\mathbb R^r$ is continuous and bounded, the [conditional probability](/page/Conditional%20Probability) vectors converge in measure to a vertex, the corresponding entropy integrands converge in measure to $0$, and bounded convergence along almost-everywhere convergent subsequences gives \begin{align*} \lim_{N\to\infty}H_\mu(\mathcal Q\mid\mathcal F_N)=0. \end{align*} Choose $N\in\mathbb N$ such that \begin{align*} H_\mu(\mathcal Q\mid\mathcal F_N)<\varepsilon, \end{align*} where $\varepsilon>0$ is arbitrary. [guided] We fix the finite partition $\mathcal Q=\{Q_1,\dots,Q_r\}$ because $h_\mu(T)$ is obtained by taking the supremum over such partitions. The generator hypothesis says that observing all translates $T^{-k}\mathcal P$ for $k\in\mathbb Z$ recovers the whole $\sigma$-algebra $\mathcal A$ modulo null sets. Hence, as the symmetric window $[-N,N]$ grows, the finite-window information \begin{align*} \mathcal F_N:=\sigma(\mathcal P_{[-N,N]}) \end{align*} recovers more and more of the information needed to decide which atom of $\mathcal Q$ contains a point. The quantitative form of this approximation is conditional entropy convergence. For any sub-$\sigma$-algebra $\mathcal F\subseteq\mathcal A$, the conditional entropy of $\mathcal Q$ relative to $\mathcal F$ is \begin{align*} H_\mu(\mathcal Q\mid\mathcal F):=\int_X -\sum_{i=1}^{r}\mu(Q_i\mid\mathcal F)(x)\log\mu(Q_i\mid\mathcal F)(x)\,d\mu(x), \end{align*} where $0\log0=0$. Each atom $Q_i$ is measurable modulo $\mu$ with respect to the completed $\sigma$-algebra generated by $\bigcup_{N\ge1}\mathcal F_N$. We therefore take conditional expectations relative to the completed $\mathcal F_N$; this only changes conditional expectations on $\mu$-null sets. The martingale convergence theorem for conditional expectations applies to the integrable function $\mathbb 1_{Q_i}:X\to\mathbb R$ and gives \begin{align*} \mu(Q_i\mid\mathcal F_N)\to\mathbb 1_{Q_i} \end{align*} in $L^1(X,\mathcal A,\mu)$ as $N\to\infty$. Since $\mathcal Q$ has finitely many atoms, these $L^1$ convergences imply [convergence in measure](/page/Convergence%20in%20Measure) of the conditional probability vector to a vertex of the probability simplex. The entropy function on that simplex is continuous and bounded, so the entropy integrands converge in measure to $0$ and are bounded above by $\log r$. Every subsequence has an almost-everywhere convergent further subsequence, and the [dominated convergence theorem](/theorems/4) applied to that further subsequence forces the integrals to converge to $0$; hence the whole sequence satisfies \begin{align*} H_\mu(\mathcal Q\mid\mathcal F_N)\to0. \end{align*} Thus, for an arbitrary error tolerance $\varepsilon>0$, we may choose $N$ such that \begin{align*} H_\mu(\mathcal Q\mid\mathcal F_N)<\varepsilon. \end{align*} [/guided] [/step] [step:Bound the entropy rate of $\mathcal Q$ by the entropy rate of the finite window] Let $\mathcal R$ denote the countable partition $\mathcal P_{[-N,N]}$. Since $\mathcal F_N=\sigma(\mathcal R)$, the preceding step gives \begin{align*} H_\mu(\mathcal Q\mid\mathcal R)<\varepsilon. \end{align*} For every $n\in\mathbb N$, the chain rule for conditional entropy and monotonicity of conditional entropy give \begin{align*} H_\mu(\mathcal Q_{[0,n-1]}\mid\mathcal R_{[0,n-1]})\le\sum_{j=0}^{n-1}H_\mu(T^{-j}\mathcal Q\mid T^{-j}\mathcal R). \end{align*} Since $T$ preserves $\mu$, the map $T^{-j}$ preserves all joint atom measures of $\mathcal Q\vee\mathcal R$, with the countable partition $\mathcal R$ having finite entropy because it is a finite join of translates of $\mathcal P$. Therefore each summand equals $H_\mu(\mathcal Q\mid\mathcal R)$, hence \begin{align*} H_\mu(\mathcal Q_{[0,n-1]}\mid\mathcal R_{[0,n-1]})<n\varepsilon. \end{align*} Using $H_\mu(\mathcal A\vee\mathcal B)\le H_\mu(\mathcal B)+H_\mu(\mathcal A\mid\mathcal B)$ for partitions, with $\mathcal A=\mathcal Q_{[0,n-1]}$ and $\mathcal B=\mathcal R_{[0,n-1]}$, we obtain \begin{align*} H_\mu(\mathcal Q_{[0,n-1]})\le H_\mu(\mathcal R_{[0,n-1]})+n\varepsilon. \end{align*} [guided] The partition $\mathcal R=\mathcal P_{[-N,N]}$ is the finite generator window chosen so that $\mathcal Q$ has small conditional entropy relative to $\mathcal R$. We compare the $n$-block name of $\mathcal Q$ with the $n$-block name of this window. The chain rule says that the uncertainty in $\mathcal Q_{[0,n-1]}$ is at most the uncertainty in $\mathcal R_{[0,n-1]}$ plus the remaining uncertainty in $\mathcal Q_{[0,n-1]}$ after $\mathcal R_{[0,n-1]}$ is known: \begin{align*} H_\mu(\mathcal Q_{[0,n-1]})\le H_\mu(\mathcal R_{[0,n-1]})+H_\mu(\mathcal Q_{[0,n-1]}\mid\mathcal R_{[0,n-1]}). \end{align*} The conditional entropy of a join is bounded by the sum of the conditional entropies of its coordinates. Therefore \begin{align*} H_\mu(\mathcal Q_{[0,n-1]}\mid\mathcal R_{[0,n-1]})\le\sum_{j=0}^{n-1}H_\mu(T^{-j}\mathcal Q\mid T^{-j}\mathcal R). \end{align*} Because $T$ preserves $\mu$, each summand has the same joint atom measures as $H_\mu(\mathcal Q\mid\mathcal R)$, which is less than $\varepsilon$. Hence the total extra uncertainty is less than $n\varepsilon$, and \begin{align*} H_\mu(\mathcal Q_{[0,n-1]})\le H_\mu(\mathcal R_{[0,n-1]})+n\varepsilon. \end{align*} This is the entropy-rate comparison: after division by $n$, the cost of replacing $\mathcal Q$ by the generator window is at most $\varepsilon$. [/guided] [/step] [step:Replace the finite window by a bounded enlargement of the original generator block] By the definition of $\mathcal R=\mathcal P_{[-N,N]}$, \begin{align*} \mathcal R_{[0,n-1]}=\bigvee_{j=0}^{n-1}T^{-j}\mathcal P_{[-N,N]}=\mathcal P_{[-N,n-1+N]}. \end{align*} Since $T$ is invertible and preserves $\mu$, applying $T^N$ shifts this partition without changing entropy: the induced bijection between atoms, after discarding null atoms, preserves their $\mu$-measures because $\mu(T^{-N}A)=\mu(A)$ for every $A\in\mathcal A$. Hence \begin{align*} H_\mu(\mathcal P_{[-N,n-1+N]})=H_\mu(\mathcal P_{[0,n-1+2N]}). \end{align*} Therefore \begin{align*} \frac{1}{n}H_\mu(\mathcal Q_{[0,n-1]})\le\frac{1}{n}H_\mu(\mathcal P_{[0,n-1+2N]})+\varepsilon. \end{align*} Taking the limit superior as $n\to\infty$ gives \begin{align*} h_\mu(T,\mathcal Q)\le h_\mu(T,\mathcal P)+\varepsilon, \end{align*} because $(n-1+2N)/n\to1$ and the entropy rate of $\mathcal P$ exists. [guided] The partition $\mathcal R$ is not a new source of entropy; it is only a bounded time-window of the original generator. Its $n$-block is \begin{align*} \mathcal R_{[0,n-1]}=\mathcal P_{[-N,n-1+N]}. \end{align*} In the invertible case, applying $T^N$ shifts this two-sided block to the forward block $\mathcal P_{[0,n-1+2N]}$ without changing atom measures, so it does not change entropy. Thus the previous step gives \begin{align*} \frac{1}{n}H_\mu(\mathcal Q_{[0,n-1]})\le\frac{1}{n}H_\mu(\mathcal P_{[0,n-1+2N]})+\varepsilon. \end{align*} The numerator on the right is the entropy of a block of $\mathcal P$ whose length is $n+2N$. Since $N$ is fixed while $n\to\infty$, the ratio $(n+2N)/n$ tends to $1$, and the existing entropy rate of $\mathcal P$ controls these averages. Taking the limit superior yields \begin{align*} h_\mu(T,\mathcal Q)\le h_\mu(T,\mathcal P)+\varepsilon. \end{align*} This proves the desired upper bound for the single finite partition $\mathcal Q$, up to the arbitrary error $\varepsilon$. [/guided] [/step] [step:Take the supremum over finite partitions and recover the countable generator by finite coarsenings] The number $\varepsilon>0$ was arbitrary, so the previous step gives \begin{align*} h_\mu(T,\mathcal Q)\le h_\mu(T,\mathcal P) \end{align*} for every finite measurable partition $\mathcal Q$ of $X$. Taking the supremum over all finite $\mathcal Q$ yields \begin{align*} h_\mu(T)\le h_\mu(T,\mathcal P). \end{align*} It remains to justify the reverse inequality for the countable partition $\mathcal P$. Enumerate the atoms of $\mathcal P$ as $(P_i)_{i\ge1}$, allowing empty atoms if necessary, and define the finite coarsening $\mathcal P_m$ by \begin{align*} \mathcal P_m:=\{P_1,\dots,P_m,X\setminus\bigcup_{i=1}^{m}P_i\}. \end{align*} For each fixed $m$, the partition $\mathcal P_m$ is finite, so \begin{align*} h_\mu(T,\mathcal P_m)\le h_\mu(T). \end{align*} Since $\mathcal P_m$ is coarser than $\mathcal P$, the chain rule for conditional entropy and monotonicity of conditional entropy give, for every $n\in\mathbb N$, \begin{align*} H_\mu(\mathcal P_{[0,n-1]})\le H_\mu((\mathcal P_m)_{[0,n-1]})+H_\mu(\mathcal P_{[0,n-1]}\mid(\mathcal P_m)_{[0,n-1]}). \end{align*} The conditional term is bounded by the sum of the one-time conditional entropies: \begin{align*} H_\mu(\mathcal P_{[0,n-1]}\mid(\mathcal P_m)_{[0,n-1]})\le\sum_{k=0}^{n-1}H_\mu(T^{-k}\mathcal P\mid T^{-k}\mathcal P_m)=nH_\mu(\mathcal P\mid\mathcal P_m), \end{align*} where the equality uses $T$-invariance of $\mu$. Dividing by $n$ and passing to the limit gives \begin{align*} h_\mu(T,\mathcal P)\le h_\mu(T,\mathcal P_m)+H_\mu(\mathcal P\mid\mathcal P_m)\le h_\mu(T)+H_\mu(\mathcal P\mid\mathcal P_m). \end{align*} Because $\mathcal P_m$ is a coarsening of $\mathcal P$, the chain rule gives \begin{align*} H_\mu(\mathcal P\mid\mathcal P_m)=H_\mu(\mathcal P)-H_\mu(\mathcal P_m). \end{align*} The finite entropy assumption $H_\mu(\mathcal P)<\infty$ implies $H_\mu(\mathcal P_m)\uparrow H_\mu(\mathcal P)$ as $m\to\infty$, since the partial entropy sums converge to the countable entropy. Hence $H_\mu(\mathcal P\mid\mathcal P_m)\to0$, and letting $m\to\infty$ yields \begin{align*} h_\mu(T,\mathcal P)\le h_\mu(T). \end{align*} Combining the two inequalities proves \begin{align*} h_\mu(T)=h_\mu(T,\mathcal P)=\lim_{n\to\infty}\frac{1}{n}H_\mu(\mathcal P_{[0,n-1]}). \end{align*} [guided] The previous step applies to every finite partition $\mathcal Q$. Since $\varepsilon>0$ was arbitrary, it gives $h_\mu(T,\mathcal Q)\le h_\mu(T,\mathcal P)$ for each finite $\mathcal Q$. Taking the supremum over finite partitions in the definition of $h_\mu(T)$ gives \begin{align*} h_\mu(T)\le h_\mu(T,\mathcal P). \end{align*} For the reverse inequality, approximate the countable partition $\mathcal P$ by finite coarsenings. Enumerate its atoms as $(P_i)_{i\ge1}$ and define \begin{align*} \mathcal P_m=\{P_1,\dots,P_m,X\setminus\bigcup_{i=1}^{m}P_i\}. \end{align*} Each $\mathcal P_m$ is finite, so its entropy rate is bounded by the supremum: \begin{align*} h_\mu(T,\mathcal P_m)\le h_\mu(T). \end{align*} The chain rule compares the $n$-block of $\mathcal P$ with the $n$-block of $\mathcal P_m$ and gives \begin{align*} H_\mu(\mathcal P_{[0,n-1]})\le H_\mu((\mathcal P_m)_{[0,n-1]})+nH_\mu(\mathcal P\mid\mathcal P_m). \end{align*} After division by $n$ and passage to the entropy-rate limit, \begin{align*} h_\mu(T,\mathcal P)\le h_\mu(T)+H_\mu(\mathcal P\mid\mathcal P_m). \end{align*} Because $H_\mu(\mathcal P)<\infty$, the finite coarsenings exhaust the countable entropy, so $H_\mu(\mathcal P\mid\mathcal P_m)=H_\mu(\mathcal P)-H_\mu(\mathcal P_m)\to0$. Letting $m\to\infty$ gives $h_\mu(T,\mathcal P)\le h_\mu(T)$. Combining the two inequalities proves the theorem in the invertible two-sided case. [/guided] [/step] [step:Adapt the argument to one-sided non-invertible generators] Assume now that $T$ is not necessarily invertible and that $\mathcal P$ is a one-sided generator of finite entropy. For a finite partition $\mathcal Q$, define \begin{align*} \mathcal F_N:=\sigma(\mathcal P_{[0,N]}). \end{align*} The one-sided generator hypothesis gives \begin{align*} H_\mu(\mathcal Q\mid\mathcal F_N)\to0. \end{align*} Choosing $N$ with $H_\mu(\mathcal Q\mid\mathcal F_N)<\varepsilon$ and setting $\mathcal R:=\mathcal P_{[0,N]}$, the same conditional-entropy argument gives \begin{align*} H_\mu(\mathcal Q_{[0,n-1]})\le H_\mu(\mathcal R_{[0,n-1]})+n\varepsilon. \end{align*} Here no inverse of $T$ is needed, because \begin{align*} \mathcal R_{[0,n-1]}=\mathcal P_{[0,n-1+N]}. \end{align*} Dividing by $n$ and passing to the limit superior gives \begin{align*} h_\mu(T,\mathcal Q)\le h_\mu(T,\mathcal P)+\varepsilon. \end{align*} Letting $\varepsilon\downarrow0$ and then taking the supremum over finite $\mathcal Q$ proves $h_\mu(T)\le h_\mu(T,\mathcal P)$ in the one-sided setting. The reverse inequality $h_\mu(T,\mathcal P)\le h_\mu(T)$ follows from the finite-coarsening argument in the previous step, since that argument used only $T$-invariance of $\mu$ and the forward blocks $\mathcal P_{[0,n-1]}$, not invertibility. This proves the one-sided statement. [guided] In the non-invertible setting we cannot shift negative time indices forward, so we use only forward windows. The one-sided generator hypothesis says that the increasing $\sigma$-algebras \begin{align*} \mathcal F_N=\sigma(\mathcal P_{[0,N]}) \end{align*} generate $\mathcal A$ modulo $\mu$. Repeating the conditional-entropy approximation from the two-sided case gives, for every finite partition $\mathcal Q$ and every $\varepsilon>0$, an $N$ such that $H_\mu(\mathcal Q\mid\mathcal F_N)<\varepsilon$. Set $\mathcal R=\mathcal P_{[0,N]}$. The same chain-rule argument gives \begin{align*} H_\mu(\mathcal Q_{[0,n-1]})\le H_\mu(\mathcal R_{[0,n-1]})+n\varepsilon. \end{align*} Now the $n$-block of $\mathcal R$ is already a forward block: \begin{align*} \mathcal R_{[0,n-1]}=\mathcal P_{[0,n-1+N]}. \end{align*} Dividing by $n$ and taking the limit superior gives $h_\mu(T,\mathcal Q)\le h_\mu(T,\mathcal P)+\varepsilon$. Letting $\varepsilon$ tend to $0$ and taking the supremum over finite $\mathcal Q$ gives $h_\mu(T)\le h_\mu(T,\mathcal P)$. The finite-coarsening argument proving $h_\mu(T,\mathcal P)\le h_\mu(T)$ used only forward blocks and $T$-invariance, so it remains valid without invertibility. This proves the one-sided generator theorem. [/guided] [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Chain Rule for Entropy

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.

Kolmogorov-Sinai Generator Theorem (Theorem # 6726)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Kolmogorov-Sinai Generator Theorem (Theorem # 6726)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further