Birkhoff Erodic Theorem for Stationary Processes

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We realize the stationary process as a single measure-preserving dynamical system on path space. Strict stationarity says exactly that the path law is invariant under the left shift. Applying the Birkhoff pointwise ergodic theorem to the coordinate function $x\mapsto x_1$ gives almost sure convergence of the time averages to a shift-invariant limit. In the ergodic case, the invariant limit is constant and equals the space average, which is $\mathbb E[X_0]$. [/proofplan] [step:Pass from the stationary process to a shift-invariant path measure] Define the path map \begin{align*} Y:\Omega &\to \mathbb R^{\mathbb Z} \\ \omega &\mapsto (X_t(\omega))_{t\in\mathbb Z}. \end{align*} Let $\mu:=\mathbb P\circ Y^{-1}$ be the law of the process on $(\mathbb R^{\mathbb Z},\mathcal B(\mathbb R^{\mathbb Z}))$. Define the left shift \begin{align*} S:\mathbb R^{\mathbb Z} &\to \mathbb R^{\mathbb Z} \\ x &\mapsto Sx, \end{align*} where $(Sx)_t=x_{t+1}$ for each $t\in\mathbb Z$. We verify that $S$ preserves $\mu$. Let $m\in\mathbb N$, let $t_1,\dots,t_m\in\mathbb Z$, and let $A_1,\dots,A_m\in\mathcal B(\mathbb R)$. For the cylinder set \begin{align*} C:=\{x\in\mathbb R^{\mathbb Z}:x_{t_j}\in A_j\text{ for }1\le j\le m\}, \end{align*} we have \begin{align*} \mu(S^{-1}C) &=\mathbb P\bigl((Y(\omega))_{t_j+1}\in A_j\text{ for }1\le j\le m\bigr)\\ &=\mathbb P\bigl(X_{t_j+1}\in A_j\text{ for }1\le j\le m\bigr)\\ &=\mathbb P\bigl(X_{t_j}\in A_j\text{ for }1\le j\le m\bigr)\\ &=\mu(C), \end{align*} where the third equality is strict stationarity. Since cylinder sets form a $\pi$-system generating $\mathcal B(\mathbb R^{\mathbb Z})$, the $\pi$-$\lambda$ theorem gives $\mu(S^{-1}A)=\mu(A)$ for every $A\in\mathcal B(\mathbb R^{\mathbb Z})$. Thus $S$ is measure-preserving on $(\mathbb R^{\mathbb Z},\mathcal B(\mathbb R^{\mathbb Z}),\mu)$. [guided] The goal is to turn the stochastic-process statement into a dynamical-systems statement. The object on which the shift acts is the full path space $\mathbb R^{\mathbb Z}$, whose points are two-sided sequences $x=(x_t)_{t\in\mathbb Z}$. The random map sending an outcome to its whole sample path is \begin{align*} Y:\Omega &\to \mathbb R^{\mathbb Z} \\ \omega &\mapsto (X_t(\omega))_{t\in\mathbb Z}. \end{align*} The probability measure transported to path space is $\mu:=\mathbb P\circ Y^{-1}$. Now define the left shift \begin{align*} S:\mathbb R^{\mathbb Z} &\to \mathbb R^{\mathbb Z} \\ x &\mapsto Sx, \end{align*} with $(Sx)_t=x_{t+1}$. We must check that this transformation preserves $\mu$, because the pointwise ergodic theorem applies to measure-preserving transformations. It is enough to verify invariance on finite-dimensional cylinder sets. Let \begin{align*} C:=\{x\in\mathbb R^{\mathbb Z}:x_{t_j}\in A_j\text{ for }1\le j\le m\}, \end{align*} where $m\in\mathbb N$, $t_1,\dots,t_m\in\mathbb Z$, and $A_1,\dots,A_m\in\mathcal B(\mathbb R)$. Then $S^{-1}C$ is the event that the shifted path has coordinates in the same sets, so \begin{align*} \mu(S^{-1}C) &=\mathbb P\bigl((Y(\omega))_{t_j+1}\in A_j\text{ for }1\le j\le m\bigr)\\ &=\mathbb P\bigl(X_{t_j+1}\in A_j\text{ for }1\le j\le m\bigr). \end{align*} Strict stationarity means that shifting all time indices by the same integer does not change any finite-dimensional distribution. Therefore \begin{align*} \mathbb P\bigl(X_{t_j+1}\in A_j\text{ for }1\le j\le m\bigr) = \mathbb P\bigl(X_{t_j}\in A_j\text{ for }1\le j\le m\bigr) = \mu(C). \end{align*} Cylinder sets generate the product Borel $\sigma$-algebra, so the $\pi$-$\lambda$ theorem extends this equality from cylinders to all Borel sets. Hence $\mu(S^{-1}A)=\mu(A)$ for every $A\in\mathcal B(\mathbb R^{\mathbb Z})$. [/guided] [/step] [step:Apply the pointwise ergodic theorem to the first-coordinate function] Define the first-coordinate function \begin{align*} f:\mathbb R^{\mathbb Z} &\to \mathbb R \\ x &\mapsto x_1. \end{align*} The function $f$ is $\mathcal B(\mathbb R^{\mathbb Z})$-measurable. Moreover, \begin{align*} \int_{\mathbb R^{\mathbb Z}} |f(x)|\,d\mu(x) = \mathbb E[|X_1|] = \mathbb E[|X_0|] <\infty, \end{align*} where the second equality follows from strict stationarity. Thus $f\in L^1(\mathbb R^{\mathbb Z},\mathcal B(\mathbb R^{\mathbb Z}),\mu)$. We use the following almost-everywhere invariant form of the Birkhoff Pointwise Ergodic Theorem: for a probability space, a measure-preserving transformation, and an $L^1$ function, the time averages converge almost everywhere to an integrable measurable function $Z^*$ satisfying $Z^*\circ S=Z^*$ $\mu$-almost everywhere. Applying this theorem to $(\mathbb R^{\mathbb Z},\mathcal B(\mathbb R^{\mathbb Z}),\mu,S)$ and $f$, there exists an integrable measurable function \begin{align*} Z^*:\mathbb R^{\mathbb Z}\to\mathbb R \end{align*} such that $Z^*\circ S=Z^*$ $\mu$-almost everywhere and \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1} f(S^k x)\to Z^*(x) \end{align*} for $\mu$-almost every $x\in\mathbb R^{\mathbb Z}$. Let $E_0\in\mathcal B(\mathbb R^{\mathbb Z})$ be the set on which this convergence holds. Since $S$ is invertible and measure-preserving, the set \begin{align*} E:=\bigcap_{j\in\mathbb Z} S^{-j}E_0 \end{align*} satisfies $\mu(E)=1$ and $S^{-1}E=E$. Define the everywhere $S$-invariant representative \begin{align*} \widehat Z:\mathbb R^{\mathbb Z}&\to\mathbb R \\ x&\mapsto \begin{cases} \displaystyle\lim_{n\to\infty}\frac{1}{n}\sum_{k=0}^{n-1} f(S^k x),& x\in E,\\ 0,& x\notin E. \end{cases} \end{align*} The function $\widehat Z$ is measurable as a pointwise limit on the measurable set $E$, extended by $0$ on $E^c$. If $x\in E$, then $Sx\in E$ and \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1} f(S^k(Sx)) = \frac{n+1}{n}\frac{1}{n+1}\sum_{k=0}^{n} f(S^k x)-\frac{1}{n}f(x) \to \widehat Z(x), \end{align*} so $\widehat Z(Sx)=\widehat Z(x)$. If $x\notin E$, then $Sx\notin E$, hence $\widehat Z(Sx)=0=\widehat Z(x)$. Thus $\widehat Z\circ S=\widehat Z$ everywhere and $\widehat Z=Z^*$ $\mu$-almost everywhere. For each $x\in\mathbb R^{\mathbb Z}$ and each $k\in\{0,\dots,n-1\}$, the definition of $S$ gives \begin{align*} f(S^k x)=(S^k x)_1=x_{k+1}. \end{align*} Therefore \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1} f(S^k x) = \frac{1}{n}\sum_{t=1}^{n}x_t. \end{align*} [guided] The coordinate function to which we apply the pointwise ergodic theorem is \begin{align*} f:\mathbb R^{\mathbb Z} &\to \mathbb R \\ x &\mapsto x_1. \end{align*} It is measurable because it is a coordinate projection on the product measurable space. Its integrability is exactly the integrability hypothesis on the stationary process: \begin{align*} \int_{\mathbb R^{\mathbb Z}} |f(x)|\,d\mu(x) = \mathbb E[|X_1|]. \end{align*} Strict stationarity gives $X_1$ and $X_0$ the same distribution, so \begin{align*} \mathbb E[|X_1|]=\mathbb E[|X_0|]<\infty. \end{align*} Thus $f\in L^1(\mu)$. We now apply the almost-everywhere invariant form of the Birkhoff Pointwise Ergodic Theorem. Its hypotheses are satisfied: $(\mathbb R^{\mathbb Z},\mathcal B(\mathbb R^{\mathbb Z}),\mu)$ is a probability space, the map $S$ is measure-preserving by the previous step, and $f$ is integrable. Hence there exists an integrable measurable function \begin{align*} Z^*:\mathbb R^{\mathbb Z}\to\mathbb R \end{align*} such that $Z^*\circ S=Z^*$ $\mu$-almost everywhere and \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1} f(S^k x)\to Z^*(x) \end{align*} for $\mu$-almost every $x$. Because this conclusion gives invariance only outside a null set, we now choose an everywhere invariant representative. Let $E_0$ be the set where the displayed convergence holds. Since the shift $S$ is invertible and measure-preserving, the set \begin{align*} E:=\bigcap_{j\in\mathbb Z} S^{-j}E_0 \end{align*} has $\mu(E)=1$ and satisfies $S^{-1}E=E$. Define \begin{align*} \widehat Z:\mathbb R^{\mathbb Z}&\to\mathbb R \\ x&\mapsto \begin{cases} \displaystyle\lim_{n\to\infty}\frac{1}{n}\sum_{k=0}^{n-1} f(S^k x),& x\in E,\\ 0,& x\notin E. \end{cases} \end{align*} This function is measurable because it is the pointwise limit of measurable averages on $E$ and is extended by a constant on $E^c$. It agrees with $Z^*$ $\mu$-almost everywhere. Moreover, if $x\in E$, then $Sx\in E$, and shifting the average changes only the first term and the normalization: \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1} f(S^k(Sx)) = \frac{n+1}{n}\frac{1}{n+1}\sum_{k=0}^{n} f(S^k x)-\frac{1}{n}f(x). \end{align*} Taking limits gives $\widehat Z(Sx)=\widehat Z(x)$. If $x\notin E$, then $Sx\notin E$, so both values are $0$. Thus $\widehat Z\circ S=\widehat Z$ everywhere. Finally we translate this back into the coordinates of the sequence. Since $(S^k x)_1=x_{k+1}$, we have \begin{align*} f(S^k x)=(S^k x)_1=x_{k+1}. \end{align*} Consequently \begin{align*} \frac{1}{n}\sum_{k=0}^{n-1} f(S^k x) = \frac{1}{n}\sum_{t=1}^{n}x_t. \end{align*} This is exactly the sample average appearing in the theorem. [/guided] [/step] [step:Pull the everywhere invariant representative back to the original probability space] Define \begin{align*} X^*:\Omega &\to \mathbb R \\ \omega &\mapsto \widehat Z(Y(\omega)). \end{align*} Since $Y$ and $\widehat Z$ are measurable, $X^*$ is an $\mathcal F$-measurable real-valued random variable. Since $\widehat Z\circ S=\widehat Z$ everywhere, $X^*$ is shift-invariant in the stated pathwise sense. The convergence obtained on path space transfers through the path map. Indeed, since $\mu=\mathbb P\circ Y^{-1}$ and $\mu(E)=1$, we have $\mathbb P(Y^{-1}(E))=1$. For every $\omega\in Y^{-1}(E)$, \begin{align*} \frac{1}{n}\sum_{t=1}^{n}X_t(\omega) = \frac{1}{n}\sum_{t=1}^{n}(Y(\omega))_t \to \widehat Z(Y(\omega)) = X^*(\omega). \end{align*} This proves the almost sure convergence assertion. [guided] We now return from the canonical path space to the original probability space. The invariant limit on path space is the measurable function $\widehat Z:\mathbb R^{\mathbb Z}\to\mathbb R$ constructed in the previous step, so we define the random variable \begin{align*} X^*:\Omega &\to \mathbb R \\ \omega &\mapsto \widehat Z(Y(\omega)). \end{align*} This map is measurable because $Y$ is measurable and $\widehat Z$ is measurable. The reason for replacing $Z^*$ by $\widehat Z$ is precisely invariance. The Birkhoff theorem only produced $Z^*\circ S=Z^*$ $\mu$-almost everywhere, while the representative $\widehat Z$ satisfies $\widehat Z\circ S=\widehat Z$ everywhere. Therefore the pulled-back random variable is shift-invariant in the pathwise sense encoded by the sample-path map. It remains to transfer convergence. Since $\mu=\mathbb P\circ Y^{-1}$ and $\mu(E)=1$, the event $Y^{-1}(E)$ has probability $1$. For every $\omega\in Y^{-1}(E)$, the path $Y(\omega)$ lies in the set where the averages converge to $\widehat Z$, and hence \begin{align*} \frac{1}{n}\sum_{t=1}^{n}X_t(\omega) = \frac{1}{n}\sum_{t=1}^{n}(Y(\omega))_t \to \widehat Z(Y(\omega)) = X^*(\omega). \end{align*} Thus the convergence holds $\mathbb P$-almost surely on the original probability space. [/guided] [/step] [step:Identify the limit under ergodicity] Assume the stationary process is ergodic, meaning that every $S$-invariant Borel set in $\mathbb R^{\mathbb Z}$ has $\mu$-measure $0$ or $1$. The Birkhoff Pointwise Ergodic Theorem in the ergodic case gives \begin{align*} Z^*(x)=\int_{\mathbb R^{\mathbb Z}} f(y)\,d\mu(y) \end{align*} for $\mu$-almost every $x\in\mathbb R^{\mathbb Z}$. Since $\widehat Z=Z^*$ $\mu$-almost everywhere, the same identity holds with $\widehat Z$ in place of $Z^*$. By the definition of $f$ and $\mu$, \begin{align*} \int_{\mathbb R^{\mathbb Z}} f(y)\,d\mu(y) = \mathbb E[X_1] = \mathbb E[X_0], \end{align*} where the final equality follows from strict stationarity. Pulling this identity back through $Y$ gives \begin{align*} X^*(\omega)=\widehat Z(Y(\omega))=\mathbb E[X_0] \end{align*} for $\mathbb P$-almost every $\omega\in\Omega$. This proves the ergodic conclusion and completes the proof. [guided] In the non-ergodic case, Birkhoff gives a limit that may depend on the invariant part of the trajectory. Ergodicity removes that dependence: every invariant measurable function is almost surely constant. The ergodic form of the Birkhoff Pointwise Ergodic Theorem states that the limit equals the space average: \begin{align*} Z^*(x)=\int_{\mathbb R^{\mathbb Z}} f(y)\,d\mu(y) \end{align*} for $\mu$-almost every $x\in\mathbb R^{\mathbb Z}$. We compute this space average using the definition of the path law. Since $f(y)=y_1$, \begin{align*} \int_{\mathbb R^{\mathbb Z}} f(y)\,d\mu(y) = \mathbb E[X_1]. \end{align*} Strict stationarity gives $X_1$ and $X_0$ the same distribution. Since $X_0$ is integrable, both expectations are finite and equal: \begin{align*} \mathbb E[X_1]=\mathbb E[X_0]. \end{align*} Therefore \begin{align*} \widehat Z(x)=\mathbb E[X_0] \end{align*} for $\mu$-almost every $x$. Because $X^*=\widehat Z\circ Y$ and $\mu=\mathbb P\circ Y^{-1}$, the same identity holds on the original probability space: \begin{align*} X^*(\omega)=\mathbb E[X_0] \end{align*} for $\mathbb P$-almost every $\omega\in\Omega$. [/guided] [/step]

What brings you to Androma?

Start with a route through the knowledge graph.