Birkhoff Ergodic Theorem for Sample Means — Statement & Proof

Birkhoff Ergodic Theorem for Sample Means (Theorem # 3623)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We apply the Birkhoff-Khinchin ergodic theorem to the integrable stationary process $(X_t)_{t\in\mathbb Z}$, which identifies the almost sure limit of the time averages as the conditional expectation of $X_0$ with respect to the invariant $\sigma$-algebra. Ergodicity forces every invariant event to have probability $0$ or $1$, so every invariant random variable is almost surely constant. Finally, the defining property of conditional expectation identifies that constant with $\mathbb E[X_0]$. [/proofplan] [step:Apply the ergodic theorem to identify the almost sure limit] Let $(\Omega,\mathcal F,\mathbb P)$ be the underlying probability space. Define the path map \begin{align*} X:\Omega&\to\mathbb R^{\mathbb Z}\\ \omega&\mapsto (X_t(\omega))_{t\in\mathbb Z}, \end{align*} and let $\mu:=\mathbb P\circ X^{-1}$ be the law of the two-sided process on the product measurable space $(\mathbb R^{\mathbb Z},\mathcal B(\mathbb R)^{\otimes\mathbb Z})$. Let $\theta:\mathbb R^{\mathbb Z}\to\mathbb R^{\mathbb Z}$ be the left shift map, defined by $(\theta x)_t=x_{t+1}$ for $x=(x_t)_{t\in\mathbb Z}$ and $t\in\mathbb Z$. Define the invariant $\sigma$-algebra \begin{align*} \mathcal I:=\{X^{-1}(B):B\in\mathcal B(\mathbb R)^{\otimes\mathbb Z}\text{ and }\theta^{-1}B=B\text{ modulo }\mu\}\subseteq\mathcal F. \end{align*} Since $\mathbb E[|X_0|]<\infty$, the [Birkhoff-Khinchin Ergodic Theorem for Stationary Processes](/theorems/???) applies to $(X_t)_{t\in\mathbb Z}$ and gives an integrable $\mathcal I$-measurable random variable $Y:\Omega\to\mathbb R$ such that \begin{align*} Y=\mathbb E[X_0\mid\mathcal I] \end{align*} and \begin{align*} \frac{1}{n}\sum_{t=1}^{n}X_t \to Y \end{align*} $\mathbb P$-a.s. as $n\to\infty$. [guided] Let $(\Omega,\mathcal F,\mathbb P)$ be the underlying probability space. To make the invariant $\sigma$-algebra precise, define the path map \begin{align*} X:\Omega&\to\mathbb R^{\mathbb Z}\\ \omega&\mapsto (X_t(\omega))_{t\in\mathbb Z}, \end{align*} and let $\mu:=\mathbb P\circ X^{-1}$ be the law of the two-sided process on $(\mathbb R^{\mathbb Z},\mathcal B(\mathbb R)^{\otimes\mathbb Z})$. Let $\theta:\mathbb R^{\mathbb Z}\to\mathbb R^{\mathbb Z}$ be the left shift map, defined by $(\theta x)_t=x_{t+1}$ for $x=(x_t)_{t\in\mathbb Z}$ and $t\in\mathbb Z$. We define \begin{align*} \mathcal I:=\{X^{-1}(B):B\in\mathcal B(\mathbb R)^{\otimes\mathbb Z}\text{ and }\theta^{-1}B=B\text{ modulo }\mu\}\subseteq\mathcal F. \end{align*} Thus $\mathcal I$ consists of events determined by the two-sided trajectory whose occurrence is unchanged, up to null sets, by shifting the trajectory. The hypothesis $\mathbb E[|X_0|]<\infty$ is exactly the integrability hypothesis needed to apply the [Birkhoff-Khinchin Ergodic Theorem for Stationary Processes](/theorems/???). That theorem states that the time averages of an integrable stationary process converge almost surely to the conditional expectation of one coordinate with respect to the invariant $\sigma$-algebra. Therefore there exists an integrable $\mathcal I$-measurable random variable $Y:\Omega\to\mathbb R$ satisfying \begin{align*} Y=\mathbb E[X_0\mid\mathcal I] \end{align*} and \begin{align*} \frac{1}{n}\sum_{t=1}^{n}X_t \to Y \end{align*} $\mathbb P$-a.s. as $n\to\infty$. This is the only place where the convergence theorem is used. The remaining work is to show that ergodicity turns the conditional expectation $Y$ into the constant $\mathbb E[X_0]$. [/guided] [/step] [step:Use ergodicity to show the invariant conditional expectation is constant] Because the process is ergodic, every event $A\in\mathcal I$ satisfies $\mathbb P(A)\in\{0,1\}$. Since $Y$ is $\mathcal I$-measurable, for every $a\in\mathbb R$ the event \begin{align*} A_a:=\{\omega\in\Omega:Y(\omega)\le a\} \end{align*} belongs to $\mathcal I$, and hence $\mathbb P(A_a)\in\{0,1\}$. Let $\mathbb Q\subset\mathbb R$ denote the set of rational numbers. Define \begin{align*} c:=\inf\{q\in\mathbb Q:\mathbb P(Y\le q)=1\}. \end{align*} The defining set is nonempty: since $Y$ is finite $\mathbb P$-a.s., the events $\{Y\le m\}$ for $m\in\mathbb N$ increase to an event of probability $1$, so continuity from below gives \begin{align*} \lim_{m\to\infty}\mathbb P(Y\le m)=1. \end{align*} Each event $\{Y\le m\}$ lies in $\mathcal I$, hence each probability $\mathbb P(Y\le m)$ belongs to $\{0,1\}$. Therefore $\mathbb P(Y\le m_0)=1$ for some $m_0\in\mathbb N$. The defining set is not all of $\mathbb Q$: since $Y$ is finite $\mathbb P$-a.s., the events $\{Y\le -m\}$ for $m\in\mathbb N$ decrease to an event of probability $0$, so continuity from above gives \begin{align*} \lim_{m\to\infty}\mathbb P(Y\le -m)=0. \end{align*} Again $\mathbb P(Y\le -m)\in\{0,1\}$ for every $m\in\mathbb N$, so $\mathbb P(Y\le -m_1)=0$ for some $m_1\in\mathbb N$. Thus $c\in\mathbb R$ is well-defined. If $q\in\mathbb Q$ and $q>c$, then by the definition of the infimum there exists $r\in\mathbb Q$ with $c<r<q$ and $\mathbb P(Y\le r)=1$, so $\mathbb P(Y\le q)=1$. If $q\in\mathbb Q$ and $q<c$, then $\mathbb P(Y\le q)\ne1$, and therefore $\mathbb P(Y\le q)=0$. Thus \begin{align*} \mathbb P(Y\le q)=0 \quad \text{for all } q\in\mathbb Q,\ q<c, \end{align*} and \begin{align*} \mathbb P(Y\le q)=1 \quad \text{for all } q\in\mathbb Q,\ q>c. \end{align*} Taking a countable union over rational $q<c$ and a countable intersection over rational $q>c$ gives \begin{align*} \mathbb P(Y<c)=0, \end{align*} and \begin{align*} \mathbb P(Y\le c)=1. \end{align*} Therefore $Y=c$ $\mathbb P$-a.s. [guided] The key point is that an $\mathcal I$-measurable random variable can only depend on invariant information. Under ergodicity, invariant information is probabilistically trivial. For each $a\in\mathbb R$, define the event \begin{align*} A_a:=\{\omega\in\Omega:Y(\omega)\le a\}. \end{align*} Because $Y$ is $\mathcal I$-measurable, $A_a\in\mathcal I$. Since the process is ergodic, every event in $\mathcal I$ has probability either $0$ or $1$, so \begin{align*} \mathbb P(A_a)\in\{0,1\}. \end{align*} We now prove that this forces $Y$ to be almost surely constant. Let $\mathbb Q\subset\mathbb R$ denote the set of rational numbers, and define \begin{align*} c:=\inf\{q\in\mathbb Q:\mathbb P(Y\le q)=1\}. \end{align*} The rational numbers are used so that the later unions and intersections are countable. We must justify that this infimum is finite. Because $Y:\Omega\to\mathbb R$ is finite $\mathbb P$-a.s., the events $\{Y\le m\}$ for $m\in\mathbb N$ increase to an event of probability $1$. Continuity from below for the probability measure $\mathbb P$ gives \begin{align*} \lim_{m\to\infty}\mathbb P(Y\le m)=1. \end{align*} For every $m\in\mathbb N$, the threshold event $\{Y\le m\}$ lies in $\mathcal I$, so ergodicity gives $\mathbb P(Y\le m)\in\{0,1\}$. A sequence taking only the values $0$ and $1$ and converging to $1$ must equal $1$ for at least one index; hence there exists $m_0\in\mathbb N$ such that $\mathbb P(Y\le m_0)=1$. Thus the defining set is nonempty. We also need to know that the defining set is not all of $\mathbb Q$. Since $Y$ is real-valued $\mathbb P$-a.s., the events $\{Y\le -m\}$ for $m\in\mathbb N$ decrease to an event of probability $0$. Continuity from above for $\mathbb P$ gives \begin{align*} \lim_{m\to\infty}\mathbb P(Y\le -m)=0. \end{align*} Each probability $\mathbb P(Y\le -m)$ is again either $0$ or $1$, so there exists $m_1\in\mathbb N$ such that $\mathbb P(Y\le -m_1)=0$. Therefore $-m_1$ does not belong to the defining set, and $c\in\mathbb R$. If $q\in\mathbb Q$ and $q>c$, then by the definition of infimum there exists $r\in\mathbb Q$ such that $c<r<q$ and $\mathbb P(Y\le r)=1$. Since $\{Y\le r\}\subseteq\{Y\le q\}$, we get \begin{align*} \mathbb P(Y\le q)=1. \end{align*} If $q\in\mathbb Q$ and $q<c$, then $q$ cannot belong to the defining set for $c$, so $\mathbb P(Y\le q)\ne1$. But this probability is either $0$ or $1$, hence \begin{align*} \mathbb P(Y\le q)=0. \end{align*} Now countability of $\mathbb Q$ lets us pass from rational thresholds to the random variable itself: \begin{align*} \{Y<c\}=\bigcup_{\substack{q\in\mathbb Q\\ q<c}}\{Y\le q\}, \end{align*} so $\mathbb P(Y<c)=0$. Also, \begin{align*} \{Y>c\}=\bigcup_{\substack{q\in\mathbb Q\\ q>c}}\{Y>q\}, \end{align*} and each event $\{Y>q\}$ has probability $0$, because $\mathbb P(Y\le q)=1$ for every rational $q>c$. Therefore $\mathbb P(Y>c)=0$. Combining the two estimates gives $Y=c$ $\mathbb P$-a.s. [/guided] [/step] [step:Identify the constant as the expectation of $X_0$] Since $Y=\mathbb E[X_0\mid\mathcal I]$, the defining property of conditional expectation applied to the event $\Omega\in\mathcal I$ gives \begin{align*} \mathbb E[Y]=\mathbb E[X_0]. \end{align*} Because $Y=c$ $\mathbb P$-a.s., we also have $\mathbb E[Y]=c$. Hence \begin{align*} c=\mathbb E[X_0]. \end{align*} Combining this identity with the almost sure convergence from the first step gives \begin{align*} \bar X_n=\frac{1}{n}\sum_{t=1}^{n}X_t \to \mathbb E[X_0] \end{align*} $\mathbb P$-a.s. as $n\to\infty$. [/step]

Explore Further

Fisher–Neyman Sufficiency of the Sample Mean and Sample Covariance for the Multivariate Normal Family probability Maximum Likelihood Estimators for the Multivariate Normal Distribution probability Rayleigh–Ritz Variance Maximisation Characterisation probability Bartlett Decomposition probability Principal Component Variances probability Bonferroni Simultaneous Confidence Intervals for Linear Contrasts of a Multivariate Normal Mean probability Bartlett Chi-Squared Approximation for Wilks' Lambda Statistic probability Eckart–Young–Mirsky Theorem probability

What brings you to Androma?

Start with a route through the knowledge graph.

Birkhoff Ergodic Theorem for Sample Means (Theorem # 3623)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Birkhoff Ergodic Theorem for Sample Means (Theorem # 3623)

Discussion

Proof

Explore Further