[proofplan]
We view composition with $T$ as the Koopman isometry on $L^2(X,\mathcal B,\mu)$ and apply the mean ergodic theorem to identify the $L^2$ limit of the ergodic averages as the orthogonal projection onto the fixed-vector subspace. We then identify that fixed-vector subspace with $L^2(X,\mathcal I,\mu)$, the functions measurable with respect to the invariant $\sigma$-algebra. Finally, testing the orthogonal projection against indicators of invariant sets gives exactly the defining integral identity for conditional expectation onto $\mathcal I$.
[/proofplan]
[step:Apply the mean ergodic theorem to the Koopman isometry]
Let $\mathcal H := L^2(X,\mathcal B,\mu)$, equipped with the inner product
\begin{align*}
\langle g,h\rangle_{\mathcal H}
:=
\int_X g(x)\overline{h(x)}\,d\mu(x).
\end{align*}
Define the Koopman operator
\begin{align*}
U:\mathcal H &\to \mathcal H \\
g &\mapsto g\circ T .
\end{align*}
Since $T$ is measure-preserving, $U$ is well-defined on $L^2$ equivalence classes and
\begin{align*}
\|Ug\|_{\mathcal H}^2
&=
\int_X |g(Tx)|^2\,d\mu(x) \\
&=
\int_X |g(y)|^2\,d\mu(y)
=
\|g\|_{\mathcal H}^2 .
\end{align*}
Thus $U$ is a linear isometry.
For each $N\in \mathbb N$, define the ergodic averaging operator
\begin{align*}
M_N:\mathcal H &\to \mathcal H \\
g &\mapsto \frac{1}{N}\sum_{n=0}^{N-1}U^n g .
\end{align*}
Let
\begin{align*}
\mathcal H_1:=\{g\in \mathcal H: Ug=g\}.
\end{align*}
By the [Von Neumann Mean Ergodic Theorem](/theorems/3448), applied to the Hilbert space $\mathcal H$ and the isometry $U$, there is an orthogonal projection
\begin{align*}
P:\mathcal H &\to \mathcal H_1
\end{align*}
such that
\begin{align*}
\lim_{N\to\infty}\|M_N f-Pf\|_{\mathcal H}=0.
\end{align*}
[guided]
The ergodic averages are averages of iterates of the composition operator induced by $T$, so we first put them into Hilbert-space form. Let $\mathcal H:=L^2(X,\mathcal B,\mu)$, with inner product
\begin{align*}
\langle g,h\rangle_{\mathcal H}
:=
\int_X g(x)\overline{h(x)}\,d\mu(x).
\end{align*}
Define
\begin{align*}
U:\mathcal H &\to \mathcal H \\
g &\mapsto g\circ T .
\end{align*}
This operator is well-defined on $L^2$ classes: if two representatives agree outside a $\mu$-null set $N\in\mathcal B$, then their compositions with $T$ agree outside $T^{-1}N$, and $\mu(T^{-1}N)=\mu(N)=0$ because $T$ is measure-preserving.
The same measure-preserving property gives the isometry identity
\begin{align*}
\|Ug\|_{\mathcal H}^2
&=
\int_X |g(Tx)|^2\,d\mu(x) \\
&=
\int_X |g(y)|^2\,d\mu(y)
=
\|g\|_{\mathcal H}^2 .
\end{align*}
Thus $U$ is a linear isometry of the Hilbert space $\mathcal H$.
For $N\in\mathbb N$, define
\begin{align*}
M_N:\mathcal H &\to \mathcal H \\
g &\mapsto \frac{1}{N}\sum_{n=0}^{N-1}U^n g .
\end{align*}
The fixed-vector subspace of $U$ is
\begin{align*}
\mathcal H_1:=\{g\in \mathcal H: Ug=g\}.
\end{align*}
The [Von Neumann Mean Ergodic Theorem](/theorems/3448) applies because $\mathcal H$ is a Hilbert space and $U$ is an isometry, hence a contraction. It gives the orthogonal projection
\begin{align*}
P:\mathcal H &\to \mathcal H_1
\end{align*}
and the convergence
\begin{align*}
\lim_{N\to\infty}\|M_N f-Pf\|_{\mathcal H}=0.
\end{align*}
So the problem is reduced to identifying the projection $Pf$ with the conditional expectation onto $\mathcal I$.
[/guided]
[/step]
[step:Identify fixed vectors with functions measurable over $\mathcal I$]
Let
\begin{align*}
\mathcal K:=L^2(X,\mathcal I,\mu)
\end{align*}
be the closed subspace of $\mathcal H$ consisting of $L^2$ classes admitting an $\mathcal I$-measurable representative.
[claim:The fixed-vector subspace equals $L^2(X,\mathcal I,\mu)$]
One has $\mathcal H_1=\mathcal K$.
[/claim]
[proof]
Let $\mathscr Q$ be the countable basis of $\mathbb C$ consisting of open balls with centers in $\mathbb Q+i\mathbb Q$ and positive rational radii.
First let $h:X\to\mathbb C$ be an $\mathcal I$-measurable representative of an element of $\mathcal K$. For every $Q\in\mathscr Q$, the set $h^{-1}(Q)$ belongs to $\mathcal I$, so
\begin{align*}
T^{-1}(h^{-1}(Q))=h^{-1}(Q).
\end{align*}
Equivalently,
\begin{align*}
(h\circ T)^{-1}(Q)=h^{-1}(Q)
\end{align*}
for every $Q\in\mathscr Q$. Since $\mathscr Q$ separates points of $\mathbb C$, it follows that $h\circ T=h$ pointwise. Hence $\mathcal K\subseteq \mathcal H_1$.
Conversely, let $g:X\to\mathbb C$ be a finite $\mathcal B$-measurable representative of an element of $\mathcal H_1$. Since $Ug=g$ in $L^2$, the set
\begin{align*}
N:=\{x\in X:g(Tx)\neq g(x)\}
\end{align*}
satisfies $\mu(N)=0$. For $Q\in\mathscr Q$, define
\begin{align*}
E_Q:=g^{-1}(Q).
\end{align*}
Then $T^{-1}E_Q\triangle E_Q\subseteq N$, where $\triangle$ denotes symmetric difference, and hence
\begin{align*}
\mu(T^{-1}E_Q\triangle E_Q)=0.
\end{align*}
For $n\geq 0$, write $T^n:X\to X$ for the $n$-fold iterate, with $T^0=\operatorname{id}_X$, and write $T^{-n}E=(T^n)^{-1}(E)$. Since $T^n$ is measure-preserving for every $n\geq 0$, induction using
\begin{align*}
T^{-(n+1)}E_Q\triangle E_Q
\subseteq
T^{-n}(T^{-1}E_Q\triangle E_Q)\cup (T^{-n}E_Q\triangle E_Q)
\end{align*}
gives
\begin{align*}
\mu(T^{-n}E_Q\triangle E_Q)=0
\end{align*}
for every $n\geq 0$.
Define
\begin{align*}
A_Q:=\bigcap_{m=0}^{\infty}\bigcup_{n=m}^{\infty}T^{-n}E_Q .
\end{align*}
Then $A_Q\in\mathcal B$ and
\begin{align*}
T^{-1}A_Q
&=
\bigcap_{m=0}^{\infty}\bigcup_{n=m}^{\infty}T^{-(n+1)}E_Q \\
&=
\bigcap_{m=1}^{\infty}\bigcup_{n=m}^{\infty}T^{-n}E_Q
=
A_Q.
\end{align*}
Thus $A_Q\in\mathcal I$. Moreover, with
\begin{align*}
N_Q:=\bigcup_{n=0}^{\infty}(T^{-n}E_Q\triangle E_Q),
\end{align*}
we have $\mu(N_Q)=0$ and $A_Q\triangle E_Q\subseteq N_Q$. Hence $g^{-1}(Q)$ differs by a null set from an element of $\mathcal I$ for every $Q\in\mathscr Q$.
Let $\overline{\mathcal I}^{\mu}$ denote the $\mu$-completion of $\mathcal I$. The collection
\begin{align*}
\mathscr S:=\{C\in\mathcal B(\mathbb C):g^{-1}(C)\in \overline{\mathcal I}^{\mu}\}
\end{align*}
is a $\sigma$-algebra containing $\mathscr Q$, and therefore $\mathscr S=\mathcal B(\mathbb C)$. Thus $g$ is $\overline{\mathcal I}^{\mu}$-measurable. Since $\mathbb C$ with its Borel $\sigma$-algebra is a standard Borel space, the measurable modification lemma for completed sub-$\sigma$-algebras gives an $\mathcal I$-measurable map
\begin{align*}
g_{\mathcal I}:X&\to\mathbb C
\end{align*}
such that $g_{\mathcal I}=g$ $\mu$-a.e. Hence the $L^2$ class of $g$ belongs to $\mathcal K$, proving $\mathcal H_1\subseteq\mathcal K$.
[/proof]
Therefore $Pf\in L^2(X,\mathcal I,\mu)$.
[guided]
We must connect two notions of invariance. The Hilbert-space limit from the mean ergodic theorem lands in
\begin{align*}
\mathcal H_1=\{g\in L^2(X,\mathcal B,\mu):g\circ T=g \text{ in } L^2\},
\end{align*}
whereas conditional expectation onto $\mathcal I$ is characterized among $\mathcal I$-measurable functions. Let
\begin{align*}
\mathcal K:=L^2(X,\mathcal I,\mu).
\end{align*}
We prove that $\mathcal H_1=\mathcal K$.
[claim:The fixed-vector subspace equals $L^2(X,\mathcal I,\mu)$]
One has $\mathcal H_1=\mathcal K$.
[/claim]
[proof]
Let $\mathscr Q$ be the countable basis of $\mathbb C$ consisting of open balls with centers in $\mathbb Q+i\mathbb Q$ and positive rational radii. This basis is used because it is countable and separates points: if $z\neq w$ in $\mathbb C$, then some $Q\in\mathscr Q$ contains one of $z,w$ and not the other.
First take an $\mathcal I$-measurable representative
\begin{align*}
h:X&\to\mathbb C
\end{align*}
of an element of $\mathcal K$. For each $Q\in\mathscr Q$, the preimage $h^{-1}(Q)$ belongs to $\mathcal I$. By the definition of $\mathcal I$,
\begin{align*}
T^{-1}(h^{-1}(Q))=h^{-1}(Q).
\end{align*}
Since $T^{-1}(h^{-1}(Q))=(h\circ T)^{-1}(Q)$, we get
\begin{align*}
(h\circ T)^{-1}(Q)=h^{-1}(Q)
\end{align*}
for every $Q\in\mathscr Q$. Because $\mathscr Q$ separates points, $h(Tx)=h(x)$ for every $x\in X$. Thus the $L^2$ class of $h$ lies in $\mathcal H_1$, proving $\mathcal K\subseteq\mathcal H_1$.
Now take an element of $\mathcal H_1$ and choose a finite $\mathcal B$-measurable representative
\begin{align*}
g:X&\to\mathbb C .
\end{align*}
The equality $Ug=g$ in $L^2$ means that
\begin{align*}
N:=\{x\in X:g(Tx)\neq g(x)\}
\end{align*}
has $\mu(N)=0$. For $Q\in\mathscr Q$, set
\begin{align*}
E_Q:=g^{-1}(Q).
\end{align*}
If $x\notin N$, then $g(Tx)=g(x)$, so $x\in T^{-1}E_Q$ exactly when $x\in E_Q$. Therefore
\begin{align*}
T^{-1}E_Q\triangle E_Q\subseteq N,
\end{align*}
and hence
\begin{align*}
\mu(T^{-1}E_Q\triangle E_Q)=0.
\end{align*}
For $n\geq 0$, let $T^n:X\to X$ be the $n$-fold iterate, with $T^0=\operatorname{id}_X$, and write $T^{-n}E=(T^n)^{-1}(E)$. Since $T$ is measure-preserving, every iterate $T^n$ is measure-preserving. The induction step is the inclusion
\begin{align*}
T^{-(n+1)}E_Q\triangle E_Q
\subseteq
T^{-n}(T^{-1}E_Q\triangle E_Q)\cup (T^{-n}E_Q\triangle E_Q).
\end{align*}
The first set on the right has measure zero because $T^n$ is measure-preserving and $T^{-1}E_Q\triangle E_Q$ has measure zero; the second has measure zero by the induction hypothesis. Hence
\begin{align*}
\mu(T^{-n}E_Q\triangle E_Q)=0
\end{align*}
for every $n\geq 0$.
We now replace the almost-invariant set $E_Q$ by an exactly invariant set. Define
\begin{align*}
A_Q:=\bigcap_{m=0}^{\infty}\bigcup_{n=m}^{\infty}T^{-n}E_Q .
\end{align*}
This is the set of points whose forward orbit visits $E_Q$ infinitely often. It is exactly invariant because
\begin{align*}
T^{-1}A_Q
&=
\bigcap_{m=0}^{\infty}\bigcup_{n=m}^{\infty}T^{-(n+1)}E_Q \\
&=
\bigcap_{m=1}^{\infty}\bigcup_{n=m}^{\infty}T^{-n}E_Q \\
&=
\bigcap_{m=0}^{\infty}\bigcup_{n=m}^{\infty}T^{-n}E_Q
=
A_Q.
\end{align*}
Thus $A_Q\in\mathcal I$.
It remains to check that $A_Q$ represents the same measurable set as $E_Q$ modulo null sets. Define
\begin{align*}
N_Q:=\bigcup_{n=0}^{\infty}(T^{-n}E_Q\triangle E_Q).
\end{align*}
The preceding paragraph gives $\mu(N_Q)=0$. If $x\notin N_Q$, then membership in every $T^{-n}E_Q$ agrees with membership in $E_Q$. Therefore $x\in A_Q$ exactly when $x\in E_Q$, so $A_Q\triangle E_Q\subseteq N_Q$. Hence $E_Q$ differs by a null set from the invariant set $A_Q$.
Let $\overline{\mathcal I}^{\mu}$ be the $\mu$-completion of $\mathcal I$. The collection
\begin{align*}
\mathscr S:=\{C\in\mathcal B(\mathbb C):g^{-1}(C)\in \overline{\mathcal I}^{\mu}\}
\end{align*}
is a $\sigma$-algebra. Since every basis set $Q\in\mathscr Q$ belongs to $\mathscr S$, and $\mathscr Q$ generates $\mathcal B(\mathbb C)$, we have $\mathscr S=\mathcal B(\mathbb C)$. Thus $g$ is measurable with respect to the completed invariant $\sigma$-algebra. The measurable modification lemma for completed sub-$\sigma$-algebras applies because $\mathbb C$ is a standard Borel space; it gives an $\mathcal I$-measurable map
\begin{align*}
g_{\mathcal I}:X&\to\mathbb C
\end{align*}
with $g_{\mathcal I}=g$ $\mu$-a.e. Therefore the $L^2$ class of $g$ belongs to $\mathcal K$, proving $\mathcal H_1\subseteq\mathcal K$.
[/proof]
Since $Pf\in\mathcal H_1$ by construction of the orthogonal projection, the equality $\mathcal H_1=\mathcal K$ gives
\begin{align*}
Pf\in L^2(X,\mathcal I,\mu).
\end{align*}
[/guided]
[/step]
[step:Test the projection against indicators of invariant sets]
Let $A\in\mathcal I$ be arbitrary, and define the indicator map
\begin{align*}
\mathbb 1_A:X&\to\{0,1\} \\
x&\mapsto
\begin{cases}
1,&x\in A,\\
0,&x\notin A.
\end{cases}
\end{align*}
Since a measure-preserving system is a probability space, $\mu(X)=1$, so $\mathbb 1_A\in\mathcal H$. Since $A\in\mathcal I$,
\begin{align*}
\mathbb 1_A\circ T
=
\mathbb 1_{T^{-1}A}
=
\mathbb 1_A,
\end{align*}
and hence $\mathbb 1_A\in\mathcal H_1$.
Because $P$ is the orthogonal projection onto $\mathcal H_1$, one has $f-Pf\in\mathcal H_1^\perp$. Therefore
\begin{align*}
0
&=
\langle f-Pf,\mathbb 1_A\rangle_{\mathcal H} \\
&=
\int_X (f(x)-Pf(x))\mathbb 1_A(x)\,d\mu(x) \\
&=
\int_A (f(x)-Pf(x))\,d\mu(x).
\end{align*}
Thus
\begin{align*}
\int_A Pf(x)\,d\mu(x)
=
\int_A f(x)\,d\mu(x)
\end{align*}
for every $A\in\mathcal I$.
[guided]
To prove that $Pf$ is the conditional expectation, we must verify the defining integral identity on every invariant set. Fix $A\in\mathcal I$, and define
\begin{align*}
\mathbb 1_A:X&\to\{0,1\} \\
x&\mapsto
\begin{cases}
1,&x\in A,\\
0,&x\notin A.
\end{cases}
\end{align*}
Since the measure space of a measure-preserving system is a probability space, $\mu(X)=1$, so
\begin{align*}
\int_X |\mathbb 1_A(x)|^2\,d\mu(x)
=
\mu(A)
\leq 1.
\end{align*}
Hence $\mathbb 1_A\in\mathcal H$.
The reason indicators of invariant sets are the right test functions is that they are fixed by $U$. Indeed, $A\in\mathcal I$ means $T^{-1}A=A$, and therefore
\begin{align*}
\mathbb 1_A\circ T
=
\mathbb 1_{T^{-1}A}
=
\mathbb 1_A.
\end{align*}
So $\mathbb 1_A\in\mathcal H_1$.
Since $P$ is the orthogonal projection onto $\mathcal H_1$, the error $f-Pf$ is orthogonal to every vector in $\mathcal H_1$. Applying this to $\mathbb 1_A$ gives
\begin{align*}
0
&=
\langle f-Pf,\mathbb 1_A\rangle_{\mathcal H} \\
&=
\int_X (f(x)-Pf(x))\overline{\mathbb 1_A(x)}\,d\mu(x).
\end{align*}
Because $\mathbb 1_A$ is real-valued and equals $1$ on $A$ and $0$ on $X\setminus A$, this becomes
\begin{align*}
0
=
\int_A (f(x)-Pf(x))\,d\mu(x).
\end{align*}
Therefore
\begin{align*}
\int_A Pf(x)\,d\mu(x)
=
\int_A f(x)\,d\mu(x)
\end{align*}
for every invariant set $A\in\mathcal I$.
[/guided]
[/step]
[step:Use the uniqueness of conditional expectation to identify the limit]
Since $\mu(X)=1$ and $f\in L^2(X,\mathcal B,\mu)$, the Cauchy-Schwarz inequality gives
\begin{align*}
\int_X |f(x)|\,d\mu(x)
&\leq
\left(\int_X |f(x)|^2\,d\mu(x)\right)^{1/2}
\left(\int_X 1^2\,d\mu(x)\right)^{1/2}
<\infty.
\end{align*}
Thus $\mathbb E[f\mid\mathcal I]$ is defined. Let
\begin{align*}
e:X&\to\mathbb C
\end{align*}
be an $\mathcal I$-measurable representative of $\mathbb E[f\mid\mathcal I]$. By the defining characterization of conditional expectation,
\begin{align*}
\int_A e(x)\,d\mu(x)
=
\int_A f(x)\,d\mu(x)
\end{align*}
for every $A\in\mathcal I$.
The previous step shows that $Pf$ is $\mathcal I$-measurable and satisfies the same integral identity. Since $Pf\in L^2(X,\mu)$ and $\mu(X)=1$, another application of Cauchy-Schwarz gives $Pf\in L^1(X,\mu)$. By the [Existence and Uniqueness of Conditional Expectation](/theorems/1147), $Pf=e$ $\mu$-a.e. Hence
\begin{align*}
Pf=\mathbb E[f\mid\mathcal I]
\end{align*}
as elements of $L^2(X,\mathcal B,\mu)$. Combining this identity with the mean ergodic convergence gives
\begin{align*}
\lim_{N\to\infty}
\left\|
\frac{1}{N}\sum_{n=0}^{N-1} f\circ T^n
-
\mathbb E[f\mid\mathcal I]
\right\|_{L^2(X,\mu)}
=
0.
\end{align*}
[guided]
The final step is to identify the abstract Hilbert-space projection $Pf$ with the measure-theoretic conditional expectation. Since $\mu(X)=1$ and $f\in L^2(X,\mathcal B,\mu)$, Cauchy-Schwarz gives
\begin{align*}
\int_X |f(x)|\,d\mu(x)
&\leq
\left(\int_X |f(x)|^2\,d\mu(x)\right)^{1/2}
\left(\int_X 1^2\,d\mu(x)\right)^{1/2}
<\infty,
\end{align*}
so $f\in L^1(X,\mu)$ and $\mathbb E[f\mid\mathcal I]$ is defined. Let
\begin{align*}
e:X&\to\mathbb C
\end{align*}
be an $\mathcal I$-measurable representative of this conditional expectation. Its defining property is
\begin{align*}
\int_A e(x)\,d\mu(x)=\int_A f(x)\,d\mu(x)
\end{align*}
for every $A\in\mathcal I$.
The previous step proved exactly the same identity for $Pf$ and also proved that $Pf$ has an $\mathcal I$-measurable representative. Moreover $Pf\in L^2(X,\mu)$, so $\mu(X)=1$ and Cauchy-Schwarz imply $Pf\in L^1(X,\mu)$. The [Existence and Uniqueness of Conditional Expectation](/theorems/1147) therefore identifies the two:
\begin{align*}
Pf=\mathbb E[f\mid\mathcal I]
\end{align*}
in $L^2(X,\mathcal B,\mu)$. Substituting this identity into the mean ergodic convergence gives
\begin{align*}
\lim_{N\to\infty}
\left\|
\frac{1}{N}\sum_{n=0}^{N-1} f\circ T^n
-
\mathbb E[f\mid\mathcal I]
\right\|_{L^2(X,\mu)}
=0.
\end{align*}
[/guided]
[/step]