[proofplan]
For bounded functions, Birkhoff gives almost-everywhere convergence, while the $L^2$ mean ergodic theorem identifies the limit as conditional expectation. Bounded convergence upgrades this to $L^p$ convergence. For general $L^p$ functions, truncate in $L^p$ and use that both ergodic averaging and conditional expectation are contractions on $L^p$.
[/proofplan]
[step:Prove the result for bounded functions]
Assume first that $f\in L^\infty(X,\mu)$. Then $f\in L^2(X,\mu)$ because $\mu(X)=1$. By the theorem identifying the mean-ergodic limit with conditional expectation,
\begin{align*}
A_Nf\to \mathbb{E}[f\mid\mathcal{I}]
\end{align*}
in $L^2$; this is the [Limit Is Conditional Expectation onto Invariant Sigma-Algebra](/theorems/3449). By the [Birkhoff Ergodic Theorem](/theorems/518), the same averages converge almost everywhere to an invariant function. The $L^2$ limit identifies that invariant function with $\mathbb{E}[f\mid\mathcal{I}]$.
Since
\begin{align*}
|A_Nf-\mathbb{E}[f\mid\mathcal{I}]|\leq 2\|f\|_\infty
\end{align*}
almost everywhere, the [Dominated Convergence Theorem](/theorems/4) gives
\begin{align*}
\|A_Nf-\mathbb{E}[f\mid\mathcal{I}]\|_p\to0
\end{align*}
for every $1\leq p<\infty$.
[/step]
[step:Use contraction estimates for truncation]
For $g\in L^p(X,\mu)$, measure preservation gives
\begin{align*}
\|g\circ T^n\|_p=\|g\|_p
\end{align*}
for every $n\geq0$. Hence
\begin{align*}
\|A_Ng\|_p\leq \|g\|_p
\end{align*}
for every $N$. Conditional expectation is also a contraction on $L^p$, so
\begin{align*}
\|\mathbb{E}[g\mid\mathcal{I}]\|_p\leq \|g\|_p.
\end{align*}
[/step]
[step:Approximate a general $L^p$ function by bounded functions]
Let $f\in L^p(X,\mu)$ and choose bounded functions $f_m\in L^\infty(X,\mu)$ such that
\begin{align*}
\|f-f_m\|_p\to0.
\end{align*}
For every $N$ and $m$,
\begin{align*}
\|A_Nf-\mathbb{E}[f\mid\mathcal{I}]\|_p
&\leq
\|A_N(f-f_m)\|_p\\
&\quad+
\|A_Nf_m-\mathbb{E}[f_m\mid\mathcal{I}]\|_p\\
&\quad+
\|\mathbb{E}[f_m-f\mid\mathcal{I}]\|_p\\
&\leq
2\|f-f_m\|_p+
\|A_Nf_m-\mathbb{E}[f_m\mid\mathcal{I}]\|_p.
\end{align*}
First choose $m$ so large that $2\|f-f_m\|_p$ is small. Then let $N\to\infty$ and use the bounded case for $f_m$. It follows that
\begin{align*}
\|A_Nf-\mathbb{E}[f\mid\mathcal{I}]\|_p\to0.
\end{align*}
[/step]