[proofplan]
We realize the stationary process as a single measure-preserving dynamical system on path space. Strict stationarity says exactly that the path law is invariant under the left shift. Applying the Birkhoff pointwise ergodic theorem to the coordinate function $x\mapsto x_1$ gives almost sure convergence of the time averages to a shift-invariant limit. In the ergodic case, the invariant limit is constant and equals the space average, which is $\mathbb E[X_0]$.
[/proofplan]
[step:Pass from the stationary process to a shift-invariant path measure]
Define the path map
\begin{align*}
Y:\Omega &\to \mathbb R^{\mathbb Z} \\
\omega &\mapsto (X_t(\omega))_{t\in\mathbb Z}.
\end{align*}
Let $\mu:=\mathbb P\circ Y^{-1}$ be the law of the process on $(\mathbb R^{\mathbb Z},\mathcal B(\mathbb R^{\mathbb Z}))$. Define the left shift
\begin{align*}
S:\mathbb R^{\mathbb Z} &\to \mathbb R^{\mathbb Z} \\
x &\mapsto Sx,
\end{align*}
where $(Sx)_t=x_{t+1}$ for each $t\in\mathbb Z$.
We verify that $S$ preserves $\mu$. Let $m\in\mathbb N$, let $t_1,\dots,t_m\in\mathbb Z$, and let $A_1,\dots,A_m\in\mathcal B(\mathbb R)$. For the cylinder set
\begin{align*}
C:=\{x\in\mathbb R^{\mathbb Z}:x_{t_j}\in A_j\text{ for }1\le j\le m\},
\end{align*}
we have
\begin{align*}
\mu(S^{-1}C)
&=\mathbb P\bigl((Y(\omega))_{t_j+1}\in A_j\text{ for }1\le j\le m\bigr)\\
&=\mathbb P\bigl(X_{t_j+1}\in A_j\text{ for }1\le j\le m\bigr)\\
&=\mathbb P\bigl(X_{t_j}\in A_j\text{ for }1\le j\le m\bigr)\\
&=\mu(C),
\end{align*}
where the third equality is strict stationarity. Since cylinder sets form a $\pi$-system generating $\mathcal B(\mathbb R^{\mathbb Z})$, the $\pi$-$\lambda$ theorem gives $\mu(S^{-1}A)=\mu(A)$ for every $A\in\mathcal B(\mathbb R^{\mathbb Z})$. Thus $S$ is measure-preserving on $(\mathbb R^{\mathbb Z},\mathcal B(\mathbb R^{\mathbb Z}),\mu)$.
[guided]
The goal is to turn the stochastic-process statement into a dynamical-systems statement. The object on which the shift acts is the full path space $\mathbb R^{\mathbb Z}$, whose points are two-sided sequences $x=(x_t)_{t\in\mathbb Z}$. The random map sending an outcome to its whole sample path is
\begin{align*}
Y:\Omega &\to \mathbb R^{\mathbb Z} \\
\omega &\mapsto (X_t(\omega))_{t\in\mathbb Z}.
\end{align*}
The probability measure transported to path space is $\mu:=\mathbb P\circ Y^{-1}$.
Now define the left shift
\begin{align*}
S:\mathbb R^{\mathbb Z} &\to \mathbb R^{\mathbb Z} \\
x &\mapsto Sx,
\end{align*}
with $(Sx)_t=x_{t+1}$. We must check that this transformation preserves $\mu$, because the pointwise ergodic theorem applies to measure-preserving transformations.
It is enough to verify invariance on finite-dimensional cylinder sets. Let
\begin{align*}
C:=\{x\in\mathbb R^{\mathbb Z}:x_{t_j}\in A_j\text{ for }1\le j\le m\},
\end{align*}
where $m\in\mathbb N$, $t_1,\dots,t_m\in\mathbb Z$, and $A_1,\dots,A_m\in\mathcal B(\mathbb R)$. Then $S^{-1}C$ is the event that the shifted path has coordinates in the same sets, so
\begin{align*}
\mu(S^{-1}C)
&=\mathbb P\bigl((Y(\omega))_{t_j+1}\in A_j\text{ for }1\le j\le m\bigr)\\
&=\mathbb P\bigl(X_{t_j+1}\in A_j\text{ for }1\le j\le m\bigr).
\end{align*}
Strict stationarity means that shifting all time indices by the same integer does not change any finite-dimensional distribution. Therefore
\begin{align*}
\mathbb P\bigl(X_{t_j+1}\in A_j\text{ for }1\le j\le m\bigr)
=
\mathbb P\bigl(X_{t_j}\in A_j\text{ for }1\le j\le m\bigr)
=
\mu(C).
\end{align*}
Cylinder sets generate the product Borel $\sigma$-algebra, so the $\pi$-$\lambda$ theorem extends this equality from cylinders to all Borel sets. Hence $\mu(S^{-1}A)=\mu(A)$ for every $A\in\mathcal B(\mathbb R^{\mathbb Z})$.
[/guided]
[/step]
[step:Apply the pointwise ergodic theorem to the first-coordinate function]
Define the first-coordinate function
\begin{align*}
f:\mathbb R^{\mathbb Z} &\to \mathbb R \\
x &\mapsto x_1.
\end{align*}
The function $f$ is $\mathcal B(\mathbb R^{\mathbb Z})$-measurable. Moreover,
\begin{align*}
\int_{\mathbb R^{\mathbb Z}} |f(x)|\,d\mu(x)
=
\mathbb E[|X_1|]
=
\mathbb E[|X_0|]
<\infty,
\end{align*}
where the second equality follows from strict stationarity. Thus $f\in L^1(\mathbb R^{\mathbb Z},\mathcal B(\mathbb R^{\mathbb Z}),\mu)$.
We use the following almost-everywhere invariant form of the Birkhoff Pointwise Ergodic Theorem: for a probability space, a measure-preserving transformation, and an $L^1$ function, the time averages converge almost everywhere to an integrable measurable function $Z^*$ satisfying $Z^*\circ S=Z^*$ $\mu$-almost everywhere. Applying this theorem to $(\mathbb R^{\mathbb Z},\mathcal B(\mathbb R^{\mathbb Z}),\mu,S)$ and $f$, there exists an integrable measurable function
\begin{align*}
Z^*:\mathbb R^{\mathbb Z}\to\mathbb R
\end{align*}
such that $Z^*\circ S=Z^*$ $\mu$-almost everywhere and
\begin{align*}
\frac{1}{n}\sum_{k=0}^{n-1} f(S^k x)\to Z^*(x)
\end{align*}
for $\mu$-almost every $x\in\mathbb R^{\mathbb Z}$.
Let $E_0\in\mathcal B(\mathbb R^{\mathbb Z})$ be the set on which this convergence holds. Since $S$ is invertible and measure-preserving, the set
\begin{align*}
E:=\bigcap_{j\in\mathbb Z} S^{-j}E_0
\end{align*}
satisfies $\mu(E)=1$ and $S^{-1}E=E$. Define the everywhere $S$-invariant representative
\begin{align*}
\widehat Z:\mathbb R^{\mathbb Z}&\to\mathbb R \\
x&\mapsto
\begin{cases}
\displaystyle\lim_{n\to\infty}\frac{1}{n}\sum_{k=0}^{n-1} f(S^k x),& x\in E,\\
0,& x\notin E.
\end{cases}
\end{align*}
The function $\widehat Z$ is measurable as a pointwise limit on the measurable set $E$, extended by $0$ on $E^c$. If $x\in E$, then $Sx\in E$ and
\begin{align*}
\frac{1}{n}\sum_{k=0}^{n-1} f(S^k(Sx))
=
\frac{n+1}{n}\frac{1}{n+1}\sum_{k=0}^{n} f(S^k x)-\frac{1}{n}f(x)
\to
\widehat Z(x),
\end{align*}
so $\widehat Z(Sx)=\widehat Z(x)$. If $x\notin E$, then $Sx\notin E$, hence $\widehat Z(Sx)=0=\widehat Z(x)$. Thus $\widehat Z\circ S=\widehat Z$ everywhere and $\widehat Z=Z^*$ $\mu$-almost everywhere.
For each $x\in\mathbb R^{\mathbb Z}$ and each $k\in\{0,\dots,n-1\}$, the definition of $S$ gives
\begin{align*}
f(S^k x)=(S^k x)_1=x_{k+1}.
\end{align*}
Therefore
\begin{align*}
\frac{1}{n}\sum_{k=0}^{n-1} f(S^k x)
=
\frac{1}{n}\sum_{t=1}^{n}x_t.
\end{align*}
[guided]
The coordinate function to which we apply the pointwise ergodic theorem is
\begin{align*}
f:\mathbb R^{\mathbb Z} &\to \mathbb R \\
x &\mapsto x_1.
\end{align*}
It is measurable because it is a coordinate projection on the product measurable space. Its integrability is exactly the integrability hypothesis on the stationary process:
\begin{align*}
\int_{\mathbb R^{\mathbb Z}} |f(x)|\,d\mu(x)
=
\mathbb E[|X_1|].
\end{align*}
Strict stationarity gives $X_1$ and $X_0$ the same distribution, so
\begin{align*}
\mathbb E[|X_1|]=\mathbb E[|X_0|]<\infty.
\end{align*}
Thus $f\in L^1(\mu)$.
We now apply the almost-everywhere invariant form of the Birkhoff Pointwise Ergodic Theorem. Its hypotheses are satisfied: $(\mathbb R^{\mathbb Z},\mathcal B(\mathbb R^{\mathbb Z}),\mu)$ is a probability space, the map $S$ is measure-preserving by the previous step, and $f$ is integrable. Hence there exists an integrable measurable function
\begin{align*}
Z^*:\mathbb R^{\mathbb Z}\to\mathbb R
\end{align*}
such that $Z^*\circ S=Z^*$ $\mu$-almost everywhere and
\begin{align*}
\frac{1}{n}\sum_{k=0}^{n-1} f(S^k x)\to Z^*(x)
\end{align*}
for $\mu$-almost every $x$.
Because this conclusion gives invariance only outside a null set, we now choose an everywhere invariant representative. Let $E_0$ be the set where the displayed convergence holds. Since the shift $S$ is invertible and measure-preserving, the set
\begin{align*}
E:=\bigcap_{j\in\mathbb Z} S^{-j}E_0
\end{align*}
has $\mu(E)=1$ and satisfies $S^{-1}E=E$. Define
\begin{align*}
\widehat Z:\mathbb R^{\mathbb Z}&\to\mathbb R \\
x&\mapsto
\begin{cases}
\displaystyle\lim_{n\to\infty}\frac{1}{n}\sum_{k=0}^{n-1} f(S^k x),& x\in E,\\
0,& x\notin E.
\end{cases}
\end{align*}
This function is measurable because it is the pointwise limit of measurable averages on $E$ and is extended by a constant on $E^c$. It agrees with $Z^*$ $\mu$-almost everywhere. Moreover, if $x\in E$, then $Sx\in E$, and shifting the average changes only the first term and the normalization:
\begin{align*}
\frac{1}{n}\sum_{k=0}^{n-1} f(S^k(Sx))
=
\frac{n+1}{n}\frac{1}{n+1}\sum_{k=0}^{n} f(S^k x)-\frac{1}{n}f(x).
\end{align*}
Taking limits gives $\widehat Z(Sx)=\widehat Z(x)$. If $x\notin E$, then $Sx\notin E$, so both values are $0$. Thus $\widehat Z\circ S=\widehat Z$ everywhere.
Finally we translate this back into the coordinates of the sequence. Since $(S^k x)_1=x_{k+1}$, we have
\begin{align*}
f(S^k x)=(S^k x)_1=x_{k+1}.
\end{align*}
Consequently
\begin{align*}
\frac{1}{n}\sum_{k=0}^{n-1} f(S^k x)
=
\frac{1}{n}\sum_{t=1}^{n}x_t.
\end{align*}
This is exactly the sample average appearing in the theorem.
[/guided]
[/step]
[step:Pull the everywhere invariant representative back to the original probability space]
Define
\begin{align*}
X^*:\Omega &\to \mathbb R \\
\omega &\mapsto \widehat Z(Y(\omega)).
\end{align*}
Since $Y$ and $\widehat Z$ are measurable, $X^*$ is an $\mathcal F$-measurable real-valued random variable. Since $\widehat Z\circ S=\widehat Z$ everywhere, $X^*$ is shift-invariant in the stated pathwise sense.
The convergence obtained on path space transfers through the path map. Indeed, since $\mu=\mathbb P\circ Y^{-1}$ and $\mu(E)=1$, we have $\mathbb P(Y^{-1}(E))=1$. For every $\omega\in Y^{-1}(E)$,
\begin{align*}
\frac{1}{n}\sum_{t=1}^{n}X_t(\omega)
=
\frac{1}{n}\sum_{t=1}^{n}(Y(\omega))_t
\to
\widehat Z(Y(\omega))
=
X^*(\omega).
\end{align*}
This proves the almost sure convergence assertion.
[guided]
We now return from the canonical path space to the original probability space. The invariant limit on path space is the measurable function $\widehat Z:\mathbb R^{\mathbb Z}\to\mathbb R$ constructed in the previous step, so we define the random variable
\begin{align*}
X^*:\Omega &\to \mathbb R \\
\omega &\mapsto \widehat Z(Y(\omega)).
\end{align*}
This map is measurable because $Y$ is measurable and $\widehat Z$ is measurable.
The reason for replacing $Z^*$ by $\widehat Z$ is precisely invariance. The Birkhoff theorem only produced $Z^*\circ S=Z^*$ $\mu$-almost everywhere, while the representative $\widehat Z$ satisfies $\widehat Z\circ S=\widehat Z$ everywhere. Therefore the pulled-back random variable is shift-invariant in the pathwise sense encoded by the sample-path map.
It remains to transfer convergence. Since $\mu=\mathbb P\circ Y^{-1}$ and $\mu(E)=1$, the event $Y^{-1}(E)$ has probability $1$. For every $\omega\in Y^{-1}(E)$, the path $Y(\omega)$ lies in the set where the averages converge to $\widehat Z$, and hence
\begin{align*}
\frac{1}{n}\sum_{t=1}^{n}X_t(\omega)
=
\frac{1}{n}\sum_{t=1}^{n}(Y(\omega))_t
\to
\widehat Z(Y(\omega))
=
X^*(\omega).
\end{align*}
Thus the convergence holds $\mathbb P$-almost surely on the original probability space.
[/guided]
[/step]
[step:Identify the limit under ergodicity]
Assume the stationary process is ergodic, meaning that every $S$-invariant Borel set in $\mathbb R^{\mathbb Z}$ has $\mu$-measure $0$ or $1$. The Birkhoff Pointwise Ergodic Theorem in the ergodic case gives
\begin{align*}
Z^*(x)=\int_{\mathbb R^{\mathbb Z}} f(y)\,d\mu(y)
\end{align*}
for $\mu$-almost every $x\in\mathbb R^{\mathbb Z}$. Since $\widehat Z=Z^*$ $\mu$-almost everywhere, the same identity holds with $\widehat Z$ in place of $Z^*$. By the definition of $f$ and $\mu$,
\begin{align*}
\int_{\mathbb R^{\mathbb Z}} f(y)\,d\mu(y)
=
\mathbb E[X_1]
=
\mathbb E[X_0],
\end{align*}
where the final equality follows from strict stationarity. Pulling this identity back through $Y$ gives
\begin{align*}
X^*(\omega)=\widehat Z(Y(\omega))=\mathbb E[X_0]
\end{align*}
for $\mathbb P$-almost every $\omega\in\Omega$. This proves the ergodic conclusion and completes the proof.
[guided]
In the non-ergodic case, Birkhoff gives a limit that may depend on the invariant part of the trajectory. Ergodicity removes that dependence: every invariant measurable function is almost surely constant. The ergodic form of the Birkhoff Pointwise Ergodic Theorem states that the limit equals the space average:
\begin{align*}
Z^*(x)=\int_{\mathbb R^{\mathbb Z}} f(y)\,d\mu(y)
\end{align*}
for $\mu$-almost every $x\in\mathbb R^{\mathbb Z}$.
We compute this space average using the definition of the path law. Since $f(y)=y_1$,
\begin{align*}
\int_{\mathbb R^{\mathbb Z}} f(y)\,d\mu(y)
=
\mathbb E[X_1].
\end{align*}
Strict stationarity gives $X_1$ and $X_0$ the same distribution. Since $X_0$ is integrable, both expectations are finite and equal:
\begin{align*}
\mathbb E[X_1]=\mathbb E[X_0].
\end{align*}
Therefore
\begin{align*}
\widehat Z(x)=\mathbb E[X_0]
\end{align*}
for $\mu$-almost every $x$. Because $X^*=\widehat Z\circ Y$ and $\mu=\mathbb P\circ Y^{-1}$, the same identity holds on the original probability space:
\begin{align*}
X^*(\omega)=\mathbb E[X_0]
\end{align*}
for $\mathbb P$-almost every $\omega\in\Omega$.
[/guided]
[/step]