[proofplan]
The finite past spaces $M_p(t)$ form an increasing sequence of subspaces whose union is dense in the closed past space $\mathcal{H}_{t-1}^X$. We prove directly, inside the Hilbert space $L^2(\Omega,\mathcal{F},\mathbb{P})$, that orthogonal projections onto such an increasing family converge in norm to the projection onto the closure of their union. The key point is that the approximation errors decrease to the optimal error over the limiting closed space, and the Pythagorean identity converts convergence of errors into convergence of the projected random variables.
[/proofplan]
[step:Place the prediction problem inside one Hilbert space]
Fix $t \in \mathbb{Z}$. Let
\begin{align*}
H := L^2(\Omega,\mathcal{F},\mathbb{P})
\end{align*}
with inner product
\begin{align*}
(Y,Z)_H := \mathbb{E}[YZ]
\end{align*}
and norm $\|Y\|_H := (\mathbb{E}[Y^2])^{1/2}$ for real-valued random variables $Y,Z \in H$. Since $(X_t)_{t \in \mathbb{Z}}$ is second-order stationary, $X_t \in H$.
For each $p \in \mathbb{N}$, define
\begin{align*}
M_p := M_p(t) = \operatorname{span}\{X_{t-1},X_{t-2},\dots,X_{t-p}\}.
\end{align*}
The spaces are increasing: if $p \leq q$, then $M_p \subset M_q$. Define
\begin{align*}
M := \overline{\bigcup_{p=1}^{\infty} M_p}^{\,H}.
\end{align*}
Because
\begin{align*}
\bigcup_{p=1}^{\infty} M_p
=
\operatorname{span}\{X_s : s \leq t-1\},
\end{align*}
we have $M = \mathcal{H}_{t-1}^X$.
Thus it remains to prove that, for $Y := X_t$,
\begin{align*}
P_{M_p}Y \to P_MY
\end{align*}
in $H$.
[guided]
We first translate the stochastic notation into Hilbert-space notation. The ambient Hilbert space is
\begin{align*}
H := L^2(\Omega,\mathcal{F},\mathbb{P}),
\end{align*}
equipped with the inner product
\begin{align*}
(Y,Z)_H := \mathbb{E}[YZ]
\end{align*}
and norm $\|Y\|_H := (\mathbb{E}[Y^2])^{1/2}$. The hypothesis that the process is second-order stationary ensures, in particular, that each $X_t$ is square-integrable, so the random variable being predicted belongs to $H$.
For the fixed time $t$, the finite predictor space with $p$ lags is
\begin{align*}
M_p := \operatorname{span}\{X_{t-1},X_{t-2},\dots,X_{t-p}\}.
\end{align*}
These spaces increase with $p$: adding more past variables can only enlarge the span, so $M_p \subset M_q$ whenever $p \le q$. Their union consists exactly of all finite linear combinations of past variables $X_s$ with $s \le t-1$:
\begin{align*}
\bigcup_{p=1}^{\infty} M_p
=
\operatorname{span}\{X_s : s \leq t-1\}.
\end{align*}
Taking the closure in $L^2(\Omega,\mathcal{F},\mathbb{P})$ gives
\begin{align*}
\overline{\bigcup_{p=1}^{\infty} M_p}^{\,H}
=
\mathcal{H}_{t-1}^X.
\end{align*}
So the theorem is now a Hilbert-space assertion: if $Y := X_t$, if $P_{M_p}Y$ is the orthogonal projection of $Y$ onto $M_p$, and if $P_MY$ is the orthogonal projection of $Y$ onto $M := \mathcal{H}_{t-1}^X$, then $P_{M_p}Y \to P_MY$ in $H$.
[/guided]
[/step]
[step:Show that the finite projection errors converge to the limiting projection error]
Let
\begin{align*}
Q_pY := P_{M_p}Y,
\qquad
QY := P_MY.
\end{align*}
Define the finite prediction error numbers $a_p \in [0,\infty)$ by
\begin{align*}
a_p := \|Y - Q_pY\|_H.
\end{align*}
Since $M_p \subset M_{p+1}$, the minimization property of orthogonal projection gives
\begin{align*}
a_{p+1}
=
\inf_{V \in M_{p+1}}\|Y - V\|_H
\leq
\inf_{V \in M_p}\|Y - V\|_H
=
a_p.
\end{align*}
Thus $(a_p)_{p=1}^{\infty}$ is decreasing and bounded below by $0$, so it has a limit $a \geq 0$.
Because $M_p \subset M$ for every $p$,
\begin{align*}
\|Y-QY\|_H
=
\inf_{V \in M}\|Y-V\|_H
\leq
\inf_{V \in M_p}\|Y-V\|_H
=
a_p.
\end{align*}
Hence $\|Y-QY\|_H \leq a$.
Conversely, let $\varepsilon > 0$. Since $\bigcup_{p=1}^{\infty}M_p$ is dense in $M$, there exists $W \in \bigcup_{p=1}^{\infty}M_p$ such that
\begin{align*}
\|QY-W\|_H < \varepsilon.
\end{align*}
Choose $p_0 \in \mathbb{N}$ such that $W \in M_{p_0}$. For every $p \geq p_0$, we have $W \in M_p$, and therefore
\begin{align*}
a_p
=
\inf_{V \in M_p}\|Y-V\|_H
\leq
\|Y-W\|_H
\leq
\|Y-QY\|_H + \|QY-W\|_H
<
\|Y-QY\|_H + \varepsilon.
\end{align*}
Taking $p \to \infty$ gives $a \leq \|Y-QY\|_H + \varepsilon$. Since $\varepsilon > 0$ was arbitrary,
\begin{align*}
a = \|Y-QY\|_H.
\end{align*}
[guided]
The goal of this step is to compare two approximation errors: the best error using only $p$ past variables, and the best error using the entire closed past space.
Let
\begin{align*}
Q_pY := P_{M_p}Y,
\qquad
QY := P_MY.
\end{align*}
For each $p \in \mathbb{N}$, define
\begin{align*}
a_p := \|Y-Q_pY\|_H.
\end{align*}
The orthogonal projection $Q_pY$ is the best approximation to $Y$ from $M_p$, so
\begin{align*}
a_p = \inf_{V \in M_p}\|Y-V\|_H.
\end{align*}
Because the spaces increase, $M_p \subset M_{p+1}$. Minimizing over a larger set cannot increase the infimum, and therefore
\begin{align*}
a_{p+1}
=
\inf_{V \in M_{p+1}}\|Y-V\|_H
\leq
\inf_{V \in M_p}\|Y-V\|_H
=
a_p.
\end{align*}
Thus $(a_p)$ is decreasing and nonnegative, so there exists $a \geq 0$ with $a_p \to a$.
Now compare $a$ with the limiting projection error. Since every $M_p$ is contained in $M$, the limiting space allows at least as many approximants as each finite space:
\begin{align*}
\|Y-QY\|_H
=
\inf_{V \in M}\|Y-V\|_H
\leq
\inf_{V \in M_p}\|Y-V\|_H
=
a_p.
\end{align*}
Letting $p \to \infty$ gives $\|Y-QY\|_H \leq a$.
For the reverse inequality, we use density. The space $M$ is the closure of $\bigcup_p M_p$, so the limiting projection $QY \in M$ can be approximated by some vector from a finite past space. Given $\varepsilon > 0$, choose
\begin{align*}
W \in \bigcup_{p=1}^{\infty}M_p
\end{align*}
such that
\begin{align*}
\|QY-W\|_H < \varepsilon.
\end{align*}
There is some $p_0 \in \mathbb{N}$ with $W \in M_{p_0}$. Since the spaces are increasing, $W \in M_p$ for all $p \ge p_0$. Hence, for $p \ge p_0$,
\begin{align*}
a_p
=
\inf_{V \in M_p}\|Y-V\|_H
\leq
\|Y-W\|_H
\leq
\|Y-QY\|_H+\|QY-W\|_H
<
\|Y-QY\|_H+\varepsilon.
\end{align*}
Passing to the limit gives $a \leq \|Y-QY\|_H+\varepsilon$. Since $\varepsilon > 0$ was arbitrary,
\begin{align*}
a = \|Y-QY\|_H.
\end{align*}
[/guided]
[/step]
[step:Convert convergence of errors into convergence of projections]
Since $QY \in M$ and $Q_pY \in M_p \subset M$, we have $QY-Q_pY \in M$. Also, $Y-QY$ is orthogonal to $M$ by the defining property of the orthogonal projection onto $M$. Therefore
\begin{align*}
(Y-QY,\,QY-Q_pY)_H = 0.
\end{align*}
Using the orthogonal decomposition
\begin{align*}
Y-Q_pY = (Y-QY) + (QY-Q_pY),
\end{align*}
the Pythagorean identity gives
\begin{align*}
\|Y-Q_pY\|_H^2
=
\|Y-QY\|_H^2+\|QY-Q_pY\|_H^2.
\end{align*}
Hence
\begin{align*}
\|Q_pY-QY\|_H^2
=
a_p^2-\|Y-QY\|_H^2.
\end{align*}
By the previous step, $a_p \to \|Y-QY\|_H$, so the right-hand side tends to $0$. Therefore
\begin{align*}
\|Q_pY-QY\|_H \to 0.
\end{align*}
Substituting $Y=X_t$, $Q_pY=P_pX_t$, and $QY=P_{\mathcal{H}_{t-1}^X}X_t$, we obtain
\begin{align*}
\|P_pX_t-P_{\mathcal{H}_{t-1}^X}X_t\|_{L^2(\Omega,\mathcal{F},\mathbb{P})}\to 0.
\end{align*}
This is the desired convergence in $L^2$.
[/step]