[proofplan]
Expand $X_{t+h}$ by its Wold representation and split the series into the terms involving innovations dated at most $t$ and the terms involving innovations dated after $t$. The first part lies in the closed linear span $\mathcal H_t^X$, while the second part is orthogonal to $\mathcal H_t^X$ because each future innovation is orthogonal to every earlier past space. The Hilbert-space projection characterization then identifies the first part as the best linear predictor, and the variance formula follows from pairwise orthogonality of the innovation terms.
[/proofplan]
[step:Split the Wold expansion into past and future innovation terms]
Fix $t \in \mathbb Z$ and an integer $h \geq 1$. By the Wold representation applied at time $t+h$, the series
\begin{align*}
X_{t+h}
=
\sum_{j=0}^{\infty}\psi_j\varepsilon_{t+h-j}
\end{align*}
converges in $L^2(\Omega,\mathcal F,\mathbb P)$.
Define the finite future-innovation term $R_{t,h} \in L^2(\Omega,\mathcal F,\mathbb P)$ by
\begin{align*}
R_{t,h}
:=
\sum_{j=0}^{h-1}\psi_j\varepsilon_{t+h-j},
\end{align*}
and define the past-innovation term $S_{t,h} \in L^2(\Omega,\mathcal F,\mathbb P)$ by
\begin{align*}
S_{t,h}
:=
\sum_{j=h}^{\infty}\psi_j\varepsilon_{t+h-j}.
\end{align*}
The second series converges in $L^2(\Omega,\mathcal F,\mathbb P)$ because it is the tail of the $L^2$-convergent Wold series. Hence
\begin{align*}
X_{t+h}=R_{t,h}+S_{t,h}
\end{align*}
in $L^2(\Omega,\mathcal F,\mathbb P)$.
[/step]
[step:Show that the past-innovation term belongs to $\mathcal H_t^X$]
For each integer $j \geq h$, set $s_j := t+h-j$. Then $s_j \leq t$. Since $\varepsilon_{s_j} \in \mathcal H_{s_j}^X$ by the innovation hypothesis and $\mathcal H_{s_j}^X \subset \mathcal H_t^X$ because $s_j \leq t$, we have
\begin{align*}
\varepsilon_{t+h-j} \in \mathcal H_t^X
\end{align*}
for every $j \geq h$.
Each finite partial sum
\begin{align*}
S_{t,h,N}
:=
\sum_{j=h}^{N}\psi_j\varepsilon_{t+h-j},
\qquad N \geq h,
\end{align*}
therefore belongs to $\mathcal H_t^X$. Since $\mathcal H_t^X$ is closed in $L^2(\Omega,\mathcal F,\mathbb P)$ and $S_{t,h,N} \to S_{t,h}$ in $L^2(\Omega,\mathcal F,\mathbb P)$, it follows that
\begin{align*}
S_{t,h} \in \mathcal H_t^X.
\end{align*}
[guided]
We need to prove that the candidate predictor is actually an admissible linear predictor from the past. For $j \geq h$, define the integer $s_j := t+h-j$. The inequality $j \geq h$ gives $s_j \leq t$, so the innovation $\varepsilon_{s_j}$ is dated no later than time $t$.
By hypothesis, every innovation satisfies $\varepsilon_s \in \mathcal H_s^X$. Since the past spaces are increasing in time, $s_j \leq t$ implies $\mathcal H_{s_j}^X \subset \mathcal H_t^X$. Therefore
\begin{align*}
\varepsilon_{t+h-j}=\varepsilon_{s_j}\in \mathcal H_t^X
\end{align*}
for every $j \geq h$.
Now define, for each integer $N \geq h$,
\begin{align*}
S_{t,h,N}
:=
\sum_{j=h}^{N}\psi_j\varepsilon_{t+h-j}.
\end{align*}
This is a finite linear combination of elements of $\mathcal H_t^X$, so $S_{t,h,N} \in \mathcal H_t^X$. The infinite series defining $S_{t,h}$ converges in $L^2(\Omega,\mathcal F,\mathbb P)$ because it is a tail of the Wold series. Since $\mathcal H_t^X$ is a closed subspace of $L^2(\Omega,\mathcal F,\mathbb P)$, the $L^2$-limit of the sequence $(S_{t,h,N})_{N \geq h}$ still belongs to $\mathcal H_t^X$. Hence
\begin{align*}
S_{t,h} \in \mathcal H_t^X.
\end{align*}
[/guided]
[/step]
[step:Show that the future-innovation error is orthogonal to $\mathcal H_t^X$]
For each $0 \leq j < h$, set $r_j := t+h-j$. Then $r_j > t$. Since $\varepsilon_{r_j} \in \mathcal H_{r_j}^X \ominus \mathcal H_{r_j-1}^X$, we have
\begin{align*}
\mathbb E[\varepsilon_{r_j}Y]=0
\end{align*}
for every $Y \in \mathcal H_{r_j-1}^X$. Because $t \leq r_j-1$, we have $\mathcal H_t^X \subset \mathcal H_{r_j-1}^X$. Therefore
\begin{align*}
\mathbb E[\varepsilon_{t+h-j}Y]=0
\end{align*}
for every $Y \in \mathcal H_t^X$ and every $0 \leq j < h$.
By finite linearity of the $L^2$ inner product, for every $Y \in \mathcal H_t^X$,
\begin{align*}
\mathbb E[R_{t,h}Y]
=
\sum_{j=0}^{h-1}\psi_j\mathbb E[\varepsilon_{t+h-j}Y]
=
0.
\end{align*}
Thus
\begin{align*}
R_{t,h} \perp \mathcal H_t^X.
\end{align*}
[guided]
The terms in $R_{t,h}$ involve innovations after time $t$, and we must show that their whole linear combination is orthogonal to every admissible predictor from $\mathcal H_t^X$.
Fix $0 \leq j < h$ and define $r_j := t+h-j$. Since $j<h$, we have $r_j>t$. The innovation hypothesis says
\begin{align*}
\varepsilon_{r_j}\in \mathcal H_{r_j}^X\ominus \mathcal H_{r_j-1}^X.
\end{align*}
The meaning of the orthogonal difference is that $\varepsilon_{r_j}$ is orthogonal to every element of $\mathcal H_{r_j-1}^X$. Since $t \leq r_j-1$, the past space at time $t$ is contained in the past space at time $r_j-1$:
\begin{align*}
\mathcal H_t^X \subset \mathcal H_{r_j-1}^X.
\end{align*}
Therefore, for every $Y \in \mathcal H_t^X$,
\begin{align*}
\mathbb E[\varepsilon_{t+h-j}Y]
=
\mathbb E[\varepsilon_{r_j}Y]
=
0.
\end{align*}
Now use that $R_{t,h}$ is a finite sum:
\begin{align*}
R_{t,h}
=
\sum_{j=0}^{h-1}\psi_j\varepsilon_{t+h-j}.
\end{align*}
For every $Y \in \mathcal H_t^X$, finite linearity of the $L^2$ inner product gives
\begin{align*}
\mathbb E[R_{t,h}Y]
=
\sum_{j=0}^{h-1}\psi_j\mathbb E[\varepsilon_{t+h-j}Y]
=
0.
\end{align*}
This proves that the entire forecast error candidate is orthogonal to the past space:
\begin{align*}
R_{t,h}\perp \mathcal H_t^X.
\end{align*}
[/guided]
[/step]
[step:Identify the orthogonal projection onto the past]
We have written
\begin{align*}
X_{t+h}=S_{t,h}+R_{t,h},
\end{align*}
where $S_{t,h}\in \mathcal H_t^X$ and $R_{t,h}\perp \mathcal H_t^X$. By the defining characterization of orthogonal projection in the Hilbert space $L^2(\Omega,\mathcal F,\mathbb P)$, this implies
\begin{align*}
P_{\mathcal H_t^X}X_{t+h}=S_{t,h}.
\end{align*}
Therefore
\begin{align*}
P_{\mathcal H_t^X}X_{t+h}
=
\sum_{j=h}^{\infty}\psi_j\varepsilon_{t+h-j},
\end{align*}
and subtracting this identity from the Wold expansion gives
\begin{align*}
X_{t+h}-P_{\mathcal H_t^X}X_{t+h}
=
R_{t,h}
=
\sum_{j=0}^{h-1}\psi_j\varepsilon_{t+h-j}.
\end{align*}
[/step]
[step:Compute the mean-square forecast error using innovation orthogonality]
Using the forecast error representation and expanding the square in $L^2(\Omega,\mathcal F,\mathbb P)$, we obtain
\begin{align*}
\mathbb E\left[\left|X_{t+h}-P_{\mathcal H_t^X}X_{t+h}\right|^2\right]
&=
\mathbb E\left[\left|\sum_{j=0}^{h-1}\psi_j\varepsilon_{t+h-j}\right|^2\right] \\
&=
\sum_{j=0}^{h-1}\sum_{k=0}^{h-1}\psi_j\psi_k\,
\mathbb E[\varepsilon_{t+h-j}\varepsilon_{t+h-k}].
\end{align*}
The innovation orthogonality assumption gives
\begin{align*}
\mathbb E[\varepsilon_{t+h-j}\varepsilon_{t+h-k}]
=
\begin{cases}
\sigma^2, & j=k,\\
0, & j\neq k.
\end{cases}
\end{align*}
Hence all off-diagonal terms vanish, and the diagonal terms give
\begin{align*}
\mathbb E\left[\left|X_{t+h}-P_{\mathcal H_t^X}X_{t+h}\right|^2\right]
=
\sigma^2\sum_{j=0}^{h-1}|\psi_j|^2.
\end{align*}
This is the stated mean-square forecast error formula.
[/step]