[proofplan]
The [Wold representation](/page/Wold%20Representation) expresses every $X_t$ as an $L^2$ limit of finite linear combinations of innovations, so $\mathcal H^X\subset \mathcal H^\varepsilon$. The reverse inclusion uses the defining Wold innovation identity $\varepsilon_t=X_t-P_{\mathcal M_{t-1}}X_t$, now included in the theorem statement, which shows that each innovation is formed from the closed linear span of the process. Once the closed spans are equal, the inclusion map is the identity on the same Hilbert subspace, so it is unitary and intertwines the two shift descriptions. The final step records this equality in the orthogonal innovation basis: the coordinate map $J$ has norm scaling factor $\sigma$, equivalently $\sigma^{-1}J$ is unitary, and the Wold representation becomes the one-sided filter $\psi(B)$.
[/proofplan]
[step:Show that the process span is contained in the innovation span]
Let $\mathcal H$ denote the real Hilbert space $L^2(\Omega,\mathcal F,\mathbb P)$ with inner product
\begin{align*}
(Y,Z)_{\mathcal H}:=\mathbb E[YZ].
\end{align*}
For each $t\in\mathbb Z$ and each $m\in\{0,1,2,\dots\}$, define
\begin{align*}
X_{t,m}:=\sum_{j=0}^{m}\psi_j\varepsilon_{t-j}.
\end{align*}
Each $X_{t,m}$ belongs to $\operatorname{span}\{\varepsilon_s:s\in\mathbb Z\}$. By the assumed $L^2$ convergence of the Wold representation,
\begin{align*}
\lim_{m\to\infty}\|X_{t,m}-X_t\|_{\mathcal H}=0.
\end{align*}
Since $\mathcal H^\varepsilon$ is closed, $X_t\in\mathcal H^\varepsilon$ for every $t\in\mathbb Z$. Taking closed linear spans gives
\begin{align*}
\mathcal H^X\subset \mathcal H^\varepsilon.
\end{align*}
[guided]
We first use only the analytic content of the Wold representation: it is an $L^2$ limit of finite innovation sums. Let $\mathcal H=L^2(\Omega,\mathcal F,\mathbb P)$, equipped with
\begin{align*}
(Y,Z)_{\mathcal H}:=\mathbb E[YZ].
\end{align*}
For fixed $t\in\mathbb Z$, define the finite partial sums
\begin{align*}
X_{t,m}:=\sum_{j=0}^{m}\psi_j\varepsilon_{t-j},
\qquad m\in\{0,1,2,\dots\}.
\end{align*}
The random variable $X_{t,m}$ is a finite linear combination of elements of the set $\{\varepsilon_s:s\in\mathbb Z\}$, hence
\begin{align*}
X_{t,m}\in \operatorname{span}\{\varepsilon_s:s\in\mathbb Z\}\subset \mathcal H^\varepsilon.
\end{align*}
The Wold representation states precisely that these partial sums converge to $X_t$ in $L^2(\Omega,\mathcal F,\mathbb P)$:
\begin{align*}
\lim_{m\to\infty}\|X_{t,m}-X_t\|_{\mathcal H}=0.
\end{align*}
Because $\mathcal H^\varepsilon$ is defined as the closed linear span of the innovations, it contains all $L^2$ limits of such finite linear combinations. Therefore $X_t\in\mathcal H^\varepsilon$ for every $t\in\mathbb Z$. Since $\mathcal H^X$ is the closed linear span of all the $X_t$, this proves
\begin{align*}
\mathcal H^X\subset \mathcal H^\varepsilon.
\end{align*}
[/guided]
[/step]
[step:Recover every innovation from the process span]
By the innovation identity in the stated [Wold representation](/page/Wold%20Representation), the innovation $\varepsilon_t$ is the one-step prediction error at time $t$:
\begin{align*}
\varepsilon_t=X_t-P_{\mathcal M_{t-1}}X_t,
\end{align*}
where $\mathcal M_{t-1}:=\overline{\operatorname{span}}\{X_s:s\leq t-1\}$ and $P_{\mathcal M_{t-1}}:\mathcal H\to\mathcal M_{t-1}$ is the orthogonal projection. Since $\mathcal M_{t-1}\subset\mathcal H^X$, both $X_t$ and $P_{\mathcal M_{t-1}}X_t$ belong to $\mathcal H^X$. Hence $\varepsilon_t\in\mathcal H^X$ for every $t\in\mathbb Z$. Taking closed linear spans gives
\begin{align*}
\mathcal H^\varepsilon\subset \mathcal H^X.
\end{align*}
[guided]
The reverse inclusion uses the structural meaning of the word "innovation" in the [Wold representation](/page/Wold%20Representation), namely the prediction-error identity now stated as a hypothesis. For each $t\in\mathbb Z$, define the past closed subspace
\begin{align*}
\mathcal M_{t-1}:=\overline{\operatorname{span}}\{X_s:s\leq t-1\}.
\end{align*}
This is a closed Hilbert subspace of $\mathcal H$, so the orthogonal projection
\begin{align*}
P_{\mathcal M_{t-1}}:\mathcal H\to\mathcal M_{t-1}
\end{align*}
is defined. In the Wold construction, the innovation at time $t$ is the one-step prediction error
\begin{align*}
\varepsilon_t=X_t-P_{\mathcal M_{t-1}}X_t.
\end{align*}
Now $\mathcal M_{t-1}\subset\mathcal H^X$ because $\mathcal M_{t-1}$ is generated by a subset of the variables used to generate $\mathcal H^X$. Therefore
\begin{align*}
P_{\mathcal M_{t-1}}X_t\in \mathcal M_{t-1}\subset\mathcal H^X.
\end{align*}
Also $X_t\in\mathcal H^X$ by the definition of $\mathcal H^X$. Since $\mathcal H^X$ is a linear subspace, the difference
\begin{align*}
X_t-P_{\mathcal M_{t-1}}X_t
\end{align*}
belongs to $\mathcal H^X$. Thus $\varepsilon_t\in\mathcal H^X$ for every $t\in\mathbb Z$. Taking the closed linear span of the innovations yields
\begin{align*}
\mathcal H^\varepsilon\subset \mathcal H^X.
\end{align*}
[/guided]
[/step]
[step:Identify the two Hilbert spaces by the inclusion map]
Combining the two inclusions gives
\begin{align*}
\mathcal H^X=\mathcal H^\varepsilon.
\end{align*}
The map
\begin{align*}
U:\mathcal H^\varepsilon&\to\mathcal H^X\\
Y&\mapsto Y
\end{align*}
is therefore the identity map on one closed Hilbert subspace of $L^2(\Omega,\mathcal F,\mathbb P)$. For all $Y,Z\in\mathcal H^\varepsilon$,
\begin{align*}
(UY,UZ)_{\mathcal H}=(Y,Z)_{\mathcal H}.
\end{align*}
Thus $U$ is linear, bijective, and inner-product preserving, so $U$ is a unitary Hilbert-space isomorphism.
[/step]
[step:Verify that the identification intertwines the shifts]
First define $S_\varepsilon$ on $\operatorname{span}\{\varepsilon_t:t\in\mathbb Z\}$ by linear extension of $S_\varepsilon\varepsilon_t=\varepsilon_{t+1}$. For every finitely supported scalar family $(a_t)_{t\in\mathbb Z}$, innovation orthogonality gives
\begin{align*}
\left\|\sum_{t\in\mathbb Z}a_t\varepsilon_{t+1}\right\|_{\mathcal H}^2
&=\sum_{t\in\mathbb Z}\sum_{s\in\mathbb Z}a_ta_s\mathbb E[\varepsilon_{t+1}\varepsilon_{s+1}]\\
&=\sigma^2\sum_{t\in\mathbb Z}|a_t|^2\\
&=\left\|\sum_{t\in\mathbb Z}a_t\varepsilon_t\right\|_{\mathcal H}^2.
\end{align*}
Thus $S_\varepsilon$ extends uniquely to an isometry $S_\varepsilon:\mathcal H^\varepsilon\to\mathcal H^\varepsilon$.
Similarly, define $S_X$ first on formal finite linear combinations by shifting each generator according to $S_XX_t=X_{t+1}$. For every finitely supported scalar family $(b_t)_{t\in\mathbb Z}$, second-order stationarity gives
\begin{align*}
\left\|\sum_{t\in\mathbb Z}b_tX_{t+1}\right\|_{\mathcal H}^2
&=\sum_{t\in\mathbb Z}\sum_{s\in\mathbb Z}b_tb_s\mathbb E[X_{t+1}X_{s+1}]\\
&=\sum_{t\in\mathbb Z}\sum_{s\in\mathbb Z}b_tb_s\mathbb E[X_tX_s]\\
&=\left\|\sum_{t\in\mathbb Z}b_tX_t\right\|_{\mathcal H}^2.
\end{align*}
If $\sum_{t\in\mathbb Z}b_tX_t=0$ in $\mathcal H$, the displayed identity gives $\|\sum_{t\in\mathbb Z}b_tX_{t+1}\|_{\mathcal H}=0$, so the shifted value is independent of the chosen finite representation. Hence $S_X$ is a well-defined isometry on $\operatorname{span}\{X_t:t\in\mathbb Z\}$ and extends uniquely to an isometry $S_X:\mathcal H^X\to\mathcal H^X$.
For fixed $t\in\mathbb Z$, the partial sums $X_{t,m}=\sum_{j=0}^{m}\psi_j\varepsilon_{t-j}$ satisfy $X_{t,m}\to X_t$ in $\mathcal H$. Since $S_\varepsilon$ is continuous,
\begin{align*}
S_\varepsilon X_t
&=\lim_{m\to\infty}S_\varepsilon X_{t,m}\\
&=\lim_{m\to\infty}\sum_{j=0}^{m}\psi_j\varepsilon_{t+1-j}\\
&=X_{t+1}\\
&=S_XX_t,
\end{align*}
where the limits are in $L^2(\Omega,\mathcal F,\mathbb P)$. Thus $S_\varepsilon$ and $S_X$ agree on $\operatorname{span}\{X_t:t\in\mathbb Z\}$. This span is dense in $\mathcal H^X=\mathcal H^\varepsilon$, and both shifts are continuous, so they agree on all of $\mathcal H^\varepsilon$. Since $U$ is the identity map under this identification,
\begin{align*}
US_\varepsilon=S_XU.
\end{align*}
[guided]
We must first justify that both shift rules define bounded operators before applying a shift to an infinite Wold series. Define $S_\varepsilon$ on the finite innovation span by linear extension of the rule
\begin{align*}
S_\varepsilon\varepsilon_t=\varepsilon_{t+1}.
\end{align*}
Let $(a_t)_{t\in\mathbb Z}$ be a finitely supported scalar family. Using bilinearity of the $L^2$ inner product and the innovation covariance relation $\mathbb E[\varepsilon_r\varepsilon_q]=\sigma^2\mathbb 1_{\{r=q\}}$, we compute
\begin{align*}
\left\|S_\varepsilon\left(\sum_{t\in\mathbb Z}a_t\varepsilon_t\right)\right\|_{\mathcal H}^2
&=\left\|\sum_{t\in\mathbb Z}a_t\varepsilon_{t+1}\right\|_{\mathcal H}^2\\
&=\sum_{t\in\mathbb Z}\sum_{s\in\mathbb Z}a_ta_s\mathbb E[\varepsilon_{t+1}\varepsilon_{s+1}]\\
&=\sum_{t\in\mathbb Z}\sum_{s\in\mathbb Z}a_ta_s\sigma^2\mathbb 1_{\{t=s\}}\\
&=\sigma^2\sum_{t\in\mathbb Z}|a_t|^2\\
&=\left\|\sum_{t\in\mathbb Z}a_t\varepsilon_t\right\|_{\mathcal H}^2.
\end{align*}
Therefore $S_\varepsilon$ is an isometry on a dense subspace of $\mathcal H^\varepsilon$, so it extends uniquely and continuously to an isometry $S_\varepsilon:\mathcal H^\varepsilon\to\mathcal H^\varepsilon$.
We also verify the same boundedness property for the process shift, including well-definedness on the span where the generators $X_t$ need not be linearly independent. Start with the rule on formal finite sums,
\begin{align*}
\sum_{t\in\mathbb Z}b_tX_t\mapsto \sum_{t\in\mathbb Z}b_tX_{t+1}.
\end{align*}
Let $(b_t)_{t\in\mathbb Z}$ be finitely supported. Since $(X_t)_{t\in\mathbb Z}$ is stationary and mean-zero, its second moments agree after a common time shift; equivalently, $\mathbb E[X_{t+1}X_{s+1}]=\mathbb E[X_tX_s]$ for all $s,t\in\mathbb Z$. The mean-zero hypothesis is what lets these second moments be read as the usual covariance kernel of the process. Hence
\begin{align*}
\left\|S_X\left(\sum_{t\in\mathbb Z}b_tX_t\right)\right\|_{\mathcal H}^2
&=\left\|\sum_{t\in\mathbb Z}b_tX_{t+1}\right\|_{\mathcal H}^2\\
&=\sum_{t\in\mathbb Z}\sum_{s\in\mathbb Z}b_tb_s\mathbb E[X_{t+1}X_{s+1}]\\
&=\sum_{t\in\mathbb Z}\sum_{s\in\mathbb Z}b_tb_s\mathbb E[X_tX_s]\\
&=\left\|\sum_{t\in\mathbb Z}b_tX_t\right\|_{\mathcal H}^2.
\end{align*}
If a formal finite sum represents the zero vector, meaning $\sum_{t\in\mathbb Z}b_tX_t=0$ in $\mathcal H$, the displayed norm identity implies
\begin{align*}
\left\|\sum_{t\in\mathbb Z}b_tX_{t+1}\right\|_{\mathcal H}=0.
\end{align*}
Therefore the shifted formal sum also represents the zero vector, so the rule descends to a well-defined linear map on $\operatorname{span}\{X_t:t\in\mathbb Z\}$. The same identity shows this map is an isometry, and hence it extends uniquely and continuously to an isometry $S_X:\mathcal H^X\to\mathcal H^X$.
Now it is legitimate to pass the innovation shift through the Wold limit. Fix $t\in\mathbb Z$ and recall the partial sums
\begin{align*}
X_{t,m}=\sum_{j=0}^{m}\psi_j\varepsilon_{t-j},
\qquad m\in\{0,1,2,\dots\}.
\end{align*}
We know $X_{t,m}\to X_t$ in $L^2(\Omega,\mathcal F,\mathbb P)$. Since $S_\varepsilon$ is continuous,
\begin{align*}
S_\varepsilon X_t
&=\lim_{m\to\infty}S_\varepsilon X_{t,m}\\
&=\lim_{m\to\infty}\sum_{j=0}^{m}\psi_jS_\varepsilon\varepsilon_{t-j}\\
&=\lim_{m\to\infty}\sum_{j=0}^{m}\psi_j\varepsilon_{t+1-j}\\
&=X_{t+1}.
\end{align*}
The final equality is the Wold representation at time $t+1$. Since $S_XX_t=X_{t+1}$ by definition on generators, we have $S_\varepsilon X_t=S_XX_t$ for every $t\in\mathbb Z$. By linearity the two shifts agree on $\operatorname{span}\{X_t:t\in\mathbb Z\}$. This span is dense in $\mathcal H^X$, and we already proved $\mathcal H^X=\mathcal H^\varepsilon$. Because both shifts are continuous isometries, equality on the dense subspace extends to equality on the whole Hilbert space. Under the identity map $U$, this equality is exactly
\begin{align*}
US_\varepsilon=S_XU.
\end{align*}
[/guided]
[/step]
[step:Write the isomorphism in innovation coordinates as a one-sided filter]
Let $\sigma:=\sqrt{\sigma^2}>0$ denote the positive square root of the innovation variance. Let $(e_t)_{t\in\mathbb Z}$ denote the standard coordinate vectors in $\ell^2(\mathbb Z)$, so $(e_t)_s=\mathbb 1_{\{s=t\}}$. Define
\begin{align*}
J:\ell^2(\mathbb Z)&\to\mathcal H^\varepsilon\\
a&\mapsto\sum_{t\in\mathbb Z}a_t\varepsilon_t.
\end{align*}
For finitely supported $a=(a_t)_{t\in\mathbb Z}$, orthogonality of the innovations gives
\begin{align*}
\|Ja\|_{\mathcal H}^2
=\mathbb E\left[\left(\sum_{t\in\mathbb Z}a_t\varepsilon_t\right)^2\right]
=\sigma^2\sum_{t\in\mathbb Z}|a_t|^2
=\sigma^2\|a\|_{\ell^2(\mathbb Z)}^2.
\end{align*}
Thus $J$ extends continuously to $\ell^2(\mathbb Z)$ and is an isomorphism onto $\mathcal H^\varepsilon$, with norm scaling factor $\sigma$. Equivalently, the rescaled map $\sigma^{-1}J:\ell^2(\mathbb Z)\to\mathcal H^\varepsilon$ is unitary for the standard $\ell^2$ inner product.
Since $J e_t=\varepsilon_t$, the Wold representation gives
\begin{align*}
X_t
=\sum_{j=0}^{\infty}\psi_j\varepsilon_{t-j}
=J\left(\sum_{j=0}^{\infty}\psi_j e_{t-j}\right),
\end{align*}
and hence
\begin{align*}
J^{-1}X_t=\sum_{j=0}^{\infty}\psi_j e_{t-j}.
\end{align*}
If $B:\ell^2(\mathbb Z)\to\ell^2(\mathbb Z)$ is defined by $(Ba)_t=a_{t+1}$, then $Be_t=e_{t-1}$, and therefore
\begin{align*}
\sum_{j=0}^{\infty}\psi_j e_{t-j}
=\sum_{j=0}^{\infty}\psi_j B^j e_t
=\psi(B)e_t.
\end{align*}
Thus, in innovation coordinates, $X_t$ is represented by the one-sided filter $\psi(B)\varepsilon_t$.
[guided]
The last assertion translates the Hilbert-space identity into coordinates. Define $\sigma:=\sqrt{\sigma^2}>0$ to be the positive square root of the innovation variance stated in the theorem. Let $(e_t)_{t\in\mathbb Z}$ be the standard basis of $\ell^2(\mathbb Z)$, meaning
\begin{align*}
(e_t)_s=\mathbb 1_{\{s=t\}}.
\end{align*}
Define the innovation-coordinate map
\begin{align*}
J:\ell^2(\mathbb Z)&\to\mathcal H^\varepsilon\\
a&\mapsto\sum_{t\in\mathbb Z}a_t\varepsilon_t.
\end{align*}
To see that this map is well-defined, first compute on finitely supported sequences. If $a=(a_t)_{t\in\mathbb Z}$ has finite support, then orthogonality of the innovation sequence gives
\begin{align*}
\|Ja\|_{\mathcal H}^2
&=\mathbb E\left[\left(\sum_{t\in\mathbb Z}a_t\varepsilon_t\right)
\left(\sum_{s\in\mathbb Z}a_s\varepsilon_s\right)\right]\\
&=\sum_{t\in\mathbb Z}\sum_{s\in\mathbb Z}a_ta_s\mathbb E[\varepsilon_t\varepsilon_s]\\
&=\sum_{t\in\mathbb Z}\sum_{s\in\mathbb Z}a_ta_s\sigma^2\mathbb 1_{\{t=s\}}\\
&=\sigma^2\sum_{t\in\mathbb Z}|a_t|^2.
\end{align*}
Thus $J$ preserves norms up to the fixed factor $\sigma>0$. It therefore extends continuously from finitely supported sequences to all of $\ell^2(\mathbb Z)$, and its range is precisely the closed span $\mathcal H^\varepsilon$. In particular, $J$ is a Hilbert-space isomorphism after accounting for this constant scale, while the normalized map $\sigma^{-1}J$ is unitary for the standard $\ell^2$ inner product.
Now $J e_t=\varepsilon_t$. Substituting this into the Wold representation gives
\begin{align*}
X_t
&=\sum_{j=0}^{\infty}\psi_j\varepsilon_{t-j}\\
&=\sum_{j=0}^{\infty}\psi_jJ e_{t-j}\\
&=J\left(\sum_{j=0}^{\infty}\psi_j e_{t-j}\right),
\end{align*}
where the sequence $\sum_{j=0}^{\infty}\psi_j e_{t-j}$ belongs to $\ell^2(\mathbb Z)$ because $(\psi_j)_{j\geq0}\in\ell^2(\{0,1,2,\dots\})$. Hence
\begin{align*}
J^{-1}X_t=\sum_{j=0}^{\infty}\psi_j e_{t-j}.
\end{align*}
Finally, the backshift $B:\ell^2(\mathbb Z)\to\ell^2(\mathbb Z)$ is defined by
\begin{align*}
(Ba)_t=a_{t+1}.
\end{align*}
For the standard basis vector $e_t$, this gives $Be_t=e_{t-1}$, and therefore $B^j e_t=e_{t-j}$ for every $j\geq0$. Consequently
\begin{align*}
J^{-1}X_t
=\sum_{j=0}^{\infty}\psi_jB^j e_t
=\psi(B)e_t.
\end{align*}
This is exactly the statement that, in innovation coordinates, the process is obtained from the innovation sequence by the one-sided filter $\psi(B)$.
[/guided]
[/step]