Wold Decomposition Theorem (Theorem # 3641)
Theorem
Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, and let $(X_t)_{t\in\mathbb Z}$ be a real-valued stochastic process such that each $X_t:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ belongs to $L^2(\Omega,\mathcal F,\mathbb P)$, satisfies $\mathbb E[X_t]=0$, and is second-order stationary:
\begin{align*}
\mathbb E[X_{t+h}X_{s+h}] = \mathbb E[X_tX_s]
\end{align*}
for all $s,t,h\in\mathbb Z$.
For each $t\in\mathbb Z$, define the closed past linear span
\begin{align*}
\mathcal H_t^X := \overline{\operatorname{span}}\{X_s:s\leq t\}\subset L^2(\Omega,\mathcal F,\mathbb P),
\end{align*}
where the closure is taken in the $L^2$ norm, and define the remote past
\begin{align*}
\mathcal H_{-\infty}^X := \bigcap_{t\in\mathbb Z}\mathcal H_t^X.
\end{align*}
Then there exist a mean-zero second-order stationary deterministic process $(D_t)_{t\in\mathbb Z}$, a mean-zero second-order stationary purely nondeterministic process $(Y_t)_{t\in\mathbb Z}$, an orthogonal innovation process $(\varepsilon_t)_{t\in\mathbb Z}$, a number $\sigma^2\geq 0$, and coefficients $(\psi_j)_{j\geq 0}\subset\mathbb R$ such that, for every $t\in\mathbb Z$,
\begin{align*}
X_t = D_t + Y_t
\end{align*}
in $L^2(\Omega,\mathcal F,\mathbb P)$, the two components are orthogonal in the sense that
\begin{align*}
\mathbb E[D_tY_s]=0
\end{align*}
for all $s,t\in\mathbb Z$, and
\begin{align*}
Y_t = \sum_{j=0}^{\infty}\psi_j\varepsilon_{t-j}
\end{align*}
with convergence in $L^2(\Omega,\mathcal F,\mathbb P)$.
The innovation process satisfies
\begin{align*}
\mathbb E[\varepsilon_t]=0,\qquad
\mathbb E[\varepsilon_t^2]=\sigma^2,\qquad
\mathbb E[\varepsilon_t\varepsilon_s]=0\quad\text{for }s\neq t,
\end{align*}
and each deterministic variable $D_t$ is orthogonal to every innovation:
\begin{align*}
\mathbb E[D_t\varepsilon_s]=0
\end{align*}
for all $s,t\in\mathbb Z$. If $\sigma^2>0$, then $\psi_0=1$ and
\begin{align*}
\sum_{j=0}^{\infty}|\psi_j|^2<\infty.
\end{align*}
If $\sigma^2=0$, then $Y_t=0$ in $L^2(\Omega,\mathcal F,\mathbb P)$ for every $t\in\mathbb Z$, the moving-average term is identically zero, and the normalization $\psi_0=1$ is only a convention.
Here deterministic means
\begin{align*}
\bigcap_{t\in\mathbb Z}\overline{\operatorname{span}}\{D_s:s\leq t\}
=
\overline{\operatorname{span}}\{D_s:s\in\mathbb Z\},
\end{align*}
and purely nondeterministic means
\begin{align*}
\bigcap_{t\in\mathbb Z}\overline{\operatorname{span}}\{Y_s:s\leq t\}=\{0\}.
\end{align*}
Discussion
No discussion available for this theorem.
Proof
[proofplan]
We work inside the Hilbert space generated by the process and use the unitary shift induced by stationarity. The deterministic component is the orthogonal projection of $X_t$ onto the remote past $\mathcal H_{-\infty}^X$, and the remaining component $Y_t$ has zero remote past. For the purely nondeterministic component, each one-step increment $\mathcal H_t^Y\ominus\mathcal H_{t-1}^Y$ is spanned by the innovation $\varepsilon_t$, and iterating these orthogonal decompositions gives finite moving-average approximations. The projections onto the decreasing remote past spaces converge to zero, so the finite expansions converge in $L^2$ to the desired Wold expansion.
[/proofplan]
[step:Construct the stationary shift on the Hilbert space generated by the process]
Let
\begin{align*}
\mathcal H^X:=\overline{\operatorname{span}}\{X_t:t\in\mathbb Z\}\subset L^2(\Omega,\mathcal F,\mathbb P),
\end{align*}
with inner product $(U,V)_{L^2}:=\mathbb E[UV]$. Define first on finite linear combinations the map
\begin{align*}
S_0:\operatorname{span}\{X_t:t\in\mathbb Z\}&\to \operatorname{span}\{X_t:t\in\mathbb Z\}\\
\sum_{k=1}^n a_kX_{t_k}&\mapsto \sum_{k=1}^n a_kX_{t_k+1}.
\end{align*}
Second-order stationarity gives, for every finite family $(a_k)_{k=1}^n\subset\mathbb R$ and $(t_k)_{k=1}^n\subset\mathbb Z$,
\begin{align*}
\left\|\sum_{k=1}^n a_kX_{t_k+1}\right\|_{L^2}^2
&=
\sum_{k=1}^n\sum_{\ell=1}^n a_ka_\ell\,\mathbb E[X_{t_k+1}X_{t_\ell+1}]\\
&=
\sum_{k=1}^n\sum_{\ell=1}^n a_ka_\ell\,\mathbb E[X_{t_k}X_{t_\ell}]\\
&=
\left\|\sum_{k=1}^n a_kX_{t_k}\right\|_{L^2}^2.
\end{align*}
Thus $S_0$ is an isometry on a dense subspace of $\mathcal H^X$. Its range is the same algebraic span, because the inverse shift on finite linear combinations sends $X_t$ to $X_{t-1}$. Hence the isometric extension is surjective, and therefore it extends uniquely to a unitary operator
\begin{align*}
S:\mathcal H^X\to\mathcal H^X.
\end{align*}
For every $t\in\mathbb Z$, this operator satisfies
\begin{align*}
S\mathcal H_t^X=\mathcal H_{t+1}^X.
\end{align*}
Consequently $S\mathcal H_{-\infty}^X=\mathcal H_{-\infty}^X$.
[guided]
The point of stationarity is that it lets us represent time translation as a unitary operator. We define
\begin{align*}
\mathcal H^X:=\overline{\operatorname{span}}\{X_t:t\in\mathbb Z\}\subset L^2(\Omega,\mathcal F,\mathbb P),
\end{align*}
and use the Hilbert-space inner product $(U,V)_{L^2}:=\mathbb E[UV]$. On a finite linear combination of process variables, define the one-step shift by
\begin{align*}
S_0:\operatorname{span}\{X_t:t\in\mathbb Z\}&\to \operatorname{span}\{X_t:t\in\mathbb Z\}\\
\sum_{k=1}^n a_kX_{t_k}&\mapsto \sum_{k=1}^n a_kX_{t_k+1}.
\end{align*}
This is well-defined because it preserves the $L^2$ norm. Indeed, second-order stationarity gives
\begin{align*}
\left\|\sum_{k=1}^n a_kX_{t_k+1}\right\|_{L^2}^2
&=
\sum_{k=1}^n\sum_{\ell=1}^n a_ka_\ell\,\mathbb E[X_{t_k+1}X_{t_\ell+1}]\\
&=
\sum_{k=1}^n\sum_{\ell=1}^n a_ka_\ell\,\mathbb E[X_{t_k}X_{t_\ell}]\\
&=
\left\|\sum_{k=1}^n a_kX_{t_k}\right\|_{L^2}^2.
\end{align*}
Hence if a finite linear combination represents the zero element of $L^2$, its shifted combination also represents the zero element. The map is not merely an isometry into $\mathcal H^X$; it is onto the algebraic span, since the backward shift sends each generator $X_t$ to $X_{t-1}$. Therefore the continuous extension has dense range and, being an isometry with closed range, has range equal to all of $\mathcal H^X$. Thus it extends uniquely by continuity to a unitary map
\begin{align*}
S:\mathcal H^X\to\mathcal H^X.
\end{align*}
The identity $SX_t=X_{t+1}$ implies
\begin{align*}
S\mathcal H_t^X=\mathcal H_{t+1}^X
\end{align*}
for every $t\in\mathbb Z$. Applying this to the intersection over all $t$ gives
\begin{align*}
S\mathcal H_{-\infty}^X
=
S\left(\bigcap_{t\in\mathbb Z}\mathcal H_t^X\right)
=
\bigcap_{t\in\mathbb Z}S\mathcal H_t^X
=
\bigcap_{t\in\mathbb Z}\mathcal H_{t+1}^X
=
\mathcal H_{-\infty}^X.
\end{align*}
Thus the remote past is invariant under the stationary time shift.
[/guided]
[/step]
[step:Project onto the remote past to obtain the deterministic component]
Let
\begin{align*}
P_{-\infty}:\mathcal H^X\to\mathcal H_{-\infty}^X
\end{align*}
be the orthogonal projection onto the closed subspace $\mathcal H_{-\infty}^X$. Define
\begin{align*}
D:\mathbb Z&\to L^2(\Omega,\mathcal F,\mathbb P)\\
t&\mapsto D_t:=P_{-\infty}X_t
\end{align*}
and
\begin{align*}
Y:\mathbb Z&\to L^2(\Omega,\mathcal F,\mathbb P)\\
t&\mapsto Y_t:=X_t-D_t.
\end{align*}
Let
\begin{align*}
L^2_0(\Omega,\mathcal F,\mathbb P):=\{Z\in L^2(\Omega,\mathcal F,\mathbb P):\mathbb E[Z]=0\}
\end{align*}
denote the closed mean-zero subspace. The expectation functional $Z\mapsto \mathbb E[Z]$ is continuous on $L^2(\Omega,\mathcal F,\mathbb P)$ by Cauchy-Schwarz, and each $X_t$ belongs to $L^2_0(\Omega,\mathcal F,\mathbb P)$; hence $\mathcal H^X\subset L^2_0(\Omega,\mathcal F,\mathbb P)$. Therefore $D_t,Y_t\in L^2_0(\Omega,\mathcal F,\mathbb P)$ for every $t\in\mathbb Z$.
Since $S\mathcal H_{-\infty}^X=\mathcal H_{-\infty}^X$, the projection $P_{-\infty}$ commutes with $S$. Hence
\begin{align*}
D_t=S^tD_0,\qquad Y_t=S^tY_0.
\end{align*}
The unitarity of $S$ gives second-order stationarity of both processes, and the preceding paragraph gives their mean-zero property.
For all $s,t\in\mathbb Z$, $D_s\in\mathcal H_{-\infty}^X$ and $Y_t=X_t-P_{-\infty}X_t$ is orthogonal to $\mathcal H_{-\infty}^X$, so
\begin{align*}
\mathbb E[D_sY_t]=0.
\end{align*}
It remains to verify determinism of $(D_t)$. For each $t\in\mathbb Z$, define
\begin{align*}
\mathcal H_t^D:=\overline{\operatorname{span}}\{D_s:s\leq t\}.
\end{align*}
Since every $D_s$ belongs to $\mathcal H_{-\infty}^X$, we have $\mathcal H_t^D\subset\mathcal H_{-\infty}^X$. Conversely, if $Z\in\mathcal H_{-\infty}^X$, then $Z\in\mathcal H_t^X$, so there are finite linear combinations
\begin{align*}
Z_n=\sum_{k=1}^{N_n}a_{n,k}X_{s_{n,k}},\qquad s_{n,k}\leq t,
\end{align*}
such that $Z_n\to Z$ in $L^2$. Applying the continuous projection $P_{-\infty}$ gives
\begin{align*}
P_{-\infty}Z_n=\sum_{k=1}^{N_n}a_{n,k}D_{s_{n,k}}\to P_{-\infty}Z=Z
\end{align*}
in $L^2$. Hence $Z\in\mathcal H_t^D$, and therefore
\begin{align*}
\mathcal H_t^D=\mathcal H_{-\infty}^X
\end{align*}
for every $t\in\mathbb Z$. Thus $(D_t)$ is deterministic.
[/step]
[step:Show that the residual process has zero remote past]
For every $t\in\mathbb Z$, define
\begin{align*}
\mathcal H_t^Y:=\overline{\operatorname{span}}\{Y_s:s\leq t\}.
\end{align*}
Because $Y_s=X_s-P_{-\infty}X_s$, each $Y_s$ is orthogonal to $\mathcal H_{-\infty}^X$. Also $X_s=D_s+Y_s$, with $D_s\in\mathcal H_{-\infty}^X$. Hence
\begin{align*}
\mathcal H_t^X=\mathcal H_{-\infty}^X\oplus \mathcal H_t^Y
\end{align*}
as an orthogonal direct sum. If
\begin{align*}
Z\in\bigcap_{t\in\mathbb Z}\mathcal H_t^Y,
\end{align*}
then $Z\in\bigcap_{t\in\mathbb Z}\mathcal H_t^X=\mathcal H_{-\infty}^X$, while also $Z\perp\mathcal H_{-\infty}^X$. Therefore $\|Z\|_{L^2}^2=(Z,Z)_{L^2}=0$, so $Z=0$ in $L^2$. Thus
\begin{align*}
\bigcap_{t\in\mathbb Z}\mathcal H_t^Y=\{0\},
\end{align*}
and $(Y_t)$ is purely nondeterministic.
[/step]
[step:Define the innovations and identify each one-step increment]
For each $t\in\mathbb Z$, let
\begin{align*}
P_{t-1}^Y:\mathcal H^Y\to\mathcal H_{t-1}^Y
\end{align*}
denote the orthogonal projection, where
\begin{align*}
\mathcal H^Y:=\overline{\operatorname{span}}\{Y_t:t\in\mathbb Z\}.
\end{align*}
Define the innovation process
\begin{align*}
\varepsilon:\mathbb Z&\to L^2(\Omega,\mathcal F,\mathbb P)\\
t&\mapsto \varepsilon_t:=Y_t-P_{t-1}^YY_t.
\end{align*}
Because $\mathcal H^Y\subset L^2_0(\Omega,\mathcal F,\mathbb P)$ and $\varepsilon_t\in\mathcal H^Y$, we have
\begin{align*}
\mathbb E[\varepsilon_t]=0
\end{align*}
for every $t\in\mathbb Z$. Also $\varepsilon_t\perp\mathcal H_{t-1}^Y$ and
\begin{align*}
\mathcal H_t^Y=\mathcal H_{t-1}^Y\oplus \operatorname{span}\{\varepsilon_t\}.
\end{align*}
Since $Y_t=S^tY_0$, the unitary $S$ maps $\mathcal H_r^Y$ onto $\mathcal H_{r+1}^Y$ for every $r\in\mathbb Z$. Thus $S^tP_{-1}^YS^{-t}$ is the orthogonal projection from $\mathcal H^Y$ onto $\mathcal H_{t-1}^Y$. By uniqueness of orthogonal projection,
\begin{align*}
P_{t-1}^Y=S^tP_{-1}^YS^{-t}.
\end{align*}
Therefore
\begin{align*}
\varepsilon_t
=Y_t-P_{t-1}^YY_t
=S^tY_0-S^tP_{-1}^YY_0
=S^t\varepsilon_0.
\end{align*}
Define
\begin{align*}
\sigma^2:=\mathbb E[\varepsilon_0^2]\geq 0.
\end{align*}
Since $S$ is unitary and $\varepsilon_t=S^t\varepsilon_0$, we have $\mathbb E[\varepsilon_t^2]=\sigma^2$ for all $t\in\mathbb Z$.
If $s<t$, then $\varepsilon_s\in\mathcal H_s^Y\subset\mathcal H_{t-1}^Y$, so $\varepsilon_t\perp\varepsilon_s$. Thus
\begin{align*}
\mathbb E[\varepsilon_t\varepsilon_s]=0
\end{align*}
whenever $s\neq t$. Since $\mathcal H_{-\infty}^X\perp\mathcal H^Y$, every $D_s$ is orthogonal to every $\varepsilon_t$.
If $\sigma^2=0$, then $\varepsilon_t=0$ in $L^2$ for every $t$, and therefore $\mathcal H_t^Y=\mathcal H_{t-1}^Y$ for every $t$. Hence all spaces $\mathcal H_t^Y$ are equal, and their intersection equals each one of them. Since $(Y_t)$ is purely nondeterministic, this common space is $\{0\}$, so $Y_t=0$ for all $t$. This is precisely the zero moving-average case.
[guided]
The innovation at time $t$ is the part of $Y_t$ that cannot be predicted from its closed linear past. We define
\begin{align*}
\mathcal H^Y:=\overline{\operatorname{span}}\{Y_t:t\in\mathbb Z\}
\end{align*}
and let
\begin{align*}
P_{t-1}^Y:\mathcal H^Y\to\mathcal H_{t-1}^Y
\end{align*}
be the orthogonal projection. Then set
\begin{align*}
\varepsilon:\mathbb Z&\to L^2(\Omega,\mathcal F,\mathbb P)\\
t&\mapsto \varepsilon_t:=Y_t-P_{t-1}^YY_t.
\end{align*}
By the defining property of orthogonal projection, $\varepsilon_t$ is orthogonal to $\mathcal H_{t-1}^Y$. Since $\mathcal H^Y$ is contained in the closed mean-zero subspace $L^2_0(\Omega,\mathcal F,\mathbb P)$, and since both $Y_t$ and $P_{t-1}^YY_t$ belong to $\mathcal H^Y$, the innovation also belongs to $L^2_0(\Omega,\mathcal F,\mathbb P)$. Hence
\begin{align*}
\mathbb E[\varepsilon_t]=0
\end{align*}
for every $t\in\mathbb Z$.
Why does this single vector span the whole new information at time $t$? Since
\begin{align*}
\mathcal H_t^Y=\overline{\operatorname{span}}\bigl(\mathcal H_{t-1}^Y\cup\{Y_t\}\bigr)
\end{align*}
and $Y_t=P_{t-1}^YY_t+\varepsilon_t$, every element added when passing from $\mathcal H_{t-1}^Y$ to $\mathcal H_t^Y$ lies in the direction of $\varepsilon_t$. Hence
\begin{align*}
\mathcal H_t^Y=\mathcal H_{t-1}^Y\oplus \operatorname{span}\{\varepsilon_t\}.
\end{align*}
The sum is orthogonal because $\varepsilon_t\perp\mathcal H_{t-1}^Y$.
We also need the innovations to move correctly under the time shift. From $Y_t=S^tY_0$ it follows that $S\mathcal H_r^Y=\mathcal H_{r+1}^Y$ for every $r\in\mathbb Z$. Therefore the operator $S^tP_{-1}^YS^{-t}$ is an orthogonal projection onto $\mathcal H_{t-1}^Y$: the conjugation by the unitary $S^t$ transports the target space $\mathcal H_{-1}^Y$ to $\mathcal H_{t-1}^Y$ and preserves orthogonality. Orthogonal projections onto a fixed closed subspace are unique, so
\begin{align*}
P_{t-1}^Y=S^tP_{-1}^YS^{-t}.
\end{align*}
Consequently
\begin{align*}
\varepsilon_t
&=Y_t-P_{t-1}^YY_t\\
&=S^tY_0-S^tP_{-1}^YS^{-t}S^tY_0\\
&=S^t(Y_0-P_{-1}^YY_0)\\
&=S^t\varepsilon_0.
\end{align*}
Define
\begin{align*}
\sigma^2:=\mathbb E[\varepsilon_0^2]\geq 0.
\end{align*}
Because $S$ is unitary and $\varepsilon_t=S^t\varepsilon_0$, we obtain
\begin{align*}
\mathbb E[\varepsilon_t^2]=\|\varepsilon_t\|_{L^2}^2=\|S^t\varepsilon_0\|_{L^2}^2=\|\varepsilon_0\|_{L^2}^2=\sigma^2
\end{align*}
for every $t\in\mathbb Z$.
The innovations are orthogonal at distinct times. If $s<t$, then
\begin{align*}
\varepsilon_s\in\mathcal H_s^Y\subset\mathcal H_{t-1}^Y.
\end{align*}
Since $\varepsilon_t\perp\mathcal H_{t-1}^Y$, we get
\begin{align*}
\mathbb E[\varepsilon_t\varepsilon_s]=0.
\end{align*}
Symmetry of the inner product gives the same conclusion for $t<s$. Also, every $D_s$ belongs to $\mathcal H_{-\infty}^X$, while every $\varepsilon_t$ belongs to $\mathcal H^Y$, and we already proved $\mathcal H_{-\infty}^X\perp\mathcal H^Y$. Thus
\begin{align*}
\mathbb E[D_s\varepsilon_t]=0
\end{align*}
for all $s,t\in\mathbb Z$.
Finally suppose $\sigma^2=0$. Then each $\varepsilon_t$ has zero $L^2$ norm, so $\varepsilon_t=0$ in $L^2$. The decomposition
\begin{align*}
\mathcal H_t^Y=\mathcal H_{t-1}^Y\oplus \operatorname{span}\{\varepsilon_t\}
\end{align*}
therefore reduces to $\mathcal H_t^Y=\mathcal H_{t-1}^Y$ for all $t$. Thus all the past spaces $\mathcal H_t^Y$ are the same space. Their intersection is that common space, but pure nondeterminism says the intersection is $\{0\}$. Hence $\mathcal H_t^Y=\{0\}$ for every $t$, and in particular $Y_t=0$ in $L^2$ for every $t$. This is the exceptional zero-variance case.
[/guided]
[/step]
[step:Expand the purely nondeterministic component into orthogonal innovations]
Assume now that $\sigma^2>0$. For each $j\geq 0$, define
\begin{align*}
\psi_j:=\frac{\mathbb E[Y_j\varepsilon_0]}{\sigma^2}.
\end{align*}
Using $Y_t=S^{t-j}Y_j$, $\varepsilon_{t-j}=S^{t-j}\varepsilon_0$, and unitarity of $S$, we obtain
\begin{align*}
\frac{\mathbb E[Y_t\varepsilon_{t-j}]}{\sigma^2}
=
\frac{(S^{t-j}Y_j,S^{t-j}\varepsilon_0)_{L^2}}{\sigma^2}
=
\frac{(Y_j,\varepsilon_0)_{L^2}}{\sigma^2}
=\psi_j
\end{align*}
for all $t\in\mathbb Z$ and $j\geq 0$. Since
\begin{align*}
Y_t=P_{t-1}^YY_t+\varepsilon_t
\end{align*}
and $\varepsilon_t\perp\mathcal H_{t-1}^Y$, we have
\begin{align*}
\psi_0=\frac{\mathbb E[Y_t\varepsilon_t]}{\sigma^2}
=
\frac{\mathbb E[\varepsilon_t^2]}{\sigma^2}
=1.
\end{align*}
For $m\geq 0$, iterating
\begin{align*}
\mathcal H_r^Y=\mathcal H_{r-1}^Y\oplus\operatorname{span}\{\varepsilon_r\}
\end{align*}
from $r=t$ down to $r=t-m$ gives
\begin{align*}
\mathcal H_t^Y
=
\mathcal H_{t-m-1}^Y
\oplus
\bigoplus_{j=0}^{m}\operatorname{span}\{\varepsilon_{t-j}\}.
\end{align*}
Projecting $Y_t\in\mathcal H_t^Y$ onto this orthogonal direct sum yields
\begin{align*}
Y_t
=
P_{t-m-1}^YY_t
+
\sum_{j=0}^{m}\psi_j\varepsilon_{t-j}.
\end{align*}
The finite sum is orthogonal, so Bessel's inequality gives
\begin{align*}
\sum_{j=0}^{m}\psi_j^2\sigma^2
=
\left\|\sum_{j=0}^{m}\psi_j\varepsilon_{t-j}\right\|_{L^2}^2
\leq
\|Y_t\|_{L^2}^2.
\end{align*}
Letting $m\to\infty$ gives
\begin{align*}
\sum_{j=0}^{\infty}|\psi_j|^2<\infty.
\end{align*}
[guided]
Assume $\sigma^2>0$; the case $\sigma^2=0$ was already identified as the zero moving-average case. For each $j\geq 0$, define the coefficient
\begin{align*}
\psi_j:=\frac{\mathbb E[Y_j\varepsilon_0]}{\sigma^2}.
\end{align*}
This is the coefficient obtained by projecting $Y_j$ onto the one-dimensional innovation space $\operatorname{span}\{\varepsilon_0\}$. The denominator is non-zero by the present assumption.
The same coefficient appears at every time because the innovations are transported by the unitary shift. Indeed, for every $t\in\mathbb Z$ and $j\geq 0$, we have $Y_t=S^{t-j}Y_j$ and $\varepsilon_{t-j}=S^{t-j}\varepsilon_0$. Since $S$ preserves the $L^2$ inner product,
\begin{align*}
\frac{\mathbb E[Y_t\varepsilon_{t-j}]}{\sigma^2}
=
\frac{(S^{t-j}Y_j,S^{t-j}\varepsilon_0)_{L^2}}{\sigma^2}
=
\frac{(Y_j,\varepsilon_0)_{L^2}}{\sigma^2}
=\psi_j.
\end{align*}
For $j=0$, the decomposition $Y_t=P_{t-1}^YY_t+\varepsilon_t$ and the orthogonality $\varepsilon_t\perp\mathcal H_{t-1}^Y$ give
\begin{align*}
\psi_0
=
\frac{\mathbb E[Y_t\varepsilon_t]}{\sigma^2}
=
\frac{\mathbb E[(P_{t-1}^YY_t+\varepsilon_t)\varepsilon_t]}{\sigma^2}
=
\frac{\mathbb E[\varepsilon_t^2]}{\sigma^2}
=1.
\end{align*}
Now fix $m\geq 0$. Iterating the orthogonal decompositions
\begin{align*}
\mathcal H_r^Y=\mathcal H_{r-1}^Y\oplus\operatorname{span}\{\varepsilon_r\}
\end{align*}
for $r=t,t-1,\dots,t-m$ yields
\begin{align*}
\mathcal H_t^Y
=
\mathcal H_{t-m-1}^Y
\oplus
\bigoplus_{j=0}^{m}\operatorname{span}\{\varepsilon_{t-j}\}.
\end{align*}
Projecting $Y_t\in\mathcal H_t^Y$ onto this orthogonal direct sum gives
\begin{align*}
Y_t
=
P_{t-m-1}^YY_t
+
\sum_{j=0}^{m}\psi_j\varepsilon_{t-j},
\end{align*}
because the coefficient of $\varepsilon_{t-j}$ in an orthogonal projection onto $\operatorname{span}\{\varepsilon_{t-j}\}$ is
\begin{align*}
\frac{\mathbb E[Y_t\varepsilon_{t-j}]}{\mathbb E[\varepsilon_{t-j}^2]}=\frac{\mathbb E[Y_t\varepsilon_{t-j}]}{\sigma^2}=\psi_j.
\end{align*}
The finite innovation sum is orthogonal, so the Pythagorean theorem gives
\begin{align*}
\left\|\sum_{j=0}^{m}\psi_j\varepsilon_{t-j}\right\|_{L^2}^2
=
\sum_{j=0}^{m}\psi_j^2\|\varepsilon_{t-j}\|_{L^2}^2
=
\sum_{j=0}^{m}\psi_j^2\sigma^2.
\end{align*}
Since this finite sum is the orthogonal projection of $Y_t$ onto a closed subspace of $\mathcal H_t^Y$, its norm is at most $\|Y_t\|_{L^2}$. Therefore
\begin{align*}
\sum_{j=0}^{m}\psi_j^2\sigma^2
\leq
\|Y_t\|_{L^2}^2.
\end{align*}
Dividing by $\sigma^2>0$ and letting $m\to\infty$ gives
\begin{align*}
\sum_{j=0}^{\infty}|\psi_j|^2<\infty.
\end{align*}
[/guided]
[/step]
[step:Let the remote projections vanish to obtain the infinite moving average]
We use the following Hilbert-space fact. If $(K_m)_{m\geq 0}$ is a decreasing sequence of closed subspaces of a Hilbert space $H$, if $P_m:H\to K_m$ is the orthogonal projection, and if
\begin{align*}
K_\infty:=\bigcap_{m=0}^{\infty}K_m,
\end{align*}
then $P_mx\to P_\infty x$ in $H$, where $P_\infty:H\to K_\infty$ is the orthogonal projection.
Indeed, for $n>m$, the identity $K_n\subset K_m$ implies $P_nx\in K_m$, and the projection identities give
\begin{align*}
\|P_mx-P_nx\|_H^2=\|P_mx\|_H^2-\|P_nx\|_H^2.
\end{align*}
Thus $(P_mx)_{m\geq 0}$ is Cauchy and converges to some $z\in H$. Since $P_mx\in K_r$ for every $m\geq r$ and $K_r$ is closed, $z\in K_r$ for every $r$, hence $z\in K_\infty$. For every $w\in K_\infty$, we have $w\in K_m$ for all $m$, so
\begin{align*}
(x-P_mx,w)_H=0.
\end{align*}
Passing to the limit gives $(x-z,w)_H=0$ for all $w\in K_\infty$, hence $z=P_\infty x$.
Apply this fact with
\begin{align*}
H=\mathcal H^Y,\qquad K_m=\mathcal H_{t-m-1}^Y,\qquad x=Y_t.
\end{align*}
Then
\begin{align*}
\bigcap_{m=0}^{\infty}\mathcal H_{t-m-1}^Y
=
\bigcap_{r\in\mathbb Z}\mathcal H_r^Y
=
\{0\},
\end{align*}
so
\begin{align*}
P_{t-m-1}^YY_t\to 0
\end{align*}
in $L^2$. Taking the limit in
\begin{align*}
Y_t
=
P_{t-m-1}^YY_t
+
\sum_{j=0}^{m}\psi_j\varepsilon_{t-j}
\end{align*}
gives
\begin{align*}
Y_t=\sum_{j=0}^{\infty}\psi_j\varepsilon_{t-j}
\end{align*}
in $L^2(\Omega,\mathcal F,\mathbb P)$.
Combining this identity with $X_t=D_t+Y_t$ gives
\begin{align*}
X_t=D_t+\sum_{j=0}^{\infty}\psi_j\varepsilon_{t-j}
\end{align*}
in $L^2(\Omega,\mathcal F,\mathbb P)$ for every $t\in\mathbb Z$. The deterministic component is orthogonal to all innovations, and the moving-average component is purely nondeterministic by the construction of $(Y_t)$. This completes the proof.
[guided]
It remains to justify that the finite expansions converge to the infinite moving average. We use the following Hilbert-space fact. Let $(K_m)_{m\geq 0}$ be a decreasing sequence of closed subspaces of a Hilbert space $H$, let $P_m:H\to K_m$ be the orthogonal projection, and define
\begin{align*}
K_\infty:=\bigcap_{m=0}^{\infty}K_m.
\end{align*}
If $P_\infty:H\to K_\infty$ is the orthogonal projection, then $P_mx\to P_\infty x$ in $H$ for every $x\in H$.
We prove the fact because it is exactly the mechanism by which the remote-past term disappears. If $n>m$, then $K_n\subset K_m$, so $P_nx\in K_m$. The projection identity applied in $K_m$ gives the orthogonal decomposition
\begin{align*}
P_mx=P_nx+(P_mx-P_nx),
\end{align*}
where $P_mx-P_nx\perp K_n$ and $P_nx\in K_n$. Hence
\begin{align*}
\|P_mx-P_nx\|_H^2=\|P_mx\|_H^2-\|P_nx\|_H^2.
\end{align*}
The sequence $(\|P_mx\|_H^2)_{m\geq 0}$ is decreasing and bounded below by $0$, so it is Cauchy. The displayed identity then shows that $(P_mx)_{m\geq 0}$ is Cauchy in $H$. Let its limit be $z\in H$.
For each fixed $r\geq 0$, all terms $P_mx$ with $m\geq r$ lie in $K_r$. Since $K_r$ is closed, the limit $z$ also lies in $K_r$. Therefore $z\in K_\infty$. If $w\in K_\infty$, then $w\in K_m$ for every $m$, and the defining property of the projection $P_m$ gives
\begin{align*}
(x-P_mx,w)_H=0.
\end{align*}
Passing to the limit in the inner product yields $(x-z,w)_H=0$ for every $w\in K_\infty$. Thus $z$ is the orthogonal projection of $x$ onto $K_\infty$, namely $z=P_\infty x$.
Apply this fact with
\begin{align*}
H=\mathcal H^Y,\qquad K_m=\mathcal H_{t-m-1}^Y,\qquad x=Y_t.
\end{align*}
The subspaces are decreasing because earlier past spaces are contained in later past spaces. Their intersection is
\begin{align*}
\bigcap_{m=0}^{\infty}\mathcal H_{t-m-1}^Y
=
\bigcap_{r\in\mathbb Z}\mathcal H_r^Y
=
\{0\},
\end{align*}
where the last equality is pure nondeterminism of $(Y_t)$. Therefore the corresponding projections satisfy
\begin{align*}
P_{t-m-1}^YY_t\to 0
\end{align*}
in $L^2(\Omega,\mathcal F,\mathbb P)$.
Taking the $L^2$ limit in the finite orthogonal expansion
\begin{align*}
Y_t
=
P_{t-m-1}^YY_t
+
\sum_{j=0}^{m}\psi_j\varepsilon_{t-j}
\end{align*}
gives
\begin{align*}
Y_t=\sum_{j=0}^{\infty}\psi_j\varepsilon_{t-j}
\end{align*}
in $L^2(\Omega,\mathcal F,\mathbb P)$. Since $X_t=D_t+Y_t$, we conclude
\begin{align*}
X_t=D_t+
\sum_{j=0}^{\infty}\psi_j\varepsilon_{t-j}
\end{align*}
in $L^2(\Omega,\mathcal F,\mathbb P)$ for every $t\in\mathbb Z$. The orthogonality of $D_s$ to every innovation was proved when the innovations were constructed, and the moving-average component is purely nondeterministic because it is exactly the residual process $(Y_t)$. This proves the asserted Wold decomposition.
[/guided]
[/step]
Explore Further
Beveridge–Nelson Decomposition
probability
Strong Consistency of the Multivariate Normal Maximum Likelihood Estimators
probability
Simultaneous Confidence Intervals for Mean Contrasts in One-Way MANOVA
probability
Gaussian Copula Has Zero Tail Dependence
probability
Quadratic Discriminant Analysis Bayes Rule
probability
Anderson's Asymptotic Normality Theorem for Sample Covariance Eigenvalues
probability
Spectral Density Formula for a Causal ARMA Process
probability
Mahalanobis Quadratic Form Distribution
probability