Linear Forecast from the Wold Representation — Statement & Proof

Linear Forecast from the Wold Representation (Theorem # 3643)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] Expand $X_{t+h}$ by its Wold representation and split the series into the terms involving innovations dated at most $t$ and the terms involving innovations dated after $t$. The first part lies in the closed linear span $\mathcal H_t^X$, while the second part is orthogonal to $\mathcal H_t^X$ because each future innovation is orthogonal to every earlier past space. The Hilbert-space projection characterization then identifies the first part as the best linear predictor, and the variance formula follows from pairwise orthogonality of the innovation terms. [/proofplan] [step:Split the Wold expansion into past and future innovation terms] Fix $t \in \mathbb Z$ and an integer $h \geq 1$. By the Wold representation applied at time $t+h$, the series \begin{align*} X_{t+h} = \sum_{j=0}^{\infty}\psi_j\varepsilon_{t+h-j} \end{align*} converges in $L^2(\Omega,\mathcal F,\mathbb P)$. Define the finite future-innovation term $R_{t,h} \in L^2(\Omega,\mathcal F,\mathbb P)$ by \begin{align*} R_{t,h} := \sum_{j=0}^{h-1}\psi_j\varepsilon_{t+h-j}, \end{align*} and define the past-innovation term $S_{t,h} \in L^2(\Omega,\mathcal F,\mathbb P)$ by \begin{align*} S_{t,h} := \sum_{j=h}^{\infty}\psi_j\varepsilon_{t+h-j}. \end{align*} The second series converges in $L^2(\Omega,\mathcal F,\mathbb P)$ because it is the tail of the $L^2$-convergent Wold series. Hence \begin{align*} X_{t+h}=R_{t,h}+S_{t,h} \end{align*} in $L^2(\Omega,\mathcal F,\mathbb P)$. [/step] [step:Show that the past-innovation term belongs to $\mathcal H_t^X$] For each integer $j \geq h$, set $s_j := t+h-j$. Then $s_j \leq t$. Since $\varepsilon_{s_j} \in \mathcal H_{s_j}^X$ by the innovation hypothesis and $\mathcal H_{s_j}^X \subset \mathcal H_t^X$ because $s_j \leq t$, we have \begin{align*} \varepsilon_{t+h-j} \in \mathcal H_t^X \end{align*} for every $j \geq h$. Each finite partial sum \begin{align*} S_{t,h,N} := \sum_{j=h}^{N}\psi_j\varepsilon_{t+h-j}, \qquad N \geq h, \end{align*} therefore belongs to $\mathcal H_t^X$. Since $\mathcal H_t^X$ is closed in $L^2(\Omega,\mathcal F,\mathbb P)$ and $S_{t,h,N} \to S_{t,h}$ in $L^2(\Omega,\mathcal F,\mathbb P)$, it follows that \begin{align*} S_{t,h} \in \mathcal H_t^X. \end{align*} [guided] We need to prove that the candidate predictor is actually an admissible linear predictor from the past. For $j \geq h$, define the integer $s_j := t+h-j$. The inequality $j \geq h$ gives $s_j \leq t$, so the innovation $\varepsilon_{s_j}$ is dated no later than time $t$. By hypothesis, every innovation satisfies $\varepsilon_s \in \mathcal H_s^X$. Since the past spaces are increasing in time, $s_j \leq t$ implies $\mathcal H_{s_j}^X \subset \mathcal H_t^X$. Therefore \begin{align*} \varepsilon_{t+h-j}=\varepsilon_{s_j}\in \mathcal H_t^X \end{align*} for every $j \geq h$. Now define, for each integer $N \geq h$, \begin{align*} S_{t,h,N} := \sum_{j=h}^{N}\psi_j\varepsilon_{t+h-j}. \end{align*} This is a finite linear combination of elements of $\mathcal H_t^X$, so $S_{t,h,N} \in \mathcal H_t^X$. The infinite series defining $S_{t,h}$ converges in $L^2(\Omega,\mathcal F,\mathbb P)$ because it is a tail of the Wold series. Since $\mathcal H_t^X$ is a closed subspace of $L^2(\Omega,\mathcal F,\mathbb P)$, the $L^2$-limit of the sequence $(S_{t,h,N})_{N \geq h}$ still belongs to $\mathcal H_t^X$. Hence \begin{align*} S_{t,h} \in \mathcal H_t^X. \end{align*} [/guided] [/step] [step:Show that the future-innovation error is orthogonal to $\mathcal H_t^X$] For each $0 \leq j < h$, set $r_j := t+h-j$. Then $r_j > t$. Since $\varepsilon_{r_j} \in \mathcal H_{r_j}^X \ominus \mathcal H_{r_j-1}^X$, we have \begin{align*} \mathbb E[\varepsilon_{r_j}Y]=0 \end{align*} for every $Y \in \mathcal H_{r_j-1}^X$. Because $t \leq r_j-1$, we have $\mathcal H_t^X \subset \mathcal H_{r_j-1}^X$. Therefore \begin{align*} \mathbb E[\varepsilon_{t+h-j}Y]=0 \end{align*} for every $Y \in \mathcal H_t^X$ and every $0 \leq j < h$. By finite linearity of the $L^2$ inner product, for every $Y \in \mathcal H_t^X$, \begin{align*} \mathbb E[R_{t,h}Y] = \sum_{j=0}^{h-1}\psi_j\mathbb E[\varepsilon_{t+h-j}Y] = 0. \end{align*} Thus \begin{align*} R_{t,h} \perp \mathcal H_t^X. \end{align*} [guided] The terms in $R_{t,h}$ involve innovations after time $t$, and we must show that their whole linear combination is orthogonal to every admissible predictor from $\mathcal H_t^X$. Fix $0 \leq j < h$ and define $r_j := t+h-j$. Since $j<h$, we have $r_j>t$. The innovation hypothesis says \begin{align*} \varepsilon_{r_j}\in \mathcal H_{r_j}^X\ominus \mathcal H_{r_j-1}^X. \end{align*} The meaning of the orthogonal difference is that $\varepsilon_{r_j}$ is orthogonal to every element of $\mathcal H_{r_j-1}^X$. Since $t \leq r_j-1$, the past space at time $t$ is contained in the past space at time $r_j-1$: \begin{align*} \mathcal H_t^X \subset \mathcal H_{r_j-1}^X. \end{align*} Therefore, for every $Y \in \mathcal H_t^X$, \begin{align*} \mathbb E[\varepsilon_{t+h-j}Y] = \mathbb E[\varepsilon_{r_j}Y] = 0. \end{align*} Now use that $R_{t,h}$ is a finite sum: \begin{align*} R_{t,h} = \sum_{j=0}^{h-1}\psi_j\varepsilon_{t+h-j}. \end{align*} For every $Y \in \mathcal H_t^X$, finite linearity of the $L^2$ inner product gives \begin{align*} \mathbb E[R_{t,h}Y] = \sum_{j=0}^{h-1}\psi_j\mathbb E[\varepsilon_{t+h-j}Y] = 0. \end{align*} This proves that the entire forecast error candidate is orthogonal to the past space: \begin{align*} R_{t,h}\perp \mathcal H_t^X. \end{align*} [/guided] [/step] [step:Identify the orthogonal projection onto the past] We have written \begin{align*} X_{t+h}=S_{t,h}+R_{t,h}, \end{align*} where $S_{t,h}\in \mathcal H_t^X$ and $R_{t,h}\perp \mathcal H_t^X$. By the defining characterization of orthogonal projection in the Hilbert space $L^2(\Omega,\mathcal F,\mathbb P)$, this implies \begin{align*} P_{\mathcal H_t^X}X_{t+h}=S_{t,h}. \end{align*} Therefore \begin{align*} P_{\mathcal H_t^X}X_{t+h} = \sum_{j=h}^{\infty}\psi_j\varepsilon_{t+h-j}, \end{align*} and subtracting this identity from the Wold expansion gives \begin{align*} X_{t+h}-P_{\mathcal H_t^X}X_{t+h} = R_{t,h} = \sum_{j=0}^{h-1}\psi_j\varepsilon_{t+h-j}. \end{align*} [/step] [step:Compute the mean-square forecast error using innovation orthogonality] Using the forecast error representation and expanding the square in $L^2(\Omega,\mathcal F,\mathbb P)$, we obtain \begin{align*} \mathbb E\left[\left|X_{t+h}-P_{\mathcal H_t^X}X_{t+h}\right|^2\right] &= \mathbb E\left[\left|\sum_{j=0}^{h-1}\psi_j\varepsilon_{t+h-j}\right|^2\right] \\ &= \sum_{j=0}^{h-1}\sum_{k=0}^{h-1}\psi_j\psi_k\, \mathbb E[\varepsilon_{t+h-j}\varepsilon_{t+h-k}]. \end{align*} The innovation orthogonality assumption gives \begin{align*} \mathbb E[\varepsilon_{t+h-j}\varepsilon_{t+h-k}] = \begin{cases} \sigma^2, & j=k,\\ 0, & j\neq k. \end{cases} \end{align*} Hence all off-diagonal terms vanish, and the diagonal terms give \begin{align*} \mathbb E\left[\left|X_{t+h}-P_{\mathcal H_t^X}X_{t+h}\right|^2\right] = \sigma^2\sum_{j=0}^{h-1}|\psi_j|^2. \end{align*} This is the stated mean-square forecast error formula. [/step]

Explore Further

Sklar's Theorem probability Conditional Expectation as the Mean Square Forecast probability Rauch--Tung--Striebel Smoothing Recursion probability Beveridge–Nelson Decomposition probability Positive Definiteness Criterion for Autocovariance Functions probability Wold Decomposition Theorem probability Gaussian Innovations Likelihood Factorization probability Ljung-Box Portmanteau Test probability

What brings you to Androma?

Start with a route through the knowledge graph.

Linear Forecast from the Wold Representation (Theorem # 3643)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Linear Forecast from the Wold Representation (Theorem # 3643)

Discussion

Proof

Explore Further