Forecast Error Variance of a Causal ARMA Process

Forecast Error Variance of a Causal ARMA Process (Theorem # 3663)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We work in the real Hilbert space $H = L^2(\Omega, \mathcal{F}, \mathbb{P})$, where the white-noise condition makes the rescaled innovations $(Z_t/\sigma)$ an orthonormal family. After confirming that the causal series converges in $H$, we expand $Y_{t+h}$ and split the sum at the index $j = h$: the terms with $j < h$ involve only the future innovations $Z_{t+1}, \dots, Z_{t+h}$, while the terms with $j \ge h$ involve only innovations dated at or before $t$. The first block is orthogonal to $\mathcal{H}_t$ and the second block lies in $\mathcal{H}_t$, so by the [Orthogonal Decomposition Theorem](/theorems/241) the second block is exactly the projection $\hat{Y}_t(h)$ and the first block is the forecast error. Pairwise orthogonality of the innovations then collapses the variance of the error to the displayed sum of squared coefficients. [/proofplan] [step:Realize the process in $L^2$ and confirm the causal series converges there] Let $H := L^2(\Omega, \mathcal{F}, \mathbb{P}; \mathbb{R})$, the space of square-integrable real random variables, equipped with the inner product \begin{align*} (\cdot, \cdot)_H : H \times H &\to \mathbb{R}, \\ (X, Y)_H &\mapsto \mathbb{E}[XY], \end{align*} and norm $\|X\|_H = (\mathbb{E}[X^2])^{1/2}$. Since $H$ is complete (the [Completeness of $L^p$ Spaces](/theorems/892), the case $p = 2$), it is a Hilbert space. The white-noise hypothesis $\mathbb{E}[Z_s Z_t] = \sigma^2 \mathbb{1}_{\{s = t\}}$ states precisely that \begin{align*} (Z_s, Z_t)_H = \sigma^2\, \mathbb{1}_{\{s = t\}}, \qquad s, t \in \mathbb{Z}, \end{align*} so each $Z_t \in H$ with $\|Z_t\|_H = \sigma$, and distinct innovations are orthogonal. For $N \in \mathbb{N}$ define the partial sum $S_N := \sum_{j=0}^{N} \psi_j Z_{t-j} \in H$. For $M < N$, the orthogonality relation gives \begin{align*} \|S_N - S_M\|_H^2 = \Big\| \sum_{j=M+1}^{N} \psi_j Z_{t-j} \Big\|_H^2 = \sum_{j=M+1}^{N} \sum_{k=M+1}^{N} \psi_j \psi_k (Z_{t-j}, Z_{t-k})_H = \sigma^2 \sum_{j=M+1}^{N} \psi_j^2. \end{align*} Because $\sum_{j=0}^{\infty} |\psi_j| < \infty$ we have $\psi_j \to 0$, so there is $J$ with $|\psi_j| \le 1$, hence $\psi_j^2 \le |\psi_j|$, for all $j \ge J$; therefore $\sum_{j=0}^{\infty} \psi_j^2 < \infty$. The tail $\sigma^2 \sum_{j=M+1}^{N} \psi_j^2$ thus tends to $0$ as $M, N \to \infty$, so $(S_N)$ is Cauchy in $H$ and converges there. This is the asserted $L^2$-convergence, and it identifies $Y_t = \sum_{j=0}^{\infty} \psi_j Z_{t-j}$ as a well-defined element of $H$ with finite variance. [guided] Why pass to $L^2$ at all? Forecasting is an $L^2$ (least-squares) problem: the "best linear predictor" is by definition the element of a linear subspace minimizing mean-squared error, and that is exactly the orthogonal projection in the Hilbert space $H = L^2(\Omega, \mathcal{F}, \mathbb{P})$. So our first task is to install the right Hilbert structure and check that every object in the statement actually lives in it. The inner product is $(X, Y)_H = \mathbb{E}[XY]$, and $H$ is complete by the [Completeness of $L^p$ Spaces](/theorems/892) with $p = 2$, so $H$ is a Hilbert space. Now read the white-noise hypothesis geometrically: $\mathbb{E}[Z_s Z_t] = \sigma^2 \mathbb{1}_{\{s=t\}}$ says \begin{align*} (Z_s, Z_t)_H = \sigma^2\, \mathbb{1}_{\{s=t\}}. \end{align*} Hence $\|Z_t\|_H = \sigma$ and $Z_s \perp Z_t$ whenever $s \ne t$: the family $(Z_t/\sigma)_{t \in \mathbb{Z}}$ is orthonormal. This single fact is the engine of the entire proof. Next we must make sure the defining series for $Y_t$ converges in $H$, since the statement asserts this and the rest of the argument manipulates it. Set $S_N = \sum_{j=0}^N \psi_j Z_{t-j}$. To prove convergence we show $(S_N)$ is Cauchy and invoke completeness. For $M < N$, orthogonality of the innovations turns the squared norm of the increment into a bare sum of squares (all cross terms $(Z_{t-j}, Z_{t-k})_H$ with $j \ne k$ vanish): \begin{align*} \|S_N - S_M\|_H^2 = \sum_{j=M+1}^{N} \sum_{k=M+1}^{N} \psi_j \psi_k (Z_{t-j}, Z_{t-k})_H = \sigma^2 \sum_{j=M+1}^{N} \psi_j^2. \end{align*} It remains to see $\sum_j \psi_j^2 < \infty$. We are only given the stronger-looking $\ell^1$ bound $\sum_j |\psi_j| < \infty$, but $\ell^1 \subseteq \ell^2$: since $\psi_j \to 0$, eventually $|\psi_j| \le 1$, so $\psi_j^2 \le |\psi_j|$, and summability of $|\psi_j|$ forces summability of $\psi_j^2$. Therefore the tail $\sigma^2\sum_{j=M+1}^N \psi_j^2 \to 0$, the sequence $(S_N)$ is Cauchy, and completeness delivers a limit $Y_t \in H$. In particular $\operatorname{Var}(Y_t) = \|Y_t\|_H^2 < \infty$, so the variance in the conclusion is meaningful. [/guided] [/step] [step:Split the causal expansion of $Y_{t+h}$ at the forecast horizon] Applying the causal representation at time $t + h$, \begin{align*} Y_{t+h} = \sum_{j=0}^{\infty} \psi_j\, Z_{t+h-j}, \end{align*} with convergence in $H$ by Step 1. Split the index set $\{0, 1, 2, \dots\}$ at $j = h$ and use continuity of addition in $H$ to write $Y_{t+h} = A + B$, where \begin{align*} A := \sum_{j=0}^{h-1} \psi_j\, Z_{t+h-j}, \qquad B := \sum_{j=h}^{\infty} \psi_j\, Z_{t+h-j}. \end{align*} Here $A$ is a finite sum and $B$ is the $H$-limit of its partial sums (a convergent tail of a convergent series). As $j$ ranges over $\{0, \dots, h-1\}$, the index $t + h - j$ ranges over $\{t+1, \dots, t+h\}$, so $A$ involves only innovations strictly after time $t$. Reindexing $B$ by $i := j - h \ge 0$ gives \begin{align*} B = \sum_{i=0}^{\infty} \psi_{i+h}\, Z_{t-i}, \end{align*} so $B$ involves only innovations $Z_{t-i}$ with $t - i \le t$, i.e. dated at or before time $t$. [/step] [step:Identify the projection and the forecast error via orthogonal decomposition] Recall $\mathcal{H}_t = \overline{\operatorname{sp}}\{Z_s : s \le t\}$, a closed subspace of the Hilbert space $H$. First, $B \in \mathcal{H}_t$. Each partial sum $\sum_{i=0}^{N} \psi_{i+h} Z_{t-i}$ is a finite linear combination of the vectors $Z_{t-i}$ with $t - i \le t$, hence lies in $\operatorname{sp}\{Z_s : s \le t\} \subseteq \mathcal{H}_t$. By Step 2 these partial sums converge in $H$ to $B$, and $\mathcal{H}_t$ is closed, so $B \in \mathcal{H}_t$. Second, $A \in \mathcal{H}_t^{\perp}$. The vector $A$ is a finite linear combination of $Z_{t+1}, \dots, Z_{t+h}$. For any $s \le t$ and any $r \in \{t+1, \dots, t+h\}$ we have $r \ne s$, so $(Z_r, Z_s)_H = \sigma^2 \mathbb{1}_{\{r = s\}} = 0$; thus $(A, Z_s)_H = 0$ for every $s \le t$. By bilinearity and continuity of the inner product, $A$ is orthogonal to every element of $\operatorname{sp}\{Z_s : s \le t\}$ and hence, taking limits, to every element of its closure $\mathcal{H}_t$. Therefore $A \in \mathcal{H}_t^{\perp}$. We have produced a decomposition $Y_{t+h} = B + A$ with $B \in \mathcal{H}_t$ and $A \in \mathcal{H}_t^{\perp}$. By the [Orthogonal Decomposition Theorem](/theorems/241), applied to the closed subspace $\mathcal{H}_t \subseteq H$, every element of $H$ has a *unique* such decomposition, and its $\mathcal{H}_t$-component is the orthogonal projection $P_{\mathcal{H}_t}$. Matching components yields \begin{align*} \hat{Y}_t(h) = P_{\mathcal{H}_t} Y_{t+h} = B = \sum_{j=h}^{\infty} \psi_j\, Z_{t+h-j}, \end{align*} and therefore the forecast error is \begin{align*} Y_{t+h} - \hat{Y}_t(h) = Y_{t+h} - B = A = \sum_{j=0}^{h-1} \psi_j\, Z_{t+h-j}, \end{align*} which is the first asserted identity. [guided] This is the conceptual heart of the proof. We have written $Y_{t+h} = A + B$, and we want to argue that $B$ *is* the forecast $\hat{Y}_t(h)$ and $A$ *is* the error. The forecast is defined as the orthogonal projection $P_{\mathcal{H}_t} Y_{t+h}$ onto the innovation history $\mathcal{H}_t = \overline{\operatorname{sp}}\{Z_s : s \le t\}$. The strategy is to show that the split $A + B$ already *is* the orthogonal decomposition of $Y_{t+h}$ relative to $\mathcal{H}_t$, and then quote the uniqueness of that decomposition to read off the projection. So we verify the two membership facts. Why is $B \in \mathcal{H}_t$? After reindexing in Step 2, $B = \sum_{i=0}^{\infty} \psi_{i+h} Z_{t-i}$. Every finite partial sum is a linear combination of innovations $Z_{t-i}$ with time index $t - i \le t$, so it lies in $\operatorname{sp}\{Z_s : s \le t\}$. The infinite sum is the $H$-limit of these partial sums; since $\mathcal{H}_t$ is the *closed* span, it contains all such limits. This is exactly why we take the closure in the definition of $\mathcal{H}_t$ — without it, the infinite-order moving average $B$ might escape the subspace. Why is $A \in \mathcal{H}_t^{\perp}$? The point of splitting at $j = h$ is that $A$ collects precisely the innovations $Z_{t+1}, \dots, Z_{t+h}$ dated *after* $t$. These are the parts of $Y_{t+h}$ that have not yet been "observed" at time $t$. Concretely, for any spanning vector $Z_s$ with $s \le t$, the index $s$ differs from each of $t+1, \dots, t+h$, so $(Z_r, Z_s)_H = \sigma^2 \mathbb{1}_{\{r=s\}} = 0$. Hence $A \perp Z_s$ for all $s \le t$. Orthogonality to a spanning set extends to the whole subspace: it is preserved under linear combinations (bilinearity of $(\cdot,\cdot)_H$) and under limits (continuity of $(\cdot,\cdot)_H$), so $A \perp \mathcal{H}_t$, i.e. $A \in \mathcal{H}_t^{\perp}$. Now we invoke the [Orthogonal Decomposition Theorem](/theorems/241). Its hypothesis is exactly what we have arranged: $\mathcal{H}_t$ is a closed subspace of the Hilbert space $H$. Its conclusion is that every $x \in H$ — here $x = Y_{t+h}$ — has a *unique* decomposition $x = m + m^{\perp}$ with $m \in \mathcal{H}_t$, $m^{\perp} \in \mathcal{H}_t^{\perp}$, and that the $\mathcal{H}_t$-component is the orthogonal projection $m = P_{\mathcal{H}_t} x$. We have exhibited one such decomposition, $Y_{t+h} = B + A$; by uniqueness it must be *the* decomposition. Reading off the components gives $\hat{Y}_t(h) = P_{\mathcal{H}_t} Y_{t+h} = B$ and the error $Y_{t+h} - \hat{Y}_t(h) = A = \sum_{j=0}^{h-1} \psi_j Z_{t+h-j}$. Notice that the error depends only on the future innovations and the first $h$ coefficients $\psi_0, \dots, \psi_{h-1}$ — the unpredictable part of the process over the next $h$ steps. [/guided] [/step] [step:Compute the error variance by orthogonality of the innovations] The error $Y_{t+h} - \hat{Y}_t(h) = A = \sum_{j=0}^{h-1} \psi_j Z_{t+h-j}$ is a finite linear combination of mean-zero random variables, so $\mathbb{E}[A] = \sum_{j=0}^{h-1} \psi_j \mathbb{E}[Z_{t+h-j}] = 0$. Hence its variance equals its second moment, i.e. its squared $H$-norm: \begin{align*} \operatorname{Var}\big(Y_{t+h} - \hat{Y}_t(h)\big) = \mathbb{E}[A^2] = (A, A)_H. \end{align*} Expanding the finite double sum and using $(Z_{t+h-j}, Z_{t+h-k})_H = \sigma^2 \mathbb{1}_{\{j = k\}}$ (distinct innovations are orthogonal, equal ones have squared norm $\sigma^2$), \begin{align*} (A, A)_H = \sum_{j=0}^{h-1} \sum_{k=0}^{h-1} \psi_j \psi_k\, (Z_{t+h-j}, Z_{t+h-k})_H = \sum_{j=0}^{h-1} \sum_{k=0}^{h-1} \psi_j \psi_k\, \sigma^2 \mathbb{1}_{\{j=k\}} = \sigma^2 \sum_{j=0}^{h-1} \psi_j^2. \end{align*} Combining the two displays gives \begin{align*} \operatorname{Var}\big(Y_{t+h} - \hat{Y}_t(h)\big) = \sigma^2 \sum_{j=0}^{h-1} \psi_j^2, \end{align*} which is the second asserted identity. Together with the error representation from Step 3, this completes the proof. [/step]

Explore Further

Bartlett Chi-Squared Approximation for Wilks' Lambda Statistic probability Innovations Algorithm probability Yule-Walker Equations for a Causal Autoregressive Process probability Rao's $F$ Approximation for Wilks' Lambda Statistic probability Positive Definiteness of the Autocovariance Function probability Conditional Expectation as the $L^2$ Projection probability Characteristic Function of the Multivariate Normal Distribution probability Wishart Distribution of the Residual Sum of Squares in the Multivariate Linear Model probability

What brings you to Androma?

Start with a route through the knowledge graph.