Kalman Filter Recursion Theorem — Statement & Proof

Kalman Filter Recursion Theorem (Theorem # 3655)

Theorem

Edit Issues Pull Requests Attributions Admin

Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space, and let $\mathcal G_t := \sigma(Y_1,\dots,Y_t)$ for each $t \in \mathbb N$. Fix $t \in \mathbb N$. Let $\alpha_t: (\Omega,\mathcal F) \to (\mathbb R^m,\mathcal B(\mathbb R^m))$ be the state vector and $Y_t: (\Omega,\mathcal F) \to (\mathbb R^p,\mathcal B(\mathbb R^p))$ be the observation vector. Suppose that, conditionally on $\mathcal G_{t-1}$, \begin{align*} \alpha_t \sim \mathcal N(a_{t|t-1}, P_{t|t-1}), \end{align*} where $a_{t|t-1}: \Omega \to \mathbb R^m$ is $\mathcal G_{t-1}$-measurable and $P_{t|t-1}: \Omega \to \mathbb R^{m \times m}$ is a symmetric positive semidefinite $\mathcal G_{t-1}$-measurable matrix-valued random variable. Assume the observation equation is \begin{align*} Y_t = Z_t \alpha_t + d_t + \varepsilon_t, \end{align*} where $Z_t \in \mathbb R^{p \times m}$, $d_t \in \mathbb R^p$, and, conditionally on $\mathcal G_{t-1}$, the noise vector $\varepsilon_t: \Omega \to \mathbb R^p$ is Gaussian with law $\mathcal N(0,H_t)$ and is independent of $\alpha_t$, with $H_t \in \mathbb R^{p \times p}$ symmetric positive semidefinite. Define the innovation $v_t: \Omega \to \mathbb R^p$, innovation covariance $F_t: \Omega \to \mathbb R^{p \times p}$, and Kalman gain $K_t: \Omega \to \mathbb R^{m \times p}$ by \begin{align*} v_t &= Y_t - Z_t a_{t|t-1} - d_t, \\ F_t &= Z_t P_{t|t-1} Z_t^\top + H_t, \\ K_t &= P_{t|t-1} Z_t^\top F_t^{-1}. \end{align*} If $F_t$ is invertible, then the conditional law of $\alpha_t$ given $\mathcal G_t$ is Gaussian with mean $a_{t|t}: \Omega \to \mathbb R^m$ and covariance $P_{t|t}: \Omega \to \mathbb R^{m \times m}$ given by \begin{align*} a_{t|t} &= a_{t|t-1} + K_t v_t, \\ P_{t|t} &= P_{t|t-1} - K_t F_t K_t^\top. \end{align*} If, in addition, the state transition equation is \begin{align*} \alpha_{t+1} = T_t \alpha_t + c_t + R_t \eta_{t+1}, \end{align*} where $T_t \in \mathbb R^{m \times m}$, $c_t \in \mathbb R^m$, $R_t \in \mathbb R^{m \times r}$, and, conditionally on $\mathcal G_t$, the noise vector $\eta_{t+1}: \Omega \to \mathbb R^r$ is Gaussian with law $\mathcal N(0,Q_t)$ and is independent of $\alpha_t$, with $Q_t \in \mathbb R^{r \times r}$ symmetric positive semidefinite, then the one-step prediction law of $\alpha_{t+1}$ given $\mathcal G_t$ is Gaussian with mean $a_{t+1|t}: \Omega \to \mathbb R^m$ and covariance $P_{t+1|t}: \Omega \to \mathbb R^{m \times m}$ given by \begin{align*} a_{t+1|t} &= T_t a_{t|t} + c_t, \\ P_{t+1|t} &= T_t P_{t|t} T_t^\top + R_t Q_t R_t^\top. \end{align*}

Discussion

No discussion available for this theorem.

Proof

[proofplan] Condition on the past observation sigma-algebra $\mathcal G_{t-1}$ and regard the forecast mean and covariance as fixed quantities. Under the linear Gaussian observation equation, the pair $(\alpha_t,Y_t)$ is jointly Gaussian with covariance blocks that can be computed directly from independence of $\alpha_t$ and the observation noise. The Gaussian conditioning formula then gives the filtering mean and covariance, and the algebraic identity \begin{align*} K_t F_t K_t^\top = P_{t|t-1}Z_t^\top F_t^{-1}Z_tP_{t|t-1} \end{align*} rewrites the covariance in Kalman-gain form. Finally, the state transition is an affine image of the filtered Gaussian state plus independent Gaussian noise, so its conditional mean and covariance are obtained by the standard affine Gaussian calculation. [/proofplan] [step:Compute the conditional joint Gaussian law of the state and observation] Fix the sigma-algebra $\mathcal G_{t-1} := \sigma(Y_1,\dots,Y_{t-1})$. Conditionally on $\mathcal G_{t-1}$, define the affine map \begin{align*} A_t: \mathbb R^m \times \mathbb R^p &\to \mathbb R^m \times \mathbb R^p \\ (x,e) &\mapsto (x, Z_t x + d_t + e). \end{align*} Since $\alpha_t$ and $\varepsilon_t$ are conditionally independent Gaussian vectors given $\mathcal G_{t-1}$, the vector $(\alpha_t,\varepsilon_t)$ is conditionally Gaussian, and therefore $(\alpha_t,Y_t) = A_t(\alpha_t,\varepsilon_t)$ is conditionally Gaussian given $\mathcal G_{t-1}$. Its conditional mean is \begin{align*} \mathbb E[\alpha_t \mid \mathcal G_{t-1}] &= a_{t|t-1}, \\ \mathbb E[Y_t \mid \mathcal G_{t-1}] &= Z_t a_{t|t-1} + d_t. \end{align*} Using the observation equation $Y_t = Z_t\alpha_t + d_t + \varepsilon_t$ and the conditional independence of $\alpha_t$ and $\varepsilon_t$ given $\mathcal G_{t-1}$, the covariance blocks are \begin{align*} \operatorname{Cov}(\alpha_t,\alpha_t \mid \mathcal G_{t-1}) &= P_{t|t-1}, \\ \operatorname{Cov}(\alpha_t,Y_t \mid \mathcal G_{t-1}) &= P_{t|t-1}Z_t^\top, \\ \operatorname{Cov}(Y_t,Y_t \mid \mathcal G_{t-1}) &= Z_tP_{t|t-1}Z_t^\top + H_t = F_t. \end{align*} Thus, conditionally on $\mathcal G_{t-1}$, the pair $(\alpha_t,Y_t)$ is jointly Gaussian with the displayed mean vector and covariance blocks. [guided] We freeze the information in the past observations by conditioning on $\mathcal G_{t-1}$. After this conditioning, the quantities $a_{t|t-1}$ and $P_{t|t-1}$ are known with respect to the conditional probability law. The observation equation is linear in the state and in the observation noise, so we package it as an affine map \begin{align*} A_t: \mathbb R^m \times \mathbb R^p &\to \mathbb R^m \times \mathbb R^p \\ (x,e) &\mapsto (x, Z_t x + d_t + e). \end{align*} The input vector $(\alpha_t,\varepsilon_t)$ is conditionally Gaussian because its two components are conditionally independent Gaussian vectors. Affine images of Gaussian vectors are Gaussian, so $(\alpha_t,Y_t) = A_t(\alpha_t,\varepsilon_t)$ is conditionally jointly Gaussian given $\mathcal G_{t-1}$. The conditional mean of the first component is the forecast mean: \begin{align*} \mathbb E[\alpha_t \mid \mathcal G_{t-1}] = a_{t|t-1}. \end{align*} For the observation component, linearity of conditional expectation gives \begin{align*} \mathbb E[Y_t \mid \mathcal G_{t-1}] &= \mathbb E[Z_t\alpha_t + d_t + \varepsilon_t \mid \mathcal G_{t-1}] \\ &= Z_t\mathbb E[\alpha_t \mid \mathcal G_{t-1}] + d_t + \mathbb E[\varepsilon_t \mid \mathcal G_{t-1}] \\ &= Z_t a_{t|t-1} + d_t. \end{align*} The covariance of the state with itself is, by hypothesis, \begin{align*} \operatorname{Cov}(\alpha_t,\alpha_t \mid \mathcal G_{t-1}) = P_{t|t-1}. \end{align*} For the cross-covariance, the deterministic shift $d_t$ contributes nothing and the conditional independence of $\alpha_t$ and $\varepsilon_t$ gives zero cross-covariance: \begin{align*} \operatorname{Cov}(\alpha_t,Y_t \mid \mathcal G_{t-1}) &= \operatorname{Cov}(\alpha_t,Z_t\alpha_t + d_t + \varepsilon_t \mid \mathcal G_{t-1}) \\ &= \operatorname{Cov}(\alpha_t,\alpha_t \mid \mathcal G_{t-1})Z_t^\top + \operatorname{Cov}(\alpha_t,\varepsilon_t \mid \mathcal G_{t-1}) \\ &= P_{t|t-1}Z_t^\top. \end{align*} Similarly, the conditional covariance of the observation is \begin{align*} \operatorname{Cov}(Y_t,Y_t \mid \mathcal G_{t-1}) &= \operatorname{Cov}(Z_t\alpha_t + d_t + \varepsilon_t,Z_t\alpha_t + d_t + \varepsilon_t \mid \mathcal G_{t-1}) \\ &= Z_tP_{t|t-1}Z_t^\top + H_t \\ &= F_t. \end{align*} This identifies the exact jointly Gaussian law needed for the conditioning step. [/guided] [/step] [step:Apply the Gaussian conditioning formula to update the filtered state] We apply the conditional version of the [Conditional Distribution Formula for Jointly Gaussian Random Vectors](/theorems/???) under a regular conditional law given $\mathcal G_{t-1}$, with $X = \alpha_t$ and $W = Y_t$. Under this conditional law, the $\mathcal G_{t-1}$-measurable mean and covariance blocks computed above are fixed parameters. The required non-degeneracy hypothesis is precisely that the covariance block of $W$ is invertible; here that block is $F_t$, which is invertible by hypothesis. Therefore, conditionally on $\mathcal G_t = \sigma(\mathcal G_{t-1},Y_t)$, \begin{align*} \alpha_t \mid \mathcal G_t \sim \mathcal N\left( a_{t|t-1} + P_{t|t-1}Z_t^\top F_t^{-1} \bigl(Y_t - Z_ta_{t|t-1} - d_t\bigr), \, P_{t|t-1} - P_{t|t-1}Z_t^\top F_t^{-1}Z_tP_{t|t-1} \right). \end{align*} Since $v_t = Y_t - Z_ta_{t|t-1} - d_t$ and $K_t = P_{t|t-1}Z_t^\top F_t^{-1}$, the conditional mean is \begin{align*} a_{t|t} = a_{t|t-1} + K_t v_t. \end{align*} [guided] The previous step put us exactly in the setting of the conditional version of the [Conditional Distribution Formula for Jointly Gaussian Random Vectors](/theorems/???). We work under a regular conditional law given $\mathcal G_{t-1}$; under that conditional law, the $\mathcal G_{t-1}$-measurable forecast quantities are treated as fixed mean and covariance parameters. In the formula, the random vector being updated is $X = \alpha_t$, and the random vector being observed is $W = Y_t$. The conditional mean blocks are \begin{align*} \mu_X &= a_{t|t-1}, \\ \mu_W &= Z_ta_{t|t-1} + d_t, \end{align*} and the covariance blocks are \begin{align*} \Sigma_{XX} &= P_{t|t-1}, \\ \Sigma_{XW} &= P_{t|t-1}Z_t^\top, \\ \Sigma_{WW} &= F_t. \end{align*} The Gaussian conditioning formula requires the observed covariance block $\Sigma_{WW}$ to be invertible. In this theorem, $\Sigma_{WW}=F_t$, and invertibility of $F_t$ is assumed. Applying the formula gives the conditional law of $\alpha_t$ after observing $Y_t$: \begin{align*} \alpha_t \mid \mathcal G_t \sim \mathcal N\left( \mu_X + \Sigma_{XW}\Sigma_{WW}^{-1}(Y_t-\mu_W), \, \Sigma_{XX}-\Sigma_{XW}\Sigma_{WW}^{-1}\Sigma_{WX} \right). \end{align*} Substituting the blocks computed above, \begin{align*} \alpha_t \mid \mathcal G_t \sim \mathcal N\left( a_{t|t-1} + P_{t|t-1}Z_t^\top F_t^{-1} \bigl(Y_t - Z_ta_{t|t-1} - d_t\bigr), \, P_{t|t-1} - P_{t|t-1}Z_t^\top F_t^{-1}Z_tP_{t|t-1} \right). \end{align*} The innovation is exactly \begin{align*} v_t = Y_t - Z_ta_{t|t-1} - d_t, \end{align*} and the Kalman gain is exactly \begin{align*} K_t = P_{t|t-1}Z_t^\top F_t^{-1}. \end{align*} Therefore the filtered conditional mean is \begin{align*} a_{t|t} = a_{t|t-1} + K_t v_t. \end{align*} [/guided] [/step] [step:Rewrite the filtered covariance in Kalman gain form] The covariance obtained from Gaussian conditioning is \begin{align*} P_{t|t} = P_{t|t-1} - P_{t|t-1}Z_t^\top F_t^{-1}Z_tP_{t|t-1}. \end{align*} Since $P_{t|t-1}$ and $H_t$ are symmetric, $F_t = Z_tP_{t|t-1}Z_t^\top + H_t$ is symmetric. Because $F_t$ is invertible, $F_t^{-1}$ is symmetric. Hence \begin{align*} K_tF_tK_t^\top &= P_{t|t-1}Z_t^\top F_t^{-1} F_t (F_t^{-1})^\top Z_tP_{t|t-1} \\ &= P_{t|t-1}Z_t^\top F_t^{-1}Z_tP_{t|t-1}. \end{align*} Substituting this identity into the covariance formula gives \begin{align*} P_{t|t} = P_{t|t-1} - K_tF_tK_t^\top. \end{align*} [guided] The Gaussian conditioning formula gives the covariance in the form \begin{align*} P_{t|t} = P_{t|t-1} - P_{t|t-1}Z_t^\top F_t^{-1}Z_tP_{t|t-1}. \end{align*} We now rewrite the second term using the Kalman gain. Since $P_{t|t-1}$ and $H_t$ are symmetric covariance matrices, the matrix \begin{align*} F_t = Z_tP_{t|t-1}Z_t^\top + H_t \end{align*} is symmetric. The hypothesis says $F_t$ is invertible, and the inverse of an invertible symmetric matrix is symmetric, so $(F_t^{-1})^\top = F_t^{-1}$. Using the definition \begin{align*} K_t = P_{t|t-1}Z_t^\top F_t^{-1}, \end{align*} we compute \begin{align*} K_tF_tK_t^\top &= P_{t|t-1}Z_t^\top F_t^{-1} F_t (F_t^{-1})^\top Z_tP_{t|t-1} \\ &= P_{t|t-1}Z_t^\top F_t^{-1}Z_tP_{t|t-1}. \end{align*} Substituting this identity into the covariance formula yields \begin{align*} P_{t|t} = P_{t|t-1} - K_tF_tK_t^\top. \end{align*} [/guided] [/step] [step:Propagate the filtered Gaussian law through the state transition] Condition on $\mathcal G_t$. By the filtering step, \begin{align*} \alpha_t \mid \mathcal G_t \sim \mathcal N(a_{t|t},P_{t|t}). \end{align*} The transition noise $\eta_{t+1}$ is conditionally independent of $\alpha_t$ given $\mathcal G_t$ and satisfies \begin{align*} \eta_{t+1} \mid \mathcal G_t \sim \mathcal N(0,Q_t). \end{align*} Define the affine transition map \begin{align*} B_t: \mathbb R^m \times \mathbb R^r &\to \mathbb R^m \\ (x,u) &\mapsto T_t x + c_t + R_tu. \end{align*} Then $\alpha_{t+1}=B_t(\alpha_t,\eta_{t+1})$ is conditionally Gaussian given $\mathcal G_t$. Its conditional mean is \begin{align*} \mathbb E[\alpha_{t+1}\mid \mathcal G_t] &= T_t\mathbb E[\alpha_t\mid \mathcal G_t]+c_t+R_t\mathbb E[\eta_{t+1}\mid \mathcal G_t] \\ &= T_ta_{t|t}+c_t. \end{align*} Its conditional covariance is \begin{align*} \operatorname{Cov}(\alpha_{t+1},\alpha_{t+1}\mid \mathcal G_t) &= \operatorname{Cov}(T_t\alpha_t+c_t+R_t\eta_{t+1},T_t\alpha_t+c_t+R_t\eta_{t+1}\mid \mathcal G_t) \\ &= T_tP_{t|t}T_t^\top + R_tQ_tR_t^\top, \end{align*} because the deterministic shift $c_t$ contributes no covariance and conditional independence gives zero cross-covariance between $\alpha_t$ and $\eta_{t+1}$. Therefore \begin{align*} a_{t+1|t} &= T_ta_{t|t}+c_t, \\ P_{t+1|t} &= T_tP_{t|t}T_t^\top + R_tQ_tR_t^\top. \end{align*} [guided] After observing $Y_t$, the filtered state is Gaussian: \begin{align*} \alpha_t \mid \mathcal G_t \sim \mathcal N(a_{t|t},P_{t|t}). \end{align*} The next state is obtained from the filtered state by the affine transition equation \begin{align*} \alpha_{t+1} = T_t\alpha_t + c_t + R_t\eta_{t+1}. \end{align*} The transition noise satisfies \begin{align*} \eta_{t+1}\mid \mathcal G_t \sim \mathcal N(0,Q_t), \end{align*} and is conditionally independent of $\alpha_t$ given $\mathcal G_t$. Thus the pair $(\alpha_t,\eta_{t+1})$ is conditionally jointly Gaussian, and the affine map \begin{align*} B_t: \mathbb R^m \times \mathbb R^r &\to \mathbb R^m \\ (x,u) &\mapsto T_t x + c_t + R_tu \end{align*} sends it to the conditionally Gaussian vector $\alpha_{t+1}$. We now compute its conditional mean. Linearity of conditional expectation gives \begin{align*} \mathbb E[\alpha_{t+1}\mid \mathcal G_t] &= \mathbb E[T_t\alpha_t+c_t+R_t\eta_{t+1}\mid \mathcal G_t] \\ &= T_t\mathbb E[\alpha_t\mid \mathcal G_t]+c_t+R_t\mathbb E[\eta_{t+1}\mid \mathcal G_t] \\ &= T_ta_{t|t}+c_t. \end{align*} We next compute the conditional covariance. The deterministic vector $c_t$ does not affect covariance. Conditional independence of $\alpha_t$ and $\eta_{t+1}$ gives \begin{align*} \operatorname{Cov}(\alpha_t,\eta_{t+1}\mid \mathcal G_t)=0. \end{align*} Therefore \begin{align*} \operatorname{Cov}(\alpha_{t+1},\alpha_{t+1}\mid \mathcal G_t) &= \operatorname{Cov}(T_t\alpha_t+R_t\eta_{t+1},T_t\alpha_t+R_t\eta_{t+1}\mid \mathcal G_t) \\ &= T_t\operatorname{Cov}(\alpha_t,\alpha_t\mid \mathcal G_t)T_t^\top + R_t\operatorname{Cov}(\eta_{t+1},\eta_{t+1}\mid \mathcal G_t)R_t^\top \\ &= T_tP_{t|t}T_t^\top + R_tQ_tR_t^\top. \end{align*} Thus the one-step prediction mean and covariance are \begin{align*} a_{t+1|t} &= T_ta_{t|t}+c_t, \\ P_{t+1|t} &= T_tP_{t|t}T_t^\top + R_tQ_tR_t^\top. \end{align*} [/guided] [/step]

Explore Further

Conditional Variance Forecast Formula for a Stationary GARCH(1,1) Process probability Root Expressions for the Four Classical MANOVA Test Statistics probability Maximum Likelihood Estimator of the Coefficient Matrix in the Multivariate Linear Model probability Existence and Uniqueness of Population Canonical Directions probability Differencing Removes a Simple Unit Root probability Quadratic Discriminant Analysis Bayes Rule probability Wold Decomposition Theorem probability Gaussian ARMA Prediction Interval Theorem probability

What brings you to Androma?

Start with a route through the knowledge graph.

Kalman Filter Recursion Theorem (Theorem # 3655)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Kalman Filter Recursion Theorem (Theorem # 3655)

Discussion

Proof

Explore Further