Kalman Filter Step with a Missing Observation — Statement & Proof

Kalman Filter Step with a Missing Observation (Theorem # 3658)

Theorem

Edit Issues Pull Requests Attributions Admin

Let $(\Omega, \mathcal{F}, \mathbb{P})$ be a probability space and consider the linear Gaussian state space model on the time index $t \in \{1, \dots, n\}$, with latent state $\alpha_t \in \mathbb{R}^m$ and observation $Y_t \in \mathbb{R}^p$ governed by the state (transition) equation and the observation equation \begin{align*} \alpha_{t+1} &= T_t \alpha_t + c_t + R_t \eta_t, & \eta_t &\sim \mathcal{N}(0, Q_t), \\ Y_t &= Z_t \alpha_t + d_t + \varepsilon_t, & \varepsilon_t &\sim \mathcal{N}(0, H_t), \end{align*} where $T_t \in \mathbb{R}^{m \times m}$, $R_t \in \mathbb{R}^{m \times r}$, $Z_t \in \mathbb{R}^{p \times m}$ are deterministic system matrices, $c_t \in \mathbb{R}^m$, $d_t \in \mathbb{R}^p$ are deterministic offsets, and $Q_t \in \mathbb{R}^{r \times r}$, $H_t \in \mathbb{R}^{p \times p}$ are deterministic covariance matrices. Assume the initial state satisfies $\alpha_1 \sim \mathcal{N}(a_1, P_1)$ and that the collection $\{\alpha_1\} \cup \{\eta_s\}_{s \ge 1} \cup \{\varepsilon_s\}_{s \ge 1}$ is mutually independent. Let $\mathcal{O} \subseteq \{1, \dots, n\}$ denote the (deterministic) set of times at which $Y_t$ is actually recorded, and define the available-information filtration \begin{align*} \mathcal{Y}_t &:= \sigma\big(\, Y_s : s \in \mathcal{O},\ s \le t \,\big), \qquad \mathcal{Y}_0 := \{\varnothing, \Omega\}. \end{align*} For each $t$ define the filtered and predicted conditional moments \begin{align*} a_{t\mid t} &:= \mathbb{E}[\alpha_t \mid \mathcal{Y}_t], & P_{t\mid t} &:= \operatorname{Var}(\alpha_t \mid \mathcal{Y}_t), \\ a_{t\mid t-1} &:= \mathbb{E}[\alpha_t \mid \mathcal{Y}_{t-1}], & P_{t\mid t-1} &:= \operatorname{Var}(\alpha_t \mid \mathcal{Y}_{t-1}), \end{align*} together with $a_{t+1\mid t} := \mathbb{E}[\alpha_{t+1} \mid \mathcal{Y}_t]$ and $P_{t+1\mid t} := \operatorname{Var}(\alpha_{t+1} \mid \mathcal{Y}_t)$, where for a random vector $X$ and a sub-$\sigma$-algebra $\mathcal{G}$ the conditional covariance matrix is \begin{align*} \operatorname{Var}(X \mid \mathcal{G}) := \mathbb{E}\big[(X - \mathbb{E}[X \mid \mathcal{G}])(X - \mathbb{E}[X \mid \mathcal{G}])^\top \,\big|\, \mathcal{G}\big]. \end{align*} Suppose $Y_t$ is missing, i.e. $t \notin \mathcal{O}$. Then the conditional law of $\alpha_t$ given the available information is unaffected by time $t$, so the filtering step collapses to the identity \begin{align*} a_{t\mid t} &= a_{t\mid t-1}, & P_{t\mid t} &= P_{t\mid t-1}, \end{align*} and the subsequent prediction step retains its usual form \begin{align*} a_{t+1\mid t} &= T_t\, a_{t\mid t} + c_t, & P_{t+1\mid t} &= T_t\, P_{t\mid t}\, T_t^\top + R_t\, Q_t\, R_t^\top. \end{align*}

Discussion

No discussion available for this theorem.

Proof

[proofplan] The entire argument rests on a single structural observation: the available-information filtration $\mathcal{Y}_t$ is generated by the *observed* measurements only, so a missing $Y_t$ adds nothing new and $\mathcal{Y}_t = \mathcal{Y}_{t-1}$. The filtering identities then follow immediately because the filtered and one-step-predicted moments are conditional expectations and variances with respect to *the same* $\sigma$-algebra. For the prediction step we substitute the transition equation and use that the state noise $\eta_t$ is independent of $\sigma(\alpha_t, \mathcal{Y}_t)$ — a consequence of the mutual independence of the model primitives — which kills the conditional mean of $\eta_t$ and the cross-covariance between $\alpha_t$ and $\eta_t$, leaving exactly the standard one-step mean and variance recursions with no innovation correction. [/proofplan] [step:Fix the model primitives and express the conditioning sets through the driving noise] By recursive substitution of the state equation, each state is a deterministic affine function of the primitives carrying strictly earlier noise: \begin{align*} \alpha_t = \Big(\textstyle\prod_{j=1}^{t-1} T_{t-j}\Big)\alpha_1 + (\text{affine function of } \eta_1, \dots, \eta_{t-1}), \end{align*} so $\alpha_t$ is $\sigma(\alpha_1, \eta_1, \dots, \eta_{t-1})$-measurable. For each recorded time $s$, the observation $Y_s = Z_s \alpha_s + d_s + \varepsilon_s$ is therefore a measurable function of $(\alpha_1, \eta_1, \dots, \eta_{s-1}, \varepsilon_s)$. Consequently, for every $t$, \begin{align*} \mathcal{Y}_{t-1} = \sigma\big(Y_s : s \in \mathcal{O},\ s \le t-1\big) \subseteq \sigma\big(\alpha_1, \eta_1, \dots, \eta_{t-2}, \varepsilon_1, \dots, \varepsilon_{t-1}\big), \end{align*} and combining this with the measurability of $\alpha_t$ above, \begin{align*} \sigma\big(\alpha_t, \mathcal{Y}_{t-1}\big) \subseteq \sigma\big(\alpha_1, \eta_1, \dots, \eta_{t-1}, \varepsilon_1, \dots, \varepsilon_{t-1}\big) =: \mathcal{G}_{t-1}. \end{align*} The matrices $T_t, R_t$ and the offset $c_t$ are deterministic and play no role in these $\sigma$-algebras. This bookkeeping is the only place the model structure enters; both the filtering identity and the prediction recursion are read off from it. [/step] [step:Show the missing observation leaves the information set unchanged] Since $t \notin \mathcal{O}$, the index set $\{s \in \mathcal{O} : s \le t\}$ contains no element equal to $t$, hence coincides with $\{s \in \mathcal{O} : s \le t-1\}$. The generating families of $\mathcal{Y}_t$ and $\mathcal{Y}_{t-1}$ are therefore identical, so \begin{align*} \mathcal{Y}_t = \sigma\big(Y_s : s \in \mathcal{O},\ s \le t\big) = \sigma\big(Y_s : s \in \mathcal{O},\ s \le t-1\big) = \mathcal{Y}_{t-1}. \end{align*} [guided] The Kalman filter is built on a filtration that grows only when a genuine measurement arrives. The defining feature of a *missing* observation is that no measurement is appended at time $t$: the recorded times up to $t$ are exactly the recorded times up to $t-1$. Formally, write the index set of recorded times not exceeding $t$ as $A_t := \{s \in \mathcal{O} : s \le t\}$. We ask: how does $A_t$ differ from $A_{t-1} = \{s \in \mathcal{O} : s \le t-1\}$? The only candidate element of $A_t \setminus A_{t-1}$ is $s = t$ itself, and that element belongs to $A_t$ precisely when $t \in \mathcal{O}$. Because $Y_t$ is missing, $t \notin \mathcal{O}$, so $t \notin A_t$ and therefore $A_t = A_{t-1}$. Two $\sigma$-algebras generated by the same family of random vectors are equal, so \begin{align*} \mathcal{Y}_t = \sigma\big(Y_s : s \in A_t\big) = \sigma\big(Y_s : s \in A_{t-1}\big) = \mathcal{Y}_{t-1}. \end{align*} This is an equality of $\sigma$-algebras, not merely an inclusion or an almost-sure statement — the two conditioning sets are literally the same object. That is the crux of the whole theorem: every subsequent identity is a corollary of $\mathcal{Y}_t = \mathcal{Y}_{t-1}$. [/guided] [/step] [step:Collapse the filtering step to the identity map] Conditional expectation and conditional variance depend on the conditioning information only through the $\sigma$-algebra. Since Step 2 gives $\mathcal{Y}_t = \mathcal{Y}_{t-1}$, the defining expressions for the filtered and one-step-predicted moments are evaluated against the same $\sigma$-algebra, hence represent the same objects: \begin{align*} a_{t\mid t} = \mathbb{E}[\alpha_t \mid \mathcal{Y}_t] = \mathbb{E}[\alpha_t \mid \mathcal{Y}_{t-1}] = a_{t\mid t-1}, \end{align*} and, using the definition $\operatorname{Var}(\alpha_t \mid \mathcal{G}) = \mathbb{E}\big[(\alpha_t - \mathbb{E}[\alpha_t \mid \mathcal{G}])(\alpha_t - \mathbb{E}[\alpha_t \mid \mathcal{G}])^\top \mid \mathcal{G}\big]$ with $\mathcal{G} = \mathcal{Y}_t = \mathcal{Y}_{t-1}$, \begin{align*} P_{t\mid t} = \operatorname{Var}(\alpha_t \mid \mathcal{Y}_t) = \operatorname{Var}(\alpha_t \mid \mathcal{Y}_{t-1}) = P_{t\mid t-1}. \end{align*} More strongly, the entire conditional law of $\alpha_t$ given $\mathcal{Y}_t$ equals its law given $\mathcal{Y}_{t-1}$, which is the precise sense in which the absent measurement carries no information. No innovation term $Y_t - \mathbb{E}[Y_t \mid \mathcal{Y}_{t-1}]$ can appear, because there is no observed $Y_t$ to form a prediction error. [/step] [step:Establish that the state noise $\eta_t$ is independent of $\sigma(\alpha_t, \mathcal{Y}_t)$] By Step 1, $\sigma(\alpha_t, \mathcal{Y}_t) = \sigma(\alpha_t, \mathcal{Y}_{t-1}) \subseteq \mathcal{G}_{t-1} = \sigma(\alpha_1, \eta_1, \dots, \eta_{t-1}, \varepsilon_1, \dots, \varepsilon_{t-1})$, where the first equality uses $\mathcal{Y}_t = \mathcal{Y}_{t-1}$ from Step 2. The hypothesis that the family $\{\alpha_1\} \cup \{\eta_s\}_{s \ge 1} \cup \{\varepsilon_s\}_{s \ge 1}$ is mutually independent implies that $\eta_t$ is independent of the sub-collection $\{\alpha_1, \eta_1, \dots, \eta_{t-1}, \varepsilon_1, \dots, \varepsilon_{t-1}\}$, hence independent of the $\sigma$-algebra $\mathcal{G}_{t-1}$ it generates. Since $\sigma(\alpha_t, \mathcal{Y}_t) \subseteq \mathcal{G}_{t-1}$, we conclude \begin{align*} \eta_t \ \text{ is independent of } \ \sigma(\alpha_t, \mathcal{Y}_t). \end{align*} In particular $\eta_t$ is independent of $\mathcal{Y}_t$. By [Conditioning and Independence](/theorems/1152), independence of $\eta_t$ from $\mathcal{Y}_t$ gives \begin{align*} \mathbb{E}[\eta_t \mid \mathcal{Y}_t] = \mathbb{E}[\eta_t] = 0, \end{align*} the last equality because $\eta_t \sim \mathcal{N}(0, Q_t)$ is centred. [/step] [step:Derive the predicted mean $a_{t+1\mid t} = T_t a_{t\mid t} + c_t$] Substitute the transition equation $\alpha_{t+1} = T_t \alpha_t + c_t + R_t \eta_t$ and take conditional expectation given $\mathcal{Y}_t$. By the linearity of conditional expectation and the fact that the deterministic quantities $T_t, c_t, R_t$ may be taken outside (see [Basic Properties of Conditional Expectation](/theorems/1148)), \begin{align*} a_{t+1\mid t} = \mathbb{E}[T_t \alpha_t + c_t + R_t \eta_t \mid \mathcal{Y}_t] = T_t\, \mathbb{E}[\alpha_t \mid \mathcal{Y}_t] + c_t + R_t\, \mathbb{E}[\eta_t \mid \mathcal{Y}_t]. \end{align*} The first term is $T_t a_{t\mid t}$ by definition of $a_{t\mid t}$, and the last term vanishes because $\mathbb{E}[\eta_t \mid \mathcal{Y}_t] = 0$ by Step 4. Hence \begin{align*} a_{t+1\mid t} = T_t\, a_{t\mid t} + c_t. \end{align*} [/step] [step:Derive the predicted variance $P_{t+1\mid t} = T_t P_{t\mid t} T_t^\top + R_t Q_t R_t^\top$] Form the conditional prediction error. Using $a_{t+1\mid t} = T_t a_{t\mid t} + c_t$ from Step 5 and the transition equation, \begin{align*} \alpha_{t+1} - a_{t+1\mid t} = T_t(\alpha_t - a_{t\mid t}) + R_t \eta_t =: U + V, \qquad U := T_t(\alpha_t - a_{t\mid t}), \quad V := R_t \eta_t, \end{align*} the deterministic offset $c_t$ cancelling. By the definition of the conditional covariance matrix, \begin{align*} P_{t+1\mid t} = \mathbb{E}\big[(U+V)(U+V)^\top \mid \mathcal{Y}_t\big] = \mathbb{E}[UU^\top \mid \mathcal{Y}_t] + \mathbb{E}[UV^\top \mid \mathcal{Y}_t] + \mathbb{E}[VU^\top \mid \mathcal{Y}_t] + \mathbb{E}[VV^\top \mid \mathcal{Y}_t]. \end{align*} We evaluate the four terms. For the $UU^\top$ term, $T_t$ is deterministic, so by [Basic Properties of Conditional Expectation](/theorems/1148), \begin{align*} \mathbb{E}[UU^\top \mid \mathcal{Y}_t] = T_t\, \mathbb{E}\big[(\alpha_t - a_{t\mid t})(\alpha_t - a_{t\mid t})^\top \mid \mathcal{Y}_t\big]\, T_t^\top = T_t\, P_{t\mid t}\, T_t^\top. \end{align*} For the $VV^\top$ term, $\eta_t$ is independent of $\mathcal{Y}_t$ (Step 4), so by [Conditioning and Independence](/theorems/1152), $\mathbb{E}[\eta_t \eta_t^\top \mid \mathcal{Y}_t] = \mathbb{E}[\eta_t \eta_t^\top] = Q_t$ (the covariance of the centred $\eta_t \sim \mathcal{N}(0, Q_t)$), whence \begin{align*} \mathbb{E}[VV^\top \mid \mathcal{Y}_t] = R_t\, \mathbb{E}[\eta_t \eta_t^\top \mid \mathcal{Y}_t]\, R_t^\top = R_t\, Q_t\, R_t^\top. \end{align*} The cross terms vanish, as shown in the claim below. Combining the four evaluations gives \begin{align*} P_{t+1\mid t} = T_t\, P_{t\mid t}\, T_t^\top + R_t\, Q_t\, R_t^\top. \end{align*} [claim:The conditional cross-covariance $\mathbb{E}[U V^\top \mid \mathcal{Y}_t]$ is zero] [proof] Write $\mathcal{G} := \sigma(\alpha_t, \mathcal{Y}_t)$, which contains $\mathcal{Y}_t$. The factor $\alpha_t - a_{t\mid t}$ is $\mathcal{G}$-measurable, since $\alpha_t$ is $\mathcal{G}$-measurable and $a_{t\mid t} = \mathbb{E}[\alpha_t \mid \mathcal{Y}_t]$ is $\mathcal{Y}_t \subseteq \mathcal{G}$-measurable. By Step 4, $\eta_t$ is independent of $\mathcal{G}$. Apply the [Tower Property of Conditional Expectation](/theorems/1150) with $\mathcal{Y}_t \subseteq \mathcal{G}$: \begin{align*} \mathbb{E}\big[(\alpha_t - a_{t\mid t})\, \eta_t^\top \,\big|\, \mathcal{Y}_t\big] = \mathbb{E}\Big[\, \mathbb{E}\big[(\alpha_t - a_{t\mid t})\, \eta_t^\top \,\big|\, \mathcal{G}\big] \,\Big|\, \mathcal{Y}_t \Big]. \end{align*} Inside, the $\mathcal{G}$-measurable factor $(\alpha_t - a_{t\mid t})$ is taken outside the conditional expectation by [Basic Properties of Conditional Expectation](/theorems/1148), and then independence of $\eta_t$ from $\mathcal{G}$ gives $\mathbb{E}[\eta_t^\top \mid \mathcal{G}] = \mathbb{E}[\eta_t^\top] = 0$ via [Conditioning and Independence](/theorems/1152): \begin{align*} \mathbb{E}\big[(\alpha_t - a_{t\mid t})\, \eta_t^\top \,\big|\, \mathcal{G}\big] = (\alpha_t - a_{t\mid t})\, \mathbb{E}[\eta_t^\top \mid \mathcal{G}] = (\alpha_t - a_{t\mid t})\cdot 0 = 0. \end{align*} The outer conditional expectation of $0$ is $0$. Therefore $\mathbb{E}[(\alpha_t - a_{t\mid t})\eta_t^\top \mid \mathcal{Y}_t] = 0$, and since $T_t, R_t$ are deterministic, \begin{align*} \mathbb{E}[U V^\top \mid \mathcal{Y}_t] = T_t\, \mathbb{E}\big[(\alpha_t - a_{t\mid t})\eta_t^\top \mid \mathcal{Y}_t\big]\, R_t^\top = 0. \end{align*} Transposing the same identity gives $\mathbb{E}[V U^\top \mid \mathcal{Y}_t] = 0$ as well. [/proof] [/claim] [/step] [step:Assemble the recursions and confirm Gaussianity is preserved] Steps 2–3 give the filtering identity $a_{t\mid t} = a_{t\mid t-1}$ and $P_{t\mid t} = P_{t\mid t-1}$, while Steps 5–6 give the prediction recursion $a_{t+1\mid t} = T_t a_{t\mid t} + c_t$ and $P_{t+1\mid t} = T_t P_{t\mid t} T_t^\top + R_t Q_t R_t^\top$. These are exactly the four displayed equations of the statement. Finally, the qualitative claim that the conditional *law* of $\alpha_t$ is unchanged follows from $\mathcal{Y}_t = \mathcal{Y}_{t-1}$ together with the linear-Gaussian structure: $\alpha_t$ and the recorded observations are jointly Gaussian (each is an affine image of the Gaussian primitive vector $(\alpha_1, \eta_1, \dots, \varepsilon_1, \dots)$ by [Affine Transformations of Multivariate Normals](/theorems/1853)), so the conditional distribution $\alpha_t \mid \mathcal{Y}_t$ is Gaussian and is completely determined by its first two moments $(a_{t\mid t}, P_{t\mid t})$. Because these moments coincide with $(a_{t\mid t-1}, P_{t\mid t-1})$, the filtered distribution equals the one-step-ahead predictive distribution, and the prediction step then proceeds from the unchanged filtered moments exactly as in the fully observed case. This completes the proof. [/step]

Explore Further

Kolmogorov Isomorphism Theorem for Purely Nondeterministic Stationary Processes probability Affine Stability of the Multivariate Normal Distribution probability Asymptotic Independence and Complex Normality of the Discrete Fourier Transform of a Stationary Process probability Kalman Filter Recursion Theorem probability Nelson's Strict Stationarity Theorem for the GARCH(1,1) Process probability Causal ARMA Infinite Moving-Average Representation probability Rotational Diagonalization of a Maximum Likelihood Factor Loading Representative probability Likelihood Ratio Test for the Number of Factors in Gaussian Factor Analysis probability

What brings you to Androma?

Start with a route through the knowledge graph.

Kalman Filter Step with a Missing Observation (Theorem # 3658)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Kalman Filter Step with a Missing Observation (Theorem # 3658)

Discussion

Proof

Explore Further