[proofplan]
The entire argument rests on a single structural observation: the available-information filtration $\mathcal{Y}_t$ is generated by the *observed* measurements only, so a missing $Y_t$ adds nothing new and $\mathcal{Y}_t = \mathcal{Y}_{t-1}$. The filtering identities then follow immediately because the filtered and one-step-predicted moments are conditional expectations and variances with respect to *the same* $\sigma$-algebra. For the prediction step we substitute the transition equation and use that the state noise $\eta_t$ is independent of $\sigma(\alpha_t, \mathcal{Y}_t)$ — a consequence of the mutual independence of the model primitives — which kills the conditional mean of $\eta_t$ and the cross-covariance between $\alpha_t$ and $\eta_t$, leaving exactly the standard one-step mean and variance recursions with no innovation correction.
[/proofplan]
[step:Fix the model primitives and express the conditioning sets through the driving noise]
By recursive substitution of the state equation, each state is a deterministic affine function of the primitives carrying strictly earlier noise:
\begin{align*}
\alpha_t = \Big(\textstyle\prod_{j=1}^{t-1} T_{t-j}\Big)\alpha_1 + (\text{affine function of } \eta_1, \dots, \eta_{t-1}),
\end{align*}
so $\alpha_t$ is $\sigma(\alpha_1, \eta_1, \dots, \eta_{t-1})$-measurable. For each recorded time $s$, the observation $Y_s = Z_s \alpha_s + d_s + \varepsilon_s$ is therefore a measurable function of $(\alpha_1, \eta_1, \dots, \eta_{s-1}, \varepsilon_s)$. Consequently, for every $t$,
\begin{align*}
\mathcal{Y}_{t-1} = \sigma\big(Y_s : s \in \mathcal{O},\ s \le t-1\big) \subseteq \sigma\big(\alpha_1, \eta_1, \dots, \eta_{t-2}, \varepsilon_1, \dots, \varepsilon_{t-1}\big),
\end{align*}
and combining this with the measurability of $\alpha_t$ above,
\begin{align*}
\sigma\big(\alpha_t, \mathcal{Y}_{t-1}\big) \subseteq \sigma\big(\alpha_1, \eta_1, \dots, \eta_{t-1}, \varepsilon_1, \dots, \varepsilon_{t-1}\big) =: \mathcal{G}_{t-1}.
\end{align*}
The matrices $T_t, R_t$ and the offset $c_t$ are deterministic and play no role in these $\sigma$-algebras. This bookkeeping is the only place the model structure enters; both the filtering identity and the prediction recursion are read off from it.
[/step]
[step:Show the missing observation leaves the information set unchanged]
Since $t \notin \mathcal{O}$, the index set $\{s \in \mathcal{O} : s \le t\}$ contains no element equal to $t$, hence coincides with $\{s \in \mathcal{O} : s \le t-1\}$. The generating families of $\mathcal{Y}_t$ and $\mathcal{Y}_{t-1}$ are therefore identical, so
\begin{align*}
\mathcal{Y}_t = \sigma\big(Y_s : s \in \mathcal{O},\ s \le t\big) = \sigma\big(Y_s : s \in \mathcal{O},\ s \le t-1\big) = \mathcal{Y}_{t-1}.
\end{align*}
[guided]
The Kalman filter is built on a filtration that grows only when a genuine measurement arrives. The defining feature of a *missing* observation is that no measurement is appended at time $t$: the recorded times up to $t$ are exactly the recorded times up to $t-1$.
Formally, write the index set of recorded times not exceeding $t$ as $A_t := \{s \in \mathcal{O} : s \le t\}$. We ask: how does $A_t$ differ from $A_{t-1} = \{s \in \mathcal{O} : s \le t-1\}$? The only candidate element of $A_t \setminus A_{t-1}$ is $s = t$ itself, and that element belongs to $A_t$ precisely when $t \in \mathcal{O}$. Because $Y_t$ is missing, $t \notin \mathcal{O}$, so $t \notin A_t$ and therefore $A_t = A_{t-1}$.
Two $\sigma$-algebras generated by the same family of random vectors are equal, so
\begin{align*}
\mathcal{Y}_t = \sigma\big(Y_s : s \in A_t\big) = \sigma\big(Y_s : s \in A_{t-1}\big) = \mathcal{Y}_{t-1}.
\end{align*}
This is an equality of $\sigma$-algebras, not merely an inclusion or an almost-sure statement — the two conditioning sets are literally the same object. That is the crux of the whole theorem: every subsequent identity is a corollary of $\mathcal{Y}_t = \mathcal{Y}_{t-1}$.
[/guided]
[/step]
[step:Collapse the filtering step to the identity map]
Conditional expectation and conditional variance depend on the conditioning information only through the $\sigma$-algebra. Since Step 2 gives $\mathcal{Y}_t = \mathcal{Y}_{t-1}$, the defining expressions for the filtered and one-step-predicted moments are evaluated against the same $\sigma$-algebra, hence represent the same objects:
\begin{align*}
a_{t\mid t} = \mathbb{E}[\alpha_t \mid \mathcal{Y}_t] = \mathbb{E}[\alpha_t \mid \mathcal{Y}_{t-1}] = a_{t\mid t-1},
\end{align*}
and, using the definition $\operatorname{Var}(\alpha_t \mid \mathcal{G}) = \mathbb{E}\big[(\alpha_t - \mathbb{E}[\alpha_t \mid \mathcal{G}])(\alpha_t - \mathbb{E}[\alpha_t \mid \mathcal{G}])^\top \mid \mathcal{G}\big]$ with $\mathcal{G} = \mathcal{Y}_t = \mathcal{Y}_{t-1}$,
\begin{align*}
P_{t\mid t} = \operatorname{Var}(\alpha_t \mid \mathcal{Y}_t) = \operatorname{Var}(\alpha_t \mid \mathcal{Y}_{t-1}) = P_{t\mid t-1}.
\end{align*}
More strongly, the entire conditional law of $\alpha_t$ given $\mathcal{Y}_t$ equals its law given $\mathcal{Y}_{t-1}$, which is the precise sense in which the absent measurement carries no information. No innovation term $Y_t - \mathbb{E}[Y_t \mid \mathcal{Y}_{t-1}]$ can appear, because there is no observed $Y_t$ to form a prediction error.
[/step]
[step:Establish that the state noise $\eta_t$ is independent of $\sigma(\alpha_t, \mathcal{Y}_t)$]
By Step 1, $\sigma(\alpha_t, \mathcal{Y}_t) = \sigma(\alpha_t, \mathcal{Y}_{t-1}) \subseteq \mathcal{G}_{t-1} = \sigma(\alpha_1, \eta_1, \dots, \eta_{t-1}, \varepsilon_1, \dots, \varepsilon_{t-1})$, where the first equality uses $\mathcal{Y}_t = \mathcal{Y}_{t-1}$ from Step 2. The hypothesis that the family $\{\alpha_1\} \cup \{\eta_s\}_{s \ge 1} \cup \{\varepsilon_s\}_{s \ge 1}$ is mutually independent implies that $\eta_t$ is independent of the sub-collection $\{\alpha_1, \eta_1, \dots, \eta_{t-1}, \varepsilon_1, \dots, \varepsilon_{t-1}\}$, hence independent of the $\sigma$-algebra $\mathcal{G}_{t-1}$ it generates. Since $\sigma(\alpha_t, \mathcal{Y}_t) \subseteq \mathcal{G}_{t-1}$, we conclude
\begin{align*}
\eta_t \ \text{ is independent of } \ \sigma(\alpha_t, \mathcal{Y}_t).
\end{align*}
In particular $\eta_t$ is independent of $\mathcal{Y}_t$. By [Conditioning and Independence](/theorems/1152), independence of $\eta_t$ from $\mathcal{Y}_t$ gives
\begin{align*}
\mathbb{E}[\eta_t \mid \mathcal{Y}_t] = \mathbb{E}[\eta_t] = 0,
\end{align*}
the last equality because $\eta_t \sim \mathcal{N}(0, Q_t)$ is centred.
[/step]
[step:Derive the predicted mean $a_{t+1\mid t} = T_t a_{t\mid t} + c_t$]
Substitute the transition equation $\alpha_{t+1} = T_t \alpha_t + c_t + R_t \eta_t$ and take conditional expectation given $\mathcal{Y}_t$. By the linearity of conditional expectation and the fact that the deterministic quantities $T_t, c_t, R_t$ may be taken outside (see [Basic Properties of Conditional Expectation](/theorems/1148)),
\begin{align*}
a_{t+1\mid t} = \mathbb{E}[T_t \alpha_t + c_t + R_t \eta_t \mid \mathcal{Y}_t] = T_t\, \mathbb{E}[\alpha_t \mid \mathcal{Y}_t] + c_t + R_t\, \mathbb{E}[\eta_t \mid \mathcal{Y}_t].
\end{align*}
The first term is $T_t a_{t\mid t}$ by definition of $a_{t\mid t}$, and the last term vanishes because $\mathbb{E}[\eta_t \mid \mathcal{Y}_t] = 0$ by Step 4. Hence
\begin{align*}
a_{t+1\mid t} = T_t\, a_{t\mid t} + c_t.
\end{align*}
[/step]
[step:Derive the predicted variance $P_{t+1\mid t} = T_t P_{t\mid t} T_t^\top + R_t Q_t R_t^\top$]
Form the conditional prediction error. Using $a_{t+1\mid t} = T_t a_{t\mid t} + c_t$ from Step 5 and the transition equation,
\begin{align*}
\alpha_{t+1} - a_{t+1\mid t} = T_t(\alpha_t - a_{t\mid t}) + R_t \eta_t =: U + V, \qquad U := T_t(\alpha_t - a_{t\mid t}), \quad V := R_t \eta_t,
\end{align*}
the deterministic offset $c_t$ cancelling. By the definition of the conditional covariance matrix,
\begin{align*}
P_{t+1\mid t} = \mathbb{E}\big[(U+V)(U+V)^\top \mid \mathcal{Y}_t\big] = \mathbb{E}[UU^\top \mid \mathcal{Y}_t] + \mathbb{E}[UV^\top \mid \mathcal{Y}_t] + \mathbb{E}[VU^\top \mid \mathcal{Y}_t] + \mathbb{E}[VV^\top \mid \mathcal{Y}_t].
\end{align*}
We evaluate the four terms. For the $UU^\top$ term, $T_t$ is deterministic, so by [Basic Properties of Conditional Expectation](/theorems/1148),
\begin{align*}
\mathbb{E}[UU^\top \mid \mathcal{Y}_t] = T_t\, \mathbb{E}\big[(\alpha_t - a_{t\mid t})(\alpha_t - a_{t\mid t})^\top \mid \mathcal{Y}_t\big]\, T_t^\top = T_t\, P_{t\mid t}\, T_t^\top.
\end{align*}
For the $VV^\top$ term, $\eta_t$ is independent of $\mathcal{Y}_t$ (Step 4), so by [Conditioning and Independence](/theorems/1152), $\mathbb{E}[\eta_t \eta_t^\top \mid \mathcal{Y}_t] = \mathbb{E}[\eta_t \eta_t^\top] = Q_t$ (the covariance of the centred $\eta_t \sim \mathcal{N}(0, Q_t)$), whence
\begin{align*}
\mathbb{E}[VV^\top \mid \mathcal{Y}_t] = R_t\, \mathbb{E}[\eta_t \eta_t^\top \mid \mathcal{Y}_t]\, R_t^\top = R_t\, Q_t\, R_t^\top.
\end{align*}
The cross terms vanish, as shown in the claim below. Combining the four evaluations gives
\begin{align*}
P_{t+1\mid t} = T_t\, P_{t\mid t}\, T_t^\top + R_t\, Q_t\, R_t^\top.
\end{align*}
[claim:The conditional cross-covariance $\mathbb{E}[U V^\top \mid \mathcal{Y}_t]$ is zero]
[proof]
Write $\mathcal{G} := \sigma(\alpha_t, \mathcal{Y}_t)$, which contains $\mathcal{Y}_t$. The factor $\alpha_t - a_{t\mid t}$ is $\mathcal{G}$-measurable, since $\alpha_t$ is $\mathcal{G}$-measurable and $a_{t\mid t} = \mathbb{E}[\alpha_t \mid \mathcal{Y}_t]$ is $\mathcal{Y}_t \subseteq \mathcal{G}$-measurable. By Step 4, $\eta_t$ is independent of $\mathcal{G}$. Apply the [Tower Property of Conditional Expectation](/theorems/1150) with $\mathcal{Y}_t \subseteq \mathcal{G}$:
\begin{align*}
\mathbb{E}\big[(\alpha_t - a_{t\mid t})\, \eta_t^\top \,\big|\, \mathcal{Y}_t\big] = \mathbb{E}\Big[\, \mathbb{E}\big[(\alpha_t - a_{t\mid t})\, \eta_t^\top \,\big|\, \mathcal{G}\big] \,\Big|\, \mathcal{Y}_t \Big].
\end{align*}
Inside, the $\mathcal{G}$-measurable factor $(\alpha_t - a_{t\mid t})$ is taken outside the conditional expectation by [Basic Properties of Conditional Expectation](/theorems/1148), and then independence of $\eta_t$ from $\mathcal{G}$ gives $\mathbb{E}[\eta_t^\top \mid \mathcal{G}] = \mathbb{E}[\eta_t^\top] = 0$ via [Conditioning and Independence](/theorems/1152):
\begin{align*}
\mathbb{E}\big[(\alpha_t - a_{t\mid t})\, \eta_t^\top \,\big|\, \mathcal{G}\big] = (\alpha_t - a_{t\mid t})\, \mathbb{E}[\eta_t^\top \mid \mathcal{G}] = (\alpha_t - a_{t\mid t})\cdot 0 = 0.
\end{align*}
The outer conditional expectation of $0$ is $0$. Therefore $\mathbb{E}[(\alpha_t - a_{t\mid t})\eta_t^\top \mid \mathcal{Y}_t] = 0$, and since $T_t, R_t$ are deterministic,
\begin{align*}
\mathbb{E}[U V^\top \mid \mathcal{Y}_t] = T_t\, \mathbb{E}\big[(\alpha_t - a_{t\mid t})\eta_t^\top \mid \mathcal{Y}_t\big]\, R_t^\top = 0.
\end{align*}
Transposing the same identity gives $\mathbb{E}[V U^\top \mid \mathcal{Y}_t] = 0$ as well.
[/proof]
[/claim]
[/step]
[step:Assemble the recursions and confirm Gaussianity is preserved]
Steps 2–3 give the filtering identity $a_{t\mid t} = a_{t\mid t-1}$ and $P_{t\mid t} = P_{t\mid t-1}$, while Steps 5–6 give the prediction recursion $a_{t+1\mid t} = T_t a_{t\mid t} + c_t$ and $P_{t+1\mid t} = T_t P_{t\mid t} T_t^\top + R_t Q_t R_t^\top$. These are exactly the four displayed equations of the statement.
Finally, the qualitative claim that the conditional *law* of $\alpha_t$ is unchanged follows from $\mathcal{Y}_t = \mathcal{Y}_{t-1}$ together with the linear-Gaussian structure: $\alpha_t$ and the recorded observations are jointly Gaussian (each is an affine image of the Gaussian primitive vector $(\alpha_1, \eta_1, \dots, \varepsilon_1, \dots)$ by [Affine Transformations of Multivariate Normals](/theorems/1853)), so the conditional distribution $\alpha_t \mid \mathcal{Y}_t$ is Gaussian and is completely determined by its first two moments $(a_{t\mid t}, P_{t\mid t})$. Because these moments coincide with $(a_{t\mid t-1}, P_{t\mid t-1})$, the filtered distribution equals the one-step-ahead predictive distribution, and the prediction step then proceeds from the unchanged filtered moments exactly as in the fully observed case. This completes the proof.
[/step]