[proofplan]
Condition on the past observation sigma-algebra $\mathcal G_{t-1}$ and regard the forecast mean and covariance as fixed quantities. Under the linear Gaussian observation equation, the pair $(\alpha_t,Y_t)$ is jointly Gaussian with covariance blocks that can be computed directly from independence of $\alpha_t$ and the observation noise. The Gaussian conditioning formula then gives the filtering mean and covariance, and the algebraic identity
\begin{align*}
K_t F_t K_t^\top = P_{t|t-1}Z_t^\top F_t^{-1}Z_tP_{t|t-1}
\end{align*}
rewrites the covariance in Kalman-gain form. Finally, the state transition is an affine image of the filtered Gaussian state plus independent Gaussian noise, so its conditional mean and covariance are obtained by the standard affine Gaussian calculation.
[/proofplan]
[step:Compute the conditional joint Gaussian law of the state and observation]
Fix the sigma-algebra $\mathcal G_{t-1} := \sigma(Y_1,\dots,Y_{t-1})$. Conditionally on $\mathcal G_{t-1}$, define the affine map
\begin{align*}
A_t: \mathbb R^m \times \mathbb R^p &\to \mathbb R^m \times \mathbb R^p \\
(x,e) &\mapsto (x, Z_t x + d_t + e).
\end{align*}
Since $\alpha_t$ and $\varepsilon_t$ are conditionally independent Gaussian vectors given $\mathcal G_{t-1}$, the vector $(\alpha_t,\varepsilon_t)$ is conditionally Gaussian, and therefore $(\alpha_t,Y_t) = A_t(\alpha_t,\varepsilon_t)$ is conditionally Gaussian given $\mathcal G_{t-1}$.
Its conditional mean is
\begin{align*}
\mathbb E[\alpha_t \mid \mathcal G_{t-1}] &= a_{t|t-1}, \\
\mathbb E[Y_t \mid \mathcal G_{t-1}] &= Z_t a_{t|t-1} + d_t.
\end{align*}
Using the observation equation $Y_t = Z_t\alpha_t + d_t + \varepsilon_t$ and the conditional independence of $\alpha_t$ and $\varepsilon_t$ given $\mathcal G_{t-1}$, the covariance blocks are
\begin{align*}
\operatorname{Cov}(\alpha_t,\alpha_t \mid \mathcal G_{t-1}) &= P_{t|t-1}, \\
\operatorname{Cov}(\alpha_t,Y_t \mid \mathcal G_{t-1}) &= P_{t|t-1}Z_t^\top, \\
\operatorname{Cov}(Y_t,Y_t \mid \mathcal G_{t-1}) &= Z_tP_{t|t-1}Z_t^\top + H_t = F_t.
\end{align*}
Thus, conditionally on $\mathcal G_{t-1}$, the pair $(\alpha_t,Y_t)$ is jointly Gaussian with the displayed mean vector and covariance blocks.
[guided]
We freeze the information in the past observations by conditioning on $\mathcal G_{t-1}$. After this conditioning, the quantities $a_{t|t-1}$ and $P_{t|t-1}$ are known with respect to the conditional probability law. The observation equation is linear in the state and in the observation noise, so we package it as an affine map
\begin{align*}
A_t: \mathbb R^m \times \mathbb R^p &\to \mathbb R^m \times \mathbb R^p \\
(x,e) &\mapsto (x, Z_t x + d_t + e).
\end{align*}
The input vector $(\alpha_t,\varepsilon_t)$ is conditionally Gaussian because its two components are conditionally independent Gaussian vectors. Affine images of Gaussian vectors are Gaussian, so $(\alpha_t,Y_t) = A_t(\alpha_t,\varepsilon_t)$ is conditionally jointly Gaussian given $\mathcal G_{t-1}$.
The conditional mean of the first component is the forecast mean:
\begin{align*}
\mathbb E[\alpha_t \mid \mathcal G_{t-1}] = a_{t|t-1}.
\end{align*}
For the observation component, linearity of conditional expectation gives
\begin{align*}
\mathbb E[Y_t \mid \mathcal G_{t-1}]
&= \mathbb E[Z_t\alpha_t + d_t + \varepsilon_t \mid \mathcal G_{t-1}] \\
&= Z_t\mathbb E[\alpha_t \mid \mathcal G_{t-1}] + d_t + \mathbb E[\varepsilon_t \mid \mathcal G_{t-1}] \\
&= Z_t a_{t|t-1} + d_t.
\end{align*}
The covariance of the state with itself is, by hypothesis,
\begin{align*}
\operatorname{Cov}(\alpha_t,\alpha_t \mid \mathcal G_{t-1}) = P_{t|t-1}.
\end{align*}
For the cross-covariance, the deterministic shift $d_t$ contributes nothing and the conditional independence of $\alpha_t$ and $\varepsilon_t$ gives zero cross-covariance:
\begin{align*}
\operatorname{Cov}(\alpha_t,Y_t \mid \mathcal G_{t-1})
&= \operatorname{Cov}(\alpha_t,Z_t\alpha_t + d_t + \varepsilon_t \mid \mathcal G_{t-1}) \\
&= \operatorname{Cov}(\alpha_t,\alpha_t \mid \mathcal G_{t-1})Z_t^\top
+ \operatorname{Cov}(\alpha_t,\varepsilon_t \mid \mathcal G_{t-1}) \\
&= P_{t|t-1}Z_t^\top.
\end{align*}
Similarly, the conditional covariance of the observation is
\begin{align*}
\operatorname{Cov}(Y_t,Y_t \mid \mathcal G_{t-1})
&= \operatorname{Cov}(Z_t\alpha_t + d_t + \varepsilon_t,Z_t\alpha_t + d_t + \varepsilon_t \mid \mathcal G_{t-1}) \\
&= Z_tP_{t|t-1}Z_t^\top + H_t \\
&= F_t.
\end{align*}
This identifies the exact jointly Gaussian law needed for the conditioning step.
[/guided]
[/step]
[step:Apply the Gaussian conditioning formula to update the filtered state]
We apply the conditional version of the [Conditional Distribution Formula for Jointly Gaussian Random Vectors](/theorems/???) under a regular conditional law given $\mathcal G_{t-1}$, with $X = \alpha_t$ and $W = Y_t$. Under this conditional law, the $\mathcal G_{t-1}$-measurable mean and covariance blocks computed above are fixed parameters. The required non-degeneracy hypothesis is precisely that the covariance block of $W$ is invertible; here that block is $F_t$, which is invertible by hypothesis. Therefore, conditionally on $\mathcal G_t = \sigma(\mathcal G_{t-1},Y_t)$,
\begin{align*}
\alpha_t \mid \mathcal G_t
\sim
\mathcal N\left(
a_{t|t-1}
+
P_{t|t-1}Z_t^\top F_t^{-1}
\bigl(Y_t - Z_ta_{t|t-1} - d_t\bigr),
\,
P_{t|t-1}
-
P_{t|t-1}Z_t^\top F_t^{-1}Z_tP_{t|t-1}
\right).
\end{align*}
Since $v_t = Y_t - Z_ta_{t|t-1} - d_t$ and $K_t = P_{t|t-1}Z_t^\top F_t^{-1}$, the conditional mean is
\begin{align*}
a_{t|t} = a_{t|t-1} + K_t v_t.
\end{align*}
[guided]
The previous step put us exactly in the setting of the conditional version of the [Conditional Distribution Formula for Jointly Gaussian Random Vectors](/theorems/???). We work under a regular conditional law given $\mathcal G_{t-1}$; under that conditional law, the $\mathcal G_{t-1}$-measurable forecast quantities are treated as fixed mean and covariance parameters. In the formula, the random vector being updated is $X = \alpha_t$, and the random vector being observed is $W = Y_t$. The conditional mean blocks are
\begin{align*}
\mu_X &= a_{t|t-1}, \\
\mu_W &= Z_ta_{t|t-1} + d_t,
\end{align*}
and the covariance blocks are
\begin{align*}
\Sigma_{XX} &= P_{t|t-1}, \\
\Sigma_{XW} &= P_{t|t-1}Z_t^\top, \\
\Sigma_{WW} &= F_t.
\end{align*}
The Gaussian conditioning formula requires the observed covariance block $\Sigma_{WW}$ to be invertible. In this theorem, $\Sigma_{WW}=F_t$, and invertibility of $F_t$ is assumed.
Applying the formula gives the conditional law of $\alpha_t$ after observing $Y_t$:
\begin{align*}
\alpha_t \mid \mathcal G_t
\sim
\mathcal N\left(
\mu_X + \Sigma_{XW}\Sigma_{WW}^{-1}(Y_t-\mu_W),
\,
\Sigma_{XX}-\Sigma_{XW}\Sigma_{WW}^{-1}\Sigma_{WX}
\right).
\end{align*}
Substituting the blocks computed above,
\begin{align*}
\alpha_t \mid \mathcal G_t
\sim
\mathcal N\left(
a_{t|t-1}
+
P_{t|t-1}Z_t^\top F_t^{-1}
\bigl(Y_t - Z_ta_{t|t-1} - d_t\bigr),
\,
P_{t|t-1}
-
P_{t|t-1}Z_t^\top F_t^{-1}Z_tP_{t|t-1}
\right).
\end{align*}
The innovation is exactly
\begin{align*}
v_t = Y_t - Z_ta_{t|t-1} - d_t,
\end{align*}
and the Kalman gain is exactly
\begin{align*}
K_t = P_{t|t-1}Z_t^\top F_t^{-1}.
\end{align*}
Therefore the filtered conditional mean is
\begin{align*}
a_{t|t} = a_{t|t-1} + K_t v_t.
\end{align*}
[/guided]
[/step]
[step:Rewrite the filtered covariance in Kalman gain form]
The covariance obtained from Gaussian conditioning is
\begin{align*}
P_{t|t}
=
P_{t|t-1}
-
P_{t|t-1}Z_t^\top F_t^{-1}Z_tP_{t|t-1}.
\end{align*}
Since $P_{t|t-1}$ and $H_t$ are symmetric, $F_t = Z_tP_{t|t-1}Z_t^\top + H_t$ is symmetric. Because $F_t$ is invertible, $F_t^{-1}$ is symmetric. Hence
\begin{align*}
K_tF_tK_t^\top
&=
P_{t|t-1}Z_t^\top F_t^{-1}
F_t
(F_t^{-1})^\top Z_tP_{t|t-1} \\
&=
P_{t|t-1}Z_t^\top F_t^{-1}Z_tP_{t|t-1}.
\end{align*}
Substituting this identity into the covariance formula gives
\begin{align*}
P_{t|t} = P_{t|t-1} - K_tF_tK_t^\top.
\end{align*}
[guided]
The Gaussian conditioning formula gives the covariance in the form
\begin{align*}
P_{t|t}
=
P_{t|t-1}
-
P_{t|t-1}Z_t^\top F_t^{-1}Z_tP_{t|t-1}.
\end{align*}
We now rewrite the second term using the Kalman gain. Since $P_{t|t-1}$ and $H_t$ are symmetric covariance matrices, the matrix
\begin{align*}
F_t = Z_tP_{t|t-1}Z_t^\top + H_t
\end{align*}
is symmetric. The hypothesis says $F_t$ is invertible, and the inverse of an invertible symmetric matrix is symmetric, so $(F_t^{-1})^\top = F_t^{-1}$. Using the definition
\begin{align*}
K_t = P_{t|t-1}Z_t^\top F_t^{-1},
\end{align*}
we compute
\begin{align*}
K_tF_tK_t^\top
&=
P_{t|t-1}Z_t^\top F_t^{-1}
F_t
(F_t^{-1})^\top Z_tP_{t|t-1} \\
&=
P_{t|t-1}Z_t^\top F_t^{-1}Z_tP_{t|t-1}.
\end{align*}
Substituting this identity into the covariance formula yields
\begin{align*}
P_{t|t} = P_{t|t-1} - K_tF_tK_t^\top.
\end{align*}
[/guided]
[/step]
[step:Propagate the filtered Gaussian law through the state transition]
Condition on $\mathcal G_t$. By the filtering step,
\begin{align*}
\alpha_t \mid \mathcal G_t \sim \mathcal N(a_{t|t},P_{t|t}).
\end{align*}
The transition noise $\eta_{t+1}$ is conditionally independent of $\alpha_t$ given $\mathcal G_t$ and satisfies
\begin{align*}
\eta_{t+1} \mid \mathcal G_t \sim \mathcal N(0,Q_t).
\end{align*}
Define the affine transition map
\begin{align*}
B_t: \mathbb R^m \times \mathbb R^r &\to \mathbb R^m \\
(x,u) &\mapsto T_t x + c_t + R_tu.
\end{align*}
Then $\alpha_{t+1}=B_t(\alpha_t,\eta_{t+1})$ is conditionally Gaussian given $\mathcal G_t$. Its conditional mean is
\begin{align*}
\mathbb E[\alpha_{t+1}\mid \mathcal G_t]
&=
T_t\mathbb E[\alpha_t\mid \mathcal G_t]+c_t+R_t\mathbb E[\eta_{t+1}\mid \mathcal G_t] \\
&=
T_ta_{t|t}+c_t.
\end{align*}
Its conditional covariance is
\begin{align*}
\operatorname{Cov}(\alpha_{t+1},\alpha_{t+1}\mid \mathcal G_t)
&=
\operatorname{Cov}(T_t\alpha_t+c_t+R_t\eta_{t+1},T_t\alpha_t+c_t+R_t\eta_{t+1}\mid \mathcal G_t) \\
&=
T_tP_{t|t}T_t^\top + R_tQ_tR_t^\top,
\end{align*}
because the deterministic shift $c_t$ contributes no covariance and conditional independence gives zero cross-covariance between $\alpha_t$ and $\eta_{t+1}$. Therefore
\begin{align*}
a_{t+1|t} &= T_ta_{t|t}+c_t, \\
P_{t+1|t} &= T_tP_{t|t}T_t^\top + R_tQ_tR_t^\top.
\end{align*}
[guided]
After observing $Y_t$, the filtered state is Gaussian:
\begin{align*}
\alpha_t \mid \mathcal G_t \sim \mathcal N(a_{t|t},P_{t|t}).
\end{align*}
The next state is obtained from the filtered state by the affine transition equation
\begin{align*}
\alpha_{t+1} = T_t\alpha_t + c_t + R_t\eta_{t+1}.
\end{align*}
The transition noise satisfies
\begin{align*}
\eta_{t+1}\mid \mathcal G_t \sim \mathcal N(0,Q_t),
\end{align*}
and is conditionally independent of $\alpha_t$ given $\mathcal G_t$. Thus the pair $(\alpha_t,\eta_{t+1})$ is conditionally jointly Gaussian, and the affine map
\begin{align*}
B_t: \mathbb R^m \times \mathbb R^r &\to \mathbb R^m \\
(x,u) &\mapsto T_t x + c_t + R_tu
\end{align*}
sends it to the conditionally Gaussian vector $\alpha_{t+1}$.
We now compute its conditional mean. Linearity of conditional expectation gives
\begin{align*}
\mathbb E[\alpha_{t+1}\mid \mathcal G_t]
&=
\mathbb E[T_t\alpha_t+c_t+R_t\eta_{t+1}\mid \mathcal G_t] \\
&=
T_t\mathbb E[\alpha_t\mid \mathcal G_t]+c_t+R_t\mathbb E[\eta_{t+1}\mid \mathcal G_t] \\
&=
T_ta_{t|t}+c_t.
\end{align*}
We next compute the conditional covariance. The deterministic vector $c_t$ does not affect covariance. Conditional independence of $\alpha_t$ and $\eta_{t+1}$ gives
\begin{align*}
\operatorname{Cov}(\alpha_t,\eta_{t+1}\mid \mathcal G_t)=0.
\end{align*}
Therefore
\begin{align*}
\operatorname{Cov}(\alpha_{t+1},\alpha_{t+1}\mid \mathcal G_t)
&=
\operatorname{Cov}(T_t\alpha_t+R_t\eta_{t+1},T_t\alpha_t+R_t\eta_{t+1}\mid \mathcal G_t) \\
&=
T_t\operatorname{Cov}(\alpha_t,\alpha_t\mid \mathcal G_t)T_t^\top
+
R_t\operatorname{Cov}(\eta_{t+1},\eta_{t+1}\mid \mathcal G_t)R_t^\top \\
&=
T_tP_{t|t}T_t^\top + R_tQ_tR_t^\top.
\end{align*}
Thus the one-step prediction mean and covariance are
\begin{align*}
a_{t+1|t} &= T_ta_{t|t}+c_t, \\
P_{t+1|t} &= T_tP_{t|t}T_t^\top + R_tQ_tR_t^\top.
\end{align*}
[/guided]
[/step]