[proofplan]
The likelihood is obtained by decomposing the joint density of the observations into one-step predictive conditional densities. The Kalman filter identifies each conditional distribution $Y_t \mid (Y_1=y_1,\dots,Y_{t-1}=y_{t-1})$ as a Gaussian distribution with mean $Z_t a_{t \mid t-1}+d_t$ and covariance $F_t$. Substituting the multivariate Gaussian density into the product decomposition and then taking logarithms gives the stated prediction error likelihood.
[/proofplan]
[step:Write the joint likelihood as a product of one-step predictive densities]
Let $p_{1:n}: (\mathbb{R}^m)^n \to [0,\infty)$ denote the joint density of $(Y_1,\dots,Y_n)$ with respect to $\mathcal{L}^{mn}$. For each $t \in \{1,\dots,n\}$, let
\begin{align*}
p_t(\,\cdot \mid y_1,\dots,y_{t-1}): \mathbb{R}^m \to [0,\infty)
\end{align*}
denote the conditional density of $Y_t$ given $Y_1=y_1,\dots,Y_{t-1}=y_{t-1}$ with respect to $\mathcal{L}^m$, where for $t=1$ this means the marginal density of $Y_1$ conditional only on the fixed initial quantities $a_1$ and $P_1$.
The chain rule for conditional densities gives
\begin{align*}
p_{1:n}(y_1,\dots,y_n)
=
\prod_{t=1}^{n} p_t(y_t \mid y_1,\dots,y_{t-1}).
\end{align*}
Therefore the likelihood function, conditional on $a_1$ and $P_1$, is
\begin{align*}
L(y_1,\dots,y_n \mid a_1,P_1)
=
\prod_{t=1}^{n} p_t(y_t \mid y_1,\dots,y_{t-1}).
\end{align*}
[guided]
The likelihood is a joint density evaluated at the observed data. We denote this joint density by
\begin{align*}
p_{1:n}: (\mathbb{R}^m)^n \to [0,\infty),
\end{align*}
where the reference measure is the $mn$-dimensional Lebesgue measure $\mathcal{L}^{mn}$ on $(\mathbb{R}^m)^n$. For each time $t$, we also introduce the one-step conditional density
\begin{align*}
p_t(\,\cdot \mid y_1,\dots,y_{t-1}): \mathbb{R}^m \to [0,\infty)
\end{align*}
with respect to $\mathcal{L}^m$. When $t=1$, there are no earlier observations, so $p_1(\,\cdot\,)$ is the marginal predictive density determined by the fixed initial quantities $a_1$ and $P_1$.
The conditional density factorization states that a joint density may be built by multiplying successive conditional densities:
\begin{align*}
p_{1:n}(y_1,\dots,y_n)
=
p_1(y_1)
p_2(y_2 \mid y_1)
\cdots
p_n(y_n \mid y_1,\dots,y_{n-1}).
\end{align*}
Equivalently,
\begin{align*}
p_{1:n}(y_1,\dots,y_n)
=
\prod_{t=1}^{n} p_t(y_t \mid y_1,\dots,y_{t-1}).
\end{align*}
Thus the conditional likelihood of the observed data, with $a_1$ and $P_1$ fixed, is exactly this product:
\begin{align*}
L(y_1,\dots,y_n \mid a_1,P_1)
=
\prod_{t=1}^{n} p_t(y_t \mid y_1,\dots,y_{t-1}).
\end{align*}
This is the point at which the likelihood becomes a prediction-error likelihood: each factor depends only on the one-step-ahead prediction error at time $t$.
[/guided]
[/step]
[step:Identify the one-step predictive Gaussian density from the Kalman filter]
For each $t \in \{1,\dots,n\}$, the Kalman filter gives the conditional Gaussian law
\begin{align*}
Y_t \mid (Y_1=y_1,\dots,Y_{t-1}=y_{t-1})
\sim
\mathcal{N}(Z_t a_{t \mid t-1}+d_t, F_t).
\end{align*}
Define the predictive mean $\mu_t \in \mathbb{R}^m$ by
\begin{align*}
\mu_t = Z_t a_{t \mid t-1}+d_t.
\end{align*}
Since $F_t$ is invertible by hypothesis and is a covariance matrix, it is positive definite on the non-degenerate observation directions, so $\det F_t>0$. Hence the conditional density at $y_t$ is
\begin{align*}
p_t(y_t \mid y_1,\dots,y_{t-1})
=
(2\pi)^{-m/2}(\det F_t)^{-1/2}
\exp\left(
-\frac{1}{2}(y_t-\mu_t)^\top F_t^{-1}(y_t-\mu_t)
\right).
\end{align*}
By the definition of the innovation vector $v_t=y_t-\mu_t$, this becomes
\begin{align*}
p_t(y_t \mid y_1,\dots,y_{t-1})
=
(2\pi)^{-m/2}(\det F_t)^{-1/2}
\exp\left(
-\frac{1}{2}v_t^\top F_t^{-1}v_t
\right).
\end{align*}
[guided]
The Kalman filter supplies the one-step predictive distribution. At time $t$, after processing the observations $y_1,\dots,y_{t-1}$, it produces a predicted state mean $a_{t \mid t-1} \in \mathbb{R}^r$ and predicted state covariance $P_{t \mid t-1} \in \mathbb{R}^{r \times r}$. Passing this predicted state through the linear observation equation gives the predictive observation mean
\begin{align*}
\mu_t = Z_t a_{t \mid t-1}+d_t \in \mathbb{R}^m.
\end{align*}
The corresponding predictive covariance is the innovation covariance matrix $F_t \in \mathbb{R}^{m \times m}$.
Thus the conditional law is
\begin{align*}
Y_t \mid (Y_1=y_1,\dots,Y_{t-1}=y_{t-1})
\sim
\mathcal{N}(\mu_t,F_t).
\end{align*}
The hypothesis that $F_t$ is invertible is exactly the non-degeneracy condition needed to write an ordinary density on $\mathbb{R}^m$. Since $F_t$ is a covariance matrix and is invertible, its determinant is positive, and the multivariate Gaussian density with mean $\mu_t$ and covariance $F_t$ is
\begin{align*}
p_t(y_t \mid y_1,\dots,y_{t-1})
=
(2\pi)^{-m/2}(\det F_t)^{-1/2}
\exp\left(
-\frac{1}{2}(y_t-\mu_t)^\top F_t^{-1}(y_t-\mu_t)
\right).
\end{align*}
The innovation vector is precisely the prediction error
\begin{align*}
v_t = y_t-\mu_t = y_t-Z_t a_{t \mid t-1}-d_t.
\end{align*}
Substituting this definition into the Gaussian density gives
\begin{align*}
p_t(y_t \mid y_1,\dots,y_{t-1})
=
(2\pi)^{-m/2}(\det F_t)^{-1/2}
\exp\left(
-\frac{1}{2}v_t^\top F_t^{-1}v_t
\right).
\end{align*}
[/guided]
[/step]
[step:Take logarithms of the product of predictive densities]
Substituting the preceding density formula into the likelihood product gives
\begin{align*}
L(y_1,\dots,y_n \mid a_1,P_1)
=
\prod_{t=1}^{n}
(2\pi)^{-m/2}(\det F_t)^{-1/2}
\exp\left(
-\frac{1}{2}v_t^\top F_t^{-1}v_t
\right).
\end{align*}
Taking logarithms and using $\log(ab)=\log a+\log b$ for positive factors yields
\begin{align*}
\ell(y_1,\dots,y_n \mid a_1,P_1)
&=
\sum_{t=1}^{n}
\left[
-\frac{m}{2}\log(2\pi)
-\frac{1}{2}\log\det F_t
-\frac{1}{2}v_t^\top F_t^{-1}v_t
\right] \\
&=
-\frac{1}{2}\sum_{t=1}^{n}
\left(
m\log(2\pi)
+
\log\det F_t
+
v_t^\top F_t^{-1}v_t
\right).
\end{align*}
This is the asserted log-likelihood formula. The dependence of $v_t$ and $F_t$ on earlier observations is already encoded in the recursively computed Kalman filter quantities $a_{t \mid t-1}$ and $P_{t \mid t-1}$, so the expression is evaluated sequentially at the fixed model parameters.
[/step]