[proofplan]
The proof uses the standard continuous-time Gaussian innovations theorem as the external analytic input: it turns the observation filtration into the innovation filtration and gives the martingale representation of the conditional mean. We then identify the predictable representation coefficient by testing the orthogonality of the estimation error against arbitrary stochastic integrals with respect to the innovations process; this gives the gain identity in integrated form and hence $\mathcal L^1\otimes\mathbb P$-almost everywhere. Finally, subtracting the filter from the state equation and applying Itô's product rule to the error covariance gives the Riccati equation, and the almost-everywhere gain identity is enough because stochastic integrals are unchanged by modifying predictable integrands on null sets.
[/proofplan]
[step:Fix the observation and innovation filtrations]
Let $(\mathcal F_t)_{t\ge 0}$ be the completed filtration generated by $x_0$, $\{w(s):0\le s\le t\}$, and $\{v(s):0\le s\le t\}$. The local boundedness and continuity of $A,B,G,C,D,u$ imply existence and uniqueness of the linear Itô equations on each finite interval, and the resulting processes are square-integrable on each finite interval; in particular the deterministic input term $B(t)u(t)$ is locally square-integrable. Define $e:[0,\infty)\times\Omega\to\mathbb R^n$ by $e(t):=x(t)-\hat x(t)$.
For each $T>0$, continuity of $D$ and positive definiteness of $R(t)=D(t)D(t)^\top$ imply that the smallest eigenvalue of $R(t)$ has a positive minimum on $[0,T]$. Hence $R^{-1}:[0,\infty)\to\mathbb R^{m\times m}$ is locally bounded and continuous.
Because the system is linear with Gaussian initial condition and Gaussian noises, every finite collection of state and observation variables is jointly Gaussian. By the Gaussian Hilbert-space [projection theorem](/theorems/1985), the [conditional expectation](/page/Conditional%20Expectation) $\hat x(t)=\mathbb E[x(t)\mid\mathcal Y_t]$ is the $L^2$-[orthogonal projection](/theorems/437) of $x(t)$ onto the closed subspace of $L^2(\Omega;\mathbb R^n)$ generated by square-integrable $\mathcal Y_t$-measurable random variables. Therefore $e(t)$ is orthogonal to every square-integrable $\mathcal Y_t$-measurable random vector. In the jointly Gaussian case this orthogonality also implies that $e(t)$ is independent of $\mathcal Y_t$, so the conditional covariance $\mathbb E[e(t)e(t)^\top\mid\mathcal Y_t]$ is deterministic and equals
\begin{align*}
P(t):=\mathbb E[e(t)e(t)^\top].
\end{align*}
Define the innovation process $\nu:[0,\infty)\times\Omega\to\mathbb R^m$ by
\begin{align*}
\nu(t):=y(t)-\int_0^t C(s)\hat x(s)\,d\mathcal L^1(s).
\end{align*}
Then
\begin{align*}
d\nu(t)=C(t)e(t)\,dt+D(t)\,dv(t).
\end{align*}
We use the external standard Continuous-Time Gaussian Innovations Theorem in the following form. For a linear Gaussian signal-observation model adapted to the completed driving filtration, with deterministic locally bounded coefficients, progressively measurable square-integrable signal and observation processes on finite intervals, and positive definite locally bounded observation covariance $R(t)$ with locally bounded inverse, the completed observation filtration equals the completed natural filtration of $\nu$; the process $\nu$ is a continuous square-integrable $(\mathcal Y_t)$-martingale; its quadratic covariation satisfies
\begin{align*}
d\langle \nu\rangle_t=R(t)\,dt;
\end{align*}
and every square-integrable continuous $(\mathcal Y_t)$-martingale has a predictable stochastic-integral representation with respect to $\nu$. Moreover, if the signal has finite-variation drift $a(t)$ and martingale noise independent of the observation noise except through the stated observation equation, then the conditional mean has drift $\mathbb E[a(t)\mid\mathcal Y_t]$ plus such a predictable integral with respect to $\nu$. In the present model the state and observation processes are adapted and continuous, hence progressive; their local square-integrability was checked above. The covariance hypotheses hold because $R$ and $R^{-1}$ are locally bounded, so the theorem applies.
[/step]
[step:Represent the conditional mean using the innovations process]
Apply the conditional-mean part of the continuous-time Gaussian innovations theorem to the state drift
\begin{align*}
a(t):=A(t)x(t)+B(t)u(t).
\end{align*}
Because $A$, $B$, and $u$ are deterministic and $u$ is not random,
\begin{align*}
\mathbb E[a(t)\mid\mathcal Y_t]=A(t)\hat x(t)+B(t)u(t).
\end{align*}
Therefore, for every $T>0$, there exists a predictable process $H:[0,T]\times\Omega\to\mathbb R^{n\times m}$ with
\begin{align*}
\mathbb E\left[\int_0^{\!T}\operatorname{tr}\bigl(H(s)R(s)H(s)^\top\bigr)\,d\mathcal L^1(s)\right]<\infty
\end{align*}
such that, for $0\le t\le T$,
\begin{align*}
d\hat x(t)=A(t)\hat x(t)\,dt+B(t)u(t)\,dt+H(t)\,d\nu(t).
\end{align*}
Since $T$ is arbitrary, these local representations agree on overlaps up to $\mathcal L^1\otimes\mathbb P$-null sets.
[/step]
[step:Identify the gain by testing orthogonality against innovation integrals]
Fix $T>0$. Let $Z:[0,T]\times\Omega\to\mathbb R^{m\times n}$ be any bounded predictable process. Define the square-integrable $\mathcal Y_T$-measurable random matrix
\begin{align*}
N_Z(T):=\int_0^{\!T}Z(s)^\top\,d\nu(s).
\end{align*}
Orthogonality of $e(T)$ to all square-integrable $\mathcal Y_T$-measurable random vectors gives
\begin{align*}
\mathbb E[e(T)N_Z(T)^\top]=0.
\end{align*}
Subtracting the represented equation for $\hat x$ from the state equation and using $d\nu(t)=C(t)e(t)\,dt+D(t)\,dv(t)$ gives
\begin{align*}
de(t)=(A(t)-H(t)C(t))e(t)\,dt+G(t)\,dw(t)-H(t)D(t)\,dv(t).
\end{align*}
Apply Itô's product rule to $e(t)N_Z(t)^\top$, where $N_Z(t):=\int_0^t Z(s)^\top\,d\nu(s)$. Taking expectations eliminates the stochastic-integral terms by square-integrability and predictability. The terms containing $e(t)N_Z(t)^\top$ also have expectation zero because $N_Z(t)$ is $\mathcal Y_t$-measurable and $\mathbb E[e(t)\mid\mathcal Y_t]=0$. The finite-variation contribution from $e(t)dN_Z(t)^\top$ is $P(t)C(t)^\top Z(t)\,d\mathcal L^1(t)$, and the quadratic-covariation contribution from $de(t)dN_Z(t)^\top$ is $-H(t)R(t)Z(t)\,d\mathcal L^1(t)$. Therefore
\begin{align*}
\mathbb E\left[\int_0^{\!T}\bigl(P(t)C(t)^\top-H(t)R(t)\bigr)Z(t)\,d\mathcal L^1(t)\right]=0.
\end{align*}
Since bounded predictable $Z$ was arbitrary, the fundamental lemma for predictable processes gives
\begin{align*}
H(t)R(t)=P(t)C(t)^\top
\end{align*}
for $\mathcal L^1\otimes\mathbb P$-almost every $(t,\omega)\in[0,T]\times\Omega$. Since $R(t)$ is invertible, define the deterministic measurable gain
\begin{align*}
K(t):=P(t)C(t)^\top R(t)^{-1}.
\end{align*}
Then $H=K$ $\mathcal L^1\otimes\mathbb P$-almost everywhere on $[0,T]\times\Omega$. Stochastic integrals with respect to $\nu$ are unchanged up to indistinguishability when predictable integrands are modified on such a null set. Hence, on $[0,T]$,
\begin{align*}
d\hat x(t)=A(t)\hat x(t)\,dt+B(t)u(t)\,dt+K(t)\bigl(dy(t)-C(t)\hat x(t)\,dt\bigr)
\end{align*}
as a semimartingale identity up to indistinguishability. Since $T>0$ was arbitrary, the identity holds locally on $[0,\infty)$.
[/step]
[step:Compute the error covariance equation]
Using the filter equation and the observation equation, the error process satisfies
\begin{align*}
de(t)=(A(t)-K(t)C(t))e(t)\,dt+G(t)\,dw(t)-K(t)D(t)\,dv(t)
\end{align*}
with equality up to indistinguishability. Apply Itô's product rule to $e(t)e(t)^\top$:
\begin{align*}
d(e(t)e(t)^\top)=de(t)e(t)^\top+e(t)de(t)^\top+de(t)de(t)^\top.
\end{align*}
Taking expectations eliminates the stochastic-integral terms. Since $w$ and $v$ are independent standard Brownian motions, their cross-variation is zero, and
\begin{align*}
\mathbb E[de(t)de(t)^\top]=(G(t)G(t)^\top+K(t)R(t)K(t)^\top)\,dt.
\end{align*}
Therefore $P$ is locally absolutely continuous and, for $\mathcal L^1$-almost every $t\ge 0$,
\begin{align*}
\dot P(t)=(A(t)-K(t)C(t))P(t)+P(t)(A(t)-K(t)C(t))^\top+G(t)G(t)^\top+K(t)R(t)K(t)^\top.
\end{align*}
[/step]
[step:Substitute the gain and obtain the Riccati equation]
Since $P(t)$ is a covariance matrix, $P(t)=P(t)^\top$. Since $R(t)$ is positive definite, $R(t)=R(t)^\top$ and $R(t)^{-1}=(R(t)^{-1})^\top$. With
\begin{align*}
K(t)=P(t)C(t)^\top R(t)^{-1},
\end{align*}
we have
\begin{align*}
K(t)C(t)P(t)=P(t)C(t)^\top R(t)^{-1}C(t)P(t)
\end{align*}
and
\begin{align*}
K(t)R(t)K(t)^\top=P(t)C(t)^\top R(t)^{-1}C(t)P(t).
\end{align*}
Substituting these two identities into the covariance equation leaves one negative correction term, so for $\mathcal L^1$-almost every $t\ge 0$,
\begin{align*}
\dot P(t)=A(t)P(t)+P(t)A(t)^\top+G(t)G(t)^\top-P(t)C(t)^\top R(t)^{-1}C(t)P(t).
\end{align*}
Together with the indistinguishable semimartingale filter identity proved above, this is the asserted Kalman-Bucy filter and Riccati covariance equation.
[guided]
The proof starts by converting the observation equation into an innovation equation. Define $e(t)=x(t)-\hat x(t)$ and
\begin{align*}
\nu(t)=y(t)-\int_0^t C(s)\hat x(s)\,d\mathcal L^1(s).
\end{align*}
Then $d\nu(t)=C(t)e(t)\,dt+D(t)\,dv(t)$. The continuous-time Gaussian innovations theorem applies because the model is linear Gaussian, the coefficients are deterministic and locally bounded, the state and observation processes are progressive and square-integrable on compact intervals, and $R(t)=D(t)D(t)^\top$ is positive definite with locally bounded inverse. It gives that $\nu$ generates the observation filtration, that $d\langle\nu\rangle_t=R(t)\,dt$, and that the conditional mean has the representation
\begin{align*}
d\hat x(t)=A(t)\hat x(t)\,dt+B(t)u(t)\,dt+H(t)d\nu(t)
\end{align*}
for some predictable matrix process $H$.
It remains to identify $H$. For every bounded predictable $Z:[0,T]\times\Omega\to\mathbb R^{m\times n}$, set
\begin{align*}
N_Z(t)=\int_0^t Z(s)^\top\,d\nu(s).
\end{align*}
The random matrix $N_Z(T)$ is $\mathcal Y_T$-measurable, while $e(T)$ is orthogonal to all square-integrable $\mathcal Y_T$-measurable random vectors. Hence $\mathbb E[e(T)N_Z(T)^\top]=0$. Applying Itô's product rule to $e(t)N_Z(t)^\top$, using the represented filter equation and $d\nu(t)=C(t)e(t)\,dt+D(t)\,dv(t)$, gives
\begin{align*}
\mathbb E\left[\int_0^{\!T}\bigl(P(t)C(t)^\top-H(t)R(t)\bigr)Z(t)\,d\mathcal L^1(t)\right]=0.
\end{align*}
Since $Z$ was arbitrary, $H(t)R(t)=P(t)C(t)^\top$ almost everywhere. Thus $H$ may be replaced in the stochastic integral by
\begin{align*}
K(t)=P(t)C(t)^\top R(t)^{-1}.
\end{align*}
Substituting $d\nu(t)=dy(t)-C(t)\hat x(t)\,dt$ gives the Kalman-Bucy filter equation.
Finally, subtract the filter equation from the state equation:
\begin{align*}
de(t)=(A(t)-K(t)C(t))e(t)\,dt+G(t)\,dw(t)-K(t)D(t)\,dv(t).
\end{align*}
Itô's product rule for $e(t)e(t)^\top$, independence of $w$ and $v$, and $R(t)=D(t)D(t)^\top$ give
\begin{align*}
\dot P(t)=(A(t)-K(t)C(t))P(t)+P(t)(A(t)-K(t)C(t))^\top+G(t)G(t)^\top+K(t)R(t)K(t)^\top.
\end{align*}
Using $K(t)=P(t)C(t)^\top R(t)^{-1}$ and the symmetry of $P(t)$ and $R(t)^{-1}$ simplifies this equation to the stated Riccati equation.
[/guided]
[/step]