[proofplan]
The strategy is to express the adjoint $a_t = \partial_{y_t} L(y_T)$ in closed form using the Jacobian of the flow, and then differentiate. By the chain rule for the composition $y_0 \mapsto y_t \mapsto y_T \mapsto L(y_T)$, the partial derivative with respect to $y_t$ factors as $a_t^\top = \nabla L(y_T)^\top \cdot J_T^t$, where $J_T^t = \partial_{y_t} y_T$ is the forward Jacobian from time $t$ to time $T$. Using the cocycle relation $J_T^t = J_T^0 \cdot (J_t^0)^{-1} = J_T^0 \cdot M_t^0$ from the [Invertibility of the CDE Jacobian](/theorems/2542), we get $a_t^\top = \nabla L(y_T)^\top J_T^0 \cdot M_t^0$. The vector $u^\top := \nabla L(y_T)^\top J_T^0$ is constant in $t$, so $a_t^\top = u^\top M_t^0$ inherits its dynamics directly from the right-acting equation $dM_t = -M_t\, dz_t$, yielding $da_t = -a_t^\top dz_t$. Substituting $dz_t = \sum_i \nabla f_\theta^i(y_t)\, dx_t^i$ and reading off the terminal condition completes the proof.
[/proofplan]
[step:Express the adjoint in terms of the forward Jacobian via the chain rule]
Let $J_t^s := \partial_{y_s} y_t$ for $s \le t$ denote the Jacobian of the flow of the CDE from time $s$ to time $t$, evaluated along the trajectory. Concretely, $J_t^s \in \mathbb{R}^{e \times e}$ is the unique solution to the linearised CDE
\begin{align*}
dJ_t^s = \nabla f_\theta(y_t) \cdot J_t^s \cdot dx_t = dz_t \cdot J_t^s, \qquad J_s^s = I_e,
\end{align*}
where we have set $dz_t := \sum_{i=1}^d \nabla f_\theta^i(y_t)\, dx_t^i \in \mathbb{R}^{e \times e}$, the auxiliary matrix-valued bounded-variation driver.
The map $y_s \mapsto y_T$ is the composition of $y_s \mapsto y_t \mapsto y_T$, so by the chain rule,
\begin{align*}
\partial_{y_t} L(y_T) = \nabla L(y_T)^\top \cdot \partial_{y_t} y_T = \nabla L(y_T)^\top \cdot J_T^t.
\end{align*}
Therefore
\begin{align*}
a_t^\top = \nabla L(y_T)^\top \cdot J_T^t.
\end{align*}
[guided]
The chain rule application is delicate because the gradient $\partial_{y_t} L(y_T)$ refers to differentiating the *terminal value* $L(y_T)$ with respect to the *intermediate state* $y_t$. Concretely: imagine perturbing the trajectory's value at time $t$ from $y_t$ to $y_t + \delta$, then evolving the unperturbed CDE forward from $y_t + \delta$ to obtain a new terminal value $\tilde{y}_T(\delta)$. The Jacobian $J_T^t = \partial_{y_t} y_T$ measures how $\tilde{y}_T$ depends on $\delta$:
\begin{align*}
\tilde{y}_T(\delta) = y_T + J_T^t \cdot \delta + o(|\delta|) \qquad \text{as } \delta \to 0.
\end{align*}
The chain rule for $L$ at $y_T$ then gives $L(\tilde{y}_T(\delta)) = L(y_T) + \nabla L(y_T)^\top J_T^t \delta + o(|\delta|)$, so $a_t = \partial_{y_t} L(y_T) = (J_T^t)^\top \nabla L(y_T)$, equivalently $a_t^\top = \nabla L(y_T)^\top J_T^t$.
The Jacobian $J_T^t$ exists and is continuous in $t$ because $f_\theta \in C^1$ (so $\nabla f_\theta$ is continuous), $x$ has bounded variation, and the linearised CDE has a unique solution by the standard [Existence and Uniqueness of Linear CDEs](/theorems/???).
[/guided]
[/step]
[step:Factor $J_T^t$ via the cocycle relation $J_T^t = J_T^0 \cdot M_t^0$]
The forward Jacobians satisfy the cocycle property $J_T^0 = J_T^t \cdot J_t^0$, valid because the flow $y_0 \mapsto y_T$ factors as $y_0 \mapsto y_t \mapsto y_T$ and the chain rule composes Jacobians. Hence
\begin{align*}
J_T^t = J_T^0 \cdot (J_t^0)^{-1}.
\end{align*}
By the [Invertibility of the CDE Jacobian](/theorems/2542), the inverse $M_t^0 := (J_t^0)^{-1}$ exists for every $t \in [0, T]$ and satisfies the right-acting linear CDE
\begin{align*}
dM_t^0 = -M_t^0 \cdot dz_t, \qquad M_0^0 = I_e.
\end{align*}
We verify the hypotheses of the cited theorem: the linearised CDE driver $z$ defined above has bounded variation (since $x$ has bounded variation and $\nabla f_\theta(y_\cdot)$ is bounded continuous on the compact trajectory image $\{y_t : t \in [0,T]\}$), and the matrix CDE for $J_t^0$ is the linear forward equation in the form required.
Substituting,
\begin{align*}
a_t^\top = \nabla L(y_T)^\top \cdot J_T^0 \cdot M_t^0.
\end{align*}
[guided]
The expression from Step 1, $a_t^\top = \nabla L(y_T)^\top J_T^t$, has the awkward feature that $J_T^t$ depends on $t$ as the *initial* time of the flow — so as $t$ varies, both endpoints of the flow shift. Differentiating $J_T^t$ in $t$ as the lower index does not give a clean linear CDE. To extract a clean dynamical equation for $a_t$ we need to factor $J_T^t$ into a piece that is constant in $t$ times a piece whose $t$-dependence is governed by a linear CDE.
The factorisation is provided by the **cocycle property** of the flow: the map $y_0 \mapsto y_T$ is the composition $y_0 \mapsto y_t \mapsto y_T$, and Jacobians compose under chain rule. Concretely,
\begin{align*}
J_T^0 = J_T^t \cdot J_t^0.
\end{align*}
This is the matrix-level statement of the chain rule for the composed flow. Solving for $J_T^t$ — which requires $J_t^0$ to be invertible — gives
\begin{align*}
J_T^t = J_T^0 \cdot (J_t^0)^{-1}.
\end{align*}
The factor $J_T^0$ is constant in $t$ (it is the Jacobian of the full flow from $0$ to $T$), and the entire $t$-dependence is now isolated in $(J_t^0)^{-1}$.
For this factorisation to make sense, we must verify that $J_t^0$ is invertible for every $t \in [0,T]$. This is exactly the [Invertibility of the CDE Jacobian](/theorems/2542). We verify its hypotheses: (i) the linearised CDE driver is $z$ defined above, with bounded variation because $\nabla f_\theta(y_\cdot)$ is bounded continuous on the compact image $\{y_t\}_{t \in [0,T]} \subset \mathbb{R}^e$ (since $f_\theta \in C^1$ and $y$ is continuous on a compact interval, hence so is $\nabla f_\theta(y_\cdot)$) and $x$ has bounded variation, so $z$ is the indefinite Riemann-Stieltjes integral of a bounded continuous function against a bounded-variation path — itself of bounded variation; (ii) the matrix CDE for $J_t^0$ is the linear forward equation $dJ = dz \cdot J$ with $J_0 = I_e$, exactly the form of the cited theorem. The cited theorem then gives the inverse $M_t^0 := (J_t^0)^{-1}$, well-defined for every $t \in [0,T]$, satisfying the right-acting linear CDE $dM_t^0 = -M_t^0 \cdot dz_t$ with $M_0^0 = I_e$.
Substituting the factorisation into the formula from Step 1,
\begin{align*}
a_t^\top = \nabla L(y_T)^\top \cdot J_T^0 \cdot M_t^0.
\end{align*}
The advantage of this rewriting is that the $t$-dependence is now packaged into a single factor $M_t^0$, which solves a clean linear CDE. In the next step we differentiate this expression to read off the adjoint dynamics.
[/guided]
[/step]
[step:Differentiate $a_t^\top = u^\top M_t^0$ to obtain the backward CDE]
Define the constant vector $u := (J_T^0)^\top \nabla L(y_T) \in \mathbb{R}^e$. This vector does not depend on $t$. Then
\begin{align*}
a_t^\top = u^\top \cdot M_t^0.
\end{align*}
Differentiating in $t$ via the right-acting equation $dM_t^0 = -M_t^0 \cdot dz_t$ from the previous step (and using the chain rule for the linear map $A \mapsto u^\top A$, which is bounded linear, hence commutes with the CDE differential),
\begin{align*}
da_t^\top = u^\top \cdot dM_t^0 = u^\top \cdot (-M_t^0 \cdot dz_t) = -(u^\top M_t^0) \cdot dz_t = -a_t^\top \cdot dz_t.
\end{align*}
Substituting $dz_t = \sum_{i=1}^d \nabla f_\theta^i(y_t)\, dx_t^i$:
\begin{align*}
da_t = -\sum_{i=1}^d a_t^\top \nabla f_\theta^i(y_t)\, dx_t^i.
\end{align*}
[guided]
The cleanness of this differentiation rests on writing $a_t^\top = u^\top M_t^0$ with $u$ *constant in $t$*. If we had stopped at $a_t^\top = \nabla L(y_T)^\top J_T^t$, we would have to differentiate the joint dependence of $J_T^t$ on its lower index $t$ — and the equation $dJ_T^t/dt$ is *not* a clean linear CDE in $t$, since $J_T^t$ depends on $t$ as the initial time (not the terminal time). The cocycle factorisation $J_T^t = J_T^0 M_t^0$ separates the two times: $J_T^0$ is a constant matrix (no $t$-dependence) and the $t$-dependence is entirely carried by $M_t^0$, which solves a clean right-acting linear CDE.
Once we have $a_t^\top = u^\top M_t^0$ with $u$ constant, differentiation is direct: the CDE differential $d$ is a linear operator (in the sense of acting on the path), and $u^\top \cdot$ is bounded linear, so $d(u^\top M_t^0) = u^\top \cdot dM_t^0$. This is the same justification as: if $M_t$ solves an ODE $\dot{M} = F(t)M$ with $M(0) = I$, then for any constant vector $u$ the path $u^\top M$ solves $\frac{d}{dt}(u^\top M) = u^\top \dot{M} = u^\top F(t) M$.
[/guided]
[/step]
[step:Verify the terminal condition $a_T = \nabla L(y_T)$]
At $t = T$, we have $J_T^T = I_e$ (the flow from $T$ to $T$ is the identity), so by the formula from step 1,
\begin{align*}
a_T^\top = \nabla L(y_T)^\top \cdot J_T^T = \nabla L(y_T)^\top \cdot I_e = \nabla L(y_T)^\top.
\end{align*}
Equivalently $a_T = \nabla L(y_T)$, which is the claimed terminal condition.
Combining the dynamics from step 3 with the terminal condition, the adjoint process $a_t = \partial_{y_t} L(y_T)$ satisfies
\begin{align*}
da_t = -\sum_{i=1}^d a_t^\top \nabla f_\theta^i(y_t)\, dx_t^i, \qquad a_T = \nabla L(y_T),
\end{align*}
which completes the proof.
[/step]