Adjoint Equation for a Neural CDE — Statement & Proof

Adjoint Equation for a Neural CDE (Theorem # 2543)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The strategy is to express the adjoint $a_t = \partial_{y_t} L(y_T)$ in closed form using the Jacobian of the flow, and then differentiate. By the chain rule for the composition $y_0 \mapsto y_t \mapsto y_T \mapsto L(y_T)$, the partial derivative with respect to $y_t$ factors as $a_t^\top = \nabla L(y_T)^\top \cdot J_T^t$, where $J_T^t = \partial_{y_t} y_T$ is the forward Jacobian from time $t$ to time $T$. Using the cocycle relation $J_T^t = J_T^0 \cdot (J_t^0)^{-1} = J_T^0 \cdot M_t^0$ from the [Invertibility of the CDE Jacobian](/theorems/2542), we get $a_t^\top = \nabla L(y_T)^\top J_T^0 \cdot M_t^0$. The vector $u^\top := \nabla L(y_T)^\top J_T^0$ is constant in $t$, so $a_t^\top = u^\top M_t^0$ inherits its dynamics directly from the right-acting equation $dM_t = -M_t\, dz_t$, yielding $da_t = -a_t^\top dz_t$. Substituting $dz_t = \sum_i \nabla f_\theta^i(y_t)\, dx_t^i$ and reading off the terminal condition completes the proof. [/proofplan] [step:Express the adjoint in terms of the forward Jacobian via the chain rule] Let $J_t^s := \partial_{y_s} y_t$ for $s \le t$ denote the Jacobian of the flow of the CDE from time $s$ to time $t$, evaluated along the trajectory. Concretely, $J_t^s \in \mathbb{R}^{e \times e}$ is the unique solution to the linearised CDE \begin{align*} dJ_t^s = \nabla f_\theta(y_t) \cdot J_t^s \cdot dx_t = dz_t \cdot J_t^s, \qquad J_s^s = I_e, \end{align*} where we have set $dz_t := \sum_{i=1}^d \nabla f_\theta^i(y_t)\, dx_t^i \in \mathbb{R}^{e \times e}$, the auxiliary matrix-valued bounded-variation driver. The map $y_s \mapsto y_T$ is the composition of $y_s \mapsto y_t \mapsto y_T$, so by the chain rule, \begin{align*} \partial_{y_t} L(y_T) = \nabla L(y_T)^\top \cdot \partial_{y_t} y_T = \nabla L(y_T)^\top \cdot J_T^t. \end{align*} Therefore \begin{align*} a_t^\top = \nabla L(y_T)^\top \cdot J_T^t. \end{align*} [guided] The chain rule application is delicate because the gradient $\partial_{y_t} L(y_T)$ refers to differentiating the *terminal value* $L(y_T)$ with respect to the *intermediate state* $y_t$. Concretely: imagine perturbing the trajectory's value at time $t$ from $y_t$ to $y_t + \delta$, then evolving the unperturbed CDE forward from $y_t + \delta$ to obtain a new terminal value $\tilde{y}_T(\delta)$. The Jacobian $J_T^t = \partial_{y_t} y_T$ measures how $\tilde{y}_T$ depends on $\delta$: \begin{align*} \tilde{y}_T(\delta) = y_T + J_T^t \cdot \delta + o(|\delta|) \qquad \text{as } \delta \to 0. \end{align*} The chain rule for $L$ at $y_T$ then gives $L(\tilde{y}_T(\delta)) = L(y_T) + \nabla L(y_T)^\top J_T^t \delta + o(|\delta|)$, so $a_t = \partial_{y_t} L(y_T) = (J_T^t)^\top \nabla L(y_T)$, equivalently $a_t^\top = \nabla L(y_T)^\top J_T^t$. The Jacobian $J_T^t$ exists and is continuous in $t$ because $f_\theta \in C^1$ (so $\nabla f_\theta$ is continuous), $x$ has bounded variation, and the linearised CDE has a unique solution by the standard [Existence and Uniqueness of Linear CDEs](/theorems/???). [/guided] [/step] [step:Factor $J_T^t$ via the cocycle relation $J_T^t = J_T^0 \cdot M_t^0$] The forward Jacobians satisfy the cocycle property $J_T^0 = J_T^t \cdot J_t^0$, valid because the flow $y_0 \mapsto y_T$ factors as $y_0 \mapsto y_t \mapsto y_T$ and the chain rule composes Jacobians. Hence \begin{align*} J_T^t = J_T^0 \cdot (J_t^0)^{-1}. \end{align*} By the [Invertibility of the CDE Jacobian](/theorems/2542), the inverse $M_t^0 := (J_t^0)^{-1}$ exists for every $t \in [0, T]$ and satisfies the right-acting linear CDE \begin{align*} dM_t^0 = -M_t^0 \cdot dz_t, \qquad M_0^0 = I_e. \end{align*} We verify the hypotheses of the cited theorem: the linearised CDE driver $z$ defined above has bounded variation (since $x$ has bounded variation and $\nabla f_\theta(y_\cdot)$ is bounded continuous on the compact trajectory image $\{y_t : t \in [0,T]\}$), and the matrix CDE for $J_t^0$ is the linear forward equation in the form required. Substituting, \begin{align*} a_t^\top = \nabla L(y_T)^\top \cdot J_T^0 \cdot M_t^0. \end{align*} [guided] The expression from Step 1, $a_t^\top = \nabla L(y_T)^\top J_T^t$, has the awkward feature that $J_T^t$ depends on $t$ as the *initial* time of the flow — so as $t$ varies, both endpoints of the flow shift. Differentiating $J_T^t$ in $t$ as the lower index does not give a clean linear CDE. To extract a clean dynamical equation for $a_t$ we need to factor $J_T^t$ into a piece that is constant in $t$ times a piece whose $t$-dependence is governed by a linear CDE. The factorisation is provided by the **cocycle property** of the flow: the map $y_0 \mapsto y_T$ is the composition $y_0 \mapsto y_t \mapsto y_T$, and Jacobians compose under chain rule. Concretely, \begin{align*} J_T^0 = J_T^t \cdot J_t^0. \end{align*} This is the matrix-level statement of the chain rule for the composed flow. Solving for $J_T^t$ — which requires $J_t^0$ to be invertible — gives \begin{align*} J_T^t = J_T^0 \cdot (J_t^0)^{-1}. \end{align*} The factor $J_T^0$ is constant in $t$ (it is the Jacobian of the full flow from $0$ to $T$), and the entire $t$-dependence is now isolated in $(J_t^0)^{-1}$. For this factorisation to make sense, we must verify that $J_t^0$ is invertible for every $t \in [0,T]$. This is exactly the [Invertibility of the CDE Jacobian](/theorems/2542). We verify its hypotheses: (i) the linearised CDE driver is $z$ defined above, with bounded variation because $\nabla f_\theta(y_\cdot)$ is bounded continuous on the compact image $\{y_t\}_{t \in [0,T]} \subset \mathbb{R}^e$ (since $f_\theta \in C^1$ and $y$ is continuous on a compact interval, hence so is $\nabla f_\theta(y_\cdot)$) and $x$ has bounded variation, so $z$ is the indefinite Riemann-Stieltjes integral of a bounded continuous function against a bounded-variation path — itself of bounded variation; (ii) the matrix CDE for $J_t^0$ is the linear forward equation $dJ = dz \cdot J$ with $J_0 = I_e$, exactly the form of the cited theorem. The cited theorem then gives the inverse $M_t^0 := (J_t^0)^{-1}$, well-defined for every $t \in [0,T]$, satisfying the right-acting linear CDE $dM_t^0 = -M_t^0 \cdot dz_t$ with $M_0^0 = I_e$. Substituting the factorisation into the formula from Step 1, \begin{align*} a_t^\top = \nabla L(y_T)^\top \cdot J_T^0 \cdot M_t^0. \end{align*} The advantage of this rewriting is that the $t$-dependence is now packaged into a single factor $M_t^0$, which solves a clean linear CDE. In the next step we differentiate this expression to read off the adjoint dynamics. [/guided] [/step] [step:Differentiate $a_t^\top = u^\top M_t^0$ to obtain the backward CDE] Define the constant vector $u := (J_T^0)^\top \nabla L(y_T) \in \mathbb{R}^e$. This vector does not depend on $t$. Then \begin{align*} a_t^\top = u^\top \cdot M_t^0. \end{align*} Differentiating in $t$ via the right-acting equation $dM_t^0 = -M_t^0 \cdot dz_t$ from the previous step (and using the chain rule for the linear map $A \mapsto u^\top A$, which is bounded linear, hence commutes with the CDE differential), \begin{align*} da_t^\top = u^\top \cdot dM_t^0 = u^\top \cdot (-M_t^0 \cdot dz_t) = -(u^\top M_t^0) \cdot dz_t = -a_t^\top \cdot dz_t. \end{align*} Substituting $dz_t = \sum_{i=1}^d \nabla f_\theta^i(y_t)\, dx_t^i$: \begin{align*} da_t = -\sum_{i=1}^d a_t^\top \nabla f_\theta^i(y_t)\, dx_t^i. \end{align*} [guided] The cleanness of this differentiation rests on writing $a_t^\top = u^\top M_t^0$ with $u$ *constant in $t$*. If we had stopped at $a_t^\top = \nabla L(y_T)^\top J_T^t$, we would have to differentiate the joint dependence of $J_T^t$ on its lower index $t$ — and the equation $dJ_T^t/dt$ is *not* a clean linear CDE in $t$, since $J_T^t$ depends on $t$ as the initial time (not the terminal time). The cocycle factorisation $J_T^t = J_T^0 M_t^0$ separates the two times: $J_T^0$ is a constant matrix (no $t$-dependence) and the $t$-dependence is entirely carried by $M_t^0$, which solves a clean right-acting linear CDE. Once we have $a_t^\top = u^\top M_t^0$ with $u$ constant, differentiation is direct: the CDE differential $d$ is a linear operator (in the sense of acting on the path), and $u^\top \cdot$ is bounded linear, so $d(u^\top M_t^0) = u^\top \cdot dM_t^0$. This is the same justification as: if $M_t$ solves an ODE $\dot{M} = F(t)M$ with $M(0) = I$, then for any constant vector $u$ the path $u^\top M$ solves $\frac{d}{dt}(u^\top M) = u^\top \dot{M} = u^\top F(t) M$. [/guided] [/step] [step:Verify the terminal condition $a_T = \nabla L(y_T)$] At $t = T$, we have $J_T^T = I_e$ (the flow from $T$ to $T$ is the identity), so by the formula from step 1, \begin{align*} a_T^\top = \nabla L(y_T)^\top \cdot J_T^T = \nabla L(y_T)^\top \cdot I_e = \nabla L(y_T)^\top. \end{align*} Equivalently $a_T = \nabla L(y_T)$, which is the claimed terminal condition. Combining the dynamics from step 3 with the terminal condition, the adjoint process $a_t = \partial_{y_t} L(y_T)$ satisfies \begin{align*} da_t = -\sum_{i=1}^d a_t^\top \nabla f_\theta^i(y_t)\, dx_t^i, \qquad a_T = \nabla L(y_T), \end{align*} which completes the proof. [/step]

Explore Further

MMD is a Metric under Characteristicness Stochastic Analysis Signature as Solution of a CDE Stochastic Analysis Sufficient Condition for Signature Membership Stochastic Analysis Factorial Decay Stochastic Analysis Polishness Fails for Any Reasonable Topology Stochastic Analysis Signature Universal Approximation Stochastic Analysis Adjoint Equation for a Neural RDE Stochastic Analysis Stratonovich SDEs as RDEs Stochastic Analysis

What brings you to Androma?

Start with a route through the knowledge graph.

Adjoint Equation for a Neural CDE (Theorem # 2543)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Adjoint Equation for a Neural CDE (Theorem # 2543)

Discussion

Proof

Explore Further