[proofplan]
The strategy is approximation by smooth paths followed by a passage to the limit using the [Universal Limit Theorem](/theorems/2540). The standing hypothesis $\gamma > p + 1$ is essential: it gives $\gamma - 1 > p$, which is the regularity threshold required to apply the Universal Limit Theorem to the *linearised* CDE driven by $\nabla f_\theta(y)$ — the natural backward equation for the Jacobian. We choose a sequence $(x^N)$ of bounded-variation paths whose truncated signatures converge to $x$ in $p$-variation. For each $N$, the smooth-path adjoint $a^N$ satisfies the CDE adjoint equation by the [Adjoint Equation for a Neural CDE](/theorems/2543). We then identify two convergent objects: (i) the smooth-path Jacobians $J^{0,N}, M^{0,N}$ converge to their rough-path counterparts $J^0, M^0$ via the Universal Limit Theorem applied to the linearised CDE; (ii) the auxiliary drivers $z^N := \int \nabla f_\theta(y^N)\, dx^N$ converge to a geometric rough path $z$, and the adjoint CDEs $da^N = -(a^N)^\top dz^N$ converge to the backward RDE $da = -a^\top dz$. The closed-form expression $a_t^\top = \nabla L(y_T)^\top J_T^0 M_t^0$ is preserved under the limit, identifying the limit as the actual adjoint of the rough-path system.
[/proofplan]
[step:Approximate the rough path by smooth paths]
Since $x$ is a *geometric* $p$-rough path, by the [definition of a geometric rough path](/page/Geometric%20Rough%20Path) there exists a sequence $(x^N)_{N \in \mathbb{N}}$ of continuous paths $x^N : [0,T] \to \mathbb{R}^d$ of bounded variation such that
\begin{align*}
\pi_{\le \lfloor p \rfloor} \circ S(x^N) \xrightarrow{p\text{-var}} x \qquad \text{as } N \to \infty,
\end{align*}
in the $p$-variation rough-path topology on $\Omega^{0,p}_{\lfloor p\rfloor}([0,T], \mathbb{R}^d)$.
Let $y^N : [0,T] \to \mathbb{R}^e$ denote the unique solution of the CDE
\begin{align*}
dy_t^N = f_\theta(y_t^N)\, dx_t^N, \qquad y_0^N = y_0,
\end{align*}
which exists by the Existence and Uniqueness of CDE Solutions (Friz-Victoir, *Multidimensional Stochastic Processes as Rough Paths*, 2010, Theorem 10.14). We verify the hypotheses: $f_\theta \in \mathrm{Lip}^\gamma$ with $\gamma > p + 1 \ge 3$ implies in particular that $f_\theta \in C^1$ with bounded Lipschitz derivatives, and $x^N$ has bounded $1$-variation, so the cited CDE existence theorem applies to bounded-variation drivers.
We now apply the [Universal Limit Theorem](/theorems/2540) to the primal RDE for $y^N$. Hypotheses to verify: the vector field $f_\theta$ is $\mathrm{Lip}^\gamma$ with $\gamma > p$ — this is satisfied since the standing hypothesis $\gamma > p + 1$ implies $\gamma > p$. The Universal Limit Theorem then gives that the solution map of the RDE is continuous from rough-path $p$-variation to $p$-variation, hence
\begin{align*}
y^N \xrightarrow{p\text{-var}} y \qquad \text{as } N \to \infty,
\end{align*}
where $y$ is the solution to the RDE $dy_t = f_\theta(y_t)\, dx_t$.
[guided]
The strategy of the proof is *approximation by smooth paths*: for each $N$ we have a bounded-variation driver $x^N$ and a corresponding CDE solution $y^N$, on which we already have the smooth-path adjoint formula from the [Adjoint Equation for a Neural CDE](/theorems/2543). Passing to the rough-path limit then transfers the result to the actual neural RDE.
The key existence statement we use is the definition of a geometric $p$-rough path: $x$ is geometric iff it is the limit (in $p$-variation) of canonical truncated-signature lifts of bounded-variation paths. So such a sequence $(x^N)$ exists by definition of the path space.
We verify the hypotheses for solving the smooth-path CDE driving $y^N$. The vector field $f_\theta$ is $\mathrm{Lip}^\gamma$ with $\gamma > p + 1 \ge 3$ (since $p \ge 2$ for non-trivial rough paths). $\mathrm{Lip}^\gamma$ regularity for $\gamma > 1$ implies $C^1$ with bounded Lipschitz first derivatives; in particular, the standard CDE existence theorem (Friz-Victoir, *Multidimensional Stochastic Processes as Rough Paths*, 2010, Theorem 10.14) applies to $f_\theta$ paired with any bounded-variation driver. Since $x^N$ is bounded-variation by construction, the CDE
\begin{align*}
dy_t^N = f_\theta(y_t^N)\,dx_t^N, \qquad y_0^N = y_0,
\end{align*}
has a unique solution $y^N$ on $[0,T]$.
We then apply the [Universal Limit Theorem](/theorems/2540) to upgrade the smooth-path CDE convergence to rough-path RDE convergence. The cited theorem requires the vector field to be in $\mathrm{Lip}^{\gamma'}$ with $\gamma' > p$. We verify: $f_\theta \in \mathrm{Lip}^\gamma$, and from the standing hypothesis $\gamma > p + 1 > p$, we have $\gamma' = \gamma > p$. Hypothesis met — though notably we are *not* yet consuming the strengthened threshold $\gamma > p + 1$; that threshold becomes essential only when we apply the Universal Limit Theorem to the *augmented* system $(y, z)$ in Step 3, where the vector field is $\nabla f_\theta \in \mathrm{Lip}^{\gamma - 1}$.
The Universal Limit Theorem then gives that the RDE solution map is continuous from rough-path $p$-variation to $p$-variation, hence $y^N \xrightarrow{p\text{-var}} y$ where $y$ solves the rough-path RDE driven by $x$. This is the foundation we will build the smooth-path adjoint argument on.
[/guided]
[/step]
[step:Apply the smooth-path adjoint formula]
For each $N$, the smooth driver $x^N$ has bounded variation, $f_\theta \in C^1$ (since $\mathrm{Lip}^\gamma \subseteq C^1$ for $\gamma > 1$, and we have $\gamma > p + 1 \ge 3$), and $L \in C^1$ by hypothesis. By the [Adjoint Equation for a Neural CDE](/theorems/2543) — verifying these hypotheses are exactly those of the cited theorem — the adjoint $a_t^N := \partial_{y_t^N} L(y_T^N)$ satisfies
\begin{align*}
da_t^N = -\sum_{i=1}^d (a^N)_t^\top \nabla f_\theta^i(y_t^N)\, dx_t^{N,i}, \qquad a_T^N = \nabla L(y_T^N).
\end{align*}
Setting
\begin{align*}
z_t^N := \int_0^t \sum_{i=1}^d \nabla f_\theta^i(y_s^N)\, dx_s^{N,i} \in \mathbb{R}^{e \times e},
\end{align*}
the backward CDE simplifies to $da_t^N = -(a^N)_t^\top\, dz_t^N$ with $a_T^N = \nabla L(y_T^N)$.
Moreover, by the [Adjoint Equation for a Neural CDE](/theorems/2543) (the closed-form expression derived in its proof) combined with the [Invertibility of the CDE Jacobian](/theorems/2542),
\begin{align*}
(a^N)_t^\top = \nabla L(y_T^N)^\top \cdot J_T^{0,N} \cdot M_t^{0,N},
\end{align*}
where $J_t^{s,N}$ is the Jacobian of $y^N$ from time $s$ to time $t$ and $M_t^{s,N} = (J_t^{s,N})^{-1}$.
[/step]
[step:Pass to the limit on the auxiliary driver $z^N$ and the Jacobians $J^N, M^N$]
We apply the [Universal Limit Theorem](/theorems/2540) to the augmented CDE system that simultaneously evolves $y^N$ and $z^N$. Concretely, define $G : \mathbb{R}^e \to \mathcal{L}(\mathbb{R}^d, \mathbb{R}^e \oplus \mathbb{R}^{e \times e})$ by $G(y)\, a := (f_\theta(y) a,\; \nabla f_\theta(y) a)$.
We verify the regularity hypothesis of the [Universal Limit Theorem](/theorems/2540) explicitly. The vector field $G$ has two components: $f_\theta \in \mathrm{Lip}^\gamma$ and $\nabla f_\theta$. Since $f_\theta \in \mathrm{Lip}^\gamma$, by the definition of the $\mathrm{Lip}^\gamma$ scale we have $\nabla f_\theta \in \mathrm{Lip}^{\gamma - 1}$. The augmented vector field $G$ therefore satisfies $G \in \mathrm{Lip}^{\gamma - 1}$. The Universal Limit Theorem requires the vector field to be in $\mathrm{Lip}^{\gamma'}$ with $\gamma' > p$. Setting $\gamma' = \gamma - 1$, we need $\gamma - 1 > p$, i.e. $\gamma > p + 1$ — which is *exactly* the standing hypothesis of the theorem. The hypothesis $\gamma > p + 1$ is therefore consumed precisely here.
By the [Universal Limit Theorem](/theorems/2540) applied to this augmented system, the truncated signatures $\pi_{\le \lfloor p \rfloor} S(z^N)$ converge in $p$-variation to a geometric $p$-rough path $z$ on $\mathbb{R}^{e \times e}$, and
\begin{align*}
(y^N, z^N) \xrightarrow{p\text{-var}} (y, z).
\end{align*}
For the Jacobians: $J_t^{s,N}$ satisfies the linear CDE $dJ^{s,N} = \nabla f_\theta(y^N)\, J^{s,N}\, dx^N$ — equivalently $dJ^{s,N} = J^{s,N}\, dz^N$ in the sense of the auxiliary driver. We apply the [Universal Limit Theorem](/theorems/2540) to this *linear* CDE driven by $z^N$. We verify the hypotheses: the linear vector field $J \mapsto J\, (\cdot)$ on $\mathbb{R}^{e \times e}$ is $\mathrm{Lip}^{\gamma'}$ for every $\gamma' \ge 1$ (it is linear, hence $C^\infty$ with all higher derivatives vanishing, and on any compact set its first derivative is uniformly bounded). In particular it is $\mathrm{Lip}^{\gamma'}$ with $\gamma' = \gamma > p$, so the regularity threshold is met. The Universal Limit Theorem yields
\begin{align*}
J^{s,N} \xrightarrow{p\text{-var}} J^s \qquad \text{as } N \to \infty,
\end{align*}
where $J^s$ solves the corresponding linear RDE $dJ_t^s = J_t^s\, dz_t$ with $J_s^s = I_e$.
By the [Invertibility of the CDE Jacobian](/theorems/2542) applied at each finite $N$, the inverses $M^{s,N} = (J^{s,N})^{-1}$ satisfy the dual right-acting linear CDE $dM^{s,N} = -M^{s,N}\, dz^N$. Applying the [Universal Limit Theorem](/theorems/2540) to this dual linear equation — again with the linear vector field, hence $\mathrm{Lip}^{\gamma'}$ for every $\gamma' \ge 1$, so the threshold $\gamma' > p$ is met — gives
\begin{align*}
M^{s,N} \xrightarrow{p\text{-var}} M^s \qquad \text{as } N \to \infty,
\end{align*}
where $M^s$ solves $dM_t^s = -M_t^s\, dz_t$ with $M_s^s = I_e$. By continuity of matrix inversion (and the limit of inverses is the inverse of the limit when the limit is invertible), $M_t^s = (J_t^s)^{-1}$.
[guided]
The key technical ingredient is the [Universal Limit Theorem](/theorems/2540) (also called the *continuity theorem* for rough differential equations). It says: the solution map of an RDE is locally Lipschitz in the rough-path $p$-variation topology, provided the vector field is in $\mathrm{Lip}^{\gamma'}$ with $\gamma' > p$.
Why does this require the strengthened hypothesis $\gamma > p + 1$? The augmented vector field $G(y) a = (f_\theta(y) a, \nabla f_\theta(y) a)$ has regularity $\min(\gamma, \gamma - 1) = \gamma - 1$, because differentiating once costs one degree on the $\mathrm{Lip}^\gamma$ scale. To run the Universal Limit Theorem on the augmented system we need $\gamma - 1 > p$, i.e. $\gamma > p + 1$. This is the regularity gap that makes adjoint equations harder than primal ones — you lose one degree of regularity to differentiate the vector field, and the strengthened hypothesis $\gamma > p + 1$ pays for this loss. In particular, the strengthened hypothesis is *not* a convenience: it is forced by the structure of the proof, and the bare hypothesis $\gamma > p$ would not suffice for the augmented Universal Limit Theorem step.
Once we have $(y^N, z^N) \to (y, z)$ as a joint rough-path limit, the Jacobian and its inverse are then *additional* solutions of further linear RDEs driven by $z$. For these linear RDEs, the regularity demand is much weaker: the linear vector field $J \mapsto J(\cdot)$ has *all* derivatives bounded on any compact set, so it lies in $\mathrm{Lip}^{\gamma'}$ for every $\gamma' \ge 1$ — in particular for $\gamma' > p$, regardless of the original $\gamma$. This is why the linear-RDE step does not consume any additional regularity.
Why does $M_t^s = (J_t^s)^{-1}$ in the limit? The product $J_t^{s,N} M_t^{s,N} = I_e$ holds for all $N$ (by step 4 of the [Invertibility of the CDE Jacobian](/theorems/2542) proof). Passing to the $p$-variation limit, $J_t^s M_t^s = I_e$, so $M_t^s$ is the inverse of $J_t^s$. Equivalently, the [Invertibility of the CDE Jacobian](/theorems/2542) extends to the rough-path setting via this approximation argument.
[/guided]
[/step]
[step:Identify the limit of $a^N$ as the rough-path adjoint]
We now combine the convergences from the previous step. The closed-form expression
\begin{align*}
(a^N)_t^\top = \nabla L(y_T^N)^\top \cdot J_T^{0,N} \cdot M_t^{0,N}
\end{align*}
is a continuous function (matrix product, evaluation of $\nabla L$) of $(y_T^N, J_T^{0,N}, M_t^{0,N})$. Each factor converges:
- $y_T^N \to y_T$ in $\mathbb{R}^e$ (uniform convergence of $y^N \to y$ implies pointwise at $t = T$),
- $\nabla L(y_T^N) \to \nabla L(y_T)$ since $\nabla L$ is continuous (because $L \in C^1$),
- $J_T^{0,N} \to J_T^0$ in $\mathbb{R}^{e \times e}$ (pointwise at $t = T$ from $p$-variation convergence),
- $M_t^{0,N} \to M_t^0$ uniformly in $t \in [0,T]$ (from $p$-variation convergence, which is stronger than uniform).
Therefore, uniformly in $t \in [0,T]$,
\begin{align*}
(a^N)_t^\top \to \nabla L(y_T)^\top \cdot J_T^0 \cdot M_t^0 =: a_t^\top \qquad \text{as } N \to \infty.
\end{align*}
Now we read off both the dynamics and the identification of the limit as the actual rough-path adjoint:
**Dynamics.** Each $a^N$ satisfies $da^N = -(a^N)^\top\, dz^N$ with $a_T^N = \nabla L(y_T^N)$. We invoke the [Universal Limit Theorem](/theorems/2540) one more time, applied to the *linear* backward RDE $da = -a^\top\, dz$. Hypotheses: the vector field $a \mapsto -a^\top(\cdot)$ on $\mathbb{R}^e$ is linear, hence $\mathrm{Lip}^{\gamma'}$ for every $\gamma' \ge 1$, so the threshold $\gamma' > p$ is met without consuming additional regularity. Since $a^N \to a$ uniformly and $z^N \to z$ in $p$-variation, by continuity of the linear-RDE solution map in both the driver and the terminal condition, the limit $a$ satisfies the linear RDE
\begin{align*}
da_t = -a_t^\top\, dz_t = -a_t^\top \nabla f_\theta(y_t)\, dx_t, \qquad a_T = \nabla L(y_T).
\end{align*}
**Identification with the partial derivative.** It remains to show that the limit $a_t$ is in fact the partial derivative $\partial_{y_t} L(y_T)$ for the *rough-path* system. By the smooth-path identity at finite $N$,
\begin{align*}
(a^N)_t^\top = \nabla L(y_T^N)^\top \cdot J_T^{t,N},
\end{align*}
where $J_T^{t,N} = J_T^{0,N} M_t^{0,N}$ is the forward Jacobian from $t$ to $T$ for the smooth-path system. Passing to the limit using continuous dependence of the rough-path Jacobian on the driver — by the [Universal Limit Theorem](/theorems/2540) applied to the linearised RDE, with regularity $\gamma - 1 > p$ verified above, and the Differentiability of RDE Flows (Friz-Victoir, *Multidimensional Stochastic Processes as Rough Paths*, 2010, §11) ensuring the rough-path forward Jacobian $J_T^t$ exists and is the limit of $J_T^{t,N}$ — we obtain
\begin{align*}
a_t^\top = \nabla L(y_T)^\top \cdot J_T^t = \nabla L(y_T)^\top \cdot \partial_{y_t} y_T = \partial_{y_t} L(y_T).
\end{align*}
This identifies the limit as the rough-path adjoint, completing the proof.
[guided]
The two-step structure of this last step — first establish the dynamics via continuity of the linear RDE, then identify the limit as the actual adjoint via continuity of the rough-path Jacobian — is essential. We could not directly *define* $a_t = \partial_{y_t} L(y_T)$ in the rough-path setting and then verify the equation by direct computation, because the partial-derivative interpretation requires the differentiability of RDE flows (Friz-Victoir, *Multidimensional Stochastic Processes as Rough Paths*, 2010, §11), a substantial theorem in its own right; we cite it externally rather than re-proving it.
Conversely, we could not skip the identification step and merely note that the limit of $a^N$ satisfies the backward linear RDE — that would only show the *limit* satisfies the equation, not that the limit equals the partial derivative $\partial_{y_t} L(y_T)$. The cited differentiability theorem is precisely what reconciles the two: it says the rough-path forward flow is differentiable in the initial condition, with derivative given by the linearised RDE — exactly what we obtain in the limit. So both the dynamics and the partial-derivative interpretation match up at the rough-path level.
The role of $\gamma > p + 1$ is exactly what makes both interpretations work simultaneously. The strengthened threshold $\gamma > p + 1$ (equivalently $\gamma - 1 > p$) is forced by the [Universal Limit Theorem](/theorems/2540) applied to the augmented system $(y, z)$ in the previous step: differentiating the vector field once costs one degree on the $\mathrm{Lip}$ scale, so $\nabla f_\theta \in \mathrm{Lip}^{\gamma - 1}$, and the Universal Limit Theorem demands the $\mathrm{Lip}$ exponent strictly exceed $p$. This is the regularity bookkeeping for backward equations: primal RDEs need $\gamma > p$, but adjoint RDEs need $\gamma > p + 1$.
[/guided]
[/step]