[proofplan]
The strategy is to absorb the time component into the driving signal so that drift and diffusion are treated uniformly under the rough path machinery, then apply the Universal Limit Theorem to the time-augmented signal. We define the time-augmented path $\widetilde{W}^N_t = (t, W^N_t)$ and the combined vector field $V(t, y) = (\mu(t, y), \sigma(t, y))$, recasting each CDE as an RDE driven by $\widetilde{W}^N$. The enhanced time-augmented signal $\widetilde{W}^N$ converges in $p$-variation (any $p \in (2,3)$) to $\widetilde{W}$ by the *Brownian Motion as a Rough Path* theorem; the Lip$^{2+\varepsilon}$ regularity of $\sigma$ together with $\mathrm{Lip}^1$ regularity of $\mu$ exceeds the threshold $\gamma > p$ required by the Universal Limit Theorem (taking $p$ close to $2$). The Universal Limit Theorem yields uniform convergence of the RDE solutions, and the limit $y$ satisfies the RDE driven by $\widetilde{W}$. Identifying this RDE solution with the strong Stratonovich SDE solution comes from the construction of $W$ as the Stratonovich lift, where the second-level area increments are the $L^2$-limits of the Riemann-Stieltjes areas of $W^N$ — exactly the Stratonovich correction.
[/proofplan]
[step:Augment with time and rewrite each CDE as a single RDE]
Define the time-augmented path
\begin{align*}
\widetilde{W}^N : [0,T] &\to \mathbb{R}^{1+d} \\
t &\mapsto (t,\, W^N_t),
\end{align*}
and the combined vector field
\begin{align*}
V : \mathbb{R}^{e+1} &\to \mathcal{L}(\mathbb{R}^{1+d}, \mathbb{R}^e) \\
(t, y) &\mapsto \big(\mu(t, y),\, \sigma_1(t, y),\, \ldots,\, \sigma_d(t, y)\big),
\end{align*}
viewing the column vector $\mu(t,y)$ as the action on the time-component basis vector and $\sigma_i(t,y)$ as the action on the $i$-th $\mathbb{R}^d$-basis vector. With this notation, the CDE in the hypothesis takes the unified form
\begin{align*}
dy^N_t = V(t, y^N_t)\,d\widetilde{W}^N_t, \qquad y^N_0 = a.
\end{align*}
Equivalently, this is the RDE driven by the canonical bounded-variation rough lift $\widetilde{W}^N$ of $\widetilde{W}^N$ (whose second level is given by the Riemann-Stieltjes areas of $\widetilde{W}^N$, well-defined because $\widetilde{W}^N$ has finite $1$-variation by piecewise linearity).
[guided]
The technical obstruction to applying rough-path tools directly to the CDE
\begin{align*}
dy^N_t = \mu(t, y^N_t)\,dt + \sigma(t, y^N_t)\,dW^N_t
\end{align*}
is that drift $\mu$ and diffusion $\sigma$ enter against different driving signals: $dt$ for drift, $dW^N_t$ for diffusion. Rough-path theory is cleanest when there is one driving signal. The standard fix is to **augment with time**: treat $t$ as an additional component of the driving path.
Define $\widetilde{W}^N_t = (t, W^N_t) \in \mathbb{R}^{1+d}$. The first component is just the time variable as a deterministic linear path, and the next $d$ components are the piecewise-linear approximation of Brownian motion. We then consolidate $\mu$ and $\sigma$ into a single vector field
\begin{align*}
V(t, y) = (\mu(t, y),\, \sigma_1(t, y),\, \ldots,\, \sigma_d(t, y))
\end{align*}
which acts as a linear map $\mathbb{R}^{1+d} \to \mathbb{R}^e$ on the time-augmented increment vector: applied to $d\widetilde{W}^N_t = (dt, dW^{N,1}_t, \ldots, dW^{N,d}_t)$, it produces $\mu(t, y^N_t)\,dt + \sum_i \sigma_i(t, y^N_t)\,dW^{N,i}_t$, which is exactly the original right-hand side. So
\begin{align*}
dy^N_t = V(t, y^N_t)\,d\widetilde{W}^N_t, \qquad y^N_0 = a.
\end{align*}
Why is this the same equation? Because the time-augmentation does nothing more than label the components of the driving signal — the dynamics are unchanged, and we have not assumed any new condition. The benefit is that $\widetilde{W}^N$ is now a single $(1+d)$-dimensional path, against which we can run rough-path machinery.
For each finite $N$, the path $\widetilde{W}^N$ has finite $1$-variation (piecewise linear paths are absolutely continuous), so the canonical bounded-variation lift $\widetilde{W}^N$ is well-defined: its second level is the iterated Riemann-Stieltjes integral $\widetilde{W}^{N,2}_{s,t} = \int_s^t (\widetilde{W}^N_r - \widetilde{W}^N_s) \otimes d\widetilde{W}^N_r$. Hence the CDE coincides with the RDE driven by $\widetilde{W}^N$.
[/guided]
[/step]
[step:Convergence of the time-augmented enhanced signal in $p$-variation]
By the *Brownian Motion as a Rough Path* theorem, for any $p \in (2, 3)$ the enhanced piecewise-linear interpolations $W^N$ converge in $p$-variation to the Stratonovich enhanced Brownian motion $W$ almost surely. The deterministic time component $t \mapsto t$ has $1$-variation $T$ on $[0,T]$ — equivalently, finite $p$-variation for any $p \geq 1$ — and is identical for every $N$. Concatenating the deterministic time component with the random Brownian component preserves $p$-variation convergence: the cross integrals
\begin{align*}
\int_s^t (r - s)\,dW^{N,i}_r \xrightarrow{N \to \infty} \int_s^t (r - s) \circ\,dW^i_r
\end{align*}
converge uniformly in $(s,t)$ almost surely (for piecewise-linear approximations to Brownian motion this is a standard consequence of Wong-Zakai-type arguments), and similarly for the symmetric cross integral. Therefore the full $\mathbb{R}^{1+d}$-valued enhanced signal satisfies
\begin{align*}
\widetilde{W}^N \xrightarrow{p\text{-var}} \widetilde{W} \qquad \text{almost surely, for every } p \in (2,3),
\end{align*}
where $\widetilde{W}$ is the Stratonovich lift of $(t, W_t)_{t \in [0,T]}$.
[guided]
We need a single rough-path convergence statement for the time-augmented signal $\widetilde{W}^N = (t, W^N_t)$. The strategy is to handle the time and Brownian components separately and then assemble the joint enhancement.
For the Brownian component: the *Brownian Motion as a Rough Path* theorem says that the Stratonovich rough lift $W$ — defined by $W^{2,ij}_{s,t} = \int_s^t (W^i_r - W^i_s)\circ\,dW^j_r$ — is the almost-sure $p$-variation limit of the canonical bounded-variation lifts $W^N$ of the piecewise-linear interpolations $W^N$, for any $p \in (2,3)$. We verify the hypotheses: piecewise-linear interpolations $W^N$ converge uniformly to $W$ on $[0,T]$ almost surely (a classical fact for any sufficiently fine sequence of meshes), and the Brownian paths are almost surely $\alpha$-Hölder for every $\alpha < 1/2$, hence $p$-variation finite for every $p > 2$.
For the time component: the path $t \mapsto t$ is the same for every $N$ — it is an affine, deterministic, $1$-variation path with $\|t\|_{1\text{-var};[0,T]} = T$. Its canonical second-level enhancement is the elementary iterated integral $\int_s^t (r-s)\,d r = (t-s)^2/2$, again deterministic and identical across $N$. So in $p$-variation, the time-component contribution is constant in $N$ and constantly equal to its limit.
For the joint enhancement: an enhancement of $\widetilde{W}^N = (t, W^N_t)$ in $\mathbb{R}^{1+d}$ requires three blocks of second-level area: the time-time block ($\int(r-s)\,dr$), the Brownian-Brownian block ($W^N$, the standard area), and the cross blocks $\int_s^t (r - s)\,dW^{N,i}_r$ together with $\int_s^t (W^{N,i}_r - W^{N,i}_s)\,dr$. The diagonal blocks are handled above. For the cross blocks, integration by parts on the bounded-variation path $W^N$ reduces them to integrals of $W^N$ against $dr$, which converge uniformly in $(s,t)$ almost surely to the corresponding Stratonovich integrals — this is the standard Wong-Zakai-type convergence for piecewise-linear approximations. Concretely,
\begin{align*}
\int_s^t (r - s)\,dW^{N,i}_r \xrightarrow{N \to \infty} \int_s^t (r - s) \circ\,dW^i_r
\end{align*}
uniformly in $(s,t) \in [0,T]^2$ almost surely, and symmetrically for $\int_s^t (W^{N,i}_r - W^{N,i}_s)\,dr$. Combining the three blocks, the full $\mathbb{R}^{1+d}$-enhanced signal satisfies
\begin{align*}
\widetilde{W}^N \xrightarrow{p\text{-var}} \widetilde{W} \qquad \text{almost surely, for every } p \in (2,3),
\end{align*}
where $\widetilde{W}$ is the Stratonovich lift of $(t, W_t)_{t \in [0,T]}$. This is the convergence statement we need to feed into the Universal Limit Theorem in the next step.
[/guided]
[/step]
[step:Verify the Universal Limit Theorem hypotheses]
The Universal Limit Theorem requires the vector field driving the RDE to have $\mathrm{Lip}^\gamma$ regularity for some $\gamma > p$. Examine $V$:
\begin{align*}
V(t, y) = (\mu(t, y),\, \sigma_1(t, y),\, \ldots,\, \sigma_d(t, y)).
\end{align*}
The $\sigma$-block has regularity $\mathrm{Lip}^{2+\varepsilon}$ by hypothesis. The $\mu$-block has regularity only $\mathrm{Lip}^1$ by hypothesis, but acts only against the time component $dt$, which is a bounded-variation ($1$-variation) signal. Standard results in rough-path theory (RDE theory with mixed regularity, sometimes called Young pairing) allow vector fields acting against bounded-variation components to have lower regularity — $\mathrm{Lip}^1$ suffices for the drift. The effective regularity required of $V$ for the rough Brownian component is therefore the regularity of the $\sigma$-block, namely $\mathrm{Lip}^{2+\varepsilon}$.
Choose $p \in (2, 2+\varepsilon)$, which is possible because $\varepsilon > 0$ and $p$ can be taken arbitrarily close to $2$. Then $\gamma = 2 + \varepsilon > p$ and the Universal Limit Theorem applies to the RDE driven by $\widetilde{W}$.
[guided]
The Universal Limit Theorem says: if a sequence of geometric $p$-rough paths converges in $p$-variation, then the corresponding RDE solutions also converge, **provided** the driving vector field is $\mathrm{Lip}^\gamma$ with $\gamma > p$. We have to check this regularity condition for our combined vector field $V$.
There is a subtlety: $V$ has two blocks of regularity. The diffusion block $\sigma$ acts against the genuine rough Brownian component; for that block the Universal Limit Theorem demands $\gamma > p > 2$. The hypothesis $\sigma \in \mathrm{Lip}^{2+\varepsilon}$ delivers $\gamma = 2 + \varepsilon$. By picking $p$ close enough to $2$ — specifically, $p \in (2,\, 2+\varepsilon)$, which is non-empty because $\varepsilon > 0$ — we get $\gamma > p$ as required.
The drift block $\mu$ acts only against the time component, which has $1$-variation. Vector fields acting against a $1$-variation signal need only be $\mathrm{Lip}^1$ for Young-style integration to work — there is no rough correction term to worry about for the drift, since $t \mapsto t$ is smooth. The hypothesis $\mu \in \mathrm{Lip}^1$ is exactly the threshold needed.
Why is the threshold $\gamma > p$ in the Universal Limit Theorem? Heuristically: a $p$-rough path encodes information up to level $\lfloor p \rfloor$ of the signature, and when $p \in (2, 3)$ we need the second level (Lévy area). To define the RDE solution and prove continuity in the rough path, the vector field must support a Taylor expansion of order $\lfloor p \rfloor + 1$ — that is, $\mathrm{Lip}^{\lfloor p \rfloor + 1}$ regularity, which translates to needing $\gamma > p$. For $p$ close to $2$ this is barely more than $C^2$ regularity, which is what $\mathrm{Lip}^{2+\varepsilon}$ gives.
[/guided]
[/step]
[step:Apply the Universal Limit Theorem to obtain uniform convergence]
By Steps 1-3, all hypotheses of the Universal Limit Theorem are met:
(i) $\widetilde{W}^N$ are geometric $p$-rough paths (canonical lifts of bounded-variation paths are geometric);
(ii) $\widetilde{W}^N \xrightarrow{p\text{-var}} \widetilde{W}$ almost surely, with $\widetilde{W}$ a geometric $p$-rough path;
(iii) the vector field $V$ satisfies the required Lip-regularity for this $p$ (Step 3);
(iv) the initial condition $y_0^N = a$ is constant in $N$.
Therefore the RDE solutions $y^N$ driven by $\widetilde{W}^N$ converge uniformly on $[0,T]$ almost surely to the unique RDE solution $y$ driven by $\widetilde{W}$:
\begin{align*}
\sup_{t \in [0,T]} \|y^N_t - y_t\| \xrightarrow{N \to \infty} 0 \qquad \text{almost surely}.
\end{align*}
By construction (Step 1), this is the uniform almost-sure convergence claimed in the theorem.
[guided]
We now combine the ingredients of Steps 1-3 to feed the Universal Limit Theorem and extract the convergence claim.
The Universal Limit Theorem (Lyons' continuity theorem for RDEs) states: for a sequence of geometric $p$-rough paths $x^N$ converging in $p$-variation to a geometric $p$-rough path $x$, and a vector field $V$ with $\mathrm{Lip}^\gamma$ regularity ($\gamma > p$), the corresponding RDE solutions converge in $p$-variation, and in particular uniformly. We verify each hypothesis explicitly.
(i) *Geometric rough paths.* Each $\widetilde{W}^N$ has finite $1$-variation, so its canonical bounded-variation lift $\widetilde{W}^N$ is geometric (canonical lifts of bounded-variation paths are precisely the prototype geometric lifts). The limit $\widetilde{W}$ is the Stratonovich lift of Brownian motion (with appended deterministic time), which is geometric by construction — it is the $p$-variation limit of geometric bounded-variation lifts.
(ii) *$p$-variation convergence.* This is exactly Step 2: $\widetilde{W}^N \xrightarrow{p\text{-var}} \widetilde{W}$ almost surely.
(iii) *Vector field regularity.* By Step 3, choosing $p \in (2, 2+\varepsilon)$ ensures $V \in \mathrm{Lip}^{\gamma}$ with $\gamma = 2 + \varepsilon > p$ on the diffusion block. The drift block, acting only against the bounded-variation time component, requires only $\mathrm{Lip}^1$, which the hypothesis provides.
(iv) *Initial condition.* The constant initial condition $y_0^N = a$ is a constant sequence in $\mathbb{R}^e$ converging to $a$.
The Universal Limit Theorem then gives that the RDE solution map is continuous in $p$-variation, hence
\begin{align*}
y^N \xrightarrow{p\text{-var}} y \qquad \text{almost surely on } [0,T],
\end{align*}
where $y$ solves $dy_t = V(t, y_t)\,d\widetilde{W}_t$, $y_0 = a$. Since $p$-variation convergence on $[0,T]$ implies uniform convergence (the $p$-variation norm dominates the supremum norm on paths starting at a common initial value), we obtain
\begin{align*}
\sup_{t \in [0,T]} \|y^N_t - y_t\| \xrightarrow{N \to \infty} 0 \qquad \text{almost surely}.
\end{align*}
By Step 1, the RDE for $y^N$ is exactly the original CDE, so this is the uniform almost-sure convergence claimed in the theorem statement — modulo identifying $y$ with the strong Stratonovich SDE solution, which is the content of the next step.
[/guided]
[/step]
[step:Identify the RDE limit $y$ with the strong Stratonovich SDE solution]
It remains to show that $y$, the RDE solution driven by $\widetilde{W}$, coincides with the strong solution $\bar{y}$ of the Stratonovich SDE
\begin{align*}
d\bar{y}_t = \mu(t, \bar{y}_t)\,dt + \sum_{i=1}^d \sigma_i(t, \bar{y}_t) \circ\,dW^i_t, \qquad \bar{y}_0 = a.
\end{align*}
By construction of the Stratonovich enhanced Brownian motion $W$, the second-level component of $\widetilde{W}$ between any $0 \leq s \leq t \leq T$ is the Stratonovich iterated integral
\begin{align*}
\widetilde{W}^{2,ij}_{s,t} = \int_s^t (\widetilde{W}^i_r - \widetilde{W}^i_s) \circ\,d\widetilde{W}^j_r,
\end{align*}
which agrees with the $L^2$-limit (and almost-sure limit, along a subsequence) of the Riemann-Stieltjes areas of $\widetilde{W}^N$. Now consider the Riemann-Stieltjes definition of the RDE solution: by the universality of the RDE, the solution map $x \mapsto y(x)$ is the unique continuous extension to the space of geometric $p$-rough paths of the solution map for bounded-variation paths. Applied to $\widetilde{W}^N$ — the bounded-variation lift — we recover the classical CDE solution $y^N$, which is the Wong-Zakai approximation to the Stratonovich SDE.
The classical Wong-Zakai theorem states that, for piecewise-linear interpolations $W^N$ of $W$, the solutions to the random ODE $dy^N_t = \mu(t, y^N_t)\,dt + \sigma(t, y^N_t)\,dW^N_t$ converge almost surely uniformly to the strong Stratonovich solution $\bar{y}$. Hence $y^N \to \bar{y}$ almost surely. By Step 4 we also have $y^N \to y$ almost surely. Uniqueness of limits in the topology of uniform convergence forces
\begin{align*}
y = \bar{y} \qquad \text{almost surely on } [0,T].
\end{align*}
This identifies the RDE solution with the strong Stratonovich solution, completing the proof.
[guided]
We have produced two limits of the same sequence $(y^N)$:
- Step 4 (rough-path side): $y^N \to y$ uniformly on $[0,T]$ a.s., where $y$ is the RDE solution driven by $\widetilde{W}$.
- Wong-Zakai (classical side): $y^N \to \bar{y}$ uniformly on $[0,T]$ a.s., where $\bar{y}$ is the strong Stratonovich SDE solution.
These are the same sequence of random functions, so by uniqueness of limits in the (Polish) space $C([0,T]; \mathbb{R}^e)$ with the uniform topology, $y = \bar{y}$ almost surely.
Why is the rough lift of $W$ specifically the **Stratonovich** lift, and not Itô? The Stratonovich lift is defined precisely as the limit (in $L^2$ and almost surely along a subsequence) of the iterated Riemann-Stieltjes integrals of the piecewise-linear approximations:
\begin{align*}
W^{2,ij}_{s,t} = \lim_{N \to \infty} \int_s^t (W^{N,i}_r - W^{N,i}_s)\,dW^{N,j}_r.
\end{align*}
The Itô lift would replace this with a different limit (left-endpoint Riemann sums against $dW$), differing by a Stratonovich correction $\frac{1}{2}\delta_{ij}(t-s)$. Because we approximate by piecewise-linear paths and take Riemann-Stieltjes integrals, we get Stratonovich, not Itô — and this matches the classical Wong-Zakai theorem, which also produces Stratonovich limits from piecewise-linear approximations.
So the chain of identifications is: piecewise-linear-driven CDE $\Leftrightarrow$ RDE driven by canonical lift $\Rightarrow$ (taking limits in rough-path topology) RDE driven by Stratonovich-enhanced Brownian motion $=$ (by Wong-Zakai) strong Stratonovich SDE. The rough-path framework gives a deterministic, pathwise meaning to the SDE, with the stochastic content concentrated in defining $W$ once and for all.
[/guided]
[/step]