Universality of Linear Neural CDEs — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The strategy is to realise every truncated-signature linear functional as a linear neural CDE. Fix $N \in \mathbb{N} \cup \{0\}$ and a covector $g \in T^N(\mathbb{R}^d)^*$, and consider the map $\Phi_g : x \mapsto g(\pi_{\le N} S(x)_{0,T})$. The truncated signature itself solves a linear CDE on the truncated tensor algebra with the linear vector field $f(y) a := \pi_{\le N}(\mathfrak{i}_{\le N}(y) \cdot \mathfrak{i}_1(a))$ from $y_0 = (1, 0, \ldots, 0)$, so the identification $T^N(\mathbb{R}^d) \cong \mathbb{R}^M$ with $M = (d^{N+1}-1)/(d-1)$ exhibits $\Phi_g$ as an element of $\mathcal{B}_K$. The class $\mathcal{A}_K := \{\Phi_g\big|_K : N, g\}$ is a point-separating subalgebra of $C(K)$ that contains the constants, so the Stone-Weierstrass Theorem (see Rudin, *Real and Complex Analysis*) gives $\overline{\mathcal{A}_K} = C(K)$. Since $\mathcal{A}_K \subseteq \mathcal{B}_K \subseteq C(K)$, density transfers to $\mathcal{B}_K$. [/proofplan] [step:Realise the truncated signature as the solution of a linear CDE on the truncated tensor algebra] Fix $N \in \mathbb{N} \cup \{0\}$. Let $T^N(\mathbb{R}^d) := \bigoplus_{k=0}^{N} (\mathbb{R}^d)^{\otimes k}$ be the truncated tensor algebra, equipped with the truncated tensor product $\cdot_N$ that discards components of degree $> N$. As a vector space $T^N(\mathbb{R}^d) \cong \mathbb{R}^M$ with $M = (d^{N+1}-1)/(d-1)$. Let $\mathfrak{i}_{\le N} : T^N(\mathbb{R}^d) \to T^N(\mathbb{R}^d)$ be the identity inclusion and $\mathfrak{i}_1 : \mathbb{R}^d \to T^N(\mathbb{R}^d)$ the canonical embedding sending $a \in \mathbb{R}^d$ to its degree-one component. Define the linear map \begin{align*} f : T^N(\mathbb{R}^d) &\to \mathcal{L}(\mathbb{R}^d, T^N(\mathbb{R}^d)) \\ y &\mapsto \bigl(a \mapsto \pi_{\le N}(\mathfrak{i}_{\le N}(y) \cdot \mathfrak{i}_1(a))\bigr). \end{align*} This $f$ is linear in $y$ (the truncated tensor product is bilinear) and the inner expression $a \mapsto y \cdot a$ is linear in $a$, so $f \in \mathcal{L}(T^N(\mathbb{R}^d), \mathcal{L}(\mathbb{R}^d, T^N(\mathbb{R}^d)))$. Under the identification $T^N(\mathbb{R}^d) \cong \mathbb{R}^M$, this is exactly an element of $\mathcal{L}(\mathbb{R}^M, \mathcal{L}(\mathbb{R}^d, \mathbb{R}^M))$ — the type required by the definition of $\mathcal{B}_K$ in the theorem statement. Set $y_0 := (1, 0, \ldots, 0) \in T^N(\mathbb{R}^d)$ (the unit of the tensor algebra). For $x \in C_{1,0,t}$, the [Truncated Signature](/page/Truncated%20Signature) $Y_t := \pi_{\le N} S(x)_{0,t}$ satisfies the Chen-type relation \begin{align*} dY_t = Y_t \cdot dx_t = f(Y_t)\, dx_t, \qquad Y_0 = y_0, \end{align*} which is exactly the linear CDE with vector field $f$ started at $y_0$. By the Existence and Uniqueness of CDE Solutions (Friz-Victoir, *Multidimensional Stochastic Processes as Rough Paths*, 2010, Theorem 10.14) for Lipschitz vector fields driven by bounded-variation paths, this solution is unique. We verify the hypotheses: $f$ is linear (hence $C^\infty$ with bounded derivatives on any compact set) and $x \in C_{1,0,t}$ has bounded $1$-variation, so the Friz-Victoir existence/uniqueness result applies. In particular, evaluating at $t = T$, \begin{align*} y_T = \pi_{\le N} S(x)_{0,T}. \end{align*} [guided] The strategy of the universality proof is to embed every truncated-signature linear functional into the class $\mathcal{B}_K$ of linear neural CDEs. The first step is to realise the truncated signature itself as the solution of a linear CDE — this is the algebraic content that makes the embedding possible. The truncated signature $Y_t = \pi_{\le N} S(x)_{0,t}$ takes values in the truncated tensor algebra $T^N(\mathbb{R}^d)$. It is built from iterated Riemann-Stieltjes integrals of the path $x$, and it satisfies a fundamental recursion known as **Chen's relation**: \begin{align*} S(x)_{0,t} = 1 + \int_0^t S(x)_{0,s} \otimes dx_s. \end{align*} Truncating at level $N$ — and observing that $\pi_{\le N}(\xi \otimes a) = \pi_{\le N}(\pi_{\le N}(\xi) \cdot_N a)$ for $a \in \mathbb{R}^d$ embedded in degree one — we obtain the recursion \begin{align*} Y_t = (1, 0, \ldots, 0) + \int_0^t \pi_{\le N}(\mathfrak{i}_{\le N}(Y_s) \cdot \mathfrak{i}_1(dx_s)), \end{align*} which is the integral form of a linear CDE on $T^N(\mathbb{R}^d)$. In differential form, \begin{align*} dY_t = f(Y_t)\,dx_t, \qquad Y_0 = (1, 0, \ldots, 0), \end{align*} with $f: T^N(\mathbb{R}^d) \to \mathcal{L}(\mathbb{R}^d, T^N(\mathbb{R}^d))$ given by $f(y)\,a := \pi_{\le N}(\mathfrak{i}_{\le N}(y) \cdot \mathfrak{i}_1(a))$. We verify that $f$ is a linear vector field of the type required by the definition of $\mathcal{B}_K$. Linearity in $y$: the truncated tensor product is bilinear, so for fixed $a$, $y \mapsto \pi_{\le N}(y \cdot a)$ is linear in $y$. Linearity in $a$: $\mathfrak{i}_1$ is linear by construction, and the truncated tensor product is again bilinear, so for fixed $y$, $a \mapsto \pi_{\le N}(y \cdot a)$ is linear in $a$. Hence $f$ is a *bilinear* map, equivalently an element of $\mathcal{L}(T^N(\mathbb{R}^d), \mathcal{L}(\mathbb{R}^d, T^N(\mathbb{R}^d)))$. We verify the existence-and-uniqueness hypotheses of the [Existence and Uniqueness of CDE Solutions](/theorems/???): (i) the vector field $f$ is linear (hence $C^\infty$ globally, with all partial derivatives bounded on any compact set), so in particular Lipschitz on bounded subsets; (ii) the driver $x \in C_{1,0,t}$ has bounded $1$-variation by the definition of the path space. Both hypotheses are met. Hence the linear CDE has a unique global solution $Y$, and by direct verification of Chen's relation $Y_t = \pi_{\le N} S(x)_{0,t}$ is precisely this solution. Evaluating at $t = T$ identifies the truncated signature as the terminal value of the linear neural CDE. This is the key reduction: every truncated-signature linear functional is "$g$ applied to the terminal value of a linear neural CDE". In the next step we make the parameter dimension $M = (d^{N+1}-1)/(d-1)$ match the requirement of $\mathcal{B}_K$ via the linear isomorphism $T^N(\mathbb{R}^d) \cong \mathbb{R}^M$. [/guided] [/step] [step:Identify the parameterised class $\mathcal{B}_K$ with bounded linear vector fields on a compact set] We pause to clarify the identification of $\mathcal{B}_K$ in the theorem statement. The set $\mathcal{B}_K$ is the parameterised class of *bounded* linear vector fields on $\mathbb{R}^M$: each element is determined by a triple $(f, g, y_0)$ where $f \in \mathcal{L}(\mathbb{R}^M, \mathcal{L}(\mathbb{R}^d, \mathbb{R}^M))$ ranges over the space of linear vector fields on $\mathbb{R}^M$ driven by $\mathbb{R}^d$-valued increments, $g \in (\mathbb{R}^M)^*$ is a linear readout, and $y_0 \in \mathbb{R}^M$ is the initial condition. When restricted to a compact set $K \subseteq C_{1,0,t}$, the resulting functionals $\Psi_{f,g,y_0}|_K$ are continuous (by the [Universal Limit Theorem](/theorems/2540) applied to the linear vector field $f$, which is $\mathrm{Lip}^\gamma$ for every $\gamma$). Implicitly, the parameter triples $(f, g, y_0)$ such that $\Psi_{f,g,y_0}|_K$ produces a given target functional form a compact subset of the parameter space — but for the universality statement, no such compactness on the parameters is needed; we range over *all* admissible triples. In particular, the construction of step 1 produces a triple $(f, g, y_0)$ with $f \in \mathcal{L}(T^N(\mathbb{R}^d), \mathcal{L}(\mathbb{R}^d, T^N(\mathbb{R}^d)))$, $g \in T^N(\mathbb{R}^d)^*$, $y_0 \in T^N(\mathbb{R}^d)$. Under the identification $T^N(\mathbb{R}^d) \cong \mathbb{R}^M$ with $M = (d^{N+1}-1)/(d-1)$, this triple is exactly an element of the parameter space defining $\mathcal{B}_K$. No extension or modification of the vector field is required: the driver $x$ is $\mathbb{R}^d$-valued in both the construction and the definition of $\mathcal{B}_K$. [/step] [step:Express every truncated-signature linear functional as some $\Psi_{f,g,y_0}$] For $g \in T^N(\mathbb{R}^d)^* \cong (\mathbb{R}^M)^*$, define \begin{align*} \Phi_g : C_{1,0,t} &\to \mathbb{R} \\ x &\mapsto g(\pi_{\le N} S(x)_{0,T}). \end{align*} By the previous step, with $f$ and $y_0$ as constructed, $\pi_{\le N} S(x)_{0,T} = y_T$ where $y$ is the unique solution to $dy_t = f(y_t)\, dx_t$, $y_0 = (1, 0, \ldots, 0)$. Hence \begin{align*} \Phi_g(x) = g(y_T) = \Psi_{f, g, y_0}(x). \end{align*} The triple $(f, g, y_0)$ is admissible in the definition of $\mathcal{B}_K$ as established above: $M = (d^{N+1}-1)/(d-1) \in \mathbb{N}$, $g \in (\mathbb{R}^M)^*$, $f \in \mathcal{L}(\mathbb{R}^M, \mathcal{L}(\mathbb{R}^d, \mathbb{R}^M))$, and $y_0 \in \mathbb{R}^M$. Therefore \begin{align*} \mathcal{A}_K := \bigl\{\Phi_g\big|_K : N \in \mathbb{N} \cup \{0\},\; g \in T^N(\mathbb{R}^d)^*\bigr\} \subseteq \mathcal{B}_K. \end{align*} [/step] [step:Apply signature universality on $K$] We invoke the [Universality of the Signature](/theorems/2503) on a compact set $K \subseteq C_{1,0,t}$ with respect to the $1$-variation topology. We verify the hypotheses: (i) $K$ is compact in the $1$-variation topology (theorem hypothesis), and (ii) paths in $C_{1,0,t}$ are time-augmented — they start at $0$ and have a strictly monotone time component — so the [Signature Determines Tree-Like Class (Hambly-Lyons)](/theorems/2509) implies that signatures separate points of $C_{1,0,t}$. Therefore the conditions of [Universality of the Signature](/theorems/2503) are met, and we conclude that $\mathcal{A}_K$ is dense in $C(K)$ in the uniform topology. [guided] We invoke the [Universality of the Signature](/theorems/2503). The theorem requires (i) $K$ to be compact in the $1$-variation topology and (ii) paths in $K$ to have a time-augmentation that ensures their signatures uniquely determine them. Both hypotheses are verified explicitly: (i) is a standing hypothesis on $K$ in the theorem statement, and (ii) is built into the space $C_{1,0,t}$ — these are continuous, bounded-variation paths starting at $0$ with a strictly monotone time channel. The strictly monotone time channel is precisely the augmentation condition required by the [Signature Determines Tree-Like Class (Hambly-Lyons)](/theorems/2509), which states that the signature map is injective on tree-reduced paths and that augmentation by a monotone channel kills the tree-like equivalence classes. The proof of signature universality is a Stone-Weierstrass argument; we sketch why each ingredient is satisfied: - $\mathcal{A}_K$ is a linear subspace of $C(K)$ because $\Phi_{\alpha g_1 + \beta g_2} = \alpha \Phi_{g_1} + \beta \Phi_{g_2}$ when $g_1, g_2$ live in the same $T^{N}(\mathbb{R}^d)^*$ (and we can always lift to a common $N$). - $\mathcal{A}_K$ is closed under products: by the shuffle-product identity for signatures, $\Phi_{g_1}(x) \cdot \Phi_{g_2}(x) = \Phi_{g_1 \shuffle g_2}(x)$ where the shuffle product is again a linear functional on a higher-truncated signature. - $\mathcal{A}_K$ contains the constants: take $N = 0$ and $g = 1 \in (\mathbb{R})^*$. - $\mathcal{A}_K$ separates points of $K$: if $x \neq x'$ in $K \subseteq C_{1,0,t}$, then $S(x) \neq S(x')$ by [Hambly-Lyons](/theorems/2509), so some component $\pi_{\le N}$ differs, and a suitable $g$ separates them. The Stone-Weierstrass Theorem (see Rudin, *Real and Complex Analysis*, Theorem 4.51) on the compact Hausdorff space $K$ then gives $\overline{\mathcal{A}_K} = C(K)$ in the uniform topology. This is the content of the cited universality result. [/guided] [/step] [step:Transfer density from $\mathcal{A}_K$ to $\mathcal{B}_K$] By the previous step, $\mathcal{A}_K$ is dense in $C(K)$ in uniform convergence. By the inclusion $\mathcal{A}_K \subseteq \mathcal{B}_K \subseteq C(K)$ established in step 2, every uniform neighbourhood of an element $\varphi \in C(K)$ contains an element of $\mathcal{A}_K$, hence an element of $\mathcal{B}_K$. Therefore $\mathcal{B}_K$ is dense in $C(K)$. [guided] The transfer of density is purely set-theoretic. Suppose $\varphi \in C(K)$ and $\varepsilon > 0$. By density of $\mathcal{A}_K$, there exists $\Phi_g \in \mathcal{A}_K$ with $\sup_{x \in K} |\varphi(x) - \Phi_g(x)| < \varepsilon$. Since $\mathcal{A}_K \subseteq \mathcal{B}_K$, the same element $\Phi_g \in \mathcal{B}_K$ witnesses density of $\mathcal{B}_K$ at $\varphi$. The point of formulating the theorem in terms of the larger class $\mathcal{B}_K$ is practical: $\mathcal{B}_K$ also contains every linear neural CDE that is *not* literally a truncated signature, but the density already follows from the signature subclass alone. [/guided] This completes the proof that $\mathcal{B}_K$ is dense in $C(K)$ with respect to uniform convergence. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Universality of Linear Neural CDEs (Theorem # 2541)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Universality of Linear Neural CDEs (Theorem # 2541)

Discussion

Proof

Explore Further