[proofplan]
We invoke the [Christmann-Steinwart Universality Criterion](/theorems/???) (Christmann and Steinwart, 2010, Theorem 2.2): for any compact metric space $\hat{\mathcal{K}}$, separable Hilbert space $\mathcal{H}$, continuous injective map $\rho : \hat{\mathcal{K}} \to \mathcal{H}$, and entire function $f : \mathbb{R} \to \mathbb{R}$, the kernel $k(x, y) = f(\langle \rho(x), \rho(y)\rangle_\mathcal{H})$ is universal on $\hat{\mathcal{K}}$. Fix a compact $\mathcal{K} \subset \tilde{\mathcal{C}_p}$. We instantiate the criterion with $\hat{\mathcal{K}} = \mathcal{P}(\mathcal{K})$, $\mathcal{H} = \mathcal{H}_\phi$, $\rho = M^\phi$, and the given entire $f$. Verifying the four hypotheses — (i) $\hat{\mathcal{K}}$ is compact metric, (ii) $\mathcal{H}_\phi$ is separable, (iii) $M^\phi$ is continuous, (iv) $M^\phi$ is injective — invokes Prokhorov's theorem, separability of RKHSs of continuous kernels on separable spaces, the metrization theorem, and characteristicness of the signature kernel respectively.
[/proofplan]
[step:State the Christmann-Steinwart universality criterion]
We invoke [Christmann-Steinwart Universality Criterion](/theorems/???) (Theorem 2.2 of Christmann and Steinwart, 2010): let $\hat{\mathcal{K}}$ be a compact metric space, $\mathcal{H}$ a separable Hilbert space, and $\rho : \hat{\mathcal{K}} \to \mathcal{H}$ a continuous injective map. Let $f : \mathbb{R} \to \mathbb{R}$ be an entire function with infinitely many strictly positive Taylor coefficients (a slight strengthening of "entire" that is standard in this corner of the literature; the present hypothesis "entire" is to be understood in this sense). Then the kernel
\begin{align*}
k : \hat{\mathcal{K}} \times \hat{\mathcal{K}} &\to \mathbb{R}, & (x, y) &\mapsto f(\langle \rho(x), \rho(y) \rangle_\mathcal{H})
\end{align*}
is universal on $\hat{\mathcal{K}}$: its RKHS is dense in $C(\hat{\mathcal{K}})$ in the uniform norm.
We must verify the four hypotheses for the choice $\hat{\mathcal{K}} = \mathcal{P}(\mathcal{K})$, $\mathcal{H} = \mathcal{H}_\phi$, $\rho = M^\phi$, where $\mathcal{K} \subset \tilde{\mathcal{C}_p}$ is a fixed compact subset.
[/step]
[step:Verify $\mathcal{P}(\mathcal{K})$ is a compact metric space in the weak topology]
By hypothesis $\mathcal{K} \subset \tilde{\mathcal{C}_p}$ is compact; since $\tilde{\mathcal{C}_p}$ is a metric space (the quotient of $\mathcal{C}_p$ by reparameterisation, equipped with the standard quotient metric), $\mathcal{K}$ is itself a compact metric space. By [Prokhorov's Theorem on Compact Metric Spaces](/theorems/???), $\mathcal{P}(\mathcal{K})$ is compact in the topology of weak convergence. Moreover, since $\mathcal{K}$ is a separable metric space, $\mathcal{P}(\mathcal{K})$ is metrizable in the weak topology — for example, by the [Lévy-Prokhorov Metric](/theorems/???). Hence $\mathcal{P}(\mathcal{K})$ is a compact metric space.
[guided]
The Christmann-Steinwart criterion takes its input space $\hat{\mathcal{K}}$ to be a **compact metric space**. Our chosen input space is $\mathcal{P}(\mathcal{K})$, the set of Borel probability measures on $\mathcal{K}$ equipped with the topology of weak convergence. We must check that this is (i) compact and (ii) metrizable.
*Compactness.* The hypothesis is that $\mathcal{K}$ is a compact subset of $\tilde{\mathcal{C}_p}$. Since $\tilde{\mathcal{C}_p}$ is a metric space (the quotient of unparameterised $p$-rough paths $\mathcal{C}_p$ by reparameterisation, equipped with the standard quotient metric inherited from $\mathcal{C}_p$), $\mathcal{K}$ inherits a metric structure and is thus a compact metric space.
We invoke [Prokhorov's Theorem](/theorems/???). Prokhorov states: in a Polish space (complete separable metric space), a family $\Gamma \subset \mathcal{P}(X)$ is weakly relatively compact iff it is **tight**. When the underlying space $X$ itself is compact, every probability measure satisfies tightness directly (taking the compact set $X$ in the tightness condition $\mu(K_\varepsilon) \ge 1 - \varepsilon$, since $\mu(X) = 1 \ge 1 - \varepsilon$ for any $\varepsilon > 0$). So **all** of $\mathcal{P}(\mathcal{K})$ is tight, hence weakly relatively compact. Combined with closedness of $\mathcal{P}(\mathcal{K})$ in itself, this gives compactness in the weak topology.
We verify Prokhorov's hypotheses: $\mathcal{K}$ is a compact metric space, hence complete (compact metric spaces are complete) and separable (compact metric spaces are separable). So $\mathcal{K}$ is Polish, Prokhorov applies, and $\mathcal{P}(\mathcal{K})$ is weakly compact.
*Metrizability.* For $\mathcal{P}(\mathcal{K})$ to be a *metric* space (not just a topological one), the weak topology must be metrizable. The standard result: when $\mathcal{K}$ is a separable metric space, the weak topology on $\mathcal{P}(\mathcal{K})$ is metrized by the [Lévy-Prokhorov Metric](/theorems/???)
\begin{align*}
d_{LP}(\mu, \nu) := \inf\{\varepsilon > 0 : \mu(A) \le \nu(A^\varepsilon) + \varepsilon \text{ and } \nu(A) \le \mu(A^\varepsilon) + \varepsilon \text{ for all Borel } A\},
\end{align*}
where $A^\varepsilon$ is the open $\varepsilon$-neighbourhood of $A$ in $\mathcal{K}$. The hypothesis we consume here is separability of $\mathcal{K}$, which we already verified above. So $\mathcal{P}(\mathcal{K})$ is metrizable.
Combining: $\mathcal{P}(\mathcal{K})$ is a compact metric space in the weak topology. The Christmann-Steinwart hypothesis (i) holds.
[/guided]
[/step]
[step:Verify $\mathcal{H}_\phi$ is separable]
The signature kernel $k_\phi : \mathcal{K} \times \mathcal{K} \to \mathbb{R}$ is continuous (by hypothesis on the signature kernel) and $\mathcal{K}$ is a separable metric space (compact metric implies separable). By the [Separability of RKHS for Continuous Kernels on Separable Spaces](/theorems/???) (e.g. Steinwart and Christmann, *Support Vector Machines*, Lemma 4.33): if the input space is separable and the kernel is continuous, the associated RKHS is separable.
Hence $\mathcal{H}_\phi$ is separable.
[guided]
The Christmann-Steinwart criterion takes its codomain $\mathcal{H}$ to be a **separable Hilbert space**. We use $\mathcal{H} = \mathcal{H}_\phi$, the RKHS of the signature kernel $k_\phi$, and must verify separability.
That $\mathcal{H}_\phi$ is a Hilbert space is built into the definition of an RKHS — the Moore-Aronszajn construction guarantees this. So the only non-trivial check is **separability**: $\mathcal{H}_\phi$ admits a countable dense subset.
We invoke [Separability of RKHS for Continuous Kernels on Separable Spaces](/theorems/???) (Steinwart-Christmann, *Support Vector Machines*, Lemma 4.33). The theorem states: if $X$ is a separable topological space and $k : X \times X \to \mathbb{R}$ is a continuous positive-definite kernel, then the associated RKHS $\mathcal{H}_k$ is separable. We verify both hypotheses.
- **Separability of $\mathcal{K}$.** $\mathcal{K}$ is compact (hypothesis) and metric (Step 2), hence separable: every compact metric space is separable, since for each $n \ge 1$ the cover by $1/n$-balls admits a finite subcover and the union of these finite centres over all $n$ gives a countable dense set.
- **Continuity of $k_\phi$.** By hypothesis, $k_\phi$ satisfies the conditions of the *Sufficient Condition for Signature Membership*, which include continuity of $k_\phi : \mathcal{K} \times \mathcal{K} \to \mathbb{R}$.
Both hypotheses hold, so the theorem applies: $\mathcal{H}_\phi$ is separable. The Christmann-Steinwart hypothesis (ii) holds.
The mechanics of the proof of Lemma 4.33 (which we do not reproduce): pick a countable dense $\{x_n\}_n \subset \mathcal{K}$; the kernel sections $\{k_\phi(\cdot, x_n)\}_n$ span a dense subspace of $\mathcal{H}_\phi$ (by continuity of $k_\phi$, every $k_\phi(\cdot, x)$ is approximated in norm by some $k_\phi(\cdot, x_n)$); rational linear combinations of $\{k_\phi(\cdot, x_n)\}_n$ form a countable dense subset.
[/guided]
[/step]
[step:Verify $M^\phi : \mathcal{P}(\mathcal{K}) \to \mathcal{H}_\phi$ is continuous]
This is the substance of [MMD Metrizes Weak Convergence](/theorems/2523), specifically the first half of that theorem: $M^\phi$ is continuous from the weak topology on $\mathcal{P}(\mathcal{K})$ to the Hilbert-norm topology on $\mathcal{H}_\phi$. The hypothesis required there — that $k_\phi$ is continuous and bounded on $\mathcal{K} \times \mathcal{K}$ — holds because $k_\phi$ is continuous (by assumption) and $\mathcal{K} \times \mathcal{K}$ is compact.
[guided]
Continuity of the kernel mean embedding is a non-trivial fact that requires the specific structure of $\mathcal{P}(\mathcal{K})$ and the continuity of $k_\phi$. We do not re-prove it here; we cite [MMD Metrizes Weak Convergence](/theorems/2523), whose hypotheses we verify:
- $\mathcal{K}$ is compact: yes, by assumption $\mathcal{K} \subset \tilde{\mathcal{C}_p}$ is compact.
- $k_\phi$ is continuous on $\mathcal{K} \times \mathcal{K}$: yes, by the hypothesis that the signature kernel satisfies the conditions of *Sufficient Condition for Signature Membership*, which include continuity.
- $k_\phi$ is bounded on $\mathcal{K} \times \mathcal{K}$: a continuous function on a compact set is bounded.
The conclusion of that theorem is exactly: the map $M^\phi : \mathcal{P}(\mathcal{K}) \to \mathcal{H}_\phi$ is continuous from the weak topology to the Hilbert-norm topology. Equivalently, $\mu_n \rightharpoonup \mu \implies \|M^\phi_{\mu_n} - M^\phi_\mu\|_{\mathcal{H}_\phi} \to 0$. Since $\mathcal{P}(\mathcal{K})$ is metrizable (Step 2), continuity is equivalent to sequential continuity, which is what the metrization theorem provides.
[/guided]
[/step]
[step:Verify $M^\phi$ is injective via characteristicness of $k_\phi$]
Suppose $\mu, \nu \in \mathcal{P}(\mathcal{K})$ satisfy $M^\phi_\mu = M^\phi_\nu$. Then
\begin{align*}
d_\phi(\mu, \nu) = \|M^\phi_\mu - M^\phi_\nu\|_{\mathcal{H}_\phi} = 0.
\end{align*}
By [MMD is a Metric under Characteristicness](/theorems/2522), since $k_\phi$ is characteristic on $\mathcal{P}(\mathcal{K})$ (which follows from the hypothesis that $k_\phi$ satisfies the conditions of *Sufficient Condition for Signature Membership*, in particular universality, which implies characteristicness), $d_\phi(\mu, \nu) = 0$ forces $\mu = \nu$. Hence $M^\phi$ is injective.
[guided]
Injectivity of the kernel mean embedding is exactly the **characteristicness** of $k_\phi$ — the two notions coincide by definition. We unwrap the chain.
Suppose $\mu, \nu \in \mathcal{P}(\mathcal{K})$ satisfy $M^\phi_\mu = M^\phi_\nu$ in $\mathcal{H}_\phi$. Then their difference is the zero element of $\mathcal{H}_\phi$, so its norm vanishes:
\begin{align*}
d_\phi(\mu, \nu) := \|M^\phi_\mu - M^\phi_\nu\|_{\mathcal{H}_\phi} = 0,
\end{align*}
where we used the definition of the maximum mean discrepancy as the $\mathcal{H}_\phi$-distance between mean embeddings.
We now invoke [MMD is a Metric under Characteristicness](/theorems/2522). Its hypothesis is that $k_\phi$ is characteristic on $\mathcal{P}(\mathcal{K})$, i.e. the map $\mu \mapsto M^\phi_\mu$ is injective. Why is $k_\phi$ characteristic? The hypothesis of the theorem we are proving — that $k_\phi$ satisfies the conditions of the *Sufficient Condition for Signature Membership* — implies $k_\phi$ is universal on $\mathcal{P}(\mathcal{K})$ (see [Universal Implies Characteristic](/theorems/???)), and universal kernels are characteristic. So $k_\phi$ is characteristic, and Theorem 2522 applies: $d_\phi$ is a genuine metric on $\mathcal{P}(\mathcal{K})$, in particular $d_\phi(\mu, \nu) = 0 \implies \mu = \nu$.
Substituting back: $M^\phi_\mu = M^\phi_\nu \implies d_\phi(\mu, \nu) = 0 \implies \mu = \nu$. Hence $M^\phi$ is injective on $\mathcal{P}(\mathcal{K})$.
A subtle point: we are using "universality $\Rightarrow$ characteristicness" of $k_\phi$ to conclude injectivity of $M^\phi$, which then enables Christmann-Steinwart to conclude universality of the *outer* kernel $K_\phi$ on $\mathcal{P}(\mathcal{K})$. This is not circular — universality of $k_\phi$ on $\mathcal{K}$ (paths) is the input; universality of $K_\phi$ on $\mathcal{P}(\mathcal{K})$ (probability measures on paths) is the output. The criterion is bootstrapping universality from one space to the next via the mean-embedding map.
[/guided]
[/step]
[step:Apply the Christmann-Steinwart criterion to conclude]
We have verified all four hypotheses of [Christmann-Steinwart Universality Criterion](/theorems/???):
\begin{align*}
\hat{\mathcal{K}} = \mathcal{P}(\mathcal{K}) &\text{ is a compact metric space} && \text{(Step 2)}, \\
\mathcal{H} = \mathcal{H}_\phi &\text{ is a separable Hilbert space} && \text{(Step 3)}, \\
\rho = M^\phi &\text{ is continuous} && \text{(Step 4)}, \\
\rho = M^\phi &\text{ is injective} && \text{(Step 5)},
\end{align*}
and $f : \mathbb{R} \to \mathbb{R}$ is entire by hypothesis. By the criterion, the kernel
\begin{align*}
K_\phi : \mathcal{P}(\mathcal{K}) \times \mathcal{P}(\mathcal{K}) &\to \mathbb{R}, & (\mu, \nu) &\mapsto f(\langle M^\phi_\mu, M^\phi_\nu \rangle_{\mathcal{H}_\phi})
\end{align*}
is universal on $\mathcal{P}(\mathcal{K})$: its RKHS is dense in $C(\mathcal{P}(\mathcal{K}))$ in the uniform norm.
Since this holds for every compact $\mathcal{K} \subset \tilde{\mathcal{C}_p}$, $K_\phi$ is universal on $\mathcal{P}(\tilde{\mathcal{C}_p})$ in the sense stated. (Note: continuous functions on $\mathcal{P}(\mathcal{K})$ in the weak topology are precisely the weakly continuous functions, by definition; this matches the conclusion of the theorem.) This completes the proof.
[/step]