[proofplan]
Universality of $k_\phi$ is, by definition, the statement that the family $\mathcal{V} := \operatorname{span}\{k_\phi(\gamma, \cdot)|_\mathcal{K} : \gamma \in \mathcal{K}\}$ of finite linear combinations of kernel sections is dense in $C(\mathcal{K})$ uniformly, for every compact $\mathcal{K} \subset \mathcal{C}_p$. The proof rests on the auxiliary fact that $\mathcal{V}$ is also $\|\cdot\|_\infty$-dense in the larger family $\mathcal{H}_\phi|_\mathcal{K}$ — a standard reproducing-kernel computation upgraded from $\mathcal{H}_\phi$-norm density to uniform density via the pointwise bound $|f(\gamma)| \leq \sqrt{k_\phi(\gamma,\gamma)}\|f\|_{\mathcal{H}_\phi}$. With this lemma, the forward direction is immediate inclusion $\mathcal{V} \subseteq \mathcal{H}_\phi|_\mathcal{K}$, and the reverse direction is a triangle-inequality chase: an arbitrary $f \in C(\mathcal{K})$ is first $\varepsilon/2$-approximated by an element of $\mathcal{H}_\phi|_\mathcal{K}$, which is then $\varepsilon/2$-approximated by an element of $\mathcal{V}$.
[/proofplan]
[step:Establish the lemma that the kernel-section span is uniformly dense in the restricted RKHS]
Fix a compact $\mathcal{K} \subset \mathcal{C}_p$. We show:
\begin{align*}
\mathcal{V} := \operatorname{span}\{k_\phi(\gamma, \cdot)|_\mathcal{K} : \gamma \in \mathcal{K}\} \quad \text{is uniformly dense in} \quad \mathcal{H}_\phi|_\mathcal{K}.
\end{align*}
Let $f \in \mathcal{H}_\phi|_\mathcal{K}$ and $\varepsilon > 0$. Then $f = g|_\mathcal{K}$ for some $g \in \mathcal{H}_\phi$. By the [definition of the RKHS as the closure of the linear span of kernel sections in the $\mathcal{H}_\phi$-norm](/theorems/???), there exist $n \in \mathbb{N}$, points $\gamma_1, \ldots, \gamma_n \in \mathcal{C}_p$, and coefficients $c_1, \ldots, c_n \in \mathbb{R}$ such that the finite combination $g_n := \sum_{i=1}^n c_i k_\phi(\gamma_i, \cdot)$ satisfies $\|g - g_n\|_{\mathcal{H}_\phi} < \delta$ for any prescribed $\delta > 0$.
By the [reproducing property](/theorems/???), for each $\gamma \in \mathcal{K}$,
\begin{align*}
|g(\gamma) - g_n(\gamma)| = |\langle g - g_n, k_\phi(\gamma, \cdot)\rangle_{\mathcal{H}_\phi}| \leq \|g - g_n\|_{\mathcal{H}_\phi} \cdot \|k_\phi(\gamma, \cdot)\|_{\mathcal{H}_\phi} = \|g - g_n\|_{\mathcal{H}_\phi} \cdot \sqrt{k_\phi(\gamma, \gamma)},
\end{align*}
where we applied Cauchy--Schwarz in $\mathcal{H}_\phi$ and used $\|k_\phi(\gamma,\cdot)\|_{\mathcal{H}_\phi}^2 = \langle k_\phi(\gamma,\cdot), k_\phi(\gamma,\cdot)\rangle_{\mathcal{H}_\phi} = k_\phi(\gamma,\gamma)$.
The function $\gamma \mapsto k_\phi(\gamma, \gamma) = \|S(\gamma)\|_\phi^2$ is continuous on $\mathcal{C}_p$ because $S$ is continuous (by hypothesis) and $\|\cdot\|_\phi^2$ is continuous on $T_\phi((V))$. By compactness of $\mathcal{K}$, the [extreme value theorem](/theorems/???) gives
\begin{align*}
M_\mathcal{K} := \sup_{\gamma \in \mathcal{K}} \sqrt{k_\phi(\gamma, \gamma)} < \infty.
\end{align*}
Hence
\begin{align*}
\sup_{\gamma \in \mathcal{K}} |g(\gamma) - g_n(\gamma)| \leq M_\mathcal{K} \, \|g - g_n\|_{\mathcal{H}_\phi}.
\end{align*}
Choosing $\delta := \varepsilon / M_\mathcal{K}$ (or $\delta = \varepsilon$ if $M_\mathcal{K} = 0$, in which case $\mathcal{H}_\phi|_\mathcal{K} = \{0\}$ directly), we obtain $\|f - g_n|_\mathcal{K}\|_\infty < \varepsilon$. Since $g_n|_\mathcal{K} \in \mathcal{V}$, this shows $\mathcal{V}$ is $\|\cdot\|_\infty$-dense in $\mathcal{H}_\phi|_\mathcal{K}$.
[guided]
The lemma is the analytic backbone of both directions of the equivalence. It asserts that even though $\mathcal{H}_\phi|_\mathcal{K}$ is *a priori* larger than $\mathcal{V}$, the two have the same uniform closure inside $C(\mathcal{K})$. We prove it in three movements: (a) write $f \in \mathcal{H}_\phi|_\mathcal{K}$ as the restriction of an element $g$ of the abstract RKHS; (b) approximate $g$ in $\mathcal{H}_\phi$-norm by a finite kernel-section combination $g_n$; (c) convert RKHS-norm closeness to uniform closeness on $\mathcal{K}$ via a pointwise evaluation bound that comes from the reproducing property.
Fix $f \in \mathcal{H}_\phi|_\mathcal{K}$ and $\varepsilon > 0$. By definition of the restriction, there exists $g \in \mathcal{H}_\phi$ with $f = g|_\mathcal{K}$.
**(a) Density of kernel-section combinations in $\mathcal{H}_\phi$.** By the [definition of the RKHS as the closure of the linear span of kernel sections in the $\mathcal{H}_\phi$-norm](/theorems/???), for every $\delta > 0$ there exist $n \in \mathbb{N}$, points $\gamma_1, \ldots, \gamma_n \in \mathcal{C}_p$, and coefficients $c_1, \ldots, c_n \in \mathbb{R}$ with
\begin{align*}
g_n := \sum_{i=1}^n c_i k_\phi(\gamma_i, \cdot) \in \mathcal{H}_\phi, \qquad \|g - g_n\|_{\mathcal{H}_\phi} < \delta.
\end{align*}
We will fix $\delta$ at the end of the argument once we know how RKHS-norm error converts to uniform-norm error.
**(b) Pointwise evaluation bound from the reproducing property.** For any $\gamma \in \mathcal{K}$ the [reproducing property](/theorems/???) gives $g(\gamma) - g_n(\gamma) = \langle g - g_n, k_\phi(\gamma, \cdot)\rangle_{\mathcal{H}_\phi}$. Applying Cauchy--Schwarz in the Hilbert space $\mathcal{H}_\phi$,
\begin{align*}
|g(\gamma) - g_n(\gamma)| \leq \|g - g_n\|_{\mathcal{H}_\phi} \cdot \|k_\phi(\gamma, \cdot)\|_{\mathcal{H}_\phi}.
\end{align*}
The reproducing property applied to $k_\phi(\gamma, \cdot)$ itself yields $\|k_\phi(\gamma, \cdot)\|_{\mathcal{H}_\phi}^2 = \langle k_\phi(\gamma, \cdot), k_\phi(\gamma, \cdot)\rangle_{\mathcal{H}_\phi} = k_\phi(\gamma, \gamma)$, so
\begin{align*}
|g(\gamma) - g_n(\gamma)| \leq \|g - g_n\|_{\mathcal{H}_\phi} \cdot \sqrt{k_\phi(\gamma, \gamma)}.
\end{align*}
This says: small in RKHS-norm forces pointwise smallness, with the conversion factor $\sqrt{k_\phi(\gamma, \gamma)}$ depending on the evaluation point $\gamma$.
**(c) Uniform bound via compactness.** To upgrade the pointwise bound to a uniform bound on $\mathcal{K}$ we control $\sqrt{k_\phi(\gamma, \gamma)}$ uniformly over $\gamma \in \mathcal{K}$. The map
\begin{align*}
\gamma \mapsto k_\phi(\gamma, \gamma) = \langle S(\gamma), S(\gamma)\rangle_\phi = \|S(\gamma)\|_\phi^2
\end{align*}
is continuous on $\mathcal{C}_p$ because $S : \mathcal{C}_p \to T_\phi((V))$ is continuous (a hypothesis of the theorem) and the squared norm $\|\cdot\|_\phi^2 : T_\phi((V)) \to \mathbb{R}$ is continuous on the Hilbert space $T_\phi((V))$. Composition of continuous maps is continuous, and $x \mapsto \sqrt{x}$ is continuous on $[0, \infty)$, so $\gamma \mapsto \sqrt{k_\phi(\gamma, \gamma)}$ is continuous. By the [extreme value theorem](/theorems/???) applied to this continuous function on the compact set $\mathcal{K}$,
\begin{align*}
M_\mathcal{K} := \sup_{\gamma \in \mathcal{K}} \sqrt{k_\phi(\gamma, \gamma)} < \infty.
\end{align*}
Taking the supremum of the bound from (b) over $\gamma \in \mathcal{K}$,
\begin{align*}
\sup_{\gamma \in \mathcal{K}} |g(\gamma) - g_n(\gamma)| \leq M_\mathcal{K} \cdot \|g - g_n\|_{\mathcal{H}_\phi}.
\end{align*}
**Closing.** If $M_\mathcal{K} > 0$, choose $\delta := \varepsilon / M_\mathcal{K}$ in (a) so that the right-hand side is at most $\varepsilon$. If $M_\mathcal{K} = 0$, then $k_\phi(\gamma, \gamma) = 0$ for all $\gamma \in \mathcal{K}$; the pointwise bound gives $|f(\gamma)| = |g(\gamma)| \le \sqrt{k_\phi(\gamma,\gamma)}\|g\|_{\mathcal{H}_\phi} = 0$, so $\mathcal{H}_\phi|_\mathcal{K} = \{0\}$ and the constant $0 \in \mathcal{V}$ already approximates $f$ exactly. In the non-degenerate case, set $v := g_n|_\mathcal{K} \in \mathcal{V}$; then $\|f - v\|_\infty = \sup_{\gamma \in \mathcal{K}}|g(\gamma) - g_n(\gamma)| \le M_\mathcal{K} \cdot \delta = \varepsilon$. Since $f \in \mathcal{H}_\phi|_\mathcal{K}$ and $\varepsilon > 0$ were arbitrary, $\mathcal{V}$ is uniformly dense in $\mathcal{H}_\phi|_\mathcal{K}$.
Why does the centre $\gamma_i$ in the kernel section need to range over all of $\mathcal{C}_p$ in step (a) and not just $\mathcal{K}$? The defining density statement for the RKHS uses $\operatorname{span}\{k_\phi(\gamma, \cdot) : \gamma \in \mathcal{C}_p\}$, but we only ever evaluate $g_n$ on $\mathcal{K}$, so the *restriction* $g_n|_\mathcal{K}$ is what we put into $\mathcal{V}$. The definition of $\mathcal{V}$ in the theorem statement requires centres in $\mathcal{K}$, but our restriction sends a kernel section centred at $\gamma_i \in \mathcal{C}_p$ to a function on $\mathcal{K}$, which agrees pointwise on $\mathcal{K}$ with $k_\phi(\gamma_i, \cdot)|_\mathcal{K}$. If centres outside $\mathcal{K}$ are required, an equivalent argument shows that linear combinations of kernel sections centred in $\mathcal{K}$ are also dense, since the RKHS topology and pointwise evaluation are intertwined by the reproducing property — but for our purposes the key fact is the uniform-density conclusion on $\mathcal{K}$.
[/guided]
[/step]
[step:Forward direction — universal kernel implies the restricted RKHS is uniformly dense in $C(\mathcal{K})$]
Suppose $k_\phi$ is universal, that is, for every compact $\mathcal{K} \subset \mathcal{C}_p$ the family $\mathcal{V} = \operatorname{span}\{k_\phi(\gamma, \cdot)|_\mathcal{K} : \gamma \in \mathcal{K}\}$ is dense in $C(\mathcal{K})$ in the uniform norm.
We show $\mathcal{H}_\phi|_\mathcal{K}$ is dense in $C(\mathcal{K})$. Every kernel section $k_\phi(\gamma, \cdot) = \langle S(\gamma), S(\cdot) \rangle_\phi = k_\phi^{S(\gamma)}$ has the form $k_\phi^h$ with $h = S(\gamma) \in T_\phi((V))$, hence $\mathcal{V} \subseteq \mathcal{H}_\phi|_\mathcal{K}$. Therefore for any $f \in C(\mathcal{K})$ and $\varepsilon > 0$, the universality hypothesis yields $v \in \mathcal{V} \subseteq \mathcal{H}_\phi|_\mathcal{K}$ with $\|f - v\|_\infty < \varepsilon$. Since $\varepsilon$ and $f$ were arbitrary, $\mathcal{H}_\phi|_\mathcal{K}$ is dense in $C(\mathcal{K})$ in the uniform norm.
[guided]
The forward direction is short because the inclusion $\mathcal{V} \subseteq \mathcal{H}_\phi|_\mathcal{K}$ does the work for free. Why is this inclusion true?
The family $\mathcal{H}_\phi|_\mathcal{K}$ consists of restrictions to $\mathcal{K}$ of functions $k_\phi^h : \gamma \mapsto \langle h, S(\gamma)\rangle_\phi$ where $h$ ranges over the Hilbert space $T_\phi((V))$. The family $\mathcal{V}$ consists of finite linear combinations of kernel sections $k_\phi(\gamma_0, \cdot)|_\mathcal{K}$ with centres $\gamma_0 \in \mathcal{K}$. We claim each kernel section is a special case of $k_\phi^h$. Indeed, by the standing definition of the signature kernel,
\begin{align*}
k_\phi(\gamma_0, \gamma) = \langle S(\gamma_0), S(\gamma)\rangle_\phi = k_\phi^{S(\gamma_0)}(\gamma),
\end{align*}
so the kernel section centred at $\gamma_0$ equals $k_\phi^h$ with $h := S(\gamma_0) \in T_\phi((V))$. Linear combinations of kernel sections are then linear combinations of $k_\phi^h$ functions, and since $h \mapsto k_\phi^h$ is linear (the inner product is linear in the first slot), $\sum_i c_i k_\phi^{h_i} = k_\phi^{\sum_i c_i h_i}$ with $\sum_i c_i h_i \in T_\phi((V))$. Hence $\mathcal{V} \subseteq \mathcal{H}_\phi|_\mathcal{K}$.
With the inclusion in hand, fix $f \in C(\mathcal{K})$ and $\varepsilon > 0$. Universality of $k_\phi$ on $\mathcal{K}$ gives $v \in \mathcal{V}$ with $\|f - v\|_\infty < \varepsilon$. Since $v \in \mathcal{V} \subseteq \mathcal{H}_\phi|_\mathcal{K}$, the same $v$ witnesses the density of the larger family $\mathcal{H}_\phi|_\mathcal{K}$ in $C(\mathcal{K})$ at tolerance $\varepsilon$. Letting $\varepsilon$ and $f$ vary, $\mathcal{H}_\phi|_\mathcal{K}$ is uniformly dense in $C(\mathcal{K})$. The compact $\mathcal{K}$ was arbitrary, so this density holds for every compact subset of $\mathcal{C}_p$.
The asymmetry between the two directions is now apparent: the forward direction goes through because every element of $\mathcal{V}$ is *automatically* an element of $\mathcal{H}_\phi|_\mathcal{K}$. The reverse direction is harder because not every element of $\mathcal{H}_\phi|_\mathcal{K}$ is in $\mathcal{V}$ — a generic $h \in T_\phi((V))$ need not be a finite combination of $S(\gamma_0)$'s — and this is exactly what the first-step lemma overcomes.
[/guided]
[/step]
[step:Reverse direction — the restricted RKHS is uniformly dense in $C(\mathcal{K})$ implies universality]
Suppose $\mathcal{H}_\phi|_\mathcal{K}$ is dense in $C(\mathcal{K})$ in the uniform norm, for every compact $\mathcal{K} \subset \mathcal{C}_p$. Fix such a $\mathcal{K}$, and fix $f \in C(\mathcal{K})$ and $\varepsilon > 0$. We show $\mathcal{V}$ approximates $f$ to within $\varepsilon$ uniformly on $\mathcal{K}$.
By the density hypothesis, there exists $h \in T_\phi((V))$ such that
\begin{align*}
\sup_{\gamma \in \mathcal{K}} |f(\gamma) - k_\phi^h(\gamma)| < \frac{\varepsilon}{2}.
\end{align*}
By the lemma in the first step, $\mathcal{V}$ is uniformly dense in $\mathcal{H}_\phi|_\mathcal{K}$, and $k_\phi^h|_\mathcal{K} \in \mathcal{H}_\phi|_\mathcal{K}$. Hence there exist $n \in \mathbb{N}$, $\gamma_1, \ldots, \gamma_n \in \mathcal{K}$, and $c_1, \ldots, c_n \in \mathbb{R}$ such that
\begin{align*}
\sup_{\gamma \in \mathcal{K}} \Bigl| k_\phi^h(\gamma) - \sum_{i=1}^n c_i k_\phi(\gamma_i, \gamma) \Bigr| < \frac{\varepsilon}{2}.
\end{align*}
By the triangle inequality on the supremum norm,
\begin{align*}
\sup_{\gamma \in \mathcal{K}} \Bigl|f(\gamma) - \sum_{i=1}^n c_i k_\phi(\gamma_i, \gamma)\Bigr| \leq \sup_{\gamma \in \mathcal{K}} |f(\gamma) - k_\phi^h(\gamma)| + \sup_{\gamma \in \mathcal{K}} \Bigl|k_\phi^h(\gamma) - \sum_{i=1}^n c_i k_\phi(\gamma_i, \gamma)\Bigr| < \frac{\varepsilon}{2} + \frac{\varepsilon}{2} = \varepsilon.
\end{align*}
The combination $\sum_i c_i k_\phi(\gamma_i, \cdot)|_\mathcal{K}$ lies in $\mathcal{V}$, so $\mathcal{V}$ is uniformly dense in $C(\mathcal{K})$. Since $\mathcal{K}$ was an arbitrary compact subset of $\mathcal{C}_p$, $k_\phi$ is universal.
[guided]
The reverse direction is a two-stage approximation glued by the triangle inequality. The hypothesis says $\mathcal{H}_\phi|_\mathcal{K}$ is uniformly dense in $C(\mathcal{K})$; the first-step lemma says $\mathcal{V}$ is uniformly dense in $\mathcal{H}_\phi|_\mathcal{K}$. We chain these to conclude $\mathcal{V}$ is uniformly dense in $C(\mathcal{K})$, which is the definition of universality on the compact set $\mathcal{K}$.
Suppose $\mathcal{H}_\phi|_\mathcal{K}$ is uniformly dense in $C(\mathcal{K})$ for every compact $\mathcal{K} \subset \mathcal{C}_p$. Fix one such compact $\mathcal{K}$, fix $f \in C(\mathcal{K})$, and fix $\varepsilon > 0$. We will produce a finite combination $\sum_i c_i k_\phi(\gamma_i, \cdot)|_\mathcal{K}$ with $\gamma_i \in \mathcal{K}$ and $c_i \in \mathbb{R}$ that approximates $f$ to within $\varepsilon$ in uniform norm.
**Stage (a) — approximate $f$ by an RKHS element.** By the density hypothesis applied with tolerance $\varepsilon/2$ in place of $\varepsilon$, there exists $h \in T_\phi((V))$ such that
\begin{align*}
\sup_{\gamma \in \mathcal{K}} |f(\gamma) - k_\phi^h(\gamma)| < \frac{\varepsilon}{2}.
\end{align*}
The element $k_\phi^h|_\mathcal{K} \in \mathcal{H}_\phi|_\mathcal{K}$ since $h \in T_\phi((V))$.
**Stage (b) — approximate the RKHS element by a kernel-section combination.** Apply the first-step lemma to the function $k_\phi^h|_\mathcal{K} \in \mathcal{H}_\phi|_\mathcal{K}$ with tolerance $\varepsilon/2$. The lemma gives $n \in \mathbb{N}$, points $\gamma_1, \ldots, \gamma_n \in \mathcal{K}$, and $c_1, \ldots, c_n \in \mathbb{R}$ such that
\begin{align*}
\sup_{\gamma \in \mathcal{K}} \Bigl| k_\phi^h(\gamma) - \sum_{i=1}^n c_i k_\phi(\gamma_i, \gamma) \Bigr| < \frac{\varepsilon}{2}.
\end{align*}
The combination $v := \sum_{i=1}^n c_i k_\phi(\gamma_i, \cdot)|_\mathcal{K} \in \mathcal{V}$ since the centres $\gamma_i$ lie in $\mathcal{K}$.
**Combining the two stages.** By the triangle inequality applied pointwise and then taking a supremum over $\gamma \in \mathcal{K}$,
\begin{align*}
\sup_{\gamma \in \mathcal{K}} \Bigl|f(\gamma) - \sum_{i=1}^n c_i k_\phi(\gamma_i, \gamma)\Bigr| &\leq \sup_{\gamma \in \mathcal{K}} |f(\gamma) - k_\phi^h(\gamma)| + \sup_{\gamma \in \mathcal{K}} \Bigl|k_\phi^h(\gamma) - \sum_{i=1}^n c_i k_\phi(\gamma_i, \gamma)\Bigr| \\
&< \frac{\varepsilon}{2} + \frac{\varepsilon}{2} = \varepsilon.
\end{align*}
Why split the budget as $\varepsilon/2 + \varepsilon/2$? Each stage of the approximation contributes one term to the triangle inequality, so we need each contribution to be less than half of $\varepsilon$ for the sum to stay below $\varepsilon$. Any positive split (e.g. $\varepsilon/3 + 2\varepsilon/3$) would also work; halves are the symmetric choice.
Why are the centres $\gamma_i$ allowed to be restricted to $\mathcal{K}$? Because the first-step lemma was proved precisely for centres in $\mathcal{K}$ — the family $\mathcal{V}$ in the theorem statement uses only centres in $\mathcal{K}$, and the lemma showed $\mathcal{V}$ is uniformly dense in $\mathcal{H}_\phi|_\mathcal{K}$.
Since $f \in C(\mathcal{K})$ and $\varepsilon > 0$ were arbitrary, this shows $\mathcal{V}$ is uniformly dense in $C(\mathcal{K})$. Since $\mathcal{K} \subset \mathcal{C}_p$ was an arbitrary compact subset, $k_\phi$ is universal by definition. This completes the reverse direction.
[/guided]
[/step]