[guided]The bound $|\mathbb{E}_\mu[f] - \mathbb{E}_\nu[f]| < 2\varepsilon$ from Step 4 holds for *every* $\varepsilon > 0$, with the proxy $g \in \mathcal{H}_\phi$ depending on the choice of $\varepsilon$ but the bound itself uniform in that choice. Letting $\varepsilon \to 0$,
\begin{align*}
|\mathbb{E}_\mu[f] - \mathbb{E}_\nu[f]| \le \lim_{\varepsilon \to 0^+} 2\varepsilon = 0,
\end{align*}
so $\mathbb{E}_\mu[f] = \mathbb{E}_\nu[f]$. Since the function $f \in C(\mathcal{K})$ was arbitrary, this identity holds for *every* continuous function on $\mathcal{K}$.
**Translating to $C_b(\mathcal{K})$.** Continuous functions on a compact metric space are automatically bounded — for $f \in C(\mathcal{K})$, the [Extreme Value Theorem](/theorems/???) gives $\sup_{x \in \mathcal{K}} |f(x)| < \infty$. Hence $C(\mathcal{K}) = C_b(\mathcal{K})$ in our setting, and we conclude
\begin{align*}
\int_\mathcal{K} f\, d\mu = \int_\mathcal{K} f\, d\nu \qquad \text{for every } f \in C_b(\mathcal{K}).
\end{align*}
**Applying the measure-determination theorem.** We invoke the [Bounded Continuous Functions Determine Borel Probability Measures](/theorems/???) (Dudley, *Real Analysis and Probability*, Lemma 9.3.2):
\begin{quote}
Let $X$ be a metric space and $\mu, \nu$ be Borel probability measures on $X$. If $\int f\, d\mu = \int f\, d\nu$ for every $f \in C_b(X)$, then $\mu = \nu$.
\end{quote}
We verify each hypothesis:
\begin{itemize}
\item *$X$ is a metric space.* The compact set $\mathcal{K}$ is a compact metric space (subspace of $\mathcal{C}_p$, which carries a metric topology by hypothesis).
\item *$\mu, \nu$ are Borel probability measures on $X$.* By assumption $\mu, \nu \in \mathcal{P}(\mathcal{K})$, the space of Borel probability measures on $\mathcal{K}$.
\item *$\mathbb{E}_\mu[f] = \mathbb{E}_\nu[f]$ for every $f \in C_b(\mathcal{K})$.* Just established.
\end{itemize}
The conclusion is $\mu = \nu$.
**Why does the chain of implications need universality?** The argument boils down to: $d_\phi(\mu, \nu) = 0 \Rightarrow \mathbb{E}_\mu = \mathbb{E}_\nu$ on $\mathcal{H}_\phi \Rightarrow \mathbb{E}_\mu = \mathbb{E}_\nu$ on $C_b(\mathcal{K}) \Rightarrow \mu = \nu$. The first arrow is just the reproducing property; the third is Dudley's lemma. The middle arrow — extending integral identities from a dense subspace $\mathcal{H}_\phi$ of $C(\mathcal{K})$ to the whole space — is the *only* place where universality enters. Without universality, $\mathcal{H}_\phi$ might be dense in some smaller space (e.g. polynomials of bounded degree, smooth functions vanishing on a fixed set), and Dudley's lemma would not apply because we would only have integral identities on a non-dense subset of $C_b$.
**Closing the equivalence.** The forward implication $\mu = \nu \Rightarrow d_\phi(\mu, \nu) = 0$ from Step 1 is trivial: equal measures have equal kernel mean embeddings, so the difference has zero norm. The converse, just established, is the substantive content. Together,
\begin{align*}
d_\phi(\mu, \nu) = 0 \iff \mu = \nu \qquad \text{(equivalently, $d_\phi$ is a metric on $\mathcal{P}(\mathcal{K})$).}
\end{align*}
**The role of compactness of $\mathcal{K}$.** Compactness is used twice: once in Step 3 (universality is defined on compact subsets), and once here to identify $C(\mathcal{K})$ with $C_b(\mathcal{K})$. On a non-compact $\mathcal{K}$, continuous functions need not be bounded, and the universality-based argument would need a separate truncation step. The standard MMD framework therefore restricts attention to compact $\mathcal{K}$ for cleanliness, although extensions to non-compact settings exist (using e.g. Polish-space versions of Dudley's lemma and characteristic kernels of integrable type).[/guided]