Androma — The Home of Mathematics on the Internet

custom_env Unknown

[guided]The hypothesis $d_\phi(\mu, \nu) = 0$ is by [Definition of MMD](/theorems/???) the statement \begin{align*} \|M^\phi_\mu - M^\phi_\nu\|_{\mathcal{H}_\phi} = 0, \end{align*} and norms are positive-definite, so this forces equality in the Hilbert space $\mathcal{H}_\phi$: \begin{align*} M^\phi_\mu = M^\phi_\nu \quad \text{in } \mathcal{H}_\phi. \end{align*} **The reproducing property of the kernel mean embedding.** The kernel mean embedding $M^\phi_\rho \in \mathcal{H}_\phi$ of a probability measure $\rho \in \mathcal{P}(\mathcal{K})$ is the unique element of $\mathcal{H}_\phi$ characterised by the [Reproducing Property of the Kernel Mean Embedding](/theorems/???): \begin{align*} \langle g, M^\phi_\rho \rangle_{\mathcal{H}_\phi} = \mathbb{E}_{x \sim \rho}[g(x)] \qquad \text{for every } g \in \mathcal{H}_\phi. \end{align*} This identity is the kernel mean embedding's *reason for existing*. It follows from $M^\phi_\rho = \int k_\phi(x, \cdot)\, d\rho(x)$ (a Bochner integral in the Hilbert space $\mathcal{H}_\phi$, well-defined because $\mathcal{K}$ is compact and $k_\phi$ is continuous, hence $k_\phi(x, \cdot)$ is uniformly bounded in $\mathcal{H}_\phi$-norm) combined with the reproducing kernel property $g(x) = \langle g, k_\phi(x, \cdot)\rangle_{\mathcal{H}_\phi}$: \begin{align*} \langle g, M^\phi_\rho\rangle_{\mathcal{H}_\phi} = \Bigl\langle g, \int k_\phi(x, \cdot)\, d\rho(x)\Bigr\rangle_{\mathcal{H}_\phi} = \int \langle g, k_\phi(x, \cdot)\rangle_{\mathcal{H}_\phi}\, d\rho(x) = \int g(x)\, d\rho(x) = \mathbb{E}_\rho[g], \end{align*} where the second equality uses linearity and continuity of the inner product against the Bochner integral. **Applying the reproducing property to both measures.** Apply the identity to $\rho = \mu$ and to $\rho = \nu$ for an arbitrary $g \in \mathcal{H}_\phi$: \begin{align*} \mathbb{E}_{x \sim \mu}[g(x)] &= \langle g, M^\phi_\mu\rangle_{\mathcal{H}_\phi}, \\ \mathbb{E}_{y \sim \nu}[g(y)] &= \langle g, M^\phi_\nu\rangle_{\mathcal{H}_\phi}. \end{align*} Since $M^\phi_\mu = M^\phi_\nu$ in $\mathcal{H}_\phi$, the two right-hand sides agree: \begin{align*} \langle g, M^\phi_\mu\rangle_{\mathcal{H}_\phi} = \langle g, M^\phi_\nu\rangle_{\mathcal{H}_\phi}. \end{align*} Therefore \begin{align*} \mathbb{E}_{x \sim \mu}[g(x)] = \mathbb{E}_{y \sim \nu}[g(y)] \qquad \text{for every } g \in \mathcal{H}_\phi, \end{align*} or equivalently \begin{align*} \mathbb{E}_{x \sim \mu}[g(x)] - \mathbb{E}_{y \sim \nu}[g(y)] = 0 \qquad \text{for every } g \in \mathcal{H}_\phi. \end{align*} **Strategic significance.** This identity is the foothold for the rest of the proof. We know $\mu$ and $\nu$ agree on integrals against every element of $\mathcal{H}_\phi$, and we want to upgrade this to agreement on integrals against every $f \in C(\mathcal{K})$. The gap is that $\mathcal{H}_\phi$ is a *strict* subset of $C(\mathcal{K})$ in general — RKHS elements have additional smoothness or summability constraints — and the bridge from "agree on $\mathcal{H}_\phi$" to "agree on $C(\mathcal{K})$" is *uniform density*. Without universality of $k_\phi$, the conclusion fails: there are kernels for which $\mathcal{H}_\phi$ is "too small" to determine the measure (e.g. polynomial kernels of fixed degree). Universality is precisely the property that fills the gap.[/guided]

custom_env Unknown

[guided]The bound $|\mathbb{E}_\mu[f] - \mathbb{E}_\nu[f]| < 2\varepsilon$ from Step 4 holds for *every* $\varepsilon > 0$, with the proxy $g \in \mathcal{H}_\phi$ depending on the choice of $\varepsilon$ but the bound itself uniform in that choice. Letting $\varepsilon \to 0$, \begin{align*} |\mathbb{E}_\mu[f] - \mathbb{E}_\nu[f]| \le \lim_{\varepsilon \to 0^+} 2\varepsilon = 0, \end{align*} so $\mathbb{E}_\mu[f] = \mathbb{E}_\nu[f]$. Since the function $f \in C(\mathcal{K})$ was arbitrary, this identity holds for *every* continuous function on $\mathcal{K}$. **Translating to $C_b(\mathcal{K})$.** Continuous functions on a compact metric space are automatically bounded — for $f \in C(\mathcal{K})$, the [Extreme Value Theorem](/theorems/???) gives $\sup_{x \in \mathcal{K}} |f(x)| < \infty$. Hence $C(\mathcal{K}) = C_b(\mathcal{K})$ in our setting, and we conclude \begin{align*} \int_\mathcal{K} f\, d\mu = \int_\mathcal{K} f\, d\nu \qquad \text{for every } f \in C_b(\mathcal{K}). \end{align*} **Applying the measure-determination theorem.** We invoke the [Bounded Continuous Functions Determine Borel Probability Measures](/theorems/???) (Dudley, *Real Analysis and Probability*, Lemma 9.3.2): \begin{quote} Let $X$ be a metric space and $\mu, \nu$ be Borel probability measures on $X$. If $\int f\, d\mu = \int f\, d\nu$ for every $f \in C_b(X)$, then $\mu = \nu$. \end{quote} We verify each hypothesis: \begin{itemize} \item *$X$ is a metric space.* The compact set $\mathcal{K}$ is a compact metric space (subspace of $\mathcal{C}_p$, which carries a metric topology by hypothesis). \item *$\mu, \nu$ are Borel probability measures on $X$.* By assumption $\mu, \nu \in \mathcal{P}(\mathcal{K})$, the space of Borel probability measures on $\mathcal{K}$. \item *$\mathbb{E}_\mu[f] = \mathbb{E}_\nu[f]$ for every $f \in C_b(\mathcal{K})$.* Just established. \end{itemize} The conclusion is $\mu = \nu$. **Why does the chain of implications need universality?** The argument boils down to: $d_\phi(\mu, \nu) = 0 \Rightarrow \mathbb{E}_\mu = \mathbb{E}_\nu$ on $\mathcal{H}_\phi \Rightarrow \mathbb{E}_\mu = \mathbb{E}_\nu$ on $C_b(\mathcal{K}) \Rightarrow \mu = \nu$. The first arrow is just the reproducing property; the third is Dudley's lemma. The middle arrow — extending integral identities from a dense subspace $\mathcal{H}_\phi$ of $C(\mathcal{K})$ to the whole space — is the *only* place where universality enters. Without universality, $\mathcal{H}_\phi$ might be dense in some smaller space (e.g. polynomials of bounded degree, smooth functions vanishing on a fixed set), and Dudley's lemma would not apply because we would only have integral identities on a non-dense subset of $C_b$. **Closing the equivalence.** The forward implication $\mu = \nu \Rightarrow d_\phi(\mu, \nu) = 0$ from Step 1 is trivial: equal measures have equal kernel mean embeddings, so the difference has zero norm. The converse, just established, is the substantive content. Together, \begin{align*} d_\phi(\mu, \nu) = 0 \iff \mu = \nu \qquad \text{(equivalently, $d_\phi$ is a metric on $\mathcal{P}(\mathcal{K})$).} \end{align*} **The role of compactness of $\mathcal{K}$.** Compactness is used twice: once in Step 3 (universality is defined on compact subsets), and once here to identify $C(\mathcal{K})$ with $C_b(\mathcal{K})$. On a non-compact $\mathcal{K}$, continuous functions need not be bounded, and the universality-based argument would need a separate truncation step. The standard MMD framework therefore restricts attention to compact $\mathcal{K}$ for cleanliness, although extensions to non-compact settings exist (using e.g. Polish-space versions of Dudley's lemma and characteristic kernels of integrable type).[/guided]

custom_env Unknown

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data