Androma — The Home of Mathematics on the Internet

custom_env Unknown

[guided]The squared MMD has, by the [Definition of MMD](/theorems/???), an inner-product origin: $d_\phi(\rho_1, \rho_2)^2 = \|M^\phi_{\rho_1} - M^\phi_{\rho_2}\|_{\mathcal{H}_\phi}^2$, where $M^\phi_\rho \in \mathcal{H}_\phi$ is the kernel mean embedding of $\rho$. To turn this into something we can compare to a weak-convergence statement, we must rewrite each Hilbert-norm term as an *integral against a product measure* — which is the form in which weak convergence interacts naturally. **The polarisation identity.** For any two vectors $a, b$ in a Hilbert space, \begin{align*} \|a - b\|^2 = \langle a - b, a - b\rangle = \langle a, a\rangle - 2\langle a, b\rangle + \langle b, b\rangle = \|a\|^2 - 2\langle a, b\rangle + \|b\|^2. \end{align*} Apply this with $a = M^\phi_{\rho_1}$, $b = M^\phi_{\rho_2}$ in $\mathcal{H}_\phi$: \begin{align*} \|M^\phi_{\rho_1} - M^\phi_{\rho_2}\|_{\mathcal{H}_\phi}^2 = \|M^\phi_{\rho_1}\|_{\mathcal{H}_\phi}^2 - 2\langle M^\phi_{\rho_1}, M^\phi_{\rho_2}\rangle_{\mathcal{H}_\phi} + \|M^\phi_{\rho_2}\|_{\mathcal{H}_\phi}^2. \end{align*} **Inner products as integrals against product measures.** By the [Inner Product Formula for Kernel Mean Embeddings](/theorems/???), \begin{align*} \langle M^\phi_\rho, M^\phi_\tau\rangle_{\mathcal{H}_\phi} = \int_{\mathcal{K} \times \mathcal{K}} k_\phi(x, y)\, d(\rho \otimes \tau)(x, y) \qquad \text{for any } \rho, \tau \in \mathcal{P}(\mathcal{K}). \end{align*} This identity comes from the reproducing property: writing $M^\phi_\rho = \int k_\phi(x, \cdot)\, d\rho(x)$ and $M^\phi_\tau = \int k_\phi(y, \cdot)\, d\tau(y)$ as Bochner integrals in $\mathcal{H}_\phi$, the inner product is \begin{align*} \langle M^\phi_\rho, M^\phi_\tau\rangle_{\mathcal{H}_\phi} = \int\int \langle k_\phi(x, \cdot), k_\phi(y, \cdot)\rangle_{\mathcal{H}_\phi}\, d\rho(x)\, d\tau(y) = \int\int k_\phi(x, y)\, d\rho(x)\, d\tau(y), \end{align*} using $\langle k_\phi(x, \cdot), k_\phi(y, \cdot)\rangle_{\mathcal{H}_\phi} = k_\phi(x, y)$ (the reproducing property of the kernel). Setting $\rho = \tau$ gives $\|M^\phi_\rho\|_{\mathcal{H}_\phi}^2 = \int k_\phi\, d(\rho \otimes \rho)$. **The full expansion.** Substituting these identities into the polarisation expansion, \begin{align*} \|M^\phi_{\rho_1} - M^\phi_{\rho_2}\|_{\mathcal{H}_\phi}^2 = \int_{\mathcal{K} \times \mathcal{K}} k_\phi\, d(\rho_1 \otimes \rho_1) - 2\int_{\mathcal{K} \times \mathcal{K}} k_\phi\, d(\rho_1 \otimes \rho_2) + \int_{\mathcal{K} \times \mathcal{K}} k_\phi\, d(\rho_2 \otimes \rho_2). \end{align*} **Finiteness of each integral.** The kernel $k_\phi : \mathcal{K} \times \mathcal{K} \to \mathbb{R}$ is continuous by hypothesis. Continuous functions on the compact space $\mathcal{K} \times \mathcal{K}$ attain their supremum (extreme value theorem), so there exists $M < \infty$ with $\sup_{(x, y) \in \mathcal{K} \times \mathcal{K}} |k_\phi(x, y)| \le M$. For any product probability measure $\rho \otimes \tau$ on $\mathcal{K} \times \mathcal{K}$, \begin{align*} \Bigl|\int k_\phi\, d(\rho \otimes \tau)\Bigr| \le \int |k_\phi|\, d(\rho \otimes \tau) \le M \cdot (\rho \otimes \tau)(\mathcal{K} \times \mathcal{K}) = M < \infty. \end{align*} All three integrals are therefore finite, and the manipulation above is valid (no infinity-minus-infinity worries). **Strategic significance of this expansion.** We have re-expressed $d_\phi(\mu_n, \mu)^2$ entirely in terms of three integrals against product measures of the form $\mu_n \otimes \mu_n$, $\mu_n \otimes \mu$, and $\mu \otimes \mu$. The rest of the forward direction will lift weak convergence $\mu_n \rightharpoonup \mu$ to weak convergence of these product measures (Step 2), then plug $k_\phi \in C_b(\mathcal{K} \times \mathcal{K})$ into the definition of weak convergence to get convergence of each integral (Step 3). The alternating signs $+1, -2, +1$ produce a perfect cancellation in the limit, sending $d_\phi(\mu_n, \mu)^2 \to 0$.[/guided]

custom_env Unknown

[guided]The converse direction is the heart of the metrization claim. We are given $d_\phi(\mu_n, \mu) \to 0$ — convergence in a norm on the *kernel mean embedding* — and must conclude $\mu_n \rightharpoonup \mu$ in the *weak topology* on $\mathcal{P}(\mathcal{K})$, i.e. $\int f\, d\mu_n \to \int f\, d\mu$ for every $f \in C_b(\mathcal{K})$. The key idea is to use universality of $k_\phi$ to bridge between $C_b(\mathcal{K})$ and $\mathcal{H}_\phi$, where the convergence in $d_\phi$ acts directly through the reproducing property. **(a) Setup.** Fix $f \in C_b(\mathcal{K})$ and $\varepsilon > 0$. Since $\mathcal{K}$ is compact, $C(\mathcal{K}) = C_b(\mathcal{K})$ (every continuous function on a compact space is bounded by the extreme value theorem). By universality of $k_\phi$, the RKHS $\mathcal{H}_\phi$ is uniformly dense in $C(\mathcal{K})$, so there exists $g_\varepsilon \in \mathcal{H}_\phi$ — depending on the choice of $\varepsilon$ — with \begin{align*} \|f - g_\varepsilon\|_\infty := \sup_{x \in \mathcal{K}} |f(x) - g_\varepsilon(x)| < \varepsilon. \end{align*} The element $g_\varepsilon$ is fixed for the remainder of the argument; in particular, $\|g_\varepsilon\|_{\mathcal{H}_\phi}$ is a finite constant depending on $\varepsilon$ and $f$ but not on $n$. **(b) Triangle decomposition.** Insert $g_\varepsilon$ as a proxy and apply the triangle inequality: \begin{align*} |\mu_n(f) - \mu(f)| \le \underbrace{|\mu_n(f) - \mu_n(g_\varepsilon)|}_{\text{(I)}} + \underbrace{|\mu_n(g_\varepsilon) - \mu(g_\varepsilon)|}_{\text{(II)}} + \underbrace{|\mu(g_\varepsilon) - \mu(f)|}_{\text{(III)}}. \end{align*} **(c) Bounding terms (I) and (III) by uniform closeness.** For any probability measure $\rho \in \mathcal{P}(\mathcal{K})$, \begin{align*} |\rho(f) - \rho(g_\varepsilon)| = \Bigl|\int_\mathcal{K} (f - g_\varepsilon)\, d\rho\Bigr| \le \int_\mathcal{K} |f - g_\varepsilon|\, d\rho \le \|f - g_\varepsilon\|_\infty \cdot \rho(\mathcal{K}) = \|f - g_\varepsilon\|_\infty < \varepsilon, \end{align*} using the pointwise bound $|f - g_\varepsilon|(x) \le \|f - g_\varepsilon\|_\infty$ on $\mathcal{K}$, monotonicity of the integral, and $\rho(\mathcal{K}) = 1$ since $\rho$ is a probability measure. Apply this with $\rho = \mu_n$ for term (I) and $\rho = \mu$ for term (III); both are bounded by $\varepsilon$. **(d) Bounding term (II) via the reproducing property.** This is the bridge from "weak topology" to "MMD topology". Since $g_\varepsilon \in \mathcal{H}_\phi$, the [Reproducing Property of the Kernel Mean Embedding](/theorems/???) gives, for any $\rho \in \mathcal{P}(\mathcal{K})$, \begin{align*} \rho(g_\varepsilon) = \mathbb{E}_{x \sim \rho}[g_\varepsilon(x)] = \langle g_\varepsilon, M^\phi_\rho\rangle_{\mathcal{H}_\phi}. \end{align*} Subtracting the same identity with $\rho = \mu$ from $\rho = \mu_n$, \begin{align*} \mu_n(g_\varepsilon) - \mu(g_\varepsilon) = \langle g_\varepsilon, M^\phi_{\mu_n} - M^\phi_\mu\rangle_{\mathcal{H}_\phi}. \end{align*} Apply Cauchy--Schwarz in the Hilbert space $\mathcal{H}_\phi$: \begin{align*} |\mu_n(g_\varepsilon) - \mu(g_\varepsilon)| \le \|g_\varepsilon\|_{\mathcal{H}_\phi} \cdot \|M^\phi_{\mu_n} - M^\phi_\mu\|_{\mathcal{H}_\phi} = \|g_\varepsilon\|_{\mathcal{H}_\phi} \cdot d_\phi(\mu_n, \mu). \end{align*} The right-hand side has $\|g_\varepsilon\|_{\mathcal{H}_\phi}$ fixed (independent of $n$) and $d_\phi(\mu_n, \mu) \to 0$ by hypothesis, so term (II) tends to zero as $n \to \infty$. **(e) Combining the bounds.** From (b)--(d), \begin{align*} |\mu_n(f) - \mu(f)| \le \varepsilon + \|g_\varepsilon\|_{\mathcal{H}_\phi} \cdot d_\phi(\mu_n, \mu) + \varepsilon = 2\varepsilon + \|g_\varepsilon\|_{\mathcal{H}_\phi} \cdot d_\phi(\mu_n, \mu). \end{align*} Take the limsup over $n$, treating $\varepsilon$ and $g_\varepsilon$ as fixed: \begin{align*} \limsup_{n \to \infty} |\mu_n(f) - \mu(f)| \le 2\varepsilon + \|g_\varepsilon\|_{\mathcal{H}_\phi} \cdot \limsup_{n \to \infty} d_\phi(\mu_n, \mu) = 2\varepsilon + \|g_\varepsilon\|_{\mathcal{H}_\phi} \cdot 0 = 2\varepsilon. \end{align*} Now take $\varepsilon \to 0^+$. Since the bound $\limsup |\mu_n(f) - \mu(f)| \le 2\varepsilon$ holds for *every* $\varepsilon > 0$, \begin{align*} \limsup_{n \to \infty} |\mu_n(f) - \mu(f)| = 0, \end{align*} which means $\mu_n(f) \to \mu(f)$. **(f) Conclusion.** The function $f \in C_b(\mathcal{K})$ was arbitrary, so $\mu_n(f) \to \mu(f)$ for every $f \in C_b(\mathcal{K})$. By the [Definition of Weak Convergence](/theorems/???) on a metric space (which uses $C_b$ test functions), $\mu_n \rightharpoonup \mu$. **Why the order of limits matters.** Notice we send $n \to \infty$ first (with $\varepsilon$ fixed) and then $\varepsilon \to 0$, not the reverse. This is essential because $g_\varepsilon$ depends on $\varepsilon$: as $\varepsilon \to 0$, the approximant changes and $\|g_\varepsilon\|_{\mathcal{H}_\phi}$ may *blow up* (better RKHS approximations of an arbitrary continuous function typically have larger RKHS norm). If we sent $\varepsilon \to 0$ first while $n$ stayed bounded, the term $\|g_\varepsilon\|_{\mathcal{H}_\phi} \cdot d_\phi(\mu_n, \mu)$ might diverge. Sending $n \to \infty$ first kills $d_\phi(\mu_n, \mu)$ for *that specific* $g_\varepsilon$, neutralising the blow-up. This is the standard "blow-up cancels by limit interchange" trick in approximation theory. **Why universality is essential.** Without universality, we could not produce $g_\varepsilon \in \mathcal{H}_\phi$ uniformly close to a generic $f \in C_b(\mathcal{K})$. The argument would then only conclude $\mu_n(f) \to \mu(f)$ for $f$ in the uniform closure of $\mathcal{H}_\phi$, which might be a strict subset of $C_b(\mathcal{K})$. Such partial convergence is *not* equivalent to weak convergence — measures could agree on $\mathcal{H}_\phi$ but disagree elsewhere — so $d_\phi$ would fail to metrize the weak topology.[/guided]

custom_env Unknown

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data