[guided]The squared MMD has, by the [Definition of MMD](/theorems/???), an inner-product origin: $d_\phi(\rho_1, \rho_2)^2 = \|M^\phi_{\rho_1} - M^\phi_{\rho_2}\|_{\mathcal{H}_\phi}^2$, where $M^\phi_\rho \in \mathcal{H}_\phi$ is the kernel mean embedding of $\rho$. To turn this into something we can compare to a weak-convergence statement, we must rewrite each Hilbert-norm term as an *integral against a product measure* — which is the form in which weak convergence interacts naturally.
**The polarisation identity.** For any two vectors $a, b$ in a Hilbert space,
\begin{align*}
\|a - b\|^2 = \langle a - b, a - b\rangle = \langle a, a\rangle - 2\langle a, b\rangle + \langle b, b\rangle = \|a\|^2 - 2\langle a, b\rangle + \|b\|^2.
\end{align*}
Apply this with $a = M^\phi_{\rho_1}$, $b = M^\phi_{\rho_2}$ in $\mathcal{H}_\phi$:
\begin{align*}
\|M^\phi_{\rho_1} - M^\phi_{\rho_2}\|_{\mathcal{H}_\phi}^2 = \|M^\phi_{\rho_1}\|_{\mathcal{H}_\phi}^2 - 2\langle M^\phi_{\rho_1}, M^\phi_{\rho_2}\rangle_{\mathcal{H}_\phi} + \|M^\phi_{\rho_2}\|_{\mathcal{H}_\phi}^2.
\end{align*}
**Inner products as integrals against product measures.** By the [Inner Product Formula for Kernel Mean Embeddings](/theorems/???),
\begin{align*}
\langle M^\phi_\rho, M^\phi_\tau\rangle_{\mathcal{H}_\phi} = \int_{\mathcal{K} \times \mathcal{K}} k_\phi(x, y)\, d(\rho \otimes \tau)(x, y) \qquad \text{for any } \rho, \tau \in \mathcal{P}(\mathcal{K}).
\end{align*}
This identity comes from the reproducing property: writing $M^\phi_\rho = \int k_\phi(x, \cdot)\, d\rho(x)$ and $M^\phi_\tau = \int k_\phi(y, \cdot)\, d\tau(y)$ as Bochner integrals in $\mathcal{H}_\phi$, the inner product is
\begin{align*}
\langle M^\phi_\rho, M^\phi_\tau\rangle_{\mathcal{H}_\phi} = \int\int \langle k_\phi(x, \cdot), k_\phi(y, \cdot)\rangle_{\mathcal{H}_\phi}\, d\rho(x)\, d\tau(y) = \int\int k_\phi(x, y)\, d\rho(x)\, d\tau(y),
\end{align*}
using $\langle k_\phi(x, \cdot), k_\phi(y, \cdot)\rangle_{\mathcal{H}_\phi} = k_\phi(x, y)$ (the reproducing property of the kernel). Setting $\rho = \tau$ gives $\|M^\phi_\rho\|_{\mathcal{H}_\phi}^2 = \int k_\phi\, d(\rho \otimes \rho)$.
**The full expansion.** Substituting these identities into the polarisation expansion,
\begin{align*}
\|M^\phi_{\rho_1} - M^\phi_{\rho_2}\|_{\mathcal{H}_\phi}^2 = \int_{\mathcal{K} \times \mathcal{K}} k_\phi\, d(\rho_1 \otimes \rho_1) - 2\int_{\mathcal{K} \times \mathcal{K}} k_\phi\, d(\rho_1 \otimes \rho_2) + \int_{\mathcal{K} \times \mathcal{K}} k_\phi\, d(\rho_2 \otimes \rho_2).
\end{align*}
**Finiteness of each integral.** The kernel $k_\phi : \mathcal{K} \times \mathcal{K} \to \mathbb{R}$ is continuous by hypothesis. Continuous functions on the compact space $\mathcal{K} \times \mathcal{K}$ attain their supremum (extreme value theorem), so there exists $M < \infty$ with $\sup_{(x, y) \in \mathcal{K} \times \mathcal{K}} |k_\phi(x, y)| \le M$. For any product probability measure $\rho \otimes \tau$ on $\mathcal{K} \times \mathcal{K}$,
\begin{align*}
\Bigl|\int k_\phi\, d(\rho \otimes \tau)\Bigr| \le \int |k_\phi|\, d(\rho \otimes \tau) \le M \cdot (\rho \otimes \tau)(\mathcal{K} \times \mathcal{K}) = M < \infty.
\end{align*}
All three integrals are therefore finite, and the manipulation above is valid (no infinity-minus-infinity worries).
**Strategic significance of this expansion.** We have re-expressed $d_\phi(\mu_n, \mu)^2$ entirely in terms of three integrals against product measures of the form $\mu_n \otimes \mu_n$, $\mu_n \otimes \mu$, and $\mu \otimes \mu$. The rest of the forward direction will lift weak convergence $\mu_n \rightharpoonup \mu$ to weak convergence of these product measures (Step 2), then plug $k_\phi \in C_b(\mathcal{K} \times \mathcal{K})$ into the definition of weak convergence to get convergence of each integral (Step 3). The alternating signs $+1, -2, +1$ produce a perfect cancellation in the limit, sending $d_\phi(\mu_n, \mu)^2 \to 0$.[/guided]