Level-alpha MMD Test — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The strategy is a bounded-differences concentration argument applied to $\hat{d}_\phi(\mu,\nu)^2$ viewed as a function of the $2m$ independent samples. Under the null $H_0 : \mu = \nu$ the estimator has mean zero (by [Unbiased MMD Estimator](/theorems/2521) combined with [MMD is a Metric under Characteristicness](/theorems/2522)). We bound the maximum change of the estimator when a single sample is replaced — this gives explicit bounded-differences constants $c_k = 8M/m$ for each of the $2m$ coordinates. McDiarmid's inequality then converts the per-coordinate bounds into an exponential tail bound, and inverting that bound at level $\alpha$ produces the rejection threshold $(8M/\sqrt{m}) \sqrt{\log(2/\alpha)}$. [/proofplan] [step:Reformulate $\hat{d}_\phi(\mu,\nu)^2$ as a function of $2m$ independent samples] Define \begin{align*} F : \mathcal{K}^{2m} &\to \mathbb{R}, \\ (x_1, \dots, x_m, y_1, \dots, y_m) &\mapsto \frac{1}{m(m-1)} \sum_{i \neq j} k_\phi(x_i, x_j) - \frac{2}{m^2} \sum_{i,j=1}^m k_\phi(x_i, y_j) + \frac{1}{m(m-1)} \sum_{i \neq j} k_\phi(y_i, y_j). \end{align*} Then $\hat{d}_\phi(\mu, \nu)^2 = F(X_1, \dots, X_m, Y_1, \dots, Y_m)$, and the random vector $Z = (X_1, \dots, X_m, Y_1, \dots, Y_m)$ has independent coordinates: the $X$-block is i.i.d. $\mu$, the $Y$-block is i.i.d. $\nu$, and the two blocks are independent of each other by hypothesis. [/step] [step:Verify $\mathbb{E}_{H_0}[\hat{d}_\phi(\mu, \nu)^2] = 0$ under the null] Under $H_0$, $\mu = \nu$. By [Unbiased MMD Estimator](/theorems/2521), $\mathbb{E}[\hat{d}_\phi(\mu, \nu)^2] = d_\phi(\mu, \nu)^2$. By [MMD is a Metric under Characteristicness](/theorems/2522), $d_\phi(\mu, \nu) = 0$ when $\mu = \nu$. Hence \begin{align*} \mathbb{E}_{H_0}[\hat{d}_\phi(\mu, \nu)^2] = 0. \end{align*} [/step] [step:Compute the bounded-differences constant for replacing a single $X$-coordinate] Fix $k \in \{1, \dots, m\}$ and let $Z' = (X_1, \dots, X_{k-1}, X_k', X_{k+1}, \dots, X_m, Y_1, \dots, Y_m)$ differ from $Z$ only in the $k$-th $X$-coordinate. We bound \begin{align*} |F(Z) - F(Z')| \le c_k. \end{align*} Replacing $X_k$ affects only those terms in $F$ that involve $X_k$ — namely the terms $k_\phi(X_k, X_j)$ and $k_\phi(X_j, X_k)$ in the first sum, and $k_\phi(X_k, Y_j)$ in the cross sum. **First sum (within-$X$ off-diagonal):** The pairs containing index $k$ are $\{(k, j) : j \neq k\} \cup \{(j, k) : j \neq k\}$, totalling $2(m-1)$ ordered pairs. Each $k_\phi$ value lies in $[-M, M]$, so the change in any single term is at most $2M$. Hence \begin{align*} \left| \frac{1}{m(m-1)} \sum_{i \neq j} k_\phi(X_i, X_j) - \frac{1}{m(m-1)} \sum_{i \neq j} k_\phi(X_i', X_j') \right| \le \frac{2(m-1) \cdot 2M}{m(m-1)} = \frac{4M}{m}, \end{align*} where we use the convention $X_i' = X_i$ for $i \neq k$. **Cross sum:** The terms containing $X_k$ are $\{(k, j) : 1 \le j \le m\}$, totalling $m$ pairs, each shifted by at most $2M$. The cross term carries a coefficient of $-2/m^2$, so \begin{align*} \left| -\frac{2}{m^2} \sum_{i, j} k_\phi(X_i, Y_j) + \frac{2}{m^2} \sum_{i, j} k_\phi(X_i', Y_j) \right| \le \frac{2 \cdot m \cdot 2M}{m^2} = \frac{4M}{m}. \end{align*} **Within-$Y$ sum:** Unchanged. By the triangle inequality, \begin{align*} |F(Z) - F(Z')| \le \frac{4M}{m} + \frac{4M}{m} = \frac{8M}{m}. \end{align*} Thus $c_k = 8M/m$ for $k \in \{1, \dots, m\}$ (the $X$-coordinates). [guided] McDiarmid's inequality requires a coordinate-wise bound: for each $k$, we need a constant $c_k$ such that flipping $X_k \to X_k'$ (holding all other coordinates fixed) changes $F$ by at most $c_k$. This number controls the variance proxy in the resulting Gaussian tail; getting it wrong by even a constant factor changes the threshold. So we do the bookkeeping carefully. Fix $k \in \{1, \dots, m\}$ and define $Z' := (X_1, \dots, X_{k-1}, X_k', X_{k+1}, \dots, X_m, Y_1, \dots, Y_m)$ — same as $Z$ except in position $k$. Recall \begin{align*} F(Z) = \underbrace{\frac{1}{m(m-1)} \sum_{i \neq j} k_\phi(X_i, X_j)}_{\text{within-}X} - \underbrace{\frac{2}{m^2} \sum_{i,j=1}^m k_\phi(X_i, Y_j)}_{\text{cross}} + \underbrace{\frac{1}{m(m-1)} \sum_{i \neq j} k_\phi(Y_i, Y_j)}_{\text{within-}Y}. \end{align*} The within-$Y$ term does not depend on $X_k$, so it contributes $0$ to $|F(Z) - F(Z')|$. We bound the other two. **Within-$X$ term.** It is $\frac{1}{m(m-1)}$ times a sum over the $m(m-1)$ ordered off-diagonal pairs. The pairs that involve index $k$ are $(k, j)$ for $j \neq k$ ($m-1$ pairs) and $(j, k)$ for $j \neq k$ (another $m-1$ pairs), giving $2(m-1)$ pairs. Each $k_\phi$ value lies in $[-M, M]$ (by hypothesis $|k_\phi| \le M$), so each term shifts by at most $|k_\phi(\cdot,\cdot) - k_\phi(\cdot,\cdot)| \le 2M$. By the triangle inequality on these $2(m-1)$ terms, \begin{align*} \Delta_{\text{within-}X} \le \frac{1}{m(m-1)} \cdot 2(m-1) \cdot 2M = \frac{4M}{m}. \end{align*} **Cross term.** It is $-\frac{2}{m^2}$ times a sum over the $m^2$ ordered pairs $(i, j)$. The pairs involving $X_k$ are $(k, j)$ for $1 \le j \le m$ — exactly $m$ pairs. Each shifts by at most $2M$, so \begin{align*} \Delta_{\text{cross}} \le \frac{2}{m^2} \cdot m \cdot 2M = \frac{4M}{m}. \end{align*} Adding via the triangle inequality, \begin{align*} |F(Z) - F(Z')| \le 0 + \frac{4M}{m} + \frac{4M}{m} = \frac{8M}{m}. \end{align*} Hence $c_k = 8M/m$ for the $X$-coordinates. A useful sanity check: the within-$X$ and cross contributions come out equal precisely because the $1/(m(m-1))$ within-coefficient compensates for the smaller pair count $2(m-1)$, while the $2/m^2$ cross-coefficient compensates for the larger pair count $m$. This balance is by design — the unbiased MMD estimator's coefficients are chosen to make these cancellations clean. [/guided] [/step] [step:Compute the bounded-differences constant for replacing a single $Y$-coordinate] By symmetry of $F$ in the $X$ and $Y$ blocks (with the cross term being symmetric in $X$ and $Y$, since changing $Y_\ell$ shifts $m$ pairs $\{(i, \ell)\}$ in the cross sum and $2(m-1)$ pairs $\{(\ell, j), (j, \ell)\}$ in the within-$Y$ sum, by the same argument), the bounded-differences constant for the $\ell$-th $Y$-coordinate is also \begin{align*} c_{m + \ell} = \frac{8M}{m} \qquad \text{for } \ell \in \{1, \dots, m\}. \end{align*} [guided] We do the bookkeeping carefully because off-by-constant errors in the bounded-differences constants propagate to the threshold. The crucial observation is the structure of $F$ as a sum of three terms, and the role of each term's coefficient. For replacing $Y_\ell$ by $Y_\ell'$: the only terms in $F$ that change are those in the cross sum (mixed $X$, $Y$) and the within-$Y$ off-diagonal sum. **Cross sum.** It is $-\frac{2}{m^2}$ times a sum of $m^2$ pairs $(i, j)$. Replacing $Y_\ell$ affects the $m$ pairs $\{(i, \ell) : 1 \le i \le m\}$, each shifted by at most $|k_\phi(X_i, Y_\ell) - k_\phi(X_i, Y_\ell')| \le 2M$. By the triangle inequality, \begin{align*} \Delta_{\text{cross}} \le \frac{2}{m^2} \cdot m \cdot 2M = \frac{4M}{m}. \end{align*} **Within-$Y$ sum.** It is $\frac{1}{m(m-1)}$ times a sum of $m(m-1)$ ordered pairs $(i, j)$ with $i \neq j$. The pairs containing index $\ell$ are $(\ell, j)$ for $j \neq \ell$ ($m-1$ pairs) and $(j, \ell)$ for $j \neq \ell$ (another $m-1$ pairs), giving $2(m-1)$ total. Each shifted by at most $2M$: \begin{align*} \Delta_{\text{within-}Y} \le \frac{1}{m(m-1)} \cdot 2(m-1) \cdot 2M = \frac{4M}{m}. \end{align*} The first term (within-$X$) does not depend on $Y_\ell$, so it contributes $0$. Adding, \begin{align*} |F(Z) - F(Z')| \le 0 + \frac{4M}{m} + \frac{4M}{m} = \frac{8M}{m}. \end{align*} This matches the constant for the $X$-coordinates by the symmetric structure of $F$. So all $2m$ bounded-differences constants are equal: $c_k = 8M/m$. [/guided] [/step] [step:Apply McDiarmid's inequality to obtain a Gaussian tail bound] The vector $Z$ has $2m$ independent coordinates, and we have established the bounded-differences condition $|F(Z) - F(Z')| \le c_k = 8M/m$ for every coordinate replacement. We invoke [McDiarmid's Inequality](/theorems/???) (the bounded-differences concentration inequality) in its two-sided form: if $F$ satisfies the bounded-differences condition with constants $(c_k)_{k=1}^{N}$ on $N$ independent coordinates, then for every $t > 0$, \begin{align*} \mathbb{P}\!\left( \left| F(Z) - \mathbb{E}[F(Z)] \right| \ge t \right) \le 2 \exp\!\left( -\frac{2 t^2}{\sum_{k=1}^{N} c_k^2} \right). \end{align*} We apply this with $N = 2m$ and $c_k = 8M/m$ for all $k$, which gives \begin{align*} \sum_{k=1}^{2m} c_k^2 = 2m \cdot \left( \frac{8M}{m} \right)^2 = \frac{128 M^2}{m}. \end{align*} Substituting, \begin{align*} \mathbb{P}\!\left( \left| F(Z) - \mathbb{E}[F(Z)] \right| \ge t \right) \le 2 \exp\!\left( -\frac{2 t^2}{128 M^2 / m} \right) = 2 \exp\!\left( -\frac{m t^2}{64 M^2} \right). \end{align*} Under $H_0$, Step 2 gives $\mathbb{E}[F(Z)] = 0$, so dropping the lower tail (which can only weaken the bound) yields \begin{align*} \mathbb{P}_{H_0}\!\left( \hat{d}_\phi(\mu, \nu)^2 \ge t \right) \le \mathbb{P}_{H_0}\!\left( |\hat{d}_\phi(\mu, \nu)^2| \ge t \right) \le 2 \exp\!\left( -\frac{m t^2}{64 M^2} \right). \end{align*} [guided] The point of Steps 1-4 was to show that $F$ has the bounded-differences property: each of its $2m$ coordinates can shift the value by at most $8M/m$. We now cash this in for a Gaussian tail. We invoke [McDiarmid's Inequality](/theorems/???) (also called the bounded-differences inequality). Its hypotheses, which we verify: - **Independent coordinates.** $Z = (X_1, \dots, X_m, Y_1, \dots, Y_m)$ has $2m$ coordinates. The $X$-block is i.i.d. $\mu$, the $Y$-block is i.i.d. $\nu$, and the two blocks are independent of each other (Step 1). So all $2m$ coordinates of $Z$ are mutually independent. - **Bounded differences.** For each $k \in \{1, \dots, 2m\}$, replacing the $k$-th coordinate while holding the rest fixed changes $F$ by at most $c_k$. Steps 3 and 4 established $c_k = 8M/m$ uniformly. Under these hypotheses, McDiarmid's inequality (two-sided form) gives, for every $t > 0$, \begin{align*} \mathbb{P}\!\left( \left| F(Z) - \mathbb{E}[F(Z)] \right| \ge t \right) \le 2 \exp\!\left( -\frac{2 t^2}{\sum_{k=1}^{2m} c_k^2} \right). \end{align*} Plugging in $c_k = 8M/m$ for all $k$ and computing the sum: \begin{align*} \sum_{k=1}^{2m} c_k^2 = 2m \cdot \left( \frac{8M}{m} \right)^2 = 2m \cdot \frac{64 M^2}{m^2} = \frac{128 M^2}{m}. \end{align*} The exponent becomes \begin{align*} -\frac{2 t^2}{128 M^2 / m} = -\frac{m t^2}{64 M^2}, \end{align*} giving \begin{align*} \mathbb{P}\!\left( \left| F(Z) - \mathbb{E}[F(Z)] \right| \ge t \right) \le 2 \exp\!\left( -\frac{m t^2}{64 M^2} \right). \end{align*} Now we specialise to the null. Step 2 established that under $H_0$ (i.e. $\mu = \nu$), $\mathbb{E}_{H_0}[F(Z)] = \mathbb{E}_{H_0}[\hat{d}_\phi(\mu,\nu)^2] = 0$. Substituting, \begin{align*} \mathbb{P}_{H_0}\!\left( \left| \hat{d}_\phi(\mu, \nu)^2 \right| \ge t \right) \le 2 \exp\!\left( -\frac{m t^2}{64 M^2} \right). \end{align*} For deriving an upper-tail threshold we use $\{\hat{d}_\phi^2 \ge t\} \subseteq \{|\hat{d}_\phi^2| \ge t\}$, which only weakens the inequality: \begin{align*} \mathbb{P}_{H_0}\!\left( \hat{d}_\phi(\mu, \nu)^2 \ge t \right) \le 2 \exp\!\left( -\frac{m t^2}{64 M^2} \right). \end{align*} The factor of $1/m$ in the exponent is the source of the $1/\sqrt{m}$ rate in the threshold below — concentration improves with sample size at the parametric rate. [/guided] [/step] [step:Invert the tail bound at level $\alpha$ to derive the threshold] Set the right-hand side equal to $\alpha$ and solve for $t$: \begin{align*} 2 \exp\!\left( -\frac{m t^2}{64 M^2} \right) = \alpha \quad\Longleftrightarrow\quad \exp\!\left( -\frac{m t^2}{64 M^2} \right) = \frac{\alpha}{2} \quad\Longleftrightarrow\quad \frac{m t^2}{64 M^2} = \log\!\left( \frac{2}{\alpha} \right). \end{align*} Solving for $t$ and taking the positive root, \begin{align*} t^* = \frac{8 M}{\sqrt{m}} \sqrt{\log(2/\alpha)}. \end{align*} This is the level-$\alpha$ threshold appearing in the theorem statement. [guided] We pause to verify the algebra by direct substitution, since the threshold formula is the load-bearing computation of the theorem. With $t^* = (8M/\sqrt{m}) \sqrt{\log(2/\alpha)}$, \begin{align*} (t^*)^2 = \frac{64 M^2}{m} \log(2/\alpha), \end{align*} so \begin{align*} \frac{m (t^*)^2}{64 M^2} = \log(2/\alpha). \end{align*} Therefore \begin{align*} 2 \exp\!\left( -\frac{m (t^*)^2}{64 M^2} \right) = 2 \exp(-\log(2/\alpha)) = 2 \cdot \frac{\alpha}{2} = \alpha. \end{align*} The substitution recovers $\alpha$ exactly, confirming the threshold formula. [/guided] [/step] [step:Verify that the test attains level at most $\alpha$] The test rejects $H_0$ when $\hat{d}_\phi(\mu, \nu)^2 \ge t^* = (8M/\sqrt{m}) \sqrt{\log(2/\alpha)}$. Combining the tail bound from Step 5 with the choice of $t^*$ from Step 6, \begin{align*} \mathbb{P}_{H_0}\!\left( \text{reject } H_0 \right) = \mathbb{P}_{H_0}\!\left( \hat{d}_\phi(\mu, \nu)^2 \ge t^* \right) \le 2 \exp\!\left( -\frac{m (t^*)^2}{64 M^2} \right) = \alpha. \end{align*} This is the level of the test, so the test has level at most $\alpha$, completing the proof. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Level-alpha MMD Test (Theorem # 2524)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Level-alpha MMD Test (Theorem # 2524)

Discussion

Proof

Explore Further