Wasserstein Convergence Characterization — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the two implications separately. If $W_p(\mu_n,\mu) \to 0$, choose couplings with vanishing $p$-transport cost; these couplings force bounded continuous test functions to have asymptotically equal expectations, and the [reverse triangle inequality](/theorems/2300) transfers the same vanishing cost to the radial variables $d(\cdot,x_0)$. Conversely, [weak convergence](/page/Weak%20Convergence) plus convergence of the $p$-th moments gives uniform integrability of the moment functions. The Skorokhod representation theorem then realizes the weak convergence almost surely, and Vitali convergence converts almost sure convergence into convergence in $L^p$, producing admissible couplings with vanishing cost. [/proofplan] [step:Choose couplings with vanishing transport cost] Assume first that $W_p(\mu_n,\mu) \to 0$. For each $n \in \mathbb{N}$, choose a coupling $\pi_n \in \Pi(\mu_n,\mu)$ such that \begin{align*} \int_{X \times X} d(x,y)^p\, d\pi_n(x,y) \le W_p(\mu_n,\mu)^p + \frac{1}{n}. \end{align*} Then \begin{align*} \int_{X \times X} d(x,y)^p\, d\pi_n(x,y) \to 0. \end{align*} Here $\Pi(\mu_n,\mu)$ denotes the set of probability measures on $X \times X$ whose first marginal is $\mu_n$ and whose second marginal is $\mu$. [/step] [step:Show that vanishing transport cost implies weak convergence] Let $f: X \to \mathbb{R}$ be a bounded [continuous function](/page/Continuous%20Function), and define $M_f := \sup_{z \in X} |f(z)|$. We prove \begin{align*} \int_X f(x)\, d\mu_n(x) \to \int_X f(y)\, d\mu(y). \end{align*} Fix $\varepsilon > 0$. Since $\mu$ is a probability measure on the Polish space $X$, it is tight. Choose a compact set $K \subset X$ such that \begin{align*} \mu(X \setminus K) < \varepsilon. \end{align*} Because $f$ is continuous at each point of $K$, for every $y \in K$ there exists $\rho_y > 0$ such that $d(z,y) < \rho_y$ implies $|f(z)-f(y)| < \varepsilon/2$. The balls $B(y,\rho_y/2)$ with $y \in K$ cover $K$, so compactness gives points $y_1,\dots,y_m \in K$ such that \begin{align*} K \subset \bigcup_{j=1}^{m} B(y_j,\rho_{y_j}/2). \end{align*} Define $\delta := \frac{1}{2}\min_{1 \le j \le m}\rho_{y_j} > 0$. If $y \in K$ and $x \in X$ satisfy $d(x,y) < \delta$, choose $j$ with $d(y,y_j) < \rho_{y_j}/2$. Then $d(x,y_j) < \rho_{y_j}$ and $d(y,y_j) < \rho_{y_j}$, hence \begin{align*} |f(x)-f(y)| \le |f(x)-f(y_j)|+|f(y_j)-f(y)| < \varepsilon. \end{align*} Define the measurable set \begin{align*} A_\delta := \{(x,y) \in X \times X : d(x,y) < \delta\}. \end{align*} Using that the second marginal of $\pi_n$ is $\mu$, and using Markov's inequality on the non-negative function $(x,y) \mapsto d(x,y)^p$, we obtain \begin{align*} \pi_n((X \times X) \setminus A_\delta) \le \delta^{-p}\int_{X \times X} d(x,y)^p\, d\pi_n(x,y). \end{align*} Therefore \begin{align*} \left|\int_X f(x)\, d\mu_n(x)-\int_X f(y)\, d\mu(y)\right| \le \int_{X \times X} |f(x)-f(y)|\, d\pi_n(x,y). \end{align*} Splitting the integral over $(X \times K) \cap A_\delta$, $X \times (X \setminus K)$, and $(X \times K) \setminus A_\delta$, we get \begin{align*} \int_{X \times X} |f(x)-f(y)|\, d\pi_n(x,y) \le \varepsilon + 2M_f\,\mu(X \setminus K) + 2M_f\,\delta^{-p}\int_{X \times X} d(x,y)^p\, d\pi_n(x,y). \end{align*} Taking $\limsup_{n \to \infty}$ and using the vanishing transport cost gives \begin{align*} \limsup_{n \to \infty}\left|\int_X f(x)\, d\mu_n(x)-\int_X f(y)\, d\mu(y)\right| \le \varepsilon + 2M_f\varepsilon. \end{align*} Since $\varepsilon > 0$ was arbitrary, the desired convergence holds for every bounded continuous $f: X \to \mathbb{R}$. Hence $\mu_n$ converges weakly to $\mu$. [/step] [step:Transfer the vanishing cost to convergence of the moments] Define the radial function \begin{align*} r: X \to [0,\infty), \qquad r(x) := d(x,x_0). \end{align*} For all $x,y \in X$, the reverse triangle inequality gives \begin{align*} |r(x)-r(y)| \le d(x,y). \end{align*} Hence, for the couplings $\pi_n$ chosen above, \begin{align*} \int_{X \times X} |r(x)-r(y)|^p\, d\pi_n(x,y) \le \int_{X \times X} d(x,y)^p\, d\pi_n(x,y) \to 0. \end{align*} Define [measurable functions](/page/Measurable%20Functions) $a_n,b_n: X \times X \to [0,\infty)$ by $a_n(x,y) := r(x)$ and $b_n(x,y) := r(y)$. On the finite [measure space](/page/Measure%20Space) $(X \times X,\mathcal{B}(X \times X),\pi_n)$, the $p$-norm of a measurable function $g: X \times X \to \mathbb{R}$ is \begin{align*} \|g\|_{p,\pi_n} := \left(\int_{X \times X} |g(x,y)|^p\, d\pi_n(x,y)\right)^{1/p}. \end{align*} Minkowski's inequality implies the reverse triangle inequality for this norm, and therefore \begin{align*} \left|\left(\int_{X \times X} r(x)^p\, d\pi_n(x,y)\right)^{1/p}-\left(\int_{X \times X} r(y)^p\, d\pi_n(x,y)\right)^{1/p}\right| \le \left(\int_{X \times X} |r(x)-r(y)|^p\, d\pi_n(x,y)\right)^{1/p}. \end{align*} Using the marginal identities for $\pi_n$, this becomes \begin{align*} \left|\left(\int_X d(x,x_0)^p\, d\mu_n(x)\right)^{1/p}-\left(\int_X d(y,x_0)^p\, d\mu(y)\right)^{1/p}\right| \to 0. \end{align*} Both quantities are finite because $\mu_n,\mu \in \mathcal{P}_p(X)$. Therefore their $p$-th powers converge, and \begin{align*} \int_X d(x,x_0)^p\, d\mu_n(x) \to \int_X d(x,x_0)^p\, d\mu(x). \end{align*} [/step] [step:Derive uniform integrability from weak convergence and moment convergence] Conversely, assume that $\mu_n$ converges weakly to $\mu$ and that \begin{align*} \int_X r(x)^p\, d\mu_n(x) \to \int_X r(x)^p\, d\mu(x), \end{align*} where $r: X \to [0,\infty)$ is the function $r(x) := d(x,x_0)$. We prove that the family of non-negative functions $r^p$ under the measures $\mu_n$ is uniformly integrable, in the following precise sense: \begin{align*} \lim_{R \to \infty}\sup_{n \in \mathbb{N}}\int_{\{x \in X : r(x)^p > R\}} r(x)^p\, d\mu_n(x) = 0. \end{align*} For $R > 0$, define the bounded continuous truncation \begin{align*} \varphi_R: X \to [0,R], \qquad \varphi_R(x) := \min\{r(x)^p,R\}. \end{align*} Since $\varphi_R$ is bounded and continuous, weak convergence gives \begin{align*} \int_X \varphi_R(x)\, d\mu_n(x) \to \int_X \varphi_R(x)\, d\mu(x) \end{align*} for each fixed $R > 0$. Hence \begin{align*} \limsup_{n \to \infty}\int_X \bigl(r(x)^p-\varphi_R(x)\bigr)\, d\mu_n(x) = \int_X \bigl(r(x)^p-\varphi_R(x)\bigr)\, d\mu(x). \end{align*} As $R \to \infty$, the [dominated convergence theorem](/theorems/4) applied with respect to the finite measure $\mu$ gives \begin{align*} \int_X \bigl(r(x)^p-\varphi_R(x)\bigr)\, d\mu(x) \to 0. \end{align*} Indeed, $0 \le r^p-\varphi_R \le r^p$, the function $r^p$ is $\mu$-integrable, and $r^p-\varphi_R \to 0$ pointwise. For every $R > 0$, \begin{align*} r(x)^p\mathbb{1}_{\{r(x)^p>2R\}} \le 2\bigl(r(x)^p-\varphi_R(x)\bigr). \end{align*} This proves the desired uniform integrability after also observing that finitely many initial measures have vanishing tails because each belongs to $\mathcal{P}_p(X)$. [guided] For the converse implication, assume that $\mu_n$ converges weakly to $\mu$ and that the $p$-th radial moments about $x_0$ converge. The point of this step is to turn convergence of these numerical moments into a tail estimate that is uniform in $n$. Define \begin{align*} r: X \to [0,\infty), \qquad r(x) := d(x,x_0). \end{align*} The function $r^p$ may be unbounded, so weak convergence cannot be applied directly to it. We therefore truncate it. For each $R > 0$, define \begin{align*} \varphi_R: X \to [0,R], \qquad \varphi_R(x) := \min\{r(x)^p,R\}. \end{align*} This function is bounded and continuous because $r$ is continuous and $t \mapsto \min\{t^p,R\}$ is continuous on $[0,\infty)$. Therefore weak convergence gives \begin{align*} \int_X \varphi_R(x)\, d\mu_n(x) \to \int_X \varphi_R(x)\, d\mu(x). \end{align*} Now compare the unbounded moment with its bounded truncation. Since by hypothesis \begin{align*} \int_X r(x)^p\, d\mu_n(x) \to \int_X r(x)^p\, d\mu(x), \end{align*} subtracting the convergence of the truncated integrals gives \begin{align*} \limsup_{n \to \infty}\int_X \bigl(r(x)^p-\varphi_R(x)\bigr)\, d\mu_n(x) = \int_X \bigl(r(x)^p-\varphi_R(x)\bigr)\, d\mu(x). \end{align*} The right-hand side tends to $0$ as $R \to \infty$ because $0 \le r^p-\varphi_R \le r^p$, the function $r^p$ is $\mu$-integrable, and $r^p-\varphi_R \to 0$ pointwise. This is the bounded convergence theorem on the finite measure space $(X,\mathcal{B}(X),\mu)$, equivalently the dominated convergence theorem with integrable dominator $r^p$. Finally, the tail $\{r^p>2R\}$ is controlled by the truncation error. If $r(x)^p>2R$, then $\varphi_R(x)=R$ and \begin{align*} r(x)^p \le 2\bigl(r(x)^p-R\bigr) = 2\bigl(r(x)^p-\varphi_R(x)\bigr). \end{align*} If $r(x)^p \le 2R$, then the left-hand side with the indicator is $0$. Thus for every $x \in X$, \begin{align*} r(x)^p\mathbb{1}_{\{r(x)^p>2R\}} \le 2\bigl(r(x)^p-\varphi_R(x)\bigr). \end{align*} Taking $\limsup_{n \to \infty}$ of the corresponding integrals and then sending $R \to \infty$ shows that all sufficiently large indices have uniformly small tails. The finitely many remaining indices also have small tails for large $R$, because each fixed measure $\mu_1,\dots,\mu_N$ lies in $\mathcal{P}_p(X)$. Hence \begin{align*} \lim_{R \to \infty}\sup_{n \in \mathbb{N}}\int_{\{x \in X : r(x)^p > R\}} r(x)^p\, d\mu_n(x) = 0. \end{align*} [/guided] [/step] [step:Realize the weak convergence almost surely] By the Skorokhod representation theorem for probability measures on Polish spaces, there exist a probability space $(\Omega,\mathcal{F},\mathbb{P})$, measurable maps \begin{align*} Y_n: \Omega \to X \end{align*} for $n \in \mathbb{N}$, and a measurable map \begin{align*} Y: \Omega \to X \end{align*} such that the law of $Y_n$ is $\mu_n$, the law of $Y$ is $\mu$, and \begin{align*} Y_n \to Y \quad \mathbb{P}\text{-a.s.} \end{align*} Define real-valued random variables \begin{align*} Z_n: \Omega \to [0,\infty), \qquad Z_n(\omega) := d(Y_n(\omega),Y(\omega))^p. \end{align*} The almost sure convergence $Y_n \to Y$ implies \begin{align*} Z_n \to 0 \quad \mathbb{P}\text{-a.s.} \end{align*} [/step] [step:Upgrade almost sure convergence to convergence of transport costs] For every $\omega \in \Omega$, the triangle inequality gives \begin{align*} d(Y_n(\omega),Y(\omega))^p \le 2^{p-1}\bigl(d(Y_n(\omega),x_0)^p+d(Y(\omega),x_0)^p\bigr). \end{align*} The family $\{d(Y_n,x_0)^p : n \in \mathbb{N}\}$ is uniformly integrable by the previous step, since the law of $Y_n$ is $\mu_n$. The single integrable [random variable](/page/Random%20Variable) $d(Y,x_0)^p$ is uniformly integrable as a one-element family, since $\mu \in \mathcal{P}_p(X)$. To justify the closure property used here, let $A_n := d(Y_n,x_0)^p$ and $B := d(Y,x_0)^p$. For every $M > 0$, \begin{align*} (A_n+B)\mathbb{1}_{\{A_n+B>M\}} \le A_n\mathbb{1}_{\{A_n>M/2\}} + B\mathbb{1}_{\{B>M/2\}} + A_n\mathbb{1}_{\{B>M/2\}} + B\mathbb{1}_{\{A_n>M/2\}}. \end{align*} The first two terms have uniformly small expectations as $M \to \infty$ by uniform integrability of $\{A_n\}$ and integrability of $B$. The mixed terms are controlled by first truncating one factor through the set inclusion and then using tightness of the tail indicators, which is the standard finite-sum closure of uniform integrability. Multiplication by the fixed scalar $2^{p-1}$ preserves uniform integrability. Applying these closure properties to the preceding pointwise bound shows that the family $\{Z_n : n \in \mathbb{N}\}$ is uniformly integrable. Let $\mathbb{E}[G] := \int_\Omega G(\omega)\, d\mathbb{P}(\omega)$ denote expectation of any non-negative or integrable real-valued random variable $G: \Omega \to \mathbb{R}$. By Vitali's convergence theorem, the almost sure convergence $Z_n \to 0$ together with uniform integrability implies \begin{align*} \mathbb{E}[Z_n] \to 0. \end{align*} Equivalently, \begin{align*} \mathbb{E}[d(Y_n,Y)^p] \to 0. \end{align*} [/step] [step:Use the joint laws as admissible couplings] For each $n \in \mathbb{N}$, define $\gamma_n$ to be the joint law of $(Y_n,Y)$ on $X \times X$. Then $\gamma_n \in \Pi(\mu_n,\mu)$, because the first marginal of $\gamma_n$ is the law of $Y_n$, namely $\mu_n$, and the second marginal of $\gamma_n$ is the law of $Y$, namely $\mu$. Hence, by the definition of $W_p$, \begin{align*} W_p(\mu_n,\mu)^p \le \int_{X \times X} d(x,y)^p\, d\gamma_n(x,y). \end{align*} Since $\gamma_n$ is the joint law of $(Y_n,Y)$, the right-hand side equals \begin{align*} \mathbb{E}[d(Y_n,Y)^p]. \end{align*} The previous step shows that this expectation tends to $0$, so \begin{align*} W_p(\mu_n,\mu)^p \to 0. \end{align*} Because $W_p(\mu_n,\mu) \ge 0$, we conclude that \begin{align*} W_p(\mu_n,\mu) \to 0. \end{align*} This proves the converse implication and completes the proof. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Wasserstein Convergence Characterization (Theorem # 7484)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Wasserstein Convergence Characterization (Theorem # 7484)

Discussion

Proof

Explore Further