One-Dimensional Wasserstein Quantile Formula — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We construct the increasing coupling by pushing forward [Lebesgue measure](/page/Lebesgue%20Measure) on $(0,1)$ through the two quantile functions. The quantile transform gives the prescribed marginals, so this coupling gives the upper bound for $W_p(\mu,\nu)^p$. For the reverse inequality, we prove that any coupling with a crossing can be uncrossed without increasing the cost $|x-y|^p$, first for finitely supported couplings and then by approximation for general couplings. The uncrossed limiting coupling is precisely the quantile coupling, giving optimality. [/proofplan] [step:Define the quantile coupling and verify its marginals] Let $\lambda$ denote the restriction of one-dimensional Lebesgue measure $\mathcal{L}^1$ to the Borel space $((0,1),\mathcal{B}((0,1)))$. Define the quantile maps \begin{align*} Q_\mu:(0,1)&\to\mathbb{R}, \qquad s\mapsto F_\mu^{-1}(s), \end{align*} and \begin{align*} Q_\nu:(0,1)&\to\mathbb{R}, \qquad s\mapsto F_\nu^{-1}(s). \end{align*} Both maps are nondecreasing and Borel measurable. Define the probability measure $\gamma$ on $\mathbb{R}^2$ by \begin{align*} \gamma=(Q_\mu,Q_\nu)_\#\lambda. \end{align*} We show that $\gamma$ has first marginal $\mu$ and second marginal $\nu$. For $x\in\mathbb{R}$, the defining property of the generalized inverse gives \begin{align*} Q_\mu(s)\leq x \iff s\leq F_\mu(x) \end{align*} for $\lambda$-almost every $s\in(0,1)$. Therefore \begin{align*} \gamma((-\infty,x]\times\mathbb{R})=\lambda(\{s\in(0,1):Q_\mu(s)\leq x\})=F_\mu(x). \end{align*} Thus the first marginal of $\gamma$ is $\mu$. The same argument with $\nu$ gives \begin{align*} \gamma(\mathbb{R}\times(-\infty,y])=F_\nu(y) \end{align*} for every $y\in\mathbb{R}$, so the second marginal is $\nu$. Hence $\gamma\in\Pi(\mu,\nu)$, where $\Pi(\mu,\nu)$ denotes the set of Borel probability measures on $\mathbb{R}^2$ with marginals $\mu$ and $\nu$. [guided] We want to build a coupling of $\mu$ and $\nu$ from one common parameter $s\in(0,1)$. The correct maps are the generalized inverse functions, so we define \begin{align*} Q_\mu:(0,1)&\to\mathbb{R}, \qquad s\mapsto F_\mu^{-1}(s), \end{align*} and \begin{align*} Q_\nu:(0,1)&\to\mathbb{R}, \qquad s\mapsto F_\nu^{-1}(s). \end{align*} Because each distribution function is nondecreasing and right-continuous, each generalized inverse is nondecreasing and Borel measurable. Let $\lambda=\mathcal{L}^1|_{(0,1)}$ be Lebesgue probability measure on $(0,1)$, and define \begin{align*} \gamma=(Q_\mu,Q_\nu)_\#\lambda. \end{align*} This means that for every Borel set $A\subset\mathbb{R}^2$, \begin{align*} \gamma(A)=\lambda(\{s\in(0,1):(Q_\mu(s),Q_\nu(s))\in A\}). \end{align*} We now verify the marginals directly from distribution functions. Fix $x\in\mathbb{R}$. By the definition \begin{align*} Q_\mu(s)=\inf\{a\in\mathbb{R}:F_\mu(a)\geq s\}, \end{align*} the inequality $Q_\mu(s)\leq x$ is equivalent, up to the harmless endpoint ambiguity of a monotone function, to $s\leq F_\mu(x)$. Hence \begin{align*} \lambda(\{s\in(0,1):Q_\mu(s)\leq x\})=F_\mu(x). \end{align*} Therefore \begin{align*} \gamma((-\infty,x]\times\mathbb{R})=F_\mu(x), \end{align*} which says exactly that the first marginal of $\gamma$ has distribution function $F_\mu$, hence is $\mu$. Repeating the same argument for $Q_\nu$ gives \begin{align*} \gamma(\mathbb{R}\times(-\infty,y])=F_\nu(y) \end{align*} for every $y\in\mathbb{R}$, so the second marginal is $\nu$. Thus $\gamma$ is an admissible transport plan from $\mu$ to $\nu$. [/guided] [/step] [step:Use the quantile coupling to obtain the upper bound] Since $\gamma\in\Pi(\mu,\nu)$ and $W_p(\mu,\nu)^p$ is the infimum of $\int_{\mathbb{R}^2}|x-y|^p\,d\pi(x,y)$ over all $\pi\in\Pi(\mu,\nu)$, the definition of the Wasserstein distance gives \begin{align*} W_p(\mu,\nu)^p\leq \int_{\mathbb{R}^2}|x-y|^p\,d\gamma(x,y). \end{align*} By the definition of pushforward measure under $(Q_\mu,Q_\nu)$, this integral equals \begin{align*} \int_{\mathbb{R}^2}|x-y|^p\,d\gamma(x,y)=\int_0^1 |Q_\mu(s)-Q_\nu(s)|^p\,d\mathcal{L}^1(s). \end{align*} Therefore \begin{align*} W_p(\mu,\nu)^p\leq \int_0^1 |F_\mu^{-1}(s)-F_\nu^{-1}(s)|^p\,d\mathcal{L}^1(s). \end{align*} The integral is finite because $\mu,\nu\in\mathcal{P}_p(\mathbb{R})$ imply $Q_\mu,Q_\nu\in L^p((0,1),\lambda)$, and the inequality $|a-b|^p\leq 2^{p-1}(|a|^p+|b|^p)$ applies for all $a,b\in\mathbb{R}$. [/step] [step:Prove the crossing inequality for the cost $|x-y|^p$] Let $x_1,x_2,y_1,y_2\in\mathbb{R}$ satisfy $x_1<x_2$ and $y_1<y_2$. We claim that \begin{align*} |x_1-y_1|^p+|x_2-y_2|^p\leq |x_1-y_2|^p+|x_2-y_1|^p. \end{align*} Define the convex function \begin{align*} \phi:\mathbb{R}\to\mathbb{R}, \qquad \phi(t)=|t|^p. \end{align*} For fixed $x_1<x_2$, define \begin{align*} h:\mathbb{R}\to\mathbb{R}, \qquad h(y)=\phi(x_2-y)-\phi(x_1-y). \end{align*} Convexity of $\phi$ implies that the difference quotient of $\phi$ over intervals of the same length is nondecreasing in the base point. Since \begin{align*} h(y)=\phi(x_2-y)-\phi(x_1-y), \end{align*} the function $h$ is nonincreasing in $y$. Hence $h(y_2)\leq h(y_1)$, which is precisely \begin{align*} |x_2-y_2|^p-|x_1-y_2|^p\leq |x_2-y_1|^p-|x_1-y_1|^p. \end{align*} Rearranging gives the desired crossing inequality. [guided] The crossing inequality is the algebraic reason monotone transport is optimal on the line. Suppose $x_1<x_2$ but the larger target $y_2$ is paired with the smaller source $x_1$, while the smaller target $y_1$ is paired with the larger source $x_2$. We want to show that uncrossing these two assignments cannot increase the total cost. Define \begin{align*} \phi:\mathbb{R}\to\mathbb{R}, \qquad \phi(t)=|t|^p. \end{align*} Since $p\geq1$, the function $\phi$ is convex. Fix $x_1<x_2$, and define \begin{align*} h:\mathbb{R}\to\mathbb{R}, \qquad h(y)=\phi(x_2-y)-\phi(x_1-y). \end{align*} The quantity $h(y)$ measures how much more expensive it is to send $x_2$ to $y$ instead of sending $x_1$ to $y$. Convexity says that slopes of secant lines are monotone. Applied to the interval endpoints $x_1-y$ and $x_2-y$, this means that as $y$ increases, both endpoints shift left, and the increment \begin{align*} \phi(x_2-y)-\phi(x_1-y) \end{align*} cannot increase. Therefore $h$ is nonincreasing. Since $y_1<y_2$, we get \begin{align*} h(y_2)\leq h(y_1). \end{align*} Writing this out gives \begin{align*} |x_2-y_2|^p-|x_1-y_2|^p\leq |x_2-y_1|^p-|x_1-y_1|^p. \end{align*} After moving the negative terms to the other side, this becomes \begin{align*} |x_1-y_1|^p+|x_2-y_2|^p\leq |x_1-y_2|^p+|x_2-y_1|^p. \end{align*} Thus matching the smaller source with the smaller target and the larger source with the larger target is no more expensive than the crossed matching. [/guided] [/step] [step:Minimize the cost among finitely supported couplings by uncrossing] Assume first that $\mu$ and $\nu$ are finitely supported probability measures on $\mathbb{R}$. Write \begin{align*} \mu=\sum_{i=1}^{m} r_i\delta_{x_i}. \end{align*} \begin{align*} \nu=\sum_{j=1}^{n} c_j\delta_{y_j}. \end{align*} Here $x_1<\cdots<x_m$, $y_1<\cdots<y_n$, $r_i>0$, $c_j>0$, $\sum_{i=1}^{m}r_i=1$, $\sum_{j=1}^{n}c_j=1$, $\delta_z$ denotes the Dirac probability measure at $z\in\mathbb{R}$, and $\delta_{(x_i,y_j)}$ denotes the Dirac probability measure at $(x_i,y_j)\in\mathbb{R}^2$. Let \begin{align*} \pi=\sum_{i=1}^{m}\sum_{j=1}^{n} a_{ij}\delta_{(x_i,y_j)} \end{align*} be a coupling of $\mu$ and $\nu$, where $a_{ij}\geq0$, $\sum_{j=1}^{n}a_{ij}=r_i$, and $\sum_{i=1}^{m}a_{ij}=c_j$. Construct the monotone filling matrix $b=(b_{ij})$ by the finite north-west rule: starting at $(i,j)=(1,1)$, put $b_{ij}=\min\{r_i^{\mathrm{rem}},c_j^{\mathrm{rem}}\}$, subtract this amount from the remaining row mass $r_i^{\mathrm{rem}}$ and column mass $c_j^{\mathrm{rem}}$, then increase $i$ when the row is exhausted and increase $j$ when the column is exhausted. Since at least one row or column is exhausted at each move, the construction terminates after at most $m+n-1$ moves. It produces a coupling, and its support is monotone: if $b_{ij}>0$ and $b_{i'j'}>0$ with $i<i'$, then $j\leq j'$. Let $a=(a_{ij})$ be any cost-minimizing coupling matrix among all matrices with these row and column sums. Such a minimizer exists because the transportation polytope is a closed and bounded subset of $\mathbb{R}^{mn}$ and the cost is a linear function of $a$. Among all cost minimizers, choose one for which the lexicographic vector of cumulative southwest masses \begin{align*} M_{ij}(a)=\sum_{u=1}^{i}\sum_{v=1}^{j}a_{uv} \end{align*} is maximal in lexicographic order over the finite list of pairs $(i,j)$. We claim that this chosen minimizer has no crossing. Suppose instead that $i_1<i_2$ and $j_1<j_2$ satisfy $a_{i_1j_2}>0$ and $a_{i_2j_1}>0$. Define \begin{align*} \varepsilon=\min\{a_{i_1j_2},a_{i_2j_1}\}. \end{align*} Move mass $\varepsilon$ from $(i_1,j_2)$ and $(i_2,j_1)$ to $(i_1,j_1)$ and $(i_2,j_2)$. This preserves every row sum and column sum. By the crossing inequality, it does not increase the cost, so the new matrix is again a cost minimizer. For every lower-left rectangle, its mass changes by $0$ except for rectangles whose row cutoff lies in $\{i_1,\dots,i_2-1\}$ and whose column cutoff lies in $\{j_1,\dots,j_2-1\}$, where the mass increases by $\varepsilon$. Hence the lexicographic vector $(M_{ij})$ strictly increases, contradicting the choice of $a$. Thus a cost minimizer is monotone. It remains to identify the monotone coupling. For $R_i=\sum_{u=1}^{i}r_u$ and $C_j=\sum_{v=1}^{j}c_v$, monotonicity forces \begin{align*} \sum_{u=1}^{i}\sum_{v=1}^{j}a_{uv}=\min\{R_i,C_j\}. \end{align*} Indeed, if $R_i\leq C_j$, then all mass in rows $u\leq i$ must lie in columns $v\leq j$, since any mass with $u\leq i$ and $v>j$ would require, by the column sums up to $j$, positive mass with $u'>i$ and $v'\leq j$, creating a crossing. The case $C_j\leq R_i$ is symmetric. These rectangle values determine every entry by finite inclusion-exclusion, so the monotone coupling is unique. The north-west filling matrix has exactly these rectangle values; hence it is the unique monotone coupling. It is also exactly $(Q_\mu,Q_\nu)_\#\lambda$, because $Q_\mu=x_i$ on the interval $(R_{i-1},R_i]$ and $Q_\nu=y_j$ on the interval $(C_{j-1},C_j]$, with $R_0=C_0=0$. Therefore, in the finitely supported case, \begin{align*} \int_{\mathbb{R}^2}|x-y|^p\,d\pi(x,y)\geq \int_0^1 |Q_\mu(s)-Q_\nu(s)|^p\,d\mathcal{L}^1(s) \end{align*} for every $\pi\in\Pi(\mu,\nu)$. [/step] [step:Pass from finitely supported measures to arbitrary measures in $\mathcal{P}_p(\mathbb{R})$] Let $\pi\in\Pi(\mu,\nu)$ be arbitrary. For each $k\in\mathbb{N}$, define the truncation-and-quantization map \begin{align*} T_k:\mathbb{R}&\to 2^{-k}\mathbb{Z}\cap[-k,k], \qquad x\mapsto \max\{-k,\min\{k,2^{-k}\lfloor 2^k x\rfloor\}\}. \end{align*} Define finitely supported probability measures \begin{align*} \mu_k=(T_k)_\#\mu, \qquad \nu_k=(T_k)_\#\nu, \qquad \pi_k=(T_k,T_k)_\#\pi. \end{align*} Then $\pi_k\in\Pi(\mu_k,\nu_k)$. Since $T_k(x)\to x$ for every $x\in\mathbb{R}$ and $|T_k(x)|\leq |x|+1$, dominated convergence with respect to $\pi$ gives \begin{align*} \lim_{k\to\infty}\int_{\mathbb{R}^2}|T_k(x)-T_k(y)|^p\,d\pi(x,y)=\int_{\mathbb{R}^2}|x-y|^p\,d\pi(x,y), \end{align*} because $|T_k(x)-T_k(y)|^p\leq 2^{p-1}(|x|+1)^p+2^{p-1}(|y|+1)^p$ and the right-hand side is $\pi$-integrable by the marginal moment hypotheses. Applying the finitely supported case to $\mu_k$ and $\nu_k$ gives \begin{align*} \int_{\mathbb{R}^2}|T_k(x)-T_k(y)|^p\,d\pi(x,y)\geq \int_0^1 |Q_{\mu_k}(s)-Q_{\nu_k}(s)|^p\,d\mathcal{L}^1(s). \end{align*} For the same monotone truncation-and-quantization map $T_k$, the generalized inverse of $(T_k)_\#\mu$ agrees with $T_k\circ Q_\mu$ for $\mathcal{L}^1$-almost every $s\in(0,1)$. To verify this, let $z\in 2^{-k}\mathbb{Z}\cap[-k,k]$ and use monotonicity of $T_k$ to write \begin{align*} \{s\in(0,1):T_k(Q_\mu(s))\leq z\}=\{s\in(0,1):Q_\mu(s)\leq \sup T_k^{-1}((-∞,z])\} \end{align*} up to endpoints. By the sublevel identity for $Q_\mu$, the $\lambda$-measure of this set equals $\mu(T_k^{-1}((-∞,z]))$, which is exactly the distribution function of $(T_k)_\#\mu$ at $z$. Since both $Q_{\mu_k}$ and $T_k\circ Q_\mu$ take values in the same finite ordered set, equality of all sublevel sets up to $\lambda$-null endpoints gives $Q_{\mu_k}=T_k\circ Q_\mu$ $\lambda$-almost everywhere. The same argument gives $Q_{\nu_k}=T_k\circ Q_\nu$ $\lambda$-almost everywhere. Here $L^p((0,1),\lambda)$ denotes the space of $p$-integrable real-valued functions on the probability space $((0,1),\mathcal{B}((0,1)),\lambda)$, modulo equality $\lambda$-almost everywhere. Since $T_k(t) o t$ for every $t\in\mathbb{R}$ and $|T_k(t)|\leq |t|+1$, dominated convergence on $((0,1),\mathcal{B}((0,1)),\lambda)$ gives \begin{align*} Q_{\mu_k}\to Q_\mu \text{ in } L^p((0,1),\lambda). \end{align*} \begin{align*} Q_{\nu_k}\to Q_\nu \text{ in } L^p((0,1),\lambda). \end{align*} The map $(f,g)\mapsto \|f-g\|_{L^p((0,1),\lambda)}^p$ is continuous under $L^p$ convergence, so \begin{align*} \lim_{k\to\infty}\int_0^1 |Q_{\mu_k}(s)-Q_{\nu_k}(s)|^p\,d\mathcal{L}^1(s)=\int_0^1 |Q_\mu(s)-Q_\nu(s)|^p\,d\mathcal{L}^1(s). \end{align*} Passing to the limit yields \begin{align*} \int_{\mathbb{R}^2}|x-y|^p\,d\pi(x,y)\geq \int_0^1 |Q_\mu(s)-Q_\nu(s)|^p\,d\mathcal{L}^1(s). \end{align*} Since $\pi\in\Pi(\mu,\nu)$ was arbitrary, taking the infimum over all couplings gives \begin{align*} W_p(\mu,\nu)^p\geq \int_0^1 |Q_\mu(s)-Q_\nu(s)|^p\,d\mathcal{L}^1(s). \end{align*} [/step] [step:Combine the two inequalities] The upper bound from the quantile coupling gives \begin{align*} W_p(\mu,\nu)^p\leq \int_0^1 |Q_\mu(s)-Q_\nu(s)|^p\,d\mathcal{L}^1(s), \end{align*} while the optimality argument for arbitrary couplings gives \begin{align*} W_p(\mu,\nu)^p\geq \int_0^1 |Q_\mu(s)-Q_\nu(s)|^p\,d\mathcal{L}^1(s). \end{align*} Therefore \begin{align*} W_p(\mu,\nu)^p=\int_0^1 |Q_\mu(s)-Q_\nu(s)|^p\,d\mathcal{L}^1(s). \end{align*} Substituting $Q_\mu=F_\mu^{-1}$ and $Q_\nu=F_\nu^{-1}$ gives \begin{align*} W_p(\mu,\nu)^p=\int_0^1 |F_\mu^{-1}(s)-F_\nu^{-1}(s)|^p\,d\mathcal{L}^1(s). \end{align*} This is the desired formula. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Distribution

What brings you to Androma?

Start with a route through the knowledge graph.

One-Dimensional Wasserstein Quantile Formula (Theorem # 7485)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

One-Dimensional Wasserstein Quantile Formula (Theorem # 7485)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further