Brenier's Theorem — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first pass from the Monge problem to the Kantorovich relaxation and choose an optimal coupling for the quadratic cost. Quadratic [Kantorovich duality](/theorems/6799) produces dual potentials; after the standard change of variables, their equality set is contained in the subdifferential of a proper lower semicontinuous convex function $\varphi$. Since the first marginal is absolutely continuous, convex differentiability holds at $\mu$-almost every point where the optimal plan lives, so the subdifferential relation becomes a graph relation $y=\nabla\varphi(x)$. This identifies the unique optimal plan as $(\operatorname{id}_{\mathbb{R}^n},\nabla\varphi)_\#\mu$, proves the Monge optimality of $\nabla\varphi$, and gives uniqueness. [/proofplan] [step:Choose an optimal Kantorovich plan for the quadratic cost] Define the quadratic cost function $c: \mathbb{R}^n \times \mathbb{R}^n \to [0,\infty)$ by $c(x,y)=|x-y|^2$. Since $\mu,\nu \in \mathcal{P}_2(\mathbb{R}^n)$, the product plan $\mu \otimes \nu$ has finite $c$-cost: \begin{align*} \int_{\mathbb{R}^n \times \mathbb{R}^n} |x-y|^2\, d(\mu \otimes \nu)(x,y) \leq 2\int_{\mathbb{R}^n} |x|^2\, d\mu(x) + 2\int_{\mathbb{R}^n} |y|^2\, d\nu(y) < \infty. \end{align*} The cost $c$ is continuous, hence lower semicontinuous. By the existence theorem for optimal Kantorovich plans for lower semicontinuous costs with finite admissible cost, there exists a coupling $\pi \in \Pi(\mu,\nu)$ such that \begin{align*} \int_{\mathbb{R}^n \times \mathbb{R}^n} |x-y|^2\, d\pi(x,y) = \inf_{\gamma \in \Pi(\mu,\nu)} \int_{\mathbb{R}^n \times \mathbb{R}^n} |x-y|^2\, d\gamma(x,y). \end{align*} Here $\Pi(\mu,\nu)$ denotes the set of Borel probability measures on $\mathbb{R}^n \times \mathbb{R}^n$ whose first marginal is $\mu$ and whose second marginal is $\nu$. This use cites a result not yet in the wiki: [existence of optimal Kantorovich plans](/theorems/7464) for lower semicontinuous costs. [/step] [step:Construct a convex potential from quadratic duality] Define the half-quadratic cost $c_0: \mathbb{R}^n \times \mathbb{R}^n \to [0,\infty)$ by \begin{align*} c_0(x,y)=\frac{1}{2}|x-y|^2. \end{align*} Apply the following precise form of quadratic Kantorovich duality with attainment to $c_0$: if the source and target are Polish spaces, the cost is continuous and finite-valued, and some admissible coupling has finite cost, then the primal infimum equals the dual supremum, the dual supremum is attained by Borel potentials $a: \mathbb{R}^n \to \mathbb{R}\cup\{-\infty\}$ and $b: \mathbb{R}^n \to \mathbb{R}\cup\{-\infty\}$ satisfying $a(x)+b(y)\leq c_0(x,y)$ for all $x,y$, the marginal integrals $\int_{\mathbb{R}^n} a\,d\mu$ and $\int_{\mathbb{R}^n} b\,d\nu$ are well-defined in the extended-real sense with finite sum equal to the primal value, and every primal optimizer is concentrated on the equality set $a(x)+b(y)=c_0(x,y)$. The spaces $\mathbb{R}^n$ and $\mathbb{R}^n$ are Polish spaces, the cost $c_0$ is continuous and finite-valued, and the product plan $\mu\otimes\nu$ has finite $c_0$-cost because $\mu,\nu\in\mathcal{P}_2(\mathbb{R}^n)$. Hence this theorem applies. We use the standard normalization of the attained potentials, which gives at least one point $x_0\in\mathbb{R}^n$ with $a(x_0)>-\infty$ and at least one point $y_0\in\mathbb{R}^n$ with $b(y_0)>-\infty$. Since $c=2c_0$, the plan $\pi$ is also optimal for $c_0$. Thus \begin{align*} a(x)+b(y) \leq \frac{1}{2}|x-y|^2 \end{align*} for all $x,y \in \mathbb{R}^n$, equality holds for $\pi$-almost every $(x,y) \in \mathbb{R}^n \times \mathbb{R}^n$, and the dual value is finite. This use cites a result not yet in the wiki: Kantorovich duality for the quadratic cost with dual attainment on Polish spaces under finite second moments. Define $h: \mathbb{R}^n \to \mathbb{R}\cup\{\infty\}$ by \begin{align*} h(y)=\frac{1}{2}|y|^2-b(y). \end{align*} Define $\varphi: \mathbb{R}^n \to (-\infty,\infty]$ to be the Legendre-Fenchel transform of $h$: \begin{align*} \varphi(x)=\sup_{y\in\mathbb{R}^n}\{x\cdot y-h(y)\}. \end{align*} As a supremum of affine functions of $x$, the function $\varphi$ is convex and lower semicontinuous. By the finiteness conclusion in the dual attainment theorem, choose points $x_0,y_0\in\mathbb{R}^n$ with $a(x_0)>-\infty$ and $b(y_0)>-\infty$. The inequality above then gives \begin{align*} \varphi(x_0) \leq \frac{1}{2}|x_0|^2-a(x_0)<\infty. \end{align*} Also $h(y_0)=\frac{1}{2}|y_0|^2-b(y_0)<\infty$, so for every $x\in\mathbb{R}^n$, \begin{align*} \varphi(x)\geq x\cdot y_0-h(y_0)>-\infty. \end{align*} Thus $\varphi$ is not identically $+\infty$ and never takes the value $-\infty$, so $\varphi$ is proper. For every $x,y \in \mathbb{R}^n$, the dual inequality gives \begin{align*} x\cdot y-h(y) \leq \frac{1}{2}|x|^2-a(x). \end{align*} Taking the supremum over $y$ gives \begin{align*} \varphi(x) \leq \frac{1}{2}|x|^2-a(x). \end{align*} On the set where $a(x)+b(y)=\frac{1}{2}|x-y|^2$, this inequality is an equality at that particular $y$: \begin{align*} \varphi(x)=x\cdot y-h(y). \end{align*} [guided] We apply the precise quadratic Kantorovich duality theorem with dual attainment to the half-quadratic cost $c_0: \mathbb{R}^n \times \mathbb{R}^n \to [0,\infty)$ defined by \begin{align*} c_0(x,y)=\frac{1}{2}|x-y|^2. \end{align*} The theorem requires Polish source and target spaces, a continuous finite-valued cost, and at least one admissible plan with finite cost. These hypotheses hold because $\mathbb{R}^n$ is Polish, $c_0$ is continuous and finite valued, and $\mu,\nu\in\mathcal{P}_2(\mathbb{R}^n)$ imply \begin{align*} \int_{\mathbb{R}^n\times\mathbb{R}^n} c_0(x,y)\,d(\mu\otimes\nu)(x,y)<\infty. \end{align*} Therefore there exist Borel dual maximizers $a,b:\mathbb{R}^n\to\mathbb{R}\cup\{-\infty\}$ such that $a(x)+b(y)\leq\frac{1}{2}|x-y|^2$ for all $x,y$, the extended-real marginal integrals against $\mu$ and $\nu$ are well-defined with finite sum equal to the primal value, and every optimal plan, in particular $\pi$, is concentrated on the equality set $a(x)+b(y)=\frac{1}{2}|x-y|^2$. We use the standard normalization of the attained potentials, so there are points $x_0,y_0\in\mathbb{R}^n$ with $a(x_0)>-\infty$ and $b(y_0)>-\infty$. The reason for introducing $h$ and $\varphi=h^*$ is that the quadratic cost hides the linear pairing $x\cdot y$. Expanding the cost $c_0(x,y)=\frac{1}{2}|x-y|^2$ gives \begin{align*} \frac{1}{2}|x-y|^2=\frac{1}{2}|x|^2+\frac{1}{2}|y|^2-x\cdot y. \end{align*} The dual inequality says \begin{align*} a(x)+b(y) \leq \frac{1}{2}|x|^2+\frac{1}{2}|y|^2-x\cdot y. \end{align*} After rearranging, this becomes \begin{align*} x\cdot y-\left(\frac{1}{2}|y|^2-b(y)\right) \leq \frac{1}{2}|x|^2-a(x). \end{align*} This motivates the definition $h(y)=\frac{1}{2}|y|^2-b(y)$ and the convex conjugate \begin{align*} \varphi(x)=\sup_{z\in\mathbb{R}^n}\{x\cdot z-h(z)\}. \end{align*} Because $\varphi$ is a supremum of affine functions $x\mapsto x\cdot z-h(z)$, it is convex and lower semicontinuous. The duality theorem also gives finite points for the two attained potentials after normalization. Hence there are points $x_0,y_0\in\mathbb{R}^n$ with $a(x_0)>-\infty$ and $b(y_0)>-\infty$. The rearranged dual inequality gives \begin{align*} \varphi(x_0) \leq \frac{1}{2}|x_0|^2-a(x_0)<\infty. \end{align*} Meanwhile $h(y_0)=\frac{1}{2}|y_0|^2-b(y_0)<\infty$, so for every $x\in\mathbb{R}^n$ the single competitor $y_0$ in the supremum gives \begin{align*} \varphi(x)\geq x\cdot y_0-h(y_0)>-\infty. \end{align*} Thus $\varphi$ is finite at least at $x_0$ and never equals $-\infty$, which is exactly the properness condition for an extended-real convex function. Now fix a pair $(x,y)$ in the equality set of the dual problem, meaning \begin{align*} a(x)+b(y)=\frac{1}{2}|x-y|^2. \end{align*} For every $z \in \mathbb{R}^n$, the dual inequality gives \begin{align*} x\cdot z-h(z) \leq \frac{1}{2}|x|^2-a(x). \end{align*} Taking the supremum over $z$ gives \begin{align*} \varphi(x) \leq \frac{1}{2}|x|^2-a(x). \end{align*} For the particular point $y$ satisfying equality, the rearranged equality gives \begin{align*} x\cdot y-h(y)=\frac{1}{2}|x|^2-a(x). \end{align*} Therefore the supremum defining $\varphi(x)$ is attained at $y$, and \begin{align*} \varphi(x)=x\cdot y-h(y). \end{align*} This equality is the bridge from optimal transport duality to convex analysis: it says that the point $y$ supports the convex function $\varphi$ at $x$. [/guided] [/step] [step:Show that the optimal plan is concentrated on the subdifferential of the potential] Let $\partial\varphi(x)$ denote the convex subdifferential of $\varphi$ at $x$, that is, \begin{align*} \partial\varphi(x)=\{p\in\mathbb{R}^n : \varphi(z)\geq \varphi(x)+p\cdot(z-x) \text{ for all } z\in\mathbb{R}^n\}. \end{align*} Let $(x,y)$ be a point where the dual equality holds. From the preceding step, \begin{align*} \varphi(x)=x\cdot y-h(y). \end{align*} For every $z\in\mathbb{R}^n$, the definition of $\varphi$ as a supremum gives \begin{align*} \varphi(z)\geq z\cdot y-h(y). \end{align*} Subtracting $\varphi(x)=x\cdot y-h(y)$ yields \begin{align*} \varphi(z)\geq \varphi(x)+y\cdot(z-x). \end{align*} Thus $y\in\partial\varphi(x)$. Since dual equality holds $\pi$-almost everywhere, we have \begin{align*} \pi(\{(x,y)\in\mathbb{R}^n\times\mathbb{R}^n : y\in\partial\varphi(x)\})=1. \end{align*} [/step] [step:Use absolute continuity to make the subdifferential single-valued almost everywhere] Let $D_\varphi \subset \mathbb{R}^n$ be the set of points at which $\varphi$ is finite and differentiable. Recall from the theorem statement that $\mathcal{L}^n$ denotes $n$-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on $\mathbb{R}^n$. Since $\pi$ is concentrated on the set $\{(x,y):y\in\partial\varphi(x)\}$, the first marginal condition gives \begin{align*} \mu(\{x\in\mathbb{R}^n:\partial\varphi(x)\neq\varnothing\})=1. \end{align*} The inclusion $\partial\varphi(x)\neq\varnothing$ implies $x\in\operatorname{dom}\varphi$, where $\operatorname{dom}\varphi=\{x\in\mathbb{R}^n:\varphi(x)<\infty\}$. Here $\operatorname{int}(A)$ denotes the topological interior of a set $A\subset \mathbb{R}^n$. The effective domain $\operatorname{dom}\varphi$ is convex. If it had empty interior, then its affine hull would have dimension at most $n-1$; hence $\operatorname{dom}\varphi$ would be contained in a proper affine subspace of $\mathbb{R}^n$. Every proper affine subspace is contained, after a rigid motion, in a coordinate hyperplane and therefore has $\mathcal{L}^n$-measure zero. Since $\mu\ll\mathcal{L}^n$, this would imply $\mu(\operatorname{dom}\varphi)=0$, contradicting $\mu(\operatorname{dom}\varphi)=1$. Thus $\operatorname{int}(\operatorname{dom}\varphi)$ is nonempty. The boundary of a convex subset of $\mathbb{R}^n$ with nonempty interior has $\mathcal{L}^n$-measure zero, so \begin{align*} \mu(\operatorname{dom}\varphi\setminus \operatorname{int}(\operatorname{dom}\varphi))=0. \end{align*} By the almost everywhere differentiability theorem for finite convex functions on open convex sets, $\varphi$ is differentiable at $\mathcal{L}^n$-almost every point of $\operatorname{int}(\operatorname{dom}\varphi)$. This use cites a result not yet in the wiki: almost everywhere differentiability of convex functions. Since $\mu\ll\mathcal{L}^n$, it follows that \begin{align*} \mu(D_\varphi)=1. \end{align*} At every $x\in D_\varphi$, the convex subdifferential is the singleton \begin{align*} \partial\varphi(x)=\{\nabla\varphi(x)\}. \end{align*} [/step] [step:Identify the optimal plan as the graph of the gradient] Define $T: \mathbb{R}^n \to \mathbb{R}^n$ on $D_\varphi$ by $T(x)=\nabla\varphi(x)$. For a finite convex function on the open convex set $\operatorname{int}(\operatorname{dom}\varphi)$, the differentiability set $D_\varphi$ is Borel and the gradient on $D_\varphi$ is Borel measurable; this follows, for example, by expressing differentiability through convergence of rational difference quotients and writing each component of $\nabla\varphi$ as the pointwise limit of one-sided difference quotients along coordinate directions on $D_\varphi$. Extend $T$ to $\mathbb{R}^n\setminus D_\varphi$ by setting $T(x)=0$. Because $D_\varphi$ is Borel and the extension is constant on its Borel complement, this gives a Borel map $T: \mathbb{R}^n\to\mathbb{R}^n$. Since $\mu(D_\varphi)=1$, the extension does not affect $T_\#\mu$. The plan $\pi$ is concentrated on $\{(x,y):y\in\partial\varphi(x)\}$, and for $\mu$-almost every $x$ this subdifferential equals $\{\nabla\varphi(x)\}$. Therefore \begin{align*} \pi(\{(x,y)\in\mathbb{R}^n\times\mathbb{R}^n:y=T(x)\})=1. \end{align*} Because the first marginal of $\pi$ is $\mu$, this graph concentration implies \begin{align*} \pi=(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu. \end{align*} Taking second marginals gives \begin{align*} T_\#\mu=\nu. \end{align*} Since $\pi$ is optimal for the Kantorovich problem and is induced by $T$, we obtain \begin{align*} \int_{\mathbb{R}^n}|x-T(x)|^2\,d\mu(x)=\int_{\mathbb{R}^n\times\mathbb{R}^n}|x-y|^2\,d\pi(x,y). \end{align*} For every Borel map $S:\mathbb{R}^n\to\mathbb{R}^n$ with $S_\#\mu=\nu$, the measure $(\operatorname{id}_{\mathbb{R}^n},S)_\#\mu$ belongs to $\Pi(\mu,\nu)$, so Kantorovich optimality of $\pi$ gives \begin{align*} \int_{\mathbb{R}^n}|x-T(x)|^2\,d\mu(x)\leq \int_{\mathbb{R}^n}|x-S(x)|^2\,d\mu(x). \end{align*} Thus $T$ attains the Monge infimum. [/step] [step:Prove uniqueness of the optimal plan and the optimal map] Let $\sigma\in\Pi(\mu,\nu)$ be any optimal Kantorovich plan for the cost $c_0(x,y)=\frac{1}{2}|x-y|^2$. Since $\sigma$ has marginals $\mu$ and $\nu$, the marginal integrals of the dual potentials against $\sigma$ are the same as those against $\pi$ and are well-defined by the duality theorem used above. Since the dual pair $(a,b)$ attains the dual value and $\sigma$ attains the primal value, the slack function $q: \mathbb{R}^n\times\mathbb{R}^n\to[0,\infty]$ defined by \begin{align*} q(x,y)=\frac{1}{2}|x-y|^2-a(x)-b(y) \end{align*} is a nonnegative Borel function. The precise duality theorem used above ensures that the extended-real marginal integrals of $a$ and $b$ are well-defined for every coupling with marginals $\mu$ and $\nu$, so \begin{align*} \int_{\mathbb{R}^n\times\mathbb{R}^n} q(x,y)\,d\sigma(x,y)=0. \end{align*} A nonnegative measurable function with integral zero is zero almost everywhere; hence $q=0$ for $\sigma$-almost every $(x,y)$. Repeating the subdifferential argument above gives \begin{align*} \sigma(\{(x,y)\in\mathbb{R}^n\times\mathbb{R}^n:y\in\partial\varphi(x)\})=1. \end{align*} Since the first marginal of $\sigma$ is $\mu$ and $\partial\varphi(x)=\{\nabla\varphi(x)\}$ for $\mu$-almost every $x$, we obtain \begin{align*} \sigma(\{(x,y)\in\mathbb{R}^n\times\mathbb{R}^n:y=T(x)\})=1. \end{align*} Therefore \begin{align*} \sigma=(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu. \end{align*} This proves uniqueness of the optimal Kantorovich plan. Now let $S:\mathbb{R}^n\to\mathbb{R}^n$ be a Borel transport map with $S_\#\mu=\nu$ that attains the Monge infimum. Then $(\operatorname{id}_{\mathbb{R}^n},S)_\#\mu$ is an optimal Kantorovich plan, so by the uniqueness just proved, \begin{align*} (\operatorname{id}_{\mathbb{R}^n},S)_\#\mu=(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu. \end{align*} The set $\{(x,y):y=T(x)\}$ has full measure for the right-hand plan, hence also for the left-hand plan. By the definition of pushforward under $(\operatorname{id}_{\mathbb{R}^n},S)$, this means \begin{align*} \mu(\{x\in\mathbb{R}^n:S(x)=T(x)\})=1. \end{align*} Thus $T=\nabla\varphi$ is unique $\mu$-almost everywhere among all optimal transport maps, and the unique optimal Kantorovich plan is $(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu$. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Kantorovich Duality

What brings you to Androma?

Start with a route through the knowledge graph.

Brenier's Theorem (Theorem # 7477)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Brenier's Theorem (Theorem # 7477)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further