[proofplan]
[Brenier's theorem](/theorems/7477) gives existence of at least one optimal quadratic transport plan induced by the gradient of a convex potential. To prove uniqueness, take two optimal plans and average them; the average is still optimal because the cost functional is affine in the plan. Quadratic [Kantorovich duality](/theorems/6799) places the averaged optimal plan on the subdifferential graph of a convex potential, and disintegration then shows that almost every conditional measure lives inside one subdifferential. Since convex functions are differentiable Lebesgue-a.e. and $\mu \ll \mathcal{L}^n$, those subdifferentials are singletons $\mu$-a.e.; hence the averaged plan is graph-induced, forcing both original optimal plans to be the same graph plan.
[/proofplan]
custom_env
admin
[step:Use Brenier's theorem to obtain an optimal graph plan]
Let $\operatorname{id}_{\mathbb{R}^n}:\mathbb{R}^n \to \mathbb{R}^n$ denote the identity map. By Brenier's theorem for quadratic cost, applied to $\mu,\nu \in \mathcal{P}_2(\mathbb{R}^n)$ with $\mu \ll \mathcal{L}^n$, there exists a proper lower semicontinuous convex function $\varphi_0:\mathbb{R}^n \to (-\infty,\infty]$, finite and differentiable $\mu$-a.e., such that the Borel map $T_0:\mathbb{R}^n \to \mathbb{R}^n$, defined by $T_0(x)=\nabla\varphi_0(x)$ on the differentiability set of $\varphi_0$ and arbitrarily on its $\mu$-null complement, satisfies
\begin{align*}
\gamma_0 := (\operatorname{id}_{\mathbb{R}^n},T_0)_\#\mu \in \Pi(\mu,\nu).
\end{align*}
Moreover, $\gamma_0$ is optimal for the quadratic Kantorovich problem. Hence the infimum is attained.
[/step]
custom_env
admin
[step:Average two optimal plans and keep optimality]
Let $\pi_1,\pi_2 \in \Pi(\mu,\nu)$ be optimal plans. Define the averaged plan
\begin{align*}
\pi := \frac{1}{2}\pi_1+\frac{1}{2}\pi_2.
\end{align*}
Since marginals are linear under convex combinations, $\pi \in \Pi(\mu,\nu)$. Let $m$ be the optimal value:
\begin{align*}
m := \int_{\mathbb{R}^n \times \mathbb{R}^n} |x-y|^2 \, d\pi_1(x,y) = \int_{\mathbb{R}^n \times \mathbb{R}^n} |x-y|^2 \, d\pi_2(x,y).
\end{align*}
By linearity of integration with respect to finite measures,
\begin{align*}
\int_{\mathbb{R}^n \times \mathbb{R}^n} |x-y|^2 \, d\pi(x,y) = \frac{1}{2}m+\frac{1}{2}m = m.
\end{align*}
Thus $\pi$ is also optimal.
[/step]
custom_env
admin
[step:Place the averaged optimal plan on a convex subdifferential graph]Since $\mu,\nu\in\mathcal{P}_2(\mathbb{R}^n)$, every admissible coupling has finite quadratic cost bounded by the second moments of $\mu$ and $\nu$. Therefore quadratic Kantorovich duality and [complementary slackness](/theorems/2559) apply to the optimal plan $\pi$. In Brenier form, this yields a proper lower semicontinuous convex function
\begin{align*}
\varphi:\mathbb{R}^n \to (-\infty,\infty]
\end{align*}
such that the averaged optimal plan $\pi$ is concentrated on the subdifferential graph
\begin{align*}
\Gamma_\varphi := \{(x,y)\in \mathbb{R}^n \times \mathbb{R}^n : y \in \partial\varphi(x)\}.
\end{align*}
That is,
\begin{align*}
\pi(\Gamma_\varphi)=1.
\end{align*}[/step]
custom_env
admin
[guided]The reason for passing to the averaged plan is that optimality is preserved by convex combinations, while duality gives a rigid geometric description of every optimal plan. We first check that the duality theorem is applicable. The hypotheses $\mu,\nu\in\mathcal{P}_2(\mathbb{R}^n)$ imply that the quadratic cost has finite integral on admissible couplings; for example, $|x-y|^2\leq 2|x|^2+2|y|^2$, and the two right-hand moments are finite under any $\rho\in\Pi(\mu,\nu)$. Thus the quadratic Kantorovich problem is finite, and optimality of $\pi$ permits applying quadratic Kantorovich duality with complementary slackness.
In the quadratic-cost Brenier formulation of duality, complementary slackness provides a proper lower semicontinuous convex function
\begin{align*}
\varphi:\mathbb{R}^n \to (-\infty,\infty]
\end{align*}
such that $\pi$ is concentrated on the dual equality set. For the cost $|x-y|^2$, the usual conversion from the quadratic dual potentials to a convex Brenier potential rewrites that equality set exactly as the subdifferential relation $y\in\partial\varphi(x)$. Therefore, if we define
\begin{align*}
\Gamma_\varphi := \{(x,y)\in \mathbb{R}^n \times \mathbb{R}^n : y \in \partial\varphi(x)\},
\end{align*}
then complementary slackness gives
\begin{align*}
\pi(\Gamma_\varphi)=1.
\end{align*}
This is the point at which optimality becomes a pointwise geometric constraint: the second coordinate $y$ is not arbitrary once the first coordinate $x$ is fixed; it must lie in the convex set $\partial\varphi(x)$.[/guided]
custom_env
admin
[step:Disintegrate the averaged plan over its first marginal]
Since $\mathbb{R}^n$ is a standard Borel space and the first marginal of $\pi$ is $\mu$, the disintegration theorem gives a $\mu$-a.e. uniquely determined family of Borel probability measures $(\pi_x)_{x\in\mathbb{R}^n}$ on $\mathbb{R}^n$ such that, for every nonnegative Borel function
\begin{align*}
F:\mathbb{R}^n \times \mathbb{R}^n \to [0,\infty],
\end{align*}
one has
\begin{align*}
\int_{\mathbb{R}^n \times \mathbb{R}^n} F(x,y)\,d\pi(x,y) = \int_{\mathbb{R}^n}\left(\int_{\mathbb{R}^n} F(x,y)\,d\pi_x(y)\right)d\mu(x).
\end{align*}
Apply this identity to the indicator function $\mathbb{1}_{\Gamma_\varphi^c}:\mathbb{R}^n \times \mathbb{R}^n \to \{0,1\}$. Since $\pi(\Gamma_\varphi^c)=0$,
\begin{align*}
0 = \int_{\mathbb{R}^n} \pi_x(\mathbb{R}^n \setminus \partial\varphi(x))\,d\mu(x).
\end{align*}
The integrand is nonnegative, so
\begin{align*}
\pi_x(\partial\varphi(x))=1
\end{align*}
for $\mu$-a.e. $x\in\mathbb{R}^n$.
[/step]
custom_env
admin
[step:Use differentiability of convex functions to make the conditional measures Dirac masses]From the previous step, $\pi_x(\partial\varphi(x))=1$ for $\mu$-a.e. $x$, so $\partial\varphi(x)\neq\varnothing$ for $\mu$-a.e. $x$. In particular, $\varphi(x)<\infty$ for $\mu$-a.e. $x$. Convex functions are differentiable $\mathcal{L}^n$-a.e. on their effective domain, and $\mu \ll \mathcal{L}^n$ transfers this null exceptional set to a $\mu$-null set. Let $D_\varphi \subset \mathbb{R}^n$ denote the Borel set of points at which $\varphi$ is finite and differentiable. Then $\mu(D_\varphi)=1$, and for every $x\in D_\varphi$,
\begin{align*}
\partial\varphi(x)=\{\nabla\varphi(x)\}.
\end{align*}
Combining this singleton identity with $\pi_x(\partial\varphi(x))=1$, we obtain
\begin{align*}
\pi_x = \delta_{\nabla\varphi(x)}
\end{align*}
for $\mu$-a.e. $x\in\mathbb{R}^n$.[/step]
custom_env
admin
[guided]The disintegration step reduced uniqueness to the size of the set $\partial\varphi(x)$. If $\partial\varphi(x)$ contains many points, then the conditional measure $\pi_x$ could split mass among them. The absolute continuity hypothesis prevents this for almost every $x$.
The previous disintegration step tells us that $\pi_x(\partial\varphi(x))=1$ for $\mu$-a.e. $x$. Since $\pi_x$ is a probability measure, this forces $\partial\varphi(x)\neq\varnothing$ for $\mu$-a.e. $x$. A point with nonempty convex subdifferential lies in the effective domain of $\varphi$, so $\varphi(x)<\infty$ for $\mu$-a.e. $x$.
Now we apply the differentiability theorem for convex functions: a convex function is differentiable $\mathcal{L}^n$-a.e. on its effective domain. Because $\mu \ll \mathcal{L}^n$, every $\mathcal{L}^n$-null set is also $\mu$-null. Therefore the set
\begin{align*}
D_\varphi := \{x\in\mathbb{R}^n : \varphi \text{ is finite and differentiable at } x\}
\end{align*}
satisfies $\mu(D_\varphi)=1$.
At each $x\in D_\varphi$, convex analysis gives the exact identification of the subdifferential:
\begin{align*}
\partial\varphi(x)=\{\nabla\varphi(x)\}.
\end{align*}
From the previous step we already know that $\pi_x(\partial\varphi(x))=1$ for $\mu$-a.e. $x$. Hence, for $\mu$-a.e. $x\in D_\varphi$,
\begin{align*}
\pi_x(\{\nabla\varphi(x)\})=1.
\end{align*}
A probability measure assigning mass $1$ to a singleton is the Dirac mass at that point, so
\begin{align*}
\pi_x = \delta_{\nabla\varphi(x)}
\end{align*}
for $\mu$-a.e. $x\in\mathbb{R}^n$.[/guided]
custom_env
admin
[step:Identify the averaged plan as a graph pushforward]
Since the gradient of a convex function is Borel on its differentiability set, choose a Borel extension
\begin{align*}
T:\mathbb{R}^n \to \mathbb{R}^n
\end{align*}
such that $T(x)=\nabla\varphi(x)$ on $D_\varphi$ and define it arbitrarily on $\mathbb{R}^n\setminus D_\varphi$. The disintegration formula and the identity $\pi_x=\delta_{T(x)}$ for $\mu$-a.e. $x$ give, for every nonnegative Borel function
\begin{align*}
F:\mathbb{R}^n \times \mathbb{R}^n \to [0,\infty],
\end{align*}
the equality
\begin{align*}
\int_{\mathbb{R}^n \times \mathbb{R}^n} F(x,y)\,d\pi(x,y) = \int_{\mathbb{R}^n} F(x,T(x))\,d\mu(x).
\end{align*}
This is precisely
\begin{align*}
\pi = (\operatorname{id}_{\mathbb{R}^n},T)_\#\mu.
\end{align*}
[/step]
custom_env
admin
[step:Force both original optimal plans to equal the same graph plan]
Let
\begin{align*}
G_T := \{(x,y)\in \mathbb{R}^n \times \mathbb{R}^n : y=T(x)\}
\end{align*}
be the graph of $T$. Since $\pi=(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu$, we have $\pi(G_T)=1$. From
\begin{align*}
\pi=\frac{1}{2}\pi_1+\frac{1}{2}\pi_2,
\end{align*}
we get, for every Borel set $A\subset \mathbb{R}^n \times \mathbb{R}^n$,
\begin{align*}
\pi_i(A)\leq 2\pi(A)
\end{align*}
for $i\in\{1,2\}$. Applying this to $A=G_T^c$ gives
\begin{align*}
\pi_i(G_T^c)=0
\end{align*}
for $i\in\{1,2\}$.
Now let $\rho\in\Pi(\mu,\nu)$ be any plan concentrated on $G_T$. For Borel sets $B,C\subset\mathbb{R}^n$,
\begin{align*}
\rho(B\times C)=\rho((B\times C)\cap G_T)=\mu(B\cap T^{-1}(C)).
\end{align*}
The right-hand side is exactly the value of $(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu$ on $B\times C$. Since Borel rectangles generate the product Borel $\sigma$-algebra, $\rho=(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu$. Applying this to $\rho=\pi_1$ and $\rho=\pi_2$ yields
\begin{align*}
\pi_1=\pi_2=(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu.
\end{align*}
Thus the optimal plan is unique.
[/step]
custom_env
admin
[step:Conclude uniqueness of the Brenier gradient]
The first step produced at least one optimal Brenier graph plan, and the previous step shows that every optimal plan equals the unique plan
\begin{align*}
\pi_*=(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu.
\end{align*}
Therefore $\pi_*=(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu$, where $T$ is a Borel representative of $\nabla\varphi$ and $\varphi$ is a convex Brenier potential. Equivalently, $\pi_*=(\operatorname{id}_{\mathbb{R}^n},\nabla\varphi)_\#\mu$ with this representative understood.
Finally, let $\psi:\mathbb{R}^n \to (-\infty,\infty]$ be another convex Brenier potential, and let $S:\mathbb{R}^n \to \mathbb{R}^n$ be a Borel representative of $\nabla\psi$ defined $\mu$-a.e. Define
\begin{align*}
G_S := \{(x,y)\in \mathbb{R}^n \times \mathbb{R}^n : y=S(x)\}
\end{align*}
to be the graph of $S$. If $(\operatorname{id}_{\mathbb{R}^n},S)_\#\mu$ is optimal, then uniqueness of the optimal plan gives
\begin{align*}
(\operatorname{id}_{\mathbb{R}^n},S)_\#\mu=(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu.
\end{align*}
The common measure is concentrated on both graphs $G_S$ and $G_T$. Hence
\begin{align*}
\mu(\{x\in\mathbb{R}^n:S(x)=T(x)\})=1.
\end{align*}
Since $S=\nabla\psi$ $\mu$-a.e. and $T=\nabla\varphi$ $\mu$-a.e., this proves
\begin{align*}
\nabla\psi=\nabla\varphi
\end{align*}
$\mu$-a.e. This completes the proof.
[/step]