[proofplan]
We first pass from the Monge problem to the Kantorovich relaxation and choose an optimal coupling for the quadratic cost. Quadratic [Kantorovich duality](/theorems/6799) produces dual potentials; after the standard change of variables, their equality set is contained in the subdifferential of a proper lower semicontinuous convex function $\varphi$. Since the first marginal is absolutely continuous, convex differentiability holds at $\mu$-almost every point where the optimal plan lives, so the subdifferential relation becomes a graph relation $y=\nabla\varphi(x)$. This identifies the unique optimal plan as $(\operatorname{id}_{\mathbb{R}^n},\nabla\varphi)_\#\mu$, proves the Monge optimality of $\nabla\varphi$, and gives uniqueness.
[/proofplan]
[step:Choose an optimal Kantorovich plan for the quadratic cost]
Define the quadratic cost function $c: \mathbb{R}^n \times \mathbb{R}^n \to [0,\infty)$ by $c(x,y)=|x-y|^2$. Since $\mu,\nu \in \mathcal{P}_2(\mathbb{R}^n)$, the product plan $\mu \otimes \nu$ has finite $c$-cost:
\begin{align*}
\int_{\mathbb{R}^n \times \mathbb{R}^n} |x-y|^2\, d(\mu \otimes \nu)(x,y) \leq 2\int_{\mathbb{R}^n} |x|^2\, d\mu(x) + 2\int_{\mathbb{R}^n} |y|^2\, d\nu(y) < \infty.
\end{align*}
The cost $c$ is continuous, hence lower semicontinuous. By the existence theorem for optimal Kantorovich plans for lower semicontinuous costs with finite admissible cost, there exists a coupling $\pi \in \Pi(\mu,\nu)$ such that
\begin{align*}
\int_{\mathbb{R}^n \times \mathbb{R}^n} |x-y|^2\, d\pi(x,y) = \inf_{\gamma \in \Pi(\mu,\nu)} \int_{\mathbb{R}^n \times \mathbb{R}^n} |x-y|^2\, d\gamma(x,y).
\end{align*}
Here $\Pi(\mu,\nu)$ denotes the set of Borel probability measures on $\mathbb{R}^n \times \mathbb{R}^n$ whose first marginal is $\mu$ and whose second marginal is $\nu$. This use cites a result not yet in the wiki: [existence of optimal Kantorovich plans](/theorems/7464) for lower semicontinuous costs.
[/step]
[step:Construct a convex potential from quadratic duality]
Define the half-quadratic cost $c_0: \mathbb{R}^n \times \mathbb{R}^n \to [0,\infty)$ by
\begin{align*}
c_0(x,y)=\frac{1}{2}|x-y|^2.
\end{align*}
Apply the following precise form of quadratic Kantorovich duality with attainment to $c_0$: if the source and target are Polish spaces, the cost is continuous and finite-valued, and some admissible coupling has finite cost, then the primal infimum equals the dual supremum, the dual supremum is attained by Borel potentials $a: \mathbb{R}^n \to \mathbb{R}\cup\{-\infty\}$ and $b: \mathbb{R}^n \to \mathbb{R}\cup\{-\infty\}$ satisfying $a(x)+b(y)\leq c_0(x,y)$ for all $x,y$, the marginal integrals $\int_{\mathbb{R}^n} a\,d\mu$ and $\int_{\mathbb{R}^n} b\,d\nu$ are well-defined in the extended-real sense with finite sum equal to the primal value, and every primal optimizer is concentrated on the equality set $a(x)+b(y)=c_0(x,y)$. The spaces $\mathbb{R}^n$ and $\mathbb{R}^n$ are Polish spaces, the cost $c_0$ is continuous and finite-valued, and the product plan $\mu\otimes\nu$ has finite $c_0$-cost because $\mu,\nu\in\mathcal{P}_2(\mathbb{R}^n)$. Hence this theorem applies. We use the standard normalization of the attained potentials, which gives at least one point $x_0\in\mathbb{R}^n$ with $a(x_0)>-\infty$ and at least one point $y_0\in\mathbb{R}^n$ with $b(y_0)>-\infty$. Since $c=2c_0$, the plan $\pi$ is also optimal for $c_0$. Thus
\begin{align*}
a(x)+b(y) \leq \frac{1}{2}|x-y|^2
\end{align*}
for all $x,y \in \mathbb{R}^n$, equality holds for $\pi$-almost every $(x,y) \in \mathbb{R}^n \times \mathbb{R}^n$, and the dual value is finite. This use cites a result not yet in the wiki: Kantorovich duality for the quadratic cost with dual attainment on Polish spaces under finite second moments.
Define $h: \mathbb{R}^n \to \mathbb{R}\cup\{\infty\}$ by
\begin{align*}
h(y)=\frac{1}{2}|y|^2-b(y).
\end{align*}
Define $\varphi: \mathbb{R}^n \to (-\infty,\infty]$ to be the Legendre-Fenchel transform of $h$:
\begin{align*}
\varphi(x)=\sup_{y\in\mathbb{R}^n}\{x\cdot y-h(y)\}.
\end{align*}
As a supremum of affine functions of $x$, the function $\varphi$ is convex and lower semicontinuous. By the finiteness conclusion in the dual attainment theorem, choose points $x_0,y_0\in\mathbb{R}^n$ with $a(x_0)>-\infty$ and $b(y_0)>-\infty$. The inequality above then gives
\begin{align*}
\varphi(x_0) \leq \frac{1}{2}|x_0|^2-a(x_0)<\infty.
\end{align*}
Also $h(y_0)=\frac{1}{2}|y_0|^2-b(y_0)<\infty$, so for every $x\in\mathbb{R}^n$,
\begin{align*}
\varphi(x)\geq x\cdot y_0-h(y_0)>-\infty.
\end{align*}
Thus $\varphi$ is not identically $+\infty$ and never takes the value $-\infty$, so $\varphi$ is proper.
For every $x,y \in \mathbb{R}^n$, the dual inequality gives
\begin{align*}
x\cdot y-h(y) \leq \frac{1}{2}|x|^2-a(x).
\end{align*}
Taking the supremum over $y$ gives
\begin{align*}
\varphi(x) \leq \frac{1}{2}|x|^2-a(x).
\end{align*}
On the set where $a(x)+b(y)=\frac{1}{2}|x-y|^2$, this inequality is an equality at that particular $y$:
\begin{align*}
\varphi(x)=x\cdot y-h(y).
\end{align*}
[guided]
We apply the precise quadratic Kantorovich duality theorem with dual attainment to the half-quadratic cost $c_0: \mathbb{R}^n \times \mathbb{R}^n \to [0,\infty)$ defined by
\begin{align*}
c_0(x,y)=\frac{1}{2}|x-y|^2.
\end{align*}
The theorem requires Polish source and target spaces, a continuous finite-valued cost, and at least one admissible plan with finite cost. These hypotheses hold because $\mathbb{R}^n$ is Polish, $c_0$ is continuous and finite valued, and $\mu,\nu\in\mathcal{P}_2(\mathbb{R}^n)$ imply
\begin{align*}
\int_{\mathbb{R}^n\times\mathbb{R}^n} c_0(x,y)\,d(\mu\otimes\nu)(x,y)<\infty.
\end{align*}
Therefore there exist Borel dual maximizers $a,b:\mathbb{R}^n\to\mathbb{R}\cup\{-\infty\}$ such that $a(x)+b(y)\leq\frac{1}{2}|x-y|^2$ for all $x,y$, the extended-real marginal integrals against $\mu$ and $\nu$ are well-defined with finite sum equal to the primal value, and every optimal plan, in particular $\pi$, is concentrated on the equality set $a(x)+b(y)=\frac{1}{2}|x-y|^2$. We use the standard normalization of the attained potentials, so there are points $x_0,y_0\in\mathbb{R}^n$ with $a(x_0)>-\infty$ and $b(y_0)>-\infty$. The reason for introducing $h$ and $\varphi=h^*$ is that the quadratic cost hides the linear pairing $x\cdot y$. Expanding the cost $c_0(x,y)=\frac{1}{2}|x-y|^2$ gives
\begin{align*}
\frac{1}{2}|x-y|^2=\frac{1}{2}|x|^2+\frac{1}{2}|y|^2-x\cdot y.
\end{align*}
The dual inequality says
\begin{align*}
a(x)+b(y) \leq \frac{1}{2}|x|^2+\frac{1}{2}|y|^2-x\cdot y.
\end{align*}
After rearranging, this becomes
\begin{align*}
x\cdot y-\left(\frac{1}{2}|y|^2-b(y)\right) \leq \frac{1}{2}|x|^2-a(x).
\end{align*}
This motivates the definition $h(y)=\frac{1}{2}|y|^2-b(y)$ and the convex conjugate
\begin{align*}
\varphi(x)=\sup_{z\in\mathbb{R}^n}\{x\cdot z-h(z)\}.
\end{align*}
Because $\varphi$ is a supremum of affine functions $x\mapsto x\cdot z-h(z)$, it is convex and lower semicontinuous. The duality theorem also gives finite points for the two attained potentials after normalization. Hence there are points $x_0,y_0\in\mathbb{R}^n$ with $a(x_0)>-\infty$ and $b(y_0)>-\infty$. The rearranged dual inequality gives
\begin{align*}
\varphi(x_0) \leq \frac{1}{2}|x_0|^2-a(x_0)<\infty.
\end{align*}
Meanwhile $h(y_0)=\frac{1}{2}|y_0|^2-b(y_0)<\infty$, so for every $x\in\mathbb{R}^n$ the single competitor $y_0$ in the supremum gives
\begin{align*}
\varphi(x)\geq x\cdot y_0-h(y_0)>-\infty.
\end{align*}
Thus $\varphi$ is finite at least at $x_0$ and never equals $-\infty$, which is exactly the properness condition for an extended-real convex function.
Now fix a pair $(x,y)$ in the equality set of the dual problem, meaning
\begin{align*}
a(x)+b(y)=\frac{1}{2}|x-y|^2.
\end{align*}
For every $z \in \mathbb{R}^n$, the dual inequality gives
\begin{align*}
x\cdot z-h(z) \leq \frac{1}{2}|x|^2-a(x).
\end{align*}
Taking the supremum over $z$ gives
\begin{align*}
\varphi(x) \leq \frac{1}{2}|x|^2-a(x).
\end{align*}
For the particular point $y$ satisfying equality, the rearranged equality gives
\begin{align*}
x\cdot y-h(y)=\frac{1}{2}|x|^2-a(x).
\end{align*}
Therefore the supremum defining $\varphi(x)$ is attained at $y$, and
\begin{align*}
\varphi(x)=x\cdot y-h(y).
\end{align*}
This equality is the bridge from optimal transport duality to convex analysis: it says that the point $y$ supports the convex function $\varphi$ at $x$.
[/guided]
[/step]
[step:Show that the optimal plan is concentrated on the subdifferential of the potential]
Let $\partial\varphi(x)$ denote the convex subdifferential of $\varphi$ at $x$, that is,
\begin{align*}
\partial\varphi(x)=\{p\in\mathbb{R}^n : \varphi(z)\geq \varphi(x)+p\cdot(z-x) \text{ for all } z\in\mathbb{R}^n\}.
\end{align*}
Let $(x,y)$ be a point where the dual equality holds. From the preceding step,
\begin{align*}
\varphi(x)=x\cdot y-h(y).
\end{align*}
For every $z\in\mathbb{R}^n$, the definition of $\varphi$ as a supremum gives
\begin{align*}
\varphi(z)\geq z\cdot y-h(y).
\end{align*}
Subtracting $\varphi(x)=x\cdot y-h(y)$ yields
\begin{align*}
\varphi(z)\geq \varphi(x)+y\cdot(z-x).
\end{align*}
Thus $y\in\partial\varphi(x)$. Since dual equality holds $\pi$-almost everywhere, we have
\begin{align*}
\pi(\{(x,y)\in\mathbb{R}^n\times\mathbb{R}^n : y\in\partial\varphi(x)\})=1.
\end{align*}
[/step]
[step:Use absolute continuity to make the subdifferential single-valued almost everywhere]
Let $D_\varphi \subset \mathbb{R}^n$ be the set of points at which $\varphi$ is finite and differentiable. Recall from the theorem statement that $\mathcal{L}^n$ denotes $n$-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on $\mathbb{R}^n$. Since $\pi$ is concentrated on the set $\{(x,y):y\in\partial\varphi(x)\}$, the first marginal condition gives
\begin{align*}
\mu(\{x\in\mathbb{R}^n:\partial\varphi(x)\neq\varnothing\})=1.
\end{align*}
The inclusion $\partial\varphi(x)\neq\varnothing$ implies $x\in\operatorname{dom}\varphi$, where $\operatorname{dom}\varphi=\{x\in\mathbb{R}^n:\varphi(x)<\infty\}$.
Here $\operatorname{int}(A)$ denotes the topological interior of a set $A\subset \mathbb{R}^n$. The effective domain $\operatorname{dom}\varphi$ is convex. If it had empty interior, then its affine hull would have dimension at most $n-1$; hence $\operatorname{dom}\varphi$ would be contained in a proper affine subspace of $\mathbb{R}^n$. Every proper affine subspace is contained, after a rigid motion, in a coordinate hyperplane and therefore has $\mathcal{L}^n$-measure zero. Since $\mu\ll\mathcal{L}^n$, this would imply $\mu(\operatorname{dom}\varphi)=0$, contradicting $\mu(\operatorname{dom}\varphi)=1$. Thus $\operatorname{int}(\operatorname{dom}\varphi)$ is nonempty. The boundary of a convex subset of $\mathbb{R}^n$ with nonempty interior has $\mathcal{L}^n$-measure zero, so
\begin{align*}
\mu(\operatorname{dom}\varphi\setminus \operatorname{int}(\operatorname{dom}\varphi))=0.
\end{align*}
By the almost everywhere differentiability theorem for finite convex functions on open convex sets, $\varphi$ is differentiable at $\mathcal{L}^n$-almost every point of $\operatorname{int}(\operatorname{dom}\varphi)$. This use cites a result not yet in the wiki: almost everywhere differentiability of convex functions. Since $\mu\ll\mathcal{L}^n$, it follows that
\begin{align*}
\mu(D_\varphi)=1.
\end{align*}
At every $x\in D_\varphi$, the convex subdifferential is the singleton
\begin{align*}
\partial\varphi(x)=\{\nabla\varphi(x)\}.
\end{align*}
[/step]
[step:Identify the optimal plan as the graph of the gradient]
Define $T: \mathbb{R}^n \to \mathbb{R}^n$ on $D_\varphi$ by $T(x)=\nabla\varphi(x)$. For a finite convex function on the open convex set $\operatorname{int}(\operatorname{dom}\varphi)$, the differentiability set $D_\varphi$ is Borel and the gradient on $D_\varphi$ is Borel measurable; this follows, for example, by expressing differentiability through convergence of rational difference quotients and writing each component of $\nabla\varphi$ as the pointwise limit of one-sided difference quotients along coordinate directions on $D_\varphi$. Extend $T$ to $\mathbb{R}^n\setminus D_\varphi$ by setting $T(x)=0$. Because $D_\varphi$ is Borel and the extension is constant on its Borel complement, this gives a Borel map $T: \mathbb{R}^n\to\mathbb{R}^n$. Since $\mu(D_\varphi)=1$, the extension does not affect $T_\#\mu$.
The plan $\pi$ is concentrated on $\{(x,y):y\in\partial\varphi(x)\}$, and for $\mu$-almost every $x$ this subdifferential equals $\{\nabla\varphi(x)\}$. Therefore
\begin{align*}
\pi(\{(x,y)\in\mathbb{R}^n\times\mathbb{R}^n:y=T(x)\})=1.
\end{align*}
Because the first marginal of $\pi$ is $\mu$, this graph concentration implies
\begin{align*}
\pi=(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu.
\end{align*}
Taking second marginals gives
\begin{align*}
T_\#\mu=\nu.
\end{align*}
Since $\pi$ is optimal for the Kantorovich problem and is induced by $T$, we obtain
\begin{align*}
\int_{\mathbb{R}^n}|x-T(x)|^2\,d\mu(x)=\int_{\mathbb{R}^n\times\mathbb{R}^n}|x-y|^2\,d\pi(x,y).
\end{align*}
For every Borel map $S:\mathbb{R}^n\to\mathbb{R}^n$ with $S_\#\mu=\nu$, the measure $(\operatorname{id}_{\mathbb{R}^n},S)_\#\mu$ belongs to $\Pi(\mu,\nu)$, so Kantorovich optimality of $\pi$ gives
\begin{align*}
\int_{\mathbb{R}^n}|x-T(x)|^2\,d\mu(x)\leq \int_{\mathbb{R}^n}|x-S(x)|^2\,d\mu(x).
\end{align*}
Thus $T$ attains the Monge infimum.
[/step]
[step:Prove uniqueness of the optimal plan and the optimal map]
Let $\sigma\in\Pi(\mu,\nu)$ be any optimal Kantorovich plan for the cost $c_0(x,y)=\frac{1}{2}|x-y|^2$. Since $\sigma$ has marginals $\mu$ and $\nu$, the marginal integrals of the dual potentials against $\sigma$ are the same as those against $\pi$ and are well-defined by the duality theorem used above. Since the dual pair $(a,b)$ attains the dual value and $\sigma$ attains the primal value, the slack function $q: \mathbb{R}^n\times\mathbb{R}^n\to[0,\infty]$ defined by
\begin{align*}
q(x,y)=\frac{1}{2}|x-y|^2-a(x)-b(y)
\end{align*}
is a nonnegative Borel function. The precise duality theorem used above ensures that the extended-real marginal integrals of $a$ and $b$ are well-defined for every coupling with marginals $\mu$ and $\nu$, so
\begin{align*}
\int_{\mathbb{R}^n\times\mathbb{R}^n} q(x,y)\,d\sigma(x,y)=0.
\end{align*}
A nonnegative measurable function with integral zero is zero almost everywhere; hence $q=0$ for $\sigma$-almost every $(x,y)$. Repeating the subdifferential argument above gives
\begin{align*}
\sigma(\{(x,y)\in\mathbb{R}^n\times\mathbb{R}^n:y\in\partial\varphi(x)\})=1.
\end{align*}
Since the first marginal of $\sigma$ is $\mu$ and $\partial\varphi(x)=\{\nabla\varphi(x)\}$ for $\mu$-almost every $x$, we obtain
\begin{align*}
\sigma(\{(x,y)\in\mathbb{R}^n\times\mathbb{R}^n:y=T(x)\})=1.
\end{align*}
Therefore
\begin{align*}
\sigma=(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu.
\end{align*}
This proves uniqueness of the optimal Kantorovich plan.
Now let $S:\mathbb{R}^n\to\mathbb{R}^n$ be a Borel transport map with $S_\#\mu=\nu$ that attains the Monge infimum. Then $(\operatorname{id}_{\mathbb{R}^n},S)_\#\mu$ is an optimal Kantorovich plan, so by the uniqueness just proved,
\begin{align*}
(\operatorname{id}_{\mathbb{R}^n},S)_\#\mu=(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu.
\end{align*}
The set $\{(x,y):y=T(x)\}$ has full measure for the right-hand plan, hence also for the left-hand plan. By the definition of pushforward under $(\operatorname{id}_{\mathbb{R}^n},S)$, this means
\begin{align*}
\mu(\{x\in\mathbb{R}^n:S(x)=T(x)\})=1.
\end{align*}
Thus $T=\nabla\varphi$ is unique $\mu$-almost everywhere among all optimal transport maps, and the unique optimal Kantorovich plan is $(\operatorname{id}_{\mathbb{R}^n},T)_\#\mu$.
[/step]