[proofplan]
The quadratic identity follows by expanding $|x-y|^2$ and observing that the marginal constraints make the two second-moment terms independent of the transport plan. For the structural statement, we use [Kantorovich duality](/theorems/6799) for the quadratic cost to obtain optimal dual potentials and then rewrite those potentials as a dual pair for maximizing the correlation $x \cdot y$. Convex conjugation replaces this correlation-dual pair by a Fenchel pair $(\varphi,\varphi^*)$ without increasing its integral value. [Complementary slackness](/theorems/2559) then forces equality in Fenchel's inequality on the optimal plan, and Fenchel-Young equality identifies the equality set with the graph of the convex subdifferential.
[/proofplan]
[step:Expand the quadratic cost and separate the fixed marginal terms]
For every $\pi \in \Pi(\mu,\nu)$, the function
\begin{align*}
(x,y) \mapsto x \cdot y
\end{align*}
is $\pi$-integrable because [Young's inequality](/theorems/244) gives
\begin{align*}
|x \cdot y| \leq \frac{1}{2}|x|^2 + \frac{1}{2}|y|^2
\end{align*}
and $\mu,\nu \in \mathcal P_2(\mathbb R^n)$. Since $\pi$ has marginals $\mu$ and $\nu$, we have
\begin{align*}
\int_{\mathbb R^n \times \mathbb R^n} |x|^2 \, d\pi(x,y)=\int_{\mathbb R^n} |x|^2 \, d\mu(x)
\end{align*}
and
\begin{align*}
\int_{\mathbb R^n \times \mathbb R^n} |y|^2 \, d\pi(x,y)=\int_{\mathbb R^n} |y|^2 \, d\nu(y).
\end{align*}
Using the pointwise identity
\begin{align*}
|x-y|^2=|x|^2+|y|^2-2x \cdot y,
\end{align*}
we obtain, for every $\pi \in \Pi(\mu,\nu)$,
\begin{align*}
\int_{\mathbb R^n \times \mathbb R^n} |x-y|^2 \, d\pi(x,y)=A-2\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y),
\end{align*}
where the finite constant $A$ is defined by
\begin{align*}
A:=\int_{\mathbb R^n} |x|^2 \, d\mu(x)+\int_{\mathbb R^n} |y|^2 \, d\nu(y).
\end{align*}
Taking the infimum over $\pi \in \Pi(\mu,\nu)$ on the left side is therefore the same as taking the supremum of the correlation term on the right side:
\begin{align*}
\inf_{\pi \in \Pi(\mu,\nu)} \int_{\mathbb R^n \times \mathbb R^n} |x-y|^2 \, d\pi(x,y)=A-2\sup_{\pi \in \Pi(\mu,\nu)}\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y).
\end{align*}
[/step]
[step:Obtain quadratic dual potentials and complementary slackness]
By the existence theorem for optimal transport plans with lower semicontinuous costs bounded from below by an integrable function, applied to the continuous nonnegative cost
\begin{align*}
c: \mathbb R^n \times \mathbb R^n \to [0,\infty), \qquad c(x,y):=|x-y|^2,
\end{align*}
there exists a plan $\pi_0 \in \Pi(\mu,\nu)$ attaining the quadratic-cost infimum. We now fix any such optimal plan $\pi_0$.
We now use the Kantorovich duality hypothesis stated in the theorem statement. Applied to $\pi_0$, it gives Borel functions
\begin{align*}
f: \mathbb R^n \to \mathbb R, \qquad g: \mathbb R^n \to \mathbb R
\end{align*}
such that $\int_{\mathbb R^n} |f| \, d\mu < \infty$ and $\int_{\mathbb R^n} |g| \, d\nu < \infty$,
\begin{align*}
f(x)+g(y)\leq |x-y|^2 \quad \text{for all } x,y \in \mathbb R^n,
\end{align*}
and
\begin{align*}
\int_{\mathbb R^n} f \, d\mu+\int_{\mathbb R^n} g \, d\nu=\int_{\mathbb R^n \times \mathbb R^n} |x-y|^2 \, d\pi_0(x,y).
\end{align*}
Since the nonnegative function $(x,y)\mapsto |x-y|^2-f(x)-g(y)$ has integral zero with respect to $\pi_0$, it vanishes $\pi_0$-almost everywhere:
\begin{align*}
f(x)+g(y)=|x-y|^2 \quad \text{for } \pi_0\text{-almost every }(x,y).
\end{align*}
[guided]
The role of Kantorovich duality is to replace the global minimization over all couplings by pointwise inequalities involving two one-variable functions. The cost function is
\begin{align*}
c: \mathbb R^n \times \mathbb R^n \to [0,\infty), \qquad c(x,y):=|x-y|^2.
\end{align*}
It is continuous, hence lower semicontinuous, and nonnegative. Since $\mu,\nu \in \mathcal P_2(\mathbb R^n)$, the coupling set is nonempty: define the product coupling $\pi_{\mathrm{prod}} \in \Pi(\mu,\nu)$ by
\begin{align*}
\pi_{\mathrm{prod}}(A \times B):=\mu(A)\nu(B)
\end{align*}
for all Borel sets $A,B \subset \mathbb R^n$, and extend this prescription uniquely to the Borel product $\sigma$-algebra on $\mathbb R^n \times \mathbb R^n$. This coupling has finite quadratic cost because
\begin{align*}
|x-y|^2 \leq 2|x|^2+2|y|^2.
\end{align*}
Thus the usual existence theorem for lower semicontinuous transport costs applies and gives an optimal plan $\pi_0 \in \Pi(\mu,\nu)$.
Next we use the Kantorovich duality hypothesis stated in the theorem statement. This is the strong finite-second-moment quadratic-cost version needed here: it gives real-valued Borel representatives of the dual potentials, integrability with respect to the two marginals, the pointwise dual inequality on all of $\mathbb R^n \times \mathbb R^n$, dual attainment, and complementary slackness for the fixed optimal plan $\pi_0$. Thus we obtain Borel functions
\begin{align*}
f: \mathbb R^n \to \mathbb R, \qquad g: \mathbb R^n \to \mathbb R
\end{align*}
such that $\int_{\mathbb R^n} |f| \, d\mu < \infty$ and $\int_{\mathbb R^n} |g| \, d\nu < \infty$, satisfying the dual feasibility inequality
\begin{align*}
f(x)+g(y)\leq |x-y|^2 \quad \text{for all } x,y \in \mathbb R^n.
\end{align*}
Dual optimality says that the dual value equals the primal value:
\begin{align*}
\int_{\mathbb R^n} f \, d\mu+\int_{\mathbb R^n} g \, d\nu=\int_{\mathbb R^n \times \mathbb R^n} |x-y|^2 \, d\pi_0(x,y).
\end{align*}
Now define the nonnegative slack function
\begin{align*}
s: \mathbb R^n \times \mathbb R^n \to [0,\infty], \qquad s(x,y):=|x-y|^2-f(x)-g(y).
\end{align*}
Dual feasibility gives $s \geq 0$. Integrating $s$ with respect to $\pi_0$ and using the marginal identities for $\pi_0$ gives
\begin{align*}
\int_{\mathbb R^n \times \mathbb R^n} s \, d\pi_0=\int_{\mathbb R^n \times \mathbb R^n} |x-y|^2 \, d\pi_0(x,y)-\int_{\mathbb R^n} f \, d\mu-\int_{\mathbb R^n} g \, d\nu=0.
\end{align*}
A nonnegative measurable function with integral zero is zero almost everywhere. Therefore
\begin{align*}
f(x)+g(y)=|x-y|^2 \quad \text{for } \pi_0\text{-almost every }(x,y).
\end{align*}
This is the complementary slackness relation that will later become equality in Fenchel's inequality.
[/guided]
[/step]
[step:Transform the quadratic dual pair into a correlation dual pair]
Define the Borel functions
\begin{align*}
\alpha: \mathbb R^n \to \mathbb R \cup \{\infty\}, \qquad \alpha(x):=\frac{|x|^2-f(x)}{2}
\end{align*}
and
\begin{align*}
\beta: \mathbb R^n \to \mathbb R \cup \{\infty\}, \qquad \beta(y):=\frac{|y|^2-g(y)}{2}.
\end{align*}
The quadratic dual inequality is equivalent to
\begin{align*}
x \cdot y \leq \alpha(x)+\beta(y) \quad \text{for all } x,y \in \mathbb R^n.
\end{align*}
Also,
\begin{align*}
\int_{\mathbb R^n} \alpha \, d\mu+\int_{\mathbb R^n} \beta \, d\nu=\frac{1}{2}\left(A-\int_{\mathbb R^n} f \, d\mu-\int_{\mathbb R^n} g \, d\nu\right).
\end{align*}
Using the identity proved in the first step and the optimality of $\pi_0$, this becomes
\begin{align*}
\int_{\mathbb R^n} \alpha \, d\mu+\int_{\mathbb R^n} \beta \, d\nu=\sup_{\pi \in \Pi(\mu,\nu)}\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y).
\end{align*}
Moreover, complementary slackness for $f,g$ is equivalent to
\begin{align*}
x \cdot y=\alpha(x)+\beta(y) \quad \text{for } \pi_0\text{-almost every }(x,y).
\end{align*}
[/step]
[step:Replace the correlation dual pair by a Fenchel dual pair]
Since $f$ is finite $\mu$-almost everywhere with $\int_{\mathbb R^n} |f| \, d\mu < \infty$ and $g$ is finite $\nu$-almost everywhere with $\int_{\mathbb R^n} |g| \, d\nu < \infty$, the functions $\alpha$ and $\beta$ are finite $\mu$-almost everywhere and $\nu$-almost everywhere, respectively. Choose $x_0 \in \mathbb R^n$ with $\alpha(x_0)<\infty$ and choose $y_0 \in \mathbb R^n$ with $\beta(y_0)<\infty$. The correlation dual inequality gives the affine lower bound
\begin{align*}
\beta(y) \geq x_0 \cdot y-\alpha(x_0) \quad \text{for all } y \in \mathbb R^n.
\end{align*}
Define the convex conjugate of $\beta$ by the function
\begin{align*}
\varphi: \mathbb R^n \to (-\infty,\infty], \qquad \varphi(x):=\sup_{y \in \mathbb R^n}\{x \cdot y-\beta(y)\}.
\end{align*}
Because $x \cdot y \leq \alpha(x)+\beta(y)$ for all $x,y \in \mathbb R^n$, we have
\begin{align*}
\varphi(x)\leq \alpha(x) \quad \text{for all } x \in \mathbb R^n.
\end{align*}
The function $\varphi$ is convex and lower semicontinuous as a supremum of affine functions. The function $\beta$ is proper because it is finite at $y_0$, never takes the value $-\infty$, and has the affine minorant $y \mapsto x_0 \cdot y-\alpha(x_0)$. Hence its convex conjugate is a well-defined extended-real convex function. The function $\varphi$ is proper: the inequality above gives $\varphi(x_0)\leq \alpha(x_0)<\infty$, while $\beta(y_0)<\infty$ gives
\begin{align*}
\varphi(x)\geq x \cdot y_0-\beta(y_0)>-\infty \quad \text{for all } x \in \mathbb R^n.
\end{align*}
Moreover, $\varphi$ is bounded below by the affine function $x \mapsto x \cdot y_0-\beta(y_0)$, and $\varphi^*$ will be bounded below by the affine function $y \mapsto x_0 \cdot y-\varphi(x_0)$. Since $\mu,\nu \in \mathcal P_2(\mathbb R^n)$, both affine lower bounds are integrable with respect to $\mu$ and $\nu$. Thus the extended integrals of $\varphi$ and $\varphi^*$ are well-defined from below and cannot contain an ambiguous $\infty-\infty$ cancellation.
Let $\varphi^*: \mathbb R^n \to (-\infty,\infty]$ denote the convex conjugate of $\varphi$:
\begin{align*}
\varphi^*(y):=\sup_{x \in \mathbb R^n}\{x \cdot y-\varphi(x)\}.
\end{align*}
Since $\beta$ is proper and bounded below by an affine function, the biconjugate $\beta^{**}$ is well-defined and the Fenchel-Moreau biconjugation inequality gives $\beta^{**}\leq \beta$ pointwise. Since $\varphi=\beta^*$, we have $\varphi^*=\beta^{**}$, and therefore $\varphi^*\leq \beta$ pointwise. Hence
\begin{align*}
\int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu\leq \int_{\mathbb R^n} \alpha \, d\mu+\int_{\mathbb R^n} \beta \, d\nu.
\end{align*}
On the other hand, Fenchel's inequality gives
\begin{align*}
x \cdot y \leq \varphi(x)+\varphi^*(y) \quad \text{for all } x,y \in \mathbb R^n.
\end{align*}
Integrating this inequality against any $\pi \in \Pi(\mu,\nu)$ gives
\begin{align*}
\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y)\leq \int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu.
\end{align*}
Taking the supremum over $\pi \in \Pi(\mu,\nu)$ and using the optimality of $(\alpha,\beta)$ for the correlation dual problem yields
\begin{align*}
\int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu=\sup_{\pi \in \Pi(\mu,\nu)}\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y).
\end{align*}
[guided]
The delicate point is that $\alpha$, $\beta$, $\varphi$, and $\varphi^*$ may be extended-valued, so we first put affine lower bounds in place before comparing integrals. Since $f$ is finite $\mu$-almost everywhere with $\int_{\mathbb R^n} |f| \, d\mu < \infty$ and $g$ is finite $\nu$-almost everywhere with $\int_{\mathbb R^n} |g| \, d\nu < \infty$, the functions $\alpha$ and $\beta$ are finite almost everywhere for their respective measures. Choose $x_0 \in \mathbb R^n$ with $\alpha(x_0)<\infty$ and $y_0 \in \mathbb R^n$ with $\beta(y_0)<\infty$. From
\begin{align*}
x \cdot y \leq \alpha(x)+\beta(y) \quad \text{for all } x,y \in \mathbb R^n
\end{align*}
we get, by fixing $x=x_0$,
\begin{align*}
\beta(y) \geq x_0 \cdot y-\alpha(x_0) \quad \text{for all } y \in \mathbb R^n.
\end{align*}
This affine lower bound is exactly what prevents the conjugate from degenerating at $x_0$.
Define
\begin{align*}
\varphi: \mathbb R^n \to (-\infty,\infty], \qquad \varphi(x):=\sup_{y \in \mathbb R^n}\{x \cdot y-\beta(y)\}.
\end{align*}
For every $x,y \in \mathbb R^n$, the inequality $x \cdot y \leq \alpha(x)+\beta(y)$ gives $x \cdot y-\beta(y)\leq \alpha(x)$, and taking the supremum over $y$ gives
\begin{align*}
\varphi(x)\leq \alpha(x) \quad \text{for all } x \in \mathbb R^n.
\end{align*}
In particular, $\varphi(x_0)\leq \alpha(x_0)<\infty$. Also, because $\beta(y_0)<\infty$, the single competitor $y_0$ in the supremum gives
\begin{align*}
\varphi(x)\geq x \cdot y_0-\beta(y_0)>-\infty \quad \text{for all } x \in \mathbb R^n.
\end{align*}
Thus $\varphi$ is proper. It is convex and lower semicontinuous because it is the supremum of affine functions of $x$.
Now define
\begin{align*}
\varphi^*: \mathbb R^n \to (-\infty,\infty], \qquad \varphi^*(y):=\sup_{x \in \mathbb R^n}\{x \cdot y-\varphi(x)\}.
\end{align*}
The same competitor $x_0$ gives the affine lower bound
\begin{align*}
\varphi^*(y)\geq x_0 \cdot y-\varphi(x_0) \quad \text{for all } y \in \mathbb R^n.
\end{align*}
Since $\mu$ and $\nu$ have finite second moments, every affine function on $\mathbb R^n$ is integrable with respect to $\mu$ and $\nu$. Therefore the extended integrals of $\varphi$ and $\varphi^*$ are defined from below; no expression of the form $\infty-\infty$ is being used.
Because $\beta$ is proper and bounded below by an affine function, the biconjugate $\beta^{**}$ is well-defined and the Fenchel-Moreau biconjugation inequality gives $\beta^{**}\leq \beta$ pointwise. Since $\varphi=\beta^*$, we have $\varphi^*=\beta^{**}$, so $\varphi^*\leq \beta$ pointwise. Together with $\varphi\leq \alpha$, this yields
\begin{align*}
\int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu\leq \int_{\mathbb R^n} \alpha \, d\mu+\int_{\mathbb R^n} \beta \, d\nu.
\end{align*}
Fenchel's inequality for the proper lower semicontinuous convex function $\varphi$ gives
\begin{align*}
x \cdot y \leq \varphi(x)+\varphi^*(y) \quad \text{for all } x,y \in \mathbb R^n.
\end{align*}
Integrating against any coupling $\pi \in \Pi(\mu,\nu)$ gives
\begin{align*}
\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y)\leq \int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu.
\end{align*}
Taking the supremum over all couplings gives
\begin{align*}
\sup_{\pi \in \Pi(\mu,\nu)}\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y)\leq \int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu.
\end{align*}
On the other hand, the previous comparison with $\alpha$ and $\beta$ gives
\begin{align*}
\int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu\leq \int_{\mathbb R^n} \alpha \, d\mu+\int_{\mathbb R^n} \beta \, d\nu.
\end{align*}
It remains to identify the right-hand side. From the definitions of $\alpha$ and $\beta$,
\begin{align*}
\int_{\mathbb R^n} \alpha \, d\mu+\int_{\mathbb R^n} \beta \, d\nu=\frac{1}{2}\left(A-\int_{\mathbb R^n} f \, d\mu-\int_{\mathbb R^n} g \, d\nu\right),
\end{align*}
where
\begin{align*}
A:=\int_{\mathbb R^n} |x|^2 \, d\mu(x)+\int_{\mathbb R^n} |y|^2 \, d\nu(y).
\end{align*}
Dual optimality for $f,g$ says
\begin{align*}
\int_{\mathbb R^n} f \, d\mu+\int_{\mathbb R^n} g \, d\nu=\int_{\mathbb R^n \times \mathbb R^n} |x-y|^2 \, d\pi_0(x,y).
\end{align*}
Expanding $|x-y|^2=|x|^2+|y|^2-2x\cdot y$ and using that $\pi_0$ has marginals $\mu$ and $\nu$, we get
\begin{align*}
\int_{\mathbb R^n} \alpha \, d\mu+\int_{\mathbb R^n} \beta \, d\nu=\int_{\mathbb R^n \times \mathbb R^n} x\cdot y \, d\pi_0(x,y).
\end{align*}
Since $\pi_0$ is a coupling, this last quantity is at most the correlation supremum. Combining the three inequalities forces equality:
\begin{align*}
\int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu=\sup_{\pi \in \Pi(\mu,\nu)}\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi(x,y).
\end{align*}
[/guided]
[/step]
[step:Use complementary slackness for the Fenchel pair]
Define the nonnegative measurable function
\begin{align*}
h: \mathbb R^n \times \mathbb R^n \to [0,\infty], \qquad h(x,y):=\varphi(x)+\varphi^*(y)-x \cdot y.
\end{align*}
Fenchel's inequality gives $h \geq 0$. The affine lower bounds obtained in the preceding step imply that the extended integrals of $\varphi$ and $\varphi^*$ are defined from below. Since their sum has already been proved equal to the finite correlation supremum, no undefined extended-real subtraction occurs below. Since $\pi_0$ maximizes the correlation functional by the identity in the first step, and since $(\varphi,\varphi^*)$ has the same dual value, we have
\begin{align*}
\int_{\mathbb R^n \times \mathbb R^n} h \, d\pi_0=\int_{\mathbb R^n} \varphi \, d\mu+\int_{\mathbb R^n} \varphi^* \, d\nu-\int_{\mathbb R^n \times \mathbb R^n} x \cdot y \, d\pi_0(x,y)=0.
\end{align*}
Therefore $h=0$ for $\pi_0$-almost every $(x,y)$, meaning
\begin{align*}
\varphi(x)+\varphi^*(y)=x \cdot y \quad \text{for } \pi_0\text{-almost every }(x,y).
\end{align*}
[/step]
[step:Identify the Fenchel equality set with the subdifferential graph]
By the Fenchel-Young equality criterion for proper lower semicontinuous convex functions, for $x,y \in \mathbb R^n$ one has
\begin{align*}
\varphi(x)+\varphi^*(y)=x \cdot y
\end{align*}
if and only if $y \in \partial \varphi(x)$. The graph of $\partial \varphi$ is closed in $\mathbb R^n \times \mathbb R^n$, hence Borel measurable. Applying this equivalence to the $\pi_0$-almost everywhere equality obtained in the previous step gives
\begin{align*}
\pi_0\bigl(\{(x,y) \in \mathbb R^n \times \mathbb R^n : y \in \partial \varphi(x)\}\bigr)=1.
\end{align*}
Thus the optimal plan $\pi_0$ is concentrated on the graph of the convex subdifferential of the proper lower semicontinuous convex function $\varphi$, as claimed.
[/step]