Benamou-Brenier Formula and Vanishing Entropic Regularization for Quadratic Optimal Transport (Theorem # 9604)
Theorem
Let $n\in\mathbb N$, and let $\mu_0,\mu_1\in\mathcal P_2(\mathbb R^n)$. Write $\Pi(\mu_0,\mu_1)$ for the set of Borel probability measures on $\mathbb R^n\times\mathbb R^n$ whose first marginal is $\mu_0$ and whose second marginal is $\mu_1$. Define
\begin{align*}
W_2^2(\mu_0,\mu_1):=\inf_{\pi\in\Pi(\mu_0,\mu_1)}\int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi(x,y).
\end{align*}
Then
\begin{align*}
W_2^2(\mu_0,\mu_1)=\inf_{(\rho,v)}\int_0^1\int_{\mathbb R^n}|v_t(x)|^2\,d\rho_t(x)\,d\mathcal L^1(t),
\end{align*}
where the infimum is taken over all pairs $(\rho,v)$ such that $\rho:[0,1]\to\mathcal P_2(\mathbb R^n)$, $t\mapsto\rho_t$, is narrowly continuous, $\rho_{t=0}=\mu_0$, $\rho_{t=1}=\mu_1$, $v:(0,1)\times\mathbb R^n\to\mathbb R^n$ is Borel, the kinetic action is finite, and the continuity equation $\partial_t\rho_t+\operatorname{div}(v_t\rho_t)=0$ holds in the distributional sense: for every $\phi\in C_c^\infty((0,1)\times\mathbb R^n)$,
\begin{align*}
\int_0^1\int_{\mathbb R^n}\left(\partial_t\phi(t,x)+\nabla_x\phi(t,x)\cdot v_t(x)\right)\,d\rho_t(x)\,d\mathcal L^1(t)=0.
\end{align*}
Moreover, let $m:=\mu_0\otimes\mu_1$. For every $\varepsilon>0$, define
\begin{align*}
E_\varepsilon(\mu_0,\mu_1):=\inf_{\pi\in\Pi(\mu_0,\mu_1)}\left\{\int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi(x,y)+\varepsilon H(\pi\mid m)\right\}.
\end{align*}
Here $H(\pi\mid m):=\int_{\mathbb R^n\times\mathbb R^n}\log(d\pi/dm)\,d\pi(x,y)$ if $\pi\ll m$, and $H(\pi\mid m):=+\infty$ otherwise. For each $\varepsilon>0$, the entropic problem defining $E_\varepsilon(\mu_0,\mu_1)$ has a unique minimizer. If there exists a sequence $(\pi_k)_{k\in\mathbb N}\subset\Pi(\mu_0,\mu_1)$ such that $\pi_k\ll m$, $H(\pi_k\mid m)<\infty$ for every $k\in\mathbb N$, and
\begin{align*}
\lim_{k\to\infty}\int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi_k(x,y)=W_2^2(\mu_0,\mu_1),
\end{align*}
then
\begin{align*}
\lim_{\varepsilon\downarrow0}E_\varepsilon(\mu_0,\mu_1)=W_2^2(\mu_0,\mu_1).
\end{align*}
Knowledge Status
Analysis
Discussion
No discussion available for this theorem.
Proof
[proofplan]
We prove the Benamou-Brenier identity by comparing the static optimal-transport problem with the dynamic continuity-equation problem. An optimal static plan gives a dynamic competitor by moving each pair $(x,y)$ along the straight segment $t\mapsto(1-t)x+ty$; disintegration gives the correct Eulerian velocity when many particles meet at the same space-time point. Conversely, the superposition principle represents any admissible continuity-equation solution as a probability measure on absolutely continuous particle paths, and Cauchy-Schwarz bounds the endpoint transport cost by the kinetic action. Finally, the entropic problem is handled by the direct method, strict convexity of relative entropy, and an epsilon-dependent recovery sequence built from the assumed finite-entropy approximating couplings.
[/proofplan]
[step:Construct a dynamic competitor from an optimal static plan]
By the existence theorem for optimal transport plans with lower semicontinuous costs, applied to the lower semicontinuous cost
\begin{align*}
c:\mathbb R^n\times\mathbb R^n&\to[0,\infty)
\end{align*}
\begin{align*}
(x,y)&\mapsto |x-y|^2,
\end{align*}
there exists $\pi\in\Pi(\mu_0,\mu_1)$ such that
\begin{align*}
\int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi(x,y)=W_2^2(\mu_0,\mu_1).
\end{align*}
Here the hypotheses are satisfied because $\mu_0,\mu_1\in\mathcal P_2(\mathbb R^n)$ imply
\begin{align*}
\int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d(\mu_0\otimes\mu_1)(x,y)<\infty.
\end{align*}
The cited existence result is not yet resolved to a wiki theorem ID: Existence of optimal transport plans for lower semicontinuous costs.
For each $t\in[0,1]$, define the interpolation map
\begin{align*}
S_t:\mathbb R^n\times\mathbb R^n \to \mathbb R^n,\qquad (x,y)\mapsto (1-t)x+ty
\end{align*}
and define
\begin{align*}
\rho_t:=(S_t)_\#\pi.
\end{align*}
For every $f\in C_b(\mathbb R^n)$, the map
\begin{align*}
[0,1]\to\mathbb R,\qquad t\mapsto \int_{\mathbb R^n}f(z)\,d\rho_t(z)=\int_{\mathbb R^n\times\mathbb R^n}f((1-t)x+ty)\,d\pi(x,y)
\end{align*}
Since $f$ is bounded and continuous, dominated convergence proves that this map is continuous. Hence $t\mapsto\rho_t$ is narrowly continuous. Also $\rho_{t=0}=\mu_0$ and $\rho_{t=1}=\mu_1$.
For every $t\in[0,1]$, the measure $\rho_t$ has finite second moment. Indeed, by the definition of pushforward and the inequality $|(1-t)x+ty|^2\le 2|x|^2+2|y|^2$,
\begin{align*}
\int_{\mathbb R^n}|z|^2\,d\rho_t(z)\le 2\int_{\mathbb R^n}|x|^2\,d\mu_0(x)+2\int_{\mathbb R^n}|y|^2\,d\mu_1(y)<\infty.
\end{align*}
Thus $\rho_t\in\mathcal P_2(\mathbb R^n)$ for every $t\in[0,1]$.
Define the Borel map
\begin{align*}
T:(0,1)\times\mathbb R^n\times\mathbb R^n \to (0,1)\times\mathbb R^n,\qquad (t,x,y)\mapsto (t,(1-t)x+ty)
\end{align*}
Let
\begin{align*}
\eta:=\mathcal L^1\!\restriction_{(0,1)}\otimes\pi
\end{align*}
be the product measure on $(0,1)\times\mathbb R^n\times\mathbb R^n$, and let
\begin{align*}
\sigma:=T_\#\eta.
\end{align*}
Then $\sigma$ is the measure on $(0,1)\times\mathbb R^n$ given by
\begin{align*}
d\sigma(t,z)=d\rho_t(z)\,d\mathcal L^1(t).
\end{align*}
By the disintegration theorem, applied to the Borel map $T$ between standard Borel spaces, there exists a $\sigma$-a.e. uniquely determined family $(\eta_{t,z})_{(t,z)\in(0,1)\times\mathbb R^n}$ of probability measures on $(0,1)\times\mathbb R^n\times\mathbb R^n$ concentrated on $T^{-1}(\{(t,z)\})$ such that, for every nonnegative Borel function $F:(0,1)\times\mathbb R^n\times\mathbb R^n\to[0,\infty]$,
\begin{align*}
\int F\,d\eta=\int_{(0,1)\times\mathbb R^n}\int F(s,x,y)\,d\eta_{t,z}(s,x,y)\,d\sigma(t,z).
\end{align*}
The cited disintegration result is not yet resolved to a wiki theorem ID: Disintegration theorem.
Define a Borel vector field
\begin{align*}
v:(0,1)\times\mathbb R^n \to \mathbb R^n,\qquad (t,z)\mapsto \int_{\mathbb R^n\times\mathbb R^n}(y-x)\,d\eta_{t,z}(t,x,y)
\end{align*}
on a full $\sigma$-measure set, and define it arbitrarily on the remaining $\sigma$-null set. Jensen's inequality for the convex map $a\mapsto |a|^2$ gives
\begin{align*}
\int_0^1\int_{\mathbb R^n}|v_t(z)|^2\,d\rho_t(z)\,d\mathcal L^1(t)\le \int_0^1\int_{\mathbb R^n\times\mathbb R^n}|y-x|^2\,d\pi(x,y)\,d\mathcal L^1(t).
\end{align*}
Therefore
\begin{align*}
\int_0^1\int_{\mathbb R^n}|v_t(z)|^2\,d\rho_t(z)\,d\mathcal L^1(t)\le W_2^2(\mu_0,\mu_1).
\end{align*}
[guided]
The straight-line interpolation is simple at the level of particles: a particle starting at $x$ and ending at $y$ follows the path $t\mapsto (1-t)x+ty$. The only subtlety is that the theorem asks for an Eulerian velocity $v_t(z)$ depending only on the current position $z$, not on the hidden pair $(x,y)$. If several pairs $(x,y)$ arrive at the same point $z$ at the same time $t$, the correct Eulerian velocity is the conditional average of $y-x$ over those pairs.
By the existence theorem for optimal transport plans with lower semicontinuous costs, there exists $\pi\in\Pi(\mu_0,\mu_1)$ minimizing the quadratic cost. Its hypotheses apply because the cost
\begin{align*}
c:\mathbb R^n\times\mathbb R^n&\to[0,\infty)
\end{align*}
\begin{align*}
(x,y)&\mapsto |x-y|^2
\end{align*}
is continuous and hence lower semicontinuous, while $\mu_0,\mu_1\in\mathcal P_2(\mathbb R^n)$ imply finite admissible cost:
\begin{align*}
\int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d(\mu_0\otimes\mu_1)(x,y)<\infty.
\end{align*}
Thus we choose $\pi\in\Pi(\mu_0,\mu_1)$ such that
\begin{align*}
\int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi(x,y)=W_2^2(\mu_0,\mu_1).
\end{align*}
The cited existence result is not yet resolved to a wiki theorem ID: Existence of optimal transport plans for lower semicontinuous costs.
For each $t\in[0,1]$, define
\begin{align*}
S_t:\mathbb R^n\times\mathbb R^n&\to\mathbb R^n
\end{align*}
\begin{align*}
(x,y)&\mapsto (1-t)x+ty,
\end{align*}
and set $\rho_t:=(S_t)_\#\pi$. This means that, for every Borel set $A\in\mathcal B(\mathbb R^n)$,
\begin{align*}
\rho_t(A)=\pi(S_t^{-1}(A)).
\end{align*}
The endpoint conditions follow from $S_0(x,y)=x$ and $S_1(x,y)=y$, together with the marginal conditions on $\pi$:
\begin{align*}
\rho_{t=0}=\mu_0,\qquad \rho_{t=1}=\mu_1.
\end{align*}
For every $t\in[0,1]$, the measure $\rho_t$ has finite second moment because
\begin{align*}
\int_{\mathbb R^n}|z|^2\,d\rho_t(z)=\int_{\mathbb R^n\times\mathbb R^n}|(1-t)x+ty|^2\,d\pi(x,y)
\end{align*}
and $|(1-t)x+ty|^2\le 2|x|^2+2|y|^2$. Since the marginals of $\pi$ are $\mu_0$ and $\mu_1$, this gives
\begin{align*}
\int_{\mathbb R^n}|z|^2\,d\rho_t(z)\le 2\int_{\mathbb R^n}|x|^2\,d\mu_0(x)+2\int_{\mathbb R^n}|y|^2\,d\mu_1(y)<\infty.
\end{align*}
Thus $\rho_t\in\mathcal P_2(\mathbb R^n)$, as required for admissibility.
To prove narrow continuity, let $f\in C_b(\mathbb R^n)$. Then
\begin{align*}
\int_{\mathbb R^n}f(z)\,d\rho_t(z)=\int_{\mathbb R^n\times\mathbb R^n}f((1-t)x+ty)\,d\pi(x,y).
\end{align*}
For fixed $(x,y)$, the integrand is continuous in $t$, and it is bounded by $\|f\|_\infty$. Dominated convergence with respect to the probability measure $\pi$ gives continuity of $t\mapsto\int f\,d\rho_t$, which is exactly narrow continuity.
Now define the map that records both time and current position:
\begin{align*}
T:(0,1)\times\mathbb R^n\times\mathbb R^n&\to(0,1)\times\mathbb R^n
\end{align*}
\begin{align*}
(t,x,y)&\mapsto (t,(1-t)x+ty).
\end{align*}
Let
\begin{align*}
\eta:=\mathcal L^1\!\restriction_{(0,1)}\otimes\pi.
\end{align*}
The pushforward $\sigma:=T_\#\eta$ is precisely the space-time measure $d\rho_t(z)\,d\mathcal L^1(t)$, because for every nonnegative Borel function $\Psi:(0,1)\times\mathbb R^n\to[0,\infty]$,
\begin{align*}
\int\Psi(t,z)\,d\sigma(t,z)=\int_0^1\int_{\mathbb R^n\times\mathbb R^n}\Psi(t,(1-t)x+ty)\,d\pi(x,y)\,d\mathcal L^1(t).
\end{align*}
By the definition of $\rho_t$, the last expression equals
\begin{align*}
\int_0^1\int_{\mathbb R^n}\Psi(t,z)\,d\rho_t(z)\,d\mathcal L^1(t).
\end{align*}
The disintegration theorem now supplies conditional probability measures $\eta_{t,z}$ over each fiber $T^{-1}(\{(t,z)\})$. We use them to average the microscopic velocity $y-x$. Define
\begin{align*}
v:(0,1)\times\mathbb R^n&\to\mathbb R^n
\end{align*}
\begin{align*}
(t,z)&\mapsto \int_{(0,1)\times\mathbb R^n\times\mathbb R^n}(y-x)\,d\eta_{t,z}(s,x,y)
\end{align*}
on a full $\sigma$-measure set, and define it arbitrarily outside that set. This is the barycentric projection of the particle velocities; the variable $s$ is the time coordinate in the fiber measure, and the fiber condition forces $s=t$ for $\eta_{t,z}$-a.e. triple.
The action estimate is exactly Jensen's inequality. Since $a\mapsto |a|^2$ is convex,
\begin{align*}
|v_t(z)|^2\le \int_{\mathbb R^n\times\mathbb R^n}|y-x|^2\,d\eta_{t,z}(t,x,y)
\end{align*}
for $\sigma$-a.e. $(t,z)$. Integrating with respect to $d\sigma(t,z)=d\rho_t(z)\,d\mathcal L^1(t)$ gives
\begin{align*}
\int_0^1\int_{\mathbb R^n}|v_t(z)|^2\,d\rho_t(z)\,d\mathcal L^1(t)\le \int_0^1\int_{\mathbb R^n\times\mathbb R^n}|y-x|^2\,d\pi(x,y)\,d\mathcal L^1(t).
\end{align*}
Since $\mathcal L^1((0,1))=1$ and $\pi$ is optimal, the right-hand side equals $W_2^2(\mu_0,\mu_1)$. Hence the constructed dynamic competitor has action at most the static optimal cost.
[/guided]
[/step]
[step:Verify the continuity equation for the straight-line interpolation]
Let $\phi\in C_c^\infty((0,1)\times\mathbb R^n)$. For every $(x,y)\in\mathbb R^n\times\mathbb R^n$, define
\begin{align*}
g_{x,y}:(0,1) \to \mathbb R,\qquad t\mapsto \phi(t,(1-t)x+ty)
\end{align*}
By the chain rule,
\begin{align*}
g_{x,y}'(t)=\partial_t\phi(t,(1-t)x+ty)+\nabla_x\phi(t,(1-t)x+ty)\cdot(y-x).
\end{align*}
Since $\phi$ has compact support in $(0,1)\times\mathbb R^n$, the function $g_{x,y}$ has compact support in $(0,1)$ for every $(x,y)$, and therefore
\begin{align*}
\int_0^1 g_{x,y}'(t)\,d\mathcal L^1(t)=0.
\end{align*}
We next justify Fubini's theorem. Let $M_0:=\|\partial_t\phi\|_\infty$ and $M_1:=\|\nabla_x\phi\|_\infty$. The integrand is bounded in absolute value by $M_0+M_1|y-x|$, and $|y-x|$ is $\pi$-integrable because $\pi$ has finite quadratic cost and is a probability measure. Hence the integrand belongs to $L^1((0,1)\times\mathbb R^n\times\mathbb R^n,\eta)$.
Integrating this identity with respect to $\pi$ and using Fubini's theorem gives
\begin{align*}
\int_0^1\int_{\mathbb R^n\times\mathbb R^n}\left(\partial_t\phi(t,(1-t)x+ty)+\nabla_x\phi(t,(1-t)x+ty)\cdot(y-x)\right)\,d\pi(x,y)\,d\mathcal L^1(t)=0.
\end{align*}
Using the definition of $\rho_t$ and the definition of $v$ as the conditional expectation of $y-x$ given $(t,(1-t)x+ty)$, this becomes
\begin{align*}
\int_0^1\int_{\mathbb R^n}\left(\partial_t\phi(t,z)+\nabla_x\phi(t,z)\cdot v_t(z)\right)\,d\rho_t(z)\,d\mathcal L^1(t)=0.
\end{align*}
Thus $(\rho,v)$ is admissible, and consequently
\begin{align*}
\inf_{(\rho,v)}\int_0^1\int_{\mathbb R^n}|v_t(x)|^2\,d\rho_t(x)\,d\mathcal L^1(t)\le W_2^2(\mu_0,\mu_1).
\end{align*}
[/step]
[step:Represent an arbitrary dynamic competitor by particle paths]
Let $(\rho,v)$ be any admissible pair. Since $\rho_t$ is a probability measure for each $t\in[0,1]$ and the kinetic action is finite, Cauchy-Schwarz gives
\begin{align*}
\int_0^1\int_{\mathbb R^n}|v_t(x)|\,d\rho_t(x)\,d\mathcal L^1(t)\le \left(\int_0^1\int_{\mathbb R^n}|v_t(x)|^2\,d\rho_t(x)\,d\mathcal L^1(t)\right)^{1/2}<\infty.
\end{align*}
Moreover,
\begin{align*}
\int_0^1\int_{\mathbb R^n}\frac{|v_t(x)|}{1+|x|}\,d\rho_t(x)\,d\mathcal L^1(t)\le \int_0^1\int_{\mathbb R^n}|v_t(x)|\,d\rho_t(x)\,d\mathcal L^1(t)<\infty.
\end{align*}
Together with the admissibility assumptions that $v$ is Borel and that $(\rho,v)$ solves the continuity equation in the distributional sense, this verifies the hypotheses of the superposition principle for continuity equations in the finite-mass Euclidean setting. By that principle, there exists a Borel probability measure $\Theta$ on $C([0,1];\mathbb R^n)$ concentrated on absolutely continuous curves $\gamma:[0,1]\to\mathbb R^n$ such that:
\begin{align*}
(e_t)_\#\Theta=\rho_t
\end{align*}
for every $t\in[0,1]$, where
\begin{align*}
e_t:C([0,1];\mathbb R^n)&\to\mathbb R^n
\end{align*}
\begin{align*}
\gamma&\mapsto \gamma(t),
\end{align*}
and such that, for $\Theta$-a.e. curve $\gamma$,
\begin{align*}
\dot\gamma(t)=v_t(\gamma(t))
\end{align*}
for $\mathcal L^1$-a.e. $t\in(0,1)$. The cited result is not yet resolved to a wiki theorem ID: Superposition principle for continuity equations.
Define the endpoint map
\begin{align*}
E:C([0,1];\mathbb R^n)&\to\mathbb R^n\times\mathbb R^n
\end{align*}
\begin{align*}
\gamma&\mapsto (\gamma(0),\gamma(1)),
\end{align*}
and set
\begin{align*}
\pi_\Theta:=E_\#\Theta.
\end{align*}
Since $(e_0)_\#\Theta=\rho_{t=0}=\mu_0$ and $(e_1)_\#\Theta=\rho_{t=1}=\mu_1$, the measure $\pi_\Theta$ belongs to $\Pi(\mu_0,\mu_1)$.
[/step]
[step:Bound the endpoint cost by the kinetic action]
For $\Theta$-a.e. absolutely continuous curve $\gamma:[0,1]\to\mathbb R^n$, the fundamental theorem of calculus for absolutely continuous curves gives
\begin{align*}
\gamma(1)-\gamma(0)=\int_0^1 \dot\gamma(t)\,d\mathcal L^1(t).
\end{align*}
Taking Euclidean norms and applying Cauchy-Schwarz on the probability measure space $((0,1),\mathcal B((0,1)),\mathcal L^1\!\restriction_{(0,1)})$, we obtain
\begin{align*}
|\gamma(1)-\gamma(0)|^2\le \left(\int_0^1|\dot\gamma(t)|\,d\mathcal L^1(t)\right)^2\le \int_0^1|\dot\gamma(t)|^2\,d\mathcal L^1(t).
\end{align*}
Integrating with respect to $\Theta$ and using the superposition identity $\dot\gamma(t)=v_t(\gamma(t))$ gives
\begin{align*}
\int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi_\Theta(x,y)\le \int_{C([0,1];\mathbb R^n)}\int_0^1|v_t(\gamma(t))|^2\,d\mathcal L^1(t)\,d\Theta(\gamma).
\end{align*}
By Fubini's theorem and $(e_t)_\#\Theta=\rho_t$,
\begin{align*}
\int_{C([0,1];\mathbb R^n)}\int_0^1|v_t(\gamma(t))|^2\,d\mathcal L^1(t)\,d\Theta(\gamma)=\int_0^1\int_{\mathbb R^n}|v_t(x)|^2\,d\rho_t(x)\,d\mathcal L^1(t).
\end{align*}
Since $\pi_\Theta\in\Pi(\mu_0,\mu_1)$, the definition of $W_2^2$ gives
\begin{align*}
W_2^2(\mu_0,\mu_1)\le \int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi_\Theta(x,y)\le \int_0^1\int_{\mathbb R^n}|v_t(x)|^2\,d\rho_t(x)\,d\mathcal L^1(t).
\end{align*}
Taking the infimum over all admissible pairs $(\rho,v)$ yields
\begin{align*}
W_2^2(\mu_0,\mu_1)\le \inf_{(\rho,v)}\int_0^1\int_{\mathbb R^n}|v_t(x)|^2\,d\rho_t(x)\,d\mathcal L^1(t).
\end{align*}
Together with the previous inequality, this proves the Benamou-Brenier formula.
[/step]
[step:Prove existence and uniqueness of the entropic minimizer]
Fix $\varepsilon>0$. Define
\begin{align*}
\mathcal J_\varepsilon:\Pi(\mu_0,\mu_1)&\to(-\infty,+\infty]
\end{align*}
\begin{align*}
\pi&\mapsto \int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi(x,y)+\varepsilon H(\pi\mid m).
\end{align*}
The coupling $m=\mu_0\otimes\mu_1$ belongs to $\Pi(\mu_0,\mu_1)$, satisfies $H(m\mid m)=0$, and has finite quadratic cost because $\mu_0,\mu_1\in\mathcal P_2(\mathbb R^n)$. Hence $\inf_{\Pi(\mu_0,\mu_1)}\mathcal J_\varepsilon<\infty$.
The set $\Pi(\mu_0,\mu_1)$ is narrowly compact. Indeed, fixed tight marginals imply tightness of the family of couplings, and the marginal constraints are closed under narrow convergence because bounded continuous test functions depending only on one coordinate pass to the limit. The quadratic cost term is narrowly lower semicontinuous because $(x,y)\mapsto |x-y|^2$ is nonnegative and lower semicontinuous. The relative entropy $\pi\mapsto H(\pi\mid m)$ is narrowly lower semicontinuous on probability measures with fixed probability reference measure $m$; this cited result is not yet resolved to a wiki theorem ID: lower semicontinuity of relative entropy. Therefore $\mathcal J_\varepsilon$ is narrowly lower semicontinuous on the narrowly compact set $\Pi(\mu_0,\mu_1)$, and the direct method gives a minimizer $\pi_\varepsilon\in\Pi(\mu_0,\mu_1)$.
For uniqueness, let $\pi_a,\pi_b\in\Pi(\mu_0,\mu_1)$ be two minimizers. Since $\mathcal J_\varepsilon(\pi_a)$ and $\mathcal J_\varepsilon(\pi_b)$ are finite, both measures are absolutely continuous with respect to $m$. Let
\begin{align*}
f_a:=\frac{d\pi_a}{dm},\qquad f_b:=\frac{d\pi_b}{dm}.
\end{align*}
For $\theta\in(0,1)$, define
\begin{align*}
\pi_\theta:=\theta\pi_a+(1-\theta)\pi_b.
\end{align*}
Then $\pi_\theta\in\Pi(\mu_0,\mu_1)$ and $d\pi_\theta/dm=\theta f_a+(1-\theta)f_b$. The quadratic cost is affine in $\pi$, while the function
\begin{align*}
r:[0,\infty) \to \mathbb R,\qquad s\mapsto s\log s
\end{align*}
with $0\log0:=0$ is strictly convex on $[0,\infty)$. Hence, if $f_a\ne f_b$ on a set of positive $m$-measure,
\begin{align*}
H(\pi_\theta\mid m)<\theta H(\pi_a\mid m)+(1-\theta)H(\pi_b\mid m),
\end{align*}
and therefore
\begin{align*}
\mathcal J_\varepsilon(\pi_\theta)<\theta\mathcal J_\varepsilon(\pi_a)+(1-\theta)\mathcal J_\varepsilon(\pi_b).
\end{align*}
This contradicts minimality of both $\pi_a$ and $\pi_b$. Thus $f_a=f_b$ $m$-a.e., so $\pi_a=\pi_b$. The entropic minimizer is unique.
[/step]
[step:Pass to the zero-temperature limit using finite-entropy recovery couplings]
For every $\varepsilon>0$ and every $\pi\in\Pi(\mu_0,\mu_1)$, the relative entropy satisfies $H(\pi\mid m)\ge 0$. Therefore
\begin{align*}
\mathcal J_\varepsilon(\pi)\ge \int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi(x,y)\ge W_2^2(\mu_0,\mu_1).
\end{align*}
Taking the infimum over $\pi\in\Pi(\mu_0,\mu_1)$ gives
\begin{align*}
E_\varepsilon(\mu_0,\mu_1)\ge W_2^2(\mu_0,\mu_1)
\end{align*}
for every $\varepsilon>0$, and hence
\begin{align*}
\liminf_{\varepsilon\downarrow0}E_\varepsilon(\mu_0,\mu_1)\ge W_2^2(\mu_0,\mu_1).
\end{align*}
It remains to prove the upper bound. Let $\delta>0$. By the assumed finite-entropy approximation property, choose $k\in\mathbb N$ such that
\begin{align*}
\int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi_k(x,y)\le W_2^2(\mu_0,\mu_1)+\delta.
\end{align*}
Since $H(\pi_k\mid m)<\infty$, choose $\varepsilon_\delta>0$ such that, whenever $0<\varepsilon<\varepsilon_\delta$,
\begin{align*}
\varepsilon H(\pi_k\mid m)\le \delta.
\end{align*}
Then, for every $0<\varepsilon<\varepsilon_\delta$, using $\pi_k$ as an admissible competitor in the entropic problem gives
\begin{align*}
E_\varepsilon(\mu_0,\mu_1)\le \int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi_k(x,y)+\varepsilon H(\pi_k\mid m)\le W_2^2(\mu_0,\mu_1)+2\delta.
\end{align*}
Taking $\limsup_{\varepsilon\downarrow0}$ and then letting $\delta\downarrow0$ gives
\begin{align*}
\limsup_{\varepsilon\downarrow0}E_\varepsilon(\mu_0,\mu_1)\le W_2^2(\mu_0,\mu_1).
\end{align*}
Combining the lower and upper bounds proves
\begin{align*}
\lim_{\varepsilon\downarrow0}E_\varepsilon(\mu_0,\mu_1)=W_2^2(\mu_0,\mu_1).
\end{align*}
This completes the proof.
[/step]
Explore Further
Input-Output Feedback Linearisation Theorem
Analysis
GNS Representation Theorem for Normal States
Analysis
The $p$-Product Formula Defines a Metric
Analysis
Symbolic Square Root Lemma for Positive Weyl Symbols
Analysis
Quadratic Norm Bound for the Curvature Reaction Tensor
Analysis
Pesin Stable Manifold Principle
Analysis
Takesaki Conditional Expectation Theorem
Analysis
Compact Embedding of Hölder Spaces
Analysis
Analysis
Area