[proofplan]
We prove the two inequalities separately. For the upper bound, choose an optimal coupling of $\mu_0$ and $\mu_1$, move each transported particle along the constant-speed segment joining its initial and final positions, and then pass from this Lagrangian description to an Eulerian velocity by conditional expectation. For the lower bound, apply the superposition principle for solutions of the continuity equation to represent any admissible Eulerian pair by a probability measure on absolutely continuous paths, then compare endpoint displacement with path energy using Cauchy-Schwarz. The constructed upper-bound curve and the lower bound together give equality and attainment.
[/proofplan]
[step:Build a constant-speed path measure from an optimal coupling]
By the existence theorem for optimal couplings in $\mathcal P_2(\mathbb R^n)$, there is a coupling $\pi\in\mathcal P(\mathbb R^n\times\mathbb R^n)$ of $\mu_0$ and $\mu_1$ such that
\begin{align*}
\int_{\mathbb R^n\times\mathbb R^n}|y-x|^2\,d\pi(x,y)=W_2^2(\mu_0,\mu_1).
\end{align*}
This cited existence theorem is a standard prerequisite not yet verified here by an internal theorem link.
Define the path map
\begin{align*}
\Gamma:\mathbb R^n\times\mathbb R^n&\to C([0,1];\mathbb R^n)
\end{align*}
\begin{align*}
(x,y)&\mapsto \Gamma(x,y),
\end{align*}
where
\begin{align*}
\Gamma(x,y)(t)=(1-t)x+ty
\end{align*}
for $t\in[0,1]$. Let $\eta:=\Gamma_\#\pi\in\mathcal P(C([0,1];\mathbb R^n))$. For each $t\in[0,1]$, let
\begin{align*}
e_t:C([0,1];\mathbb R^n)&\to\mathbb R^n
\end{align*}
\begin{align*}
\gamma&\mapsto \gamma(t)
\end{align*}
be the evaluation map, and define
\begin{align*}
\mu_t:=(e_t)_\#\eta.
\end{align*}
Since $e_0(\Gamma(x,y))=x$ and $e_1(\Gamma(x,y))=y$, the endpoint measures are $\mu_{t=0}=\mu_0$ and $\mu_{t=1}=\mu_1$.
For every bounded continuous function $f:\mathbb R^n\to\mathbb R$, the map
\begin{align*}
t\mapsto \int_{\mathbb R^n}f(z)\,d\mu_t(z)
=
\int_{\mathbb R^n\times\mathbb R^n}f((1-t)x+ty)\,d\pi(x,y)
\end{align*}
is continuous by dominated convergence, because $f$ is bounded and $(t,x,y)\mapsto f((1-t)x+ty)$ is continuous in $t$ for every $(x,y)$. Hence $t\mapsto\mu_t$ is narrowly continuous.
[/step]
[step:Define the Eulerian velocity by conditional expectation]
Define the displacement map
\begin{align*}
\xi:\mathbb R^n\times\mathbb R^n&\to\mathbb R^n
\end{align*}
\begin{align*}
(x,y)&\mapsto y-x.
\end{align*}
Define the space-time interpolation map
\begin{align*}
F:[0,1]\times\mathbb R^n\times\mathbb R^n&\to[0,1]\times\mathbb R^n
\end{align*}
\begin{align*}
(t,x,y)&\mapsto (t,(1-t)x+ty).
\end{align*}
Let $\Lambda:=\mathcal L^1\big|_{[0,1]}\otimes\pi$. Since
\begin{align*}
\int_{[0,1]\times\mathbb R^n\times\mathbb R^n}|\xi(x,y)|^2\,d\Lambda(t,x,y)
=
W_2^2(\mu_0,\mu_1)<\infty,
\end{align*}
the conditional expectation theorem applied to the square-integrable random vector $\xi$ with respect to the $\sigma$-algebra generated by $F$ gives a Borel map
\begin{align*}
v:[0,1]\times\mathbb R^n&\to\mathbb R^n
\end{align*}
\begin{align*}
(t,z)&\mapsto v_t(z)
\end{align*}
such that
\begin{align*}
v(F(t,x,y))=\mathbb E_\Lambda[\xi\mid F](t,x,y)
\end{align*}
for $\Lambda$-a.e. $(t,x,y)$. The equality is understood up to the measure $F_\#\Lambda$, which is precisely $d\mu_t(z)\,d\mathcal L^1(t)$.
By Jensen's inequality for conditional expectation,
\begin{align*}
\int_0^1\int_{\mathbb R^n}|v_t(z)|^2\,d\mu_t(z)\,d\mathcal L^1(t)
\le
\int_{\mathbb R^n\times\mathbb R^n}|y-x|^2\,d\pi(x,y).
\end{align*}
Thus
\begin{align*}
\int_0^1\int_{\mathbb R^n}|v_t(z)|^2\,d\mu_t(z)\,d\mathcal L^1(t)
\le
W_2^2(\mu_0,\mu_1).
\end{align*}
[guided]
The Lagrangian velocity along the segment from $x$ to $y$ is the vector $y-x$. The Eulerian velocity, however, must be a function of the current space-time point $(t,z)$, not of the hidden labels $(x,y)$. Therefore we average the Lagrangian velocity over all particles that occupy the same point $z$ at time $t$.
Formally, define
\begin{align*}
\xi:\mathbb R^n\times\mathbb R^n&\to\mathbb R^n
\end{align*}
\begin{align*}
(x,y)&\mapsto y-x
\end{align*}
and
\begin{align*}
F:[0,1]\times\mathbb R^n\times\mathbb R^n&\to[0,1]\times\mathbb R^n
\end{align*}
\begin{align*}
(t,x,y)&\mapsto (t,(1-t)x+ty).
\end{align*}
Let $\Lambda:=\mathcal L^1\big|_{[0,1]}\otimes\pi$. The map $F$ records the observable space-time position of a particle, while $\xi$ records its actual constant velocity. Since $\pi$ is an optimal coupling of two measures in $\mathcal P_2(\mathbb R^n)$, the quadratic transport cost is finite, and hence
\begin{align*}
\int_{[0,1]\times\mathbb R^n\times\mathbb R^n}|\xi(x,y)|^2\,d\Lambda(t,x,y)
=
\int_{\mathbb R^n\times\mathbb R^n}|y-x|^2\,d\pi(x,y)
<\infty.
\end{align*}
Thus $\xi$ is square-integrable with respect to $\Lambda$.
We now use the disintegration and conditional expectation theorem for the measurable map $F$. It gives a Borel map
\begin{align*}
v:[0,1]\times\mathbb R^n&\to\mathbb R^n
\end{align*}
\begin{align*}
(t,z)&\mapsto v_t(z)
\end{align*}
such that $v(F(t,x,y))$ is the conditional expectation of $\xi(x,y)$ given the value of $F(t,x,y)$. Equivalently, $v_t(z)$ is the average displacement $y-x$ among all particles that are at $z$ at time $t$. This definition is meaningful only up to the measure $F_\#\Lambda=d\mu_t(z)\,d\mathcal L^1(t)$, which is exactly the null-set convention needed for the action integral and the weak continuity equation.
The crucial estimate is Jensen's inequality for conditional expectation. Since the function $a\mapsto |a|^2$ is convex on $\mathbb R^n$,
\begin{align*}
|v(F(t,x,y))|^2
=
|\mathbb E_\Lambda[\xi\mid F](t,x,y)|^2
\le
\mathbb E_\Lambda[|\xi|^2\mid F](t,x,y)
\end{align*}
in the conditional-expectation sense. Integrating this inequality with respect to $\Lambda$ and pushing forward by $F$ gives
\begin{align*}
\int_0^1\int_{\mathbb R^n}|v_t(z)|^2\,d\mu_t(z)\,d\mathcal L^1(t)
\le
\int_{[0,1]\times\mathbb R^n\times\mathbb R^n}|\xi(x,y)|^2\,d\Lambda(t,x,y).
\end{align*}
Substituting the definition of $\Lambda$ and $\xi$, we obtain
\begin{align*}
\int_0^1\int_{\mathbb R^n}|v_t(z)|^2\,d\mu_t(z)\,d\mathcal L^1(t)
\le
\int_{\mathbb R^n\times\mathbb R^n}|y-x|^2\,d\pi(x,y)
=
W_2^2(\mu_0,\mu_1).
\end{align*}
This is the action bound required for the upper inequality.
[/guided]
[/step]
[step:Verify the continuity equation for the constructed pair]
Let $\phi\in C_c^\infty((0,1)\times\mathbb R^n)$. For every $(x,y)\in\mathbb R^n\times\mathbb R^n$, the map
\begin{align*}
h_{x,y}:[0,1]&\to\mathbb R
\end{align*}
\begin{align*}
t&\mapsto \phi(t,(1-t)x+ty)
\end{align*}
is continuously differentiable and compactly supported in $(0,1)$. Therefore the fundamental theorem of calculus gives
\begin{align*}
\int_0^1
\left[
\partial_t\phi(t,(1-t)x+ty)
+
\nabla_x\phi(t,(1-t)x+ty)\cdot(y-x)
\right]
\,d\mathcal L^1(t)=0.
\end{align*}
Integrating this identity with respect to $\pi$ and using Fubini's theorem, whose integrability hypothesis holds because $\phi$ and $\nabla_x\phi$ are bounded with compact support and $\xi\in L^2(\pi;\mathbb R^n)$, yields
\begin{align*}
\int_0^1
\int_{\mathbb R^n\times\mathbb R^n}
\left[
\partial_t\phi(t,(1-t)x+ty)
+
\nabla_x\phi(t,(1-t)x+ty)\cdot(y-x)
\right]
\,d\pi(x,y)\,d\mathcal L^1(t)=0.
\end{align*}
The first term is the pushforward identity for $\mu_t$. The second term is the defining property of the conditional expectation $v_t(z)$ with test vector $\nabla_x\phi(t,z)$. Hence
\begin{align*}
\int_0^1\int_{\mathbb R^n}
\left(\partial_t\phi(t,z)+\nabla_x\phi(t,z)\cdot v_t(z)\right)
\,d\mu_t(z)\,d\mathcal L^1(t)=0.
\end{align*}
Thus $(\mu,v)$ is admissible, and the preceding action estimate gives
\begin{align*}
\inf_{(\mu,v)}
\int_0^1\int_{\mathbb R^n}|v_t(z)|^2\,d\mu_t(z)\,d\mathcal L^1(t)
\le
W_2^2(\mu_0,\mu_1).
\end{align*}
[/step]
[step:Represent an arbitrary admissible pair by absolutely continuous paths]
Let $(\mu,v)$ be any admissible pair. The superposition principle for the continuity equation applies because $t\mapsto\mu_t$ is narrowly continuous, $v$ is Borel measurable, and
\begin{align*}
\int_0^1\int_{\mathbb R^n}|v_t(x)|^2\,d\mu_t(x)\,d\mathcal L^1(t)<\infty.
\end{align*}
This superposition principle is a standard prerequisite not yet verified here by an internal theorem link. It yields a probability measure $\eta\in\mathcal P(C([0,1];\mathbb R^n))$ concentrated on absolutely continuous paths $\gamma:[0,1]\to\mathbb R^n$ such that
\begin{align*}
(e_t)_\#\eta=\mu_t
\end{align*}
for every $t\in[0,1]$, and such that the metric derivative of $\gamma$ is represented by the vector field $v$ along the path in the integrated sense
\begin{align*}
\int_{C([0,1];\mathbb R^n)}
\int_0^1|\dot\gamma(t)|^2\,d\mathcal L^1(t)\,d\eta(\gamma)
\le
\int_0^1\int_{\mathbb R^n}|v_t(x)|^2\,d\mu_t(x)\,d\mathcal L^1(t).
\end{align*}
Here $\dot\gamma(t)$ denotes the classical derivative, defined for $\mathcal L^1$-a.e. $t$ along each absolutely continuous path.
Define the endpoint map
\begin{align*}
E:C([0,1];\mathbb R^n)&\to\mathbb R^n\times\mathbb R^n
\end{align*}
\begin{align*}
\gamma&\mapsto(\gamma(0),\gamma(1)).
\end{align*}
Let $\pi:=E_\#\eta$. Since $(e_0)_\#\eta=\mu_0$ and $(e_1)_\#\eta=\mu_1$, the measure $\pi$ is a coupling of $\mu_0$ and $\mu_1$.
[/step]
[step:Bound endpoint cost by the action of the admissible pair]
For $\eta$-a.e. absolutely continuous path $\gamma:[0,1]\to\mathbb R^n$, the fundamental theorem of calculus for absolutely continuous curves gives
\begin{align*}
\gamma(1)-\gamma(0)=\int_0^1\dot\gamma(t)\,d\mathcal L^1(t).
\end{align*}
Applying the Cauchy-Schwarz inequality in the Hilbert space $L^2((0,1),\mathcal L^1;\mathbb R^n)$ to the functions $t\mapsto\dot\gamma(t)$ and $t\mapsto 1$, we obtain
\begin{align*}
|\gamma(1)-\gamma(0)|^2
\le
\int_0^1|\dot\gamma(t)|^2\,d\mathcal L^1(t).
\end{align*}
Integrating with respect to $\eta$ gives
\begin{align*}
\int_{\mathbb R^n\times\mathbb R^n}|y-x|^2\,d\pi(x,y)
=
\int_{C([0,1];\mathbb R^n)}|\gamma(1)-\gamma(0)|^2\,d\eta(\gamma)
\le
\int_{C([0,1];\mathbb R^n)}
\int_0^1|\dot\gamma(t)|^2\,d\mathcal L^1(t)\,d\eta(\gamma).
\end{align*}
Using the superposition action estimate from the previous step,
\begin{align*}
\int_{\mathbb R^n\times\mathbb R^n}|y-x|^2\,d\pi(x,y)
\le
\int_0^1\int_{\mathbb R^n}|v_t(x)|^2\,d\mu_t(x)\,d\mathcal L^1(t).
\end{align*}
Since $\pi$ is a coupling of $\mu_0$ and $\mu_1$, the definition of $W_2$ gives
\begin{align*}
W_2^2(\mu_0,\mu_1)
\le
\int_{\mathbb R^n\times\mathbb R^n}|y-x|^2\,d\pi(x,y).
\end{align*}
Therefore every admissible pair satisfies
\begin{align*}
W_2^2(\mu_0,\mu_1)
\le
\int_0^1\int_{\mathbb R^n}|v_t(x)|^2\,d\mu_t(x)\,d\mathcal L^1(t).
\end{align*}
Taking the infimum over admissible pairs gives the lower bound.
[/step]
[step:Combine the two inequalities and identify an attaining pair]
The constructed pair from the optimal coupling is admissible and satisfies
\begin{align*}
\int_0^1\int_{\mathbb R^n}|v_t(x)|^2\,d\mu_t(x)\,d\mathcal L^1(t)
\le
W_2^2(\mu_0,\mu_1).
\end{align*}
The lower bound just proved applies to this same pair, so
\begin{align*}
W_2^2(\mu_0,\mu_1)
\le
\int_0^1\int_{\mathbb R^n}|v_t(x)|^2\,d\mu_t(x)\,d\mathcal L^1(t)
\le
W_2^2(\mu_0,\mu_1).
\end{align*}
Hence equality holds for this admissible pair. Consequently the infimum equals $W_2^2(\mu_0,\mu_1)$ and is attained.
[/step]