[proofplan]
The proof has three parts. First, [Brenier's theorem](/theorems/7477) identifies the gradient map $T=\nabla\psi$ as the optimal quadratic transport map from $\mu_0$ to $\mu_1$, and the affine interpolation $T_t(x)=(1-t)x+tT(x)$ gives the displacement interpolation; the constant-speed identity follows from the triangle inequality once the endpoint cost is known to be optimal. Second, the particle formula $y(t)=T_t(x)$ gives the velocity $T(x)-x$, and differentiating test-function integrals proves the continuity equation. Third, along each particle path the velocity is constant, so differentiating $\nabla\phi_t(y(t))$ gives the gradient form of the Hamilton-Jacobi equation; the remaining scalar time-dependent constant is removed by the allowed additive normalization of $\phi_t$.
[/proofplan]
[step:Use Brenier optimality to identify the displacement interpolation]
Since $\mu_0$ is absolutely continuous with respect to $\mathcal L^n$, $\psi$ is convex, $T=\nabla\psi$, and $T_\#\mu_0=\mu_1$, Brenier's theorem for the quadratic cost implies that the transport plan induced by $T$ is optimal for the quadratic cost from $\mu_0$ to $\mu_1$. Therefore
\begin{align*}
W_2(\mu_0,\mu_1)^2=\int_U |T(x)-x|^2\,d\mu_0(x).
\end{align*}
Define
\begin{align*}
L:=W_2(\mu_0,\mu_1).
\end{align*}
Because $\mu_0,\mu_1\in\mathcal P_2(\mathbb R^n)$ and $T_\#\mu_0=\mu_1$, the function $x\mapsto |T(x)-x|$ belongs to $L^2(\mu_0)$, and hence $L<\infty$.
For $0\le s<t\le1$, the map
\begin{align*}
\Gamma_{s,t}:U&\to\mathbb R^n\times\mathbb R^n
\end{align*}
\begin{align*}
x&\mapsto (T_s(x),T_t(x))
\end{align*}
pushes $\mu_0$ forward to a coupling of $\mu_s$ and $\mu_t$. Hence, by the definition of $W_2$,
\begin{align*}
W_2(\mu_s,\mu_t)^2\le \int_U |T_t(x)-T_s(x)|^2\,d\mu_0(x).
\end{align*}
Since
\begin{align*}
T_t(x)-T_s(x)=(t-s)(T(x)-x),
\end{align*}
we obtain
\begin{align*}
W_2(\mu_s,\mu_t)\le (t-s)L.
\end{align*}
Applying this bound to the intervals $[0,s]$, $[s,t]$, and $[t,1]$, and then using the triangle inequality for $W_2$, gives
\begin{align*}
L=W_2(\mu_0,\mu_1)\le W_2(\mu_0,\mu_s)+W_2(\mu_s,\mu_t)+W_2(\mu_t,\mu_1)\le sL+(t-s)L+(1-t)L=L.
\end{align*}
All inequalities are therefore equalities. In particular,
\begin{align*}
W_2(\mu_s,\mu_t)=(t-s)L
\end{align*}
for every $0\le s<t\le1$. Thus $(\mu_t)_{t\in[0,1]}$ is a constant-speed $W_2$-geodesic.
[guided]
The key point is that the affine maps $T_t$ move every initial point $x$ along the straight line from $x$ to $T(x)$ at constant speed. To turn this particle picture into a Wasserstein geodesic, we first need to know that the endpoint map $T$ is optimal for the quadratic transport problem.
Because $\mu_0$ is absolutely continuous with respect to $\mathcal L^n$, because $\psi$ is convex, because $T=\nabla\psi$, and because $T_\#\mu_0=\mu_1$, Brenier's theorem applies to the pair $(\mu_0,\mu_1)$ for the quadratic cost. Its conclusion is exactly that the transport plan induced by $T$ is optimal, so the transport cost induced by $T$ realizes the Wasserstein distance:
\begin{align*}
W_2(\mu_0,\mu_1)^2=\int_U |T(x)-x|^2\,d\mu_0(x).
\end{align*}
We define the endpoint length
\begin{align*}
L:=W_2(\mu_0,\mu_1).
\end{align*}
The integral above is finite because $\mu_0,\mu_1\in\mathcal P_2(\mathbb R^n)$ and $T_\#\mu_0=\mu_1$, so $x\mapsto |T(x)-x|$ is square-integrable with respect to $\mu_0$.
Now fix $0\le s<t\le1$. We need to estimate the Wasserstein distance between the intermediate measures $\mu_s$ and $\mu_t$. The natural coupling is obtained by using the same starting point $x$ for both times. Define
\begin{align*}
\Gamma_{s,t}:U&\to\mathbb R^n\times\mathbb R^n
\end{align*}
\begin{align*}
x&\mapsto (T_s(x),T_t(x)).
\end{align*}
Since $\mu_s=(T_s)_\#\mu_0$ and $\mu_t=(T_t)_\#\mu_0$, the measure $(\Gamma_{s,t})_\#\mu_0$ is a coupling of $\mu_s$ and $\mu_t$. Therefore the definition of $W_2$ gives
\begin{align*}
W_2(\mu_s,\mu_t)^2\le \int_U |T_t(x)-T_s(x)|^2\,d\mu_0(x).
\end{align*}
The difference of the two affine interpolants is
\begin{align*}
T_t(x)-T_s(x)=(t-s)(T(x)-x).
\end{align*}
Substituting this identity into the previous estimate yields
\begin{align*}
W_2(\mu_s,\mu_t)^2\le (t-s)^2\int_U |T(x)-x|^2\,d\mu_0(x)=(t-s)^2L^2.
\end{align*}
Taking square roots gives
\begin{align*}
W_2(\mu_s,\mu_t)\le (t-s)L.
\end{align*}
This upper bound alone says that the curve is no faster than constant speed. To prove it is exactly a geodesic, we use the triangle inequality between the three intermediate pieces:
\begin{align*}
L=W_2(\mu_0,\mu_1)\le W_2(\mu_0,\mu_s)+W_2(\mu_s,\mu_t)+W_2(\mu_t,\mu_1).
\end{align*}
Applying the same upper bound to each term gives
\begin{align*}
W_2(\mu_0,\mu_s)+W_2(\mu_s,\mu_t)+W_2(\mu_t,\mu_1)\le sL+(t-s)L+(1-t)L=L.
\end{align*}
Thus the left and right sides are both $L$, so every inequality in the chain is an equality. In particular,
\begin{align*}
W_2(\mu_s,\mu_t)=(t-s)L.
\end{align*}
This is precisely the constant-speed geodesic identity.
[/guided]
[/step]
[step:Differentiate the particle formula to obtain the continuity equation]
Let $\zeta\in C_c^\infty(\mathbb R^n)$ be a [test function](/page/Test%20Function). Define
\begin{align*}
F_\zeta:[0,1]&\to\mathbb R
\end{align*}
\begin{align*}
t&\mapsto \int_{\mathbb R^n}\zeta(y)\,d\mu_t(y).
\end{align*}
Since $\mu_t=(T_t)_\#\mu_0$ and $\mu_0(\mathbb R^n\setminus U)=0$,
\begin{align*}
F_\zeta(t)=\int_U \zeta(T_t(x))\,d\mu_0(x).
\end{align*}
For every $x\in U$, the map $t\mapsto \zeta(T_t(x))$ is differentiable and
\begin{align*}
\frac{d}{dt}\zeta(T_t(x))=\nabla\zeta(T_t(x))\cdot (T(x)-x).
\end{align*}
Moreover,
\begin{align*}
\left|\nabla\zeta(T_t(x))\cdot (T(x)-x)\right|\le \|\nabla\zeta\|_\infty |T(x)-x|,
\end{align*}
and the right-hand side belongs to $L^1(\mu_0)$ because $|T-\operatorname{id}|\in L^2(\mu_0)$ and $\mu_0$ is a [probability measure](/page/Probability%20Measure). Differentiation under the integral sign gives
\begin{align*}
F_\zeta'(t)=\int_U \nabla\zeta(T_t(x))\cdot (T(x)-x)\,d\mu_0(x).
\end{align*}
Using the defining identity $\nabla\phi_t(T_t(x))=T(x)-x$ and the pushforward relation $\mu_t=(T_t)_\#\mu_0$, this becomes
\begin{align*}
F_\zeta'(t)=\int_{\mathbb R^n}\nabla\zeta(y)\cdot v_t(y)\,d\mu_t(y),
\end{align*}
where $v_t$ denotes any Borel extension to $\mathbb R^n$ of the smooth map $\nabla\phi_t:U_t\to\mathbb R^n$. This extension does not change the integral because $\mu_t(\mathbb R^n\setminus U_t)=0$.
This is the weak formulation of
\begin{align*}
\partial_t\mu_t+\operatorname{div}(\mu_t\nabla\phi_t)=0
\end{align*}
on $(0,1)\times\mathbb R^n$.
[/step]
[step:Differentiate the constant velocity identity along trajectories]
For $t\in(0,1)$, let
\begin{align*}
S_t:U_t&\to U
\end{align*}
denote the inverse diffeomorphism of $T_t:U\to U_t$. Define the velocity field
\begin{align*}
v_t:U_t&\to\mathbb R^n
\end{align*}
\begin{align*}
y&\mapsto \nabla\phi_t(y).
\end{align*}
The defining relation for $\phi_t$ says that
\begin{align*}
v_t(T_t(x))=T(x)-x
\end{align*}
for every $x\in U$. Fix $x\in U$ and define the trajectory
\begin{align*}
y_x:(0,1)&\to\mathbb R^n
\end{align*}
\begin{align*}
t&\mapsto T_t(x).
\end{align*}
Then
\begin{align*}
y_x'(t)=T(x)-x=v_t(y_x(t)).
\end{align*}
Since $\phi:Q\to\mathbb R$ is smooth and $v_t=\nabla\phi_t$, the map $(t,y)\mapsto v_t(y)$ is continuously differentiable on $Q$. Since $v_t(y_x(t))=T(x)-x$ is independent of $t$, differentiating this identity with respect to $t$ gives, for each component $i\in\{1,\dots,n\}$,
\begin{align*}
\partial_t (v_t)_i(y_x(t))+\sum_{j=1}^n \partial_{y_j}(v_t)_i(y_x(t))(y_x')_j(t)=0.
\end{align*}
Substituting $y_x'(t)=v_t(y_x(t))$ yields
\begin{align*}
\partial_t (v_t)_i(y_x(t))+\sum_{j=1}^n \partial_{y_j}(v_t)_i(y_x(t))(v_t)_j(y_x(t))=0.
\end{align*}
Because $x\in U$ was arbitrary and $T_t:U\to U_t$ is onto, for every $y\in U_t$ and every $i\in\{1,\dots,n\}$ we have
\begin{align*}
\partial_t (v_t)_i(y)+\sum_{j=1}^n \partial_{y_j}(v_t)_i(y)(v_t)_j(y)=0.
\end{align*}
Since $(v_t)_i=\partial_{y_i}\phi_t$, this is
\begin{align*}
\partial_{y_i}\partial_t\phi_t(y)+\sum_{j=1}^n \partial_{y_j}\partial_{y_i}\phi_t(y)\partial_{y_j}\phi_t(y)=0.
\end{align*}
The chain rule for the function $y\mapsto \frac{1}{2}|\nabla\phi_t(y)|^2$ gives, for each $i\in\{1,\dots,n\}$,
\begin{align*}
\partial_{y_i}\left(\frac{1}{2}|\nabla\phi_t|^2\right)(y)=\sum_{j=1}^n \partial_{y_i}\partial_{y_j}\phi_t(y)\partial_{y_j}\phi_t(y).
\end{align*}
Since mixed partial derivatives agree for the smooth function $\phi$, the last two displayed identities imply
Therefore
\begin{align*}
\nabla\left(\partial_t\phi_t+\frac{1}{2}|\nabla\phi_t|^2\right)(y)=0
\end{align*}
for every $y\in U_t$.
[/step]
[step:Choose the additive normalization to obtain the scalar Hamilton-Jacobi equation]
The previous step shows that, for each fixed $t\in(0,1)$, the smooth function
\begin{align*}
H_t:U_t&\to\mathbb R
\end{align*}
\begin{align*}
y&\mapsto \partial_t\phi_t(y)+\frac{1}{2}|\nabla\phi_t(y)|^2
\end{align*}
has zero gradient on $U_t$. Since $U$ is connected and $T_t:U\to U_t$ is a diffeomorphism, $U_t$ is connected. Hence there is a scalar $c(t)\in\mathbb R$ such that $H_t(y)=c(t)$ for every $y\in U_t$. Choose a point $x_0\in U$ and define $y_0:(0,1)\to\mathbb R^n$ by $y_0(t)=T_t(x_0)$. Since $\phi$ and $y_0$ are smooth, the function $c:(0,1)\to\mathbb R$ is smooth and is given by
\begin{align*}
c(t)=H_t(y_0(t)).
\end{align*}
Choose a smooth function $a:(0,1)\to\mathbb R$ satisfying $a'(t)=-c(t)$, and replace $\phi_t$ by $\phi_t+a(t)$. This changes $\partial_t\phi_t$ by $a'(t)$ and leaves $\nabla\phi_t$ unchanged. With this normalization,
\begin{align*}
\partial_t\phi_t(y)+\frac{1}{2}|\nabla\phi_t(y)|^2=0
\end{align*}
for every $t\in(0,1)$ and every $y\in U_t$. Together with the constant-speed geodesic identity and the weak continuity equation proved above, this completes the proof.
[/step]