[proofplan]
The proof first carries out McCann's determinant computation in the regular case, where the endpoint densities and the Brenier map are smooth enough for the change-of-variables formula to apply pointwise. The determinant inequality compares the Jacobian scale of the interpolated map with the affine interpolation of the endpoint density scales. Convexity and monotonicity of $h(r)=r^nU(r^{-n})$ then give the displacement-convexity inequality after integration. Finally, the passage to arbitrary endpoints is made by the standard McCann relaxation theorem, and the separately stated lower-semicontinuity hypothesis supplies the limiting inequality.
[/proofplan]
[step:Reduce the argument to finite absolutely continuous endpoints]
If $\mathcal U[\mu_0]=+\infty$ or $\mathcal U[\mu_1]=+\infty$, then the required inequality is immediate for every $t\in(0,1)$ and for any $W_2$-geodesic, because the right-hand side is $+\infty$. At $t=0$ and $t=1$ the inequality is equality whenever the corresponding endpoint coefficient is nonzero. Hence it remains to prove the assertion when
\begin{align*}
\mu_0=\rho_0\,d\mathcal L^n
\end{align*}
and
\begin{align*}
\mu_1=\rho_1\,d\mathcal L^n,
\end{align*}
where $\rho_0:\mathbb R^n\to[0,\infty)$ and $\rho_1:\mathbb R^n\to[0,\infty)$ are Borel probability densities and both endpoint energies are finite.
We first assume the following regularity. The densities $\rho_0$ and $\rho_1$ are smooth and positive on compact convex supports, and the optimal transport map $T:\operatorname{supp}\rho_0\to\operatorname{supp}\rho_1$ is a smooth Brenier map of the form $T=\nabla\varphi$, where $\varphi:\mathbb R^n\to\mathbb R$ is convex and smooth on $\operatorname{supp}\rho_0$. For $t\in[0,1]$, define the interpolation map $T_t:\operatorname{supp}\rho_0\to\mathbb R^n$ by
\begin{align*}
T_t(x)=(1-t)x+tT(x).
\end{align*}
Since $T_t$ is the gradient of the [convex function](/page/Convex%20Function) $x\mapsto (1-t)|x|^2/2+t\varphi(x)$, [Brenier's theorem](/theorems/7477) gives that
\begin{align*}
\mu_t=(T_t)_\#\mu_0
\end{align*}
is a constant-speed $W_2$-geodesic between $\mu_0$ and $\mu_1$.
[/step]
[step:Rewrite the energy along the smooth Brenier interpolation]
Define $A:\operatorname{supp}\rho_0\to\mathbb R^{n\times n}$ by
\begin{align*}
A(x)=JT_x,
\end{align*}
where $JT_x$ is the [Jacobian matrix](/page/Jacobian%20Matrix) of $T$ at $x$. For each $t\in[0,1]$, define $A_t:\operatorname{supp}\rho_0\to\mathbb R^{n\times n}$ by
\begin{align*}
A_t(x)=(1-t)I_n+tA(x).
\end{align*}
The matrix $A(x)$ is symmetric positive semidefinite for $\mathcal L^n$-a.e. $x\in\operatorname{supp}\rho_0$, because $T=\nabla\varphi$ and $\varphi$ is convex. Thus $A_t(x)$ is positive semidefinite for such $x$.
The smooth change-of-variables formula applied to $T_t$ gives, for $\mathcal L^n$-a.e. $x$ with $\rho_0(x)>0$,
\begin{align*}
\rho_t(T_t(x))\det A_t(x)=\rho_0(x).
\end{align*}
Consequently,
\begin{align*}
\mathcal U[\mu_t]=\int_{\operatorname{supp}\rho_0} U\left(\frac{\rho_0(x)}{\det A_t(x)}\right)\det A_t(x)\,d\mathcal L^n(x).
\end{align*}
On the set where $\rho_0=0$, the contribution is zero because $U(0)=0$. On $\{x\in\operatorname{supp}\rho_0:\rho_0(x)>0\}$, define $r_t:\{\rho_0>0\}\to(0,\infty)$ by
\begin{align*}
r_t(x)=\left(\frac{\det A_t(x)}{\rho_0(x)}\right)^{1/n}.
\end{align*}
Then the identity above becomes
\begin{align*}
\mathcal U[\mu_t]=\int_{\{\rho_0>0\}}\rho_0(x)h(r_t(x))\,d\mathcal L^n(x).
\end{align*}
At $t=0$ one has
\begin{align*}
r_0(x)=\rho_0(x)^{-1/n}.
\end{align*}
At $t=1$, the change-of-variables formula for $T$ gives
\begin{align*}
\rho_1(T(x))\det A(x)=\rho_0(x),
\end{align*}
and hence
\begin{align*}
r_1(x)=\rho_1(T(x))^{-1/n}
\end{align*}
for $\mathcal L^n$-a.e. $x$ with $\rho_0(x)>0$.
[/step]
[step:Compare the interpolated Jacobian scale with the endpoint scales]
Fix $x\in\operatorname{supp}\rho_0$ such that $\rho_0(x)>0$ and $A(x)$ is symmetric positive semidefinite. Let $\lambda_1(x),\dots,\lambda_n(x)\in[0,\infty)$ denote the eigenvalues of $A(x)$, counted with multiplicity. The eigenvalues of $A_t(x)$ are $(1-t)+t\lambda_i(x)$ for $i\in\{1,\dots,n\}$.
We prove
\begin{align*}
\det A_t(x)^{1/n}\ge (1-t)+t\det A(x)^{1/n}.
\end{align*}
Indeed, expanding the determinant gives
\begin{align*}
\det A_t(x)=\prod_{i=1}^n((1-t)+t\lambda_i(x)).
\end{align*}
For $m\in\{0,\dots,n\}$, let $e_m(\lambda_1(x),\dots,\lambda_n(x))$ denote the $m$th elementary symmetric polynomial, with $e_0=1$. Maclaurin's inequalities for nonnegative numbers imply, for $m\in\{1,\dots,n\}$,
\begin{align*}
e_m(\lambda_1(x),\dots,\lambda_n(x))\ge \binom{n}{m}(\lambda_1(x)\cdots\lambda_n(x))^{m/n}.
\end{align*}
Comparing the coefficients in the expansion of the preceding product with those of
\begin{align*}
\left((1-t)+t(\lambda_1(x)\cdots\lambda_n(x))^{1/n}\right)^n
\end{align*}
gives
\begin{align*}
\det A_t(x)\ge \left((1-t)+t\det A(x)^{1/n}\right)^n.
\end{align*}
Taking the nonnegative $n$th root proves the determinant comparison.
Dividing by $\rho_0(x)^{1/n}$ and using $\det A(x)=\rho_0(x)/\rho_1(T(x))$, we obtain
\begin{align*}
r_t(x)\ge (1-t)r_0(x)+tr_1(x).
\end{align*}
[guided]
The point of this step is to compare density scales rather than densities. Fix $x\in\operatorname{supp}\rho_0$ such that $\rho_0(x)>0$ and $A(x)=JT_x$ is symmetric positive semidefinite. Since $A(x)$ is positive semidefinite, its eigenvalues $\lambda_1(x),\dots,\lambda_n(x)$ are nonnegative. The matrix
\begin{align*}
A_t(x)=(1-t)I_n+tA(x)
\end{align*}
has eigenvalues $(1-t)+t\lambda_i(x)$, because the identity matrix and $A(x)$ are diagonal in the same orthonormal eigenbasis.
Therefore
\begin{align*}
\det A_t(x)=\prod_{i=1}^n((1-t)+t\lambda_i(x)).
\end{align*}
For $m\in\{0,\dots,n\}$, define $e_m(\lambda_1(x),\dots,\lambda_n(x))$ to be the $m$th elementary symmetric polynomial of these eigenvalues, with $e_0=1$. The coefficient of $t^m(1-t)^{n-m}$ in the product is exactly $e_m(\lambda_1(x),\dots,\lambda_n(x))$.
Maclaurin's inequalities state that the normalized elementary symmetric means of nonnegative numbers decrease with $m$. Applying this to $\lambda_1(x),\dots,\lambda_n(x)$ gives
\begin{align*}
e_m(\lambda_1(x),\dots,\lambda_n(x))\ge \binom{n}{m}(\lambda_1(x)\cdots\lambda_n(x))^{m/n}
\end{align*}
for every $m\in\{1,\dots,n\}$. Thus every coefficient in the expansion of $\det A_t(x)$ dominates the corresponding coefficient in
\begin{align*}
\left((1-t)+t(\lambda_1(x)\cdots\lambda_n(x))^{1/n}\right)^n.
\end{align*}
Since $t\in[0,1]$, the monomials $t^m(1-t)^{n-m}$ are nonnegative, and coefficientwise comparison gives
\begin{align*}
\det A_t(x)\ge \left((1-t)+t\det A(x)^{1/n}\right)^n.
\end{align*}
Both sides are nonnegative, so taking the nonnegative $n$th root is order-preserving and yields
\begin{align*}
\det A_t(x)^{1/n}\ge (1-t)+t\det A(x)^{1/n}.
\end{align*}
Now divide this inequality by $\rho_0(x)^{1/n}$. The change-of-variables formula at time $1$ is
\begin{align*}
\rho_1(T(x))\det A(x)=\rho_0(x).
\end{align*}
Therefore
\begin{align*}
\frac{\det A(x)^{1/n}}{\rho_0(x)^{1/n}}=\rho_1(T(x))^{-1/n}.
\end{align*}
By the definitions of $r_t$, $r_0$, and $r_1$, the divided inequality is precisely
\begin{align*}
r_t(x)\ge (1-t)r_0(x)+tr_1(x).
\end{align*}
This is the correct comparison because the McCann hypothesis is imposed on $h$ as a function of the scale variable $r=\rho^{-1/n}$.
[/guided]
[/step]
[step:Apply convexity and monotonicity of the McCann transform]
Since $h$ is nonincreasing and
\begin{align*}
r_t(x)\ge (1-t)r_0(x)+tr_1(x),
\end{align*}
we have
\begin{align*}
h(r_t(x))\le h((1-t)r_0(x)+tr_1(x)).
\end{align*}
Since $h$ is convex on $(0,\infty)$, it follows that
\begin{align*}
h(r_t(x))\le (1-t)h(r_0(x))+th(r_1(x)).
\end{align*}
Multiplying by $\rho_0(x)$ and integrating with respect to $\mathcal L^n$ gives
\begin{align*}
\mathcal U[\mu_t]\le (1-t)\int_{\{\rho_0>0\}}\rho_0(x)h(r_0(x))\,d\mathcal L^n(x)+t\int_{\{\rho_0>0\}}\rho_0(x)h(r_1(x))\,d\mathcal L^n(x).
\end{align*}
The first integral equals $\mathcal U[\mu_0]$, because
\begin{align*}
\rho_0(x)h(\rho_0(x)^{-1/n})=U(\rho_0(x)).
\end{align*}
For the second integral, using $r_1(x)=\rho_1(T(x))^{-1/n}$ and $\rho_0(x)=\rho_1(T(x))\det A(x)$ gives
\begin{align*}
\rho_0(x)h(r_1(x))=U(\rho_1(T(x)))\det A(x).
\end{align*}
The change-of-variables formula under $y=T(x)$ therefore gives
\begin{align*}
\int_{\{\rho_0>0\}}\rho_0(x)h(r_1(x))\,d\mathcal L^n(x)=\int_{\mathbb R^n}U(\rho_1(y))\,d\mathcal L^n(y).
\end{align*}
Thus
\begin{align*}
\mathcal U[\mu_t]\le (1-t)\mathcal U[\mu_0]+t\mathcal U[\mu_1]
\end{align*}
in the regular case.
[/step]
[step:Pass from regular endpoints to finite-energy endpoints by McCann relaxation]
We now invoke the standard McCann relaxation theorem for internal energies. The precise form needed is the following external theorem: if $U:[0,\infty)\to\mathbb R$ is convex, continuous at $0$, satisfies $U(0)=0$, and its McCann transform $r\mapsto r^nU(r^{-n})$ is convex and nonincreasing, then every pair of absolutely continuous finite-energy endpoints in $\mathcal P_2(\mathbb R^n)$ admits approximating smooth positive compactly supported endpoint pairs for which the regular Brenier computation applies, the endpoint energies converge, and a subsequence of the induced optimal dynamical couplings converges narrowly to an optimal coupling between the original endpoints.
The hypotheses of this relaxation theorem are exactly the hypotheses already verified in the statement: convexity of $U$, continuity at $0$, the normalization $U(0)=0$, and convexity and monotonicity of $h$. Applying it to the finite-energy absolutely continuous pair $\mu_0,\mu_1$, choose smooth approximants $\mu_{i,k}=\rho_{i,k}\,d\mathcal L^n$ for $i\in\{0,1\}$ such that
\begin{align*}
W_2(\mu_{i,k},\mu_i)\to0
\end{align*}
and
\begin{align*}
\mathcal U[\mu_{i,k}]\to\mathcal U[\mu_i].
\end{align*}
Let $\gamma_k\in\mathcal P(\mathbb R^n\times\mathbb R^n)$ be the optimal coupling induced by the regular Brenier map from $\mu_{0,k}$ to $\mu_{1,k}$. For $t\in[0,1]$, define $S_t:\mathbb R^n\times\mathbb R^n\to\mathbb R^n$ by
\begin{align*}
S_t(x,y)=(1-t)x+ty
\end{align*}
and define
\begin{align*}
\mu_{t,k}=(S_t)_\#\gamma_k.
\end{align*}
The regular computation gives
\begin{align*}
\mathcal U[\mu_{t,k}]\le (1-t)\mathcal U[\mu_{0,k}]+t\mathcal U[\mu_{1,k}].
\end{align*}
By the compactness clause of the relaxation theorem, after passing to a subsequence the couplings $\gamma_k$ converge narrowly to an optimal coupling $\gamma$ between $\mu_0$ and $\mu_1$. Define $\mu_t=(S_t)_\#\gamma$. Then $(\mu_t)_{t\in[0,1]}$ is a constant-speed $W_2$-geodesic. Since $S_t$ has at most linear growth and the endpoint second moments converge, the sequence $(\mu_{t,k})_{k\in\mathbb N}$ converges narrowly to $\mu_t$ and has uniformly bounded second moments. The lower-semicontinuity hypothesis in the theorem statement gives
\begin{align*}
\mathcal U[\mu_t]\le \liminf_{k\to\infty}\mathcal U[\mu_{t,k}].
\end{align*}
Combining the last two inequalities and using endpoint energy recovery yields
\begin{align*}
\mathcal U[\mu_t]\le (1-t)\mathcal U[\mu_0]+t\mathcal U[\mu_1].
\end{align*}
This proves displacement convexity for all finite-energy absolutely continuous endpoints.
[/step]
[step:Record the lower-semicontinuity input and close the proof]
The theorem statement includes lower semicontinuity as a separate hypothesis for the extended functional. When this hypothesis is obtained from the standard convex integral-functional lower-semicontinuity theorem, the exact external input is as follows: for a convex density $U$ with suitable affine lower control of its negative part and superlinear growth at infinity, the functional
\begin{align*}
\nu\mapsto \int_{\mathbb R^n}U\left(\frac{d\nu}{d\mathcal L^n}(x)\right)\,d\mathcal L^n(x)
\end{align*}
extended by $+\infty$ on nonzero singular parts is lower semicontinuous under narrow convergence of probability measures with uniformly integrable moments sufficient to prevent loss of mass at infinity. In the present statement, this is precisely the assumed lower-semicontinuity condition, and the uniformly bounded second moments supply the required tightness at infinity.
The infinite endpoint case was settled at the beginning, the regular finite-energy case was proved by the determinant computation, and the general finite-energy absolutely continuous case follows by relaxation and lower semicontinuity. Hence $\mathcal U$ is displacement convex in the stated sense. The lower-semicontinuity assertion is exactly the additional hypothesis recorded in the statement, with the standard convex superlinear integral-functional theorem giving the listed sufficient conditions. This completes the proof.
[/step]