[proofplan]
The proof is an application of the global von Renesse-Sturm characterization of Ricci lower bounds by entropy convexity. That theorem converts the pointwise tensor inequality $\operatorname{Ric}\ge Kg$ into $K$-displacement convexity of the relative entropy on the Wasserstein space. We then unpack the weak displacement-convexity conclusion to obtain an optimal dynamical plan, its induced $W_2$-geodesic, and the stated entropy inequality.
[/proofplan]
custom_env
admin
[step:Check that the entropy assumptions place both endpoints in the effective domain]
By hypothesis, $\operatorname{Ent}_{\operatorname{vol}_g}(\mu_0)$ and $\operatorname{Ent}_{\operatorname{vol}_g}(\mu_1)$ are finite real numbers. By the definition of $\operatorname{Ent}_{\operatorname{vol}_g}$, there exist Borel functions
\begin{align*}
\rho_0:M\to[0,\infty)
\end{align*}
and
\begin{align*}
\rho_1:M\to[0,\infty)
\end{align*}
such that
\begin{align*}
\mu_0=\rho_0\,\operatorname{vol}_g
\end{align*}
and
\begin{align*}
\mu_1=\rho_1\,\operatorname{vol}_g.
\end{align*}
Moreover $\rho_0\log\rho_0$ and $\rho_1\log\rho_1$ are integrable with respect to $\operatorname{vol}_g$. Thus both endpoint measures lie in the effective domain of the entropy functional. In particular, finite real entropy under this convention already implies absolute continuity with respect to $\operatorname{vol}_g$.
[/step]
custom_env
admin
[step:Apply the von Renesse-Sturm characterization to obtain weak displacement convexity]The manifold $(M,g)$ is complete and connected, and the hypothesis gives
\begin{align*}
\operatorname{Ric}\ge Kg
\end{align*}
as quadratic forms on the tangent bundle $TM$. Therefore the hypotheses of the von Renesse-Sturm characterization [citetheorem:9588] are satisfied. Applying the Ricci-lower-bound-to-entropy-convexity direction of that theorem to the metric-measure space $(M,d_g,\operatorname{vol}_g)$, the relative entropy $\operatorname{Ent}_{\operatorname{vol}_g}$ is $K$-displacement convex on $\mathcal P_2(M)$ in the weak sense: for each pair of finite-entropy endpoints in $\mathcal P_2(M)$, at least one constant-speed $W_2$-geodesic joining them satisfies the entropy convexity inequality.
By that weak formulation of $K$-displacement convexity, for the two endpoint measures $\mu_0,\mu_1\in\mathcal P_2(M)$ with finite real entropy there exists a constant-speed $W_2$-geodesic
\begin{align*}
\mu:[0,1]\to\mathcal P_2(M)
\end{align*}
with $\mu(0)=\mu_0$ and $\mu(1)=\mu_1$ such that
\begin{align*}
\operatorname{Ent}_{\operatorname{vol}_g}(\mu(t))\le (1-t)\operatorname{Ent}_{\operatorname{vol}_g}(\mu_0)+t\operatorname{Ent}_{\operatorname{vol}_g}(\mu_1)-\frac{K}{2}t(1-t)W_2(\mu_0,\mu_1)^2
\end{align*}
for every $t\in[0,1]$.[/step]
custom_env
admin
[guided]The only deep input is the global theorem connecting Ricci curvature and optimal transport convexity. We must verify its hypotheses before using it. The theorem applies to a complete connected Riemannian manifold equipped with its Riemannian distance and volume measure. These are exactly the objects in the present statement: $(M,g)$ is complete and connected, $d_g$ is its Riemannian distance, and $\operatorname{vol}_g$ is its Riemannian volume measure.
The curvature hypothesis required by the theorem is the tensor inequality
\begin{align*}
\operatorname{Ric}\ge Kg.
\end{align*}
In the present statement this means that, for every $p\in M$ and every tangent vector $v\in T_pM$,
\begin{align*}
\operatorname{Ric}_p(v,v)\ge K g_p(v,v).
\end{align*}
This is precisely the assumed Ricci lower bound. Hence the Ricci-lower-bound-to-entropy-convexity direction of the von Renesse-Sturm characterization [citetheorem:9588] gives that $\operatorname{Ent}_{\operatorname{vol}_g}$ is $K$-displacement convex on $\mathcal P_2(M)$ in the weak sense that at least one entropy-convex constant-speed $W_2$-geodesic exists between any two finite-entropy endpoints.
Now we unpack what this conclusion says for the fixed measures $\mu_0$ and $\mu_1$. Weak $K$-displacement convexity means that there is at least one constant-speed Wasserstein geodesic
\begin{align*}
\mu:[0,1]\to\mathcal P_2(M)
\end{align*}
with endpoints $\mu(0)=\mu_0$ and $\mu(1)=\mu_1$ along which the entropy satisfies the convexity inequality with the quadratic correction determined by $K$ and the squared endpoint distance. Thus, for every $t\in[0,1]$,
\begin{align*}
\operatorname{Ent}_{\operatorname{vol}_g}(\mu(t))\le (1-t)\operatorname{Ent}_{\operatorname{vol}_g}(\mu_0)+t\operatorname{Ent}_{\operatorname{vol}_g}(\mu_1)-\frac{K}{2}t(1-t)W_2(\mu_0,\mu_1)^2.
\end{align*}
This is exactly the entropy estimate required in the theorem statement.[/guided]
custom_env
admin
[step:Represent the selected Wasserstein geodesic by an optimal dynamical plan]Define $\operatorname{Geo}(M)$ to be the subset of $C([0,1];M)$ consisting of constant-speed minimizing geodesics $\gamma:[0,1]\to M$, equipped with the Borel structure induced by the compact-open topology. Since $(M,g)$ is complete and smooth, the Hopf-Rinow theorem implies that $(M,d_g)$ is a complete, separable, proper geodesic metric space. Hence the standard dynamical optimal-plan representation theorem for Wasserstein geodesics on proper geodesic Polish spaces applies to the constant-speed $W_2$-geodesic
\begin{align*}
\mu:[0,1]\to\mathcal P_2(M)
\end{align*}
obtained above.
Thus there exists a Borel probability measure $\Pi$ on $\operatorname{Geo}(M)$ such that, for each $t\in[0,1]$, the evaluation map
\begin{align*}
e_t:\operatorname{Geo}(M)\to M
\end{align*}
defined by $e_t(\gamma)=\gamma(t)$ is Borel and satisfies
\begin{align*}
(e_t)_{\#}\Pi=\mu(t).
\end{align*}
The same representation theorem gives that the endpoint coupling
\begin{align*}
(e_0,e_1)_{\#}\Pi
\end{align*}
is $W_2$-optimal between $\mu_0$ and $\mu_1$. Writing $\mu_t:=\mu(t)$ for every $t\in[0,1]$, the preceding step gives the displayed entropy inequality along this dynamically induced geodesic.[/step]
custom_env
admin
[guided]We now justify the passage from an abstract Wasserstein geodesic to a probability measure on actual geodesic paths in $M$. Define $\operatorname{Geo}(M)$ to be the set of constant-speed minimizing geodesics $\gamma:[0,1]\to M$, viewed as a Borel subset of $C([0,1];M)$ with the compact-open topology. For each $t\in[0,1]$, the evaluation map
\begin{align*}
e_t:\operatorname{Geo}(M)\to M
\end{align*}
is defined by $e_t(\gamma)=\gamma(t)$; because evaluation is continuous on $C([0,1];M)$, its restriction to $\operatorname{Geo}(M)$ is Borel.
The representation input is the standard dynamical optimal-plan theorem for geodesics in Wasserstein space over a geodesic Polish space. Its hypotheses are met here. The Riemannian manifold $M$ is second-countable, so $M$ is separable as a topological space. Completeness of the Riemannian metric and the Hopf-Rinow theorem imply that the metric space $(M,d_g)$ is complete, proper, and geodesic. Therefore $(M,d_g)$ is a geodesic Polish space, and the theorem applies to the constant-speed $W_2$-geodesic
\begin{align*}
\mu:[0,1]\to\mathcal P_2(M)
\end{align*}
constructed from weak displacement convexity.
The conclusion of the representation theorem is exactly the missing dynamical object: there is a Borel probability measure $\Pi$ on $\operatorname{Geo}(M)$ such that
\begin{align*}
(e_t)_{\#}\Pi=\mu(t)
\end{align*}
for every $t\in[0,1]$, and such that the endpoint plan
\begin{align*}
(e_0,e_1)_{\#}\Pi
\end{align*}
is optimal for the quadratic transport cost induced by $d_g$. This endpoint optimality means precisely that $(e_0,e_1)_{\#}\Pi$ is $W_2$-optimal between $\mu_0$ and $\mu_1$. Finally, setting $\mu_t:=\mu(t)$ only changes notation, so the entropy inequality already proved for $\mu(t)$ holds along the dynamically represented curve $t\mapsto\mu_t$.[/guided]
custom_env
admin
[step:Conclude the asserted weak displacement convexity]
Combining the previous steps, for every pair $\mu_0,\mu_1\in\mathcal P_2(M)$ with finite real entropy, there is an optimal dynamical plan $\Pi$ and its induced curve $t\mapsto\mu_t$ such that $\mu_0$ and $\mu_1$ are the endpoints and
\begin{align*}
\operatorname{Ent}_{\operatorname{vol}_g}(\mu_t)\le (1-t)\operatorname{Ent}_{\operatorname{vol}_g}(\mu_0)+t\operatorname{Ent}_{\operatorname{vol}_g}(\mu_1)-\frac{K}{2}t(1-t)W_2(\mu_0,\mu_1)^2
\end{align*}
for every $t\in[0,1]$. This is precisely $K$-displacement convexity of $\operatorname{Ent}_{\operatorname{vol}_g}$ on $\mathcal P_2(M)$ in the weak sense specified in the theorem statement.
[/step]