[proofplan]
We first prove the estimate for smooth strictly positive densities with enough decay to justify differentiating the entropy along the Wasserstein geodesic. The Hessian lower bound on $V$ gives $\lambda$-displacement convexity of the relative entropy, and the first variation formula identifies the initial entropy derivative with the $L^2(\mu)$ pairing of $\nabla\log f$ and the initial velocity. Cauchy-Schwarz then gives the product of the Wasserstein distance and the square root of Fisher information. Finally, the general finite-entropy finite-information case follows by the standard regularisation and lower semicontinuity theorem for entropy, Fisher information, and the quadratic Wasserstein distance.
[/proofplan]
[step:Prove the estimate first for smooth positive densities]
Assume first that $f\in C^\infty(\mathbb R^n;(0,\infty))$, that $\mu=f\nu$, and that $f$, $V$, and all integrations by parts below have sufficient decay at infinity. We also assume in this preliminary subcase that the constant-speed optimal geodesic from $\mu$ to $\nu$ is a smooth Benamou-Brenier geodesic with smooth initial velocity; this extra regularity is only used for the classical first-variation computation and will be removed by the regularisation step. Since $\nu=e^{-V}\mathcal L^n$, the relative entropy can be written as the free energy
\begin{align*}
\operatorname{Ent}_\nu(\rho\mathcal L^n)=\int_{\mathbb R^n}\rho\log\rho\,d\mathcal L^n+\int_{\mathbb R^n}V\rho\,d\mathcal L^n
\end{align*}
whenever $\rho\mathcal L^n\ll\nu$ and the two integrals are finite, using the normalisation $\int_{\mathbb R^n}e^{-V}\,d\mathcal L^n=1$.
Because $\mu\in\mathcal P_2(\mathbb R^n)$ and $\nu\in\mathcal P_2(\mathbb R^n)$ by hypothesis, both measures have finite second moment, the quadratic Wasserstein distance between them is finite, and there exists a constant-speed $W_2$-geodesic joining them. We apply [citetheorem:9568], the entropy-plus-$\lambda$-convex-potential displacement convexity theorem: its hypotheses are $V\in C^2(\mathbb R^n;\mathbb R)$ and $D^2V(x)\ge\lambda I_n$ as quadratic forms for every $x\in\mathbb R^n$, exactly as assumed here. Hence the functional $\operatorname{Ent}_\nu$ is $\lambda$-displacement convex along quadratic Wasserstein geodesics. Let
\begin{align*}
(\mu_t)_{t\in[0,1]}:[0,1]\to\mathcal P_2(\mathbb R^n)
\end{align*}
be such a constant-speed $W_2$-geodesic from $\mu_0=\mu$ to $\mu_1=\nu$. Let
\begin{align*}
h:[0,1]\to\mathbb R,\qquad t\mapsto \operatorname{Ent}_\nu(\mu_t).
\end{align*}
Then $\lambda$-convexity gives, for every $t\in[0,1]$,
\begin{align*}
h(t)\le (1-t)h(0)+t h(1)-\frac{\lambda}{2}t(1-t)W_2(\mu,\nu)^2.
\end{align*}
Since $\operatorname{Ent}_\nu(\nu)=0$, subtracting $h(0)$, dividing by $t>0$, and letting $t\downarrow0$ gives
\begin{align*}
h'(0+)\le -\operatorname{Ent}_\nu(\mu)-\frac{\lambda}{2}W_2(\mu,\nu)^2.
\end{align*}
Equivalently,
\begin{align*}
\operatorname{Ent}_\nu(\mu)\le -h'(0+)-\frac{\lambda}{2}W_2(\mu,\nu)^2.
\end{align*}
[guided]
The first reduction is to a class where all differentiations and integrations by parts are classical. We assume that $f:\mathbb R^n\to(0,\infty)$ is smooth, that $\mu=f\nu$, and that the decay at infinity is strong enough to justify every integral identity used below. This temporary assumption will be removed in the last step.
Because $\nu=e^{-V}\mathcal L^n$, a measure $\sigma=\rho\mathcal L^n$ has density $\rho e^V$ with respect to $\nu$. Thus its relative entropy with respect to $\nu$ is the same free energy
\begin{align*}
\operatorname{Ent}_\nu(\sigma)=\int_{\mathbb R^n}\rho\log\rho\,d\mathcal L^n+\int_{\mathbb R^n}V\rho\,d\mathcal L^n
\end{align*}
up to the normalisation already encoded by $\nu=e^{-V}\mathcal L^n$. The Hessian assumption
\begin{align*}
\xi^\top D^2V(x)\xi\ge \lambda |\xi|^2
\end{align*}
for every $x\in\mathbb R^n$ and $\xi\in\mathbb R^n$ is precisely the quadratic-form hypothesis $D^2V\ge\lambda I_n$ in [citetheorem:9568], the entropy-plus-$\lambda$-convex-potential displacement convexity theorem. Since $\mu,\nu\in\mathcal P_2(\mathbb R^n)$, the distance $W_2(\mu,\nu)$ is finite and a constant-speed geodesic exists.
Let
\begin{align*}
(\mu_t)_{t\in[0,1]}:[0,1]\to\mathcal P_2(\mathbb R^n)
\end{align*}
be the constant-speed $W_2$-geodesic joining $\mu$ to $\nu$, and define
\begin{align*}
h:[0,1]\to\mathbb R,\qquad t\mapsto \operatorname{Ent}_\nu(\mu_t).
\end{align*}
The $\lambda$-convexity inequality says that, for each $t\in[0,1]$,
\begin{align*}
h(t)\le (1-t)h(0)+t h(1)-\frac{\lambda}{2}t(1-t)W_2(\mu,\nu)^2.
\end{align*}
Here $h(0)=\operatorname{Ent}_\nu(\mu)$ and $h(1)=\operatorname{Ent}_\nu(\nu)=0$, because $\nu$ has density $1$ with respect to itself and therefore
\begin{align*}
\operatorname{Ent}_\nu(\nu)=\int_{\mathbb R^n}1\log1\,d\nu=0.
\end{align*}
Subtract $h(0)$ from the convexity estimate and divide by $t>0$:
\begin{align*}
\frac{h(t)-h(0)}{t}\le h(1)-h(0)-\frac{\lambda}{2}(1-t)W_2(\mu,\nu)^2.
\end{align*}
Letting $t\downarrow0$ gives
\begin{align*}
h'(0+)\le -\operatorname{Ent}_\nu(\mu)-\frac{\lambda}{2}W_2(\mu,\nu)^2.
\end{align*}
This is the entropy drop estimate. Rearranging it yields
\begin{align*}
\operatorname{Ent}_\nu(\mu)\le -h'(0+)-\frac{\lambda}{2}W_2(\mu,\nu)^2.
\end{align*}
The rest of the proof is devoted to estimating the initial derivative $-h'(0+)$ by the Wasserstein distance times the square root of Fisher information.
[/guided]
[/step]
[step:Compute the first entropy variation along the initial velocity]
In this smooth subcase, we work in the smooth Benamou-Brenier regime specified in the preceding step. Let
\begin{align*}
v_0:\mathbb R^n\to\mathbb R^n
\end{align*}
be the initial tangent velocity field of this constant-speed geodesic $(\mu_t)_{t\in[0,1]}$, meaning that $v_0\in L^2(\mu;\mathbb R^n)$ represents the derivative of $t\mapsto\mu_t$ at $t=0$ through the continuity equation. Thus the continuity equation at $t=0$ is
\begin{align*}
\partial_t\mu_t\big|_{t=0}+\nabla\cdot(v_0\mu)=0
\end{align*}
in the distributional sense, and the Benamou-Brenier characterization of the metric speed gives
\begin{align*}
\int_{\mathbb R^n}|v_0(x)|^2\,d\mu(x)=W_2(\mu,\nu)^2.
\end{align*}
In the smooth setting, differentiating the entropy along the continuity equation, equivalently the Otto first-variation formula of [citetheorem:9562], gives
\begin{align*}
h'(0+)=\int_{\mathbb R^n}\nabla\log f(x)\cdot v_0(x)\,d\mu(x).
\end{align*}
Indeed, if $\mu_t=\rho_t\mathcal L^n$, then
\begin{align*}
\operatorname{Ent}_\nu(\mu_t)=\int_{\mathbb R^n}\rho_t\log\rho_t\,d\mathcal L^n+\int_{\mathbb R^n}V\rho_t\,d\mathcal L^n.
\end{align*}
Differentiating at $t=0$ and using $\partial_t\rho_t|_{t=0}=-\nabla\cdot(\rho_0 v_0)$ gives
\begin{align*}
h'(0+)=\int_{\mathbb R^n}(\log\rho_0(x)+1+V(x))\partial_t\rho_t(x)\big|_{t=0}\,d\mathcal L^n(x).
\end{align*}
Integration by parts gives
\begin{align*}
h'(0+)=\int_{\mathbb R^n}\nabla(\log\rho_0+V)(x)\cdot v_0(x)\rho_0(x)\,d\mathcal L^n(x).
\end{align*}
Since $\mu=f\nu=f e^{-V}\mathcal L^n$, we have $\rho_0=f e^{-V}$ and hence $\log\rho_0+V=\log f$. Therefore
\begin{align*}
h'(0+)=\int_{\mathbb R^n}\nabla\log f(x)\cdot v_0(x)\,d\mu(x).
\end{align*}
[guided]
In this preliminary regime, the geodesic is smooth enough that its initial tangent can be represented by a vector field
\begin{align*}
v_0:\mathbb R^n\to\mathbb R^n.
\end{align*}
The statement that $v_0$ is the initial velocity means that the curve satisfies the continuity equation at $t=0$:
\begin{align*}
\partial_t\mu_t\big|_{t=0}+\nabla\cdot(v_0\mu)=0
\end{align*}
in the distributional sense. Since this geodesic has constant speed, the Benamou-Brenier metric-speed identity gives
\begin{align*}
\int_{\mathbb R^n}|v_0(x)|^2\,d\mu(x)=W_2(\mu,\nu)^2.
\end{align*}
Write $\mu_t=\rho_t\mathcal L^n$. The free-energy expression for the relative entropy is
\begin{align*}
\operatorname{Ent}_\nu(\mu_t)=\int_{\mathbb R^n}\rho_t(x)\log\rho_t(x)\,d\mathcal L^n(x)+\int_{\mathbb R^n}V(x)\rho_t(x)\,d\mathcal L^n(x).
\end{align*}
Differentiating at $t=0$ is justified by the smoothness and decay assumptions in this subcase. The derivative is
\begin{align*}
h'(0+)=\int_{\mathbb R^n}(\log\rho_0(x)+1+V(x))\partial_t\rho_t(x)\big|_{t=0}\,d\mathcal L^n(x).
\end{align*}
The continuity equation gives $\partial_t\rho_t|_{t=0}=-\nabla\cdot(\rho_0v_0)$. Integrating by parts with respect to $\mathcal L^n$ gives
\begin{align*}
h'(0+)=\int_{\mathbb R^n}\nabla(\log\rho_0+V)(x)\cdot v_0(x)\rho_0(x)\,d\mathcal L^n(x).
\end{align*}
Because $\mu=f\nu$ and $d\nu(x)=e^{-V(x)}\,d\mathcal L^n(x)$, one has $\rho_0=f e^{-V}$ and hence $\log\rho_0+V=\log f$. Therefore
\begin{align*}
h'(0+)=\int_{\mathbb R^n}\nabla\log f(x)\cdot v_0(x)\,d\mu(x).
\end{align*}
This identity is the bridge from displacement convexity to Fisher information: it converts the entropy slope into an $L^2(\mu)$ pairing.
[/guided]
[/step]
[step:Bound the first variation by Fisher information and geodesic length]
Apply the Cauchy-Schwarz inequality in the Hilbert space $L^2(\mu;\mathbb R^n)$ to the vector fields $\nabla\log f$ and $v_0$. This gives
\begin{align*}
-h'(0+)\le |h'(0+)|\le \left(\int_{\mathbb R^n}|\nabla\log f(x)|^2\,d\mu(x)\right)^{1/2}\left(\int_{\mathbb R^n}|v_0(x)|^2\,d\mu(x)\right)^{1/2}.
\end{align*}
By the definitions of $I_\nu(\mu)$ and $v_0$,
\begin{align*}
-h'(0+)\le \sqrt{I_\nu(\mu)}W_2(\mu,\nu).
\end{align*}
Combining this with the entropy drop estimate gives
\begin{align*}
\operatorname{Ent}_\nu(\mu)\le W_2(\mu,\nu)\sqrt{I_\nu(\mu)}-\frac{\lambda}{2}W_2(\mu,\nu)^2.
\end{align*}
Thus the desired inequality holds in the smooth positive case.
[guided]
The first-variation identity from the previous step expresses the initial entropy derivative as an $L^2(\mu)$ inner product:
\begin{align*}
h'(0+)=\int_{\mathbb R^n}\nabla\log f(x)\cdot v_0(x)\,d\mu(x).
\end{align*}
We now apply the Cauchy-Schwarz inequality in the Hilbert space $L^2(\mu;\mathbb R^n)$ to the two vector fields $\nabla\log f$ and $v_0$. This gives
\begin{align*}
-h'(0+)\le |h'(0+)|\le \left(\int_{\mathbb R^n}|\nabla\log f(x)|^2\,d\mu(x)\right)^{1/2}\left(\int_{\mathbb R^n}|v_0(x)|^2\,d\mu(x)\right)^{1/2}.
\end{align*}
By definition of Fisher information in the smooth positive case,
\begin{align*}
\int_{\mathbb R^n}|\nabla\log f(x)|^2\,d\mu(x)=I_\nu(\mu).
\end{align*}
By the constant-speed geodesic identity,
\begin{align*}
\int_{\mathbb R^n}|v_0(x)|^2\,d\mu(x)=W_2(\mu,\nu)^2.
\end{align*}
Therefore
\begin{align*}
-h'(0+)\le \sqrt{I_\nu(\mu)}W_2(\mu,\nu).
\end{align*}
Substituting this estimate into the entropy drop inequality
\begin{align*}
\operatorname{Ent}_\nu(\mu)\le -h'(0+)-\frac{\lambda}{2}W_2(\mu,\nu)^2
\end{align*}
gives
\begin{align*}
\operatorname{Ent}_\nu(\mu)\le W_2(\mu,\nu)\sqrt{I_\nu(\mu)}-\frac{\lambda}{2}W_2(\mu,\nu)^2.
\end{align*}
This proves the HWI inequality in the smooth positive regime.
[/guided]
[/step]
[step:Pass to finite entropy and finite Fisher information by regularisation]
For a general measure $\mu=f\nu$ satisfying the hypotheses, use the weighted Fisher-information recovery property assumed in the statement. It provides a sequence of smooth strictly positive probability densities
\begin{align*}
f_k:\mathbb R^n\to(0,\infty)
\end{align*}
such that the smooth HWI argument applies to each $\mu_k=f_k\nu$ and
\begin{align*}
W_2(\mu_k,\mu)\to0,
\end{align*}
\begin{align*}
\operatorname{Ent}_\nu(\mu_k)\to\operatorname{Ent}_\nu(\mu),
\end{align*}
and
\begin{align*}
\limsup_{k\to\infty}I_\nu(\mu_k)\le I_\nu(\mu).
\end{align*}
Applying the smooth inequality to $\mu_k$ gives
\begin{align*}
\operatorname{Ent}_\nu(\mu_k)\le W_2(\mu_k,\nu)\sqrt{I_\nu(\mu_k)}-\frac{\lambda}{2}W_2(\mu_k,\nu)^2.
\end{align*}
Since $W_2(\mu_k,\mu)\to0$, the triangle inequality for $W_2$ gives
\begin{align*}
|W_2(\mu_k,\nu)-W_2(\mu,\nu)|\le W_2(\mu_k,\mu)\to0.
\end{align*}
Thus $W_2(\mu_k,\nu)\to W_2(\mu,\nu)$. Since the square-root map is increasing and continuous on $[0,\infty)$, the Fisher-information limsup bound gives
\begin{align*}
\limsup_{k\to\infty}\sqrt{I_\nu(\mu_k)}\le \sqrt{I_\nu(\mu)}.
\end{align*}
Taking the limit on the entropy side and the limit superior on the right-hand side yields
\begin{align*}
\operatorname{Ent}_\nu(\mu)\le W_2(\mu,\nu)\sqrt{I_\nu(\mu)}-\frac{\lambda}{2}W_2(\mu,\nu)^2.
\end{align*}
This is the claimed Otto-Villani HWI inequality for every admissible $\mu$.
[guided]
The smooth proof used extra regularity only to justify the first-variation computation along a smooth geodesic. To remove that extra hypothesis, we use the weighted Fisher-information recovery property included in the statement. This assumption supplies smooth strictly positive probability densities
\begin{align*}
f_k:\mathbb R^n\to(0,\infty)
\end{align*}
for which the smooth HWI argument applies. With $\mu_k=f_k\nu$, it gives
\begin{align*}
W_2(\mu_k,\mu)\to0,
\end{align*}
\begin{align*}
\operatorname{Ent}_\nu(\mu_k)\to\operatorname{Ent}_\nu(\mu),
\end{align*}
and
\begin{align*}
\limsup_{k\to\infty}I_\nu(\mu_k)\le I_\nu(\mu).
\end{align*}
For each $k$, the smooth HWI inequality gives
\begin{align*}
\operatorname{Ent}_\nu(\mu_k)\le W_2(\mu_k,\nu)\sqrt{I_\nu(\mu_k)}-\frac{\lambda}{2}W_2(\mu_k,\nu)^2.
\end{align*}
Since $W_2(\mu_k,\mu)\to0$, the triangle inequality gives
\begin{align*}
|W_2(\mu_k,\nu)-W_2(\mu,\nu)|\le W_2(\mu_k,\mu)\to0,
\end{align*}
so $W_2(\mu_k,\nu)\to W_2(\mu,\nu)$. The square-root function is increasing and continuous on $[0,\infty)$, hence
\begin{align*}
\limsup_{k\to\infty}\sqrt{I_\nu(\mu_k)}\le \sqrt{I_\nu(\mu)}.
\end{align*}
Passing to the limit on the entropy side and taking the limit superior on the right gives
\begin{align*}
\operatorname{Ent}_\nu(\mu)\le W_2(\mu,\nu)\sqrt{I_\nu(\mu)}-\frac{\lambda}{2}W_2(\mu,\nu)^2.
\end{align*}
This proves the inequality for the full finite-entropy finite-Fisher-information class in the statement.
[/guided]
[/step]