Otto-Villani HWI Inequality — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We first prove the estimate for smooth strictly positive densities with enough decay to justify differentiating the entropy along the Wasserstein geodesic. The Hessian lower bound on $V$ gives $\lambda$-displacement convexity of the relative entropy, and the first variation formula identifies the initial entropy derivative with the $L^2(\mu)$ pairing of $\nabla\log f$ and the initial velocity. Cauchy-Schwarz then gives the product of the Wasserstein distance and the square root of Fisher information. Finally, the general finite-entropy finite-information case follows by the standard regularisation and lower semicontinuity theorem for entropy, Fisher information, and the quadratic Wasserstein distance. [/proofplan] [step:Prove the estimate first for smooth positive densities] Assume first that $f\in C^\infty(\mathbb R^n;(0,\infty))$, that $\mu=f\nu$, and that $f$, $V$, and all integrations by parts below have sufficient decay at infinity. We also assume in this preliminary subcase that the constant-speed optimal geodesic from $\mu$ to $\nu$ is a smooth Benamou-Brenier geodesic with smooth initial velocity; this extra regularity is only used for the classical first-variation computation and will be removed by the regularisation step. Since $\nu=e^{-V}\mathcal L^n$, the relative entropy can be written as the free energy \begin{align*} \operatorname{Ent}_\nu(\rho\mathcal L^n)=\int_{\mathbb R^n}\rho\log\rho\,d\mathcal L^n+\int_{\mathbb R^n}V\rho\,d\mathcal L^n \end{align*} whenever $\rho\mathcal L^n\ll\nu$ and the two integrals are finite, using the normalisation $\int_{\mathbb R^n}e^{-V}\,d\mathcal L^n=1$. Because $\mu\in\mathcal P_2(\mathbb R^n)$ and $\nu\in\mathcal P_2(\mathbb R^n)$ by hypothesis, both measures have finite second moment, the quadratic Wasserstein distance between them is finite, and there exists a constant-speed $W_2$-geodesic joining them. We apply [citetheorem:9568], the entropy-plus-$\lambda$-convex-potential displacement convexity theorem: its hypotheses are $V\in C^2(\mathbb R^n;\mathbb R)$ and $D^2V(x)\ge\lambda I_n$ as quadratic forms for every $x\in\mathbb R^n$, exactly as assumed here. Hence the functional $\operatorname{Ent}_\nu$ is $\lambda$-displacement convex along quadratic Wasserstein geodesics. Let \begin{align*} (\mu_t)_{t\in[0,1]}:[0,1]\to\mathcal P_2(\mathbb R^n) \end{align*} be such a constant-speed $W_2$-geodesic from $\mu_0=\mu$ to $\mu_1=\nu$. Let \begin{align*} h:[0,1]\to\mathbb R,\qquad t\mapsto \operatorname{Ent}_\nu(\mu_t). \end{align*} Then $\lambda$-convexity gives, for every $t\in[0,1]$, \begin{align*} h(t)\le (1-t)h(0)+t h(1)-\frac{\lambda}{2}t(1-t)W_2(\mu,\nu)^2. \end{align*} Since $\operatorname{Ent}_\nu(\nu)=0$, subtracting $h(0)$, dividing by $t>0$, and letting $t\downarrow0$ gives \begin{align*} h'(0+)\le -\operatorname{Ent}_\nu(\mu)-\frac{\lambda}{2}W_2(\mu,\nu)^2. \end{align*} Equivalently, \begin{align*} \operatorname{Ent}_\nu(\mu)\le -h'(0+)-\frac{\lambda}{2}W_2(\mu,\nu)^2. \end{align*} [guided] The first reduction is to a class where all differentiations and integrations by parts are classical. We assume that $f:\mathbb R^n\to(0,\infty)$ is smooth, that $\mu=f\nu$, and that the decay at infinity is strong enough to justify every integral identity used below. This temporary assumption will be removed in the last step. Because $\nu=e^{-V}\mathcal L^n$, a measure $\sigma=\rho\mathcal L^n$ has density $\rho e^V$ with respect to $\nu$. Thus its relative entropy with respect to $\nu$ is the same free energy \begin{align*} \operatorname{Ent}_\nu(\sigma)=\int_{\mathbb R^n}\rho\log\rho\,d\mathcal L^n+\int_{\mathbb R^n}V\rho\,d\mathcal L^n \end{align*} up to the normalisation already encoded by $\nu=e^{-V}\mathcal L^n$. The Hessian assumption \begin{align*} \xi^\top D^2V(x)\xi\ge \lambda |\xi|^2 \end{align*} for every $x\in\mathbb R^n$ and $\xi\in\mathbb R^n$ is precisely the quadratic-form hypothesis $D^2V\ge\lambda I_n$ in [citetheorem:9568], the entropy-plus-$\lambda$-convex-potential displacement convexity theorem. Since $\mu,\nu\in\mathcal P_2(\mathbb R^n)$, the distance $W_2(\mu,\nu)$ is finite and a constant-speed geodesic exists. Let \begin{align*} (\mu_t)_{t\in[0,1]}:[0,1]\to\mathcal P_2(\mathbb R^n) \end{align*} be the constant-speed $W_2$-geodesic joining $\mu$ to $\nu$, and define \begin{align*} h:[0,1]\to\mathbb R,\qquad t\mapsto \operatorname{Ent}_\nu(\mu_t). \end{align*} The $\lambda$-convexity inequality says that, for each $t\in[0,1]$, \begin{align*} h(t)\le (1-t)h(0)+t h(1)-\frac{\lambda}{2}t(1-t)W_2(\mu,\nu)^2. \end{align*} Here $h(0)=\operatorname{Ent}_\nu(\mu)$ and $h(1)=\operatorname{Ent}_\nu(\nu)=0$, because $\nu$ has density $1$ with respect to itself and therefore \begin{align*} \operatorname{Ent}_\nu(\nu)=\int_{\mathbb R^n}1\log1\,d\nu=0. \end{align*} Subtract $h(0)$ from the convexity estimate and divide by $t>0$: \begin{align*} \frac{h(t)-h(0)}{t}\le h(1)-h(0)-\frac{\lambda}{2}(1-t)W_2(\mu,\nu)^2. \end{align*} Letting $t\downarrow0$ gives \begin{align*} h'(0+)\le -\operatorname{Ent}_\nu(\mu)-\frac{\lambda}{2}W_2(\mu,\nu)^2. \end{align*} This is the entropy drop estimate. Rearranging it yields \begin{align*} \operatorname{Ent}_\nu(\mu)\le -h'(0+)-\frac{\lambda}{2}W_2(\mu,\nu)^2. \end{align*} The rest of the proof is devoted to estimating the initial derivative $-h'(0+)$ by the Wasserstein distance times the square root of Fisher information. [/guided] [/step] [step:Compute the first entropy variation along the initial velocity] In this smooth subcase, we work in the smooth Benamou-Brenier regime specified in the preceding step. Let \begin{align*} v_0:\mathbb R^n\to\mathbb R^n \end{align*} be the initial tangent velocity field of this constant-speed geodesic $(\mu_t)_{t\in[0,1]}$, meaning that $v_0\in L^2(\mu;\mathbb R^n)$ represents the derivative of $t\mapsto\mu_t$ at $t=0$ through the continuity equation. Thus the continuity equation at $t=0$ is \begin{align*} \partial_t\mu_t\big|_{t=0}+\nabla\cdot(v_0\mu)=0 \end{align*} in the distributional sense, and the Benamou-Brenier characterization of the metric speed gives \begin{align*} \int_{\mathbb R^n}|v_0(x)|^2\,d\mu(x)=W_2(\mu,\nu)^2. \end{align*} In the smooth setting, differentiating the entropy along the continuity equation, equivalently the Otto first-variation formula of [citetheorem:9562], gives \begin{align*} h'(0+)=\int_{\mathbb R^n}\nabla\log f(x)\cdot v_0(x)\,d\mu(x). \end{align*} Indeed, if $\mu_t=\rho_t\mathcal L^n$, then \begin{align*} \operatorname{Ent}_\nu(\mu_t)=\int_{\mathbb R^n}\rho_t\log\rho_t\,d\mathcal L^n+\int_{\mathbb R^n}V\rho_t\,d\mathcal L^n. \end{align*} Differentiating at $t=0$ and using $\partial_t\rho_t|_{t=0}=-\nabla\cdot(\rho_0 v_0)$ gives \begin{align*} h'(0+)=\int_{\mathbb R^n}(\log\rho_0(x)+1+V(x))\partial_t\rho_t(x)\big|_{t=0}\,d\mathcal L^n(x). \end{align*} Integration by parts gives \begin{align*} h'(0+)=\int_{\mathbb R^n}\nabla(\log\rho_0+V)(x)\cdot v_0(x)\rho_0(x)\,d\mathcal L^n(x). \end{align*} Since $\mu=f\nu=f e^{-V}\mathcal L^n$, we have $\rho_0=f e^{-V}$ and hence $\log\rho_0+V=\log f$. Therefore \begin{align*} h'(0+)=\int_{\mathbb R^n}\nabla\log f(x)\cdot v_0(x)\,d\mu(x). \end{align*} [guided] In this preliminary regime, the geodesic is smooth enough that its initial tangent can be represented by a vector field \begin{align*} v_0:\mathbb R^n\to\mathbb R^n. \end{align*} The statement that $v_0$ is the initial velocity means that the curve satisfies the continuity equation at $t=0$: \begin{align*} \partial_t\mu_t\big|_{t=0}+\nabla\cdot(v_0\mu)=0 \end{align*} in the distributional sense. Since this geodesic has constant speed, the Benamou-Brenier metric-speed identity gives \begin{align*} \int_{\mathbb R^n}|v_0(x)|^2\,d\mu(x)=W_2(\mu,\nu)^2. \end{align*} Write $\mu_t=\rho_t\mathcal L^n$. The free-energy expression for the relative entropy is \begin{align*} \operatorname{Ent}_\nu(\mu_t)=\int_{\mathbb R^n}\rho_t(x)\log\rho_t(x)\,d\mathcal L^n(x)+\int_{\mathbb R^n}V(x)\rho_t(x)\,d\mathcal L^n(x). \end{align*} Differentiating at $t=0$ is justified by the smoothness and decay assumptions in this subcase. The derivative is \begin{align*} h'(0+)=\int_{\mathbb R^n}(\log\rho_0(x)+1+V(x))\partial_t\rho_t(x)\big|_{t=0}\,d\mathcal L^n(x). \end{align*} The continuity equation gives $\partial_t\rho_t|_{t=0}=-\nabla\cdot(\rho_0v_0)$. Integrating by parts with respect to $\mathcal L^n$ gives \begin{align*} h'(0+)=\int_{\mathbb R^n}\nabla(\log\rho_0+V)(x)\cdot v_0(x)\rho_0(x)\,d\mathcal L^n(x). \end{align*} Because $\mu=f\nu$ and $d\nu(x)=e^{-V(x)}\,d\mathcal L^n(x)$, one has $\rho_0=f e^{-V}$ and hence $\log\rho_0+V=\log f$. Therefore \begin{align*} h'(0+)=\int_{\mathbb R^n}\nabla\log f(x)\cdot v_0(x)\,d\mu(x). \end{align*} This identity is the bridge from displacement convexity to Fisher information: it converts the entropy slope into an $L^2(\mu)$ pairing. [/guided] [/step] [step:Bound the first variation by Fisher information and geodesic length] Apply the Cauchy-Schwarz inequality in the Hilbert space $L^2(\mu;\mathbb R^n)$ to the vector fields $\nabla\log f$ and $v_0$. This gives \begin{align*} -h'(0+)\le |h'(0+)|\le \left(\int_{\mathbb R^n}|\nabla\log f(x)|^2\,d\mu(x)\right)^{1/2}\left(\int_{\mathbb R^n}|v_0(x)|^2\,d\mu(x)\right)^{1/2}. \end{align*} By the definitions of $I_\nu(\mu)$ and $v_0$, \begin{align*} -h'(0+)\le \sqrt{I_\nu(\mu)}W_2(\mu,\nu). \end{align*} Combining this with the entropy drop estimate gives \begin{align*} \operatorname{Ent}_\nu(\mu)\le W_2(\mu,\nu)\sqrt{I_\nu(\mu)}-\frac{\lambda}{2}W_2(\mu,\nu)^2. \end{align*} Thus the desired inequality holds in the smooth positive case. [guided] The first-variation identity from the previous step expresses the initial entropy derivative as an $L^2(\mu)$ inner product: \begin{align*} h'(0+)=\int_{\mathbb R^n}\nabla\log f(x)\cdot v_0(x)\,d\mu(x). \end{align*} We now apply the Cauchy-Schwarz inequality in the Hilbert space $L^2(\mu;\mathbb R^n)$ to the two vector fields $\nabla\log f$ and $v_0$. This gives \begin{align*} -h'(0+)\le |h'(0+)|\le \left(\int_{\mathbb R^n}|\nabla\log f(x)|^2\,d\mu(x)\right)^{1/2}\left(\int_{\mathbb R^n}|v_0(x)|^2\,d\mu(x)\right)^{1/2}. \end{align*} By definition of Fisher information in the smooth positive case, \begin{align*} \int_{\mathbb R^n}|\nabla\log f(x)|^2\,d\mu(x)=I_\nu(\mu). \end{align*} By the constant-speed geodesic identity, \begin{align*} \int_{\mathbb R^n}|v_0(x)|^2\,d\mu(x)=W_2(\mu,\nu)^2. \end{align*} Therefore \begin{align*} -h'(0+)\le \sqrt{I_\nu(\mu)}W_2(\mu,\nu). \end{align*} Substituting this estimate into the entropy drop inequality \begin{align*} \operatorname{Ent}_\nu(\mu)\le -h'(0+)-\frac{\lambda}{2}W_2(\mu,\nu)^2 \end{align*} gives \begin{align*} \operatorname{Ent}_\nu(\mu)\le W_2(\mu,\nu)\sqrt{I_\nu(\mu)}-\frac{\lambda}{2}W_2(\mu,\nu)^2. \end{align*} This proves the HWI inequality in the smooth positive regime. [/guided] [/step] [step:Pass to finite entropy and finite Fisher information by regularisation] For a general measure $\mu=f\nu$ satisfying the hypotheses, use the weighted Fisher-information recovery property assumed in the statement. It provides a sequence of smooth strictly positive probability densities \begin{align*} f_k:\mathbb R^n\to(0,\infty) \end{align*} such that the smooth HWI argument applies to each $\mu_k=f_k\nu$ and \begin{align*} W_2(\mu_k,\mu)\to0, \end{align*} \begin{align*} \operatorname{Ent}_\nu(\mu_k)\to\operatorname{Ent}_\nu(\mu), \end{align*} and \begin{align*} \limsup_{k\to\infty}I_\nu(\mu_k)\le I_\nu(\mu). \end{align*} Applying the smooth inequality to $\mu_k$ gives \begin{align*} \operatorname{Ent}_\nu(\mu_k)\le W_2(\mu_k,\nu)\sqrt{I_\nu(\mu_k)}-\frac{\lambda}{2}W_2(\mu_k,\nu)^2. \end{align*} Since $W_2(\mu_k,\mu)\to0$, the triangle inequality for $W_2$ gives \begin{align*} |W_2(\mu_k,\nu)-W_2(\mu,\nu)|\le W_2(\mu_k,\mu)\to0. \end{align*} Thus $W_2(\mu_k,\nu)\to W_2(\mu,\nu)$. Since the square-root map is increasing and continuous on $[0,\infty)$, the Fisher-information limsup bound gives \begin{align*} \limsup_{k\to\infty}\sqrt{I_\nu(\mu_k)}\le \sqrt{I_\nu(\mu)}. \end{align*} Taking the limit on the entropy side and the limit superior on the right-hand side yields \begin{align*} \operatorname{Ent}_\nu(\mu)\le W_2(\mu,\nu)\sqrt{I_\nu(\mu)}-\frac{\lambda}{2}W_2(\mu,\nu)^2. \end{align*} This is the claimed Otto-Villani HWI inequality for every admissible $\mu$. [guided] The smooth proof used extra regularity only to justify the first-variation computation along a smooth geodesic. To remove that extra hypothesis, we use the weighted Fisher-information recovery property included in the statement. This assumption supplies smooth strictly positive probability densities \begin{align*} f_k:\mathbb R^n\to(0,\infty) \end{align*} for which the smooth HWI argument applies. With $\mu_k=f_k\nu$, it gives \begin{align*} W_2(\mu_k,\mu)\to0, \end{align*} \begin{align*} \operatorname{Ent}_\nu(\mu_k)\to\operatorname{Ent}_\nu(\mu), \end{align*} and \begin{align*} \limsup_{k\to\infty}I_\nu(\mu_k)\le I_\nu(\mu). \end{align*} For each $k$, the smooth HWI inequality gives \begin{align*} \operatorname{Ent}_\nu(\mu_k)\le W_2(\mu_k,\nu)\sqrt{I_\nu(\mu_k)}-\frac{\lambda}{2}W_2(\mu_k,\nu)^2. \end{align*} Since $W_2(\mu_k,\mu)\to0$, the triangle inequality gives \begin{align*} |W_2(\mu_k,\nu)-W_2(\mu,\nu)|\le W_2(\mu_k,\mu)\to0, \end{align*} so $W_2(\mu_k,\nu)\to W_2(\mu,\nu)$. The square-root function is increasing and continuous on $[0,\infty)$, hence \begin{align*} \limsup_{k\to\infty}\sqrt{I_\nu(\mu_k)}\le \sqrt{I_\nu(\mu)}. \end{align*} Passing to the limit on the entropy side and taking the limit superior on the right gives \begin{align*} \operatorname{Ent}_\nu(\mu)\le W_2(\mu,\nu)\sqrt{I_\nu(\mu)}-\frac{\lambda}{2}W_2(\mu,\nu)^2. \end{align*} This proves the inequality for the full finite-entropy finite-Fisher-information class in the statement. [/guided] [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Otto-Villani HWI Inequality (Theorem # 9570)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Otto-Villani HWI Inequality (Theorem # 9570)

Discussion

Proof

Explore Further