Otto-Villani Theorem: Talagrand $T_2$ Inequality from Logarithmic Sobolev Inequality

Theorem

Edit Issues Pull Requests Attributions Admin

Let $n\in\mathbb N$, let $\lambda>0$, and let $\nu$ be a Borel probability measure on $\mathbb R^n$ with finite second moment. For a locally Lipschitz function $h:\mathbb R^n\to\mathbb R$, define its Euclidean local slope by \begin{align*} |\nabla h|(x):=\limsup_{y\to x,\,y\ne x}\frac{|h(y)-h(x)|}{|y-x|}. \end{align*} When $h$ is differentiable at $x$, this local slope agrees with the Euclidean norm of the classical gradient. Assume that $\nu$ satisfies the logarithmic Sobolev inequality with constant $\lambda$ in the following Euclidean local-slope form: for every bounded locally Lipschitz function $g:\mathbb R^n\to\mathbb R$ satisfying \begin{align*} \int_{\mathbb R^n} g^2\,d\nu(x)=1, \end{align*} one has \begin{align*} \operatorname{Ent}_\nu(g^2\nu)\le \frac{2}{\lambda}\int_{\mathbb R^n}|\nabla g|^2\,d\nu(x). \end{align*} Here, if $f:\mathbb R^n\to[0,\infty]$ is a $\nu$-density with $\int_{\mathbb R^n}f\,d\nu(x)=1$, then \begin{align*} \operatorname{Ent}_\nu(f\nu):=\int_{\mathbb R^n} f\log f\,d\nu(x) \end{align*} whenever the positive part is integrable, and for bounded locally Lipschitz probability densities with finite Fisher information \begin{align*} I_\nu(f\nu):=\int_{\mathbb R^n}\frac{|\nabla f|^2}{f}\,d\nu(x), \end{align*} where $|\nabla f|$ is the Euclidean local slope. Equivalently, the hypothesis says \begin{align*} \operatorname{Ent}_\nu(f\nu)\le \frac{1}{2\lambda}I_\nu(f\nu) \end{align*} for such densities $f$. Let $W_2$ denote the quadratic Wasserstein distance on Borel probability measures on $\mathbb R^n$ with finite second moments, defined by \begin{align*} W_2(\mu,\nu)^2:=\inf_{\pi\in\Pi(\mu,\nu)}\int_{\mathbb R^n\times\mathbb R^n}|x-y|^2\,d\pi(x,y), \end{align*} where $\Pi(\mu,\nu)$ is the set of Borel couplings of $\mu$ and $\nu$. Then every Borel probability measure $\mu$ on $\mathbb R^n$ such that $\mu\ll\nu$, $\mu$ has finite second moment, and $\operatorname{Ent}_\nu(\mu)<\infty$ satisfies \begin{align*} W_2(\mu,\nu)^2\le \frac{2}{\lambda}\operatorname{Ent}_\nu(\mu). \end{align*}

Discussion

No discussion available for this theorem.

Proof

[proofplan] We prove the transport inequality through the Hopf-Lax semigroup and the Kantorovich dual formulation for the quadratic cost. The logarithmic Sobolev inequality controls the derivative of a normalized exponential moment along the Hopf-Lax flow, while the Hamilton-Jacobi inequality gives the sign needed for monotonicity. This monotonicity yields an infimum-convolution inequality, and the entropy variational formula converts it into a dual transport bound. Taking the supremum in the quadratic Kantorovich dual gives the asserted $T_2$ inequality. [/proofplan] [step:Define the Hopf-Lax flow and the exponential moment functional] Let $\varphi:\mathbb R^n\to\mathbb R$ be a bounded Lipschitz function. For $t>0$, define the Hopf-Lax transform \begin{align*} Q_t\varphi:\mathbb R^n\to\mathbb R,\qquad Q_t\varphi(x):=\inf_{y\in\mathbb R^n}\left\{\varphi(y)+\frac{|x-y|^2}{2t}\right\}. \end{align*} For every locally Lipschitz map $w:\mathbb R^n\to\mathbb R$, the notation $|\nabla w|(x)$ means the Euclidean local slope defined in the theorem statement. Set \begin{align*} u:(0,1]\times\mathbb R^n\to\mathbb R,\qquad u(t,x):=Q_t\varphi(x). \end{align*} Let $L:=\operatorname{Lip}(\varphi)$ denote the global Lipschitz constant of $\varphi$. We use the standard Hopf-Lax Hamilton-Jacobi theorem for bounded Lipschitz data on Euclidean space: $u$ is locally Lipschitz on $(0,1]\times\mathbb R^n$, $Q_t\varphi$ is $L$-Lipschitz for every $t>0$, $|\partial_tu(t,x)|\le L^2/2$ for a.e. $(t,x)$, and the local-slope Hamilton-Jacobi inequality \begin{align*} \partial_t u(t,x)+\frac{1}{2}|\nabla u(t,\cdot)|^2(x)\le0 \end{align*} holds for a.e. $t\in(0,1]$ and every $x\in\mathbb R^n$. At Lebesgue-a.e. differentiability point $(t,x)$ the equality version holds with the classical gradient. In particular, \begin{align*} |\nabla u(t,x)|\le L,\qquad |\partial_t u(t,x)|\le \frac{L^2}{2} \end{align*} for a.e. $(t,x)\in(0,1]\times\mathbb R^n$. Moreover $Q_t\varphi(x)\to\varphi(x)$ for every $x\in\mathbb R^n$ as $t\downarrow0$, and $|Q_t\varphi(x)|\le \|\varphi\|_\infty$. Define the exponential moment map \begin{align*} Z:(0,1]\to(0,\infty),\qquad Z(t):=\int_{\mathbb R^n}e^{\lambda t u(t,x)}\,d\nu(x) \end{align*} and \begin{align*} \Phi:(0,1]\to\mathbb R,\qquad \Phi(t):=\frac{1}{\lambda t}\log Z(t). \end{align*} Because $u$ is bounded, $Z(t)$ is finite and strictly positive for every $t\in(0,1]$. [/step] [step:Differentiate the normalized exponential moment] For a.e. $t\in(0,1]$, define the probability density \begin{align*} h_t:\mathbb R^n\to(0,\infty),\qquad h_t(x):=\frac{e^{\lambda t u(t,x)}}{Z(t)}. \end{align*} Then $\int_{\mathbb R^n}h_t\,d\nu=1$. Fix $0<a<1$. On $[a,1]\times\mathbb R^n$, the function $u$ is bounded by $\|\varphi\|_\infty$ and the a.e. bound $|\partial_tu|\le L^2/2$ gives \begin{align*} \left|\lambda\bigl(u(t,x)+t\partial_tu(t,x)\bigr)e^{\lambda t u(t,x)}\right|\le \lambda\left(\|\varphi\|_\infty+\frac{L^2}{2}\right)e^{\lambda\|\varphi\|_\infty} \end{align*} for a.e. $(t,x)\in[a,1]\times\mathbb R^n$. Since $\nu$ is a probability measure, this constant is integrable with respect to $\nu$. Dominated differentiation on $[a,1]$ therefore gives, for a.e. $t\in[a,1]$, \begin{align*} Z'(t)=\lambda\int_{\mathbb R^n}\bigl(u(t,x)+t\partial_tu(t,x)\bigr)e^{\lambda t u(t,x)}\,d\nu(x). \end{align*} As $a\in(0,1)$ was arbitrary, the formula holds for a.e. $t\in(0,1]$. Therefore \begin{align*} \Phi'(t)=\int_{\mathbb R^n}\partial_tu(t,x)h_t(x)\,d\nu(x)+\frac{1}{\lambda t^2}\operatorname{Ent}_\nu(h_t\nu). \end{align*} Indeed, \begin{align*} \operatorname{Ent}_\nu(h_t\nu)=\int_{\mathbb R^n}h_t(x)\log h_t(x)\,d\nu(x)=\lambda t\int_{\mathbb R^n}u(t,x)h_t(x)\,d\nu(x)-\log Z(t), \end{align*} which is exactly the entropy term appearing after differentiating $\Phi$. [guided] The point of introducing $h_t$ is that it turns the logarithmic derivative of $Z(t)$ into an expectation with respect to a probability density. Since \begin{align*} h_t(x)=\frac{e^{\lambda t u(t,x)}}{Z(t)} \end{align*} and \begin{align*} Z(t)=\int_{\mathbb R^n}e^{\lambda t u(t,x)}\,d\nu(x), \end{align*} we have \begin{align*} \int_{\mathbb R^n}h_t\,d\nu=1. \end{align*} The boundedness of $u$ controls the exponential factor, but we also need a bound on the time derivative. Let $L:=\operatorname{Lip}(\varphi)$. The Hopf-Lax regularity statement used above gives $|\partial_tu(t,x)|\le L^2/2$ for a.e. $(t,x)$. Hence, on every compact time interval $[a,1]\subset(0,1]$, \begin{align*} \left|\lambda\bigl(u(t,x)+t\partial_tu(t,x)\bigr)e^{\lambda t u(t,x)}\right|\le \lambda\left(\|\varphi\|_\infty+\frac{L^2}{2}\right)e^{\lambda\|\varphi\|_\infty}. \end{align*} The right-hand side is a finite constant and is integrable with respect to the probability measure $\nu$. Dominated differentiation on $[a,1]$ is therefore justified, and since $a\in(0,1)$ is arbitrary the derivative formula holds for a.e. $t\in(0,1]$. Differentiate $Z(t)$: \begin{align*} Z'(t)=\lambda\int_{\mathbb R^n}\bigl(u(t,x)+t\partial_tu(t,x)\bigr)e^{\lambda t u(t,x)}\,d\nu(x). \end{align*} Dividing by $Z(t)$ rewrites this as \begin{align*} \frac{Z'(t)}{Z(t)}=\lambda\int_{\mathbb R^n}\bigl(u(t,x)+t\partial_tu(t,x)\bigr)h_t(x)\,d\nu(x). \end{align*} Since \begin{align*} \Phi(t)=\frac{1}{\lambda t}\log Z(t), \end{align*} we obtain \begin{align*} \Phi'(t)=-\frac{1}{\lambda t^2}\log Z(t)+\frac{1}{\lambda t}\frac{Z'(t)}{Z(t)}. \end{align*} Substituting the formula for $Z'(t)/Z(t)$ gives \begin{align*} \Phi'(t)=\int_{\mathbb R^n}\partial_tu(t,x)h_t(x)\,d\nu(x)+\frac{1}{t}\int_{\mathbb R^n}u(t,x)h_t(x)\,d\nu(x)-\frac{1}{\lambda t^2}\log Z(t). \end{align*} The last two terms are exactly the entropy of $h_t\nu$ divided by $\lambda t^2$, because \begin{align*} \log h_t(x)=\lambda t u(t,x)-\log Z(t). \end{align*} Thus \begin{align*} \operatorname{Ent}_\nu(h_t\nu)=\lambda t\int_{\mathbb R^n}u(t,x)h_t(x)\,d\nu(x)-\log Z(t), \end{align*} and hence \begin{align*} \Phi'(t)=\int_{\mathbb R^n}\partial_tu(t,x)h_t(x)\,d\nu(x)+\frac{1}{\lambda t^2}\operatorname{Ent}_\nu(h_t\nu). \end{align*} [/guided] [/step] [step:Use logarithmic Sobolev and Hamilton-Jacobi to prove monotonicity] For a.e. $t\in(0,1]$, define \begin{align*} g_t:\mathbb R^n\to(0,\infty),\qquad g_t(x):=h_t(x)^{1/2}. \end{align*} The function $g_t$ is bounded and locally Lipschitz because $u(t,\cdot)$ is locally Lipschitz and bounded. Also \begin{align*} \int_{\mathbb R^n}g_t^2\,d\nu=\int_{\mathbb R^n}h_t\,d\nu=1. \end{align*} Applying the logarithmic Sobolev inequality to $g_t$ gives \begin{align*} \operatorname{Ent}_\nu(h_t\nu)\le \frac{2}{\lambda}\int_{\mathbb R^n}|\nabla g_t(x)|^2\,d\nu(x). \end{align*} The local-slope chain rule for the $C^1$ map $r\mapsto r^{1/2}$ on the compact range of $h_t$ gives \begin{align*} |\nabla g_t|^2(x)\le \frac{\lambda^2t^2}{4}g_t(x)^2|\nabla u(t,\cdot)|^2(x) \end{align*} for every $x\in\mathbb R^n$. This is the same identity at classical differentiability points and is the inequality needed for the local-slope logarithmic Sobolev hypothesis. Thus the logarithmic Sobolev estimate becomes \begin{align*} \operatorname{Ent}_\nu(h_t\nu)\le \frac{\lambda t^2}{2}\int_{\mathbb R^n}|\nabla u(t,x)|^2h_t(x)\,d\nu(x). \end{align*} Substituting this estimate into the derivative identity for $\Phi$ yields \begin{align*} \Phi'(t)\le \int_{\mathbb R^n}\left(\partial_tu(t,x)+\frac{1}{2}|\nabla u(t,x)|^2\right)h_t(x)\,d\nu(x). \end{align*} The local-slope Hopf-Lax Hamilton-Jacobi inequality gives the integrand is nonpositive for every $x\in\mathbb R^n$ for a.e. $t$, and $h_t\ge0$. The preceding dominated differentiation argument also shows that $Z$ is absolutely continuous on every compact interval $[a,1]\subset(0,1]$. Since $Z(t)>0$ there and $t\mapsto (\lambda t)^{-1}$ is smooth, $\Phi$ is absolutely continuous on every such interval. Therefore \begin{align*} \Phi'(t)\le0 \end{align*} for a.e. $t\in(0,1]$, and the absolute-continuity criterion for monotonicity gives that $\Phi$ is nonincreasing on $(0,1]$. [guided] The derivative inequality alone is not enough unless the function being differentiated has the right one-dimensional regularity. Here that regularity comes from the previous step. For every $0<a<1$, the dominated differentiation estimate proves that $Z$ is absolutely continuous on $[a,1]$. Since $Z(t)>0$ and the map $t\mapsto (\lambda t)^{-1}\log Z(t)$ is obtained from $Z$ by smooth operations on $[a,1]$, the function $\Phi$ is absolutely continuous on $[a,1]$. For a.e. $t\in(0,1]$, define $g_t=h_t^{1/2}$. The density $h_t$ is positive, bounded, and locally Lipschitz because $u(t,\cdot)$ is bounded and locally Lipschitz. Hence $g_t:\mathbb R^n\to(0,\infty)$ is bounded and locally Lipschitz, and \begin{align*} \int_{\mathbb R^n}g_t^2\,d\nu=\int_{\mathbb R^n}h_t\,d\nu=1. \end{align*} The logarithmic Sobolev inequality therefore applies to $g_t$ and gives \begin{align*} \operatorname{Ent}_\nu(h_t\nu)\le \frac{2}{\lambda}\int_{\mathbb R^n}|\nabla g_t(x)|^2\,d\nu(x). \end{align*} The local-slope chain rule applied to the smooth function $r\mapsto r^{1/2}$ on the compact range of $h_t$ gives \begin{align*} |\nabla g_t|^2(x)\le \frac{\lambda^2t^2}{4}g_t(x)^2|\nabla u(t,\cdot)|^2(x). \end{align*} Substituting this bound into the logarithmic Sobolev estimate yields \begin{align*} \operatorname{Ent}_\nu(h_t\nu)\le \frac{\lambda t^2}{2}\int_{\mathbb R^n}|\nabla u(t,x)|^2h_t(x)\,d\nu(x). \end{align*} Combining this with the derivative identity for $\Phi$ gives \begin{align*} \Phi'(t)\le \int_{\mathbb R^n}\left(\partial_tu(t,x)+\frac{1}{2}|\nabla u(t,x)|^2\right)h_t(x)\,d\nu(x). \end{align*} The Hopf-Lax Hamilton-Jacobi theorem gives the local-slope inequality \begin{align*} \partial_t u(t,x)+\frac{1}{2}|\nabla u(t,\cdot)|^2(x)\le0 \end{align*} for a.e. $t$ and every $x\in\mathbb R^n$. Since $h_t\ge0$, the integral is nonpositive. Thus $\Phi'(t)\le0$ for a.e. $t\in(0,1]$. Absolute continuity on each $[a,1]$ then implies that $\Phi$ is nonincreasing on each $[a,1]$, and since $a>0$ is arbitrary, $\Phi$ is nonincreasing on $(0,1]$. [/guided] [/step] [step:Let the time parameter tend to zero to obtain the infimum-convolution inequality] Define the map \begin{align*} A:(0,1]\to\mathbb R,\qquad A(t):=\int_{\mathbb R^n}\frac{e^{\lambda t Q_t\varphi(x)}-1}{\lambda t}\,d\nu(x). \end{align*} Since $Q_t\varphi(x)\to\varphi(x)$ for every $x\in\mathbb R^n$ and $|Q_t\varphi(x)|\le\|\varphi\|_\infty$, the elementary bound \begin{align*} \left|\frac{e^{\lambda t Q_t\varphi(x)}-1}{\lambda t}\right|\le e^{\lambda\|\varphi\|_\infty}\|\varphi\|_\infty \end{align*} holds for $0<t\le1$. The dominated convergence theorem applied with respect to the probability measure $\nu$ gives \begin{align*} \lim_{t\downarrow0}A(t)=\int_{\mathbb R^n}\varphi\,d\nu. \end{align*} Moreover \begin{align*} \int_{\mathbb R^n}e^{\lambda t Q_t\varphi(x)}\,d\nu(x)=1+\lambda t A(t), \end{align*} and therefore \begin{align*} \frac{1}{\lambda t}\log\int_{\mathbb R^n}e^{\lambda t Q_t\varphi(x)}\,d\nu(x)=\frac{\log(1+\lambda t A(t))}{\lambda t A(t)}A(t)\to\int_{\mathbb R^n}\varphi\,d\nu \end{align*} as $t\downarrow0$, with the harmless convention that the displayed product is $A(t)$ when $A(t)=0$. Since $\Phi$ is nonincreasing, $\Phi(1)\le\lim_{t\downarrow0}\Phi(t)$. Thus \begin{align*} \frac{1}{\lambda}\log\int_{\mathbb R^n}e^{\lambda Q_1\varphi(x)}\,d\nu(x)\le \int_{\mathbb R^n}\varphi\,d\nu. \end{align*} Equivalently, \begin{align*} \int_{\mathbb R^n}e^{\lambda Q_1\varphi(x)}\,d\nu(x)\le \exp\left(\lambda\int_{\mathbb R^n}\varphi\,d\nu\right). \end{align*} [/step] [step:Apply the entropy variational formula to compare integrals against $\mu$ and $\nu$] Let $\mu\ll\nu$ be a Borel probability measure with finite second moment and finite entropy. Let \begin{align*} f:\mathbb R^n\to[0,\infty] \end{align*} denote the Radon-Nikodym density of $\mu$ with respect to $\nu$, so that $d\mu=f\,d\nu$ and \begin{align*} \operatorname{Ent}_\nu(\mu)=\int_{\mathbb R^n}f\log f\,d\nu. \end{align*} The Donsker-Varadhan entropy variational formula states that, for every bounded measurable function $\psi:\mathbb R^n\to\mathbb R$, \begin{align*} \int_{\mathbb R^n}\psi\,d\mu\le \operatorname{Ent}_\nu(\mu)+\log\int_{\mathbb R^n}e^\psi\,d\nu. \end{align*} Its hypotheses are satisfied because $\mu\ll\nu$ and $\operatorname{Ent}_\nu(\mu)<\infty$. Apply this with \begin{align*} \psi:\mathbb R^n\to\mathbb R,\qquad \psi(x):=\lambda Q_1\varphi(x). \end{align*} Using the infimum-convolution inequality from the previous step gives \begin{align*} \lambda\int_{\mathbb R^n}Q_1\varphi\,d\mu\le \operatorname{Ent}_\nu(\mu)+\lambda\int_{\mathbb R^n}\varphi\,d\nu. \end{align*} Therefore \begin{align*} \int_{\mathbb R^n}Q_1\varphi\,d\mu-\int_{\mathbb R^n}\varphi\,d\nu\le \frac{1}{\lambda}\operatorname{Ent}_\nu(\mu). \end{align*} [/step] [step:Take the quadratic Kantorovich dual supremum] Let $\operatorname{Lip}_b(\mathbb R^n)$ denote the set of bounded Lipschitz functions from $\mathbb R^n$ to $\mathbb R$. For the cost \begin{align*} c:\mathbb R^n\times\mathbb R^n\to[0,\infty),\qquad c(x,y):=\frac{1}{2}|x-y|^2, \end{align*} the Hopf-Lax transform at time $1$ is the $c$-transform \begin{align*} Q_1\varphi(x)=\inf_{y\in\mathbb R^n}\left\{\varphi(y)+c(x,y)\right\}. \end{align*} Because $\mu$ and $\nu$ have finite second moments, the quadratic Kantorovich duality theorem applies to the lower semicontinuous cost $c$. We use the standard bounded-Lipschitz $c$-transform form of quadratic Kantorovich duality, which states that, with $\operatorname{Lip}_b(\mathbb R^n)$ denoting the set of bounded Lipschitz maps $\mathbb R^n\to\mathbb R$, \begin{align*} \frac{1}{2}W_2(\mu,\nu)^2=\sup_{\varphi\in\operatorname{Lip}_b(\mathbb R^n)}\left\{\int_{\mathbb R^n}Q_1\varphi\,d\mu-\int_{\mathbb R^n}\varphi\,d\nu\right\}. \end{align*} This theorem is the quadratic-cost bounded-Lipschitz potential reduction of the usual Kantorovich duality formula; its hypotheses are exactly the finite second moment assumptions on $\mu$ and $\nu$ together with the lower semicontinuity and quadratic growth of $c$. For every bounded Lipschitz function $\varphi:\mathbb R^n\to\mathbb R$, the pair \begin{align*} \alpha:=Q_1\varphi,\qquad \beta:=-\varphi \end{align*} is admissible because $Q_1\varphi(x)\le \varphi(y)+c(x,y)$ for all $x,y\in\mathbb R^n$. This is the only point where the finite second moment assumptions on both measures are used. Since the previous step proves that every such $\varphi$ satisfies \begin{align*} \int_{\mathbb R^n}Q_1\varphi\,d\mu-\int_{\mathbb R^n}\varphi\,d\nu\le \frac{1}{\lambda}\operatorname{Ent}_\nu(\mu), \end{align*} taking the supremum over $\varphi$ yields \begin{align*} \frac{1}{2}W_2(\mu,\nu)^2\le \frac{1}{\lambda}\operatorname{Ent}_\nu(\mu). \end{align*} Multiplying by $2$ gives \begin{align*} W_2(\mu,\nu)^2\le \frac{2}{\lambda}\operatorname{Ent}_\nu(\mu), \end{align*} which is the desired Talagrand $T_2$ inequality. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.