Entropy Plus Convex Potential Is Displacement Convex

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The proof separates the free energy into its entropy part and potential part. The entropy term is displacement convex by [citetheorem:9567] applied to the internal energy $U(s)=s\log s$. For the potential term, Brenier's optimal transport theorem represents the unique quadratic Wasserstein geodesic from the absolutely continuous source as the pushforward by the linear interpolation between the identity map and the optimal map; the Hessian lower bound on $V$ gives a pointwise $\lambda$-convexity inequality along each interpolation segment. Integrating that pointwise inequality and adding it to the entropy estimate gives the desired displacement-convexity bound. [/proofplan] [step:Represent the Wasserstein geodesic by the Brenier interpolation] Since $\mu_0\ll\mathcal L^n$ and $\mu_0,\mu_1\in\mathcal P_2(\mathbb R^n)$, Brenier's optimal transport theorem gives a Borel map \begin{align*} T:\mathbb R^n\to\mathbb R^n \end{align*} such that $T_{\#}\mu_0=\mu_1$ and the transport plan $(\operatorname{id}_{\mathbb R^n},T)_{\#}\mu_0$ is optimal for the quadratic cost. In particular, \begin{align*} W_2(\mu_0,\mu_1)^2=\int_{\mathbb R^n}|T(x)-x|^2\,d\mu_0(x). \end{align*} For each $t\in[0,1]$, define the interpolation map \begin{align*} T_t:\mathbb R^n&\to\mathbb R^n \end{align*} by \begin{align*} T_t(x):=(1-t)x+tT(x). \end{align*} The displacement interpolation uniqueness theorem for quadratic optimal transport says that, when the optimal quadratic transport plan from $\mu_0$ to $\mu_1$ is unique and induced by a map, the unique constant-speed quadratic Wasserstein geodesic from $\mu_0$ to $\mu_1$ is given by \begin{align*} \mu_t=(T_t)_{\#}\mu_0. \end{align*} Brenier's theorem gives exactly this uniqueness and map-induced optimal plan under the hypothesis $\mu_0\ll\mathcal L^n$. Thus the geodesic appearing in the statement is this curve. [guided] The role of absolute continuity of $\mu_0$ is to make the quadratic optimal transport problem map-induced rather than merely plan-induced. Brenier's optimal transport theorem applies because $\mu_0,\mu_1\in\mathcal P_2(\mathbb R^n)$ and $\mu_0$ is absolutely continuous with respect to $\mathcal L^n$. It gives a Borel map \begin{align*} T:\mathbb R^n\to\mathbb R^n \end{align*} with $T_{\#}\mu_0=\mu_1$ such that the plan $(\operatorname{id}_{\mathbb R^n},T)_{\#}\mu_0$ minimizes the quadratic transportation cost. Therefore the optimal cost is \begin{align*} W_2(\mu_0,\mu_1)^2=\int_{\mathbb R^n}|T(x)-x|^2\,d\mu_0(x). \end{align*} We now turn the optimal map into a geodesic. For each $t\in[0,1]$, define \begin{align*} T_t:\mathbb R^n&\to\mathbb R^n \end{align*} by \begin{align*} T_t(x):=(1-t)x+tT(x). \end{align*} This is the straight-line interpolation in the ambient Euclidean space between $x$ and its optimal destination $T(x)$. The displacement interpolation theorem for Brenier maps says that \begin{align*} \mu_t=(T_t)_{\#}\mu_0 \end{align*} is a constant-speed $W_2$-geodesic from $\mu_0$ to $\mu_1$. We also need uniqueness, because the statement quantifies over every constant-speed geodesic. Brenier's theorem gives uniqueness of the optimal quadratic transport plan and says that this plan is induced by the map $T$. The displacement interpolation uniqueness theorem then implies that any constant-speed geodesic between $\mu_0$ and $\mu_1$ must be the interpolation obtained from this unique optimal plan. This justifies treating the arbitrary geodesic in the statement as the Brenier interpolation above. [/guided] [/step] [step:Apply McCann displacement convexity to the entropy term] Let $U:(0,\infty)\to\mathbb R$ be the smooth function \begin{align*} U(s):=s\log s. \end{align*} This internal energy is the Boltzmann entropy: \begin{align*} \mathcal U[\mu]=\int_{\mathbb R^n}U(\rho(x))\,d\mathcal L^n(x)=\operatorname{Ent}_{\mathcal L^n}(\mu) \end{align*} whenever $\mu=\rho\,\mathcal L^n$ and the integral is finite. For this theorem, $U\in\mathcal{DC}_n$ means that the map $G:(0,\infty)\to\mathbb R$ defined by \begin{align*} G(r):=r^nU(r^{-n}) \end{align*} is convex and nonincreasing. For $U(s)=s\log s$, this map is \begin{align*} G(r)=r^n r^{-n}\log(r^{-n})=-n\log r. \end{align*} Its first and second derivatives are \begin{align*} G'(r)=-\frac{n}{r} \end{align*} and \begin{align*} G''(r)=\frac{n}{r^2}. \end{align*} Thus $G$ is nonincreasing and convex on $(0,\infty)$, so $U\in\mathcal{DC}_n$. The endpoint hypotheses of [citetheorem:9567] hold because $\mu_0$ and $\mu_1$ are absolutely continuous probability measures in $\mathcal P_2(\mathbb R^n)$ and $\mathcal F[\mu_0],\mathcal F[\mu_1]\in\mathbb R$ imply $\operatorname{Ent}_{\mathcal L^n}(\mu_0),\operatorname{Ent}_{\mathcal L^n}(\mu_1)\in\mathbb R$ by the definition of $\mathcal F$. The first step identified the given geodesic with the Brenier interpolation, so [citetheorem:9567] gives \begin{align*} \operatorname{Ent}_{\mathcal L^n}(\mu_t)\le (1-t)\operatorname{Ent}_{\mathcal L^n}(\mu_0)+t\operatorname{Ent}_{\mathcal L^n}(\mu_1) \end{align*} for every $t\in[0,1]$. [guided] We apply McCann displacement convexity in exactly the entropy case needed here. Define \begin{align*} U:(0,\infty)&\to\mathbb R \end{align*} by \begin{align*} U(s):=s\log s. \end{align*} For an absolutely continuous measure $\mu=\rho\,\mathcal L^n$, the associated internal energy is \begin{align*} \mathcal U[\mu]=\int_{\mathbb R^n}U(\rho(x))\,d\mathcal L^n(x)=\operatorname{Ent}_{\mathcal L^n}(\mu), \end{align*} provided this integral is finite. For this theorem, $U\in\mathcal{DC}_n$ means that the map $G:(0,\infty)\to\mathbb R$ defined by \begin{align*} G(r):=r^nU(r^{-n}) \end{align*} is convex and nonincreasing. For $U(s)=s\log s$, one has \begin{align*} G(r)=r^n r^{-n}\log(r^{-n})=-n\log r. \end{align*} Since \begin{align*} G'(r)=-\frac{n}{r} \end{align*} and \begin{align*} G''(r)=\frac{n}{r^2}, \end{align*} the map $G$ is nonincreasing and convex on $(0,\infty)$. Hence $U\in\mathcal{DC}_n$. The theorem [citetheorem:9567] applies to absolutely continuous probability measures in $\mathcal P_2(\mathbb R^n)$ with finite internal energy and to the constant-speed geodesic induced by the Brenier map. We verify these hypotheses. The measures $\mu_0$ and $\mu_1$ are absolutely continuous by assumption and belong to $\mathcal P_2(\mathbb R^n)$ by assumption. Since $\mathcal F[\mu_0]$ and $\mathcal F[\mu_1]$ are finite real numbers, the definition of $\mathcal F$ gives \begin{align*} \operatorname{Ent}_{\mathcal L^n}(\mu_0),\operatorname{Ent}_{\mathcal L^n}(\mu_1)\in\mathbb R. \end{align*} The first step showed that the geodesic in the statement is the Brenier interpolation from $\mu_0$ to $\mu_1$. Therefore [citetheorem:9567], in the Boltzmann entropy case $U(s)=s\log s$, yields \begin{align*} \operatorname{Ent}_{\mathcal L^n}(\mu_t)\le (1-t)\operatorname{Ent}_{\mathcal L^n}(\mu_0)+t\operatorname{Ent}_{\mathcal L^n}(\mu_1) \end{align*} for every $t\in[0,1]$. This is the entropy half of the free-energy inequality. [/guided] [/step] [step:Derive the pointwise $\lambda$-convexity estimate for the potential] Fix $x\in\mathbb R^n$ such that $T(x)$ is defined, and define \begin{align*} \gamma_x:[0,1]&\to\mathbb R^n \end{align*} by \begin{align*} \gamma_x(t):=(1-t)x+tT(x). \end{align*} Define also \begin{align*} \phi_x:[0,1]&\to\mathbb R \end{align*} by \begin{align*} \phi_x(t):=V(\gamma_x(t)). \end{align*} Since $V\in C^2(\mathbb R^n;\mathbb R)$ and $\gamma_x$ is affine, the one-dimensional chain rule gives \begin{align*} \phi_x''(t)=(T(x)-x)^\top J(\nabla V)_{\gamma_x(t)}(T(x)-x). \end{align*} The Hessian lower bound therefore implies \begin{align*} \phi_x''(t)\ge \lambda |T(x)-x|^2. \end{align*} Define \begin{align*} \psi_x:[0,1]&\to\mathbb R \end{align*} by \begin{align*} \psi_x(t):=\phi_x(t)-\frac{\lambda}{2}t^2|T(x)-x|^2. \end{align*} Then $\psi_x''(t)\ge0$, so $\psi_x$ is convex on $[0,1]$. Hence \begin{align*} \psi_x(t)\le (1-t)\psi_x(0)+t\psi_x(1). \end{align*} Substituting the definition of $\psi_x$ and using $\phi_x(0)=V(x)$ and $\phi_x(1)=V(T(x))$ yields \begin{align*} V((1-t)x+tT(x))\le (1-t)V(x)+tV(T(x))-\frac{\lambda}{2}t(1-t)|T(x)-x|^2. \end{align*} [guided] The goal is to prove a one-dimensional convexity inequality along each transport ray. Fix a point $x\in\mathbb R^n$ at which the Borel optimal map $T$ is defined. The ray from $x$ to $T(x)$ is encoded by the affine map \begin{align*} \gamma_x:[0,1]&\to\mathbb R^n \end{align*} given by \begin{align*} \gamma_x(t):=(1-t)x+tT(x). \end{align*} Composing $V$ with this ray gives the one-dimensional function \begin{align*} \phi_x:[0,1]&\to\mathbb R \end{align*} defined by \begin{align*} \phi_x(t):=V(\gamma_x(t)). \end{align*} Because $V\in C^2(\mathbb R^n;\mathbb R)$ and $\gamma_x$ is affine, $\phi_x$ is twice continuously differentiable. The chain rule gives first \begin{align*} \phi_x'(t)=\nabla V(\gamma_x(t))\cdot (T(x)-x). \end{align*} Differentiating once more gives \begin{align*} \phi_x''(t)=(T(x)-x)^\top J(\nabla V)_{\gamma_x(t)}(T(x)-x). \end{align*} Now we use the hypothesis on $V$. It says that for every point $z\in\mathbb R^n$ and every vector $\xi\in\mathbb R^n$, \begin{align*} \xi^\top J(\nabla V)_z\xi\ge \lambda |\xi|^2. \end{align*} Apply this with $z=\gamma_x(t)$ and $\xi=T(x)-x$. We obtain \begin{align*} \phi_x''(t)\ge \lambda |T(x)-x|^2. \end{align*} To convert this lower bound on the second derivative into the endpoint inequality, subtract the quadratic function whose second derivative is exactly the lower bound. Define \begin{align*} \psi_x:[0,1]&\to\mathbb R \end{align*} by \begin{align*} \psi_x(t):=\phi_x(t)-\frac{\lambda}{2}t^2|T(x)-x|^2. \end{align*} Then \begin{align*} \psi_x''(t)=\phi_x''(t)-\lambda |T(x)-x|^2\ge0. \end{align*} Thus $\psi_x$ is convex on $[0,1]$. Convexity on the interval gives \begin{align*} \psi_x(t)\le (1-t)\psi_x(0)+t\psi_x(1). \end{align*} Substituting the definition of $\psi_x$ gives \begin{align*} \phi_x(t)-\frac{\lambda}{2}t^2|T(x)-x|^2\le (1-t)\phi_x(0)+t\phi_x(1)-\frac{\lambda}{2}t|T(x)-x|^2. \end{align*} Rearranging and using $\phi_x(0)=V(x)$, $\phi_x(1)=V(T(x))$, and $\phi_x(t)=V((1-t)x+tT(x))$, we obtain \begin{align*} V((1-t)x+tT(x))\le (1-t)V(x)+tV(T(x))-\frac{\lambda}{2}t(1-t)|T(x)-x|^2. \end{align*} This is exactly the pointwise potential-energy estimate needed before integration. [/guided] [/step] [step:Integrate the potential estimate along the optimal map] First record the lower integrability needed to integrate $V$. Define \begin{align*} a:=V(0),\qquad b:=\nabla V(0)\in\mathbb R^n. \end{align*} Fix $z\in\mathbb R^n$, and define the one-dimensional map $h_z:[0,1]\to\mathbb R$ by \begin{align*} h_z(s):=V(sz). \end{align*} Taylor's formula with integral remainder gives \begin{align*} V(z)=a+b\cdot z+\int_0^1(1-s)z^\top J(\nabla V)_{sz}z\,d\mathcal L^1(s). \end{align*} The Hessian lower bound yields \begin{align*} V(z)\ge a+b\cdot z+\frac{\lambda}{2}|z|^2. \end{align*} Set \begin{align*} C_V:=1+|a|+|b|+|\lambda|. \end{align*} Using $-|b||z|\ge -|b|(1+|z|^2)$, the lower bound implies \begin{align*} V^-(z)\le C_V(1+|z|^2) \end{align*} for every $z\in\mathbb R^n$. Since $\mu_0,\mu_1,\mu_t\in\mathcal P_2(\mathbb R^n)$, the negative part $V^-$ is integrable with respect to each of these measures. The endpoint assumptions $\mathcal F[\mu_0],\mathcal F[\mu_1]\in\mathbb R$ give $V\in L^1(\mu_0)$ and $V\in L^1(\mu_1)$ by the definition of $\mathcal F$. Since $\mu_t=(T_t)_{\#}\mu_0$, the change-of-variables formula for pushforward measures gives \begin{align*} \int_{\mathbb R^n}V(y)\,d\mu_t(y)=\int_{\mathbb R^n}V(T_t(x))\,d\mu_0(x) \end{align*} whenever the left-hand side is finite or equal to $+\infty$. Integrating the pointwise estimate from the previous step in the extended-real sense is legitimate because all negative parts are integrable and the right-hand side is integrable with respect to $\mu_0$. Thus \begin{align*} \int_{\mathbb R^n}V(y)\,d\mu_t(y)\le (1-t)\int_{\mathbb R^n}V(x)\,d\mu_0(x)+t\int_{\mathbb R^n}V(T(x))\,d\mu_0(x)-\frac{\lambda}{2}t(1-t)\int_{\mathbb R^n}|T(x)-x|^2\,d\mu_0(x). \end{align*} Because $T_{\#}\mu_0=\mu_1$, another use of the pushforward formula gives \begin{align*} \int_{\mathbb R^n}V(T(x))\,d\mu_0(x)=\int_{\mathbb R^n}V(y)\,d\mu_1(y). \end{align*} Using the optimality identity for $T$, we obtain \begin{align*} \int_{\mathbb R^n}V(y)\,d\mu_t(y)\le (1-t)\int_{\mathbb R^n}V(y)\,d\mu_0(y)+t\int_{\mathbb R^n}V(y)\,d\mu_1(y)-\frac{\lambda}{2}t(1-t)W_2(\mu_0,\mu_1)^2. \end{align*} The right-hand side is finite, so $V\in L^1(\mu_t)$ and the potential term of $\mu_t$ is finite. [guided] Before integrating the pointwise inequality, we must know that the negative part of $V$ is integrable. The Hessian lower bound gives exactly the needed quadratic lower bound. Define \begin{align*} a:=V(0),\qquad b:=\nabla V(0)\in\mathbb R^n. \end{align*} Fix an arbitrary point $z\in\mathbb R^n$, and define the one-dimensional map $h_z:[0,1]\to\mathbb R$ by \begin{align*} h_z(s):=V(sz). \end{align*} Taylor's formula with integral remainder gives \begin{align*} V(z)=a+b\cdot z+\int_0^1(1-s)z^\top J(\nabla V)_{sz}z\,d\mathcal L^1(s). \end{align*} The hypothesis on $V$ says that \begin{align*} z^\top J(\nabla V)_{sz}z\ge \lambda |z|^2 \end{align*} for every $s\in[0,1]$. Therefore \begin{align*} V(z)\ge a+b\cdot z+\frac{\lambda}{2}|z|^2. \end{align*} Set \begin{align*} C_V:=1+|a|+|b|+|\lambda|. \end{align*} Since $-|b||z|\ge -|b|(1+|z|^2)$, this lower bound implies \begin{align*} V^-(z)\le C_V(1+|z|^2) \end{align*} for every $z\in\mathbb R^n$. Since every measure on the geodesic belongs to $\mathcal P_2(\mathbb R^n)$, this bound implies that $V^-$ is integrable with respect to $\mu_0$, $\mu_1$, and $\mu_t$. The assumptions $\mathcal F[\mu_0],\mathcal F[\mu_1]\in\mathbb R$ also imply $V\in L^1(\mu_0)$ and $V\in L^1(\mu_1)$ by the definition of $\mathcal F$. Now we use the pushforward identity. Since $\mu_t=(T_t)_{\#}\mu_0$, for the Borel function $V:\mathbb R^n\to\mathbb R$ we have \begin{align*} \int_{\mathbb R^n}V(y)\,d\mu_t(y)=\int_{\mathbb R^n}V(T_t(x))\,d\mu_0(x) \end{align*} whenever the integral is finite or equal to $+\infty$. The pointwise estimate from the previous step says \begin{align*} V(T_t(x))\le (1-t)V(x)+tV(T(x))-\frac{\lambda}{2}t(1-t)|T(x)-x|^2. \end{align*} The negative parts are integrable, and the endpoint potential energies are finite, so this inequality can be integrated in the extended-real sense. We obtain \begin{align*} \int_{\mathbb R^n}V(y)\,d\mu_t(y)\le (1-t)\int_{\mathbb R^n}V(x)\,d\mu_0(x)+t\int_{\mathbb R^n}V(T(x))\,d\mu_0(x)-\frac{\lambda}{2}t(1-t)\int_{\mathbb R^n}|T(x)-x|^2\,d\mu_0(x). \end{align*} Because $T_{\#}\mu_0=\mu_1$, the pushforward formula gives \begin{align*} \int_{\mathbb R^n}V(T(x))\,d\mu_0(x)=\int_{\mathbb R^n}V(y)\,d\mu_1(y). \end{align*} Because $(\operatorname{id}_{\mathbb R^n},T)_{\#}\mu_0$ is optimal for the quadratic cost, \begin{align*} \int_{\mathbb R^n}|T(x)-x|^2\,d\mu_0(x)=W_2(\mu_0,\mu_1)^2. \end{align*} Substituting these two identities gives \begin{align*} \int_{\mathbb R^n}V(y)\,d\mu_t(y)\le (1-t)\int_{\mathbb R^n}V(y)\,d\mu_0(y)+t\int_{\mathbb R^n}V(y)\,d\mu_1(y)-\frac{\lambda}{2}t(1-t)W_2(\mu_0,\mu_1)^2. \end{align*} The right-hand side is finite, so the potential energy at time $t$ is finite. [/guided] [/step] [step:Add the entropy and potential inequalities] For every $t\in[0,1]$, the entropy estimate gives \begin{align*} \operatorname{Ent}_{\mathcal L^n}(\mu_t)\le (1-t)\operatorname{Ent}_{\mathcal L^n}(\mu_0)+t\operatorname{Ent}_{\mathcal L^n}(\mu_1). \end{align*} The potential estimate gives \begin{align*} \int_{\mathbb R^n}V(y)\,d\mu_t(y)\le (1-t)\int_{\mathbb R^n}V(y)\,d\mu_0(y)+t\int_{\mathbb R^n}V(y)\,d\mu_1(y)-\frac{\lambda}{2}t(1-t)W_2(\mu_0,\mu_1)^2. \end{align*} The right-hand sides are finite, so both terms defining $\mathcal F[\mu_t]$ are finite. Adding these two inequalities and using the definition \begin{align*} \mathcal F[\nu]=\operatorname{Ent}_{\mathcal L^n}(\nu)+\int_{\mathbb R^n}V(y)\,d\nu(y) \end{align*} for $\nu\in\{\mu_0,\mu_t,\mu_1\}$ yields \begin{align*} \mathcal F[\mu_t]\le (1-t)\mathcal F[\mu_0]+t\mathcal F[\mu_1]-\frac{\lambda}{2}t(1-t)W_2(\mu_0,\mu_1)^2. \end{align*} This is the asserted displacement $\lambda$-convexity of the entropy-plus-potential functional along the constant-speed quadratic Wasserstein geodesic. [guided] At this point the proof has produced two inequalities with the same endpoints and the same interpolation parameter $t$. The entropy part is \begin{align*} \operatorname{Ent}_{\mathcal L^n}(\mu_t)\le (1-t)\operatorname{Ent}_{\mathcal L^n}(\mu_0)+t\operatorname{Ent}_{\mathcal L^n}(\mu_1). \end{align*} The potential part is \begin{align*} \int_{\mathbb R^n}V(y)\,d\mu_t(y)\le (1-t)\int_{\mathbb R^n}V(y)\,d\mu_0(y)+t\int_{\mathbb R^n}V(y)\,d\mu_1(y)-\frac{\lambda}{2}t(1-t)W_2(\mu_0,\mu_1)^2. \end{align*} The previous step showed that the potential term at $\mu_t$ is finite, and the entropy estimate has a finite right-hand side because the endpoint entropies are finite. Therefore the two inequalities may be added as inequalities between finite real numbers. Using \begin{align*} \mathcal F[\nu]=\operatorname{Ent}_{\mathcal L^n}(\nu)+\int_{\mathbb R^n}V(y)\,d\nu(y) \end{align*} for $\nu\in\{\mu_0,\mu_t,\mu_1\}$ gives \begin{align*} \mathcal F[\mu_t]\le (1-t)\mathcal F[\mu_0]+t\mathcal F[\mu_1]-\frac{\lambda}{2}t(1-t)W_2(\mu_0,\mu_1)^2. \end{align*} This is precisely the claimed displacement $\lambda$-convexity inequality. [/guided] [/step]

What brings you to Androma?

Start with a route through the knowledge graph.