Displacement Convexity of Relative Entropy for Uniformly Convex Potentials

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The proof splits the relative entropy into the Lebesgue entropy plus the potential energy. Since finite relative entropy with respect to $\mu$ implies absolute continuity with respect to [Lebesgue measure](/page/Lebesgue%20Measure), Brenier's theorem gives a unique optimal map $T$ from $\nu_0$ to $\nu_1$, and the given geodesic is the displacement interpolation generated by $T$. The Lebesgue entropy is displacement convex by McCann's theorem, while the uniform convexity of $V$ gives a strictly convex estimate for the potential energy along each transport segment. Adding the two inequalities and cancelling the constant $\log Z$ gives the claimed $\rho$-convexity inequality. [/proofplan] [step:Identify the unique displacement interpolation generated by the Brenier map] Because $e^{-V(x)} > 0$ for every $x \in \mathbb R^n$, the measures $\mu$ and $\mathcal L^n$ have the same null sets. Since $H(\nu_0\mid\mu) < \infty$ and $H(\nu_1\mid\mu) < \infty$, there exist Borel functions $f_0,f_1:\mathbb R^n \to [0,\infty)$ such that $\nu_i=f_i\mu$ for $i \in \{0,1\}$. Hence $\nu_0$ and $\nu_1$ are absolutely continuous with respect to $\mathcal L^n$. By the Brenier optimal transport theorem, applied to the absolutely continuous source measure $\nu_0 \in \mathcal P_2(\mathbb R^n)$ and the target measure $\nu_1 \in \mathcal P_2(\mathbb R^n)$ for the quadratic cost, there exists a convex function $\varphi:\mathbb R^n \to (-\infty,\infty]$ and a Borel map \begin{align*} T:\mathbb R^n \to \mathbb R^n \end{align*} such that $T(x)=\nabla \varphi(x)$ for $\nu_0$-a.e. $x$. The same theorem gives that $T$ pushes $\nu_0$ forward to $\nu_1$ and is the unique optimal transport map from $\nu_0$ to $\nu_1$ for the cost $|x-y|^2$. Thus \begin{align*} T_{\#}\nu_0=\nu_1 \end{align*} and \begin{align*} \int_{\mathbb R^n}|T(x)-x|^2\, d\nu_0(x)=W_2(\nu_0,\nu_1)^2. \end{align*} For each $t \in [0,1]$, define the interpolation map \begin{align*} T_t:\mathbb R^n \to \mathbb R^n \end{align*} by \begin{align*} T_t(x):=(1-t)x+tT(x). \end{align*} Since $\nu_0$ is absolutely continuous with respect to $\mathcal L^n$, optimal plans from $\nu_0$ for the quadratic cost are unique. The standard representation of constant-speed $W_2$-geodesics by optimal plans says that if $\pi$ is an optimal plan between $\nu_0$ and $\nu_1$, then the geodesic induced by $\pi$ is $((1-t)x+ty)_{\#}\pi$. Applying this representation to the unique optimal plan $\pi=(\operatorname{id}_{\mathbb R^n},T)_{\#}\nu_0$, the constant-speed $W_2$-geodesic from $\nu_0$ to $\nu_1$ is unique, and the prescribed geodesic satisfies \begin{align*} \nu_t=(T_t)_{\#}\nu_0 \end{align*} for every $t \in [0,1]$. [/step] [step:Split relative entropy into Lebesgue entropy and potential energy] For any $\nu \in \mathcal P_2(\mathbb R^n)$ with $H(\nu\mid\mu)<\infty$, write $\nu=\rho_\nu \mathcal L^n$, where $\rho_\nu:\mathbb R^n\to[0,\infty)$ is the Lebesgue density of $\nu$. Define the Lebesgue entropy functional $E_{\mathcal L}:\mathcal P_2(\mathbb R^n)\to(-\infty,\infty]$ by \begin{align*} E_{\mathcal L}(\nu):=\int_{\mathbb R^n}\rho_\nu(x)\log \rho_\nu(x)\, d\mathcal L^n(x) \end{align*} when $\nu=\rho_\nu\mathcal L^n$, and $E_{\mathcal L}(\nu):=+\infty$ otherwise. Define the potential energy functional $P_V:\mathcal P_2(\mathbb R^n)\to(-\infty,\infty]$ by \begin{align*} P_V(\nu):=\int_{\mathbb R^n}V(x)\, d\nu(x). \end{align*} Define the reference density $p:\mathbb R^n\to(0,\infty)$ by \begin{align*} p(x):=Z^{-1}e^{-V(x)}. \end{align*} Then $\mu=p\mathcal L^n$. Define the Radon-Nikodym derivative as the measurable map \begin{align*} r_\nu:\mathbb R^n\to[0,\infty] \end{align*} given $\mu$-a.e. by \begin{align*} r_\nu=\frac{d\nu}{d\mu}. \end{align*} Since $\nu=\rho_\nu\mathcal L^n$, this derivative satisfies \begin{align*} r_\nu(x)=\frac{\rho_\nu(x)}{p(x)}=Z e^{V(x)}\rho_\nu(x) \end{align*} for $\nu$-a.e. $x$. The change of reference measure in the definition of relative entropy gives \begin{align*} H(\nu\mid\mu)=\int_{\mathbb R^n}\rho_\nu(x)\log\frac{\rho_\nu(x)}{p(x)}\, d\mathcal L^n(x) \end{align*} as an extended integral, with the convention $0\log 0=0$. On the set where $\rho_\nu>0$, we have \begin{align*} \log\frac{\rho_\nu(x)}{p(x)}=\log\rho_\nu(x)+V(x)+\log Z. \end{align*} The set where $\rho_\nu=0$ contributes zero to the density-weighted integrals. Therefore, whenever $E_{\mathcal L}(\nu)$ and $P_V(\nu)$ are not combined in the indeterminate form $\infty-\infty$, the definition of the signed [Lebesgue integral](/page/Lebesgue%20Integral) gives \begin{align*} H(\nu\mid\mu)=E_{\mathcal L}(\nu)+P_V(\nu)+\log Z. \end{align*} For the endpoint measures, the next paragraph proves that $E_{\mathcal L}(\nu_i)$ and $P_V(\nu_i)$ are finite [real numbers](/page/Real%20Numbers), so the displayed identity is then an ordinary equality for $i\in\{0,1\}$. The terms in this identity are finite for $\nu_0$ and $\nu_1$. Indeed, Taylor's formula with integral remainder and the Hessian lower bound give \begin{align*} V(x)\ge V(0)+\nabla V(0)\cdot x+\frac{\rho}{2}|x|^2. \end{align*} [Young's inequality](/theorems/244) applied to the Euclidean [inner product](/page/Inner%20Product) $\nabla V(0)\cdot x$ gives \begin{align*} \nabla V(0)\cdot x\ge -\frac{\rho}{4}|x|^2-\frac{|\nabla V(0)|^2}{\rho}, \end{align*} so define the constant \begin{align*} m:=V(0)-\frac{|\nabla V(0)|^2}{\rho}. \end{align*} Then \begin{align*} V(x)\ge \frac{\rho}{4}|x|^2+m \end{align*} for every $x\in\mathbb R^n$. For a real-valued measurable function $F:\mathbb R^n\to\mathbb R$, define its positive and negative parts by \begin{align*} F^+(x):=\max\{F(x),0\}, \qquad F^-(x):=\max\{-F(x),0\}. \end{align*} Hence $V$ is bounded below by $m$. Defining \begin{align*} m^-:=\max\{-m,0\}, \end{align*} we have $V^-\le m^-$. It remains to justify that the positive part of $V$ and the Lebesgue entropy are finite at the endpoints. Fix $i\in\{0,1\}$, and define \begin{align*} M_i:=\int_{\mathbb R^n}|x|^2\,d\nu_i(x)<\infty. \end{align*} Choose $a>0$ and define the Gaussian probability density $q_a:\mathbb R^n\to(0,\infty)$ by \begin{align*} q_a(x):=c_a e^{-a|x|^2}. \end{align*} Here $c_a>0$ is the normalising constant satisfying \begin{align*} \int_{\mathbb R^n}q_a(x)\,d\mathcal L^n(x)=1. \end{align*} By the non-negativity of relative entropy, also called [Gibbs' inequality](/theorems/1629), the relative entropy of $\nu_i$ with respect to the probability measure $q_a\mathcal L^n$ is non-negative in the extended sense. Therefore \begin{align*} 0\le \int_{\mathbb R^n}\rho_{\nu_i}(x)\log\frac{\rho_{\nu_i}(x)}{q_a(x)}\,d\mathcal L^n(x). \end{align*} Thus \begin{align*} E_{\mathcal L}(\nu_i)\ge \int_{\mathbb R^n}\rho_{\nu_i}(x)\log q_a(x)\,d\mathcal L^n(x)=\log c_a-aM_i> -\infty. \end{align*} The lower bound $V\ge m$ gives $P_V(\nu_i)>-\infty$. Hence the entropy splitting for $\nu_i$ is not an indeterminate expression. Using \begin{align*} H(\nu_i\mid\mu)=E_{\mathcal L}(\nu_i)+P_V(\nu_i)+\log Z \end{align*} and the hypothesis $H(\nu_i\mid\mu)<\infty$, the lower bound $E_{\mathcal L}(\nu_i)>-\infty$ forces $P_V(\nu_i)<\infty$. Since $V^-$ is bounded, this implies \begin{align*} \int_{\mathbb R^n}V^+(x)\,d\nu_i(x)<\infty, \end{align*} and hence $P_V(\nu_i)\in\mathbb R$. The same identity then gives $E_{\mathcal L}(\nu_i)\in\mathbb R$. [/step] [step:Apply displacement convexity to the Lebesgue entropy] By the preceding step, $E_{\mathcal L}(\nu_0)$ and $E_{\mathcal L}(\nu_1)$ are finite real numbers. We use McCann's displacement convexity theorem in the following precise form: for the extended Boltzmann entropy functional $E_{\mathcal L}$, defined as above by the density formula on measures absolutely continuous with respect to $\mathcal L^n$ and as $+\infty$ otherwise, $E_{\mathcal L}$ is convex along every quadratic displacement interpolation in $\mathcal P_2(\mathbb R^n)$. The hypotheses apply because $\nu_0$ and $\nu_1$ are absolutely continuous with respect to $\mathcal L^n$, have finite second moments, have finite Lebesgue entropy, and $\nu_t=(T_t)_{\#}\nu_0$ is their quadratic displacement interpolation. Hence, for every $t\in[0,1]$, \begin{align*} E_{\mathcal L}(\nu_t)\le (1-t)E_{\mathcal L}(\nu_0)+tE_{\mathcal L}(\nu_1). \end{align*} [/step] [step:Use uniform convexity of $V$ along each transport segment] Fix $x\in\mathbb R^n$ and define the one-dimensional function \begin{align*} g_x:[0,1]\to\mathbb R,\qquad g_x(s)=V((1-s)x+sT(x)). \end{align*} For $\nu_0$-a.e. $x$, the vector $T(x)$ is defined and finite. Since $V\in C^2(\mathbb R^n)$, the function $g_x$ belongs to $C^2([0,1])$, and \begin{align*} g_x''(s)=(T(x)-x)^\top D^2V((1-s)x+sT(x))(T(x)-x). \end{align*} The Hessian lower bound gives \begin{align*} g_x''(s)\ge \rho |T(x)-x|^2 \end{align*} for every $s\in[0,1]$. Applying the elementary one-dimensional strong convexity estimate to $g_x$ gives \begin{align*} V(T_t(x))\le (1-t)V(x)+tV(T(x))-\frac{\rho}{2}t(1-t)|T(x)-x|^2 \end{align*} for $\nu_0$-a.e. $x$. [guided] This guided expansion proves the pointwise strong convexity estimate used in this step; the remaining steps integrate this estimate and combine it with the entropy convexity estimate. We want to extract the curvature of $V$ along the exact straight line used by the Wasserstein geodesic. For a fixed starting point $x\in\mathbb R^n$ for which $T(x)$ is defined, the path followed by the transported particle is \begin{align*} s\mapsto (1-s)x+sT(x). \end{align*} This qualification is needed because the Brenier map is only specified $\nu_0$-a.e.; all pointwise estimates below are therefore asserted on that full $\nu_0$-measure set. We encode the value of the potential along this segment by the function \begin{align*} g_x:[0,1]\to\mathbb R \end{align*} defined by \begin{align*} g_x(s):=V((1-s)x+sT(x)). \end{align*} Since $V\in C^2(\mathbb R^n)$ and the map $s\mapsto (1-s)x+sT(x)$ is affine, the chain rule gives $g_x\in C^2([0,1])$. Differentiating twice in the one-dimensional variable $s$ gives \begin{align*} g_x''(s)=(T(x)-x)^\top D^2V((1-s)x+sT(x))(T(x)-x). \end{align*} Now the hypothesis on $D^2V$ applies with the vector $\xi=T(x)-x$ and the point $(1-s)x+sT(x)$. Therefore \begin{align*} g_x''(s)\ge \rho |T(x)-x|^2 \end{align*} for every $s\in[0,1]$. This means that the function \begin{align*} h_x:[0,1]\to\mathbb R \end{align*} defined by \begin{align*} h_x(s):=g_x(s)-\frac{\rho}{2}|T(x)-x|^2s^2 \end{align*} is convex, because $h_x''(s)=g_x''(s)-\rho |T(x)-x|^2\ge0$. Convexity of $h_x$ between $0$ and $1$ gives \begin{align*} h_x(t)\le (1-t)h_x(0)+t h_x(1). \end{align*} Substituting the definition of $h_x$ into this inequality yields \begin{align*} g_x(t)-\frac{\rho}{2}|T(x)-x|^2t^2\le (1-t)g_x(0)+t\left(g_x(1)-\frac{\rho}{2}|T(x)-x|^2\right). \end{align*} Rearranging gives \begin{align*} g_x(t)\le (1-t)g_x(0)+t g_x(1)-\frac{\rho}{2}t(1-t)|T(x)-x|^2. \end{align*} Finally, $g_x(0)=V(x)$, $g_x(1)=V(T(x))$, and $g_x(t)=V(T_t(x))$, so \begin{align*} V(T_t(x))\le (1-t)V(x)+tV(T(x))-\frac{\rho}{2}t(1-t)|T(x)-x|^2. \end{align*} This is the pointwise gain coming from uniform convexity. It is stronger than ordinary convexity exactly by the quadratic correction involving the transport distance $|T(x)-x|^2$. [/guided] [/step] [step:Integrate the potential estimate and identify the transport cost] Let $m:=V(0)-|\nabla V(0)|^2/\rho$ be the lower bound constant obtained above, and define the non-negative potential $\widetilde V:\mathbb R^n\to[0,\infty)$ by \begin{align*} \widetilde V(x):=V(x)-m. \end{align*} Subtracting the constant $m$ from the pointwise strong convexity estimate cancels on the affine terms, so for $\nu_0$-a.e. $x$, \begin{align*} \widetilde V(T_t(x))\le (1-t)\widetilde V(x)+t\widetilde V(T(x))-\frac{\rho}{2}t(1-t)|T(x)-x|^2. \end{align*} The functions $\widetilde V$ and $\widetilde V\circ T$ are $\nu_0$-integrable because $P_V(\nu_0)$ and $P_V(\nu_1)$ are finite and $T_{\#}\nu_0=\nu_1$. The function $x\mapsto |T(x)-x|^2$ is $\nu_0$-integrable because $T$ is an optimal transport map between measures in $\mathcal P_2(\mathbb R^n)$. Hence the right-hand side is integrable, and the preceding pointwise inequality implies that $\widetilde V\circ T_t$ is also $\nu_0$-integrable. Integrating with respect to $\nu_0$ is therefore justified and gives \begin{align*} \int_{\mathbb R^n}\widetilde V(T_t(x))\,d\nu_0(x)\le (1-t)\int_{\mathbb R^n}\widetilde V(x)\,d\nu_0(x)+t\int_{\mathbb R^n}\widetilde V(T(x))\,d\nu_0(x)-\frac{\rho}{2}t(1-t)\int_{\mathbb R^n}|T(x)-x|^2\,d\nu_0(x). \end{align*} Because $\nu_t=(T_t)_{\#}\nu_0$ and $T_{\#}\nu_0=\nu_1$, the pushforward identity gives \begin{align*} \int_{\mathbb R^n}\widetilde V(T_t(x))\,d\nu_0(x)=P_V(\nu_t)-m \end{align*} and \begin{align*} \int_{\mathbb R^n}\widetilde V(T(x))\,d\nu_0(x)=P_V(\nu_1)-m. \end{align*} Also, \begin{align*} \int_{\mathbb R^n}\widetilde V(x)\,d\nu_0(x)=P_V(\nu_0)-m. \end{align*} Substituting these three identities into the integrated inequality cancels the constants because $(1-t)m+tm=m$. Thus \begin{align*} P_V(\nu_t)\le (1-t)P_V(\nu_0)+tP_V(\nu_1)-\frac{\rho}{2}t(1-t)\int_{\mathbb R^n}|T(x)-x|^2\,d\nu_0(x). \end{align*} Since $T$ is the optimal transport map from $\nu_0$ to $\nu_1$ for the quadratic cost, \begin{align*} \int_{\mathbb R^n}|T(x)-x|^2\,d\nu_0(x)=W_2(\nu_0,\nu_1)^2. \end{align*} Therefore \begin{align*} P_V(\nu_t)\le (1-t)P_V(\nu_0)+tP_V(\nu_1)-\frac{\rho}{2}t(1-t)W_2(\nu_0,\nu_1)^2. \end{align*} [guided] The pointwise estimate from the previous step becomes useful only after integrating it against the starting measure $\nu_0$. Because $V$ may take negative values, we first subtract the lower bound constant $m$ and work with the non-negative function $\widetilde V=V-m$. This avoids any ambiguity in the integral while preserving the same convexity correction term. For $\nu_0$-a.e. $x$, the previous step gives \begin{align*} \widetilde V(T_t(x))\le (1-t)\widetilde V(x)+t\widetilde V(T(x))-\frac{\rho}{2}t(1-t)|T(x)-x|^2. \end{align*} The functions on the right are integrable with respect to $\nu_0$: the endpoint potential energies are finite, $T_{\#}\nu_0=\nu_1$, and the quadratic transport cost is finite because $T$ is optimal between measures in $\mathcal P_2(\mathbb R^n)$. Thus integration with respect to $\nu_0$ is justified and gives \begin{align*} \int_{\mathbb R^n}\widetilde V(T_t(x))\,d\nu_0(x)\le (1-t)\int_{\mathbb R^n}\widetilde V(x)\,d\nu_0(x)+t\int_{\mathbb R^n}\widetilde V(T(x))\,d\nu_0(x)-\frac{\rho}{2}t(1-t)\int_{\mathbb R^n}|T(x)-x|^2\,d\nu_0(x). \end{align*} The pushforward identities identify these three potential terms as $P_V(\nu_t)-m$, $P_V(\nu_0)-m$, and $P_V(\nu_1)-m$. The constants cancel because $(1-t)m+tm=m$. Finally, the optimality of $T$ identifies the last integral with $W_2(\nu_0,\nu_1)^2$. Therefore \begin{align*} P_V(\nu_t)\le (1-t)P_V(\nu_0)+tP_V(\nu_1)-\frac{\rho}{2}t(1-t)W_2(\nu_0,\nu_1)^2. \end{align*} [/guided] [/step] [step:Add the entropy and potential inequalities] Before using the entropy splitting at time $t$, we verify that it is an ordinary equality. The preceding step proves $P_V(\nu_t)\in\mathbb R$. Since $\nu_t\in\mathcal P_2(\mathbb R^n)$, the same Gaussian comparison used for the endpoints gives $E_{\mathcal L}(\nu_t)>-\infty$. The displacement convexity estimate gives $E_{\mathcal L}(\nu_t)<\infty$ because $E_{\mathcal L}(\nu_0)$ and $E_{\mathcal L}(\nu_1)$ are finite. Hence $E_{\mathcal L}(\nu_t)\in\mathbb R$, and the entropy splitting for $\nu_t$, $\nu_0$, and $\nu_1$ is legitimate. Thus \begin{align*} H(\nu_t\mid\mu)=E_{\mathcal L}(\nu_t)+P_V(\nu_t)+\log Z. \end{align*} Combining the displacement convexity estimate for $E_{\mathcal L}$ with the strong convexity estimate for $P_V$ gives \begin{align*} H(\nu_t\mid\mu)\le (1-t)E_{\mathcal L}(\nu_0)+tE_{\mathcal L}(\nu_1)+(1-t)P_V(\nu_0)+tP_V(\nu_1)+\log Z-\frac{\rho}{2}t(1-t)W_2(\nu_0,\nu_1)^2. \end{align*} Since $(1-t)\log Z+t\log Z=\log Z$, the right-hand side is \begin{align*} (1-t)H(\nu_0\mid\mu)+tH(\nu_1\mid\mu)-\frac{\rho}{2}t(1-t)W_2(\nu_0,\nu_1)^2. \end{align*} Thus, for every $t\in[0,1]$, \begin{align*} H(\nu_t\mid\mu) \le (1-t)H(\nu_0\mid\mu)+tH(\nu_1\mid\mu)-\frac{\rho}{2}t(1-t)W_2(\nu_0,\nu_1)^2. \end{align*} This is the desired displacement $\rho$-convexity inequality for the relative entropy. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.