Euler-Lagrange Equation for an Entropy-Potential JKO Step

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We perturb the minimising measure by compactly supported smooth flows and use minimality to force the first variation of the JKO functional to vanish. The entropy and potential terms are differentiated by change of variables and the chain rule, while the Wasserstein term is controlled using the optimal map from the minimiser to the previous step. Testing with both positive and negative perturbation parameters gives an integral identity against every compactly supported smooth vector field. The fundamental lemma then identifies the integrand and gives the Euler-Lagrange equation. [/proofplan] [step:Introduce compactly supported flow perturbations of the minimiser] Set \begin{align*} \rho:=\rho_{\tau,k+1}. \end{align*} Define \begin{align*} \mu:=\rho\,\mathcal L^n. \end{align*} Let \begin{align*} \xi:\mathbb R^n\to\mathbb R^n \end{align*} be a compactly supported $C^\infty$ vector field. For $\varepsilon\in\mathbb R$ with $|\varepsilon|$ sufficiently small, define the map $\Phi_\varepsilon:\mathbb R^n\to\mathbb R^n$ by \begin{align*} \Phi_\varepsilon(x):=x+\varepsilon \xi(x). \end{align*} Since $\xi$ is smooth and compactly supported, $\Phi_\varepsilon$ is a $C^1$ diffeomorphism for sufficiently small $|\varepsilon|$, and it equals the identity outside a compact set. Define the perturbed probability measure \begin{align*} \mu_\varepsilon:=(\Phi_\varepsilon)_{\#}\mu. \end{align*} The compact support of $\xi$ implies that $\mu_\varepsilon\in\mathcal P_2(\mathbb R^n)$ and that the potential term remains finite whenever it is finite for $\mu$. The bounded-below and linear-gradient-growth assumptions on $V$ are used here to make the entropy-plus-potential functional well defined on the finite-second-moment class and to keep these compactly supported competitors admissible; the local Euler-Lagrange variation below uses only the stated $C^1$ regularity and local boundedness of $\nabla V$. [/step] [step:Use minimality to obtain a two-sided first-variation inequality] Define the functional $J:\mathcal P_2(\mathbb R^n)\to(-\infty,+\infty]$ by \begin{align*} J[\nu]:=\mathcal F[\nu]+\frac{1}{2\tau}W_2^2(\nu,\mu_{\tau,k}). \end{align*} Since $\mu$ minimises $J$, for all sufficiently small $\varepsilon$ one has \begin{align*} 0\le J[\mu_\varepsilon]-J[\mu]. \end{align*} We will compute the first variations of the three terms in $J[\mu_\varepsilon]$ at $\varepsilon=0$. [guided] The perturbation $\mu_\varepsilon=(\Phi_\varepsilon)_{\#}\mu$ is an admissible competitor in the minimisation problem because it is still a probability measure with finite second moment. Thus the variational principle is not an equality at first; it is the inequality \begin{align*} 0\le J[\mu_\varepsilon]-J[\mu]. \end{align*} The point of using both signs of $\varepsilon$ is that, after we compute an upper first-order expansion of the right-hand side, the inequality for $\varepsilon>0$ gives one sign and the inequality for $\varepsilon<0$ gives the opposite sign. This is the standard way to extract the Euler-Lagrange equation from a minimisation problem when the Wasserstein term is estimated using a competitor coupling rather than differentiated by an independent differentiability theorem. [/guided] [/step] [step:Differentiate the entropy by the Jacobian change of variables] Let $\rho_\varepsilon:\mathbb R^n\to[0,\infty)$ denote the density of $\mu_\varepsilon$ with respect to $\mathcal L^n$. Let $I_n\in\mathbb R^{n\times n}$ be the identity matrix, and let $J\xi_x\in\mathbb R^{n\times n}$ denote the Jacobian matrix of $\xi$ at $x$, with entries $(J\xi_x)_{ij}=\partial_{x_j}\xi_i(x)$. By the change-of-variables formula for the diffeomorphism $\Phi_\varepsilon$, \begin{align*} \rho_\varepsilon(\Phi_\varepsilon(x))\det J(\Phi_\varepsilon)_x=\rho(x) \end{align*} for $\mathcal L^n$-a.e. $x\in\mathbb R^n$, where $J(\Phi_\varepsilon)_x=I_n+\varepsilon J\xi_x$ is the Jacobian matrix of $\Phi_\varepsilon$ at $x$. Hence \begin{align*} \int_{\mathbb R^n}\rho_\varepsilon\log\rho_\varepsilon\,d\mathcal L^n = \int_{\mathbb R^n}\rho(x)\log\left(\frac{\rho(x)}{\det(I_n+\varepsilon J\xi_x)}\right)\,d\mathcal L^n(x). \end{align*} Let $K:=\operatorname{supp}\xi$. Since $J\xi$ is bounded on $K$, there are $\varepsilon_0>0$ and $m>0$ such that $\det(I_n+\varepsilon J\xi_x)\ge m$ for every $x\in\mathbb R^n$ and $|\varepsilon|<\varepsilon_0$. The difference quotients of $\log\det(I_n+\varepsilon J\xi_x)$ are bounded on $K$ by a constant depending only on $\xi$ and vanish off $K$. Since $\rho\in C^1(\mathbb R^n)$ and $K$ is compact, $\rho\mathbb 1_K\in L^1(\mathbb R^n,\mathcal L^n)$. Dominated convergence therefore justifies differentiating under the integral. Using \begin{align*} \frac{d}{d\varepsilon}\bigg|_{\varepsilon=0}\log\det(I_n+\varepsilon J\xi_x)=\operatorname{tr}(J\xi_x)=\nabla\cdot\xi(x), \end{align*} we obtain \begin{align*} \frac{d}{d\varepsilon}\bigg|_{\varepsilon=0} \int_{\mathbb R^n}\rho_\varepsilon\log\rho_\varepsilon\,d\mathcal L^n = -\int_{\mathbb R^n}\rho(x)\nabla\cdot\xi(x)\,d\mathcal L^n(x). \end{align*} Because $\rho\in C^1(\mathbb R^n)$ and $\xi$ is compactly supported, integration by parts on a compact set gives \begin{align*} -\int_{\mathbb R^n}\rho(x)\nabla\cdot\xi(x)\,d\mathcal L^n(x) = \int_{\mathbb R^n}\nabla\rho(x)\cdot\xi(x)\,d\mathcal L^n(x). \end{align*} Since $\rho>0$, this is \begin{align*} \int_{\mathbb R^n}\nabla\log\rho(x)\cdot\xi(x)\rho(x)\,d\mathcal L^n(x). \end{align*} [/step] [step:Differentiate the potential term by the chain rule] Using the definition of pushforward, \begin{align*} \int_{\mathbb R^n}V\,d\mu_\varepsilon = \int_{\mathbb R^n}V(\Phi_\varepsilon(x))\rho(x)\,d\mathcal L^n(x). \end{align*} The chain rule gives \begin{align*} \frac{d}{d\varepsilon}\bigg|_{\varepsilon=0}V(\Phi_\varepsilon(x)) = \nabla V(x)\cdot\xi(x). \end{align*} Since $\xi$ is compactly supported and $\nabla V$ is continuous, the derivative is dominated on the support of $\xi$ by an integrable multiple of $\rho$. Therefore differentiation under the integral gives \begin{align*} \frac{d}{d\varepsilon}\bigg|_{\varepsilon=0} \int_{\mathbb R^n}V\,d\mu_\varepsilon = \int_{\mathbb R^n}\nabla V(x)\cdot\xi(x)\rho(x)\,d\mathcal L^n(x). \end{align*} [/step] [step:Estimate the Wasserstein variation using the optimal map] Since $T_{\#}\mu=\mu_{\tau,k}$, the measure \begin{align*} (\Phi_\varepsilon,T)_{\#}\mu \end{align*} is a coupling of $\mu_\varepsilon$ and $\mu_{\tau,k}$. Therefore \begin{align*} W_2^2(\mu_\varepsilon,\mu_{\tau,k}) \le \int_{\mathbb R^n}|\Phi_\varepsilon(x)-T(x)|^2\rho(x)\,d\mathcal L^n(x). \end{align*} At $\varepsilon=0$, the map $T$ is optimal, so \begin{align*} W_2^2(\mu,\mu_{\tau,k}) = \int_{\mathbb R^n}|x-T(x)|^2\rho(x)\,d\mathcal L^n(x). \end{align*} Subtracting the two displays and expanding the square gives \begin{align*} W_2^2(\mu_\varepsilon,\mu_{\tau,k})-W_2^2(\mu,\mu_{\tau,k}) \le 2\varepsilon\int_{\mathbb R^n}(x-T(x))\cdot\xi(x)\rho(x)\,d\mathcal L^n(x) + \varepsilon^2\int_{\mathbb R^n}|\xi(x)|^2\rho(x)\,d\mathcal L^n(x). \end{align*} Thus the Wasserstein contribution to the first variation is bounded above by \begin{align*} \frac{1}{\tau}\int_{\mathbb R^n}(x-T(x))\cdot\xi(x)\rho(x)\,d\mathcal L^n(x). \end{align*} [guided] We need a first-order formula for the squared Wasserstein term. Instead of assuming differentiability of the map $\nu\mapsto W_2^2(\nu,\mu_{\tau,k})$, we use the optimal map already present in the theorem statement. Since $T_{\#}\mu=\mu_{\tau,k}$, transporting each point $x$ first to $\Phi_\varepsilon(x)$ and pairing it with $T(x)$ gives a coupling of $\mu_\varepsilon$ and $\mu_{\tau,k}$. This coupling may not be optimal after perturbation, but it gives the upper bound \begin{align*} W_2^2(\mu_\varepsilon,\mu_{\tau,k}) \le \int_{\mathbb R^n}|\Phi_\varepsilon(x)-T(x)|^2\rho(x)\,d\mathcal L^n(x). \end{align*} At $\varepsilon=0$, the coupling induced by $T$ is optimal by hypothesis, so \begin{align*} W_2^2(\mu,\mu_{\tau,k}) = \int_{\mathbb R^n}|x-T(x)|^2\rho(x)\,d\mathcal L^n(x). \end{align*} Now substitute $\Phi_\varepsilon(x)=x+\varepsilon\xi(x)$ and expand the Euclidean square: \begin{align*} |\Phi_\varepsilon(x)-T(x)|^2=|x-T(x)|^2+2\varepsilon (x-T(x))\cdot\xi(x)+\varepsilon^2|\xi(x)|^2. \end{align*} After integrating with respect to $\rho\,d\mathcal L^n$, the zeroth-order terms cancel. Dividing by $2\tau$ shows that the first-order upper variation of the Wasserstein penalty is \begin{align*} \frac{1}{\tau}\int_{\mathbb R^n}(x-T(x))\cdot\xi(x)\rho(x)\,d\mathcal L^n(x). \end{align*} The remaining $\varepsilon^2$ term disappears after division by $\varepsilon$ and passage to $\varepsilon\to0$. [/guided] [/step] [step:Combine the variations and use both signs of the perturbation] For $\varepsilon>0$, minimality gives $0\le J[\mu_\varepsilon]-J[\mu]$. Insert the entropy and potential first-order expansions and the Wasserstein upper expansion proved above, divide the resulting upper bound by $\varepsilon$, and let $\varepsilon\downarrow0$. The entropy and potential quotients converge by the differentiability already established, and the Wasserstein remainder contributes \begin{align*} \frac{\varepsilon}{2\tau}\int_{\mathbb R^n}|\xi(x)|^2\rho(x)\,d\mathcal L^n(x), \end{align*} which tends to $0$ because $\xi$ is compactly supported and $\rho\in L^1_{\mathrm{loc}}(\mathbb R^n)$. Therefore \begin{align*} 0 \le \int_{\mathbb R^n} \left( \nabla\log\rho(x)+\nabla V(x)+\frac{x-T(x)}{\tau} \right)\cdot\xi(x)\rho(x)\,d\mathcal L^n(x). \end{align*} Repeating the same argument with $-\xi$ in place of $\xi$ gives \begin{align*} 0 \le -\int_{\mathbb R^n} \left( \nabla\log\rho(x)+\nabla V(x)+\frac{x-T(x)}{\tau} \right)\cdot\xi(x)\rho(x)\,d\mathcal L^n(x), \end{align*} which is the reverse inequality. Hence \begin{align*} \int_{\mathbb R^n} \left( \nabla\log\rho(x)+\nabla V(x)+\frac{x-T(x)}{\tau} \right)\cdot\xi(x)\rho(x)\,d\mathcal L^n(x)=0 \end{align*} for every $\xi\in C_c^\infty(\mathbb R^n;\mathbb R^n)$. [/step] [step:Apply the fundamental lemma to identify the vector field] Define the Borel vector field $G:\mathbb R^n\to\mathbb R^n$ by \begin{align*} G(x):=\nabla\log\rho(x)+\nabla V(x)+\frac{x-T(x)}{\tau}. \end{align*} The preceding step says that \begin{align*} \int_{\mathbb R^n}G(x)\cdot\xi(x)\rho(x)\,d\mathcal L^n(x)=0 \end{align*} for every compactly supported smooth vector field $\xi:\mathbb R^n\to\mathbb R^n$. Each component of $G$ is locally integrable with respect to $\rho\,d\mathcal L^n$: the terms $\nabla\log\rho$ and $\nabla V$ are continuous, and the term $x-T(x)$ is locally $\rho\,d\mathcal L^n$-integrable because $T\in L^2(\mu;\mathbb R^n)$ and $\mu\in\mathcal P_2(\mathbb R^n)$. Define $H:\mathbb R^n\to\mathbb R^n$ by $H(x):=G(x)\rho(x)$. The preceding local integrability check shows that each component of $H$ belongs to $L^1_{\mathrm{loc}}(\mathbb R^n,\mathcal L^n)$. The integral identity is equivalently \begin{align*} \int_{\mathbb R^n}H(x)\cdot\xi(x)\,d\mathcal L^n(x)=0 \end{align*} for every compactly supported smooth vector field $\xi:\mathbb R^n\to\mathbb R^n$. By the fundamental lemma of the calculus of variations for vector-valued locally integrable functions, applied componentwise with respect to $\mathcal L^n$, we get $H(x)=0$ for $\mathcal L^n$-a.e. $x\in\mathbb R^n$. Since $\rho$ is continuous and strictly positive, this implies \begin{align*} G(x)=0 \end{align*} for $\rho\,\mathcal L^n$-a.e. $x\in\mathbb R^n$. Therefore \begin{align*} \frac{T(x)-x}{\tau}=\nabla\log\rho(x)+\nabla V(x) \end{align*} for $\rho\,\mathcal L^n$-a.e. $x\in\mathbb R^n$. Recalling that $\rho=\rho_{\tau,k+1}$ and $\mu=\rho_{\tau,k+1}\mathcal L^n$, this is exactly the claimed Euler-Lagrange equation. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.