Gibbs Variational Principle — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first prove the variational identity for [measurable functions](/page/Measurable%20Functions) that are bounded above; this includes the bounded case and supplies the entropy comparison. The key algebra is to subtract the proposed tilted measure and identify the error term as a relative entropy, whose non-negativity follows from the elementary inequality $a\log a-a+1\geq 0$. We then obtain the upper bound for a general $g$ by applying the bounded formula to two-sided truncations and passing to the limit. Finally, the reverse inequality is obtained from tilted probability measures supported on the bounded sets $\{|g|\leq m\}$, so each competitor has finite $g^+$-integral under the theorem's extended-value convention. [/proofplan] [step:Record the entropy inequality used in the variational comparison] Let $(X,\mathcal A,\lambda)$ be a probability space, and let $\rho$ be a probability measure on $(X,\mathcal A)$ with $\rho\ll\lambda$. Let \begin{align*} r:X\to[0,\infty] \end{align*} denote a [Radon-Nikodym density](/page/Absolutely%20Continuous%20Measures) of $\rho$ with respect to $\lambda$, so that $\rho(A)=\int_A r\,d\lambda$ for every $A\in\mathcal A$ and $\int_X r\,d\lambda=1$. Define $\psi:[0,\infty]\to[0,\infty]$ by $\psi(a)=a\log a-a+1$ for $a\in(0,\infty)$, with $\psi(0)=1$ and $\psi(+\infty)=+\infty$. We use the standard entropy convention $0\log 0=0$ when writing $r\log r$. On $[0,\infty)$, the function $\psi$ is convex, $\psi'(1)=0$, and $\psi(1)=0$, so $\psi(a)\geq 0$ for every $a\in[0,\infty]$ under this extended convention. Therefore \begin{align*} \int_X r\log r\,d\lambda=\int_X \psi(r)\,d\lambda+\int_X r\,d\lambda-\int_X 1\,d\lambda\geq 0. \end{align*} Thus $D(\rho\|\lambda)\geq 0$. Moreover, equality can occur only when $\psi(r)=0$ $\lambda$-a.e., hence only when $r=1$ $\lambda$-a.e., that is, $\rho=\lambda$. [/step] [step:Prove the formula when the potential is bounded above] Let \begin{align*} f:E\to\mathbb R \end{align*} be an $\mathcal E$-measurable function bounded above. Define $f^+:E\to[0,\infty)$ by $f^+(x)=\max\{f(x),0\}$ and $f^-:E\to[0,\infty)$ by $f^-(x)=\max\{-f(x),0\}$. Assume \begin{align*} 0<Z_f:=\int_E e^f\,d\mu<\infty. \end{align*} Define the tilted probability measure $\mu_f$ by \begin{align*} \frac{d\mu_f}{d\mu}=\frac{e^f}{Z_f}. \end{align*} This is a probability measure because the density \begin{align*} E\to[0,\infty),\qquad x\mapsto \frac{e^{f(x)}}{Z_f} \end{align*} is non-negative and integrates to $1$ with respect to $\mu$. Let $\nu\in\mathcal P(E)$. If $\nu\not\ll\mu$, then $D(\nu\|\mu)=+\infty$, and the variational expression is $-\infty$. Suppose $\nu\ll\mu$, and let \begin{align*} h:E\to[0,\infty] \end{align*} be a [Radon-Nikodym density](/page/Absolutely%20Continuous%20Measures) of $\nu$ with respect to $\mu$. If $D(\nu\|\mu)=+\infty$, then the variational expression is $-\infty$ under the same extended-value convention, so the upper bound is immediate. Hence assume $D(\nu\|\mu)<\infty$. Since the density \begin{align*} E\to(0,\infty),\qquad x\mapsto \frac{e^{f(x)}}{Z_f} \end{align*} is strictly positive, $\nu\ll\mu_f$, and the [Radon-Nikodym density](/page/Absolutely%20Continuous%20Measures) of $\nu$ with respect to $\mu_f$ is \begin{align*} \frac{d\nu}{d\mu_f}=\frac{h}{e^f/Z_f}. \end{align*} If \begin{align*} \int_E f^-\,d\nu=\infty, \end{align*} then \begin{align*} \int_E f\,d\nu=-\infty, \end{align*} so the desired upper bound is immediate. Otherwise all terms below are well-defined in $(-\infty,\infty]$. Since $D(\nu\|\mu)<\infty$, the positive part of $h\log h$ is $\mu$-integrable, and the negative part of $h\log h$ is integrable with respect to $\mu$ because $a\log a\geq -e^{-1}$ for $a\geq 0$ and $\mu(E)=1$. Since $f$ is bounded above, $f^+$ is bounded, so $\int_E f^+\,d\nu<\infty$; by the present case assumption, $\int_E f^-\,d\nu<\infty$. Hence $\int_E fh\,d\mu=\int_E f\,d\nu$ is finite, and the relative entropy expansion below is not an undefined extended-real subtraction. Direct expansion gives \begin{align*} D(\nu\|\mu_f)=\int_E h\log\left(\frac{h}{e^f/Z_f}\right)\,d\mu. \end{align*} Therefore \begin{align*} D(\nu\|\mu_f)=\int_E h\log h\,d\mu-\int_E fh\,d\mu+\log Z_f\int_E h\,d\mu. \end{align*} Since $\int_E h\,d\mu=1$, this rearranges to \begin{align*} \int_E f\,d\nu-D(\nu\|\mu)=\log Z_f-D(\nu\|\mu_f). \end{align*} The [entropy inequality](/theorems/6729) from the previous step gives $D(\nu\|\mu_f)\geq 0$, hence \begin{align*} \int_E f\,d\nu-D(\nu\|\mu)\leq \log Z_f. \end{align*} It remains to check that equality is achieved at $\mu_f$. Because $f$ is bounded above, say $f\leq M$, the negative part of $f e^f$ is integrable with respect to $\mu$: on $\{f<0\}$, the function $-f e^f$ is bounded above by $e^{-1}$, and on $\{f\geq 0\}$ there is no negative part. Hence $\int_E f\,d\mu_f$ is finite from below. Also \begin{align*} D(\mu_f\|\mu)=\int_E \frac{e^f}{Z_f}\left(f-\log Z_f\right)\,d\mu \end{align*} is finite-valued. Substituting $\nu=\mu_f$ in the identity above gives \begin{align*} \int_E f\,d\mu_f-D(\mu_f\|\mu)=\log Z_f. \end{align*} Thus \begin{align*} \log\int_E e^f\,d\mu=\sup_{\nu\in\mathcal P(E)}\left\{\int_E f\,d\nu-D(\nu\|\mu)\right\} \end{align*} whenever $f$ is measurable, bounded above, and has $0<\int_E e^f\,d\mu<\infty$. [guided] The central idea is to compare every competitor $\nu$ with the probability measure suggested by the formula. Let \begin{align*} f:E\to\mathbb R \end{align*} be an $\mathcal E$-measurable function bounded above. Define $f^+:E\to[0,\infty)$ by $f^+(x)=\max\{f(x),0\}$ and $f^-:E\to[0,\infty)$ by $f^-(x)=\max\{-f(x),0\}$. Define \begin{align*} Z_f:=\int_E e^f\,d\mu. \end{align*} The assumptions say $0<Z_f<\infty$, so the formula \begin{align*} \frac{d\mu_f}{d\mu}=\frac{e^f}{Z_f} \end{align*} defines a probability measure $\mu_f$ on $(E,\mathcal E)$. Now fix an arbitrary probability measure $\nu$ on $(E,\mathcal E)$. If $\nu$ is not absolutely continuous with respect to $\mu$, then $D(\nu\|\mu)=+\infty$ by definition, and this competitor contributes $-\infty$ to the supremum. Such a measure cannot improve the variational value. Assume therefore that $\nu\ll\mu$, and let \begin{align*} h:E\to[0,\infty] \end{align*} be a [Radon-Nikodym density](/page/Absolutely%20Continuous%20Measures) of $\nu$ with respect to $\mu$. If $D(\nu\|\mu)=+\infty$, then this competitor has variational value $-\infty$ under the extended-value convention, so it cannot improve the upper bound. We may therefore assume $D(\nu\|\mu)<\infty$ before performing the entropy algebra. Since the density \begin{align*} E\to(0,\infty),\qquad x\mapsto \frac{e^{f(x)}}{Z_f} \end{align*} is strictly positive, $\nu$ is also absolutely continuous with respect to $\mu_f$, and its density relative to $\mu_f$ is \begin{align*} \frac{d\nu}{d\mu_f}=\frac{h}{e^f/Z_f}. \end{align*} This is the quantity whose entropy measures the loss from not choosing the tilted measure. If \begin{align*} \int_E f^-\,d\nu=\infty, \end{align*} then \begin{align*} \int_E f\,d\nu=-\infty \end{align*} because $f^+$ is bounded by the assumed upper bound on $f$. The variational expression is then $-\infty$, and the upper bound follows. Otherwise $\int_E f\,d\nu$ is well-defined. We also check that the entropy computation is legitimate in the extended-real sense. Since $D(\nu\|\mu)<\infty$, the positive part of $h\log h$ is $\mu$-integrable, while the inequality $a\log a\geq -e^{-1}$ for $a\geq 0$ and the identity $\mu(E)=1$ show that the negative part of $h\log h$ is $\mu$-integrable. Since $f$ is bounded above and we are in the case $\int_E f^-\,d\nu<\infty$, the integral $\int_E fh\,d\mu=\int_E f\,d\nu$ is finite. Thus the following expansion is not an undefined $\infty-\infty$ subtraction. We compute the relative entropy of $\nu$ with respect to $\mu_f$: \begin{align*} D(\nu\|\mu_f)=\int_E h\log\left(\frac{h}{e^f/Z_f}\right)\,d\mu. \end{align*} Expanding the logarithm gives \begin{align*} D(\nu\|\mu_f)=\int_E h\log h\,d\mu-\int_E fh\,d\mu+\log Z_f\int_E h\,d\mu. \end{align*} Since $h$ is a probability density, $\int_E h\,d\mu=1$. Also $\int_E fh\,d\mu=\int_E f\,d\nu$. Therefore \begin{align*} \int_E f\,d\nu-D(\nu\|\mu)=\log Z_f-D(\nu\|\mu_f). \end{align*} The previous step proves $D(\nu\|\mu_f)\geq 0$, so every competitor satisfies \begin{align*} \int_E f\,d\nu-D(\nu\|\mu)\leq \log Z_f. \end{align*} Finally we verify that the proposed competitor actually reaches the upper bound. For $\nu=\mu_f$, the density $d\nu/d\mu_f$ is $1$, so the loss term $D(\mu_f\|\mu_f)$ is $0$. We also need the original expression to be finite. Since $f$ is bounded above and $-t e^t\leq e^{-1}$ for $t<0$, the negative part of $f e^f$ is $\mu$-integrable. Hence $\int_E f\,d\mu_f$ is finite from below, and \begin{align*} D(\mu_f\|\mu)=\int_E \frac{e^f}{Z_f}\left(f-\log Z_f\right)\,d\mu \end{align*} is finite-valued. Therefore \begin{align*} \int_E f\,d\mu_f-D(\mu_f\|\mu)=\log Z_f. \end{align*} This proves the bounded-above variational formula. [/guided] [/step] [step:Pass two-sided truncations to obtain the upper bound for $g$] For each $m\in\mathbb N$, define the bounded measurable truncation \begin{align*} g_m:E\to\mathbb R,\qquad x\mapsto \max\{-m,\min\{g(x),m\}\}. \end{align*} The bounded-above formula gives, for every $\nu\in\mathcal P(E)$, \begin{align*} \int_E g_m\,d\nu-D(\nu\|\mu)\leq \log\int_E e^{g_m}\,d\mu. \end{align*} The functions $e^{g_m}$ converge pointwise to $e^g$. Moreover $e^{g_m}\leq e^g+1$ for every $m$, so the [Dominated Convergence Theorem](/theorems/4) gives \begin{align*} \int_E e^{g_m}\,d\mu\to\int_E e^g\,d\mu=Z. \end{align*} Now fix $\nu\in\mathcal P(E)$. If \begin{align*} \int_E g^+\,d\nu=\infty \end{align*} or \begin{align*} D(\nu\|\mu)=+\infty, \end{align*} then the variational expression for $g$ is $-\infty$ by convention and is bounded above by $\log Z$. Otherwise \begin{align*} \int_E g^+\,d\nu<\infty \end{align*} and \begin{align*} D(\nu\|\mu)<\infty. \end{align*} Since \begin{align*} g_m=(g^+\wedge m)-(g^-\wedge m), \end{align*} the [Monotone Convergence Theorem](/theorems/509) applied separately to $g^+\wedge m$ and $g^-\wedge m$ gives \begin{align*} \int_E g_m\,d\nu\to\int_E g\,d\nu \end{align*} in the extended real sense, with value $-\infty$ if $\int_E g^-\,d\nu=\infty$. Passing to the limit in the inequality for $g_m$ yields \begin{align*} \int_E g\,d\nu-D(\nu\|\mu)\leq \log Z. \end{align*} Taking the supremum over all $\nu\in\mathcal P(E)$ gives \begin{align*} \sup_{\nu\in\mathcal P(E)}\left\{\int_E g\,d\nu-D(\nu\|\mu)\right\}\leq \log Z. \end{align*} [guided] The bounded-above formula is not applied directly to $g$, because $g$ may be unbounded above. Instead, for each $m\in\mathbb N$, we define the bounded measurable truncation \begin{align*} g_m:E\to\mathbb R,\qquad x\mapsto \max\{-m,\min\{g(x),m\}\}. \end{align*} The previous step applies to $g_m$, so every probability measure $\nu$ on $(E,\mathcal E)$ satisfies \begin{align*} \int_E g_m\,d\nu-D(\nu\|\mu)\leq \log\int_E e^{g_m}\,d\mu. \end{align*} We first pass to the limit on the right-hand side. The functions $e^{g_m}$ converge pointwise to $e^g$. Also $e^{g_m}\leq e^g+1$ for every $m$, and $e^g+1$ is $\mu$-integrable because $\mu$ is a probability measure and $\int_E e^g\,d\mu<\infty$. The [Dominated Convergence Theorem](/theorems/4) therefore gives \begin{align*} \int_E e^{g_m}\,d\mu\to\int_E e^g\,d\mu=Z. \end{align*} Now fix $\nu\in\mathcal P(E)$. If \begin{align*} \int_E g^+\,d\nu=\infty \end{align*} or \begin{align*} D(\nu\|\mu)=+\infty, \end{align*} then the convention in the theorem makes the variational expression for $g$ equal to $-\infty$, so the desired upper bound is immediate. Otherwise \begin{align*} \int_E g^+\,d\nu<\infty \end{align*} and \begin{align*} D(\nu\|\mu)<\infty. \end{align*} Since \begin{align*} g_m=(g^+\wedge m)-(g^-\wedge m), \end{align*} the [Monotone Convergence Theorem](/theorems/509) applied separately to $g^+\wedge m$ and $g^-\wedge m$ gives \begin{align*} \int_E g_m\,d\nu\to\int_E g\,d\nu \end{align*} in the extended real sense, with value $-\infty$ if $\int_E g^-\,d\nu=\infty$. Passing to the limit in the bounded-truncation inequality yields \begin{align*} \int_E g\,d\nu-D(\nu\|\mu)\leq \log Z. \end{align*} Because $\nu$ was arbitrary, taking the supremum over all probability measures on $(E,\mathcal E)$ gives \begin{align*} \sup_{\nu\in\mathcal P(E)}\left\{\int_E g\,d\nu-D(\nu\|\mu)\right\}\leq \log Z. \end{align*} [/guided] [/step] [step:Use bounded-support tilted measures to obtain the reverse inequality] For each $m\in\mathbb N$, define the measurable set \begin{align*} A_m:=\{x\in E: |g(x)|\leq m\}. \end{align*} Since $g$ is real-valued, the sets $A_m$ increase to $E$. For each $m\in\mathbb N$, let $\mathbb{1}_{A_m}:E\to\{0,1\}$ denote the indicator function of $A_m$, defined by $\mathbb{1}_{A_m}(x)=1$ for $x\in A_m$ and $\mathbb{1}_{A_m}(x)=0$ for $x\notin A_m$. Define \begin{align*} Z_m:=\int_{A_m} e^g\,d\mu. \end{align*} By the [Monotone Convergence Theorem](/theorems/509) applied to the non-negative functions $\mathbb{1}_{A_m}e^g$, we have $Z_m\uparrow Z$. Since $Z>0$, choose $m$ large enough that $Z_m>0$, and define the probability measure $\nu_m$ on $(E,\mathcal E)$ by \begin{align*} \frac{d\nu_m}{d\mu}=\frac{\mathbb{1}_{A_m}e^g}{Z_m}. \end{align*} On $A_m$ one has $|g|\leq m$, so $\int_E g^+\,d\nu_m\leq m$ and all terms below are finite. Moreover, \begin{align*} D(\nu_m\|\mu)=\int_E \frac{\mathbb{1}_{A_m}e^g}{Z_m}\left(g-\log Z_m\right)\,d\mu. \end{align*} Using the same density in the integral of $g$ gives \begin{align*} \int_E g\,d\nu_m-D(\nu_m\|\mu)=\log Z_m. \end{align*} Therefore \begin{align*} \sup_{\nu\in\mathcal P(E)}\left\{\int_E g\,d\nu-D(\nu\|\mu)\right\}\geq \log Z_m \end{align*} for every sufficiently large $m$. Since $Z_m\uparrow Z$ and the logarithm is increasing and continuous on $(0,\infty)$, passing $m\to\infty$ yields \begin{align*} \log Z\leq \sup_{\nu\in\mathcal P(E)}\left\{\int_E g\,d\nu-D(\nu\|\mu)\right\}. \end{align*} [/step] [step:Identify the maximizer when the tilted expression is finite] [guided] We now assemble the identity from the two estimates already proved. For the upper bound, two-sided truncations $g_m=\max\{-m,\min\{g,m\}\}$ satisfy the bounded variational formula, the [Dominated Convergence Theorem](/theorems/4) gives $\int_E e^{g_m}\,d\mu\to Z$, and the [Monotone Convergence Theorem](/theorems/509) passes $\int_E g_m\,d\nu$ to $\int_E g\,d\nu$ for every finite competitor. For the lower bound, the probability measures with density $\mathbb{1}_{\{|g|\leq m\}}e^g/Z_m$ have admissible variational value $\log Z_m$, and $Z_m\uparrow Z$. Hence \begin{align*} \log\int_E e^g\,d\mu=\sup_{\nu\in\mathcal P(E)}\left\{\int_E g\,d\nu-D(\nu\|\mu)\right\}. \end{align*} Now assume that the variational expression at $\nu_g$ is finite-valued, where \begin{align*} \frac{d\nu_g}{d\mu}=\frac{e^g}{Z}. \end{align*} Equivalently, $\nu_g$ has density \begin{align*} E\to(0,\infty),\qquad x\mapsto \frac{e^{g(x)}}{Z} \end{align*} with respect to $\mu$. Let $\nu\in\mathcal P(E)$. Competitors outside the finite regime cannot beat $\log Z$: if $\nu\not\ll\mu$, if $D(\nu\|\mu)=+\infty$, or if \begin{align*} \int_E g^+\,d\nu=\infty, \end{align*} then the theorem's convention assigns value $-\infty$. If instead $\nu\ll\mu$, $D(\nu\|\mu)<\infty$, and $\int_E g^+\,d\nu<\infty$, but \begin{align*} \int_E g^-\,d\nu=\infty, \end{align*} then $\int_E g\,d\nu=-\infty$, so this competitor also cannot exceed $\log Z$. It remains to handle the case where $\int_E g^+\,d\nu<\infty$, $\int_E g^-\,d\nu<\infty$, and $D(\nu\|\mu)<\infty$. Let $h:E\to[0,\infty]$ be a Radon-Nikodym density of $\nu$ with respect to $\mu$. Since the density of $\nu_g$ with respect to $\mu$ is strictly positive, $\nu\ll\nu_g$, and the density of $\nu$ with respect to $\nu_g$ is \begin{align*} \frac{d\nu}{d\nu_g}=\frac{hZ}{e^g}. \end{align*} The assumed integrability prevents any undefined $\infty-\infty$ subtraction, so expanding the relative entropy gives \begin{align*} D(\nu\|\nu_g)=D(\nu\|\mu)-\int_E g\,d\nu+\log Z. \end{align*} Equivalently, \begin{align*} \int_E g\,d\nu-D(\nu\|\mu)=\log Z-D(\nu\|\nu_g). \end{align*} The entropy inequality gives $D(\nu\|\nu_g)\geq 0$, so no admissible competitor exceeds $\log Z$. For $\nu=\nu_g$, the loss term is $D(\nu_g\|\nu_g)=0$, and the assumed finiteness makes the original expression well-defined. Thus the supremum is attained by $\nu_g$. If the tilted expression is not finite-valued, the preceding limiting construction still proves the identity. The two-sided truncations give the upper bound, and the measures supported on $A_m=\{|g|\leq m\}$ give admissible values equal to $\log Z_m$, with $Z_m\uparrow Z$. Hence the variational values converge upward to $\log Z$, which is the asserted extended-value case. [/guided] Combining the upper and lower bounds proves \begin{align*} \log\int_E e^g\,d\mu=\sup_{\nu\in\mathcal P(E)}\left\{\int_E g\,d\nu-D(\nu\|\mu)\right\}. \end{align*} Assume now that the variational expression at $\nu_g$ is finite-valued, where \begin{align*} \frac{d\nu_g}{d\mu}=\frac{e^g}{Z}. \end{align*} Equivalently, $\nu_g$ has density \begin{align*} E\to(0,\infty),\qquad x\mapsto \frac{e^{g(x)}}{Z} \end{align*} with respect to $\mu$. Let $\nu\in\mathcal P(E)$. If $\nu\not\ll\mu$, or $D(\nu\|\mu)=+\infty$, or $\int_E g^+\,d\nu=\infty$, then the convention in the statement makes the variational expression equal to $-\infty$, so this competitor cannot exceed $\log Z$. If $\nu\ll\mu$, $D(\nu\|\mu)<\infty$, and $\int_E g^+\,d\nu<\infty$, but $\int_E g^-\,d\nu=\infty$, then $\int_E g\,d\nu=-\infty$, and again the competitor cannot exceed $\log Z$. It remains only to consider the case where $\int_E g^+\,d\nu<\infty$, $\int_E g^-\,d\nu<\infty$, and $D(\nu\|\mu)<\infty$. Let $h:E\to[0,\infty]$ be a Radon-Nikodym density of $\nu$ with respect to $\mu$. Since the density of $\nu_g$ with respect to $\mu$ is strictly positive, $\nu\ll\nu_g$, and the density of $\nu$ with respect to $\nu_g$ is \begin{align*} \frac{d\nu}{d\nu_g}=\frac{hZ}{e^g}. \end{align*} The integrability assumptions make the following expansion an equality in $(-\infty,\infty]$ without any undefined $\infty-\infty$ subtraction: \begin{align*} D(\nu\|\nu_g)=D(\nu\|\mu)-\int_E g\,d\nu+\log Z. \end{align*} Equivalently, \begin{align*} \int_E g\,d\nu-D(\nu\|\mu)=\log Z-D(\nu\|\nu_g). \end{align*} The entropy inequality gives $D(\nu\|\nu_g)\geq 0$, so no competitor exceeds $\log Z$. For $\nu=\nu_g$, the loss term is $D(\nu_g\|\nu_g)=0$, and the assumed finiteness ensures the expression is well-defined. Therefore the supremum is attained by $\nu_g$. If that tilted expression is not finite-valued, the identity has already been proved by the limiting argument: the two-sided truncations give the upper bound, and the bounded-support tilted measures $\nu_m$ give admissible variational values increasing to $\log Z$. This is precisely the asserted extended-value interpretation. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Entropy Inequality

What brings you to Androma?

Start with a route through the knowledge graph.

Gibbs Variational Principle (Theorem # 6723)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Gibbs Variational Principle (Theorem # 6723)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further