Derivatives of the Log-Laplace Transform — Statement & Proof

Derivatives of the Log-Laplace Transform (Theorem # 6736)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first prove a local exponential domination estimate near an arbitrary point $\beta \in I$. This estimate implies that $F e^{tF}$ and $F^2 e^{tF}$ are locally dominated by integrable functions, so the partition function $Z_F(t)=\mathbb E[e^{tF}]$ may be differentiated twice under the expectation. The logarithmic derivative then gives the tilted expectation, and the second logarithmic derivative gives the tilted variance. Since variances are non-negative, the second derivative is non-negative everywhere, which implies convexity. [/proofplan] [step:Dominate the first two exponential moments locally around $\beta$] Let $(\Omega, \mathcal A, \mathbb P)$ be the probability space on which the [random variable](/page/Random%20Variable) $F: (\Omega, \mathcal A, \mathbb P) \to \mathbb R$ is defined. Fix $\beta \in I$. Since $I$ is open, choose $\varepsilon > 0$ such that $\beta-\varepsilon \in I$ and $\beta+\varepsilon \in I$. Define the open interval $J \subset I$ by \begin{align*} J = \left(\beta-\frac{\varepsilon}{2}, \beta+\frac{\varepsilon}{2}\right). \end{align*} For $k \in \{0,1,2\}$ define the constant $C_k > 0$ by $C_0=1$ and, for $k \in \{1,2\}$, \begin{align*} C_k = \left(\frac{2k}{e\varepsilon}\right)^k. \end{align*} For every $t \in J$ and every $x \in \mathbb R$, \begin{align*} |x|^k e^{tx} \le C_k\left(e^{(\beta+\varepsilon)x}+e^{(\beta-\varepsilon)x}\right). \end{align*} Indeed, if $x \ge 0$, then $t \le \beta+\varepsilon/2$, hence \begin{align*} |x|^k e^{tx} \le x^k e^{(\beta+\varepsilon/2)x} = x^k e^{-\varepsilon x/2}e^{(\beta+\varepsilon)x} \le C_k e^{(\beta+\varepsilon)x}. \end{align*} If $x < 0$, set $y=-x>0$. Since $t \ge \beta-\varepsilon/2$, \begin{align*} |x|^k e^{tx} = y^k e^{-ty} \le y^k e^{-(\beta-\varepsilon/2)y} = y^k e^{-\varepsilon y/2}e^{-(\beta-\varepsilon)y} \le C_k e^{(\beta-\varepsilon)x}. \end{align*} Applying this pointwise estimate with $x=F(\omega)$ gives, for every $t \in J$ and $k \in \{0,1,2\}$, \begin{align*} |F|^k e^{tF} \le C_k\left(e^{(\beta+\varepsilon)F}+e^{(\beta-\varepsilon)F}\right). \end{align*} The right-hand side is integrable because $\beta+\varepsilon,\beta-\varepsilon \in I$ and the exponential moment hypothesis holds at every point of $I$. [guided] We need one domination estimate that works uniformly for all parameters $t$ close to $\beta$. The point is that polynomial factors such as $|F|$ and $F^2$ can be absorbed into a small change of exponential rate. Because $I$ is open and $\beta \in I$, choose $\varepsilon>0$ such that the two endpoints $\beta-\varepsilon$ and $\beta+\varepsilon$ still lie in $I$. Define \begin{align*} J = \left(\beta-\frac{\varepsilon}{2}, \beta+\frac{\varepsilon}{2}\right). \end{align*} For $k=0$ set $C_0=1$. For $k \in \{1,2\}$ define \begin{align*} C_k = \left(\frac{2k}{e\varepsilon}\right)^k. \end{align*} This is the maximum of the function $y \mapsto y^k e^{-\varepsilon y/2}$ on $[0,\infty)$, obtained at $y=2k/\varepsilon$. We prove that for every $t \in J$ and $x \in \mathbb R$, \begin{align*} |x|^k e^{tx} \le C_k\left(e^{(\beta+\varepsilon)x}+e^{(\beta-\varepsilon)x}\right). \end{align*} There are two cases, because the larger exponential depends on the sign of $x$. If $x \ge 0$, then increasing the exponent increases $e^{tx}$. Since $t \le \beta+\varepsilon/2$, we have \begin{align*} |x|^k e^{tx} \le x^k e^{(\beta+\varepsilon/2)x}. \end{align*} Now separate the extra half of exponential growth: \begin{align*} x^k e^{(\beta+\varepsilon/2)x} = x^k e^{-\varepsilon x/2}e^{(\beta+\varepsilon)x}. \end{align*} By the definition of $C_k$, the factor $x^k e^{-\varepsilon x/2}$ is at most $C_k$, hence \begin{align*} |x|^k e^{tx} \le C_k e^{(\beta+\varepsilon)x}. \end{align*} If $x<0$, write $y=-x>0$. Now increasing the exponent decreases $e^{tx}$ because $x$ is negative. Since $t \ge \beta-\varepsilon/2$, we get \begin{align*} |x|^k e^{tx} = y^k e^{-ty} \le y^k e^{-(\beta-\varepsilon/2)y}. \end{align*} Again isolate the spare exponential decay: \begin{align*} y^k e^{-(\beta-\varepsilon/2)y} = y^k e^{-\varepsilon y/2}e^{-(\beta-\varepsilon)y}. \end{align*} The first factor is at most $C_k$, and because $y=-x$ the second factor is $e^{(\beta-\varepsilon)x}$. Therefore \begin{align*} |x|^k e^{tx} \le C_k e^{(\beta-\varepsilon)x}. \end{align*} Combining the two sign cases gives the uniform estimate. Substituting $x=F(\omega)$ gives \begin{align*} |F(\omega)|^k e^{tF(\omega)} \le C_k\left(e^{(\beta+\varepsilon)F(\omega)}+e^{(\beta-\varepsilon)F(\omega)}\right) \end{align*} for every $\omega \in \Omega$ and every $t \in J$. The right-hand side is integrable because $\beta+\varepsilon$ and $\beta-\varepsilon$ lie in $I$, and the theorem assumes finite exponential moments at every point of $I$. [/guided] [/step] [step:Differentiate the partition function under the expectation] Define the partition function $Z_F: I \to (0,\infty)$ by \begin{align*} Z_F(t)=\mathbb E[e^{tF}]. \end{align*} The preceding domination estimate with $k=0$ shows that $e^{tF}$ is integrable for each $t \in J$. For fixed $t \in J$ and sufficiently small $h \in \mathbb R$ with $t+h \in J$, the [mean value theorem](/theorems/186) applied to the map $s \mapsto e^{sF(\omega)}$ gives \begin{align*} \left|\frac{e^{(t+h)F(\omega)}-e^{tF(\omega)}}{h}\right| \le |F(\omega)|\sup_{s \in J} e^{sF(\omega)}. \end{align*} The right-hand side is bounded by an integrable multiple of $e^{(\beta+\varepsilon)F}+e^{(\beta-\varepsilon)F}$ by the domination estimate with $k=1$. Therefore the [Dominated Convergence Theorem](/theorems/4) gives \begin{align*} Z_F'(t)=\mathbb E[F e^{tF}] \end{align*} for every $t \in J$. Apply the same argument to the map $t \mapsto F e^{tF}$. For fixed $t \in J$ and small $h$ with $t+h \in J$, \begin{align*} \left|\frac{F e^{(t+h)F}-F e^{tF}}{h}\right| \le F^2 \sup_{s \in J} e^{sF}. \end{align*} The domination estimate with $k=2$ gives an integrable dominating function, so the [Dominated Convergence Theorem](/theorems/4) yields \begin{align*} Z_F''(t)=\mathbb E[F^2 e^{tF}] \end{align*} for every $t \in J$. The same domination argument, applied to $F e^{tF}$ and $F^2 e^{tF}$ as functions of $t$, also shows that $Z_F'$ and $Z_F''$ are continuous on $J$. Since $\beta \in I$ was arbitrary, $Z_F$ is twice differentiable on $I$, with continuous first derivative on each compact subinterval of $I$. [guided] We now justify differentiating under the expectation, which is the analytic step where the local domination estimate is used. Define the partition function $Z_F: I \to (0,\infty)$ by \begin{align*} Z_F(t)=\mathbb E[e^{tF}]. \end{align*} The codomain is $(0,\infty)$ because $e^{tF}>0$ almost surely and the hypothesis gives $e^{tF} \in L^1(\Omega,\mathcal A,\mathbb P)$ for each $t \in I$. Fix $t \in J$. For all sufficiently small $h \in \mathbb R$ with $t+h \in J$, apply the ordinary mean value theorem to the differentiable map $s \mapsto e^{sF(\omega)}$ on the interval with endpoints $t$ and $t+h$. For each $\omega \in \Omega$, this gives \begin{align*} \left|\frac{e^{(t+h)F(\omega)}-e^{tF(\omega)}}{h}\right| \le |F(\omega)|\sup_{s \in J} e^{sF(\omega)}. \end{align*} The first-step domination estimate with $k=1$ bounds the right-hand side by an integrable function depending only on $\beta$ and $\varepsilon$, namely a constant multiple of $e^{(\beta+\varepsilon)F}+e^{(\beta-\varepsilon)F}$. The exponential moment hypothesis makes this function integrable because $\beta+\varepsilon$ and $\beta-\varepsilon$ belong to $I$. For each $\omega \in \Omega$, the difference quotient converges pointwise to $F(\omega)e^{tF(\omega)}$ as $h \to 0$. The [Dominated Convergence Theorem](/theorems/4) applies to the family of difference quotients, so \begin{align*} Z_F'(t)=\mathbb E[F e^{tF}]. \end{align*} To differentiate once more, repeat the same argument for the map $t \mapsto F e^{tF}$. For fixed $t \in J$ and sufficiently small $h$ with $t+h \in J$, the mean value theorem applied to $s \mapsto F(\omega)e^{sF(\omega)}$ gives \begin{align*} \left|\frac{F(\omega)e^{(t+h)F(\omega)}-F(\omega)e^{tF(\omega)}}{h}\right| \le F(\omega)^2\sup_{s \in J} e^{sF(\omega)}. \end{align*} The domination estimate with $k=2$ bounds this by an integrable multiple of $e^{(\beta+\varepsilon)F}+e^{(\beta-\varepsilon)F}$. The pointwise limit of the difference quotient is $F(\omega)^2e^{tF(\omega)}$, so the [Dominated Convergence Theorem](/theorems/4) gives \begin{align*} Z_F''(t)=\mathbb E[F^2 e^{tF}]. \end{align*} The same dominated convergence argument also gives continuity of the derivative formulas on $J$. Indeed, if $t_n \to t$ in $J$, then $F e^{t_nF} \to F e^{tF}$ and $F^2e^{t_nF} \to F^2e^{tF}$ pointwise, and the domination estimates with $k=1$ and $k=2$ give integrable dominating functions independent of $n$ once $n$ is large. Hence $Z_F'$ and $Z_F''$ are continuous on $J$. Because $t \in J$ was arbitrary, $Z_F$ is twice differentiable on $J$ with continuous first derivative. Because the original point $\beta \in I$ was arbitrary and every point of $I$ has such a neighbourhood $J$, $Z_F$ is twice differentiable on all of $I$, with continuous first derivative on each compact subinterval of $I$. [/guided] [/step] [step:Compute the first derivative as the tilted expectation] For each $\beta \in I$, $Z_F(\beta)>0$ because $e^{\beta F}>0$ almost surely. Define the probability measure $\mathbb P_\beta$ on $(\Omega,\mathcal A)$ by \begin{align*} \mathbb P_\beta(A)=\frac{\mathbb E[\mathbb 1_A e^{\beta F}]}{Z_F(\beta)} \end{align*} for every event $A \in \mathcal A$. The previous step with $t=\beta$ gives $F e^{\beta F} \in L^1(\Omega,\mathcal A,\mathbb P)$, so $F \in L^1(\Omega,\mathcal A,\mathbb P_\beta)$ and \begin{align*} \mathbb E_\beta[F]=\frac{\mathbb E[F e^{\beta F}]}{Z_F(\beta)}. \end{align*} Since $\Lambda_F(\beta)=\log Z_F(\beta)$ and $Z_F(\beta)>0$, compute the derivative directly. For $h \ne 0$ with $\beta+h \in I$ and $Z_F(\beta+h) \ne Z_F(\beta)$, the [Mean Value Theorem](/theorems/632) applied to $r \mapsto \log r$ on the interval with endpoints $Z_F(\beta)$ and $Z_F(\beta+h)$ gives a number $\xi_h$ between these two positive values such that \begin{align*} \frac{\log Z_F(\beta+h)-\log Z_F(\beta)}{h}=\frac{1}{\xi_h}\frac{Z_F(\beta+h)-Z_F(\beta)}{h}. \end{align*} If $Z_F(\beta+h)=Z_F(\beta)$, the same displayed identity holds with the left-hand side and the second factor on the right both equal to $0$. Since $Z_F$ is continuous and differentiable at $\beta$, letting $h \to 0$ gives \begin{align*} \Lambda_F'(\beta)=\frac{Z_F'(\beta)}{Z_F(\beta)}. \end{align*} Substituting $Z_F'(\beta)=\mathbb E[F e^{\beta F}]$ gives \begin{align*} \Lambda_F'(\beta)=\mathbb E_\beta[F]. \end{align*} [/step] [step:Compute the second derivative as the tilted variance] The previous differentiation step gives $F^2 e^{\beta F} \in L^1(\Omega,\mathcal A,\mathbb P)$, so $F \in L^2(\Omega,\mathcal A,\mathbb P_\beta)$. Set $A: I \to \mathbb R$ by $A(t)=Z_F'(t)$ and $B: I \to (0,\infty)$ by $B(t)=Z_F(t)$. Since $A$ and $B$ are differentiable at $\beta$ and $B$ is continuous with $B(\beta)>0$, the quotient derivative follows from the identity \begin{align*} \frac{A(\beta+h)}{B(\beta+h)}-\frac{A(\beta)}{B(\beta)}=\frac{(A(\beta+h)-A(\beta))B(\beta)-A(\beta)(B(\beta+h)-B(\beta))}{B(\beta+h)B(\beta)} \end{align*} valid for all sufficiently small $h$ with $\beta+h \in I$. Dividing by $h$ and letting $h \to 0$ gives \begin{align*} \Lambda_F''(\beta)=\frac{Z_F''(\beta)Z_F(\beta)-(Z_F'(\beta))^2}{(Z_F(\beta))^2}. \end{align*} Equivalently, \begin{align*} \Lambda_F''(\beta)=\frac{Z_F''(\beta)}{Z_F(\beta)}-\left(\frac{Z_F'(\beta)}{Z_F(\beta)}\right)^2. \end{align*} Using the identities \begin{align*} Z_F'(\beta)=\mathbb E[F e^{\beta F}] \end{align*} and \begin{align*} Z_F''(\beta)=\mathbb E[F^2 e^{\beta F}], \end{align*} we obtain \begin{align*} \Lambda_F''(\beta)=\mathbb E_\beta[F^2]-(\mathbb E_\beta[F])^2. \end{align*} By the definition of variance under $\mathbb P_\beta$, \begin{align*} \operatorname{Var}_{\mathbb P_\beta}(F)=\mathbb E_\beta[F^2]-(\mathbb E_\beta[F])^2. \end{align*} Thus \begin{align*} \Lambda_F''(\beta)=\operatorname{Var}_{\mathbb P_\beta}(F). \end{align*} [/step] [step:Conclude convexity from non-negativity of the variance] For every $\beta \in I$, \begin{align*} \Lambda_F''(\beta)=\operatorname{Var}_{\mathbb P_\beta}(F)\ge 0. \end{align*} Let $a,b \in I$ with $a<b$. The preceding steps show that $\Lambda_F'$ is continuous on $[a,b]$ and differentiable on $(a,b)$. Applying the [Mean Value Theorem](/theorems/632) to $\Lambda_F'$ on $[a,b]$, there exists $c \in (a,b)$ such that \begin{align*} \Lambda_F'(b)-\Lambda_F'(a)=(b-a)\Lambda_F''(c). \end{align*} Since $b-a>0$ and $\Lambda_F''(c)\ge 0$, this proves $\Lambda_F'(a)\le \Lambda_F'(b)$. Therefore $\Lambda_F'$ is nondecreasing on $I$. It remains only to translate this monotonicity into convexity. Let $x,y,z \in I$ with $x<y<z$. Applying the [Mean Value Theorem](/theorems/632) to $\Lambda_F$ on $[x,y]$ and on $[y,z]$, there exist $p \in (x,y)$ and $q \in (y,z)$ such that \begin{align*} \frac{\Lambda_F(y)-\Lambda_F(x)}{y-x}=\Lambda_F'(p) \end{align*} and \begin{align*} \frac{\Lambda_F(z)-\Lambda_F(y)}{z-y}=\Lambda_F'(q). \end{align*} Since $p<q$ and $\Lambda_F'$ is nondecreasing, the left secant slope is at most the right secant slope. Rearranging this secant-slope inequality gives, for $y=(1-\theta)x+\theta z$ with $\theta \in (0,1)$, \begin{align*} \Lambda_F(y)\le (1-\theta)\Lambda_F(x)+\theta\Lambda_F(z). \end{align*} The cases $\theta=0$ and $\theta=1$ are equalities, so $\Lambda_F$ is convex on $I$. Combining the first- and second-derivative formulas proves the theorem. [/step]

Prerequisites (0/5 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Mean Value Theorem

Definitions & Concepts

Explore Further

Expectation Definition Event Definition Continuity Definition Variance Definition Mean Value Theorem Theorem #186 Uniform Consistency of Kernel Density Estimators Probability & Statistics Union Bound for Simultaneous Deviations Probability & Statistics Asymptotic Mean Integrated Squared Error for Kernel Density Estimators Probability & Statistics Tensorization of Hellinger Affinity Probability & Statistics Invariance Properties of Brownian Motion Brownian Motion Splitting Property of the Poisson Distribution Poisson Processes Stability Selection False Discovery Bound Probability & Statistics Uniform Distribution of Ranks for I.I.D. Continuous Random Variables Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.

Derivatives of the Log-Laplace Transform (Theorem # 6736)

Discussion

Proof

Prerequisites (0/5 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Derivatives of the Log-Laplace Transform (Theorem # 6736)

Discussion

Proof

Prerequisites (0/5 completed)

Prerequisites Graph

Explore Further