[proofplan]
We first prove a local exponential domination estimate near an arbitrary point $\beta \in I$. This estimate implies that $F e^{tF}$ and $F^2 e^{tF}$ are locally dominated by integrable functions, so the partition function $Z_F(t)=\mathbb E[e^{tF}]$ may be differentiated twice under the expectation. The logarithmic derivative then gives the tilted expectation, and the second logarithmic derivative gives the tilted variance. Since variances are non-negative, the second derivative is non-negative everywhere, which implies convexity.
[/proofplan]
[step:Dominate the first two exponential moments locally around $\beta$]
Let $(\Omega, \mathcal A, \mathbb P)$ be the probability space on which the [random variable](/page/Random%20Variable) $F: (\Omega, \mathcal A, \mathbb P) \to \mathbb R$ is defined. Fix $\beta \in I$. Since $I$ is open, choose $\varepsilon > 0$ such that $\beta-\varepsilon \in I$ and $\beta+\varepsilon \in I$. Define the open interval $J \subset I$ by
\begin{align*}
J = \left(\beta-\frac{\varepsilon}{2}, \beta+\frac{\varepsilon}{2}\right).
\end{align*}
For $k \in \{0,1,2\}$ define the constant $C_k > 0$ by $C_0=1$ and, for $k \in \{1,2\}$,
\begin{align*}
C_k = \left(\frac{2k}{e\varepsilon}\right)^k.
\end{align*}
For every $t \in J$ and every $x \in \mathbb R$,
\begin{align*}
|x|^k e^{tx} \le C_k\left(e^{(\beta+\varepsilon)x}+e^{(\beta-\varepsilon)x}\right).
\end{align*}
Indeed, if $x \ge 0$, then $t \le \beta+\varepsilon/2$, hence
\begin{align*}
|x|^k e^{tx} \le x^k e^{(\beta+\varepsilon/2)x} = x^k e^{-\varepsilon x/2}e^{(\beta+\varepsilon)x} \le C_k e^{(\beta+\varepsilon)x}.
\end{align*}
If $x < 0$, set $y=-x>0$. Since $t \ge \beta-\varepsilon/2$,
\begin{align*}
|x|^k e^{tx} = y^k e^{-ty} \le y^k e^{-(\beta-\varepsilon/2)y} = y^k e^{-\varepsilon y/2}e^{-(\beta-\varepsilon)y} \le C_k e^{(\beta-\varepsilon)x}.
\end{align*}
Applying this pointwise estimate with $x=F(\omega)$ gives, for every $t \in J$ and $k \in \{0,1,2\}$,
\begin{align*}
|F|^k e^{tF} \le C_k\left(e^{(\beta+\varepsilon)F}+e^{(\beta-\varepsilon)F}\right).
\end{align*}
The right-hand side is integrable because $\beta+\varepsilon,\beta-\varepsilon \in I$ and the exponential moment hypothesis holds at every point of $I$.
[guided]
We need one domination estimate that works uniformly for all parameters $t$ close to $\beta$. The point is that polynomial factors such as $|F|$ and $F^2$ can be absorbed into a small change of exponential rate.
Because $I$ is open and $\beta \in I$, choose $\varepsilon>0$ such that the two endpoints $\beta-\varepsilon$ and $\beta+\varepsilon$ still lie in $I$. Define
\begin{align*}
J = \left(\beta-\frac{\varepsilon}{2}, \beta+\frac{\varepsilon}{2}\right).
\end{align*}
For $k=0$ set $C_0=1$. For $k \in \{1,2\}$ define
\begin{align*}
C_k = \left(\frac{2k}{e\varepsilon}\right)^k.
\end{align*}
This is the maximum of the function $y \mapsto y^k e^{-\varepsilon y/2}$ on $[0,\infty)$, obtained at $y=2k/\varepsilon$.
We prove that for every $t \in J$ and $x \in \mathbb R$,
\begin{align*}
|x|^k e^{tx} \le C_k\left(e^{(\beta+\varepsilon)x}+e^{(\beta-\varepsilon)x}\right).
\end{align*}
There are two cases, because the larger exponential depends on the sign of $x$.
If $x \ge 0$, then increasing the exponent increases $e^{tx}$. Since $t \le \beta+\varepsilon/2$, we have
\begin{align*}
|x|^k e^{tx} \le x^k e^{(\beta+\varepsilon/2)x}.
\end{align*}
Now separate the extra half of exponential growth:
\begin{align*}
x^k e^{(\beta+\varepsilon/2)x} = x^k e^{-\varepsilon x/2}e^{(\beta+\varepsilon)x}.
\end{align*}
By the definition of $C_k$, the factor $x^k e^{-\varepsilon x/2}$ is at most $C_k$, hence
\begin{align*}
|x|^k e^{tx} \le C_k e^{(\beta+\varepsilon)x}.
\end{align*}
If $x<0$, write $y=-x>0$. Now increasing the exponent decreases $e^{tx}$ because $x$ is negative. Since $t \ge \beta-\varepsilon/2$, we get
\begin{align*}
|x|^k e^{tx} = y^k e^{-ty} \le y^k e^{-(\beta-\varepsilon/2)y}.
\end{align*}
Again isolate the spare exponential decay:
\begin{align*}
y^k e^{-(\beta-\varepsilon/2)y} = y^k e^{-\varepsilon y/2}e^{-(\beta-\varepsilon)y}.
\end{align*}
The first factor is at most $C_k$, and because $y=-x$ the second factor is $e^{(\beta-\varepsilon)x}$. Therefore
\begin{align*}
|x|^k e^{tx} \le C_k e^{(\beta-\varepsilon)x}.
\end{align*}
Combining the two sign cases gives the uniform estimate. Substituting $x=F(\omega)$ gives
\begin{align*}
|F(\omega)|^k e^{tF(\omega)} \le C_k\left(e^{(\beta+\varepsilon)F(\omega)}+e^{(\beta-\varepsilon)F(\omega)}\right)
\end{align*}
for every $\omega \in \Omega$ and every $t \in J$. The right-hand side is integrable because $\beta+\varepsilon$ and $\beta-\varepsilon$ lie in $I$, and the theorem assumes finite exponential moments at every point of $I$.
[/guided]
[/step]
[step:Differentiate the partition function under the expectation]
Define the partition function $Z_F: I \to (0,\infty)$ by
\begin{align*}
Z_F(t)=\mathbb E[e^{tF}].
\end{align*}
The preceding domination estimate with $k=0$ shows that $e^{tF}$ is integrable for each $t \in J$. For fixed $t \in J$ and sufficiently small $h \in \mathbb R$ with $t+h \in J$, the [mean value theorem](/theorems/186) applied to the map $s \mapsto e^{sF(\omega)}$ gives
\begin{align*}
\left|\frac{e^{(t+h)F(\omega)}-e^{tF(\omega)}}{h}\right| \le |F(\omega)|\sup_{s \in J} e^{sF(\omega)}.
\end{align*}
The right-hand side is bounded by an integrable multiple of $e^{(\beta+\varepsilon)F}+e^{(\beta-\varepsilon)F}$ by the domination estimate with $k=1$. Therefore the [Dominated Convergence Theorem](/theorems/4) gives
\begin{align*}
Z_F'(t)=\mathbb E[F e^{tF}]
\end{align*}
for every $t \in J$.
Apply the same argument to the map $t \mapsto F e^{tF}$. For fixed $t \in J$ and small $h$ with $t+h \in J$,
\begin{align*}
\left|\frac{F e^{(t+h)F}-F e^{tF}}{h}\right| \le F^2 \sup_{s \in J} e^{sF}.
\end{align*}
The domination estimate with $k=2$ gives an integrable dominating function, so the [Dominated Convergence Theorem](/theorems/4) yields
\begin{align*}
Z_F''(t)=\mathbb E[F^2 e^{tF}]
\end{align*}
for every $t \in J$. The same domination argument, applied to $F e^{tF}$ and $F^2 e^{tF}$ as functions of $t$, also shows that $Z_F'$ and $Z_F''$ are continuous on $J$. Since $\beta \in I$ was arbitrary, $Z_F$ is twice differentiable on $I$, with continuous first derivative on each compact subinterval of $I$.
[guided]
We now justify differentiating under the expectation, which is the analytic step where the local domination estimate is used. Define the partition function $Z_F: I \to (0,\infty)$ by
\begin{align*}
Z_F(t)=\mathbb E[e^{tF}].
\end{align*}
The codomain is $(0,\infty)$ because $e^{tF}>0$ almost surely and the hypothesis gives $e^{tF} \in L^1(\Omega,\mathcal A,\mathbb P)$ for each $t \in I$.
Fix $t \in J$. For all sufficiently small $h \in \mathbb R$ with $t+h \in J$, apply the ordinary mean value theorem to the differentiable map $s \mapsto e^{sF(\omega)}$ on the interval with endpoints $t$ and $t+h$. For each $\omega \in \Omega$, this gives
\begin{align*}
\left|\frac{e^{(t+h)F(\omega)}-e^{tF(\omega)}}{h}\right| \le |F(\omega)|\sup_{s \in J} e^{sF(\omega)}.
\end{align*}
The first-step domination estimate with $k=1$ bounds the right-hand side by an integrable function depending only on $\beta$ and $\varepsilon$, namely a constant multiple of $e^{(\beta+\varepsilon)F}+e^{(\beta-\varepsilon)F}$. The exponential moment hypothesis makes this function integrable because $\beta+\varepsilon$ and $\beta-\varepsilon$ belong to $I$.
For each $\omega \in \Omega$, the difference quotient converges pointwise to $F(\omega)e^{tF(\omega)}$ as $h \to 0$. The [Dominated Convergence Theorem](/theorems/4) applies to the family of difference quotients, so
\begin{align*}
Z_F'(t)=\mathbb E[F e^{tF}].
\end{align*}
To differentiate once more, repeat the same argument for the map $t \mapsto F e^{tF}$. For fixed $t \in J$ and sufficiently small $h$ with $t+h \in J$, the mean value theorem applied to $s \mapsto F(\omega)e^{sF(\omega)}$ gives
\begin{align*}
\left|\frac{F(\omega)e^{(t+h)F(\omega)}-F(\omega)e^{tF(\omega)}}{h}\right| \le F(\omega)^2\sup_{s \in J} e^{sF(\omega)}.
\end{align*}
The domination estimate with $k=2$ bounds this by an integrable multiple of $e^{(\beta+\varepsilon)F}+e^{(\beta-\varepsilon)F}$. The pointwise limit of the difference quotient is $F(\omega)^2e^{tF(\omega)}$, so the [Dominated Convergence Theorem](/theorems/4) gives
\begin{align*}
Z_F''(t)=\mathbb E[F^2 e^{tF}].
\end{align*}
The same dominated convergence argument also gives continuity of the derivative formulas on $J$. Indeed, if $t_n \to t$ in $J$, then $F e^{t_nF} \to F e^{tF}$ and $F^2e^{t_nF} \to F^2e^{tF}$ pointwise, and the domination estimates with $k=1$ and $k=2$ give integrable dominating functions independent of $n$ once $n$ is large. Hence $Z_F'$ and $Z_F''$ are continuous on $J$. Because $t \in J$ was arbitrary, $Z_F$ is twice differentiable on $J$ with continuous first derivative. Because the original point $\beta \in I$ was arbitrary and every point of $I$ has such a neighbourhood $J$, $Z_F$ is twice differentiable on all of $I$, with continuous first derivative on each compact subinterval of $I$.
[/guided]
[/step]
[step:Compute the first derivative as the tilted expectation]
For each $\beta \in I$, $Z_F(\beta)>0$ because $e^{\beta F}>0$ almost surely. Define the probability measure $\mathbb P_\beta$ on $(\Omega,\mathcal A)$ by
\begin{align*}
\mathbb P_\beta(A)=\frac{\mathbb E[\mathbb 1_A e^{\beta F}]}{Z_F(\beta)}
\end{align*}
for every event $A \in \mathcal A$. The previous step with $t=\beta$ gives $F e^{\beta F} \in L^1(\Omega,\mathcal A,\mathbb P)$, so $F \in L^1(\Omega,\mathcal A,\mathbb P_\beta)$ and
\begin{align*}
\mathbb E_\beta[F]=\frac{\mathbb E[F e^{\beta F}]}{Z_F(\beta)}.
\end{align*}
Since $\Lambda_F(\beta)=\log Z_F(\beta)$ and $Z_F(\beta)>0$, compute the derivative directly. For $h \ne 0$ with $\beta+h \in I$ and $Z_F(\beta+h) \ne Z_F(\beta)$, the [Mean Value Theorem](/theorems/632) applied to $r \mapsto \log r$ on the interval with endpoints $Z_F(\beta)$ and $Z_F(\beta+h)$ gives a number $\xi_h$ between these two positive values such that
\begin{align*}
\frac{\log Z_F(\beta+h)-\log Z_F(\beta)}{h}=\frac{1}{\xi_h}\frac{Z_F(\beta+h)-Z_F(\beta)}{h}.
\end{align*}
If $Z_F(\beta+h)=Z_F(\beta)$, the same displayed identity holds with the left-hand side and the second factor on the right both equal to $0$. Since $Z_F$ is continuous and differentiable at $\beta$, letting $h \to 0$ gives
\begin{align*}
\Lambda_F'(\beta)=\frac{Z_F'(\beta)}{Z_F(\beta)}.
\end{align*}
Substituting $Z_F'(\beta)=\mathbb E[F e^{\beta F}]$ gives
\begin{align*}
\Lambda_F'(\beta)=\mathbb E_\beta[F].
\end{align*}
[/step]
[step:Compute the second derivative as the tilted variance]
The previous differentiation step gives $F^2 e^{\beta F} \in L^1(\Omega,\mathcal A,\mathbb P)$, so $F \in L^2(\Omega,\mathcal A,\mathbb P_\beta)$. Set $A: I \to \mathbb R$ by $A(t)=Z_F'(t)$ and $B: I \to (0,\infty)$ by $B(t)=Z_F(t)$. Since $A$ and $B$ are differentiable at $\beta$ and $B$ is continuous with $B(\beta)>0$, the quotient derivative follows from the identity
\begin{align*}
\frac{A(\beta+h)}{B(\beta+h)}-\frac{A(\beta)}{B(\beta)}=\frac{(A(\beta+h)-A(\beta))B(\beta)-A(\beta)(B(\beta+h)-B(\beta))}{B(\beta+h)B(\beta)}
\end{align*}
valid for all sufficiently small $h$ with $\beta+h \in I$. Dividing by $h$ and letting $h \to 0$ gives
\begin{align*}
\Lambda_F''(\beta)=\frac{Z_F''(\beta)Z_F(\beta)-(Z_F'(\beta))^2}{(Z_F(\beta))^2}.
\end{align*}
Equivalently,
\begin{align*}
\Lambda_F''(\beta)=\frac{Z_F''(\beta)}{Z_F(\beta)}-\left(\frac{Z_F'(\beta)}{Z_F(\beta)}\right)^2.
\end{align*}
Using the identities
\begin{align*}
Z_F'(\beta)=\mathbb E[F e^{\beta F}]
\end{align*}
and
\begin{align*}
Z_F''(\beta)=\mathbb E[F^2 e^{\beta F}],
\end{align*}
we obtain
\begin{align*}
\Lambda_F''(\beta)=\mathbb E_\beta[F^2]-(\mathbb E_\beta[F])^2.
\end{align*}
By the definition of variance under $\mathbb P_\beta$,
\begin{align*}
\operatorname{Var}_{\mathbb P_\beta}(F)=\mathbb E_\beta[F^2]-(\mathbb E_\beta[F])^2.
\end{align*}
Thus
\begin{align*}
\Lambda_F''(\beta)=\operatorname{Var}_{\mathbb P_\beta}(F).
\end{align*}
[/step]
[step:Conclude convexity from non-negativity of the variance]
For every $\beta \in I$,
\begin{align*}
\Lambda_F''(\beta)=\operatorname{Var}_{\mathbb P_\beta}(F)\ge 0.
\end{align*}
Let $a,b \in I$ with $a<b$. The preceding steps show that $\Lambda_F'$ is continuous on $[a,b]$ and differentiable on $(a,b)$. Applying the [Mean Value Theorem](/theorems/632) to $\Lambda_F'$ on $[a,b]$, there exists $c \in (a,b)$ such that
\begin{align*}
\Lambda_F'(b)-\Lambda_F'(a)=(b-a)\Lambda_F''(c).
\end{align*}
Since $b-a>0$ and $\Lambda_F''(c)\ge 0$, this proves $\Lambda_F'(a)\le \Lambda_F'(b)$. Therefore $\Lambda_F'$ is nondecreasing on $I$. It remains only to translate this monotonicity into convexity. Let $x,y,z \in I$ with $x<y<z$. Applying the [Mean Value Theorem](/theorems/632) to $\Lambda_F$ on $[x,y]$ and on $[y,z]$, there exist $p \in (x,y)$ and $q \in (y,z)$ such that
\begin{align*}
\frac{\Lambda_F(y)-\Lambda_F(x)}{y-x}=\Lambda_F'(p)
\end{align*}
and
\begin{align*}
\frac{\Lambda_F(z)-\Lambda_F(y)}{z-y}=\Lambda_F'(q).
\end{align*}
Since $p<q$ and $\Lambda_F'$ is nondecreasing, the left secant slope is at most the right secant slope. Rearranging this secant-slope inequality gives, for $y=(1-\theta)x+\theta z$ with $\theta \in (0,1)$,
\begin{align*}
\Lambda_F(y)\le (1-\theta)\Lambda_F(x)+\theta\Lambda_F(z).
\end{align*}
The cases $\theta=0$ and $\theta=1$ are equalities, so $\Lambda_F$ is convex on $I$. Combining the first- and second-derivative formulas proves the theorem.
[/step]