[proofplan]
We prove the upper-tail estimate by comparing the indicator of the event $\{X\ge t\}$ with the exponential [random variable](/page/Random%20Variable) $e^{\lambda X}$ for each admissible $\lambda>0$. Taking expectations gives the usual Chernoff estimate with a free parameter $\lambda$. Since the exponential function is increasing, optimizing over $\lambda$ converts the infimum form into the Legendre-transform supremum form. The lower-tail estimate is the same argument with $\lambda<0$, where the sign of $\lambda$ reverses the pointwise comparison.
[/proofplan]
custom_env
admin
[step:Bound the upper tail for one admissible positive parameter]
Fix $t\in\mathbb R$ and fix $\lambda\in D_X\cap(0,\infty)$. Let
\begin{align*}
A_t:=\{\omega\in\Omega:X(\omega)\ge t\}
\end{align*}
denote the upper-tail event. Since $X$ is measurable and $[t,\infty)\in\mathcal B(\mathbb R)$, the set $A_t=X^{-1}([t,\infty))$ belongs to $\mathcal F$.
For every $\omega\in A_t$, the inequality $\lambda>0$ gives $\lambda X(\omega)\ge \lambda t$, hence
\begin{align*}
1\le \exp(\lambda X(\omega)-\lambda t).
\end{align*}
For every $\omega\notin A_t$, the indicator is $0$, so the same inequality holds after multiplying by $\mathbb 1_{A_t}$. Thus, pointwise on $\Omega$,
\begin{align*}
\mathbb 1_{A_t}\le \exp(-\lambda t)e^{\lambda X}.
\end{align*}
Taking expectations of both sides, using monotonicity of the [Lebesgue integral](/page/Lebesgue%20Integral) with respect to $\mathbb P$, and using $\lambda\in D_X$, we obtain
\begin{align*}
\mathbb P(X\ge t)
=\mathbb E[\mathbb 1_{A_t}]
\le \exp(-\lambda t)\mathbb E[e^{\lambda X}]
=\exp(\Lambda_X(\lambda)-\lambda t).
\end{align*}
[/step]
custom_env
admin
[step:Optimize the upper-tail estimate over positive admissible parameters]The previous step gives, for every $\lambda\in D_X\cap(0,\infty)$,
\begin{align*}
\mathbb P(X\ge t)\le \exp(\Lambda_X(\lambda)-\lambda t).
\end{align*}
Taking the infimum over $\lambda\in D_X\cap(0,\infty)$ yields
\begin{align*}
\mathbb P(X\ge t)
\le \inf_{\lambda\in D_X\cap(0,\infty)}\exp(\Lambda_X(\lambda)-\lambda t).
\end{align*}
Because $\exp:\mathbb R\to(0,\infty)$ is increasing,
\begin{align*}
\inf_{\lambda\in D_X\cap(0,\infty)}\exp(\Lambda_X(\lambda)-\lambda t)
=
\exp\left(\inf_{\lambda\in D_X\cap(0,\infty)}\{\Lambda_X(\lambda)-\lambda t\}\right).
\end{align*}
Finally,
\begin{align*}
\inf_{\lambda\in D_X\cap(0,\infty)}\{\Lambda_X(\lambda)-\lambda t\}
=
-\sup_{\lambda\in D_X\cap(0,\infty)}\{\lambda t-\Lambda_X(\lambda)\}.
\end{align*}
Therefore
\begin{align*}
\mathbb P(X\ge t)\le
\exp\left(-\sup_{\lambda\in D_X\cap(0,\infty)}\{\lambda t-\Lambda_X(\lambda)\}\right).
\end{align*}[/step]
custom_env
admin
[guided]We now turn the one-parameter estimate into the advertised Legendre-transform form. From the preceding step, every admissible positive parameter $\lambda\in D_X\cap(0,\infty)$ gives the valid bound
\begin{align*}
\mathbb P(X\ge t)\le \exp(\Lambda_X(\lambda)-\lambda t).
\end{align*}
Since the left-hand side does not depend on $\lambda$, it is bounded by the best of these right-hand sides:
\begin{align*}
\mathbb P(X\ge t)
\le \inf_{\lambda\in D_X\cap(0,\infty)}\exp(\Lambda_X(\lambda)-\lambda t).
\end{align*}
The remaining point is purely order-theoretic. The exponential function $\exp:\mathbb R\to(0,\infty)$ is increasing, so taking the infimum after exponentiating is the same as exponentiating the infimum:
\begin{align*}
\inf_{\lambda\in D_X\cap(0,\infty)}\exp(\Lambda_X(\lambda)-\lambda t)
=
\exp\left(\inf_{\lambda\in D_X\cap(0,\infty)}\{\Lambda_X(\lambda)-\lambda t\}\right).
\end{align*}
The expression inside the exponential is then rewritten by factoring out a minus sign:
\begin{align*}
\inf_{\lambda\in D_X\cap(0,\infty)}\{\Lambda_X(\lambda)-\lambda t\}
=
-\sup_{\lambda\in D_X\cap(0,\infty)}\{\lambda t-\Lambda_X(\lambda)\}.
\end{align*}
Substituting this identity gives
\begin{align*}
\mathbb P(X\ge t)\le
\exp\left(-\sup_{\lambda\in D_X\cap(0,\infty)}\{\lambda t-\Lambda_X(\lambda)\}\right).
\end{align*}
This is exactly the upper-tail [Chernoff bound](/theorems/6038) in Legendre transform form.[/guided]
custom_env
admin
[step:Repeat the pointwise comparison for the lower tail with negative parameters]
Fix $\lambda\in D_X\cap(-\infty,0)$. Let
\begin{align*}
B_t:=\{\omega\in\Omega:X(\omega)\le t\}
\end{align*}
denote the lower-tail event. Since $X$ is measurable and $(-\infty,t]\in\mathcal B(\mathbb R)$, the set $B_t=X^{-1}((-\infty,t])$ belongs to $\mathcal F$.
For every $\omega\in B_t$, the inequality $\lambda<0$ reverses order and gives $\lambda X(\omega)\ge \lambda t$. Therefore
\begin{align*}
1\le \exp(\lambda X(\omega)-\lambda t).
\end{align*}
As before, this implies the pointwise inequality
\begin{align*}
\mathbb 1_{B_t}\le \exp(-\lambda t)e^{\lambda X}.
\end{align*}
Taking expectations and using $\lambda\in D_X$ gives
\begin{align*}
\mathbb P(X\le t)
=\mathbb E[\mathbb 1_{B_t}]
\le \exp(-\lambda t)\mathbb E[e^{\lambda X}]
=\exp(\Lambda_X(\lambda)-\lambda t).
\end{align*}
Taking the infimum over $\lambda\in D_X\cap(-\infty,0)$ and using the same increasing-exponential and infimum-supremum identities as above, we obtain
\begin{align*}
\mathbb P(X\le t)\le
\exp\left(-\sup_{\lambda\in D_X\cap(-\infty,0)}\{\lambda t-\Lambda_X(\lambda)\}\right).
\end{align*}
[/step]