[proofplan]
Write the martingale increments as $X_k := M_k - M_{k-1}$. The bounded-difference assumption gives a conditional Hoeffding bound for each increment, and iterating [conditional expectation](/page/Conditional%20Expectation) yields a Gaussian-type bound for the moment generating function of $\sum_{k=1}^{n} X_k$. Exponential Markov's inequality then converts this moment generating function estimate into a one-sided tail bound, and optimizing over the exponential parameter gives the stated exponent. The two-sided estimate follows by applying the same one-sided bound to the martingale $(-M_k)_{k=0}^{n}$ and using the union bound.
[/proofplan]
[step:Reduce the martingale to bounded mean-zero increments]
For each $1 \le k \le n$, define the real-valued [random variable](/page/Random%20Variable) $X_k: \Omega \to \mathbb R$ by
\begin{align*}
X_k(\omega) = M_k(\omega) - M_{k-1}(\omega).
\end{align*}
Since $M_k$ and $M_{k-1}$ are integrable and $\mathcal F_k$-measurable, $X_k$ is integrable and $\mathcal F_k$-measurable. Since $(M_k)_{k=0}^{n}$ is a martingale with respect to $(\mathcal F_k)_{k=0}^{n}$, for every $1 \le k \le n$ we first use linearity of conditional expectation and the fact that $M_{k-1}$ is $\mathcal F_{k-1}$-measurable to get
\begin{align*}
\mathbb E[X_k \mid \mathcal F_{k-1}] = \mathbb E[M_k \mid \mathcal F_{k-1}] - M_{k-1}.
\end{align*}
The martingale property gives $\mathbb E[M_k \mid \mathcal F_{k-1}] = M_{k-1}$ $\mathbb P$-a.s., hence
\begin{align*}
\mathbb E[X_k \mid \mathcal F_{k-1}] = 0
\end{align*}
$\mathbb P$-a.s. Also,
\begin{align*}
\sum_{k=1}^{n} X_k = M_n - M_0
\end{align*}
by telescoping. The hypothesis gives $|X_k| \le c_k$ $\mathbb P$-a.s. for every $1 \le k \le n$.
[/step]
[step:Prove the conditional Hoeffding bound for one increment]
[claim:Conditional Hoeffding bound]
Let $\mathcal G \subset \mathcal F$ be a sub-$\sigma$-algebra, let $Y: \Omega \to \mathbb R$ be an integrable random variable, and let $c \ge 0$ be deterministic. Suppose $|Y| \le c$ $\mathbb P$-a.s. and $\mathbb E[Y \mid \mathcal G] = 0$ $\mathbb P$-a.s. Then, for every $\lambda \in \mathbb R$,
\begin{align*}
\mathbb E[e^{\lambda Y} \mid \mathcal G] \le \exp\left(\frac{\lambda^2 c^2}{2}\right)
\end{align*}
$\mathbb P$-a.s.
[/claim]
[proof]
If $c=0$, then $Y=0$ $\mathbb P$-a.s., and therefore
\begin{align*}
\mathbb E[e^{\lambda Y} \mid \mathcal G] = 1 = \exp(0)
\end{align*}
$\mathbb P$-a.s. Hence assume $c>0$.
For every $y \in [-c,c]$, convexity of the map $r \mapsto e^{\lambda r}$ on $\mathbb R$ gives the chord bound
\begin{align*}
e^{\lambda y}
\le \frac{c-y}{2c}e^{-\lambda c}+\frac{c+y}{2c}e^{\lambda c}.
\end{align*}
Applying this pointwise with $y=Y$ and taking conditional expectation with respect to $\mathcal G$ gives
\begin{align*}
\mathbb E[e^{\lambda Y}\mid \mathcal G] \le \mathbb E\left[\frac{c-Y}{2c}e^{-\lambda c}+\frac{c+Y}{2c}e^{\lambda c}\,\middle|\,\mathcal G\right].
\end{align*}
By linearity of conditional expectation and the hypothesis $\mathbb E[Y\mid \mathcal G]=0$ $\mathbb P$-a.s.,
\begin{align*}
\mathbb E[e^{\lambda Y}\mid \mathcal G] \le \frac{e^{-\lambda c}+e^{\lambda c}}{2}.
\end{align*}
Define the function $h: \mathbb R \to \mathbb R$ by
\begin{align*}
h(u)=\frac{e^u+e^{-u}}{2}.
\end{align*}
Using the [power series](/page/Power%20Series) expansions of $e^u$ and $e^{-u}$,
\begin{align*}
h(u)=\sum_{m=0}^{\infty}\frac{u^{2m}}{(2m)!}.
\end{align*}
Since $(2m)! \ge 2^m m!$ for every integer $m \ge 0$,
\begin{align*}
h(u) \le \sum_{m=0}^{\infty}\frac{(u^2/2)^m}{m!}=e^{u^2/2},
\end{align*}
because $(2m)! \ge 2^m m!$ for every integer $m \ge 0$. Taking $u=\lambda c$ yields
\begin{align*}
\mathbb E[e^{\lambda Y}\mid \mathcal G]
\le \exp\left(\frac{\lambda^2 c^2}{2}\right)
\end{align*}
$\mathbb P$-a.s.
[/proof]
[guided]
The point of this claim is to replace a bounded mean-zero random variable by the worst possible exponential moment allowed by its range. The proof is conditional, because later we will apply it to $X_k$ after conditioning on the past $\mathcal F_{k-1}$.
If $c=0$, the bound $|Y|\le 0$ implies $Y=0$ $\mathbb P$-a.s. Therefore $e^{\lambda Y}=1$ $\mathbb P$-a.s., and
\begin{align*}
\mathbb E[e^{\lambda Y}\mid \mathcal G] = 1 = \exp(0).
\end{align*}
So assume $c>0$.
For each number $y\in[-c,c]$, write $y$ as a convex combination of the two endpoints $-c$ and $c$:
\begin{align*}
y = \frac{c-y}{2c}(-c)+\frac{c+y}{2c}c,
\end{align*}
where both coefficients are non-negative and their sum is $1$. Since the map $r\mapsto e^{\lambda r}$ is convex on $\mathbb R$, its value at this convex combination is at most the corresponding convex combination of its endpoint values:
\begin{align*}
e^{\lambda y}
\le \frac{c-y}{2c}e^{-\lambda c}+\frac{c+y}{2c}e^{\lambda c}.
\end{align*}
Applying this inequality with $y=Y(\omega)$ gives a pointwise inequality outside a $\mathbb P$-null set, because $|Y|\le c$ $\mathbb P$-a.s. Taking conditional expectation with respect to $\mathcal G$ preserves the inequality:
\begin{align*}
\mathbb E[e^{\lambda Y}\mid \mathcal G] \le \mathbb E\left[\frac{c-Y}{2c}e^{-\lambda c}+\frac{c+Y}{2c}e^{\lambda c}\,\middle|\,\mathcal G\right].
\end{align*}
By linearity of conditional expectation,
\begin{align*}
\mathbb E\left[\frac{c-Y}{2c}e^{-\lambda c}+\frac{c+Y}{2c}e^{\lambda c}\,\middle|\,\mathcal G\right]=\frac{c-\mathbb E[Y\mid \mathcal G]}{2c}e^{-\lambda c}+\frac{c+\mathbb E[Y\mid \mathcal G]}{2c}e^{\lambda c}.
\end{align*}
The conditional mean-zero hypothesis is now used exactly once: $\mathbb E[Y\mid\mathcal G]=0$ $\mathbb P$-a.s. Hence
\begin{align*}
\mathbb E[e^{\lambda Y}\mid \mathcal G]
\le \frac{e^{-\lambda c}+e^{\lambda c}}{2}.
\end{align*}
It remains to bound the hyperbolic cosine term by a Gaussian exponential. Define the function $h: \mathbb R \to \mathbb R$ by
\begin{align*}
h(u)=\frac{e^u+e^{-u}}{2}.
\end{align*}
Using the power series for the exponential functions, the odd powers cancel and give
\begin{align*}
h(u)
= \sum_{m=0}^{\infty}\frac{u^{2m}}{(2m)!}.
\end{align*}
For every integer $m\ge 0$, the factorial inequality $(2m)!\ge 2^m m!$ gives
\begin{align*}
\frac{u^{2m}}{(2m)!}\le \frac{(u^2/2)^m}{m!}.
\end{align*}
Summing over $m$ yields
\begin{align*}
h(u)\le \sum_{m=0}^{\infty}\frac{(u^2/2)^m}{m!}=e^{u^2/2}.
\end{align*}
With $u=\lambda c$, this proves
\begin{align*}
\mathbb E[e^{\lambda Y}\mid \mathcal G]
\le \exp\left(\frac{\lambda^2 c^2}{2}\right)
\end{align*}
$\mathbb P$-a.s.
[/guided]
[/step]
[step:Iterate conditional expectations to bound the full moment generating function]
Fix $\lambda \in \mathbb R$. For each $0 \le j \le n$, define the partial-sum random variable $S_j: \Omega \to \mathbb R$ by
\begin{align*}
S_j(\omega)=\sum_{k=1}^{j} X_k(\omega),
\end{align*}
with the convention $S_0=0$. Since $|X_k|\le c_k$ $\mathbb P$-a.s. and $n<\infty$, each $e^{\lambda S_j}$ is integrable.
For $1\le k\le n$, the random variable $S_{k-1}$ is $\mathcal F_{k-1}$-measurable. Applying the conditional Hoeffding bound with $\mathcal G=\mathcal F_{k-1}$, $Y=X_k$, and $c=c_k$ gives
\begin{align*}
\mathbb E[e^{\lambda X_k}\mid \mathcal F_{k-1}]
\le \exp\left(\frac{\lambda^2 c_k^2}{2}\right)
\end{align*}
$\mathbb P$-a.s. Therefore, using the tower property and then pulling the $\mathcal F_{k-1}$-measurable factor $e^{\lambda S_{k-1}}$ out of the inner conditional expectation,
\begin{align*}
\mathbb E[e^{\lambda S_k}]=\mathbb E\left[e^{\lambda S_{k-1}}\mathbb E[e^{\lambda X_k}\mid \mathcal F_{k-1}]\right].
\end{align*}
The conditional Hoeffding bound gives
\begin{align*}
\mathbb E[e^{\lambda S_k}]\le \exp\left(\frac{\lambda^2 c_k^2}{2}\right)\mathbb E[e^{\lambda S_{k-1}}].
\end{align*}
Iterating this inequality from $k=n$ down to $k=1$ gives
\begin{align*}
\mathbb E[e^{\lambda(M_n-M_0)}]=\mathbb E[e^{\lambda S_n}]\le \prod_{k=1}^{n}\exp\left(\frac{\lambda^2 c_k^2}{2}\right)=\exp\left(\frac{\lambda^2}{2}\sum_{k=1}^{n}c_k^2\right).
\end{align*}
[/step]
[step:Apply exponential Markov and optimize the parameter]
Assume first that $S=\sum_{k=1}^{n}c_k^2>0$. Fix $t\ge 0$ and $\lambda>0$. Define the non-negative random variable $Z_\lambda: \Omega \to [0,\infty)$ by
\begin{align*}
Z_\lambda(\omega)=\exp(\lambda(M_n(\omega)-M_0(\omega))).
\end{align*}
On the event $\{M_n-M_0\ge t\}$, one has $Z_\lambda\ge e^{\lambda t}$. Hence, by monotonicity of expectation and the moment generating function estimate,
\begin{align*}
e^{\lambda t}\mathbb P(M_n-M_0\ge t)\le \mathbb E[Z_\lambda]\le \exp\left(\frac{\lambda^2 S}{2}\right).
\end{align*}
Thus
\begin{align*}
\mathbb P(M_n-M_0\ge t)
\le \exp\left(-\lambda t+\frac{\lambda^2S}{2}\right).
\end{align*}
Choose
\begin{align*}
\lambda := \frac{t}{S}.
\end{align*}
If $t=0$, this choice gives $\lambda=0$, and the desired inequality is simply $\mathbb P(M_n-M_0\ge 0)\le 1$. If $t>0$, the chosen $\lambda$ is positive and gives
\begin{align*}
-\lambda t+\frac{\lambda^2S}{2}
= -\frac{t^2}{S}+\frac{t^2}{2S}
= -\frac{t^2}{2S}.
\end{align*}
Therefore, for every $t\ge 0$,
\begin{align*}
\mathbb P(M_n-M_0\ge t)
\le \exp\left(-\frac{t^2}{2S}\right).
\end{align*}
[guided]
The moment generating function estimate becomes a tail estimate by exponentiating the event. Fix $t\ge 0$ and choose a parameter $\lambda>0$. Define the random variable $Z_\lambda: \Omega \to [0,\infty)$ by
\begin{align*}
Z_\lambda(\omega)=\exp(\lambda(M_n(\omega)-M_0(\omega))).
\end{align*}
This random variable is non-negative. On the event $\{M_n-M_0\ge t\}$, the monotonicity of the exponential function gives
\begin{align*}
Z_\lambda = \exp(\lambda(M_n-M_0))\ge e^{\lambda t}.
\end{align*}
Equivalently,
\begin{align*}
e^{\lambda t}\mathbb 1_{\{M_n-M_0\ge t\}}
\le Z_\lambda.
\end{align*}
Taking expectation and using monotonicity of expectation gives
\begin{align*}
e^{\lambda t}\mathbb P(M_n-M_0\ge t)
\le \mathbb E[Z_\lambda].
\end{align*}
The previous step bounds the right-hand side:
\begin{align*}
\mathbb E[Z_\lambda]=\mathbb E[e^{\lambda(M_n-M_0)}]\le \exp\left(\frac{\lambda^2}{2}\sum_{k=1}^{n}c_k^2\right).
\end{align*}
Writing
\begin{align*}
S:=\sum_{k=1}^{n}c_k^2,
\end{align*}
we obtain
\begin{align*}
\mathbb P(M_n-M_0\ge t)
\le \exp\left(-\lambda t+\frac{\lambda^2S}{2}\right).
\end{align*}
Now we choose $\lambda$ to make the exponent as small as possible. Define the function $q:(0,\infty)\to\mathbb R$ by
\begin{align*}
q(\lambda)=-\lambda t+\frac{\lambda^2S}{2}.
\end{align*}
This is a quadratic polynomial with derivative $q'(\lambda)=-t+\lambda S$. When $S>0$ and $t>0$, the minimum over $\lambda>0$ occurs at
\begin{align*}
\lambda=\frac{t}{S}.
\end{align*}
Substituting this value gives
\begin{align*}
-\lambda t+\frac{\lambda^2S}{2}
= -\frac{t^2}{S}+\frac{t^2}{2S}
= -\frac{t^2}{2S}.
\end{align*}
Therefore, for $t>0$,
\begin{align*}
\mathbb P(M_n-M_0\ge t)
\le \exp\left(-\frac{t^2}{2S}\right).
\end{align*}
For $t=0$, the same displayed inequality reads $\mathbb P(M_n-M_0\ge 0)\le 1$, which is true because probabilities are at most $1$. Thus the one-sided estimate holds for every $t\ge 0$ when $S>0$.
[/guided]
[/step]
[step:Apply the one-sided estimate to the negative martingale for the two-sided bound]
Assume $S>0$. Define the process $(N_k)_{k=0}^{n}$ by declaring, for each $0\le k\le n$, the random variable $N_k: \Omega\to\mathbb R$ as
\begin{align*}
N_k(\omega)=-M_k(\omega).
\end{align*}
Since $(M_k)_{k=0}^{n}$ is a martingale with respect to $(\mathcal F_k)_{k=0}^{n}$, the process $(N_k)_{k=0}^{n}$ is also a martingale with respect to the same filtration. Moreover,
\begin{align*}
|N_k-N_{k-1}|=|M_k-M_{k-1}|\le c_k
\end{align*}
$\mathbb P$-a.s. for every $1\le k\le n$. Applying the one-sided estimate to $(N_k)_{k=0}^{n}$ gives
\begin{align*}
\mathbb P(M_n-M_0\le -t)
= \mathbb P(N_n-N_0\ge t)
\le \exp\left(-\frac{t^2}{2S}\right).
\end{align*}
Since
\begin{align*}
\{|M_n-M_0|\ge t\}
\subset \{M_n-M_0\ge t\}\cup\{M_n-M_0\le -t\},
\end{align*}
the union bound gives
\begin{align*}
\mathbb P(|M_n-M_0|\ge t)\le \mathbb P(M_n-M_0\ge t)+\mathbb P(M_n-M_0\le -t)\le 2\exp\left(-\frac{t^2}{2S}\right).
\end{align*}
[/step]
[step:Handle the degenerate variance proxy]
If $S=0$, then $c_k=0$ for every $1\le k\le n$. Hence $|M_k-M_{k-1}|\le 0$ $\mathbb P$-a.s. for every $k$, so $M_k=M_{k-1}$ $\mathbb P$-a.s. for every $k$. Since there are finitely many indices, intersecting the corresponding probability-one events gives
\begin{align*}
M_n=M_0
\end{align*}
$\mathbb P$-a.s. Therefore, for every $t>0$,
\begin{align*}
\mathbb P(M_n-M_0\ge t)=0
\end{align*}
and
\begin{align*}
\mathbb P(|M_n-M_0|\ge t)=0.
\end{align*}
Together with the estimates proved for $S>0$, this proves the theorem.
[/step]