[proofplan]
The proof first uses independence to factor the moment generating function of the sum into the product of the individual moment generating functions. The assumed sub-Gaussian bounds then combine multiplicatively into a single exponential with variance proxy $V=\sum_{i=1}^n\sigma_i^2$. The tail estimate follows from Chernoff's method: apply Markov's inequality to $e^{\lambda S_n}$ and $e^{-\lambda S_n}$, optimize over $\lambda>0$, and add the two one-sided estimates.
[/proofplan]
[step:Factor the moment generating function using independence]
Let $(\Omega,\mathcal F,\mathbb P)$ denote the underlying probability space on which the random variables $X_1,\dots,X_n$ are defined. Fix $\lambda\in\mathbb R$. For each $i\in\{1,\dots,n\}$, define the non-negative [random variable](/page/Random%20Variable) $Y_{\lambda,i}:\Omega\to[0,\infty)$ by
\begin{align*}
Y_{\lambda,i}(\omega)=e^{\lambda X_i(\omega)}.
\end{align*}
Since $Y_{\lambda,i}$ is a Borel measurable function of $X_i$, the random variables $Y_{\lambda,1},\dots,Y_{\lambda,n}$ are independent. The hypothesis gives
\begin{align*}
\mathbb E[Y_{\lambda,i}]=\mathbb E[e^{\lambda X_i}]\le \exp\left(\frac{\sigma_i^2\lambda^2}{2}\right)<\infty.
\end{align*}
Since $S_n=\sum_{i=1}^n X_i$, the exponential identity gives
\begin{align*}
\mathbb E[e^{\lambda S_n}]=\mathbb E\left[\prod_{i=1}^n e^{\lambda X_i}\right].
\end{align*}
We use the finite product expectation identity for independent integrable random variables, and include the verification for the present variables.
[claim:Factor expectations for the independent exponential variables]
For every $k\in\{1,\dots,n\}$,
\begin{align*}
\mathbb E\left[\prod_{i=1}^k Y_{\lambda,i}\right]=\prod_{i=1}^k\mathbb E[Y_{\lambda,i}].
\end{align*}
[/claim]
[proof]
The case $k=1$ is the identity $\mathbb E[Y_{\lambda,1}]=\mathbb E[Y_{\lambda,1}]$. For $k=2$, independence gives the identity first for indicator functions of Borel sets, then for non-negative simple functions by linearity. For the non-negative integrable variables $Y_{\lambda,1}$ and $Y_{\lambda,2}$, choose non-negative simple functions $U_m:\Omega\to[0,\infty)$ and $V_m:\Omega\to[0,\infty)$ such that $U_m\uparrow Y_{\lambda,1}$ and $V_m\uparrow Y_{\lambda,2}$ pointwise. Applying the simple-function identity to $U_m$ and $V_m$, and then passing to the limit using continuity of expectation from below, gives the two-variable identity for $Y_{\lambda,1}$ and $Y_{\lambda,2}$. For the induction step, assume the identity holds for $k-1$. The random vector $(Y_{\lambda,1},\dots,Y_{\lambda,k-1})$ is independent of $Y_{\lambda,k}$ because $Y_{\lambda,1},\dots,Y_{\lambda,k}$ are independent. Applying the two-variable identity to $\prod_{i=1}^{k-1}Y_{\lambda,i}$ and $Y_{\lambda,k}$ gives
\begin{align*}
\mathbb E\left[\prod_{i=1}^k Y_{\lambda,i}\right]
=\mathbb E\left[\prod_{i=1}^{k-1}Y_{\lambda,i}\right]\mathbb E[Y_{\lambda,k}].
\end{align*}
Using the induction hypothesis on the first factor gives
\begin{align*}
\mathbb E\left[\prod_{i=1}^k Y_{\lambda,i}\right]
=\prod_{i=1}^k\mathbb E[Y_{\lambda,i}].
\end{align*}
[/proof]
Applying the claim with $k=n$ gives
\begin{align*}
\mathbb E[e^{\lambda S_n}]=\prod_{i=1}^n \mathbb E[e^{\lambda X_i}].
\end{align*}
[/step]
[step:Combine the individual sub-Gaussian bounds]
Using the hypothesis for each $i\in\{1,\dots,n\}$ in the product obtained above,
\begin{align*}
\mathbb E[e^{\lambda S_n}]\le \prod_{i=1}^n \exp\left(\frac{\sigma_i^2\lambda^2}{2}\right).
\end{align*}
Combining the product of exponentials into one exponential gives
\begin{align*}
\mathbb E[e^{\lambda S_n}]\le \exp\left(\frac{\lambda^2}{2}\sum_{i=1}^n\sigma_i^2\right).
\end{align*}
This proves the asserted moment generating function bound.
[/step]
[step:Derive the upper tail estimate by optimizing Chernoff's bound]
Define
\begin{align*}
V:=\sum_{i=1}^n\sigma_i^2.
\end{align*}
Assume first that $V>0$. Let $t\ge0$ and let $\lambda>0$. Define the non-negative random variable $Z_\lambda:\Omega\to[0,\infty)$ by
\begin{align*}
Z_\lambda(\omega)=e^{\lambda S_n(\omega)}.
\end{align*}
Since the exponential function is increasing,
\begin{align*}
\{S_n\ge t\}\subseteq \{Z_\lambda\ge e^{\lambda t}\}.
\end{align*}
For any non-negative random variable $Z:\Omega\to[0,\infty)$ and any $a>0$, the pointwise inequality
\begin{align*}
a\mathbb{1}_{\{Z\ge a\}}\le Z
\end{align*}
gives the [Markov inequality](/theorems/514) estimate
\begin{align*}
\mathbb P(Z\ge a)\le a^{-1}\mathbb E[Z]
\end{align*}
after taking expectations. Applying this with $Z=Z_\lambda$ and $a=e^{\lambda t}$ gives
\begin{align*}
\mathbb P(S_n\ge t)\le e^{-\lambda t}\mathbb E[e^{\lambda S_n}].
\end{align*}
Using the moment generating function bound with variance proxy $V$ gives
\begin{align*}
\mathbb P(S_n\ge t)\le \exp\left(-\lambda t+\frac{\lambda^2V}{2}\right).
\end{align*}
If $t>0$, choose $\lambda=t/V$, which is positive, and obtain
\begin{align*}
\mathbb P(S_n\ge t)
\le
\exp\left(-\frac{t^2}{2V}\right).
\end{align*}
If $t=0$, the same bound holds because $\mathbb P(S_n\ge0)\le1=\exp(0)$.
[guided]
We want to convert the moment generating function estimate into a probability estimate for the event $\{S_n\ge t\}$. The standard device is to exponentiate the event because the exponential is increasing and because we already control expectations of exponentials.
Define
\begin{align*}
V:=\sum_{i=1}^n\sigma_i^2.
\end{align*}
Assume in this step that $V>0$. Fix $t\ge0$ and choose an auxiliary parameter $\lambda>0$. Define the non-negative random variable $Z_\lambda:\Omega\to[0,\infty)$ by
\begin{align*}
Z_\lambda(\omega)=e^{\lambda S_n(\omega)}.
\end{align*}
The parameter $\lambda$ is required to be positive so that the map $s\mapsto e^{\lambda s}$ is increasing. Therefore, whenever $S_n(\omega)\ge t$, we have $e^{\lambda S_n(\omega)}\ge e^{\lambda t}$, so
\begin{align*}
\{S_n\ge t\}\subseteq \{Z_\lambda\ge e^{\lambda t}\}.
\end{align*}
The required one-sided probability bound is the elementary estimate behind Markov's inequality. Since $Z_\lambda$ is non-negative and has finite expectation by the moment generating function estimate, and since $e^{\lambda t}>0$, the pointwise inequality
\begin{align*}
e^{\lambda t}\mathbb{1}_{\{Z_\lambda\ge e^{\lambda t}\}}\le Z_\lambda
\end{align*}
gives, after taking expectations,
\begin{align*}
\mathbb P(S_n\ge t)\le \mathbb P(Z_\lambda\ge e^{\lambda t})\le e^{-\lambda t}\mathbb E[Z_\lambda].
\end{align*}
Since $Z_\lambda=e^{\lambda S_n}$ and the moment generating function bound has variance proxy $V$,
\begin{align*}
\mathbb P(S_n\ge t)\le \exp\left(-\lambda t+\frac{\lambda^2V}{2}\right).
\end{align*}
Now we choose $\lambda$ to make the exponent as small as possible. Define the function $\varphi:(0,\infty)\to\mathbb R$ by
\begin{align*}
\varphi(\lambda)=-\lambda t+\frac{\lambda^2V}{2}.
\end{align*}
This function is a quadratic with derivative $\varphi'(\lambda)=-t+\lambda V$. If $t>0$, its minimum on $(0,\infty)$ occurs at $\lambda=t/V$. Substituting this value gives
\begin{align*}
-\frac{t}{V}t+\frac{1}{2}\frac{t^2}{V^2}V=-\frac{t^2}{2V}.
\end{align*}
Thus, for $t>0$,
\begin{align*}
\mathbb P(S_n\ge t)
\le
\exp\left(-\frac{t^2}{2V}\right).
\end{align*}
For $t=0$, the same displayed estimate reads $\mathbb P(S_n\ge0)\le1$, which holds because every probability is at most $1$.
[/guided]
[/step]
[step:Derive the lower tail estimate with the same exponential method]
Assume $V>0$. Let $t\ge0$ and let $\lambda>0$. Define the non-negative random variable $W_\lambda:\Omega\to[0,\infty)$ by
\begin{align*}
W_\lambda(\omega)=e^{-\lambda S_n(\omega)}.
\end{align*}
Since the map $s\mapsto e^{-\lambda s}$ is decreasing, the implication $S_n(\omega)\le -t$ gives $e^{-\lambda S_n(\omega)}\ge e^{\lambda t}$. Hence
\begin{align*}
\{S_n\le -t\}\subseteq\{W_\lambda\ge e^{\lambda t}\}.
\end{align*}
The pointwise Markov inequality estimate applied to $W_\lambda$ with threshold $e^{\lambda t}$ gives
\begin{align*}
\mathbb P(S_n\le -t)
\le e^{-\lambda t}\mathbb E[e^{-\lambda S_n}].
\end{align*}
Applying the moment generating function bound with $-\lambda$ in place of $\lambda$ gives
\begin{align*}
\mathbb E[e^{-\lambda S_n}]
\le
\exp\left(\frac{\lambda^2V}{2}\right),
\end{align*}
and therefore
\begin{align*}
\mathbb P(S_n\le -t)
\le
\exp\left(-\lambda t+\frac{\lambda^2V}{2}\right).
\end{align*}
If $t>0$, choose $\lambda=t/V$ to obtain
\begin{align*}
\mathbb P(S_n\le -t)
\le \exp\left(-\frac{t^2}{2V}\right).
\end{align*}
If $t=0$, the same bound holds because $\mathbb P(S_n\le0)\le1=\exp(0)$.
[/step]
[step:Combine the two one-sided estimates and handle the degenerate case]
Assume $V>0$. Since
\begin{align*}
\{|S_n|\ge t\}
=
\{S_n\ge t\}\cup\{S_n\le -t\},
\end{align*}
the union bound gives
\begin{align*}
\mathbb P(|S_n|\ge t)\le \mathbb P(S_n\ge t)+\mathbb P(S_n\le -t).
\end{align*}
Applying the two one-sided estimates gives
\begin{align*}
\mathbb P(|S_n|\ge t)\le 2\exp\left(-\frac{t^2}{2V}\right).
\end{align*}
It remains to consider $V=0$. Since each $\sigma_i^2\ge0$ and $\sum_{i=1}^n\sigma_i^2=0$, we have $\sigma_i=0$ for every $i\in\{1,\dots,n\}$. The moment generating function bound gives
\begin{align*}
\mathbb E[e^{\lambda X_i}]\le 1
\end{align*}
for every $\lambda\in\mathbb R$. Applying the same one-sided exponential estimate to $X_i$ yields, for every $a>0$ and every $\lambda>0$,
\begin{align*}
\mathbb P(X_i\ge a)\le e^{-\lambda a}.
\end{align*}
Letting $\lambda\to\infty$ gives $\mathbb P(X_i\ge a)=0$. Applying the estimate to $-X_i$ gives $\mathbb P(X_i\le -a)=0$. Taking the countable union over $a=1/m$ with $m\in\mathbb N$, we obtain $\mathbb P(X_i\ne0)=0$. Thus each $X_i=0$ $\mathbb P$-a.s., so $S_n=0$ $\mathbb P$-a.s. Consequently, for every $t>0$,
\begin{align*}
\mathbb P(|S_n|\ge t)=0.
\end{align*}
This proves the stated degenerate alternative when $V=0$ and avoids evaluating the expression with denominator $V$ in that case. This completes the proof.
[/step]