[proofplan]
We first combine the individual moment-generating-function bounds using independence, obtaining one sub-exponential MGF bound for the sum. We then prove the one-sided tail bound by the exponential Markov argument and optimize the exponential parameter in two regimes: a quadratic regime for moderate $t$ and a linear regime for large $t$. Finally, we apply the same one-sided estimate to the variables $-X_i$ and use the elementary union bound to obtain the two-sided estimate.
[/proofplan]
[step:Combine the individual moment-generating-function bounds for the sum]
Define the sum [random variable](/page/Random%20Variable) $S:(\Omega,\mathcal{F}) \to (\mathbb{R},\mathcal{B}(\mathbb{R}))$ by
\begin{align*}
S(\omega)=\sum_{i=1}^n X_i(\omega).
\end{align*}
Let $\lambda \in [0,1/b)$. Since $b=\max_{1\le i\le n} b_i$, we have $\lambda<1/b_i$ for every $1\le i\le n$. By independence of $X_1,\dots,X_n$, the random variables $e^{\lambda X_1},\dots,e^{\lambda X_n}$ are independent, and hence
\begin{align*}
\mathbb{E}\left[e^{\lambda S}\right]=\mathbb{E}\left[\prod_{i=1}^n e^{\lambda X_i}\right]=\prod_{i=1}^n \mathbb{E}\left[e^{\lambda X_i}\right]\le \prod_{i=1}^n \exp\left(\frac{\nu_i^2\lambda^2}{2}\right)=\exp\left(\frac{\lambda^2}{2}\sum_{i=1}^n \nu_i^2\right)=\exp\left(\frac{\nu^2\lambda^2}{2}\right).
\end{align*}
Thus
\begin{align*}
\mathbb{E}\left[e^{\lambda S}\right]\le \exp\left(\frac{\nu^2\lambda^2}{2}\right)
\end{align*}
for every $\lambda\in[0,1/b)$.
[guided]
The goal of this step is to pass from bounds for the separate variables $X_i$ to a bound for their sum. Define the measurable map $S:(\Omega,\mathcal{F}) \to (\mathbb{R},\mathcal{B}(\mathbb{R}))$ by
\begin{align*}
S(\omega)=\sum_{i=1}^n X_i(\omega).
\end{align*}
Fix $\lambda\in[0,1/b)$. Because $b=\max_{1\le i\le n} b_i$, we have $b_i\le b$ for every $i$, and therefore $1/b\le 1/b_i$. Hence $\lambda<1/b_i$ for every $1\le i\le n$, so each individual MGF hypothesis is available at this same value of $\lambda$.
Since $X_1,\dots,X_n$ are independent, the [measurable functions](/page/Measurable%20Functions) $e^{\lambda X_1},\dots,e^{\lambda X_n}$ are also independent. Therefore the expectation of their product factors:
\begin{align*}
\mathbb{E}\left[e^{\lambda S}\right]=\mathbb{E}\left[e^{\lambda\sum_{i=1}^n X_i}\right]=\mathbb{E}\left[\prod_{i=1}^n e^{\lambda X_i}\right]=\prod_{i=1}^n \mathbb{E}\left[e^{\lambda X_i}\right].
\end{align*}
Now apply the assumed MGF bound to each factor:
\begin{align*}
\prod_{i=1}^n \mathbb{E}\left[e^{\lambda X_i}\right]\le \prod_{i=1}^n \exp\left(\frac{\nu_i^2\lambda^2}{2}\right)=\exp\left(\sum_{i=1}^n \frac{\nu_i^2\lambda^2}{2}\right)=\exp\left(\frac{\lambda^2}{2}\sum_{i=1}^n \nu_i^2\right)=\exp\left(\frac{\nu^2\lambda^2}{2}\right).
\end{align*}
Thus the sum $S$ obeys the aggregate MGF estimate
\begin{align*}
\mathbb{E}\left[e^{\lambda S}\right]\le \exp\left(\frac{\nu^2\lambda^2}{2}\right)
\end{align*}
for every $\lambda\in[0,1/b)$. This is the only point where independence is used.
[/guided]
[/step]
[step:Convert the aggregate MGF bound into a Chernoff estimate]
Let $t\ge0$ and let $\lambda\in[0,1/b)$. If $\lambda=0$, the estimate below gives the bound $\mathbb{P}(S\ge t)\le1$. If $\lambda>0$, then on the event $\{S\ge t\}$ we have $e^{\lambda S}\ge e^{\lambda t}$. Therefore
\begin{align*}
e^{\lambda t}\mathbb{1}_{\{S\ge t\}} \le e^{\lambda S}.
\end{align*}
Taking expectations and using the aggregate MGF bound gives
\begin{align*}
e^{\lambda t}\mathbb{P}(S\ge t)=\mathbb{E}\left[e^{\lambda t}\mathbb{1}_{\{S\ge t\}}\right]\le \mathbb{E}\left[e^{\lambda S}\right]\le \exp\left(\frac{\nu^2\lambda^2}{2}\right).
\end{align*}
Dividing by $e^{\lambda t}>0$, we obtain
\begin{align*}
\mathbb{P}(S\ge t)
\le \exp\left(-\lambda t+\frac{\nu^2\lambda^2}{2}\right)
\end{align*}
for every $\lambda\in[0,1/b)$.
[/step]
[step:Choose the optimizing parameter in the quadratic regime]
Assume $0\le t<\nu^2/b$. Define
\begin{align*}
\lambda_t:=\frac{t}{\nu^2}.
\end{align*}
Then $\lambda_t\in[0,1/b)$, so the Chernoff estimate applies with $\lambda=\lambda_t$. Substituting this value gives
\begin{align*}
\mathbb{P}(S\ge t)\le \exp\left(-\frac{t^2}{\nu^2}+\frac{\nu^2}{2}\frac{t^2}{\nu^4}\right)=\exp\left(-\frac{t^2}{2\nu^2}\right)\le \exp\left(-\frac{1}{4}\frac{t^2}{\nu^2}\right).
\end{align*}
[/step]
[step:Choose a fixed admissible parameter in the linear regime]
Assume $t\ge\nu^2/b$. Define
\begin{align*}
\lambda_*:=\frac{1}{2b}.
\end{align*}
Since $b>0$, we have $\lambda_*\in[0,1/b)$, so the Chernoff estimate applies with $\lambda=\lambda_*$. We obtain
\begin{align*}
\mathbb{P}(S\ge t)\le \exp\left(-\frac{t}{2b}+\frac{\nu^2}{8b^2}\right).
\end{align*}
The assumption $t\ge\nu^2/b$ implies
\begin{align*}
\frac{\nu^2}{8b^2}\le \frac{t}{8b}.
\end{align*}
Therefore
\begin{align*}
-\frac{t}{2b}+\frac{\nu^2}{8b^2}
\le -\frac{t}{2b}+\frac{t}{8b}
= -\frac{3t}{8b}
\le -\frac{t}{4b},
\end{align*}
and hence
\begin{align*}
\mathbb{P}(S\ge t)\le \exp\left(-\frac{t}{4b}\right).
\end{align*}
[guided]
For large deviations, the unconstrained minimizer $\lambda=t/\nu^2$ may exceed the permitted range $\lambda<1/b$. The standard remedy is to use a fixed value safely inside the allowed interval. Define
\begin{align*}
\lambda_*:=\frac{1}{2b}.
\end{align*}
Because $b>0$, this number is non-negative and satisfies $\lambda_*<1/b$, so it is an admissible Chernoff parameter.
Applying the Chernoff estimate with this choice gives
\begin{align*}
\mathbb{P}(S\ge t)\le \exp\left(-\lambda_* t+\frac{\nu^2\lambda_*^2}{2}\right)=\exp\left(-\frac{t}{2b}+\frac{\nu^2}{8b^2}\right).
\end{align*}
Now the large-deviation assumption $t\ge\nu^2/b$ is used to compare the variance term with the linear term:
\begin{align*}
t\ge \frac{\nu^2}{b}
\quad\Longrightarrow\quad
\frac{t}{8b}\ge \frac{\nu^2}{8b^2}.
\end{align*}
Substituting this upper bound for $\nu^2/(8b^2)$ into the exponent yields
\begin{align*}
-\frac{t}{2b}+\frac{\nu^2}{8b^2}
\le -\frac{t}{2b}+\frac{t}{8b}
= -\frac{3t}{8b}
\le -\frac{t}{4b}.
\end{align*}
Exponentiating preserves the inequality because the exponential function is increasing, so
\begin{align*}
\mathbb{P}(S\ge t)\le \exp\left(-\frac{t}{4b}\right).
\end{align*}
This is the linear tail regime: once $t$ is larger than the variance scale $\nu^2/b$, the constraint on the MGF parameter prevents further quadratic improvement.
[/guided]
[/step]
[step:Combine the two regimes into the one-sided Bernstein bound]
Let $t\ge0$. If $0\le t<\nu^2/b$, then
\begin{align*}
\min\left\{\frac{t^2}{\nu^2},\frac{t}{b}\right\}\le \frac{t^2}{\nu^2},
\end{align*}
and the quadratic-regime estimate gives
\begin{align*}
\mathbb{P}(S\ge t)
\le \exp\left(-\frac{1}{4}\frac{t^2}{\nu^2}\right)
\le \exp\left(-\frac14\min\left\{\frac{t^2}{\nu^2},\frac{t}{b}\right\}\right).
\end{align*}
If $t\ge\nu^2/b$, then
\begin{align*}
\min\left\{\frac{t^2}{\nu^2},\frac{t}{b}\right\}\le \frac{t}{b},
\end{align*}
and the linear-regime estimate gives
\begin{align*}
\mathbb{P}(S\ge t)
\le \exp\left(-\frac{1}{4}\frac{t}{b}\right)
\le \exp\left(-\frac14\min\left\{\frac{t^2}{\nu^2},\frac{t}{b}\right\}\right).
\end{align*}
Thus, for every $t\ge0$,
\begin{align*}
\mathbb{P}\left(\sum_{i=1}^n X_i\ge t\right)
\le \exp\left(-\frac14\min\left\{\frac{t^2}{\nu^2},\frac{t}{b}\right\}\right).
\end{align*}
[/step]
[step:Apply the one-sided bound to the negated variables and use the union bound]
For each $1\le i\le n$, define the measurable map $Y_i:(\Omega,\mathcal{F}) \to (\mathbb{R},\mathcal{B}(\mathbb{R}))$ by
\begin{align*}
Y_i(\omega)=-X_i(\omega).
\end{align*}
The random variables $Y_1,\dots,Y_n$ are independent and satisfy $\mathbb{E}[Y_i]=0$. Moreover, for every $\lambda\in\mathbb{R}$ with $|\lambda|<1/b_i$,
\begin{align*}
\mathbb{E}\left[e^{\lambda Y_i}\right]
= \mathbb{E}\left[e^{-\lambda X_i}\right]
\le \exp\left(\frac{\nu_i^2\lambda^2}{2}\right),
\end{align*}
because $|-\lambda|=|\lambda|$. Applying the one-sided estimate already proved to $Y_1,\dots,Y_n$ gives
\begin{align*}
\mathbb{P}\left(-S\ge t\right)
=
\mathbb{P}\left(\sum_{i=1}^n Y_i\ge t\right)
\le \exp\left(-\frac14\min\left\{\frac{t^2}{\nu^2},\frac{t}{b}\right\}\right).
\end{align*}
For the events
\begin{align*}
A_t:=\{S\ge t\}, \qquad B_t:=\{-S\ge t\},
\end{align*}
we have $\{|S|\ge t\}=A_t\cup B_t$ for $t\ge0$. Since $\mathbb{1}_{A_t\cup B_t}\le \mathbb{1}_{A_t}+\mathbb{1}_{B_t}$, taking expectations yields
\begin{align*}
\mathbb{P}(|S|\ge t)\le \mathbb{P}(A_t)+\mathbb{P}(B_t).
\end{align*}
Combining the two one-sided bounds gives
\begin{align*}
\mathbb{P}\left(\left|\sum_{i=1}^n X_i\right|\ge t\right)
\le 2\exp\left(-\frac14\min\left\{\frac{t^2}{\nu^2},\frac{t}{b}\right\}\right).
\end{align*}
This proves the two-sided Bernstein inequality.
[/step]