[proofplan]
We first prove a one-sided Bernstein-type tail estimate directly from the moment-generating-function assumption, using Chernoff's method and independence. The same calculation with negative exponential parameter gives the lower tail because the hypothesis holds for both positive and negative $\lambda$. We then choose the stated radius so that each one-sided tail probability is at most $\beta/2$, and combine the two estimates by the union bound.
[/proofplan]
[step:Derive the upper tail estimate by optimizing the Chernoff bound]
Define the centered sum $S_n: \Omega \to \mathbb R$ by
\begin{align*}
S_n(\omega) := \sum_{i=1}^n (X_i(\omega)-\mu).
\end{align*}
For $r > 0$ and $\lambda \in (0,1/b)$, the event $\{\bar X_n-\mu \ge r\}$ equals $\{S_n \ge nr\}$, which equals $\{e^{\lambda S_n} \ge e^{\lambda nr}\}$ because the exponential map is increasing. The [Markov inequality](/theorems/514) applied to the non-negative [random variable](/page/Random%20Variable) $e^{\lambda S_n}$ gives
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r) = \mathbb P(e^{\lambda S_n} \ge e^{\lambda nr}).
\end{align*}
Therefore
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r) \le e^{-\lambda nr}\mathbb E[e^{\lambda S_n}].
\end{align*}
Since $X_1,\dots,X_n$ are independent, the centered variables $X_i-\mu$ are independent, and therefore
\begin{align*}
\mathbb E[e^{\lambda S_n}] = \prod_{i=1}^n \mathbb E[e^{\lambda(X_i-\mu)}].
\end{align*}
The moment-generating-function hypothesis applies because $\lambda \in (0,1/b)$, so
\begin{align*}
\mathbb E[e^{\lambda S_n}] \le \prod_{i=1}^n \exp\left(\frac{\nu^2\lambda^2}{2}\right) = \exp\left(\frac{n\nu^2\lambda^2}{2}\right).
\end{align*}
Hence
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r)
\le \exp\left(-n\lambda r+\frac{n\nu^2\lambda^2}{2}\right).
\end{align*}
If $0<r<\nu^2/b$, choose $\lambda := r/\nu^2 \in (0,1/b)$. Then
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r)
\le \exp\left(-\frac{nr^2}{2\nu^2}\right).
\end{align*}
If $r \ge \nu^2/b$, then for every $\lambda \in (0,1/b)$ the same bound holds; taking $\lambda \uparrow 1/b$ yields
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r) \le \exp\left(-\frac{nr}{b}+\frac{n\nu^2}{2b^2}\right).
\end{align*}
Because $r \ge \nu^2/b$, the exponent satisfies
\begin{align*}
-\frac{nr}{b}+\frac{n\nu^2}{2b^2} \le -\frac{nr}{2b}.
\end{align*}
Hence
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r) \le \exp\left(-\frac{nr}{2b}\right).
\end{align*} Combining the two cases gives, for every $r>0$,
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r)
\le \exp\left(-\frac{n}{2}\min\left\{\frac{r^2}{\nu^2},\frac{r}{b}\right\}\right).
\end{align*}
[guided]
The goal is to convert the exponential-moment hypothesis into a tail bound for the average. The standard method is Chernoff's bound: exponentiate the event, use Markov's inequality, and then optimize the exponential parameter.
Define the centered sum $S_n: \Omega \to \mathbb R$ by
\begin{align*}
S_n(\omega) := \sum_{i=1}^n (X_i(\omega)-\mu).
\end{align*}
The event $\{\bar X_n-\mu \ge r\}$ is exactly the event $\{S_n \ge nr\}$. Since $\lambda>0$, the exponential map is increasing, so this event is also $\{e^{\lambda S_n} \ge e^{\lambda nr}\}$. For a parameter $\lambda \in (0,1/b)$, the map $\omega \mapsto e^{\lambda S_n(\omega)}$ is a non-negative random variable, so the [Markov inequality](/theorems/514) applies. We obtain
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r) = \mathbb P(e^{\lambda S_n} \ge e^{\lambda nr}).
\end{align*}
Then Markov's inequality gives
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r) \le e^{-\lambda nr}\mathbb E[e^{\lambda S_n}].
\end{align*}
Now we estimate the expectation. Since $X_1,\dots,X_n$ are independent, the centered variables $X_1-\mu,\dots,X_n-\mu$ are also independent. Therefore the exponential of the sum factors. First,
\begin{align*}
\mathbb E[e^{\lambda S_n}] = \mathbb E\left[\prod_{i=1}^n e^{\lambda(X_i-\mu)}\right].
\end{align*}
By independence of the centered variables, the expectation of the product is the product of the expectations:
\begin{align*}
\mathbb E[e^{\lambda S_n}] = \prod_{i=1}^n \mathbb E[e^{\lambda(X_i-\mu)}].
\end{align*}
The hypothesis applies to this positive $\lambda$ because $\lambda \in (0,1/b)$. Hence
\begin{align*}
\mathbb E[e^{\lambda S_n}] \le \prod_{i=1}^n \exp\left(\frac{\nu^2\lambda^2}{2}\right) = \exp\left(\frac{n\nu^2\lambda^2}{2}\right).
\end{align*}
Substituting this into the Markov estimate gives
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r)
\le \exp\left(-n\lambda r+\frac{n\nu^2\lambda^2}{2}\right).
\end{align*}
We now choose $\lambda$ to make the exponent as negative as possible while respecting $\lambda<1/b$. If $0<r<\nu^2/b$, the unconstrained minimizer $\lambda=r/\nu^2$ lies in $(0,1/b)$. Substituting this value of $\lambda$ gives
\begin{align*}
-n\lambda r+\frac{n\nu^2\lambda^2}{2} = -\frac{nr^2}{2\nu^2}.
\end{align*}
Thus
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r)
\le \exp\left(-\frac{nr^2}{2\nu^2}\right).
\end{align*}
If $r \ge \nu^2/b$, the minimizer $r/\nu^2$ is not necessarily allowed, so we push $\lambda$ up to the boundary of the admissible interval. The hypothesis is stated for $\lambda<1/b$, so we use the bound for $\lambda \in (0,1/b)$ and then take the limit $\lambda \uparrow 1/b$. This gives
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r)
&\le \exp\left(-\frac{nr}{b}+\frac{n\nu^2}{2b^2}\right).
\end{align*}
Because $r \ge \nu^2/b$, we have $\nu^2/b^2 \le r/b$, and hence
\begin{align*}
-\frac{nr}{b}+\frac{n\nu^2}{2b^2} \le -\frac{nr}{2b}.
\end{align*}
Therefore
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r)
\le \exp\left(-\frac{nr}{2b}\right).
\end{align*}
Combining the small-deviation and large-deviation cases gives the single bound
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r)
\le \exp\left(-\frac{n}{2}\min\left\{\frac{r^2}{\nu^2},\frac{r}{b}\right\}\right).
\end{align*}
[/guided]
[/step]
[step:Apply the same Chernoff argument to the lower tail]
For $r>0$ and $\lambda \in (0,1/b)$, the event $\{\bar X_n-\mu \le -r\}$ equals $\{-S_n \ge nr\}$, which equals $\{e^{-\lambda S_n} \ge e^{\lambda nr}\}$ because the exponential map is increasing. Applying the [Markov inequality](/theorems/514) to the non-negative random variable $e^{-\lambda S_n}$ gives
\begin{align*}
\mathbb P(\bar X_n-\mu \le -r) \le e^{-\lambda nr}\mathbb E[e^{-\lambda S_n}].
\end{align*}
Independence gives
\begin{align*}
\mathbb E[e^{-\lambda S_n}] = \prod_{i=1}^n \mathbb E[e^{-\lambda(X_i-\mu)}].
\end{align*}
The moment-generating-function hypothesis applies with parameter $-\lambda \in (-1/b,0)$, and therefore
\begin{align*}
\mathbb E[e^{-\lambda S_n}] \le \exp\left(\frac{n\nu^2\lambda^2}{2}\right).
\end{align*}
The same optimization in $\lambda$ as in the upper-tail estimate yields
\begin{align*}
\mathbb P(\bar X_n-\mu \le -r)
\le \exp\left(-\frac{n}{2}\min\left\{\frac{r^2}{\nu^2},\frac{r}{b}\right\}\right).
\end{align*}
[/step]
[step:Choose the radius so each one-sided tail has probability at most $\beta/2$]
Fix $\beta \in (0,1)$ and define
\begin{align*}
\rho := \max\left\{\nu\sqrt{\frac{2\log(2/\beta)}{n}},\,\frac{2b\log(2/\beta)}{n}\right\}.
\end{align*}
Since $\rho$ is at least the first term in the maximum,
\begin{align*}
\frac{n\rho^2}{2\nu^2} \ge \log(2/\beta).
\end{align*}
Since $\rho$ is at least the second term in the maximum,
\begin{align*}
\frac{n\rho}{2b} \ge \log(2/\beta).
\end{align*}
Therefore
\begin{align*}
\frac{n}{2}\min\left\{\frac{\rho^2}{\nu^2},\frac{\rho}{b}\right\}
\ge \log(2/\beta).
\end{align*}
Applying the upper-tail and lower-tail estimates with $r=\rho$ gives
\begin{align*}
\mathbb P(\bar X_n-\mu \ge \rho) \le \frac{\beta}{2},
\qquad
\mathbb P(\bar X_n-\mu \le -\rho) \le \frac{\beta}{2}.
\end{align*}
[/step]
[step:Combine the two one-sided estimates by the union bound]
The failure event of the desired confidence statement is
\begin{align*}
\{|\bar X_n-\mu|>\rho\}
= \{\bar X_n-\mu>\rho\}\cup \{\bar X_n-\mu<-\rho\}.
\end{align*}
Using monotonicity of probability to replace strict inequalities by non-strict tail events and then applying the [union bound](/theorems/6078),
\begin{align*}
\mathbb P(|\bar X_n-\mu|>\rho) \le \mathbb P(\bar X_n-\mu\ge \rho)+\mathbb P(\bar X_n-\mu\le -\rho).
\end{align*}
Using the two one-sided estimates gives
\begin{align*}
\mathbb P(|\bar X_n-\mu|>\rho) \le \frac{\beta}{2}+\frac{\beta}{2} = \beta.
\end{align*}
Taking complements gives
\begin{align*}
\mathbb P(|\bar X_n-\mu|\le \rho) \ge 1-\beta.
\end{align*}
Substituting the definition of $\rho$ is exactly
\begin{align*}
\mathbb P\left(|\bar X_n-\mu| \le \max\left\{\nu\sqrt{\frac{2\log(2/\beta)}{n}},\,\frac{2b\log(2/\beta)}{n}\right\}\right) \ge 1-\beta,
\end{align*}
which proves the theorem.
[/step]