[proofplan]
We first define the centered sum $S_n$ and show that its moment generating function has variance proxy $n\sigma^2$, using independence to factor the expectation. We then apply the exponential Markov argument and optimize over the exponential parameter to obtain one-sided upper and lower tail estimates for $\bar X_n-\mu$. Finally, we combine the two one-sided estimates by subadditivity of probability and choose the radius that makes the total failure probability equal to $\beta$.
[/proofplan]
[step:Factor the moment generating function of the centered sum]
Define the centered sum [random variable](/page/Random%20Variable) $S_n: \Omega \to \mathbb R$ by
\begin{align*}
S_n(\omega) := \sum_{i=1}^n (X_i(\omega)-\mu).
\end{align*}
For every $\lambda \in \mathbb R$, independence of $X_1,\dots,X_n$ gives independence of the random variables $e^{\lambda(X_1-\mu)},\dots,e^{\lambda(X_n-\mu)}$. Hence
First, the exponential of the sum factors pointwise:
\begin{align*}
e^{\lambda S_n}=\prod_{i=1}^n e^{\lambda(X_i-\mu)}.
\end{align*}
Independence gives
\begin{align*}
\mathbb E\left[e^{\lambda S_n}\right]=\prod_{i=1}^n \mathbb E\left[e^{\lambda(X_i-\mu)}\right].
\end{align*}
Applying the sub-Gaussian hypothesis to each factor gives
\begin{align*}
\mathbb E\left[e^{\lambda S_n}\right]\le \prod_{i=1}^n \exp\left(\frac{\sigma^2\lambda^2}{2}\right)=\exp\left(\frac{n\sigma^2\lambda^2}{2}\right).
\end{align*}
Thus $S_n$ is sub-Gaussian with variance proxy $n\sigma^2$ in the moment-generating-function sense.
[/step]
[step:Optimize the exponential Markov bound for the upper tail]
Let $r>0$ be fixed. Since
\begin{align*}
\bar X_n-\mu = \frac{S_n}{n},
\end{align*}
the event $\{\bar X_n-\mu \ge r\}$ equals $\{S_n \ge nr\}$. For every $\lambda>0$, the random variable $e^{\lambda S_n}: \Omega \to [0,\infty)$ is non-negative, and on the event $\{S_n \ge nr\}$ it satisfies $e^{\lambda S_n}\ge e^{\lambda nr}$. Therefore
\begin{align*}
e^{\lambda nr}\,\mathbb P(S_n \ge nr)\le \mathbb E\left[e^{\lambda S_n}\right].
\end{align*}
Using the moment-generating-function bound for $S_n$ gives
\begin{align*}
e^{\lambda nr}\,\mathbb P(S_n \ge nr)\le \exp\left(\frac{n\sigma^2\lambda^2}{2}\right).
\end{align*}
Dividing by $e^{\lambda nr}>0$ gives
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r)\le \exp\left(\frac{n\sigma^2\lambda^2}{2}-\lambda nr\right).
\end{align*}
Choose
\begin{align*}
\lambda_* := \frac{r}{\sigma^2}.
\end{align*}
Since $r>0$ and $\sigma>0$, we have $\lambda_*>0$. Substituting $\lambda_*$ yields
\begin{align*}
\frac{n\sigma^2\lambda_*^2}{2}-\lambda_*nr=\frac{n\sigma^2}{2}\frac{r^2}{\sigma^4}-n\frac{r}{\sigma^2}r=-\frac{nr^2}{2\sigma^2}.
\end{align*}
Consequently,
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r)
\le \exp\left(-\frac{nr^2}{2\sigma^2}\right).
\end{align*}
[guided]
We want to convert the moment-generating-function estimate into a probability estimate. The reason for introducing an exponential is that the event $\{\bar X_n-\mu \ge r\}$ is the same as $\{S_n \ge nr\}$, and exponentials turn this threshold event into an inequality between non-negative random variables.
Fix $r>0$ and choose an arbitrary parameter $\lambda>0$. Since
\begin{align*}
\bar X_n-\mu = \frac{1}{n}\sum_{i=1}^n(X_i-\mu)=\frac{S_n}{n},
\end{align*}
we have
\begin{align*}
\{\bar X_n-\mu \ge r\}=\{S_n\ge nr\}.
\end{align*}
On this event, monotonicity of the exponential function gives
\begin{align*}
e^{\lambda S_n}\ge e^{\lambda nr}.
\end{align*}
Let $\mathbb{1}_{\{S_n\ge nr\}}:\Omega\to\{0,1\}$ denote the indicator function of the event $\{S_n\ge nr\}$. Equivalently,
\begin{align*}
e^{\lambda nr}\mathbb{1}_{\{S_n\ge nr\}} \le e^{\lambda S_n}.
\end{align*}
Taking expectations with respect to $\mathbb P$ gives
\begin{align*}
e^{\lambda nr}\mathbb P(S_n\ge nr)= \mathbb E\left[e^{\lambda nr}\mathbb{1}_{\{S_n\ge nr\}}\right]\le \mathbb E\left[e^{\lambda S_n}\right].
\end{align*}
The moment-generating-function estimate from the previous step applies to $S_n$, so
\begin{align*}
e^{\lambda nr}\mathbb P(S_n\ge nr)
\le \exp\left(\frac{n\sigma^2\lambda^2}{2}\right).
\end{align*}
Dividing by $e^{\lambda nr}>0$ gives
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r)
\le \exp\left(\frac{n\sigma^2\lambda^2}{2}-\lambda nr\right).
\end{align*}
Now we choose $\lambda$ to make the exponent as small as possible. The exponent is the quadratic function
\begin{align*}
q:(0,\infty)\to\mathbb R,\qquad q(\lambda)=\frac{n\sigma^2\lambda^2}{2}-\lambda nr.
\end{align*}
Its derivative is $q'(\lambda)=n\sigma^2\lambda-nr$, so the critical point is
\begin{align*}
\lambda_*=\frac{r}{\sigma^2}.
\end{align*}
Because $r>0$ and $\sigma>0$, this value lies in the allowed range $(0,\infty)$. Substitution gives
\begin{align*}
q(\lambda_*)=\frac{n\sigma^2}{2}\frac{r^2}{\sigma^4}-n\frac{r}{\sigma^2}r=\frac{nr^2}{2\sigma^2}-\frac{nr^2}{\sigma^2}=-\frac{nr^2}{2\sigma^2}.
\end{align*}
Therefore
\begin{align*}
\mathbb P(\bar X_n-\mu \ge r)
\le \exp\left(-\frac{nr^2}{2\sigma^2}\right).
\end{align*}
[/guided]
[/step]
[step:Apply the same argument to the lower tail]
For each $i \in \{1,\dots,n\}$ and every $\lambda \in \mathbb R$,
\begin{align*}
\mathbb E\left[e^{\lambda(\mu-X_i)}\right]
= \mathbb E\left[e^{(-\lambda)(X_i-\mu)}\right]
\le \exp\left(\frac{\sigma^2\lambda^2}{2}\right).
\end{align*}
Thus the centered variables $\mu-X_i$ satisfy the same sub-Gaussian moment-generating-function bound. Applying the previous upper-tail estimate to the independent random variables $\mu-X_1,\dots,\mu-X_n$ gives, for every $r>0$,
\begin{align*}
\mathbb P(\mu-\bar X_n \ge r)
\le \exp\left(-\frac{nr^2}{2\sigma^2}\right).
\end{align*}
Equivalently,
\begin{align*}
\mathbb P(\bar X_n-\mu \le -r)
\le \exp\left(-\frac{nr^2}{2\sigma^2}\right).
\end{align*}
[/step]
[step:Combine the two tails and choose the confidence radius]
For $r>0$, define the upper-tail event $A_+(r)\in\mathcal F$ and lower-tail event $A_-(r)\in\mathcal F$ by
\begin{align*}
A_+(r) := \{\omega\in\Omega:\bar X_n(\omega)-\mu \ge r\}.
\end{align*}
Also define
\begin{align*}
A_-(r) := \{\omega\in\Omega:\bar X_n(\omega)-\mu \le -r\}.
\end{align*}
The failure event for the two-sided bound is
\begin{align*}
\{\omega\in\Omega:|\bar X_n(\omega)-\mu|>r\}\subset A_+(r)\cup A_-(r).
\end{align*}
By subadditivity of the probability measure $\mathbb P$ and the two one-sided estimates,
\begin{align*}
\mathbb P(|\bar X_n-\mu|>r)\le \mathbb P(A_+(r))+\mathbb P(A_-(r)).
\end{align*}
Applying the upper-tail and lower-tail estimates gives
\begin{align*}
\mathbb P(|\bar X_n-\mu|>r)\le 2\exp\left(-\frac{nr^2}{2\sigma^2}\right).
\end{align*}
Now fix $\beta\in(0,1)$ and define
\begin{align*}
r_\beta := \sigma\sqrt{\frac{2\log(2/\beta)}{n}}.
\end{align*}
Since $\beta\in(0,1)$, we have $2/\beta>1$, hence $\log(2/\beta)>0$ and $r_\beta>0$. Substituting $r=r_\beta$ gives
\begin{align*}
2\exp\left(-\frac{nr_\beta^2}{2\sigma^2}\right)=2\exp\left(-\frac{n}{2\sigma^2}\sigma^2\frac{2\log(2/\beta)}{n}\right).
\end{align*}
The exponent simplifies to $-\log(2/\beta)$, and therefore
\begin{align*}
2\exp\left(-\frac{nr_\beta^2}{2\sigma^2}\right)=2\exp\left(-\log(2/\beta)\right)=2\cdot\frac{\beta}{2}=\beta.
\end{align*}
Therefore
\begin{align*}
\mathbb P(|\bar X_n-\mu|>r_\beta)\le \beta.
\end{align*}
Taking complements in $\Omega$ yields
\begin{align*}
\mathbb P\left(|\bar X_n-\mu|\le r_\beta\right)\ge 1-\beta,
\end{align*}
which is exactly
\begin{align*}
\mathbb P\left(\left|\bar X_n-\mu\right| \le \sigma\sqrt{\frac{2\log(2/\beta)}{n}}\right)\ge 1-\beta.
\end{align*}
This proves the theorem.
[/step]