[proofplan]
We derive the tail bound via the Chernoff method: for any $\alpha > 0$, Markov's inequality applied to $e^{\alpha(W - \mathbb{E}[W])}$ gives $\mathbb{P}(W - \mathbb{E}[W] \geq t) \leq e^{-\alpha t}\,\mathbb{E}[e^{\alpha(W - \mathbb{E}[W])}]$. The sub-Gaussian condition bounds the moment generating factor by $e^{\alpha^2 \sigma^2/2}$. Optimising the resulting bound over $\alpha > 0$ yields the tightest exponential decay, which occurs at $\alpha^* = t/\sigma^2$.
[/proofplan]
[step:Apply the Chernoff bound to the centred variable $W - \mathbb{E}[W]$]
Let $\bar{W} := W - \mathbb{E}[W]$ denote the centred version of $W$. Fix $t \geq 0$ and let $\alpha > 0$ be an arbitrary positive parameter (to be optimised later). Since $x \mapsto e^{\alpha x}$ is a strictly increasing function, the event $\{\bar{W} \geq t\}$ is equivalent to $\{e^{\alpha \bar{W}} \geq e^{\alpha t}\}$. Applying Markov's inequality to the non-negative random variable $e^{\alpha \bar{W}}$ at the level $e^{\alpha t} > 0$:
\begin{align*}
\mathbb{P}(\bar{W} \geq t) = \mathbb{P}(e^{\alpha \bar{W}} \geq e^{\alpha t}) \leq \frac{\mathbb{E}[e^{\alpha \bar{W}}]}{e^{\alpha t}} = e^{-\alpha t}\,\mathbb{E}[e^{\alpha \bar{W}}].
\end{align*}
Markov's inequality applies because $e^{\alpha \bar{W}} \geq 0$ always and $\mathbb{E}[e^{\alpha \bar{W}}] \leq e^{\alpha^2 \sigma^2 / 2} < \infty$ (which follows from the sub-Gaussian condition, verified in the next step).
[guided]
This is the Chernoff bounding technique: instead of bounding the probability of $\{\bar{W} \geq t\}$ directly, we exponentiate both sides with a free parameter $\alpha > 0$ and apply Markov's inequality to the resulting non-negative random variable.
Why introduce the exponential? Markov's inequality $\mathbb{P}(Z \geq a) \leq \mathbb{E}[Z]/a$ for $Z \geq 0$ and $a > 0$ is a weak bound in general, but when applied to $Z = e^{\alpha \bar{W}}$, it converts information about the moment generating function (MGF) into a tail bound. The sub-Gaussian condition gives us precise control over the MGF, so the Chernoff method is the natural approach.
Applying Markov's inequality:
\begin{align*}
\mathbb{P}(\bar{W} \geq t) = \mathbb{P}(e^{\alpha \bar{W}} \geq e^{\alpha t}) \leq \frac{\mathbb{E}[e^{\alpha \bar{W}}]}{e^{\alpha t}} = e^{-\alpha t}\,\mathbb{E}[e^{\alpha \bar{W}}].
\end{align*}
This holds for every $\alpha > 0$, so we are free to choose $\alpha$ to make the bound as tight as possible. The optimal $\alpha$ will depend on the structure of $\mathbb{E}[e^{\alpha \bar{W}}]$, which is controlled by the sub-Gaussian hypothesis.
[/guided]
[/step]
[step:Bound the moment generating factor using the sub-Gaussian condition]
By assumption, $W$ is sub-Gaussian with parameter $\sigma > 0$, which means
\begin{align*}
\mathbb{E}[e^{\alpha(W - \mathbb{E}[W])}] \leq e^{\alpha^2 \sigma^2 / 2} \quad \text{for all } \alpha \in \mathbb{R}.
\end{align*}
Substituting into the Chernoff bound from the previous step:
\begin{align*}
\mathbb{P}(W - \mathbb{E}[W] \geq t) \leq e^{-\alpha t} \cdot e^{\alpha^2 \sigma^2/2} = \exp\!\left(-\alpha t + \frac{\alpha^2 \sigma^2}{2}\right).
\end{align*}
This bound holds for every $\alpha > 0$.
[/step]
[step:Optimise over $\alpha > 0$ to obtain the tightest bound]
Define
\begin{align*}
\varphi : (0, \infty) &\to \mathbb{R} \\
\alpha &\mapsto -\alpha t + \frac{\alpha^2 \sigma^2}{2}.
\end{align*}
To minimise $\varphi(\alpha)$, compute its derivative: $\varphi'(\alpha) = -t + \alpha \sigma^2$. Setting $\varphi'(\alpha) = 0$ gives $\alpha^* = t / \sigma^2$. Since $\varphi''(\alpha) = \sigma^2 > 0$, the function $\varphi$ is strictly convex and $\alpha^*$ is the unique global minimiser on $(0, \infty)$ (for $t > 0$; when $t = 0$, the bound gives $e^0 = 1$, which is the vacuous bound $\mathbb{P}(\bar{W} \geq 0) \leq 1$).
Evaluating at $\alpha^* = t/\sigma^2$:
\begin{align*}
\varphi(\alpha^*) = -\frac{t}{\sigma^2} \cdot t + \frac{1}{2} \cdot \frac{t^2}{\sigma^4} \cdot \sigma^2 = -\frac{t^2}{\sigma^2} + \frac{t^2}{2\sigma^2} = -\frac{t^2}{2\sigma^2}.
\end{align*}
Therefore
\begin{align*}
\mathbb{P}(W - \mathbb{E}[W] \geq t) \leq \exp\!\left(-\frac{t^2}{2\sigma^2}\right),
\end{align*}
which is the stated sub-Gaussian tail bound.
[guided]
The exponent $\varphi(\alpha) = -\alpha t + \alpha^2 \sigma^2/2$ is a quadratic in $\alpha$ (opening upward since the leading coefficient $\sigma^2/2 > 0$). Its minimum is at $\alpha^* = t/\sigma^2$, found by setting the first derivative equal to zero.
Why is this the right $\alpha$? The Chernoff bound gives $\mathbb{P}(\bar{W} \geq t) \leq e^{\varphi(\alpha)}$ for all $\alpha > 0$, so the tightest bound is $\inf_{\alpha > 0} e^{\varphi(\alpha)} = e^{\inf_{\alpha > 0} \varphi(\alpha)}$. The infimum of a strictly convex function is attained at its unique critical point.
Substituting $\alpha^* = t/\sigma^2$:
\begin{align*}
\varphi\!\left(\frac{t}{\sigma^2}\right) = -\frac{t^2}{\sigma^2} + \frac{t^2}{2\sigma^2} = -\frac{t^2}{2\sigma^2}.
\end{align*}
This yields the Gaussian-type tail bound $e^{-t^2/(2\sigma^2)}$. Notice that the bound has the same form as the tail of a $\mathcal{N}(0, \sigma^2)$ random variable (up to a factor of $2$ in the normalisation), which is the origin of the name "sub-Gaussian": the tails decay at least as fast as those of a Gaussian with variance $\sigma^2$.
For the boundary case $t = 0$: the bound gives $\mathbb{P}(\bar{W} \geq 0) \leq e^0 = 1$, which is trivially true and consistent with the general statement for all $t \geq 0$.
[/guided]
[/step]