[proofplan]
We use the exponential moment method. For any $\alpha > 0$, Jensen's inequality applied to the convex function $x \mapsto e^{\alpha x}$ lifts the expected maximum into a log-moment generating function. Bounding the maximum of exponentials by a sum, then applying the sub-Gaussian MGF bound to each term, yields an upper bound with a free parameter $\alpha$. Optimising over $\alpha$ produces the $\sigma\sqrt{2\log d}$ bound.
[/proofplan]
[step:Lift the expected maximum into a logarithm of an exponential moment via Jensen's inequality]
Let $\alpha > 0$ be a free parameter to be chosen later. The function $x \mapsto e^{\alpha x}$ is convex on $\mathbb{R}$. By [Jensen's Inequality](/theorems/1977), for any real-valued random variable $Z$ with $\mathbb{E}[|Z|] < \infty$, we have $e^{\alpha \mathbb{E}[Z]} \leq \mathbb{E}[e^{\alpha Z}]$.
We apply this with $Z = \max_{j=1,\ldots,d} W_j$. The random variable $Z$ is integrable because each $W_j$ is sub-Gaussian (hence integrable) and the maximum of finitely many integrable random variables is integrable. Jensen's inequality gives
\begin{align*}
\exp\!\left(\alpha\, \mathbb{E}\!\left[\max_{j=1,\ldots,d} W_j\right]\right) \leq \mathbb{E}\!\left[\exp\!\left(\alpha \max_{j=1,\ldots,d} W_j\right)\right].
\end{align*}
Taking logarithms (which is monotone) and dividing by $\alpha > 0$:
\begin{align*}
\mathbb{E}\!\left[\max_{j=1,\ldots,d} W_j\right] \leq \frac{1}{\alpha}\log \mathbb{E}\!\left[\exp\!\left(\alpha \max_{j=1,\ldots,d} W_j\right)\right].
\end{align*}
[guided]
Why the exponential moment method? We want to bound $\mathbb{E}[\max_j W_j]$, but the maximum is not additive, making direct estimation difficult. The exponential is a natural tool because $e^{\alpha \max_j W_j} = \max_j e^{\alpha W_j}$ (since $x \mapsto e^{\alpha x}$ is monotone), and the maximum of non-negative quantities is bounded by their sum. This converts the problem into bounding individual MGFs, which is exactly what the sub-Gaussian condition controls.
Jensen's inequality is applied to the convex function $\varphi(x) = e^{\alpha x}$ (convex because $\varphi''(x) = \alpha^2 e^{\alpha x} > 0$). This gives $\varphi(\mathbb{E}[Z]) \leq \mathbb{E}[\varphi(Z)]$, i.e.,
\begin{align*}
\exp\!\left(\alpha\, \mathbb{E}\!\left[\max_j W_j\right]\right) \leq \mathbb{E}\!\left[\exp\!\left(\alpha \max_j W_j\right)\right].
\end{align*}
Since $\alpha > 0$, dividing by $\alpha$ after taking the logarithm preserves the inequality direction.
[/guided]
[/step]
[step:Bound $\max_j e^{\alpha W_j}$ by the sum $\sum_j e^{\alpha W_j}$]
Since the exponential function is non-negative, we have the pointwise inequality
\begin{align*}
\max_{j=1,\ldots,d} e^{\alpha W_j} \leq \sum_{j=1}^d e^{\alpha W_j}.
\end{align*}
This holds because the maximum of a finite collection of non-negative numbers is at most their sum. Taking expectations and using monotonicity of expectation:
\begin{align*}
\mathbb{E}\!\left[\max_{j=1,\ldots,d} e^{\alpha W_j}\right] \leq \sum_{j=1}^d \mathbb{E}\!\left[e^{\alpha W_j}\right].
\end{align*}
Note that this step does not require independence of $W_1, \ldots, W_d$.
[/step]
[step:Apply the sub-Gaussian MGF bound to each summand]
Each $W_j$ is sub-Gaussian with parameter $\sigma > 0$ and $\mathbb{E}[W_j] = 0$. By definition of sub-Gaussianity, for all $\alpha \in \mathbb{R}$:
\begin{align*}
\mathbb{E}\!\left[e^{\alpha(W_j - \mathbb{E}[W_j])}\right] = \mathbb{E}\!\left[e^{\alpha W_j}\right] \leq e^{\alpha^2 \sigma^2 / 2},
\end{align*}
where the first equality uses $\mathbb{E}[W_j] = 0$. Since all $d$ random variables share the same sub-Gaussian parameter $\sigma$, each summand satisfies the same bound:
\begin{align*}
\sum_{j=1}^d \mathbb{E}\!\left[e^{\alpha W_j}\right] \leq d \cdot e^{\alpha^2 \sigma^2/2}.
\end{align*}
[/step]
[step:Combine the bounds and take logarithms]
Substituting the result of the previous two steps into the Jensen bound from the first step:
\begin{align*}
\mathbb{E}\!\left[\max_{j=1,\ldots,d} W_j\right] \leq \frac{1}{\alpha}\log\!\left(d \cdot e^{\alpha^2 \sigma^2/2}\right) = \frac{\log d}{\alpha} + \frac{\alpha \sigma^2}{2}.
\end{align*}
This inequality holds for all $\alpha > 0$.
[/step]
[step:Optimise over $\alpha > 0$ to obtain the tightest bound]
Define
\begin{align*}
\psi : (0, \infty) &\to \mathbb{R} \\
\alpha &\mapsto \frac{\log d}{\alpha} + \frac{\alpha \sigma^2}{2}.
\end{align*}
To minimise $\psi$, compute $\psi'(\alpha) = -\frac{\log d}{\alpha^2} + \frac{\sigma^2}{2}$. Setting $\psi'(\alpha) = 0$:
\begin{align*}
\frac{\sigma^2}{2} = \frac{\log d}{\alpha^2} \quad \Longrightarrow \quad \alpha^* = \sqrt{\frac{2\log d}{\sigma^2}} = \frac{\sqrt{2\log d}}{\sigma}.
\end{align*}
Since $\psi''(\alpha) = \frac{2\log d}{\alpha^3} > 0$ for $\alpha > 0$, the function $\psi$ is strictly convex and $\alpha^*$ is the unique global minimiser. Evaluating at $\alpha^*$:
\begin{align*}
\psi(\alpha^*) = \frac{\log d}{\sqrt{2\log d}/\sigma} + \frac{\sigma^2}{2}\cdot\frac{\sqrt{2\log d}}{\sigma} = \frac{\sigma\log d}{\sqrt{2\log d}} + \frac{\sigma\sqrt{2\log d}}{2} = \frac{\sigma\sqrt{2\log d}}{2} + \frac{\sigma\sqrt{2\log d}}{2} = \sigma\sqrt{2\log d}.
\end{align*}
Therefore
\begin{align*}
\mathbb{E}\!\left[\max_{j=1,\ldots,d} W_j\right] \leq \sigma\sqrt{2\log d},
\end{align*}
as claimed.
[guided]
The optimisation is a standard exercise in balancing two competing terms. The bound $\frac{\log d}{\alpha} + \frac{\alpha \sigma^2}{2}$ consists of a term decreasing in $\alpha$ (which controls the "union bound cost" of having $d$ variables) and a term increasing in $\alpha$ (which controls the MGF growth). The optimal $\alpha^*$ balances these two effects.
Computing $\psi(\alpha^*)$ explicitly: substituting $\alpha^* = \sqrt{2\log d}/\sigma$,
\begin{align*}
\frac{\log d}{\alpha^*} = \frac{\log d \cdot \sigma}{\sqrt{2\log d}} = \frac{\sigma\sqrt{\log d}}{\sqrt{2}} = \frac{\sigma\sqrt{2\log d}}{2},
\end{align*}
and
\begin{align*}
\frac{\alpha^* \sigma^2}{2} = \frac{\sigma^2}{2}\cdot\frac{\sqrt{2\log d}}{\sigma} = \frac{\sigma\sqrt{2\log d}}{2}.
\end{align*}
Adding these two equal terms gives $\psi(\alpha^*) = \sigma\sqrt{2\log d}$.
The key feature of this bound is its logarithmic dependence on $d$: the expected maximum of $d$ sub-Gaussian random variables grows only as $\sqrt{\log d}$, not as $\sqrt{d}$. This is the reason sub-Gaussian concentration is so powerful in high-dimensional settings -- the "price" of taking a maximum over an exponentially large class grows only polynomially in the relevant parameters.
[/guided]
[/step]