Expected Maximum of Sub-Gaussians — Statement & Proof

Expected Maximum of Sub-Gaussians (Theorem # 1963)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We use the exponential moment method. For any $\alpha > 0$, Jensen's inequality applied to the convex function $x \mapsto e^{\alpha x}$ lifts the expected maximum into a log-moment generating function. Bounding the maximum of exponentials by a sum, then applying the sub-Gaussian MGF bound to each term, yields an upper bound with a free parameter $\alpha$. Optimising over $\alpha$ produces the $\sigma\sqrt{2\log d}$ bound. [/proofplan] [step:Lift the expected maximum into a logarithm of an exponential moment via Jensen's inequality] Let $\alpha > 0$ be a free parameter to be chosen later. The function $x \mapsto e^{\alpha x}$ is convex on $\mathbb{R}$. By [Jensen's Inequality](/theorems/1977), for any real-valued random variable $Z$ with $\mathbb{E}[|Z|] < \infty$, we have $e^{\alpha \mathbb{E}[Z]} \leq \mathbb{E}[e^{\alpha Z}]$. We apply this with $Z = \max_{j=1,\ldots,d} W_j$. The random variable $Z$ is integrable because each $W_j$ is sub-Gaussian (hence integrable) and the maximum of finitely many integrable random variables is integrable. Jensen's inequality gives \begin{align*} \exp\!\left(\alpha\, \mathbb{E}\!\left[\max_{j=1,\ldots,d} W_j\right]\right) \leq \mathbb{E}\!\left[\exp\!\left(\alpha \max_{j=1,\ldots,d} W_j\right)\right]. \end{align*} Taking logarithms (which is monotone) and dividing by $\alpha > 0$: \begin{align*} \mathbb{E}\!\left[\max_{j=1,\ldots,d} W_j\right] \leq \frac{1}{\alpha}\log \mathbb{E}\!\left[\exp\!\left(\alpha \max_{j=1,\ldots,d} W_j\right)\right]. \end{align*} [guided] Why the exponential moment method? We want to bound $\mathbb{E}[\max_j W_j]$, but the maximum is not additive, making direct estimation difficult. The exponential is a natural tool because $e^{\alpha \max_j W_j} = \max_j e^{\alpha W_j}$ (since $x \mapsto e^{\alpha x}$ is monotone), and the maximum of non-negative quantities is bounded by their sum. This converts the problem into bounding individual MGFs, which is exactly what the sub-Gaussian condition controls. Jensen's inequality is applied to the convex function $\varphi(x) = e^{\alpha x}$ (convex because $\varphi''(x) = \alpha^2 e^{\alpha x} > 0$). This gives $\varphi(\mathbb{E}[Z]) \leq \mathbb{E}[\varphi(Z)]$, i.e., \begin{align*} \exp\!\left(\alpha\, \mathbb{E}\!\left[\max_j W_j\right]\right) \leq \mathbb{E}\!\left[\exp\!\left(\alpha \max_j W_j\right)\right]. \end{align*} Since $\alpha > 0$, dividing by $\alpha$ after taking the logarithm preserves the inequality direction. [/guided] [/step] [step:Bound $\max_j e^{\alpha W_j}$ by the sum $\sum_j e^{\alpha W_j}$] Since the exponential function is non-negative, we have the pointwise inequality \begin{align*} \max_{j=1,\ldots,d} e^{\alpha W_j} \leq \sum_{j=1}^d e^{\alpha W_j}. \end{align*} This holds because the maximum of a finite collection of non-negative numbers is at most their sum. Taking expectations and using monotonicity of expectation: \begin{align*} \mathbb{E}\!\left[\max_{j=1,\ldots,d} e^{\alpha W_j}\right] \leq \sum_{j=1}^d \mathbb{E}\!\left[e^{\alpha W_j}\right]. \end{align*} Note that this step does not require independence of $W_1, \ldots, W_d$. [/step] [step:Apply the sub-Gaussian MGF bound to each summand] Each $W_j$ is sub-Gaussian with parameter $\sigma > 0$ and $\mathbb{E}[W_j] = 0$. By definition of sub-Gaussianity, for all $\alpha \in \mathbb{R}$: \begin{align*} \mathbb{E}\!\left[e^{\alpha(W_j - \mathbb{E}[W_j])}\right] = \mathbb{E}\!\left[e^{\alpha W_j}\right] \leq e^{\alpha^2 \sigma^2 / 2}, \end{align*} where the first equality uses $\mathbb{E}[W_j] = 0$. Since all $d$ random variables share the same sub-Gaussian parameter $\sigma$, each summand satisfies the same bound: \begin{align*} \sum_{j=1}^d \mathbb{E}\!\left[e^{\alpha W_j}\right] \leq d \cdot e^{\alpha^2 \sigma^2/2}. \end{align*} [/step] [step:Combine the bounds and take logarithms] Substituting the result of the previous two steps into the Jensen bound from the first step: \begin{align*} \mathbb{E}\!\left[\max_{j=1,\ldots,d} W_j\right] \leq \frac{1}{\alpha}\log\!\left(d \cdot e^{\alpha^2 \sigma^2/2}\right) = \frac{\log d}{\alpha} + \frac{\alpha \sigma^2}{2}. \end{align*} This inequality holds for all $\alpha > 0$. [/step] [step:Optimise over $\alpha > 0$ to obtain the tightest bound] Define \begin{align*} \psi : (0, \infty) &\to \mathbb{R} \\ \alpha &\mapsto \frac{\log d}{\alpha} + \frac{\alpha \sigma^2}{2}. \end{align*} To minimise $\psi$, compute $\psi'(\alpha) = -\frac{\log d}{\alpha^2} + \frac{\sigma^2}{2}$. Setting $\psi'(\alpha) = 0$: \begin{align*} \frac{\sigma^2}{2} = \frac{\log d}{\alpha^2} \quad \Longrightarrow \quad \alpha^* = \sqrt{\frac{2\log d}{\sigma^2}} = \frac{\sqrt{2\log d}}{\sigma}. \end{align*} Since $\psi''(\alpha) = \frac{2\log d}{\alpha^3} > 0$ for $\alpha > 0$, the function $\psi$ is strictly convex and $\alpha^*$ is the unique global minimiser. Evaluating at $\alpha^*$: \begin{align*} \psi(\alpha^*) = \frac{\log d}{\sqrt{2\log d}/\sigma} + \frac{\sigma^2}{2}\cdot\frac{\sqrt{2\log d}}{\sigma} = \frac{\sigma\log d}{\sqrt{2\log d}} + \frac{\sigma\sqrt{2\log d}}{2} = \frac{\sigma\sqrt{2\log d}}{2} + \frac{\sigma\sqrt{2\log d}}{2} = \sigma\sqrt{2\log d}. \end{align*} Therefore \begin{align*} \mathbb{E}\!\left[\max_{j=1,\ldots,d} W_j\right] \leq \sigma\sqrt{2\log d}, \end{align*} as claimed. [guided] The optimisation is a standard exercise in balancing two competing terms. The bound $\frac{\log d}{\alpha} + \frac{\alpha \sigma^2}{2}$ consists of a term decreasing in $\alpha$ (which controls the "union bound cost" of having $d$ variables) and a term increasing in $\alpha$ (which controls the MGF growth). The optimal $\alpha^*$ balances these two effects. Computing $\psi(\alpha^*)$ explicitly: substituting $\alpha^* = \sqrt{2\log d}/\sigma$, \begin{align*} \frac{\log d}{\alpha^*} = \frac{\log d \cdot \sigma}{\sqrt{2\log d}} = \frac{\sigma\sqrt{\log d}}{\sqrt{2}} = \frac{\sigma\sqrt{2\log d}}{2}, \end{align*} and \begin{align*} \frac{\alpha^* \sigma^2}{2} = \frac{\sigma^2}{2}\cdot\frac{\sqrt{2\log d}}{\sigma} = \frac{\sigma\sqrt{2\log d}}{2}. \end{align*} Adding these two equal terms gives $\psi(\alpha^*) = \sigma\sqrt{2\log d}$. The key feature of this bound is its logarithmic dependence on $d$: the expected maximum of $d$ sub-Gaussian random variables grows only as $\sqrt{\log d}$, not as $\sqrt{d}$. This is the reason sub-Gaussian concentration is so powerful in high-dimensional settings -- the "price" of taking a maximum over an exponentially large class grows only polynomially in the relevant parameters. [/guided] [/step]

Explore Further

Convergence of Projected Gradient Descent Machine Learning AdaBoost Training Error Bound Machine Learning Universal Approximation Machine Learning Expected Risk Bound via Rademacher Complexity Machine Learning Contraction Lemma Machine Learning Calibration via Differentiability at Zero Machine Learning Subdifferential at Points of Differentiability Machine Learning Regression Function Minimises Squared-Error Risk Machine Learning

What brings you to Androma?

Start with a route through the knowledge graph.

Expected Maximum of Sub-Gaussians (Theorem # 1963)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Expected Maximum of Sub-Gaussians (Theorem # 1963)

Discussion

Proof

Explore Further