Maximal Inequality for Finitely Many Sub-Gaussian Random Variables

Maximal Inequality for Finitely Many Sub-Gaussian Random Variables (Theorem # 6058)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We first translate the $\psi_2$ bound into a uniform sub-Gaussian tail estimate for each [random variable](/page/Random%20Variable). A union bound then gives a tail estimate for the maximum $M := \max_{1 \le i \le N}|X_i|$. The expectation bound follows by integrating this tail estimate and splitting the integral at the natural scale $K\sqrt{\log(N+1)}$. The deviation estimate is obtained by applying the same union-bound tail estimate at the shifted threshold and absorbing the finite maximum penalty into the logarithmic part of the threshold. [/proofplan] [step:Reduce the $\psi_2$ hypothesis to individual tail bounds] Let $(\Omega,\mathcal F,\mathbb P)$ denote the probability space on which the random variables $X_1,\dots,X_N$ are defined. Since the statement has been formulated with $K>0$, all expressions involving $K^{-2}$ are defined. Fix $i \in \{1,\dots,N\}$. By definition, \begin{align*} \|X_i\|_{\psi_2}=\inf\left\{a>0: \mathbb E\left[\exp\left(\frac{X_i^2}{a^2}\right)\right]\le 2\right\}. \end{align*} Since $\|X_i\|_{\psi_2}\le K$, every $K+\varepsilon$ with $\varepsilon>0$ belongs to this admissible set, and hence \begin{align*} \mathbb E\left[\exp\left(\frac{X_i^2}{(K+\varepsilon)^2}\right)\right] \le 2. \end{align*} As $\varepsilon \downarrow 0$, the non-negative random variables $Y_{i,\varepsilon}: \Omega \to [0,\infty)$ defined by \begin{align*} Y_{i,\varepsilon}(\omega) := \exp\left(\frac{X_i(\omega)^2}{(K+\varepsilon)^2}\right) \end{align*} increase pointwise to the random variable $Y_i: \Omega \to [0,\infty)$ defined by \begin{align*} Y_i(\omega) := \exp\left(\frac{X_i(\omega)^2}{K^2}\right). \end{align*} By [Fatou's lemma](/theorems/510), \begin{align*} \mathbb E[Y_i] \le \liminf_{\varepsilon \downarrow 0}\mathbb E[Y_{i,\varepsilon}] \le 2. \end{align*} For each $t \ge 0$, applying [Markov's inequality](/theorems/514) to the non-negative random variable $Y_i$ at the level $\exp(t^2/K^2)$ gives \begin{align*} \mathbb P(|X_i| \ge t)=\mathbb P\left(\exp\left(\frac{X_i^2}{K^2}\right) \ge \exp\left(\frac{t^2}{K^2}\right)\right) \le 2\exp\left(-\frac{t^2}{K^2}\right). \end{align*} [/step] [step:Apply the union bound to the maximum] Define the random variable $M: \Omega \to [0,\infty)$ by \begin{align*} M(\omega) := \max_{1 \le i \le N}|X_i(\omega)|. \end{align*} For every $t \ge 0$, the event $\{M \ge t\}$ is contained in the union $\bigcup_{i=1}^N \{|X_i| \ge t\}$. Therefore the [union bound](/theorems/6078) and the tail estimate from the previous step give \begin{align*} \mathbb P(M \ge t) \le \sum_{i=1}^N \mathbb P(|X_i| \ge t) \le 2N\exp\left(-\frac{t^2}{K^2}\right). \end{align*} Also $\mathbb P(M \ge t) \le 1$, so \begin{align*} \mathbb P(M \ge t) \le \min\left\{1,2N\exp\left(-\frac{t^2}{K^2}\right)\right\}. \end{align*} [guided] The maximum is large only if at least one of the variables is large. Formally, define the random variable $M: \Omega \to [0,\infty)$ by \begin{align*} M(\omega) := \max_{1 \le i \le N}|X_i(\omega)|. \end{align*} For a fixed threshold $t \ge 0$, if $M(\omega) \ge t$, then by the definition of the maximum there exists an index $i \in \{1,\dots,N\}$ such that $|X_i(\omega)| \ge t$. Hence \begin{align*} \{M \ge t\} \subset \bigcup_{i=1}^N \{|X_i| \ge t\}. \end{align*} Taking probabilities and using the [union bound](/theorems/6078), equivalently finite subadditivity of $\mathbb P$ over the finite family of events $\{|X_i| \ge t\}$, gives \begin{align*} \mathbb P(M \ge t) \le \sum_{i=1}^N \mathbb P(|X_i| \ge t). \end{align*} For each fixed index $i \in \{1,\dots,N\}$, the preceding tail argument applies Markov's inequality to the non-negative random variable $Y_i=\exp(X_i^2/K^2)$ at the level $\exp(t^2/K^2)$. Since the $\psi_2$ hypothesis gave $\mathbb E[Y_i]\le 2$, we obtain \begin{align*} \mathbb P(|X_i| \ge t) = \mathbb P\left(Y_i \ge \exp\left(\frac{t^2}{K^2}\right)\right) \le 2\exp\left(-\frac{t^2}{K^2}\right). \end{align*} Substituting this bound into the finite sum yields \begin{align*} \mathbb P(M \ge t) \le 2N\exp\left(-\frac{t^2}{K^2}\right). \end{align*} Since probabilities are always at most $1$, we may combine this with the elementary bound $\mathbb P(M \ge t) \le 1$ and obtain \begin{align*} \mathbb P(M \ge t) \le \min\left\{1,2N\exp\left(-\frac{t^2}{K^2}\right)\right\}. \end{align*} This is the key estimate: the price of taking the maximum over $N$ variables is the factor $N$, and the logarithm in the final bound will be exactly what cancels that factor. [/guided] [/step] [step:Integrate the tail bound to estimate the expectation] Let $\mathcal L^1$ denote one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on $[0,\infty)$. Since $M$ is the finite maximum of measurable real-valued random variables, $M$ is measurable, and $M \ge 0$. The [layer-cake identity](/theorems/2956) for non-negative measurable random variables gives, with both sides allowed initially to take the value $+\infty$, \begin{align*} \mathbb E[M] = \int_0^\infty \mathbb P(M \ge t)\,d\mathcal L^1(t). \end{align*} Let \begin{align*} L := \log(N+1), \qquad a := K\sqrt{\log(2N)}. \end{align*} Using the tail estimate from the previous step and splitting the integral at $a$, we obtain \begin{align*} \mathbb E[M] \le \int_0^a 1\,d\mathcal L^1(t)+ \int_a^\infty 2N\exp\left(-\frac{t^2}{K^2}\right)\,d\mathcal L^1(t). \end{align*} For the second integral, the change of variables $t=Ks$ transforms $d\mathcal L^1(t)$ into $K\,d\mathcal L^1(s)$ and transforms the domain $[a,\infty)$ into $[\sqrt{\log(2N)},\infty)$. Hence \begin{align*} \mathbb E[M] \le a + K\int_{\sqrt{\log(2N)}}^\infty 2N e^{-s^2}\,d\mathcal L^1(s), \end{align*} where we used the change of variables $t=Ks$, so $d\mathcal L^1(t)=K\,d\mathcal L^1(s)$. For $s \ge \sqrt{\log(2N)}$, the Gaussian tail estimate \begin{align*} \int_r^\infty e^{-s^2}\,d\mathcal L^1(s) \le \frac{e^{-r^2}}{2r} \end{align*} with $r=\sqrt{\log(2N)}$ gives \begin{align*} K\int_{\sqrt{\log(2N)}}^\infty 2N e^{-s^2}\,d\mathcal L^1(s) \le K \cdot 2N \cdot \frac{e^{-\log(2N)}}{2\sqrt{\log(2N)}} = \frac{K}{2\sqrt{\log(2N)}}. \end{align*} Since $N \ge 1$, we have $\log(2N) \ge \log 2$, $\log(2N) \le 2\log(N+1)$, and $\log(N+1) \ge \log 2$. Therefore \begin{align*} \frac{K}{2\sqrt{\log 2}} \le \frac{K\sqrt{\log(N+1)}}{2\log 2}, \end{align*} and hence \begin{align*} \mathbb E[M] \le K\sqrt{2\log(N+1)} + \frac{K\sqrt{\log(N+1)}}{2\log 2} \le C_1 K\sqrt{\log(N+1)} \end{align*} for the universal constant \begin{align*} C_1 := \sqrt{2} + \frac{1}{2\log 2}. \end{align*} [/step] [step:Shift the threshold to obtain the deviation estimate] Let $C := 2\sqrt{2}$ and define \begin{align*} b := C K\sqrt{\log(N+1)} + u. \end{align*} For every $u \ge 0$, the [union bound](/theorems/6078) tail estimate gives \begin{align*} \mathbb P(M \ge b) \le 2N\exp\left(-\frac{b^2}{K^2}\right). \end{align*} Since $b \ge C K\sqrt{\log(N+1)}$, we have \begin{align*} \frac{b^2}{K^2} \ge \frac{C^2}{2}\log(N+1) + \frac{u^2}{2K^2}, \end{align*} using $(r+s)^2 \ge r^2+s^2$ for $r,s \ge 0$ and weakening the bound by a factor of $2$. Therefore \begin{align*} \mathbb P(M \ge b) \le 2N\exp\left(-\frac{C^2}{2}\log(N+1)\right)\exp\left(-\frac{u^2}{2K^2}\right). \end{align*} With $C=2\sqrt{2}$, we have $C^2/2=4$, and since $N \le (N+1)^4$ for $N \ge 1$, \begin{align*} N\exp\left(-4\log(N+1)\right) = \frac{N}{(N+1)^4} \le 1. \end{align*} Thus \begin{align*} \mathbb P\left(M \ge C K\sqrt{\log(N+1)}+u\right) \le 2\exp\left(-\frac{u^2}{2K^2}\right). \end{align*} Taking $c := 1/2$ proves the claimed deviation estimate. [/step] [step:Combine the estimates] The expectation estimate follows from the third step with the universal constant $C_1$. The deviation estimate follows from the fourth step with universal constants $C=2\sqrt{2}$ and $c=1/2$. Enlarging $C$ if necessary so that the same universal constant may be used in both displayed conclusions gives the theorem. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Random Variable Definition Expectation Definition Cook's Distance Formula for Internally Studentized Residuals Probability & Statistics Sub-Gaussian Sum Bound for Independent Random Variables Probability & Statistics Minimax Euclidean Risk for Sparse Linear Regression with Gaussian Design Probability & Statistics Equivalent Characterizations of Sub-Gaussian Random Variables Probability & Statistics Tail Sum Formula for Expectation Probability Theory Basic Identities for the Hat Matrix Probability & Statistics Planted-Clique-Based Sparse PCA Lower Bound Probability & Statistics PGF of a Random Sum Probability Theory Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.