Gaussian Score Concentration for Column-Normalized Fixed Designs

Gaussian Score Concentration for Column-Normalized Fixed Designs (Theorem # 5550)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We write each coordinate of the score vector $X^\top\varepsilon/n$ as a scalar Gaussian [random variable](/page/Random%20Variable). Column normalization makes every coordinate have variance exactly $\sigma^2/n$. A one-dimensional Chernoff bound gives the two-sided Gaussian tail estimate for each coordinate, and finite subadditivity of probability gives the union bound over the $p$ coordinates. The high-probability form is obtained by choosing $t$ so that the tail bound equals $\delta$. [/proofplan] [step:Identify each coordinate score and compute its Gaussian variance] For each $j \in \{1,\dots,p\}$, define the score coordinate random variable as the map $S_j: \Omega \to \mathbb{R}$ given by \begin{align*} S_j(\omega) := \frac{1}{n}\sum_{i=1}^{n}X_{ij}\varepsilon_i(\omega). \end{align*} Then \begin{align*} \frac{X^\top\varepsilon}{n} = (S_1,\dots,S_p). \end{align*} Since $S_j$ is a deterministic linear combination of independent centered Gaussian random variables, $S_j$ is Gaussian. Linearity of expectation gives \begin{align*} \mathbb{E}[S_j] = \frac{1}{n}\sum_{i=1}^{n}X_{ij}\mathbb{E}[\varepsilon_i] = 0. \end{align*} Independence gives additivity of variance for the summands $X_{ij}\varepsilon_i/n$, and column normalization gives \begin{align*} \operatorname{Var}(S_j) = \frac{1}{n^2}\sum_{i=1}^{n}X_{ij}^{2}\operatorname{Var}(\varepsilon_i) = \frac{\sigma^2}{n^2}\sum_{i=1}^{n}X_{ij}^{2} = \frac{\sigma^2}{n}. \end{align*} Thus $S_j \sim \mathcal{N}(0,\sigma^2/n)$ for every $j \in \{1,\dots,p\}$. [guided] Fix $j \in \{1,\dots,p\}$. The $j$-th coordinate of $X^\top\varepsilon/n$ is the scalar random variable $S_j: \Omega \to \mathbb{R}$ defined by \begin{align*} S_j(\omega) := \frac{1}{n}\sum_{i=1}^{n}X_{ij}\varepsilon_i(\omega). \end{align*} This definition is exactly the coordinate expansion of the matrix-vector product, so \begin{align*} \frac{X^\top\varepsilon}{n} = (S_1,\dots,S_p). \end{align*} Because the entries $X_{ij}$ are fixed [real numbers](/page/Real%20Numbers), $S_j$ is a deterministic linear combination of the independent Gaussian coordinates $\varepsilon_1,\dots,\varepsilon_n$. Hence $S_j$ is Gaussian. Its mean is computed using linearity of expectation: \begin{align*} \mathbb{E}[S_j] &= \mathbb{E}\left[\frac{1}{n}\sum_{i=1}^{n}X_{ij}\varepsilon_i\right] = \frac{1}{n}\sum_{i=1}^{n}X_{ij}\mathbb{E}[\varepsilon_i] = 0, \end{align*} since each $\varepsilon_i$ has mean $0$. For the variance, independence removes the cross-covariance terms. Therefore \begin{align*} \operatorname{Var}(S_j) &= \operatorname{Var}\left(\frac{1}{n}\sum_{i=1}^{n}X_{ij}\varepsilon_i\right) = \frac{1}{n^2}\sum_{i=1}^{n}X_{ij}^{2}\operatorname{Var}(\varepsilon_i). \end{align*} Since $\operatorname{Var}(\varepsilon_i)=\sigma^2$ for every $i$, this becomes \begin{align*} \operatorname{Var}(S_j) &= \frac{\sigma^2}{n^2}\sum_{i=1}^{n}X_{ij}^{2}. \end{align*} The column normalization hypothesis says precisely that \begin{align*} \frac{1}{n}\sum_{i=1}^{n}X_{ij}^{2}=1, \end{align*} or equivalently $\sum_{i=1}^{n}X_{ij}^{2}=n$. Substituting this into the variance computation gives \begin{align*} \operatorname{Var}(S_j) &= \frac{\sigma^2}{n^2}\,n = \frac{\sigma^2}{n}. \end{align*} Thus $S_j \sim \mathcal{N}(0,\sigma^2/n)$. [/guided] [/step] [step:Apply a Chernoff bound to one score coordinate] Let $v := \sigma^2/n$. If $Z: \Omega \to \mathbb{R}$ satisfies $Z \sim \mathcal{N}(0,v)$, then for every $t>0$, \begin{align*} \mathbb{P}(|Z|>t) \leq 2\exp\left(-\frac{t^2}{2v}\right). \end{align*} Indeed, for every $\lambda>0$, applying Markov's inequality to the nonnegative random variable $e^{\lambda Z}$ gives \begin{align*} \mathbb{P}(Z>t) &= \mathbb{P}(e^{\lambda Z}>e^{\lambda t}) \leq e^{-\lambda t}\mathbb{E}[e^{\lambda Z}] = \exp\left(-\lambda t+\frac{v\lambda^2}{2}\right). \end{align*} Choosing $\lambda=t/v$ yields \begin{align*} \mathbb{P}(Z>t) \leq \exp\left(-\frac{t^2}{2v}\right). \end{align*} Applying the same estimate to $-Z \sim \mathcal{N}(0,v)$ gives \begin{align*} \mathbb{P}(Z<-t) \leq \exp\left(-\frac{t^2}{2v}\right). \end{align*} Since $\{|Z|>t\}=\{Z>t\}\cup\{Z<-t\}$, finite subadditivity gives the two-sided bound. Applying this with $Z=S_j$ and $v=\sigma^2/n$ gives \begin{align*} \mathbb{P}(|S_j|>t) \leq 2\exp\left(-\frac{nt^2}{2\sigma^2}\right) \end{align*} for every $j \in \{1,\dots,p\}$. [/step] [step:Take the union bound over all coordinates] For each $j \in \{1,\dots,p\}$, define the event \begin{align*} A_j := \{\omega \in \Omega : |S_j(\omega)| > t\}. \end{align*} Because the infinity norm of a vector in $\mathbb{R}^p$ is the maximum absolute coordinate, \begin{align*} \left\{\left\|\frac{X^\top\varepsilon}{n}\right\|_\infty > t\right\} = \bigcup_{j=1}^{p} A_j. \end{align*} Using finite subadditivity of probability and the coordinate tail bound, first \begin{align*} \mathbb{P}\left(\left\|\frac{X^\top\varepsilon}{n}\right\|_\infty > t\right) = \mathbb{P}\left(\bigcup_{j=1}^{p} A_j\right) \leq \sum_{j=1}^{p}\mathbb{P}(A_j). \end{align*} The coordinate tail bound then gives \begin{align*} \sum_{j=1}^{p}\mathbb{P}(A_j) \leq \sum_{j=1}^{p}2\exp\left(-\frac{nt^2}{2\sigma^2}\right) = 2p\exp\left(-\frac{nt^2}{2\sigma^2}\right). \end{align*} This proves the stated concentration inequality. [/step] [step:Choose the deviation level that makes the failure probability equal to $\delta$] Let $\delta \in (0,1)$, and define \begin{align*} t_\delta := \sigma\sqrt{\frac{2\log(2p/\delta)}{n}}. \end{align*} Since $p \in \mathbb{N}$ and $\delta \in (0,1)$, we have $2p/\delta>1$, so $t_\delta>0$. Substituting $t=t_\delta$ into the concentration inequality gives \begin{align*} 2p\exp\left(-\frac{n t_\delta^2}{2\sigma^2}\right) = 2p\exp\left(-\log(2p/\delta)\right) = 2p\frac{\delta}{2p} = \delta. \end{align*} Therefore \begin{align*} \mathbb{P}\left(\left\|\frac{X^\top\varepsilon}{n}\right\|_\infty > t_\delta\right) \leq \delta. \end{align*} Taking complements gives \begin{align*} \mathbb{P}\left(\left\|\frac{X^\top\varepsilon}{n}\right\|_\infty \leq \sigma\sqrt{\frac{2\log(2p/\delta)}{n}}\right) \geq 1-\delta. \end{align*} This is the asserted high-probability bound. [/step]

Prerequisites (0/4 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Random Variable Definition Expectation Definition Variance Definition Event Definition Half-Line Test for Real Measurability Probability & Statistics Coordinatewise Asymptotic Normality of the Debiased Lasso Probability & Statistics Addition Property of the Poisson Distribution Poisson Processes Feasibility of the True Parameter for the Dantzig Selector Probability & Statistics Gaussian Score Bound for Fixed-Design Linear Regression Probability & Statistics Unbiasedness of the Holdout Risk Estimator Probability & Statistics Coordinate Remainder Bound for Debiased Estimators Probability & Statistics Tower Property of Conditional Expectation Conditional Expectation Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.