Gaussian Score Bound for Fixed-Design Linear Regression

Gaussian Score Bound for Fixed-Design Linear Regression (Theorem # 5585)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the score bound coordinate by coordinate. Each coordinate $X_j^\top\varepsilon/n$ is a centered Gaussian [random variable](/page/Random%20Variable), and the column normalization bounds its variance by $\sigma^2/n$. We then apply a two-sided Gaussian tail estimate with threshold $\lambda/2$ and combine the $p$ coordinate bounds by the union bound. [/proofplan] [step:Compute the variance of each score coordinate] For each $j \in \{1,\dots,p\}$, define the real-valued random variable $S_j: \Omega \to \mathbb{R}$ by \begin{align*} S_j(\omega) := \frac{X_j^\top \varepsilon(\omega)}{n}. \end{align*} Since $\varepsilon \sim \mathcal{N}(0,\sigma^2 I_n)$ and $S_j$ is a deterministic linear functional of $\varepsilon$, the random variable $S_j$ is Gaussian. Its expectation is \begin{align*} \mathbb{E}[S_j] = \frac{1}{n}X_j^\top \mathbb{E}[\varepsilon] = 0, \end{align*} and its variance is \begin{align*} \operatorname{Var}(S_j) = \operatorname{Var}\left(\frac{X_j^\top \varepsilon}{n}\right) = \frac{1}{n^2}X_j^\top (\sigma^2 I_n)X_j = \frac{\sigma^2 |X_j|^2}{n^2} \leq \frac{\sigma^2}{n}. \end{align*} Thus \begin{align*} S_j \sim \mathcal{N}(0,v_j^2) \end{align*} for some variance parameter $v_j^2 \in [0,\sigma^2/n]$. [/step] [step:Apply the Gaussian tail estimate at the chosen threshold] Define the threshold \begin{align*} t := \frac{\lambda}{2} = \sigma\sqrt{\frac{2\log(2p/\alpha)}{n}}. \end{align*} We first record the elementary bound used below: if $Z \sim \mathcal{N}(0,v^2)$ with $0 \leq v^2 \leq \sigma^2/n$, then \begin{align*} \mathbb{P}(|Z| > t) \leq 2\exp\left(-\frac{nt^2}{2\sigma^2}\right). \end{align*} Indeed, when $v=0$, $Z=0$ almost surely and the bound holds. When $v>0$, write $Z=vG$ with $G \sim \mathcal{N}(0,1)$. For any $a>0$ and any $s>0$, Markov's inequality applied to $\exp(sG)$ gives \begin{align*} \mathbb{P}(G>a) = \mathbb{P}(\exp(sG)>\exp(sa)) \leq \exp(-sa)\mathbb{E}[\exp(sG)] = \exp\left(-sa+\frac{s^2}{2}\right). \end{align*} Choosing $s=a$ gives $\mathbb{P}(G>a)\leq \exp(-a^2/2)$. Applying the same argument to $-G$ gives $\mathbb{P}(G<-a)\leq \exp(-a^2/2)$, hence \begin{align*} \mathbb{P}(|G|>a)\leq 2\exp\left(-\frac{a^2}{2}\right). \end{align*} With $a=t/v$, this yields \begin{align*} \mathbb{P}(|Z|>t) \leq 2\exp\left(-\frac{t^2}{2v^2}\right) \leq 2\exp\left(-\frac{nt^2}{2\sigma^2}\right), \end{align*} because $v^2\leq \sigma^2/n$. Applying this estimate to $Z=S_j$ gives, for every $j \in \{1,\dots,p\}$, \begin{align*} \mathbb{P}(|S_j|>t) \leq 2\exp\left(-\frac{nt^2}{2\sigma^2}\right). \end{align*} Substituting the definition of $t$, \begin{align*} \frac{nt^2}{2\sigma^2} = \frac{n}{2\sigma^2}\cdot \sigma^2\frac{2\log(2p/\alpha)}{n} = \log(2p/\alpha), \end{align*} and therefore \begin{align*} \mathbb{P}(|S_j|>t) \leq 2\exp(-\log(2p/\alpha)) = \frac{\alpha}{p}. \end{align*} [guided] The role of the chosen value of $\lambda$ is to make the Gaussian tail probability exactly small enough to survive a union bound over $p$ coordinates. We set \begin{align*} t := \frac{\lambda}{2} = \sigma\sqrt{\frac{2\log(2p/\alpha)}{n}}, \end{align*} because the event in the theorem is the same as $\max_{1\leq j\leq p}|S_j|\leq t$. We need a two-sided tail bound for a centered Gaussian whose variance may be smaller than $\sigma^2/n$. Let $Z \sim \mathcal{N}(0,v^2)$ with $0 \leq v^2 \leq \sigma^2/n$. If $v=0$, then $Z=0$ almost surely, so $\mathbb{P}(|Z|>t)=0$. Assume now that $v>0$, and write $Z=vG$ where $G \sim \mathcal{N}(0,1)$. For $a>0$ and $s>0$, Markov's inequality applied to the non-negative random variable $\exp(sG)$ gives \begin{align*} \mathbb{P}(G>a) = \mathbb{P}(\exp(sG)>\exp(sa)) \leq \exp(-sa)\mathbb{E}[\exp(sG)]. \end{align*} The moment generating function of the standard Gaussian is $\mathbb{E}[\exp(sG)]=\exp(s^2/2)$, so \begin{align*} \mathbb{P}(G>a) \leq \exp\left(-sa+\frac{s^2}{2}\right). \end{align*} Choosing $s=a$ minimizes the displayed quadratic expression and gives \begin{align*} \mathbb{P}(G>a)\leq \exp\left(-\frac{a^2}{2}\right). \end{align*} Applying the same estimate to $-G$, which also has distribution $\mathcal{N}(0,1)$, gives \begin{align*} \mathbb{P}(|G|>a)\leq 2\exp\left(-\frac{a^2}{2}\right). \end{align*} Now set $a=t/v$. Since $Z=vG$, \begin{align*} \mathbb{P}(|Z|>t) = \mathbb{P}\left(|G|>\frac{t}{v}\right) \leq 2\exp\left(-\frac{t^2}{2v^2}\right). \end{align*} The variance bound $v^2\leq \sigma^2/n$ implies \begin{align*} \frac{1}{v^2}\geq \frac{n}{\sigma^2}, \end{align*} and hence \begin{align*} \mathbb{P}(|Z|>t) \leq 2\exp\left(-\frac{nt^2}{2\sigma^2}\right). \end{align*} We apply this with $Z=S_j$. The previous step proved that each $S_j$ is centered Gaussian with variance at most $\sigma^2/n$, so the hypotheses of the tail estimate are satisfied. Therefore \begin{align*} \mathbb{P}(|S_j|>t) \leq 2\exp\left(-\frac{nt^2}{2\sigma^2}\right). \end{align*} Finally, the definition of $t$ gives \begin{align*} \frac{nt^2}{2\sigma^2} = \frac{n}{2\sigma^2}\cdot \sigma^2\frac{2\log(2p/\alpha)}{n} = \log(2p/\alpha), \end{align*} so \begin{align*} \mathbb{P}(|S_j|>t) \leq 2\exp(-\log(2p/\alpha)) = \frac{\alpha}{p}. \end{align*} This is the coordinatewise error probability needed for the final union bound. [/guided] [/step] [step:Union bound over all score coordinates] Define the bad event \begin{align*} A := \left\{\omega \in \Omega : \left\|\frac{X^\top\varepsilon(\omega)}{n}\right\|_\infty > t\right\}. \end{align*} Since \begin{align*} \left\|\frac{X^\top\varepsilon}{n}\right\|_\infty = \max_{1\leq j\leq p}|S_j|, \end{align*} we have \begin{align*} A = \bigcup_{j=1}^p \{\omega \in \Omega : |S_j(\omega)|>t\}. \end{align*} By the union bound and the coordinate estimate from the previous step, \begin{align*} \mathbb{P}(A) \leq \sum_{j=1}^p \mathbb{P}(|S_j|>t) \leq \sum_{j=1}^p \frac{\alpha}{p} = \alpha. \end{align*} Taking complements gives \begin{align*} \mathbb{P}\left(\left\|\frac{X^\top\varepsilon}{n}\right\|_\infty \leq t\right) = 1-\mathbb{P}(A) \geq 1-\alpha. \end{align*} Since $t=\lambda/2$, this is precisely \begin{align*} \mathbb{P}\left(\left\|\frac{X^\top\varepsilon}{n}\right\|_\infty \leq \frac{\lambda}{2}\right) \geq 1-\alpha. \end{align*} [/step]

Prerequisites (0/5 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Distribution Definition Random Variable Definition Expectation Definition Variance Definition Event Definition Fast Lasso Prediction Bound under the Compatibility Condition Probability & Statistics Independence from the Germ Field Brownian Motion Conditional Expectation from a Conditional Density Probability Theory Bias-Variance Decomposition for Squared Loss Probability & Statistics Tail Sum Formula for Expectation Probability Theory Uniqueness of the PGF Probability Theory Fast-Rate Lasso Estimation Under a Restricted Eigenvalue Condition Probability & Statistics Debiased Lasso Decomposition Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.