Marchenko-Pastur Upper Edge for Gaussian Null Sample Covariance Matrices

Marchenko-Pastur Upper Edge for Gaussian Null Sample Covariance Matrices (Theorem # 5941)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We encode the data vectors as the rows of a Gaussian noise matrix $Z_n \in \mathbb{R}^{n \times p_n}$. The sample covariance matrix is exactly $n^{-1}Z_n^\top Z_n$, so its largest eigenvalue is the square of the largest singular value of $n^{-1/2}Z_n$. The Bai-Yin extreme singular value theorem gives the limiting top singular value $1+\sqrt{\gamma}$ in probability, and an elementary squaring argument transfers this convergence to the upper edge of the sample covariance spectrum. [/proofplan] [step:Represent the sample covariance matrix as a rescaled Gram matrix] For each $n \geq 1$ and each $i \in \{1,\dots,n\}$, write $X_{n,i}:\Omega \to \mathbb{R}^{p_n}$ for the $i$th data vector in the $n$th row of the triangular array, and write $(X_{n,i})_j:\Omega \to \mathbb{R}$ for its $j$th coordinate. Define the random matrix $Z_n:\Omega \to \mathbb{R}^{n \times p_n}$ by declaring that, for every $\omega \in \Omega$, every $i \in \{1,\dots,n\}$, and every $j \in \{1,\dots,p_n\}$, the $(i,j)$ entry is $[Z_n(\omega)]_{ij}:=(X_{n,i}(\omega))_j$. Since $X_{n,1},\dots,X_{n,n}$ are independent with common distribution $\mathcal{N}(0,I_{p_n})$, the entries of $Z_n$ are independent standard Gaussian random variables. For every $\omega \in \Omega$, \begin{align*} Z_n(\omega)^\top Z_n(\omega) = \sum_{i=1}^{n} X_{n,i}(\omega) X_{n,i}(\omega)^\top. \end{align*} Therefore \begin{align*} \hat{\Sigma}_n = \frac{1}{n} Z_n^\top Z_n. \end{align*} [guided] We first turn the probabilistic data into a matrix whose singular values can be studied. For each $n \geq 1$ and each $i \in \{1,\dots,n\}$, let $X_{n,i}:\Omega \to \mathbb{R}^{p_n}$ denote the $i$th data vector in the $n$th row of the triangular array. For each coordinate index $j \in \{1,\dots,p_n\}$, let $(X_{n,i})_j:\Omega \to \mathbb{R}$ denote the $j$th coordinate map. Define the random matrix $Z_n:\Omega \to \mathbb{R}^{n \times p_n}$ by declaring that, for every $\omega \in \Omega$, every $i \in \{1,\dots,n\}$, and every $j \in \{1,\dots,p_n\}$, \begin{align*} [Z_n(\omega)]_{ij}:=(X_{n,i}(\omega))_j. \end{align*} Because $X_{n,1},\dots,X_{n,n}$ are independent and each has distribution $\mathcal{N}(0,I_{p_n})$, their coordinates are independent standard Gaussian random variables. Hence the entries of $Z_n$ are independent standard Gaussian random variables. Now compute the Gram matrix. For every $\omega \in \Omega$, the $(j,k)$ entry of $Z_n(\omega)^\top Z_n(\omega)$ is the sum over the row index $i$ of $(X_{n,i}(\omega))_j(X_{n,i}(\omega))_k$. This is exactly the $(j,k)$ entry of the outer-product sum \begin{align*} \sum_{i=1}^{n} X_{n,i}(\omega) X_{n,i}(\omega)^\top. \end{align*} Therefore \begin{align*} Z_n(\omega)^\top Z_n(\omega) = \sum_{i=1}^{n} X_{n,i}(\omega) X_{n,i}(\omega)^\top. \end{align*} By the definition of the sample covariance matrix in the null model, this gives the identity \begin{align*} \hat{\Sigma}_n = \frac{1}{n} Z_n^\top Z_n. \end{align*} This exact identity is the bridge from sample covariance eigenvalues to singular values of the Gaussian data matrix. [/guided] [/step] [step:Identify the largest sample covariance eigenvalue with the squared top singular value] Define the normalized data matrix $A_n:\Omega \to \mathbb{R}^{n \times p_n}$ by $A_n(\omega):=n^{-1/2}Z_n(\omega)$ for every $\omega \in \Omega$. Define the largest-singular-value map $s_{\max}:\mathbb{R}^{n \times p_n} \to [0,\infty)$ by \begin{align*} s_{\max}(A) := \sup\{ |Av| : v \in \mathbb{R}^{p_n},\ |v|=1 \} \end{align*} for each deterministic matrix $A \in \mathbb{R}^{n \times p_n}$. Since $A_n^\top A_n = n^{-1}Z_n^\top Z_n = \hat{\Sigma}_n$, the variational characterization of the largest eigenvalue of a real symmetric positive semidefinite matrix gives \begin{align*} \lambda_{\max}(\hat{\Sigma}_n)=\sup\{ v^\top \hat{\Sigma}_n v : v \in \mathbb{R}^{p_n},\ |v|=1 \}. \end{align*} Substituting $\hat{\Sigma}_n=A_n^\top A_n$ gives \begin{align*} \lambda_{\max}(\hat{\Sigma}_n)=\sup\{ v^\top A_n^\top A_n v : v \in \mathbb{R}^{p_n},\ |v|=1 \}. \end{align*} For every $v \in \mathbb{R}^{p_n}$, the Euclidean [inner product](/page/Inner%20Product) identity gives $v^\top A_n^\top A_n v=|A_nv|^2$, so \begin{align*} \lambda_{\max}(\hat{\Sigma}_n)=\sup\{ |A_n v|^2 : v \in \mathbb{R}^{p_n},\ |v|=1 \}. \end{align*} Because the square function is increasing on $[0,\infty)$, this supremum is $s_{\max}(A_n)^2$. [guided] The point of introducing $A_n = n^{-1/2}Z_n$ is that the standard extreme singular value theorem is stated for the normalized noise matrix, not directly for $Z_n^\top Z_n$. We first connect the two spectral quantities exactly. Define the largest-singular-value map $s_{\max}:\mathbb{R}^{n \times p_n} \to [0,\infty)$ by \begin{align*} s_{\max}(A) := \sup\{ |Av| : v \in \mathbb{R}^{p_n},\ |v|=1 \} \end{align*} for each deterministic matrix $A \in \mathbb{R}^{n \times p_n}$. Applying this to $A_n = n^{-1/2}Z_n$, we have $A_n^\top A_n = n^{-1}Z_n^\top Z_n = \hat{\Sigma}_n$. The matrix $\hat{\Sigma}_n$ is real symmetric and positive semidefinite because, for every $v \in \mathbb{R}^{p_n}$, \begin{align*} v^\top \hat{\Sigma}_n v = v^\top A_n^\top A_n v = |A_n v|^2 \geq 0. \end{align*} Hence the variational characterization of the largest eigenvalue gives \begin{align*} \lambda_{\max}(\hat{\Sigma}_n)=\sup\{ v^\top \hat{\Sigma}_n v : v \in \mathbb{R}^{p_n},\ |v|=1 \}. \end{align*} Using $\hat{\Sigma}_n=A_n^\top A_n$ and $v^\top A_n^\top A_n v=|A_nv|^2$, this becomes \begin{align*} \lambda_{\max}(\hat{\Sigma}_n)=\sup\{ |A_n v|^2 : v \in \mathbb{R}^{p_n},\ |v|=1 \}. \end{align*} Since $|A_nv|\geq 0$ and the square function is increasing on $[0,\infty)$, we may square the supremum: \begin{align*} \lambda_{\max}(\hat{\Sigma}_n)=\left(\sup\{ |A_n v| : v \in \mathbb{R}^{p_n},\ |v|=1 \}\right)^2=s_{\max}(A_n)^2. \end{align*} Thus the largest eigenvalue of the sample covariance matrix is not merely asymptotic to a squared singular value; it is exactly the square of the top singular value of the normalized data matrix. [/guided] [/step] [step:Apply the Bai-Yin upper singular value limit] By the [Bai-Yin theorem for extreme singular values](/theorems/73768), applied to $Z_n \in \mathbb{R}^{n \times p_n}$ with $p_n/n \to \gamma \in (0,\infty)$, we have \begin{align*} s_{\max}(n^{-1/2}Z_n) \xrightarrow{\mathbb{P}} 1+\sqrt{\gamma}. \end{align*} The hypotheses are satisfied because the entries of $Z_n$ are independent, centered, variance-one Gaussian random variables, and the aspect ratio satisfies $p_n/n \to \gamma$ by assumption. [guided] This is the only probabilistic input in the proof. The [Bai-Yin theorem for extreme singular values](/theorems/73768) applies to rectangular random matrices whose entries are independent, centered, variance-one random variables with the required moment assumptions. In the present Gaussian null model, the entries of $Z_n$ are independent standard Gaussian random variables, hence they are centered, have variance one, and have moments of every finite order. The rectangular aspect ratio condition in the theorem is also satisfied because $p_n/n \to \gamma$ with $\gamma \in (0,\infty)$. Therefore the theorem gives \begin{align*} s_{\max}(n^{-1/2}Z_n) \xrightarrow{\mathbb{P}} 1+\sqrt{\gamma}. \end{align*} This converts the high-dimensional random matrix input into the single scalar convergence needed for the final step. [/guided] [/step] [step:Square the singular value convergence to obtain the Marchenko-Pastur edge] Define the real-valued [random variable](/page/Random%20Variable) $S_n:\Omega \to [0,\infty)$ by $S_n(\omega):=s_{\max}(n^{-1/2}Z_n(\omega))$ for every $\omega \in \Omega$. The previous step gives $S_n \xrightarrow{\mathbb{P}} 1+\sqrt{\gamma}$. Let $a := 1+\sqrt{\gamma}$. Since $\gamma \in (0,\infty)$, we have $a>0$. We prove that $S_n^2 \xrightarrow{\mathbb{P}} a^2$. Fix $\varepsilon>0$. Choose \begin{align*} \delta := \min\left\{1,\frac{\varepsilon}{2a+1}\right\}. \end{align*} On the event $\{|S_n-a|<\delta\}$, we have $S_n < a+1$, and therefore \begin{align*} |S_n^2-a^2|=|S_n-a|(S_n+a)<\delta(2a+1)\leq \varepsilon. \end{align*} Hence \begin{align*} \mathbb{P}\left(|S_n^2-a^2|\geq \varepsilon\right)\leq\mathbb{P}\left(|S_n-a|\geq \delta\right)\to 0. \end{align*} Thus \begin{align*} S_n^2 \xrightarrow{\mathbb{P}} (1+\sqrt{\gamma})^2. \end{align*} Using $\lambda_{\max}(\hat{\Sigma}_n)=S_n^2$ from the preceding identification, we obtain \begin{align*} \lambda_{\max}(\hat{\Sigma}_n) \xrightarrow{\mathbb{P}} (1+\sqrt{\gamma})^2. \end{align*} This is the claimed Marchenko-Pastur upper edge limit. [guided] The Bai-Yin step gives convergence in probability of the singular values, but the theorem is stated for the squared quantity. Define $S_n:\Omega \to [0,\infty)$ by $S_n(\omega):=s_{\max}(n^{-1/2}Z_n(\omega))$. We know that $S_n \xrightarrow{\mathbb{P}} a$, where $a:=1+\sqrt{\gamma}$. Since $\gamma \in (0,\infty)$, $a$ is a finite positive real number. We now check directly that squaring preserves this convergence in probability. Fix $\varepsilon>0$ and set \begin{align*} \delta := \min\left\{1,\frac{\varepsilon}{2a+1}\right\}. \end{align*} If $|S_n-a|<\delta$, then $S_n<a+1$ because $\delta\leq 1$. Hence \begin{align*} |S_n^2-a^2|=|S_n-a|(S_n+a)<\delta(2a+1)\leq\varepsilon. \end{align*} Therefore the event $\{|S_n^2-a^2|\geq\varepsilon\}$ is contained in the event $\{|S_n-a|\geq\delta\}$. Taking probabilities gives \begin{align*} \mathbb{P}\left(|S_n^2-a^2|\geq \varepsilon\right)\leq\mathbb{P}\left(|S_n-a|\geq \delta\right). \end{align*} The right-hand side tends to $0$ because $S_n \xrightarrow{\mathbb{P}} a$ and $\delta>0$ is fixed. Hence $S_n^2 \xrightarrow{\mathbb{P}} a^2=(1+\sqrt{\gamma})^2$. Finally, the spectral identification proved earlier gives $\lambda_{\max}(\hat{\Sigma}_n)=S_n^2$, so \begin{align*} \lambda_{\max}(\hat{\Sigma}_n) \xrightarrow{\mathbb{P}} (1+\sqrt{\gamma})^2. \end{align*} [/guided] [/step]

Prerequisites (0/8 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Distribution Definition Variance Definition Event Definition Function Definition Sequence Definition Matrix Definition Limit Definition Set Definition Strong Markov Property of Brownian Motion Brownian Motion $L^2$ Integrability Implies $L^1$ Integrability Probability Theory Total Variation Bound by Chi-Square Divergence Probability & Statistics Joint Distribution of Brownian Motion and its Maximum Brownian Motion Fano Testing-to-Estimation Reduction Probability & Statistics Integration with Respect to a Poisson Random Measure Poisson Processes Chi-Squared Distribution of the Residual Sum of Squares Probability & Statistics Gaussian Score Bound for Fixed-Design Linear Regression Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.