Tracy-Widom Limit for the Largest Eigenvalue of a Real Wishart Matrix

Tracy-Widom Limit for the Largest Eigenvalue of a Real Wishart Matrix (Theorem # 4073)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We reduce the statement to the standard real Wishart soft-edge theorem of Johnstone, which gives Tracy-Widom $\beta = 1$ fluctuations for the largest eigenvalue of a real Gaussian sample covariance matrix with the precise centering and scaling used here. The only substantive verification is that the present matrix model is exactly the real Wishart/Laguerre orthogonal ensemble covered by that theorem and that the normalization is transferred from $X_n^\top X_n$ to $n^{-1}X_n^\top X_n$. The aspect-ratio hypothesis $p_n/n \to c \in (0,\infty)$ ensures that the Johnstone hypotheses apply and that the scale is of order $n^{-2/3}$ for the eigenvalues of $S_n$. [/proofplan] [step:Name the largest eigenvalue and the Wishart normalization] For each $n$, let $p_n \in \mathbb{N}$ be the column dimension of $X_n$, and let \begin{align*} W_n := X_n^\top X_n \in \mathbb{R}^{p_n \times p_n} \end{align*} denote the unnormalised real Wishart matrix. Let $\Lambda_{1,n}$ denote the largest eigenvalue of $W_n$, and let $\lambda_{1,n}$ denote the largest eigenvalue of $S_n = n^{-1}W_n$. Since scalar multiplication by $n^{-1}$ multiplies every eigenvalue by $n^{-1}$, we have \begin{align*} \lambda_{1,n} = n^{-1}\Lambda_{1,n}. \end{align*} The entries of $X_n$ are independent real standard Gaussian random variables, so $W_n$ is the real Wishart matrix with identity covariance and parameters $(n,p_n)$. If $p_n > n$, then $W_n = X_n^\top X_n$ has $p_n - n$ additional zero eigenvalues, while $X_nX_n^\top$ and $X_n^\top X_n$ have the same non-zero eigenvalues with the same multiplicities by the singular-value decomposition. Hence the largest eigenvalue is unaffected by passing between $X_n^\top X_n$ and $X_nX_n^\top$. [/step] [step:Apply the real Wishart soft-edge theorem] Define \begin{align*} \widetilde{\mu}_n &:= (\sqrt{n - 1/2} + \sqrt{p_n - 1/2})^2, \\ \widetilde{\sigma}_n &:= (\sqrt{n - 1/2} + \sqrt{p_n - 1/2})(1/\sqrt{n - 1/2} + 1/\sqrt{p_n - 1/2})^{1/3}. \end{align*} By the [Johnstone Tracy-Widom theorem for real Wishart matrices](https://doi.org/10.1214/aos/1009210544), applied to the real Gaussian matrix $X_n$ with identity covariance and aspect ratio $p_n/n \to c \in (0,\infty)$, the largest eigenvalue $\Lambda_{1,n}$ of $W_n = X_n^\top X_n$ satisfies the asserted soft-edge limit. If the theorem is stated with the smaller matrix dimension first, then in the case $p_n>n$ we apply it to $X_nX_n^\top$ instead; the preceding singular-value argument identifies its largest eigenvalue with $\Lambda_{1,n}$, and the displayed centering and scaling are symmetric in $n$ and $p_n$. Thus \begin{align*} \frac{\Lambda_{1,n} - \widetilde{\mu}_n}{\widetilde{\sigma}_n} \xrightarrow{d} \xi, \end{align*} where $\xi$ has the Tracy-Widom distribution $F_1$. [guided] The external theorem we need is the real Wishart soft-edge limit, often called Johnstone's Tracy-Widom theorem for the largest eigenvalue of a real Wishart matrix. Its hypotheses are: the matrix entries are independent real standard Gaussian random variables, the population covariance is the identity matrix, and the dimensions satisfy $p_n/n \to c$ for some $c \in (0,\infty)$. These are exactly the hypotheses in the theorem statement: $X_n$ has independent $\mathcal N(0,1)$ entries, so $X_n^\top X_n$ is a real identity-covariance Wishart matrix, and the aspect ratio assumption is $p_n/n \to c \in (0,\infty)$. There is one dimensional convention to check. Some formulations state the real Wishart soft-edge theorem for the non-zero eigenvalues of the sample covariance matrix with the smaller dimension listed first. If $p_n \le n$, this is directly $X_n^\top X_n$. If $p_n > n$, then $X_n^\top X_n$ has $p_n-n$ zero eigenvalues, but the singular-value decomposition gives the same positive eigenvalues for $X_n^\top X_n$ and $X_nX_n^\top$, with the same multiplicities. Therefore the largest eigenvalue is the same in the two matrices. Since the centering $\widetilde{\mu}_n$ and scaling $\widetilde{\sigma}_n$ are symmetric in $n$ and $p_n$, applying Johnstone's theorem to $X_nX_n^\top$ in the case $p_n>n$ gives exactly the same displayed normalization. The theorem gives the following convergence for the largest eigenvalue $\Lambda_{1,n}$ of the unnormalised Wishart matrix $W_n = X_n^\top X_n$: \begin{align*} \frac{\Lambda_{1,n} - \widetilde{\mu}_n}{\widetilde{\sigma}_n} \xrightarrow{d} \xi, \end{align*} where $\xi$ has distribution function $F_1$, the Tracy-Widom law with parameter $\beta = 1$. The centering and scaling are \begin{align*} \widetilde{\mu}_n &:= (\sqrt{n - 1/2} + \sqrt{p_n - 1/2})^2, \\ \widetilde{\sigma}_n &:= (\sqrt{n - 1/2} + \sqrt{p_n - 1/2})(1/\sqrt{n - 1/2} + 1/\sqrt{p_n - 1/2})^{1/3}. \end{align*} This is the only deep input in the proof: the Pfaffian kernel formula, the Laguerre asymptotics at the soft edge, and the Airy kernel identification are precisely the content of the cited soft-edge theorem, not additional unproved steps here. [/guided] [/step] [step:Transfer the convergence from $W_n$ to $S_n$] Define the normalized centering and scaling constants \begin{align*} \mu_n &:= n^{-1}\widetilde{\mu}_n, \\ \sigma_n &:= n^{-1}\widetilde{\sigma}_n. \end{align*} Using $\lambda_{1,n} = n^{-1}\Lambda_{1,n}$, we compute \begin{align*} \frac{\lambda_{1,n} - \mu_n}{\sigma_n} &= \frac{n^{-1}\Lambda_{1,n} - n^{-1}\widetilde{\mu}_n}{n^{-1}\widetilde{\sigma}_n} \\ &= \frac{\Lambda_{1,n} - \widetilde{\mu}_n}{\widetilde{\sigma}_n}. \end{align*} The right-hand side converges in distribution to $F_1$ by the previous step, so \begin{align*} \frac{\lambda_{1,n} - \mu_n}{\sigma_n} \xrightarrow{d} \xi, \end{align*} where $\xi$ has distribution function $F_1$. [guided] The cited soft-edge theorem is stated for the unnormalised Wishart matrix $W_n=X_n^\top X_n$, while the theorem statement uses the normalized sample covariance matrix $S_n=n^{-1}W_n$. We therefore transfer the normalization explicitly. Define \begin{align*} \mu_n &:= n^{-1}\widetilde{\mu}_n, \\ \sigma_n &:= n^{-1}\widetilde{\sigma}_n. \end{align*} Since multiplying a matrix by the scalar $n^{-1}$ multiplies each of its eigenvalues by $n^{-1}$, the largest eigenvalues satisfy $\lambda_{1,n}=n^{-1}\Lambda_{1,n}$. Substituting this identity and the definitions of $\mu_n$ and $\sigma_n$ gives \begin{align*} \frac{\lambda_{1,n} - \mu_n}{\sigma_n} &= \frac{n^{-1}\Lambda_{1,n} - n^{-1}\widetilde{\mu}_n}{n^{-1}\widetilde{\sigma}_n} \\ &= \frac{\Lambda_{1,n} - \widetilde{\mu}_n}{\widetilde{\sigma}_n}. \end{align*} The right-hand side converges in distribution to $\xi$ by Johnstone's theorem as applied in the previous step. Hence \begin{align*} \frac{\lambda_{1,n} - \mu_n}{\sigma_n} \xrightarrow{d} \xi, \end{align*} where $\xi$ has distribution function $F_1$. [/guided] [/step] [step:Identify the Marchenko-Pastur edge scale] The upper Marchenko-Pastur edge for the normalized covariance matrix $S_n = n^{-1}X_n^\top X_n$ is \begin{align*} b_c := (1 + \sqrt{c})^2. \end{align*} Since $p_n/n \to c$, we have \begin{align*} \mu_n = \frac{(\sqrt{n - 1/2} + \sqrt{p_n - 1/2})^2}{n} \longrightarrow (1 + \sqrt{c})^2 = b_c. \end{align*} Moreover \begin{align*} \sigma_n = \frac{(\sqrt{n - 1/2} + \sqrt{p_n - 1/2})(1/\sqrt{n - 1/2} + 1/\sqrt{p_n - 1/2})^{1/3}}{n}, \end{align*} and the first factor is of order $n^{1/2}$ while the second factor is of order $n^{-1/6}$. Hence $\sigma_n$ is of order $n^{-2/3}$. This is exactly the centering at the upper Marchenko-Pastur edge and the soft-edge scaling asserted in the theorem. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Distribution

Explore Further

Distribution Definition Bartlett's Chi-Squared Approximation for Wilks' Lambda in One-Way MANOVA probability Wiener–Khinchin Theorem probability Covariance Formula for the Gaussian Factor Model probability Population Canonical Correlations via Singular Values probability Two-Sample Hotelling $T^2$ Distribution Theorem probability Best Linear Predictor and Residual Covariance Formula probability Root-Pattern Sensitivity of Classical One-Way MANOVA Criteria probability Independence of the Sample Mean and Sample Covariance for Multivariate Normal Samples probability

What brings you to Androma?

Start with a route through the knowledge graph.