Anderson's Asymptotic Normality Theorem for Sample Covariance Eigenvalues (Theorem # 4035)
Theorem
Let $p \in \mathbb{N}$ be fixed. Let $X_1, X_2, \dots$ be independent identically distributed random vectors with $X_i \sim \mathcal{N}_p(\mu,\Sigma)$, where $\mu \in \mathbb{R}^p$ and $\Sigma \in \mathbb{R}^{p \times p}$ is symmetric positive definite. Let the eigenvalues of $\Sigma$ be distinct and ordered
\begin{align*}
\lambda_1 > \lambda_2 > \cdots > \lambda_p > 0.
\end{align*}
For each $n \geq 2$, define the unbiased sample covariance matrix
\begin{align*}
S_n := \frac{1}{n-1}\sum_{i=1}^{n}(X_i-\overline{X}_n)(X_i-\overline{X}_n)^\top,
\qquad
\overline{X}_n := \frac{1}{n}\sum_{i=1}^{n}X_i.
\end{align*}
Let $\hat{\lambda}_{1,n} > \cdots > \hat{\lambda}_{p,n}$ denote the ordered eigenvalues of $S_n$ on the event where they are distinct, with any measurable ordering convention on the null complement.
Then, as $n \to \infty$,
\begin{align*}
\left(\sqrt{n}(\hat{\lambda}_{1,n}-\lambda_1),\dots,\sqrt{n}(\hat{\lambda}_{p,n}-\lambda_p)\right)
\xrightarrow{d}
(Z_1,\dots,Z_p),
\end{align*}
where $Z_1,\dots,Z_p$ are independent real-valued Gaussian random variables satisfying
\begin{align*}
Z_k \sim \mathcal{N}(0,2\lambda_k^2)
\end{align*}
for each $k \in \{1,\dots,p\}$. In particular, for each fixed $k$,
\begin{align*}
\sqrt{n}(\hat{\lambda}_{k,n}-\lambda_k)\xrightarrow{d}\mathcal{N}(0,2\lambda_k^2).
\end{align*}
Discussion
This theorem gives the asymptotic normal distribution of the ordered sample covariance eigenvalues under a multivariate normal model with distinct population eigenvalues. It provides the large-sample fluctuation theory for principal component eigenvalues.
Proof
[proofplan]
We rotate the data into an orthonormal eigenbasis of $\Sigma$, where the population covariance matrix is diagonal. In that basis, the diagonal entries of the unbiased sample covariance matrix are independent sample variances of one-dimensional normal samples, so their scaled fluctuations converge jointly to independent Gaussian limits. We then prove that, at a symmetric matrix with distinct eigenvalues, the first-order perturbation of the $k$-th eigenvalue is exactly the $k$-th diagonal entry of the perturbation in the eigenbasis. Combining this perturbation expansion with the entrywise order of the sample covariance fluctuation gives the joint limit for the sample eigenvalues.
[/proofplan]
[step:Rotate the observations into the population eigenbasis]
Let $(\Omega,\mathcal{F},\mathbb{P})$ denote the probability space on which the random vectors $X_1,\dots,X_n$ are defined, and let $I_p\in\mathbb{R}^{p\times p}$ denote the identity matrix. Let $S_n$ denote the unbiased [sample covariance](/page/Sample%20Covariance) matrix formed from $X_1,\dots,X_n$, and let $\hat\lambda_{1,n}>\cdots>\hat\lambda_{p,n}$ denote its ordered sample [eigenvalues](/page/Eigenvalue). Since $\Sigma$ is symmetric positive definite, the [Spectral Theorem](/page/Spectral%20Theorem) gives an orthogonal matrix $\Gamma \in \mathbb{R}^{p \times p}$ such that
\begin{align*}
\Gamma^\top \Sigma \Gamma = \Lambda,
\qquad
\Lambda := \operatorname{diag}(\lambda_1,\dots,\lambda_p).
\end{align*}
For each $i \in \mathbb{N}$, define the rotated random vector
\begin{align*}
Y_i: \Omega &\to \mathbb{R}^p \\
\omega &\mapsto \Gamma^\top(X_i(\omega)-\mu).
\end{align*}
Because affine orthogonal transformations preserve [multivariate normal](/page/Multivariate%20Normal%20Distribution) distributions, the random vectors $Y_1,Y_2,\dots$ are independent identically distributed with
\begin{align*}
Y_i \sim \mathcal{N}_p(0,\Lambda).
\end{align*}
Thus the coordinate random variables $Y_{i1},\dots,Y_{ip}$ are independent and satisfy
\begin{align*}
Y_{ik} \sim \mathcal{N}(0,\lambda_k)
\end{align*}
for each $k \in \{1,\dots,p\}$.
Define the rotated sample covariance matrix
\begin{align*}
T_n := \Gamma^\top S_n \Gamma.
\end{align*}
Since orthogonal conjugation preserves [eigenvalues](/page/Eigenvalue), $T_n$ and $S_n$ have the same ordered eigenvalues. Writing
\begin{align*}
\overline{Y}_n := \frac{1}{n}\sum_{i=1}^{n}Y_i,
\end{align*}
we have
\begin{align*}
T_n
= \frac{1}{n-1}\sum_{i=1}^{n}(Y_i-\overline{Y}_n)(Y_i-\overline{Y}_n)^\top.
\end{align*}
[guided]
Let $(\Omega,\mathcal{F},\mathbb{P})$ be the probability space on which $X_1,\dots,X_n$ are defined, and let $I_p\in\mathbb{R}^{p\times p}$ be the identity matrix. The purpose of the rotation is to replace the general covariance matrix $\Sigma$ by a diagonal covariance matrix without changing eigenvalues of the sample covariance matrix. Since $\Sigma$ is symmetric positive definite, the [Spectral Theorem](/page/Spectral%20Theorem) gives an orthogonal matrix $\Gamma \in \mathbb{R}^{p \times p}$ with
\begin{align*}
\Gamma^\top \Sigma \Gamma = \Lambda,
\qquad
\Lambda := \operatorname{diag}(\lambda_1,\dots,\lambda_p).
\end{align*}
For each observation we define the centered and rotated random vector
\begin{align*}
Y_i: \Omega &\to \mathbb{R}^p \\
\omega &\mapsto \Gamma^\top(X_i(\omega)-\mu).
\end{align*}
Because $X_i \sim \mathcal{N}_p(\mu,\Sigma)$, the vector $X_i-\mu$ has [multivariate normal](/page/Multivariate%20Normal%20Distribution) distribution $\mathcal{N}_p(0,\Sigma)$. Applying the [linear map](/page/Linear%20Map) $\Gamma^\top$ gives
\begin{align*}
Y_i \sim \mathcal{N}_p(0,\Gamma^\top\Sigma\Gamma)=\mathcal{N}_p(0,\Lambda).
\end{align*}
The matrix $\Lambda$ is diagonal, so the coordinates $Y_{i1},\dots,Y_{ip}$ are independent normal random variables, with $Y_{ik}\sim\mathcal{N}(0,\lambda_k)$. Independence across the index $i$ is preserved because each $Y_i$ is a measurable function of $X_i$ alone.
Now define
\begin{align*}
T_n := \Gamma^\top S_n \Gamma.
\end{align*}
This is the sample covariance matrix in the rotated coordinates. Orthogonal conjugation preserves characteristic polynomials, since
\begin{align*}
\det(T_n-tI_p)
= \det(\Gamma^\top(S_n-tI_p)\Gamma)
= \det(S_n-tI_p),
\end{align*}
so $T_n$ and $S_n$ have the same eigenvalues. Therefore it is enough to prove the theorem for $T_n$.
[/guided]
[/step]
[step:Identify the diagonal sample variance fluctuations]
For each $k \in \{1,\dots,p\}$, define the one-dimensional sample mean and sample variance
\begin{align*}
\overline{Y}_{k,n} := \frac{1}{n}\sum_{i=1}^{n}Y_{ik},
\qquad
V_{k,n} := \frac{1}{n-1}\sum_{i=1}^{n}(Y_{ik}-\overline{Y}_{k,n})^2.
\end{align*}
Then $(T_n)_{kk}=V_{k,n}$.
For normal samples, the centered sample variance satisfies
\begin{align*}
\frac{(n-1)V_{k,n}}{\lambda_k}\sim \chi^2_{n-1},
\end{align*}
and the random variables $V_{1,n},\dots,V_{p,n}$ are independent because they are functions of the independent coordinate samples $(Y_{1k},\dots,Y_{nk})$.
Let $G_{k,n}$ be defined by
\begin{align*}
G_{k,n}:=\sqrt{n}(V_{k,n}-\lambda_k).
\end{align*}
Let $Q_{k,n}:=(n-1)V_{k,n}/\lambda_k$. There exist independent standard normal random variables $N_{1,k},\dots,N_{n-1,k}$ such that
\begin{align*}
Q_{k,n}=\sum_{m=1}^{n-1}N_{m,k}^2.
\end{align*}
The variables $N_{m,k}^2-1$ are independent, centered, and have variance $2$, so the [Central Limit Theorem](/page/Central%20Limit%20Theorem) gives
\begin{align*}
\frac{Q_{k,n}-(n-1)}{\sqrt{2(n-1)}}\xrightarrow{d}\mathcal{N}(0,1).
\end{align*}
Since
\begin{align*}
G_{k,n}
=\lambda_k\sqrt{n}\left(\frac{Q_{k,n}}{n-1}-1\right)
=\lambda_k\sqrt{\frac{2n}{n-1}}\,\frac{Q_{k,n}-(n-1)}{\sqrt{2(n-1)}},
\end{align*}
it follows that
\begin{align*}
G_{k,n}\xrightarrow{d}\mathcal{N}(0,2\lambda_k^2).
\end{align*}
Since the coordinates are independent for each $n$, the vector convergence is
\begin{align*}
(G_{1,n},\dots,G_{p,n})
\xrightarrow{d}
(Z_1,\dots,Z_p),
\end{align*}
where $Z_1,\dots,Z_p$ are independent and $Z_k\sim\mathcal{N}(0,2\lambda_k^2)$.
This is the chi-square form of the [Central Limit Theorem](/page/Central%20Limit%20Theorem), equivalently the [central limit theorem](/theorems/1848) applied to standardized squared normal variables.
[guided]
We now compute the first-order fluctuations of the diagonal entries of $T_n$. For each coordinate $k$, define
\begin{align*}
\overline{Y}_{k,n} := \frac{1}{n}\sum_{i=1}^{n}Y_{ik},
\qquad
V_{k,n} := \frac{1}{n-1}\sum_{i=1}^{n}(Y_{ik}-\overline{Y}_{k,n})^2.
\end{align*}
By the formula for $T_n$, its $k$-th diagonal entry is exactly
\begin{align*}
(T_n)_{kk}
= \frac{1}{n-1}\sum_{i=1}^{n}(Y_{ik}-\overline{Y}_{k,n})^2
= V_{k,n}.
\end{align*}
The variables $Y_{1k},\dots,Y_{nk}$ form an independent sample from $\mathcal{N}(0,\lambda_k)$. For a one-dimensional normal sample, the unbiased sample variance has the chi-square representation
\begin{align*}
\frac{(n-1)V_{k,n}}{\lambda_k}\sim \chi^2_{n-1}.
\end{align*}
Moreover, for different values of $k$, the coordinate samples $(Y_{1k},\dots,Y_{nk})$ are independent because the covariance matrix $\Lambda$ is diagonal and the vectors $Y_i$ are jointly normal. Hence $V_{1,n},\dots,V_{p,n}$ are independent.
Define
\begin{align*}
G_{k,n}:=\sqrt{n}(V_{k,n}-\lambda_k).
\end{align*}
Writing $Q_{k,n}:=(n-1)V_{k,n}/\lambda_k$, we have $Q_{k,n}\sim \chi^2_{n-1}$ and
\begin{align*}
G_{k,n}
= \lambda_k\sqrt{n}\left(\frac{Q_{k,n}}{n-1}-1\right)
= \lambda_k\frac{\sqrt{n}}{\sqrt{2(n-1)}}\left(\frac{Q_{k,n}-(n-1)}{\sqrt{(n-1)/2}}\right).
\end{align*}
The [Central Limit Theorem](/page/Central%20Limit%20Theorem), applied to the sum representation of the chi-square variable $Q_{k,n}$, says that
\begin{align*}
\frac{Q_{k,n}-(n-1)}{\sqrt{2(n-1)}}\xrightarrow{d}\mathcal{N}(0,1).
\end{align*}
Equivalently, $G_{k,n}\xrightarrow{d}\mathcal{N}(0,2\lambda_k^2)$ in the sense of [convergence in distribution](/page/Convergence%20in%20Distribution). Because the random variables $G_{1,n},\dots,G_{p,n}$ are independent for each $n$, their joint characteristic function factors into the product of the marginal characteristic functions. Passing to the limit gives the joint convergence
\begin{align*}
(G_{1,n},\dots,G_{p,n})
\xrightarrow{d}
(Z_1,\dots,Z_p),
\end{align*}
where $Z_1,\dots,Z_p$ are independent and $Z_k\sim\mathcal{N}(0,2\lambda_k^2)$.
This step uses the chi-square form of the [Central Limit Theorem](/page/Central%20Limit%20Theorem), equivalently the classical [central limit theorem](/theorems/521) applied to the centered variables $(Y_{ik}^2-\lambda_k)/\lambda_k$.
[/guided]
[/step]
[step:Control the size of the whole covariance perturbation]
Define the symmetric perturbation matrix
\begin{align*}
E_n := T_n-\Lambda.
\end{align*}
For random matrices $A_n$ and positive deterministic numbers $a_n$, the notation $A_n=O_{\mathbb{P}}(a_n)$ means that the family $\{\|A_n\|_{\mathrm{op}}/a_n:n\in\mathbb{N}\}$ is tight, where $\|\cdot\|_{\mathrm{op}}$ is the [operator norm](/page/Operator%20Norm). The notation $A_n=o_{\mathbb{P}}(a_n)$ means that $\|A_n\|_{\mathrm{op}}/a_n\xrightarrow{\mathbb{P}}0$ in the sense of [convergence in probability](/page/Convergence%20in%20Probability).
For $a,b \in \{1,\dots,p\}$, define the coordinate sample mean
\begin{align*}
\overline{Y}_{a,n}:=\frac{1}{n}\sum_{i=1}^{n}Y_{ia}.
\end{align*}
The $(a,b)$ entry of $T_n$ is
\begin{align*}
(T_n)_{ab}
&=\frac{1}{n-1}\sum_{i=1}^{n}(Y_{ia}-\overline{Y}_{a,n})(Y_{ib}-\overline{Y}_{b,n}) \\
&=\frac{1}{n-1}\sum_{i=1}^{n}Y_{ia}Y_{ib}-\frac{n}{n-1}\overline{Y}_{a,n}\overline{Y}_{b,n}.
\end{align*}
Since $\mathbb{E}[Y_{ia}Y_{ib}]=\lambda_a\mathbb{1}_{\{a=b\}}$, the [Central Limit Theorem](/page/Central%20Limit%20Theorem) applied to the centered variables $Y_{ia}Y_{ib}-\mathbb{E}[Y_{ia}Y_{ib}]$ gives
\begin{align*}
\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left(Y_{ia}Y_{ib}-\mathbb{E}[Y_{ia}Y_{ib}]\right)=O_{\mathbb{P}}(1),
\end{align*}
because [normal coordinates](/theorems/2713) have finite fourth moments. Also $\sqrt{n}\,\overline{Y}_{a,n}=O_{\mathbb{P}}(1)$ and $\sqrt{n}\,\overline{Y}_{b,n}=O_{\mathbb{P}}(1)$ by the same theorem applied to $Y_{ia}$ and $Y_{ib}$, so
\begin{align*}
\sqrt{n}\,\overline{Y}_{a,n}\overline{Y}_{b,n}=O_{\mathbb{P}}(n^{-1/2})=O_{\mathbb{P}}(1).
\end{align*}
Thus
\begin{align*}
\sqrt{n}(E_n)_{ab}=O_{\mathbb{P}}(1)
\end{align*}
for every $a,b \in \{1,\dots,p\}$. Since $p$ is fixed, entrywise tightness of the $p^2$ entries implies tightness of every matrix norm; in particular,
\begin{align*}
\|E_n\|_{\mathrm{op}}=O_{\mathbb{P}}(n^{-1/2}).
\end{align*}
[guided]
The point of this step is to show that the whole covariance perturbation is of order $n^{-1/2}$, not just its diagonal entries. We first fix the asymptotic notation. For random matrices $A_n$ and positive deterministic numbers $a_n$, $A_n=O_{\mathbb{P}}(a_n)$ means that the family $\{\|A_n\|_{\mathrm{op}}/a_n:n\in\mathbb{N}\}$ is tight. Also $A_n=o_{\mathbb{P}}(a_n)$ means that $\|A_n\|_{\mathrm{op}}/a_n\xrightarrow{\mathbb{P}}0$, where this is [convergence in probability](/page/Convergence%20in%20Probability). The norm here is the [operator norm](/page/Operator%20Norm), although any fixed matrix norm would give the same probabilistic order because the dimension $p$ is fixed.
Define
\begin{align*}
E_n:=T_n-\Lambda.
\end{align*}
For $a,b\in\{1,\dots,p\}$, define
\begin{align*}
\overline{Y}_{a,n}:=\frac{1}{n}\sum_{i=1}^{n}Y_{ia}.
\end{align*}
Expanding the centered products gives
\begin{align*}
(T_n)_{ab}
&=\frac{1}{n-1}\sum_{i=1}^{n}(Y_{ia}-\overline{Y}_{a,n})(Y_{ib}-\overline{Y}_{b,n}) \\
&=\frac{1}{n-1}\sum_{i=1}^{n}Y_{ia}Y_{ib}-\frac{n}{n-1}\overline{Y}_{a,n}\overline{Y}_{b,n}.
\end{align*}
This formula isolates the ordinary average of products from the sample-mean correction.
Since $Y_i\sim\mathcal{N}_p(0,\Lambda)$, we have $\mathbb{E}[Y_{ia}Y_{ib}]=\lambda_a\mathbb{1}_{\{a=b\}}$. The variables $Y_{ia}Y_{ib}$ have finite variance because the normal coordinates have finite fourth moments. Therefore the [Central Limit Theorem](/page/Central%20Limit%20Theorem) gives
\begin{align*}
\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left(Y_{ia}Y_{ib}-\mathbb{E}[Y_{ia}Y_{ib}]\right)=O_{\mathbb{P}}(1).
\end{align*}
The sample-mean term is smaller. Applying the same theorem to the one-dimensional samples $Y_{1a},\dots,Y_{na}$ and $Y_{1b},\dots,Y_{nb}$ gives $\sqrt{n}\,\overline{Y}_{a,n}=O_{\mathbb{P}}(1)$ and $\sqrt{n}\,\overline{Y}_{b,n}=O_{\mathbb{P}}(1)$. Hence
\begin{align*}
\sqrt{n}\,\overline{Y}_{a,n}\overline{Y}_{b,n}=O_{\mathbb{P}}(n^{-1/2})=O_{\mathbb{P}}(1).
\end{align*}
Combining the product-average term and the sample-mean correction yields
\begin{align*}
\sqrt{n}(E_n)_{ab}=O_{\mathbb{P}}(1)
\end{align*}
for every pair $a,b$.
Finally, there are only $p^2$ entries because $p$ is fixed. Tightness of finitely many entries implies tightness of any fixed matrix norm. Therefore
\begin{align*}
\|E_n\|_{\mathrm{op}}=O_{\mathbb{P}}(n^{-1/2}).
\end{align*}
[/guided]
[/step]
[step:Compute the first-order perturbation of a simple eigenvalue]
Let $A:=\Lambda$. For each $k \in \{1,\dots,p\}$, let $\rho_k(B)$ denote the $k$-th largest eigenvalue of a symmetric matrix $B \in \mathbb{R}^{p \times p}$. Define the spectral gap
\begin{align*}
\delta_k := \min_{j\neq k}|\lambda_k-\lambda_j|>0.
\end{align*}
[claim:Simple diagonal eigenvalues have diagonal first variation]
There exist constants $r_k>0$ and $C_k>0$, depending only on $\Lambda$ and $k$, such that every symmetric matrix $H \in \mathbb{R}^{p \times p}$ with $\|H\|_{\mathrm{op}}\le r_k$ satisfies
\begin{align*}
\left|\rho_k(\Lambda+H)-\lambda_k-H_{kk}\right|
\le C_k\|H\|_{\mathrm{op}}^2.
\end{align*}
[/claim]
[proof]
Let $e_k \in \mathbb{R}^p$ denote the $k$-th standard basis vector, and let $P_k:\mathbb{R}^p\to\mathbb{R}^p$ be the [orthogonal projection](/theorems/437) onto $\operatorname{span}\{e_k\}^\perp$. Choose
\begin{align*}
r_k := \frac{\delta_k}{4}.
\end{align*}
If $\|H\|_{\mathrm{op}}\le r_k$, [Weyl's eigenvalue inequality](/page/Weyl%27s%20Inequality) gives
\begin{align*}
|\rho_k(\Lambda+H)-\lambda_k|\le \|H\|_{\mathrm{op}}\le \frac{\delta_k}{4},
\end{align*}
so the eigenvalue $\rho_k(\Lambda+H)$ remains separated from every $\lambda_j$ with $j\neq k$ by at least $\delta_k/2$.
Let $\ell:=\rho_k(\Lambda+H)$, and choose a unit eigenvector $u \in \mathbb{R}^p$ of $\Lambda+H$ for $\ell$ with positive $k$-th coordinate. Write
\begin{align*}
u = \alpha e_k+w,
\end{align*}
where $\alpha \in \mathbb{R}$, $w \in \operatorname{span}\{e_k\}^\perp$, and $\alpha^2+|w|^2=1$.
Projecting the eigenvalue equation $(\Lambda+H)u=\ell u$ onto $\operatorname{span}\{e_k\}^\perp$ gives
\begin{align*}
(P_k\Lambda P_k-\ell I)w = -P_kH(\alpha e_k+w).
\end{align*}
Let $I_{k}^{\perp}:\operatorname{span}\{e_k\}^\perp\to\operatorname{span}\{e_k\}^\perp$ denote the identity map on $\operatorname{span}\{e_k\}^\perp$. The inverse of $P_k\Lambda P_k-\ell I_k^{\perp}$ on $\operatorname{span}\{e_k\}^\perp$ has operator norm at most $2/\delta_k$, hence
\begin{align*}
|w|
\le \frac{2}{\delta_k}\|H\|_{\mathrm{op}}|\alpha e_k+w|
= \frac{2}{\delta_k}\|H\|_{\mathrm{op}}.
\end{align*}
Taking the inner product of $(\Lambda+H)u=\ell u$ with $u$ gives
\begin{align*}
\ell
= u^\top\Lambda u+u^\top H u.
\end{align*}
Subtracting $\lambda_k+H_{kk}$ yields
\begin{align*}
\ell-\lambda_k-H_{kk}
&= (\alpha^2-1)\lambda_k+\sum_{j\neq k}\lambda_j w_j^2
+(\alpha^2-1)H_{kk}+2\alpha\sum_{j\neq k}H_{kj}w_j+\sum_{i,j\neq k}H_{ij}w_iw_j.
\end{align*}
Since $\alpha^2-1=-|w|^2$, $|\alpha|\le 1$, and $|w|\le 2\|H\|_{\mathrm{op}}/\delta_k$, every term on the right is bounded by a constant depending only on $\Lambda$ and $k$ times $\|H\|_{\mathrm{op}}^2$. Thus there is $C_k>0$ such that
\begin{align*}
|\ell-\lambda_k-H_{kk}|\le C_k\|H\|_{\mathrm{op}}^2.
\end{align*}
[/proof]
Applying the claim with $H=E_n$ gives
\begin{align*}
\hat{\lambda}_{k,n}-\lambda_k
=
(E_n)_{kk}+R_{k,n},
\qquad
|R_{k,n}|\le C_k\|E_n\|_{\mathrm{op}}^2
\end{align*}
with probability tending to $1$. Since $\|E_n\|_{\mathrm{op}}=O_{\mathbb{P}}(n^{-1/2})$, we have
\begin{align*}
\sqrt{n}R_{k,n}=O_{\mathbb{P}}(n^{-1/2})\xrightarrow{\mathbb{P}}0.
\end{align*}
The last convergence is [convergence in probability](/page/Convergence%20in%20Probability).
Therefore
\begin{align*}
\sqrt{n}(\hat{\lambda}_{k,n}-\lambda_k)
=
\sqrt{n}(T_n)_{kk}-\sqrt{n}\lambda_k+o_{\mathbb{P}}(1)
=
G_{k,n}+o_{\mathbb{P}}(1).
\end{align*}
[guided]
The diagonal entries give the first-order eigenvalue fluctuations, but we still need to justify that the off-diagonal entries do not contribute at order $n^{-1/2}$. This is a perturbation statement for a symmetric matrix with simple eigenvalues.
Let $A:=\Lambda$. For a symmetric matrix $B \in \mathbb{R}^{p\times p}$, let $\rho_k(B)$ denote its $k$-th largest eigenvalue. Define
\begin{align*}
\delta_k := \min_{j\neq k}|\lambda_k-\lambda_j|>0.
\end{align*}
This number is positive because the population eigenvalues are distinct.
We prove the perturbation estimate. Let $e_k \in \mathbb{R}^p$ be the $k$-th standard basis vector, and let $P_k:\mathbb{R}^p\to\mathbb{R}^p$ be the orthogonal projection onto $\operatorname{span}\{e_k\}^\perp$. Choose
\begin{align*}
r_k:=\frac{\delta_k}{4}.
\end{align*}
If $H$ is symmetric and $\|H\|_{\mathrm{op}}\le r_k$, [Weyl's eigenvalue inequality](/page/Weyl%27s%20Inequality) gives
\begin{align*}
|\rho_k(\Lambda+H)-\lambda_k|\le \|H\|_{\mathrm{op}}\le \frac{\delta_k}{4}.
\end{align*}
Thus the perturbed $k$-th eigenvalue stays separated from all the other unperturbed eigenvalues. Put
\begin{align*}
\ell:=\rho_k(\Lambda+H).
\end{align*}
Choose a unit eigenvector $u \in \mathbb{R}^p$ of $\Lambda+H$ for $\ell$, and choose its sign so that its $k$-th coordinate is nonnegative. Decompose it as
\begin{align*}
u=\alpha e_k+w,
\end{align*}
where $\alpha\in\mathbb{R}$, $w\in\operatorname{span}\{e_k\}^\perp$, and $\alpha^2+|w|^2=1$.
Project the eigenvalue equation
\begin{align*}
(\Lambda+H)u=\ell u
\end{align*}
onto $\operatorname{span}\{e_k\}^\perp$. Since $P_k\Lambda e_k=0$, this gives
\begin{align*}
(P_k\Lambda P_k-\ell I)w=-P_kH(\alpha e_k+w).
\end{align*}
For $j\neq k$, the eigenvalues of $P_k\Lambda P_k-\ell I$ are $\lambda_j-\ell$. The separation estimate above gives
\begin{align*}
|\lambda_j-\ell|\ge \frac{\delta_k}{2}.
\end{align*}
Let $I_k^{\perp}:\operatorname{span}\{e_k\}^\perp\to\operatorname{span}\{e_k\}^\perp$ denote the identity map on $\operatorname{span}\{e_k\}^\perp$. Hence the inverse of $P_k\Lambda P_k-\ell I_k^{\perp}$ on $\operatorname{span}\{e_k\}^\perp$ has operator norm at most $2/\delta_k$, and therefore
\begin{align*}
|w|
\le \frac{2}{\delta_k}\|H\|_{\mathrm{op}}|\alpha e_k+w|
= \frac{2}{\delta_k}\|H\|_{\mathrm{op}},
\end{align*}
because $u=\alpha e_k+w$ is a unit vector.
Now compute the eigenvalue through the Rayleigh quotient:
\begin{align*}
\ell
= u^\top(\Lambda+H)u
= u^\top\Lambda u+u^\top H u.
\end{align*}
Expanding with $u=\alpha e_k+w$ gives
\begin{align*}
\ell-\lambda_k-H_{kk}
&= (\alpha^2-1)\lambda_k+\sum_{j\neq k}\lambda_j w_j^2
+(\alpha^2-1)H_{kk}
+2\alpha\sum_{j\neq k}H_{kj}w_j
+\sum_{i,j\neq k}H_{ij}w_iw_j.
\end{align*}
The key point is that $\alpha^2-1=-|w|^2$, and we already proved $|w|=O(\|H\|_{\mathrm{op}})$. Thus each term on the right is $O(\|H\|_{\mathrm{op}}^2)$, with a constant depending only on the fixed matrix $\Lambda$ and the index $k$. Hence there is $C_k>0$ such that
\begin{align*}
|\rho_k(\Lambda+H)-\lambda_k-H_{kk}|
\le C_k\|H\|_{\mathrm{op}}^2.
\end{align*}
We apply this with $H=E_n=T_n-\Lambda$. Since $\|E_n\|_{\mathrm{op}}=O_{\mathbb{P}}(n^{-1/2})$, the quadratic remainder satisfies
\begin{align*}
\sqrt{n}\|E_n\|_{\mathrm{op}}^2=O_{\mathbb{P}}(n^{-1/2})\xrightarrow{\mathbb{P}}0.
\end{align*}
Therefore
\begin{align*}
\sqrt{n}(\hat{\lambda}_{k,n}-\lambda_k)
=
\sqrt{n}(E_n)_{kk}+o_{\mathbb{P}}(1)
=
\sqrt{n}((T_n)_{kk}-\lambda_k)+o_{\mathbb{P}}(1)
=
G_{k,n}+o_{\mathbb{P}}(1).
\end{align*}
This proves that the off-diagonal entries affect eigenvectors at first order but affect eigenvalues only at second order.
[/guided]
[/step]
[step:Pass from diagonal variance fluctuations to eigenvalue fluctuations]
From the previous step, for each $k \in \{1,\dots,p\}$,
\begin{align*}
\sqrt{n}(\hat{\lambda}_{k,n}-\lambda_k)=G_{k,n}+o_{\mathbb{P}}(1).
\end{align*}
Equivalently,
\begin{align*}
\left(\sqrt{n}(\hat{\lambda}_{1,n}-\lambda_1),\dots,\sqrt{n}(\hat{\lambda}_{p,n}-\lambda_p)\right)
=
(G_{1,n},\dots,G_{p,n})+o_{\mathbb{P}}(1)
\end{align*}
in $\mathbb{R}^p$. By [Slutsky's Theorem](/page/Slutsky%27s%20Theorem), applied in $\mathbb{R}^p$, and by the joint [convergence in distribution](/page/Convergence%20in%20Distribution) of $(G_{1,n},\dots,G_{p,n})$,
\begin{align*}
\left(\sqrt{n}(\hat{\lambda}_{1,n}-\lambda_1),\dots,\sqrt{n}(\hat{\lambda}_{p,n}-\lambda_p)\right)
\xrightarrow{d}
(Z_1,\dots,Z_p),
\end{align*}
where $Z_1,\dots,Z_p$ are independent and $Z_k\sim\mathcal{N}(0,2\lambda_k^2)$. Taking the $k$-th coordinate gives
\begin{align*}
\sqrt{n}(\hat{\lambda}_{k,n}-\lambda_k)\xrightarrow{d}\mathcal{N}(0,2\lambda_k^2).
\end{align*}
The joint limit has diagonal covariance matrix, hence the limiting fluctuations of distinct sample eigenvalues are independent. This proves the theorem.
[guided]
From the perturbation step, for each $k\in\{1,\dots,p\}$ we have
\begin{align*}
\sqrt{n}(\hat{\lambda}_{k,n}-\lambda_k)=G_{k,n}+o_{\mathbb{P}}(1).
\end{align*}
Because there are only finitely many coordinates, these coordinatewise probability errors combine into the vector statement
\begin{align*}
\left(\sqrt{n}(\hat{\lambda}_{1,n}-\lambda_1),\dots,\sqrt{n}(\hat{\lambda}_{p,n}-\lambda_p)\right)
=
(G_{1,n},\dots,G_{p,n})+o_{\mathbb{P}}(1)
\end{align*}
in $\mathbb{R}^p$. The diagonal variance step proved the joint [convergence in distribution](/page/Convergence%20in%20Distribution)
\begin{align*}
(G_{1,n},\dots,G_{p,n})\xrightarrow{d}(Z_1,\dots,Z_p),
\end{align*}
where $Z_1,\dots,Z_p$ are independent and $Z_k\sim\mathcal{N}(0,2\lambda_k^2)$.
We now apply [Slutsky's Theorem](/page/Slutsky%27s%20Theorem) in the finite-dimensional space $\mathbb{R}^p$. Its hypotheses are satisfied because the first vector converges in distribution and the error vector converges to $0$ in probability. Therefore
\begin{align*}
\left(\sqrt{n}(\hat{\lambda}_{1,n}-\lambda_1),\dots,\sqrt{n}(\hat{\lambda}_{p,n}-\lambda_p)\right)
\xrightarrow{d}
(Z_1,\dots,Z_p).
\end{align*}
Taking the $k$-th coordinate gives
\begin{align*}
\sqrt{n}(\hat{\lambda}_{k,n}-\lambda_k)\xrightarrow{d}\mathcal{N}(0,2\lambda_k^2).
\end{align*}
Since the limiting vector has independent coordinates, the limiting fluctuations of distinct sample eigenvalues are independent. This proves the theorem.
[/guided]
[/step]
Prerequisites (0/3 completed)
Prerequisites Graph
Interactive dependency map showing how this theorem builds on foundational concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Theorems
Definitions & Concepts
Explore Further
Distribution
Definition
Orthogonal Projection
Theorem #437
Normal Coordinates
Theorem #2713
Kalman Filter Recursion Theorem
probability
Linear Forecast from the Wold Representation
probability
Unbiasedness of the Sample Mean and Sample Covariance Matrix
probability
Expectation of an Indicator Random Variable
probability
Sklar's Theorem
probability
Distribution of the Sample Mean of a Multivariate Normal Sample
probability
Rank Correlations for the Bivariate Normal Distribution
probability
Positive Definiteness of the Autocovariance Function
probability