Anderson's Asymptotic Normality Theorem for Sample Covariance Eigenvalues

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We rotate the data into an orthonormal eigenbasis of $\Sigma$, where the population covariance matrix is diagonal. In that basis, the diagonal entries of the unbiased sample covariance matrix are independent sample variances of one-dimensional normal samples, so their scaled fluctuations converge jointly to independent Gaussian limits. We then prove that, at a symmetric matrix with distinct eigenvalues, the first-order perturbation of the $k$-th eigenvalue is exactly the $k$-th diagonal entry of the perturbation in the eigenbasis. Combining this perturbation expansion with the entrywise order of the sample covariance fluctuation gives the joint limit for the sample eigenvalues. [/proofplan] [step:Rotate the observations into the population eigenbasis] Let $(\Omega,\mathcal{F},\mathbb{P})$ denote the probability space on which the random vectors $X_1,\dots,X_n$ are defined, and let $I_p\in\mathbb{R}^{p\times p}$ denote the identity matrix. Let $S_n$ denote the unbiased [sample covariance](/page/Sample%20Covariance) matrix formed from $X_1,\dots,X_n$, and let $\hat\lambda_{1,n}>\cdots>\hat\lambda_{p,n}$ denote its ordered sample [eigenvalues](/page/Eigenvalue). Since $\Sigma$ is symmetric positive definite, the [Spectral Theorem](/page/Spectral%20Theorem) gives an orthogonal matrix $\Gamma \in \mathbb{R}^{p \times p}$ such that \begin{align*} \Gamma^\top \Sigma \Gamma = \Lambda, \qquad \Lambda := \operatorname{diag}(\lambda_1,\dots,\lambda_p). \end{align*} For each $i \in \mathbb{N}$, define the rotated random vector \begin{align*} Y_i: \Omega &\to \mathbb{R}^p \\ \omega &\mapsto \Gamma^\top(X_i(\omega)-\mu). \end{align*} Because affine orthogonal transformations preserve [multivariate normal](/page/Multivariate%20Normal%20Distribution) distributions, the random vectors $Y_1,Y_2,\dots$ are independent identically distributed with \begin{align*} Y_i \sim \mathcal{N}_p(0,\Lambda). \end{align*} Thus the coordinate random variables $Y_{i1},\dots,Y_{ip}$ are independent and satisfy \begin{align*} Y_{ik} \sim \mathcal{N}(0,\lambda_k) \end{align*} for each $k \in \{1,\dots,p\}$. Define the rotated sample covariance matrix \begin{align*} T_n := \Gamma^\top S_n \Gamma. \end{align*} Since orthogonal conjugation preserves [eigenvalues](/page/Eigenvalue), $T_n$ and $S_n$ have the same ordered eigenvalues. Writing \begin{align*} \overline{Y}_n := \frac{1}{n}\sum_{i=1}^{n}Y_i, \end{align*} we have \begin{align*} T_n = \frac{1}{n-1}\sum_{i=1}^{n}(Y_i-\overline{Y}_n)(Y_i-\overline{Y}_n)^\top. \end{align*} [guided] Let $(\Omega,\mathcal{F},\mathbb{P})$ be the probability space on which $X_1,\dots,X_n$ are defined, and let $I_p\in\mathbb{R}^{p\times p}$ be the identity matrix. The purpose of the rotation is to replace the general covariance matrix $\Sigma$ by a diagonal covariance matrix without changing eigenvalues of the sample covariance matrix. Since $\Sigma$ is symmetric positive definite, the [Spectral Theorem](/page/Spectral%20Theorem) gives an orthogonal matrix $\Gamma \in \mathbb{R}^{p \times p}$ with \begin{align*} \Gamma^\top \Sigma \Gamma = \Lambda, \qquad \Lambda := \operatorname{diag}(\lambda_1,\dots,\lambda_p). \end{align*} For each observation we define the centered and rotated random vector \begin{align*} Y_i: \Omega &\to \mathbb{R}^p \\ \omega &\mapsto \Gamma^\top(X_i(\omega)-\mu). \end{align*} Because $X_i \sim \mathcal{N}_p(\mu,\Sigma)$, the vector $X_i-\mu$ has [multivariate normal](/page/Multivariate%20Normal%20Distribution) distribution $\mathcal{N}_p(0,\Sigma)$. Applying the [linear map](/page/Linear%20Map) $\Gamma^\top$ gives \begin{align*} Y_i \sim \mathcal{N}_p(0,\Gamma^\top\Sigma\Gamma)=\mathcal{N}_p(0,\Lambda). \end{align*} The matrix $\Lambda$ is diagonal, so the coordinates $Y_{i1},\dots,Y_{ip}$ are independent normal random variables, with $Y_{ik}\sim\mathcal{N}(0,\lambda_k)$. Independence across the index $i$ is preserved because each $Y_i$ is a measurable function of $X_i$ alone. Now define \begin{align*} T_n := \Gamma^\top S_n \Gamma. \end{align*} This is the sample covariance matrix in the rotated coordinates. Orthogonal conjugation preserves characteristic polynomials, since \begin{align*} \det(T_n-tI_p) = \det(\Gamma^\top(S_n-tI_p)\Gamma) = \det(S_n-tI_p), \end{align*} so $T_n$ and $S_n$ have the same eigenvalues. Therefore it is enough to prove the theorem for $T_n$. [/guided] [/step] [step:Identify the diagonal sample variance fluctuations] For each $k \in \{1,\dots,p\}$, define the one-dimensional sample mean and sample variance \begin{align*} \overline{Y}_{k,n} := \frac{1}{n}\sum_{i=1}^{n}Y_{ik}, \qquad V_{k,n} := \frac{1}{n-1}\sum_{i=1}^{n}(Y_{ik}-\overline{Y}_{k,n})^2. \end{align*} Then $(T_n)_{kk}=V_{k,n}$. For normal samples, the centered sample variance satisfies \begin{align*} \frac{(n-1)V_{k,n}}{\lambda_k}\sim \chi^2_{n-1}, \end{align*} and the random variables $V_{1,n},\dots,V_{p,n}$ are independent because they are functions of the independent coordinate samples $(Y_{1k},\dots,Y_{nk})$. Let $G_{k,n}$ be defined by \begin{align*} G_{k,n}:=\sqrt{n}(V_{k,n}-\lambda_k). \end{align*} Let $Q_{k,n}:=(n-1)V_{k,n}/\lambda_k$. There exist independent standard normal random variables $N_{1,k},\dots,N_{n-1,k}$ such that \begin{align*} Q_{k,n}=\sum_{m=1}^{n-1}N_{m,k}^2. \end{align*} The variables $N_{m,k}^2-1$ are independent, centered, and have variance $2$, so the [Central Limit Theorem](/page/Central%20Limit%20Theorem) gives \begin{align*} \frac{Q_{k,n}-(n-1)}{\sqrt{2(n-1)}}\xrightarrow{d}\mathcal{N}(0,1). \end{align*} Since \begin{align*} G_{k,n} =\lambda_k\sqrt{n}\left(\frac{Q_{k,n}}{n-1}-1\right) =\lambda_k\sqrt{\frac{2n}{n-1}}\,\frac{Q_{k,n}-(n-1)}{\sqrt{2(n-1)}}, \end{align*} it follows that \begin{align*} G_{k,n}\xrightarrow{d}\mathcal{N}(0,2\lambda_k^2). \end{align*} Since the coordinates are independent for each $n$, the vector convergence is \begin{align*} (G_{1,n},\dots,G_{p,n}) \xrightarrow{d} (Z_1,\dots,Z_p), \end{align*} where $Z_1,\dots,Z_p$ are independent and $Z_k\sim\mathcal{N}(0,2\lambda_k^2)$. This is the chi-square form of the [Central Limit Theorem](/page/Central%20Limit%20Theorem), equivalently the [central limit theorem](/theorems/1848) applied to standardized squared normal variables. [guided] We now compute the first-order fluctuations of the diagonal entries of $T_n$. For each coordinate $k$, define \begin{align*} \overline{Y}_{k,n} := \frac{1}{n}\sum_{i=1}^{n}Y_{ik}, \qquad V_{k,n} := \frac{1}{n-1}\sum_{i=1}^{n}(Y_{ik}-\overline{Y}_{k,n})^2. \end{align*} By the formula for $T_n$, its $k$-th diagonal entry is exactly \begin{align*} (T_n)_{kk} = \frac{1}{n-1}\sum_{i=1}^{n}(Y_{ik}-\overline{Y}_{k,n})^2 = V_{k,n}. \end{align*} The variables $Y_{1k},\dots,Y_{nk}$ form an independent sample from $\mathcal{N}(0,\lambda_k)$. For a one-dimensional normal sample, the unbiased sample variance has the chi-square representation \begin{align*} \frac{(n-1)V_{k,n}}{\lambda_k}\sim \chi^2_{n-1}. \end{align*} Moreover, for different values of $k$, the coordinate samples $(Y_{1k},\dots,Y_{nk})$ are independent because the covariance matrix $\Lambda$ is diagonal and the vectors $Y_i$ are jointly normal. Hence $V_{1,n},\dots,V_{p,n}$ are independent. Define \begin{align*} G_{k,n}:=\sqrt{n}(V_{k,n}-\lambda_k). \end{align*} Writing $Q_{k,n}:=(n-1)V_{k,n}/\lambda_k$, we have $Q_{k,n}\sim \chi^2_{n-1}$ and \begin{align*} G_{k,n} = \lambda_k\sqrt{n}\left(\frac{Q_{k,n}}{n-1}-1\right) = \lambda_k\frac{\sqrt{n}}{\sqrt{2(n-1)}}\left(\frac{Q_{k,n}-(n-1)}{\sqrt{(n-1)/2}}\right). \end{align*} The [Central Limit Theorem](/page/Central%20Limit%20Theorem), applied to the sum representation of the chi-square variable $Q_{k,n}$, says that \begin{align*} \frac{Q_{k,n}-(n-1)}{\sqrt{2(n-1)}}\xrightarrow{d}\mathcal{N}(0,1). \end{align*} Equivalently, $G_{k,n}\xrightarrow{d}\mathcal{N}(0,2\lambda_k^2)$ in the sense of [convergence in distribution](/page/Convergence%20in%20Distribution). Because the random variables $G_{1,n},\dots,G_{p,n}$ are independent for each $n$, their joint characteristic function factors into the product of the marginal characteristic functions. Passing to the limit gives the joint convergence \begin{align*} (G_{1,n},\dots,G_{p,n}) \xrightarrow{d} (Z_1,\dots,Z_p), \end{align*} where $Z_1,\dots,Z_p$ are independent and $Z_k\sim\mathcal{N}(0,2\lambda_k^2)$. This step uses the chi-square form of the [Central Limit Theorem](/page/Central%20Limit%20Theorem), equivalently the classical [central limit theorem](/theorems/521) applied to the centered variables $(Y_{ik}^2-\lambda_k)/\lambda_k$. [/guided] [/step] [step:Control the size of the whole covariance perturbation] Define the symmetric perturbation matrix \begin{align*} E_n := T_n-\Lambda. \end{align*} For random matrices $A_n$ and positive deterministic numbers $a_n$, the notation $A_n=O_{\mathbb{P}}(a_n)$ means that the family $\{\|A_n\|_{\mathrm{op}}/a_n:n\in\mathbb{N}\}$ is tight, where $\|\cdot\|_{\mathrm{op}}$ is the [operator norm](/page/Operator%20Norm). The notation $A_n=o_{\mathbb{P}}(a_n)$ means that $\|A_n\|_{\mathrm{op}}/a_n\xrightarrow{\mathbb{P}}0$ in the sense of [convergence in probability](/page/Convergence%20in%20Probability). For $a,b \in \{1,\dots,p\}$, define the coordinate sample mean \begin{align*} \overline{Y}_{a,n}:=\frac{1}{n}\sum_{i=1}^{n}Y_{ia}. \end{align*} The $(a,b)$ entry of $T_n$ is \begin{align*} (T_n)_{ab} &=\frac{1}{n-1}\sum_{i=1}^{n}(Y_{ia}-\overline{Y}_{a,n})(Y_{ib}-\overline{Y}_{b,n}) \\ &=\frac{1}{n-1}\sum_{i=1}^{n}Y_{ia}Y_{ib}-\frac{n}{n-1}\overline{Y}_{a,n}\overline{Y}_{b,n}. \end{align*} Since $\mathbb{E}[Y_{ia}Y_{ib}]=\lambda_a\mathbb{1}_{\{a=b\}}$, the [Central Limit Theorem](/page/Central%20Limit%20Theorem) applied to the centered variables $Y_{ia}Y_{ib}-\mathbb{E}[Y_{ia}Y_{ib}]$ gives \begin{align*} \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left(Y_{ia}Y_{ib}-\mathbb{E}[Y_{ia}Y_{ib}]\right)=O_{\mathbb{P}}(1), \end{align*} because [normal coordinates](/theorems/2713) have finite fourth moments. Also $\sqrt{n}\,\overline{Y}_{a,n}=O_{\mathbb{P}}(1)$ and $\sqrt{n}\,\overline{Y}_{b,n}=O_{\mathbb{P}}(1)$ by the same theorem applied to $Y_{ia}$ and $Y_{ib}$, so \begin{align*} \sqrt{n}\,\overline{Y}_{a,n}\overline{Y}_{b,n}=O_{\mathbb{P}}(n^{-1/2})=O_{\mathbb{P}}(1). \end{align*} Thus \begin{align*} \sqrt{n}(E_n)_{ab}=O_{\mathbb{P}}(1) \end{align*} for every $a,b \in \{1,\dots,p\}$. Since $p$ is fixed, entrywise tightness of the $p^2$ entries implies tightness of every matrix norm; in particular, \begin{align*} \|E_n\|_{\mathrm{op}}=O_{\mathbb{P}}(n^{-1/2}). \end{align*} [guided] The point of this step is to show that the whole covariance perturbation is of order $n^{-1/2}$, not just its diagonal entries. We first fix the asymptotic notation. For random matrices $A_n$ and positive deterministic numbers $a_n$, $A_n=O_{\mathbb{P}}(a_n)$ means that the family $\{\|A_n\|_{\mathrm{op}}/a_n:n\in\mathbb{N}\}$ is tight. Also $A_n=o_{\mathbb{P}}(a_n)$ means that $\|A_n\|_{\mathrm{op}}/a_n\xrightarrow{\mathbb{P}}0$, where this is [convergence in probability](/page/Convergence%20in%20Probability). The norm here is the [operator norm](/page/Operator%20Norm), although any fixed matrix norm would give the same probabilistic order because the dimension $p$ is fixed. Define \begin{align*} E_n:=T_n-\Lambda. \end{align*} For $a,b\in\{1,\dots,p\}$, define \begin{align*} \overline{Y}_{a,n}:=\frac{1}{n}\sum_{i=1}^{n}Y_{ia}. \end{align*} Expanding the centered products gives \begin{align*} (T_n)_{ab} &=\frac{1}{n-1}\sum_{i=1}^{n}(Y_{ia}-\overline{Y}_{a,n})(Y_{ib}-\overline{Y}_{b,n}) \\ &=\frac{1}{n-1}\sum_{i=1}^{n}Y_{ia}Y_{ib}-\frac{n}{n-1}\overline{Y}_{a,n}\overline{Y}_{b,n}. \end{align*} This formula isolates the ordinary average of products from the sample-mean correction. Since $Y_i\sim\mathcal{N}_p(0,\Lambda)$, we have $\mathbb{E}[Y_{ia}Y_{ib}]=\lambda_a\mathbb{1}_{\{a=b\}}$. The variables $Y_{ia}Y_{ib}$ have finite variance because the normal coordinates have finite fourth moments. Therefore the [Central Limit Theorem](/page/Central%20Limit%20Theorem) gives \begin{align*} \frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left(Y_{ia}Y_{ib}-\mathbb{E}[Y_{ia}Y_{ib}]\right)=O_{\mathbb{P}}(1). \end{align*} The sample-mean term is smaller. Applying the same theorem to the one-dimensional samples $Y_{1a},\dots,Y_{na}$ and $Y_{1b},\dots,Y_{nb}$ gives $\sqrt{n}\,\overline{Y}_{a,n}=O_{\mathbb{P}}(1)$ and $\sqrt{n}\,\overline{Y}_{b,n}=O_{\mathbb{P}}(1)$. Hence \begin{align*} \sqrt{n}\,\overline{Y}_{a,n}\overline{Y}_{b,n}=O_{\mathbb{P}}(n^{-1/2})=O_{\mathbb{P}}(1). \end{align*} Combining the product-average term and the sample-mean correction yields \begin{align*} \sqrt{n}(E_n)_{ab}=O_{\mathbb{P}}(1) \end{align*} for every pair $a,b$. Finally, there are only $p^2$ entries because $p$ is fixed. Tightness of finitely many entries implies tightness of any fixed matrix norm. Therefore \begin{align*} \|E_n\|_{\mathrm{op}}=O_{\mathbb{P}}(n^{-1/2}). \end{align*} [/guided] [/step] [step:Compute the first-order perturbation of a simple eigenvalue] Let $A:=\Lambda$. For each $k \in \{1,\dots,p\}$, let $\rho_k(B)$ denote the $k$-th largest eigenvalue of a symmetric matrix $B \in \mathbb{R}^{p \times p}$. Define the spectral gap \begin{align*} \delta_k := \min_{j\neq k}|\lambda_k-\lambda_j|>0. \end{align*} [claim:Simple diagonal eigenvalues have diagonal first variation] There exist constants $r_k>0$ and $C_k>0$, depending only on $\Lambda$ and $k$, such that every symmetric matrix $H \in \mathbb{R}^{p \times p}$ with $\|H\|_{\mathrm{op}}\le r_k$ satisfies \begin{align*} \left|\rho_k(\Lambda+H)-\lambda_k-H_{kk}\right| \le C_k\|H\|_{\mathrm{op}}^2. \end{align*} [/claim] [proof] Let $e_k \in \mathbb{R}^p$ denote the $k$-th standard basis vector, and let $P_k:\mathbb{R}^p\to\mathbb{R}^p$ be the [orthogonal projection](/theorems/437) onto $\operatorname{span}\{e_k\}^\perp$. Choose \begin{align*} r_k := \frac{\delta_k}{4}. \end{align*} If $\|H\|_{\mathrm{op}}\le r_k$, [Weyl's eigenvalue inequality](/page/Weyl%27s%20Inequality) gives \begin{align*} |\rho_k(\Lambda+H)-\lambda_k|\le \|H\|_{\mathrm{op}}\le \frac{\delta_k}{4}, \end{align*} so the eigenvalue $\rho_k(\Lambda+H)$ remains separated from every $\lambda_j$ with $j\neq k$ by at least $\delta_k/2$. Let $\ell:=\rho_k(\Lambda+H)$, and choose a unit eigenvector $u \in \mathbb{R}^p$ of $\Lambda+H$ for $\ell$ with positive $k$-th coordinate. Write \begin{align*} u = \alpha e_k+w, \end{align*} where $\alpha \in \mathbb{R}$, $w \in \operatorname{span}\{e_k\}^\perp$, and $\alpha^2+|w|^2=1$. Projecting the eigenvalue equation $(\Lambda+H)u=\ell u$ onto $\operatorname{span}\{e_k\}^\perp$ gives \begin{align*} (P_k\Lambda P_k-\ell I)w = -P_kH(\alpha e_k+w). \end{align*} Let $I_{k}^{\perp}:\operatorname{span}\{e_k\}^\perp\to\operatorname{span}\{e_k\}^\perp$ denote the identity map on $\operatorname{span}\{e_k\}^\perp$. The inverse of $P_k\Lambda P_k-\ell I_k^{\perp}$ on $\operatorname{span}\{e_k\}^\perp$ has operator norm at most $2/\delta_k$, hence \begin{align*} |w| \le \frac{2}{\delta_k}\|H\|_{\mathrm{op}}|\alpha e_k+w| = \frac{2}{\delta_k}\|H\|_{\mathrm{op}}. \end{align*} Taking the inner product of $(\Lambda+H)u=\ell u$ with $u$ gives \begin{align*} \ell = u^\top\Lambda u+u^\top H u. \end{align*} Subtracting $\lambda_k+H_{kk}$ yields \begin{align*} \ell-\lambda_k-H_{kk} &= (\alpha^2-1)\lambda_k+\sum_{j\neq k}\lambda_j w_j^2 +(\alpha^2-1)H_{kk}+2\alpha\sum_{j\neq k}H_{kj}w_j+\sum_{i,j\neq k}H_{ij}w_iw_j. \end{align*} Since $\alpha^2-1=-|w|^2$, $|\alpha|\le 1$, and $|w|\le 2\|H\|_{\mathrm{op}}/\delta_k$, every term on the right is bounded by a constant depending only on $\Lambda$ and $k$ times $\|H\|_{\mathrm{op}}^2$. Thus there is $C_k>0$ such that \begin{align*} |\ell-\lambda_k-H_{kk}|\le C_k\|H\|_{\mathrm{op}}^2. \end{align*} [/proof] Applying the claim with $H=E_n$ gives \begin{align*} \hat{\lambda}_{k,n}-\lambda_k = (E_n)_{kk}+R_{k,n}, \qquad |R_{k,n}|\le C_k\|E_n\|_{\mathrm{op}}^2 \end{align*} with probability tending to $1$. Since $\|E_n\|_{\mathrm{op}}=O_{\mathbb{P}}(n^{-1/2})$, we have \begin{align*} \sqrt{n}R_{k,n}=O_{\mathbb{P}}(n^{-1/2})\xrightarrow{\mathbb{P}}0. \end{align*} The last convergence is [convergence in probability](/page/Convergence%20in%20Probability). Therefore \begin{align*} \sqrt{n}(\hat{\lambda}_{k,n}-\lambda_k) = \sqrt{n}(T_n)_{kk}-\sqrt{n}\lambda_k+o_{\mathbb{P}}(1) = G_{k,n}+o_{\mathbb{P}}(1). \end{align*} [guided] The diagonal entries give the first-order eigenvalue fluctuations, but we still need to justify that the off-diagonal entries do not contribute at order $n^{-1/2}$. This is a perturbation statement for a symmetric matrix with simple eigenvalues. Let $A:=\Lambda$. For a symmetric matrix $B \in \mathbb{R}^{p\times p}$, let $\rho_k(B)$ denote its $k$-th largest eigenvalue. Define \begin{align*} \delta_k := \min_{j\neq k}|\lambda_k-\lambda_j|>0. \end{align*} This number is positive because the population eigenvalues are distinct. We prove the perturbation estimate. Let $e_k \in \mathbb{R}^p$ be the $k$-th standard basis vector, and let $P_k:\mathbb{R}^p\to\mathbb{R}^p$ be the orthogonal projection onto $\operatorname{span}\{e_k\}^\perp$. Choose \begin{align*} r_k:=\frac{\delta_k}{4}. \end{align*} If $H$ is symmetric and $\|H\|_{\mathrm{op}}\le r_k$, [Weyl's eigenvalue inequality](/page/Weyl%27s%20Inequality) gives \begin{align*} |\rho_k(\Lambda+H)-\lambda_k|\le \|H\|_{\mathrm{op}}\le \frac{\delta_k}{4}. \end{align*} Thus the perturbed $k$-th eigenvalue stays separated from all the other unperturbed eigenvalues. Put \begin{align*} \ell:=\rho_k(\Lambda+H). \end{align*} Choose a unit eigenvector $u \in \mathbb{R}^p$ of $\Lambda+H$ for $\ell$, and choose its sign so that its $k$-th coordinate is nonnegative. Decompose it as \begin{align*} u=\alpha e_k+w, \end{align*} where $\alpha\in\mathbb{R}$, $w\in\operatorname{span}\{e_k\}^\perp$, and $\alpha^2+|w|^2=1$. Project the eigenvalue equation \begin{align*} (\Lambda+H)u=\ell u \end{align*} onto $\operatorname{span}\{e_k\}^\perp$. Since $P_k\Lambda e_k=0$, this gives \begin{align*} (P_k\Lambda P_k-\ell I)w=-P_kH(\alpha e_k+w). \end{align*} For $j\neq k$, the eigenvalues of $P_k\Lambda P_k-\ell I$ are $\lambda_j-\ell$. The separation estimate above gives \begin{align*} |\lambda_j-\ell|\ge \frac{\delta_k}{2}. \end{align*} Let $I_k^{\perp}:\operatorname{span}\{e_k\}^\perp\to\operatorname{span}\{e_k\}^\perp$ denote the identity map on $\operatorname{span}\{e_k\}^\perp$. Hence the inverse of $P_k\Lambda P_k-\ell I_k^{\perp}$ on $\operatorname{span}\{e_k\}^\perp$ has operator norm at most $2/\delta_k$, and therefore \begin{align*} |w| \le \frac{2}{\delta_k}\|H\|_{\mathrm{op}}|\alpha e_k+w| = \frac{2}{\delta_k}\|H\|_{\mathrm{op}}, \end{align*} because $u=\alpha e_k+w$ is a unit vector. Now compute the eigenvalue through the Rayleigh quotient: \begin{align*} \ell = u^\top(\Lambda+H)u = u^\top\Lambda u+u^\top H u. \end{align*} Expanding with $u=\alpha e_k+w$ gives \begin{align*} \ell-\lambda_k-H_{kk} &= (\alpha^2-1)\lambda_k+\sum_{j\neq k}\lambda_j w_j^2 +(\alpha^2-1)H_{kk} +2\alpha\sum_{j\neq k}H_{kj}w_j +\sum_{i,j\neq k}H_{ij}w_iw_j. \end{align*} The key point is that $\alpha^2-1=-|w|^2$, and we already proved $|w|=O(\|H\|_{\mathrm{op}})$. Thus each term on the right is $O(\|H\|_{\mathrm{op}}^2)$, with a constant depending only on the fixed matrix $\Lambda$ and the index $k$. Hence there is $C_k>0$ such that \begin{align*} |\rho_k(\Lambda+H)-\lambda_k-H_{kk}| \le C_k\|H\|_{\mathrm{op}}^2. \end{align*} We apply this with $H=E_n=T_n-\Lambda$. Since $\|E_n\|_{\mathrm{op}}=O_{\mathbb{P}}(n^{-1/2})$, the quadratic remainder satisfies \begin{align*} \sqrt{n}\|E_n\|_{\mathrm{op}}^2=O_{\mathbb{P}}(n^{-1/2})\xrightarrow{\mathbb{P}}0. \end{align*} Therefore \begin{align*} \sqrt{n}(\hat{\lambda}_{k,n}-\lambda_k) = \sqrt{n}(E_n)_{kk}+o_{\mathbb{P}}(1) = \sqrt{n}((T_n)_{kk}-\lambda_k)+o_{\mathbb{P}}(1) = G_{k,n}+o_{\mathbb{P}}(1). \end{align*} This proves that the off-diagonal entries affect eigenvectors at first order but affect eigenvalues only at second order. [/guided] [/step] [step:Pass from diagonal variance fluctuations to eigenvalue fluctuations] From the previous step, for each $k \in \{1,\dots,p\}$, \begin{align*} \sqrt{n}(\hat{\lambda}_{k,n}-\lambda_k)=G_{k,n}+o_{\mathbb{P}}(1). \end{align*} Equivalently, \begin{align*} \left(\sqrt{n}(\hat{\lambda}_{1,n}-\lambda_1),\dots,\sqrt{n}(\hat{\lambda}_{p,n}-\lambda_p)\right) = (G_{1,n},\dots,G_{p,n})+o_{\mathbb{P}}(1) \end{align*} in $\mathbb{R}^p$. By [Slutsky's Theorem](/page/Slutsky%27s%20Theorem), applied in $\mathbb{R}^p$, and by the joint [convergence in distribution](/page/Convergence%20in%20Distribution) of $(G_{1,n},\dots,G_{p,n})$, \begin{align*} \left(\sqrt{n}(\hat{\lambda}_{1,n}-\lambda_1),\dots,\sqrt{n}(\hat{\lambda}_{p,n}-\lambda_p)\right) \xrightarrow{d} (Z_1,\dots,Z_p), \end{align*} where $Z_1,\dots,Z_p$ are independent and $Z_k\sim\mathcal{N}(0,2\lambda_k^2)$. Taking the $k$-th coordinate gives \begin{align*} \sqrt{n}(\hat{\lambda}_{k,n}-\lambda_k)\xrightarrow{d}\mathcal{N}(0,2\lambda_k^2). \end{align*} The joint limit has diagonal covariance matrix, hence the limiting fluctuations of distinct sample eigenvalues are independent. This proves the theorem. [guided] From the perturbation step, for each $k\in\{1,\dots,p\}$ we have \begin{align*} \sqrt{n}(\hat{\lambda}_{k,n}-\lambda_k)=G_{k,n}+o_{\mathbb{P}}(1). \end{align*} Because there are only finitely many coordinates, these coordinatewise probability errors combine into the vector statement \begin{align*} \left(\sqrt{n}(\hat{\lambda}_{1,n}-\lambda_1),\dots,\sqrt{n}(\hat{\lambda}_{p,n}-\lambda_p)\right) = (G_{1,n},\dots,G_{p,n})+o_{\mathbb{P}}(1) \end{align*} in $\mathbb{R}^p$. The diagonal variance step proved the joint [convergence in distribution](/page/Convergence%20in%20Distribution) \begin{align*} (G_{1,n},\dots,G_{p,n})\xrightarrow{d}(Z_1,\dots,Z_p), \end{align*} where $Z_1,\dots,Z_p$ are independent and $Z_k\sim\mathcal{N}(0,2\lambda_k^2)$. We now apply [Slutsky's Theorem](/page/Slutsky%27s%20Theorem) in the finite-dimensional space $\mathbb{R}^p$. Its hypotheses are satisfied because the first vector converges in distribution and the error vector converges to $0$ in probability. Therefore \begin{align*} \left(\sqrt{n}(\hat{\lambda}_{1,n}-\lambda_1),\dots,\sqrt{n}(\hat{\lambda}_{p,n}-\lambda_p)\right) \xrightarrow{d} (Z_1,\dots,Z_p). \end{align*} Taking the $k$-th coordinate gives \begin{align*} \sqrt{n}(\hat{\lambda}_{k,n}-\lambda_k)\xrightarrow{d}\mathcal{N}(0,2\lambda_k^2). \end{align*} Since the limiting vector has independent coordinates, the limiting fluctuations of distinct sample eigenvalues are independent. This proves the theorem. [/guided] [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Distribution

What brings you to Androma?

Start with a route through the knowledge graph.