Consistency of Sample Principal Component Analysis

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first rewrite each entry of the sample covariance matrix in terms of empirical first and second moments. The [weak law of large numbers](/theorems/1851) gives convergence in probability of these finitely many empirical moments, so each entry of $S_n$ converges to the corresponding entry of $\Sigma$. Since the dimension $p$ is fixed, entrywise convergence implies operator-norm convergence. Weyl's eigenvalue perturbation bound then gives convergence of ordered eigenvalues, and a direct spectral-gap estimate proves convergence of a signed eigenvector when the population eigenvalue is simple. [/proofplan] [step:Reduce the sample covariance entries to empirical moments] For $j,\ell \in \{1,\dots,p\}$, let $X_{r,j}$ denote the $j$-th coordinate of $X_r$. Define the empirical coordinate mean $M_{j,n}: \Omega \to \mathbb{R}$ and the empirical second moment $Q_{j\ell,n}: \Omega \to \mathbb{R}$ by \begin{align*} M_{j,n} &= \frac{1}{n}\sum_{r=1}^n X_{r,j},\\ Q_{j\ell,n} &= \frac{1}{n}\sum_{r=1}^n X_{r,j}X_{r,\ell}. \end{align*} The $(j,\ell)$ entry of $S_n$ is \begin{align*} (S_n)_{j\ell} &= \frac{1}{n}\sum_{r=1}^n (X_{r,j}-M_{j,n})(X_{r,\ell}-M_{\ell,n})\\ &= Q_{j\ell,n}-M_{j,n}M_{\ell,n}. \end{align*} Also, writing $\mu_j = \mathbb{E}[X_{1,j}]$ for the $j$-th coordinate of $\mu$, \begin{align*} \Sigma_{j\ell} &= \mathbb{E}\big[(X_{1,j}-\mu_j)(X_{1,\ell}-\mu_\ell)\big]\\ &= \mathbb{E}[X_{1,j}X_{1,\ell}] - \mu_j\mu_\ell. \end{align*} [guided] The sample covariance is built from centered observations, but convergence is easiest to prove before centering. For each coordinate $j$, define \begin{align*} M_{j,n} &= \frac{1}{n}\sum_{r=1}^n X_{r,j}, \end{align*} which is exactly the $j$-th coordinate of $\bar{X}_n$. For each coordinate pair $(j,\ell)$, define \begin{align*} Q_{j\ell,n} &= \frac{1}{n}\sum_{r=1}^n X_{r,j}X_{r,\ell}. \end{align*} Expanding the product in the definition of $S_n$ gives \begin{align*} (S_n)_{j\ell} &= \frac{1}{n}\sum_{r=1}^n (X_{r,j}-M_{j,n})(X_{r,\ell}-M_{\ell,n})\\ &= \frac{1}{n}\sum_{r=1}^n X_{r,j}X_{r,\ell} - M_{\ell,n}\frac{1}{n}\sum_{r=1}^n X_{r,j} - M_{j,n}\frac{1}{n}\sum_{r=1}^n X_{r,\ell} + M_{j,n}M_{\ell,n}\\ &= Q_{j\ell,n}-M_{j,n}M_{\ell,n}. \end{align*} The same algebra holds at the population level. Since $\mu_j=\mathbb{E}[X_{1,j}]$ and $\mu_\ell=\mathbb{E}[X_{1,\ell}]$, the $(j,\ell)$ entry of the covariance matrix is \begin{align*} \Sigma_{j\ell} &= \mathbb{E}\big[(X_{1,j}-\mu_j)(X_{1,\ell}-\mu_\ell)\big]\\ &= \mathbb{E}[X_{1,j}X_{1,\ell}] - \mu_\ell\mathbb{E}[X_{1,j}] - \mu_j\mathbb{E}[X_{1,\ell}] + \mu_j\mu_\ell\\ &= \mathbb{E}[X_{1,j}X_{1,\ell}] - \mu_j\mu_\ell. \end{align*} Thus it remains to prove convergence of the finitely many quantities $M_{j,n}$ and $Q_{j\ell,n}$. [/guided] [/step] [step:Apply the weak law of large numbers to the coordinate moments] Fix $j,\ell \in \{1,\dots,p\}$. Since $|X_{1,j}| \leq |X_1|$ and $\mathbb{E}[|X_1|^2]<\infty$, we have $\mathbb{E}[|X_{1,j}|]<\infty$. Since \begin{align*} |X_{1,j}X_{1,\ell}| \leq \frac{1}{2}\bigl(X_{1,j}^2+X_{1,\ell}^2\bigr) \leq |X_1|^2, \end{align*} we also have $\mathbb{E}[|X_{1,j}X_{1,\ell}|]<\infty$. By the [Weak Law of Large Numbers](/theorems/1127) (citing a result not yet in the wiki: Weak Law of Large Numbers), \begin{align*} M_{j,n} &\xrightarrow{\mathbb{P}} \mu_j,\\ Q_{j\ell,n} &\xrightarrow{\mathbb{P}} \mathbb{E}[X_{1,j}X_{1,\ell}]. \end{align*} Products are continuous maps on $\mathbb{R}^2$, so the [continuous mapping theorem](/theorems/1847) gives \begin{align*} M_{j,n}M_{\ell,n} \xrightarrow{\mathbb{P}} \mu_j\mu_\ell. \end{align*} Therefore \begin{align*} (S_n)_{j\ell} = Q_{j\ell,n}-M_{j,n}M_{\ell,n} \xrightarrow{\mathbb{P}} \mathbb{E}[X_{1,j}X_{1,\ell}] - \mu_j\mu_\ell = \Sigma_{j\ell}. \end{align*} Because $j$ and $\ell$ were arbitrary, $S_n \to \Sigma$ entrywise in probability. [guided] We now check the hypotheses of the law of large numbers for every scalar random variable that appears in the empirical moments. The weak law requires integrability. For the first moments, $|X_{1,j}| \leq |X_1|$, and the [Cauchy-Schwarz inequality](/theorems/432) gives $\mathbb{E}[|X_1|] \leq \mathbb{E}[|X_1|^2]^{1/2}<\infty$. Hence $X_{1,j}$ is integrable. For the second moments, we use the elementary inequality $2|ab|\leq a^2+b^2$ with $a=X_{1,j}$ and $b=X_{1,\ell}$: \begin{align*} |X_{1,j}X_{1,\ell}| \leq \frac{1}{2}\bigl(X_{1,j}^2+X_{1,\ell}^2\bigr) \leq |X_1|^2. \end{align*} Since $\mathbb{E}[|X_1|^2]<\infty$, this proves $\mathbb{E}[|X_{1,j}X_{1,\ell}|]<\infty$. The scalar random variables $X_{1,j},X_{2,j},\dots$ are independent and identically distributed because the vectors $X_1,X_2,\dots$ are independent and identically distributed. Likewise, the scalar random variables $X_{1,j}X_{1,\ell},X_{2,j}X_{2,\ell},\dots$ are independent and identically distributed. Therefore the Weak Law of Large Numbers (citing a result not yet in the wiki: Weak Law of Large Numbers) gives \begin{align*} M_{j,n} &\xrightarrow{\mathbb{P}} \mathbb{E}[X_{1,j}] = \mu_j,\\ Q_{j\ell,n} &\xrightarrow{\mathbb{P}} \mathbb{E}[X_{1,j}X_{1,\ell}]. \end{align*} Since multiplication is continuous on $\mathbb{R}^2$, the continuous mapping theorem yields \begin{align*} M_{j,n}M_{\ell,n} \xrightarrow{\mathbb{P}} \mu_j\mu_\ell. \end{align*} Combining this with the covariance identity from the previous step, \begin{align*} (S_n)_{j\ell} = Q_{j\ell,n}-M_{j,n}M_{\ell,n} \xrightarrow{\mathbb{P}} \mathbb{E}[X_{1,j}X_{1,\ell}] - \mu_j\mu_\ell = \Sigma_{j\ell}. \end{align*} This proves entrywise convergence for every one of the finitely many entries. [/guided] [/step] [step:Upgrade entrywise convergence to operator norm convergence] Define the random matrix $A_n:\Omega\to\mathbb{R}^{p\times p}$ by $A_n=S_n-\Sigma$. Its Frobenius norm is \begin{align*} \|A_n\|_F = \left(\sum_{j=1}^p\sum_{\ell=1}^p |(A_n)_{j\ell}|^2\right)^{1/2}. \end{align*} Since the number of entries is finite and each entry converges to $0$ in probability, the continuous mapping theorem gives $\|A_n\|_F\xrightarrow{\mathbb{P}}0$. For every deterministic matrix $A\in\mathbb{R}^{p\times p}$, \begin{align*} \|A\|_{\mathrm{op}} \leq \|A\|_F. \end{align*} Applying this inequality to $A_n$ gives \begin{align*} \|S_n-\Sigma\|_{\mathrm{op}} = \|A_n\|_{\mathrm{op}} \leq \|A_n\|_F \xrightarrow{\mathbb{P}}0. \end{align*} [guided] Entrywise convergence is enough because $p$ is fixed. Define \begin{align*} A_n = S_n-\Sigma. \end{align*} The Frobenius norm collects all entries into a single scalar: \begin{align*} \|A_n\|_F = \left(\sum_{j=1}^p\sum_{\ell=1}^p |(A_n)_{j\ell}|^2\right)^{1/2}. \end{align*} There are only $p^2$ entries, and $p$ does not depend on $n$. Since each $(A_n)_{j\ell}$ converges to $0$ in probability, the vector of entries of $A_n$ converges to the zero vector in $\mathbb{R}^{p^2}$ in probability. The map from this vector to its Euclidean norm is continuous, so the continuous mapping theorem gives \begin{align*} \|A_n\|_F \xrightarrow{\mathbb{P}}0. \end{align*} Now we compare norms. For any deterministic matrix $A\in\mathbb{R}^{p\times p}$ and any vector $v\in\mathbb{R}^p$ with $|v|=1$, \begin{align*} |Av|^2 &= \sum_{j=1}^p \left|\sum_{\ell=1}^p A_{j\ell}v_\ell\right|^2\\ &\leq \sum_{j=1}^p \left(\sum_{\ell=1}^p |A_{j\ell}|^2\right)\left(\sum_{\ell=1}^p |v_\ell|^2\right)\\ &= \sum_{j=1}^p\sum_{\ell=1}^p |A_{j\ell}|^2\\ &= \|A\|_F^2. \end{align*} The inequality is the Cauchy-Schwarz inequality applied to each row of $A$ and the vector $v$. Taking the supremum over all $v$ with $|v|=1$ gives $\|A\|_{\mathrm{op}}\leq \|A\|_F$. Therefore \begin{align*} \|S_n-\Sigma\|_{\mathrm{op}} = \|A_n\|_{\mathrm{op}} \leq \|A_n\|_F \xrightarrow{\mathbb{P}}0. \end{align*} [/guided] [/step] [step:Use Weyl's perturbation bound to prove eigenvalue convergence] Both $S_n$ and $\Sigma$ are real symmetric matrices. By Weyl's eigenvalue perturbation inequality for Hermitian matrices (citing a result not yet in the wiki: Weyl eigenvalue perturbation inequality), \begin{align*} \max_{1\leq k\leq p} |\hat{\lambda}_{k,n}-\lambda_k| \leq \|S_n-\Sigma\|_{\mathrm{op}}. \end{align*} Since $\|S_n-\Sigma\|_{\mathrm{op}}\xrightarrow{\mathbb{P}}0$, it follows that, for each $k\in\{1,\dots,p\}$, \begin{align*} \hat{\lambda}_{k,n}\xrightarrow{\mathbb{P}}\lambda_k. \end{align*} [/step] [step:Exploit the spectral gap around a simple population eigenvalue] Assume $\lambda_k$ is simple, and let $\gamma_k\in\mathbb{R}^p$ be a unit eigenvector of $\Sigma$ associated to $\lambda_k$. Since $\Sigma$ is real symmetric, choose an orthonormal eigenbasis $\gamma_1,\dots,\gamma_p$ of $\mathbb{R}^p$ such that $\Sigma\gamma_j=\lambda_j\gamma_j$ for each $j$. Define the spectral gap \begin{align*} \delta_k = \min_{j\neq k} |\lambda_j-\lambda_k|. \end{align*} Simplicity of $\lambda_k$ gives $\delta_k>0$. Let $E_n=S_n-\Sigma$, and let $\eta_n=\|E_n\|_{\mathrm{op}}$. On the event $\eta_n<\delta_k/2$, Weyl's inequality gives \begin{align*} |\hat{\lambda}_{k,n}-\lambda_k| \leq \eta_n < \frac{\delta_k}{2}. \end{align*} Let $\hat{\gamma}_{k,n}$ be the signed unit eigenvector from the statement, so \begin{align*} S_n\hat{\gamma}_{k,n}=\hat{\lambda}_{k,n}\hat{\gamma}_{k,n}, \qquad |\hat{\gamma}_{k,n}|=1, \qquad \hat{\gamma}_{k,n}^\top\gamma_k\geq0. \end{align*} Write the orthogonal expansion \begin{align*} \hat{\gamma}_{k,n}=a_n\gamma_k+w_n, \end{align*} where $a_n=\hat{\gamma}_{k,n}^\top\gamma_k\in[0,1]$ and $w_n\in\operatorname{span}\{\gamma_j:j\neq k\}$. Since \begin{align*} (\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n} = -(S_n-\Sigma)\hat{\gamma}_{k,n} = -E_n\hat{\gamma}_{k,n}, \end{align*} we have \begin{align*} \|(\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}\|_{\mathbb{R}^p} \leq \eta_n. \end{align*} Expanding in the eigenbasis gives \begin{align*} (\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n} = a_n(\lambda_k-\hat{\lambda}_{k,n})\gamma_k + \sum_{j\neq k} c_{j,n}(\lambda_j-\hat{\lambda}_{k,n})\gamma_j, \end{align*} where $w_n=\sum_{j\neq k}c_{j,n}\gamma_j$. For $j\neq k$, \begin{align*} |\lambda_j-\hat{\lambda}_{k,n}| \geq |\lambda_j-\lambda_k|-|\lambda_k-\hat{\lambda}_{k,n}| \geq \delta_k-\eta_n > \frac{\delta_k}{2}. \end{align*} Therefore \begin{align*} \eta_n^2 &\geq \|(\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}\|_{\mathbb{R}^p}^2\\ &\geq \sum_{j\neq k}|c_{j,n}|^2|\lambda_j-\hat{\lambda}_{k,n}|^2\\ &\geq \frac{\delta_k^2}{4}\sum_{j\neq k}|c_{j,n}|^2\\ &= \frac{\delta_k^2}{4}|w_n|^2. \end{align*} Hence \begin{align*} |w_n| \leq \frac{2\eta_n}{\delta_k} \end{align*} on the event $\eta_n<\delta_k/2$. [guided] The eigenvector statement needs a gap. Since $\lambda_k$ is simple, no other eigenvalue of $\Sigma$ equals $\lambda_k$. We record the smallest separation from $\lambda_k$ to the rest of the spectrum: \begin{align*} \delta_k = \min_{j\neq k}|\lambda_j-\lambda_k|. \end{align*} This is positive because the set $\{j:j\neq k\}$ is finite and every term in the minimum is positive. Let \begin{align*} E_n=S_n-\Sigma, \qquad \eta_n=\|E_n\|_{\mathrm{op}}. \end{align*} We work on the event $\eta_n<\delta_k/2$. Weyl's eigenvalue perturbation inequality gives \begin{align*} |\hat{\lambda}_{k,n}-\lambda_k|\leq \eta_n<\frac{\delta_k}{2}. \end{align*} This says the sample eigenvalue associated to the $k$-th component stays inside the spectral gap around $\lambda_k$. Now take the signed unit eigenvector $\hat{\gamma}_{k,n}$ from the statement: \begin{align*} S_n\hat{\gamma}_{k,n}=\hat{\lambda}_{k,n}\hat{\gamma}_{k,n}, \qquad |\hat{\gamma}_{k,n}|=1, \qquad \hat{\gamma}_{k,n}^\top\gamma_k\geq0. \end{align*} Because $\Sigma$ is real symmetric, the spectral theorem gives an orthonormal eigenbasis $\gamma_1,\dots,\gamma_p$ with $\Sigma\gamma_j=\lambda_j\gamma_j$. Decompose $\hat{\gamma}_{k,n}$ into the component in the target eigendirection and the orthogonal remainder: \begin{align*} \hat{\gamma}_{k,n}=a_n\gamma_k+w_n, \end{align*} where \begin{align*} a_n=\hat{\gamma}_{k,n}^\top\gamma_k\in[0,1], \qquad w_n\in\operatorname{span}\{\gamma_j:j\neq k\}. \end{align*} The sign choice is exactly what guarantees $a_n\geq0$. The residual of $\hat{\gamma}_{k,n}$ under $\Sigma-\hat{\lambda}_{k,n}I$ is controlled by the perturbation: \begin{align*} (\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n} &= (\Sigma-S_n)\hat{\gamma}_{k,n} + (S_n-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}\\ &= -E_n\hat{\gamma}_{k,n}, \end{align*} because $(S_n-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}=0$. Since $|\hat{\gamma}_{k,n}|=1$, \begin{align*} \|(\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}\|_{\mathbb{R}^p} \leq \|E_n\|_{\mathrm{op}}|\hat{\gamma}_{k,n}| = \eta_n. \end{align*} Write \begin{align*} w_n=\sum_{j\neq k}c_{j,n}\gamma_j. \end{align*} Then \begin{align*} (\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n} = a_n(\lambda_k-\hat{\lambda}_{k,n})\gamma_k + \sum_{j\neq k} c_{j,n}(\lambda_j-\hat{\lambda}_{k,n})\gamma_j. \end{align*} For every $j\neq k$, the triangle inequality gives \begin{align*} |\lambda_j-\hat{\lambda}_{k,n}| &\geq |\lambda_j-\lambda_k|-|\lambda_k-\hat{\lambda}_{k,n}|\\ &\geq \delta_k-\eta_n\\ &> \frac{\delta_k}{2}. \end{align*} Since the basis is orthonormal, the squared norm is the sum of squared coefficients. Therefore \begin{align*} \eta_n^2 &\geq \|(\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}\|_{\mathbb{R}^p}^2\\ &\geq \sum_{j\neq k}|c_{j,n}|^2|\lambda_j-\hat{\lambda}_{k,n}|^2\\ &\geq \frac{\delta_k^2}{4}\sum_{j\neq k}|c_{j,n}|^2\\ &= \frac{\delta_k^2}{4}|w_n|^2. \end{align*} Thus \begin{align*} |w_n|\leq \frac{2\eta_n}{\delta_k}. \end{align*} The meaning of this estimate is that a small perturbation cannot move the eigenvector significantly into the orthogonal spectral subspace when the target eigenvalue is separated from the rest of the spectrum. [/guided] [/step] [step:Convert the orthogonal component estimate into eigenvector convergence] On the event $\eta_n<\delta_k/2$, the previous step gives \begin{align*} |w_n|\leq \frac{2\eta_n}{\delta_k}. \end{align*} Since $|\hat{\gamma}_{k,n}|=1$, $|\gamma_k|=1$, and $\hat{\gamma}_{k,n}=a_n\gamma_k+w_n$ with $w_n\perp\gamma_k$, we have \begin{align*} a_n^2+|w_n|^2=1. \end{align*} Because $a_n\geq0$, \begin{align*} |\hat{\gamma}_{k,n}-\gamma_k|^2 &= |(a_n-1)\gamma_k+w_n|^2\\ &= (1-a_n)^2+|w_n|^2\\ &\leq 2|w_n|^2. \end{align*} Thus, on $\{\eta_n<\delta_k/2\}$, \begin{align*} |\hat{\gamma}_{k,n}-\gamma_k| \leq \sqrt{2}|w_n| \leq \frac{2\sqrt{2}}{\delta_k}\eta_n. \end{align*} Since $\eta_n=\|S_n-\Sigma\|_{\mathrm{op}}\xrightarrow{\mathbb{P}}0$, the right-hand side converges to $0$ in probability, and \begin{align*} \mathbb{P}\left(\eta_n\geq\frac{\delta_k}{2}\right)\to0. \end{align*} It follows that \begin{align*} \hat{\gamma}_{k,n}\xrightarrow{\mathbb{P}}\gamma_k. \end{align*} This proves the eigenvector consistency claim and completes the proof. [/step]

Prerequisites (0/4 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

What brings you to Androma?

Start with a route through the knowledge graph.