[proofplan]
We first rewrite each entry of the sample covariance matrix in terms of empirical first and second moments. The [weak law of large numbers](/theorems/1851) gives convergence in probability of these finitely many empirical moments, so each entry of $S_n$ converges to the corresponding entry of $\Sigma$. Since the dimension $p$ is fixed, entrywise convergence implies operator-norm convergence. Weyl's eigenvalue perturbation bound then gives convergence of ordered eigenvalues, and a direct spectral-gap estimate proves convergence of a signed eigenvector when the population eigenvalue is simple.
[/proofplan]
[step:Reduce the sample covariance entries to empirical moments]
For $j,\ell \in \{1,\dots,p\}$, let $X_{r,j}$ denote the $j$-th coordinate of $X_r$. Define the empirical coordinate mean $M_{j,n}: \Omega \to \mathbb{R}$ and the empirical second moment $Q_{j\ell,n}: \Omega \to \mathbb{R}$ by
\begin{align*}
M_{j,n} &= \frac{1}{n}\sum_{r=1}^n X_{r,j},\\
Q_{j\ell,n} &= \frac{1}{n}\sum_{r=1}^n X_{r,j}X_{r,\ell}.
\end{align*}
The $(j,\ell)$ entry of $S_n$ is
\begin{align*}
(S_n)_{j\ell}
&= \frac{1}{n}\sum_{r=1}^n (X_{r,j}-M_{j,n})(X_{r,\ell}-M_{\ell,n})\\
&= Q_{j\ell,n}-M_{j,n}M_{\ell,n}.
\end{align*}
Also, writing $\mu_j = \mathbb{E}[X_{1,j}]$ for the $j$-th coordinate of $\mu$,
\begin{align*}
\Sigma_{j\ell}
&= \mathbb{E}\big[(X_{1,j}-\mu_j)(X_{1,\ell}-\mu_\ell)\big]\\
&= \mathbb{E}[X_{1,j}X_{1,\ell}] - \mu_j\mu_\ell.
\end{align*}
[guided]
The sample covariance is built from centered observations, but convergence is easiest to prove before centering. For each coordinate $j$, define
\begin{align*}
M_{j,n} &= \frac{1}{n}\sum_{r=1}^n X_{r,j},
\end{align*}
which is exactly the $j$-th coordinate of $\bar{X}_n$. For each coordinate pair $(j,\ell)$, define
\begin{align*}
Q_{j\ell,n} &= \frac{1}{n}\sum_{r=1}^n X_{r,j}X_{r,\ell}.
\end{align*}
Expanding the product in the definition of $S_n$ gives
\begin{align*}
(S_n)_{j\ell}
&= \frac{1}{n}\sum_{r=1}^n (X_{r,j}-M_{j,n})(X_{r,\ell}-M_{\ell,n})\\
&= \frac{1}{n}\sum_{r=1}^n X_{r,j}X_{r,\ell}
- M_{\ell,n}\frac{1}{n}\sum_{r=1}^n X_{r,j}
- M_{j,n}\frac{1}{n}\sum_{r=1}^n X_{r,\ell}
+ M_{j,n}M_{\ell,n}\\
&= Q_{j\ell,n}-M_{j,n}M_{\ell,n}.
\end{align*}
The same algebra holds at the population level. Since $\mu_j=\mathbb{E}[X_{1,j}]$ and $\mu_\ell=\mathbb{E}[X_{1,\ell}]$, the $(j,\ell)$ entry of the covariance matrix is
\begin{align*}
\Sigma_{j\ell}
&= \mathbb{E}\big[(X_{1,j}-\mu_j)(X_{1,\ell}-\mu_\ell)\big]\\
&= \mathbb{E}[X_{1,j}X_{1,\ell}] - \mu_\ell\mathbb{E}[X_{1,j}]
- \mu_j\mathbb{E}[X_{1,\ell}]
+ \mu_j\mu_\ell\\
&= \mathbb{E}[X_{1,j}X_{1,\ell}] - \mu_j\mu_\ell.
\end{align*}
Thus it remains to prove convergence of the finitely many quantities $M_{j,n}$ and $Q_{j\ell,n}$.
[/guided]
[/step]
[step:Apply the weak law of large numbers to the coordinate moments]
Fix $j,\ell \in \{1,\dots,p\}$. Since $|X_{1,j}| \leq |X_1|$ and $\mathbb{E}[|X_1|^2]<\infty$, we have $\mathbb{E}[|X_{1,j}|]<\infty$. Since
\begin{align*}
|X_{1,j}X_{1,\ell}| \leq \frac{1}{2}\bigl(X_{1,j}^2+X_{1,\ell}^2\bigr) \leq |X_1|^2,
\end{align*}
we also have $\mathbb{E}[|X_{1,j}X_{1,\ell}|]<\infty$. By the [Weak Law of Large Numbers](/theorems/1127) (citing a result not yet in the wiki: Weak Law of Large Numbers),
\begin{align*}
M_{j,n} &\xrightarrow{\mathbb{P}} \mu_j,\\
Q_{j\ell,n} &\xrightarrow{\mathbb{P}} \mathbb{E}[X_{1,j}X_{1,\ell}].
\end{align*}
Products are continuous maps on $\mathbb{R}^2$, so the [continuous mapping theorem](/theorems/1847) gives
\begin{align*}
M_{j,n}M_{\ell,n} \xrightarrow{\mathbb{P}} \mu_j\mu_\ell.
\end{align*}
Therefore
\begin{align*}
(S_n)_{j\ell}
= Q_{j\ell,n}-M_{j,n}M_{\ell,n}
\xrightarrow{\mathbb{P}}
\mathbb{E}[X_{1,j}X_{1,\ell}] - \mu_j\mu_\ell
= \Sigma_{j\ell}.
\end{align*}
Because $j$ and $\ell$ were arbitrary, $S_n \to \Sigma$ entrywise in probability.
[guided]
We now check the hypotheses of the law of large numbers for every scalar random variable that appears in the empirical moments. The weak law requires integrability. For the first moments, $|X_{1,j}| \leq |X_1|$, and the [Cauchy-Schwarz inequality](/theorems/432) gives $\mathbb{E}[|X_1|] \leq \mathbb{E}[|X_1|^2]^{1/2}<\infty$. Hence $X_{1,j}$ is integrable.
For the second moments, we use the elementary inequality $2|ab|\leq a^2+b^2$ with $a=X_{1,j}$ and $b=X_{1,\ell}$:
\begin{align*}
|X_{1,j}X_{1,\ell}| \leq \frac{1}{2}\bigl(X_{1,j}^2+X_{1,\ell}^2\bigr) \leq |X_1|^2.
\end{align*}
Since $\mathbb{E}[|X_1|^2]<\infty$, this proves $\mathbb{E}[|X_{1,j}X_{1,\ell}|]<\infty$.
The scalar random variables $X_{1,j},X_{2,j},\dots$ are independent and identically distributed because the vectors $X_1,X_2,\dots$ are independent and identically distributed. Likewise, the scalar random variables $X_{1,j}X_{1,\ell},X_{2,j}X_{2,\ell},\dots$ are independent and identically distributed. Therefore the Weak Law of Large Numbers (citing a result not yet in the wiki: Weak Law of Large Numbers) gives
\begin{align*}
M_{j,n} &\xrightarrow{\mathbb{P}} \mathbb{E}[X_{1,j}] = \mu_j,\\
Q_{j\ell,n} &\xrightarrow{\mathbb{P}} \mathbb{E}[X_{1,j}X_{1,\ell}].
\end{align*}
Since multiplication is continuous on $\mathbb{R}^2$, the continuous mapping theorem yields
\begin{align*}
M_{j,n}M_{\ell,n} \xrightarrow{\mathbb{P}} \mu_j\mu_\ell.
\end{align*}
Combining this with the covariance identity from the previous step,
\begin{align*}
(S_n)_{j\ell}
= Q_{j\ell,n}-M_{j,n}M_{\ell,n}
\xrightarrow{\mathbb{P}}
\mathbb{E}[X_{1,j}X_{1,\ell}] - \mu_j\mu_\ell
= \Sigma_{j\ell}.
\end{align*}
This proves entrywise convergence for every one of the finitely many entries.
[/guided]
[/step]
[step:Upgrade entrywise convergence to operator norm convergence]
Define the random matrix $A_n:\Omega\to\mathbb{R}^{p\times p}$ by $A_n=S_n-\Sigma$. Its Frobenius norm is
\begin{align*}
\|A_n\|_F
=
\left(\sum_{j=1}^p\sum_{\ell=1}^p |(A_n)_{j\ell}|^2\right)^{1/2}.
\end{align*}
Since the number of entries is finite and each entry converges to $0$ in probability, the continuous mapping theorem gives $\|A_n\|_F\xrightarrow{\mathbb{P}}0$. For every deterministic matrix $A\in\mathbb{R}^{p\times p}$,
\begin{align*}
\|A\|_{\mathrm{op}} \leq \|A\|_F.
\end{align*}
Applying this inequality to $A_n$ gives
\begin{align*}
\|S_n-\Sigma\|_{\mathrm{op}} = \|A_n\|_{\mathrm{op}}
\leq \|A_n\|_F
\xrightarrow{\mathbb{P}}0.
\end{align*}
[guided]
Entrywise convergence is enough because $p$ is fixed. Define
\begin{align*}
A_n = S_n-\Sigma.
\end{align*}
The Frobenius norm collects all entries into a single scalar:
\begin{align*}
\|A_n\|_F
=
\left(\sum_{j=1}^p\sum_{\ell=1}^p |(A_n)_{j\ell}|^2\right)^{1/2}.
\end{align*}
There are only $p^2$ entries, and $p$ does not depend on $n$. Since each $(A_n)_{j\ell}$ converges to $0$ in probability, the vector of entries of $A_n$ converges to the zero vector in $\mathbb{R}^{p^2}$ in probability. The map from this vector to its Euclidean norm is continuous, so the continuous mapping theorem gives
\begin{align*}
\|A_n\|_F \xrightarrow{\mathbb{P}}0.
\end{align*}
Now we compare norms. For any deterministic matrix $A\in\mathbb{R}^{p\times p}$ and any vector $v\in\mathbb{R}^p$ with $|v|=1$,
\begin{align*}
|Av|^2
&= \sum_{j=1}^p \left|\sum_{\ell=1}^p A_{j\ell}v_\ell\right|^2\\
&\leq \sum_{j=1}^p \left(\sum_{\ell=1}^p |A_{j\ell}|^2\right)\left(\sum_{\ell=1}^p |v_\ell|^2\right)\\
&= \sum_{j=1}^p\sum_{\ell=1}^p |A_{j\ell}|^2\\
&= \|A\|_F^2.
\end{align*}
The inequality is the Cauchy-Schwarz inequality applied to each row of $A$ and the vector $v$. Taking the supremum over all $v$ with $|v|=1$ gives $\|A\|_{\mathrm{op}}\leq \|A\|_F$. Therefore
\begin{align*}
\|S_n-\Sigma\|_{\mathrm{op}}
=
\|A_n\|_{\mathrm{op}}
\leq
\|A_n\|_F
\xrightarrow{\mathbb{P}}0.
\end{align*}
[/guided]
[/step]
[step:Use Weyl's perturbation bound to prove eigenvalue convergence]
Both $S_n$ and $\Sigma$ are real symmetric matrices. By Weyl's eigenvalue perturbation inequality for Hermitian matrices (citing a result not yet in the wiki: Weyl eigenvalue perturbation inequality),
\begin{align*}
\max_{1\leq k\leq p} |\hat{\lambda}_{k,n}-\lambda_k|
\leq
\|S_n-\Sigma\|_{\mathrm{op}}.
\end{align*}
Since $\|S_n-\Sigma\|_{\mathrm{op}}\xrightarrow{\mathbb{P}}0$, it follows that, for each $k\in\{1,\dots,p\}$,
\begin{align*}
\hat{\lambda}_{k,n}\xrightarrow{\mathbb{P}}\lambda_k.
\end{align*}
[/step]
[step:Exploit the spectral gap around a simple population eigenvalue]
Assume $\lambda_k$ is simple, and let $\gamma_k\in\mathbb{R}^p$ be a unit eigenvector of $\Sigma$ associated to $\lambda_k$. Since $\Sigma$ is real symmetric, choose an orthonormal eigenbasis $\gamma_1,\dots,\gamma_p$ of $\mathbb{R}^p$ such that $\Sigma\gamma_j=\lambda_j\gamma_j$ for each $j$. Define the spectral gap
\begin{align*}
\delta_k = \min_{j\neq k} |\lambda_j-\lambda_k|.
\end{align*}
Simplicity of $\lambda_k$ gives $\delta_k>0$.
Let $E_n=S_n-\Sigma$, and let $\eta_n=\|E_n\|_{\mathrm{op}}$. On the event $\eta_n<\delta_k/2$, Weyl's inequality gives
\begin{align*}
|\hat{\lambda}_{k,n}-\lambda_k| \leq \eta_n < \frac{\delta_k}{2}.
\end{align*}
Let $\hat{\gamma}_{k,n}$ be the signed unit eigenvector from the statement, so
\begin{align*}
S_n\hat{\gamma}_{k,n}=\hat{\lambda}_{k,n}\hat{\gamma}_{k,n},
\qquad
|\hat{\gamma}_{k,n}|=1,
\qquad
\hat{\gamma}_{k,n}^\top\gamma_k\geq0.
\end{align*}
Write the orthogonal expansion
\begin{align*}
\hat{\gamma}_{k,n}=a_n\gamma_k+w_n,
\end{align*}
where $a_n=\hat{\gamma}_{k,n}^\top\gamma_k\in[0,1]$ and $w_n\in\operatorname{span}\{\gamma_j:j\neq k\}$. Since
\begin{align*}
(\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}
=
-(S_n-\Sigma)\hat{\gamma}_{k,n}
=
-E_n\hat{\gamma}_{k,n},
\end{align*}
we have
\begin{align*}
\|(\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}\|_{\mathbb{R}^p}
\leq
\eta_n.
\end{align*}
Expanding in the eigenbasis gives
\begin{align*}
(\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}
=
a_n(\lambda_k-\hat{\lambda}_{k,n})\gamma_k
+
\sum_{j\neq k} c_{j,n}(\lambda_j-\hat{\lambda}_{k,n})\gamma_j,
\end{align*}
where $w_n=\sum_{j\neq k}c_{j,n}\gamma_j$. For $j\neq k$,
\begin{align*}
|\lambda_j-\hat{\lambda}_{k,n}|
\geq
|\lambda_j-\lambda_k|-|\lambda_k-\hat{\lambda}_{k,n}|
\geq
\delta_k-\eta_n
>
\frac{\delta_k}{2}.
\end{align*}
Therefore
\begin{align*}
\eta_n^2
&\geq
\|(\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}\|_{\mathbb{R}^p}^2\\
&\geq
\sum_{j\neq k}|c_{j,n}|^2|\lambda_j-\hat{\lambda}_{k,n}|^2\\
&\geq
\frac{\delta_k^2}{4}\sum_{j\neq k}|c_{j,n}|^2\\
&=
\frac{\delta_k^2}{4}|w_n|^2.
\end{align*}
Hence
\begin{align*}
|w_n| \leq \frac{2\eta_n}{\delta_k}
\end{align*}
on the event $\eta_n<\delta_k/2$.
[guided]
The eigenvector statement needs a gap. Since $\lambda_k$ is simple, no other eigenvalue of $\Sigma$ equals $\lambda_k$. We record the smallest separation from $\lambda_k$ to the rest of the spectrum:
\begin{align*}
\delta_k = \min_{j\neq k}|\lambda_j-\lambda_k|.
\end{align*}
This is positive because the set $\{j:j\neq k\}$ is finite and every term in the minimum is positive.
Let
\begin{align*}
E_n=S_n-\Sigma,
\qquad
\eta_n=\|E_n\|_{\mathrm{op}}.
\end{align*}
We work on the event $\eta_n<\delta_k/2$. Weyl's eigenvalue perturbation inequality gives
\begin{align*}
|\hat{\lambda}_{k,n}-\lambda_k|\leq \eta_n<\frac{\delta_k}{2}.
\end{align*}
This says the sample eigenvalue associated to the $k$-th component stays inside the spectral gap around $\lambda_k$.
Now take the signed unit eigenvector $\hat{\gamma}_{k,n}$ from the statement:
\begin{align*}
S_n\hat{\gamma}_{k,n}=\hat{\lambda}_{k,n}\hat{\gamma}_{k,n},
\qquad
|\hat{\gamma}_{k,n}|=1,
\qquad
\hat{\gamma}_{k,n}^\top\gamma_k\geq0.
\end{align*}
Because $\Sigma$ is real symmetric, the spectral theorem gives an orthonormal eigenbasis $\gamma_1,\dots,\gamma_p$ with $\Sigma\gamma_j=\lambda_j\gamma_j$. Decompose $\hat{\gamma}_{k,n}$ into the component in the target eigendirection and the orthogonal remainder:
\begin{align*}
\hat{\gamma}_{k,n}=a_n\gamma_k+w_n,
\end{align*}
where
\begin{align*}
a_n=\hat{\gamma}_{k,n}^\top\gamma_k\in[0,1],
\qquad
w_n\in\operatorname{span}\{\gamma_j:j\neq k\}.
\end{align*}
The sign choice is exactly what guarantees $a_n\geq0$.
The residual of $\hat{\gamma}_{k,n}$ under $\Sigma-\hat{\lambda}_{k,n}I$ is controlled by the perturbation:
\begin{align*}
(\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}
&=
(\Sigma-S_n)\hat{\gamma}_{k,n}
+
(S_n-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}\\
&=
-E_n\hat{\gamma}_{k,n},
\end{align*}
because $(S_n-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}=0$. Since $|\hat{\gamma}_{k,n}|=1$,
\begin{align*}
\|(\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}\|_{\mathbb{R}^p}
\leq
\|E_n\|_{\mathrm{op}}|\hat{\gamma}_{k,n}|
=
\eta_n.
\end{align*}
Write
\begin{align*}
w_n=\sum_{j\neq k}c_{j,n}\gamma_j.
\end{align*}
Then
\begin{align*}
(\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}
=
a_n(\lambda_k-\hat{\lambda}_{k,n})\gamma_k
+
\sum_{j\neq k} c_{j,n}(\lambda_j-\hat{\lambda}_{k,n})\gamma_j.
\end{align*}
For every $j\neq k$, the triangle inequality gives
\begin{align*}
|\lambda_j-\hat{\lambda}_{k,n}|
&\geq
|\lambda_j-\lambda_k|-|\lambda_k-\hat{\lambda}_{k,n}|\\
&\geq
\delta_k-\eta_n\\
&>
\frac{\delta_k}{2}.
\end{align*}
Since the basis is orthonormal, the squared norm is the sum of squared coefficients. Therefore
\begin{align*}
\eta_n^2
&\geq
\|(\Sigma-\hat{\lambda}_{k,n}I)\hat{\gamma}_{k,n}\|_{\mathbb{R}^p}^2\\
&\geq
\sum_{j\neq k}|c_{j,n}|^2|\lambda_j-\hat{\lambda}_{k,n}|^2\\
&\geq
\frac{\delta_k^2}{4}\sum_{j\neq k}|c_{j,n}|^2\\
&=
\frac{\delta_k^2}{4}|w_n|^2.
\end{align*}
Thus
\begin{align*}
|w_n|\leq \frac{2\eta_n}{\delta_k}.
\end{align*}
The meaning of this estimate is that a small perturbation cannot move the eigenvector significantly into the orthogonal spectral subspace when the target eigenvalue is separated from the rest of the spectrum.
[/guided]
[/step]
[step:Convert the orthogonal component estimate into eigenvector convergence]
On the event $\eta_n<\delta_k/2$, the previous step gives
\begin{align*}
|w_n|\leq \frac{2\eta_n}{\delta_k}.
\end{align*}
Since $|\hat{\gamma}_{k,n}|=1$, $|\gamma_k|=1$, and $\hat{\gamma}_{k,n}=a_n\gamma_k+w_n$ with $w_n\perp\gamma_k$, we have
\begin{align*}
a_n^2+|w_n|^2=1.
\end{align*}
Because $a_n\geq0$,
\begin{align*}
|\hat{\gamma}_{k,n}-\gamma_k|^2
&=
|(a_n-1)\gamma_k+w_n|^2\\
&=
(1-a_n)^2+|w_n|^2\\
&\leq
2|w_n|^2.
\end{align*}
Thus, on $\{\eta_n<\delta_k/2\}$,
\begin{align*}
|\hat{\gamma}_{k,n}-\gamma_k|
\leq
\sqrt{2}|w_n|
\leq
\frac{2\sqrt{2}}{\delta_k}\eta_n.
\end{align*}
Since $\eta_n=\|S_n-\Sigma\|_{\mathrm{op}}\xrightarrow{\mathbb{P}}0$, the right-hand side converges to $0$ in probability, and
\begin{align*}
\mathbb{P}\left(\eta_n\geq\frac{\delta_k}{2}\right)\to0.
\end{align*}
It follows that
\begin{align*}
\hat{\gamma}_{k,n}\xrightarrow{\mathbb{P}}\gamma_k.
\end{align*}
This proves the eigenvector consistency claim and completes the proof.
[/step]