[proofplan]
The proof first removes the degeneracy of $\Sigma$ by working on its range and whitening the random vectors there. This reduces the problem to a uniform quadratic-process bound for an isotropic sub-Gaussian vector indexed by the ellipsoid $T=\{\Sigma^{1/2}u:u\in S,\ |u|\leq 1\}$ on the range $S$ of $\Sigma$. The Gaussian width of this ellipsoid is bounded by $\operatorname{tr}(\Sigma)^{1/2}$, while its radius is bounded by $\|\Sigma\|_{\mathrm{op}}^{1/2}$. Substituting these two geometric quantities into the sub-Gaussian quadratic-process inequality gives the stated effective-rank estimate.
[/proofplan]
custom_env
admin
[step:Dispose of the zero covariance case]
Assume first that $\Sigma=0$. Since $X_1$ is mean-zero and has covariance $0$, we have
\begin{align*}
\mathbb{E}[|X_1|^2]
=
\operatorname{tr}\bigl(\mathbb{E}[X_1\otimes X_1]\bigr)
=
\operatorname{tr}(\Sigma)
=
0.
\end{align*}
Because $|X_1|^2 \geq 0$, this implies $X_1=0$ almost surely. The same holds for each $X_i$ because the random vectors are identically distributed. Hence
\begin{align*}
\widehat{\Sigma}
=
\frac{1}{n}\sum_{i=1}^n X_i\otimes X_i
=
0
=
\Sigma
\end{align*}
almost surely. The asserted estimate is therefore immediate in this case.
[/step]
custom_env
admin
[step:Whiten the sample on the range of the covariance]Assume now that $\Sigma \neq 0$. Let $S:=\operatorname{Range}(\Sigma) \subseteq \mathbb{R}^p$, equipped with the Euclidean [inner product](/page/Inner%20Product) inherited from $\mathbb{R}^p$. Let $\Sigma^{1/2}:S\to S$ be the positive square root of $\Sigma|_S$, and let $\Sigma^{-1/2}:S\to S$ be its inverse.
For every $v\in S^\perp=\ker(\Sigma)$,
\begin{align*}
\mathbb{E}\bigl[(v\cdot X_1)^2\bigr]
=
v^\top \Sigma v
=
0,
\end{align*}
so $v\cdot X_1=0$ almost surely. Since $S^\perp$ is finite-dimensional, $X_1\in S$ almost surely, and therefore $X_i\in S$ almost surely for every $i$.
Define whitened random vectors $Z_i:\Omega\to S$ by $Z_i(\omega):=\Sigma^{-1/2}X_i(\omega)$. For every $a\in S$, the identity $a\cdot Z_i=(\Sigma^{-1/2}a)\cdot X_i$ holds almost surely. Therefore
\begin{align*}
\mathbb{E}\bigl[(a\cdot Z_i)^2\bigr]
=
\mathbb{E}\bigl[(\Sigma^{-1/2}a\cdot X_i)^2\bigr].
\end{align*}
By the covariance identity for $X_i$,
\begin{align*}
\mathbb{E}\bigl[(\Sigma^{-1/2}a\cdot X_i)^2\bigr]
=
(\Sigma^{-1/2}a)^\top \Sigma(\Sigma^{-1/2}a).
\end{align*}
Since $\Sigma^{-1/2}$ is the inverse of $\Sigma^{1/2}$ on $S$, this last quantity equals $|a|^2$. Thus $Z_i$ is isotropic on $S$. Moreover the relative sub-Gaussian hypothesis applied to $v=\Sigma^{-1/2}a$ gives
\begin{align*}
\|a\cdot Z_i\|_{\psi_2}
=
\|\Sigma^{-1/2}a\cdot X_i\|_{\psi_2}
\leq
K|a|.
\end{align*}
Hence $Z_1,\dots,Z_n$ are independent isotropic $K$-sub-Gaussian random vectors in the Euclidean space $S$.[/step]
custom_env
admin
[guided]The covariance matrix may be singular, so whitening on all of $\mathbb{R}^p$ need not make sense. The correct space is the range
$S:=\operatorname{Range}(\Sigma)$. On this space $\Sigma|_S$ is positive definite, so its positive square root
$\Sigma^{1/2}:S\to S$ is invertible and $\Sigma^{-1/2}:S\to S$ is well-defined.
First we check that no part of the random vector lives outside $S$. Since $\Sigma$ is symmetric positive semidefinite, $S^\perp=\ker(\Sigma)$. If $v\in S^\perp$, then
\begin{align*}
\mathbb{E}\bigl[(v\cdot X_1)^2\bigr]
=
v^\top\Sigma v
=
0.
\end{align*}
The [random variable](/page/Random%20Variable) $(v\cdot X_1)^2$ is non-negative, so it must vanish almost surely. Taking a basis of the finite-dimensional space $S^\perp$, we obtain $v\cdot X_1=0$ for every $v\in S^\perp$ almost surely, hence $X_1\in S$ almost surely. The same argument applies to each $X_i$.
Now define the map $Z_i:\Omega\to S$ by $Z_i(\omega):=\Sigma^{-1/2}X_i(\omega)$. For $a\in S$, the scalar random variable $a\cdot Z_i$ is the same as $(\Sigma^{-1/2}a)\cdot X_i$. Therefore
\begin{align*}
\mathbb{E}\bigl[(a\cdot Z_i)^2\bigr]
=
\mathbb{E}\bigl[(\Sigma^{-1/2}a\cdot X_i)^2\bigr].
\end{align*}
By the definition of covariance for $X_i$,
\begin{align*}
\mathbb{E}\bigl[(\Sigma^{-1/2}a\cdot X_i)^2\bigr]
=
(\Sigma^{-1/2}a)^\top\Sigma(\Sigma^{-1/2}a).
\end{align*}
Because $\Sigma^{-1/2}$ is the inverse of $\Sigma^{1/2}$ on $S$, the final expression is $|a|^2$. This proves that the whitened vector is isotropic on $S$. The same substitution in the sub-Gaussian assumption gives
\begin{align*}
\|a\cdot Z_i\|_{\psi_2}
=
\|\Sigma^{-1/2}a\cdot X_i\|_{\psi_2}
\leq
K\bigl((\Sigma^{-1/2}a)^\top\Sigma(\Sigma^{-1/2}a)\bigr)^{1/2}
=
K|a|.
\end{align*}
Thus the whitening step has converted the original covariance structure into an isotropic sub-Gaussian problem on the smaller Euclidean space $S$.[/guided]
custom_env
admin
[step:Rewrite the operator norm as a quadratic process over an ellipsoid]Let
\begin{align*}
T:=\{\Sigma^{1/2}u:u\in S,\ |u|\leq 1\}\subseteq S.
\end{align*}
For $u\in S$ with $|u|\leq 1$, set $a:=\Sigma^{1/2}u\in T$. Since $X_i=\Sigma^{1/2}Z_i$ almost surely,
\begin{align*}
u^\top(\widehat{\Sigma}-\Sigma)u
=
\frac{1}{n}\sum_{i=1}^n (u\cdot X_i)^2-u^\top\Sigma u.
\end{align*}
Using $a=\Sigma^{1/2}u$ and isotropy of $Z_i$, this becomes
\begin{align*}
u^\top(\widehat{\Sigma}-\Sigma)u
=
\frac{1}{n}\sum_{i=1}^n\left((a\cdot Z_i)^2-\mathbb{E}\bigl[(a\cdot Z_i)^2\bigr]\right).
\end{align*}
Because $\widehat{\Sigma}-\Sigma$ vanishes on $S^\perp$ and maps $S$ into $S$, the variational formula for the operator norm of a symmetric matrix gives
\begin{align*}
\|\widehat{\Sigma}-\Sigma\|_{\mathrm{op}}
=
\sup_{a\in T}
\left|
\frac{1}{n}\sum_{i=1}^n\left((a\cdot Z_i)^2-\mathbb{E}\bigl[(a\cdot Z_i)^2\bigr]\right)
\right|.
\end{align*}[/step]
custom_env
admin
[guided]The goal is to turn the matrix norm into a supremum of scalar random variables, because the concentration input applies to quadratic processes. Define
\begin{align*}
T:=\{\Sigma^{1/2}u:u\in S,\ |u|\leq 1\}\subseteq S.
\end{align*}
For each $u\in S$ with $|u|\leq 1$, set $a:=\Sigma^{1/2}u\in T$. Since $X_i=\Sigma^{1/2}Z_i$ almost surely, we have $u\cdot X_i=(\Sigma^{1/2}u)\cdot Z_i=a\cdot Z_i$. Also, isotropy of $Z_i$ on $S$ gives
\begin{align*}
\mathbb{E}\bigl[(a\cdot Z_i)^2\bigr]=|a|^2=u^\top\Sigma u.
\end{align*}
Therefore
\begin{align*}
u^\top(\widehat{\Sigma}-\Sigma)u
=
\frac{1}{n}\sum_{i=1}^n\left((a\cdot Z_i)^2-\mathbb{E}\bigl[(a\cdot Z_i)^2\bigr]\right).
\end{align*}
Why is this enough for the operator norm? The matrix $\widehat{\Sigma}-\Sigma$ is symmetric, maps $S$ into $S$, and vanishes on $S^\perp$, because every $X_i$ lies in $S$ almost surely and $\Sigma$ vanishes on $S^\perp$. Hence the variational formula for the operator norm of a symmetric matrix reduces the supremum to unit vectors in $S$:
\begin{align*}
\|\widehat{\Sigma}-\Sigma\|_{\mathrm{op}}
=
\sup_{u\in S,\ |u|\leq 1}|u^\top(\widehat{\Sigma}-\Sigma)u|.
\end{align*}
The map $u\mapsto a=\Sigma^{1/2}u$ sends this unit ball onto the ellipsoid $T$, so the last display becomes
\begin{align*}
\|\widehat{\Sigma}-\Sigma\|_{\mathrm{op}}
=
\sup_{a\in T}
\left|
\frac{1}{n}\sum_{i=1}^n\left((a\cdot Z_i)^2-\mathbb{E}\bigl[(a\cdot Z_i)^2\bigr]\right)
\right|.
\end{align*}
This is the desired quadratic-process form.[/guided]
custom_env
admin
[step:Estimate the radius and Gaussian width of the covariance ellipsoid]
Define the Euclidean radius of $T$ by
\begin{align*}
R(T):=\sup_{a\in T}|a|.
\end{align*}
Since $a=\Sigma^{1/2}u$ with $|u|\leq 1$,
\begin{align*}
R(T)\leq \|\Sigma^{1/2}\|_{\mathrm{op}}=\|\Sigma\|_{\mathrm{op}}^{1/2}.
\end{align*}
Let $g:\Omega\to S$ be a standard Gaussian random vector on $S$, and define the Gaussian width
\begin{align*}
w(T):=\mathbb{E}\left[\sup_{a\in T} g\cdot a\right].
\end{align*}
Then the Euclidean duality formula for the unit ball gives
\begin{align*}
w(T)
=
\mathbb{E}\left[|\Sigma^{1/2}g|\right].
\end{align*}
Applying [Jensen's inequality](/theorems/1977) to the concave square-root function yields
\begin{align*}
w(T)
\leq
\left(\mathbb{E}\bigl[|\Sigma^{1/2}g|^2\bigr]\right)^{1/2}.
\end{align*}
Since $g$ is standard Gaussian on $S$, the second moment is $\operatorname{tr}(\Sigma)$. Hence
\begin{align*}
w(T)
\leq
\operatorname{tr}(\Sigma)^{1/2}.
\end{align*}
[/step]
custom_env
admin
[step:Apply the sub-Gaussian quadratic-process bound]We use the following standard uniform quadratic-process concentration inequality, namely the generic-chaining sub-Gaussian quadratic-process bound of Dirksen specialized to isotropic linear functionals: for independent isotropic $K$-sub-Gaussian random vectors $Z_1,\dots,Z_n$ in a finite-dimensional Euclidean space and every bounded set $T$ in that space, there is a constant $A_K>0$, depending only on $K$, such that for every $t\geq 1$, with probability at least $1-e^{-t}$,
\begin{align*}
\sup_{a\in T}
\left|
\frac{1}{n}\sum_{i=1}^n\left((a\cdot Z_i)^2-\mathbb{E}\bigl[(a\cdot Z_i)^2\bigr]\right)
\right|
\leq
A_K\left[
\frac{R(T)w(T)}{\sqrt{n}}
+
\frac{R(T)^2t^{1/2}}{\sqrt{n}}
+
\frac{w(T)^2}{n}
+
\frac{R(T)^2t}{n}
\right].
\end{align*}
This standard inequality is the external concentration input in the proof; it is the generic-chaining quadratic-process bound for isotropic sub-Gaussian vectors, and all remaining work is the verification of its hypotheses and the substitution of the geometric parameters of $T$.
The hypotheses of this inequality hold by the whitening step: the vectors $Z_i$ are independent, isotropic, and $K$-sub-Gaussian in the Euclidean space $S$, and the set $T$ is bounded because $R(T)\leq\|\Sigma\|_{\mathrm{op}}^{1/2}$. Substituting the bounds
\begin{align*}
R(T)\leq \|\Sigma\|_{\mathrm{op}}^{1/2},
\qquad
w(T)\leq \operatorname{tr}(\Sigma)^{1/2}
\end{align*}
gives, with probability at least $1-e^{-t}$,
\begin{align*}
\|\widehat{\Sigma}-\Sigma\|_{\mathrm{op}}
\leq
A_K\left[
\frac{\|\Sigma\|_{\mathrm{op}}^{1/2}\operatorname{tr}(\Sigma)^{1/2}}{\sqrt{n}}
+
\frac{\|\Sigma\|_{\mathrm{op}}t^{1/2}}{\sqrt{n}}
+
\frac{\operatorname{tr}(\Sigma)}{n}
+
\frac{\|\Sigma\|_{\mathrm{op}}t}{n}
\right].
\end{align*}[/step]
custom_env
admin
[guided]At this stage the random-matrix problem has already been reduced to a standard concentration theorem. We apply Dirksen's generic-chaining sub-Gaussian quadratic-process bound, specialized to isotropic linear functionals, to the independent random vectors $Z_1,\dots,Z_n$. The whitening step proved that these vectors are isotropic and $K$-sub-Gaussian in the finite-dimensional Euclidean space $S$, and the preceding geometric estimate proved that $T$ is bounded with
\begin{align*}
R(T)\leq \|\Sigma\|_{\mathrm{op}}^{1/2},
\qquad
w(T)\leq \operatorname{tr}(\Sigma)^{1/2}.
\end{align*}
Thus the theorem applies for every $t\geq 1$ and gives, with probability at least $1-e^{-t}$,
\begin{align*}
\sup_{a\in T}
\left|
\frac{1}{n}\sum_{i=1}^n\left((a\cdot Z_i)^2-\mathbb{E}\bigl[(a\cdot Z_i)^2\bigr]\right)
\right|
\leq
A_K\left[
\frac{R(T)w(T)}{\sqrt{n}}
+
\frac{R(T)^2t^{1/2}}{\sqrt{n}}
+
\frac{w(T)^2}{n}
+
\frac{R(T)^2t}{n}
\right].
\end{align*}
The previous step identified the left-hand side with $\|\widehat{\Sigma}-\Sigma\|_{\mathrm{op}}$. Substituting the displayed bounds for $R(T)$ and $w(T)$ yields
\begin{align*}
\|\widehat{\Sigma}-\Sigma\|_{\mathrm{op}}
\leq
A_K\left[
\frac{\|\Sigma\|_{\mathrm{op}}^{1/2}\operatorname{tr}(\Sigma)^{1/2}}{\sqrt{n}}
+
\frac{\|\Sigma\|_{\mathrm{op}}t^{1/2}}{\sqrt{n}}
+
\frac{\operatorname{tr}(\Sigma)}{n}
+
\frac{\|\Sigma\|_{\mathrm{op}}t}{n}
\right].
\end{align*}
This is the point where the dimension-free geometry of the ellipsoid enters: the bound depends on $\operatorname{tr}(\Sigma)$ and $\|\Sigma\|_{\mathrm{op}}$, not on the ambient dimension $p$ directly.[/guided]
custom_env
admin
[step:Rewrite the estimate in effective-rank form]
Since $\Sigma\neq 0$, $\|\Sigma\|_{\mathrm{op}}>0$, and
\begin{align*}
\operatorname{tr}(\Sigma)=r_{\mathrm{eff}}(\Sigma)\|\Sigma\|_{\mathrm{op}}.
\end{align*}
Therefore
First,
\begin{align*}
\frac{\|\Sigma\|_{\mathrm{op}}^{1/2}\operatorname{tr}(\Sigma)^{1/2}}{\sqrt{n}}
+
\frac{\|\Sigma\|_{\mathrm{op}}t^{1/2}}{\sqrt{n}}
=
\|\Sigma\|_{\mathrm{op}}
\left[
\sqrt{\frac{r_{\mathrm{eff}}(\Sigma)}{n}}
+
\sqrt{\frac{t}{n}}
\right].
\end{align*}
Using $\sqrt{a}+\sqrt{b}\leq 2\sqrt{a+b}$ for $a,b\geq 0$, this implies
\begin{align*}
\frac{\|\Sigma\|_{\mathrm{op}}^{1/2}\operatorname{tr}(\Sigma)^{1/2}}{\sqrt{n}}
+
\frac{\|\Sigma\|_{\mathrm{op}}t^{1/2}}{\sqrt{n}}
\leq
2\|\Sigma\|_{\mathrm{op}}
\sqrt{\frac{r_{\mathrm{eff}}(\Sigma)+t}{n}}.
\end{align*}
and
\begin{align*}
\frac{\operatorname{tr}(\Sigma)}{n}
+
\frac{\|\Sigma\|_{\mathrm{op}}t}{n}
=
\|\Sigma\|_{\mathrm{op}}
\frac{r_{\mathrm{eff}}(\Sigma)+t}{n}.
\end{align*}
Absorbing the numerical factor into the constant, take any $C_K\geq 3A_K$, depending only on $K$. Then, with probability at least $1-e^{-t}$,
\begin{align*}
\|\widehat{\Sigma}-\Sigma\|_{\mathrm{op}}
\leq
C_K\|\Sigma\|_{\mathrm{op}}
\left[
\sqrt{\frac{r_{\mathrm{eff}}(\Sigma)+t}{n}}
+
\frac{r_{\mathrm{eff}}(\Sigma)+t}{n}
\right].
\end{align*}
This is the asserted sample covariance concentration bound.
[/step]