[proofplan]
The proof first identifies the Ledoit-Wolf oracle shrinkage intensity for the population-target risk by expanding the expected Frobenius risk as a quadratic polynomial in the shrinkage parameter. The minimizer depends on two population quantities: the fluctuation size $\beta_n^2$ of the sample covariance and the total distance $\delta_n^2$ from the shrinkage target. The Ledoit-Wolf estimator replaces these quantities by consistent sample estimates, and the assumed high-dimensional laws of large numbers imply that the resulting data-dependent shrinkage intensity converges to the oracle one. The final step proves asymptotic equivalence of the data-driven shrinkage estimator and the oracle shrinkage estimator, avoiding the stronger and generally false claim that the deterministic oracle minimizes each realized sample loss.
[/proofplan]
[step:Expand the loss as a quadratic function of the shrinkage intensity]
Fix $n \ge 1$. For $\alpha \in [0,1]$, define the linear shrinkage estimator
\begin{align*}
\hat{\Sigma}_n(\alpha)
=
\alpha\hat{\mu}_n I_{p_n}
+
(1-\alpha)S_n.
\end{align*}
Define also the centered sample direction
\begin{align*}
A_n := S_n-\hat{\mu}_n I_{p_n}
\end{align*}
and the sample covariance error
\begin{align*}
E_n := S_n-\Sigma_n.
\end{align*}
Then
\begin{align*}
\hat{\Sigma}_n(\alpha)-\Sigma_n
=
E_n-\alpha A_n.
\end{align*}
Define the realized normalized Frobenius loss map $L_n:[0,1]\to[0,\infty)$ by
\begin{align*}
L_n(\alpha)
&:=
\frac{1}{p_n}\|E_n-\alpha A_n\|_F^2\\
&=
\frac{1}{p_n}\|E_n\|_F^2
-
\frac{2\alpha}{p_n}\operatorname{tr}(E_nA_n)
+
\frac{\alpha^2}{p_n}\|A_n\|_F^2.
\end{align*}
Thus, for the realized sample, $L_n$ is a convex quadratic polynomial in $\alpha$.
[guided]
We first isolate exactly how the shrinkage parameter enters the loss. The estimator is a point on the line segment between $S_n$ and the scalar matrix $\hat{\mu}_n I_{p_n}$. Writing
\begin{align*}
A_n := S_n-\hat{\mu}_n I_{p_n}
\end{align*}
means that increasing $\alpha$ moves the estimator away from $S_n$ in the direction $-A_n$. Since
\begin{align*}
\hat{\Sigma}_n(\alpha)
=
S_n-\alpha(S_n-\hat{\mu}_n I_{p_n})
=
S_n-\alpha A_n,
\end{align*}
subtracting $\Sigma_n$ gives
\begin{align*}
\hat{\Sigma}_n(\alpha)-\Sigma_n
=
S_n-\Sigma_n-\alpha A_n.
\end{align*}
With
\begin{align*}
E_n := S_n-\Sigma_n,
\end{align*}
this becomes $E_n-\alpha A_n$. Expanding the Frobenius norm by the identity
\begin{align*}
\|B-\alpha C\|_F^2
=
\|B\|_F^2
-
2\alpha\operatorname{tr}(BC)
+
\alpha^2\|C\|_F^2
\end{align*}
for symmetric matrices $B,C \in \mathbb{R}^{p_n\times p_n}$ gives
\begin{align*}
L_n(\alpha)
&=
\frac{1}{p_n}\|E_n-\alpha A_n\|_F^2\\
&=
\frac{1}{p_n}\|E_n\|_F^2
-
\frac{2\alpha}{p_n}\operatorname{tr}(E_nA_n)
+
\frac{\alpha^2}{p_n}\|A_n\|_F^2.
\end{align*}
This is the key reduction: optimal shrinkage is now a one-dimensional quadratic minimization problem.
[/guided]
[/step]
[step:Identify the deterministic oracle shrinkage intensity]
Define the deterministic oracle quantities
\begin{align*}
\beta_n^2 &= \mathbb{E}\|S_n-\Sigma_n\|_F^2,\\
\delta_n^2 &= \mathbb{E}\|S_n-\mu_n I_{p_n}\|_F^2.
\end{align*}
Since $\mathbb{E}S_n=\Sigma_n$, we have
\begin{align*}
\mathbb{E}\operatorname{tr}\bigl((S_n-\Sigma_n)(\Sigma_n-\mu_n I_{p_n})\bigr)=0.
\end{align*}
Therefore
\begin{align*}
\delta_n^2
&=
\mathbb{E}\|S_n-\mu_n I_{p_n}\|_F^2\\
&=
\mathbb{E}\|S_n-\Sigma_n\|_F^2
+
\|\Sigma_n-\mu_n I_{p_n}\|_F^2\\
&=
\beta_n^2+\|\Sigma_n-\mu_n I_{p_n}\|_F^2.
\end{align*}
Now define the population-target shrinkage family
\begin{align*}
\widetilde{\Sigma}_n(\alpha)
=
\alpha\mu_n I_{p_n}+(1-\alpha)S_n,
\qquad 0\le \alpha\le1,
\end{align*}
and define its expected normalized Frobenius risk $R_n:[0,1]\to[0,\infty)$ by
\begin{align*}
R_n(\alpha)
=
\frac{1}{p_n}\mathbb{E}\left\|\widetilde{\Sigma}_n(\alpha)-\Sigma_n\right\|_F^2.
\end{align*}
Since
\begin{align*}
\widetilde{\Sigma}_n(\alpha)-\Sigma_n
=
(S_n-\Sigma_n)-\alpha(S_n-\mu_n I_{p_n}),
\end{align*}
we obtain
\begin{align*}
R_n(\alpha)
&=
\frac{1}{p_n}\left(\beta_n^2-2\alpha\beta_n^2+\alpha^2\delta_n^2\right).
\end{align*}
This quadratic has unconstrained minimizer $\beta_n^2/\delta_n^2$. Therefore the constrained minimizer on $[0,1]$ is
\begin{align*}
\alpha_n^*
=
\min\left\{1,\max\left\{0,\frac{\beta_n^2}{\delta_n^2}\right\}\right\}.
\end{align*}
Because $\beta_n^2 \ge 0$ and $\delta_n^2>0$ for all sufficiently large $n$, this is the Ledoit-Wolf oracle intensity for the population-target risk $R_n$.
[guided]
The oracle parameter is the value that would be used if the population quantities were known. We define
\begin{align*}
\beta_n^2 = \mathbb{E}\|S_n-\Sigma_n\|_F^2
\end{align*}
as the total sampling fluctuation of the sample covariance matrix, and
\begin{align*}
\delta_n^2 = \mathbb{E}\|S_n-\mu_n I_{p_n}\|_F^2
\end{align*}
as the total expected squared distance from the scalar shrinkage target.
The cross-term between the sampling error and the deterministic population deviation vanishes. Indeed, since
\begin{align*}
S_n=\frac{1}{n}\sum_{k=1}^{n}X_{n,k}X_{n,k}^{\top}
\end{align*}
and $\mathbb{E}[X_{n,k}X_{n,k}^{\top}]=\Sigma_n$, linearity of expectation gives $\mathbb{E}S_n=\Sigma_n$. Hence
\begin{align*}
\mathbb{E}\operatorname{tr}\bigl((S_n-\Sigma_n)(\Sigma_n-\mu_n I_{p_n})\bigr)
=
\operatorname{tr}\bigl((\mathbb{E}S_n-\Sigma_n)(\Sigma_n-\mu_n I_{p_n})\bigr)
=
0.
\end{align*}
Expanding $S_n-\mu_n I_{p_n}$ as
\begin{align*}
S_n-\mu_n I_{p_n}
=
(S_n-\Sigma_n)+(\Sigma_n-\mu_n I_{p_n})
\end{align*}
therefore yields
\begin{align*}
\delta_n^2
=
\beta_n^2+\|\Sigma_n-\mu_n I_{p_n}\|_F^2.
\end{align*}
For the population-target estimator
\begin{align*}
\widetilde{\Sigma}_n(\alpha)
=
\alpha\mu_n I_{p_n}+(1-\alpha)S_n,
\end{align*}
the expected normalized Frobenius risk is
\begin{align*}
R_n(\alpha)
=
\frac{1}{p_n}\mathbb{E}\left\|\widetilde{\Sigma}_n(\alpha)-\Sigma_n\right\|_F^2.
\end{align*}
The identity
\begin{align*}
\widetilde{\Sigma}_n(\alpha)-\Sigma_n
=
(S_n-\Sigma_n)-\alpha(S_n-\mu_n I_{p_n})
\end{align*}
gives
\begin{align*}
R_n(\alpha)
=
\frac{1}{p_n}\left(\beta_n^2-2\alpha\beta_n^2+\alpha^2\delta_n^2\right).
\end{align*}
Thus the fraction $\beta_n^2/\delta_n^2$ is the unconstrained minimizer of this quadratic risk. Since the shrinkage family restricts $\alpha$ to $[0,1]$, the oracle intensity is the clipped value
\begin{align*}
\alpha_n^*
=
\min\left\{1,\max\left\{0,\frac{\beta_n^2}{\delta_n^2}\right\}\right\}.
\end{align*}
[/guided]
[/step]
[step:Use the Ledoit-Wolf laws of large numbers to estimate the oracle ratio]
By the assumed Ledoit-Wolf high-dimensional moment conditions,
\begin{align*}
\frac{1}{p_n}\left|\hat{\beta}_n^2-\beta_n^2\right|
\xrightarrow{\mathbb{P}}0,
\qquad
\frac{1}{p_n}\left|\hat{\delta}_n^2-\delta_n^2\right|
\xrightarrow{\mathbb{P}}0.
\end{align*}
Set
\begin{align*}
B_n=\frac{\beta_n^2}{p_n},
\qquad
D_n=\frac{\delta_n^2}{p_n},
\qquad
\widehat B_n=\frac{\hat{\beta}_n^2}{p_n},
\qquad
\widehat D_n=\frac{\hat{\delta}_n^2}{p_n}.
\end{align*}
By assumption, $\widehat B_n-B_n\xrightarrow{\mathbb P}0$ and $\widehat D_n-D_n\xrightarrow{\mathbb P}0$. Also $D_n$ is bounded below away from $0$ eventually, while $B_n$ and $D_n$ are bounded above. Hence $\widehat D_n$ is bounded away from $0$ with probability tending to $1$, and the continuous-mapping theorem gives
\begin{align*}
\frac{\hat{\beta}_n^2}{\hat{\delta}_n^2}
=
\frac{\widehat B_n}{\widehat D_n}
\xrightarrow{\mathbb P}
\frac{B_n}{D_n}
=
\frac{\beta_n^2}{\delta_n^2}.
\end{align*}
Equivalently, on events where $\widehat D_n$ is bounded below by a fixed positive constant,
\begin{align*}
\left|
\frac{\widehat B_n}{\widehat D_n}
-
\frac{B_n}{D_n}
\right|
\le
\frac{|\widehat B_n-B_n|}{\widehat D_n}
+
\frac{|B_n|\,|\widehat D_n-D_n|}{\widehat D_nD_n},
\end{align*}
and the right-hand side converges to $0$ in probability by the boundedness and lower-bound assumptions.
Equivalently,
\begin{align*}
\frac{\hat{\beta}_n^2}{\hat{\delta}_n^2}
-
\frac{\beta_n^2}{\delta_n^2}
\xrightarrow{\mathbb{P}}0.
\end{align*}
The clipping map $t\mapsto \min\{1,\max\{0,t\}\}$ is Lipschitz with constant $1$, so
\begin{align*}
\hat{\alpha}_n-\alpha_n^*
\xrightarrow{\mathbb{P}}0.
\end{align*}
[guided]
The two sample quantities $\hat{\beta}_n^2$ and $\hat{\delta}_n^2$ are designed to estimate the two population quantities entering the oracle ratio. The Ledoit-Wolf moment assumptions give the normalized consistency statements
\begin{align*}
\frac{1}{p_n}\left|\hat{\beta}_n^2-\beta_n^2\right|
\xrightarrow{\mathbb{P}}0
\end{align*}
and
\begin{align*}
\frac{1}{p_n}\left|\hat{\delta}_n^2-\delta_n^2\right|
\xrightarrow{\mathbb{P}}0.
\end{align*}
The normalization by $p_n$ is the correct scale because Frobenius losses for $p_n\times p_n$ covariance matrices grow linearly in $p_n$ under the bounded trace assumptions.
The denominator is not allowed to degenerate: the hypothesis
\begin{align*}
\liminf_{n\to\infty}p_n^{-1}\delta_n^2>0
\end{align*}
says that $\delta_n^2$ remains of order at least $p_n$. The hypotheses also give uniform upper bounds for $p_n^{-1}\beta_n^2$ and $p_n^{-1}\delta_n^2$. With
\begin{align*}
B_n=\frac{\beta_n^2}{p_n},
\quad
D_n=\frac{\delta_n^2}{p_n},
\quad
\widehat B_n=\frac{\hat{\beta}_n^2}{p_n},
\quad
\widehat D_n=\frac{\hat{\delta}_n^2}{p_n},
\end{align*}
the consistency assumptions say $\widehat B_n-B_n\to0$ and $\widehat D_n-D_n\to0$ in probability. Since $D_n$ stays bounded away from $0$, $\widehat D_n$ also stays bounded away from $0$ with probability tending to $1$. Hence the ratio is stable:
\begin{align*}
\frac{\hat{\beta}_n^2}{\hat{\delta}_n^2}
-
\frac{\beta_n^2}{\delta_n^2}
=
\frac{\widehat B_n}{\widehat D_n}
-
\frac{B_n}{D_n}
\xrightarrow{\mathbb{P}}0.
\end{align*}
The elementary bound behind this convergence is
\begin{align*}
\left|
\frac{\widehat B_n}{\widehat D_n}
-
\frac{B_n}{D_n}
\right|
\le
\frac{|\widehat B_n-B_n|}{\widehat D_n}
+
\frac{|B_n|\,|\widehat D_n-D_n|}{\widehat D_nD_n},
\end{align*}
on the high-probability events where $\widehat D_n$ is bounded below.
Finally, the clipping map
\begin{align*}
t \mapsto \min\{1,\max\{0,t\}\}
\end{align*}
cannot enlarge distances, because projecting two [real numbers](/page/Real%20Numbers) onto the closed interval $[0,1]$ decreases their absolute difference. Thus
\begin{align*}
\hat{\alpha}_n-\alpha_n^*
\xrightarrow{\mathbb{P}}0.
\end{align*}
[/guided]
[/step]
[step:Convert convergence of shrinkage intensities into oracle estimator equivalence]
The difference between the data-driven estimator and the oracle estimator is
\begin{align*}
\hat{\Sigma}_{n}(\hat{\alpha}_n)-\hat{\Sigma}_{n}(\alpha_n^*)
=
(\alpha_n^*-\hat{\alpha}_n)(S_n-\hat{\mu}_n I_{p_n}).
\end{align*}
Therefore
\begin{align*}
\frac{1}{p_n}
\left\|
\hat{\Sigma}_{n}(\hat{\alpha}_n)-\hat{\Sigma}_{n}(\alpha_n^*)
\right\|_F^2
=
|\hat{\alpha}_n-\alpha_n^*|^2
\frac{1}{p_n}\|S_n-\hat{\mu}_n I_{p_n}\|_F^2.
\end{align*}
Because $\hat{\delta}_n^2=\|S_n-\hat{\mu}_n I_{p_n}\|_F^2$, the consistency law for $\hat{\delta}_n^2$ and the uniform bound on $p_n^{-1}\delta_n^2$ imply
\begin{align*}
\forall \varepsilon>0\ \exists M<\infty\ \exists N\in\mathbb N\ \forall n\ge N:\quad
\mathbb{P}\left(
\frac{1}{p_n}\|S_n-\hat{\mu}_n I_{p_n}\|_F^2>M
\right)<\varepsilon.
\end{align*}
Together with $\hat{\alpha}_n-\alpha_n^*\xrightarrow{\mathbb{P}}0$, this gives
\begin{align*}
\frac{1}{p_n}
\left\|
\hat{\Sigma}_{n}(\hat{\alpha}_n)-\hat{\Sigma}_{n}(\alpha_n^*)
\right\|_F^2
\xrightarrow{\mathbb{P}}0.
\end{align*}
Therefore $\hat{\Sigma}_{LW,n}$ is asymptotically equivalent in normalized Frobenius loss to the oracle Ledoit-Wolf linear shrinkage estimator. This proves the stated oracle-equivalence form of the optimality claim.
[guided]
The estimator itself depends linearly on $\alpha$. Hence the difference between using the estimated intensity $\hat{\alpha}_n$ and the oracle intensity $\alpha_n^*$ is exactly
\begin{align*}
\hat{\Sigma}_{n}(\hat{\alpha}_n)-\hat{\Sigma}_{n}(\alpha_n^*)
=
(\alpha_n^*-\hat{\alpha}_n)(S_n-\hat{\mu}_n I_{p_n}).
\end{align*}
Taking normalized Frobenius norms gives
\begin{align*}
\frac{1}{p_n}
\left\|
\hat{\Sigma}_{n}(\hat{\alpha}_n)-\hat{\Sigma}_{n}(\alpha_n^*)
\right\|_F^2
=
|\hat{\alpha}_n-\alpha_n^*|^2
\frac{1}{p_n}\|S_n-\hat{\mu}_n I_{p_n}\|_F^2.
\end{align*}
It remains to check that the matrix factor does not diverge in probability. Since $\hat{\delta}_n^2=\|S_n-\hat{\mu}_n I_{p_n}\|_F^2$, the consistency law for $\hat{\delta}_n^2$ and the uniform bound on $p_n^{-1}\delta_n^2$ imply that for every $\varepsilon>0$ there are $M<\infty$ and $N\in\mathbb N$ such that, for every $n\ge N$,
\begin{align*}
\mathbb{P}\left(
\frac{1}{p_n}\|S_n-\hat{\mu}_n I_{p_n}\|_F^2>M
\right)<\varepsilon.
\end{align*}
Since the previous step proved
\begin{align*}
\hat{\alpha}_n-\alpha_n^*
\xrightarrow{\mathbb{P}}0,
\end{align*}
the product of a tight sequence and a sequence converging to $0$ in probability also converges to $0$ in probability. Hence
\begin{align*}
\frac{1}{p_n}
\left\|
\hat{\Sigma}_{n}(\hat{\alpha}_n)-\hat{\Sigma}_{n}(\alpha_n^*)
\right\|_F^2
\xrightarrow{\mathbb{P}}0.
\end{align*}
This is precisely the asserted asymptotic equivalence of
\begin{align*}
\hat{\Sigma}_{LW,n}
=
\hat{\alpha}_n\hat{\mu}_n I_{p_n}
+
(1-\hat{\alpha}_n)S_n
\end{align*}
to the oracle linear shrinkage estimator in the stated class.
[/guided]
[/step]