Ledoit-Wolf Linear Shrinkage Optimality Theorem — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Let $(p_n)_{n \ge 1}$ be a sequence of positive integers such that $p_n/n \to c \in (0,\infty)$. For each $n$, let $X_{n,1},\dots,X_{n,n}$ be independent $\mathbb{R}^{p_n}$-valued random vectors with \begin{align*} \mathbb{E}[X_{n,k}] = 0, \qquad \mathbb{E}[X_{n,k}X_{n,k}^{\top}] = \Sigma_n, \qquad 1 \le k \le n, \end{align*} where $\Sigma_n \in \mathbb{R}^{p_n \times p_n}$ is symmetric positive semidefinite. Let $I_{p_n}$ denote the $p_n \times p_n$ identity matrix and define \begin{align*} S_n &= \frac{1}{n}\sum_{k=1}^{n} X_{n,k}X_{n,k}^{\top},\\ \mu_n &= \frac{1}{p_n}\operatorname{tr}(\Sigma_n),\\ \hat{\mu}_n &= \frac{1}{p_n}\operatorname{tr}(S_n). \end{align*} Assume the Ledoit-Wolf high-dimensional moment conditions hold: the normalized traces $p_n^{-1}\operatorname{tr}(\Sigma_n)$ and $p_n^{-1}\operatorname{tr}(\Sigma_n^2)$ are uniformly bounded, the entries after whitening have uniformly bounded eighth moments, and the quadratic covariance functionals below are consistently estimable on the normalized Frobenius scale. Define \begin{align*} \beta_n^2 &= \mathbb{E}\|S_n-\Sigma_n\|_F^2,\\ \delta_n^2 &= \mathbb{E}\|S_n-\mu_n I_{p_n}\|_F^2, \end{align*} and assume \begin{align*} \liminf_{n\to\infty} p_n^{-1}\delta_n^2 > 0, \qquad \sup_{n\ge1}p_n^{-1}\beta_n^2<\infty, \qquad \sup_{n\ge1}p_n^{-1}\delta_n^2<\infty. \end{align*} Let \begin{align*} \hat{\beta}_n^2 &= \frac{1}{n^2}\sum_{k=1}^{n} \|X_{n,k}X_{n,k}^{\top}-S_n\|_F^2,\\ \hat{\delta}_n^2 &= \|S_n-\hat{\mu}_n I_{p_n}\|_F^2, \end{align*} and assume the corresponding consistency laws \begin{align*} \frac{1}{p_n}\left|\hat{\beta}_n^2-\beta_n^2\right| \xrightarrow{\mathbb{P}}0, \qquad \frac{1}{p_n}\left|\hat{\delta}_n^2-\delta_n^2\right| \xrightarrow{\mathbb{P}}0. \end{align*} Set \begin{align*} \hat{\alpha}_n = \min\left\{1,\max\left\{0,\frac{\hat{\beta}_n^2}{\hat{\delta}_n^2}\right\}\right\}. \end{align*} Then the Ledoit-Wolf linear shrinkage estimator \begin{align*} \hat{\Sigma}_{LW,n} = \hat{\alpha}_n\hat{\mu}_n I_{p_n} + (1-\hat{\alpha}_n)S_n \end{align*} is asymptotically equivalent, in normalized Frobenius loss, to the Ledoit-Wolf oracle member of the linear shrinkage family \begin{align*} \hat{\Sigma}_{n}(\alpha) = \alpha\hat{\mu}_n I_{p_n} + (1-\alpha)S_n, \qquad 0 \le \alpha \le 1. \end{align*} More precisely, if \begin{align*} \alpha_n^* = \min\left\{1,\max\left\{0,\frac{\beta_n^2}{\delta_n^2}\right\}\right\}, \end{align*} then \begin{align*} \hat{\alpha}_n-\alpha_n^* \xrightarrow{\mathbb{P}}0 \end{align*} and \begin{align*} \frac{1}{p_n}\left\| \hat{\Sigma}_{n}(\hat{\alpha}_n) - \hat{\Sigma}_{n}(\alpha_n^*) \right\|_F^2 \xrightarrow{\mathbb{P}} 0. \end{align*} The deterministic value $\alpha_n^*$ is the Ledoit-Wolf oracle intensity for the corresponding population-target quadratic Frobenius risk; the conclusion here is oracle equivalence of the data-driven estimator on the normalized Frobenius scale.

Discussion

Proof

[proofplan] The proof first identifies the Ledoit-Wolf oracle shrinkage intensity for the population-target risk by expanding the expected Frobenius risk as a quadratic polynomial in the shrinkage parameter. The minimizer depends on two population quantities: the fluctuation size $\beta_n^2$ of the sample covariance and the total distance $\delta_n^2$ from the shrinkage target. The Ledoit-Wolf estimator replaces these quantities by consistent sample estimates, and the assumed high-dimensional laws of large numbers imply that the resulting data-dependent shrinkage intensity converges to the oracle one. The final step proves asymptotic equivalence of the data-driven shrinkage estimator and the oracle shrinkage estimator, avoiding the stronger and generally false claim that the deterministic oracle minimizes each realized sample loss. [/proofplan] [step:Expand the loss as a quadratic function of the shrinkage intensity] Fix $n \ge 1$. For $\alpha \in [0,1]$, define the linear shrinkage estimator \begin{align*} \hat{\Sigma}_n(\alpha) = \alpha\hat{\mu}_n I_{p_n} + (1-\alpha)S_n. \end{align*} Define also the centered sample direction \begin{align*} A_n := S_n-\hat{\mu}_n I_{p_n} \end{align*} and the sample covariance error \begin{align*} E_n := S_n-\Sigma_n. \end{align*} Then \begin{align*} \hat{\Sigma}_n(\alpha)-\Sigma_n = E_n-\alpha A_n. \end{align*} Define the realized normalized Frobenius loss map $L_n:[0,1]\to[0,\infty)$ by \begin{align*} L_n(\alpha) &:= \frac{1}{p_n}\|E_n-\alpha A_n\|_F^2\\ &= \frac{1}{p_n}\|E_n\|_F^2 - \frac{2\alpha}{p_n}\operatorname{tr}(E_nA_n) + \frac{\alpha^2}{p_n}\|A_n\|_F^2. \end{align*} Thus, for the realized sample, $L_n$ is a convex quadratic polynomial in $\alpha$. [guided] We first isolate exactly how the shrinkage parameter enters the loss. The estimator is a point on the line segment between $S_n$ and the scalar matrix $\hat{\mu}_n I_{p_n}$. Writing \begin{align*} A_n := S_n-\hat{\mu}_n I_{p_n} \end{align*} means that increasing $\alpha$ moves the estimator away from $S_n$ in the direction $-A_n$. Since \begin{align*} \hat{\Sigma}_n(\alpha) = S_n-\alpha(S_n-\hat{\mu}_n I_{p_n}) = S_n-\alpha A_n, \end{align*} subtracting $\Sigma_n$ gives \begin{align*} \hat{\Sigma}_n(\alpha)-\Sigma_n = S_n-\Sigma_n-\alpha A_n. \end{align*} With \begin{align*} E_n := S_n-\Sigma_n, \end{align*} this becomes $E_n-\alpha A_n$. Expanding the Frobenius norm by the identity \begin{align*} \|B-\alpha C\|_F^2 = \|B\|_F^2 - 2\alpha\operatorname{tr}(BC) + \alpha^2\|C\|_F^2 \end{align*} for symmetric matrices $B,C \in \mathbb{R}^{p_n\times p_n}$ gives \begin{align*} L_n(\alpha) &= \frac{1}{p_n}\|E_n-\alpha A_n\|_F^2\\ &= \frac{1}{p_n}\|E_n\|_F^2 - \frac{2\alpha}{p_n}\operatorname{tr}(E_nA_n) + \frac{\alpha^2}{p_n}\|A_n\|_F^2. \end{align*} This is the key reduction: optimal shrinkage is now a one-dimensional quadratic minimization problem. [/guided] [/step] [step:Identify the deterministic oracle shrinkage intensity] Define the deterministic oracle quantities \begin{align*} \beta_n^2 &= \mathbb{E}\|S_n-\Sigma_n\|_F^2,\\ \delta_n^2 &= \mathbb{E}\|S_n-\mu_n I_{p_n}\|_F^2. \end{align*} Since $\mathbb{E}S_n=\Sigma_n$, we have \begin{align*} \mathbb{E}\operatorname{tr}\bigl((S_n-\Sigma_n)(\Sigma_n-\mu_n I_{p_n})\bigr)=0. \end{align*} Therefore \begin{align*} \delta_n^2 &= \mathbb{E}\|S_n-\mu_n I_{p_n}\|_F^2\\ &= \mathbb{E}\|S_n-\Sigma_n\|_F^2 + \|\Sigma_n-\mu_n I_{p_n}\|_F^2\\ &= \beta_n^2+\|\Sigma_n-\mu_n I_{p_n}\|_F^2. \end{align*} Now define the population-target shrinkage family \begin{align*} \widetilde{\Sigma}_n(\alpha) = \alpha\mu_n I_{p_n}+(1-\alpha)S_n, \qquad 0\le \alpha\le1, \end{align*} and define its expected normalized Frobenius risk $R_n:[0,1]\to[0,\infty)$ by \begin{align*} R_n(\alpha) = \frac{1}{p_n}\mathbb{E}\left\|\widetilde{\Sigma}_n(\alpha)-\Sigma_n\right\|_F^2. \end{align*} Since \begin{align*} \widetilde{\Sigma}_n(\alpha)-\Sigma_n = (S_n-\Sigma_n)-\alpha(S_n-\mu_n I_{p_n}), \end{align*} we obtain \begin{align*} R_n(\alpha) &= \frac{1}{p_n}\left(\beta_n^2-2\alpha\beta_n^2+\alpha^2\delta_n^2\right). \end{align*} This quadratic has unconstrained minimizer $\beta_n^2/\delta_n^2$. Therefore the constrained minimizer on $[0,1]$ is \begin{align*} \alpha_n^* = \min\left\{1,\max\left\{0,\frac{\beta_n^2}{\delta_n^2}\right\}\right\}. \end{align*} Because $\beta_n^2 \ge 0$ and $\delta_n^2>0$ for all sufficiently large $n$, this is the Ledoit-Wolf oracle intensity for the population-target risk $R_n$. [guided] The oracle parameter is the value that would be used if the population quantities were known. We define \begin{align*} \beta_n^2 = \mathbb{E}\|S_n-\Sigma_n\|_F^2 \end{align*} as the total sampling fluctuation of the sample covariance matrix, and \begin{align*} \delta_n^2 = \mathbb{E}\|S_n-\mu_n I_{p_n}\|_F^2 \end{align*} as the total expected squared distance from the scalar shrinkage target. The cross-term between the sampling error and the deterministic population deviation vanishes. Indeed, since \begin{align*} S_n=\frac{1}{n}\sum_{k=1}^{n}X_{n,k}X_{n,k}^{\top} \end{align*} and $\mathbb{E}[X_{n,k}X_{n,k}^{\top}]=\Sigma_n$, linearity of expectation gives $\mathbb{E}S_n=\Sigma_n$. Hence \begin{align*} \mathbb{E}\operatorname{tr}\bigl((S_n-\Sigma_n)(\Sigma_n-\mu_n I_{p_n})\bigr) = \operatorname{tr}\bigl((\mathbb{E}S_n-\Sigma_n)(\Sigma_n-\mu_n I_{p_n})\bigr) = 0. \end{align*} Expanding $S_n-\mu_n I_{p_n}$ as \begin{align*} S_n-\mu_n I_{p_n} = (S_n-\Sigma_n)+(\Sigma_n-\mu_n I_{p_n}) \end{align*} therefore yields \begin{align*} \delta_n^2 = \beta_n^2+\|\Sigma_n-\mu_n I_{p_n}\|_F^2. \end{align*} For the population-target estimator \begin{align*} \widetilde{\Sigma}_n(\alpha) = \alpha\mu_n I_{p_n}+(1-\alpha)S_n, \end{align*} the expected normalized Frobenius risk is \begin{align*} R_n(\alpha) = \frac{1}{p_n}\mathbb{E}\left\|\widetilde{\Sigma}_n(\alpha)-\Sigma_n\right\|_F^2. \end{align*} The identity \begin{align*} \widetilde{\Sigma}_n(\alpha)-\Sigma_n = (S_n-\Sigma_n)-\alpha(S_n-\mu_n I_{p_n}) \end{align*} gives \begin{align*} R_n(\alpha) = \frac{1}{p_n}\left(\beta_n^2-2\alpha\beta_n^2+\alpha^2\delta_n^2\right). \end{align*} Thus the fraction $\beta_n^2/\delta_n^2$ is the unconstrained minimizer of this quadratic risk. Since the shrinkage family restricts $\alpha$ to $[0,1]$, the oracle intensity is the clipped value \begin{align*} \alpha_n^* = \min\left\{1,\max\left\{0,\frac{\beta_n^2}{\delta_n^2}\right\}\right\}. \end{align*} [/guided] [/step] [step:Use the Ledoit-Wolf laws of large numbers to estimate the oracle ratio] By the assumed Ledoit-Wolf high-dimensional moment conditions, \begin{align*} \frac{1}{p_n}\left|\hat{\beta}_n^2-\beta_n^2\right| \xrightarrow{\mathbb{P}}0, \qquad \frac{1}{p_n}\left|\hat{\delta}_n^2-\delta_n^2\right| \xrightarrow{\mathbb{P}}0. \end{align*} Set \begin{align*} B_n=\frac{\beta_n^2}{p_n}, \qquad D_n=\frac{\delta_n^2}{p_n}, \qquad \widehat B_n=\frac{\hat{\beta}_n^2}{p_n}, \qquad \widehat D_n=\frac{\hat{\delta}_n^2}{p_n}. \end{align*} By assumption, $\widehat B_n-B_n\xrightarrow{\mathbb P}0$ and $\widehat D_n-D_n\xrightarrow{\mathbb P}0$. Also $D_n$ is bounded below away from $0$ eventually, while $B_n$ and $D_n$ are bounded above. Hence $\widehat D_n$ is bounded away from $0$ with probability tending to $1$, and the continuous-mapping theorem gives \begin{align*} \frac{\hat{\beta}_n^2}{\hat{\delta}_n^2} = \frac{\widehat B_n}{\widehat D_n} \xrightarrow{\mathbb P} \frac{B_n}{D_n} = \frac{\beta_n^2}{\delta_n^2}. \end{align*} Equivalently, on events where $\widehat D_n$ is bounded below by a fixed positive constant, \begin{align*} \left| \frac{\widehat B_n}{\widehat D_n} - \frac{B_n}{D_n} \right| \le \frac{|\widehat B_n-B_n|}{\widehat D_n} + \frac{|B_n|\,|\widehat D_n-D_n|}{\widehat D_nD_n}, \end{align*} and the right-hand side converges to $0$ in probability by the boundedness and lower-bound assumptions. Equivalently, \begin{align*} \frac{\hat{\beta}_n^2}{\hat{\delta}_n^2} - \frac{\beta_n^2}{\delta_n^2} \xrightarrow{\mathbb{P}}0. \end{align*} The clipping map $t\mapsto \min\{1,\max\{0,t\}\}$ is Lipschitz with constant $1$, so \begin{align*} \hat{\alpha}_n-\alpha_n^* \xrightarrow{\mathbb{P}}0. \end{align*} [guided] The two sample quantities $\hat{\beta}_n^2$ and $\hat{\delta}_n^2$ are designed to estimate the two population quantities entering the oracle ratio. The Ledoit-Wolf moment assumptions give the normalized consistency statements \begin{align*} \frac{1}{p_n}\left|\hat{\beta}_n^2-\beta_n^2\right| \xrightarrow{\mathbb{P}}0 \end{align*} and \begin{align*} \frac{1}{p_n}\left|\hat{\delta}_n^2-\delta_n^2\right| \xrightarrow{\mathbb{P}}0. \end{align*} The normalization by $p_n$ is the correct scale because Frobenius losses for $p_n\times p_n$ covariance matrices grow linearly in $p_n$ under the bounded trace assumptions. The denominator is not allowed to degenerate: the hypothesis \begin{align*} \liminf_{n\to\infty}p_n^{-1}\delta_n^2>0 \end{align*} says that $\delta_n^2$ remains of order at least $p_n$. The hypotheses also give uniform upper bounds for $p_n^{-1}\beta_n^2$ and $p_n^{-1}\delta_n^2$. With \begin{align*} B_n=\frac{\beta_n^2}{p_n}, \quad D_n=\frac{\delta_n^2}{p_n}, \quad \widehat B_n=\frac{\hat{\beta}_n^2}{p_n}, \quad \widehat D_n=\frac{\hat{\delta}_n^2}{p_n}, \end{align*} the consistency assumptions say $\widehat B_n-B_n\to0$ and $\widehat D_n-D_n\to0$ in probability. Since $D_n$ stays bounded away from $0$, $\widehat D_n$ also stays bounded away from $0$ with probability tending to $1$. Hence the ratio is stable: \begin{align*} \frac{\hat{\beta}_n^2}{\hat{\delta}_n^2} - \frac{\beta_n^2}{\delta_n^2} = \frac{\widehat B_n}{\widehat D_n} - \frac{B_n}{D_n} \xrightarrow{\mathbb{P}}0. \end{align*} The elementary bound behind this convergence is \begin{align*} \left| \frac{\widehat B_n}{\widehat D_n} - \frac{B_n}{D_n} \right| \le \frac{|\widehat B_n-B_n|}{\widehat D_n} + \frac{|B_n|\,|\widehat D_n-D_n|}{\widehat D_nD_n}, \end{align*} on the high-probability events where $\widehat D_n$ is bounded below. Finally, the clipping map \begin{align*} t \mapsto \min\{1,\max\{0,t\}\} \end{align*} cannot enlarge distances, because projecting two [real numbers](/page/Real%20Numbers) onto the closed interval $[0,1]$ decreases their absolute difference. Thus \begin{align*} \hat{\alpha}_n-\alpha_n^* \xrightarrow{\mathbb{P}}0. \end{align*} [/guided] [/step] [step:Convert convergence of shrinkage intensities into oracle estimator equivalence] The difference between the data-driven estimator and the oracle estimator is \begin{align*} \hat{\Sigma}_{n}(\hat{\alpha}_n)-\hat{\Sigma}_{n}(\alpha_n^*) = (\alpha_n^*-\hat{\alpha}_n)(S_n-\hat{\mu}_n I_{p_n}). \end{align*} Therefore \begin{align*} \frac{1}{p_n} \left\| \hat{\Sigma}_{n}(\hat{\alpha}_n)-\hat{\Sigma}_{n}(\alpha_n^*) \right\|_F^2 = |\hat{\alpha}_n-\alpha_n^*|^2 \frac{1}{p_n}\|S_n-\hat{\mu}_n I_{p_n}\|_F^2. \end{align*} Because $\hat{\delta}_n^2=\|S_n-\hat{\mu}_n I_{p_n}\|_F^2$, the consistency law for $\hat{\delta}_n^2$ and the uniform bound on $p_n^{-1}\delta_n^2$ imply \begin{align*} \forall \varepsilon>0\ \exists M<\infty\ \exists N\in\mathbb N\ \forall n\ge N:\quad \mathbb{P}\left( \frac{1}{p_n}\|S_n-\hat{\mu}_n I_{p_n}\|_F^2>M \right)<\varepsilon. \end{align*} Together with $\hat{\alpha}_n-\alpha_n^*\xrightarrow{\mathbb{P}}0$, this gives \begin{align*} \frac{1}{p_n} \left\| \hat{\Sigma}_{n}(\hat{\alpha}_n)-\hat{\Sigma}_{n}(\alpha_n^*) \right\|_F^2 \xrightarrow{\mathbb{P}}0. \end{align*} Therefore $\hat{\Sigma}_{LW,n}$ is asymptotically equivalent in normalized Frobenius loss to the oracle Ledoit-Wolf linear shrinkage estimator. This proves the stated oracle-equivalence form of the optimality claim. [guided] The estimator itself depends linearly on $\alpha$. Hence the difference between using the estimated intensity $\hat{\alpha}_n$ and the oracle intensity $\alpha_n^*$ is exactly \begin{align*} \hat{\Sigma}_{n}(\hat{\alpha}_n)-\hat{\Sigma}_{n}(\alpha_n^*) = (\alpha_n^*-\hat{\alpha}_n)(S_n-\hat{\mu}_n I_{p_n}). \end{align*} Taking normalized Frobenius norms gives \begin{align*} \frac{1}{p_n} \left\| \hat{\Sigma}_{n}(\hat{\alpha}_n)-\hat{\Sigma}_{n}(\alpha_n^*) \right\|_F^2 = |\hat{\alpha}_n-\alpha_n^*|^2 \frac{1}{p_n}\|S_n-\hat{\mu}_n I_{p_n}\|_F^2. \end{align*} It remains to check that the matrix factor does not diverge in probability. Since $\hat{\delta}_n^2=\|S_n-\hat{\mu}_n I_{p_n}\|_F^2$, the consistency law for $\hat{\delta}_n^2$ and the uniform bound on $p_n^{-1}\delta_n^2$ imply that for every $\varepsilon>0$ there are $M<\infty$ and $N\in\mathbb N$ such that, for every $n\ge N$, \begin{align*} \mathbb{P}\left( \frac{1}{p_n}\|S_n-\hat{\mu}_n I_{p_n}\|_F^2>M \right)<\varepsilon. \end{align*} Since the previous step proved \begin{align*} \hat{\alpha}_n-\alpha_n^* \xrightarrow{\mathbb{P}}0, \end{align*} the product of a tight sequence and a sequence converging to $0$ in probability also converges to $0$ in probability. Hence \begin{align*} \frac{1}{p_n} \left\| \hat{\Sigma}_{n}(\hat{\alpha}_n)-\hat{\Sigma}_{n}(\alpha_n^*) \right\|_F^2 \xrightarrow{\mathbb{P}}0. \end{align*} This is precisely the asserted asymptotic equivalence of \begin{align*} \hat{\Sigma}_{LW,n} = \hat{\alpha}_n\hat{\mu}_n I_{p_n} + (1-\hat{\alpha}_n)S_n \end{align*} to the oracle linear shrinkage estimator in the stated class. [/guided] [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Expectation

What brings you to Androma?

Start with a route through the knowledge graph.

Ledoit-Wolf Linear Shrinkage Optimality Theorem (Theorem # 4072)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Ledoit-Wolf Linear Shrinkage Optimality Theorem (Theorem # 4072)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further