[proofplan]
We verify each displayed rate by invoking the precise course minimax or recovery result for the relevant Gaussian model, and in each case we state the form used before matching the theorem's hypotheses. The lower bounds are the testing, packing, or stable-embedding halves of those results, while the upper bounds are realised by the corresponding standard estimators: thresholding for sparse sequence estimation, restricted-eigenvalue estimators for sparse regression, the sample covariance for covariance estimation, principal eigenspace estimators for spiked covariance, and restricted-isometry recovery for compressed sensing. In the spiked covariance case the sharp minimax theorem is invoked with $\lambda$ restricted to fixed signal-strength bounds, so the covariance-scale factor is absorbed into constants and the displayed dependence is $d/(n\lambda^2)$. The truncations by $d$, $1$, or the maximal sine loss are obtained by comparing the local statistical rate with the bounded diameter of the parameter class under the stated loss.
[/proofplan]
custom_env
admin
[step:Apply the sparse Gaussian sequence minimax theorem]For the Gaussian sequence model, define the parameter class
\begin{align*}
\Theta_k := \{\theta \in \mathbb{R}^d : \|\theta\|_0 \leq k\}
\end{align*}
and let an estimator be any measurable map
\begin{align*}
\hat{\theta}: \mathbb{R}^d &\to \mathbb{R}^d.
\end{align*}
The stated model is $Y = \theta + \sigma Z$, where $\sigma>0$ is the noise standard deviation, $Z \sim \mathcal{N}(0,I_d)$, $1 \leq k \leq d$, and the loss is squared Euclidean loss $|\hat{\theta}(Y)-\theta|^2$. We use the [Sparse Normal Means Minimax Rate](/page/Sparse%20Normal%20Means%20Minimax%20Rate), in the following precise form: for $Y=\theta+\sigma Z$ with $Z\sim\mathcal N(0,I_d)$ and $1\leq k\leq d$, there are universal constants $0<c_1\leq c_2<\infty$ such that
\begin{align*}
c_1\sigma^2 k\log\left(\frac{ed}{k}\right)
\leq
\inf_{\hat{\theta}}\sup_{\theta\in\Theta_k}\mathbb{E}_\theta[|\hat{\theta}(Y)-\theta|^2]
\leq
c_2\sigma^2 k\log\left(\frac{ed}{k}\right).
\end{align*}
The theorem's hypotheses are exactly verified by the displayed model, the measurable estimator class above, the sparsity range $1\leq k\leq d$, and squared Euclidean loss.
This is precisely the asserted equivalence.[/step]
custom_env
admin
[guided]The first statement is the standard sparse Gaussian sequence problem. We must match the theorem's inputs carefully: the observation space is $\mathbb{R}^d$, the estimator is a measurable map $\hat{\theta}:\mathbb{R}^d\to\mathbb{R}^d$, the parameter set is $\Theta_k=\{\theta\in\mathbb{R}^d:\|\theta\|_0\leq k\}$, and the loss is $|\hat{\theta}(Y)-\theta|^2$. The condition $1\leq k\leq d$ is exactly the admissible sparsity range.
The precise result used here is the [Sparse Normal Means Minimax Rate](/page/Sparse%20Normal%20Means%20Minimax%20Rate). It states that for this Gaussian shift experiment, this sparsity class, and this squared Euclidean loss, there are universal constants $0<c_1\leq c_2<\infty$ satisfying
\begin{align*}
c_1\sigma^2 k\log\left(\frac{ed}{k}\right)
\leq
\inf_{\hat{\theta}}\sup_{\theta\in\Theta_k}\mathbb{E}_\theta[|\hat{\theta}(Y)-\theta|^2]
\leq
c_2\sigma^2 k\log\left(\frac{ed}{k}\right).
\end{align*}
Its assumptions are not additional assumptions: the theorem requires Gaussian noise with covariance $\sigma^2I_d$, the class $\Theta_k$, $1\leq k\leq d$, and squared Euclidean loss, all of which have just been identified.
The lower bound is the packing/testing half of the theorem, and the upper bound is attained by a thresholding estimator at the usual noise level. Since the constants do not depend on $d$ or $k$, this is exactly the meaning of
\begin{align*}
\inf_{\hat{\theta}}\sup_{\theta\in\Theta_k}\mathbb{E}_\theta[|\hat{\theta}-\theta|^2]
\asymp \sigma^2 k\log\left(\frac{ed}{k}\right).
\end{align*}[/guided]
custom_env
admin
[step:Apply the sparse regression theorem under random or restricted-eigenvalue design]Let an estimator in the regression model be a measurable map
\begin{align*}
\hat{\beta}: \mathbb{R}^n \times \mathbb{R}^{n\times d} &\to \mathbb{R}^d.
\end{align*}
In the random-design formulation, the rows of $X$ are independent Gaussian random vectors with population covariance comparable to $I_d$, and the risk is taken over both $X$ and $w$. We use the [Sparse Linear Regression Minimax Rate under Restricted Eigenvalues](/page/Sparse%20Linear%20Regression%20Minimax%20Rate%20under%20Restricted%20Eigenvalues). In the random-design form, it states that if $1\leq k\leq d$, $n\geq C_0 k\log(ed/k)$, and the Gaussian design covariance has eigenvalues in $[a,b]$ for fixed constants $0<a\leq b<\infty$, then
\begin{align*}
\inf_{\hat{\beta}}\sup_{\|\beta\|_0\leq k}\mathbb{E}_\beta[|\hat{\beta}(Y,X)-\beta|^2]
\asymp
\frac{\sigma^2 k\log(ed/k)}{n},
\end{align*}
with constants depending only on $a$, $b$, and $C_0$. These are exactly the random-design assumptions stated here. In the fixed-design formulation, conditioning on a realised matrix $X$ whose columns have Euclidean norm comparable to $n^{1/2}$ and whose empirical Gram matrix $X^\top X/n$ satisfies restricted eigenvalue upper and lower bounds on sparsity levels comparable to $k$ verifies the fixed-design hypotheses of the same linked theorem; the constants then depend on those restricted-eigenvalue and normalisation constants. Therefore the conditional minimax risk over $w\sim\mathcal{N}(0,\sigma^2 I_n)$ has the same order. If these restricted-eigenvalue assumptions fail, Euclidean parameter loss can be non-identifiable along sparse directions in the kernel or near-kernel of $X$, which is why the statement separates Euclidean estimation from prediction loss.[/step]
custom_env
admin
[guided]The regression assertion has two versions because the design matrix controls whether Euclidean parameter error is statistically identifiable. In the random-design version, the estimator is a measurable map
\begin{align*}
\hat{\beta}: \mathbb{R}^n \times \mathbb{R}^{n\times d} &\to \mathbb{R}^d,
\end{align*}
and the risk averages over the pair $(X,w)$, where the rows of $X$ are independent Gaussian vectors and $w\sim\mathcal N(0,\sigma^2I_n)$. The linked [Sparse Linear Regression Minimax Rate under Restricted Eigenvalues](/page/Sparse%20Linear%20Regression%20Minimax%20Rate%20under%20Restricted%20Eigenvalues) requires $1\le k\le d$, a sample size lower bound $n\ge C_0k\log(ed/k)$, and population covariance eigenvalues in a fixed interval $[a,b]$ with $0<a\le b<\infty$. These are exactly the random-design hypotheses in the statement, so the theorem gives
\begin{align*}
\inf_{\hat{\beta}}\sup_{\|\beta\|_0\leq k}\mathbb{E}_\beta[|\hat{\beta}(Y,X)-\beta|^2]
\asymp
\frac{\sigma^2 k\log(ed/k)}{n},
\end{align*}
with constants depending only on $a$, $b$, and $C_0$.
For the fixed-design version, the relevant quantitative condition is that there are fixed constants $0<\kappa_-\le \kappa_+<\infty$ and a sparsity multiplier $s_0$ comparable to $k$ such that
\begin{align*}
\kappa_-|u|^2\leq u^\top\frac{X^\top X}{n}u\leq \kappa_+|u|^2
\end{align*}
for every $u\in\mathbb R^d$ with $\|u\|_0\le s_0$, together with column norms comparable to $n^{1/2}$. This is the restricted-eigenvalue input used by the same linked theorem. Under those deterministic conditions on the realised matrix $X$, the only remaining randomness is $w$, and the conditional minimax risk over the noise has the same order. If the lower restricted eigenvalue fails, a nonzero sparse vector can lie in, or nearly lie in, the kernel of $X$; then two different sparse parameters can produce identical or nearly identical means $X\beta$, so Euclidean parameter loss is not controlled by the observations even though prediction loss may be.[/guided]
custom_env
admin
[step:Apply covariance estimation bounds and truncate by the bounded-spectrum diameter]
Let an estimator of the covariance matrix be a measurable map
\begin{align*}
\hat{\Sigma}: (\mathbb{R}^d)^n &\to \mathbb{R}^{d\times d}.
\end{align*}
The parameter class consists of symmetric positive-definite matrices $\Sigma\in\mathbb{R}^{d\times d}$ whose eigenvalues lie in $[m,M]$, where $0<m<M<\infty$ are fixed. We use the [Gaussian Covariance Estimation Minimax Bounds](/page/Gaussian%20Covariance%20Estimation%20Minimax%20Bounds). In Frobenius loss, this result states that for i.i.d. samples from $\mathcal N(0,\Sigma)$ over the class $mI_d\preceq \Sigma\preceq MI_d$, the minimax risk is the local rate $d^2/n$ truncated by the squared Frobenius diameter of the class, with constants depending only on $m$ and $M$. The present samples, parameter class, and loss match those hypotheses. Since
\begin{align*}
\|MI_d-mI_d\|_F^2=(M-m)^2d,
\end{align*}
the squared Frobenius diameter is of order $d$, with constants depending only on $m$ and $M$. Hence
\begin{align*}
\inf_{\hat{\Sigma}}\sup_\Sigma\mathbb{E}_\Sigma[\|\hat{\Sigma}-\Sigma\|_F^2]
\asymp
\left(\frac{d^2}{n}\right)\wedge d.
\end{align*}
The operator-norm part of the [Gaussian Covariance Estimation Minimax Bounds](/page/Gaussian%20Covariance%20Estimation%20Minimax%20Bounds) applies to the same bounded-spectrum Gaussian class and states
\begin{align*}
\inf_{\hat{\Sigma}}\sup_\Sigma\mathbb{E}_\Sigma[\|\hat{\Sigma}-\Sigma\|_{\mathrm{op}}^2]
\asymp
\left(\sqrt{\frac{d}{n}}+\frac{d}{n}\right)^2\wedge 1,
\end{align*}
with constants depending only on $m$ and $M$, because the operator-norm diameter of the bounded-spectrum class is at most $M-m$.
[/step]
custom_env
admin
[step:Apply eigenspace perturbation and sparse PCA minimax rates]Let an estimator of the spike direction be a measurable map
\begin{align*}
\hat v: (\mathbb{R}^d)^n &\to \{u\in\mathbb{R}^d: |u|=1\}.
\end{align*}
For the rank-one spiked covariance model $\Sigma=I_d+\lambda vv^\top$, the unique leading eigenspace is $\operatorname{span}(v)$ and the population eigengap equals $\lambda$. Assume, as in the statement, that $\lambda\in[\lambda_{\min},\lambda_{\max}]$ for fixed constants $0<\lambda_{\min}\leq\lambda_{\max}<\infty$. We use the [Spiked Covariance Eigenspace Minimax Rate](/page/Spiked%20Covariance%20Eigenspace%20Minimax%20Rate), whose dense rank-one form states that, in the separated regime where the eigengap $\lambda$ is larger than the sample covariance spectral fluctuation scale by a fixed separation factor, there are constants depending only on $\lambda_{\min}$, $\lambda_{\max}$, and that separation factor such that
\begin{align*}
\inf_{\hat v}\sup_{|v|=1}\mathbb{E}[\sin^2\angle(\hat v,v)]
\asymp
\left(\frac{d}{n\lambda^2}\right)\wedge 1.
\end{align*}
This linked theorem supplies both halves of the minimax equivalence: the lower bound comes from a packing of the unit sphere in projective distance and the upper bound from the principal eigenspace estimator, with perturbation controlled through the [Davis-Kahan Sine Theorem](/page/Davis-Kahan%20Sine%20Theorem). The sample covariance fluctuation has covariance scale $1+\lambda$, but under the fixed bounds on $\lambda$ this factor is absorbed into the constants; it is not asserted as an additional lower-bound factor in the displayed rate. The truncation by $1$ is forced by the range of squared sine loss. Restricting $v$ to $k$-sparse unit vectors replaces the ambient metric entropy $d$ by the sparse support complexity $k\log(ed/k)$ in the information-theoretic packing bound and in the matching sparse estimator rate, with constants depending on the same fixed signal-strength and separation constants. Computationally constrained estimators are not asserted to attain this information-theoretic rate, so the final sentence about polynomial-time relaxations is consistent with, but not part of, the minimax equality.[/step]
custom_env
admin
[guided]The point of this step is to use a minimax eigenspace theorem, not merely a perturbation inequality. The parameter is the unit vector $v\in\mathbb R^d$ with $|v|=1$, but the loss depends only on the one-dimensional subspace $\operatorname{span}(v)$ through $\sin^2\angle(\hat v,v)$. The estimator is a measurable map
\begin{align*}
\hat v: (\mathbb{R}^d)^n &\to \{u\in\mathbb{R}^d: |u|=1\}.
\end{align*}
For $\Sigma=I_d+\lambda vv^\top$, the leading eigenvalue is $1+\lambda$, the remaining eigenvalues are $1$, and the eigengap is therefore $\lambda$.
We apply the [Spiked Covariance Eigenspace Minimax Rate](/page/Spiked%20Covariance%20Eigenspace%20Minimax%20Rate). Its hypotheses require fixed signal-strength bounds $\lambda\in[\lambda_{\min},\lambda_{\max}]$ with $0<\lambda_{\min}\le\lambda_{\max}<\infty$ and a separated regime in which the eigengap is larger than the sample covariance fluctuation scale by a fixed factor. These are precisely the assumptions stated here. The theorem gives constants depending only on $\lambda_{\min}$, $\lambda_{\max}$, and the separation factor such that
\begin{align*}
\inf_{\hat v}\sup_{|v|=1}\mathbb{E}[\sin^2\angle(\hat v,v)]
\asymp
\left(\frac{d}{n\lambda^2}\right)\wedge 1.
\end{align*}
The lower bound is part of the minimax theorem and comes from a packing of one-dimensional subspaces; the upper bound is achieved by estimating the leading eigenspace of the sample covariance and using the [Davis-Kahan Sine Theorem](/page/Davis-Kahan%20Sine%20Theorem) to convert operator perturbation into sine-angle error. The covariance scale $1+\lambda$ appears in the perturbation analysis, but because $\lambda$ is restricted to fixed bounds, $1+\lambda$ is controlled by constants depending only on $\lambda_{\max}$ and is absorbed into the equivalence constants. This is why the displayed rate keeps the explicit dependence on the eigengap as $1/\lambda^2$.
For sparse PCA, the same minimax mechanism is applied to the smaller parameter class of $k$-sparse unit vectors. The metric entropy of that class is governed by support selection and angular packing on each support, producing the effective complexity $k\log(ed/k)$ in place of $d$. The squared sine loss is always between $0$ and $1$, so every local rate is truncated by $1$.[/guided]
custom_env
admin
[step:Apply Gaussian restricted-isometry bounds for compressed sensing]For compressed sensing, let $A\in\mathbb{R}^{n\times d}$ have independent entries $A_{ij}\sim\mathcal N(0,1/n)$, let the signal class be
\begin{align*}
\Theta_k:=\{\theta\in\mathbb{R}^d:\|\theta\|_0\leq k\},
\end{align*}
and let the noise vector be an arbitrary vector $w\in\mathbb{R}^n$. For a fixed restricted-isometry level $\delta\in(0,1)$, the restricted isometry property on $2k$-sparse vectors means that
\begin{align*}
(1-\delta)|u|^2\leq |Au|^2\leq (1+\delta)|u|^2
\end{align*}
for every $u\in\mathbb R^d$ with $\|u\|_0\leq 2k$. We use the [Gaussian Restricted Isometry Theorem](/page/Gaussian%20Restricted%20Isometry%20Theorem), which states that for each fixed restricted-isometry level $\delta\in(0,1)$ there are constants $c(\delta),C(\delta)>0$ such that $A$ satisfies this property with probability at least $1-2e^{-c(\delta)n}$ whenever
\begin{align*}
n\geq C(\delta)k\log\left(\frac{ed}{k}\right).
\end{align*}
On this event, the [RIP Stable Recovery Theorem for Basis Pursuit Denoising](/page/RIP%20Stable%20Recovery%20Theorem%20for%20Basis%20Pursuit%20Denoising) gives a reconstruction map $\Delta:\mathbb{R}^n\to\mathbb{R}^d$ and a constant $C_{\mathrm{rec}}=C_{\mathrm{rec}}(\delta)>0$ such that
\begin{align*}
|\Delta(A\theta+w)-\theta|\leq C_{\mathrm{rec}}|w|
\end{align*}
for every $\theta\in\Theta_k$ and every $w\in\mathbb{R}^n$. This inequality is the definition of uniform stable recovery at stability level $C_{\mathrm{rec}}$ with optimal noise sensitivity under the isotropic normalisation $A_{ij}\sim\mathcal N(0,1/n)$. Conversely, the [Stable Embedding Lower Bound for Sparse Recovery](/page/Stable%20Embedding%20Lower%20Bound%20for%20Sparse%20Recovery) shows that any uniformly stable recovery scheme over all $k$-sparse vectors requires
\begin{align*}
n\gtrsim k\log\left(\frac{ed}{k}\right)
\end{align*}
measurements up to constants depending only on the required stability level. Combining the sufficient restricted-isometry bound with the necessary stable-embedding lower bound proves the compressed sensing assertion.[/step]
custom_env
admin
[guided]The compressed sensing assertion is about uniform recovery over the whole sparse class, not about a single fixed signal. We define
\begin{align*}
\Theta_k:=\{\theta\in\mathbb{R}^d:\|\theta\|_0\leq k\}
\end{align*}
and take $A\in\mathbb R^{n\times d}$ with independent entries $A_{ij}\sim\mathcal N(0,1/n)$. Fix a number $\delta\in(0,1)$. Saying that $A$ has the restricted isometry property on $2k$-sparse vectors at level $\delta$ means
\begin{align*}
(1-\delta)|u|^2\leq |Au|^2\leq (1+\delta)|u|^2
\end{align*}
for every $u\in\mathbb R^d$ with $\|u\|_0\leq 2k$.
The [Gaussian Restricted Isometry Theorem](/page/Gaussian%20Restricted%20Isometry%20Theorem) applies to this isotropically normalised Gaussian matrix. It gives constants $c(\delta),C(\delta)>0$ such that the restricted isometry property above holds with probability at least $1-2e^{-c(\delta)n}$ whenever
\begin{align*}
n\geq C(\delta)k\log\left(\frac{ed}{k}\right).
\end{align*}
On that event, the [RIP Stable Recovery Theorem for Basis Pursuit Denoising](/page/RIP%20Stable%20Recovery%20Theorem%20for%20Basis%20Pursuit%20Denoising) provides a reconstruction map $\Delta:\mathbb R^n\to\mathbb R^d$ and a constant $C_{\mathrm{rec}}=C_{\mathrm{rec}}(\delta)>0$ satisfying
\begin{align*}
|\Delta(A\theta+w)-\theta|\leq C_{\mathrm{rec}}|w|
\end{align*}
for every $\theta\in\Theta_k$ and every noise vector $w\in\mathbb R^n$. This is uniform because the same map $\Delta$ and the same constant work for all sparse signals simultaneously. It has optimal noise sensitivity because the reconstruction error is controlled linearly by the Euclidean noise size.
For necessity, the [Stable Embedding Lower Bound for Sparse Recovery](/page/Stable%20Embedding%20Lower%20Bound%20for%20Sparse%20Recovery) states that any recovery scheme with a fixed uniform stability level over all $k$-sparse vectors must use
\begin{align*}
n\gtrsim k\log\left(\frac{ed}{k}\right)
\end{align*}
measurements, with the implicit constant depending only on that stability level. Thus the Gaussian restricted-isometry upper bound and the stable-embedding lower bound match up to fixed constants.[/guided]
custom_env
admin
[step:Combine the five canonical inputs]
Each displayed assertion follows from the corresponding linked theorem after the model-specific hypotheses verified above. The constants are uniform in $n,d,k$ and depend only on the fixed quantities explicitly attached to those inputs: spectrum bounds $m,M$, Gaussian design covariance bounds $a,b$, the sample-size constant $C_0$, fixed-design restricted-eigenvalue constants $\kappa_-,\kappa_+$ and column-normalisation constants, spiked-covariance signal bounds $\lambda_{\min},\lambda_{\max}$ and the separation factor, and the chosen restricted-isometry level $\delta$, probability constants $c(\delta),C(\delta)$, and recovery stability constant $C_{\mathrm{rec}}(\delta)$. The expectations are taken over exactly the randomness specified in each model: noise only for fixed-design risks, joint design and noise for random-design risks, and the sample distribution for covariance and spiked covariance models. Therefore all five rates and recovery thresholds hold with the uniformity claimed in the corrected theorem statement.
[/step]