Androma — The Home of Mathematics on the Internet

custom_env admin

[guided]The regression assertion has two versions because the design matrix controls whether Euclidean parameter error is statistically identifiable. In the random-design version, the estimator is a measurable map \begin{align*} \hat{\beta}: \mathbb{R}^n \times \mathbb{R}^{n\times d} &\to \mathbb{R}^d, \end{align*} and the risk averages over the pair $(X,w)$, where the rows of $X$ are independent Gaussian vectors and $w\sim\mathcal N(0,\sigma^2I_n)$. The linked [Sparse Linear Regression Minimax Rate under Restricted Eigenvalues](/page/Sparse%20Linear%20Regression%20Minimax%20Rate%20under%20Restricted%20Eigenvalues) requires $1\le k\le d$, a sample size lower bound $n\ge C_0k\log(ed/k)$, and population covariance eigenvalues in a fixed interval $[a,b]$ with $0<a\le b<\infty$. These are exactly the random-design hypotheses in the statement, so the theorem gives \begin{align*} \inf_{\hat{\beta}}\sup_{\|\beta\|_0\leq k}\mathbb{E}_\beta[|\hat{\beta}(Y,X)-\beta|^2] \asymp \frac{\sigma^2 k\log(ed/k)}{n}, \end{align*} with constants depending only on $a$, $b$, and $C_0$. For the fixed-design version, the relevant quantitative condition is that there are fixed constants $0<\kappa_-\le \kappa_+<\infty$ and a sparsity multiplier $s_0$ comparable to $k$ such that \begin{align*} \kappa_-|u|^2\leq u^\top\frac{X^\top X}{n}u\leq \kappa_+|u|^2 \end{align*} for every $u\in\mathbb R^d$ with $\|u\|_0\le s_0$, together with column norms comparable to $n^{1/2}$. This is the restricted-eigenvalue input used by the same linked theorem. Under those deterministic conditions on the realised matrix $X$, the only remaining randomness is $w$, and the conditional minimax risk over the noise has the same order. If the lower restricted eigenvalue fails, a nonzero sparse vector can lie in, or nearly lie in, the kernel of $X$; then two different sparse parameters can produce identical or nearly identical means $X\beta$, so Euclidean parameter loss is not controlled by the observations even though prediction loss may be.[/guided]

custom_env admin

[step:Apply eigenspace perturbation and sparse PCA minimax rates]Let an estimator of the spike direction be a measurable map \begin{align*} \hat v: (\mathbb{R}^d)^n &\to \{u\in\mathbb{R}^d: |u|=1\}. \end{align*} For the rank-one spiked covariance model $\Sigma=I_d+\lambda vv^\top$, the unique leading eigenspace is $\operatorname{span}(v)$ and the population eigengap equals $\lambda$. Assume, as in the statement, that $\lambda\in[\lambda_{\min},\lambda_{\max}]$ for fixed constants $0<\lambda_{\min}\leq\lambda_{\max}<\infty$. We use the [Spiked Covariance Eigenspace Minimax Rate](/page/Spiked%20Covariance%20Eigenspace%20Minimax%20Rate), whose dense rank-one form states that, in the separated regime where the eigengap $\lambda$ is larger than the sample covariance spectral fluctuation scale by a fixed separation factor, there are constants depending only on $\lambda_{\min}$, $\lambda_{\max}$, and that separation factor such that \begin{align*} \inf_{\hat v}\sup_{|v|=1}\mathbb{E}[\sin^2\angle(\hat v,v)] \asymp \left(\frac{d}{n\lambda^2}\right)\wedge 1. \end{align*} This linked theorem supplies both halves of the minimax equivalence: the lower bound comes from a packing of the unit sphere in projective distance and the upper bound from the principal eigenspace estimator, with perturbation controlled through the [Davis-Kahan Sine Theorem](/page/Davis-Kahan%20Sine%20Theorem). The sample covariance fluctuation has covariance scale $1+\lambda$, but under the fixed bounds on $\lambda$ this factor is absorbed into the constants; it is not asserted as an additional lower-bound factor in the displayed rate. The truncation by $1$ is forced by the range of squared sine loss. Restricting $v$ to $k$-sparse unit vectors replaces the ambient metric entropy $d$ by the sparse support complexity $k\log(ed/k)$ in the information-theoretic packing bound and in the matching sparse estimator rate, with constants depending on the same fixed signal-strength and separation constants. Computationally constrained estimators are not asserted to attain this information-theoretic rate, so the final sentence about polynomial-time relaxations is consistent with, but not part of, the minimax equality.[/step]

custom_env admin

[guided]The point of this step is to use a minimax eigenspace theorem, not merely a perturbation inequality. The parameter is the unit vector $v\in\mathbb R^d$ with $|v|=1$, but the loss depends only on the one-dimensional subspace $\operatorname{span}(v)$ through $\sin^2\angle(\hat v,v)$. The estimator is a measurable map \begin{align*} \hat v: (\mathbb{R}^d)^n &\to \{u\in\mathbb{R}^d: |u|=1\}. \end{align*} For $\Sigma=I_d+\lambda vv^\top$, the leading eigenvalue is $1+\lambda$, the remaining eigenvalues are $1$, and the eigengap is therefore $\lambda$. We apply the [Spiked Covariance Eigenspace Minimax Rate](/page/Spiked%20Covariance%20Eigenspace%20Minimax%20Rate). Its hypotheses require fixed signal-strength bounds $\lambda\in[\lambda_{\min},\lambda_{\max}]$ with $0<\lambda_{\min}\le\lambda_{\max}<\infty$ and a separated regime in which the eigengap is larger than the sample covariance fluctuation scale by a fixed factor. These are precisely the assumptions stated here. The theorem gives constants depending only on $\lambda_{\min}$, $\lambda_{\max}$, and the separation factor such that \begin{align*} \inf_{\hat v}\sup_{|v|=1}\mathbb{E}[\sin^2\angle(\hat v,v)] \asymp \left(\frac{d}{n\lambda^2}\right)\wedge 1. \end{align*} The lower bound is part of the minimax theorem and comes from a packing of one-dimensional subspaces; the upper bound is achieved by estimating the leading eigenspace of the sample covariance and using the [Davis-Kahan Sine Theorem](/page/Davis-Kahan%20Sine%20Theorem) to convert operator perturbation into sine-angle error. The covariance scale $1+\lambda$ appears in the perturbation analysis, but because $\lambda$ is restricted to fixed bounds, $1+\lambda$ is controlled by constants depending only on $\lambda_{\max}$ and is absorbed into the equivalence constants. This is why the displayed rate keeps the explicit dependence on the eigengap as $1/\lambda^2$. For sparse PCA, the same minimax mechanism is applied to the smaller parameter class of $k$-sparse unit vectors. The metric entropy of that class is governed by support selection and angular packing on each support, producing the effective complexity $k\log(ed/k)$ in place of $d$. The squared sine loss is always between $0$ and $1$, so every local rate is truncated by $1$.[/guided]

custom_env admin

[step:Apply Gaussian restricted-isometry bounds for compressed sensing]For compressed sensing, let $A\in\mathbb{R}^{n\times d}$ have independent entries $A_{ij}\sim\mathcal N(0,1/n)$, let the signal class be \begin{align*} \Theta_k:=\{\theta\in\mathbb{R}^d:\|\theta\|_0\leq k\}, \end{align*} and let the noise vector be an arbitrary vector $w\in\mathbb{R}^n$. For a fixed restricted-isometry level $\delta\in(0,1)$, the restricted isometry property on $2k$-sparse vectors means that \begin{align*} (1-\delta)|u|^2\leq |Au|^2\leq (1+\delta)|u|^2 \end{align*} for every $u\in\mathbb R^d$ with $\|u\|_0\leq 2k$. We use the [Gaussian Restricted Isometry Theorem](/page/Gaussian%20Restricted%20Isometry%20Theorem), which states that for each fixed restricted-isometry level $\delta\in(0,1)$ there are constants $c(\delta),C(\delta)>0$ such that $A$ satisfies this property with probability at least $1-2e^{-c(\delta)n}$ whenever \begin{align*} n\geq C(\delta)k\log\left(\frac{ed}{k}\right). \end{align*} On this event, the [RIP Stable Recovery Theorem for Basis Pursuit Denoising](/page/RIP%20Stable%20Recovery%20Theorem%20for%20Basis%20Pursuit%20Denoising) gives a reconstruction map $\Delta:\mathbb{R}^n\to\mathbb{R}^d$ and a constant $C_{\mathrm{rec}}=C_{\mathrm{rec}}(\delta)>0$ such that \begin{align*} |\Delta(A\theta+w)-\theta|\leq C_{\mathrm{rec}}|w| \end{align*} for every $\theta\in\Theta_k$ and every $w\in\mathbb{R}^n$. This inequality is the definition of uniform stable recovery at stability level $C_{\mathrm{rec}}$ with optimal noise sensitivity under the isotropic normalisation $A_{ij}\sim\mathcal N(0,1/n)$. Conversely, the [Stable Embedding Lower Bound for Sparse Recovery](/page/Stable%20Embedding%20Lower%20Bound%20for%20Sparse%20Recovery) shows that any uniformly stable recovery scheme over all $k$-sparse vectors requires \begin{align*} n\gtrsim k\log\left(\frac{ed}{k}\right) \end{align*} measurements up to constants depending only on the required stability level. Combining the sufficient restricted-isometry bound with the necessary stable-embedding lower bound proves the compressed sensing assertion.[/step]

custom_env admin

[guided]The compressed sensing assertion is about uniform recovery over the whole sparse class, not about a single fixed signal. We define \begin{align*} \Theta_k:=\{\theta\in\mathbb{R}^d:\|\theta\|_0\leq k\} \end{align*} and take $A\in\mathbb R^{n\times d}$ with independent entries $A_{ij}\sim\mathcal N(0,1/n)$. Fix a number $\delta\in(0,1)$. Saying that $A$ has the restricted isometry property on $2k$-sparse vectors at level $\delta$ means \begin{align*} (1-\delta)|u|^2\leq |Au|^2\leq (1+\delta)|u|^2 \end{align*} for every $u\in\mathbb R^d$ with $\|u\|_0\leq 2k$. The [Gaussian Restricted Isometry Theorem](/page/Gaussian%20Restricted%20Isometry%20Theorem) applies to this isotropically normalised Gaussian matrix. It gives constants $c(\delta),C(\delta)>0$ such that the restricted isometry property above holds with probability at least $1-2e^{-c(\delta)n}$ whenever \begin{align*} n\geq C(\delta)k\log\left(\frac{ed}{k}\right). \end{align*} On that event, the [RIP Stable Recovery Theorem for Basis Pursuit Denoising](/page/RIP%20Stable%20Recovery%20Theorem%20for%20Basis%20Pursuit%20Denoising) provides a reconstruction map $\Delta:\mathbb R^n\to\mathbb R^d$ and a constant $C_{\mathrm{rec}}=C_{\mathrm{rec}}(\delta)>0$ satisfying \begin{align*} |\Delta(A\theta+w)-\theta|\leq C_{\mathrm{rec}}|w| \end{align*} for every $\theta\in\Theta_k$ and every noise vector $w\in\mathbb R^n$. This is uniform because the same map $\Delta$ and the same constant work for all sparse signals simultaneously. It has optimal noise sensitivity because the reconstruction error is controlled linearly by the Euclidean noise size. For necessity, the [Stable Embedding Lower Bound for Sparse Recovery](/page/Stable%20Embedding%20Lower%20Bound%20for%20Sparse%20Recovery) states that any recovery scheme with a fixed uniform stability level over all $k$-sparse vectors must use \begin{align*} n\gtrsim k\log\left(\frac{ed}{k}\right) \end{align*} measurements, with the implicit constant depending only on that stability level. Thus the Gaussian restricted-isometry upper bound and the stable-embedding lower bound match up to fixed constants.[/guided]

custom_env admin

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data