Noisy Sparse Recovery Rate under Restricted Isometry

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The proof uses the standard cone-and-curvature mechanism for the constrained least-squares $\ell^1$ estimator in the statement. The $\ell^1$ constraint forces the error $h=\hat{x}-x$ into the descent cone determined by the support of $x$. The theorem states the exact deterministic cone inputs needed: lower curvature, upper Lipschitz control, and Gaussian width control on that cone. Optimality compares the residual at $\hat{x}$ to the residual at $x$, Gaussian concentration controls the noise process, and the cone curvature converts this stochastic bound into the stated recovery rate with the logarithmic factor $\log(ed/k)$. [/proofplan] [step:Fix the deterministic cone constants assumed in the theorem] Work on a probability space $(\Omega,\mathcal{F},\mathbb{P})$ on which the noise vector $\varepsilon:(\Omega,\mathcal{F})\to(\mathbb{R}^n,\mathcal{B}(\mathbb{R}^n))$ is defined and satisfies $\varepsilon\sim\mathcal{N}(0,\sigma^2 I_n)$. All probabilities and expectations below are taken with respect to $\mathbb{P}$. For every set $T\subset\{1,\dots,d\}$ with $|T|\leq k$, define the $\ell^1$ descent cone over $T$ by \begin{align*} \mathcal{C}(T):=\{v\in\mathbb{R}^d:\|v_{T^c}\|_1\leq \|v_T\|_1\}. \end{align*} The theorem statement assumes constants $\kappa>0$, $\Lambda>0$, and $\Gamma>0$ with the following three properties. First, for every $T\subset\{1,\dots,d\}$ with $|T|\leq k$ and every $v\in\mathcal{C}(T)$, \begin{align*} |Av|\geq \kappa |v|. \end{align*} Second, for every $T\subset\{1,\dots,d\}$ with $|T|\leq k$ and every $v\in\mathcal{C}(T)$ satisfying $|v|=1$, \begin{align*} |Av|\leq \Lambda. \end{align*} Third, if $g:(\Omega,\mathcal{F},\mathbb{P})\to\mathbb{R}^n$ is a standard Gaussian random vector and \begin{align*} K_T:=\mathcal{C}(T)\cap\{v\in\mathbb{R}^d:|v|=1\}, \end{align*} then \begin{align*} \mathbb{E}\left[\sup_{v\in K_T}g\cdot Av\right] \leq \Gamma\sqrt{k\log\left(\frac{ed}{k}\right)}. \end{align*} These are the only deterministic assumptions on $A$ used below. If they are obtained from a standard RIP theorem, that theorem is used only before this proof, to verify the hypotheses of the present statement. [/step] [step:Place the estimation error in the $\ell^1$ descent cone] Let $S := \operatorname{supp} x := \{j\in\{1,\dots,d\}:x_j\neq 0\}$ be the support of $x$. Since $x \in \Sigma_k$, we have $|S| \leq k$. Define the estimation error \begin{align*} h := \hat{x}-x \in \mathbb{R}^d. \end{align*} By the definition of the constrained least-squares estimator in the theorem statement, $\hat{x}$ is feasible for the constraint $\|z\|_1\leq \|x\|_1$, so $\|\hat{x}\|_1 \leq \|x\|_1$. Since $x_{S^c}=0$, we have \begin{align*} \|x+h\|_1 = \|x_S+h_S\|_1+\|h_{S^c}\|_1. \end{align*} By the [reverse triangle inequality](/theorems/2300) on $\mathbb{R}^S$, \begin{align*} \|x_S+h_S\|_1 \geq \|x_S\|_1-\|h_S\|_1. \end{align*} Therefore \begin{align*} \|x\|_1 \geq \|x+h\|_1 \geq \|x_S\|_1-\|h_S\|_1+\|h_{S^c}\|_1. \end{align*} Since $\|x\|_1=\|x_S\|_1$, subtracting $\|x_S\|_1$ gives \begin{align*} \|h_{S^c}\|_1 \leq \|h_S\|_1. \end{align*} Thus $h \in \mathcal{C}(S)$. [guided] The first task is to identify the deterministic set in which the error must live. Let \begin{align*} S := \operatorname{supp} x := \{j\in\{1,\dots,d\}:x_j\neq 0\}. \end{align*} Thus $S$ is the support of $x$. Because $x$ is $k$-sparse, $|S|\leq k$. Define \begin{align*} h := \hat{x}-x. \end{align*} We want to prove that $h$ belongs to the cone \begin{align*} \mathcal{C}(S)=\{v\in\mathbb{R}^d:\|v_{S^c}\|_1\leq \|v_S\|_1\}. \end{align*} The only input needed here is feasibility for the constrained least-squares estimator stated in the theorem. Since $\hat{x}$ is chosen under the constraint $\|z\|_1\leq \|x\|_1$, it satisfies \begin{align*} \|\hat{x}\|_1 \leq \|x\|_1. \end{align*} Substituting $\hat{x}=x+h$ gives \begin{align*} \|x+h\|_1 \leq \|x\|_1. \end{align*} Because $x$ vanishes outside $S$, we split the $\ell^1$ norm over $S$ and $S^c$: \begin{align*} \|x+h\|_1 = \|x_S+h_S\|_1+\|h_{S^c}\|_1. \end{align*} The term on $S$ is controlled from below by the reverse triangle inequality: \begin{align*} \|x_S+h_S\|_1 \geq \|x_S\|_1-\|h_S\|_1. \end{align*} Combining these inequalities gives \begin{align*} \|x\|_1 \geq \|x+h\|_1 \geq \|x_S\|_1-\|h_S\|_1+\|h_{S^c}\|_1. \end{align*} Since $x_{S^c}=0$, $\|x\|_1=\|x_S\|_1$. Cancelling this common term yields \begin{align*} \|h_{S^c}\|_1 \leq \|h_S\|_1. \end{align*} This is exactly the statement that $h\in\mathcal{C}(S)$. [/guided] [/step] [step:Use optimality to compare the signal error with the noise] Since $x$ is feasible for the constrained problem, the optimality of $\hat{x}$ gives \begin{align*} |y-A\hat{x}|^2 \leq |y-Ax|^2. \end{align*} Using $y=Ax+\varepsilon$ and $h=\hat{x}-x$, this becomes \begin{align*} |\varepsilon-Ah|^2 \leq |\varepsilon|^2. \end{align*} Expanding the Euclidean square in $\mathbb{R}^n$ gives \begin{align*} |\varepsilon|^2 -2\varepsilon\cdot Ah+|Ah|^2 \leq |\varepsilon|^2. \end{align*} After cancelling $|\varepsilon|^2$, we obtain \begin{align*} |Ah|^2 \leq 2\varepsilon\cdot Ah. \end{align*} [guided] We now use the fact that $\hat{x}$ minimizes the residual over the constrained feasible set. The vector $x$ is feasible because $\|x\|_1\leq \|x\|_1$, so optimality gives \begin{align*} |y-A\hat{x}|^2 \leq |y-Ax|^2. \end{align*} Substitute the observation model $y=Ax+\varepsilon$ and the error definition $h=\hat{x}-x$. Then $A\hat{x}=A(x+h)=Ax+Ah$, so \begin{align*} y-A\hat{x}=Ax+\varepsilon-Ax-Ah=\varepsilon-Ah, \end{align*} and \begin{align*} y-Ax=\varepsilon. \end{align*} Therefore optimality becomes \begin{align*} |\varepsilon-Ah|^2 \leq |\varepsilon|^2. \end{align*} Expanding the Euclidean square in $\mathbb{R}^n$ yields \begin{align*} |\varepsilon-Ah|^2 = |\varepsilon|^2 -2\varepsilon\cdot Ah+|Ah|^2. \end{align*} Combining the last two displays and cancelling the common term $|\varepsilon|^2$ gives \begin{align*} |Ah|^2 \leq 2\varepsilon\cdot Ah. \end{align*} This is the basic inequality: the deterministic signal error measured through $A$ is controlled by the noise paired with the same error direction. [/guided] [/step] [step:Convert the basic inequality into an error bound over the cone] If $h=0$, the desired estimate is immediate. Assume $h\neq 0$ and define the unit error direction $u\in\mathbb{R}^d$ by \begin{align*} u := \frac{h}{|h|}. \end{align*} Since $\mathcal{C}(S)$ is a cone and $h\in\mathcal{C}(S)$, we have \begin{align*} u\in \mathcal{C}(S)\cap \{v\in\mathbb{R}^d: |v|=1\}. \end{align*} The cone curvature estimate from the first step applies to $h\in\mathcal{C}(S)$ and gives \begin{align*} |Ah|^2 \geq \kappa^2 |h|^2. \end{align*} The basic inequality gives \begin{align*} \kappa^2 |h|^2 \leq |Ah|^2 \leq 2\varepsilon\cdot Ah = 2|h|\,\varepsilon\cdot Au. \end{align*} Dividing by $|h|>0$ yields \begin{align*} |h| \leq \frac{2}{\kappa^2} \sup_{v\in \mathcal{C}(S)\cap \{w\in\mathbb{R}^d: |w|=1\}} \varepsilon\cdot Av. \end{align*} [guided] The purpose of this step is to turn the basic inequality into a bound on the Euclidean error $|h|$. If $h=0$, then $|\hat{x}-x|=0$ and there is nothing to prove. Assume $h\neq 0$ and define the unit error direction $u\in\mathbb{R}^d$ by \begin{align*} u := \frac{h}{|h|}. \end{align*} Because $\mathcal{C}(S)$ is closed under multiplication by positive scalars and $h\in\mathcal{C}(S)$, the vector $u$ also belongs to the cone. Its Euclidean norm is $1$, so \begin{align*} u\in \mathcal{C}(S)\cap \{v\in\mathbb{R}^d: |v|=1\}. \end{align*} The strengthened RIP hypothesis was recorded earlier as a cone curvature estimate: for every $v\in\mathcal{C}(S)$, \begin{align*} |Av|\geq \kappa |v|. \end{align*} Applying this estimate to $v=h$ gives \begin{align*} |Ah|^2 \geq \kappa^2 |h|^2. \end{align*} Now combine this lower bound with the basic inequality from the previous step: \begin{align*} \kappa^2 |h|^2 \leq |Ah|^2 \leq 2\varepsilon\cdot Ah. \end{align*} Since $h=|h|u$, linearity of $A$ gives $Ah=|h|Au$, and therefore \begin{align*} 2\varepsilon\cdot Ah=2|h|\,\varepsilon\cdot Au. \end{align*} Dividing by the positive number $|h|$ yields \begin{align*} |h| \leq \frac{2}{\kappa^2}\varepsilon\cdot Au. \end{align*} Finally, $u$ is one admissible element of $\mathcal{C}(S)\cap\{w\in\mathbb{R}^d:|w|=1\}$, so replacing this single direction by the supremum over all admissible directions gives \begin{align*} |h| \leq \frac{2}{\kappa^2} \sup_{v\in \mathcal{C}(S)\cap \{w\in\mathbb{R}^d: |w|=1\}} \varepsilon\cdot Av. \end{align*} [/guided] [/step] [step:Control the Gaussian noise process on the sparse cone] Let \begin{align*} K_S := \mathcal{C}(S)\cap \{v\in\mathbb{R}^d: |v|=1\}. \end{align*} If $\sigma=0$, then $\varepsilon=0$ almost surely and the final estimate follows from the deterministic reduction. Hence assume $\sigma>0$. Define the standard Gaussian random vector $g:(\Omega,\mathcal{F},\mathbb{P})\to\mathbb{R}^n$ by \begin{align*} g(\omega):=\sigma^{-1}\varepsilon(\omega). \end{align*} Then $\varepsilon=\sigma g$ on the same probability space. Define the Gaussian supremum \begin{align*} Z_S := \sup_{v\in K_S} g\cdot Av. \end{align*} The set $K_S$ is a closed subset of the Euclidean unit sphere in $\mathbb{R}^d$, hence compact. For each $w\in\mathbb{R}^n$, the map $v\mapsto w\cdot Av$ is continuous on $K_S$, so the supremum defining $\Phi_S(w)$ below is finite and attained; moreover $Z_S=\Phi_S(g)$ is measurable. The upper cone bound gives $Z_S\leq \Lambda |g|$, so $\mathbb{E}[Z_S]$ is finite. By the Gaussian complexity estimate from the first step, applied to the support set $S$ with $|S|\leq k$, \begin{align*} \mathbb{E}[Z_S]\leq \Gamma\sqrt{k\log\left(\frac{ed}{k}\right)}. \end{align*} Define the map $\Phi_S:\mathbb{R}^n\to\mathbb{R}$ by \begin{align*} \Phi_S(w):=\sup_{v\in K_S} w\cdot Av. \end{align*} This map is $\Lambda$-Lipschitz: for $w_1,w_2\in\mathbb{R}^n$, \begin{align*} |\Phi_S(w_1)-\Phi_S(w_2)| \leq \sup_{v\in K_S}|(w_1-w_2)\cdot Av| \leq |w_1-w_2|\sup_{v\in K_S}|Av| \leq \Lambda |w_1-w_2|. \end{align*} By the [Gaussian concentration inequality for Lipschitz functions](/page/Gaussian%20Concentration%20Inequality) applied to the $\Lambda$-Lipschitz map $\Phi_S:\mathbb{R}^n\to\mathbb{R}$ and the standard Gaussian random vector $g$, for every $r\geq 0$, \begin{align*} \mathbb{P}\left(Z_S\leq \mathbb{E}[Z_S]+\Lambda r\right) \geq 1-e^{-r^2/2}. \end{align*} Therefore, with probability at least $1-e^{-r^2/2}$, \begin{align*} \sup_{v\in K_S}\varepsilon\cdot Av = \sigma Z_S \leq \sigma \left( \Gamma\sqrt{k\log\left(\frac{ed}{k}\right)} + \Lambda r \right). \end{align*} [guided] The previous step reduced the recovery error to a stochastic supremum over the cone. We now estimate that supremum. Define \begin{align*} K_S := \mathcal{C}(S)\cap \{v\in\mathbb{R}^d: |v|=1\}. \end{align*} This is the set of unit directions in which the error can point. If $\sigma=0$, then $\varepsilon=0$ almost surely, so the stochastic term is zero and the final recovery estimate follows immediately from the deterministic reduction. Thus the only nontrivial case is $\sigma>0$. In that case define the map $g:(\Omega,\mathcal{F},\mathbb{P})\to\mathbb{R}^n$ by \begin{align*} g(\omega):=\sigma^{-1}\varepsilon(\omega). \end{align*} Because $\varepsilon\sim\mathcal{N}(0,\sigma^2 I_n)$, this $g$ is a standard Gaussian random vector and $\varepsilon=\sigma g$ on the same probability space. Define \begin{align*} Z_S := \sup_{v\in K_S} g\cdot Av. \end{align*} Before using expectations or concentration, we check that this supremum is a legitimate [random variable](/page/Random%20Variable) with finite expectation. The cone $\mathcal{C}(S)$ is closed because it is defined by the continuous inequality $\|v_{S^c}\|_1\leq\|v_S\|_1$, and therefore $K_S$ is a closed subset of the Euclidean unit sphere in $\mathbb{R}^d$. Hence $K_S$ is compact. For each fixed $w\in\mathbb{R}^n$, the map $v\mapsto w\cdot Av$ is continuous on $K_S$, so the supremum is finite and attained. With $\Phi_S(w):=\sup_{v\in K_S}w\cdot Av$, this gives $Z_S=\Phi_S(g)$, and the Lipschitz estimate proved below implies that $\Phi_S$ is continuous, hence Borel measurable. Finally, since $|Av|\leq\Lambda$ for $v\in K_S$, we have $Z_S\leq\Lambda |g|$, and $\mathbb{E}[|g|]<\infty$ for a standard Gaussian vector in $\mathbb{R}^n$. Thus $\mathbb{E}[Z_S]$ is finite. The Gaussian complexity hypothesis from the theorem, recorded in the first step, applies because $S$ has cardinality at most $k$ and $K_S$ is the corresponding unit cone section. It gives \begin{align*} \mathbb{E}[Z_S]\leq \Gamma\sqrt{k\log\left(\frac{ed}{k}\right)}. \end{align*} To pass from an expectation estimate to a high-probability estimate, we verify the Lipschitz constant of the supremum functional. Define \begin{align*} \Phi_S:\mathbb{R}^n\to\mathbb{R} \end{align*} by \begin{align*} \Phi_S(w):=\sup_{v\in K_S} w\cdot Av. \end{align*} For two vectors $w_1,w_2\in\mathbb{R}^n$, \begin{align*} |\Phi_S(w_1)-\Phi_S(w_2)| \leq \sup_{v\in K_S}|(w_1-w_2)\cdot Av|. \end{align*} The [Cauchy-Schwarz inequality](/theorems/432) in the Euclidean space $\mathbb{R}^n$ gives \begin{align*} |(w_1-w_2)\cdot Av|\leq |w_1-w_2|\,|Av|. \end{align*} Since $v\in K_S\subset \mathcal{C}(S)$ and $|v|=1$, the upper restricted isometry bound gives \begin{align*} |Av|\leq \Lambda. \end{align*} Thus \begin{align*} |\Phi_S(w_1)-\Phi_S(w_2)|\leq \Lambda |w_1-w_2|. \end{align*} So $\Phi_S$ is $\Lambda$-Lipschitz. We now apply the [Gaussian concentration inequality for Lipschitz functions](/page/Gaussian%20Concentration%20Inequality). Its hypotheses are satisfied because $g$ is a standard Gaussian random vector in $\mathbb{R}^n$, the functional $\Phi_S:\mathbb{R}^n\to\mathbb{R}$ is Borel measurable with finite expectation as verified above, and $\Phi_S$ is $\Lambda$-Lipschitz. Applied to $\Phi_S$, it gives, for every $r\geq 0$, \begin{align*} \mathbb{P}\left(\Phi_S(g)\leq \mathbb{E}[\Phi_S(g)]+\Lambda r\right)\geq 1-e^{-r^2/2}. \end{align*} Since $\Phi_S(g)=Z_S$, this becomes \begin{align*} \mathbb{P}\left(Z_S\leq \mathbb{E}[Z_S]+\Lambda r\right)\geq 1-e^{-r^2/2}. \end{align*} Using the expectation bound for $Z_S$, we conclude that with probability at least $1-e^{-r^2/2}$, \begin{align*} Z_S \leq \Gamma\sqrt{k\log\left(\frac{ed}{k}\right)} + \Lambda r. \end{align*} Multiplying by $\sigma$ gives the corresponding bound for the original noise vector: \begin{align*} \sup_{v\in K_S}\varepsilon\cdot Av = \sigma Z_S \leq \sigma \left( \Gamma\sqrt{k\log\left(\frac{ed}{k}\right)} + \Lambda r \right). \end{align*} [/guided] [/step] [step:Combine cone curvature and noise concentration to obtain the recovery rate] On the event from the previous step, the deterministic reduction gives \begin{align*} |\hat{x}-x| = |h| \leq \frac{2}{\kappa^2} \sup_{v\in K_S}\varepsilon\cdot Av. \end{align*} Substituting the stochastic bound yields \begin{align*} |\hat{x}-x| \leq \frac{2\sigma}{\kappa^2} \left( \Gamma\sqrt{k\log\left(\frac{ed}{k}\right)} + \Lambda r \right) \end{align*} with probability at least $1-e^{-r^2/2}$. Equivalently, for every $r\geq0$, \begin{align*} |\hat{x}-x| \leq C_0\sigma \left( \sqrt{k\log\left(\frac{ed}{k}\right)} + r \right), \qquad C_0:=\frac{2\max\{\Gamma,\Lambda\}}{\kappa^2}, \end{align*} on the same event. Taking \begin{align*} r:=\sqrt{k\log\left(\frac{ed}{k}\right)} \end{align*} gives \begin{align*} |\hat{x}-x| \leq \frac{2\sigma(\Gamma+\Lambda)}{\kappa^2} \sqrt{k\log\left(\frac{ed}{k}\right)} \end{align*} with probability at least \begin{align*} 1-\exp\left(-\frac12 k\log\left(\frac{ed}{k}\right)\right). \end{align*} Thus the asserted rate holds with \begin{align*} C := \frac{2(\Gamma+\Lambda)}{\kappa^2}, \end{align*} which depends only on the deterministic cone constants $\kappa$, $\Lambda$, and $\Gamma$. In the RIP corollary mentioned in the statement, these cone constants depend only on the RIP constants, so $C$ also depends only on the RIP constants. [guided] We now combine the deterministic and probabilistic estimates. On the event supplied by Gaussian concentration, the previous stochastic step gives \begin{align*} \sup_{v\in K_S}\varepsilon\cdot Av \leq \sigma \left( \Gamma\sqrt{k\log\left(\frac{ed}{k}\right)} + \Lambda r \right). \end{align*} The deterministic cone reduction gives \begin{align*} |\hat{x}-x| = |h| \leq \frac{2}{\kappa^2} \sup_{v\in K_S}\varepsilon\cdot Av. \end{align*} Substituting the stochastic estimate into this deterministic inequality yields \begin{align*} |\hat{x}-x| \leq \frac{2\sigma}{\kappa^2} \left( \Gamma\sqrt{k\log\left(\frac{ed}{k}\right)} + \Lambda r \right) \end{align*} with probability at least $1-e^{-r^2/2}$. Choose \begin{align*} r:=\sqrt{k\log\left(\frac{ed}{k}\right)}. \end{align*} Then the two terms inside the parentheses have the same scale, and we obtain \begin{align*} |\hat{x}-x| \leq \frac{2\sigma(\Gamma+\Lambda)}{\kappa^2} \sqrt{k\log\left(\frac{ed}{k}\right)}. \end{align*} For this choice of $r$, the probability of the event is at least \begin{align*} 1-\exp\left(-\frac12 k\log\left(\frac{ed}{k}\right)\right). \end{align*} Define \begin{align*} C := \frac{2(\Gamma+\Lambda)}{\kappa^2}. \end{align*} Since $\kappa$, $\Lambda$, and $\Gamma$ are the deterministic cone constants assumed in the theorem, $C$ depends only on those constants. In any RIP setting where a separate deterministic result supplies these cone constants with dependence only on the RIP constants, the same is true of $C$. Therefore \begin{align*} |\hat{x}-x|\leq C\sigma\sqrt{k\log\left(\frac{ed}{k}\right)} \end{align*} with probability at least $1-\exp(-k\log(ed/k)/2)$, which is the asserted recovery rate in the updated statement. [/guided] [/step]

Prerequisites (0/6 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.