[proofplan]
The proof uses the standard cone-and-curvature mechanism for the constrained least-squares $\ell^1$ estimator in the statement. The $\ell^1$ constraint forces the error $h=\hat{x}-x$ into the descent cone determined by the support of $x$. The theorem states the exact deterministic cone inputs needed: lower curvature, upper Lipschitz control, and Gaussian width control on that cone. Optimality compares the residual at $\hat{x}$ to the residual at $x$, Gaussian concentration controls the noise process, and the cone curvature converts this stochastic bound into the stated recovery rate with the logarithmic factor $\log(ed/k)$.
[/proofplan]
[step:Fix the deterministic cone constants assumed in the theorem]
Work on a probability space $(\Omega,\mathcal{F},\mathbb{P})$ on which the noise vector $\varepsilon:(\Omega,\mathcal{F})\to(\mathbb{R}^n,\mathcal{B}(\mathbb{R}^n))$ is defined and satisfies $\varepsilon\sim\mathcal{N}(0,\sigma^2 I_n)$. All probabilities and expectations below are taken with respect to $\mathbb{P}$.
For every set $T\subset\{1,\dots,d\}$ with $|T|\leq k$, define the $\ell^1$ descent cone over $T$ by
\begin{align*}
\mathcal{C}(T):=\{v\in\mathbb{R}^d:\|v_{T^c}\|_1\leq \|v_T\|_1\}.
\end{align*}
The theorem statement assumes constants $\kappa>0$, $\Lambda>0$, and $\Gamma>0$ with the following three properties. First, for every $T\subset\{1,\dots,d\}$ with $|T|\leq k$ and every $v\in\mathcal{C}(T)$,
\begin{align*}
|Av|\geq \kappa |v|.
\end{align*}
Second, for every $T\subset\{1,\dots,d\}$ with $|T|\leq k$ and every $v\in\mathcal{C}(T)$ satisfying $|v|=1$,
\begin{align*}
|Av|\leq \Lambda.
\end{align*}
Third, if $g:(\Omega,\mathcal{F},\mathbb{P})\to\mathbb{R}^n$ is a standard Gaussian random vector and
\begin{align*}
K_T:=\mathcal{C}(T)\cap\{v\in\mathbb{R}^d:|v|=1\},
\end{align*}
then
\begin{align*}
\mathbb{E}\left[\sup_{v\in K_T}g\cdot Av\right]
\leq
\Gamma\sqrt{k\log\left(\frac{ed}{k}\right)}.
\end{align*}
These are the only deterministic assumptions on $A$ used below. If they are obtained from a standard RIP theorem, that theorem is used only before this proof, to verify the hypotheses of the present statement.
[/step]
[step:Place the estimation error in the $\ell^1$ descent cone]
Let $S := \operatorname{supp} x := \{j\in\{1,\dots,d\}:x_j\neq 0\}$ be the support of $x$. Since $x \in \Sigma_k$, we have $|S| \leq k$. Define the estimation error
\begin{align*}
h := \hat{x}-x \in \mathbb{R}^d.
\end{align*}
By the definition of the constrained least-squares estimator in the theorem statement, $\hat{x}$ is feasible for the constraint $\|z\|_1\leq \|x\|_1$, so $\|\hat{x}\|_1 \leq \|x\|_1$. Since $x_{S^c}=0$, we have
\begin{align*}
\|x+h\|_1
=
\|x_S+h_S\|_1+\|h_{S^c}\|_1.
\end{align*}
By the [reverse triangle inequality](/theorems/2300) on $\mathbb{R}^S$,
\begin{align*}
\|x_S+h_S\|_1 \geq \|x_S\|_1-\|h_S\|_1.
\end{align*}
Therefore
\begin{align*}
\|x\|_1
\geq
\|x+h\|_1
\geq
\|x_S\|_1-\|h_S\|_1+\|h_{S^c}\|_1.
\end{align*}
Since $\|x\|_1=\|x_S\|_1$, subtracting $\|x_S\|_1$ gives
\begin{align*}
\|h_{S^c}\|_1 \leq \|h_S\|_1.
\end{align*}
Thus $h \in \mathcal{C}(S)$.
[guided]
The first task is to identify the deterministic set in which the error must live. Let
\begin{align*}
S := \operatorname{supp} x := \{j\in\{1,\dots,d\}:x_j\neq 0\}.
\end{align*}
Thus $S$ is the support of $x$. Because $x$ is $k$-sparse, $|S|\leq k$. Define
\begin{align*}
h := \hat{x}-x.
\end{align*}
We want to prove that $h$ belongs to the cone
\begin{align*}
\mathcal{C}(S)=\{v\in\mathbb{R}^d:\|v_{S^c}\|_1\leq \|v_S\|_1\}.
\end{align*}
The only input needed here is feasibility for the constrained least-squares estimator stated in the theorem. Since $\hat{x}$ is chosen under the constraint $\|z\|_1\leq \|x\|_1$, it satisfies
\begin{align*}
\|\hat{x}\|_1 \leq \|x\|_1.
\end{align*}
Substituting $\hat{x}=x+h$ gives
\begin{align*}
\|x+h\|_1 \leq \|x\|_1.
\end{align*}
Because $x$ vanishes outside $S$, we split the $\ell^1$ norm over $S$ and $S^c$:
\begin{align*}
\|x+h\|_1
=
\|x_S+h_S\|_1+\|h_{S^c}\|_1.
\end{align*}
The term on $S$ is controlled from below by the reverse triangle inequality:
\begin{align*}
\|x_S+h_S\|_1 \geq \|x_S\|_1-\|h_S\|_1.
\end{align*}
Combining these inequalities gives
\begin{align*}
\|x\|_1
\geq
\|x+h\|_1
\geq
\|x_S\|_1-\|h_S\|_1+\|h_{S^c}\|_1.
\end{align*}
Since $x_{S^c}=0$, $\|x\|_1=\|x_S\|_1$. Cancelling this common term yields
\begin{align*}
\|h_{S^c}\|_1 \leq \|h_S\|_1.
\end{align*}
This is exactly the statement that $h\in\mathcal{C}(S)$.
[/guided]
[/step]
[step:Use optimality to compare the signal error with the noise]
Since $x$ is feasible for the constrained problem, the optimality of $\hat{x}$ gives
\begin{align*}
|y-A\hat{x}|^2 \leq |y-Ax|^2.
\end{align*}
Using $y=Ax+\varepsilon$ and $h=\hat{x}-x$, this becomes
\begin{align*}
|\varepsilon-Ah|^2 \leq |\varepsilon|^2.
\end{align*}
Expanding the Euclidean square in $\mathbb{R}^n$ gives
\begin{align*}
|\varepsilon|^2 -2\varepsilon\cdot Ah+|Ah|^2 \leq |\varepsilon|^2.
\end{align*}
After cancelling $|\varepsilon|^2$, we obtain
\begin{align*}
|Ah|^2 \leq 2\varepsilon\cdot Ah.
\end{align*}
[guided]
We now use the fact that $\hat{x}$ minimizes the residual over the constrained feasible set. The vector $x$ is feasible because $\|x\|_1\leq \|x\|_1$, so optimality gives
\begin{align*}
|y-A\hat{x}|^2 \leq |y-Ax|^2.
\end{align*}
Substitute the observation model $y=Ax+\varepsilon$ and the error definition $h=\hat{x}-x$. Then $A\hat{x}=A(x+h)=Ax+Ah$, so
\begin{align*}
y-A\hat{x}=Ax+\varepsilon-Ax-Ah=\varepsilon-Ah,
\end{align*}
and
\begin{align*}
y-Ax=\varepsilon.
\end{align*}
Therefore optimality becomes
\begin{align*}
|\varepsilon-Ah|^2 \leq |\varepsilon|^2.
\end{align*}
Expanding the Euclidean square in $\mathbb{R}^n$ yields
\begin{align*}
|\varepsilon-Ah|^2
=
|\varepsilon|^2 -2\varepsilon\cdot Ah+|Ah|^2.
\end{align*}
Combining the last two displays and cancelling the common term $|\varepsilon|^2$ gives
\begin{align*}
|Ah|^2 \leq 2\varepsilon\cdot Ah.
\end{align*}
This is the basic inequality: the deterministic signal error measured through $A$ is controlled by the noise paired with the same error direction.
[/guided]
[/step]
[step:Convert the basic inequality into an error bound over the cone]
If $h=0$, the desired estimate is immediate. Assume $h\neq 0$ and define the unit error direction $u\in\mathbb{R}^d$ by
\begin{align*}
u := \frac{h}{|h|}.
\end{align*}
Since $\mathcal{C}(S)$ is a cone and $h\in\mathcal{C}(S)$, we have
\begin{align*}
u\in \mathcal{C}(S)\cap \{v\in\mathbb{R}^d: |v|=1\}.
\end{align*}
The cone curvature estimate from the first step applies to $h\in\mathcal{C}(S)$ and gives
\begin{align*}
|Ah|^2 \geq \kappa^2 |h|^2.
\end{align*}
The basic inequality gives
\begin{align*}
\kappa^2 |h|^2
\leq
|Ah|^2
\leq
2\varepsilon\cdot Ah
=
2|h|\,\varepsilon\cdot Au.
\end{align*}
Dividing by $|h|>0$ yields
\begin{align*}
|h|
\leq
\frac{2}{\kappa^2}
\sup_{v\in \mathcal{C}(S)\cap \{w\in\mathbb{R}^d: |w|=1\}}
\varepsilon\cdot Av.
\end{align*}
[guided]
The purpose of this step is to turn the basic inequality into a bound on the Euclidean error $|h|$. If $h=0$, then $|\hat{x}-x|=0$ and there is nothing to prove. Assume $h\neq 0$ and define the unit error direction $u\in\mathbb{R}^d$ by
\begin{align*}
u := \frac{h}{|h|}.
\end{align*}
Because $\mathcal{C}(S)$ is closed under multiplication by positive scalars and $h\in\mathcal{C}(S)$, the vector $u$ also belongs to the cone. Its Euclidean norm is $1$, so
\begin{align*}
u\in \mathcal{C}(S)\cap \{v\in\mathbb{R}^d: |v|=1\}.
\end{align*}
The strengthened RIP hypothesis was recorded earlier as a cone curvature estimate: for every $v\in\mathcal{C}(S)$,
\begin{align*}
|Av|\geq \kappa |v|.
\end{align*}
Applying this estimate to $v=h$ gives
\begin{align*}
|Ah|^2 \geq \kappa^2 |h|^2.
\end{align*}
Now combine this lower bound with the basic inequality from the previous step:
\begin{align*}
\kappa^2 |h|^2
\leq
|Ah|^2
\leq
2\varepsilon\cdot Ah.
\end{align*}
Since $h=|h|u$, linearity of $A$ gives $Ah=|h|Au$, and therefore
\begin{align*}
2\varepsilon\cdot Ah=2|h|\,\varepsilon\cdot Au.
\end{align*}
Dividing by the positive number $|h|$ yields
\begin{align*}
|h|
\leq
\frac{2}{\kappa^2}\varepsilon\cdot Au.
\end{align*}
Finally, $u$ is one admissible element of $\mathcal{C}(S)\cap\{w\in\mathbb{R}^d:|w|=1\}$, so replacing this single direction by the supremum over all admissible directions gives
\begin{align*}
|h|
\leq
\frac{2}{\kappa^2}
\sup_{v\in \mathcal{C}(S)\cap \{w\in\mathbb{R}^d: |w|=1\}}
\varepsilon\cdot Av.
\end{align*}
[/guided]
[/step]
[step:Control the Gaussian noise process on the sparse cone]
Let
\begin{align*}
K_S := \mathcal{C}(S)\cap \{v\in\mathbb{R}^d: |v|=1\}.
\end{align*}
If $\sigma=0$, then $\varepsilon=0$ almost surely and the final estimate follows from the deterministic reduction. Hence assume $\sigma>0$. Define the standard Gaussian random vector $g:(\Omega,\mathcal{F},\mathbb{P})\to\mathbb{R}^n$ by
\begin{align*}
g(\omega):=\sigma^{-1}\varepsilon(\omega).
\end{align*}
Then $\varepsilon=\sigma g$ on the same probability space. Define the Gaussian supremum
\begin{align*}
Z_S := \sup_{v\in K_S} g\cdot Av.
\end{align*}
The set $K_S$ is a closed subset of the Euclidean unit sphere in $\mathbb{R}^d$, hence compact. For each $w\in\mathbb{R}^n$, the map $v\mapsto w\cdot Av$ is continuous on $K_S$, so the supremum defining $\Phi_S(w)$ below is finite and attained; moreover $Z_S=\Phi_S(g)$ is measurable. The upper cone bound gives $Z_S\leq \Lambda |g|$, so $\mathbb{E}[Z_S]$ is finite.
By the Gaussian complexity estimate from the first step, applied to the support set $S$ with $|S|\leq k$,
\begin{align*}
\mathbb{E}[Z_S]\leq \Gamma\sqrt{k\log\left(\frac{ed}{k}\right)}.
\end{align*}
Define the map $\Phi_S:\mathbb{R}^n\to\mathbb{R}$ by
\begin{align*}
\Phi_S(w):=\sup_{v\in K_S} w\cdot Av.
\end{align*}
This map is $\Lambda$-Lipschitz: for $w_1,w_2\in\mathbb{R}^n$,
\begin{align*}
|\Phi_S(w_1)-\Phi_S(w_2)|
\leq
\sup_{v\in K_S}|(w_1-w_2)\cdot Av|
\leq
|w_1-w_2|\sup_{v\in K_S}|Av|
\leq
\Lambda |w_1-w_2|.
\end{align*}
By the [Gaussian concentration inequality for Lipschitz functions](/page/Gaussian%20Concentration%20Inequality) applied to the $\Lambda$-Lipschitz map $\Phi_S:\mathbb{R}^n\to\mathbb{R}$ and the standard Gaussian random vector $g$, for every $r\geq 0$,
\begin{align*}
\mathbb{P}\left(Z_S\leq \mathbb{E}[Z_S]+\Lambda r\right)
\geq
1-e^{-r^2/2}.
\end{align*}
Therefore, with probability at least $1-e^{-r^2/2}$,
\begin{align*}
\sup_{v\in K_S}\varepsilon\cdot Av
=
\sigma Z_S
\leq
\sigma
\left(
\Gamma\sqrt{k\log\left(\frac{ed}{k}\right)}
+
\Lambda r
\right).
\end{align*}
[guided]
The previous step reduced the recovery error to a stochastic supremum over the cone. We now estimate that supremum.
Define
\begin{align*}
K_S := \mathcal{C}(S)\cap \{v\in\mathbb{R}^d: |v|=1\}.
\end{align*}
This is the set of unit directions in which the error can point. If $\sigma=0$, then $\varepsilon=0$ almost surely, so the stochastic term is zero and the final recovery estimate follows immediately from the deterministic reduction. Thus the only nontrivial case is $\sigma>0$. In that case define the map $g:(\Omega,\mathcal{F},\mathbb{P})\to\mathbb{R}^n$ by
\begin{align*}
g(\omega):=\sigma^{-1}\varepsilon(\omega).
\end{align*}
Because $\varepsilon\sim\mathcal{N}(0,\sigma^2 I_n)$, this $g$ is a standard Gaussian random vector and $\varepsilon=\sigma g$ on the same probability space. Define
\begin{align*}
Z_S := \sup_{v\in K_S} g\cdot Av.
\end{align*}
Before using expectations or concentration, we check that this supremum is a legitimate [random variable](/page/Random%20Variable) with finite expectation. The cone $\mathcal{C}(S)$ is closed because it is defined by the continuous inequality $\|v_{S^c}\|_1\leq\|v_S\|_1$, and therefore $K_S$ is a closed subset of the Euclidean unit sphere in $\mathbb{R}^d$. Hence $K_S$ is compact. For each fixed $w\in\mathbb{R}^n$, the map $v\mapsto w\cdot Av$ is continuous on $K_S$, so the supremum is finite and attained. With $\Phi_S(w):=\sup_{v\in K_S}w\cdot Av$, this gives $Z_S=\Phi_S(g)$, and the Lipschitz estimate proved below implies that $\Phi_S$ is continuous, hence Borel measurable. Finally, since $|Av|\leq\Lambda$ for $v\in K_S$, we have $Z_S\leq\Lambda |g|$, and $\mathbb{E}[|g|]<\infty$ for a standard Gaussian vector in $\mathbb{R}^n$. Thus $\mathbb{E}[Z_S]$ is finite.
The Gaussian complexity hypothesis from the theorem, recorded in the first step, applies because $S$ has cardinality at most $k$ and $K_S$ is the corresponding unit cone section. It gives
\begin{align*}
\mathbb{E}[Z_S]\leq \Gamma\sqrt{k\log\left(\frac{ed}{k}\right)}.
\end{align*}
To pass from an expectation estimate to a high-probability estimate, we verify the Lipschitz constant of the supremum functional. Define
\begin{align*}
\Phi_S:\mathbb{R}^n\to\mathbb{R}
\end{align*}
by
\begin{align*}
\Phi_S(w):=\sup_{v\in K_S} w\cdot Av.
\end{align*}
For two vectors $w_1,w_2\in\mathbb{R}^n$,
\begin{align*}
|\Phi_S(w_1)-\Phi_S(w_2)|
\leq
\sup_{v\in K_S}|(w_1-w_2)\cdot Av|.
\end{align*}
The [Cauchy-Schwarz inequality](/theorems/432) in the Euclidean space $\mathbb{R}^n$ gives
\begin{align*}
|(w_1-w_2)\cdot Av|\leq |w_1-w_2|\,|Av|.
\end{align*}
Since $v\in K_S\subset \mathcal{C}(S)$ and $|v|=1$, the upper restricted isometry bound gives
\begin{align*}
|Av|\leq \Lambda.
\end{align*}
Thus
\begin{align*}
|\Phi_S(w_1)-\Phi_S(w_2)|\leq \Lambda |w_1-w_2|.
\end{align*}
So $\Phi_S$ is $\Lambda$-Lipschitz.
We now apply the [Gaussian concentration inequality for Lipschitz functions](/page/Gaussian%20Concentration%20Inequality). Its hypotheses are satisfied because $g$ is a standard Gaussian random vector in $\mathbb{R}^n$, the functional $\Phi_S:\mathbb{R}^n\to\mathbb{R}$ is Borel measurable with finite expectation as verified above, and $\Phi_S$ is $\Lambda$-Lipschitz. Applied to $\Phi_S$, it gives, for every $r\geq 0$,
\begin{align*}
\mathbb{P}\left(\Phi_S(g)\leq \mathbb{E}[\Phi_S(g)]+\Lambda r\right)\geq 1-e^{-r^2/2}.
\end{align*}
Since $\Phi_S(g)=Z_S$, this becomes
\begin{align*}
\mathbb{P}\left(Z_S\leq \mathbb{E}[Z_S]+\Lambda r\right)\geq 1-e^{-r^2/2}.
\end{align*}
Using the expectation bound for $Z_S$, we conclude that with probability at least $1-e^{-r^2/2}$,
\begin{align*}
Z_S
\leq
\Gamma\sqrt{k\log\left(\frac{ed}{k}\right)}
+
\Lambda r.
\end{align*}
Multiplying by $\sigma$ gives the corresponding bound for the original noise vector:
\begin{align*}
\sup_{v\in K_S}\varepsilon\cdot Av
=
\sigma Z_S
\leq
\sigma
\left(
\Gamma\sqrt{k\log\left(\frac{ed}{k}\right)}
+
\Lambda r
\right).
\end{align*}
[/guided]
[/step]
[step:Combine cone curvature and noise concentration to obtain the recovery rate]
On the event from the previous step, the deterministic reduction gives
\begin{align*}
|\hat{x}-x|
=
|h|
\leq
\frac{2}{\kappa^2}
\sup_{v\in K_S}\varepsilon\cdot Av.
\end{align*}
Substituting the stochastic bound yields
\begin{align*}
|\hat{x}-x|
\leq
\frac{2\sigma}{\kappa^2}
\left(
\Gamma\sqrt{k\log\left(\frac{ed}{k}\right)}
+
\Lambda r
\right)
\end{align*}
with probability at least $1-e^{-r^2/2}$. Equivalently, for every $r\geq0$,
\begin{align*}
|\hat{x}-x|
\leq
C_0\sigma
\left(
\sqrt{k\log\left(\frac{ed}{k}\right)}
+
r
\right),
\qquad
C_0:=\frac{2\max\{\Gamma,\Lambda\}}{\kappa^2},
\end{align*}
on the same event. Taking
\begin{align*}
r:=\sqrt{k\log\left(\frac{ed}{k}\right)}
\end{align*}
gives
\begin{align*}
|\hat{x}-x|
\leq
\frac{2\sigma(\Gamma+\Lambda)}{\kappa^2}
\sqrt{k\log\left(\frac{ed}{k}\right)}
\end{align*}
with probability at least
\begin{align*}
1-\exp\left(-\frac12 k\log\left(\frac{ed}{k}\right)\right).
\end{align*}
Thus the asserted rate holds with
\begin{align*}
C := \frac{2(\Gamma+\Lambda)}{\kappa^2},
\end{align*}
which depends only on the deterministic cone constants $\kappa$, $\Lambda$, and $\Gamma$. In the RIP corollary mentioned in the statement, these cone constants depend only on the RIP constants, so $C$ also depends only on the RIP constants.
[guided]
We now combine the deterministic and probabilistic estimates. On the event supplied by Gaussian concentration, the previous stochastic step gives
\begin{align*}
\sup_{v\in K_S}\varepsilon\cdot Av
\leq
\sigma
\left(
\Gamma\sqrt{k\log\left(\frac{ed}{k}\right)}
+
\Lambda r
\right).
\end{align*}
The deterministic cone reduction gives
\begin{align*}
|\hat{x}-x|
=
|h|
\leq
\frac{2}{\kappa^2}
\sup_{v\in K_S}\varepsilon\cdot Av.
\end{align*}
Substituting the stochastic estimate into this deterministic inequality yields
\begin{align*}
|\hat{x}-x|
\leq
\frac{2\sigma}{\kappa^2}
\left(
\Gamma\sqrt{k\log\left(\frac{ed}{k}\right)}
+
\Lambda r
\right)
\end{align*}
with probability at least $1-e^{-r^2/2}$. Choose
\begin{align*}
r:=\sqrt{k\log\left(\frac{ed}{k}\right)}.
\end{align*}
Then the two terms inside the parentheses have the same scale, and we obtain
\begin{align*}
|\hat{x}-x|
\leq
\frac{2\sigma(\Gamma+\Lambda)}{\kappa^2}
\sqrt{k\log\left(\frac{ed}{k}\right)}.
\end{align*}
For this choice of $r$, the probability of the event is at least
\begin{align*}
1-\exp\left(-\frac12 k\log\left(\frac{ed}{k}\right)\right).
\end{align*}
Define
\begin{align*}
C := \frac{2(\Gamma+\Lambda)}{\kappa^2}.
\end{align*}
Since $\kappa$, $\Lambda$, and $\Gamma$ are the deterministic cone constants assumed in the theorem, $C$ depends only on those constants. In any RIP setting where a separate deterministic result supplies these cone constants with dependence only on the RIP constants, the same is true of $C$. Therefore
\begin{align*}
|\hat{x}-x|\leq C\sigma\sqrt{k\log\left(\frac{ed}{k}\right)}
\end{align*}
with probability at least $1-\exp(-k\log(ed/k)/2)$, which is the asserted recovery rate in the updated statement.
[/guided]
[/step]