[proofplan]
We compare the minimizer $\hat{x}$ with the true vector $x$ through the error vector $h:=\hat{x}-x$. The optimization minimality gives a cone inequality: the error outside the largest $k$ coordinates of $x$ is controlled by the error on those coordinates and the tail $\sigma_k(x)_1$. Feasibility of both $x$ and $\hat{x}$ gives the measurement estimate $|Ah|\leq 2\eta$. Finally, the RIP hypothesis is used through the standard robust null-space property, which first controls $|h_S|$ and then controls the full Euclidean error by adding the off-support part.
[/proofplan]
custom_env
admin
[step:Choose the largest coordinate set and define the error vector]
Throughout the proof, $|\cdot|$ denotes the Euclidean norm on whichever finite-dimensional Euclidean space is in scope, either $\mathbb{R}^N$ or $\mathbb{R}^m$, and $\|\cdot\|_1$ denotes the $\ell^1$ norm on $\mathbb{R}^N$. Let $h \in \mathbb{R}^N$ denote the reconstruction error
\begin{align*}
h := \hat{x}-x.
\end{align*}
Let $S \subset \{1,\dots,N\}$ be a set of cardinality at most $k$ indexing $k$ largest coordinates of $x$ in absolute value, with arbitrary tie-breaking. For a vector $v \in \mathbb{R}^N$ and a set $T \subset \{1,\dots,N\}$, define $v_T \in \mathbb{R}^N$ coordinatewise by setting $(v_T)_i := v_i$ for $i \in T$ and $(v_T)_i := 0$ for $i \notin T$.
By the choice of $S$, the vector $x_S$ has at most $k$ non-zero coordinates and
\begin{align*}
\|x_{S^c}\|_1 = \sigma_k(x)_1.
\end{align*}
[/step]
custom_env
admin
[step:Derive the cone inequality from minimality]Since $y=Ax+e$ and $|e|\leq \eta$, the vector $x$ is feasible for the basis pursuit denoising problem:
\begin{align*}
|Ax-y| = |e| \leq \eta.
\end{align*}
Since $\hat{x}$ minimizes the $\ell^1$ norm over the feasible set, we have
\begin{align*}
\|\hat{x}\|_1 \leq \|x\|_1.
\end{align*}
Using $\hat{x}=x+h$ and splitting over $S$ and $S^c$ gives the following chain of inequalities. First,
\begin{align*}
\|x_S\|_1+\|x_{S^c}\|_1 = \|x\|_1.
\end{align*}
Minimality gives
\begin{align*}
\|x\|_1 \geq \|x+h\|_1.
\end{align*}
The disjoint support decomposition gives
\begin{align*}
\|x+h\|_1 = \|x_S+h_S\|_1+\|x_{S^c}+h_{S^c}\|_1.
\end{align*}
Applying the [reverse triangle inequality](/theorems/2300) on each block gives
\begin{align*}
\|x_S+h_S\|_1+\|x_{S^c}+h_{S^c}\|_1 \geq \|x_S\|_1-\|h_S\|_1+\|h_{S^c}\|_1-\|x_{S^c}\|_1.
\end{align*}
Canceling $\|x_S\|_1$ and rearranging yields
\begin{align*}
\|h_{S^c}\|_1 \leq \|h_S\|_1 + 2\|x_{S^c}\|_1.
\end{align*}[/step]
custom_env
admin
[guided]The minimization problem tells us that $\hat{x}$ cannot have larger $\ell^1$ norm than any feasible competitor. The true vector $x$ is feasible because the data have the form $y=Ax+e$ and the noise satisfies $|e|\leq \eta$:
\begin{align*}
|Ax-y| = |Ax-(Ax+e)| = |-e| = |e| \leq \eta.
\end{align*}
Therefore the defining property of $\hat{x}$ gives
\begin{align*}
\|\hat{x}\|_1 \leq \|x\|_1.
\end{align*}
We now insert the error vector $h=\hat{x}-x$, so that $\hat{x}=x+h$. The point of splitting into $S$ and $S^c$ is that $x_{S^c}$ is exactly the compressibility error of $x$. Since the supports $S$ and $S^c$ are disjoint, the $\ell^1$ norm splits additively:
\begin{align*}
\|x\|_1 = \|x_S\|_1+\|x_{S^c}\|_1
\end{align*}
and
\begin{align*}
\|x+h\|_1
= \|x_S+h_S\|_1+\|x_{S^c}+h_{S^c}\|_1.
\end{align*}
Applying the reverse triangle inequality separately on the two coordinate blocks gives
\begin{align*}
\|x_S+h_S\|_1 \geq \|x_S\|_1-\|h_S\|_1.
\end{align*}
It also gives
\begin{align*}
\|x_{S^c}+h_{S^c}\|_1 \geq \|h_{S^c}\|_1-\|x_{S^c}\|_1.
\end{align*}
Combining these estimates with $\|x+h\|_1 \leq \|x\|_1$ gives the following sequence. First,
\begin{align*}
\|x_S\|_1+\|x_{S^c}\|_1 \geq \|x+h\|_1.
\end{align*}
Next, the disjoint support decomposition gives
\begin{align*}
\|x+h\|_1 = \|x_S+h_S\|_1+\|x_{S^c}+h_{S^c}\|_1.
\end{align*}
Finally, the two reverse triangle inequalities give
\begin{align*}
\|x_S+h_S\|_1+\|x_{S^c}+h_{S^c}\|_1 \geq \|x_S\|_1-\|h_S\|_1+\|h_{S^c}\|_1-\|x_{S^c}\|_1.
\end{align*}
After canceling $\|x_S\|_1$ and moving the remaining terms to the opposite sides, we obtain the cone inequality
\begin{align*}
\|h_{S^c}\|_1 \leq \|h_S\|_1 + 2\|x_{S^c}\|_1.
\end{align*}
This inequality is the place where the $\ell^1$ minimization structure enters the proof.[/guided]
custom_env
admin
[step:Use feasibility to bound the measurement error]Since $\hat{x}$ is feasible and $x$ is feasible, the triangle inequality in $\mathbb{R}^m$ gives the following computation. Because $h=\hat{x}-x$,
\begin{align*}
|Ah| = |A\hat{x}-Ax|.
\end{align*}
Inserting and subtracting $y$ gives
\begin{align*}
|A\hat{x}-Ax| = |(A\hat{x}-y)+(y-Ax)|.
\end{align*}
The triangle inequality gives
\begin{align*}
|(A\hat{x}-y)+(y-Ax)| \leq |A\hat{x}-y|+|y-Ax|.
\end{align*}
Feasibility of $\hat{x}$ and $x$ gives
\begin{align*}
|A\hat{x}-y|+|y-Ax| \leq \eta+\eta = 2\eta.
\end{align*}[/step]
custom_env
admin
[guided]The quantity needed by the RIP-based recovery estimate is $|Ah|$, the size of the measurement residual of the error vector. Since $h=\hat{x}-x$, linearity of $A$ gives
\begin{align*}
Ah=A\hat{x}-Ax.
\end{align*}
We insert and subtract the observed data vector $y$:
\begin{align*}
A\hat{x}-Ax=(A\hat{x}-y)+(y-Ax).
\end{align*}
Taking the Euclidean norm in $\mathbb{R}^m$ and applying the triangle inequality gives
\begin{align*}
|Ah|=|A\hat{x}-Ax|\leq |A\hat{x}-y|+|y-Ax|.
\end{align*}
The vector $\hat{x}$ is feasible by definition of a basis pursuit denoising solution, so $|A\hat{x}-y|\leq\eta$. The true vector $x$ is feasible because $y=Ax+e$ and $|e|\leq\eta$, hence
\begin{align*}
|y-Ax|=|e|\leq\eta.
\end{align*}
Therefore
\begin{align*}
|Ah|\leq 2\eta.
\end{align*}[/guided]
custom_env
admin
[step:Apply the robust sparse recovery estimate supplied by RIP]We use the standard compressed sensing theorem that RIP of order $2k$ with sufficiently small restricted isometry constant implies robust sparse recovery. Precisely, there exist universal constants $\delta_*\in(0,1)$, $D_0>0$, and $D_1>0$ such that whenever $\delta_{2k}(A)\leq \delta_*$, every vector $z\in\mathbb{R}^N$, every vector $w\in\mathbb{R}^N$, and every set $T\subset\{1,\dots,N\}$ with $|T|\leq k$ satisfying the cone inequality
\begin{align*}
\|w_{T^c}\|_1 \leq \|w_T\|_1+2\|z_{T^c}\|_1
\end{align*}
also satisfy
\begin{align*}
|w| \leq D_0\frac{\|z_{T^c}\|_1}{\sqrt{k}}+D_1|Aw|.
\end{align*}
This is the standard robust recovery consequence of the [Restricted Isometry Property](/page/Restricted%20Isometry%20Property), proved by decomposing $T^c$ into blocks of size $k$ and applying RIP to the resulting sparse blocks.
The hypotheses of this result are satisfied with $z=x$, $w=h$, and $T=S$: the theorem assumes $\delta_{2k}(A)\leq\delta_*$, the set $S$ has cardinality at most $k$, and the preceding cone inequality gives
\begin{align*}
\|h_{S^c}\|_1 \leq \|h_S\|_1+2\|x_{S^c}\|_1.
\end{align*}
Therefore
\begin{align*}
|h| \leq D_0\frac{\|x_{S^c}\|_1}{\sqrt{k}}+D_1|Ah|.
\end{align*}[/step]
custom_env
admin
[guided]The RIP hypothesis is not used directly in the cone argument. Its role is to supply a robust sparse recovery estimate. The standard theorem says that there are universal constants $\delta_*\in(0,1)$, $D_0>0$, and $D_1>0$ such that, if $\delta_{2k}(A)\leq\delta_*$ and a vector $w\in\mathbb{R}^N$ satisfies
\begin{align*}
\|w_{T^c}\|_1 \leq \|w_T\|_1+2\|z_{T^c}\|_1
\end{align*}
for some vector $z\in\mathbb{R}^N$ and some set $T\subset\{1,\dots,N\}$ with $|T|\leq k$, then
\begin{align*}
|w| \leq D_0\frac{\|z_{T^c}\|_1}{\sqrt{k}}+D_1|Aw|.
\end{align*}
This is the usual robust recovery consequence of the [Restricted Isometry Property](/page/Restricted%20Isometry%20Property). The proof of that theorem is the block-decomposition argument: one partitions $T^c$ into successive blocks of size $k$, applies RIP to the sparse blocks, and sums the tail using the cone inequality. That block step is exactly what prevents a non-universal factor such as $\sqrt{k}$ from appearing in the final Euclidean estimate.
We now verify the hypotheses in the present setting. The theorem assumes $\delta_{2k}(A)\leq\delta_*$. The vector $h=\hat{x}-x$ belongs to $\mathbb{R}^N$, the vector $x$ belongs to $\mathbb{R}^N$, and the set $S$ was chosen with $|S|\leq k$. The cone inequality proved from $\ell^1$ minimality is
\begin{align*}
\|h_{S^c}\|_1 \leq \|h_S\|_1+2\|x_{S^c}\|_1.
\end{align*}
Thus the robust sparse recovery estimate applies with $z=x$, $w=h$, and $T=S$, yielding
\begin{align*}
|h| \leq D_0\frac{\|x_{S^c}\|_1}{\sqrt{k}}+D_1|Ah|.
\end{align*}
The important point is that this estimate already controls the full Euclidean error $|h|$; no separate estimate of $|h_{S^c}|$ is needed.[/guided]
custom_env
admin
[step:Insert the tail and noise bounds into the full-error estimate]The robust sparse recovery estimate gives
\begin{align*}
|h| \leq D_0\frac{\|x_{S^c}\|_1}{\sqrt{k}}+D_1|Ah|.
\end{align*}
By the choice of $S$,
\begin{align*}
\|x_{S^c}\|_1 = \sigma_k(x)_1.
\end{align*}
By feasibility,
\begin{align*}
|Ah|\leq 2\eta.
\end{align*}
Therefore
\begin{align*}
|\hat{x}-x| = |h|.
\end{align*}
Combining these three facts gives
\begin{align*}
|h| \leq D_0\frac{\sigma_k(x)_1}{\sqrt{k}}+2D_1\eta.
\end{align*}
Define the universal constant $C_0$ by
\begin{align*}
C_0 := D_0.
\end{align*}
Define the universal constant $C_1$ by
\begin{align*}
C_1 := 2D_1.
\end{align*}
Since $D_0$ and $D_1$ are universal constants depending only on the chosen RIP threshold $\delta_*$, the constants $C_0$ and $C_1$ are universal. This proves
\begin{align*}
|\hat{x}-x|\leq C_0\frac{\sigma_k(x)_1}{\sqrt{k}}+C_1\eta,
\end{align*}
as required.[/step]
custom_env
admin
[guided]At this point the full Euclidean error estimate has already been obtained:
\begin{align*}
|h| \leq D_0\frac{\|x_{S^c}\|_1}{\sqrt{k}}+D_1|Ah|.
\end{align*}
We now replace the two quantities on the right-hand side by the quantities appearing in the theorem statement. First, the set $S$ was chosen to index $k$ largest coordinates of $x$ in absolute value, so the off-support $\ell^1$ mass is exactly the best $k$-term approximation error:
\begin{align*}
\|x_{S^c}\|_1 = \sigma_k(x)_1.
\end{align*}
Second, feasibility of $x$ and $\hat{x}$ was proved above and gives
\begin{align*}
|Ah|\leq 2\eta.
\end{align*}
Substituting these two estimates into the robust recovery bound yields
\begin{align*}
|h| \leq D_0\frac{\sigma_k(x)_1}{\sqrt{k}}+2D_1\eta.
\end{align*}
Finally $h=\hat{x}-x$, so
\begin{align*}
|\hat{x}-x|=|h|.
\end{align*}
Define
\begin{align*}
C_0 := D_0
\end{align*}
and
\begin{align*}
C_1 := 2D_1.
\end{align*}
The constants $D_0$ and $D_1$ are universal because they come from the RIP-to-recovery theorem for the fixed threshold $\delta_*$. Hence $C_0$ and $C_1$ are universal, and the desired estimate follows:
\begin{align*}
|\hat{x}-x|\leq C_0\frac{\sigma_k(x)_1}{\sqrt{k}}+C_1\eta.
\end{align*}[/guided]