[proofplan]
The proof has three ingredients. First, optimality of the Dantzig selector and the support condition on $\beta^*$ force the error vector $\Delta$ into the cone $\mathcal{C}(S)$. Second, feasibility of both $\hat{\beta}_{\mathrm{DS}}$ and $\beta^*$ controls the empirical score $\|X^\top X\Delta/n\|_\infty$ by $2\lambda$. Finally, the cone condition and the restricted eigenvalue condition convert this score bound into a prediction estimate, and substituting that estimate back gives the $\ell^1$ estimate.
[/proofplan]
[step:Use Dantzig optimality to place the error in the cone]
Since $\beta^* \in \mathcal{D}_\lambda$ and $\hat{\beta}_{\mathrm{DS}}$ minimizes the $\ell^1$ norm over $\mathcal{D}_\lambda$, we have
\begin{align*}
\|\hat{\beta}_{\mathrm{DS}}\|_1 \leq \|\beta^*\|_1.
\end{align*}
Using $\hat{\beta}_{\mathrm{DS}} = \beta^* + \Delta$ and $\beta^*_{S^c}=0$, this gives
\begin{align*}
\|\beta^*_S + \Delta_S\|_1 + \|\Delta_{S^c}\|_1
= \|\beta^* + \Delta\|_1
\leq \|\beta^*\|_1
= \|\beta^*_S\|_1.
\end{align*}
By the [Reverse Triangle Inequality](/page/Reverse%20Triangle%20Inequality) for the $\ell^1$ norm on $\mathbb{R}^p$,
\begin{align*}
\|\beta^*_S + \Delta_S\|_1 \geq \|\beta^*_S\|_1 - \|\Delta_S\|_1.
\end{align*}
Combining the last two displays yields
\begin{align*}
\|\Delta_{S^c}\|_1 \leq \|\Delta_S\|_1.
\end{align*}
Thus $\Delta \in \mathcal{C}(S)$. If $s=0$, then $S=\varnothing$, so the cone condition gives
\begin{align*}
\|\Delta\|_1=\|\Delta_{S^c}\|_1 \leq \|\Delta_S\|_1=0.
\end{align*}
Hence $\Delta=0$, and both asserted inequalities hold because $\kappa(S,1)>0$ makes the displayed denominators positive and the right-hand sides equal $0$. For the remaining steps, assume $s\geq 1$.
[guided]
The first point is that the Dantzig selector is not merely feasible; it is an $\ell^1$-minimizer among all feasible vectors. Since $\beta^*$ is feasible by hypothesis, the minimizing property gives
\begin{align*}
\|\hat{\beta}_{\mathrm{DS}}\|_1 \leq \|\beta^*\|_1.
\end{align*}
We now rewrite this inequality in terms of the error vector $\Delta := \hat{\beta}_{\mathrm{DS}}-\beta^*$. Because $S=\operatorname{supp}(\beta^*)$, the vector $\beta^*$ has no coordinates outside $S$, so $\beta^*_{S^c}=0$. Therefore
\begin{align*}
\|\hat{\beta}_{\mathrm{DS}}\|_1
= \|\beta^*+\Delta\|_1
= \|\beta^*_S+\Delta_S\|_1+\|\Delta_{S^c}\|_1.
\end{align*}
The right-hand side of the optimality inequality is
\begin{align*}
\|\beta^*\|_1=\|\beta^*_S\|_1.
\end{align*}
Hence
\begin{align*}
\|\beta^*_S+\Delta_S\|_1+\|\Delta_{S^c}\|_1 \leq \|\beta^*_S\|_1.
\end{align*}
The only remaining issue is to isolate $\|\Delta_{S^c}\|_1$. The [Reverse Triangle Inequality](/page/Reverse%20Triangle%20Inequality) for the $\ell^1$ norm on $\mathbb{R}^p$ gives
\begin{align*}
\|\beta^*_S+\Delta_S\|_1 \geq \|\beta^*_S\|_1-\|\Delta_S\|_1.
\end{align*}
Substituting this lower bound into the previous inequality gives
\begin{align*}
\|\beta^*_S\|_1-\|\Delta_S\|_1+\|\Delta_{S^c}\|_1 \leq \|\beta^*_S\|_1.
\end{align*}
After subtracting $\|\beta^*_S\|_1$ from both sides, we obtain
\begin{align*}
\|\Delta_{S^c}\|_1 \leq \|\Delta_S\|_1.
\end{align*}
This is exactly the cone condition $\Delta \in \mathcal{C}(S)$. We also handle the edge case in which the support is empty. If $s=0$, then $S=\varnothing$, and the cone condition reads
\begin{align*}
\|\Delta\|_1=\|\Delta_{S^c}\|_1 \leq \|\Delta_S\|_1=0.
\end{align*}
Therefore $\Delta=0$. Since $\kappa(S,1)>0$, the denominators in the two claimed right-hand sides are positive; since $s=0$, both right-hand sides are $0$. Thus both desired inequalities hold in this case. From now on, we may assume $s\geq 1$.
[/guided]
[/step]
[step:Use feasibility of both vectors to bound the empirical score]
Define the residual map $R: \mathbb{R}^p \to \mathbb{R}^n$ by declaring that, for each $\beta \in \mathbb{R}^p$, the value $R(\beta) \in \mathbb{R}^n$ is
\begin{align*}
R(\beta)=y-X\beta.
\end{align*}
Since $\hat{\beta}_{\mathrm{DS}},\beta^* \in \mathcal{D}_\lambda$, we have
\begin{align*}
\left\|\frac{X^\top R(\hat{\beta}_{\mathrm{DS}})}{n}\right\|_\infty \leq \lambda,
\qquad
\left\|\frac{X^\top R(\beta^*)}{n}\right\|_\infty \leq \lambda.
\end{align*}
Because
\begin{align*}
R(\beta^*)-R(\hat{\beta}_{\mathrm{DS}})
= y-X\beta^* - y + X\hat{\beta}_{\mathrm{DS}}
= X\Delta,
\end{align*}
we first obtain
\begin{align*}
\left\|\frac{X^\top X\Delta}{n}\right\|_\infty
=
\left\|\frac{X^\top R(\beta^*)}{n}-\frac{X^\top R(\hat{\beta}_{\mathrm{DS}})}{n}\right\|_\infty.
\end{align*}
Applying the triangle inequality for the maximum norm $\|\cdot\|_\infty$ on $\mathbb{R}^p$ gives
\begin{align*}
\left\|\frac{X^\top X\Delta}{n}\right\|_\infty
\leq
\left\|\frac{X^\top R(\beta^*)}{n}\right\|_\infty
+
\left\|\frac{X^\top R(\hat{\beta}_{\mathrm{DS}})}{n}\right\|_\infty.
\end{align*}
Using the two feasibility bounds yields
\begin{align*}
\left\|\frac{X^\top X\Delta}{n}\right\|_\infty \leq 2\lambda.
\end{align*}
[guided]
We next translate feasibility into a bound on the empirical score of the error vector. Define the residual map $R: \mathbb{R}^p \to \mathbb{R}^n$ by declaring that, for each $\beta \in \mathbb{R}^p$, the value $R(\beta) \in \mathbb{R}^n$ is
\begin{align*}
R(\beta)=y-X\beta.
\end{align*}
The definition of feasibility in the Dantzig set $\mathcal{D}_\lambda$ gives the two bounds
\begin{align*}
\left\|\frac{X^\top R(\hat{\beta}_{\mathrm{DS}})}{n}\right\|_\infty \leq \lambda,
\qquad
\left\|\frac{X^\top R(\beta^*)}{n}\right\|_\infty \leq \lambda.
\end{align*}
The key point is that the difference of these residuals is exactly the fitted-value error. Since $\Delta=\hat{\beta}_{\mathrm{DS}}-\beta^*$, expanding the two residuals gives
\begin{align*}
R(\beta^*)-R(\hat{\beta}_{\mathrm{DS}})=y-X\beta^* - y + X\hat{\beta}_{\mathrm{DS}}.
\end{align*}
Cancelling the two occurrences of $y$ and factoring out $X$ gives
\begin{align*}
R(\beta^*)-R(\hat{\beta}_{\mathrm{DS}})=X(\hat{\beta}_{\mathrm{DS}}-\beta^*)=X\Delta.
\end{align*}
Multiplying by $X^\top/n$ gives
\begin{align*}
\frac{X^\top X\Delta}{n}
=
\frac{X^\top R(\beta^*)}{n}-\frac{X^\top R(\hat{\beta}_{\mathrm{DS}})}{n}.
\end{align*}
Applying the triangle inequality for the maximum norm $\|\cdot\|_\infty$ on $\mathbb{R}^p$ gives
\begin{align*}
\left\|\frac{X^\top X\Delta}{n}\right\|_\infty
\leq
\left\|\frac{X^\top R(\beta^*)}{n}\right\|_\infty
+
\left\|\frac{X^\top R(\hat{\beta}_{\mathrm{DS}})}{n}\right\|_\infty.
\end{align*}
Substituting the two feasibility estimates yields
\begin{align*}
\left\|\frac{X^\top X\Delta}{n}\right\|_\infty \leq \lambda+\lambda=2\lambda.
\end{align*}
[/guided]
[/step]
[step:Convert the score bound into a prediction bound]
The Euclidean identity for the matrix $X$ and vector $\Delta$ gives
\begin{align*}
\frac{\|X\Delta\|_2^2}{n}
=
\frac{\Delta^\top X^\top X\Delta}{n}.
\end{align*}
Applying [Hölder's Inequality](/page/Holder%20Inequality) for the conjugate pair $(1,\infty)$ to the vectors $\Delta$ and $X^\top X\Delta/n$ in $\mathbb{R}^p$, and then using the score bound, yields
\begin{align*}
\frac{\|X\Delta\|_2^2}{n}
\leq
\|\Delta\|_1\left\|\frac{X^\top X\Delta}{n}\right\|_\infty
\leq
2\lambda\|\Delta\|_1.
\end{align*}
Since $\Delta \in \mathcal{C}(S)$,
\begin{align*}
\|\Delta\|_1
=
\|\Delta_S\|_1+\|\Delta_{S^c}\|_1
\leq
2\|\Delta_S\|_1.
\end{align*}
Enumerate the finite set $S$ as $S=\{j_1,\dots,j_s\}$, and regard $\Delta_S$ as the vector $(\Delta_{j_1},\dots,\Delta_{j_s}) \in \mathbb{R}^s$. By the [Cauchy-Schwarz Inequality](/page/Cauchy-Schwarz%20Inequality) in $\mathbb{R}^s$,
\begin{align*}
\|\Delta_S\|_1 \leq \sqrt{s}\,\|\Delta_S\|_2.
\end{align*}
Thus
\begin{align*}
\|\Delta\|_1 \leq 2\sqrt{s}\,\|\Delta_S\|_2.
\end{align*}
If $\Delta_S=0$, the cone condition gives $\|\Delta_{S^c}\|_1=0$, hence $\Delta=0$, and both claimed inequalities follow. Assume therefore that $\Delta_S \neq 0$. Since $\Delta \in \mathcal{C}(S)$, the definition of $\kappa(S,1)$ gives
\begin{align*}
\kappa(S,1)
\leq
\frac{\|X\Delta\|_2}{\sqrt{n}\,\|\Delta_S\|_2},
\end{align*}
or equivalently
\begin{align*}
\|\Delta_S\|_2 \leq \frac{\|X\Delta\|_2}{\sqrt{n}\,\kappa(S,1)}.
\end{align*}
Combining the preceding estimates gives
\begin{align*}
\frac{\|X\Delta\|_2^2}{n}
\leq
2\lambda\|\Delta\|_1
\leq
4\lambda\sqrt{s}\,\|\Delta_S\|_2
\leq
\frac{4\lambda\sqrt{s}\,\|X\Delta\|_2}{\sqrt{n}\,\kappa(S,1)}.
\end{align*}
If $\|X\Delta\|_2=0$, the desired prediction bound is immediate. If $\|X\Delta\|_2>0$, divide by $\|X\Delta\|_2/\sqrt{n}$ to obtain
\begin{align*}
\frac{\|X\Delta\|_2}{\sqrt{n}}
\leq
\frac{4\lambda\sqrt{s}}{\kappa(S,1)}.
\end{align*}
Squaring both sides gives
\begin{align*}
\frac{\|X\Delta\|_2^2}{n}
\leq
\frac{16\lambda^2 s}{\kappa(S,1)^2}.
\end{align*}
[guided]
We now combine the score control with the restricted eigenvalue condition. First, the Euclidean identity for the matrix $X$ and vector $\Delta$ gives
\begin{align*}
\frac{\|X\Delta\|_2^2}{n}
=
\frac{\Delta^\top X^\top X\Delta}{n}.
\end{align*}
Apply [Hölder's Inequality](/page/Holder%20Inequality) with conjugate exponents $(1,\infty)$ to the two vectors $\Delta \in \mathbb{R}^p$ and $X^\top X\Delta/n \in \mathbb{R}^p$. This gives
\begin{align*}
\frac{\|X\Delta\|_2^2}{n}
\leq
\|\Delta\|_1\left\|\frac{X^\top X\Delta}{n}\right\|_\infty.
\end{align*}
Using the score bound from the previous step, we obtain
\begin{align*}
\frac{\|X\Delta\|_2^2}{n}
\leq 2\lambda\|\Delta\|_1.
\end{align*}
The cone condition now reduces the full $\ell^1$ norm to the active coordinates:
\begin{align*}
\|\Delta\|_1
=
\|\Delta_S\|_1+\|\Delta_{S^c}\|_1
\leq
2\|\Delta_S\|_1.
\end{align*}
Enumerate the finite set $S$ as $S=\{j_1,\dots,j_s\}$, and regard $\Delta_S$ as the vector $(\Delta_{j_1},\dots,\Delta_{j_s}) \in \mathbb{R}^s$. Applying the [Cauchy-Schwarz Inequality](/page/Cauchy-Schwarz%20Inequality) in $\mathbb{R}^s$ gives
\begin{align*}
\|\Delta_S\|_1 \leq \sqrt{s}\,\|\Delta_S\|_2.
\end{align*}
Therefore
\begin{align*}
\|\Delta\|_1 \leq 2\sqrt{s}\,\|\Delta_S\|_2.
\end{align*}
If $\Delta_S=0$, the cone condition gives $\|\Delta_{S^c}\|_1=0$, hence $\Delta=0$, and the prediction inequality follows. Assume $\Delta_S\neq 0$. Since $\Delta\in\mathcal{C}(S)$, the definition of the restricted eigenvalue constant $\kappa(S,1)$ gives
\begin{align*}
\kappa(S,1)
\leq
\frac{\|X\Delta\|_2}{\sqrt{n}\,\|\Delta_S\|_2},
\end{align*}
so
\begin{align*}
\|\Delta_S\|_2 \leq \frac{\|X\Delta\|_2}{\sqrt{n}\,\kappa(S,1)}.
\end{align*}
Combining the estimates yields
\begin{align*}
\frac{\|X\Delta\|_2^2}{n}
\leq
2\lambda\|\Delta\|_1
\leq
4\lambda\sqrt{s}\,\|\Delta_S\|_2
\leq
\frac{4\lambda\sqrt{s}\,\|X\Delta\|_2}{\sqrt{n}\,\kappa(S,1)}.
\end{align*}
If $\|X\Delta\|_2=0$, the desired prediction bound follows. If $\|X\Delta\|_2>0$, divide by $\|X\Delta\|_2/\sqrt{n}$ to obtain
\begin{align*}
\frac{\|X\Delta\|_2}{\sqrt{n}}
\leq
\frac{4\lambda\sqrt{s}}{\kappa(S,1)}.
\end{align*}
Squaring both sides gives
\begin{align*}
\frac{\|X\Delta\|_2^2}{n}
\leq
\frac{16\lambda^2 s}{\kappa(S,1)^2}.
\end{align*}
[/guided]
[/step]
[step:Substitute the prediction estimate to obtain the $\ell^1$ estimate]
From the cone condition and the finite-dimensional Cauchy-Schwarz estimate on the support coordinates,
\begin{align*}
\|\Delta\|_1
\leq
2\sqrt{s}\,\|\Delta_S\|_2.
\end{align*}
If $\Delta_S=0$, then the cone condition gives $\|\Delta_{S^c}\|_1=0$, hence $\Delta=0$, and the claimed $\ell^1$ estimate follows. Assume therefore that $\Delta_S\neq 0$. Since $\Delta\in\mathcal{C}(S)$, the definition of $\kappa(S,1)$ gives
\begin{align*}
\|\Delta_S\|_2
\leq
\frac{\|X\Delta\|_2}{\sqrt{n}\,\kappa(S,1)}.
\end{align*}
Combining the last two displays gives
\begin{align*}
\|\Delta\|_1
\leq
\frac{2\sqrt{s}\,\|X\Delta\|_2}{\sqrt{n}\,\kappa(S,1)}.
\end{align*}
Using the prediction estimate
\begin{align*}
\frac{\|X\Delta\|_2}{\sqrt{n}}
\leq
\frac{4\lambda\sqrt{s}}{\kappa(S,1)},
\end{align*}
we obtain
\begin{align*}
\|\Delta\|_1
\leq
\frac{2\sqrt{s}}{\kappa(S,1)}
\cdot
\frac{4\lambda\sqrt{s}}{\kappa(S,1)}
=
\frac{8\lambda s}{\kappa(S,1)^2}.
\end{align*}
Together with the prediction bound, this proves both asserted inequalities.
[guided]
The prediction estimate already proved can now be fed back into the cone and restricted eigenvalue estimates to control $\|\Delta\|_1$. From the cone condition and the finite-dimensional Cauchy-Schwarz estimate on the support coordinates, we have
\begin{align*}
\|\Delta\|_1
\leq
2\sqrt{s}\,\|\Delta_S\|_2.
\end{align*}
If $\Delta_S=0$, then this already gives $\|\Delta\|_1=0$, so the desired $\ell^1$ bound holds. If $\Delta_S\neq 0$, the restricted eigenvalue estimate gives
\begin{align*}
\|\Delta_S\|_2
\leq
\frac{\|X\Delta\|_2}{\sqrt{n}\,\kappa(S,1)}.
\end{align*}
Therefore
\begin{align*}
\|\Delta\|_1
\leq
\frac{2\sqrt{s}\,\|X\Delta\|_2}{\sqrt{n}\,\kappa(S,1)}.
\end{align*}
The prediction bound gives
\begin{align*}
\frac{\|X\Delta\|_2}{\sqrt{n}}
\leq
\frac{4\lambda\sqrt{s}}{\kappa(S,1)}.
\end{align*}
Substituting this into the preceding display yields
\begin{align*}
\|\Delta\|_1
\leq
\frac{2\sqrt{s}}{\kappa(S,1)}
\cdot
\frac{4\lambda\sqrt{s}}{\kappa(S,1)}
=
\frac{8\lambda s}{\kappa(S,1)^2}.
\end{align*}
Together with the prediction bound, this proves both asserted inequalities.
[/guided]
[/step]