[proofplan]
We compare the Lasso objective at $\hat{\beta}$ and at the true vector $\beta^*$ to obtain the basic inequality. The score bound controls the stochastic [inner product](/page/Inner%20Product), while the support condition $\operatorname{supp}(\beta^*) \subset S$ separates the $\ell^1$ penalty over $S$ and $S^c$. This yields the cone condition $|\Delta_{S^c}|_1 \leq 3|\Delta_S|_1$ for the error $\Delta := \hat{\beta}-\beta^*$, allowing the restricted eigenvalue condition to convert prediction error into control of $|\Delta_S|_2$. The prediction bound follows by rearrangement, and the $\ell^1$ bound follows by combining the cone condition with the same restricted eigenvalue estimate.
[/proofplan]
[step:Derive the basic inequality from optimality of the Lasso estimator]
Define the estimation error $\Delta \in \mathbb{R}^p$ by
\begin{align*}
\Delta := \hat{\beta} - \beta^*.
\end{align*}
Since $\hat{\beta}$ minimizes the Lasso objective over $\mathbb{R}^p$, evaluating the objective at $\hat{\beta}$ and at $\beta^*$ gives
\begin{align*}
\frac{1}{2n}|Y - X\hat{\beta}|^2 + \lambda |\hat{\beta}|_1
\leq
\frac{1}{2n}|Y - X\beta^*|^2 + \lambda |\beta^*|_1.
\end{align*}
Using $Y = X\beta^* + \varepsilon$ and $\hat{\beta} = \beta^* + \Delta$, we have
\begin{align*}
Y - X\hat{\beta}
=
\varepsilon - X\Delta,
\qquad
Y - X\beta^*
=
\varepsilon.
\end{align*}
Substitution gives
\begin{align*}
\frac{1}{2n}|\varepsilon - X\Delta|^2 + \lambda |\beta^*+\Delta|_1
\leq
\frac{1}{2n}|\varepsilon|^2 + \lambda |\beta^*|_1.
\end{align*}
Expanding the square,
\begin{align*}
|\varepsilon - X\Delta|^2
=
|\varepsilon|^2 - 2\varepsilon^\top X\Delta + |X\Delta|^2.
\end{align*}
After cancellation of $|\varepsilon|^2/(2n)$, this becomes
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{\varepsilon^\top X\Delta}{n}
+
\lambda\bigl(|\beta^*|_1 - |\beta^*+\Delta|_1\bigr).
\end{align*}
[/step]
[step:Use the score bound and support decomposition to place the error in the cone]
The score bound gives
\begin{align*}
\frac{\varepsilon^\top X\Delta}{n}
=
\left(\frac{X^\top \varepsilon}{n}\right)^\top \Delta
\leq
\left|\frac{X^\top \varepsilon}{n}\right|_\infty |\Delta|_1
\leq
\frac{\lambda}{2}|\Delta|_1.
\end{align*}
Since $\operatorname{supp}(\beta^*) \subset S$, one has $\beta^*_{S^c}=0$. Therefore,
\begin{align*}
|\beta^*|_1 - |\beta^*+\Delta|_1
=
|\beta^*_S|_1 - |\beta^*_S+\Delta_S|_1 - |\Delta_{S^c}|_1.
\end{align*}
The [reverse triangle inequality](/theorems/2300) on $\mathbb{R}^{|S|}$ applied to $\beta^*_S$ and $\beta^*_S+\Delta_S$ gives
\begin{align*}
|\beta^*_S|_1 - |\beta^*_S+\Delta_S|_1 \leq |\Delta_S|_1.
\end{align*}
Hence
\begin{align*}
|\beta^*|_1 - |\beta^*+\Delta|_1
\leq
|\Delta_S|_1 - |\Delta_{S^c}|_1.
\end{align*}
Combining these estimates with the basic inequality gives
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{\lambda}{2}\bigl(|\Delta_S|_1 + |\Delta_{S^c}|_1\bigr)
+
\lambda\bigl(|\Delta_S|_1 - |\Delta_{S^c}|_1\bigr).
\end{align*}
Collecting the coefficients of $|\Delta_S|_1$ and $|\Delta_{S^c}|_1$ yields
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{3\lambda}{2}|\Delta_S|_1 - \frac{\lambda}{2}|\Delta_{S^c}|_1.
\end{align*}
Since the left-hand side is non-negative and $\lambda > 0$, it follows that
\begin{align*}
|\Delta_{S^c}|_1 \leq 3|\Delta_S|_1.
\end{align*}
[guided]
The goal of this step is to prove that the Lasso error $\Delta$ lies in the cone required by the restricted eigenvalue hypothesis. We begin with the stochastic term from the basic inequality. Because
\begin{align*}
\frac{\varepsilon^\top X\Delta}{n}
=
\left(\frac{X^\top \varepsilon}{n}\right)^\top \Delta,
\end{align*}
we can apply the finite-dimensional duality inequality between $|\cdot|_\infty$ and $|\cdot|_1$:
\begin{align*}
\left(\frac{X^\top \varepsilon}{n}\right)^\top \Delta
\leq
\left|\frac{X^\top \varepsilon}{n}\right|_\infty |\Delta|_1.
\end{align*}
The score assumption then gives
\begin{align*}
\frac{\varepsilon^\top X\Delta}{n}
\leq
\frac{\lambda}{2}|\Delta|_1
=
\frac{\lambda}{2}\bigl(|\Delta_S|_1 + |\Delta_{S^c}|_1\bigr).
\end{align*}
Next we use sparsity. Since $\operatorname{supp}(\beta^*) \subset S$, the vector $\beta^*$ vanishes on $S^c$, so $\beta^*_{S^c}=0$. Hence the $\ell^1$ penalty separates as
\begin{align*}
|\beta^*+\Delta|_1
=
|\beta^*_S+\Delta_S|_1 + |\Delta_{S^c}|_1.
\end{align*}
Also $|\beta^*|_1 = |\beta^*_S|_1$. Therefore
\begin{align*}
|\beta^*|_1 - |\beta^*+\Delta|_1
=
|\beta^*_S|_1 - |\beta^*_S+\Delta_S|_1 - |\Delta_{S^c}|_1.
\end{align*}
By the reverse triangle inequality on the coordinates in $S$,
\begin{align*}
|\beta^*_S|_1 - |\beta^*_S+\Delta_S|_1 \leq |\Delta_S|_1.
\end{align*}
Thus
\begin{align*}
|\beta^*|_1 - |\beta^*+\Delta|_1
\leq
|\Delta_S|_1 - |\Delta_{S^c}|_1.
\end{align*}
Substituting the stochastic estimate and the penalty estimate into the basic inequality yields
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{\lambda}{2}\bigl(|\Delta_S|_1 + |\Delta_{S^c}|_1\bigr)
+
\lambda\bigl(|\Delta_S|_1 - |\Delta_{S^c}|_1\bigr).
\end{align*}
Collecting terms gives
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{3\lambda}{2}|\Delta_S|_1 - \frac{\lambda}{2}|\Delta_{S^c}|_1.
\end{align*}
The left-hand side is non-negative because it is a squared Euclidean norm divided by $2n$. Hence the right-hand side must be non-negative:
\begin{align*}
0
\leq
\frac{3\lambda}{2}|\Delta_S|_1 - \frac{\lambda}{2}|\Delta_{S^c}|_1.
\end{align*}
Since $\lambda > 0$, multiplying by $2/\lambda$ gives the cone condition
\begin{align*}
|\Delta_{S^c}|_1 \leq 3|\Delta_S|_1.
\end{align*}
This is exactly the structural condition needed to invoke the restricted eigenvalue hypothesis.
[/guided]
[/step]
[step:Apply the restricted eigenvalue condition to bound the prediction error]
The preceding step proves that $\Delta$ satisfies
\begin{align*}
|\Delta_{S^c}|_1 \leq 3|\Delta_S|_1.
\end{align*}
By the restricted eigenvalue condition on $(S,3)$,
\begin{align*}
\frac{1}{n}|X\Delta|^2 \geq \kappa^2 |\Delta_S|_2^2.
\end{align*}
Also, since $\Delta_S$ is supported on at most $s$ coordinates, the [Cauchy-Schwarz inequality](/theorems/432) in $\mathbb{R}^s$ gives
\begin{align*}
|\Delta_S|_1 \leq \sqrt{s}\,|\Delta_S|_2.
\end{align*}
From the sharpened basic inequality in the previous step,
\begin{align*}
\frac{1}{n}|X\Delta|^2
\leq
3\lambda |\Delta_S|_1.
\end{align*}
Combining the last three estimates gives
\begin{align*}
\frac{1}{n}|X\Delta|^2
\leq
3\lambda \sqrt{s}\,|\Delta_S|_2
\leq
3\lambda \sqrt{s}\,\frac{|X\Delta|}{\sqrt{n}\,\kappa}.
\end{align*}
If $|X\Delta|=0$, the desired prediction bound holds. If $|X\Delta|>0$, divide by $|X\Delta|/\sqrt{n}$ to obtain
\begin{align*}
\frac{|X\Delta|}{\sqrt{n}}
\leq
\frac{3\lambda\sqrt{s}}{\kappa}.
\end{align*}
Squaring both sides yields
\begin{align*}
\frac{1}{n}|X\Delta|^2
\leq
9\frac{s\lambda^2}{\kappa^2}.
\end{align*}
[/step]
[step:Convert the prediction bound into the $\ell^1$ estimation bound]
The cone condition gives
\begin{align*}
|\Delta|_1
=
|\Delta_S|_1 + |\Delta_{S^c}|_1
\leq
4|\Delta_S|_1.
\end{align*}
Using Cauchy-Schwarz on the coordinates in $S$,
\begin{align*}
|\Delta|_1
\leq
4\sqrt{s}\,|\Delta_S|_2.
\end{align*}
Since $\Delta$ satisfies the restricted eigenvalue cone condition,
\begin{align*}
|\Delta_S|_2
\leq
\frac{|X\Delta|}{\sqrt{n}\,\kappa}.
\end{align*}
The prediction estimate from the previous step gives
\begin{align*}
\frac{|X\Delta|}{\sqrt{n}}
\leq
\frac{3\lambda\sqrt{s}}{\kappa}.
\end{align*}
Therefore,
\begin{align*}
|\Delta|_1
\leq
4\sqrt{s}\,\frac{1}{\kappa}\frac{|X\Delta|}{\sqrt{n}}
\leq
4\sqrt{s}\,\frac{1}{\kappa}\frac{3\lambda\sqrt{s}}{\kappa}
=
12\frac{s\lambda}{\kappa^2}.
\end{align*}
Since $\Delta = \hat{\beta}-\beta^*$, the two displayed estimates are precisely
\begin{align*}
\frac{1}{n}|X(\hat{\beta}-\beta^*)|^2 \leq 9\frac{s\lambda^2}{\kappa^2}
\end{align*}
and
\begin{align*}
|\hat{\beta}-\beta^*|_1 \leq 12\frac{s\lambda}{\kappa^2}.
\end{align*}
[/step]