[proofplan]
We write the estimation error as $\Delta := \hat{\beta}-\beta^*$ and use the Lasso optimality inequality at $\hat{\beta}$ against the competitor $\beta^*$. On the event $\mathcal{E}_\lambda$, the stochastic [inner product](/page/Inner%20Product) is controlled by the $\ell_1$ norm of $\Delta$, which yields both the cone condition and the sharpened prediction bound. The restricted eigenvalue assumption then converts the prediction bound into an Euclidean error bound. Finally, the cone condition converts the Euclidean bound into the $\ell_1$ bound.
[/proofplan]
custom_env
admin
[step:Derive the basic inequality from Lasso optimality]Define the estimation error $\Delta \in \mathbb{R}^p$ by
\begin{align*}
\Delta := \hat{\beta}-\beta^*.
\end{align*}
Since $\hat{\beta}$ minimizes the Lasso objective over $\mathbb{R}^p$, comparison with $\beta^*$ gives
\begin{align*}
\frac{1}{n}|Y-X\hat{\beta}|^2 + 2\lambda\|\hat{\beta}\|_1
\leq
\frac{1}{n}|Y-X\beta^*|^2 + 2\lambda\|\beta^*\|_1.
\end{align*}
Using $Y=X\beta^*+\varepsilon$ and $\hat{\beta}=\beta^*+\Delta$, we have
\begin{align*}
Y-X\hat{\beta}=\varepsilon-X\Delta.
\end{align*}
Substituting this identity and expanding the Euclidean square gives
\begin{align*}
\frac{1}{n}|X\Delta|^2
\leq
\frac{2}{n}\varepsilon^\top X\Delta
+
2\lambda\bigl(\|\beta^*\|_1-\|\beta^*+\Delta\|_1\bigr).
\end{align*}[/step]
custom_env
admin
[guided]The first task is to turn the variational definition of $\hat{\beta}$ into an inequality for the error vector. Define
\begin{align*}
\Delta := \hat{\beta}-\beta^*.
\end{align*}
The Lasso solution $\hat{\beta}$ minimizes the function $Q:\mathbb{R}^p \to \mathbb{R}$ defined, for each $\beta \in \mathbb{R}^p$, by
\begin{align*}
Q(\beta):=\frac{1}{n}|Y-X\beta|^2 + 2\lambda\|\beta\|_1.
\end{align*}
Therefore $Q(\hat{\beta}) \leq Q(\beta^*)$, meaning
\begin{align*}
\frac{1}{n}|Y-X\hat{\beta}|^2 + 2\lambda\|\hat{\beta}\|_1
\leq
\frac{1}{n}|Y-X\beta^*|^2 + 2\lambda\|\beta^*\|_1.
\end{align*}
Since $Y=X\beta^*+\varepsilon$ and $\hat{\beta}=\beta^*+\Delta$, the residual at $\hat{\beta}$ is
\begin{align*}
Y-X\hat{\beta}
=
X\beta^*+\varepsilon-X(\beta^*+\Delta)
=
\varepsilon-X\Delta.
\end{align*}
Thus
\begin{align*}
\frac{1}{n}|\varepsilon-X\Delta|^2 + 2\lambda\|\beta^*+\Delta\|_1
\leq
\frac{1}{n}|\varepsilon|^2 + 2\lambda\|\beta^*\|_1.
\end{align*}
Expanding
\begin{align*}
|\varepsilon-X\Delta|^2
=
|\varepsilon|^2 - 2\varepsilon^\top X\Delta + |X\Delta|^2
\end{align*}
and cancelling $|\varepsilon|^2/n$ from both sides gives
\begin{align*}
\frac{1}{n}|X\Delta|^2
\leq
\frac{2}{n}\varepsilon^\top X\Delta
+
2\lambda\bigl(\|\beta^*\|_1-\|\beta^*+\Delta\|_1\bigr).
\end{align*}
This is the basic inequality. It is the only place where the minimizing property of $\hat{\beta}$ is used.[/guided]
custom_env
admin
[step:Control the stochastic term and force the error into the restricted eigenvalue cone]By the definition of $\mathcal{E}_\lambda$, we have $(1/n)|X^\top\varepsilon|_\infty \leq \lambda/2$. On this event, [Hölder's inequality](/page/Holder%20Inequality) for the dual pair $(\ell_\infty^p,\ell_1^p)$ gives
\begin{align*}
\frac{2}{n}\varepsilon^\top X\Delta
\leq
\frac{2}{n}|X^\top\varepsilon|_\infty\|\Delta\|_1
\leq
\lambda\|\Delta\|_1.
\end{align*}
Because $\operatorname{supp}(\beta^*)=S$, we have $\beta^*_{S^c}=0$. Decomposing the $\ell_1$ norm over $S$ and $S^c$ gives
\begin{align*}
\|\beta^*\|_1-\|\beta^*+\Delta\|_1
=
\|\beta^*_S\|_1-\|\beta^*_S+\Delta_S\|_1-\|\Delta_{S^c}\|_1.
\end{align*}
The [reverse triangle inequality](/theorems/2300) on the coordinates in $S$ therefore yields
\begin{align*}
\|\beta^*\|_1-\|\beta^*+\Delta\|_1
\leq
\|\Delta_S\|_1-\|\Delta_{S^c}\|_1.
\end{align*}
Substituting both estimates into the basic inequality yields
\begin{align*}
\frac{1}{n}|X\Delta|^2
\leq
\lambda\|\Delta\|_1
+
2\lambda\|\Delta_S\|_1
-
2\lambda\|\Delta_{S^c}\|_1.
\end{align*}
Since $\|\Delta\|_1=\|\Delta_S\|_1+\|\Delta_{S^c}\|_1$, this becomes
\begin{align*}
\frac{1}{n}|X\Delta|^2
\leq
3\lambda\|\Delta_S\|_1-\lambda\|\Delta_{S^c}\|_1.
\end{align*}
The left-hand side is non-negative, so
\begin{align*}
\|\Delta_{S^c}\|_1 \leq 3\|\Delta_S\|_1.
\end{align*}
Thus $\Delta$ lies in the restricted eigenvalue cone. Also,
\begin{align*}
\frac{1}{n}|X\Delta|^2
\leq
3\lambda\|\Delta_S\|_1.
\end{align*}[/step]
custom_env
admin
[guided]The event $\mathcal{E}_\lambda$ is designed exactly to control the random correlation between the noise vector and each column of the design matrix. By definition,
\begin{align*}
\mathcal{E}_\lambda = \left\{\frac{1}{n}|X^\top\varepsilon|_\infty \leq \frac{\lambda}{2}\right\}.
\end{align*}
Therefore, on $\mathcal{E}_\lambda$, the dual pairing between $X^\top\varepsilon$ and $\Delta$ is bounded by [Hölder's inequality](/page/Holder%20Inequality) for $(\ell_\infty^p,\ell_1^p)$:
\begin{align*}
\frac{2}{n}\varepsilon^\top X\Delta
=
\frac{2}{n}(X^\top\varepsilon)^\top\Delta
\leq
\frac{2}{n}|X^\top\varepsilon|_\infty\|\Delta\|_1
\leq
\lambda\|\Delta\|_1.
\end{align*}
The next point is to make the sparsity of $\beta^*$ visible in the penalty difference. Since $S=\operatorname{supp}(\beta^*)$, the vector $\beta^*$ vanishes on $S^c$, so $\beta^*_{S^c}=0$. Splitting the $\ell_1$ norm across the two coordinate sets gives
\begin{align*}
\|\beta^*\|_1-\|\beta^*+\Delta\|_1
=
\|\beta^*_S\|_1-\|\beta^*_S+\Delta_S\|_1-\|\Delta_{S^c}\|_1.
\end{align*}
On $S$, the reverse triangle inequality gives
\begin{align*}
\|\beta^*_S\|_1-\|\beta^*_S+\Delta_S\|_1
\leq
\|\Delta_S\|_1.
\end{align*}
Hence
\begin{align*}
\|\beta^*\|_1-\|\beta^*+\Delta\|_1
\leq
\|\Delta_S\|_1-\|\Delta_{S^c}\|_1.
\end{align*}
Substituting this penalty estimate and the stochastic estimate into the basic inequality gives
\begin{align*}
\frac{1}{n}|X\Delta|^2
\leq
\lambda\|\Delta\|_1+2\lambda\|\Delta_S\|_1-2\lambda\|\Delta_{S^c}\|_1.
\end{align*}
Because the supports $S$ and $S^c$ are disjoint,
\begin{align*}
\|\Delta\|_1=\|\Delta_S\|_1+\|\Delta_{S^c}\|_1,
\end{align*}
so the right-hand side becomes
\begin{align*}
3\lambda\|\Delta_S\|_1-\lambda\|\Delta_{S^c}\|_1.
\end{align*}
Thus
\begin{align*}
\frac{1}{n}|X\Delta|^2
\leq
3\lambda\|\Delta_S\|_1-\lambda\|\Delta_{S^c}\|_1.
\end{align*}
The left-hand side is a squared Euclidean norm divided by $n$, and is therefore non-negative. Consequently the right-hand side must be non-negative, which forces
\begin{align*}
\|\Delta_{S^c}\|_1 \leq 3\|\Delta_S\|_1.
\end{align*}
This is the restricted eigenvalue cone condition. Dropping the non-positive term $-\lambda\|\Delta_{S^c}\|_1$ from the preceding upper bound also gives the sharpened prediction estimate
\begin{align*}
\frac{1}{n}|X\Delta|^2
\leq
3\lambda\|\Delta_S\|_1.
\end{align*}[/guided]
custom_env
admin
[step:Apply the restricted eigenvalue condition to obtain the Euclidean rate]
Since $\Delta$ satisfies
\begin{align*}
\|\Delta_{S^c}\|_1 \leq 3\|\Delta_S\|_1,
\end{align*}
the restricted eigenvalue hypothesis applies to $\Delta$ and gives
\begin{align*}
\kappa_{\mathrm{full}}(S,3)^2|\Delta|^2
\leq
\frac{1}{n}|X\Delta|^2.
\end{align*}
Combining this lower bound with the sharpened prediction bound gives
\begin{align*}
\kappa_{\mathrm{full}}(S,3)^2|\Delta|^2
\leq
3\lambda\|\Delta_S\|_1.
\end{align*}
By the [Cauchy-Schwarz inequality](/page/Cauchy-Schwarz%20Inequality) in $\mathbb{R}^s$ applied to the vector $\Delta_S$, we have
\begin{align*}
\|\Delta_S\|_1 \leq \sqrt{s}\,|\Delta_S| \leq \sqrt{s}\,|\Delta|.
\end{align*}
Therefore
\begin{align*}
\kappa_{\mathrm{full}}(S,3)^2|\Delta|^2
\leq
3\lambda\sqrt{s}\,|\Delta|.
\end{align*}
If $\Delta=0$, the desired Euclidean bound is immediate. If $\Delta\neq 0$, dividing by $|\Delta|$ gives
\begin{align*}
|\Delta|
\leq
\frac{3\lambda\sqrt{s}}{\kappa_{\mathrm{full}}(S,3)^2}.
\end{align*}
Squaring yields
\begin{align*}
|\hat{\beta}-\beta^*|^2
=
|\Delta|^2
\leq
\frac{9\lambda^2s}{\kappa_{\mathrm{full}}(S,3)^4}
\leq
\frac{12\lambda^2s}{\kappa_{\mathrm{full}}(S,3)^4}.
\end{align*}
[/step]
custom_env
admin
[step:Use the cone condition to convert the Euclidean rate into the $\ell_1$ rate]
From the cone condition,
\begin{align*}
\|\Delta\|_1
=
\|\Delta_S\|_1+\|\Delta_{S^c}\|_1
\leq
4\|\Delta_S\|_1.
\end{align*}
Applying the [Cauchy-Schwarz inequality](/page/Cauchy-Schwarz%20Inequality) in $\mathbb{R}^s$ to $\Delta_S$ gives
\begin{align*}
\|\Delta_S\|_1 \leq \sqrt{s}\,|\Delta_S| \leq \sqrt{s}\,|\Delta|.
\end{align*}
Using the Euclidean estimate from the previous step,
\begin{align*}
\|\Delta\|_1
\leq
4\sqrt{s}\,|\Delta|.
\end{align*}
Substituting $|\Delta| \leq 3\lambda\sqrt{s}/\kappa_{\mathrm{full}}(S,3)^2$ gives
\begin{align*}
\|\Delta\|_1
\leq
\frac{12\lambda s}{\kappa_{\mathrm{full}}(S,3)^2}.
\end{align*}
Since $\Delta=\hat{\beta}-\beta^*$, this proves
\begin{align*}
\|\hat{\beta}-\beta^*\|_1
\leq
\frac{12\lambda s}{\kappa_{\mathrm{full}}(S,3)^2}.
\end{align*}
Together with the Euclidean estimate $|\hat{\beta}-\beta^*|^2 \leq 9\lambda^2s/\kappa_{\mathrm{full}}(S,3)^4$, this proves both displayed estimates after choosing the single universal constant $C=12$ for the two inequalities.
[/step]