[proofplan]
Set $\Delta = \hat{\beta}-\beta^*$ and use the optimality of $\hat{\beta}$ in the Lasso objective to obtain the basic inequality. The score event controls the stochastic [inner product](/page/Inner%20Product) by $\lambda\|\Delta\|_1/2$, while the support condition $\beta^*_{S^c}=0$ separates the $\ell^1$ penalty into $S$ and $S^c$. This simultaneously proves the cone condition $\Delta \in \mathcal{C}(S,3)$ and gives an upper bound for $|X\Delta|^2/n$ in terms of $\|\Delta_S\|_1$. The restricted eigenvalue lower bound then converts the prediction norm bound into a Euclidean bound for $\Delta$.
[/proofplan]
[step:Derive the sharpened basic inequality from Lasso optimality]
Define the estimation error vector
\begin{align*}
\Delta = \hat{\beta}-\beta^* \in \mathbb{R}^p .
\end{align*}
For any subset $A \subset \{1,\dots,p\}$, define $\Delta_A \in \mathbb{R}^A$ to be the coordinate restriction $(\Delta_j)_{j\in A}$, where $\mathbb{R}^A$ denotes the finite-dimensional real [vector space](/page/Vector%20Space) of functions from $A$ to $\mathbb{R}$.
Since $\hat{\beta}$ minimizes $Q_\lambda$ over $\mathbb{R}^p$, we have
\begin{align*}
\frac{1}{2n}|Y-X\hat{\beta}|^2 + \lambda\|\hat{\beta}\|_1
\leq
\frac{1}{2n}|Y-X\beta^*|^2 + \lambda\|\beta^*\|_1 .
\end{align*}
Using $Y=X\beta^*+\varepsilon$ and $\hat{\beta}=\beta^*+\Delta$, this becomes
\begin{align*}
\frac{1}{2n}|\varepsilon-X\Delta|^2 + \lambda\|\beta^*+\Delta\|_1
\leq
\frac{1}{2n}|\varepsilon|^2 + \lambda\|\beta^*\|_1 .
\end{align*}
Expanding the squared Euclidean norm gives
\begin{align*}
|\varepsilon-X\Delta|^2
=
|\varepsilon|^2 - 2\varepsilon^\top X\Delta + |X\Delta|^2 .
\end{align*}
Substituting this identity and cancelling $|\varepsilon|^2/(2n)$ yields
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{1}{n}\varepsilon^\top X\Delta
+
\lambda\bigl(\|\beta^*\|_1-\|\beta^*+\Delta\|_1\bigr).
\end{align*}
On the score event $\mathcal{E}_\lambda$, the vector $X^\top\varepsilon/n$ and the vector $\Delta$ both belong to the finite-dimensional space $\mathbb{R}^p$, so the finite-dimensional [Hölder inequality](/page/Holder%20Inequality) with conjugate exponents $\infty$ and $1$ applies:
\begin{align*}
\frac{1}{n}\varepsilon^\top X\Delta
=
\left(\frac{1}{n}X^\top\varepsilon\right)^\top \Delta
\leq
\left\|\frac{1}{n}X^\top\varepsilon\right\|_\infty \|\Delta\|_1
\leq
\frac{\lambda}{2}\|\Delta\|_1 .
\end{align*}
Because $\beta^*_{S^c}=0$, the support decomposition gives
\begin{align*}
\|\beta^*\|_1-\|\beta^*+\Delta\|_1
=
\|\beta^*_S\|_1-\|\beta^*_S+\Delta_S\|_1-\|\Delta_{S^c}\|_1 .
\end{align*}
The vectors $\beta^*_S$ and $\Delta_S$ belong to the finite-dimensional coordinate space $\mathbb{R}^S$, so the reverse form of the [Triangle Inequality](/page/Triangle%20Inequality) for the $\ell^1$ norm on $\mathbb{R}^S$ gives
\begin{align*}
\|\beta^*\|_1-\|\beta^*+\Delta\|_1
\leq
\|\Delta_S\|_1-\|\Delta_{S^c}\|_1 .
\end{align*}
Since $\|\Delta\|_1=\|\Delta_S\|_1+\|\Delta_{S^c}\|_1$, we obtain
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{\lambda}{2}\bigl(\|\Delta_S\|_1+\|\Delta_{S^c}\|_1\bigr)
+
\lambda\bigl(\|\Delta_S\|_1-\|\Delta_{S^c}\|_1\bigr).
\end{align*}
Collecting coefficients gives
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{3\lambda}{2}\|\Delta_S\|_1
-
\frac{\lambda}{2}\|\Delta_{S^c}\|_1 .
\end{align*}
[guided]
We first convert the fact that $\hat{\beta}$ minimizes the Lasso objective into an inequality involving the error vector. Define
\begin{align*}
\Delta = \hat{\beta}-\beta^* \in \mathbb{R}^p .
\end{align*}
For any subset $A \subset \{1,\dots,p\}$, define $\Delta_A \in \mathbb{R}^A$ to be the coordinate restriction $(\Delta_j)_{j\in A}$, where $\mathbb{R}^A$ denotes the finite-dimensional real vector space of functions from $A$ to $\mathbb{R}$. In particular, $\Delta_S$ keeps the coordinates indexed by $S$, while $\Delta_{S^c}$ keeps the coordinates indexed by the complement $S^c$ in $\{1,\dots,p\}$.
The objective being minimized is the map $Q_\lambda: \mathbb{R}^p \to \mathbb{R}$ defined by sending each $\beta \in \mathbb{R}^p$ to the real number $Q_\lambda(\beta)$, where
\begin{align*}
Q_\lambda(\beta)
=
\frac{1}{2n}|Y-X\beta|^2+\lambda\|\beta\|_1 .
\end{align*}
Since $\hat{\beta}$ is a minimizer, comparing it with the admissible vector $\beta^*$ gives
\begin{align*}
\frac{1}{2n}|Y-X\hat{\beta}|^2 + \lambda\|\hat{\beta}\|_1
\leq
\frac{1}{2n}|Y-X\beta^*|^2 + \lambda\|\beta^*\|_1 .
\end{align*}
Now use the model equation $Y=X\beta^*+\varepsilon$ and the identity $\hat{\beta}=\beta^*+\Delta$. Then
\begin{align*}
Y-X\hat{\beta}
=
X\beta^*+\varepsilon-X(\beta^*+\Delta)
=
\varepsilon-X\Delta,
\end{align*}
while $Y-X\beta^*=\varepsilon$. Therefore
\begin{align*}
\frac{1}{2n}|\varepsilon-X\Delta|^2 + \lambda\|\beta^*+\Delta\|_1
\leq
\frac{1}{2n}|\varepsilon|^2 + \lambda\|\beta^*\|_1 .
\end{align*}
Expanding the square in the Euclidean inner product on $\mathbb{R}^n$ gives
\begin{align*}
|\varepsilon-X\Delta|^2
=
|\varepsilon|^2 - 2\varepsilon^\top X\Delta + |X\Delta|^2 .
\end{align*}
After substituting this expansion and cancelling $|\varepsilon|^2/(2n)$ from both sides, we get the basic inequality
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{1}{n}\varepsilon^\top X\Delta
+
\lambda\bigl(\|\beta^*\|_1-\|\beta^*+\Delta\|_1\bigr).
\end{align*}
The next task is to control the two terms on the right-hand side. The score event was designed exactly to control the stochastic inner product. Since
\begin{align*}
\frac{1}{n}\varepsilon^\top X\Delta
=
\left(\frac{1}{n}X^\top\varepsilon\right)^\top\Delta,
\end{align*}
the vector $X^\top\varepsilon/n$ and the vector $\Delta$ lie in the same finite-dimensional space $\mathbb{R}^p$. Therefore the finite-dimensional [Hölder inequality](/page/Holder%20Inequality) with conjugate exponents $\infty$ and $1$ gives
\begin{align*}
\frac{1}{n}\varepsilon^\top X\Delta
\leq
\left\|\frac{1}{n}X^\top\varepsilon\right\|_\infty\|\Delta\|_1 .
\end{align*}
On $\mathcal{E}_\lambda$, the first factor is at most $\lambda/2$, hence
\begin{align*}
\frac{1}{n}\varepsilon^\top X\Delta
\leq
\frac{\lambda}{2}\|\Delta\|_1 .
\end{align*}
It remains to exploit sparsity. Since $S=\operatorname{supp}(\beta^*)$, the vector $\beta^*$ has no coordinates outside $S$, so $\beta^*_{S^c}=0$. Therefore
\begin{align*}
\|\beta^*+\Delta\|_1
=
\|\beta^*_S+\Delta_S\|_1+\|\Delta_{S^c}\|_1 .
\end{align*}
The vectors $\beta^*_S$ and $\Delta_S$ lie in the finite-dimensional coordinate space $\mathbb{R}^S$. Applying the reverse form of the [Triangle Inequality](/page/Triangle%20Inequality) for the $\ell^1$ norm on this space gives
\begin{align*}
\|\beta^*_S+\Delta_S\|_1
\geq
\|\beta^*_S\|_1-\|\Delta_S\|_1 .
\end{align*}
Thus the support decomposition gives
\begin{align*}
\|\beta^*\|_1-\|\beta^*+\Delta\|_1
=
\|\beta^*_S\|_1-\|\beta^*_S+\Delta_S\|_1-\|\Delta_{S^c}\|_1 .
\end{align*}
Combining this identity with the [reverse triangle inequality](/theorems/2300) gives
\begin{align*}
\|\beta^*\|_1-\|\beta^*+\Delta\|_1
\leq
\|\Delta_S\|_1-\|\Delta_{S^c}\|_1 .
\end{align*}
Finally, because $\Delta_S$ and $\Delta_{S^c}$ split the coordinates of $\Delta$,
\begin{align*}
\|\Delta\|_1=\|\Delta_S\|_1+\|\Delta_{S^c}\|_1 .
\end{align*}
Substituting the score bound and the penalty bound into the basic inequality gives
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{\lambda}{2}\bigl(\|\Delta_S\|_1+\|\Delta_{S^c}\|_1\bigr)
+
\lambda\bigl(\|\Delta_S\|_1-\|\Delta_{S^c}\|_1\bigr).
\end{align*}
Collecting the coefficients of $\|\Delta_S\|_1$ and $\|\Delta_{S^c}\|_1$ gives
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{3\lambda}{2}\|\Delta_S\|_1
-
\frac{\lambda}{2}\|\Delta_{S^c}\|_1 .
\end{align*}
This is the sharpened basic inequality: the term on $S^c$ has a negative sign, and that sign is what forces the error vector into the restricted eigenvalue cone.
[/guided]
[/step]
[step:Use the sharpened inequality to place the error in the restricted eigenvalue cone]
From the sharpened basic inequality,
\begin{align*}
0
\leq
\frac{1}{2n}|X\Delta|^2
\leq
\frac{3\lambda}{2}\|\Delta_S\|_1
-
\frac{\lambda}{2}\|\Delta_{S^c}\|_1 .
\end{align*}
Since $\lambda>0$, this implies
\begin{align*}
\|\Delta_{S^c}\|_1
\leq
3\|\Delta_S\|_1 .
\end{align*}
Hence $\Delta \in \mathcal{C}(S,3)$. Returning to the same sharpened inequality and discarding the non-positive term $-(\lambda/2)\|\Delta_{S^c}\|_1$, we also obtain
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{3\lambda}{2}\|\Delta_S\|_1 .
\end{align*}
[guided]
The sharpened basic inequality contains both the prediction error and the two coordinate parts of the estimation error:
\begin{align*}
0
\leq
\frac{1}{2n}|X\Delta|^2
\leq
\frac{3\lambda}{2}\|\Delta_S\|_1
-
\frac{\lambda}{2}\|\Delta_{S^c}\|_1 .
\end{align*}
The [first inequality](/theorems/2897) holds because $|X\Delta|^2$ is a squared Euclidean norm. Since $\lambda>0$, the right-hand side must be non-negative. Multiplying the resulting inequality by $2/\lambda$ gives
\begin{align*}
\|\Delta_{S^c}\|_1
\leq
3\|\Delta_S\|_1 .
\end{align*}
This is exactly the defining cone condition for $\mathcal{C}(S,3)$, so $\Delta \in \mathcal{C}(S,3)$.
We also need an upper bound for the prediction norm that only involves the coordinates on $S$. In the same sharpened inequality, the term $-(\lambda/2)\|\Delta_{S^c}\|_1$ is non-positive because $\lambda>0$ and $\|\Delta_{S^c}\|_1\geq 0$. Removing this non-positive term enlarges the right-hand side, so
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{3\lambda}{2}\|\Delta_S\|_1 .
\end{align*}
This is the prediction bound that will be paired with the restricted eigenvalue lower bound.
[/guided]
[/step]
[step:Convert the prediction bound into a Euclidean bound using the restricted eigenvalue]
If $s=0$, then $S=\varnothing$. The cone condition proved above gives
\begin{align*}
\|\Delta\|_1=\|\Delta_{S^c}\|_1
\leq
3\|\Delta_S\|_1=0,
\end{align*}
so $\Delta=0$ and the desired estimate follows. Hence assume $s\geq 1$. If $\Delta=0$, then
\begin{align*}
|\hat{\beta}-\beta^*|=|\Delta|=0
\leq
\frac{4\lambda\sqrt{s}}{\kappa_{\mathrm{RE}}(S,3)^2},
\end{align*}
so assume $\Delta\neq 0$. Since $\Delta \in \mathcal{C}(S,3)$ and $\kappa_{\mathrm{RE}}(S,3)$ is the infimum of $|Xv|/(\sqrt{n}|v|)$ over nonzero vectors $v \in \mathcal{C}(S,3)$, we have
\begin{align*}
\frac{|X\Delta|}{\sqrt{n}|\Delta|}
\geq
\kappa_{\mathrm{RE}}(S,3),
\end{align*}
and therefore
\begin{align*}
\frac{1}{n}|X\Delta|^2
\geq
\kappa_{\mathrm{RE}}(S,3)^2|\Delta|^2 .
\end{align*}
The vector $\Delta_S$ belongs to the finite-dimensional coordinate space $\mathbb{R}^S$, which has $s$ coordinates. Applying the finite-dimensional [Cauchy-Schwarz Inequality](/page/Cauchy-Schwarz%20Inequality) to the vectors $(|\Delta_j|)_{j\in S}$ and $(1)_{j\in S}$ gives
\begin{align*}
\|\Delta_S\|_1
\leq
\sqrt{s}\,|\Delta_S|
\leq
\sqrt{s}\,|\Delta|.
\end{align*}
Combining this with the upper bound from the preceding step gives
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{3\lambda}{2}\sqrt{s}\,|\Delta|.
\end{align*}
Using the restricted eigenvalue lower bound on the left-hand side,
\begin{align*}
\frac{1}{2}\kappa_{\mathrm{RE}}(S,3)^2|\Delta|^2
\leq
\frac{3\lambda}{2}\sqrt{s}\,|\Delta|.
\end{align*}
Since $\Delta\neq 0$, divide by $|\Delta|/2$ to obtain
\begin{align*}
|\Delta|
\leq
\frac{3\lambda\sqrt{s}}{\kappa_{\mathrm{RE}}(S,3)^2}.
\end{align*}
Because $3\leq 4$ and $\Delta=\hat{\beta}-\beta^*$, it follows that
\begin{align*}
|\hat{\beta}-\beta^*|
\leq
\frac{4\lambda\sqrt{s}}{\kappa_{\mathrm{RE}}(S,3)^2}.
\end{align*}
This is the claimed Euclidean estimation bound.
[guided]
There is one endpoint case to remove before using the restricted eigenvalue definition. If $s=0$, then $S=\varnothing$. The cone condition from the preceding step gives
\begin{align*}
\|\Delta\|_1=\|\Delta_{S^c}\|_1
\leq
3\|\Delta_S\|_1=0.
\end{align*}
Thus $\Delta=0$, and the desired estimate follows. We may therefore assume $s\geq 1$.
If $\Delta=0$, then the desired estimate is immediate because
\begin{align*}
|\hat{\beta}-\beta^*|=|\Delta|=0
\leq
\frac{4\lambda\sqrt{s}}{\kappa_{\mathrm{RE}}(S,3)^2}.
\end{align*}
Assume therefore that $\Delta\neq 0$. The preceding step proved that $\Delta \in \mathcal{C}(S,3)$, and the hypothesis $\kappa_{\mathrm{RE}}(S,3)>0$ says that the restricted eigenvalue is positive on this cone. By the definition of $\kappa_{\mathrm{RE}}(S,3)$ as the infimum of $|Xv|/(\sqrt{n}|v|)$ over nonzero vectors $v\in\mathcal{C}(S,3)$, applying the definition to $v=\Delta$ gives
\begin{align*}
\frac{|X\Delta|}{\sqrt{n}|\Delta|}
\geq
\kappa_{\mathrm{RE}}(S,3).
\end{align*}
Squaring both sides is valid because both sides are non-negative, and hence
\begin{align*}
\frac{1}{n}|X\Delta|^2
\geq
\kappa_{\mathrm{RE}}(S,3)^2|\Delta|^2 .
\end{align*}
Next we convert the $\ell^1$ term from the prediction upper bound into a Euclidean norm. The vector $\Delta_S$ lies in the finite-dimensional coordinate space $\mathbb{R}^S$, which has $s$ coordinates. Applying the finite-dimensional [Cauchy-Schwarz Inequality](/page/Cauchy-Schwarz%20Inequality) to $(|\Delta_j|)_{j\in S}$ and $(1)_{j\in S}$ gives
\begin{align*}
\|\Delta_S\|_1
=
\sum_{j\in S}|\Delta_j|
\leq
\left(\sum_{j\in S}|\Delta_j|^2\right)^{1/2}
\left(\sum_{j\in S}1^2\right)^{1/2}
=
\sqrt{s}\,|\Delta_S|.
\end{align*}
Since restricting to the coordinates in $S$ cannot increase the Euclidean norm,
\begin{align*}
|\Delta_S|\leq |\Delta|,
\end{align*}
and therefore
\begin{align*}
\|\Delta_S\|_1
\leq
\sqrt{s}\,|\Delta|.
\end{align*}
The prediction upper bound from the previous step was
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{3\lambda}{2}\|\Delta_S\|_1 .
\end{align*}
Substituting the Cauchy-Schwarz estimate gives
\begin{align*}
\frac{1}{2n}|X\Delta|^2
\leq
\frac{3\lambda}{2}\sqrt{s}\,|\Delta|.
\end{align*}
Using the restricted eigenvalue lower bound on the left-hand side yields
\begin{align*}
\frac{1}{2}\kappa_{\mathrm{RE}}(S,3)^2|\Delta|^2
\leq
\frac{3\lambda}{2}\sqrt{s}\,|\Delta|.
\end{align*}
Because $\Delta\neq 0$, we have $|\Delta|>0$, so division by $|\Delta|/2$ is valid. This gives
\begin{align*}
|\Delta|
\leq
\frac{3\lambda\sqrt{s}}{\kappa_{\mathrm{RE}}(S,3)^2}.
\end{align*}
Finally, $3\leq 4$ and $\Delta=\hat{\beta}-\beta^*$, so
\begin{align*}
|\hat{\beta}-\beta^*|
\leq
\frac{4\lambda\sqrt{s}}{\kappa_{\mathrm{RE}}(S,3)^2}.
\end{align*}
This is the claimed Euclidean estimation bound.
[/guided]
[/step]