[proofplan]
We compare the Lasso objective at the minimizer $\hat{\beta}$ with its value at the true coefficient vector $\beta^*$. After substituting $Y=X\beta^*+\varepsilon$, the comparison gives a deterministic basic inequality involving the prediction error, the score vector $X^\top\varepsilon/n$, and the difference of $\ell^1$ penalties. On the score event, the stochastic term is controlled by the $\ell^\infty$-$\ell^1$ duality estimate, and the remaining penalty terms are rearranged using the triangle inequality.
[/proofplan]
[step:Use optimality of $\hat{\beta}$ to derive the basic inequality]
Define the estimation error vector $\delta \in \mathbb{R}^p$ by
\begin{align*}
\delta := \hat{\beta}-\beta^* .
\end{align*}
Since $\hat{\beta}$ minimizes the Lasso objective over $\mathbb{R}^p$, evaluating the objective at the comparison point $\beta^*$ gives
\begin{align*}
\frac{1}{2n}|Y-X\hat{\beta}|^2+\lambda\|\hat{\beta}\|_1
\leq
\frac{1}{2n}|Y-X\beta^*|^2+\lambda\|\beta^*\|_1 .
\end{align*}
Using $Y=X\beta^*+\varepsilon$ and $\delta=\hat{\beta}-\beta^*$, we have
\begin{align*}
Y-X\hat{\beta}
=
\varepsilon-X\delta,
\qquad
Y-X\beta^*
=
\varepsilon .
\end{align*}
Substituting these identities gives
\begin{align*}
\frac{1}{2n}|\varepsilon-X\delta|^2+\lambda\|\hat{\beta}\|_1
\leq
\frac{1}{2n}|\varepsilon|^2+\lambda\|\beta^*\|_1 .
\end{align*}
Expanding the Euclidean square,
\begin{align*}
|\varepsilon-X\delta|^2
=
|\varepsilon|^2
-2\varepsilon^\top X\delta
+
|X\delta|^2 .
\end{align*}
After cancelling $|\varepsilon|^2/(2n)$ from both sides, we obtain
\begin{align*}
\frac{1}{2n}|X\delta|^2
\leq
\frac{1}{n}\varepsilon^\top X\delta
+
\lambda\|\beta^*\|_1
-
\lambda\|\hat{\beta}\|_1 .
\end{align*}
[guided]
The point of starting from optimality is that it gives an inequality without differentiability assumptions or uniqueness of the minimizer. Since $\hat{\beta}$ is a minimizer of
\begin{align*}
\beta \mapsto \frac{1}{2n}|Y-X\beta|^2+\lambda\|\beta\|_1 ,
\end{align*}
its objective value is no larger than the value at any competitor. We choose the specific competitor $\beta^*$, because the model equation gives an exact simplification there:
\begin{align*}
\frac{1}{2n}|Y-X\hat{\beta}|^2+\lambda\|\hat{\beta}\|_1
\leq
\frac{1}{2n}|Y-X\beta^*|^2+\lambda\|\beta^*\|_1 .
\end{align*}
Define $\delta := \hat{\beta}-\beta^*$. The model equation $Y=X\beta^*+\varepsilon$ gives
\begin{align*}
Y-X\hat{\beta}
=
X\beta^*+\varepsilon-X\hat{\beta}
=
\varepsilon-X(\hat{\beta}-\beta^*)
=
\varepsilon-X\delta,
\end{align*}
and also
\begin{align*}
Y-X\beta^*=\varepsilon .
\end{align*}
Therefore
\begin{align*}
\frac{1}{2n}|\varepsilon-X\delta|^2+\lambda\|\hat{\beta}\|_1
\leq
\frac{1}{2n}|\varepsilon|^2+\lambda\|\beta^*\|_1 .
\end{align*}
Now expand the squared Euclidean norm. For vectors $a,b \in \mathbb{R}^n$,
\begin{align*}
|a-b|^2=|a|^2-2a^\top b+|b|^2 .
\end{align*}
With $a=\varepsilon$ and $b=X\delta$, this gives
\begin{align*}
|\varepsilon-X\delta|^2
=
|\varepsilon|^2-2\varepsilon^\top X\delta+|X\delta|^2 .
\end{align*}
Substituting this expansion and cancelling the common term $|\varepsilon|^2/(2n)$ yields
\begin{align*}
\frac{1}{2n}|X\delta|^2
\leq
\frac{1}{n}\varepsilon^\top X\delta
+
\lambda\|\beta^*\|_1
-
\lambda\|\hat{\beta}\|_1 .
\end{align*}
This is the basic inequality: the left-hand side is the prediction error, while the right-hand side contains one stochastic score term and one deterministic penalty difference.
[/guided]
[/step]
[step:Control the score term on $\mathcal{E}_{\lambda}$]
Define the score vector $s \in \mathbb{R}^p$ by
\begin{align*}
s := \frac{1}{n}X^\top\varepsilon .
\end{align*}
Then
\begin{align*}
\frac{1}{n}\varepsilon^\top X\delta
=
s^\top\delta .
\end{align*}
Using the finite-dimensional [$\ell^\infty$-$\ell^1$ duality estimate](/page/Dual%20Norm),
\begin{align*}
s^\top\delta
\leq
|s^\top\delta|
\leq
\sum_{j=1}^p |s_j|\,|\delta_j|
\leq
\|s\|_\infty \|\delta\|_1 .
\end{align*}
On $\mathcal{E}_{\lambda}$, $\|s\|_\infty \leq \lambda/2$, so
\begin{align*}
\frac{1}{n}\varepsilon^\top X\delta
\leq
\frac{\lambda}{2}\|\delta\|_1 .
\end{align*}
Substituting this into the basic inequality gives
\begin{align*}
\frac{1}{2n}|X\delta|^2
\leq
\frac{\lambda}{2}\|\delta\|_1
+
\lambda\|\beta^*\|_1
-
\lambda\|\hat{\beta}\|_1 .
\end{align*}
[guided]
We isolate the stochastic term by naming the score vector. Define $s \in \mathbb{R}^p$ by
\begin{align*}
s := \frac{1}{n}X^\top\varepsilon .
\end{align*}
Then associativity of matrix multiplication and the definition of $s$ give
\begin{align*}
\frac{1}{n}\varepsilon^\top X\delta
=
\left(\frac{1}{n}X^\top\varepsilon\right)^\top\delta
=
s^\top\delta .
\end{align*}
We now apply the finite-dimensional [$\ell^\infty$-$\ell^1$ duality estimate](/page/Dual%20Norm) to the vectors $s,\delta \in \mathbb{R}^p$. This estimate is the coordinate inequality obtained by bounding each coefficient of $s$ by its maximum absolute value:
\begin{align*}
s^\top\delta
\leq
|s^\top\delta|
\leq
\sum_{j=1}^p |s_j|\,|\delta_j|
\leq
\|s\|_\infty \sum_{j=1}^p |\delta_j|
=
\|s\|_\infty \|\delta\|_1 .
\end{align*}
The score event $\mathcal{E}_{\lambda}$ is precisely the event on which $\|s\|_\infty \leq \lambda/2$. Therefore, on $\mathcal{E}_{\lambda}$,
\begin{align*}
\frac{1}{n}\varepsilon^\top X\delta
\leq
\frac{\lambda}{2}\|\delta\|_1 .
\end{align*}
Substituting this bound into the basic inequality from the previous step gives
\begin{align*}
\frac{1}{2n}|X\delta|^2
\leq
\frac{\lambda}{2}\|\delta\|_1
+
\lambda\|\beta^*\|_1
-
\lambda\|\hat{\beta}\|_1 .
\end{align*}
This is the point where randomness leaves the argument: after restricting to $\mathcal{E}_{\lambda}$, the remaining estimate is deterministic.
[/guided]
[/step]
[step:Absorb the penalty terms using the triangle inequality]
The [triangle inequality](/page/Triangle%20Inequality) in $\mathbb{R}^p$ gives
\begin{align*}
\|\delta\|_1
=
\|\hat{\beta}-\beta^*\|_1
\leq
\|\hat{\beta}\|_1+\|\beta^*\|_1 .
\end{align*}
Therefore substituting the triangle inequality estimate into the previous bound gives
\begin{align*}
\frac{1}{2n}|X\delta|^2
\leq
\frac{\lambda}{2}\left(\|\hat{\beta}\|_1+\|\beta^*\|_1\right)
+
\lambda\|\beta^*\|_1
-
\lambda\|\hat{\beta}\|_1 .
\end{align*}
Collecting the coefficients of $\|\hat{\beta}\|_1$ and $\|\beta^*\|_1$, this becomes
\begin{align*}
\frac{1}{2n}|X\delta|^2
\leq
\frac{3\lambda}{2}\|\beta^*\|_1
-
\frac{\lambda}{2}\|\hat{\beta}\|_1 .
\end{align*}
Since $\lambda>0$ and $\|\hat{\beta}\|_1 \geq 0$, the last term is non-positive, hence
\begin{align*}
\frac{1}{2n}|X\delta|^2
\leq
\frac{3\lambda}{2}\|\beta^*\|_1 .
\end{align*}
Multiplying by $2$ yields the sharper bound
\begin{align*}
\frac{1}{n}|X(\hat{\beta}-\beta^*)|^2
\leq
3\lambda\|\beta^*\|_1 .
\end{align*}
Since $3\lambda\|\beta^*\|_1 \leq 4\lambda\|\beta^*\|_1$, the stated inequality follows:
\begin{align*}
\frac{1}{n}|X(\hat{\beta}-\beta^*)|^2
\leq
4\lambda\|\beta^*\|_1 .
\end{align*}
[guided]
We now remove the unknown error norm $\|\delta\|_1$ from the bound by comparing it to the two coefficient vectors already present. The [triangle inequality](/page/Triangle%20Inequality) in $\mathbb{R}^p$ gives
\begin{align*}
\|\delta\|_1
=
\|\hat{\beta}-\beta^*\|_1
\leq
\|\hat{\beta}\|_1+\|\beta^*\|_1 .
\end{align*}
Substituting this into the estimate obtained after controlling the score term yields
\begin{align*}
\frac{1}{2n}|X\delta|^2
\leq
\frac{\lambda}{2}\left(\|\hat{\beta}\|_1+\|\beta^*\|_1\right)
+
\lambda\|\beta^*\|_1
-
\lambda\|\hat{\beta}\|_1 .
\end{align*}
Now collect the coefficients of the two penalty norms. The coefficient of $\|\hat{\beta}\|_1$ is $\lambda/2-\lambda=-\lambda/2$, and the coefficient of $\|\beta^*\|_1$ is $\lambda/2+\lambda=3\lambda/2$. Hence
\begin{align*}
\frac{1}{2n}|X\delta|^2
\leq
\frac{3\lambda}{2}\|\beta^*\|_1
-
\frac{\lambda}{2}\|\hat{\beta}\|_1 .
\end{align*}
Because $\lambda>0$ and $\|\hat{\beta}\|_1 \geq 0$, the term $-(\lambda/2)\|\hat{\beta}\|_1$ is non-positive. Dropping a non-positive term from the right-hand side preserves the upper bound:
\begin{align*}
\frac{1}{2n}|X\delta|^2
\leq
\frac{3\lambda}{2}\|\beta^*\|_1 .
\end{align*}
Multiplying both sides by $2$ gives
\begin{align*}
\frac{1}{n}|X\delta|^2
\leq
3\lambda\|\beta^*\|_1 .
\end{align*}
Finally, since $\delta=\hat{\beta}-\beta^*$ and $3\lambda\|\beta^*\|_1 \leq 4\lambda\|\beta^*\|_1$, we obtain the stated slow-rate prediction inequality:
\begin{align*}
\frac{1}{n}|X(\hat{\beta}-\beta^*)|^2
\leq
4\lambda\|\beta^*\|_1 .
\end{align*}
[/guided]
[/step]