[proofplan]
We use the fact that a global minimizer remains a minimizer after one variable is fixed. First, fixing $\sigma=\hat{\sigma}$ and multiplying by the positive constant $\hat{\sigma}$ converts the $\beta$-subproblem into an ordinary Lasso objective with penalty level $\hat{\sigma}\lambda_0$; this fixed-point necessity uses only that $\lambda_0$ is a scalar, not that it is nonnegative. Second, fixing $\beta=\hat{\beta}$ reduces the $\sigma$-subproblem to a one-variable convex minimization problem. The derivative computation gives the fixed point relation for $\hat{\sigma}$, and the zero-residual case is excluded for any attained positive-scale minimizer because the one-variable objective then decreases as $\sigma \downarrow 0$.
[/proofplan]
custom_env
admin
[step:Declare the scaled Lasso objective and norm conventions]
Let $\Phi: \mathbb R^p\times(0,\infty)\to\mathbb R$ denote the scaled Lasso objective defined by
\begin{align*}
\Phi(\beta,\sigma)=\frac{\|Y-X\beta\|_2^2}{2n\sigma}+\frac{\sigma}{2}+\lambda_0\|\beta\|_1.
\end{align*}
Here $\|\cdot\|_2$ denotes the Euclidean norm on $\mathbb R^n$, and $\|\cdot\|_1$ denotes the $\ell^1$ norm on $\mathbb R^p$.
[/step]
custom_env
admin
[step:Fix $\hat\sigma$ and reduce the $\beta$-minimization to an ordinary Lasso problem]
Since $(\hat{\beta},\hat{\sigma})$ attains the minimum of $\Phi$ on $\mathbb{R}^p \times (0,\infty)$, for every $\beta \in \mathbb{R}^p$ we have
\begin{align*}
\Phi(\hat{\beta},\hat{\sigma}) \leq \Phi(\beta,\hat{\sigma}).
\end{align*}
Expanding the definition of $\Phi$ gives
\begin{align*}
\frac{\|Y-X\hat{\beta}\|_2^2}{2n\hat{\sigma}}+\frac{\hat{\sigma}}{2}+\lambda_0\|\hat{\beta}\|_1
\leq
\frac{\|Y-X\beta\|_2^2}{2n\hat{\sigma}}+\frac{\hat{\sigma}}{2}+\lambda_0\|\beta\|_1 .
\end{align*}
Subtracting $\hat{\sigma}/2$ from both sides and multiplying by the positive scalar $\hat{\sigma}$ preserves the inequality:
\begin{align*}
\frac{\|Y-X\hat{\beta}\|_2^2}{2n}+\hat{\sigma}\lambda_0\|\hat{\beta}\|_1
\leq
\frac{\|Y-X\beta\|_2^2}{2n}+\hat{\sigma}\lambda_0\|\beta\|_1 .
\end{align*}
Since this holds for every $\beta \in \mathbb{R}^p$, it follows that
\begin{align*}
\hat{\beta} \in \operatorname*{argmin}_{\beta\in\mathbb R^p}\left\{\frac{\|Y-X\beta\|_2^2}{2n}+\hat\sigma\lambda_0\|\beta\|_1\right\}.
\end{align*}
[/step]
custom_env
admin
[step:Minimize the one-variable scale objective at fixed $\hat\beta$]Define the residual vector $\hat r \in \mathbb{R}^n$ by
\begin{align*}
\hat r := Y-X\hat{\beta}.
\end{align*}
Since $(\hat{\beta},\hat{\sigma})$ is a global minimizer, for every $s>0$ we have
\begin{align*}
\Phi(\hat{\beta},\hat{\sigma}) \leq \Phi(\hat{\beta},s).
\end{align*}
The term $\lambda_0\|\hat{\beta}\|_1$ is independent of $s$, so $\hat{\sigma}$ minimizes the function $\psi: (0,\infty) \to \mathbb{R}$ defined by
\begin{align*}
\psi(s)=\frac{\|\hat r\|_2^2}{2ns}+\frac{s}{2}.
\end{align*}
If $\|\hat r\|_2=0$, then $\psi(s)=s/2$, which has no minimizer on $(0,\infty)$ because $\psi(s)\downarrow 0$ as $s\downarrow 0$. This contradicts the fact that $\hat{\sigma}$ minimizes $\psi$. Hence $\|\hat r\|_2>0$.
For $\|\hat r\|_2>0$, the function $\psi$ is differentiable on $(0,\infty)$ and
\begin{align*}
\psi'(s)=-\frac{\|\hat r\|_2^2}{2ns^2}+\frac{1}{2}.
\end{align*}
Since $\hat{\sigma}$ is an interior minimizer of the differentiable function $\psi$, the one-variable first-order necessary condition for an interior minimum gives $\psi'(\hat{\sigma})=0$. Therefore
\begin{align*}
-\frac{\|\hat r\|_2^2}{2n\hat{\sigma}^2}+\frac{1}{2}=0,
\end{align*}
so
\begin{align*}
\hat{\sigma}^2=\frac{\|\hat r\|_2^2}{n}.
\end{align*}
Because $\hat{\sigma}>0$ and $\|\hat r\|_2>0$, taking positive square roots gives
\begin{align*}
\hat{\sigma}=\frac{\|\hat r\|_2}{\sqrt n}
=\frac{\|Y-X\hat{\beta}\|_2}{\sqrt n}.
\end{align*}[/step]
custom_env
admin
[guided]The only variable in this step is the scale parameter. Define the residual vector $\hat r \in \mathbb{R}^n$ by
\begin{align*}
\hat r := Y-X\hat{\beta}.
\end{align*}
Keeping $\beta=\hat{\beta}$ fixed, the scaled Lasso objective becomes
\begin{align*}
\Phi(\hat{\beta},s)
=
\frac{\|\hat r\|_2^2}{2ns}+\frac{s}{2}+\lambda_0\|\hat{\beta}\|_1,
\qquad s>0.
\end{align*}
Since $\lambda_0\|\hat{\beta}\|_1$ does not depend on $s$, minimizing $\Phi(\hat{\beta},s)$ over $s>0$ is equivalent to minimizing the function $\psi: (0,\infty) \to \mathbb{R}$ defined by
\begin{align*}
\psi(s)=\frac{\|\hat r\|_2^2}{2ns}+\frac{s}{2}.
\end{align*}
First consider the possible failure mode $\|\hat r\|_2=0$. Then the singular term vanishes and
\begin{align*}
\psi(s)=\frac{s}{2}.
\end{align*}
This function has no minimizer on $(0,\infty)$: for every $s>0$, choosing $s/2$ as a smaller positive scale gives
\begin{align*}
\psi(s/2)=\frac{s}{4}<\frac{s}{2}=\psi(s).
\end{align*}
Thus a positive minimizing scale cannot occur when the residual is zero. Since $\hat{\sigma}$ is assumed to be part of an attained minimizer, we must have $\|\hat r\|_2>0$.
Now $\psi$ is differentiable on $(0,\infty)$, and its derivative is
\begin{align*}
\psi'(s)
=
-\frac{\|\hat r\|_2^2}{2ns^2}+\frac{1}{2}.
\end{align*}
Because $\hat{\sigma}\in(0,\infty)$ is an interior minimizer of this differentiable one-variable function, the first-order necessary condition for an interior minimum applies and gives $\psi'(\hat{\sigma})=0$. Hence
\begin{align*}
-\frac{\|\hat r\|_2^2}{2n\hat{\sigma}^2}+\frac{1}{2}=0.
\end{align*}
Solving this equation yields
\begin{align*}
\hat{\sigma}^2=\frac{\|\hat r\|_2^2}{n}.
\end{align*}
The scale parameter is positive by hypothesis, and the residual norm is positive by the preceding paragraph. Therefore the positive square root gives
\begin{align*}
\hat{\sigma}=\frac{\|\hat r\|_2}{\sqrt n}
=\frac{\|Y-X\hat{\beta}\|_2}{\sqrt n}.
\end{align*}
This is the fixed point relation: the scale appearing in the Lasso penalty must equal the empirical residual norm generated by the corresponding Lasso solution.[/guided]
custom_env
admin
[step:Show that exact interpolation gives a boundary infimum along the interpolating ray]
Let $\beta_0 \in \mathbb{R}^p$ satisfy $Y=X\beta_0$. Then for every $\sigma>0$,
\begin{align*}
\Phi(\beta_0,\sigma)
=
\frac{\|Y-X\beta_0\|_2^2}{2n\sigma}+\frac{\sigma}{2}+\lambda_0\|\beta_0\|_1
=
\frac{\sigma}{2}+\lambda_0\|\beta_0\|_1.
\end{align*}
Therefore
\begin{align*}
\inf_{\sigma>0}\Phi(\beta_0,\sigma)
=
\lambda_0\|\beta_0\|_1,
\end{align*}
because $\sigma/2 \downarrow 0$ as $\sigma \downarrow 0$. This fixed-$\beta_0$ infimum is not attained by any $\sigma>0$, since $\sigma/2>0$ for every positive $\sigma$.
Thus, along any sequence $(\sigma_k)_{k=1}^{\infty}$ in $(0,\infty)$ with $\sigma_k\downarrow 0$, the values $\Phi(\beta_0,\sigma_k)$ approach $\lambda_0\|\beta_0\|_1$ without attaining it at a positive scale. If, in addition, this value equals the full infimum of $\Phi$ on $\mathbb{R}^p\times(0,\infty)$, then the full scaled Lasso objective has its infimum approached by the boundary sequence $(\beta_0,\sigma_k)$ and has no attained minimizer at positive scale with zero residual. Together with the fixed-point relation proved above, this completes the characterization.
[/step]