[proofplan]
We construct a primal-dual witness. First we solve the Lasso restricted to the active coordinates $S$ and set the inactive coordinates equal to zero. The first deterministic inequality forces the restricted solution to have the same coordinatewise signs as $\beta_S^*$. We then define the inactive dual variables from the full KKT equations and use the irrepresentable condition together with the stochastic residual bound to show that they lie in the subdifferential of the $\ell^1$ norm at zero. The full KKT system then proves optimality, and strict inactive dual feasibility gives uniqueness.
[/proofplan]
custom_env
admin
[step:Construct the restricted active solution and its active dual vector]Define the full Lasso objective $L: \mathbb{R}^p \to \mathbb{R}$ by
\begin{align*}
L(b) := \frac{1}{2n}\|Y-X b\|_2^2+\lambda\|b\|_1.
\end{align*}
For every index set $A\subset\{1,\dots,p\}$, let $X_A\in\mathbb{R}^{n\times A}$ denote the submatrix of $X$ with columns indexed by $A$, and for every vector $b\in\mathbb{R}^p$, let $b_A\in\mathbb{R}^A$ denote its coordinate subvector indexed by $A$. In particular, $S^c:=\{1,\dots,p\}\setminus S$.
Define the restricted objective $L_S: \mathbb{R}^{S} \to \mathbb{R}$ by
\begin{align*}
L_S(b) := \frac{1}{2n}\|Y-X_S b\|_2^2+\lambda\|b\|_1.
\end{align*}
The active Gram matrix satisfies
\begin{align*}
\hat{\Sigma}_{SS}=\frac{X_S^\top X_S}{n}.
\end{align*}
It is symmetric and positive semidefinite because, for every $v\in\mathbb{R}^{S}$,
\begin{align*}
v^\top\hat{\Sigma}_{SS}v=\frac{|X_Sv|^2}{n}\ge 0.
\end{align*}
Since $\hat{\Sigma}_{SS}$ is also invertible by hypothesis, it is positive definite on $\mathbb{R}^{S}$. Let $c_S>0$ denote its smallest eigenvalue. The Hessian of the quadratic part of $L_S$ is $\hat{\Sigma}_{SS}$, and positive definiteness gives the lower bound $b^\top\hat{\Sigma}_{SS}b\ge c_S|b|^2$ for every $b\in\mathbb{R}^S$. Hence the quadratic part of $L_S$ is coercive and strictly convex, while $\lambda\|b\|_1$ is continuous and convex. Therefore $L_S$ has a minimiser by coercivity and continuity, and this minimiser is unique by strict convexity. Denote it by $\hat{\beta}_S \in \mathbb{R}^{S}$.
Define
\begin{align*}
\hat{z}_S := \operatorname{sign}(\beta_S^*) \in \mathbb{R}^{S}.
\end{align*}
We claim that the vector
\begin{align*}
\hat{\beta}_S
:=
\beta_S^*
+
\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top\varepsilon}{n}
-
\lambda \hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*)
\end{align*}
is the restricted minimiser and has sign $\operatorname{sign}(\beta_S^*)$.
Indeed, using $Y=X_S\beta_S^*+\varepsilon$, we compute
\begin{align*}
\frac{1}{n}X_S^\top(Y-X_S\hat{\beta}_S)
=
\frac{1}{n}X_S^\top\varepsilon
-
\hat{\Sigma}_{SS}(\hat{\beta}_S-\beta_S^*)
=
\lambda \operatorname{sign}(\beta_S^*).
\end{align*}
Thus
\begin{align*}
-\frac{1}{n}X_S^\top(Y-X_S\hat{\beta}_S)+\lambda \hat{z}_S=0.
\end{align*}
Once we prove $\operatorname{sign}(\hat{\beta}_S)=\operatorname{sign}(\beta_S^*)$, the vector $\hat{z}_S$ belongs to $\partial\|\hat{\beta}_S\|_1$, so this is exactly the restricted [KKT condition](/page/Karush-Kuhn-Tucker%20Conditions), equivalently the [subgradient](/page/Subgradient) optimality equation for the restricted convex objective.[/step]
custom_env
admin
[guided]The full Lasso objective used later in the proof is the map $L: \mathbb{R}^p \to \mathbb{R}$ defined by
\begin{align*}
L(b) := \frac{1}{2n}\|Y-X b\|_2^2+\lambda\|b\|_1.
\end{align*}
For every index set $A\subset\{1,\dots,p\}$, let $X_A\in\mathbb{R}^{n\times A}$ denote the submatrix of $X$ with columns indexed by $A$, and for every vector $b\in\mathbb{R}^p$, let $b_A\in\mathbb{R}^A$ denote its coordinate subvector indexed by $A$. In particular, $S^c:=\{1,\dots,p\}\setminus S$. The restricted problem keeps only the active variables and temporarily forces all coefficients outside $S$ to be zero. We define the restricted objective $L_S: \mathbb{R}^{S} \to \mathbb{R}$ by
\begin{align*}
L_S(b) := \frac{1}{2n}\|Y-X_S b\|_2^2+\lambda\|b\|_1.
\end{align*}
The Hessian of the quadratic part is the active Gram matrix
\begin{align*}
\hat{\Sigma}_{SS}=\frac{X_S^\top X_S}{n}.
\end{align*}
For every $v\in\mathbb{R}^{S}$,
\begin{align*}
v^\top\hat{\Sigma}_{SS}v=\frac{|X_Sv|^2}{n}\ge 0,
\end{align*}
so $\hat{\Sigma}_{SS}$ is symmetric and positive semidefinite. By hypothesis this matrix is invertible, and an invertible symmetric positive semidefinite matrix is positive definite on $\mathbb{R}^{S}$. Let $c_S>0$ denote the smallest eigenvalue of $\hat{\Sigma}_{SS}$. The Hessian of the quadratic part of $L_S$ is $\hat{\Sigma}_{SS}$, and positive definiteness gives
\begin{align*}
b^\top\hat{\Sigma}_{SS}b\ge c_S|b|^2
\end{align*}
for every $b\in\mathbb{R}^S$. This lower quadratic bound is the reason the quadratic part is coercive and strictly convex. Since $\lambda\|b\|_1$ is continuous and convex, $L_S$ is continuous, coercive, and strictly convex. Coercivity and continuity give existence of a minimiser, and strict convexity gives uniqueness.
The candidate restricted solution is chosen by solving the active KKT equation with the intended sign vector. Define
\begin{align*}
\hat{z}_S := \operatorname{sign}(\beta_S^*) \in \mathbb{R}^{S}
\end{align*}
and
\begin{align*}
\hat{\beta}_S
:=
\beta_S^*
+
\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top\varepsilon}{n}
-
\lambda \hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*).
\end{align*}
Because $Y=X\beta^*+\varepsilon=X_S\beta_S^*+\varepsilon$, the residual for this active candidate is
\begin{align*}
Y-X_S\hat{\beta}_S
=
\varepsilon-X_S(\hat{\beta}_S-\beta_S^*).
\end{align*}
Multiplying by $X_S^\top/n$ gives
\begin{align*}
\frac{1}{n}X_S^\top(Y-X_S\hat{\beta}_S)
=
\frac{1}{n}X_S^\top\varepsilon
-
\hat{\Sigma}_{SS}(\hat{\beta}_S-\beta_S^*).
\end{align*}
Substituting the definition of $\hat{\beta}_S-\beta_S^*$ gives
\begin{align*}
\frac{1}{n}X_S^\top(Y-X_S\hat{\beta}_S)
=
\frac{1}{n}X_S^\top\varepsilon
-
\left(
\frac{1}{n}X_S^\top\varepsilon
-
\lambda\operatorname{sign}(\beta_S^*)
\right)
=
\lambda\operatorname{sign}(\beta_S^*).
\end{align*}
Therefore
\begin{align*}
-\frac{1}{n}X_S^\top(Y-X_S\hat{\beta}_S)+\lambda\hat{z}_S=0.
\end{align*}
This is the restricted [KKT condition](/page/Karush-Kuhn-Tucker%20Conditions), equivalently the [subgradient](/page/Subgradient) optimality equation for the restricted Lasso, provided $\hat{z}_S$ is a valid subgradient of $\|\cdot\|_1$ at $\hat{\beta}_S$. That validity is exactly the sign preservation statement proved in the next step.[/guided]
custom_env
admin
[step:Use the active error bound to preserve the signs on $S$]For every $j \in S$,
\begin{align*}
|\hat{\beta}_j-\beta_j^*|
\le
\left\|\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top\varepsilon}{n}\right\|_\infty
+
\lambda
\left\|\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*)\right\|_\infty
<
\min_{k\in S}|\beta_k^*|
\le |\beta_j^*|.
\end{align*}
Hence $\hat{\beta}_j$ cannot cross zero from $\beta_j^*$, and therefore
\begin{align*}
\operatorname{sign}(\hat{\beta}_S)=\operatorname{sign}(\beta_S^*).
\end{align*}
Thus $\hat{z}_S=\operatorname{sign}(\hat{\beta}_S)$ and the restricted KKT equation from the previous step proves that $\hat{\beta}_S$ is the restricted Lasso minimiser.[/step]
custom_env
admin
[guided]The active error estimate compares each active coordinate of the candidate $\hat{\beta}_S$ to the corresponding true coefficient. For every $j\in S$, the formula for $\hat{\beta}_S-\beta_S^*$ gives
\begin{align*}
|\hat{\beta}_j-\beta_j^*|
\le
\left\|\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top\varepsilon}{n}
-
\lambda\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*)\right\|_\infty.
\end{align*}
The triangle inequality for the $\ell^\infty$ norm then gives
\begin{align*}
|\hat{\beta}_j-\beta_j^*|
\le
\left\|\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top\varepsilon}{n}\right\|_\infty
+
\lambda
\left\|\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*)\right\|_\infty.
\end{align*}
The displayed active hypothesis in the theorem statement says that this last quantity is strictly smaller than $\min_{k\in S}|\beta_k^*|$. Hence
\begin{align*}
|\hat{\beta}_j-\beta_j^*|<\min_{k\in S}|\beta_k^*|\le |\beta_j^*|.
\end{align*}
This strict inequality is exactly what prevents a sign change: the perturbation from $\beta_j^*$ to $\hat{\beta}_j$ is smaller than the distance from $\beta_j^*$ to zero. Therefore $\hat{\beta}_j$ has the same sign as $\beta_j^*$ for every $j\in S$, and so
\begin{align*}
\operatorname{sign}(\hat{\beta}_S)=\operatorname{sign}(\beta_S^*).
\end{align*}
Consequently $\hat{z}_S=\operatorname{sign}(\hat{\beta}_S)$ is a valid subgradient of $\|\cdot\|_1$ at $\hat{\beta}_S$, and the restricted KKT equation already verified proves that $\hat{\beta}_S$ is the restricted Lasso minimiser.[/guided]
custom_env
admin
[step:Define the inactive coefficients and solve for the inactive dual variables]Define the full primal vector $\hat{\beta}\in\mathbb{R}^p$ by taking its active block to be the vector $\hat{\beta}_S$ constructed above and its inactive block to satisfy
\begin{align*}
\hat{\beta}_{S^c}:=0.
\end{align*}
Let $I_n\in\mathbb{R}^{n\times n}$ denote the identity matrix. Define the inactive-active empirical covariance block $\hat{\Sigma}_{S^cS}\in\mathbb{R}^{S^c\times S}$ by
\begin{align*}
\hat{\Sigma}_{S^cS}:=\frac{X_{S^c}^\top X_S}{n}.
\end{align*}
Since $\lambda>0$ by hypothesis, define $\hat{z}_{S^c}\in\mathbb{R}^{S^c}$ by the inactive KKT equation
\begin{align*}
\lambda \hat{z}_{S^c}
:=
\frac{1}{n}X_{S^c}^\top(Y-X\hat{\beta}).
\end{align*}
Since $\hat{\beta}_{S^c}=0$, this residual is $Y-X\hat{\beta}=Y-X_S\hat{\beta}_S$. Substituting $Y=X_S\beta_S^*+\varepsilon$ and the active error formula gives
\begin{align*}
Y-X_S\hat{\beta}_S
=
\left(I_n-X_S\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top}{n}\right)\varepsilon
+
\lambda X_S\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*).
\end{align*}
Therefore
\begin{align*}
\hat{z}_{S^c}
=
\frac{1}{\lambda}
\frac{X_{S^c}^\top}{n}
\left(I_n-X_S\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top}{n}\right)\varepsilon
+
\hat{\Sigma}_{S^cS}\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*).
\end{align*}[/step]
custom_env
admin
[guided]We now extend the active restricted solution to a full vector by setting every inactive coefficient to zero:
\begin{align*}
\hat{\beta}_{S^c}:=0.
\end{align*}
Let $I_n\in\mathbb{R}^{n\times n}$ denote the identity matrix, and define the inactive-active empirical covariance block $\hat{\Sigma}_{S^cS}\in\mathbb{R}^{S^c\times S}$ by
\begin{align*}
\hat{\Sigma}_{S^cS}:=\frac{X_{S^c}^\top X_S}{n}.
\end{align*}
The inactive dual vector is not chosen freely; it is forced by the inactive block of the KKT equation. Since $\lambda>0$ by hypothesis, define $\hat{z}_{S^c}\in\mathbb{R}^{S^c}$ by
\begin{align*}
\lambda\hat{z}_{S^c}:=\frac{1}{n}X_{S^c}^\top(Y-X\hat{\beta}).
\end{align*}
Since $\hat{\beta}_{S^c}=0$, the residual is $Y-X\hat{\beta}=Y-X_S\hat{\beta}_S$. Using $Y=X_S\beta_S^*+\varepsilon$ and the active error formula
\begin{align*}
\hat{\beta}_S-\beta_S^*
=
\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top\varepsilon}{n}
-
\lambda\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*),
\end{align*}
we first obtain
\begin{align*}
Y-X_S\hat{\beta}_S=
\varepsilon-X_S(\hat{\beta}_S-\beta_S^*).
\end{align*}
Substituting the active error formula gives
\begin{align*}
Y-X_S\hat{\beta}_S=
\varepsilon
-X_S\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top\varepsilon}{n}
+
\lambda X_S\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*).
\end{align*}
Factoring the two terms containing $\varepsilon$ gives
\begin{align*}
Y-X_S\hat{\beta}_S=
\left(I_n-X_S\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top}{n}\right)\varepsilon
+
\lambda X_S\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*).
\end{align*}
Multiplying by $X_{S^c}^\top/n$ and dividing by $\lambda$ gives
\begin{align*}
\hat{z}_{S^c}
=
\frac{1}{\lambda}
\frac{X_{S^c}^\top}{n}
\left(I_n-X_S\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top}{n}\right)\varepsilon
+
\frac{X_{S^c}^\top X_S}{n}\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*).
\end{align*}
By the definition of $\hat{\Sigma}_{S^cS}$, this is
\begin{align*}
\hat{z}_{S^c}
=
\frac{1}{\lambda}
\frac{X_{S^c}^\top}{n}
\left(I_n-X_S\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top}{n}\right)\varepsilon
+
\hat{\Sigma}_{S^cS}\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*).
\end{align*}[/guided]
custom_env
admin
[step:Control the inactive dual variables by the irrepresentable condition]Taking the $\ell^\infty$ norm and using the triangle inequality,
\begin{align*}
\|\hat{z}_{S^c}\|_\infty
\le
\frac{1}{\lambda}
\left\|
\frac{X_{S^c}^\top}{n}
\left(I_n-X_S\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top}{n}\right)\varepsilon
\right\|_\infty
+
\left\|
\hat{\Sigma}_{S^cS}\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*)
\right\|_\infty.
\end{align*}
The stochastic inactive bound gives the first term at most $\eta$, using $\lambda>0$ to divide the inequality by $\lambda$. The irrepresentable condition with margin $\eta$, namely
\begin{align*}
\left\|
\hat{\Sigma}_{S^cS}\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*)
\right\|_\infty
\le 1-\eta,
\end{align*}
gives the second term at most $1-\eta$. Hence
\begin{align*}
\|\hat{z}_{S^c}\|_\infty \le \eta+(1-\eta)=1.
\end{align*}
Because $\hat{\beta}_{S^c}=0$, the condition $\|\hat{z}_{S^c}\|_\infty\le 1$ says exactly that $\hat{z}_{S^c}\in \partial\|\hat{\beta}_{S^c}\|_1$.[/step]
custom_env
admin
[guided]The inactive dual vector has two contributions: a random residual contribution and a deterministic design contribution. From the previous step,
\begin{align*}
\hat{z}_{S^c}
=
\frac{1}{\lambda}
\frac{X_{S^c}^\top}{n}
\left(I_n-X_S\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top}{n}\right)\varepsilon
+
\hat{\Sigma}_{S^cS}\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*).
\end{align*}
Taking the $\ell^\infty$ norm and applying the triangle inequality to these two terms gives
\begin{align*}
\|\hat{z}_{S^c}\|_\infty
\le
\frac{1}{\lambda}
\left\|
\frac{X_{S^c}^\top}{n}
\left(I_n-X_S\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top}{n}\right)\varepsilon
\right\|_\infty
+
\left\|
\hat{\Sigma}_{S^cS}\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*)
\right\|_\infty.
\end{align*}
The stochastic inactive bound in the theorem statement controls the residual term: since $\lambda>0$, dividing by $\lambda$ shows that it is at most $\eta$. The irrepresentable condition with margin $\eta$ is the design inequality
\begin{align*}
\left\|
\hat{\Sigma}_{S^cS}\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*)
\right\|_\infty
\le 1-\eta.
\end{align*}
This controls the deterministic design term by $1-\eta$.
Therefore
\begin{align*}
\|\hat{z}_{S^c}\|_\infty \le \eta+(1-\eta)=1.
\end{align*}
This is exactly the subgradient condition for the inactive coordinates: since $\hat{\beta}_{S^c}=0$, a vector belongs to $\partial\|\hat{\beta}_{S^c}\|_1$ precisely when all of its coordinates have absolute value at most $1$.[/guided]
custom_env
admin
[step:Verify the full KKT system and conclude sign consistency]Define $\hat{z}\in\mathbb{R}^p$ by combining $\hat{z}_S$ and $\hat{z}_{S^c}$. From the active KKT equation,
\begin{align*}
-\frac{1}{n}X_S^\top(Y-X\hat{\beta})+\lambda \hat{z}_S=0.
\end{align*}
From the definition of $\hat{z}_{S^c}$,
\begin{align*}
-\frac{1}{n}X_{S^c}^\top(Y-X\hat{\beta})+\lambda \hat{z}_{S^c}=0.
\end{align*}
Combining the two block equations gives
\begin{align*}
-\frac{1}{n}X^\top(Y-X\hat{\beta})+\lambda \hat{z}=0.
\end{align*}
We have already shown $\hat{z}_S=\operatorname{sign}(\hat{\beta}_S)$ and $\|\hat{z}_{S^c}\|_\infty\le 1$ with $\hat{\beta}_{S^c}=0$, so $\hat{z}\in\partial\|\hat{\beta}\|_1$. The objective $L$ is convex because it is the sum of a convex quadratic function and the convex function $\lambda\|\cdot\|_1$. By the defining [subgradient](/page/Subgradient) inequality, for every $b\in\mathbb{R}^p$,
\begin{align*}
L(b)\ge L(\hat{\beta})+0^\top(b-\hat{\beta})=L(\hat{\beta}).
\end{align*}
The displayed [KKT condition](/page/Karush-Kuhn-Tucker%20Conditions) is exactly the inclusion $0\in\partial L(\hat{\beta})$, so $\hat{\beta}$ is a Lasso minimiser.
Since $\operatorname{sign}(\hat{\beta}_S)=\operatorname{sign}(\beta_S^*)$ and $\hat{\beta}_{S^c}=0=\beta^*_{S^c}$, we conclude
\begin{align*}
\operatorname{sign}(\hat{\beta})=\operatorname{sign}(\beta^*).
\end{align*}[/step]
custom_env
admin
[guided]We combine the active and inactive KKT equations into one vector equation. Define $\hat{z}\in\mathbb{R}^p$ by taking its active block to be $\hat{z}_S$ and its inactive block to be $\hat{z}_{S^c}$. On the active block, because $\hat{\beta}_{S^c}=0$, the residual $Y-X\hat{\beta}$ equals $Y-X_S\hat{\beta}_S$, and the restricted KKT equation gives
\begin{align*}
-\frac{1}{n}X_S^\top(Y-X\hat{\beta})+\lambda\hat{z}_S=0.
\end{align*}
On the inactive block, the definition of $\hat{z}_{S^c}$ gives
\begin{align*}
-\frac{1}{n}X_{S^c}^\top(Y-X\hat{\beta})+\lambda\hat{z}_{S^c}=0.
\end{align*}
Putting these two block equations together yields
\begin{align*}
-\frac{1}{n}X^\top(Y-X\hat{\beta})+\lambda\hat{z}=0.
\end{align*}
We also know that $\hat{z}$ is a valid subgradient of the $\ell^1$ norm at $\hat{\beta}$. Indeed, on $S$ we proved $\hat{z}_S=\operatorname{sign}(\hat{\beta}_S)$, while on $S^c$ we have $\hat{\beta}_{S^c}=0$ and $\|\hat{z}_{S^c}\|_\infty\le 1$. Therefore $\hat{z}\in\partial\|\hat{\beta}\|_1$.
Now use the defining [subgradient](/page/Subgradient) inequality directly. The Lasso objective $L$ is convex because the squared-error term is a convex quadratic function and $\lambda\|\cdot\|_1$ is convex. The previous displayed [KKT condition](/page/Karush-Kuhn-Tucker%20Conditions), together with $\hat{z}\in\partial\|\hat{\beta}\|_1$, is exactly the inclusion $0\in\partial L(\hat{\beta})$. Therefore, for every $b\in\mathbb{R}^p$,
\begin{align*}
L(b)\ge L(\hat{\beta})+0^\top(b-\hat{\beta})=L(\hat{\beta}).
\end{align*}
Hence $\hat{\beta}$ is a Lasso minimiser.
Finally, the active sign preservation gives $\operatorname{sign}(\hat{\beta}_S)=\operatorname{sign}(\beta_S^*)$, and the construction gives $\hat{\beta}_{S^c}=0=\beta^*_{S^c}$. Thus
\begin{align*}
\operatorname{sign}(\hat{\beta})=\operatorname{sign}(\beta^*).
\end{align*}[/guided]
custom_env
admin
[step:Use strict inactive dual feasibility to prove uniqueness]Assume now that the inactive KKT inequalities for the primal-dual witness are strict. For the inactive dual vector constructed above, this means
\begin{align*}
\|\hat{z}_{S^c}\|_\infty<1.
\end{align*}
Let $\tilde{\beta}\in\mathbb{R}^p$ be any Lasso minimiser, and define $\delta:=\tilde{\beta}-\hat{\beta}$. Using the full Lasso objective $L: \mathbb{R}^p \to \mathbb{R}$ defined above, convexity of the quadratic loss and the KKT equation at $\hat{\beta}$ give
\begin{align*}
L(\tilde{\beta})-L(\hat{\beta})
\ge
\lambda\left(\|\tilde{\beta}\|_1-\|\hat{\beta}\|_1-\hat{z}^\top\delta\right).
\end{align*}
Since both vectors are minimisers, the left-hand side is zero. For each $j\in S$, the identity $\hat{z}_j=\operatorname{sign}(\hat{\beta}_j)$ and convexity of the absolute value give
\begin{align*}
|\tilde{\beta}_j|-|\hat{\beta}_j|-\hat{z}_j(\tilde{\beta}_j-\hat{\beta}_j)\ge 0.
\end{align*}
For each $j\in S^c$, the identities $\hat{\beta}_j=0$ and $|\hat{z}_j|<1$ give
\begin{align*}
|\tilde{\beta}_j|-\hat{z}_j\tilde{\beta}_j
\ge
(1-|\hat{z}_j|)|\tilde{\beta}_j|.
\end{align*}
The sum of these nonnegative terms is at most zero, hence each inactive term vanishes. Therefore $\tilde{\beta}_{S^c}=0$.
Now $\tilde{\beta}_{S^c}=0$, so the full objective value at $\tilde{\beta}$ equals $L_S(\tilde{\beta}_S)$. If there were some $b\in\mathbb{R}^S$ with $L_S(b)<L_S(\tilde{\beta}_S)$, then the full vector $u\in\mathbb{R}^p$ defined by $u_S=b$ and $u_{S^c}=0$ would satisfy $L(u)<L(\tilde{\beta})$, contradicting that $\tilde{\beta}$ is a full Lasso minimiser. Thus $\tilde{\beta}_S$ minimises the restricted objective. The vector $\hat{\beta}_S$ also minimises the restricted objective by construction. Since $\hat{\Sigma}_{SS}$ is invertible, the restricted objective is [strictly convex](/page/Strict%20Convexity). A strictly convex function has at most one minimiser: if $u\neq v$ were two minimisers, then the midpoint inequality would give a strictly smaller value at $(u+v)/2$, contradicting minimality. Hence $\tilde{\beta}_S=\hat{\beta}_S$ and also $\tilde{\beta}_{S^c}=\hat{\beta}_{S^c}=0$. Thus $\tilde{\beta}=\hat{\beta}$, proving uniqueness.[/step]
custom_env
admin
[guided]We now explain why strict inactive dual feasibility rules out every other minimiser. In the language of the theorem, the inactive KKT inequalities are strict exactly when the inactive dual vector constructed above satisfies
\begin{align*}
\|\hat{z}_{S^c}\|_\infty<1.
\end{align*}
Let $\tilde{\beta}\in\mathbb{R}^p$ be any minimiser of the full Lasso objective $L: \mathbb{R}^p \to \mathbb{R}$, and define the difference vector $\delta:=\tilde{\beta}-\hat{\beta}$. The KKT equation for $\hat{\beta}$ says
\begin{align*}
-\frac{1}{n}X^\top(Y-X\hat{\beta})+\lambda\hat{z}=0,
\end{align*}
so
\begin{align*}
\frac{1}{n}(Y-X\hat{\beta})^\top X\delta=\lambda\hat{z}^\top\delta.
\end{align*}
The quadratic loss is convex, and its first-order expansion at $\hat{\beta}$ gives
\begin{align*}
\frac{1}{2n}\|Y-X\tilde{\beta}\|_2^2-\frac{1}{2n}\|Y-X\hat{\beta}\|_2^2
\ge
-\frac{1}{n}(Y-X\hat{\beta})^\top X\delta.
\end{align*}
Combining this inequality with the KKT identity gives
\begin{align*}
L(\tilde{\beta})-L(\hat{\beta})
\ge
\lambda\left(\|\tilde{\beta}\|_1-\|\hat{\beta}\|_1-\hat{z}^\top\delta\right).
\end{align*}
Since both $\tilde{\beta}$ and $\hat{\beta}$ are minimisers, the left-hand side equals $0$. The right-hand side is a sum of coordinatewise subgradient gaps. For each active coordinate $j\in S$, the identity $\hat{z}_j=\operatorname{sign}(\hat{\beta}_j)$ and convexity of the absolute value give
\begin{align*}
|\tilde{\beta}_j|-|\hat{\beta}_j|-\hat{z}_j(\tilde{\beta}_j-\hat{\beta}_j)\ge 0.
\end{align*}
For each inactive coordinate $j\in S^c$, the identities $\hat{\beta}_j=0$ and $|\hat{z}_j|<1$ give
\begin{align*}
|\tilde{\beta}_j|-\hat{z}_j\tilde{\beta}_j
\ge
|\tilde{\beta}_j|-|\hat{z}_j|\,|\tilde{\beta}_j|
=
(1-|\hat{z}_j|)|\tilde{\beta}_j|.
\end{align*}
Every coordinate contribution is nonnegative, while their sum is at most $0$. Hence every inactive contribution must vanish. Because $1-|\hat{z}_j|>0$ for each $j\in S^c$, this forces $|\tilde{\beta}_j|=0$ for each inactive coordinate, so $\tilde{\beta}_{S^c}=0$.
With the inactive coordinates zero, the full objective value at $\tilde{\beta}$ equals $L_S(\tilde{\beta}_S)$. We must justify that $\tilde{\beta}_S$ minimises the restricted objective. Suppose instead that there were some $b\in\mathbb{R}^S$ with $L_S(b)<L_S(\tilde{\beta}_S)$. Define $u\in\mathbb{R}^p$ by $u_S=b$ and $u_{S^c}=0$. Because $u$ is supported on $S$, the full objective at $u$ equals $L_S(b)$, while the full objective at $\tilde{\beta}$ equals $L_S(\tilde{\beta}_S)$. This would give $L(u)<L(\tilde{\beta})$, contradicting that $\tilde{\beta}$ is a full Lasso minimiser. Hence $\tilde{\beta}_S$ minimises $L_S$.
The vector $\hat{\beta}_S$ also minimises the restricted objective $L_S$ by construction. The matrix $\hat{\Sigma}_{SS}$ is invertible, so $L_S$ is [strictly convex](/page/Strict%20Convexity). Strict convexity gives uniqueness of the restricted minimiser: if two different restricted minimisers existed, evaluating $L_S$ at their midpoint would give a strictly smaller value than the common minimum, contradicting minimality. Therefore $\tilde{\beta}_S=\hat{\beta}_S$. Combining this with $\tilde{\beta}_{S^c}=\hat{\beta}_{S^c}=0$ yields $\tilde{\beta}=\hat{\beta}$, proving uniqueness.[/guided]