Androma — The Home of Mathematics on the Internet

custom_env admin

[step:Construct the restricted active solution and its active dual vector]Define the full Lasso objective $L: \mathbb{R}^p \to \mathbb{R}$ by \begin{align*} L(b) := \frac{1}{2n}\|Y-X b\|_2^2+\lambda\|b\|_1. \end{align*} For every index set $A\subset\{1,\dots,p\}$, let $X_A\in\mathbb{R}^{n\times A}$ denote the submatrix of $X$ with columns indexed by $A$, and for every vector $b\in\mathbb{R}^p$, let $b_A\in\mathbb{R}^A$ denote its coordinate subvector indexed by $A$. In particular, $S^c:=\{1,\dots,p\}\setminus S$. Define the restricted objective $L_S: \mathbb{R}^{S} \to \mathbb{R}$ by \begin{align*} L_S(b) := \frac{1}{2n}\|Y-X_S b\|_2^2+\lambda\|b\|_1. \end{align*} The active Gram matrix satisfies \begin{align*} \hat{\Sigma}_{SS}=\frac{X_S^\top X_S}{n}. \end{align*} It is symmetric and positive semidefinite because, for every $v\in\mathbb{R}^{S}$, \begin{align*} v^\top\hat{\Sigma}_{SS}v=\frac{|X_Sv|^2}{n}\ge 0. \end{align*} Since $\hat{\Sigma}_{SS}$ is also invertible by hypothesis, it is positive definite on $\mathbb{R}^{S}$. Let $c_S>0$ denote its smallest eigenvalue. The Hessian of the quadratic part of $L_S$ is $\hat{\Sigma}_{SS}$, and positive definiteness gives the lower bound $b^\top\hat{\Sigma}_{SS}b\ge c_S|b|^2$ for every $b\in\mathbb{R}^S$. Hence the quadratic part of $L_S$ is coercive and strictly convex, while $\lambda\|b\|_1$ is continuous and convex. Therefore $L_S$ has a minimiser by coercivity and continuity, and this minimiser is unique by strict convexity. Denote it by $\hat{\beta}_S \in \mathbb{R}^{S}$. Define \begin{align*} \hat{z}_S := \operatorname{sign}(\beta_S^*) \in \mathbb{R}^{S}. \end{align*} We claim that the vector \begin{align*} \hat{\beta}_S := \beta_S^* + \hat{\Sigma}_{SS}^{-1}\frac{X_S^\top\varepsilon}{n} - \lambda \hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*) \end{align*} is the restricted minimiser and has sign $\operatorname{sign}(\beta_S^*)$. Indeed, using $Y=X_S\beta_S^*+\varepsilon$, we compute \begin{align*} \frac{1}{n}X_S^\top(Y-X_S\hat{\beta}_S) = \frac{1}{n}X_S^\top\varepsilon - \hat{\Sigma}_{SS}(\hat{\beta}_S-\beta_S^*) = \lambda \operatorname{sign}(\beta_S^*). \end{align*} Thus \begin{align*} -\frac{1}{n}X_S^\top(Y-X_S\hat{\beta}_S)+\lambda \hat{z}_S=0. \end{align*} Once we prove $\operatorname{sign}(\hat{\beta}_S)=\operatorname{sign}(\beta_S^*)$, the vector $\hat{z}_S$ belongs to $\partial\|\hat{\beta}_S\|_1$, so this is exactly the restricted [KKT condition](/page/Karush-Kuhn-Tucker%20Conditions), equivalently the [subgradient](/page/Subgradient) optimality equation for the restricted convex objective.[/step]

custom_env admin

[guided]The full Lasso objective used later in the proof is the map $L: \mathbb{R}^p \to \mathbb{R}$ defined by \begin{align*} L(b) := \frac{1}{2n}\|Y-X b\|_2^2+\lambda\|b\|_1. \end{align*} For every index set $A\subset\{1,\dots,p\}$, let $X_A\in\mathbb{R}^{n\times A}$ denote the submatrix of $X$ with columns indexed by $A$, and for every vector $b\in\mathbb{R}^p$, let $b_A\in\mathbb{R}^A$ denote its coordinate subvector indexed by $A$. In particular, $S^c:=\{1,\dots,p\}\setminus S$. The restricted problem keeps only the active variables and temporarily forces all coefficients outside $S$ to be zero. We define the restricted objective $L_S: \mathbb{R}^{S} \to \mathbb{R}$ by \begin{align*} L_S(b) := \frac{1}{2n}\|Y-X_S b\|_2^2+\lambda\|b\|_1. \end{align*} The Hessian of the quadratic part is the active Gram matrix \begin{align*} \hat{\Sigma}_{SS}=\frac{X_S^\top X_S}{n}. \end{align*} For every $v\in\mathbb{R}^{S}$, \begin{align*} v^\top\hat{\Sigma}_{SS}v=\frac{|X_Sv|^2}{n}\ge 0, \end{align*} so $\hat{\Sigma}_{SS}$ is symmetric and positive semidefinite. By hypothesis this matrix is invertible, and an invertible symmetric positive semidefinite matrix is positive definite on $\mathbb{R}^{S}$. Let $c_S>0$ denote the smallest eigenvalue of $\hat{\Sigma}_{SS}$. The Hessian of the quadratic part of $L_S$ is $\hat{\Sigma}_{SS}$, and positive definiteness gives \begin{align*} b^\top\hat{\Sigma}_{SS}b\ge c_S|b|^2 \end{align*} for every $b\in\mathbb{R}^S$. This lower quadratic bound is the reason the quadratic part is coercive and strictly convex. Since $\lambda\|b\|_1$ is continuous and convex, $L_S$ is continuous, coercive, and strictly convex. Coercivity and continuity give existence of a minimiser, and strict convexity gives uniqueness. The candidate restricted solution is chosen by solving the active KKT equation with the intended sign vector. Define \begin{align*} \hat{z}_S := \operatorname{sign}(\beta_S^*) \in \mathbb{R}^{S} \end{align*} and \begin{align*} \hat{\beta}_S := \beta_S^* + \hat{\Sigma}_{SS}^{-1}\frac{X_S^\top\varepsilon}{n} - \lambda \hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*). \end{align*} Because $Y=X\beta^*+\varepsilon=X_S\beta_S^*+\varepsilon$, the residual for this active candidate is \begin{align*} Y-X_S\hat{\beta}_S = \varepsilon-X_S(\hat{\beta}_S-\beta_S^*). \end{align*} Multiplying by $X_S^\top/n$ gives \begin{align*} \frac{1}{n}X_S^\top(Y-X_S\hat{\beta}_S) = \frac{1}{n}X_S^\top\varepsilon - \hat{\Sigma}_{SS}(\hat{\beta}_S-\beta_S^*). \end{align*} Substituting the definition of $\hat{\beta}_S-\beta_S^*$ gives \begin{align*} \frac{1}{n}X_S^\top(Y-X_S\hat{\beta}_S) = \frac{1}{n}X_S^\top\varepsilon - \left( \frac{1}{n}X_S^\top\varepsilon - \lambda\operatorname{sign}(\beta_S^*) \right) = \lambda\operatorname{sign}(\beta_S^*). \end{align*} Therefore \begin{align*} -\frac{1}{n}X_S^\top(Y-X_S\hat{\beta}_S)+\lambda\hat{z}_S=0. \end{align*} This is the restricted [KKT condition](/page/Karush-Kuhn-Tucker%20Conditions), equivalently the [subgradient](/page/Subgradient) optimality equation for the restricted Lasso, provided $\hat{z}_S$ is a valid subgradient of $\|\cdot\|_1$ at $\hat{\beta}_S$. That validity is exactly the sign preservation statement proved in the next step.[/guided]

custom_env admin

[guided]We now extend the active restricted solution to a full vector by setting every inactive coefficient to zero: \begin{align*} \hat{\beta}_{S^c}:=0. \end{align*} Let $I_n\in\mathbb{R}^{n\times n}$ denote the identity matrix, and define the inactive-active empirical covariance block $\hat{\Sigma}_{S^cS}\in\mathbb{R}^{S^c\times S}$ by \begin{align*} \hat{\Sigma}_{S^cS}:=\frac{X_{S^c}^\top X_S}{n}. \end{align*} The inactive dual vector is not chosen freely; it is forced by the inactive block of the KKT equation. Since $\lambda>0$ by hypothesis, define $\hat{z}_{S^c}\in\mathbb{R}^{S^c}$ by \begin{align*} \lambda\hat{z}_{S^c}:=\frac{1}{n}X_{S^c}^\top(Y-X\hat{\beta}). \end{align*} Since $\hat{\beta}_{S^c}=0$, the residual is $Y-X\hat{\beta}=Y-X_S\hat{\beta}_S$. Using $Y=X_S\beta_S^*+\varepsilon$ and the active error formula \begin{align*} \hat{\beta}_S-\beta_S^* = \hat{\Sigma}_{SS}^{-1}\frac{X_S^\top\varepsilon}{n} - \lambda\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*), \end{align*} we first obtain \begin{align*} Y-X_S\hat{\beta}_S= \varepsilon-X_S(\hat{\beta}_S-\beta_S^*). \end{align*} Substituting the active error formula gives \begin{align*} Y-X_S\hat{\beta}_S= \varepsilon -X_S\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top\varepsilon}{n} + \lambda X_S\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*). \end{align*} Factoring the two terms containing $\varepsilon$ gives \begin{align*} Y-X_S\hat{\beta}_S= \left(I_n-X_S\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top}{n}\right)\varepsilon + \lambda X_S\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*). \end{align*} Multiplying by $X_{S^c}^\top/n$ and dividing by $\lambda$ gives \begin{align*} \hat{z}_{S^c} = \frac{1}{\lambda} \frac{X_{S^c}^\top}{n} \left(I_n-X_S\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top}{n}\right)\varepsilon + \frac{X_{S^c}^\top X_S}{n}\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*). \end{align*} By the definition of $\hat{\Sigma}_{S^cS}$, this is \begin{align*} \hat{z}_{S^c} = \frac{1}{\lambda} \frac{X_{S^c}^\top}{n} \left(I_n-X_S\hat{\Sigma}_{SS}^{-1}\frac{X_S^\top}{n}\right)\varepsilon + \hat{\Sigma}_{S^cS}\hat{\Sigma}_{SS}^{-1}\operatorname{sign}(\beta_S^*). \end{align*}[/guided]

custom_env admin

[step:Use strict inactive dual feasibility to prove uniqueness]Assume now that the inactive KKT inequalities for the primal-dual witness are strict. For the inactive dual vector constructed above, this means \begin{align*} \|\hat{z}_{S^c}\|_\infty<1. \end{align*} Let $\tilde{\beta}\in\mathbb{R}^p$ be any Lasso minimiser, and define $\delta:=\tilde{\beta}-\hat{\beta}$. Using the full Lasso objective $L: \mathbb{R}^p \to \mathbb{R}$ defined above, convexity of the quadratic loss and the KKT equation at $\hat{\beta}$ give \begin{align*} L(\tilde{\beta})-L(\hat{\beta}) \ge \lambda\left(\|\tilde{\beta}\|_1-\|\hat{\beta}\|_1-\hat{z}^\top\delta\right). \end{align*} Since both vectors are minimisers, the left-hand side is zero. For each $j\in S$, the identity $\hat{z}_j=\operatorname{sign}(\hat{\beta}_j)$ and convexity of the absolute value give \begin{align*} |\tilde{\beta}_j|-|\hat{\beta}_j|-\hat{z}_j(\tilde{\beta}_j-\hat{\beta}_j)\ge 0. \end{align*} For each $j\in S^c$, the identities $\hat{\beta}_j=0$ and $|\hat{z}_j|<1$ give \begin{align*} |\tilde{\beta}_j|-\hat{z}_j\tilde{\beta}_j \ge (1-|\hat{z}_j|)|\tilde{\beta}_j|. \end{align*} The sum of these nonnegative terms is at most zero, hence each inactive term vanishes. Therefore $\tilde{\beta}_{S^c}=0$. Now $\tilde{\beta}_{S^c}=0$, so the full objective value at $\tilde{\beta}$ equals $L_S(\tilde{\beta}_S)$. If there were some $b\in\mathbb{R}^S$ with $L_S(b)<L_S(\tilde{\beta}_S)$, then the full vector $u\in\mathbb{R}^p$ defined by $u_S=b$ and $u_{S^c}=0$ would satisfy $L(u)<L(\tilde{\beta})$, contradicting that $\tilde{\beta}$ is a full Lasso minimiser. Thus $\tilde{\beta}_S$ minimises the restricted objective. The vector $\hat{\beta}_S$ also minimises the restricted objective by construction. Since $\hat{\Sigma}_{SS}$ is invertible, the restricted objective is [strictly convex](/page/Strict%20Convexity). A strictly convex function has at most one minimiser: if $u\neq v$ were two minimisers, then the midpoint inequality would give a strictly smaller value at $(u+v)/2$, contradicting minimality. Hence $\tilde{\beta}_S=\hat{\beta}_S$ and also $\tilde{\beta}_{S^c}=\hat{\beta}_{S^c}=0$. Thus $\tilde{\beta}=\hat{\beta}$, proving uniqueness.[/step]

custom_env admin

[guided]We now explain why strict inactive dual feasibility rules out every other minimiser. In the language of the theorem, the inactive KKT inequalities are strict exactly when the inactive dual vector constructed above satisfies \begin{align*} \|\hat{z}_{S^c}\|_\infty<1. \end{align*} Let $\tilde{\beta}\in\mathbb{R}^p$ be any minimiser of the full Lasso objective $L: \mathbb{R}^p \to \mathbb{R}$, and define the difference vector $\delta:=\tilde{\beta}-\hat{\beta}$. The KKT equation for $\hat{\beta}$ says \begin{align*} -\frac{1}{n}X^\top(Y-X\hat{\beta})+\lambda\hat{z}=0, \end{align*} so \begin{align*} \frac{1}{n}(Y-X\hat{\beta})^\top X\delta=\lambda\hat{z}^\top\delta. \end{align*} The quadratic loss is convex, and its first-order expansion at $\hat{\beta}$ gives \begin{align*} \frac{1}{2n}\|Y-X\tilde{\beta}\|_2^2-\frac{1}{2n}\|Y-X\hat{\beta}\|_2^2 \ge -\frac{1}{n}(Y-X\hat{\beta})^\top X\delta. \end{align*} Combining this inequality with the KKT identity gives \begin{align*} L(\tilde{\beta})-L(\hat{\beta}) \ge \lambda\left(\|\tilde{\beta}\|_1-\|\hat{\beta}\|_1-\hat{z}^\top\delta\right). \end{align*} Since both $\tilde{\beta}$ and $\hat{\beta}$ are minimisers, the left-hand side equals $0$. The right-hand side is a sum of coordinatewise subgradient gaps. For each active coordinate $j\in S$, the identity $\hat{z}_j=\operatorname{sign}(\hat{\beta}_j)$ and convexity of the absolute value give \begin{align*} |\tilde{\beta}_j|-|\hat{\beta}_j|-\hat{z}_j(\tilde{\beta}_j-\hat{\beta}_j)\ge 0. \end{align*} For each inactive coordinate $j\in S^c$, the identities $\hat{\beta}_j=0$ and $|\hat{z}_j|<1$ give \begin{align*} |\tilde{\beta}_j|-\hat{z}_j\tilde{\beta}_j \ge |\tilde{\beta}_j|-|\hat{z}_j|\,|\tilde{\beta}_j| = (1-|\hat{z}_j|)|\tilde{\beta}_j|. \end{align*} Every coordinate contribution is nonnegative, while their sum is at most $0$. Hence every inactive contribution must vanish. Because $1-|\hat{z}_j|>0$ for each $j\in S^c$, this forces $|\tilde{\beta}_j|=0$ for each inactive coordinate, so $\tilde{\beta}_{S^c}=0$. With the inactive coordinates zero, the full objective value at $\tilde{\beta}$ equals $L_S(\tilde{\beta}_S)$. We must justify that $\tilde{\beta}_S$ minimises the restricted objective. Suppose instead that there were some $b\in\mathbb{R}^S$ with $L_S(b)<L_S(\tilde{\beta}_S)$. Define $u\in\mathbb{R}^p$ by $u_S=b$ and $u_{S^c}=0$. Because $u$ is supported on $S$, the full objective at $u$ equals $L_S(b)$, while the full objective at $\tilde{\beta}$ equals $L_S(\tilde{\beta}_S)$. This would give $L(u)<L(\tilde{\beta})$, contradicting that $\tilde{\beta}$ is a full Lasso minimiser. Hence $\tilde{\beta}_S$ minimises $L_S$. The vector $\hat{\beta}_S$ also minimises the restricted objective $L_S$ by construction. The matrix $\hat{\Sigma}_{SS}$ is invertible, so $L_S$ is [strictly convex](/page/Strict%20Convexity). Strict convexity gives uniqueness of the restricted minimiser: if two different restricted minimisers existed, evaluating $L_S$ at their midpoint would give a strictly smaller value than the common minimum, contradicting minimality. Therefore $\tilde{\beta}_S=\hat{\beta}_S$. Combining this with $\tilde{\beta}_{S^c}=\hat{\beta}_{S^c}=0$ yields $\tilde{\beta}=\hat{\beta}$, proving uniqueness.[/guided]

custom_env admin

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data