Lasso Score Feasibility — Statement & Proof

Lasso Score Feasibility (Theorem # 5576)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the score bound directly from the defining minimality of the Lasso solution. The argument compares the Lasso objective at the minimizer with coordinatewise positive and negative perturbations. The positive perturbations give an upper bound on each score coordinate, the negative perturbations give the corresponding lower bound, and the coordinatewise bounds combine to give the stated $\ell_\infty$ feasibility estimate. [/proofplan] [step:Write the one-coordinate optimality inequality] For each $j \in \{1,\dots,p\}$, let $e_j \in \mathbb{R}^p$ denote the $j$-th standard basis vector. Define the residual vector $r \in \mathbb{R}^n$ by \begin{align*} r := Y-X\hat\beta_{\mathrm{L}}. \end{align*} Define the score vector $s \in \mathbb{R}^p$ by \begin{align*} s := \frac{X^\top r}{n}. \end{align*} Since $\hat\beta_{\mathrm{L}}$ minimizes the Lasso objective over $\mathbb{R}^p$, for every $t \in \mathbb{R}$ we have \begin{align*} 0 &\leq \frac{1}{2n}|Y-X(\hat\beta_{\mathrm{L}}+t e_j)|^2+\lambda_L\|\hat\beta_{\mathrm{L}}+t e_j\|_1 -\frac{1}{2n}|Y-X\hat\beta_{\mathrm{L}}|^2-\lambda_L\|\hat\beta_{\mathrm{L}}\|_1. \end{align*} Because $Y-X(\hat\beta_{\mathrm{L}}+t e_j)=r-tXe_j$, expanding the squared Euclidean norm gives \begin{align*} |r-tXe_j|^2-|r|^2 &= -2t(Xe_j)\cdot r+t^2|Xe_j|^2. \end{align*} Also $(Xe_j)\cdot r=e_j^\top X^\top r=n s_j$. Therefore \begin{align*} 0 \leq -t s_j+\frac{t^2}{2n}|Xe_j|^2 +\lambda_L\left(\|\hat\beta_{\mathrm{L}}+t e_j\|_1-\|\hat\beta_{\mathrm{L}}\|_1\right). \end{align*} [/step] [step:Use positive perturbations to prove $s_j \leq \lambda_L$] Let $t>0$. The ordinary triangle inequality for the $\ell_1$ norm, rearranged, gives \begin{align*} \|\hat\beta_{\mathrm{L}}+t e_j\|_1-\|\hat\beta_{\mathrm{L}}\|_1 \leq \|t e_j\|_1 = t. \end{align*} Using this in the one-coordinate optimality inequality yields \begin{align*} 0 &\leq -t s_j+\frac{t^2}{2n}|Xe_j|^2+\lambda_L t. \end{align*} Dividing by $t>0$ gives \begin{align*} s_j \leq \frac{t}{2n}|Xe_j|^2+\lambda_L. \end{align*} Letting $t \to 0^+$ gives $s_j \leq \lambda_L$. [guided] Fix a coordinate $j \in \{1,\dots,p\}$. Let $e_j \in \mathbb{R}^p$ denote the $j$-th standard basis vector. Define the residual vector $r \in \mathbb{R}^n$ by \begin{align*} r := Y-X\hat\beta_{\mathrm{L}}. \end{align*} Define the score vector $s \in \mathbb{R}^p$ by \begin{align*} s := \frac{X^\top r}{n}. \end{align*} We perturb the minimizer only in the positive $e_j$ direction. Let $t>0$. Since $\hat\beta_{\mathrm{L}}$ is a global minimizer of the Lasso objective over $\mathbb{R}^p$, the objective value at $\hat\beta_{\mathrm{L}}+t e_j$ is at least the objective value at $\hat\beta_{\mathrm{L}}$. Therefore \begin{align*} 0 \leq \frac{1}{2n}|Y-X(\hat\beta_{\mathrm{L}}+t e_j)|^2+\lambda_L\|\hat\beta_{\mathrm{L}}+t e_j\|_1 -\frac{1}{2n}|Y-X\hat\beta_{\mathrm{L}}|^2-\lambda_L\|\hat\beta_{\mathrm{L}}\|_1. \end{align*} Because $Y-X(\hat\beta_{\mathrm{L}}+t e_j)=r-tXe_j$, expanding the squared Euclidean norm gives \begin{align*} |r-tXe_j|^2-|r|^2 = -2t(Xe_j)\cdot r+t^2|Xe_j|^2. \end{align*} Also $(Xe_j)\cdot r=e_j^\top X^\top r=n s_j$, by the definition of $s$ and the definition of the $j$-th coordinate. Substituting these identities into the objective comparison gives \begin{align*} 0 \leq -t s_j+\frac{t^2}{2n}|Xe_j|^2 +\lambda_L\left(\|\hat\beta_{\mathrm{L}}+t e_j\|_1-\|\hat\beta_{\mathrm{L}}\|_1\right). \end{align*} The remaining term is the change in the $\ell_1$ norm. We control it using the triangle inequality for the $\ell_1$ norm in the form \begin{align*} \|a+b\|_1\leq \|a\|_1+\|b\|_1, \end{align*} which implies \begin{align*} \|a+b\|_1-\|a\|_1 \leq \|b\|_1. \end{align*} Apply this with $a=\hat\beta_{\mathrm{L}}$ and $b=t e_j$. Since $t>0$ and $e_j$ has exactly one nonzero coordinate equal to $1$, \begin{align*} \|t e_j\|_1=t. \end{align*} Thus \begin{align*} \|\hat\beta_{\mathrm{L}}+t e_j\|_1-\|\hat\beta_{\mathrm{L}}\|_1 \leq t. \end{align*} Substituting this estimate into the objective comparison gives \begin{align*} 0 &\leq -t s_j+\frac{t^2}{2n}|Xe_j|^2+\lambda_L t. \end{align*} After dividing by $t>0$, we obtain \begin{align*} s_j \leq \frac{t}{2n}|Xe_j|^2+\lambda_L. \end{align*} The term $\frac{t}{2n}|Xe_j|^2$ tends to $0$ as $t \to 0^+$ because $|Xe_j|^2/(2n)$ is a fixed finite scalar. Hence \begin{align*} s_j \leq \lambda_L. \end{align*} This is the upper coordinate bound for the score. [/guided] [/step] [step:Use negative perturbations to prove $-s_j \leq \lambda_L$] Let $t<0$. Again the ordinary triangle inequality for the $\ell_1$ norm, rearranged, gives \begin{align*} \|\hat\beta_{\mathrm{L}}+t e_j\|_1-\|\hat\beta_{\mathrm{L}}\|_1 \leq \|t e_j\|_1 = |t|. \end{align*} Since $t<0$, $|t|=-t$. The one-coordinate optimality inequality becomes \begin{align*} 0 &\leq -t s_j+\frac{t^2}{2n}|Xe_j|^2-\lambda_L t. \end{align*} Dividing by $-t>0$ gives \begin{align*} s_j \geq \frac{t}{2n}|Xe_j|^2-\lambda_L. \end{align*} Letting $t \to 0^-$ gives $s_j \geq -\lambda_L$, equivalently $-s_j \leq \lambda_L$. [/step] [step:Take the maximum over coordinates] The previous two steps show that for every $j \in \{1,\dots,p\}$, \begin{align*} -\lambda_L \leq s_j \leq \lambda_L. \end{align*} Hence $|s_j|\leq \lambda_L$ for every coordinate $j$. By the definition of the $\ell_\infty$ norm on $\mathbb{R}^p$, \begin{align*} \|s\|_\infty = \max_{1\leq j\leq p}|s_j| \leq \lambda_L. \end{align*} Substituting $s=X^\top(Y-X\hat\beta_{\mathrm{L}})/n$ gives \begin{align*} \left\|\frac{X^\top(Y-X\hat\beta_{\mathrm{L}})}{n}\right\|_\infty \leq \lambda_L, \end{align*} which is the desired score feasibility bound. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Triangle Inequality For Inner Product Spaces

Explore Further

Triangle Inequality For Inner Product Spaces Theorem #433 Conditional Expectation from a Conditional Density Probability Theory Frisch-Waugh-Lovell Theorem Probability & Statistics Conditional Variance Formula for the Ordinary Least Squares Estimator Probability & Statistics Elementary Closure Properties Probability & Statistics Pushforward Formula Probability Theory Subdifferential of the $\ell^1$ Norm Probability & Statistics Immediate Return to Zero for Brownian Motion Brownian Motion Cook's Distance Formula for Internally Studentized Residuals Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.

Lasso Score Feasibility (Theorem # 5576)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Lasso Score Feasibility (Theorem # 5576)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further