Basic Lasso Inequality — Statement & Proof

Basic Lasso Inequality (Theorem # 5549)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The proof uses only the defining optimality of the Lasso estimator. We compare the Lasso objective at the minimizer $\hat{\beta}$ with its value at the feasible point $\beta^*$, substitute the model identity $Y = X\beta^* + \varepsilon$, and expand the resulting squared Euclidean norm. After canceling the common noise term $|\varepsilon|^2$, the remaining terms rearrange to the desired deterministic inequality. [/proofplan] [step:Compare the Lasso objective at $\hat{\beta}$ and $\beta^*$] Define the Lasso objective $Q: \mathbb{R}^p \to \mathbb{R}$ by \begin{align*} Q(b) := \frac{1}{2n}|Y-Xb|^2 + \lambda |b|_1. \end{align*} Since $\hat{\beta}$ minimizes $Q$ over $\mathbb{R}^p$ and $\beta^* \in \mathbb{R}^p$ is an admissible comparison point, we have \begin{align*} \frac{1}{2n}|Y-X\hat{\beta}|^2 + \lambda |\hat{\beta}|_1 \leq \frac{1}{2n}|Y-X\beta^*|^2 + \lambda |\beta^*|_1. \end{align*} [/step] [step:Substitute the linear model and expand the squared norm] Define the prediction-error vector $r \in \mathbb{R}^n$ by \begin{align*} r := X(\hat{\beta}-\beta^*). \end{align*} Using $Y = X\beta^* + \varepsilon$, we first compute \begin{align*} Y-X\hat{\beta}=X\beta^*+\varepsilon-X\hat{\beta}. \end{align*} Since $X\beta^*-X\hat{\beta}=-X(\hat{\beta}-\beta^*)$, this gives \begin{align*} Y-X\hat{\beta}=\varepsilon-X(\hat{\beta}-\beta^*). \end{align*} By the definition of $r$, we therefore have \begin{align*} Y-X\hat{\beta}=\varepsilon-r. \end{align*} At the comparison point $\beta^*$, the residual is \begin{align*} Y-X\beta^*=\varepsilon. \end{align*} Therefore the optimality inequality becomes \begin{align*} \frac{1}{2n}|\varepsilon-r|^2+\lambda|\hat{\beta}|_1 \leq \frac{1}{2n}|\varepsilon|^2+\lambda|\beta^*|_1. \end{align*} Expanding the Euclidean square first gives \begin{align*} |\varepsilon-r|^2=(\varepsilon-r)^\top(\varepsilon-r). \end{align*} Multiplying out the Euclidean [inner product](/page/Inner%20Product) yields \begin{align*} (\varepsilon-r)^\top(\varepsilon-r)=|\varepsilon|^2-2\varepsilon^\top r+|r|^2. \end{align*} Thus \begin{align*} |\varepsilon-r|^2=|\varepsilon|^2-2\varepsilon^\top r+|r|^2. \end{align*} [guided] The comparison inequality contains the residual $Y-X\hat{\beta}$, but the theorem is stated in terms of the prediction error $X(\hat{\beta}-\beta^*)$. We therefore name that vector. Define $r \in \mathbb{R}^n$ by \begin{align*} r := X(\hat{\beta}-\beta^*). \end{align*} The model identity $Y=X\beta^*+\varepsilon$ converts the Lasso residual into noise minus prediction error. First, \begin{align*} Y-X\hat{\beta}=X\beta^*+\varepsilon-X\hat{\beta}. \end{align*} Since $X\beta^*-X\hat{\beta}=-X(\hat{\beta}-\beta^*)$, this becomes \begin{align*} Y-X\hat{\beta}=\varepsilon-X(\hat{\beta}-\beta^*). \end{align*} By the definition of $r$, this is \begin{align*} Y-X\hat{\beta}=\varepsilon-r. \end{align*} At the comparison point $\beta^*$, the residual is just the noise: \begin{align*} Y-X\beta^*=\varepsilon. \end{align*} Substituting these two identities into the objective comparison gives \begin{align*} \frac{1}{2n}|\varepsilon-r|^2+\lambda|\hat{\beta}|_1 \leq \frac{1}{2n}|\varepsilon|^2+\lambda|\beta^*|_1. \end{align*} Now expand the square using the Euclidean inner product on $\mathbb{R}^n$: \begin{align*} |\varepsilon-r|^2=(\varepsilon-r)^\top(\varepsilon-r). \end{align*} The inner product expansion gives \begin{align*} (\varepsilon-r)^\top(\varepsilon-r)=\varepsilon^\top\varepsilon-2\varepsilon^\top r+r^\top r. \end{align*} Using $\varepsilon^\top\varepsilon=|\varepsilon|^2$ and $r^\top r=|r|^2$, we get \begin{align*} |\varepsilon-r|^2=|\varepsilon|^2-2\varepsilon^\top r+|r|^2. \end{align*} This expansion is the only algebraic point in the proof: it separates the residual norm into the pure noise term, the stochastic inner-product term, and the deterministic prediction-error term. [/guided] [/step] [step:Cancel the common noise term and rearrange] Substituting the expansion into the preceding inequality yields \begin{align*} \frac{1}{2n}\bigl(|\varepsilon|^2-2\varepsilon^\top r+|r|^2\bigr)+\lambda|\hat{\beta}|_1 \leq \frac{1}{2n}|\varepsilon|^2+\lambda|\beta^*|_1. \end{align*} Subtracting $\frac{1}{2n}|\varepsilon|^2$ from both sides gives \begin{align*} -\frac{1}{n}\varepsilon^\top r+\frac{1}{2n}|r|^2+\lambda|\hat{\beta}|_1 \leq \lambda|\beta^*|_1. \end{align*} Moving the inner-product term and the penalty term to the right-hand side gives \begin{align*} \frac{1}{2n}|r|^2 \leq \frac{1}{n}\varepsilon^\top r+\lambda\bigl(|\beta^*|_1-|\hat{\beta}|_1\bigr). \end{align*} Finally substituting $r=X(\hat{\beta}-\beta^*)$ gives \begin{align*} \frac{1}{2n}|X(\hat{\beta}-\beta^*)|^2 \leq \frac{1}{n}\varepsilon^\top X(\hat{\beta}-\beta^*)+\lambda\bigl(|\beta^*|_1-|\hat{\beta}|_1\bigr), \end{align*} which is the desired inequality. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Inner Product

Explore Further

Inner Product Definition Coordinate Projections on a Product Space Are Independent Probability Theory Uniform Integrability and $L^1$ Convergence Martingale Theory Basic Properties of the Ordinary Least Squares Hat Matrix Probability & Statistics Independence Through Rectangles Probability & Statistics Orthogonality of Ordinary Least Squares Residuals Probability & Statistics Variance Inflation Factor Formula Probability & Statistics PGF of a Sum Probability Theory Asymptotic Normality of Random Design Ordinary Least Squares Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.

Basic Lasso Inequality (Theorem # 5549)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Basic Lasso Inequality (Theorem # 5549)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further