Lasso Oracle Inequality under a Restricted Eigenvalue Condition

Lasso Oracle Inequality under a Restricted Eigenvalue Condition (Theorem # 5551)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We compare the Lasso objective at $\hat{\beta}$ and at the true vector $\beta^*$ to obtain the basic inequality. The score bound controls the stochastic [inner product](/page/Inner%20Product), while the support condition $\operatorname{supp}(\beta^*) \subset S$ separates the $\ell^1$ penalty over $S$ and $S^c$. This yields the cone condition $|\Delta_{S^c}|_1 \leq 3|\Delta_S|_1$ for the error $\Delta := \hat{\beta}-\beta^*$, allowing the restricted eigenvalue condition to convert prediction error into control of $|\Delta_S|_2$. The prediction bound follows by rearrangement, and the $\ell^1$ bound follows by combining the cone condition with the same restricted eigenvalue estimate. [/proofplan] [step:Derive the basic inequality from optimality of the Lasso estimator] Define the estimation error $\Delta \in \mathbb{R}^p$ by \begin{align*} \Delta := \hat{\beta} - \beta^*. \end{align*} Since $\hat{\beta}$ minimizes the Lasso objective over $\mathbb{R}^p$, evaluating the objective at $\hat{\beta}$ and at $\beta^*$ gives \begin{align*} \frac{1}{2n}|Y - X\hat{\beta}|^2 + \lambda |\hat{\beta}|_1 \leq \frac{1}{2n}|Y - X\beta^*|^2 + \lambda |\beta^*|_1. \end{align*} Using $Y = X\beta^* + \varepsilon$ and $\hat{\beta} = \beta^* + \Delta$, we have \begin{align*} Y - X\hat{\beta} = \varepsilon - X\Delta, \qquad Y - X\beta^* = \varepsilon. \end{align*} Substitution gives \begin{align*} \frac{1}{2n}|\varepsilon - X\Delta|^2 + \lambda |\beta^*+\Delta|_1 \leq \frac{1}{2n}|\varepsilon|^2 + \lambda |\beta^*|_1. \end{align*} Expanding the square, \begin{align*} |\varepsilon - X\Delta|^2 = |\varepsilon|^2 - 2\varepsilon^\top X\Delta + |X\Delta|^2. \end{align*} After cancellation of $|\varepsilon|^2/(2n)$, this becomes \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{\varepsilon^\top X\Delta}{n} + \lambda\bigl(|\beta^*|_1 - |\beta^*+\Delta|_1\bigr). \end{align*} [/step] [step:Use the score bound and support decomposition to place the error in the cone] The score bound gives \begin{align*} \frac{\varepsilon^\top X\Delta}{n} = \left(\frac{X^\top \varepsilon}{n}\right)^\top \Delta \leq \left|\frac{X^\top \varepsilon}{n}\right|_\infty |\Delta|_1 \leq \frac{\lambda}{2}|\Delta|_1. \end{align*} Since $\operatorname{supp}(\beta^*) \subset S$, one has $\beta^*_{S^c}=0$. Therefore, \begin{align*} |\beta^*|_1 - |\beta^*+\Delta|_1 = |\beta^*_S|_1 - |\beta^*_S+\Delta_S|_1 - |\Delta_{S^c}|_1. \end{align*} The [reverse triangle inequality](/theorems/2300) on $\mathbb{R}^{|S|}$ applied to $\beta^*_S$ and $\beta^*_S+\Delta_S$ gives \begin{align*} |\beta^*_S|_1 - |\beta^*_S+\Delta_S|_1 \leq |\Delta_S|_1. \end{align*} Hence \begin{align*} |\beta^*|_1 - |\beta^*+\Delta|_1 \leq |\Delta_S|_1 - |\Delta_{S^c}|_1. \end{align*} Combining these estimates with the basic inequality gives \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{\lambda}{2}\bigl(|\Delta_S|_1 + |\Delta_{S^c}|_1\bigr) + \lambda\bigl(|\Delta_S|_1 - |\Delta_{S^c}|_1\bigr). \end{align*} Collecting the coefficients of $|\Delta_S|_1$ and $|\Delta_{S^c}|_1$ yields \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{3\lambda}{2}|\Delta_S|_1 - \frac{\lambda}{2}|\Delta_{S^c}|_1. \end{align*} Since the left-hand side is non-negative and $\lambda > 0$, it follows that \begin{align*} |\Delta_{S^c}|_1 \leq 3|\Delta_S|_1. \end{align*} [guided] The goal of this step is to prove that the Lasso error $\Delta$ lies in the cone required by the restricted eigenvalue hypothesis. We begin with the stochastic term from the basic inequality. Because \begin{align*} \frac{\varepsilon^\top X\Delta}{n} = \left(\frac{X^\top \varepsilon}{n}\right)^\top \Delta, \end{align*} we can apply the finite-dimensional duality inequality between $|\cdot|_\infty$ and $|\cdot|_1$: \begin{align*} \left(\frac{X^\top \varepsilon}{n}\right)^\top \Delta \leq \left|\frac{X^\top \varepsilon}{n}\right|_\infty |\Delta|_1. \end{align*} The score assumption then gives \begin{align*} \frac{\varepsilon^\top X\Delta}{n} \leq \frac{\lambda}{2}|\Delta|_1 = \frac{\lambda}{2}\bigl(|\Delta_S|_1 + |\Delta_{S^c}|_1\bigr). \end{align*} Next we use sparsity. Since $\operatorname{supp}(\beta^*) \subset S$, the vector $\beta^*$ vanishes on $S^c$, so $\beta^*_{S^c}=0$. Hence the $\ell^1$ penalty separates as \begin{align*} |\beta^*+\Delta|_1 = |\beta^*_S+\Delta_S|_1 + |\Delta_{S^c}|_1. \end{align*} Also $|\beta^*|_1 = |\beta^*_S|_1$. Therefore \begin{align*} |\beta^*|_1 - |\beta^*+\Delta|_1 = |\beta^*_S|_1 - |\beta^*_S+\Delta_S|_1 - |\Delta_{S^c}|_1. \end{align*} By the reverse triangle inequality on the coordinates in $S$, \begin{align*} |\beta^*_S|_1 - |\beta^*_S+\Delta_S|_1 \leq |\Delta_S|_1. \end{align*} Thus \begin{align*} |\beta^*|_1 - |\beta^*+\Delta|_1 \leq |\Delta_S|_1 - |\Delta_{S^c}|_1. \end{align*} Substituting the stochastic estimate and the penalty estimate into the basic inequality yields \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{\lambda}{2}\bigl(|\Delta_S|_1 + |\Delta_{S^c}|_1\bigr) + \lambda\bigl(|\Delta_S|_1 - |\Delta_{S^c}|_1\bigr). \end{align*} Collecting terms gives \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{3\lambda}{2}|\Delta_S|_1 - \frac{\lambda}{2}|\Delta_{S^c}|_1. \end{align*} The left-hand side is non-negative because it is a squared Euclidean norm divided by $2n$. Hence the right-hand side must be non-negative: \begin{align*} 0 \leq \frac{3\lambda}{2}|\Delta_S|_1 - \frac{\lambda}{2}|\Delta_{S^c}|_1. \end{align*} Since $\lambda > 0$, multiplying by $2/\lambda$ gives the cone condition \begin{align*} |\Delta_{S^c}|_1 \leq 3|\Delta_S|_1. \end{align*} This is exactly the structural condition needed to invoke the restricted eigenvalue hypothesis. [/guided] [/step] [step:Apply the restricted eigenvalue condition to bound the prediction error] The preceding step proves that $\Delta$ satisfies \begin{align*} |\Delta_{S^c}|_1 \leq 3|\Delta_S|_1. \end{align*} By the restricted eigenvalue condition on $(S,3)$, \begin{align*} \frac{1}{n}|X\Delta|^2 \geq \kappa^2 |\Delta_S|_2^2. \end{align*} Also, since $\Delta_S$ is supported on at most $s$ coordinates, the [Cauchy-Schwarz inequality](/theorems/432) in $\mathbb{R}^s$ gives \begin{align*} |\Delta_S|_1 \leq \sqrt{s}\,|\Delta_S|_2. \end{align*} From the sharpened basic inequality in the previous step, \begin{align*} \frac{1}{n}|X\Delta|^2 \leq 3\lambda |\Delta_S|_1. \end{align*} Combining the last three estimates gives \begin{align*} \frac{1}{n}|X\Delta|^2 \leq 3\lambda \sqrt{s}\,|\Delta_S|_2 \leq 3\lambda \sqrt{s}\,\frac{|X\Delta|}{\sqrt{n}\,\kappa}. \end{align*} If $|X\Delta|=0$, the desired prediction bound holds. If $|X\Delta|>0$, divide by $|X\Delta|/\sqrt{n}$ to obtain \begin{align*} \frac{|X\Delta|}{\sqrt{n}} \leq \frac{3\lambda\sqrt{s}}{\kappa}. \end{align*} Squaring both sides yields \begin{align*} \frac{1}{n}|X\Delta|^2 \leq 9\frac{s\lambda^2}{\kappa^2}. \end{align*} [/step] [step:Convert the prediction bound into the $\ell^1$ estimation bound] The cone condition gives \begin{align*} |\Delta|_1 = |\Delta_S|_1 + |\Delta_{S^c}|_1 \leq 4|\Delta_S|_1. \end{align*} Using Cauchy-Schwarz on the coordinates in $S$, \begin{align*} |\Delta|_1 \leq 4\sqrt{s}\,|\Delta_S|_2. \end{align*} Since $\Delta$ satisfies the restricted eigenvalue cone condition, \begin{align*} |\Delta_S|_2 \leq \frac{|X\Delta|}{\sqrt{n}\,\kappa}. \end{align*} The prediction estimate from the previous step gives \begin{align*} \frac{|X\Delta|}{\sqrt{n}} \leq \frac{3\lambda\sqrt{s}}{\kappa}. \end{align*} Therefore, \begin{align*} |\Delta|_1 \leq 4\sqrt{s}\,\frac{1}{\kappa}\frac{|X\Delta|}{\sqrt{n}} \leq 4\sqrt{s}\,\frac{1}{\kappa}\frac{3\lambda\sqrt{s}}{\kappa} = 12\frac{s\lambda}{\kappa^2}. \end{align*} Since $\Delta = \hat{\beta}-\beta^*$, the two displayed estimates are precisely \begin{align*} \frac{1}{n}|X(\hat{\beta}-\beta^*)|^2 \leq 9\frac{s\lambda^2}{\kappa^2} \end{align*} and \begin{align*} |\hat{\beta}-\beta^*|_1 \leq 12\frac{s\lambda}{\kappa^2}. \end{align*} [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Explore Further

Triangle Inequality For Inner Product Spaces Theorem #433 Reverse Triangle Inequality Theorem #2300 Chi-Squared Distribution of the Residual Sum of Squares Probability & Statistics Bias-Variance Decomposition for Squared Loss Probability & Statistics Fast-Rate Lasso Prediction Bound Under the Compatibility Condition Probability & Statistics Blumenthal's Zero-One Law Brownian Motion Basic Properties of Conditional Expectation Conditional Expectation Mean Response and Prediction Intervals in the Gaussian Linear Model Probability & Statistics Jensen's Inequality Probability Theory Cramer's Theorem Large Deviations Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.