Restricted Eigenvalue Euclidean Error Bound for the Lasso

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] Set $\Delta = \hat{\beta}-\beta^*$ and use the optimality of $\hat{\beta}$ in the Lasso objective to obtain the basic inequality. The score event controls the stochastic [inner product](/page/Inner%20Product) by $\lambda\|\Delta\|_1/2$, while the support condition $\beta^*_{S^c}=0$ separates the $\ell^1$ penalty into $S$ and $S^c$. This simultaneously proves the cone condition $\Delta \in \mathcal{C}(S,3)$ and gives an upper bound for $|X\Delta|^2/n$ in terms of $\|\Delta_S\|_1$. The restricted eigenvalue lower bound then converts the prediction norm bound into a Euclidean bound for $\Delta$. [/proofplan] [step:Derive the sharpened basic inequality from Lasso optimality] Define the estimation error vector \begin{align*} \Delta = \hat{\beta}-\beta^* \in \mathbb{R}^p . \end{align*} For any subset $A \subset \{1,\dots,p\}$, define $\Delta_A \in \mathbb{R}^A$ to be the coordinate restriction $(\Delta_j)_{j\in A}$, where $\mathbb{R}^A$ denotes the finite-dimensional real [vector space](/page/Vector%20Space) of functions from $A$ to $\mathbb{R}$. Since $\hat{\beta}$ minimizes $Q_\lambda$ over $\mathbb{R}^p$, we have \begin{align*} \frac{1}{2n}|Y-X\hat{\beta}|^2 + \lambda\|\hat{\beta}\|_1 \leq \frac{1}{2n}|Y-X\beta^*|^2 + \lambda\|\beta^*\|_1 . \end{align*} Using $Y=X\beta^*+\varepsilon$ and $\hat{\beta}=\beta^*+\Delta$, this becomes \begin{align*} \frac{1}{2n}|\varepsilon-X\Delta|^2 + \lambda\|\beta^*+\Delta\|_1 \leq \frac{1}{2n}|\varepsilon|^2 + \lambda\|\beta^*\|_1 . \end{align*} Expanding the squared Euclidean norm gives \begin{align*} |\varepsilon-X\Delta|^2 = |\varepsilon|^2 - 2\varepsilon^\top X\Delta + |X\Delta|^2 . \end{align*} Substituting this identity and cancelling $|\varepsilon|^2/(2n)$ yields \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{1}{n}\varepsilon^\top X\Delta + \lambda\bigl(\|\beta^*\|_1-\|\beta^*+\Delta\|_1\bigr). \end{align*} On the score event $\mathcal{E}_\lambda$, the vector $X^\top\varepsilon/n$ and the vector $\Delta$ both belong to the finite-dimensional space $\mathbb{R}^p$, so the finite-dimensional [Hölder inequality](/page/Holder%20Inequality) with conjugate exponents $\infty$ and $1$ applies: \begin{align*} \frac{1}{n}\varepsilon^\top X\Delta = \left(\frac{1}{n}X^\top\varepsilon\right)^\top \Delta \leq \left\|\frac{1}{n}X^\top\varepsilon\right\|_\infty \|\Delta\|_1 \leq \frac{\lambda}{2}\|\Delta\|_1 . \end{align*} Because $\beta^*_{S^c}=0$, the support decomposition gives \begin{align*} \|\beta^*\|_1-\|\beta^*+\Delta\|_1 = \|\beta^*_S\|_1-\|\beta^*_S+\Delta_S\|_1-\|\Delta_{S^c}\|_1 . \end{align*} The vectors $\beta^*_S$ and $\Delta_S$ belong to the finite-dimensional coordinate space $\mathbb{R}^S$, so the reverse form of the [Triangle Inequality](/page/Triangle%20Inequality) for the $\ell^1$ norm on $\mathbb{R}^S$ gives \begin{align*} \|\beta^*\|_1-\|\beta^*+\Delta\|_1 \leq \|\Delta_S\|_1-\|\Delta_{S^c}\|_1 . \end{align*} Since $\|\Delta\|_1=\|\Delta_S\|_1+\|\Delta_{S^c}\|_1$, we obtain \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{\lambda}{2}\bigl(\|\Delta_S\|_1+\|\Delta_{S^c}\|_1\bigr) + \lambda\bigl(\|\Delta_S\|_1-\|\Delta_{S^c}\|_1\bigr). \end{align*} Collecting coefficients gives \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{3\lambda}{2}\|\Delta_S\|_1 - \frac{\lambda}{2}\|\Delta_{S^c}\|_1 . \end{align*} [guided] We first convert the fact that $\hat{\beta}$ minimizes the Lasso objective into an inequality involving the error vector. Define \begin{align*} \Delta = \hat{\beta}-\beta^* \in \mathbb{R}^p . \end{align*} For any subset $A \subset \{1,\dots,p\}$, define $\Delta_A \in \mathbb{R}^A$ to be the coordinate restriction $(\Delta_j)_{j\in A}$, where $\mathbb{R}^A$ denotes the finite-dimensional real vector space of functions from $A$ to $\mathbb{R}$. In particular, $\Delta_S$ keeps the coordinates indexed by $S$, while $\Delta_{S^c}$ keeps the coordinates indexed by the complement $S^c$ in $\{1,\dots,p\}$. The objective being minimized is the map $Q_\lambda: \mathbb{R}^p \to \mathbb{R}$ defined by sending each $\beta \in \mathbb{R}^p$ to the real number $Q_\lambda(\beta)$, where \begin{align*} Q_\lambda(\beta) = \frac{1}{2n}|Y-X\beta|^2+\lambda\|\beta\|_1 . \end{align*} Since $\hat{\beta}$ is a minimizer, comparing it with the admissible vector $\beta^*$ gives \begin{align*} \frac{1}{2n}|Y-X\hat{\beta}|^2 + \lambda\|\hat{\beta}\|_1 \leq \frac{1}{2n}|Y-X\beta^*|^2 + \lambda\|\beta^*\|_1 . \end{align*} Now use the model equation $Y=X\beta^*+\varepsilon$ and the identity $\hat{\beta}=\beta^*+\Delta$. Then \begin{align*} Y-X\hat{\beta} = X\beta^*+\varepsilon-X(\beta^*+\Delta) = \varepsilon-X\Delta, \end{align*} while $Y-X\beta^*=\varepsilon$. Therefore \begin{align*} \frac{1}{2n}|\varepsilon-X\Delta|^2 + \lambda\|\beta^*+\Delta\|_1 \leq \frac{1}{2n}|\varepsilon|^2 + \lambda\|\beta^*\|_1 . \end{align*} Expanding the square in the Euclidean inner product on $\mathbb{R}^n$ gives \begin{align*} |\varepsilon-X\Delta|^2 = |\varepsilon|^2 - 2\varepsilon^\top X\Delta + |X\Delta|^2 . \end{align*} After substituting this expansion and cancelling $|\varepsilon|^2/(2n)$ from both sides, we get the basic inequality \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{1}{n}\varepsilon^\top X\Delta + \lambda\bigl(\|\beta^*\|_1-\|\beta^*+\Delta\|_1\bigr). \end{align*} The next task is to control the two terms on the right-hand side. The score event was designed exactly to control the stochastic inner product. Since \begin{align*} \frac{1}{n}\varepsilon^\top X\Delta = \left(\frac{1}{n}X^\top\varepsilon\right)^\top\Delta, \end{align*} the vector $X^\top\varepsilon/n$ and the vector $\Delta$ lie in the same finite-dimensional space $\mathbb{R}^p$. Therefore the finite-dimensional [Hölder inequality](/page/Holder%20Inequality) with conjugate exponents $\infty$ and $1$ gives \begin{align*} \frac{1}{n}\varepsilon^\top X\Delta \leq \left\|\frac{1}{n}X^\top\varepsilon\right\|_\infty\|\Delta\|_1 . \end{align*} On $\mathcal{E}_\lambda$, the first factor is at most $\lambda/2$, hence \begin{align*} \frac{1}{n}\varepsilon^\top X\Delta \leq \frac{\lambda}{2}\|\Delta\|_1 . \end{align*} It remains to exploit sparsity. Since $S=\operatorname{supp}(\beta^*)$, the vector $\beta^*$ has no coordinates outside $S$, so $\beta^*_{S^c}=0$. Therefore \begin{align*} \|\beta^*+\Delta\|_1 = \|\beta^*_S+\Delta_S\|_1+\|\Delta_{S^c}\|_1 . \end{align*} The vectors $\beta^*_S$ and $\Delta_S$ lie in the finite-dimensional coordinate space $\mathbb{R}^S$. Applying the reverse form of the [Triangle Inequality](/page/Triangle%20Inequality) for the $\ell^1$ norm on this space gives \begin{align*} \|\beta^*_S+\Delta_S\|_1 \geq \|\beta^*_S\|_1-\|\Delta_S\|_1 . \end{align*} Thus the support decomposition gives \begin{align*} \|\beta^*\|_1-\|\beta^*+\Delta\|_1 = \|\beta^*_S\|_1-\|\beta^*_S+\Delta_S\|_1-\|\Delta_{S^c}\|_1 . \end{align*} Combining this identity with the [reverse triangle inequality](/theorems/2300) gives \begin{align*} \|\beta^*\|_1-\|\beta^*+\Delta\|_1 \leq \|\Delta_S\|_1-\|\Delta_{S^c}\|_1 . \end{align*} Finally, because $\Delta_S$ and $\Delta_{S^c}$ split the coordinates of $\Delta$, \begin{align*} \|\Delta\|_1=\|\Delta_S\|_1+\|\Delta_{S^c}\|_1 . \end{align*} Substituting the score bound and the penalty bound into the basic inequality gives \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{\lambda}{2}\bigl(\|\Delta_S\|_1+\|\Delta_{S^c}\|_1\bigr) + \lambda\bigl(\|\Delta_S\|_1-\|\Delta_{S^c}\|_1\bigr). \end{align*} Collecting the coefficients of $\|\Delta_S\|_1$ and $\|\Delta_{S^c}\|_1$ gives \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{3\lambda}{2}\|\Delta_S\|_1 - \frac{\lambda}{2}\|\Delta_{S^c}\|_1 . \end{align*} This is the sharpened basic inequality: the term on $S^c$ has a negative sign, and that sign is what forces the error vector into the restricted eigenvalue cone. [/guided] [/step] [step:Use the sharpened inequality to place the error in the restricted eigenvalue cone] From the sharpened basic inequality, \begin{align*} 0 \leq \frac{1}{2n}|X\Delta|^2 \leq \frac{3\lambda}{2}\|\Delta_S\|_1 - \frac{\lambda}{2}\|\Delta_{S^c}\|_1 . \end{align*} Since $\lambda>0$, this implies \begin{align*} \|\Delta_{S^c}\|_1 \leq 3\|\Delta_S\|_1 . \end{align*} Hence $\Delta \in \mathcal{C}(S,3)$. Returning to the same sharpened inequality and discarding the non-positive term $-(\lambda/2)\|\Delta_{S^c}\|_1$, we also obtain \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{3\lambda}{2}\|\Delta_S\|_1 . \end{align*} [guided] The sharpened basic inequality contains both the prediction error and the two coordinate parts of the estimation error: \begin{align*} 0 \leq \frac{1}{2n}|X\Delta|^2 \leq \frac{3\lambda}{2}\|\Delta_S\|_1 - \frac{\lambda}{2}\|\Delta_{S^c}\|_1 . \end{align*} The [first inequality](/theorems/2897) holds because $|X\Delta|^2$ is a squared Euclidean norm. Since $\lambda>0$, the right-hand side must be non-negative. Multiplying the resulting inequality by $2/\lambda$ gives \begin{align*} \|\Delta_{S^c}\|_1 \leq 3\|\Delta_S\|_1 . \end{align*} This is exactly the defining cone condition for $\mathcal{C}(S,3)$, so $\Delta \in \mathcal{C}(S,3)$. We also need an upper bound for the prediction norm that only involves the coordinates on $S$. In the same sharpened inequality, the term $-(\lambda/2)\|\Delta_{S^c}\|_1$ is non-positive because $\lambda>0$ and $\|\Delta_{S^c}\|_1\geq 0$. Removing this non-positive term enlarges the right-hand side, so \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{3\lambda}{2}\|\Delta_S\|_1 . \end{align*} This is the prediction bound that will be paired with the restricted eigenvalue lower bound. [/guided] [/step] [step:Convert the prediction bound into a Euclidean bound using the restricted eigenvalue] If $s=0$, then $S=\varnothing$. The cone condition proved above gives \begin{align*} \|\Delta\|_1=\|\Delta_{S^c}\|_1 \leq 3\|\Delta_S\|_1=0, \end{align*} so $\Delta=0$ and the desired estimate follows. Hence assume $s\geq 1$. If $\Delta=0$, then \begin{align*} |\hat{\beta}-\beta^*|=|\Delta|=0 \leq \frac{4\lambda\sqrt{s}}{\kappa_{\mathrm{RE}}(S,3)^2}, \end{align*} so assume $\Delta\neq 0$. Since $\Delta \in \mathcal{C}(S,3)$ and $\kappa_{\mathrm{RE}}(S,3)$ is the infimum of $|Xv|/(\sqrt{n}|v|)$ over nonzero vectors $v \in \mathcal{C}(S,3)$, we have \begin{align*} \frac{|X\Delta|}{\sqrt{n}|\Delta|} \geq \kappa_{\mathrm{RE}}(S,3), \end{align*} and therefore \begin{align*} \frac{1}{n}|X\Delta|^2 \geq \kappa_{\mathrm{RE}}(S,3)^2|\Delta|^2 . \end{align*} The vector $\Delta_S$ belongs to the finite-dimensional coordinate space $\mathbb{R}^S$, which has $s$ coordinates. Applying the finite-dimensional [Cauchy-Schwarz Inequality](/page/Cauchy-Schwarz%20Inequality) to the vectors $(|\Delta_j|)_{j\in S}$ and $(1)_{j\in S}$ gives \begin{align*} \|\Delta_S\|_1 \leq \sqrt{s}\,|\Delta_S| \leq \sqrt{s}\,|\Delta|. \end{align*} Combining this with the upper bound from the preceding step gives \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{3\lambda}{2}\sqrt{s}\,|\Delta|. \end{align*} Using the restricted eigenvalue lower bound on the left-hand side, \begin{align*} \frac{1}{2}\kappa_{\mathrm{RE}}(S,3)^2|\Delta|^2 \leq \frac{3\lambda}{2}\sqrt{s}\,|\Delta|. \end{align*} Since $\Delta\neq 0$, divide by $|\Delta|/2$ to obtain \begin{align*} |\Delta| \leq \frac{3\lambda\sqrt{s}}{\kappa_{\mathrm{RE}}(S,3)^2}. \end{align*} Because $3\leq 4$ and $\Delta=\hat{\beta}-\beta^*$, it follows that \begin{align*} |\hat{\beta}-\beta^*| \leq \frac{4\lambda\sqrt{s}}{\kappa_{\mathrm{RE}}(S,3)^2}. \end{align*} This is the claimed Euclidean estimation bound. [guided] There is one endpoint case to remove before using the restricted eigenvalue definition. If $s=0$, then $S=\varnothing$. The cone condition from the preceding step gives \begin{align*} \|\Delta\|_1=\|\Delta_{S^c}\|_1 \leq 3\|\Delta_S\|_1=0. \end{align*} Thus $\Delta=0$, and the desired estimate follows. We may therefore assume $s\geq 1$. If $\Delta=0$, then the desired estimate is immediate because \begin{align*} |\hat{\beta}-\beta^*|=|\Delta|=0 \leq \frac{4\lambda\sqrt{s}}{\kappa_{\mathrm{RE}}(S,3)^2}. \end{align*} Assume therefore that $\Delta\neq 0$. The preceding step proved that $\Delta \in \mathcal{C}(S,3)$, and the hypothesis $\kappa_{\mathrm{RE}}(S,3)>0$ says that the restricted eigenvalue is positive on this cone. By the definition of $\kappa_{\mathrm{RE}}(S,3)$ as the infimum of $|Xv|/(\sqrt{n}|v|)$ over nonzero vectors $v\in\mathcal{C}(S,3)$, applying the definition to $v=\Delta$ gives \begin{align*} \frac{|X\Delta|}{\sqrt{n}|\Delta|} \geq \kappa_{\mathrm{RE}}(S,3). \end{align*} Squaring both sides is valid because both sides are non-negative, and hence \begin{align*} \frac{1}{n}|X\Delta|^2 \geq \kappa_{\mathrm{RE}}(S,3)^2|\Delta|^2 . \end{align*} Next we convert the $\ell^1$ term from the prediction upper bound into a Euclidean norm. The vector $\Delta_S$ lies in the finite-dimensional coordinate space $\mathbb{R}^S$, which has $s$ coordinates. Applying the finite-dimensional [Cauchy-Schwarz Inequality](/page/Cauchy-Schwarz%20Inequality) to $(|\Delta_j|)_{j\in S}$ and $(1)_{j\in S}$ gives \begin{align*} \|\Delta_S\|_1 = \sum_{j\in S}|\Delta_j| \leq \left(\sum_{j\in S}|\Delta_j|^2\right)^{1/2} \left(\sum_{j\in S}1^2\right)^{1/2} = \sqrt{s}\,|\Delta_S|. \end{align*} Since restricting to the coordinates in $S$ cannot increase the Euclidean norm, \begin{align*} |\Delta_S|\leq |\Delta|, \end{align*} and therefore \begin{align*} \|\Delta_S\|_1 \leq \sqrt{s}\,|\Delta|. \end{align*} The prediction upper bound from the previous step was \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{3\lambda}{2}\|\Delta_S\|_1 . \end{align*} Substituting the Cauchy-Schwarz estimate gives \begin{align*} \frac{1}{2n}|X\Delta|^2 \leq \frac{3\lambda}{2}\sqrt{s}\,|\Delta|. \end{align*} Using the restricted eigenvalue lower bound on the left-hand side yields \begin{align*} \frac{1}{2}\kappa_{\mathrm{RE}}(S,3)^2|\Delta|^2 \leq \frac{3\lambda}{2}\sqrt{s}\,|\Delta|. \end{align*} Because $\Delta\neq 0$, we have $|\Delta|>0$, so division by $|\Delta|/2$ is valid. This gives \begin{align*} |\Delta| \leq \frac{3\lambda\sqrt{s}}{\kappa_{\mathrm{RE}}(S,3)^2}. \end{align*} Finally, $3\leq 4$ and $\Delta=\hat{\beta}-\beta^*$, so \begin{align*} |\hat{\beta}-\beta^*| \leq \frac{4\lambda\sqrt{s}}{\kappa_{\mathrm{RE}}(S,3)^2}. \end{align*} This is the claimed Euclidean estimation bound. [/guided] [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.