Oracle Property for the SCAD Penalized Least-Squares Estimator

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We work on the event $\mathcal E_n$. First we use the oracle normal equations to show that every active oracle coordinate lies in the flat region of the SCAD penalty, so the penalty contributes no active-coordinate first-order term near the oracle estimator. Next we compute the oracle residual and use the score and inactive-active correlation bounds to verify the zero-coordinate stationarity inequality on $S^c$. Finally we compare the SCAD objective at $\hat\beta^{\mathrm{or}}+h$ and at $\hat\beta^{\mathrm{or}}$ for every sufficiently small sparse perturbation $h$, using the sparse curvature lower bound to dominate the maximum concavity of SCAD. [/proofplan] [step:Define the SCAD objective and the oracle estimator] Fix an outcome in $\mathcal E_n$. We first justify that $\Sigma_S\in\mathbb R^{s\times s}$ is invertible from the sparse curvature hypothesis. If $v\in\mathbb R^s$ is nonzero and $\delta\in\mathbb R^p$ is the vector with $\delta_S=v$ and $\delta_{S^c}=0$, then $\delta$ is supported on $S$, hence on a set of size at most $Ms$. Therefore \begin{align*} v^\top\Sigma_S v =\frac{1}{n}|X_Sv|^2 =\frac{1}{n}|X\delta|^2 \ge \kappa |\delta|^2 =\kappa |v|^2 >0. \end{align*} Thus $\Sigma_S$ is positive definite and hence invertible. Define the SCAD penalty $p_{\lambda,a}:[0,\infty)\to[0,\infty)$ by $p_{\lambda,a}(0)=0$ and by the one-sided derivative formula \begin{align*} p'_{\lambda,a}(t)=\lambda \quad\text{for }0<t\le \lambda. \end{align*} For $\lambda<t\le a\lambda$, define \begin{align*} p'_{\lambda,a}(t)=\frac{a\lambda-t}{a-1}. \end{align*} For $t>a\lambda$, define \begin{align*} p'_{\lambda,a}(t)=0. \end{align*} Define the SCAD objective $Q_n:\mathbb R^p\to\mathbb R$ by \begin{align*} Q_n(\beta):=\frac{1}{2n}|Y-X\beta|^2+\sum_{j=1}^p p_{\lambda,a}(|\beta_j|). \end{align*} Define the oracle estimator $\hat\beta^{\mathrm{or}}\in\mathbb R^p$ by setting its active coordinates to \begin{align*} \hat\beta^{\mathrm{or}}_S :=\beta^*_S+\Sigma_S^{-1}\frac{1}{n}X_S^\top\varepsilon. \end{align*} Set its inactive coordinates to \begin{align*} \hat\beta^{\mathrm{or}}_{S^c}:=0. \end{align*} [/step] [step:Place the oracle active coordinates in the flat SCAD region] Define the active oracle error vector $e_S\in\mathbb R^s$ by \begin{align*} e_S:=\hat\beta^{\mathrm{or}}_S-\beta^*_S = \Sigma_S^{-1}\frac{1}{n}X_S^\top\varepsilon. \end{align*} By the oracle estimation bound, \begin{align*} \|e_S\|_\infty\le r_n. \end{align*} For every $j\in S$, the beta-min condition gives \begin{align*} |\hat\beta^{\mathrm{or}}_j| \ge |\beta_j^*|-|\hat\beta^{\mathrm{or}}_j-\beta_j^*| \ge a\lambda+r_n-r_n = a\lambda. \end{align*} Thus every active oracle coordinate lies at or beyond the SCAD flat threshold. At the endpoint $t=a\lambda$, the middle branch gives $p'_{\lambda,a}(a\lambda)=(a\lambda-a\lambda)/(a-1)=0$, while for $t>a\lambda$ the final branch gives $p'_{\lambda,a}(t)=0$. Hence $p'_{\lambda,a}(t)=0$ for every $t\ge a\lambda$. Choose \begin{align*} \rho_0:=\frac{\lambda}{2}. \end{align*} Then $\rho_0>0$. The argument below will not require the active-coordinate penalty to remain constant throughout the whole neighbourhood; it only uses that each active oracle coordinate itself lies in the flat SCAD region and then controls any possible movement into the concave region by the global one-dimensional SCAD curvature bound. [/step] [step:Verify active normal equations and inactive stationarity] Define the residual vector $r^{\mathrm{or}}\in\mathbb R^n$ by \begin{align*} r^{\mathrm{or}}:=Y-X\hat\beta^{\mathrm{or}}. \end{align*} Since $Y=X_S\beta^*_S+\varepsilon$ and $\hat\beta^{\mathrm{or}}_{S^c}=0$, \begin{align*} r^{\mathrm{or}} = \varepsilon-X_S(\hat\beta^{\mathrm{or}}_S-\beta^*_S) = \varepsilon-X_S\Sigma_S^{-1}\frac{1}{n}X_S^\top\varepsilon. \end{align*} The active normal equations follow from the definition of $\Sigma_S$. First, \begin{align*} \frac{1}{n}X_S^\top r^{\mathrm{or}} = \frac{1}{n}X_S^\top\varepsilon - \frac{1}{n}X_S^\top X_S\Sigma_S^{-1}\frac{1}{n}X_S^\top\varepsilon. \end{align*} Since $X_S^\top X_S/n=\Sigma_S$, this becomes \begin{align*} \frac{1}{n}X_S^\top r^{\mathrm{or}} = \frac{1}{n}X_S^\top\varepsilon - \Sigma_S\Sigma_S^{-1}\frac{1}{n}X_S^\top\varepsilon = 0. \end{align*} For inactive coordinates, define $g_{S^c}\in\mathbb R^{p-s}$ by \begin{align*} g_{S^c}:=\frac{1}{n}X_{S^c}^\top r^{\mathrm{or}}. \end{align*} Using the residual identity, \begin{align*} g_{S^c} = \frac{1}{n}X_{S^c}^\top\varepsilon - \frac{1}{n}X_{S^c}^\top X_S\Sigma_S^{-1}\frac{1}{n}X_S^\top\varepsilon. \end{align*} Taking the maximum norm and using the score bound and the inactive-active correlation bound gives \begin{align*} \|g_{S^c}\|_\infty \le \left\|\frac{1}{n}X_{S^c}^\top\varepsilon\right\|_\infty + \left\|\frac{1}{n}X_{S^c}^\top X_S\Sigma_S^{-1}\right\|_\infty \left\|\frac{1}{n}X_S^\top\varepsilon\right\|_\infty. \end{align*} Therefore \begin{align*} \|g_{S^c}\|_\infty \le \frac{\eta\lambda}{4} + \frac{1-\eta}{2}\cdot\frac{\eta\lambda}{4} \le \lambda. \end{align*} Thus \begin{align*} \left|\frac{1}{n}X_j^\top r^{\mathrm{or}}\right|\le \lambda \end{align*} for every $j\in S^c$, which is the SCAD zero-coordinate stationarity inequality. [guided] The oracle estimator is obtained by fitting least squares on the true support $S$ and setting all inactive coordinates to zero. The first thing to check is therefore that the least-squares residual is orthogonal to the active columns. We define \begin{align*} r^{\mathrm{or}}:=Y-X\hat\beta^{\mathrm{or}}. \end{align*} Using $Y=X_S\beta^*_S+\varepsilon$ and $\hat\beta^{\mathrm{or}}_{S^c}=0$, we get \begin{align*} r^{\mathrm{or}} = \varepsilon-X_S(\hat\beta^{\mathrm{or}}_S-\beta^*_S). \end{align*} The oracle definition gives \begin{align*} \hat\beta^{\mathrm{or}}_S-\beta^*_S = \Sigma_S^{-1}\frac{1}{n}X_S^\top\varepsilon, \end{align*} so \begin{align*} r^{\mathrm{or}} = \varepsilon-X_S\Sigma_S^{-1}\frac{1}{n}X_S^\top\varepsilon. \end{align*} Now multiply by $X_S^\top/n$. Since $\Sigma_S=X_S^\top X_S/n$, \begin{align*} \frac{1}{n}X_S^\top r^{\mathrm{or}}=\frac{1}{n}X_S^\top\varepsilon-\frac{1}{n}X_S^\top X_S\Sigma_S^{-1}\frac{1}{n}X_S^\top\varepsilon. \end{align*} Since $X_S^\top X_S/n=\Sigma_S$, this becomes \begin{align*} \frac{1}{n}X_S^\top r^{\mathrm{or}}=\frac{1}{n}X_S^\top\varepsilon-\Sigma_S\Sigma_S^{-1}\frac{1}{n}X_S^\top\varepsilon=0. \end{align*} This proves stationarity along active coordinates. Because the preceding step placed the oracle active coordinates themselves in the flat SCAD region, the penalty derivative at those coordinates is also zero. For inactive coordinates, the issue is different: $\hat\beta^{\mathrm{or}}_j=0$, so stationarity means the least-squares score must lie inside the subdifferential interval $[-\lambda,\lambda]$ of $p_{\lambda,a}(|\cdot|)$ at zero. Define \begin{align*} g_{S^c}:=\frac{1}{n}X_{S^c}^\top r^{\mathrm{or}}. \end{align*} Substituting the residual expression gives \begin{align*} g_{S^c} = \frac{1}{n}X_{S^c}^\top\varepsilon - \frac{1}{n}X_{S^c}^\top X_S\Sigma_S^{-1}\frac{1}{n}X_S^\top\varepsilon. \end{align*} The first term is controlled directly by the score bound. The second term is controlled by multiplying the inactive-active correlation matrix against the active score vector. Using the maximum row-sum matrix norm, \begin{align*} \|g_{S^c}\|_\infty \le \left\|\frac{1}{n}X_{S^c}^\top\varepsilon\right\|_\infty + \left\|\frac{1}{n}X_{S^c}^\top X_S\Sigma_S^{-1}\right\|_\infty \left\|\frac{1}{n}X_S^\top\varepsilon\right\|_\infty. \end{align*} The score bound controls both active and inactive subvectors of $X^\top\varepsilon/n$, and the inactive-active correlation bound controls the matrix factor, so \begin{align*} \|g_{S^c}\|_\infty \le \frac{\eta\lambda}{4} + \frac{1-\eta}{2}\cdot\frac{\eta\lambda}{4} \le \lambda. \end{align*} Hence every inactive coordinate satisfies \begin{align*} \left|\frac{1}{n}X_j^\top r^{\mathrm{or}}\right|\le \lambda. \end{align*} This is exactly the condition that the linear term in an inactive perturbation can be dominated by the SCAD penalty near zero. [/guided] [/step] [step:Use sparse curvature to dominate SCAD concavity] Let $h\in\mathbb R^p$ satisfy \begin{align*} |\operatorname{supp}(\hat\beta^{\mathrm{or}}+h)\cup S|\le Ms, \qquad |h|\le \rho, \end{align*} where \begin{align*} 0<\rho\le \rho_0. \end{align*} Set \begin{align*} T:=\operatorname{supp}(\hat\beta^{\mathrm{or}}+h)\cup S. \end{align*} Then $\operatorname{supp}(h)\subseteq T$ and $|T|\le Ms$, so the sparse curvature bound gives \begin{align*} \frac{1}{n}|Xh|^2\ge \kappa |h|^2. \end{align*} We use two one-dimensional SCAD lower bounds. First, for every $t\ge 0$, the derivative satisfies \begin{align*} p'_{\lambda,a}(t)\ge \lambda-\frac{t}{a-1}. \end{align*} Indeed, on $0\le t\le\lambda$ this says $\lambda\ge\lambda-t/(a-1)$. On $\lambda<t\le a\lambda$, we compute \begin{align*} \frac{a\lambda-t}{a-1}-\left(\lambda-\frac{t}{a-1}\right)=\frac{\lambda}{a-1}>0, \end{align*} so the inequality holds on the middle branch. On $t>a\lambda$, it says $0\ge\lambda-t/(a-1)$, which follows from $t>a\lambda$ and $a>2$. Hence, for every $j\in S^c$, since $\hat\beta^{\mathrm{or}}_j=0$ and $p_{\lambda,a}(0)=0$, integration over $[0,|h_j|]$ with respect to one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) gives \begin{align*} p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j+h_j|)-p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j|)=p_{\lambda,a}(|h_j|)\ge \lambda |h_j|-\frac{h_j^2}{2(a-1)}. \end{align*} Second, define the even SCAD penalty map $q_{\lambda,a}:\mathbb R\to\mathbb R$ by \begin{align*} q_{\lambda,a}(u):=p_{\lambda,a}(|u|). \end{align*} Define also $F_{\lambda,a}:\mathbb R\to\mathbb R$ by \begin{align*} F_{\lambda,a}(u):=q_{\lambda,a}(u)+\frac{u^2}{2(a-1)}. \end{align*} We verify convexity of $F_{\lambda,a}$ directly from its one-sided derivatives. We use the elementary one-dimensional fact that a continuous piecewise $C^1$ function is convex if its one-sided derivative is nondecreasing across the open smooth pieces and the left derivative at each breakpoint is at most the right derivative. On the intervals $(-\infty,-a\lambda)$, $(-a\lambda,-\lambda)$, $(-\lambda,0)$, $(0,\lambda)$, $(\lambda,a\lambda)$, and $(a\lambda,\infty)$, the derivative of $q_{\lambda,a}$ is respectively \begin{align*} 0,\quad -\frac{a\lambda+u}{a-1},\quad -\lambda,\quad \lambda,\quad \frac{a\lambda-u}{a-1},\quad 0. \end{align*} Therefore the derivative of $F_{\lambda,a}$ is respectively \begin{align*} \frac{u}{a-1},\quad -\frac{a\lambda}{a-1},\quad -\lambda+\frac{u}{a-1},\quad \lambda+\frac{u}{a-1},\quad \frac{a\lambda}{a-1},\quad \frac{u}{a-1}. \end{align*} Each displayed formula is nondecreasing on its interval. At the breakpoints $-a\lambda$, $-\lambda$, $0$, $\lambda$, and $a\lambda$, the left one-sided derivative is at most the right one-sided derivative; at $0$ the jump is from $-\lambda$ to $\lambda$. Hence $F'_{\lambda,a}$ is nondecreasing in the one-sided sense on $\mathbb R$, which proves that $F_{\lambda,a}$ is convex. For $j\in S$, the bound $|\hat\beta^{\mathrm{or}}_j|\ge a\lambda$ places $\hat\beta^{\mathrm{or}}_j$ in the flat SCAD region, so $q'_{\lambda,a}(\hat\beta^{\mathrm{or}}_j)=0$ and therefore \begin{align*} F'_{\lambda,a}(\hat\beta^{\mathrm{or}}_j)=\frac{\hat\beta^{\mathrm{or}}_j}{a-1}. \end{align*} Convexity of $F_{\lambda,a}$ at the point $\hat\beta^{\mathrm{or}}_j$ gives \begin{align*} F_{\lambda,a}(\hat\beta^{\mathrm{or}}_j+h_j)-F_{\lambda,a}(\hat\beta^{\mathrm{or}}_j) \ge \frac{\hat\beta^{\mathrm{or}}_j}{a-1}h_j. \end{align*} Expanding the definition of $F_{\lambda,a}$ and cancelling the linear quadratic term yields \begin{align*} p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j+h_j|)-p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j|)\ge -\frac{h_j^2}{2(a-1)}. \end{align*} Using $Y-X(\hat\beta^{\mathrm{or}}+h)=r^{\mathrm{or}}-Xh$, the objective difference is \begin{align*} Q_n(\hat\beta^{\mathrm{or}}+h)-Q_n(\hat\beta^{\mathrm{or}})=\frac{1}{2n}|Xh|^2-\frac{1}{n}(r^{\mathrm{or}})^\top Xh+\sum_{j=1}^p\left[p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j+h_j|)-p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j|)\right]. \end{align*} Define $g_j:=X_j^\top r^{\mathrm{or}}/n$ for each $j\in\{1,\dots,p\}$. The active linear term vanishes because $X_S^\top r^{\mathrm{or}}/n=0$, while the inactive linear term is bounded below by $-\sum_{j\in S^c}|g_j||h_j|$. Combining the active and inactive penalty lower bounds gives \begin{align*} Q_n(\hat\beta^{\mathrm{or}}+h)-Q_n(\hat\beta^{\mathrm{or}})\ge\frac{1}{2n}|Xh|^2-\sum_{j\in S^c}|g_j||h_j|+\sum_{j\in S^c}\lambda |h_j|-\frac{1}{2(a-1)}\sum_{j=1}^p h_j^2. \end{align*} Since $|g_j|\le\lambda$ for every $j\in S^c$, the inactive linear loss is dominated by the inactive SCAD linear gain, and hence \begin{align*} Q_n(\hat\beta^{\mathrm{or}}+h)-Q_n(\hat\beta^{\mathrm{or}})\ge\frac{1}{2n}|Xh|^2-\frac{1}{2(a-1)}|h|^2. \end{align*} The sparse curvature bound then gives \begin{align*} Q_n(\hat\beta^{\mathrm{or}}+h)-Q_n(\hat\beta^{\mathrm{or}})\ge\frac{1}{2}\left(\kappa-\frac{1}{a-1}\right)|h|^2\ge 0. \end{align*} The last inequality uses $\kappa>1/(a-1)$. Therefore $Q_n(\hat\beta^{\mathrm{or}}+h)\ge Q_n(\hat\beta^{\mathrm{or}})$ for every $h$ such that $\hat\beta^{\mathrm{or}}+h\in\mathcal N_S(\rho)$. [guided] The point of this step is to compare the positive quadratic curvature of the least-squares loss with the possible negative curvature of the SCAD penalty. Let $h\in\mathbb R^p$ satisfy \begin{align*} |\operatorname{supp}(\hat\beta^{\mathrm{or}}+h)\cup S|\le Ms, \qquad |h|\le \rho. \end{align*} Define \begin{align*} T:=\operatorname{supp}(\hat\beta^{\mathrm{or}}+h)\cup S. \end{align*} Then $\operatorname{supp}(h)\subseteq T$ and $|T|\le Ms$, so the sparse curvature hypothesis applies to $h$ and gives \begin{align*} \frac{1}{n}|Xh|^2\ge \kappa |h|^2. \end{align*} For inactive coordinates $j\in S^c$, we have $\hat\beta^{\mathrm{or}}_j=0$. Let $\mathcal L^1$ denote one-dimensional Lebesgue measure on $\mathbb R$. The SCAD derivative lower bound $p'_{\lambda,a}(t)\ge \lambda-t/(a-1)$ for $t\ge0$ gives, after integrating over $[0,|h_j|]$ with respect to $\mathcal L^1$, \begin{align*} p_{\lambda,a}(|h_j|)-p_{\lambda,a}(0) \ge \lambda |h_j|-\frac{h_j^2}{2(a-1)}. \end{align*} For active coordinates, define $q_{\lambda,a}:\mathbb R\to\mathbb R$ by $q_{\lambda,a}(u):=p_{\lambda,a}(|u|)$ and define $F_{\lambda,a}:\mathbb R\to\mathbb R$ by \begin{align*} F_{\lambda,a}(u):=q_{\lambda,a}(u)+\frac{u^2}{2(a-1)}. \end{align*} We now justify the convexity assertion rather than treating it as a black box. The elementary criterion we use is this: a continuous piecewise $C^1$ function on $\mathbb R$ is convex if its one-sided derivative is nondecreasing on each smooth interval and the left derivative at every breakpoint is at most the right derivative. On the intervals $(-\infty,-a\lambda)$, $(-a\lambda,-\lambda)$, $(-\lambda,0)$, $(0,\lambda)$, $(\lambda,a\lambda)$, and $(a\lambda,\infty)$, the derivative of $q_{\lambda,a}$ is respectively \begin{align*} 0,\quad -\frac{a\lambda+u}{a-1},\quad -\lambda,\quad \lambda,\quad \frac{a\lambda-u}{a-1},\quad 0. \end{align*} Adding the derivative of $u^2/(2(a-1))$ gives the corresponding derivative values of $F_{\lambda,a}$: \begin{align*} \frac{u}{a-1},\quad -\frac{a\lambda}{a-1},\quad -\lambda+\frac{u}{a-1},\quad \lambda+\frac{u}{a-1},\quad \frac{a\lambda}{a-1},\quad \frac{u}{a-1}. \end{align*} Each expression is nondecreasing on its own interval. Checking the junctions $-a\lambda$, $-\lambda$, $0$, $\lambda$, and $a\lambda$, the left derivative never exceeds the right derivative; the only jump at $0$ goes upward from $-\lambda$ to $\lambda$. Thus $F'_{\lambda,a}$ is nondecreasing in the one-sided derivative sense, which is the elementary one-dimensional convexity criterion used here. Hence $F_{\lambda,a}$ is convex. Since $|\hat\beta^{\mathrm{or}}_j|\ge a\lambda$ for $j\in S$, the SCAD part is flat at $\hat\beta^{\mathrm{or}}_j$, hence $q'_{\lambda,a}(\hat\beta^{\mathrm{or}}_j)=0$ and \begin{align*} F'_{\lambda,a}(\hat\beta^{\mathrm{or}}_j)=\frac{\hat\beta^{\mathrm{or}}_j}{a-1}. \end{align*} Convexity gives \begin{align*} F_{\lambda,a}(\hat\beta^{\mathrm{or}}_j+h_j)-F_{\lambda,a}(\hat\beta^{\mathrm{or}}_j) \ge \frac{\hat\beta^{\mathrm{or}}_j}{a-1}h_j. \end{align*} After expanding $F_{\lambda,a}$, this is exactly \begin{align*} p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j+h_j|)-p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j|) \ge -\frac{h_j^2}{2(a-1)}. \end{align*} Now expand the objective. Since $Y-X(\hat\beta^{\mathrm{or}}+h)=r^{\mathrm{or}}-Xh$, \begin{align*} Q_n(\hat\beta^{\mathrm{or}}+h)-Q_n(\hat\beta^{\mathrm{or}})=\frac{1}{2n}|Xh|^2-\frac{1}{n}(r^{\mathrm{or}})^\top Xh+\sum_{j=1}^p\left[p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j+h_j|)-p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j|)\right]. \end{align*} Define $g_j:=X_j^\top r^{\mathrm{or}}/n$ for $j\in\{1,\dots,p\}$. The active normal equations give $g_j=0$ for $j\in S$, while the previous stationarity step gives $|g_j|\le\lambda$ for $j\in S^c$. Combining the penalty bounds with the objective expansion gives \begin{align*} Q_n(\hat\beta^{\mathrm{or}}+h)-Q_n(\hat\beta^{\mathrm{or}}) \ge \frac{1}{2n}|Xh|^2-\sum_{j\in S^c}|g_j||h_j|+\sum_{j\in S^c}\lambda |h_j|-\frac{1}{2(a-1)}|h|^2. \end{align*} Because $|g_j|\le\lambda$ on $S^c$, the inactive linear loss is dominated by the inactive SCAD linear gain. Hence \begin{align*} Q_n(\hat\beta^{\mathrm{or}}+h)-Q_n(\hat\beta^{\mathrm{or}}) \ge \frac{1}{2n}|Xh|^2-\frac{1}{2(a-1)}|h|^2. \end{align*} Finally the sparse curvature bound gives \begin{align*} Q_n(\hat\beta^{\mathrm{or}}+h)-Q_n(\hat\beta^{\mathrm{or}}) \ge \frac{1}{2}\left(\kappa-\frac{1}{a-1}\right)|h|^2 \ge 0, \end{align*} because $\kappa>1/(a-1)$. Thus every admissible sparse perturbation has nonnegative objective increase. [/guided] [/step] [step:Conclude sparse local minimality and the probability statement] Choose any \begin{align*} \rho\in(0,\rho_0]. \end{align*} The preceding step proves that, on $\mathcal E_n$, \begin{align*} Q_n(\beta)\ge Q_n(\hat\beta^{\mathrm{or}}) \end{align*} for every $\beta\in\mathcal N_S(\rho)$. Hence $\hat\beta^{\mathrm{or}}$ is a local minimizer of the SCAD criterion relative to $\mathcal N_S(\rho)$. Since this conclusion holds for every outcome in $\mathcal E_n$ and $\mathbb P(\mathcal E_n)\ge 1-\delta_n$, the event that there exists a sparse local minimizer of $Q_n$ equal to $\hat\beta^{\mathrm{or}}$ has probability at least $1-\delta_n$. This completes the proof. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.