[step:Use sparse curvature to dominate SCAD concavity]Let $h\in\mathbb R^p$ satisfy
\begin{align*}
|\operatorname{supp}(\hat\beta^{\mathrm{or}}+h)\cup S|\le Ms,
\qquad
|h|\le \rho,
\end{align*}
where
\begin{align*}
0<\rho\le \rho_0.
\end{align*}
Set
\begin{align*}
T:=\operatorname{supp}(\hat\beta^{\mathrm{or}}+h)\cup S.
\end{align*}
Then $\operatorname{supp}(h)\subseteq T$ and $|T|\le Ms$, so the sparse curvature bound gives
\begin{align*}
\frac{1}{n}|Xh|^2\ge \kappa |h|^2.
\end{align*}
We use two one-dimensional SCAD lower bounds. First, for every $t\ge 0$, the derivative satisfies
\begin{align*}
p'_{\lambda,a}(t)\ge \lambda-\frac{t}{a-1}.
\end{align*}
Indeed, on $0\le t\le\lambda$ this says $\lambda\ge\lambda-t/(a-1)$. On $\lambda<t\le a\lambda$, we compute
\begin{align*}
\frac{a\lambda-t}{a-1}-\left(\lambda-\frac{t}{a-1}\right)=\frac{\lambda}{a-1}>0,
\end{align*}
so the inequality holds on the middle branch. On $t>a\lambda$, it says $0\ge\lambda-t/(a-1)$, which follows from $t>a\lambda$ and $a>2$. Hence, for every $j\in S^c$, since $\hat\beta^{\mathrm{or}}_j=0$ and $p_{\lambda,a}(0)=0$, integration over $[0,|h_j|]$ with respect to one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) gives
\begin{align*}
p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j+h_j|)-p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j|)=p_{\lambda,a}(|h_j|)\ge \lambda |h_j|-\frac{h_j^2}{2(a-1)}.
\end{align*}
Second, define the even SCAD penalty map $q_{\lambda,a}:\mathbb R\to\mathbb R$ by
\begin{align*}
q_{\lambda,a}(u):=p_{\lambda,a}(|u|).
\end{align*}
Define also $F_{\lambda,a}:\mathbb R\to\mathbb R$ by
\begin{align*}
F_{\lambda,a}(u):=q_{\lambda,a}(u)+\frac{u^2}{2(a-1)}.
\end{align*}
We verify convexity of $F_{\lambda,a}$ directly from its one-sided derivatives. We use the elementary one-dimensional fact that a continuous piecewise $C^1$ function is convex if its one-sided derivative is nondecreasing across the open smooth pieces and the left derivative at each breakpoint is at most the right derivative. On the intervals $(-\infty,-a\lambda)$, $(-a\lambda,-\lambda)$, $(-\lambda,0)$, $(0,\lambda)$, $(\lambda,a\lambda)$, and $(a\lambda,\infty)$, the derivative of $q_{\lambda,a}$ is respectively
\begin{align*}
0,\quad -\frac{a\lambda+u}{a-1},\quad -\lambda,\quad \lambda,\quad \frac{a\lambda-u}{a-1},\quad 0.
\end{align*}
Therefore the derivative of $F_{\lambda,a}$ is respectively
\begin{align*}
\frac{u}{a-1},\quad -\frac{a\lambda}{a-1},\quad -\lambda+\frac{u}{a-1},\quad \lambda+\frac{u}{a-1},\quad \frac{a\lambda}{a-1},\quad \frac{u}{a-1}.
\end{align*}
Each displayed formula is nondecreasing on its interval. At the breakpoints $-a\lambda$, $-\lambda$, $0$, $\lambda$, and $a\lambda$, the left one-sided derivative is at most the right one-sided derivative; at $0$ the jump is from $-\lambda$ to $\lambda$. Hence $F'_{\lambda,a}$ is nondecreasing in the one-sided sense on $\mathbb R$, which proves that $F_{\lambda,a}$ is convex. For $j\in S$, the bound $|\hat\beta^{\mathrm{or}}_j|\ge a\lambda$ places $\hat\beta^{\mathrm{or}}_j$ in the flat SCAD region, so $q'_{\lambda,a}(\hat\beta^{\mathrm{or}}_j)=0$ and therefore
\begin{align*}
F'_{\lambda,a}(\hat\beta^{\mathrm{or}}_j)=\frac{\hat\beta^{\mathrm{or}}_j}{a-1}.
\end{align*}
Convexity of $F_{\lambda,a}$ at the point $\hat\beta^{\mathrm{or}}_j$ gives
\begin{align*}
F_{\lambda,a}(\hat\beta^{\mathrm{or}}_j+h_j)-F_{\lambda,a}(\hat\beta^{\mathrm{or}}_j)
\ge
\frac{\hat\beta^{\mathrm{or}}_j}{a-1}h_j.
\end{align*}
Expanding the definition of $F_{\lambda,a}$ and cancelling the linear quadratic term yields
\begin{align*}
p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j+h_j|)-p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j|)\ge -\frac{h_j^2}{2(a-1)}.
\end{align*}
Using $Y-X(\hat\beta^{\mathrm{or}}+h)=r^{\mathrm{or}}-Xh$, the objective difference is
\begin{align*}
Q_n(\hat\beta^{\mathrm{or}}+h)-Q_n(\hat\beta^{\mathrm{or}})=\frac{1}{2n}|Xh|^2-\frac{1}{n}(r^{\mathrm{or}})^\top Xh+\sum_{j=1}^p\left[p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j+h_j|)-p_{\lambda,a}(|\hat\beta^{\mathrm{or}}_j|)\right].
\end{align*}
Define $g_j:=X_j^\top r^{\mathrm{or}}/n$ for each $j\in\{1,\dots,p\}$. The active linear term vanishes because $X_S^\top r^{\mathrm{or}}/n=0$, while the inactive linear term is bounded below by $-\sum_{j\in S^c}|g_j||h_j|$. Combining the active and inactive penalty lower bounds gives
\begin{align*}
Q_n(\hat\beta^{\mathrm{or}}+h)-Q_n(\hat\beta^{\mathrm{or}})\ge\frac{1}{2n}|Xh|^2-\sum_{j\in S^c}|g_j||h_j|+\sum_{j\in S^c}\lambda |h_j|-\frac{1}{2(a-1)}\sum_{j=1}^p h_j^2.
\end{align*}
Since $|g_j|\le\lambda$ for every $j\in S^c$, the inactive linear loss is dominated by the inactive SCAD linear gain, and hence
\begin{align*}
Q_n(\hat\beta^{\mathrm{or}}+h)-Q_n(\hat\beta^{\mathrm{or}})\ge\frac{1}{2n}|Xh|^2-\frac{1}{2(a-1)}|h|^2.
\end{align*}
The sparse curvature bound then gives
\begin{align*}
Q_n(\hat\beta^{\mathrm{or}}+h)-Q_n(\hat\beta^{\mathrm{or}})\ge\frac{1}{2}\left(\kappa-\frac{1}{a-1}\right)|h|^2\ge 0.
\end{align*}
The last inequality uses $\kappa>1/(a-1)$. Therefore $Q_n(\hat\beta^{\mathrm{or}}+h)\ge Q_n(\hat\beta^{\mathrm{or}})$ for every $h$ such that $\hat\beta^{\mathrm{or}}+h\in\mathcal N_S(\rho)$.[/step]