[proofplan]
The proof separates the curvature of the objective into the squared-loss contribution and the penalty contribution. On an $m$-sparse coordinate subspace, the [sparse curvature](/page/Sparse%20Curvature) hypothesis gives a lower quadratic curvature bound for the least-squares loss. The coordinatewise [maximum concavity](/page/Maximum%20Concavity) hypothesis says that adding $\frac{\zeta}{2}s^2$ makes the even scalar penalty $s \mapsto p_\lambda(|s|)$ convexified on $\mathbb{R}$, so the penalty can destroy at most $\zeta$ units of curvature per coordinate. Combining these two estimates gives the restricted chord curvature inequality with parameter $\kappa-\zeta$; when $\kappa>\zeta$, this parameter is positive and gives genuine [strong convexity](/page/Strong%20Convexity) along every line segment whose active coordinates have size at most $m$.
[/proofplan]
[step:Fix the sparse coordinate subspace containing the line segment]
The theorem statement declares $n,p,m \in \mathbb{N}$, the design matrix $X \in \mathbb{R}^{n \times p}$, the response vector $Y \in \mathbb{R}^n$, the squared loss $\ell: \mathbb{R}^p \to \mathbb{R}$, and the objective $Q: \mathbb{R}^p \to \mathbb{R}$. Let $\beta,\beta' \in \mathbb{R}^p$ satisfy
\begin{align*}
|\operatorname{supp}(\beta) \cup \operatorname{supp}(\beta')| \leq m.
\end{align*}
Define the active index set
\begin{align*}
S := \operatorname{supp}(\beta) \cup \operatorname{supp}(\beta') \subseteq \{1,\dots,p\}.
\end{align*}
For every $t \in [0,1]$, define the line segment point
\begin{align*}
\beta_t := (1-t)\beta + t\beta' \in \mathbb{R}^p.
\end{align*}
Since both $\beta$ and $\beta'$ vanish outside $S$, the vector $\beta_t$ also vanishes outside $S$. Hence
\begin{align*}
\operatorname{supp}(\beta_t) \subseteq S
\end{align*}
for every $t \in [0,1]$.
Define the displacement vector $\delta \in \mathbb{R}^p$ by
\begin{align*}
\delta := \beta' - \beta.
\end{align*}
Then $\operatorname{supp}(\delta) \subseteq S$, so $|\operatorname{supp}(\delta)| \leq m$.
[/step]
[step:Use sparse curvature to bound the squared loss from below along the chord]
For $t \in [0,1]$, expand the squared loss at $\beta_t=\beta+t\delta$:
\begin{align*}
\ell(\beta_t) = \frac{1}{2n}|Y-X\beta-tX\delta|^2.
\end{align*}
Expanding the Euclidean square gives
\begin{align*}
\ell(\beta_t) = \frac{1}{2n}|Y-X\beta|^2 - \frac{t}{n}(Y-X\beta)\cdot X\delta + \frac{t^2}{2n}|X\delta|^2.
\end{align*}
Similarly, first write
\begin{align*}
\ell(\beta') = \frac{1}{2n}|Y-X\beta-X\delta|^2.
\end{align*}
Expanding the Euclidean square gives
\begin{align*}
\ell(\beta') = \frac{1}{2n}|Y-X\beta|^2 - \frac{1}{n}(Y-X\beta)\cdot X\delta + \frac{1}{2n}|X\delta|^2.
\end{align*}
Taking the convex combination $(1-t)\ell(\beta)+t\ell(\beta')$ and subtracting $\ell(\beta_t)$ gives
\begin{align*}
(1-t)\ell(\beta)+t\ell(\beta')-\ell(\beta_t)
= \frac{t(1-t)}{2n}|X\delta|^2.
\end{align*}
Because $|\operatorname{supp}(\delta)| \leq m$, the sparse curvature hypothesis applies to $\delta$ and yields
\begin{align*}
\frac{1}{n}|X\delta|^2 \geq \kappa|\delta|^2.
\end{align*}
Therefore
\begin{align*}
\ell(\beta_t)
\leq (1-t)\ell(\beta)+t\ell(\beta')-\frac{\kappa}{2}t(1-t)|\delta|^2.
\end{align*}
[guided]
The squared loss is a quadratic function of $\beta$, so along any line it has an exact one-dimensional quadratic expansion. We use this exact identity because the strong convexity inequality is also a statement about the gap between the value at a convex combination and the convex combination of endpoint values. Define
\begin{align*}
S := \operatorname{supp}(\beta)\cup\operatorname{supp}(\beta') \subseteq \{1,\dots,p\}.
\end{align*}
The theorem hypotheses give $|S|\leq m$.
Define the displacement vector $\delta \in \mathbb{R}^p$ by
\begin{align*}
\delta := \beta' - \beta.
\end{align*}
Since $\beta$ and $\beta'$ are both supported in $S$, their difference $\delta$ is supported in $S$ as well. Hence $|\operatorname{supp}(\delta)| \leq m$, which is precisely the sparsity condition needed to apply the sparse curvature hypothesis.
For $t \in [0,1]$, the point on the line segment from $\beta$ to $\beta'$ is the vector $\beta_t \in \mathbb{R}^p$ defined by
\begin{align*}
\beta_t = \beta + t\delta.
\end{align*}
Expanding the Euclidean square in $\mathbb{R}^n$ gives
\begin{align*}
\ell(\beta_t) = \frac{1}{2n}|Y-X(\beta+t\delta)|^2.
\end{align*}
Since $X(\beta+t\delta)=X\beta+tX\delta$, this is
\begin{align*}
\ell(\beta_t) = \frac{1}{2n}|Y-X\beta-tX\delta|^2.
\end{align*}
Expanding the Euclidean square in $\mathbb{R}^n$ gives
\begin{align*}
\ell(\beta_t) = \frac{1}{2n}|Y-X\beta|^2 - \frac{t}{n}(Y-X\beta)\cdot X\delta + \frac{t^2}{2n}|X\delta|^2.
\end{align*}
At the endpoint $t=1$, the same expansion first gives
\begin{align*}
\ell(\beta') = \frac{1}{2n}|Y-X\beta-X\delta|^2.
\end{align*}
Expanding the Euclidean square gives
\begin{align*}
\ell(\beta') = \frac{1}{2n}|Y-X\beta|^2 - \frac{1}{n}(Y-X\beta)\cdot X\delta + \frac{1}{2n}|X\delta|^2.
\end{align*}
Now compare $\ell(\beta_t)$ with the chord value $(1-t)\ell(\beta)+t\ell(\beta')$. The constant terms and the linear terms in $t$ cancel, leaving exactly
\begin{align*}
(1-t)\ell(\beta)+t\ell(\beta')-\ell(\beta_t)
= \frac{t(1-t)}{2n}|X\delta|^2.
\end{align*}
This identity says that the curvature of the squared loss along this line is measured by $\frac{1}{n}|X\delta|^2$. Since $\delta$ is supported on at most $m$ coordinates, the sparse curvature assumption gives
\begin{align*}
\frac{1}{n}|X\delta|^2 \geq \kappa|\delta|^2.
\end{align*}
Substituting this lower bound into the exact chord identity yields
\begin{align*}
\ell(\beta_t)
\leq (1-t)\ell(\beta)+t\ell(\beta')-\frac{\kappa}{2}t(1-t)|\delta|^2.
\end{align*}
Thus the loss contributes at least $\kappa$ units of strong convexity along the sparse line segment.
[/guided]
[/step]
[step:Use maximum concavity to bound the penalty loss of curvature]
The theorem statement declares the folded-concave penalty as a map $p_\lambda: [0,\infty) \to \mathbb{R}$ and assumes coordinatewise maximum concavity at most $\zeta$. Thus the scalar convexification $g: \mathbb{R} \to \mathbb{R}$ defined by
\begin{align*}
g(s) = p_\lambda(|s|)+\frac{\zeta}{2}s^2
\end{align*}
is convex on $\mathbb{R}$. For each coordinate $j \in \{1,\dots,p\}$, convexity of $g$ applied to the two [real numbers](/page/Real%20Numbers) $\beta_j$ and $\beta'_j$ gives
\begin{align*}
g((1-t)\beta_j+t\beta'_j)
\leq (1-t)g(\beta_j)+tg(\beta'_j).
\end{align*}
Expanding the definition of $g$ and using the scalar identity
\begin{align*}
(1-t)\beta_j^2+t(\beta'_j)^2-((1-t)\beta_j+t\beta'_j)^2
= t(1-t)(\beta'_j-\beta_j)^2,
\end{align*}
we obtain
\begin{align*}
p_\lambda(|(\beta_t)_j|)
\leq (1-t)p_\lambda(|\beta_j|)
+t p_\lambda(|\beta'_j|)
+\frac{\zeta}{2}t(1-t)(\beta'_j-\beta_j)^2.
\end{align*}
Summing this inequality over $j=1,\dots,p$ gives
\begin{align*}
\sum_{j=1}^p p_\lambda(|(\beta_t)_j|) \leq (1-t)\sum_{j=1}^p p_\lambda(|\beta_j|)+t\sum_{j=1}^p p_\lambda(|\beta'_j|)+\frac{\zeta}{2}t(1-t)|\delta|^2.
\end{align*}
[guided]
The penalty is separable across coordinates, so it is enough to understand what happens to one coordinate. The theorem statement declares the penalty as a map $p_\lambda: [0,\infty) \to \mathbb{R}$ and assumes coordinatewise maximum concavity at most $\zeta$. This means that the scalar convexification $g: \mathbb{R} \to \mathbb{R}$ defined by
\begin{align*}
g(s) = p_\lambda(|s|)+\frac{\zeta}{2}s^2
\end{align*}
is convex on $\mathbb{R}$. This coordinatewise formulation is the hypothesis needed here: the coordinates $\beta_j$ and $\beta'_j$ may have either sign, so convexity only of the radial map on $[0,\infty)$ would not by itself justify applying convexity between the real numbers $\beta_j$ and $\beta'_j$. Therefore, for every $j \in \{1,\dots,p\}$ and every $t \in [0,1]$, convexity of $g$ applied to the two real numbers $\beta_j$ and $\beta'_j$ gives
\begin{align*}
g((1-t)\beta_j+t\beta'_j) \leq (1-t)g(\beta_j)+tg(\beta'_j).
\end{align*}
Substituting the definition of $g$ into this inequality gives
\begin{align*}
p_\lambda(|(1-t)\beta_j+t\beta'_j|)+\frac{\zeta}{2}((1-t)\beta_j+t\beta'_j)^2 \leq (1-t)p_\lambda(|\beta_j|)+t p_\lambda(|\beta'_j|)+\frac{\zeta}{2}\bigl((1-t)\beta_j^2+t(\beta'_j)^2\bigr).
\end{align*}
Moving the quadratic term on the left to the right and using
\begin{align*}
(1-t)\beta_j^2+t(\beta'_j)^2-((1-t)\beta_j+t\beta'_j)^2 = t(1-t)(\beta'_j-\beta_j)^2
\end{align*}
yields
\begin{align*}
p_\lambda(|(\beta_t)_j|) \leq (1-t)p_\lambda(|\beta_j|)+t p_\lambda(|\beta'_j|)+\frac{\zeta}{2}t(1-t)(\beta'_j-\beta_j)^2.
\end{align*}
Finally, summing over all coordinates gives
\begin{align*}
\sum_{j=1}^p p_\lambda(|(\beta_t)_j|) \leq (1-t)\sum_{j=1}^p p_\lambda(|\beta_j|)+t\sum_{j=1}^p p_\lambda(|\beta'_j|)+\frac{\zeta}{2}t(1-t)|\delta|^2.
\end{align*}
This is the precise sense in which the penalty can remove at most $\zeta$ units of quadratic curvature along the chord.
[/guided]
[/step]
[step:Combine the loss and penalty estimates to obtain the restricted chord curvature inequality]
By definition of $Q$, we have
\begin{align*}
Q(\beta_t) = \ell(\beta_t)+\sum_{j=1}^p p_\lambda(|(\beta_t)_j|).
\end{align*}
Adding the two estimates gives
\begin{align*}
Q(\beta_t) \leq (1-t)\ell(\beta)+t\ell(\beta')-\frac{\kappa}{2}t(1-t)|\delta|^2 +(1-t)\sum_{j=1}^p p_\lambda(|\beta_j|)+t\sum_{j=1}^p p_\lambda(|\beta'_j|)+\frac{\zeta}{2}t(1-t)|\delta|^2.
\end{align*}
Collecting the endpoint terms and the quadratic terms gives
\begin{align*}
Q(\beta_t) \leq (1-t)Q(\beta)+tQ(\beta')-\frac{\kappa-\zeta}{2}t(1-t)|\delta|^2.
\end{align*}
Since $\delta=\beta'-\beta$, this is
\begin{align*}
Q((1-t)\beta+t\beta') \leq (1-t)Q(\beta)+tQ(\beta')-\frac{\kappa-\zeta}{2}t(1-t)|\beta'-\beta|^2.
\end{align*}
This is the restricted chord curvature inequality with parameter $\kappa-\zeta$ along the segment from $\beta$ to $\beta'$.
The same argument also applies to any two vectors $u,v\in\mathbb{R}^p$ with $\operatorname{supp}(u)\cup\operatorname{supp}(v)\subseteq S$, because then $\operatorname{supp}(v-u)\subseteq S$ and $|S|\leq m$. Hence the restriction of $Q$ to the coordinate subspace supported on $S$ has strong convexity modulus at least $\kappa-\zeta$. If $\kappa>\zeta$, then $\kappa-\zeta>0$, so the displayed inequality is the strong convexity inequality on each sparse line segment with positive modulus at least $\kappa-\zeta$. Hence $Q$ is strongly convex along every line segment joining sparse vectors whose union of supports has cardinality at most $m$. This proves the claimed local restricted strong convexity.
[guided]
The objective is the sum of the squared loss and the coordinatewise penalty. By definition,
\begin{align*}
Q(\beta_t) = \ell(\beta_t)+\sum_{j=1}^p p_\lambda(|(\beta_t)_j|).
\end{align*}
The loss estimate proved above is
\begin{align*}
\ell(\beta_t) \leq (1-t)\ell(\beta)+t\ell(\beta')-\frac{\kappa}{2}t(1-t)|\delta|^2.
\end{align*}
The penalty estimate proved above is
\begin{align*}
\sum_{j=1}^p p_\lambda(|(\beta_t)_j|) \leq (1-t)\sum_{j=1}^p p_\lambda(|\beta_j|)+t\sum_{j=1}^p p_\lambda(|\beta'_j|)+\frac{\zeta}{2}t(1-t)|\delta|^2.
\end{align*}
Adding these two inequalities is valid because both have the same left-side point $\beta_t$ and the same chord parameter $t$. The endpoint loss terms and endpoint penalty terms combine back into $Q(\beta)$ and $Q(\beta')$, while the quadratic coefficients combine by
\begin{align*}
-\frac{\kappa}{2}+\frac{\zeta}{2}=-\frac{\kappa-\zeta}{2}.
\end{align*}
Therefore
\begin{align*}
Q(\beta_t) \leq (1-t)Q(\beta)+tQ(\beta')-\frac{\kappa-\zeta}{2}t(1-t)|\delta|^2.
\end{align*}
Since $\beta_t=(1-t)\beta+t\beta'$ and $\delta=\beta'-\beta$, this becomes
\begin{align*}
Q((1-t)\beta+t\beta') \leq (1-t)Q(\beta)+tQ(\beta')-\frac{\kappa-\zeta}{2}t(1-t)|\beta'-\beta|^2.
\end{align*}
This proves the restricted chord curvature inequality with parameter $\kappa-\zeta$ for the chosen pair. The same proof applies to arbitrary $u,v\in\mathbb{R}^p$ supported in the coordinate subspace indexed by $S$, because $\operatorname{supp}(v-u)\subseteq S$ and $|S|\leq m$ are the only support facts used in the loss estimate. Therefore the entire restriction of $Q$ to that coordinate subspace has strong convexity modulus at least $\kappa-\zeta$. To call this genuine strong convexity, the modulus must be positive. The additional hypothesis $\kappa>\zeta$ gives $\kappa-\zeta>0$, and then the displayed inequality is exactly the strong convexity inequality along every sparse line segment under consideration.
[/guided]
[/step]