[proofplan]
The objective is even in $u$ up to the linear displacement by $z$, so we first reduce to the case $z \geq 0$ and $u \geq 0$. On the nonnegative half-line, the derivative of the objective is affine on each SCAD region, so the only possible minimizers occur at the affine critical points or at the breakpoints $0,\lambda,a\lambda$. We determine the sign of the one-sided derivatives on each interval and show that the displayed threshold point is exactly where the objective changes from decreasing to increasing. Finally, symmetry restores the sign for arbitrary $z \in \mathbb{R}$.
[/proofplan]
[step:Reduce the minimization to the nonnegative half-line when $z \geq 0$]
Define the objective function $Q_z: \mathbb{R} \to \mathbb{R}$ by
\begin{align*}
Q_z(u)=\frac{1}{2}(u-z)^2+p_{\lambda,a}(|u|).
\end{align*}
For $u \in \mathbb{R}$, expanding the two squares gives
\begin{align*}
Q_z(-u)-Q_z(u)=\frac{1}{2}(-u-z)^2-\frac{1}{2}(u-z)^2=2uz.
\end{align*}
If $z \geq 0$ and $u \geq 0$, then $Q_z(-u) \geq Q_z(u)$. Hence, for $z \geq 0$, a global minimizer may be chosen in $[0,\infty)$.
[guided]
We first remove a nuisance: the penalty depends only on $|u|$, but the quadratic term prefers $u$ to have the same sign as $z$. Define $Q_z: \mathbb{R} \to \mathbb{R}$ by
\begin{align*}
Q_z(u)=\frac{1}{2}(u-z)^2+p_{\lambda,a}(|u|).
\end{align*}
The penalty term is unchanged when $u$ is replaced by $-u$. Thus the only difference between $Q_z(u)$ and $Q_z(-u)$ comes from the square. First,
\begin{align*}
Q_z(-u)-Q_z(u)=\frac{1}{2}(-u-z)^2+p_{\lambda,a}(|u|)-\frac{1}{2}(u-z)^2-p_{\lambda,a}(|u|).
\end{align*}
Cancelling the penalty terms gives
\begin{align*}
Q_z(-u)-Q_z(u)=\frac{1}{2}\bigl((u+z)^2-(u-z)^2\bigr).
\end{align*}
Expanding the squares yields
\begin{align*}
Q_z(-u)-Q_z(u)=2uz.
\end{align*}
When $z \geq 0$ and $u \geq 0$, this difference is nonnegative. Therefore replacing a negative candidate $-u$ by the nonnegative candidate $u$ cannot increase the objective. It is enough, for $z \geq 0$, to minimize $Q_z$ over $[0,\infty)$.
[/guided]
[/step]
[step:Compute the derivative of the restricted objective on the SCAD regions]
Assume $z \geq 0$ and define $q_z: [0,\infty) \to \mathbb{R}$ by
\begin{align*}
q_z(u)=\frac{1}{2}(u-z)^2+p_{\lambda,a}(u).
\end{align*}
The SCAD penalty is the [continuous function](/page/Continuous%20Function) $p_{\lambda,a}: [0,\infty) \to [0,\infty)$ defined as follows. For $0\leq t\leq \lambda$,
\begin{align*}
p_{\lambda,a}(t)=\lambda t.
\end{align*}
For $\lambda<t\leq a\lambda$,
\begin{align*}
p_{\lambda,a}(t)=\frac{-t^2+2a\lambda t-\lambda^2}{2(a-1)}.
\end{align*}
For $t>a\lambda$,
\begin{align*}
p_{\lambda,a}(t)=\frac{(a+1)\lambda^2}{2}.
\end{align*}
The values agree at $t=\lambda$ and $t=a\lambda$, so $p_{\lambda,a}$ is continuous across the breakpoints. Since $\lambda>0$ and $a>2$, the breakpoints satisfy $0<\lambda<2\lambda<a\lambda$. For $u > 0$ away from the breakpoints $\lambda$ and $a\lambda$, differentiating this piecewise formula gives the following derivative region by region. On $0<u<\lambda$,
\begin{align*}
q_z'(u)=u-z+\lambda.
\end{align*}
On $\lambda<u<a\lambda$,
\begin{align*}
q_z'(u)=u-z+\frac{a\lambda-u}{a-1}.
\end{align*}
On $u>a\lambda$,
\begin{align*}
q_z'(u)=u-z.
\end{align*}
Equivalently, on $\lambda<u<a\lambda$,
\begin{align*}
q_z'(u)=\frac{(a-2)u+(a\lambda-(a-1)z)}{a-1}.
\end{align*}
Because $a>2$, the derivative is strictly increasing on each of the intervals $(0,\lambda)$, $(\lambda,a\lambda)$, and $(a\lambda,\infty)$.
[guided]
The restricted objective is a one-variable piecewise smooth function, so the minimizer can be found by tracking where the derivative changes sign. Assume $z\geq 0$ and define $q_z:[0,\infty)\to\mathbb{R}$ by
\begin{align*}
q_z(u)=\frac{1}{2}(u-z)^2+p_{\lambda,a}(u).
\end{align*}
Here the SCAD penalty is the continuous map $p_{\lambda,a}: [0,\infty)\to[0,\infty)$ given as follows. For $0\leq t\leq \lambda$,
\begin{align*}
p_{\lambda,a}(t)=\lambda t.
\end{align*}
For $\lambda<t\leq a\lambda$,
\begin{align*}
p_{\lambda,a}(t)=\frac{-t^2+2a\lambda t-\lambda^2}{2(a-1)}.
\end{align*}
For $t>a\lambda$,
\begin{align*}
p_{\lambda,a}(t)=\frac{(a+1)\lambda^2}{2}.
\end{align*}
The first and second formulas both give $\lambda^2$ at $t=\lambda$, and the second and third formulas both give $(a+1)\lambda^2/2$ at $t=a\lambda$. Thus $p_{\lambda,a}$, and hence $q_z$, is continuous across the breakpoints. Since $\lambda>0$ and $a>2$, the intervals satisfy $0<\lambda<2\lambda<a\lambda$. Differentiating the displayed piecewise formula gives SCAD derivative $\lambda$ on $0<u<\lambda$, $(a\lambda-u)/(a-1)$ on $\lambda<u<a\lambda$, and $0$ on $u>a\lambda$. Therefore differentiating the quadratic term and adding the SCAD derivative gives
\begin{align*}
q_z'(u)=u-z+\lambda
\end{align*}
for $0<u<\lambda$,
\begin{align*}
q_z'(u)=u-z+\frac{a\lambda-u}{a-1}
\end{align*}
for $\lambda<u<a\lambda$, and
\begin{align*}
q_z'(u)=u-z
\end{align*}
for $u>a\lambda$. On the middle interval, collecting the $u$ terms gives
\begin{align*}
q_z'(u)=\frac{(a-2)u+(a\lambda-(a-1)z)}{a-1}.
\end{align*}
The hypothesis $a>2$ is used here: the coefficient of $u$ on the middle interval is $(a-2)/(a-1)>0$. Hence $q_z'$ is strictly increasing on each smooth SCAD region. This reduces the minimization problem to finding the unique sign change of $q_z'$ in the appropriate interval and checking the adjacent breakpoints.
[/guided]
[/step]
[step:Identify the minimizer for $0 \leq z \leq 2\lambda$]
On $(0,\lambda)$,
\begin{align*}
q_z'(u)=u-z+\lambda.
\end{align*}
If $0 \leq z \leq \lambda$, then $q_z'(u)>0$ for every $u \in (0,\lambda)$, and the right derivative at $0$ is $\lambda-z \geq 0$. On $(\lambda,a\lambda)$,
\begin{align*}
q_z'(\lambda+)
=
\lambda-z+\frac{a\lambda-\lambda}{a-1}
=
2\lambda-z
\geq 0,
\end{align*}
and since $q_z'$ is strictly increasing on $(\lambda,a\lambda)$, $q_z'(u)>0$ there. Also $q_z'(u)=u-z>0$ on $(a\lambda,\infty)$ because $u>a\lambda>\lambda\geq z$. Thus $q_z$ is nondecreasing from $0$ onward, and $u=0$ is a global minimizer.
If $\lambda<z\leq 2\lambda$, then the equation $q_z'(u)=0$ on $(0,\lambda)$ gives
\begin{align*}
u=z-\lambda.
\end{align*}
This point lies in $(0,\lambda]$. The derivative is negative before $z-\lambda$ and positive after $z-\lambda$ on $(0,\lambda)$. At $\lambda$,
\begin{align*}
q_z'(\lambda+)=2\lambda-z\geq 0,
\end{align*}
and the derivative remains positive on $(\lambda,a\lambda)$ and on $(a\lambda,\infty)$. Hence $u=z-\lambda$ is a global minimizer.
[guided]
We now use the derivative signs to locate the minimizer when $0\leq z\leq 2\lambda$. If $0\leq z\leq\lambda$, then on $0<u<\lambda$ we have
\begin{align*}
q_z'(u)=u-z+\lambda>0,
\end{align*}
and the right derivative at $0$ is $\lambda-z\geq0$. At the next region, the right derivative at $\lambda$ is
\begin{align*}
q_z'(\lambda+)=\lambda-z+\frac{a\lambda-\lambda}{a-1}=2\lambda-z\geq0.
\end{align*}
Since $q_z'$ is strictly increasing on $(\lambda,a\lambda)$, it stays positive there. On $(a\lambda,\infty)$, $q_z'(u)=u-z>0$ because $u>a\lambda>\lambda\geq z$. Thus $q_z$ never decreases after $0$, and $u=0$ is a global minimizer.
If $\lambda<z\leq2\lambda$, then solving $q_z'(u)=0$ on $(0,\lambda)$ gives $u=z-\lambda$, which lies in $(0,\lambda]$. The formula $q_z'(u)=u-z+\lambda$ shows that $q_z'$ is negative before $z-\lambda$ and positive after $z-\lambda$ inside $(0,\lambda)$. At the breakpoint $\lambda$, the one-sided derivative into the next region satisfies
\begin{align*}
q_z'(\lambda+)=2\lambda-z\geq0.
\end{align*}
The strict increase of $q_z'$ on $(\lambda,a\lambda)$ and the formula $q_z'(u)=u-z$ on $(a\lambda,\infty)$ keep the derivative nonnegative after this point. Therefore $q_z$ decreases until $u=z-\lambda$ and does not decrease afterward, so $u=z-\lambda$ is a global minimizer.
[/guided]
[/step]
[step:Identify the minimizer for $2\lambda<z\leq a\lambda$]
Assume $2\lambda<z\leq a\lambda$. The equation $q_z'(u)=0$ on $(\lambda,a\lambda)$ is
\begin{align*}
u-z+\frac{a\lambda-u}{a-1}=0.
\end{align*}
Solving the affine equation, first multiply by $a-1$ to obtain
\begin{align*}
(a-1)u-(a-1)z+a\lambda-u=0.
\end{align*}
Collecting the $u$ terms gives
\begin{align*}
(a-2)u=(a-1)z-a\lambda.
\end{align*}
Since $a>2$, division by $a-2$ gives
\begin{align*}
u=\frac{(a-1)z-a\lambda}{a-2}.
\end{align*}
Call this point $u_z$. Since $z>2\lambda$,
\begin{align*}
u_z>\frac{2(a-1)\lambda-a\lambda}{a-2}=\lambda,
\end{align*}
and since $z\leq a\lambda$,
\begin{align*}
u_z\leq \frac{a(a-1)\lambda-a\lambda}{a-2}=a\lambda.
\end{align*}
Thus $u_z \in (\lambda,a\lambda]$. Moreover $q_z'(\lambda+)=2\lambda-z<0$, $q_z'$ is strictly increasing on $(\lambda,a\lambda)$, and $q_z'$ changes sign at $u_z$. On $(0,\lambda)$ the derivative remains negative up to the right endpoint because $q_z'(\lambda-)=2\lambda-z<0$. On $(a\lambda,\infty)$, $q_z'(u)=u-z\geq a\lambda-z\geq 0$ at the left endpoint and is increasing. Therefore $u_z$ is a global minimizer.
[guided]
Assume $2\lambda<z\leq a\lambda$. The minimizer should now occur in the concave-transition part of the SCAD penalty, so we solve the middle-region equation. On $\lambda<u<a\lambda$,
\begin{align*}
q_z'(u)=u-z+\frac{a\lambda-u}{a-1}.
\end{align*}
Setting this equal to zero and multiplying by $a-1$ gives
\begin{align*}
(a-1)u-(a-1)z+a\lambda-u=0.
\end{align*}
Collecting the $u$ terms gives
\begin{align*}
(a-2)u=(a-1)z-a\lambda.
\end{align*}
Because $a>2$, division by $a-2$ is valid, and the critical point is
\begin{align*}
u_z=\frac{(a-1)z-a\lambda}{a-2}.
\end{align*}
The assumptions place this point in the middle SCAD region. Indeed, $z>2\lambda$ implies
\begin{align*}
u_z>\frac{2(a-1)\lambda-a\lambda}{a-2}=\lambda,
\end{align*}
and $z\leq a\lambda$ implies
\begin{align*}
u_z\leq\frac{a(a-1)\lambda-a\lambda}{a-2}=a\lambda.
\end{align*}
At the left edge of the middle region,
\begin{align*}
q_z'(\lambda+)=2\lambda-z<0.
\end{align*}
Since $q_z'$ is strictly increasing on $(\lambda,a\lambda)$ and vanishes at $u_z$, it changes from negative to nonnegative at $u_z$. Before $\lambda$, the derivative remains negative up to the endpoint because $q_z'(\lambda-)=2\lambda-z<0$. After $a\lambda$, the derivative is $q_z'(u)=u-z$, whose left endpoint value is at least $a\lambda-z\geq0$ and which increases with $u$. Therefore $q_z$ decreases up to $u_z$ and does not decrease afterward, so $u_z$ is a global minimizer.
[/guided]
[/step]
[step:Identify the minimizer for $z>a\lambda$]
Assume $z>a\lambda$. On $(a\lambda,\infty)$,
\begin{align*}
q_z'(u)=u-z,
\end{align*}
so the unique critical point in this region is $u=z$, and it lies in $(a\lambda,\infty)$. On $(0,\lambda)$ and $(\lambda,a\lambda)$, the derivative is still negative at the right endpoint $a\lambda$, since
\begin{align*}
q_z'(a\lambda-)=a\lambda-z<0.
\end{align*}
After $a\lambda$, the derivative $u-z$ is negative for $a\lambda<u<z$ and positive for $u>z$. Hence $q_z$ decreases up to $u=z$ and increases after $u=z$, so $u=z$ is a global minimizer.
[guided]
Assume $z>a\lambda$. On the last SCAD region the penalty derivative is zero, so
\begin{align*}
q_z'(u)=u-z
\end{align*}
for $u>a\lambda$. The equation $q_z'(u)=0$ has the critical point $u=z$, and the assumption $z>a\lambda$ ensures that this critical point lies inside the region where the formula applies. At the left endpoint of this region, the derivative coming from the middle interval satisfies
\begin{align*}
q_z'(a\lambda-)=a\lambda-z<0.
\end{align*}
Thus the objective is still decreasing as it reaches $a\lambda$. For $a\lambda<u<z$, the derivative $u-z$ remains negative, and for $u>z$ it is positive. Hence $q_z$ decreases until $u=z$ and increases after $u=z$, so $u=z$ is a global minimizer.
[/guided]
[/step]
[step:Restore the sign and read off the SCAD thresholding rule]
For $z\geq 0$, the preceding steps show that the global minimizer is unique and is described by the following four alternatives. If $0 \leq z \leq \lambda$, then
\begin{align*}
\hat u=0.
\end{align*}
If $\lambda<z\leq 2\lambda$, then
\begin{align*}
\hat u=z-\lambda.
\end{align*}
If $2\lambda<z\leq a\lambda$, then
\begin{align*}
\hat u=\frac{(a-1)z-a\lambda}{a-2}.
\end{align*}
If $z>a\lambda$, then
\begin{align*}
\hat u=z.
\end{align*}
This is precisely $T_{\mathrm{SCAD}}(z)$.
For $z<0$, write $r=|z|$. Since
\begin{align*}
Q_z(u)=Q_r(-u),
\end{align*}
the minimizers of $Q_z$ are precisely the negatives of the minimizers of $Q_r$. Since the minimizer for $Q_r$ is unique, every global minimizer for $Q_z$ is
\begin{align*}
\hat u=\operatorname{sgn}(z)\,T_{\mathrm{SCAD}}(|z|).
\end{align*}
This is the asserted SCAD thresholding rule.
[guided]
The previous steps solved the problem for $z\geq0$. In that case the global minimizer is unique and is exactly the four-piece threshold value: $0$ for $0\leq z\leq\lambda$, $z-\lambda$ for $\lambda<z\leq2\lambda$, $((a-1)z-a\lambda)/(a-2)$ for $2\lambda<z\leq a\lambda$, and $z$ for $z>a\lambda$. Thus the nonnegative case gives $T_{\mathrm{SCAD}}(z)$.
Now suppose $z<0$ and define $r=|z|$. For every $u\in\mathbb{R}$, direct substitution gives
\begin{align*}
Q_z(u)=Q_r(-u).
\end{align*}
Therefore minimizing $Q_z$ over $u$ is the same as minimizing $Q_r$ over $-u$. Since the nonnegative case gives the unique global minimizer $T_{\mathrm{SCAD}}(r)$ for $Q_r$, every minimizer for $Q_z$ is $-T_{\mathrm{SCAD}}(r)$. Because $z<0$, this equals $\operatorname{sgn}(z)T_{\mathrm{SCAD}}(|z|)$. Combining this with the case $z\geq0$ proves that every global minimizer satisfies
\begin{align*}
\hat u=\operatorname{sgn}(z)\,T_{\mathrm{SCAD}}(|z|)
\end{align*}
for every $z\in\mathbb{R}$.
[/guided]
[/step]