SCAD Thresholding Rule — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The objective is even in $u$ up to the linear displacement by $z$, so we first reduce to the case $z \geq 0$ and $u \geq 0$. On the nonnegative half-line, the derivative of the objective is affine on each SCAD region, so the only possible minimizers occur at the affine critical points or at the breakpoints $0,\lambda,a\lambda$. We determine the sign of the one-sided derivatives on each interval and show that the displayed threshold point is exactly where the objective changes from decreasing to increasing. Finally, symmetry restores the sign for arbitrary $z \in \mathbb{R}$. [/proofplan] [step:Reduce the minimization to the nonnegative half-line when $z \geq 0$] Define the objective function $Q_z: \mathbb{R} \to \mathbb{R}$ by \begin{align*} Q_z(u)=\frac{1}{2}(u-z)^2+p_{\lambda,a}(|u|). \end{align*} For $u \in \mathbb{R}$, expanding the two squares gives \begin{align*} Q_z(-u)-Q_z(u)=\frac{1}{2}(-u-z)^2-\frac{1}{2}(u-z)^2=2uz. \end{align*} If $z \geq 0$ and $u \geq 0$, then $Q_z(-u) \geq Q_z(u)$. Hence, for $z \geq 0$, a global minimizer may be chosen in $[0,\infty)$. [guided] We first remove a nuisance: the penalty depends only on $|u|$, but the quadratic term prefers $u$ to have the same sign as $z$. Define $Q_z: \mathbb{R} \to \mathbb{R}$ by \begin{align*} Q_z(u)=\frac{1}{2}(u-z)^2+p_{\lambda,a}(|u|). \end{align*} The penalty term is unchanged when $u$ is replaced by $-u$. Thus the only difference between $Q_z(u)$ and $Q_z(-u)$ comes from the square. First, \begin{align*} Q_z(-u)-Q_z(u)=\frac{1}{2}(-u-z)^2+p_{\lambda,a}(|u|)-\frac{1}{2}(u-z)^2-p_{\lambda,a}(|u|). \end{align*} Cancelling the penalty terms gives \begin{align*} Q_z(-u)-Q_z(u)=\frac{1}{2}\bigl((u+z)^2-(u-z)^2\bigr). \end{align*} Expanding the squares yields \begin{align*} Q_z(-u)-Q_z(u)=2uz. \end{align*} When $z \geq 0$ and $u \geq 0$, this difference is nonnegative. Therefore replacing a negative candidate $-u$ by the nonnegative candidate $u$ cannot increase the objective. It is enough, for $z \geq 0$, to minimize $Q_z$ over $[0,\infty)$. [/guided] [/step] [step:Compute the derivative of the restricted objective on the SCAD regions] Assume $z \geq 0$ and define $q_z: [0,\infty) \to \mathbb{R}$ by \begin{align*} q_z(u)=\frac{1}{2}(u-z)^2+p_{\lambda,a}(u). \end{align*} The SCAD penalty is the [continuous function](/page/Continuous%20Function) $p_{\lambda,a}: [0,\infty) \to [0,\infty)$ defined as follows. For $0\leq t\leq \lambda$, \begin{align*} p_{\lambda,a}(t)=\lambda t. \end{align*} For $\lambda<t\leq a\lambda$, \begin{align*} p_{\lambda,a}(t)=\frac{-t^2+2a\lambda t-\lambda^2}{2(a-1)}. \end{align*} For $t>a\lambda$, \begin{align*} p_{\lambda,a}(t)=\frac{(a+1)\lambda^2}{2}. \end{align*} The values agree at $t=\lambda$ and $t=a\lambda$, so $p_{\lambda,a}$ is continuous across the breakpoints. Since $\lambda>0$ and $a>2$, the breakpoints satisfy $0<\lambda<2\lambda<a\lambda$. For $u > 0$ away from the breakpoints $\lambda$ and $a\lambda$, differentiating this piecewise formula gives the following derivative region by region. On $0<u<\lambda$, \begin{align*} q_z'(u)=u-z+\lambda. \end{align*} On $\lambda<u<a\lambda$, \begin{align*} q_z'(u)=u-z+\frac{a\lambda-u}{a-1}. \end{align*} On $u>a\lambda$, \begin{align*} q_z'(u)=u-z. \end{align*} Equivalently, on $\lambda<u<a\lambda$, \begin{align*} q_z'(u)=\frac{(a-2)u+(a\lambda-(a-1)z)}{a-1}. \end{align*} Because $a>2$, the derivative is strictly increasing on each of the intervals $(0,\lambda)$, $(\lambda,a\lambda)$, and $(a\lambda,\infty)$. [guided] The restricted objective is a one-variable piecewise smooth function, so the minimizer can be found by tracking where the derivative changes sign. Assume $z\geq 0$ and define $q_z:[0,\infty)\to\mathbb{R}$ by \begin{align*} q_z(u)=\frac{1}{2}(u-z)^2+p_{\lambda,a}(u). \end{align*} Here the SCAD penalty is the continuous map $p_{\lambda,a}: [0,\infty)\to[0,\infty)$ given as follows. For $0\leq t\leq \lambda$, \begin{align*} p_{\lambda,a}(t)=\lambda t. \end{align*} For $\lambda<t\leq a\lambda$, \begin{align*} p_{\lambda,a}(t)=\frac{-t^2+2a\lambda t-\lambda^2}{2(a-1)}. \end{align*} For $t>a\lambda$, \begin{align*} p_{\lambda,a}(t)=\frac{(a+1)\lambda^2}{2}. \end{align*} The first and second formulas both give $\lambda^2$ at $t=\lambda$, and the second and third formulas both give $(a+1)\lambda^2/2$ at $t=a\lambda$. Thus $p_{\lambda,a}$, and hence $q_z$, is continuous across the breakpoints. Since $\lambda>0$ and $a>2$, the intervals satisfy $0<\lambda<2\lambda<a\lambda$. Differentiating the displayed piecewise formula gives SCAD derivative $\lambda$ on $0<u<\lambda$, $(a\lambda-u)/(a-1)$ on $\lambda<u<a\lambda$, and $0$ on $u>a\lambda$. Therefore differentiating the quadratic term and adding the SCAD derivative gives \begin{align*} q_z'(u)=u-z+\lambda \end{align*} for $0<u<\lambda$, \begin{align*} q_z'(u)=u-z+\frac{a\lambda-u}{a-1} \end{align*} for $\lambda<u<a\lambda$, and \begin{align*} q_z'(u)=u-z \end{align*} for $u>a\lambda$. On the middle interval, collecting the $u$ terms gives \begin{align*} q_z'(u)=\frac{(a-2)u+(a\lambda-(a-1)z)}{a-1}. \end{align*} The hypothesis $a>2$ is used here: the coefficient of $u$ on the middle interval is $(a-2)/(a-1)>0$. Hence $q_z'$ is strictly increasing on each smooth SCAD region. This reduces the minimization problem to finding the unique sign change of $q_z'$ in the appropriate interval and checking the adjacent breakpoints. [/guided] [/step] [step:Identify the minimizer for $0 \leq z \leq 2\lambda$] On $(0,\lambda)$, \begin{align*} q_z'(u)=u-z+\lambda. \end{align*} If $0 \leq z \leq \lambda$, then $q_z'(u)>0$ for every $u \in (0,\lambda)$, and the right derivative at $0$ is $\lambda-z \geq 0$. On $(\lambda,a\lambda)$, \begin{align*} q_z'(\lambda+) = \lambda-z+\frac{a\lambda-\lambda}{a-1} = 2\lambda-z \geq 0, \end{align*} and since $q_z'$ is strictly increasing on $(\lambda,a\lambda)$, $q_z'(u)>0$ there. Also $q_z'(u)=u-z>0$ on $(a\lambda,\infty)$ because $u>a\lambda>\lambda\geq z$. Thus $q_z$ is nondecreasing from $0$ onward, and $u=0$ is a global minimizer. If $\lambda<z\leq 2\lambda$, then the equation $q_z'(u)=0$ on $(0,\lambda)$ gives \begin{align*} u=z-\lambda. \end{align*} This point lies in $(0,\lambda]$. The derivative is negative before $z-\lambda$ and positive after $z-\lambda$ on $(0,\lambda)$. At $\lambda$, \begin{align*} q_z'(\lambda+)=2\lambda-z\geq 0, \end{align*} and the derivative remains positive on $(\lambda,a\lambda)$ and on $(a\lambda,\infty)$. Hence $u=z-\lambda$ is a global minimizer. [guided] We now use the derivative signs to locate the minimizer when $0\leq z\leq 2\lambda$. If $0\leq z\leq\lambda$, then on $0<u<\lambda$ we have \begin{align*} q_z'(u)=u-z+\lambda>0, \end{align*} and the right derivative at $0$ is $\lambda-z\geq0$. At the next region, the right derivative at $\lambda$ is \begin{align*} q_z'(\lambda+)=\lambda-z+\frac{a\lambda-\lambda}{a-1}=2\lambda-z\geq0. \end{align*} Since $q_z'$ is strictly increasing on $(\lambda,a\lambda)$, it stays positive there. On $(a\lambda,\infty)$, $q_z'(u)=u-z>0$ because $u>a\lambda>\lambda\geq z$. Thus $q_z$ never decreases after $0$, and $u=0$ is a global minimizer. If $\lambda<z\leq2\lambda$, then solving $q_z'(u)=0$ on $(0,\lambda)$ gives $u=z-\lambda$, which lies in $(0,\lambda]$. The formula $q_z'(u)=u-z+\lambda$ shows that $q_z'$ is negative before $z-\lambda$ and positive after $z-\lambda$ inside $(0,\lambda)$. At the breakpoint $\lambda$, the one-sided derivative into the next region satisfies \begin{align*} q_z'(\lambda+)=2\lambda-z\geq0. \end{align*} The strict increase of $q_z'$ on $(\lambda,a\lambda)$ and the formula $q_z'(u)=u-z$ on $(a\lambda,\infty)$ keep the derivative nonnegative after this point. Therefore $q_z$ decreases until $u=z-\lambda$ and does not decrease afterward, so $u=z-\lambda$ is a global minimizer. [/guided] [/step] [step:Identify the minimizer for $2\lambda<z\leq a\lambda$] Assume $2\lambda<z\leq a\lambda$. The equation $q_z'(u)=0$ on $(\lambda,a\lambda)$ is \begin{align*} u-z+\frac{a\lambda-u}{a-1}=0. \end{align*} Solving the affine equation, first multiply by $a-1$ to obtain \begin{align*} (a-1)u-(a-1)z+a\lambda-u=0. \end{align*} Collecting the $u$ terms gives \begin{align*} (a-2)u=(a-1)z-a\lambda. \end{align*} Since $a>2$, division by $a-2$ gives \begin{align*} u=\frac{(a-1)z-a\lambda}{a-2}. \end{align*} Call this point $u_z$. Since $z>2\lambda$, \begin{align*} u_z>\frac{2(a-1)\lambda-a\lambda}{a-2}=\lambda, \end{align*} and since $z\leq a\lambda$, \begin{align*} u_z\leq \frac{a(a-1)\lambda-a\lambda}{a-2}=a\lambda. \end{align*} Thus $u_z \in (\lambda,a\lambda]$. Moreover $q_z'(\lambda+)=2\lambda-z<0$, $q_z'$ is strictly increasing on $(\lambda,a\lambda)$, and $q_z'$ changes sign at $u_z$. On $(0,\lambda)$ the derivative remains negative up to the right endpoint because $q_z'(\lambda-)=2\lambda-z<0$. On $(a\lambda,\infty)$, $q_z'(u)=u-z\geq a\lambda-z\geq 0$ at the left endpoint and is increasing. Therefore $u_z$ is a global minimizer. [guided] Assume $2\lambda<z\leq a\lambda$. The minimizer should now occur in the concave-transition part of the SCAD penalty, so we solve the middle-region equation. On $\lambda<u<a\lambda$, \begin{align*} q_z'(u)=u-z+\frac{a\lambda-u}{a-1}. \end{align*} Setting this equal to zero and multiplying by $a-1$ gives \begin{align*} (a-1)u-(a-1)z+a\lambda-u=0. \end{align*} Collecting the $u$ terms gives \begin{align*} (a-2)u=(a-1)z-a\lambda. \end{align*} Because $a>2$, division by $a-2$ is valid, and the critical point is \begin{align*} u_z=\frac{(a-1)z-a\lambda}{a-2}. \end{align*} The assumptions place this point in the middle SCAD region. Indeed, $z>2\lambda$ implies \begin{align*} u_z>\frac{2(a-1)\lambda-a\lambda}{a-2}=\lambda, \end{align*} and $z\leq a\lambda$ implies \begin{align*} u_z\leq\frac{a(a-1)\lambda-a\lambda}{a-2}=a\lambda. \end{align*} At the left edge of the middle region, \begin{align*} q_z'(\lambda+)=2\lambda-z<0. \end{align*} Since $q_z'$ is strictly increasing on $(\lambda,a\lambda)$ and vanishes at $u_z$, it changes from negative to nonnegative at $u_z$. Before $\lambda$, the derivative remains negative up to the endpoint because $q_z'(\lambda-)=2\lambda-z<0$. After $a\lambda$, the derivative is $q_z'(u)=u-z$, whose left endpoint value is at least $a\lambda-z\geq0$ and which increases with $u$. Therefore $q_z$ decreases up to $u_z$ and does not decrease afterward, so $u_z$ is a global minimizer. [/guided] [/step] [step:Identify the minimizer for $z>a\lambda$] Assume $z>a\lambda$. On $(a\lambda,\infty)$, \begin{align*} q_z'(u)=u-z, \end{align*} so the unique critical point in this region is $u=z$, and it lies in $(a\lambda,\infty)$. On $(0,\lambda)$ and $(\lambda,a\lambda)$, the derivative is still negative at the right endpoint $a\lambda$, since \begin{align*} q_z'(a\lambda-)=a\lambda-z<0. \end{align*} After $a\lambda$, the derivative $u-z$ is negative for $a\lambda<u<z$ and positive for $u>z$. Hence $q_z$ decreases up to $u=z$ and increases after $u=z$, so $u=z$ is a global minimizer. [guided] Assume $z>a\lambda$. On the last SCAD region the penalty derivative is zero, so \begin{align*} q_z'(u)=u-z \end{align*} for $u>a\lambda$. The equation $q_z'(u)=0$ has the critical point $u=z$, and the assumption $z>a\lambda$ ensures that this critical point lies inside the region where the formula applies. At the left endpoint of this region, the derivative coming from the middle interval satisfies \begin{align*} q_z'(a\lambda-)=a\lambda-z<0. \end{align*} Thus the objective is still decreasing as it reaches $a\lambda$. For $a\lambda<u<z$, the derivative $u-z$ remains negative, and for $u>z$ it is positive. Hence $q_z$ decreases until $u=z$ and increases after $u=z$, so $u=z$ is a global minimizer. [/guided] [/step] [step:Restore the sign and read off the SCAD thresholding rule] For $z\geq 0$, the preceding steps show that the global minimizer is unique and is described by the following four alternatives. If $0 \leq z \leq \lambda$, then \begin{align*} \hat u=0. \end{align*} If $\lambda<z\leq 2\lambda$, then \begin{align*} \hat u=z-\lambda. \end{align*} If $2\lambda<z\leq a\lambda$, then \begin{align*} \hat u=\frac{(a-1)z-a\lambda}{a-2}. \end{align*} If $z>a\lambda$, then \begin{align*} \hat u=z. \end{align*} This is precisely $T_{\mathrm{SCAD}}(z)$. For $z<0$, write $r=|z|$. Since \begin{align*} Q_z(u)=Q_r(-u), \end{align*} the minimizers of $Q_z$ are precisely the negatives of the minimizers of $Q_r$. Since the minimizer for $Q_r$ is unique, every global minimizer for $Q_z$ is \begin{align*} \hat u=\operatorname{sgn}(z)\,T_{\mathrm{SCAD}}(|z|). \end{align*} This is the asserted SCAD thresholding rule. [guided] The previous steps solved the problem for $z\geq0$. In that case the global minimizer is unique and is exactly the four-piece threshold value: $0$ for $0\leq z\leq\lambda$, $z-\lambda$ for $\lambda<z\leq2\lambda$, $((a-1)z-a\lambda)/(a-2)$ for $2\lambda<z\leq a\lambda$, and $z$ for $z>a\lambda$. Thus the nonnegative case gives $T_{\mathrm{SCAD}}(z)$. Now suppose $z<0$ and define $r=|z|$. For every $u\in\mathbb{R}$, direct substitution gives \begin{align*} Q_z(u)=Q_r(-u). \end{align*} Therefore minimizing $Q_z$ over $u$ is the same as minimizing $Q_r$ over $-u$. Since the nonnegative case gives the unique global minimizer $T_{\mathrm{SCAD}}(r)$ for $Q_r$, every minimizer for $Q_z$ is $-T_{\mathrm{SCAD}}(r)$. Because $z<0$, this equals $\operatorname{sgn}(z)T_{\mathrm{SCAD}}(|z|)$. Combining this with the case $z\geq0$ proves that every global minimizer satisfies \begin{align*} \hat u=\operatorname{sgn}(z)\,T_{\mathrm{SCAD}}(|z|) \end{align*} for every $z\in\mathbb{R}$. [/guided] [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

SCAD Thresholding Rule (Theorem # 5578)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

SCAD Thresholding Rule (Theorem # 5578)

Discussion

Proof

Explore Further