[proofplan]
We verify the classification calibration condition: for each $\eta \neq 1/2$, we must find $\alpha$ with $\operatorname{sgn}(\alpha) = \operatorname{sgn}(2\eta - 1)$ such that $C_\eta(\alpha) < \inf_{\alpha : \alpha(2\eta-1) \leq 0} C_\eta(\alpha)$, where $C_\eta(\alpha) = \eta\varphi(\alpha) + (1-\eta)\varphi(-\alpha)$. The strategy is to show that the first-order condition for the convex function $C_\eta$ forces $C_\eta$ to be non-decreasing (respectively non-increasing) on the wrong-sign half-line, while the derivative $C_\eta'(0) = (2\eta-1)\varphi'(0)$ has a sign that guarantees improvement in the correct-sign direction. We treat $\eta > 1/2$ directly and reduce $\eta < 1/2$ by symmetry.
[/proofplan]
[step:Compute the derivative $C_\eta'(0)$ and establish its sign]
Define the conditional $\varphi$-risk:
\begin{align*}
C_\eta : \mathbb{R} &\to [0, \infty) \\
\alpha &\mapsto \eta\varphi(\alpha) + (1-\eta)\varphi(-\alpha).
\end{align*}
Since $\varphi$ is convex, $C_\eta$ is a non-negative linear combination of the convex functions $\alpha \mapsto \varphi(\alpha)$ and $\alpha \mapsto \varphi(-\alpha)$ (the latter is convex because composing a convex function with the linear map $\alpha \mapsto -\alpha$ preserves convexity, by [Properties of Convex Functions](/theorems/1976), part (ii)). Hence $C_\eta$ is convex.
Since $\varphi$ is differentiable at $0$, so is $C_\eta$. By the chain rule:
\begin{align*}
C_\eta'(0) = \eta\varphi'(0) + (1-\eta)\cdot(-1)\cdot\varphi'(0) = (2\eta - 1)\varphi'(0).
\end{align*}
Since $\varphi'(0) < 0$ by hypothesis, the sign of $C_\eta'(0)$ is opposite to the sign of $2\eta - 1$:
- If $\eta > 1/2$, then $2\eta - 1 > 0$ and $C_\eta'(0) < 0$.
- If $\eta < 1/2$, then $2\eta - 1 < 0$ and $C_\eta'(0) > 0$.
[guided]
The conditional $\varphi$-risk $C_\eta(\alpha)$ measures the expected surrogate loss when predicting margin $\alpha$, given that the true label is $+1$ with probability $\eta$. Classification calibration requires that the minimiser of $C_\eta$ has the "correct" sign -- positive when $\eta > 1/2$, negative when $\eta < 1/2$.
The derivative at zero tells us the direction in which $C_\eta$ initially decreases. If $C_\eta'(0) < 0$ (as when $\eta > 1/2$), then $C_\eta$ decreases as $\alpha$ increases from $0$ -- meaning there exist positive $\alpha$ values with $C_\eta(\alpha) < C_\eta(0)$. This is the correct-sign direction for $\eta > 1/2$.
Why is the composition $\alpha \mapsto \varphi(-\alpha)$ convex? By part (ii) of [Properties of Convex Functions](/theorems/1976) with $A = -1 \in \mathbb{R}^{1 \times 1}$ and $b = 0$, the composition of a convex function with an affine map is convex.
[/guided]
[/step]
[step:Treat the case $\eta > 1/2$: use the first-order condition to bound the infimum over the wrong-sign half-line]
Assume $\eta > 1/2$, so $2\eta - 1 > 0$ and $C_\eta'(0) < 0$.
We first show that $\inf_{\alpha \leq 0} C_\eta(\alpha) = C_\eta(0)$. Since $C_\eta$ is convex and differentiable at $0$, the first-order condition ([Properties of Convex Functions](/theorems/1976), part (v)) gives
\begin{align*}
C_\eta(\alpha) \geq C_\eta(0) + C_\eta'(0) \cdot \alpha \quad \text{for all } \alpha \in \mathbb{R}.
\end{align*}
For $\alpha \leq 0$: since $C_\eta'(0) < 0$ and $\alpha \leq 0$, the product $C_\eta'(0) \cdot \alpha \geq 0$. Therefore $C_\eta(\alpha) \geq C_\eta(0)$ for all $\alpha \leq 0$, and since $C_\eta(0)$ is attained at $\alpha = 0$:
\begin{align*}
\inf_{\alpha \leq 0} C_\eta(\alpha) = C_\eta(0).
\end{align*}
Note that this is the infimum over the "wrong-sign" half-line: when $\eta > 1/2$, the Bayes-optimal prediction should be positive, so $\alpha \leq 0$ corresponds to the wrong sign, i.e., $\alpha(2\eta - 1) \leq 0$.
[guided]
The first-order condition for convex functions says that the tangent line at any point lies below the graph. At $\alpha = 0$, the tangent line is $\alpha \mapsto C_\eta(0) + C_\eta'(0)\alpha$. Since $C_\eta'(0) < 0$, this tangent line is decreasing -- so for $\alpha \leq 0$, the tangent value $C_\eta(0) + C_\eta'(0)\alpha$ is at least $C_\eta(0)$ (the product of two non-positive numbers is non-negative). Since $C_\eta(\alpha)$ lies above the tangent line, we get $C_\eta(\alpha) \geq C_\eta(0)$ on the entire non-positive half-line.
This is the key geometric fact: a convex function with negative derivative at $0$ cannot dip below $C_\eta(0)$ for $\alpha \leq 0$.
[/guided]
[/step]
[step:Show that $C_\eta$ achieves a strictly lower value for some $\alpha^* > 0$]
Since $C_\eta'(0) < 0$, by the definition of the derivative there exists $\alpha^* > 0$ sufficiently small such that
\begin{align*}
\frac{C_\eta(\alpha^*) - C_\eta(0)}{\alpha^*} < 0,
\end{align*}
which gives $C_\eta(\alpha^*) < C_\eta(0)$. (Concretely, for any $\varepsilon$ with $0 < \varepsilon < |C_\eta'(0)|$, there exists $\delta > 0$ such that $(C_\eta(\alpha) - C_\eta(0))/\alpha < C_\eta'(0) + \varepsilon < 0$ for all $\alpha \in (0, \delta)$, so any $\alpha^* \in (0, \delta)$ works.)
Combining with the previous step:
\begin{align*}
\inf_{\alpha \in \mathbb{R}} C_\eta(\alpha) \leq C_\eta(\alpha^*) < C_\eta(0) = \inf_{\alpha \leq 0} C_\eta(\alpha) = \inf_{\alpha : \alpha(2\eta-1) \leq 0} C_\eta(\alpha).
\end{align*}
Since $\alpha^* > 0$ and $2\eta - 1 > 0$, the minimiser has the correct sign. This verifies the classification calibration condition for $\eta > 1/2$.
[/step]
[step:Treat the case $\eta < 1/2$ by symmetry]
For $\eta < 1/2$, we have $2\eta - 1 < 0$ and $C_\eta'(0) = (2\eta-1)\varphi'(0) > 0$ (product of two negatives).
Observe the symmetry $C_{1-\eta}(\alpha) = (1-\eta)\varphi(\alpha) + \eta\varphi(-\alpha) = C_\eta(-\alpha)$ for all $\alpha \in \mathbb{R}$. Setting $\tilde{\eta} := 1 - \eta > 1/2$, the case $\tilde{\eta} > 1/2$ already treated gives the existence of $\tilde{\alpha}^* > 0$ with
\begin{align*}
C_{\tilde{\eta}}(\tilde{\alpha}^*) < \inf_{\alpha : \alpha(2\tilde{\eta}-1) \leq 0} C_{\tilde{\eta}}(\alpha).
\end{align*}
Using the symmetry $C_{\tilde{\eta}}(\alpha) = C_\eta(-\alpha)$, set $\alpha^* := -\tilde{\alpha}^* < 0$. Then $C_\eta(\alpha^*) = C_\eta(-\tilde{\alpha}^*) = C_{\tilde{\eta}}(\tilde{\alpha}^*)$. For the infimum on the wrong-sign half-line: $\alpha(2\eta - 1) \leq 0$ means $\alpha \geq 0$ (since $2\eta - 1 < 0$), and
\begin{align*}
\inf_{\alpha \geq 0} C_\eta(\alpha) = \inf_{\alpha \geq 0} C_{\tilde{\eta}}(-\alpha) = \inf_{\beta \leq 0} C_{\tilde{\eta}}(\beta) = \inf_{\beta : \beta(2\tilde{\eta}-1) \leq 0} C_{\tilde{\eta}}(\beta).
\end{align*}
Therefore
\begin{align*}
C_\eta(\alpha^*) = C_{\tilde{\eta}}(\tilde{\alpha}^*) < \inf_{\beta : \beta(2\tilde{\eta}-1) \leq 0} C_{\tilde{\eta}}(\beta) = \inf_{\alpha : \alpha(2\eta-1) \leq 0} C_\eta(\alpha).
\end{align*}
Since $\alpha^* < 0$ and $2\eta - 1 < 0$, we have $\alpha^*(2\eta - 1) > 0$, so $\alpha^*$ has the correct sign. This verifies classification calibration for $\eta < 1/2$.
[guided]
The symmetry reduces the $\eta < 1/2$ case to the $\eta > 1/2$ case. The conditional risk $C_\eta(\alpha)$ at label probability $\eta$ and margin $\alpha$ equals $C_{1-\eta}(-\alpha)$ -- flipping the label probability is equivalent to negating the margin. If $\eta < 1/2$, then $1-\eta > 1/2$, and we can apply the result already proved for the case $\tilde{\eta} = 1-\eta > 1/2$. The optimal margin $\tilde{\alpha}^* > 0$ for $C_{\tilde{\eta}}$ translates to $\alpha^* = -\tilde{\alpha}^* < 0$ for $C_\eta$, which is the correct (negative) sign when $\eta < 1/2$.
Why is the case $\eta = 1/2$ excluded? When $\eta = 1/2$, the Bayes-optimal classifier can predict either label (both achieve the same misclassification risk $1/2$), so the classification calibration condition imposes no constraint at $\eta = 1/2$.
[/guided]
[/step]
[step:Conclude that $\varphi$ is classification calibrated]
For every $\eta \in (0,1)$ with $\eta \neq 1/2$, we have shown that
\begin{align*}
\inf_{\alpha \in \mathbb{R}} C_\eta(\alpha) < \inf_{\alpha : \alpha(2\eta-1) \leq 0} C_\eta(\alpha).
\end{align*}
This is precisely the definition of classification calibration: the infimum of the conditional $\varphi$-risk is not attained on the wrong-sign half-line. Therefore $\varphi$ is classification calibrated whenever $\varphi$ is convex, differentiable at $0$, and $\varphi'(0) < 0$.
[/step]