Calibration via Differentiability at Zero — Statement & Proof

Calibration via Differentiability at Zero (Theorem # 1978)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We verify the classification calibration condition: for each $\eta \neq 1/2$, we must find $\alpha$ with $\operatorname{sgn}(\alpha) = \operatorname{sgn}(2\eta - 1)$ such that $C_\eta(\alpha) < \inf_{\alpha : \alpha(2\eta-1) \leq 0} C_\eta(\alpha)$, where $C_\eta(\alpha) = \eta\varphi(\alpha) + (1-\eta)\varphi(-\alpha)$. The strategy is to show that the first-order condition for the convex function $C_\eta$ forces $C_\eta$ to be non-decreasing (respectively non-increasing) on the wrong-sign half-line, while the derivative $C_\eta'(0) = (2\eta-1)\varphi'(0)$ has a sign that guarantees improvement in the correct-sign direction. We treat $\eta > 1/2$ directly and reduce $\eta < 1/2$ by symmetry. [/proofplan] [step:Compute the derivative $C_\eta'(0)$ and establish its sign] Define the conditional $\varphi$-risk: \begin{align*} C_\eta : \mathbb{R} &\to [0, \infty) \\ \alpha &\mapsto \eta\varphi(\alpha) + (1-\eta)\varphi(-\alpha). \end{align*} Since $\varphi$ is convex, $C_\eta$ is a non-negative linear combination of the convex functions $\alpha \mapsto \varphi(\alpha)$ and $\alpha \mapsto \varphi(-\alpha)$ (the latter is convex because composing a convex function with the linear map $\alpha \mapsto -\alpha$ preserves convexity, by [Properties of Convex Functions](/theorems/1976), part (ii)). Hence $C_\eta$ is convex. Since $\varphi$ is differentiable at $0$, so is $C_\eta$. By the chain rule: \begin{align*} C_\eta'(0) = \eta\varphi'(0) + (1-\eta)\cdot(-1)\cdot\varphi'(0) = (2\eta - 1)\varphi'(0). \end{align*} Since $\varphi'(0) < 0$ by hypothesis, the sign of $C_\eta'(0)$ is opposite to the sign of $2\eta - 1$: - If $\eta > 1/2$, then $2\eta - 1 > 0$ and $C_\eta'(0) < 0$. - If $\eta < 1/2$, then $2\eta - 1 < 0$ and $C_\eta'(0) > 0$. [guided] The conditional $\varphi$-risk $C_\eta(\alpha)$ measures the expected surrogate loss when predicting margin $\alpha$, given that the true label is $+1$ with probability $\eta$. Classification calibration requires that the minimiser of $C_\eta$ has the "correct" sign -- positive when $\eta > 1/2$, negative when $\eta < 1/2$. The derivative at zero tells us the direction in which $C_\eta$ initially decreases. If $C_\eta'(0) < 0$ (as when $\eta > 1/2$), then $C_\eta$ decreases as $\alpha$ increases from $0$ -- meaning there exist positive $\alpha$ values with $C_\eta(\alpha) < C_\eta(0)$. This is the correct-sign direction for $\eta > 1/2$. Why is the composition $\alpha \mapsto \varphi(-\alpha)$ convex? By part (ii) of [Properties of Convex Functions](/theorems/1976) with $A = -1 \in \mathbb{R}^{1 \times 1}$ and $b = 0$, the composition of a convex function with an affine map is convex. [/guided] [/step] [step:Treat the case $\eta > 1/2$: use the first-order condition to bound the infimum over the wrong-sign half-line] Assume $\eta > 1/2$, so $2\eta - 1 > 0$ and $C_\eta'(0) < 0$. We first show that $\inf_{\alpha \leq 0} C_\eta(\alpha) = C_\eta(0)$. Since $C_\eta$ is convex and differentiable at $0$, the first-order condition ([Properties of Convex Functions](/theorems/1976), part (v)) gives \begin{align*} C_\eta(\alpha) \geq C_\eta(0) + C_\eta'(0) \cdot \alpha \quad \text{for all } \alpha \in \mathbb{R}. \end{align*} For $\alpha \leq 0$: since $C_\eta'(0) < 0$ and $\alpha \leq 0$, the product $C_\eta'(0) \cdot \alpha \geq 0$. Therefore $C_\eta(\alpha) \geq C_\eta(0)$ for all $\alpha \leq 0$, and since $C_\eta(0)$ is attained at $\alpha = 0$: \begin{align*} \inf_{\alpha \leq 0} C_\eta(\alpha) = C_\eta(0). \end{align*} Note that this is the infimum over the "wrong-sign" half-line: when $\eta > 1/2$, the Bayes-optimal prediction should be positive, so $\alpha \leq 0$ corresponds to the wrong sign, i.e., $\alpha(2\eta - 1) \leq 0$. [guided] The first-order condition for convex functions says that the tangent line at any point lies below the graph. At $\alpha = 0$, the tangent line is $\alpha \mapsto C_\eta(0) + C_\eta'(0)\alpha$. Since $C_\eta'(0) < 0$, this tangent line is decreasing -- so for $\alpha \leq 0$, the tangent value $C_\eta(0) + C_\eta'(0)\alpha$ is at least $C_\eta(0)$ (the product of two non-positive numbers is non-negative). Since $C_\eta(\alpha)$ lies above the tangent line, we get $C_\eta(\alpha) \geq C_\eta(0)$ on the entire non-positive half-line. This is the key geometric fact: a convex function with negative derivative at $0$ cannot dip below $C_\eta(0)$ for $\alpha \leq 0$. [/guided] [/step] [step:Show that $C_\eta$ achieves a strictly lower value for some $\alpha^* > 0$] Since $C_\eta'(0) < 0$, by the definition of the derivative there exists $\alpha^* > 0$ sufficiently small such that \begin{align*} \frac{C_\eta(\alpha^*) - C_\eta(0)}{\alpha^*} < 0, \end{align*} which gives $C_\eta(\alpha^*) < C_\eta(0)$. (Concretely, for any $\varepsilon$ with $0 < \varepsilon < |C_\eta'(0)|$, there exists $\delta > 0$ such that $(C_\eta(\alpha) - C_\eta(0))/\alpha < C_\eta'(0) + \varepsilon < 0$ for all $\alpha \in (0, \delta)$, so any $\alpha^* \in (0, \delta)$ works.) Combining with the previous step: \begin{align*} \inf_{\alpha \in \mathbb{R}} C_\eta(\alpha) \leq C_\eta(\alpha^*) < C_\eta(0) = \inf_{\alpha \leq 0} C_\eta(\alpha) = \inf_{\alpha : \alpha(2\eta-1) \leq 0} C_\eta(\alpha). \end{align*} Since $\alpha^* > 0$ and $2\eta - 1 > 0$, the minimiser has the correct sign. This verifies the classification calibration condition for $\eta > 1/2$. [/step] [step:Treat the case $\eta < 1/2$ by symmetry] For $\eta < 1/2$, we have $2\eta - 1 < 0$ and $C_\eta'(0) = (2\eta-1)\varphi'(0) > 0$ (product of two negatives). Observe the symmetry $C_{1-\eta}(\alpha) = (1-\eta)\varphi(\alpha) + \eta\varphi(-\alpha) = C_\eta(-\alpha)$ for all $\alpha \in \mathbb{R}$. Setting $\tilde{\eta} := 1 - \eta > 1/2$, the case $\tilde{\eta} > 1/2$ already treated gives the existence of $\tilde{\alpha}^* > 0$ with \begin{align*} C_{\tilde{\eta}}(\tilde{\alpha}^*) < \inf_{\alpha : \alpha(2\tilde{\eta}-1) \leq 0} C_{\tilde{\eta}}(\alpha). \end{align*} Using the symmetry $C_{\tilde{\eta}}(\alpha) = C_\eta(-\alpha)$, set $\alpha^* := -\tilde{\alpha}^* < 0$. Then $C_\eta(\alpha^*) = C_\eta(-\tilde{\alpha}^*) = C_{\tilde{\eta}}(\tilde{\alpha}^*)$. For the infimum on the wrong-sign half-line: $\alpha(2\eta - 1) \leq 0$ means $\alpha \geq 0$ (since $2\eta - 1 < 0$), and \begin{align*} \inf_{\alpha \geq 0} C_\eta(\alpha) = \inf_{\alpha \geq 0} C_{\tilde{\eta}}(-\alpha) = \inf_{\beta \leq 0} C_{\tilde{\eta}}(\beta) = \inf_{\beta : \beta(2\tilde{\eta}-1) \leq 0} C_{\tilde{\eta}}(\beta). \end{align*} Therefore \begin{align*} C_\eta(\alpha^*) = C_{\tilde{\eta}}(\tilde{\alpha}^*) < \inf_{\beta : \beta(2\tilde{\eta}-1) \leq 0} C_{\tilde{\eta}}(\beta) = \inf_{\alpha : \alpha(2\eta-1) \leq 0} C_\eta(\alpha). \end{align*} Since $\alpha^* < 0$ and $2\eta - 1 < 0$, we have $\alpha^*(2\eta - 1) > 0$, so $\alpha^*$ has the correct sign. This verifies classification calibration for $\eta < 1/2$. [guided] The symmetry reduces the $\eta < 1/2$ case to the $\eta > 1/2$ case. The conditional risk $C_\eta(\alpha)$ at label probability $\eta$ and margin $\alpha$ equals $C_{1-\eta}(-\alpha)$ -- flipping the label probability is equivalent to negating the margin. If $\eta < 1/2$, then $1-\eta > 1/2$, and we can apply the result already proved for the case $\tilde{\eta} = 1-\eta > 1/2$. The optimal margin $\tilde{\alpha}^* > 0$ for $C_{\tilde{\eta}}$ translates to $\alpha^* = -\tilde{\alpha}^* < 0$ for $C_\eta$, which is the correct (negative) sign when $\eta < 1/2$. Why is the case $\eta = 1/2$ excluded? When $\eta = 1/2$, the Bayes-optimal classifier can predict either label (both achieve the same misclassification risk $1/2$), so the classification calibration condition imposes no constraint at $\eta = 1/2$. [/guided] [/step] [step:Conclude that $\varphi$ is classification calibrated] For every $\eta \in (0,1)$ with $\eta \neq 1/2$, we have shown that \begin{align*} \inf_{\alpha \in \mathbb{R}} C_\eta(\alpha) < \inf_{\alpha : \alpha(2\eta-1) \leq 0} C_\eta(\alpha). \end{align*} This is precisely the definition of classification calibration: the infimum of the conditional $\varphi$-risk is not attained on the wrong-sign half-line. Therefore $\varphi$ is classification calibrated whenever $\varphi$ is convex, differentiable at $0$, and $\varphi'(0) < 0$. [/step]

Explore Further

Fundamental Theorem of Statistical Learning Theory Machine Learning Subdifferential at Points of Differentiability Machine Learning Projection Theorem Machine Learning Expected Maximum of Sub-Gaussians Machine Learning Empirical Rademacher Bound via Counting Machine Learning Zhang–Bartlett Machine Learning Finite Class Generalization Bound Machine Learning Hoeffding's Inequality Machine Learning

What brings you to Androma?

Start with a route through the knowledge graph.

Calibration via Differentiability at Zero (Theorem # 1978)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Calibration via Differentiability at Zero (Theorem # 1978)

Discussion

Proof

Explore Further