Subdifferential at Points of Differentiability — Statement & Proof

Subdifferential at Points of Differentiability (Theorem # 1988)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We show double inclusion. The first-order condition for convex functions ([Properties of Convex Functions](/theorems/1976), part (v)) immediately gives $\nabla f(x) \in \partial f(x)$. For the reverse inclusion, we take an arbitrary $g \in \partial f(x)$ and use the subgradient inequality along every direction $z \in \mathbb{R}^d$, pass to the limit via differentiability to obtain $g^\top z \leq \nabla f(x)^\top z$ for all $z$, and then choose $z = g - \nabla f(x)$ to force $g = \nabla f(x)$. [/proofplan] [step:Show that $\nabla f(x) \in \partial f(x)$ via the first-order condition] Since $f$ is convex and differentiable at $x \in \mathbb{R}^d$, the first-order condition ([Properties of Convex Functions](/theorems/1976), part (v)) states that for all $y \in \mathbb{R}^d$: \begin{align*} f(y) \geq f(x) + \nabla f(x)^\top(y - x). \end{align*} This is precisely the subgradient inequality for $\nabla f(x)$, so $\nabla f(x) \in \partial f(x)$. [/step] [step:Show that every $g \in \partial f(x)$ satisfies $g^\top z \leq \nabla f(x)^\top z$ for all $z \in \mathbb{R}^d$] Suppose $g \in \partial f(x)$. By the definition of the subdifferential, for all $y \in \mathbb{R}^d$: \begin{align*} f(y) \geq f(x) + g^\top(y - x). \end{align*} Fix an arbitrary direction $z \in \mathbb{R}^d$ and set $y = x + tz$ for $t > 0$. The subgradient inequality becomes \begin{align*} f(x + tz) \geq f(x) + tg^\top z. \end{align*} Rearranging and dividing by $t > 0$: \begin{align*} g^\top z \leq \frac{f(x + tz) - f(x)}{t}. \end{align*} Since $f$ is differentiable at $x$, the right-hand side converges to the directional derivative $\nabla f(x)^\top z$ as $t \downarrow 0$. Therefore \begin{align*} g^\top z \leq \nabla f(x)^\top z. \end{align*} Since $z \in \mathbb{R}^d$ was arbitrary, this holds for every direction. [guided] The key idea is to extract information about $g$ from the subgradient inequality by probing along rays $x + tz$ and sending $t \to 0$. The subgradient inequality gives a lower bound on the difference quotient $(f(x + tz) - f(x))/t$ for each $t > 0$, and differentiability at $x$ identifies the limit of these difference quotients as $\nabla f(x)^\top z$. Because $g^\top z$ is bounded above by the difference quotient for every $t > 0$, it must also be bounded above by the limit, yielding $g^\top z \leq \nabla f(x)^\top z$. Why is differentiability essential here? Without differentiability, the directional derivative $\lim_{t \downarrow 0}(f(x + tz) - f(x))/t$ may still exist (convex functions always have directional derivatives), but it need not be a linear function of $z$. Differentiability forces the directional derivative to equal $\nabla f(x)^\top z$ -- a linear function of $z$ -- which is what allows the final step below to pin down $g$ uniquely. [/guided] [/step] [step:Choose $z = g - \nabla f(x)$ to conclude $g = \nabla f(x)$] Taking $z = g - \nabla f(x)$ in the inequality $g^\top z \leq \nabla f(x)^\top z$: \begin{align*} g^\top(g - \nabla f(x)) \leq \nabla f(x)^\top(g - \nabla f(x)). \end{align*} Subtracting the right-hand side from both sides: \begin{align*} (g - \nabla f(x))^\top(g - \nabla f(x)) \leq 0, \end{align*} i.e., $\|g - \nabla f(x)\|_2^2 \leq 0$. Since the squared norm is non-negative, we conclude $\|g - \nabla f(x)\|_2^2 = 0$ and hence $g = \nabla f(x)$. Combining with the first step, every element of $\partial f(x)$ equals $\nabla f(x)$, so $\partial f(x) = \{\nabla f(x)\}$. [/step]

Explore Further

Finite Class Generalization Bound Machine Learning Contraction Lemma Machine Learning Calibration via Differentiability at Zero Machine Learning Fundamental Theorem of Statistical Learning Theory Machine Learning Regression Function Minimises Squared-Error Risk Machine Learning Universal Approximation Machine Learning Symmetrization Bound Machine Learning Bayes Classifier Machine Learning

What brings you to Androma?

Start with a route through the knowledge graph.

Subdifferential at Points of Differentiability (Theorem # 1988)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Subdifferential at Points of Differentiability (Theorem # 1988)

Discussion

Proof

Explore Further