Non-Emptiness of the Subdifferential — Statement & Proof

Non-Emptiness of the Subdifferential (Theorem # 1987)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We construct a subgradient at $x$ using a supporting hyperplane argument on the epigraph. The epigraph $\operatorname{epi}(f) = \{(z, y) \in \mathbb{R}^d \times \mathbb{R} : y \geq f(z)\}$ is a closed convex set, and the point $(x, f(x))$ lies on its boundary. We approach $(x, f(x))$ from outside the epigraph via a sequence of points, use the [Projection Theorem](/theorems/1985) to produce outer normals, extract a convergent subsequence by the Bolzano--Weierstrass theorem, and show that the limiting normal vector defines a subgradient. [/proofplan] [step:Establish that the epigraph is closed and convex, and construct an approaching sequence] The epigraph of $f$ is \begin{align*} C := \operatorname{epi}(f) = \{(z, y) \in \mathbb{R}^d \times \mathbb{R} : y \geq f(z)\}. \end{align*} Since $f$ is convex and real-valued on $\mathbb{R}^d$, $C$ is convex: for $(z_1, y_1), (z_2, y_2) \in C$ and $t \in [0,1]$, \begin{align*} f((1-t)z_1 + tz_2) \leq (1-t)f(z_1) + tf(z_2) \leq (1-t)y_1 + ty_2, \end{align*} so $((1-t)z_1 + tz_2, (1-t)y_1 + ty_2) \in C$. Moreover, $C$ is closed: a finite-valued convex function on $\mathbb{R}^d$ is continuous (this is a standard consequence of local boundedness of convex functions), so if $(z_k, y_k) \in C$ and $(z_k, y_k) \to (z_0, y_0)$, then $y_0 = \lim_k y_k \geq \lim_k f(z_k) = f(z_0)$ by continuity, hence $(z_0, y_0) \in C$. The point $(x, f(x))$ lies on the boundary of $C$. Define the sequence \begin{align*} w_k := \left(x,\; f(x) - \frac{1}{k}\right) \in \mathbb{R}^{d+1}, \quad k = 1, 2, 3, \ldots \end{align*} Since $f(x) - 1/k < f(x)$, each $w_k \notin C$, and $w_k \to (x, f(x))$ as $k \to \infty$. [/step] [step:Apply the Projection Theorem to each $w_k$ to obtain outer normal vectors] For each $k$, since $C$ is closed and convex in $\mathbb{R}^{d+1}$ and $w_k \notin C$, the [Projection Theorem](/theorems/1985) guarantees the existence of a unique projection $\pi_C(w_k) \in C$. The variational characterisation (part (i) of the Projection Theorem) states that for all $w \in C$: \begin{align*} (w_k - \pi_C(w_k))^\top(w - \pi_C(w_k)) \leq 0. \end{align*} Define $v_k := w_k - \pi_C(w_k)$. Since $w_k \notin C$ but $\pi_C(w_k) \in C$, we have $v_k \neq 0$. The variational inequality becomes: for all $w \in C$, \begin{align*} v_k^\top w \leq v_k^\top \pi_C(w_k) \leq v_k^\top w_k, \end{align*} where the second inequality uses $v_k^\top \pi_C(w_k) = v_k^\top(w_k - v_k) = v_k^\top w_k - \|v_k\|_2^2 \leq v_k^\top w_k$. Normalise: define $\hat{v}_k := v_k / \|v_k\|_2$. Then for all $w \in C$: \begin{align*} \hat{v}_k^\top w \leq \hat{v}_k^\top w_k. \end{align*} [/step] [step:Extract a convergent subsequence of the normalised normals] The sequence $(\hat{v}_k)_{k \geq 1}$ lies in the unit sphere $\{u \in \mathbb{R}^{d+1} : \|u\|_2 = 1\}$, which is compact (closed and bounded in $\mathbb{R}^{d+1}$). By the Bolzano--Weierstrass theorem, there exists a subsequence $(\hat{v}_{k_j})_{j \geq 1}$ converging to some $\hat{v} \in \mathbb{R}^{d+1}$ with $\|\hat{v}\|_2 = 1$. Write $\hat{v} = (-\tilde{g}, \alpha)$ where $\tilde{g} \in \mathbb{R}^d$ and $\alpha \in \mathbb{R}$. Passing to the limit in $\hat{v}_{k_j}^\top w \leq \hat{v}_{k_j}^\top w_{k_j}$, and using $w_{k_j} \to (x, f(x))$: for all $w = (z, y) \in C$, \begin{align*} -\tilde{g}^\top z + \alpha y \leq -\tilde{g}^\top x + \alpha f(x). \end{align*} [/step] [step:Show that $\alpha < 0$ and extract a subgradient] Taking $z = x$ and letting $y \to +\infty$ (since $(x, y) \in C$ for all $y \geq f(x)$), the left-hand side becomes $-\tilde{g}^\top x + \alpha y$. For this to remain bounded above by $-\tilde{g}^\top x + \alpha f(x)$ as $y \to +\infty$, we require $\alpha \leq 0$. Suppose $\alpha = 0$. Then the inequality becomes $-\tilde{g}^\top z \leq -\tilde{g}^\top x$ for all $z \in \mathbb{R}^d$ (since every $(z, f(z)) \in C$), i.e., $\tilde{g}^\top(z - x) \geq 0$ for all $z \in \mathbb{R}^d$. Taking $z = x - \tilde{g}$ gives $-\|\tilde{g}\|_2^2 \geq 0$, hence $\tilde{g} = 0$. But then $\hat{v} = (0, 0)$, contradicting $\|\hat{v}\|_2 = 1$. Therefore $\alpha < 0$. [guided] Why must $\alpha$ be strictly negative? The vector $\hat{v} = (-\tilde{g}, \alpha)$ is a unit outward normal to the epigraph at the boundary point $(x, f(x))$. The epigraph extends upward (in the $y$-direction) without bound, so any outward normal must have a non-positive last component. If the last component were zero, the normal would lie entirely in the "horizontal" subspace $\mathbb{R}^d \times \{0\}$, meaning the supporting hyperplane is vertical. But the epigraph of a finite-valued convex function on all of $\mathbb{R}^d$ is "wide" enough that no vertical hyperplane can support it (because $f$ is defined everywhere, the epigraph projects onto all of $\mathbb{R}^d$). This forces $\alpha < 0$. [/guided] [/step] [step:Rearrange the inequality to obtain $g \in \partial f(x)$] Since $\alpha < 0$, define $g := \tilde{g}/|\alpha| = -\tilde{g}/\alpha \in \mathbb{R}^d$. (Note that $\alpha < 0$, so $|\alpha| = -\alpha$.) Divide the inequality $-\tilde{g}^\top z + \alpha y \leq -\tilde{g}^\top x + \alpha f(x)$ by $|\alpha| = -\alpha > 0$ (which flips the sign of the $\alpha$ terms): \begin{align*} \frac{\tilde{g}^\top}{\alpha} z - y \leq \frac{\tilde{g}^\top}{\alpha} x - f(x), \end{align*} which rearranges to \begin{align*} f(x) - g^\top x \leq y - g^\top z. \end{align*} This holds for all $(z, y) \in C$, i.e., for all $z \in \mathbb{R}^d$ and all $y \geq f(z)$. Taking $y = f(z)$: \begin{align*} f(x) + g^\top(z - x) \leq f(z) \quad \text{for all } z \in \mathbb{R}^d. \end{align*} This is exactly the subgradient inequality, so $g \in \partial f(x)$. In particular, $\partial f(x) \neq \varnothing$. [guided] The full argument is a geometric one: we separate the point $(x, f(x))$ from the interior of the epigraph using a supporting hyperplane. The supporting hyperplane has equation $-\tilde{g}^\top z + \alpha y = -\tilde{g}^\top x + \alpha f(x)$, and the entire epigraph lies on one side. The crucial observation is that the "vertical" component $\alpha$ of the normal must be strictly negative (the hyperplane is not vertical), which allows us to solve for the slope $g = \tilde{g}/|\alpha|$ and recover the subgradient inequality. This argument generalises: any convex function $f$ that is proper (not identically $+\infty$) and lower semicontinuous has non-empty subdifferential at every point of its effective domain. The proof here specialises to the case where $f$ is finite-valued on all of $\mathbb{R}^d$, which avoids issues with relative interiors and effective domains. [/guided] [/step]

Explore Further

AdaBoost Training Error Bound Machine Learning Finite Class Generalization Bound Machine Learning Ensemble Variance Formula Machine Learning Expected Risk Bound via Rademacher Complexity Machine Learning Universal Approximation Machine Learning Subgradient Calculus Machine Learning Projection Theorem Machine Learning Bayes Classifier Machine Learning

What brings you to Androma?

Start with a route through the knowledge graph.

Non-Emptiness of the Subdifferential (Theorem # 1987)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Non-Emptiness of the Subdifferential (Theorem # 1987)

Discussion

Proof

Explore Further