First-Order Optimality Condition for Constrained Convex Minimization

First-Order Optimality Condition for Constrained Convex Minimization (Theorem # 6675)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The normal cone condition is exactly a variational inequality: the gradient has nonnegative [inner product](/page/Inner%20Product) with every feasible displacement from $x_*$. For a differentiable convex function, convexity along each line segment gives the first-order lower bound $f(y) \ge f(x_*) + \nabla f(x_*) \cdot (y - x_*)$. This lower bound proves sufficiency. Conversely, if $x_*$ minimizes on $C$, then every feasible segment starting at $x_*$ has nonnegative right derivative at $0$, which gives the same variational inequality and hence the normal cone condition. [/proofplan] [step:Translate the normal cone condition into a variational inequality] By the definition of $N_C(x_*)$, \begin{align*} -\nabla f(x_*) \in N_C(x_*) \end{align*} is equivalent to \begin{align*} (-\nabla f(x_*)) \cdot (y - x_*) \le 0 \text{ for every } y \in C. \end{align*} Multiplying by $-1$, this is equivalent to the variational inequality \begin{align*} \nabla f(x_*) \cdot (y - x_*) \ge 0 \text{ for every } y \in C. \end{align*} [/step] [step:Derive the convex first order inequality along line segments] We first record the first-order inequality needed for the sufficiency direction. Let $x \in U$ and $y \in U$. Because $U$ is convex, the segment $\{x + t(y - x) : 0 \le t \le 1\}$ is contained in $U$. Define \begin{align*} \phi_{x,y}: [0,1] \to \mathbb{R}, \qquad t \mapsto f(x + t(y - x)). \end{align*} Convexity of $f$ implies convexity of $\phi_{x,y}$. For every $t \in (0,1]$, convexity gives \begin{align*} \phi_{x,y}(t) \le (1 - t)\phi_{x,y}(0) + t\phi_{x,y}(1). \end{align*} Rearranging, \begin{align*} \frac{\phi_{x,y}(t) - \phi_{x,y}(0)}{t} \le \phi_{x,y}(1) - \phi_{x,y}(0). \end{align*} Since $f$ is differentiable at $x$, the right derivative of $\phi_{x,y}$ at $0$ exists and equals \begin{align*} \phi_{x,y}'(0) = \nabla f(x) \cdot (y - x). \end{align*} Letting $t \downarrow 0$ yields \begin{align*} f(y) \ge f(x) + \nabla f(x) \cdot (y - x). \end{align*} [guided] We need a lower bound for $f(y)$ in terms of the value and gradient at $x$. The natural way to get it is to restrict the convex function $f$ to the line segment from $x$ to $y$. Since $U$ is convex and $x,y \in U$, every point $x + t(y - x)$ with $0 \le t \le 1$ lies in $U$, so the following map is well-defined: \begin{align*} \phi_{x,y}: [0,1] \to \mathbb{R}, \qquad t \mapsto f(x + t(y - x)). \end{align*} Convexity of $f$ implies that $\phi_{x,y}$ is convex. Indeed, for $s,t \in [0,1]$ and $\lambda \in [0,1]$, the point \begin{align*} x + ((1-\lambda)s + \lambda t)(y - x) \end{align*} is the convex combination \begin{align*} (1-\lambda)(x + s(y - x)) + \lambda(x + t(y - x)). \end{align*} Applying convexity of $f$ to these two points gives convexity of $\phi_{x,y}$. Now fix $t \in (0,1]$. Convexity of $\phi_{x,y}$ at the point $t = (1-t)0 + t1$ gives \begin{align*} \phi_{x,y}(t) \le (1 - t)\phi_{x,y}(0) + t\phi_{x,y}(1). \end{align*} Subtracting $\phi_{x,y}(0)$ and dividing by the positive number $t$ gives \begin{align*} \frac{\phi_{x,y}(t) - \phi_{x,y}(0)}{t} \le \phi_{x,y}(1) - \phi_{x,y}(0). \end{align*} Because $f$ is differentiable at $x$, the directional derivative of $f$ at $x$ in the direction $y - x$ exists and is given by the Euclidean inner product with the gradient: \begin{align*} \lim_{t \downarrow 0}\frac{f(x + t(y - x)) - f(x)}{t} = \nabla f(x) \cdot (y - x). \end{align*} This is exactly the right derivative of $\phi_{x,y}$ at $0$. Taking the limit $t \downarrow 0$ in the previous inequality therefore yields \begin{align*} \nabla f(x) \cdot (y - x) \le f(y) - f(x). \end{align*} Equivalently, \begin{align*} f(y) \ge f(x) + \nabla f(x) \cdot (y - x). \end{align*} This is the first-order inequality for differentiable convex functions. [/guided] [/step] [step:Use the variational inequality to prove minimality] Assume \begin{align*} -\nabla f(x_*) \in N_C(x_*). \end{align*} By the first step, this means \begin{align*} \nabla f(x_*) \cdot (y - x_*) \ge 0 \text{ for every } y \in C. \end{align*} Let $y \in C$ be arbitrary. Since $C \subset U$, both $x_*$ and $y$ belong to $U$. Applying the first-order convexity inequality from the previous step with $x = x_*$ gives \begin{align*} f(y) \ge f(x_*) + \nabla f(x_*) \cdot (y - x_*). \end{align*} The variational inequality makes the second term nonnegative, so \begin{align*} f(y) \ge f(x_*). \end{align*} Because $y \in C$ was arbitrary, $x_*$ is a global minimizer of $f$ over $C$. [/step] [step:Use minimality along feasible segments to recover the normal cone condition] Assume that $x_*$ is a global minimizer of $f$ over $C$. Let $y \in C$ be arbitrary. Since $C$ is convex, the segment point \begin{align*} x_* + t(y - x_*) \in C \end{align*} for every $t \in [0,1]$. Define \begin{align*} \psi_y: [0,1] \to \mathbb{R}, \qquad t \mapsto f(x_* + t(y - x_*)). \end{align*} For every $t \in (0,1]$, minimality of $x_*$ over $C$ gives \begin{align*} \psi_y(t) = f(x_* + t(y - x_*)) \ge f(x_*) = \psi_y(0). \end{align*} Hence \begin{align*} \frac{\psi_y(t) - \psi_y(0)}{t} \ge 0. \end{align*} Since $f$ is differentiable at $x_*$, letting $t \downarrow 0$ gives \begin{align*} \nabla f(x_*) \cdot (y - x_*) \ge 0. \end{align*} Because $y \in C$ was arbitrary, \begin{align*} (-\nabla f(x_*)) \cdot (y - x_*) \le 0 \text{ for every } y \in C. \end{align*} Thus \begin{align*} -\nabla f(x_*) \in N_C(x_*). \end{align*} This proves the converse implication and completes the equivalence. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Variational Inequality for the Obstacle Problem

Definitions & Concepts

Inner Product

Explore Further

Inner Product Definition Variational Inequality for the Obstacle Problem Theorem #6434 Similarity Invariance of Transfer Functions applied 3-SAT Is NP-Complete applied Polynomial Size Bound for Local Gadget Constructions applied S-Lemma applied Closure of Polynomial Time Under Polynomial-Time Many-One Reductions applied Nesterov-Todd Self-Concordant Barrier applied Epigraph Characterization of Convex Functions applied Linear-Time Multi-Tape Simulation of Single-Tape Turing Machines applied

What brings you to Androma?

Start with a route through the knowledge graph.