First-Order Optimality Condition for Composite Convex Minimization

First-Order Optimality Condition for Composite Convex Minimization (Theorem # 6719)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the equivalence directly from one-sided variations and the defining inequalities for convex differentiable functions and subgradients. If $\hat{x}$ is optimal, we compare $F(\hat{x})$ with $F((1-t)\hat{x}+tx)$, divide by $t>0$, use convexity of $R$, and let $t \downarrow 0$ using differentiability of $\ell$; this gives the subgradient inequality for $R$ at $\hat{x}$ with subgradient \begin{align*} -\frac{1}{\lambda}\nabla \ell(\hat{x}). \end{align*} Conversely, if such a subgradient exists, the subgradient inequality for $R$ and the first-order convexity inequality for $\ell$ combine to show that every $x \in \mathbb{R}^n$ has objective value at least that of $\hat{x}$. [/proofplan] [step:Record the two convex inequalities used in both directions] Define the composite objective $F: \mathbb{R}^n \to (-\infty,\infty]$ by \begin{align*} F(x) = \ell(x) + \lambda R(x). \end{align*} Since $\hat{x} \in \operatorname{dom} R$ and $\ell$ is real-valued, $F(\hat{x}) < \infty$. Because $\ell: \mathbb{R}^n \to \mathbb{R}$ is convex and differentiable, for every $x \in \mathbb{R}^n$ the first-order convexity inequality gives \begin{align*} \ell(x) \geq \ell(\hat{x}) + \nabla \ell(\hat{x}) \cdot (x - \hat{x}). \end{align*} For a vector $z \in \mathbb{R}^n$, the subgradient condition $z \in \partial R(\hat{x})$ means precisely that for every $x \in \mathbb{R}^n$, \begin{align*} R(x) \geq R(\hat{x}) + z \cdot (x - \hat{x}). \end{align*} [/step] [step:Derive the subgradient certificate from global optimality] Assume that $\hat{x}$ minimizes $F$ over $\mathbb{R}^n$. Define the vector $z \in \mathbb{R}^n$ by \begin{align*} z = -\frac{1}{\lambda}\nabla \ell(\hat{x}). \end{align*} We prove that $z \in \partial R(\hat{x})$. Fix $x \in \mathbb{R}^n$. If $R(x) = \infty$, then the subgradient inequality for $R$ at $\hat{x}$ holds because $R(\hat{x}) < \infty$ and the left-hand side is $\infty$. Suppose instead that $R(x) < \infty$. For each $t \in (0,1]$, define the point $x_t \in \mathbb{R}^n$ by \begin{align*} x_t = (1-t)\hat{x} + tx. \end{align*} Convexity of $R$ and finiteness of $R(\hat{x})$ and $R(x)$ give $R(x_t) < \infty$. Since $\hat{x}$ minimizes $F$, \begin{align*} 0 \leq \ell(x_t) - \ell(\hat{x}) + \lambda\bigl(R(x_t) - R(\hat{x})\bigr). \end{align*} Dividing by $t>0$ gives \begin{align*} 0 \leq \frac{\ell(x_t)-\ell(\hat{x})}{t} + \lambda\frac{R(x_t)-R(\hat{x})}{t}. \end{align*} Convexity of $R$ also gives \begin{align*} R(x_t) \leq (1-t)R(\hat{x}) + tR(x), \end{align*} and hence \begin{align*} \frac{R(x_t)-R(\hat{x})}{t} \leq R(x)-R(\hat{x}). \end{align*} Therefore \begin{align*} 0 \leq \frac{\ell(x_t)-\ell(\hat{x})}{t} + \lambda\bigl(R(x)-R(\hat{x})\bigr). \end{align*} Because $\ell$ is differentiable at $\hat{x}$ and $x_t-\hat{x}=t(x-\hat{x})$, letting $t \downarrow 0$ yields \begin{align*} 0 \leq \nabla \ell(\hat{x}) \cdot (x - \hat{x}) + \lambda\bigl(R(x)-R(\hat{x})\bigr). \end{align*} Dividing by $\lambda>0$ and rearranging gives \begin{align*} R(x) \geq R(\hat{x}) + z \cdot (x - \hat{x}). \end{align*} Thus $z \in \partial R(\hat{x})$, and by definition of $z$, \begin{align*} \nabla \ell(\hat{x}) + \lambda z = 0. \end{align*} [guided] Assume that $\hat{x}$ is a global minimizer of the composite objective $F: \mathbb{R}^n \to (-\infty,\infty]$, where \begin{align*} F(x) = \ell(x) + \lambda R(x). \end{align*} We want to turn the global comparison $F(\hat{x}) \leq F(y)$ for every $y \in \mathbb{R}^n$ into the subgradient inequality for $R$ at $\hat{x}$. The desired optimality equation forces the candidate subgradient. Define $z \in \mathbb{R}^n$ by \begin{align*} z = -\frac{1}{\lambda}\nabla \ell(\hat{x}). \end{align*} To prove $z \in \partial R(\hat{x})$, we must prove \begin{align*} R(x) \geq R(\hat{x}) + z \cdot (x - \hat{x}) \end{align*} for every $x \in \mathbb{R}^n$. Fix $x \in \mathbb{R}^n$. If $R(x)=\infty$, the inequality holds because $R(\hat{x})<\infty$ and the left-hand side is $\infty$. Hence suppose $R(x)<\infty$. The direct comparison $F(\hat{x})\leq F(x)$ is not enough by itself, because a lower bound for $\ell(x)$ cannot be substituted into the upper side of that inequality. Instead, we approach $x$ from $\hat{x}$ along the line segment and then take a one-sided derivative. For each $t \in (0,1]$, define $x_t \in \mathbb{R}^n$ by \begin{align*} x_t = (1-t)\hat{x} + tx. \end{align*} Since $R$ is convex and both $R(\hat{x})$ and $R(x)$ are finite, convexity gives \begin{align*} R(x_t) \leq (1-t)R(\hat{x}) + tR(x) < \infty. \end{align*} Thus $F(x_t)$ is finite, and optimality of $\hat{x}$ gives \begin{align*} 0 \leq F(x_t)-F(\hat{x}) = \ell(x_t)-\ell(\hat{x}) + \lambda\bigl(R(x_t)-R(\hat{x})\bigr). \end{align*} Dividing by $t>0$ preserves the inequality: \begin{align*} 0 \leq \frac{\ell(x_t)-\ell(\hat{x})}{t} + \lambda\frac{R(x_t)-R(\hat{x})}{t}. \end{align*} The convexity estimate for $R(x_t)$ also gives \begin{align*} R(x_t)-R(\hat{x}) \leq t\bigl(R(x)-R(\hat{x})\bigr), \end{align*} so \begin{align*} \frac{R(x_t)-R(\hat{x})}{t} \leq R(x)-R(\hat{x}). \end{align*} Substituting this upper bound is valid because it makes the right-hand side larger, and the previous non-negative quantity remains bounded above by that larger expression. We obtain \begin{align*} 0 \leq \frac{\ell(x_t)-\ell(\hat{x})}{t} + \lambda\bigl(R(x)-R(\hat{x})\bigr). \end{align*} Now use differentiability of $\ell$ at $\hat{x}$. Since $x_t-\hat{x}=t(x-\hat{x})$, the directional difference quotient satisfies \begin{align*} \lim_{t\downarrow 0}\frac{\ell(x_t)-\ell(\hat{x})}{t} = \nabla \ell(\hat{x}) \cdot (x-\hat{x}). \end{align*} Letting $t \downarrow 0$ gives \begin{align*} 0 \leq \nabla \ell(\hat{x}) \cdot (x - \hat{x}) + \lambda\bigl(R(x)-R(\hat{x})\bigr). \end{align*} After dividing by the positive scalar $\lambda$ and rearranging, this becomes \begin{align*} R(x) \geq R(\hat{x}) - \frac{1}{\lambda}\nabla \ell(\hat{x}) \cdot (x - \hat{x}). \end{align*} Using the definition of $z$, we get \begin{align*} R(x) \geq R(\hat{x}) + z \cdot (x - \hat{x}). \end{align*} This is exactly the definition of $z \in \partial R(\hat{x})$. Finally, the definition of $z$ gives \begin{align*} \nabla \ell(\hat{x}) + \lambda z = 0. \end{align*} [/guided] [/step] [step:Use the certificate to prove global optimality] Conversely, assume that there exists $z \in \partial R(\hat{x})$ such that \begin{align*} \nabla \ell(\hat{x}) + \lambda z = 0. \end{align*} Fix $x \in \mathbb{R}^n$. The first-order convexity inequality for $\ell$ gives \begin{align*} \ell(x) \geq \ell(\hat{x}) + \nabla \ell(\hat{x}) \cdot (x - \hat{x}). \end{align*} The subgradient inequality for $z \in \partial R(\hat{x})$ gives \begin{align*} R(x) \geq R(\hat{x}) + z \cdot (x - \hat{x}). \end{align*} Multiplying the [second inequality](/theorems/2136) by $\lambda > 0$ and adding it to the first yields \begin{align*} \ell(x) + \lambda R(x) \geq \ell(\hat{x}) + \lambda R(\hat{x}) + \bigl(\nabla \ell(\hat{x}) + \lambda z\bigr) \cdot (x - \hat{x}). \end{align*} The certificate makes the final [inner product](/page/Inner%20Product) equal to $0$, hence \begin{align*} F(x) \geq F(\hat{x}). \end{align*} Since $x \in \mathbb{R}^n$ was arbitrary, $\hat{x}$ minimizes $F$ over $\mathbb{R}^n$. [/step] [step:Rewrite the certificate as a subdifferential inclusion] The equation \begin{align*} \nabla \ell(\hat{x}) + \lambda z = 0 \end{align*} is equivalent, because $\lambda > 0$, to \begin{align*} z = -\frac{1}{\lambda}\nabla \ell(\hat{x}). \end{align*} Therefore the existence of $z \in \partial R(\hat{x})$ satisfying the optimality equation is equivalent to the single inclusion \begin{align*} -\frac{1}{\lambda}\nabla \ell(\hat{x}) \in \partial R(\hat{x}). \end{align*} This proves both stated formulations of the optimality certificate. [/step]

Explore Further

Kalman Observability Rank Theorem applied Saddle Point Theorem for Lagrange Duality applied Observable Companion Form Theorem applied Polynomial Invariance of Polynomial Time applied PBH Stabilizability Criterion applied Feedback Invariance of Unreachable Eigenvalues applied Convexity of the Set of Minimizers of a Convex Function applied Space-Bounded Computations Have Exponential-Time Simulations applied

What brings you to Androma?

Start with a route through the knowledge graph.