Properties of Convex Functions — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We prove each part by direct application of the definition of convexity and standard calculus. Parts (i)--(iv) verify the convexity inequality for the newly constructed function or set. Part (v) derives the first-order condition by rearranging the convexity inequality along a line segment and taking the limit as $t \to 0^+$. Part (vi) is a short contradiction argument. Part (vii) connects convexity to the Hessian via Taylor's theorem applied along line segments. [/proofplan] [step:Prove part (i): non-negative linear combinations preserve (strict) convexity] Let $x, y \in C$ and $t \in (0,1)$. Since $f$ and $g$ are both convex on $C$: \begin{align*} (af + bg)\bigl((1-t)x + ty\bigr) &= af\bigl((1-t)x + ty\bigr) + bg\bigl((1-t)x + ty\bigr) \\ &\leq a\bigl[(1-t)f(x) + tf(y)\bigr] + b\bigl[(1-t)g(x) + tg(y)\bigr] \\ &= (1-t)\bigl[af(x) + bg(x)\bigr] + t\bigl[af(y) + bg(y)\bigr] \\ &= (1-t)(af + bg)(x) + t(af + bg)(y). \end{align*} The second line uses $a > 0$ and the convexity of $f$ (so $a$ preserves the inequality direction), and similarly $b > 0$ with the convexity of $g$. This establishes convexity of $af + bg$. For strict convexity: if $g$ is strictly convex and $x \neq y$, then $g\bigl((1-t)x + ty\bigr) < (1-t)g(x) + tg(y)$ for $t \in (0,1)$. Since $b > 0$, the inequality in the second line becomes strict, so $(af + bg)\bigl((1-t)x + ty\bigr) < (1-t)(af + bg)(x) + t(af + bg)(y)$. [/step] [step:Prove part (ii): composition with affine maps preserves convexity] Define $g : \mathbb{R}^m \to \mathbb{R}$ by $g(x) = f(Ax - b)$. Let $x, y \in \mathbb{R}^m$ and $t \in (0,1)$. Since the affine map $x \mapsto Ax - b$ is linear in $x$: \begin{align*} A\bigl((1-t)x + ty\bigr) - b = (1-t)(Ax - b) + t(Ay - b). \end{align*} The points $Ax - b$ and $Ay - b$ lie in $\mathbb{R}^d = C$, so by the convexity of $f$: \begin{align*} g\bigl((1-t)x + ty\bigr) &= f\bigl(A\bigl((1-t)x + ty\bigr) - b\bigr) \\ &= f\bigl((1-t)(Ax-b) + t(Ay-b)\bigr) \\ &\leq (1-t)f(Ax-b) + tf(Ay-b) \\ &= (1-t)g(x) + tg(y). \end{align*} [/step] [step:Prove part (iii): pointwise suprema of convex functions are convex on their effective domain] First, we show $D = \{x \in C : g(x) < \infty\}$ is convex. Take $x, y \in D$ (so $g(x) < \infty$ and $g(y) < \infty$) and $t \in (0,1)$. Since $C$ is convex, $(1-t)x + ty \in C$. For each $\alpha \in I$, the convexity of $f_\alpha$ gives \begin{align*} f_\alpha\bigl((1-t)x + ty\bigr) \leq (1-t)f_\alpha(x) + tf_\alpha(y) \leq (1-t)g(x) + tg(y), \end{align*} where the second inequality uses $f_\alpha(x) \leq \sup_{\alpha' \in I} f_{\alpha'}(x) = g(x)$ and similarly for $y$. Taking the supremum over $\alpha \in I$: \begin{align*} g\bigl((1-t)x + ty\bigr) = \sup_{\alpha \in I} f_\alpha\bigl((1-t)x + ty\bigr) \leq (1-t)g(x) + tg(y) < \infty. \end{align*} This shows $(1-t)x + ty \in D$ (the supremum is finite), establishing that $D$ is convex. The displayed inequality is also the convexity condition for $g$ on $D$. [guided] The argument has two components. First, each convex function $f_\alpha$ satisfies the convexity inequality, giving an upper bound on $f_\alpha$ at the convex combination. Second, we use $f_\alpha \leq g$ to replace $f_\alpha(x)$ and $f_\alpha(y)$ by $g(x)$ and $g(y)$, yielding an upper bound that is uniform in $\alpha$. Taking the supremum on the left-hand side then gives the convexity of $g$ itself. An important subtlety: we must verify that $(1-t)x + ty \in D$, not just that $g$ satisfies the convexity inequality. The displayed computation confirms both simultaneously -- the right-hand side is finite (since $g(x), g(y) < \infty$), so $g((1-t)x + ty) < \infty$ and the point lies in $D$. [/guided] [/step] [step:Prove part (iv): sublevel sets of convex functions are convex] Let $S_M := \{x \in C : f(x) \leq M\}$ and take $x, y \in S_M$, so $f(x) \leq M$ and $f(y) \leq M$. For $t \in (0,1)$, the point $(1-t)x + ty \in C$ by convexity of $C$, and \begin{align*} f\bigl((1-t)x + ty\bigr) \leq (1-t)f(x) + tf(y) \leq (1-t)M + tM = M. \end{align*} Therefore $(1-t)x + ty \in S_M$, and $S_M$ is convex. [/step] [step:Prove part (v): the first-order condition from convexity and differentiability] Let $x \in \operatorname{int}(C)$ and $y \in C$. For $t \in (0,1)$, the point $x + t(y - x) = (1-t)x + ty \in C$ by convexity of $C$. The convexity of $f$ gives \begin{align*} f\bigl(x + t(y-x)\bigr) \leq (1-t)f(x) + tf(y) = f(x) + t\bigl(f(y) - f(x)\bigr). \end{align*} Rearranging: \begin{align*} \frac{f\bigl(x + t(y-x)\bigr) - f(x)}{t} \leq f(y) - f(x). \end{align*} Since $f$ is differentiable at $x$, the left-hand side has a limit as $t \to 0^+$. By the definition of the directional derivative and the chain rule applied to the map $t \mapsto f(x + t(y-x))$: \begin{align*} \lim_{t \to 0^+} \frac{f\bigl(x + t(y-x)\bigr) - f(x)}{t} = \nabla f(x)^\top (y - x). \end{align*} Since the inequality holds for all $t \in (0,1)$, it persists in the limit: \begin{align*} \nabla f(x)^\top (y - x) \leq f(y) - f(x). \end{align*} Rearranging gives $f(y) \geq f(x) + \nabla f(x)^\top (y - x)$. For the "in particular" statement: if $\nabla f(x) = 0$, the inequality becomes $f(y) \geq f(x)$ for all $y \in C$, so $x$ is a global minimiser. [guided] The geometric interpretation is fundamental. The first-order condition says that the tangent hyperplane $y \mapsto f(x) + \nabla f(x)^\top(y - x)$ at any point $x$ lies below the graph of $f$. This is equivalent to saying that the graph of $f$ lies above every tangent hyperplane -- the defining property of a convex function from the perspective of supporting hyperplanes. The derivation proceeds by dividing the convexity inequality by $t$: \begin{align*} \frac{f(x + t(y-x)) - f(x)}{t} \leq f(y) - f(x). \end{align*} The left-hand side is a difference quotient along the direction $y - x$. As $t \to 0^+$, this converges to the directional derivative $\nabla f(x)^\top(y-x)$ by differentiability. The right-hand side is constant in $t$, so the limit inequality follows. Why do we need $x \in \operatorname{int}(C)$? Differentiability at $x$ requires that $x$ be an interior point of the domain, so that the difference quotient is well-defined for small perturbations in all directions. [/guided] [/step] [step:Prove part (vi): strict convexity implies uniqueness of minimisers] Suppose for contradiction that $x, y \in C$ are both minimisers of $f$ with $x \neq y$, so $f(x) = f(y) = \min_{z \in C} f(z)$. By strict convexity with $t = 1/2$: \begin{align*} f\!\left(\frac{x + y}{2}\right) < \frac{1}{2}f(x) + \frac{1}{2}f(y) = f(x). \end{align*} Since $(x + y)/2 \in C$ by convexity of $C$, this contradicts the assumption that $f(x)$ is the minimum of $f$ over $C$. [/step] [step:Prove part (vii)(a): convexity is equivalent to the Hessian being positive semi-definite everywhere] Let $f : \mathbb{R}^d \to \mathbb{R}$ be twice continuously differentiable, with Hessian $H(x) = \bigl(\partial_{x_i}\partial_{x_j} f(x)\bigr)_{i,j=1}^d$. **($\Rightarrow$)** Assume $f$ is convex. Fix $x \in \mathbb{R}^d$ and $v \in \mathbb{R}^d$. Define the one-variable function $\varphi : \mathbb{R} \to \mathbb{R}$ by $\varphi(t) = f(x + tv)$. By the first-order condition (part (v)) applied with $y = x + tv$ and then $y = x$ (taking $x + sv$ in place of $x$): \begin{align*} \varphi(t) = f(x + tv) \geq f(x + sv) + \nabla f(x + sv)^\top(t - s)v = \varphi(s) + \varphi'(s)(t - s) \end{align*} for all $s, t \in \mathbb{R}$. This shows $\varphi$ is convex on $\mathbb{R}$. In particular, $\varphi''(0) \geq 0$. Computing $\varphi''(0)$ by the chain rule: \begin{align*} \varphi''(0) = v^\top H(x) v. \end{align*} Since $v \in \mathbb{R}^d$ was arbitrary, $H(x)$ is positive semi-definite. Since $x$ was arbitrary, this holds for all $x \in \mathbb{R}^d$. **($\Leftarrow$)** Assume $H(x)$ is positive semi-definite for all $x$. Fix $x, y \in \mathbb{R}^d$ and $t \in (0,1)$. By Taylor's theorem with integral remainder applied to $f$ along the line segment from $x$ to $y$: \begin{align*} f(y) = f(x) + \nabla f(x)^\top(y - x) + \int_0^1 (1-s)(y-x)^\top H\bigl(x + s(y-x)\bigr)(y-x) \, d\mathcal{L}^1(s). \end{align*} Since $(y-x)^\top H(\xi)(y-x) \geq 0$ for every $\xi$ (by the positive semi-definiteness hypothesis), the integral is non-negative: \begin{align*} f(y) \geq f(x) + \nabla f(x)^\top(y - x). \end{align*} This is the first-order condition, which implies convexity: for $t \in (0,1)$, applying the inequality once with the pair $(x, (1-t)x + ty)$ and once with $(y, (1-t)x + ty)$, and taking the convex combination $(1-t) \cdot [\text{first}] + t \cdot [\text{second}]$, yields $f((1-t)x + ty) \leq (1-t)f(x) + tf(y)$. [guided] The forward direction ($\Rightarrow$) reduces convexity in $\mathbb{R}^d$ to convexity along every line. If $f$ is convex, then for any $x, v \in \mathbb{R}^d$, the restriction $\varphi(t) = f(x + tv)$ is a convex function of one variable. A twice-differentiable convex function of one variable satisfies $\varphi''(t) \geq 0$ for all $t$. In particular $\varphi''(0) = v^\top H(x) v \geq 0$, giving positive semi-definiteness. The reverse direction ($\Leftarrow$) uses Taylor's theorem to establish the first-order condition $f(y) \geq f(x) + \nabla f(x)^\top(y-x)$, and then deduces convexity from it. To see how the first-order condition implies convexity: let $z = (1-t)x + ty$ for $t \in (0,1)$. Apply the first-order condition at $z$ with $y$ replaced by $x$: $f(x) \geq f(z) + \nabla f(z)^\top(x - z)$. Apply it at $z$ with $y$ replaced by $y$: $f(y) \geq f(z) + \nabla f(z)^\top(y - z)$. Multiply the first by $(1-t)$ and the second by $t$ and add: \begin{align*} (1-t)f(x) + tf(y) &\geq f(z) + \nabla f(z)^\top\bigl[(1-t)(x-z) + t(y-z)\bigr] \\ &= f(z) + \nabla f(z)^\top\bigl[(1-t)x + ty - z\bigr] \\ &= f(z), \end{align*} since $(1-t)x + ty = z$ by definition. This is the convexity inequality. [/guided] [/step] [step:Prove part (vii)(b): positive definite Hessian implies strict convexity] Assume $H(x)$ is positive definite for all $x \in \mathbb{R}^d$. Take $x, y \in \mathbb{R}^d$ with $x \neq y$ and $t \in (0,1)$. By the Taylor expansion with integral remainder used in the previous step: \begin{align*} f(y) = f(x) + \nabla f(x)^\top(y - x) + \int_0^1 (1-s)(y-x)^\top H\bigl(x + s(y-x)\bigr)(y-x) \, d\mathcal{L}^1(s). \end{align*} Since $y \neq x$, the direction $v := y - x \neq 0$, and $v^\top H(\xi) v > 0$ for every $\xi$ by positive definiteness. The integrand $(1-s)v^\top H(x + sv)v > 0$ for $s \in [0,1)$ (and is zero only at $s = 1$), so the integral is strictly positive: \begin{align*} f(y) > f(x) + \nabla f(x)^\top(y - x). \end{align*} This is a strict first-order condition. The same argument as in part (vii)(a) (taking convex combinations of the strict first-order conditions at $x$ and $y$) then yields the strict convexity inequality $f((1-t)x + ty) < (1-t)f(x) + tf(y)$ for $t \in (0,1)$ and $x \neq y$. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Properties of Convex Functions (Theorem # 1976)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Properties of Convex Functions (Theorem # 1976)

Discussion

Proof

Explore Further