Alexandrov's Theorem (Convex Functions) — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The proof has four major stages. First, derive that the distributional Hessian of $f$ is a matrix of locally finite signed Radon measures, by combining the local Lipschitz bound for $\nabla f$ with the monotonicity of the subgradient (which forces $\nabla f \in BV_{\mathrm{loc}}$). Second, on the full-measure set of Lebesgue points $x_0$ of $\nabla f$ at which the absolutely continuous part of $D^2 f$ has a Lebesgue density $A(x_0) \in \mathbb{R}^{n \times n}_{\mathrm{sym}}$, deduce the $L^1$ Lebesgue-density estimate for the gradient remainder near $x_0$. Third, invoke the classical Alexandrov rigidity lemma (Federer 3.2.22 / Evans–Gariepy §6.4) as a black box: it states that the $L^1$-mean estimate on the gradient remainder, together with convexity of $f$, forces the pointwise quadratic Taylor expansion. Fourth, observe that the limit Hessian is positive semidefinite by the symmetric-difference convexity trick. The technical inputs are stated as `[claim]` blocks: bounded variation of $\nabla f$ from monotonicity, Lebesgue–Besicovitch differentiation for matrix-valued Radon measures, and the Alexandrov rigidity lemma upgrading $L^1$-control to pointwise control. [/proofplan] [step:Prepare the setting and the local Lipschitz bound for $\nabla f$] Throughout this proof, $U \subseteq \mathbb{R}^n$ is open and convex and $f: U \to \mathbb{R}$ is convex. By the theorem that convex functions are locally Lipschitz, $f$ is locally Lipschitz on $U$, hence by Rademacher's theorem, $f$ is (classically) differentiable at $\mathcal{L}^n$-a.e. point of $U$, with weak gradient \begin{align*} \nabla f: U &\to \mathbb{R}^n \end{align*} defined $\mathcal{L}^n$-a.e. and belonging to $L^\infty_{\mathrm{loc}}(U; \mathbb{R}^n)$. By the standard properties of the subdifferential, at every point of differentiability $\partial f(x) = \{\nabla f(x)\}$. Fix any open ball $B \Subset U$ (i.e., $\overline{B} \subseteq U$ compact). By local Lipschitzness there is a constant $L_B < \infty$ with $|f(x) - f(y)| \le L_B |x - y|$ for $x, y \in \overline{B}$, and $|\nabla f(x)| \le L_B$ for $\mathcal{L}^n$-a.e. $x \in \overline{B}$. [/step] [step:Show $\nabla f$ is locally $BV$ via monotonicity] [claim:For every open ball $B \Subset U$, $\nabla f \in BV(B; \mathbb{R}^n)$, and the distributional Jacobian $D(\nabla f)$ is an $n \times n$ matrix of finite signed Radon measures on $B$, symmetric in the sense that $\partial_{x_i}(\partial_{x_j} f) = \partial_{x_j}(\partial_{x_i} f)$ as distributions.] [/claim] *Proof of claim.* Fix an open ball $B \Subset U$. Choose a slightly enlarged ball $B' \Subset U$ with $\overline{B} \Subset B'$, and set $\delta := \operatorname{dist}(\overline{B}, \partial B') > 0$. For each unit vector $v \in S^{n-1}$ and each $h \in (0, \delta)$, define the directional difference quotient \begin{align*} \Delta_v^h \nabla f: B &\to \mathbb{R}^n, \\ x &\mapsto \frac{\nabla f(x + h v) - \nabla f(x)}{h}. \end{align*} This is well-defined for a.e. $x \in B$ since $x + hv \in B'$. By the monotonicity of the subdifferential, applied with $p = \nabla f(x + hv) \in \partial f(x + hv)$ and $q = \nabla f(x) \in \partial f(x)$ (at points of differentiability), \begin{align*} (\nabla f(x + hv) - \nabla f(x)) \cdot ((x + hv) - x) \ge 0, \end{align*} i.e., $h \, v \cdot \Delta_v^h \nabla f(x) \ge 0$. Dividing by $h^2 > 0$, \begin{align*} v \cdot \Delta_v^h \nabla f(x) \ge 0 \qquad \text{for a.e. } x \in B. \end{align*} In particular, taking $v = e_k$ shows that the scalar $\partial_{x_k} f$ is monotone-nondecreasing in the $x_k$-direction along $B$ (in the difference-quotient sense). Taking $v = (e_i + e_j)/\sqrt 2$ and expanding $v \cdot \nabla f = (\partial_{x_i} f + \partial_{x_j} f)/\sqrt 2$ shows that the scalar $\partial_{x_i} f + \partial_{x_j} f$ is monotone-nondecreasing in the direction $v$. Taking $v = (e_i - e_j)/\sqrt 2$ shows analogously that $\partial_{x_i} f - \partial_{x_j} f$ is monotone-nondecreasing in $v' = (e_i - e_j)/\sqrt 2$. Integrating these difference-quotient inequalities against test functions $\varphi \in C_c^\infty(B)$ with $\varphi \ge 0$, and passing to the distributional limit as $h \to 0^+$, yields nonnegative distributions: \begin{align*} T_k &:= \partial_{x_k}(\partial_{x_k} f) \ge 0, \\ T_{ij}^+ &:= \partial_v(\partial_{v} f) = \tfrac{1}{2}(\partial_{x_i} + \partial_{x_j})(\partial_{x_i} f + \partial_{x_j} f) = \tfrac{1}{2}(\partial_{ii} f + 2 \partial_{ij} f + \partial_{jj} f) \ge 0, \\ T_{ij}^- &:= \partial_{v'}(\partial_{v'} f) = \tfrac{1}{2}(\partial_{x_i} - \partial_{x_j})(\partial_{x_i} f - \partial_{x_j} f) = \tfrac{1}{2}(\partial_{ii} f - 2 \partial_{ij} f + \partial_{jj} f) \ge 0. \end{align*} By Schwartz's theorem on positive distributions (every nonnegative distribution is a Radon measure; see Schwartz, *Théorie des distributions*, Ch. I §4), each of $T_k$, $T_{ij}^+$, $T_{ij}^-$ is a nonnegative Radon measure on $B$. We now extract the off-diagonal entries by polarization: \begin{align*} \partial_{ij} f = \tfrac{1}{2}\bigl(T_{ij}^+ - T_{ij}^-\bigr), \end{align*} exhibiting $\partial_{ij} f$ as a difference of two nonnegative Radon measures, hence a finite signed Radon measure on $B$. The diagonal entries $\partial_{ii} f = T_i$ are themselves nonnegative measures. Since each component $\partial_{x_i} f \in L^\infty(B)$ has each distributional partial derivative $\partial_{x_j}(\partial_{x_i} f)$ a finite signed Radon measure on $B$, we conclude $\partial_{x_i} f \in BV(B)$ by the definition of $BV$, and hence $\nabla f \in BV(B; \mathbb{R}^n)$. The symmetry $\partial_{x_i}(\partial_{x_j} f) = \partial_{x_j}(\partial_{x_i} f)$ holds as distributions because mixed partials commute when applied to test functions (using $f \in L^1_{\mathrm{loc}}(B)$, which follows from $f$ being locally Lipschitz hence locally bounded): \begin{align*} \partial_{x_j}(\partial_{x_i} f)(\varphi) = -\int_B \partial_{x_i} f \, \partial_{x_j}\varphi = \int_B f \, \partial_{x_i}\partial_{x_j}\varphi = -\int_B \partial_{x_j} f\, \partial_{x_i}\varphi = \partial_{x_i}(\partial_{x_j} f)(\varphi). \end{align*} $\square$ By the claim, $\nabla f \in BV(B; \mathbb{R}^n)$ and $D(\nabla f) = (\partial_{x_j}(\partial_{x_i} f))_{i,j} =: D^2 f$ is an $n \times n$ matrix of finite signed Radon measures on $B$. By Lebesgue's decomposition for the matrix of signed measures $D^2 f$, write \begin{align*} D^2 f = D^2 f_{\mathrm{ac}} + D^2 f_{\mathrm{s}}, \end{align*} where $D^2 f_{\mathrm{ac}} \ll \mathcal{L}^n$ and $D^2 f_{\mathrm{s}} \perp \mathcal{L}^n$ on $B$. The Radon–Nikodym theorem provides a measurable matrix-valued density $A_B \in L^1(B; \mathbb{R}^{n \times n}_{\mathrm{sym}})$ with $D^2 f_{\mathrm{ac}} = A_B \, \mathcal{L}^n$. The symmetry of $D^2 f$ as a distribution descends to symmetry of $A_B(x)$ for $\mathcal{L}^n$-a.e. $x$. The densities $A_B$ on different balls agree on overlaps (the absolutely continuous part of $D^2 f$ on $B$ is the restriction to $B$ of the absolutely continuous part of $D^2 f$ on any larger ball $B'' \supseteq B$), so they patch into a single density $A \in L^1_{\mathrm{loc}}(U; \mathbb{R}^{n \times n}_{\mathrm{sym}})$. [/step] [step:Identify the full-measure set of good points $x_0$] Define the **set of good points** $G \subseteq U$ by \begin{align*} G := \{x_0 \in U: x_0 \text{ satisfies all of (G1)-(G4) below}\}, \end{align*} where the conditions are: - (G1) $f$ is classically differentiable at $x_0$; - (G2) $x_0$ is a Lebesgue point of $\nabla f$ in the $L^1$ sense; - (G3) $x_0$ is a Lebesgue point of the global density $A \in L^1_{\mathrm{loc}}(U; \mathbb{R}^{n \times n}_{\mathrm{sym}})$, with Lebesgue value $A(x_0)$; - (G4) the singular part $D^2 f_{\mathrm{s}}$ has zero Lebesgue density at $x_0$ (a property holding $\mathcal{L}^n$-a.e. by the Lebesgue–Besicovitch differentiation theorem for Radon measures). We claim $\mathcal{L}^n(U \setminus G) = 0$. Indeed, (G1) holds on a full-measure set by Rademacher's theorem; (G2) holds on a full-measure set by the Lebesgue differentiation theorem applied to $\nabla f \in L^1_{\mathrm{loc}}(U)$; (G3) holds on a full-measure set by the Lebesgue differentiation theorem applied to $A \in L^1_{\mathrm{loc}}(U)$; (G4) holds by the Lebesgue–Besicovitch differentiation theorem for Radon measures (Evans–Gariepy, *Measure Theory and Fine Properties of Functions*, §1.6.2) applied to $D^2 f_{\mathrm{s}}$. Each is a countable intersection of full-measure sets, hence $G$ has full measure. For the rest of the proof we fix $x_0 \in G$, set $A(x_0)$ as the symmetric matrix in $\mathbb{R}^{n \times n}$ from (G3), and prove the Taylor expansion at $x_0$. [/step] [step:State the gradient density estimate at $x_0$] [claim:$L^1$-gradient density estimate at $x_0 \in G$. With $x_0 \in G$ and $A(x_0)$ as in Step 3, defining the **gradient deviation** \begin{align*} e: U &\to \mathbb{R}^n, \\ y &\mapsto \nabla f(y) - \nabla f(x_0) - A(x_0)(y - x_0), \end{align*} we have \begin{align*} \fint_{B(x_0, r)} |e(y)| \, d\mathcal{L}^n(y) = o(r) \qquad \text{as } r \to 0^+, \end{align*} i.e., the average gradient deviation on $B(x_0, r)$ from the affine model $y \mapsto \nabla f(x_0) + A(x_0)(y - x_0)$ is $o(r)$ as $r \to 0$.] [/claim] *Proof of claim.* This is the assertion that $\nabla f \in BV_{\mathrm{loc}}(U; \mathbb{R}^n)$ has $A(x_0)$ as its $L^1$-approximate derivative at $x_0$. Apply the BV $L^{1^*}$-differentiability theorem (Evans–Gariepy, *Measure Theory and Fine Properties of Functions*, §6.1.2, Theorem 4) componentwise to each scalar $\partial_{x_i} f \in BV_{\mathrm{loc}}(U)$ — its hypotheses are met because Step 2 established $\nabla f \in BV_{\mathrm{loc}}(U; \mathbb{R}^n)$, and the absolutely continuous part of $D(\partial_{x_i} f)$ has $\mathcal{L}^n$-density given by the $i$-th row of $A$ (Step 2), with the singular part vanishing in density at $x_0$ by (G4). Here $1^* = n/(n-1)$ is the Sobolev conjugate of $1$. The theorem yields, at $\mathcal{L}^n$-a.e. $x_0$ (in particular at every $x_0$ satisfying (G1)–(G4)), \begin{align*} \left(\fint_{B(x_0, r)} |e(y)|^{1^*} \, d\mathcal{L}^n(y)\right)^{1/1^*} = o(r) \qquad \text{as } r \to 0^+. \end{align*} By Jensen's inequality (since $1 < 1^*$, the function $s \mapsto s^{1^*}$ is convex on $[0, \infty)$), \begin{align*} \fint_{B(x_0, r)} |e| \, d\mathcal{L}^n \le \left(\fint_{B(x_0, r)} |e|^{1^*}\, d\mathcal{L}^n\right)^{1/1^*}, \end{align*} so the $L^1$-average bound $\fint_{B(x_0,r)}|e| = o(r)$ follows from the $L^{1^*}$ one. $\square$ [/step] [step:Pointwise Taylor expansion via the Alexandrov rigidity lemma] We now upgrade the $L^1$-mean control of the gradient deviation $e$ established in Step 4 to a pointwise quadratic Taylor expansion of $f$ at $x_0$. This upgrade is the heart of Alexandrov's theorem and is a standard, named result in geometric measure theory; we invoke it as a black box. [claim:Alexandrov rigidity lemma. Let $V \subseteq \mathbb{R}^n$ be open and convex, let $g: V \to \mathbb{R}$ be convex, and let $x_0 \in V$ be a point at which $g$ is classically differentiable. Suppose there exists a symmetric matrix $A \in \mathbb{R}^{n \times n}_{\mathrm{sym}}$ such that the gradient remainder satisfies the $L^1$-mean estimate \begin{align*} \fint_{B(x_0, r)}\bigl|\nabla g(y) - \nabla g(x_0) - A(y - x_0)\bigr|\, d\mathcal{L}^n(y) = o(r) \qquad \text{as } r \to 0^+. \end{align*} Then $g$ admits the pointwise quadratic Taylor expansion \begin{align*} g(y) = g(x_0) + \nabla g(x_0)\cdot(y - x_0) + \tfrac{1}{2}(y - x_0)^\top A\,(y - x_0) + o(|y - x_0|^2) \qquad \text{as } y \to x_0. \end{align*}] [/claim] *Reference for the proof of the claim.* This is the standard rigidity lemma underlying Alexandrov's theorem; see Federer, *Geometric Measure Theory* (Springer, 1969), Theorem 3.2.22 (Aleksandrov's theorem), and Evans–Gariepy, *Measure Theory and Fine Properties of Functions* (CRC Press, 2nd ed. 2015), §6.4 ("Aleksandrov's Theorem"). Both references prove this exact upgrade — that an $L^1$-mean estimate on the gradient remainder of a convex function, at a point of classical differentiability, implies the corresponding pointwise quadratic Taylor expansion — using maximal-function arguments combined with the radial-monotonicity properties of one-dimensional convex functions. The convexity of $g$ is essential; without it, the $L^1$-to-pointwise upgrade fails for $W^{1,1}$ functions in general. *Application to $f$ at $x_0 \in G$.* The hypotheses of the rigidity lemma hold with $V = U$, $g = f$, the point $x_0 \in G$, and the symmetric matrix $A = A(x_0) \in \mathbb{R}^{n \times n}_{\mathrm{sym}}$ from Step 3: - $U \subseteq \mathbb{R}^n$ is open and convex (global hypothesis). - $f: U \to \mathbb{R}$ is convex (global hypothesis). - $f$ is classically differentiable at $x_0$ by (G1). - The $L^1$-mean estimate $\fint_{B(x_0, r)} |\nabla f(y) - \nabla f(x_0) - A(x_0)(y - x_0)|\, d\mathcal{L}^n(y) = o(r)$ as $r \to 0^+$ holds by Step 4. Therefore the lemma yields the pointwise quadratic Taylor expansion at $x_0$: \begin{align*} f(y) = f(x_0) + \nabla f(x_0)\cdot(y - x_0) + \tfrac{1}{2}(y - x_0)^\top A(x_0) (y - x_0) + o(|y - x_0|^2) \qquad \text{as } y \to x_0. \end{align*} [/step] [step:Verify the Hessian is positive semidefinite] Set $A(x_0) := D^2 f(x_0)$, the symmetric matrix from Step 3. Fix $h \in \mathbb{R}^n$ with $|h| = 1$. By convexity of $f$, the one-dimensional function \begin{align*} \psi: [-\rho_0, \rho_0] &\to \mathbb{R}, \\ t &\mapsto f(x_0 + t h) \end{align*} is convex (where $\rho_0 > 0$ is small enough that $x_0 \pm \rho_0 h \in U$, which exists because $U$ is open). By the Taylor expansion (Step 5) applied along $\pm t h$: \begin{align*} \psi(t) + \psi(-t) &= f(x_0 + th) + f(x_0 - th) \\ &= 2 f(x_0) + (\nabla f(x_0) \cdot h - \nabla f(x_0) \cdot h) t + t^2 h^\top A(x_0) h + o(t^2) \\ &= 2 f(x_0) + t^2 h^\top A(x_0) h + o(t^2) \quad \text{as } t \to 0. \end{align*} By convexity of $\psi$ at $0$ with the symmetric pair $(t, -t)$, \begin{align*} \frac{\psi(t) + \psi(-t)}{2} \ge \psi(0) = f(x_0). \end{align*} Combining, \begin{align*} 2 f(x_0) + t^2 h^\top A(x_0) h + o(t^2) = \psi(t) + \psi(-t) \ge 2 f(x_0), \end{align*} hence \begin{align*} t^2 h^\top A(x_0) h + o(t^2) \ge 0. \end{align*} Dividing by $t^2 > 0$ and letting $t \to 0$ gives $h^\top A(x_0) h \ge 0$. Since $h$ was an arbitrary unit vector, $A(x_0) \ge 0$ (positive semidefinite). This completes the proof: at every point $x_0 \in G$ (a set of full $\mathcal{L}^n$-measure in $U$), $f$ admits the second-order Taylor expansion \begin{align*} f(x) = f(x_0) + \nabla f(x_0)\cdot(x - x_0) + \tfrac{1}{2}(x - x_0)^\top A(x_0)(x - x_0) + o(|x - x_0|^2) \end{align*} as $x \to x_0$, with $A(x_0) = D^2 f(x_0)$ symmetric and positive semidefinite. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Alexandrov's Theorem (Convex Functions) (Theorem # 3088)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Alexandrov's Theorem (Convex Functions) (Theorem # 3088)

Discussion

Proof

Explore Further