Jensen's Inequality for Finite Convex Combinations

Jensen's Inequality for Finite Convex Combinations (Theorem # 6671)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the finite [Jensen inequality](/theorems/515) by induction on the number of points. The case of one point is an equality. For the induction step, we isolate the last weight, normalize the remaining weights when their total mass is positive, and rewrite the full convex combination as a two-point convex combination. Convexity gives the [first inequality](/theorems/2897), and the induction hypothesis gives the bound for the normalized combination of the first points. [/proofplan] [step:Prove the one-point case directly] For $m=1$, the condition $\sum_{i=1}^1 \lambda_i = 1$ gives $\lambda_1 = 1$. Hence \begin{align*} f\left(\sum_{i=1}^1 \lambda_i x_i\right) = f(x_1) = \sum_{i=1}^1 \lambda_i f(x_i). \end{align*} Thus the desired inequality holds for $m=1$. [/step] [step:Assume the inequality for $m-1$ points and prepare the normalized weights] Let $m \in \mathbb{N}$ with $m \geq 2$, and assume the result holds for $m-1$ points. Let $x_1,\dots,x_m \in \mathbb{R}^n$ and let $\lambda_1,\dots,\lambda_m \in [0,\infty)$ satisfy \begin{align*} \sum_{i=1}^m \lambda_i = 1. \end{align*} Define the remaining total weight $s \in [0,1]$ by \begin{align*} s := \sum_{i=1}^{m-1} \lambda_i = 1 - \lambda_m. \end{align*} If $s=0$, then $\lambda_m=1$ and $\lambda_i=0$ for every $1 \leq i \leq m-1$. Therefore \begin{align*} f\left(\sum_{i=1}^m \lambda_i x_i\right) = f(x_m) = \sum_{i=1}^m \lambda_i f(x_i), \end{align*} where the terms with $\lambda_i=0$ vanish by the convention $0 \cdot (+\infty)=0$. Hence the desired inequality holds in this case. Assume now that $s>0$. For each $1 \leq i \leq m-1$, define the normalized weight $\mu_i \in [0,\infty)$ by \begin{align*} \mu_i := \frac{\lambda_i}{s}. \end{align*} Then \begin{align*} \sum_{i=1}^{m-1} \mu_i = \frac{1}{s}\sum_{i=1}^{m-1}\lambda_i = 1. \end{align*} Define the normalized convex-combination point $y \in \mathbb{R}^n$ by \begin{align*} y := \sum_{i=1}^{m-1} \mu_i x_i. \end{align*} Then \begin{align*} \sum_{i=1}^m \lambda_i x_i = s y + \lambda_m x_m. \end{align*} [/step] [step:Apply two-point convexity and then the induction hypothesis] Since $s \geq 0$, $\lambda_m \geq 0$, and $s+\lambda_m=1$, convexity of $f$ applied to the two points $y,x_m \in \mathbb{R}^n$ gives \begin{align*} f(sy+\lambda_m x_m) \leq s f(y) + \lambda_m f(x_m). \end{align*} By the induction hypothesis applied to the $m-1$ points $x_1,\dots,x_{m-1}$ with weights $\mu_1,\dots,\mu_{m-1}$, \begin{align*} f(y) = f\left(\sum_{i=1}^{m-1}\mu_i x_i\right) \leq \sum_{i=1}^{m-1}\mu_i f(x_i). \end{align*} Because $s>0$, multiplication by $s$ preserves the extended-real order, so \begin{align*} s f(y) \leq \sum_{i=1}^{m-1} s\mu_i f(x_i) = \sum_{i=1}^{m-1} \lambda_i f(x_i). \end{align*} Combining these inequalities with $\sum_{i=1}^m \lambda_i x_i = sy+\lambda_m x_m$ yields \begin{align*} f\left(\sum_{i=1}^m \lambda_i x_i\right) \leq \sum_{i=1}^{m-1} \lambda_i f(x_i) + \lambda_m f(x_m) = \sum_{i=1}^m \lambda_i f(x_i). \end{align*} [guided] The key move is to turn an $m$-point convex combination into a two-point convex combination, because convexity is only assumed for two points at a time. We have already defined \begin{align*} s := \sum_{i=1}^{m-1}\lambda_i = 1-\lambda_m. \end{align*} In the present case $s>0$, so the first $m-1$ weights can be normalized. For each $1 \leq i \leq m-1$, define \begin{align*} \mu_i := \frac{\lambda_i}{s}. \end{align*} These are nonnegative and satisfy \begin{align*} \sum_{i=1}^{m-1}\mu_i = \frac{1}{s}\sum_{i=1}^{m-1}\lambda_i = 1. \end{align*} Thus $\mu_1,\dots,\mu_{m-1}$ are legitimate weights for the induction hypothesis. Define the point $y \in \mathbb{R}^n$ by \begin{align*} y := \sum_{i=1}^{m-1}\mu_i x_i. \end{align*} This point packages the first $m-1$ points into one convex-combination point. Since $s\mu_i=\lambda_i$ for every $1 \leq i \leq m-1$, we have \begin{align*} sy+\lambda_m x_m = \sum_{i=1}^{m-1}s\mu_i x_i+\lambda_m x_m = \sum_{i=1}^m \lambda_i x_i. \end{align*} Now $s,\lambda_m \in [0,\infty)$ and $s+\lambda_m=1$, so convexity of $f$ applied to the two points $y$ and $x_m$ gives \begin{align*} f(sy+\lambda_m x_m) \leq s f(y)+\lambda_m f(x_m). \end{align*} The induction hypothesis applies to $x_1,\dots,x_{m-1}$ with weights $\mu_1,\dots,\mu_{m-1}$, and gives \begin{align*} f(y) = f\left(\sum_{i=1}^{m-1}\mu_i x_i\right) \leq \sum_{i=1}^{m-1}\mu_i f(x_i). \end{align*} Because $s>0$, multiplying this inequality by $s$ preserves order in $(-\infty,+\infty]$, so \begin{align*} s f(y) \leq \sum_{i=1}^{m-1}s\mu_i f(x_i) = \sum_{i=1}^{m-1}\lambda_i f(x_i). \end{align*} Substituting this estimate into the two-point convexity bound gives \begin{align*} f\left(\sum_{i=1}^m \lambda_i x_i\right) = f(sy+\lambda_m x_m) \leq \sum_{i=1}^{m-1}\lambda_i f(x_i)+\lambda_m f(x_m) = \sum_{i=1}^m \lambda_i f(x_i). \end{align*} This proves the induction step. [/guided] [/step] [step:Conclude the induction] The one-point case holds, and the preceding induction step proves that validity for $m-1$ points implies validity for $m$ points for every $m \geq 2$. By induction, the stated Jensen inequality holds for every $m \in \mathbb{N}$. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Jensen Inequality

Explore Further

Jensen Inequality Theorem #515 Similarity Invariance of Transfer Functions applied Conservation Laws for Central-Force Scattering applied Hamiltonian Cycle NP-Completeness Theorem applied Set Cover NP-Completeness Theorem applied Characterization of ZPP as RP Intersect coRP applied Feedback Invariance of Unreachable Eigenvalues applied Stability from Uniform Compact Near-Minimizers applied Completeness of Bounded Quantified Boolean Satisfiability applied

What brings you to Androma?

Start with a route through the knowledge graph.