Finite Jensen Inequality — Statement & Proof

Finite Jensen Inequality (Theorem # 7940)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the finite [Jensen inequality](/theorems/515) by induction on the number of points. The base case has only one coefficient and is forced by the condition that the coefficients sum to $1$. In the induction step, we separate the last coefficient from the others; if the remaining total mass is positive, we normalize the first coefficients and apply the induction hypothesis to the normalized convex combination. The two-point convexity of both the set $C$ and the function $f$ then combines that normalized point with the last point. [/proofplan] [step:Prove the one-point case] Let $x_1 \in C$ and let $\lambda_1 \in [0,\infty)$ satisfy \begin{align*} \lambda_1 = 1. \end{align*} Then \begin{align*} \sum_{i=1}^1 \lambda_i x_i = \lambda_1 x_1 = x_1 \in C. \end{align*} Therefore \begin{align*} f\left(\sum_{i=1}^1 \lambda_i x_i\right) = f(x_1) = \lambda_1 f(x_1) = \sum_{i=1}^1 \lambda_i f(x_i). \end{align*} This proves the assertion for $m=1$. [/step] [step:Normalize the first $m$ weights in the induction step] Fix an integer $m \ge 1$, and assume the theorem holds for this value of $m$. We prove it for $m+1$. Let $x_1,\ldots,x_m,x_{m+1} \in C$, and let $\lambda_1,\ldots,\lambda_m,\lambda_{m+1} \in [0,\infty)$ satisfy \begin{align*} \sum_{i=1}^{m+1} \lambda_i = 1. \end{align*} Define the remaining mass $\mu \in [0,1]$ by \begin{align*} \mu = \sum_{i=1}^m \lambda_i. \end{align*} Then \begin{align*} \mu + \lambda_{m+1} = 1. \end{align*} If $\mu = 0$, then $\lambda_i = 0$ for every $i \in \{1,\ldots,m\}$, because each $\lambda_i$ is nonnegative. Hence $\lambda_{m+1}=1$, and \begin{align*} \sum_{i=1}^{m+1} \lambda_i x_i = x_{m+1} \in C. \end{align*} Moreover, \begin{align*} f\left(\sum_{i=1}^{m+1} \lambda_i x_i\right) = f(x_{m+1}) = \sum_{i=1}^{m+1} \lambda_i f(x_i). \end{align*} Thus the desired conclusion holds in the case $\mu=0$. Assume now that $\mu>0$. For each $i \in \{1,\ldots,m\}$, define $\alpha_i \in [0,\infty)$ by \begin{align*} \alpha_i = \frac{\lambda_i}{\mu}. \end{align*} Then \begin{align*} \sum_{i=1}^m \alpha_i = \frac{1}{\mu}\sum_{i=1}^m \lambda_i = 1. \end{align*} Define $y \in V$ by \begin{align*} y = \sum_{i=1}^m \alpha_i x_i. \end{align*} By the induction hypothesis applied to the points $x_1,\ldots,x_m$ and the coefficients $\alpha_1,\ldots,\alpha_m$, we have $y \in C$ and \begin{align*} f(y) \le \sum_{i=1}^m \alpha_i f(x_i). \end{align*} [guided] We separate the last point $x_{m+1}$ and measure how much coefficient mass remains on the first $m$ points. Define \begin{align*} \mu = \sum_{i=1}^m \lambda_i. \end{align*} Since all coefficients are nonnegative and their total sum is $1$, we have $\mu \in [0,1]$ and \begin{align*} \mu + \lambda_{m+1} = 1. \end{align*} There are two cases. If $\mu=0$, then the sum of the nonnegative numbers $\lambda_1,\ldots,\lambda_m$ is zero, so each one is zero. The total sum condition then gives $\lambda_{m+1}=1$. Therefore the weighted point is just \begin{align*} \sum_{i=1}^{m+1} \lambda_i x_i = x_{m+1}. \end{align*} Since $x_{m+1}\in C$, the weighted point lies in $C$, and the Jensen inequality becomes equality: \begin{align*} f\left(\sum_{i=1}^{m+1} \lambda_i x_i\right) = f(x_{m+1}) = \sum_{i=1}^{m+1} \lambda_i f(x_i). \end{align*} Now suppose $\mu>0$. The reason for introducing $\mu$ is that the first $m$ coefficients need not sum to $1$, so we cannot apply the induction hypothesis to them directly. We normalize them. For each $i \in \{1,\ldots,m\}$, define \begin{align*} \alpha_i = \frac{\lambda_i}{\mu}. \end{align*} Each $\alpha_i$ is nonnegative, and the normalized coefficients sum to $1$: \begin{align*} \sum_{i=1}^m \alpha_i = \frac{1}{\mu}\sum_{i=1}^m \lambda_i = 1. \end{align*} Define the normalized weighted point $y \in V$ by \begin{align*} y = \sum_{i=1}^m \alpha_i x_i. \end{align*} The induction hypothesis applies because the points $x_1,\ldots,x_m$ lie in $C$ and the coefficients $\alpha_1,\ldots,\alpha_m$ are nonnegative with sum $1$. Hence $y \in C$, and \begin{align*} f(y) \le \sum_{i=1}^m \alpha_i f(x_i). \end{align*} This is the exact place where the induction hypothesis converts the first $m$-point part of the weighted sum into a single point $y \in C$ that can be combined with $x_{m+1}$ using the ordinary two-point convexity inequality. [/guided] [/step] [step:Apply two-point convexity to combine the normalized point with the last point] Continue in the case $\mu>0$. Define $z \in V$ by \begin{align*} z = \sum_{i=1}^{m+1} \lambda_i x_i. \end{align*} Using the definition of $y$ and the identity $\lambda_i=\mu\alpha_i$ for $i \in \{1,\ldots,m\}$, we compute \begin{align*} z = \mu y + \lambda_{m+1}x_{m+1}. \end{align*} Since $y \in C$, $x_{m+1}\in C$, and $\mu,\lambda_{m+1}\in[0,1]$ with $\mu+\lambda_{m+1}=1$, the stated two-point convexity property of $C$ gives $z\in C$. Applying the stated two-point convexity property of $f$ to the points $y,x_{m+1}\in C$ with coefficients $\mu$ and $\lambda_{m+1}$ gives \begin{align*} f(z) \le \mu f(y) + \lambda_{m+1}f(x_{m+1}). \end{align*} Using the induction estimate for $f(y)$ and the nonnegativity of $\mu$, we obtain \begin{align*} f(z) \le \mu \sum_{i=1}^m \alpha_i f(x_i) + \lambda_{m+1}f(x_{m+1}). \end{align*} Substituting $\mu\alpha_i=\lambda_i$ for each $i \in \{1,\ldots,m\}$ yields \begin{align*} f(z) \le \sum_{i=1}^m \lambda_i f(x_i) + \lambda_{m+1}f(x_{m+1}) = \sum_{i=1}^{m+1} \lambda_i f(x_i). \end{align*} Because $z=\sum_{i=1}^{m+1}\lambda_i x_i$, this is precisely the desired inequality for $m+1$. [/step] [step:Conclude by induction] The assertion holds for $m=1$, and the induction step proves that validity for $m$ implies validity for $m+1$. Therefore, by induction, the assertion holds for every integer $m\ge 1$. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Convex Function Definition Function Definition Set Definition Classification of Rank Two Reduced Crystallographic Root Systems Algebra Finite Generation of Zero Relations Algebra Cayley-Hamilton Theorem Linear Algebra Linear Independence Of Eigenvectors Linear Algebra Classification of One-Dimensional Representations of $S^1$ Representation Theory Invariance of Basis Cardinality Algebra Hilbert's Basis Theorem Algebra Galois Group of Cyclotomic Extensions Algebra Algebra Area

What brings you to Androma?

Start with a route through the knowledge graph.

Finite Jensen Inequality (Theorem # 7940)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Finite Jensen Inequality (Theorem # 7940)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further