Jensen's Inequality — Statement & Proof

Jensen's Inequality (Theorem # 9)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The argument proceeds in three stages. First, we show that $\mathbb{E}[X]$ lies in the interval $I$, so that $f(\mathbb{E}[X])$ is well-defined. Second, we invoke the [Existence of a Supporting Hyperplane](/theorems/7) at the point $\mathbb{E}[X]$ to obtain an affine minorant $f(y) \geq f(\mathbb{E}[X]) + r(y - \mathbb{E}[X])$ for all $y \in I$, and use this affine lower bound to control the negative part of $f(X)$, guaranteeing that $\mathbb{E}[f(X)]$ is well-defined in $(-\infty, +\infty]$. Third, we substitute $y = X(\omega)$ into the supporting hyperplane inequality, integrate both sides against $\mathbb{P}$, and use linearity of expectation together with the cancellation $\mathbb{E}[X] - \mathbb{E}[X] = 0$ to conclude $f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]$. [/proofplan] [step:Show that $\mathbb{E}[X]$ lies in the interval $I$] Since $X: \Omega \to I$, every value of $X$ lies in $I$. Write $a_0 = \inf I$ and $b_0 = \sup I$ (where either bound may be $\pm \infty$). For every $\omega \in \Omega$, the inequality $a_0 \leq X(\omega) \leq b_0$ holds. Integrating against $\mathbb{P}$ and using monotonicity of the [Lebesgue Integral](/pages/1152) together with $\mathbb{P}(\Omega) = 1$ gives \begin{align*} a_0 = a_0 \cdot \mathbb{P}(\Omega) \leq \mathbb{E}[X] \leq b_0 \cdot \mathbb{P}(\Omega) = b_0. \end{align*} Since $I$ is an interval and $a_0 \leq \mathbb{E}[X] \leq b_0$, every point between $a_0$ and $b_0$ lies in $I$. In particular, $\mathbb{E}[X] \in I$, so $f(\mathbb{E}[X])$ is defined. [guided] Why must we check this at all? The function $f: I \to \mathbb{R}$ is defined only on $I$, so we must verify that $\mathbb{E}[X]$ lies in the domain of $f$ before writing $f(\mathbb{E}[X])$. Geometrically, $\mathbb{E}[X]$ is a weighted average of values in $I$, and an interval is a convex set, so any convex combination of its elements remains in $I$. The integral $\mathbb{E}[X] = \int_\Omega X \, d\mathbb{P}$ is precisely such a convex combination (with $\mathbb{P}$ as the weighting measure of total mass $1$). To make this precise, write $a_0 = \inf I$ and $b_0 = \sup I$. Since $X(\omega) \in I$ for every $\omega \in \Omega$, we have $a_0 \leq X(\omega) \leq b_0$ pointwise. Integrating both sides of this inequality against $\mathbb{P}$ and using that $\mathbb{P}(\Omega) = 1$: \begin{align*} a_0 = a_0 \cdot \mathbb{P}(\Omega) \leq \int_\Omega X \, d\mathbb{P} = \mathbb{E}[X] \leq b_0 \cdot \mathbb{P}(\Omega) = b_0. \end{align*} Since $I$ is an interval containing all points between $a_0$ and $b_0$, this forces $\mathbb{E}[X] \in I$, so $f(\mathbb{E}[X])$ is defined. Note that if $I$ is open — say $I = (a_0, b_0)$ — the strict inequalities $a_0 < X(\omega) < b_0$ do not automatically upgrade to strict inequalities for $\mathbb{E}[X]$ unless $X$ is non-degenerate. However, for the purpose of evaluating $f(\mathbb{E}[X])$, membership in $I$ suffices, and this follows because $I$ contains the convex hull of the range of $X$. [/guided] [/step] [step:Construct an affine minorant of $f$ at $\mathbb{E}[X]$ and show $\mathbb{E}[f(X)]$ is well-defined] Set $m := \mathbb{E}[X] \in I$. Since $f: I \to \mathbb{R}$ is convex, the [Existence of a Supporting Hyperplane](/theorems/7) (applied in dimension $n = 1$) provides a constant $r \in \mathbb{R}$ such that \begin{align*} f(y) \geq f(m) + r(y - m) \quad \text{for all } y \in I. \end{align*} Define the affine function \begin{align*} g: I &\to \mathbb{R} \\ y &\mapsto ry + (f(m) - rm). \end{align*} Substituting $y = X(\omega)$ gives the pointwise bound $f(X(\omega)) \geq g(X(\omega))$ for every $\omega \in \Omega$. To show that $\mathbb{E}[f(X)]$ is well-defined (i.e., not of the indeterminate form $\infty - \infty$), it suffices to show that the negative part $(f(X))^{-} := \max\{-f(X),\, 0\}$ is integrable. Since $f(X) \geq g(X)$ pointwise: \begin{align*} (f(X))^{-} = \max\{-f(X),\, 0\} \leq \max\{-g(X),\, 0\} = (g(X))^{-} \leq |g(X)| \leq |r| \cdot |X| + |f(m) - rm|. \end{align*} The right-hand side is integrable: $X$ is integrable by hypothesis (so $\mathbb{E}[|X|] < \infty$) and $|f(m) - rm|$ is a finite constant. Therefore $(f(X))^{-}$ is integrable, and $\mathbb{E}[f(X)]$ is well-defined in $(-\infty, +\infty]$. [guided] The subtlety here is that $f(X)$ may take arbitrarily large positive values — for instance, if $f$ grows quadratically and $X$ has heavy tails — so $\mathbb{E}[f(X)]$ could equal $+\infty$. That is acceptable: the inequality $f(\mathbb{E}[X]) \leq +\infty$ is vacuously true. What would be problematic is if $f(X)$ were also unbounded below, making the integral $\int_\Omega f(X) \, d\mathbb{P}$ of the indeterminate form $\infty - \infty$. The supporting hyperplane eliminates this possibility. Set $m := \mathbb{E}[X] \in I$. Since $f: I \to \mathbb{R}$ is convex, the [Existence of a Supporting Hyperplane](/theorems/7) (applied in one dimension) provides a slope $r \in \mathbb{R}$ such that \begin{align*} f(y) \geq f(m) + r(y - m) \quad \text{for all } y \in I. \end{align*} This is the key property of convex functions: they lie above every supporting line. Define the affine minorant \begin{align*} g: I &\to \mathbb{R} \\ y &\mapsto ry + (f(m) - rm). \end{align*} Substituting $y = X(\omega)$ for an arbitrary $\omega \in \Omega$, we obtain the pointwise lower bound $f(X(\omega)) \geq g(X(\omega))$. In particular, wherever $f(X)$ is negative, $g(X)$ is at least as negative, so \begin{align*} (f(X))^{-} = \max\{-f(X),\, 0\} \leq \max\{-g(X),\, 0\} = (g(X))^{-} \leq |g(X)| \leq |r| \cdot |X| + |f(m) - rm|. \end{align*} The right-hand side is integrable because $X$ is integrable by hypothesis ($\mathbb{E}[|X|] < \infty$) and $|f(m) - rm|$ is a finite constant (since $m \in I$ and $f(m) \in \mathbb{R}$). This is precisely the step where the integrability hypothesis on $X$ is consumed: without $\mathbb{E}[|X|] < \infty$, even the affine bound $g(X) = rX + c$ would fail to be integrable, and $\mathbb{E}[f(X)]$ could be undefined. Since $(f(X))^{-}$ is integrable, the extended expectation $\mathbb{E}[f(X)] = \mathbb{E}[(f(X))^{+}] - \mathbb{E}[(f(X))^{-}]$ is well-defined in $(-\infty, +\infty]$. [/guided] [/step] [step:Integrate the supporting hyperplane inequality to conclude $f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]$] From the previous step, the pointwise inequality \begin{align*} f(X(\omega)) \geq f(m) + r(X(\omega) - m) \end{align*} holds for all $\omega \in \Omega$, where $m = \mathbb{E}[X]$ and $r \in \mathbb{R}$ is the supporting slope from the [Existence of a Supporting Hyperplane](/theorems/7). Since $(f(X))^{-}$ is integrable, the expectation $\mathbb{E}[f(X)]$ is well-defined in $(-\infty, +\infty]$. The right-hand side $f(m) + r(X - m)$ is integrable (it is an affine function of the integrable random variable $X$). Applying the monotonicity of the Lebesgue integral to the pointwise inequality gives \begin{align*} \mathbb{E}[f(X)] &\geq \mathbb{E}\big[f(m) + r(X - m)\big] \\ &= f(m) + r\,(\mathbb{E}[X] - m) \\ &= f(m) + r\,(m - m) \\ &= f(m) \\ &= f(\mathbb{E}[X]). \end{align*} The second equality uses linearity of expectation: $\mathbb{E}[f(m) + r(X - m)] = f(m) + r(\mathbb{E}[X] - m)$, since $f(m)$ and $rm$ are finite constants and $\mathbb{E}$ is linear on integrable random variables. The third equality uses the definition $m = \mathbb{E}[X]$, which forces the correction term $\mathbb{E}[X] - m$ to vanish. This completes the proof that $f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]$. [guided] We now integrate the supporting hyperplane inequality to extract the Jensen bound. Recall from the previous step: for all $\omega \in \Omega$, \begin{align*} f(X(\omega)) \geq f(m) + r(X(\omega) - m), \end{align*} where $m = \mathbb{E}[X]$ and $r \in \mathbb{R}$ is the supporting slope from the [Existence of a Supporting Hyperplane](/theorems/7). The right-hand side is the affine function $g(X(\omega)) = rX(\omega) + (f(m) - rm)$, which is integrable because $X$ is integrable and $f(m) - rm$ is a finite constant. Since $(f(X))^{-}$ is integrable (established in the previous step), the monotonicity of the Lebesgue integral applies: for two [measurable functions](/pages/1234) $h_1 \geq h_2$ with $h_2$ integrable, we have $\int h_1 \, d\mathbb{P} \geq \int h_2 \, d\mathbb{P}$ (with the left side possibly $+\infty$). Applying this with $h_1 = f(X)$ and $h_2 = g(X)$: \begin{align*} \mathbb{E}[f(X)] &\geq \mathbb{E}\big[f(m) + r(X - m)\big]. \end{align*} Now we expand the right-hand side using linearity of expectation. Since $f(m)$ and $rm$ are finite constants: \begin{align*} \mathbb{E}\big[f(m) + r(X - m)\big] &= \mathbb{E}[f(m)] + r\,\mathbb{E}[X] - r\,\mathbb{E}[m] \\ &= f(m) + r\,\mathbb{E}[X] - rm \\ &= f(m) + r\,(m - m) \\ &= f(m) = f(\mathbb{E}[X]). \end{align*} The heart of the argument is the cancellation $\mathbb{E}[X] - m = 0$ in the third line. This is not a coincidence — it is precisely *why* we placed the supporting hyperplane at $m = \mathbb{E}[X]$ rather than at any other point of $I$. At any other point $x_0 \in I$, the supporting hyperplane would yield the valid lower bound $\mathbb{E}[f(X)] \geq f(x_0) + r_{x_0}(\mathbb{E}[X] - x_0)$, but the right-hand side equals $f(x_0) + r_{x_0}(\mathbb{E}[X] - x_0)$, which is generally *not* equal to $f(\mathbb{E}[X])$. The choice $x_0 = m = \mathbb{E}[X]$ is the unique point at which the affine correction term vanishes upon taking expectations, converting the supporting hyperplane inequality into the Jensen bound $f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]$. [/guided] [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Existence of a Supporting Hyperplane for Convex Functions

Definitions & Concepts

Explore Further

Function Definition Integral Definition Existence of a Supporting Hyperplane for Convex Functions Theorem #7 Donsker's Invariance Principle Brownian Motion Iteration of PGFs in Branching Processes Probability Theory Population Least Squares Projection Probability & Statistics Unbiasedness of the Ordinary Least Squares Estimator Under Exogeneity Probability & Statistics Logistic Regression Log-Likelihood, Score, and Observed Hessian Probability & Statistics Integration with Respect to a Poisson Random Measure Poisson Processes Asymptotic Normality of Random Design Ordinary Least Squares Probability & Statistics Memoryless Property of the Exponential Probability Theory Probability & Statistics Area Probability Theory Subarea

What brings you to Androma?

Start with a route through the knowledge graph.

Jensen's Inequality (Theorem # 9)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Jensen's Inequality (Theorem # 9)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further