Jensen's Inequality — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The proof pivots on the supporting hyperplane theorem for convex functions: at the point $\mu := \mathbb{E}[Z] \in \mathbb{R}^d$, convexity guarantees the existence of a subgradient $c \in \mathbb{R}^d$ such that the affine function $x \mapsto f(\mu) + c \cdot (x - \mu)$ lies everywhere below $f$. Substituting the random variable $Z$ into this global inequality yields a pointwise assertion on $\Omega$. Taking expectations of both sides, the linear term $c \cdot (\mathbb{E}[Z] - \mu)$ vanishes by definition of $\mu$, and what remains is exactly $\mathbb{E}[f(Z)] \geq f(\mathbb{E}[Z])$. [/proofplan] [step:Apply the supporting hyperplane theorem at $\mu := \mathbb{E}[Z]$ to produce a global affine lower bound for $f$] Since $f: \mathbb{R}^d \to \mathbb{R}$ is convex, by [Convex Functions Are Locally Lipschitz](/theorems/3086) it is continuous at every point of $\mathbb{R}^d$, hence Borel measurable. Since $Z$ is $(\mathcal{F}, \mathcal{B}(\mathbb{R}^d))$-measurable, the composition $f \circ Z: \Omega \to \mathbb{R}$ is $(\mathcal{F}, \mathcal{B}(\mathbb{R}))$-measurable, confirming that $f(Z)$ is a well-defined real-valued random variable. Set $\mu := \mathbb{E}[Z] \in \mathbb{R}^d$, which is finite since $\mathbb{E}|Z| < \infty$ by hypothesis. Applying the [Existence of a Supporting Hyperplane for Convex Functions](/theorems/7) to $f$ at the point $\mu$, there exists a vector $c \in \mathbb{R}^d$ such that \begin{align*} f(x) \geq f(\mu) + c \cdot (x - \mu) \quad \text{for all } x \in \mathbb{R}^d, \end{align*} where $c \cdot (x - \mu) = \sum_{i=1}^d c_i(x_i - \mu_i)$ is the standard Euclidean inner product. [guided] We want a global lower bound for $f$ that can be integrated against $\mathbb{P}$. The central observation is that convexity is precisely the condition that guarantees such a bound exists: any convex function $f: \mathbb{R}^d \to \mathbb{R}$ admits a supporting affine minorant at every point. This is the content of the [Existence of a Supporting Hyperplane for Convex Functions](/theorems/7). Before applying the theorem, we verify that $f(Z)$ is a well-defined random variable so that expectations of it are meaningful. By [Convex Functions Are Locally Lipschitz](/theorems/3086), $f$ is locally Lipschitz on $\mathbb{R}^d$, hence continuous everywhere. A continuous map between metric spaces sends open sets to Borel sets (and preimages of open sets are open), so $f$ is $(\mathcal{B}(\mathbb{R}^d), \mathcal{B}(\mathbb{R}))$-measurable. Since $Z: \Omega \to \mathbb{R}^d$ is $(\mathcal{F}, \mathcal{B}(\mathbb{R}^d))$-measurable, the composition $f \circ Z: \Omega \to \mathbb{R}$ is $(\mathcal{F}, \mathcal{B}(\mathbb{R}))$-measurable. Hence $f(Z)$ is a well-defined real-valued random variable on $(\Omega, \mathcal{F}, \mathbb{P})$, and $\mathbb{E}[f(Z)]$ is meaningful. We set $\mu := \mathbb{E}[Z]$. Since $\mathbb{E}|Z| < \infty$ by hypothesis, each component satisfies $\mathbb{E}|Z_i| \leq \mathbb{E}|Z| < \infty$, so $\mu_i = \mathbb{E}[Z_i] \in \mathbb{R}$ for each $i = 1, \dots, d$, giving $\mu \in \mathbb{R}^d$. We now apply the [Existence of a Supporting Hyperplane for Convex Functions](/theorems/7) at $\mu$. That theorem requires: (i) $f$ is convex — given; (ii) $\mu \in \mathbb{R}^d$ — verified above. The conclusion supplies a subgradient $c \in \mathbb{R}^d$ such that \begin{align*} f(x) \geq f(\mu) + c \cdot (x - \mu) \quad \text{for all } x \in \mathbb{R}^d. \end{align*} The affine function $x \mapsto f(\mu) + c \cdot (x - \mu)$ is a global lower bound for $f$ that achieves equality at $x = \mu$. It is this global bound that carries the geometric content of convexity into an analytic inequality between expectations. [/guided] [/step] [step:Substitute $Z$ into the affine lower bound, integrate both sides, and identify $c \cdot (\mathbb{E}[Z] - \mu) = 0$ to conclude] Substituting $x = Z(\omega)$ into the supporting hyperplane inequality — which holds for every $x \in \mathbb{R}^d$ — gives the pointwise inequality \begin{align*} f(Z(\omega)) \geq f(\mu) + c \cdot (Z(\omega) - \mu) \quad \text{for every } \omega \in \Omega. \end{align*} The right-hand side belongs to $L^1(\Omega, \mathcal{F}, \mathbb{P})$: since $|c \cdot Z| \leq |c||Z|$ by the Cauchy-Schwarz inequality and $|c| \in [0, \infty)$ is a finite constant, \begin{align*} \mathbb{E}|c \cdot Z| \leq |c|\,\mathbb{E}|Z| < \infty, \end{align*} where the last inequality uses $\mathbb{E}|Z| < \infty$ by hypothesis; the constants $f(\mu)$ and $c \cdot \mu$ are finite. The left-hand side satisfies $\mathbb{E}|f(Z)| < \infty$ by hypothesis. By monotonicity and linearity of expectation ([Properties of Expectation](/theorems/1117)): \begin{align*} \mathbb{E}[f(Z)] \geq \mathbb{E}\bigl[f(\mu) + c \cdot (Z - \mu)\bigr] = f(\mu) + c \cdot \bigl(\mathbb{E}[Z] - \mu\bigr) = f(\mu) + c \cdot (\mu - \mu) = f(\mu) = f(\mathbb{E}[Z]). \end{align*} This completes the proof. [guided] The supporting hyperplane inequality $f(x) \geq f(\mu) + c \cdot (x - \mu)$ holds for every $x \in \mathbb{R}^d$. Since $Z(\omega) \in \mathbb{R}^d$ for every $\omega \in \Omega$, substituting $x = Z(\omega)$ is valid at each $\omega$: \begin{align*} f(Z(\omega)) \geq f(\mu) + c \cdot (Z(\omega) - \mu) \quad \text{for every } \omega \in \Omega. \end{align*} This is a deterministic inequality holding at every point of $\Omega$, not merely $\mathbb{P}$-almost surely. We wish to integrate both sides over $(\Omega, \mathcal{F}, \mathbb{P})$. To apply monotonicity of expectation from [Properties of Expectation](/theorems/1117), both sides must be in $L^1(\Omega, \mathcal{F}, \mathbb{P})$. We verify this for each side in turn. **Left side.** By hypothesis, $\mathbb{E}|f(Z)| < \infty$, so $f(Z) \in L^1(\Omega, \mathcal{F}, \mathbb{P})$. **Right side.** We write $f(\mu) + c \cdot (Z - \mu) = f(\mu) + c \cdot Z - c \cdot \mu$. Here $f(\mu) \in \mathbb{R}$ (finite, since $\mu \in \mathbb{R}^d$) and $c \cdot \mu = \sum_{i=1}^d c_i \mu_i \in \mathbb{R}$ (finite, since $c, \mu \in \mathbb{R}^d$) are constants. For the term $c \cdot Z = \sum_{i=1}^d c_i Z_i$, the Cauchy-Schwarz inequality for the Euclidean inner product gives $|c \cdot Z| \leq |c||Z|$. Since $|c| \in [0, \infty)$ is a finite constant, \begin{align*} \mathbb{E}|c \cdot Z| \leq |c|\,\mathbb{E}|Z| < \infty, \end{align*} where the final inequality uses $\mathbb{E}|Z| < \infty$ by hypothesis. Hence $c \cdot Z \in L^1(\Omega, \mathcal{F}, \mathbb{P})$, and the entire right-hand side is integrable. Both sides now confirmed to be in $L^1(\Omega, \mathcal{F}, \mathbb{P})$. Since $f(Z(\omega)) \geq f(\mu) + c \cdot (Z(\omega) - \mu)$ holds for every $\omega \in \Omega$, monotonicity of expectation from [Properties of Expectation](/theorems/1117) gives \begin{align*} \mathbb{E}[f(Z)] \geq \mathbb{E}\bigl[f(\mu) + c \cdot (Z - \mu)\bigr]. \end{align*} By linearity of expectation ([Properties of Expectation](/theorems/1117)), distributing over the integrable summands: \begin{align*} \mathbb{E}\bigl[f(\mu) + c \cdot (Z - \mu)\bigr] = f(\mu) + c \cdot \bigl(\mathbb{E}[Z] - \mu\bigr). \end{align*} Linearity applies to $c \cdot Z = \sum_{i=1}^d c_i Z_i$ because each $Z_i \in L^1(\Omega, \mathcal{F}, \mathbb{P})$ (following from $\mathbb{E}|Z_i| \leq \mathbb{E}|Z| < \infty$), and $\mathbb{E}[c_i Z_i] = c_i \mathbb{E}[Z_i]$ for each $i$. By definition of $\mu$, we have $\mathbb{E}[Z] = \mu$, so $\mathbb{E}[Z] - \mu = 0 \in \mathbb{R}^d$ and therefore $c \cdot (\mathbb{E}[Z] - \mu) = 0$. The chain of inequalities gives \begin{align*} \mathbb{E}[f(Z)] \geq f(\mu) + 0 = f(\mu) = f(\mathbb{E}[Z]), \end{align*} which is Jensen's inequality. [/guided] [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Jensen's Inequality (Theorem # 1977)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Jensen's Inequality (Theorem # 1977)

Discussion

Proof

Explore Further