[proofplan]
We prove Jensen's inequality in three steps. First, we show that the mean $m := \frac{1}{\mu(E)}\int_{E} g \, d\mu(\omega)$ lies in the interval $I$, so that $f(m)$ is well-defined. Second, we invoke the [Existence of Supporting Hyperplane](/theorems/7) at $m$ to produce an affine minorant $t \mapsto f(m) + r(t - m)$ lying below $f$ on all of $I$. Third, we substitute $t = g(\omega)$, integrate the resulting pointwise inequality against the normalized measure $\mu / \mu(E)$, and observe that the affine correction term vanishes by the definition of $m$, yielding the desired inequality.
[/proofplan]
[step:Show that the mean value $m$ lies in $I$ so that $f(m)$ is well-defined]
Since $0 < \mu(E) < \infty$, define the mean
\begin{align*}
m := \frac{1}{\mu(E)} \int_{E} g \, d\mu(\omega).
\end{align*}
Since $g: E \to I$ and $I$ is an interval, we have $g(\omega) \in I$ for every $\omega \in E$. Let $a \leq b$ (possibly infinite) be the endpoints of $I$. Then $a \leq g(\omega) \leq b$ for all $\omega \in E$. Integrating this inequality against the probability measure $\tilde{\mu} := \mu / \mu(E)$ on the [measure space](/pages/1251) $(E, \mathcal{E})$ and applying monotonicity of the integral gives
\begin{align*}
a = \int_{E} a \, d\tilde{\mu}(\omega) \leq \int_{E} g \, d\tilde{\mu}(\omega) = m \leq \int_{E} b \, d\tilde{\mu}(\omega) = b.
\end{align*}
Hence $m \in I$, and $f(m)$ is well-defined.
[guided]
The entire proof rests on evaluating $f$ at the point $m$, so we must first verify that $m \in I$. If $m$ fell outside $I$, the expression $f(m)$ would be meaningless and the inequality would be vacuous.
Define the normalized measure
\begin{align*}
\tilde{\mu} := \frac{\mu}{\mu(E)}.
\end{align*}
Since $\mu(E) \in (0, \infty)$ by hypothesis, $\tilde{\mu}$ is a well-defined probability measure on $(E, \mathcal{E})$, and
\begin{align*}
m = \int_{E} g \, d\tilde{\mu}(\omega).
\end{align*}
Why must $m$ lie in $I$? The key fact is that integration against a probability measure cannot move the result outside the convex hull of the values being integrated. Since $g(\omega) \in I$ for every $\omega \in E$ and $I$ is an interval (hence convex), the convex hull of $g(E)$ is contained in $I$. Concretely, let $a \leq b$ be the endpoints of $I$. Then $a \leq g(\omega) \leq b$ for all $\omega \in E$. Integrating this pair of inequalities against $\tilde{\mu}$ and using monotonicity of the integral gives
\begin{align*}
a = \int_{E} a \, d\tilde{\mu}(\omega) \leq \int_{E} g \, d\tilde{\mu}(\omega) = m \leq \int_{E} b \, d\tilde{\mu}(\omega) = b.
\end{align*}
Hence $m \in [a, b] \subset I$, and $f(m)$ is well-defined. This is the measure-theoretic version of the fact that a convex combination of points in a convex set remains in that set.
[/guided]
[/step]
[step:Obtain an affine minorant of $f$ at $m$ via the supporting hyperplane]
Since $f: I \to \mathbb{R}$ is convex and $m \in I$, the Existence of Supporting Hyperplane (theorem 7) guarantees the existence of a constant $r \in \mathbb{R}$ (a subgradient of $f$ at $m$) such that
\begin{align*}
f(t) \geq f(m) + r(t - m) \quad \text{for all } t \in I.
\end{align*}
This is the one-dimensional supporting hyperplane inequality: the graph of $f$ lies on or above the affine function $t \mapsto f(m) + r(t - m)$.
[guided]
The Existence of Supporting Hyperplane (theorem 7) is the central ingredient of the proof. What does it say? For a convex function $f: I \to \mathbb{R}$ and any point $m \in I$, there exists a real number $r$ such that the affine function $\ell(t) := f(m) + r(t - m)$ satisfies $f(t) \geq \ell(t)$ for all $t \in I$. Geometrically, the graph of $f$ lies on or above the line through $(m, f(m))$ with slope $r$.
Where does $r$ come from? At any interior point $m$ of $I$, the left and right derivatives $f'_{-}(m)$ and $f'_{+}(m)$ both exist (this is a standard property of convex functions on intervals) and satisfy $f'_{-}(m) \leq f'_{+}(m)$. Any value $r \in [f'_{-}(m), f'_{+}(m)]$ serves as a subgradient. If $m$ is a left endpoint of $I$ (say $m = a$ where $I = [a, b]$), the right derivative $f'_{+}(a)$ still exists, and the supporting inequality holds with $r = f'_{+}(a)$.
The hypothesis that $f$ is convex is consumed here: without convexity, a supporting affine minorant need not exist. The plan for the next step is to evaluate this pointwise inequality at $t = g(\omega)$ and integrate.
\begin{align*}
f(t) \geq f(m) + r(t - m) \quad \text{for all } t \in I.
\end{align*}
[/guided]
[/step]
[step:Substitute $t = g(\omega)$, integrate, and conclude by the definition of $m$]
For every $\omega \in E$, we have $g(\omega) \in I$. Substituting $t = g(\omega)$ into the supporting hyperplane inequality from the previous step gives
\begin{align*}
f(g(\omega)) \geq f(m) + r\bigl(g(\omega) - m\bigr) \quad \text{for all } \omega \in E.
\end{align*}
Both sides are $\mu$-[integrable](/pages/1152): the left-hand side $f \circ g$ is integrable by hypothesis, and the right-hand side $\omega \mapsto f(m) + r(g(\omega) - m)$ is an affine function of the integrable function $g$ with finite constants $f(m), r, m$, hence integrable. Integrating both sides against the normalized measure $\tilde{\mu} = \mu / \mu(E)$ and applying monotonicity of the integral (which preserves the direction of a pointwise inequality between integrable functions) gives
\begin{align*}
\frac{1}{\mu(E)} \int_{E} f(g(\omega)) \, d\mu(\omega) &\geq \frac{1}{\mu(E)} \int_{E} \bigl[f(m) + r(g(\omega) - m)\bigr] \, d\mu(\omega).
\end{align*}
By linearity of the integral, the right-hand side expands as
\begin{align*}
\frac{1}{\mu(E)} \int_{E} \bigl[f(m) + r(g(\omega) - m)\bigr] \, d\mu(\omega) &= f(m) \cdot \frac{\mu(E)}{\mu(E)} + r\!\left(\frac{1}{\mu(E)} \int_{E} g \, d\mu(\omega) - m\right) \\
&= f(m) + r(m - m) \\
&= f(m).
\end{align*}
In the second equality, we used the definition $m = \frac{1}{\mu(E)}\int_{E} g \, d\mu(\omega)$, which causes the correction term $r(m - m) = 0$ to vanish. Combining gives
\begin{align*}
f\!\left(\frac{1}{\mu(E)} \int_{E} g \, d\mu(\omega)\right) = f(m) \leq \frac{1}{\mu(E)} \int_{E} f \circ g \, d\mu(\omega),
\end{align*}
which is the desired Jensen's inequality.
[guided]
This is the step where the definition of $m$ pays off. We substitute $t = g(\omega)$ into the affine minorant inequality to obtain a pointwise bound valid for every $\omega \in E$:
\begin{align*}
f(g(\omega)) \geq f(m) + r\bigl(g(\omega) - m\bigr) \quad \text{for all } \omega \in E.
\end{align*}
Before integrating, we must verify that both sides are $\mu$-integrable so that monotonicity of the integral applies. The left-hand side $f \circ g$ is integrable by hypothesis. For the right-hand side, the function $\omega \mapsto f(m) + r(g(\omega) - m)$ is an affine function of $g(\omega)$. Since $g$ is $\mu$-integrable by hypothesis and $f(m)$, $r$, and $m$ are finite real constants, this affine combination is $\mu$-integrable. With both sides integrable, monotonicity of the integral preserves the pointwise inequality upon integration against $\tilde{\mu} = \mu / \mu(E)$:
\begin{align*}
\frac{1}{\mu(E)} \int_{E} f(g(\omega)) \, d\mu(\omega) \geq \frac{1}{\mu(E)} \int_{E} \bigl[f(m) + r(g(\omega) - m)\bigr] \, d\mu(\omega).
\end{align*}
Now we expand the right-hand side using linearity of the integral. The constant term $f(m)$ integrates to $f(m) \cdot \tilde{\mu}(E) = f(m) \cdot 1 = f(m)$. The linear term becomes $r \cdot \left(\frac{1}{\mu(E)} \int_{E} g \, d\mu(\omega) - m\right)$. Here is the critical cancellation: by the very definition of $m$ as the mean of $g$, we have $\frac{1}{\mu(E)} \int_{E} g \, d\mu(\omega) = m$, so the linear term equals $r \cdot (m - m) = 0$. This is not a coincidence — the mean $m$ is precisely the point at which the linear correction integrates to zero against the normalized measure.
Combining:
\begin{align*}
\frac{1}{\mu(E)} \int_{E} f(g(\omega)) \, d\mu(\omega) &\geq f(m) + r(m - m) = f(m) \\
&= f\!\left(\frac{1}{\mu(E)} \int_{E} g \, d\mu(\omega)\right).
\end{align*}
This completes the proof of Jensen's inequality for finite measure spaces.
[/guided]
[/step]