[proofplan]
The argument proceeds in three stages. First, we show that $\mathbb{E}[X]$ lies in the interval $I$, so that $f(\mathbb{E}[X])$ is well-defined. Second, we invoke the [Existence of a Supporting Hyperplane](/theorems/7) at the point $\mathbb{E}[X]$ to obtain an affine minorant $f(y) \geq f(\mathbb{E}[X]) + r(y - \mathbb{E}[X])$ for all $y \in I$, and use this affine lower bound to control the negative part of $f(X)$, guaranteeing that $\mathbb{E}[f(X)]$ is well-defined in $(-\infty, +\infty]$. Third, we substitute $y = X(\omega)$ into the supporting hyperplane inequality, integrate both sides against $\mathbb{P}$, and use linearity of expectation together with the cancellation $\mathbb{E}[X] - \mathbb{E}[X] = 0$ to conclude $f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]$.
[/proofplan]
[step:Show that $\mathbb{E}[X]$ lies in the interval $I$]
Since $X: \Omega \to I$, every value of $X$ lies in $I$. Write $a_0 = \inf I$ and $b_0 = \sup I$ (where either bound may be $\pm \infty$). For every $\omega \in \Omega$, the inequality $a_0 \leq X(\omega) \leq b_0$ holds. Integrating against $\mathbb{P}$ and using monotonicity of the [Lebesgue Integral](/pages/1152) together with $\mathbb{P}(\Omega) = 1$ gives
\begin{align*}
a_0 = a_0 \cdot \mathbb{P}(\Omega) \leq \mathbb{E}[X] \leq b_0 \cdot \mathbb{P}(\Omega) = b_0.
\end{align*}
Since $I$ is an interval and $a_0 \leq \mathbb{E}[X] \leq b_0$, every point between $a_0$ and $b_0$ lies in $I$. In particular, $\mathbb{E}[X] \in I$, so $f(\mathbb{E}[X])$ is defined.
[guided]
Why must we check this at all? The function $f: I \to \mathbb{R}$ is defined only on $I$, so we must verify that $\mathbb{E}[X]$ lies in the domain of $f$ before writing $f(\mathbb{E}[X])$. Geometrically, $\mathbb{E}[X]$ is a weighted average of values in $I$, and an interval is a convex set, so any convex combination of its elements remains in $I$. The integral $\mathbb{E}[X] = \int_\Omega X \, d\mathbb{P}$ is precisely such a convex combination (with $\mathbb{P}$ as the weighting measure of total mass $1$).
To make this precise, write $a_0 = \inf I$ and $b_0 = \sup I$. Since $X(\omega) \in I$ for every $\omega \in \Omega$, we have $a_0 \leq X(\omega) \leq b_0$ pointwise. Integrating both sides of this inequality against $\mathbb{P}$ and using that $\mathbb{P}(\Omega) = 1$:
\begin{align*}
a_0 = a_0 \cdot \mathbb{P}(\Omega) \leq \int_\Omega X \, d\mathbb{P} = \mathbb{E}[X] \leq b_0 \cdot \mathbb{P}(\Omega) = b_0.
\end{align*}
Since $I$ is an interval containing all points between $a_0$ and $b_0$, this forces $\mathbb{E}[X] \in I$, so $f(\mathbb{E}[X])$ is defined.
Note that if $I$ is open — say $I = (a_0, b_0)$ — the strict inequalities $a_0 < X(\omega) < b_0$ do not automatically upgrade to strict inequalities for $\mathbb{E}[X]$ unless $X$ is non-degenerate. However, for the purpose of evaluating $f(\mathbb{E}[X])$, membership in $I$ suffices, and this follows because $I$ contains the convex hull of the range of $X$.
[/guided]
[/step]
[step:Construct an affine minorant of $f$ at $\mathbb{E}[X]$ and show $\mathbb{E}[f(X)]$ is well-defined]
Set $m := \mathbb{E}[X] \in I$. Since $f: I \to \mathbb{R}$ is convex, the [Existence of a Supporting Hyperplane](/theorems/7) (applied in dimension $n = 1$) provides a constant $r \in \mathbb{R}$ such that
\begin{align*}
f(y) \geq f(m) + r(y - m) \quad \text{for all } y \in I.
\end{align*}
Define the affine function
\begin{align*}
g: I &\to \mathbb{R} \\
y &\mapsto ry + (f(m) - rm).
\end{align*}
Substituting $y = X(\omega)$ gives the pointwise bound $f(X(\omega)) \geq g(X(\omega))$ for every $\omega \in \Omega$. To show that $\mathbb{E}[f(X)]$ is well-defined (i.e., not of the indeterminate form $\infty - \infty$), it suffices to show that the negative part $(f(X))^{-} := \max\{-f(X),\, 0\}$ is integrable. Since $f(X) \geq g(X)$ pointwise:
\begin{align*}
(f(X))^{-} = \max\{-f(X),\, 0\} \leq \max\{-g(X),\, 0\} = (g(X))^{-} \leq |g(X)| \leq |r| \cdot |X| + |f(m) - rm|.
\end{align*}
The right-hand side is integrable: $X$ is integrable by hypothesis (so $\mathbb{E}[|X|] < \infty$) and $|f(m) - rm|$ is a finite constant. Therefore $(f(X))^{-}$ is integrable, and $\mathbb{E}[f(X)]$ is well-defined in $(-\infty, +\infty]$.
[guided]
The subtlety here is that $f(X)$ may take arbitrarily large positive values — for instance, if $f$ grows quadratically and $X$ has heavy tails — so $\mathbb{E}[f(X)]$ could equal $+\infty$. That is acceptable: the inequality $f(\mathbb{E}[X]) \leq +\infty$ is vacuously true. What would be problematic is if $f(X)$ were also unbounded below, making the integral $\int_\Omega f(X) \, d\mathbb{P}$ of the indeterminate form $\infty - \infty$.
The supporting hyperplane eliminates this possibility. Set $m := \mathbb{E}[X] \in I$. Since $f: I \to \mathbb{R}$ is convex, the [Existence of a Supporting Hyperplane](/theorems/7) (applied in one dimension) provides a slope $r \in \mathbb{R}$ such that
\begin{align*}
f(y) \geq f(m) + r(y - m) \quad \text{for all } y \in I.
\end{align*}
This is the key property of convex functions: they lie above every supporting line. Define the affine minorant
\begin{align*}
g: I &\to \mathbb{R} \\
y &\mapsto ry + (f(m) - rm).
\end{align*}
Substituting $y = X(\omega)$ for an arbitrary $\omega \in \Omega$, we obtain the pointwise lower bound $f(X(\omega)) \geq g(X(\omega))$. In particular, wherever $f(X)$ is negative, $g(X)$ is at least as negative, so
\begin{align*}
(f(X))^{-} = \max\{-f(X),\, 0\} \leq \max\{-g(X),\, 0\} = (g(X))^{-} \leq |g(X)| \leq |r| \cdot |X| + |f(m) - rm|.
\end{align*}
The right-hand side is integrable because $X$ is integrable by hypothesis ($\mathbb{E}[|X|] < \infty$) and $|f(m) - rm|$ is a finite constant (since $m \in I$ and $f(m) \in \mathbb{R}$). This is precisely the step where the integrability hypothesis on $X$ is consumed: without $\mathbb{E}[|X|] < \infty$, even the affine bound $g(X) = rX + c$ would fail to be integrable, and $\mathbb{E}[f(X)]$ could be undefined.
Since $(f(X))^{-}$ is integrable, the extended expectation $\mathbb{E}[f(X)] = \mathbb{E}[(f(X))^{+}] - \mathbb{E}[(f(X))^{-}]$ is well-defined in $(-\infty, +\infty]$.
[/guided]
[/step]
[step:Integrate the supporting hyperplane inequality to conclude $f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]$]
From the previous step, the pointwise inequality
\begin{align*}
f(X(\omega)) \geq f(m) + r(X(\omega) - m)
\end{align*}
holds for all $\omega \in \Omega$, where $m = \mathbb{E}[X]$ and $r \in \mathbb{R}$ is the supporting slope from the [Existence of a Supporting Hyperplane](/theorems/7). Since $(f(X))^{-}$ is integrable, the expectation $\mathbb{E}[f(X)]$ is well-defined in $(-\infty, +\infty]$. The right-hand side $f(m) + r(X - m)$ is integrable (it is an affine function of the integrable random variable $X$). Applying the monotonicity of the Lebesgue integral to the pointwise inequality gives
\begin{align*}
\mathbb{E}[f(X)] &\geq \mathbb{E}\big[f(m) + r(X - m)\big] \\
&= f(m) + r\,(\mathbb{E}[X] - m) \\
&= f(m) + r\,(m - m) \\
&= f(m) \\
&= f(\mathbb{E}[X]).
\end{align*}
The second equality uses linearity of expectation: $\mathbb{E}[f(m) + r(X - m)] = f(m) + r(\mathbb{E}[X] - m)$, since $f(m)$ and $rm$ are finite constants and $\mathbb{E}$ is linear on integrable random variables. The third equality uses the definition $m = \mathbb{E}[X]$, which forces the correction term $\mathbb{E}[X] - m$ to vanish. This completes the proof that $f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]$.
[guided]
We now integrate the supporting hyperplane inequality to extract the Jensen bound. Recall from the previous step: for all $\omega \in \Omega$,
\begin{align*}
f(X(\omega)) \geq f(m) + r(X(\omega) - m),
\end{align*}
where $m = \mathbb{E}[X]$ and $r \in \mathbb{R}$ is the supporting slope from the [Existence of a Supporting Hyperplane](/theorems/7). The right-hand side is the affine function $g(X(\omega)) = rX(\omega) + (f(m) - rm)$, which is integrable because $X$ is integrable and $f(m) - rm$ is a finite constant. Since $(f(X))^{-}$ is integrable (established in the previous step), the monotonicity of the Lebesgue integral applies: for two [measurable functions](/pages/1234) $h_1 \geq h_2$ with $h_2$ integrable, we have $\int h_1 \, d\mathbb{P} \geq \int h_2 \, d\mathbb{P}$ (with the left side possibly $+\infty$). Applying this with $h_1 = f(X)$ and $h_2 = g(X)$:
\begin{align*}
\mathbb{E}[f(X)] &\geq \mathbb{E}\big[f(m) + r(X - m)\big].
\end{align*}
Now we expand the right-hand side using linearity of expectation. Since $f(m)$ and $rm$ are finite constants:
\begin{align*}
\mathbb{E}\big[f(m) + r(X - m)\big] &= \mathbb{E}[f(m)] + r\,\mathbb{E}[X] - r\,\mathbb{E}[m] \\
&= f(m) + r\,\mathbb{E}[X] - rm \\
&= f(m) + r\,(m - m) \\
&= f(m) = f(\mathbb{E}[X]).
\end{align*}
The heart of the argument is the cancellation $\mathbb{E}[X] - m = 0$ in the third line. This is not a coincidence — it is precisely *why* we placed the supporting hyperplane at $m = \mathbb{E}[X]$ rather than at any other point of $I$. At any other point $x_0 \in I$, the supporting hyperplane would yield the valid lower bound $\mathbb{E}[f(X)] \geq f(x_0) + r_{x_0}(\mathbb{E}[X] - x_0)$, but the right-hand side equals $f(x_0) + r_{x_0}(\mathbb{E}[X] - x_0)$, which is generally *not* equal to $f(\mathbb{E}[X])$. The choice $x_0 = m = \mathbb{E}[X]$ is the unique point at which the affine correction term vanishes upon taking expectations, converting the supporting hyperplane inequality into the Jensen bound $f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]$.
[/guided]
[/step]