[proofplan]
We introduce an independent copy $X_1',\dots,X_n'$ of the sample and replace each deterministic mean $P f$ by the [conditional expectation](/page/Conditional%20Expectation) of $f(X_i')$. [Jensen's inequality](/theorems/9) then bounds the centered empirical supremum by the expected supremum of the ghost-sample differences. The symmetry of the pair $(X_i,X_i')$ allows independent Rademacher signs to be inserted into those differences without changing their joint law. Finally, the triangle inequality separates the signed difference into two Rademacher sums, and the ghost sample has the same law as the original sample.
[/proofplan]
[step:Introduce an independent ghost sample and encode the centered process]
Work on a product [probability space](/page/Probability%20Space) carrying, in addition to $X_1,\dots,X_n$, random variables $X_1',\dots,X_n'$ such that $X_1',\dots,X_n'$ are i.i.d. with distribution $P$ and independent of $X_1,\dots,X_n$. For each $f\in\mathcal F$, define
\begin{align*}
Z_f:=\sum_{i=1}^{n}(f(X_i)-P f).
\end{align*}
Since $P|f|<\infty$, each quantity $P f$ is finite. Also define the ghost-sample difference
\begin{align*}
D_f:=\sum_{i=1}^{n}(f(X_i)-f(X_i')).
\end{align*}
For each fixed $f\in\mathcal F$, independence and identical distribution give
\begin{align*}
\mathbb E[f(X_i')]=P f
\end{align*}
for every $1\le i\le n$, and hence
\begin{align*}
Z_f=\mathbb E\left[D_f\mid X_1,\dots,X_n\right].
\end{align*}
Here the conditional expectation is taken with respect to the ghost sample while the original sample is held fixed.
[/step]
[step:Apply Jensen to replace the means by ghost observations]
Let $\mathcal V$ denote the real [vector space](/page/Vector%20Space) of all real-valued functions on $\mathcal F$. Define the map
\begin{align*}
\Phi:\mathcal V&\to[0,\infty]
\end{align*}
\begin{align*}
a&\mapsto \sup_{f\in\mathcal F}|a(f)|.
\end{align*}
The map $\Phi$ is convex because, for $0\le\lambda\le1$ and $a,b\in\mathcal V$,
\begin{align*}
\Phi(\lambda a+(1-\lambda)b)
\le \lambda\Phi(a)+(1-\lambda)\Phi(b)
\end{align*}
by the triangle inequality in $\mathbb R$ and the elementary inequality for suprema. Applying [Jensen's inequality](/theorems/1977) conditionally to the random element $f\mapsto D_f$ gives
\begin{align*}
\sup_{f\in\mathcal F}|Z_f|
=\Phi\left(\mathbb E[D_\cdot\mid X_1,\dots,X_n]\right)
\le \mathbb E\left[\Phi(D_\cdot)\mid X_1,\dots,X_n\right].
\end{align*}
Taking expectations and using the [tower property of conditional expectation](/theorems/1150) yields
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}|Z_f|\right]
\le \mathbb E\left[\sup_{f\in\mathcal F}|D_f|\right].
\end{align*}
[guided]
The purpose of the ghost sample is to turn the deterministic centering term $P f$ into an average over independent random observations. For fixed $f\in\mathcal F$, the integrability assumption $P|f|<\infty$ ensures that $f(X_i')$ is integrable and that
\begin{align*}
\mathbb E[f(X_i')]=\int_S f\,dP=P f.
\end{align*}
Since $X_1',\dots,X_n'$ are independent of $X_1,\dots,X_n$, conditioning on the original sample leaves only the ghost variables random. Therefore
\begin{align*}
\mathbb E\left[\sum_{i=1}^{n}(f(X_i)-f(X_i'))\mid X_1,\dots,X_n\right]
=\sum_{i=1}^{n}(f(X_i)-P f).
\end{align*}
Now we need to pass from this pointwise identity in $f$ to an inequality after taking $\sup_{f\in\mathcal F}|\cdot|$. The relevant function is
\begin{align*}
\Phi:\mathcal V&\to[0,\infty]
\end{align*}
\begin{align*}
a&\mapsto \sup_{f\in\mathcal F}|a(f)|,
\end{align*}
where $\mathcal V$ is the vector space of all real-valued functions on $\mathcal F$. This map is convex: if $0\le\lambda\le1$, then for every $f\in\mathcal F$,
\begin{align*}
|\lambda a(f)+(1-\lambda)b(f)|
\le \lambda |a(f)|+(1-\lambda)|b(f)|
\le \lambda\Phi(a)+(1-\lambda)\Phi(b).
\end{align*}
Taking the supremum over $f\in\mathcal F$ gives
\begin{align*}
\Phi(\lambda a+(1-\lambda)b)
\le \lambda\Phi(a)+(1-\lambda)\Phi(b).
\end{align*}
Conditional Jensen's inequality may therefore be applied to the conditional expectation of the random function $f\mapsto D_f$. It gives
\begin{align*}
\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}(f(X_i)-P f)\right|
\le
\mathbb E\left[
\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}(f(X_i)-f(X_i'))\right|
\mid X_1,\dots,X_n
\right].
\end{align*}
Finally, taking expectation on both sides and using the tower property gives
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}(f(X_i)-P f)\right|\right]
\le
\mathbb E\left[
\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}(f(X_i)-f(X_i'))\right|
\right].
\end{align*}
[/guided]
[/step]
[step:Insert independent Rademacher signs by symmetry of the paired samples]
Let $\varepsilon_1,\dots,\varepsilon_n$ be independent Rademacher random variables independent of both samples. For each $1\le i\le n$, the ordered pair $(X_i,X_i')$ has the same distribution as $(X_i',X_i)$. Hence the real-valued process indexed by $\mathcal F$,
\begin{align*}
f\mapsto f(X_i)-f(X_i'),
\end{align*}
has the same distribution as
\begin{align*}
f\mapsto -(f(X_i)-f(X_i')).
\end{align*}
Independence across $i$ and independence of the signs imply that the two indexed processes
\begin{align*}
f\mapsto \sum_{i=1}^{n}(f(X_i)-f(X_i'))
\end{align*}
and
\begin{align*}
f\mapsto \sum_{i=1}^{n}\varepsilon_i(f(X_i)-f(X_i'))
\end{align*}
have the same distribution. Therefore
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}(f(X_i)-f(X_i'))\right|\right]
=
\mathbb E\left[\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}\varepsilon_i(f(X_i)-f(X_i'))\right|\right].
\end{align*}
[/step]
[step:Split the signed difference into two Rademacher sums]
For every realization of the samples and signs, the triangle inequality in $\mathbb R$ gives, for each $f\in\mathcal F$,
\begin{align*}
\left|\sum_{i=1}^{n}\varepsilon_i(f(X_i)-f(X_i'))\right|
\le
\left|\sum_{i=1}^{n}\varepsilon_i f(X_i)\right|
+
\left|\sum_{i=1}^{n}\varepsilon_i f(X_i')\right|.
\end{align*}
Taking the supremum over $f\in\mathcal F$ and using $\sup_f(A_f+B_f)\le\sup_f A_f+\sup_f B_f$ for nonnegative indexed quantities,
\begin{align*}
\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}\varepsilon_i(f(X_i)-f(X_i'))\right|
\le
\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}\varepsilon_i f(X_i)\right|
+
\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}\varepsilon_i f(X_i')\right|.
\end{align*}
Taking expectations gives
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}\varepsilon_i(f(X_i)-f(X_i'))\right|\right]
\le
\mathbb E\left[\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}\varepsilon_i f(X_i)\right|\right]
+
\mathbb E\left[\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}\varepsilon_i f(X_i')\right|\right].
\end{align*}
Because $(X_1',\dots,X_n')$ has the same distribution as $(X_1,\dots,X_n)$ and is independent of the same Rademacher vector $(\varepsilon_1,\dots,\varepsilon_n)$, the two expectations on the right-hand side are equal. Thus
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}\varepsilon_i(f(X_i)-f(X_i'))\right|\right]
\le
2\mathbb E\left[\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}\varepsilon_i f(X_i)\right|\right].
\end{align*}
[/step]
[step:Combine the ghost-sample and Rademacher bounds]
Combining the Jensen bound with the sign-symmetrisation identity and the triangle-inequality estimate gives
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}(f(X_i)-P f)\right|\right]
\le
2\mathbb E\left[\sup_{f\in\mathcal F}\left|\sum_{i=1}^{n}\varepsilon_i f(X_i)\right|\right].
\end{align*}
This is the desired basic symmetrisation inequality.
[/step]