[proofplan]
We first introduce an independent ghost sample and compare the empirical process $P_n-P$ to the difference of two empirical measures. The comparison uses conditional [Jensen's inequality](/theorems/9): conditionally on the observed sample, $Pf$ is the conditional mean of the ghost empirical average. Then exchangeability of each pair $(X_i,X_i')$ inserts independent Rademacher signs into the sample difference. Finally, the triangle inequality separates the signed difference into two Rademacher processes, and the ghost and original terms have the same distribution, giving the factor $2$.
[/proofplan]
[step:Introduce an independent ghost sample and its empirical averages]
Let
\begin{align*}
X_1',\dots,X_n':(\Omega',\mathcal A',\mathbb P')\to(S,\mathcal S)
\end{align*}
be independent identically distributed random variables with common distribution $P$, independent of $X_1,\dots,X_n$. We work on the product [probability space](/page/Probability%20Space)
\begin{align*}
(\overline\Omega,\overline{\mathcal A},\overline{\mathbb P})
:=
(\Omega\times\Omega',\mathcal A\otimes\mathcal A',\mathbb P\otimes\mathbb P').
\end{align*}
Let $\overline{\mathbb E}$ denote expectation with respect to $\overline{\mathbb P}$. For every integrable [random variable](/page/Random%20Variable) $H:\overline\Omega\to\mathbb R$, let $\mathbb E'[H]$ denote the [conditional expectation](/page/Conditional%20Expectation) obtained by integrating in the $\Omega'$ coordinate with the original sample held fixed.
For $f\in\mathcal F$, define the ghost empirical average
\begin{align*}
P_n'f:=\frac{1}{n}\sum_{i=1}^{n}f(X_i').
\end{align*}
Since $P|f|<\infty$ for every $f\in\mathcal F$ and $X_i'\sim P$, each random variable $f(X_i')$ is integrable, and therefore
\begin{align*}
\mathbb E'[P_n'f]=Pf,
\end{align*}
where $\mathbb E'$ denotes expectation with respect to $\mathbb P'$ while $X_1,\dots,X_n$ are held fixed.
[/step]
[step:Compare the centered empirical process to the ghost difference]
Fix first a nonempty finite subclass $\mathcal G\subset\mathcal F$. Define
\begin{align*}
Z_{\mathcal G}:=\sup_{f\in\mathcal G}|P_nf-Pf|
\end{align*}
as a nonnegative measurable random variable on $(\Omega,\mathcal A,\mathbb P)$, and define
\begin{align*}
W_{\mathcal G}:=\sup_{f\in\mathcal G}|P_nf-P_n'f|
\end{align*}
as a nonnegative measurable random variable on $(\overline\Omega,\overline{\mathcal A},\overline{\mathbb P})$.
For each fixed $f\in\mathcal G$,
\begin{align*}
P_nf-Pf
=
\mathbb E'[P_nf-P_n'f],
\end{align*}
because $P_nf$ is constant with respect to $\mathbb P'$ and $\mathbb E'[P_n'f]=Pf$. Taking absolute values and applying conditional [Jensen's inequality](/theorems/1977) to the convex map $t\mapsto |t|$ gives
\begin{align*}
|P_nf-Pf|
\le
\mathbb E'[|P_nf-P_n'f|].
\end{align*}
Taking the supremum over $f\in\mathcal G$ and using the elementary monotonicity estimate
\begin{align*}
\sup_{f\in\mathcal G}\mathbb E'[|P_nf-P_n'f|]
\le
\mathbb E'\left[\sup_{f\in\mathcal G}|P_nf-P_n'f|\right],
\end{align*}
we obtain
\begin{align*}
Z_{\mathcal G}
\le
\mathbb E'[W_{\mathcal G}].
\end{align*}
Taking expectation with respect to $\mathbb P$ and using Tonelli's theorem for the nonnegative random variable $W_{\mathcal G}$ yields
\begin{align*}
\mathbb E[Z_{\mathcal G}]
\le
\overline{\mathbb E}[W_{\mathcal G}],
\end{align*}
where $\overline{\mathbb E}$ denotes expectation on the product probability space.
[guided]
The purpose of the ghost sample is to replace the deterministic centering $Pf$ by a random empirical average that has the same mean. For a fixed function $f\in\mathcal G$, the quantity $P_nf$ depends only on the original sample, while $P_n'f$ depends only on the ghost sample. Since $X_i'\sim P$ and $P|f|<\infty$, we have
\begin{align*}
\mathbb E'[f(X_i')]=\int_S f(x)\,dP(x)=Pf.
\end{align*}
Averaging over $i\in\{1,\dots,n\}$ gives
\begin{align*}
\mathbb E'[P_n'f]
=
\mathbb E'\left[\frac{1}{n}\sum_{i=1}^{n}f(X_i')\right]
=
\frac{1}{n}\sum_{i=1}^{n}\mathbb E'[f(X_i')]
=
Pf.
\end{align*}
Therefore
\begin{align*}
P_nf-Pf
=
\mathbb E'[P_nf-P_n'f].
\end{align*}
Now we apply conditional Jensen's inequality. The [convex function](/page/Convex%20Function) is the absolute value map
\begin{align*}
\varphi:\mathbb R\to[0,\infty),\qquad t\mapsto |t|.
\end{align*}
The random variable $P_nf-P_n'f$ is integrable with respect to $\mathbb P'$ for each fixed original sample because it is bounded in absolute value by
\begin{align*}
|P_nf|+P_n'|f|,
\end{align*}
and $\mathbb E'[P_n'|f|]=P|f|<\infty$. Thus conditional Jensen gives
\begin{align*}
|P_nf-Pf|
=
\left|\mathbb E'[P_nf-P_n'f]\right|
\le
\mathbb E'[|P_nf-P_n'f|].
\end{align*}
Because $\mathcal G$ is finite, the supremum over $\mathcal G$ is just a finite maximum, so all random variables in this step are measurable. Taking the maximum over $f\in\mathcal G$ gives
\begin{align*}
\sup_{f\in\mathcal G}|P_nf-Pf|
\le
\sup_{f\in\mathcal G}\mathbb E'[|P_nf-P_n'f|].
\end{align*}
The supremum of expectations is bounded by the expectation of the supremum, since for every $f\in\mathcal G$,
\begin{align*}
|P_nf-P_n'f|
\le
\sup_{g\in\mathcal G}|P_ng-P_n'g|.
\end{align*}
Taking $\mathbb E'$ and then the supremum over $f$ yields
\begin{align*}
\sup_{f\in\mathcal G}\mathbb E'[|P_nf-P_n'f|]
\le
\mathbb E'\left[\sup_{f\in\mathcal G}|P_nf-P_n'f|\right].
\end{align*}
Hence
\begin{align*}
Z_{\mathcal G}
\le
\mathbb E'[W_{\mathcal G}].
\end{align*}
Finally, $W_{\mathcal G}$ is nonnegative, so Tonelli's theorem applies without any finiteness assumption and gives
\begin{align*}
\mathbb E[Z_{\mathcal G}]
\le
\overline{\mathbb E}[W_{\mathcal G}].
\end{align*}
This is the first half of symmetrisation: the deterministic centering $Pf$ has been replaced by a second independent empirical average.
[/guided]
[/step]
[step:Insert Rademacher signs by pairwise exchangeability]
Enlarge the product space, if necessary, by a further factor carrying independent Rademacher random variables
\begin{align*}
\varepsilon_1,\dots,\varepsilon_n
\end{align*}
that are independent of both samples, and continue to write $\overline{\mathbb E}$ for expectation on this enlarged product space. For each $i\in\{1,\dots,n\}$, the pair $(X_i,X_i')$ has distribution $P\otimes P$, so it has the same distribution as $(X_i',X_i)$. Since the pairs $(X_i,X_i')$ are independent across $i$ and the signs $\varepsilon_1,\dots,\varepsilon_n$ are independent, independently swapping the two coordinates in the $i$th pair according to $\varepsilon_i$ preserves the joint distribution of the full vector of pairs. Therefore the random element
\begin{align*}
\left((X_1,X_1'),\dots,(X_n,X_n')\right)
\end{align*}
has the same distribution as
\begin{align*}
\left((Y_1,Y_1'),\dots,(Y_n,Y_n')\right),
\end{align*}
where for each $i\in\{1,\dots,n\}$,
\begin{align*}
(Y_i,Y_i')
=
(X_i,X_i')
\end{align*}
on the event $\{\varepsilon_i=1\}$, and
\begin{align*}
(Y_i,Y_i')
=
(X_i',X_i)
\end{align*}
on the event $\{\varepsilon_i=-1\}$.
For each fixed finite $\mathcal G\subset\mathcal F$, the measurable map from $(S\times S)^n$ to $[0,\infty]$ given by
\begin{align*}
((x_1,x_1'),\dots,(x_n,x_n'))\mapsto
\sup_{f\in\mathcal G}\left|\frac{1}{n}\sum_{i=1}^{n}\{f(x_i)-f(x_i')\}\right|
\end{align*}
is invariant under replacing the difference in the $i$th coordinate by $\varepsilon_i\{f(X_i)-f(X_i')\}$ after the corresponding coordinate swap. Consequently, for every finite $\mathcal G\subset\mathcal F$,
\begin{align*}
\overline{\mathbb E}\left[\sup_{f\in\mathcal G}|P_nf-P_n'f|\right]
=
\overline{\mathbb E}\left[
\sup_{f\in\mathcal G}
\left|
\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i\{f(X_i)-f(X_i')\}
\right|
\right].
\end{align*}
[/step]
[step:Separate the signed original and ghost processes]
For every $f\in\mathcal G$,
\begin{align*}
\left|
\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i\{f(X_i)-f(X_i')\}
\right|
\le
\left|
\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(X_i)
\right|
+
\left|
\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(X_i')
\right|.
\end{align*}
Taking suprema over $f\in\mathcal G$ gives
\begin{align*}
\sup_{f\in\mathcal G}
\left|
\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i\{f(X_i)-f(X_i')\}
\right|
\le
\sup_{f\in\mathcal G}
\left|
\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(X_i)
\right|
+
\sup_{f\in\mathcal G}
\left|
\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(X_i')
\right|.
\end{align*}
Taking extended nonnegative expectations and using subadditivity of expectation for nonnegative random variables yields
\begin{align*}
\overline{\mathbb E}\left[
\sup_{f\in\mathcal G}
\left|
\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i\{f(X_i)-f(X_i')\}
\right|
\right]
\le
\overline{\mathbb E}\left[
\sup_{f\in\mathcal G}
\left|
\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(X_i)
\right|
\right]
+
\overline{\mathbb E}\left[
\sup_{f\in\mathcal G}
\left|
\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(X_i')
\right|
\right].
\end{align*}
Because $(X_1',\dots,X_n')$ and $(X_1,\dots,X_n)$ have the same distribution and are both independent of $(\varepsilon_1,\dots,\varepsilon_n)$, the two expectations on the right-hand side are equal. Hence
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal G}|P_nf-Pf|\right]
\le
2\mathbb E\left[
\sup_{f\in\mathcal G}
\left|
\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(X_i)
\right|
\right].
\end{align*}
[/step]
[step:Pass from finite subclasses to the full class]
Let
\begin{align*}
Z:=\sup_{f\in\mathcal F}|P_nf-Pf|
\end{align*}
and
\begin{align*}
R:=\sup_{f\in\mathcal F}
\left|
\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(X_i)
\right|.
\end{align*}
By hypothesis, $Z$ and $R$ are measurable extended real-valued random variables. If $\mathcal F=\varnothing$, then the stated convention gives $Z=R=0$, and the desired inequality follows. Hence assume for the rest of the step that $\mathcal F$ is nonempty. For every nonempty finite subclass $\mathcal G\subset\mathcal F$, the finite-class estimate gives
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal G}|P_nf-Pf|\right]
\le
2\mathbb E\left[
\sup_{f\in\mathcal G}
\left|
\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(X_i)
\right|
\right].
\end{align*}
Since $\mathcal F$ is countable and nonempty, choose an enumeration $\mathcal F=\{f_1,f_2,\dots\}$ if $\mathcal F$ is infinite, and choose an enumeration ending at $m$ if $\mathcal F$ is finite. Let $\mathcal G_k:=\{f_1,\dots,f_k\}$ in the infinite case and $\mathcal G_k:=\mathcal F$ for all $k\ge m$ in the finite case. The corresponding finite suprema form increasing sequences of nonnegative measurable random variables and converge pointwise to $Z$ and $R$, respectively. By the [monotone convergence theorem](/theorems/509) applied to these two increasing sequences, taking $k\to\infty$ in the finite-class estimate gives
\begin{align*}
\mathbb E[Z]
\le
2\mathbb E[R].
\end{align*}
Written out, this is
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}|P_nf-Pf|\right]
\le
2\mathbb E\left[\sup_{f\in\mathcal F}
\left|
\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(X_i)
\right|
\right].
\end{align*}
This is the claimed symmetrisation inequality.
[/step]