[proofplan]
Introduce an independent copy $X_1',\dots,X_n'$ of the sample and compare the centred empirical process with the difference of the two empirical measures. Conditional [Jensen's inequality](/theorems/9) replaces the mean $P f$ by the ghost empirical mean $P_n'f$ first on finite subclasses, and countability of $\mathcal F$ then passes to the full class by monotone convergence. The joint law of each pair $(X_i,X_i')$ is invariant under swapping its two coordinates, so independent Rademacher signs may be inserted into the ghost-sample difference. The triangle inequality splits the signed difference into two identically distributed Rademacher processes, giving the factor $2$.
[/proofplan]
[step:Introduce a ghost sample and its empirical measure]
Let
\begin{align*}
(\Omega',\mathcal A',\mathbb P'):=(S^n,\mathcal S^{\otimes n},P^{\otimes n})
\end{align*}
and, for $1\le i\le n$, let $X_i':(\Omega',\mathcal A')\to(S,\mathcal S)$ be the $i$-th coordinate projection. Then $X_1',\dots,X_n'$ are independent identically distributed random variables with common distribution $P$. On the product extension below, they are independent of $X_1,\dots,X_n$ and of $\varepsilon_1,\dots,\varepsilon_n$ by the product construction. Work on the product [probability space](/page/Probability%20Space)
\begin{align*}
(\overline\Omega,\overline{\mathcal A},\overline{\mathbb P}):=(\Omega\times\Omega'\times\Omega_\varepsilon,\mathcal A\otimes\mathcal A'\otimes\mathcal A_\varepsilon,\mathbb P\otimes\mathbb P'\otimes\mathbb P_\varepsilon).
\end{align*}
Write $\mathbb E'$ for expectation with respect to $\mathbb P'$. For every measurable $h:S\to\mathbb R$ with $P|h|<\infty$, define the ghost empirical average
\begin{align*}
P_n'h:=\frac{1}{n}\sum_{i=1}^{n}h(X_i').
\end{align*}
For $f\in\mathcal F$, define
\begin{align*}
\Delta_n(f):=\sqrt n\,(P_n-P_n')f=\frac{1}{\sqrt n}\sum_{i=1}^{n}\{f(X_i)-f(X_i')\}.
\end{align*}
Since $P|f|<\infty$ and $X_i,X_i'$ have distribution $P$, all random variables $f(X_i)$ and $f(X_i')$ are integrable.
[/step]
[step:Replace the population mean by the ghost empirical mean]
We first prove
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}|G_n(f)|\right]\le \mathbb E\left[\sup_{f\in\mathcal F}|\Delta_n(f)|\right].
\end{align*}
For a finite nonempty subclass $\mathcal H\subset\mathcal F$, the map $\Phi_{\mathcal H}:\mathbb R^{\mathcal H}\to[0,\infty)$ defined by
\begin{align*}
\Phi_{\mathcal H}(z):=\sup_{f\in\mathcal H}|z_f|
\end{align*}
is convex. Conditional on $X_1,\dots,X_n$, integrability gives
\begin{align*}
\mathbb E'\left[\frac{1}{\sqrt n}\sum_{i=1}^{n}\{f(X_i)-f(X_i')\}\right]=\frac{1}{\sqrt n}\sum_{i=1}^{n}\{f(X_i)-P f\}=G_n(f)
\end{align*}
for every $f\in\mathcal H$. The finite-dimensional [Jensen inequality](/theorems/515) for the convex map $\Phi_{\mathcal H}$ yields
\begin{align*}
\sup_{f\in\mathcal H}|G_n(f)|\le \mathbb E'\left[\sup_{f\in\mathcal H}|\Delta_n(f)|\right]
\end{align*}
for $\mathbb P$-almost every sample. Taking expectation in $X_1,\dots,X_n$ gives
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal H}|G_n(f)|\right]\le \mathbb E\left[\sup_{f\in\mathcal H}|\Delta_n(f)|\right].
\end{align*}
Because $\mathcal F$ is countable, choose an enumeration $\mathcal F=\{f_1,f_2,\dots\}$ when $\mathcal F$ is infinite, and set $\mathcal H_m:=\{f_1,\dots,f_m\}$ for $m\in\mathbb N$. When $\mathcal F$ is finite, take $\mathcal H_m=\mathcal F$ for all sufficiently large $m$. Then
\begin{align*}
\sup_{f\in\mathcal H_m}|G_n(f)|\uparrow \sup_{f\in\mathcal F}|G_n(f)|
\end{align*}
and
\begin{align*}
\sup_{f\in\mathcal H_m}|\Delta_n(f)|\uparrow \sup_{f\in\mathcal F}|\Delta_n(f)|
\end{align*}
pointwise as $m\to\infty$. The displayed random variables are measurable as suprema of countably many [measurable functions](/page/Measurable%20Functions). Applying the [monotone convergence theorem](/theorems/509) to the increasing measurable nonnegative sequences on both sides of the finite-subclass inequality gives
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}|G_n(f)|\right]\le \mathbb E\left[\sup_{f\in\mathcal F}|\Delta_n(f)|\right].
\end{align*}
[guided]
The role of the ghost sample is to replace the fixed population mean $P f$ by a random empirical mean with the same [conditional expectation](/page/Conditional%20Expectation). We first make this replacement for a finite subclass $\mathcal H\subset\mathcal F$, because then all suprema are ordinary measurable suprema of finitely many random variables.
Define $\Phi_{\mathcal H}:\mathbb R^{\mathcal H}\to[0,\infty)$ by
\begin{align*}
\Phi_{\mathcal H}(z):=\sup_{f\in\mathcal H}|z_f|.
\end{align*}
This function is convex because it is the supremum of the convex functions $z\mapsto z_f$ and $z\mapsto -z_f$ over $f\in\mathcal H$. Conditional on $X_1,\dots,X_n$, the only randomness in $\Delta_n(f)$ is the ghost sample. Since $X_i'$ has law $P$ and $P|f|<\infty$, we have
\begin{align*}
\mathbb E'[f(X_i')]=\int_S f(x)\,dP(x)=P f.
\end{align*}
Therefore, for every $f\in\mathcal H$,
\begin{align*}
\mathbb E'\left[\Delta_n(f)\right]=\mathbb E'\left[\frac{1}{\sqrt n}\sum_{i=1}^{n}\{f(X_i)-f(X_i')\}\right]=\frac{1}{\sqrt n}\sum_{i=1}^{n}\{f(X_i)-P f\}=G_n(f).
\end{align*}
Applying the finite-dimensional Jensen inequality to the integrable random vector $(\Delta_n(f))_{f\in\mathcal H}$ and the [convex function](/page/Convex%20Function) $\Phi_{\mathcal H}$ gives
\begin{align*}
\sup_{f\in\mathcal H}|G_n(f)|=\Phi_{\mathcal H}\left(\mathbb E'[(\Delta_n(f))_{f\in\mathcal H}]\right)\le \mathbb E'\left[\Phi_{\mathcal H}((\Delta_n(f))_{f\in\mathcal H})\right]=\mathbb E'\left[\sup_{f\in\mathcal H}|\Delta_n(f)|\right].
\end{align*}
Taking expectation over the original sample yields
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal H}|G_n(f)|\right]\le \mathbb E\left[\sup_{f\in\mathcal H}|\Delta_n(f)|\right].
\end{align*}
Now pass from finite subclasses to the full class using countability. If $\mathcal F$ is infinite, enumerate it as $\mathcal F=\{f_1,f_2,\dots\}$ and define $\mathcal H_m:=\{f_1,\dots,f_m\}$ for $m\in\mathbb N$; if $\mathcal F$ is finite, take $\mathcal H_m=\mathcal F$ once $m$ is large enough. For each outcome,
\begin{align*}
\sup_{f\in\mathcal H_m}|G_n(f)|\uparrow \sup_{f\in\mathcal F}|G_n(f)|
\end{align*}
and
\begin{align*}
\sup_{f\in\mathcal H_m}|\Delta_n(f)|\uparrow \sup_{f\in\mathcal F}|\Delta_n(f)|.
\end{align*}
These suprema are measurable because they are countable suprema of measurable real-valued random variables, and the two displayed sequences are increasing and nonnegative. Therefore the monotone convergence theorem applies to both sides of the finite-subclass inequality and yields
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}|G_n(f)|\right]\le \mathbb E\left[\sup_{f\in\mathcal F}|\Delta_n(f)|\right].
\end{align*}
This is exactly the point at which countability is used: it replaces the outer-expectation difficulty by an increasing sequence of measurable finite suprema.
[/guided]
[/step]
[step:Insert Rademacher signs by exchangeability of the sample pairs]
For each $1\le i\le n$, the pair $(X_i,X_i')$ has the same distribution as $(X_i',X_i)$, and the pairs
\begin{align*}
(X_1,X_1'),\dots,(X_n,X_n')
\end{align*}
are independent. For each fixed sign vector $e=(e_1,\dots,e_n)\in\{-1,1\}^n$, replacing $(X_i,X_i')$ by $(X_i',X_i)$ when $e_i=-1$ leaves the joint law of the full vector
\begin{align*}
(X_1,X_1',\dots,X_n,X_n')
\end{align*}
unchanged and transforms $f(X_i)-f(X_i')$ into $e_i\{f(X_i)-f(X_i')\}$ for every $f\in\mathcal F$. Hence, conditional on $\varepsilon=e$, the measurable [random variable](/page/Random%20Variable)
\begin{align*}
\sup_{f\in\mathcal F}\left|\frac{1}{\sqrt n}\sum_{i=1}^{n}\{f(X_i)-f(X_i')\}\right|
\end{align*}
has the same distribution as
\begin{align*}
\sup_{f\in\mathcal F}\left|\frac{1}{\sqrt n}\sum_{i=1}^{n}e_i\{f(X_i)-f(X_i')\}\right|.
\end{align*}
Averaging over the finitely many possible sign vectors gives
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}|\Delta_n(f)|\right]=\mathbb E\left[\sup_{f\in\mathcal F}\left|\frac{1}{\sqrt n}\sum_{i=1}^{n}\varepsilon_i\{f(X_i)-f(X_i')\}\right|\right].
\end{align*}
[/step]
[step:Split the signed ghost difference into two Rademacher processes]
For every realization of $X_1,\dots,X_n,X_1',\dots,X_n'$ and $\varepsilon_1,\dots,\varepsilon_n$, the triangle inequality gives
\begin{align*}
\sup_{f\in\mathcal F}\left|\frac{1}{\sqrt n}\sum_{i=1}^{n}\varepsilon_i\{f(X_i)-f(X_i')\}\right|\le \sup_{f\in\mathcal F}\left|\frac{1}{\sqrt n}\sum_{i=1}^{n}\varepsilon_i f(X_i)\right|+\sup_{f\in\mathcal F}\left|\frac{1}{\sqrt n}\sum_{i=1}^{n}\varepsilon_i f(X_i')\right|.
\end{align*}
Taking ordinary expectations of these measurable extended nonnegative random variables and using monotonicity of expectation gives
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}\left|\frac{1}{\sqrt n}\sum_{i=1}^{n}\varepsilon_i\{f(X_i)-f(X_i')\}\right|\right]\le \mathbb E\left[\sup_{f\in\mathcal F}\left|\frac{1}{\sqrt n}\sum_{i=1}^{n}\varepsilon_i f(X_i)\right|\right]+\mathbb E\left[\sup_{f\in\mathcal F}\left|\frac{1}{\sqrt n}\sum_{i=1}^{n}\varepsilon_i f(X_i')\right|\right].
\end{align*}
The random vectors $(X_1,\dots,X_n)$ and $(X_1',\dots,X_n')$ have the same distribution and are both independent of $(\varepsilon_1,\dots,\varepsilon_n)$. Hence the two outer expectations on the right-hand side are equal. Since
\begin{align*}
R_n(f)=\frac{1}{\sqrt n}\sum_{i=1}^{n}\varepsilon_i f(X_i),
\end{align*}
we obtain
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}|\Delta_n(f)|\right]\le 2\mathbb E\left[\sup_{f\in\mathcal F}|R_n(f)|\right].
\end{align*}
[/step]
[step:Combine the ghost-sample and Rademacher bounds]
Combining the ghost-sample comparison with the Rademacher splitting gives
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}|G_n(f)|\right]\le \mathbb E\left[\sup_{f\in\mathcal F}|\Delta_n(f)|\right]\le 2\mathbb E\left[\sup_{f\in\mathcal F}|R_n(f)|\right].
\end{align*}
This is the asserted symmetrization inequality in the extended nonnegative sense. The convention that the supremum over $\varnothing$ is $0$ makes the argument valid also when $\mathcal F=\varnothing$.
[/step]