[proofplan]
We first use pointwise measurability to replace all suprema by suprema over a countable subclass, so that the displayed expectation is an ordinary expectation. We then scale the class by the envelope bound $M$, producing a unit-bounded class whose localization radius is $\sigma/M$. The same-run [local Rademacher bound](/theorems/9843) [citetheorem:9843] applies to this scaled localized class: it is used here with bound parameter $1$, envelope $H$, and radius $\delta=\sigma/M$. The normalized entropy integral is unchanged by the scaling. Multiplying the unit-bound estimate back by $M$ gives the asserted maximal inequality.
[/proofplan]
custom_env
admin
[step:Replace the supremum by a countable measurable supremum]Let $\mathcal F_0\subset\mathcal F$ be a countable pointwise dense subclass as in the hypothesis. For each $\omega\in\Omega$ and each $f\in\mathcal F$, choose a sequence $(f_k)_{k\in\mathbb N}$ in $\mathcal F_0$ such that $f_k(x)\to f(x)$ for every $x\in S$. Since the set $\{X_1(\omega),\dots,X_n(\omega)\}$ is finite, we have
\begin{align*}
P_nf_k(\omega)\to P_nf(\omega).
\end{align*}
Also, since $|f_k|\le F\le M$ and $f_k\to f$ pointwise, the [dominated convergence theorem](/theorems/4) with dominating function $F$ gives
\begin{align*}
P f_k\to P f.
\end{align*}
Therefore $G_nf_k(\omega)\to G_nf(\omega)$. Hence
\begin{align*}
\sup_{f\in\mathcal F}|G_nf(\omega)|=\sup_{f\in\mathcal F_0}|G_nf(\omega)|
\end{align*}
for every $\omega\in\Omega$. The right-hand side is a supremum of countably many real-valued measurable random variables, and is therefore measurable. Thus the expectation in the statement is an ordinary expectation.[/step]
custom_env
admin
[guided]The role of pointwise measurability is to remove an otherwise serious measurability issue. We are taking a supremum over a possibly uncountable class $\mathcal F$, and an uncountable supremum of measurable random variables need not be measurable. The hypothesis gives a countable subclass $\mathcal F_0\subset\mathcal F$ that is pointwise dense in $\mathcal F$.
Fix $\omega\in\Omega$ and $f\in\mathcal F$. By pointwise measurability, there is a sequence $(f_k)_{k\in\mathbb N}$ in $\mathcal F_0$ such that $f_k(x)\to f(x)$ for every $x\in S$. Since $P_n$ evaluates functions only at the finitely many sample points $X_1(\omega),\dots,X_n(\omega)$, pointwise convergence gives
\begin{align*}
P_nf_k(\omega)=\frac{1}{n}\sum_{i=1}^{n}f_k(X_i(\omega))\to \frac{1}{n}\sum_{i=1}^{n}f(X_i(\omega))=P_nf(\omega).
\end{align*}
For the population part, the envelope bound gives $|f_k|\le F\le M$, and $F$ is $P$-integrable because
\begin{align*}
\int_S F(x)\,dP(x)\le M P(S)=M<\infty.
\end{align*}
Thus the [dominated convergence theorem](/theorems/7529) applies with respect to the measure $P$, and yields
\begin{align*}
P f_k=\int_S f_k(x)\,dP(x)\to \int_S f(x)\,dP(x)=P f.
\end{align*}
Combining the empirical and population convergences gives $G_nf_k(\omega)\to G_nf(\omega)$. Hence every value attained in the supremum over $\mathcal F$ is a limit of values from $\mathcal F_0$, and therefore
\begin{align*}
\sup_{f\in\mathcal F}|G_nf(\omega)|=\sup_{f\in\mathcal F_0}|G_nf(\omega)|.
\end{align*}
The right-hand side is a countable supremum of measurable random variables. This proves the measurability of the supremum and justifies using $\mathbb E$ rather than outer expectation.[/guided]
custom_env
admin
[step:Scale to a unit-bounded localized class]Define the scaled class
\begin{align*}
\mathcal H:=\{h:S\to[-1,1]\mid h=f/M\text{ for some }f\in\mathcal F\}.
\end{align*}
Define the envelope $H:S\to[0,1]$ by $H(x):=F(x)/M$. Then $|h(x)|\le H(x)$ for every $h\in\mathcal H$ and every $x\in S$. Pointwise measurability of $\mathcal F$ implies pointwise measurability of $\mathcal H$ by scaling the countable dense subclass $\mathcal F_0$ by $1/M$.
Set $\delta:=\sigma/M$. Since $0<\sigma\le M$, we have $\delta\in(0,1]$. If $h=f/M\in\mathcal H$, then
\begin{align*}
P h^2=\frac{1}{M^2}P f^2\le \frac{\sigma^2}{M^2}=\delta^2.
\end{align*}
Thus $\mathcal H\subseteq\mathcal H(\delta)$, where
\begin{align*}
\mathcal H(\delta):=\{h\in\mathcal H:P h^2\le\delta^2\}.
\end{align*}
The hypotheses of the local Rademacher bound [citetheorem:9843] are now satisfied: $\mathcal H$ is pointwise measurable, each $h\in\mathcal H$ maps $S$ into $[-1,1]$, $H\le 1$ is an envelope, $\delta\in(0,1]$, and $X_1,\dots,X_n$ are i.i.d. with common distribution $P$. Applying that theorem with bound parameter $1$, envelope $H$, and localization radius $\delta$ gives a universal constant $C_0>0$ such that
\begin{align*}
\mathbb E\left[\sup_{h\in\mathcal H(\delta)}|(P_n-P)h|\right]\le C_0\left\{\frac{1}{\sqrt n}J(\delta,\mathcal H,H)+\frac{1}{n\delta^2}J(\delta,\mathcal H,H)^2\right\}.
\end{align*}
Since $\mathcal H\subseteq\mathcal H(\delta)$, the same upper bound holds with the supremum taken over $\mathcal H$.[/step]
custom_env
admin
[guided]The goal of the scaling is to put the class into the $M=1$ form of the local Rademacher bound [citetheorem:9843]. Define
\begin{align*}
\mathcal H:=\{h:S\to[-1,1]\mid h=f/M\text{ for some }f\in\mathcal F\}.
\end{align*}
Because every $f\in\mathcal F$ takes values in $[-M,M]$, every $h=f/M$ takes values in $[-1,1]$. The scaled envelope is the map $H:S\to[0,1]$ given by $H(x):=F(x)/M$, and the envelope property follows from
\begin{align*}
|h(x)|=\frac{|f(x)|}{M}\le \frac{F(x)}{M}=H(x).
\end{align*}
Pointwise measurability is also preserved: if $f_k\in\mathcal F_0$ and $f_k(x)\to f(x)$ pointwise, then $f_k(x)/M\to f(x)/M$ pointwise, so the countable class $\{f_0/M:f_0\in\mathcal F_0\}$ is pointwise dense in $\mathcal H$.
Now set $\delta:=\sigma/M$. The hypothesis $0<\sigma\le M$ gives $\delta\in(0,1]$, so it is an admissible localization radius for a unit-bounded class. If $h=f/M$, the variance hypothesis gives
\begin{align*}
P h^2=\frac{1}{M^2}P f^2\le \frac{\sigma^2}{M^2}=\delta^2.
\end{align*}
Therefore every member of $\mathcal H$ lies in the localized subclass
\begin{align*}
\mathcal H(\delta):=\{h\in\mathcal H:P h^2\le\delta^2\}.
\end{align*}
We may apply [citetheorem:9843] to $\mathcal H$ because it is pointwise measurable, unit bounded, has the envelope $H\le 1$, and is indexed by the same i.i.d. sample $X_1,\dots,X_n$ with common law $P$. The theorem yields a universal constant $C_0>0$ such that
\begin{align*}
\mathbb E\left[\sup_{h\in\mathcal H(\delta)}|(P_n-P)h|\right]\le C_0\left\{\frac{1}{\sqrt n}J(\delta,\mathcal H,H)+\frac{1}{n\delta^2}J(\delta,\mathcal H,H)^2\right\}.
\end{align*}
Since $\mathcal H\subseteq\mathcal H(\delta)$, replacing the supremum over $\mathcal H(\delta)$ by the smaller supremum over $\mathcal H$ preserves the upper bound. This is the only external empirical-process estimate used in the proof; its temporary citation is one of the same-run theorem targets and will be rewritten to the final theorem link at commit time.[/guided]
custom_env
admin
[step:Check that the entropy integral is unchanged by scaling]Let $Q$ be a finitely supported probability measure on $(S,\mathcal S)$ with $0<\|F\|_{L^2(Q)}<\infty$. For $f,g\in\mathcal F$, put $h=f/M$ and $k=g/M$. Then
\begin{align*}
\|h-k\|_{L^2(Q)}=\frac{1}{M}\|f-g\|_{L^2(Q)}
\end{align*}
and
\begin{align*}
\|H\|_{L^2(Q)}=\frac{1}{M}\|F\|_{L^2(Q)}.
\end{align*}
Therefore, for every $\varepsilon>0$,
\begin{align*}
N\left(\varepsilon\|H\|_{L^2(Q)},\mathcal H,L^2(Q)\right)=N\left(\varepsilon\|F\|_{L^2(Q)},\mathcal F,L^2(Q)\right).
\end{align*}
Taking the supremum over finitely supported probability measures $Q$ and integrating over $\varepsilon\in(0,\delta)$ with respect to $\mathcal L^1$ gives
\begin{align*}
J(\delta,\mathcal H,H)=J(\delta,\mathcal F,F).
\end{align*}[/step]
custom_env
admin
[guided]The normalized entropy integral is designed to be invariant under multiplying both the class and the envelope by the same positive constant. Fix a finitely supported probability measure $Q$ on $(S,\mathcal S)$ with $0<\|F\|_{L^2(Q)}<\infty$. If $h=f/M$ and $k=g/M$, then the $L^2(Q)$ distance scales by the same factor:
\begin{align*}
\|h-k\|_{L^2(Q)}=\frac{1}{M}\|f-g\|_{L^2(Q)}.
\end{align*}
The envelope norm also scales by that factor:
\begin{align*}
\|H\|_{L^2(Q)}=\frac{1}{M}\|F\|_{L^2(Q)}.
\end{align*}
Thus an open $L^2(Q)$-ball of radius $\varepsilon\|H\|_{L^2(Q)}$ around $h=f/M$ corresponds exactly, after multiplying by $M$, to an open $L^2(Q)$-ball of radius $\varepsilon\|F\|_{L^2(Q)}$ around $f$. Hence the covering numbers agree:
\begin{align*}
N\left(\varepsilon\|H\|_{L^2(Q)},\mathcal H,L^2(Q)\right)=N\left(\varepsilon\|F\|_{L^2(Q)},\mathcal F,L^2(Q)\right).
\end{align*}
Taking the supremum over the same finitely supported probability measures $Q$ and integrating over $\varepsilon\in(0,\delta)$ with respect to $\mathcal L^1$ yields
\begin{align*}
J(\delta,\mathcal H,H)=J(\delta,\mathcal F,F).
\end{align*}[/guided]
custom_env
admin
[step:Multiply the scaled estimate by $M$]For $h=f/M$, linearity of $P_n$ and $P$ gives
\begin{align*}
G_nh=\frac{1}{M}G_nf.
\end{align*}
Consequently,
\begin{align*}
\sup_{h\in\mathcal H}|G_nh|=\frac{1}{M}\sup_{f\in\mathcal F}|G_nf|.
\end{align*}
Multiplying the estimate from [citetheorem:9843] by $\sqrt n$ gives
\begin{align*}
\mathbb E\left[\sup_{h\in\mathcal H}|G_nh|\right]\le C_0\left\{J(\delta,\mathcal H,H)+\frac{J(\delta,\mathcal H,H)^2}{\delta^2\sqrt n}\right\}.
\end{align*}
Using $\delta=\sigma/M$ and $J(\delta,\mathcal H,H)=J(\delta,\mathcal F,F)$, we obtain
\begin{align*}
\frac{1}{M}\mathbb E\left[\sup_{f\in\mathcal F}|G_nf|\right]\le C_0\left\{J\left(\frac{\sigma}{M},\mathcal F,F\right)+\frac{J\left(\frac{\sigma}{M},\mathcal F,F\right)^2}{(\sigma/M)^2\sqrt n}\right\}.
\end{align*}
Multiplying by $M$ yields
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}|G_nf|\right]\le C_0\left\{M J\left(\frac{\sigma}{M},\mathcal F,F\right)+\frac{M^3J\left(\frac{\sigma}{M},\mathcal F,F\right)^2}{\sigma^2\sqrt n}\right\}.
\end{align*}
Taking $C:=C_0$ proves the asserted inequality. If the entropy integral is infinite, the same displayed inequality is true in the extended nonnegative sense. This completes the proof.[/step]
custom_env
admin
[guided]The bound from [citetheorem:9843] controls $(P_n-P)h$, while the theorem is stated for $G_nh=\sqrt n(P_n-P)h$. Multiplying the localized estimate by $\sqrt n$ gives
\begin{align*}
\mathbb E\left[\sup_{h\in\mathcal H}|G_nh|\right]\le C_0\left\{J(\delta,\mathcal H,H)+\frac{J(\delta,\mathcal H,H)^2}{\delta^2\sqrt n}\right\}.
\end{align*}
Now we return from the scaled class $\mathcal H$ to the original class $\mathcal F$. If $h=f/M$, then by linearity of $P_n$ and $P$,
\begin{align*}
G_nh=\sqrt n\left(P_n(f/M)-P(f/M)\right)=\frac{1}{M}G_nf.
\end{align*}
Therefore
\begin{align*}
\sup_{h\in\mathcal H}|G_nh|=\frac{1}{M}\sup_{f\in\mathcal F}|G_nf|.
\end{align*}
The previous step proved that the entropy integral is unchanged by the scaling, and the definition of $\delta$ gives $\delta=\sigma/M$. Substituting these two facts into the displayed estimate yields
\begin{align*}
\frac{1}{M}\mathbb E\left[\sup_{f\in\mathcal F}|G_nf|\right]\le C_0\left\{J\left(\frac{\sigma}{M},\mathcal F,F\right)+\frac{J\left(\frac{\sigma}{M},\mathcal F,F\right)^2}{(\sigma/M)^2\sqrt n}\right\}.
\end{align*}
Finally multiply both sides by $M$ and simplify the second term:
\begin{align*}
M\cdot\frac{1}{(\sigma/M)^2\sqrt n}=\frac{M^3}{\sigma^2\sqrt n}.
\end{align*}
Thus
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}|G_nf|\right]\le C_0\left\{M J\left(\frac{\sigma}{M},\mathcal F,F\right)+\frac{M^3J\left(\frac{\sigma}{M},\mathcal F,F\right)^2}{\sigma^2\sqrt n}\right\}.
\end{align*}
Taking $C:=C_0$ gives the asserted universal constant. If the entropy integral is infinite, the inequality is interpreted in the extended nonnegative sense, so it remains valid.[/guided]