[proofplan]
Fix a target accuracy and cover $\mathcal F$ by finitely many $L^1(P)$ brackets of very small width. For a function inside one bracket, compare $P_n f-Pf$ to the empirical errors of the two bracket endpoints and to the empirical and true widths of the bracket. The endpoint errors and the empirical widths involve only finitely many integrable functions, so the [finite-class uniform law of large numbers](/theorems/9816) applies. Choosing the bracket width small relative to the target accuracy gives convergence of the full supremum to zero in probability.
[/proofplan]
custom_env
admin
[step:Choose a finite bracket cover with small $L^1(P)$ width]
Fix $\eta>0$. Choose a number $\delta>0$ satisfying $4\delta<\eta$. By the bracketing hypothesis, there exist an integer $J\in\mathbb N$ and measurable $P$-integrable functions
\begin{align*}
l_j:S\to\mathbb R,\qquad u_j:S\to\mathbb R,\qquad 1\le j\le J,
\end{align*}
such that $l_j\le u_j$ pointwise,
\begin{align*}
P(u_j-l_j)<\delta
\end{align*}
for every $1\le j\le J$, and for every $f\in\mathcal F$ there exists $j\in\{1,\dots,J\}$ such that $l_j\le f\le u_j$ pointwise.
For later use, define the finite endpoint class
\begin{align*}
\mathcal E:=\{l_1,u_1,\dots,l_J,u_J\}
\end{align*}
and the finite width class
\begin{align*}
\mathcal W:=\{w_1,\dots,w_J\},
\end{align*}
where each width function $w_j:S\to[0,\infty)$ is given by
\begin{align*}
w_j(x):=u_j(x)-l_j(x).
\end{align*}
Since $l_j$ and $u_j$ are $P$-integrable, each $w_j$ is $P$-integrable.
[/step]
custom_env
admin
[step:Control each function by its bracket endpoints and bracket width]Let $f\in\mathcal F$, and choose $j\in\{1,\dots,J\}$ such that $l_j\le f\le u_j$ pointwise. Since $l_j\le f\le u_j$, we have
\begin{align*}
P_n f-Pf\le P_n u_j-P l_j.
\end{align*}
Adding and subtracting $P u_j$ gives
\begin{align*}
P_n f-Pf\le (P_n u_j-Pu_j)+P(u_j-l_j).
\end{align*}
Also, using $l_j\le f\le u_j$ in the opposite direction,
\begin{align*}
Pf-P_n f\le P u_j-P_n l_j.
\end{align*}
Adding and subtracting $P_n u_j$ gives
\begin{align*}
Pf-P_n f\le (P u_j-P_n u_j)+P_n(u_j-l_j).
\end{align*}
Therefore
\begin{align*}
|P_n f-Pf|\le \max_{h\in\mathcal E}|P_nh-Ph|+\max_{1\le k\le J}P_n w_k+\delta.
\end{align*}
Taking the supremum over $f\in\mathcal F$ yields
\begin{align*}
\sup_{f\in\mathcal F}|P_n f-Pf|\le \max_{h\in\mathcal E}|P_nh-Ph|+\max_{1\le k\le J}P_n w_k+\delta.
\end{align*}[/step]
custom_env
admin
[guided]The point of bracketing is that we do not approximate $f$ by a single function; instead, we trap it between a lower endpoint and an upper endpoint. Fix $f\in\mathcal F$. By construction of the bracket cover, there is some index $j\in\{1,\dots,J\}$ such that
\begin{align*}
l_j(x)\le f(x)\le u_j(x)
\end{align*}
for every $x\in S$.
Because empirical averaging preserves pointwise inequalities, the upper bound $f\le u_j$ gives
\begin{align*}
P_n f\le P_n u_j.
\end{align*}
Because integration with respect to $P$ also preserves pointwise inequalities, the lower bound $l_j\le f$ gives
\begin{align*}
P l_j\le P f.
\end{align*}
Combining these two inequalities,
\begin{align*}
P_n f-Pf\le P_n u_j-P l_j.
\end{align*}
Now we insert the missing quantity $P u_j$ so that an empirical error appears:
\begin{align*}
P_n u_j-P l_j=(P_n u_j-Pu_j)+P(u_j-l_j).
\end{align*}
The first term is an empirical fluctuation of the fixed endpoint $u_j$, while the second term is the true width of the bracket. Since the bracket was chosen with $P(u_j-l_j)<\delta$, this gives
\begin{align*}
P_n f-Pf\le (P_n u_j-Pu_j)+\delta.
\end{align*}
For the lower tail, we reverse the comparison. The inequalities $l_j\le f\le u_j$ imply
\begin{align*}
Pf\le Pu_j
\end{align*}
and
\begin{align*}
P_n l_j\le P_n f.
\end{align*}
Hence
\begin{align*}
Pf-P_n f\le P u_j-P_n l_j.
\end{align*}
We insert $P_n u_j$ because this exposes the empirical bracket width:
\begin{align*}
P u_j-P_n l_j=(P u_j-P_n u_j)+P_n(u_j-l_j).
\end{align*}
Thus
\begin{align*}
Pf-P_n f\le |P_nu_j-Pu_j|+P_n w_j,
\end{align*}
where $w_j:S\to[0,\infty)$ is defined by $w_j(x)=u_j(x)-l_j(x)$.
Combining the upper and lower estimates, and then allowing the worst endpoint and worst width among the finitely many brackets, gives
\begin{align*}
|P_n f-Pf|\le \max_{h\in\mathcal E}|P_nh-Ph|+\max_{1\le k\le J}P_n w_k+\delta.
\end{align*}
Since this estimate holds for every $f\in\mathcal F$, taking the supremum over $\mathcal F$ gives
\begin{align*}
\sup_{f\in\mathcal F}|P_n f-Pf|\le \max_{h\in\mathcal E}|P_nh-Ph|+\max_{1\le k\le J}P_n w_k+\delta.
\end{align*}[/guided]
custom_env
admin
[step:Apply the finite-class law to endpoints and bracket widths]
The finite endpoint class $\mathcal E$ consists of $P$-integrable [measurable functions](/page/Measurable%20Functions). Hence, by the [Finite-Class [Uniform Law of Large Numbers](/theorems/1855)][citetheorem:9816],
\begin{align*}
\max_{h\in\mathcal E}|P_nh-Ph|\xrightarrow{\mathbb P}0.
\end{align*}
The finite width class $\mathcal W$ also consists of $P$-integrable measurable functions. Applying the same result to $\mathcal W$ gives
\begin{align*}
\max_{1\le k\le J}|P_nw_k-Pw_k|\xrightarrow{\mathbb P}0.
\end{align*}
Since $Pw_k<\delta$ for every $k$, we have
\begin{align*}
\max_{1\le k\le J}P_n w_k
\le \max_{1\le k\le J}|P_nw_k-Pw_k|+\delta.
\end{align*}
Therefore,
\begin{align*}
\max_{1\le k\le J}P_n w_k\xrightarrow{\mathbb P}\text{is eventually bounded above by }2\delta
\end{align*}
in the precise sense that
\begin{align*}
\mathbb P\left(\max_{1\le k\le J}P_n w_k>2\delta\right)\to 0.
\end{align*}
[/step]
custom_env
admin
[step:Choose the bracket width to force the supremum below the target accuracy]
Combining the previous estimates, for every $n\in\mathbb N$,
\begin{align*}
\sup_{f\in\mathcal F}|P_n f-Pf|
\le \max_{h\in\mathcal E}|P_nh-Ph|+\max_{1\le k\le J}P_n w_k+\delta.
\end{align*}
Hence
\begin{align*}
\mathbb P\left(\sup_{f\in\mathcal F}|P_n f-Pf|>\eta\right)
\le
\mathbb P\left(\max_{h\in\mathcal E}|P_nh-Ph|>\eta-3\delta\right)
+
\mathbb P\left(\max_{1\le k\le J}P_n w_k>2\delta\right).
\end{align*}
Because $4\delta<\eta$, we have $\eta-3\delta>0$. The first probability tends to $0$ by the finite-class uniform law applied to $\mathcal E$, and the second probability tends to $0$ by the finite-class uniform law applied to $\mathcal W$. Consequently,
\begin{align*}
\lim_{n\to\infty}\mathbb P\left(\sup_{f\in\mathcal F}|P_n f-Pf|>\eta\right)=0.
\end{align*}
Since $\eta>0$ was arbitrary, this proves
\begin{align*}
\sup_{f\in\mathcal F}|P_n f-P f|\xrightarrow{\mathbb P}0.
\end{align*}
Thus $\mathcal F$ is $P$-Glivenko-Cantelli. If the supremum is not measurable, the same argument is read with outer probability throughout.
[/step]