[proofplan]
Fix two functions $f,g\in\mathcal F$ and reduce the increment $G_n(f)-G_n(g)$ to the centered empirical average of the single [bounded function](/page/Bounded%20Function) $h=f-g$. The function $h$ takes values in $[-1,1]$, so its range length is $2$, and its variance under $P$ is bounded by $P h^2=d(f,g)^2$. Applying the bounded [Bernstein inequality for empirical averages](/theorems/9839) with threshold $u/\sqrt n$ gives exactly the displayed denominator.
[/proofplan]
custom_env
admin
[step:Rewrite the increment as a centered empirical average of $h=f-g$]Fix $f,g\in\mathcal F$ and $u>0$. Define the [measurable function](/page/Measurable%20Function)
\begin{align*}
h:S\to[-1,1],\qquad h(x):=f(x)-g(x).
\end{align*}
Since $f$ and $g$ take values in $[0,1]$, the range inclusion $h(S)\subset[-1,1]$ holds. In particular $P|h|<\infty$ and $P h^2<\infty$.
By linearity of $P_n$ and $P$ on bounded [measurable functions](/page/Measurable%20Functions),
\begin{align*}
G_n(f)-G_n(g)=\sqrt n\{(P_n f-Pf)-(P_n g-Pg)\}.
\end{align*}
Thus
\begin{align*}
G_n(f)-G_n(g)=\sqrt n\,(P_n h-P h).
\end{align*}
Equivalently,
\begin{align*}
G_n(f)-G_n(g)=\frac{1}{\sqrt n}\sum_{i=1}^{n}\left(h(X_i)-P h\right).
\end{align*}[/step]
custom_env
admin
[guided]Fix $f,g\in\mathcal F$ and $u>0$. The increment of the empirical process should be rewritten as the empirical average of one function, so define
\begin{align*}
h:S\to[-1,1],\qquad h(x):=f(x)-g(x).
\end{align*}
The function $h$ is measurable because $f$ and $g$ are measurable, and $h(S)\subset[-1,1]$ because both $f$ and $g$ take values in $[0,1]$. Hence $h$ is bounded, so $P|h|<\infty$ and $P h^2<\infty$.
By the definitions of $G_n$, $P_n$, and $P$, and by linearity of finite sums and integrals for bounded measurable functions,
\begin{align*}
G_n(f)-G_n(g)=\sqrt n\{(P_n f-Pf)-(P_n g-Pg)\}.
\end{align*}
The two empirical averages combine into $P_n(f-g)=P_nh$, and the two population averages combine into $P(f-g)=Ph$. Therefore
\begin{align*}
G_n(f)-G_n(g)=\sqrt n\,(P_n h-P h).
\end{align*}
Expanding the definition of $P_nh$ gives the equivalent centered-sum form
\begin{align*}
G_n(f)-G_n(g)=\frac{1}{\sqrt n}\sum_{i=1}^{n}\left(h(X_i)-P h\right).
\end{align*}
This identity is the reduction from an empirical-process increment to a scalar empirical-average deviation.[/guided]
custom_env
admin
[step:Verify the variance and boundedness hypotheses for Bernstein's inequality]The random variables
\begin{align*}
h(X_i):(\Omega,\mathcal E)\to(\mathbb R,\mathcal B(\mathbb R))
\end{align*}
are independent and identically distributed because $X_1,\dots,X_n$ are independent and identically distributed and $h$ is measurable. Their common expectation is
\begin{align*}
\mathbb E[h(X_i)]=\int_S h(x)\,dP(x)=P h.
\end{align*}
Their common variance satisfies
\begin{align*}
\operatorname{Var}(h(X_i))=\mathbb E\left[\left(h(X_i)-P h\right)^2\right].
\end{align*}
Using the identity $\operatorname{Var}(Z)=\mathbb E[Z^2]-(\mathbb E[Z])^2$ for the square-integrable real-valued [random variable](/page/Random%20Variable) $Z=h(X_i)$, we get
\begin{align*}
\operatorname{Var}(h(X_i))=P h^2-(P h)^2\le P h^2.
\end{align*}
By the definition of $d$,
\begin{align*}
P h^2=\int_S |f(x)-g(x)|^2\,dP(x)=d(f,g)^2.
\end{align*}
Moreover $h(S)\subset[-1,1]$, so the bounded empirical-average Bernstein inequality applies to $h$ with lower bound $a=-1$, upper bound $b=1$, and variance proxy $d(f,g)^2$.[/step]
custom_env
admin
[guided]The point of introducing $h=f-g$ is that an increment of the empirical process is just the empirical process applied to one bounded function. Because $f,g:S\to[0,1]$, the difference function
\begin{align*}
h:S\to[-1,1],\qquad h(x):=f(x)-g(x)
\end{align*}
is measurable and bounded. Hence $h(X_i)$ is a real-valued random variable for each $1\le i\le n$, and it is integrable and square-integrable.
We must now check the hypotheses needed for the Bernstein inequality. First, the random variables $h(X_1),\dots,h(X_n)$ are independent and identically distributed: independence is preserved under applying the same measurable map $h$ to the independent variables $X_i$, and each $X_i$ has distribution $P$. Their common expectation is computed by the change-of-measure formula for the distribution of $X_i$:
\begin{align*}
\mathbb E[h(X_i)]=\int_S h(x)\,dP(x)=P h.
\end{align*}
Second, Bernstein needs a variance bound. For each $i$, the variance identity gives
\begin{align*}
\operatorname{Var}(h(X_i))=\mathbb E\left[\left(h(X_i)-P h\right)^2\right]=P h^2-(P h)^2.
\end{align*}
Since $(P h)^2\ge 0$, this implies
\begin{align*}
\operatorname{Var}(h(X_i))\le P h^2.
\end{align*}
The quantity on the right is exactly the squared $L^2(P)$ distance between $f$ and $g$:
\begin{align*}
P h^2=\int_S |f(x)-g(x)|^2\,dP(x)=d(f,g)^2.
\end{align*}
Third, Bernstein needs a bounded range. The inclusion $h(S)\subset[-1,1]$ gives lower and upper bounds $a=-1$ and $b=1$, so the range length is $b-a=2$. These are precisely the constants that will produce the term $4u/(3\sqrt n)$ after substituting the threshold $u/\sqrt n$.[/guided]
custom_env
admin
[step:Apply Bernstein's inequality with threshold $u/\sqrt n$]Apply [citetheorem:9839] to the measurable function $h:S\to[-1,1]$ with threshold
\begin{align*}
t:=\frac{u}{\sqrt n}>0.
\end{align*}
The hypotheses have been verified above: $h$ is measurable, $-1\le h\le 1$, and the variance is bounded by $d(f,g)^2$. Therefore
\begin{align*}
\mathbb P\left(|(P_n-P)h|>t\right)\le 2\exp\left(-\frac{n t^2}{2d(f,g)^2+2(1-(-1))t/3}\right).
\end{align*}
Since $1-(-1)=2$, this becomes
\begin{align*}
\mathbb P\left(|(P_n-P)h|>t\right)\le 2\exp\left(-\frac{n t^2}{2d(f,g)^2+4t/3}\right).
\end{align*}
Substituting $t=u/\sqrt n$ gives $n t^2=u^2$ and $4t/3=4u/(3\sqrt n)$, hence
\begin{align*}
\mathbb P\left(\sqrt n\,|(P_n-P)h|>u\right)\le 2\exp\left(-\frac{u^2}{2d(f,g)^2+4u/(3\sqrt n)}\right).
\end{align*}
Using $G_n(f)-G_n(g)=\sqrt n(P_n-P)h$, we obtain
\begin{align*}
\mathbb P\left(|G_n(f)-G_n(g)|>u\right)\le 2\exp\left(-\frac{u^2}{2d(f,g)^2+4u/(3\sqrt n)}\right).
\end{align*}
This is the desired increment bound.[/step]
custom_env
admin
[guided]We now convert the scalar Bernstein estimate into the stated empirical-process increment estimate. Define
\begin{align*}
t:=\frac{u}{\sqrt n}.
\end{align*}
Since $u>0$ and $n\in\mathbb N$, we have $t>0$. The theorem [citetheorem:9839] applies to the measurable function $h:S\to[-1,1]$ because the previous step verified that $h(X_1),\dots,h(X_n)$ are independent identically distributed, that $h$ has lower bound $a=-1$ and upper bound $b=1$, and that the common variance is bounded by $d(f,g)^2$. Therefore
\begin{align*}
\mathbb P\left(|(P_n-P)h|>t\right)\le 2\exp\left(-\frac{n t^2}{2d(f,g)^2+2(1-(-1))t/3}\right).
\end{align*}
The range length is $1-(-1)=2$, so the denominator becomes
\begin{align*}
2d(f,g)^2+2(1-(-1))t/3=2d(f,g)^2+4t/3.
\end{align*}
Substituting $t=u/\sqrt n$ gives
\begin{align*}
n t^2=u^2
\end{align*}
and
\begin{align*}
4t/3=\frac{4u}{3\sqrt n}.
\end{align*}
Hence
\begin{align*}
\mathbb P\left(\sqrt n\,|(P_n-P)h|>u\right)\le 2\exp\left(-\frac{u^2}{2d(f,g)^2+4u/(3\sqrt n)}\right).
\end{align*}
Finally, the first step proved the identity
\begin{align*}
G_n(f)-G_n(g)=\sqrt n(P_n-P)h.
\end{align*}
Replacing the left-hand side by this increment gives
\begin{align*}
\mathbb P\left(|G_n(f)-G_n(g)|>u\right)\le 2\exp\left(-\frac{u^2}{2d(f,g)^2+4u/(3\sqrt n)}\right),
\end{align*}
which is exactly the desired bound.[/guided]