[proofplan]
We first convert the absolute uniform empirical deviation over $\mathcal H$ into a one-sided supremum over the symmetrized class $\mathcal H^\pm$. The expectation of this supremum is bounded by the empirical Rademacher complexity through the symmetrization inequality. A bounded-differences argument then upgrades the expected bound to a high-probability bound, using the fact that the original functions take values in $[0,1]$. Finally, the empirical-risk-minimization statement follows from the deterministic inequality that excess risk is at most twice the uniform deviation.
[/proofplan]
[step:Rewrite the absolute deviation as a one-sided supremum over the symmetrized class]
Define the measurable map $S:\Omega\to[0,\infty)$ by
\begin{align*}
S(\omega):=\sup_{h\in\mathcal H}|(P_n-P)h(\omega)|.
\end{align*}
Since $\mathcal H$ is pointwise measurable, the usual countable separability reduction for pointwise measurable classes makes this supremum measurable. For every $\omega\in\Omega$ and every $h\in\mathcal H$,
\begin{align*}
|(P_n-P)h(\omega)|=\max\{(P_n-P)h(\omega),(P_n-P)(-h)(\omega)\}.
\end{align*}
Since $\mathcal H^\pm=\mathcal H\cup\{-h:h\in\mathcal H\}$, it follows that
\begin{align*}
S=\sup_{g\in\mathcal H^\pm}(P_n-P)g.
\end{align*}
Every $g\in\mathcal H^\pm$ is measurable and satisfies $|g|\le 1$, so $P|g|<\infty$.
[guided]
The absolute value is the reason for introducing $\mathcal H^\pm$. For a fixed function $h\in\mathcal H$, the quantity $(P_n-P)h$ can be positive or negative, and taking absolute values is the same as allowing the sign of the function to change. More precisely, for every sample point $\omega\in\Omega$,
\begin{align*}
|(P_n-P)h(\omega)|=\max\{(P_n-P)h(\omega),-(P_n-P)h(\omega)\}.
\end{align*}
By linearity of $P_n$ and $P$ on bounded [measurable functions](/page/Measurable%20Functions),
\begin{align*}
-(P_n-P)h(\omega)=(P_n-P)(-h)(\omega).
\end{align*}
Thus the largest absolute deviation over $h\in\mathcal H$ is exactly the largest one-sided deviation over the enlarged class containing both $h$ and $-h$:
\begin{align*}
\sup_{h\in\mathcal H}|(P_n-P)h(\omega)|=\sup_{g\in\mathcal H^\pm}(P_n-P)g(\omega).
\end{align*}
The measurability of this supremum is the point of the pointwise measurability assumption: it permits replacing the supremum over $\mathcal H$ by a supremum over a countable pointwise-dense subclass, so that the supremum is a measurable [random variable](/page/Random%20Variable). Finally, because each $h$ takes values in $[0,1]$, every $g\in\mathcal H^\pm$ takes values in $[-1,1]$, and hence $P|g|<\infty$.
[/guided]
[/step]
[step:Bound the expected uniform deviation by symmetrization]
We apply the [[Symmetrization Inequality for Empirical Processes](/theorems/9851)][citetheorem:9851] to the class $\mathcal H^\pm$. Pointwise measurability passes from $\mathcal H$ to $\mathcal H^\pm$ because the countable pointwise-dense subclass for $\mathcal H$ together with its negatives is countable and pointwise dense in $\mathcal H^\pm$. Each $g\in\mathcal H^\pm$ is measurable and satisfies $|g|\le 1$, so $P|g|<\infty$ and the empirical and population averages are integrable. The same countable reduction makes the suprema in the symmetrization inequality measurable; boundedness by $2$ gives finite expectation. Therefore
\begin{align*}
\mathbb E[S]\le 2\mathbb E\left[\mathbb E_\varepsilon\left[\sup_{g\in\mathcal H^\pm}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i g(Z_i)\right]\right].
\end{align*}
By the definition of $\mathfrak R_n$,
\begin{align*}
\mathbb E[S]\le 2\mathbb E\left[\mathfrak R_n(\mathcal H^\pm;Z_1,\dots,Z_n)\right].
\end{align*}
[guided]
We now turn the deterministic rewriting from the previous step into an expectation bound. The theorem we use is the [Symmetrization Inequality for Empirical Processes][citetheorem:9851], applied with the function class $\mathcal H^\pm$ and the sample $Z_1,\dots,Z_n$.
We verify its hypotheses. First, $\mathcal H^\pm$ consists of measurable functions because every element is either $h$ or $-h$ for some measurable $h\in\mathcal H$. Second, every $g\in\mathcal H^\pm$ satisfies $|g|\le 1$, and therefore
\begin{align*}
P|g|=\int_{\mathcal Z}|g(z)|\,dP(z)\le 1<\infty.
\end{align*}
Third, the relevant suprema are measurable. Indeed, pointwise measurability of $\mathcal H$ gives a countable subclass whose pointwise evaluations determine suprema over $\mathcal H$; adjoining the negatives of this countable subclass gives a countable pointwise-dense subclass of $\mathcal H^\pm$. Thus the supremum over $\mathcal H^\pm$ is the supremum of a countable family of measurable random variables. Since the summands are bounded by $1$ in absolute value, these suprema have finite expectation.
The symmetrization inequality therefore gives
\begin{align*}
\mathbb E[S]\le 2\mathbb E\left[\mathbb E_\varepsilon\left[\sup_{g\in\mathcal H^\pm}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i g(Z_i)\right]\right].
\end{align*}
The inner expectation is exactly the empirical Rademacher complexity of $\mathcal H^\pm$ at the realised sample. Hence, by the definition of $\mathfrak R_n$,
\begin{align*}
\mathbb E[S]\le 2\mathbb E\left[\mathfrak R_n(\mathcal H^\pm;Z_1,\dots,Z_n)\right].
\end{align*}
[/guided]
[/step]
[step:Apply bounded differences to the original bounded class]
Define the deterministic function $\Phi:\mathcal Z^n\to[0,\infty)$ by
\begin{align*}
\Phi(z_1,\dots,z_n):=\sup_{h\in\mathcal H}\left|\frac{1}{n}\sum_{i=1}^{n}h(z_i)-P h\right|.
\end{align*}
Then $S=\Phi(Z_1,\dots,Z_n)$. Let $(z_1,\dots,z_n)\in\mathcal Z^n$, let $(z_1',\dots,z_n')\in\mathcal Z^n$, and assume that these two tuples differ only in the $j$-th coordinate for some $j\in\{1,\dots,n\}$. For each $h\in\mathcal H$,
\begin{align*}
\left|\left(\frac{1}{n}\sum_{i=1}^{n}h(z_i)-P h\right)-\left(\frac{1}{n}\sum_{i=1}^{n}h(z_i')-P h\right)\right|=\frac{1}{n}|h(z_j)-h(z_j')|.
\end{align*}
Since $0\le h\le 1$, the right-hand side is at most $1/n$. Taking suprema and using $|\sup A-\sup B|\le \sup_{a}|a-b|$ for corresponding indexed families gives
\begin{align*}
|\Phi(z_1,\dots,z_n)-\Phi(z_1',\dots,z_n')|\le \frac{1}{n}.
\end{align*}
Thus $\Phi$ has bounded differences with constants $1/n,\dots,1/n$. By [McDiarmid's bounded differences inequality](/theorems/6072) applied to the independent variables $Z_1,\dots,Z_n$,
\begin{align*}
\mathbb P\left(S-\mathbb E[S]\ge u\right)\le \exp(-2nu^2)
\end{align*}
for every $u>0$. Taking
\begin{align*}
u:=\sqrt{\frac{t}{2n}}
\end{align*}
gives
\begin{align*}
\mathbb P\left(S\le \mathbb E[S]+\sqrt{\frac{t}{2n}}\right)\ge 1-e^{-t}.
\end{align*}
Using the expectation bound from the previous step, we obtain
\begin{align*}
\mathbb P\left(S\le 2\mathbb E\left[\mathfrak R_n(\mathcal H^\pm;Z_1,\dots,Z_n)\right]+\sqrt{\frac{t}{2n}}\right)\ge 1-e^{-t}.
\end{align*}
This is the first asserted inequality.
[guided]
The concentration step must be done with the original class $\mathcal H$, not with $\mathcal H^\pm$, because the sharp Lipschitz constant uses the range $[0,1]$. Define $\Phi:\mathcal Z^n\to[0,\infty)$ by
\begin{align*}
\Phi(z_1,\dots,z_n):=\sup_{h\in\mathcal H}\left|\frac{1}{n}\sum_{i=1}^{n}h(z_i)-P h\right|.
\end{align*}
Then the random variable we want to control is exactly
\begin{align*}
S=\Phi(Z_1,\dots,Z_n).
\end{align*}
We verify the bounded-differences hypothesis. Fix two deterministic samples $(z_1,\dots,z_n)$ and $(z_1',\dots,z_n')$ that differ only at coordinate $j$. For a fixed $h\in\mathcal H$, the population term $P h$ is unchanged, and all empirical summands except the $j$-th one cancel. Hence
\begin{align*}
\left|\left(\frac{1}{n}\sum_{i=1}^{n}h(z_i)-P h\right)-\left(\frac{1}{n}\sum_{i=1}^{n}h(z_i')-P h\right)\right|=\frac{1}{n}|h(z_j)-h(z_j')|.
\end{align*}
Because every $h\in\mathcal H$ takes values in $[0,1]$, we have $|h(z_j)-h(z_j')|\le 1$, so the displayed difference is at most $1/n$. Passing from a fixed $h$ to the supremum cannot increase the coordinate sensitivity beyond this common bound, and therefore
\begin{align*}
|\Phi(z_1,\dots,z_n)-\Phi(z_1',\dots,z_n')|\le \frac{1}{n}.
\end{align*}
We now apply McDiarmid's bounded differences inequality, whose conclusion says that a [measurable function](/page/Measurable%20Function) of independent variables with coordinate sensitivities $c_1,\dots,c_n$ satisfies
\begin{align*}
\mathbb P\left(\Phi(Z_1,\dots,Z_n)-\mathbb E[\Phi(Z_1,\dots,Z_n)]\ge u\right)\le \exp\left(-\frac{2u^2}{\sum_{i=1}^{n}c_i^2}\right).
\end{align*}
Here $c_i=1/n$ for every $i$, so
\begin{align*}
\sum_{i=1}^{n}c_i^2=\sum_{i=1}^{n}\frac{1}{n^2}=\frac{1}{n}.
\end{align*}
Thus
\begin{align*}
\mathbb P\left(S-\mathbb E[S]\ge u\right)\le \exp(-2nu^2).
\end{align*}
Choosing
\begin{align*}
u=\sqrt{\frac{t}{2n}}
\end{align*}
makes the right-hand side equal to $e^{-t}$. Combining this concentration estimate with the symmetrization estimate
\begin{align*}
\mathbb E[S]\le 2\mathbb E\left[\mathfrak R_n(\mathcal H^\pm;Z_1,\dots,Z_n)\right]
\end{align*}
gives
\begin{align*}
\mathbb P\left(S\le 2\mathbb E\left[\mathfrak R_n(\mathcal H^\pm;Z_1,\dots,Z_n)\right]+\sqrt{\frac{t}{2n}}\right)\ge 1-e^{-t}.
\end{align*}
This is exactly the asserted high-probability uniform deviation bound.
[/guided]
[/step]
[step:Convert the uniform deviation bound into the ERM excess-risk bound]
For the ERM consequence, set
\begin{align*}
B_t:=2\mathbb E\left[\mathfrak R_n(\mathcal H^\pm;Z_1,\dots,Z_n)\right]+\sqrt{\frac{t}{2n}}.
\end{align*}
By the first part,
\begin{align*}
\mathbb P\left(\sup_{f\in\mathcal F_0}|(P_n-P)\ell_f|\le B_t\right)\ge 1-e^{-t}.
\end{align*}
Fix a sample point $\omega\in\Omega$ for which
\begin{align*}
\sup_{f\in\mathcal F_0}|(P_n-P)\ell_f(\omega)|\le B_t
\end{align*}
and for which the selector $\hat f(\omega)$ is evaluated and satisfies the empirical minimization identity. Since
\begin{align*}
P\ell_{\hat f(\omega)}\le P_n\ell_{\hat f(\omega)}(\omega)+B_t
\end{align*}
and
\begin{align*}
P_n\ell_{\hat f(\omega)}(\omega)=\inf_{f\in\mathcal F_0}P_n\ell_f(\omega),
\end{align*}
we have
\begin{align*}
P\ell_{\hat f(\omega)}\le \inf_{f\in\mathcal F_0}P_n\ell_f(\omega)+B_t.
\end{align*}
For every $f\in\mathcal F_0$,
\begin{align*}
P_n\ell_f(\omega)\le P\ell_f+B_t.
\end{align*}
Taking the infimum over $f\in\mathcal F_0$ gives
\begin{align*}
\inf_{f\in\mathcal F_0}P_n\ell_f(\omega)\le \inf_{f\in\mathcal F_0}P\ell_f+B_t.
\end{align*}
Therefore
\begin{align*}
P\ell_{\hat f(\omega)}-\inf_{f\in\mathcal F_0}P\ell_f\le 2B_t.
\end{align*}
Thus the event
\begin{align*}
\left\{\sup_{f\in\mathcal F_0}|(P_n-P)\ell_f|\le B_t\right\}
\end{align*}
is contained in the outer event
\begin{align*}
\left\{P\ell_{\hat f}-\inf_{f\in\mathcal F_0}P\ell_f\le 2B_t\right\}.
\end{align*}
Taking outer probability preserves this lower bound, and hence
\begin{align*}
\mathbb P^*\left(P\ell_{\hat f}-\inf_{f\in\mathcal F_0}P\ell_f\le 2B_t\right)\ge 1-e^{-t}.
\end{align*}
Substituting the definition of $B_t$ proves the stated ERM bound.
[guided]
The ERM part is deterministic once the uniform deviation event is known. Define
\begin{align*}
B_t:=2\mathbb E\left[\mathfrak R_n(\mathcal H^\pm;Z_1,\dots,Z_n)\right]+\sqrt{\frac{t}{2n}}.
\end{align*}
The first part of the theorem gives
\begin{align*}
\mathbb P\left(\sup_{f\in\mathcal F_0}|(P_n-P)\ell_f|\le B_t\right)\ge 1-e^{-t}.
\end{align*}
Fix a sample point $\omega\in\Omega$ on this event and suppose the selector $\hat f(\omega)$ is evaluated and satisfies the empirical minimization identity. The uniform deviation bound applied to $\ell_{\hat f(\omega)}$ gives
\begin{align*}
P\ell_{\hat f(\omega)}\le P_n\ell_{\hat f(\omega)}(\omega)+B_t.
\end{align*}
Since $\hat f(\omega)$ is an empirical risk minimizer,
\begin{align*}
P_n\ell_{\hat f(\omega)}(\omega)=\inf_{f\in\mathcal F_0}P_n\ell_f(\omega).
\end{align*}
For each fixed $f\in\mathcal F_0$, the same uniform deviation event gives
\begin{align*}
P_n\ell_f(\omega)\le P\ell_f+B_t.
\end{align*}
Taking the infimum over $f\in\mathcal F_0$ preserves the inequality, so
\begin{align*}
\inf_{f\in\mathcal F_0}P_n\ell_f(\omega)\le \inf_{f\in\mathcal F_0}P\ell_f+B_t.
\end{align*}
Combining the last three displays yields
\begin{align*}
P\ell_{\hat f(\omega)}-\inf_{f\in\mathcal F_0}P\ell_f\le 2B_t.
\end{align*}
Thus the high-probability uniform deviation event is contained in the outer event appearing in the theorem statement. Taking outer probability is necessary because $\hat f$ may be nonmeasurable, and it preserves the lower bound from the measurable event. Therefore
\begin{align*}
\mathbb P^*\left(P\ell_{\hat f}-\inf_{f\in\mathcal F_0}P\ell_f\le 2B_t\right)\ge 1-e^{-t}.
\end{align*}
Substituting the displayed definition of $B_t$ gives the asserted ERM bound.
[/guided]
[/step]