[proofplan]
We reduce the empirical Rademacher complexity to the expected maximum of finitely many sub-Gaussian random variables, then apply the [Expected Maximum of Sub-Gaussians](/theorems/1963) bound. The key observations are: (1) the supremum over $\mathcal{F}$ can be replaced by a supremum over the finitely many distinct behaviours of $\mathcal{F}$ on $z_{1:n}$; (2) each resulting random variable is a normalised Rademacher sum, hence sub-Gaussian by [Hoeffding's Lemma](/theorems/1956) and [Sub-Gaussian Stability Under Linear Combinations](/theorems/1960); and (3) the number of distinct behaviours is $|\mathcal{H}(x_{1:n})|$.
[/proofplan]
[step:Reduce the supremum to finitely many distinct loss vectors]
Two loss functions $f, f' \in \mathcal{F}$ that agree on the data -- meaning $f(z_i) = f'(z_i)$ for all $i = 1, \ldots, n$ -- contribute identically to the empirical Rademacher complexity because $\sum_{i=1}^n \varepsilon_i f(z_i) = \sum_{i=1}^n \varepsilon_i f'(z_i)$. Therefore the supremum in
\begin{align*}
\widehat{\mathcal{R}}(\mathcal{F}(z_{1:n})) = \mathbb{E}\!\left[\sup_{f \in \mathcal{F}} \frac{1}{n}\sum_{i=1}^n \varepsilon_i f(z_i)\right]
\end{align*}
ranges over at most $d := |\mathcal{F}(z_{1:n})|$ distinct values, where $\mathcal{F}(z_{1:n}) = \{(f(z_1), \ldots, f(z_n)) : f \in \mathcal{F}\}$ is the set of distinct behaviour vectors. Let $f_1, \ldots, f_d$ be representatives realising these $d$ distinct behaviours. Then
\begin{align*}
\widehat{\mathcal{R}}(\mathcal{F}(z_{1:n})) = \mathbb{E}\!\left[\max_{j=1,\ldots,d} \frac{1}{n}\sum_{i=1}^n \varepsilon_i f_j(z_i)\right].
\end{align*}
Since $\mathcal{F}$ is the misclassification loss class of $\mathcal{H}$, the number of distinct behaviours of $\mathcal{F}$ on $z_{1:n}$ equals the number of distinct label patterns of $\mathcal{H}$ on $x_{1:n}$, i.e., $d = |\mathcal{H}(x_{1:n})|$.
[/step]
[step:Verify that each $W_j := \frac{1}{n}\sum_{i=1}^n \varepsilon_i f_j(z_i)$ is mean-zero and sub-Gaussian with parameter $1/\sqrt{n}$]
For each $j \in \{1, \ldots, d\}$, define $W_j := \frac{1}{n}\sum_{i=1}^n \varepsilon_i f_j(z_i)$. Since the data $z_{1:n}$ are fixed and the $\varepsilon_i$ are i.i.d. Rademacher (symmetric about zero), each summand $\varepsilon_i f_j(z_i)$ has mean zero: $\mathbb{E}[\varepsilon_i f_j(z_i)] = f_j(z_i)\,\mathbb{E}[\varepsilon_i] = 0$. By linearity of expectation, $\mathbb{E}[W_j] = 0$.
For the sub-Gaussian parameter: each $\varepsilon_i$ takes values in $\{-1, 1\}$ and each $f_j(z_i) \in \{0, 1\}$ (since $\mathcal{F}$ is the misclassification loss class with values in $\{0, 1\}$), so $\varepsilon_i f_j(z_i)$ takes values in $\{-1, 0, 1\} \subset [-1, 1]$. By [Hoeffding's Lemma](/theorems/1956), $\varepsilon_i f_j(z_i)$ is sub-Gaussian with parameter $(1 - (-1))/2 = 1$. The $\varepsilon_1, \ldots, \varepsilon_n$ are independent (and the $f_j(z_i)$ are deterministic constants), so by [Sub-Gaussian Stability Under Linear Combinations](/theorems/1960) applied with weights $\gamma_i = 1/n$ and parameters $\sigma_i = 1$:
\begin{align*}
W_j \text{ is sub-Gaussian with parameter } \left(\sum_{i=1}^n \frac{1}{n^2} \cdot 1\right)^{1/2} = \frac{1}{\sqrt{n}}.
\end{align*}
[guided]
Why parameter $1/\sqrt{n}$ and not some other value? The stability theorem says a linear combination $\sum_i \gamma_i V_i$ of independent sub-Gaussians with parameters $\sigma_i$ is sub-Gaussian with parameter $(\sum_i \gamma_i^2 \sigma_i^2)^{1/2}$. Here each $V_i = \varepsilon_i f_j(z_i)$ is sub-Gaussian with parameter $\sigma_i = 1$ (from Hoeffding's Lemma, since $V_i \in [-1, 1]$), and $\gamma_i = 1/n$. So
\begin{align*}
\tilde{\sigma} = \left(\sum_{i=1}^n \frac{1}{n^2}\right)^{1/2} = \left(\frac{1}{n}\right)^{1/2} = \frac{1}{\sqrt{n}}.
\end{align*}
One might obtain a tighter bound by noting $f_j(z_i) \in \{0, 1\}$, so $\varepsilon_i f_j(z_i) \in \{0, \pm 1\}$ and some summands may be deterministically zero. However, the uniform bound $\sigma_i = 1$ suffices for the stated result.
[/guided]
[/step]
[step:Apply the expected maximum of sub-Gaussians bound to conclude]
By the previous step, $W_1, \ldots, W_d$ are mean-zero random variables, each sub-Gaussian with the common parameter $\sigma = 1/\sqrt{n}$. (Independence of $W_1, \ldots, W_d$ is not required.) By the [Expected Maximum of Sub-Gaussians](/theorems/1963) bound:
\begin{align*}
\widehat{\mathcal{R}}(\mathcal{F}(z_{1:n})) = \mathbb{E}\!\left[\max_{j=1,\ldots,d} W_j\right] \leq \sigma\sqrt{2\log d} = \frac{1}{\sqrt{n}}\sqrt{2\log d} = \sqrt{\frac{2\log d}{n}}.
\end{align*}
Substituting $d = |\mathcal{H}(x_{1:n})|$:
\begin{align*}
\widehat{\mathcal{R}}(\mathcal{F}(z_{1:n})) \leq \sqrt{\frac{2\log|\mathcal{H}(x_{1:n})|}{n}},
\end{align*}
which is the stated bound.
[/step]