[proofplan]
We chain two previously established bounds. The [Empirical Rademacher Bound via Counting](/theorems/1968) controls $\widehat{\mathcal{R}}(\mathcal{F}(z_{1:n}))$ in terms of $\log|\mathcal{H}(x_{1:n})|$. The [Sauer--Shelah Lemma](/theorems/1969) then bounds $|\mathcal{H}(x_{1:n})|$ by a polynomial in $n$ determined by the VC dimension. Composing these two estimates and taking expectations over $z_{1:n}$ yields the stated $O(\sqrt{d\log n/n})$ rate.
[/proofplan]
[step:Bound the empirical Rademacher complexity via the counting argument]
By the [Empirical Rademacher Bound via Counting](/theorems/1968), for every fixed data sequence $z_{1:n} \in (\mathcal{X} \times \{-1,1\})^n$:
\begin{align*}
\widehat{\mathcal{R}}(\mathcal{F}(z_{1:n})) \leq \sqrt{\frac{2\log|\mathcal{H}(x_{1:n})|}{n}},
\end{align*}
where $\mathcal{H}(x_{1:n}) = \{(h(x_1), \ldots, h(x_n)) : h \in \mathcal{H}\}$ is the set of distinct label patterns that $\mathcal{H}$ produces on the feature vectors $x_{1:n}$.
[/step]
[step:Apply the Sauer--Shelah Lemma to replace $|\mathcal{H}(x_{1:n})|$ by a polynomial in $n$]
By the [Sauer--Shelah Lemma](/theorems/1969), since $\operatorname{VC}(\mathcal{H}) = d < \infty$, the growth function satisfies $s(\mathcal{H}, n) \leq (n+1)^d$ for all $n \geq 1$. Since $|\mathcal{H}(x_{1:n})| \leq s(\mathcal{H}, n)$ by definition of the growth function (the growth function is the maximum of $|\mathcal{H}(x_{1:n})|$ over all choices of $x_{1:n}$), we obtain
\begin{align*}
|\mathcal{H}(x_{1:n})| \leq s(\mathcal{H}, n) \leq (n+1)^d.
\end{align*}
Substituting into the bound from the previous step:
\begin{align*}
\widehat{\mathcal{R}}(\mathcal{F}(z_{1:n})) \leq \sqrt{\frac{2\log(n+1)^d}{n}} = \sqrt{\frac{2d\log(n+1)}{n}}.
\end{align*}
This bound holds for every fixed $z_{1:n}$.
[guided]
Why does $|\mathcal{H}(x_{1:n})| \leq s(\mathcal{H}, n)$? The growth function is defined as $s(\mathcal{H}, n) := \max_{x_1, \ldots, x_n \in \mathcal{X}} |\mathcal{H}(x_{1:n})|$, so for any particular choice of feature vectors $x_{1:n}$, the number of distinct label patterns is at most the growth function. The Sauer--Shelah Lemma then provides the uniform upper bound $(n+1)^d$ on the growth function, which depends only on the sample size $n$ and the VC dimension $d$ -- not on the specific data points. Applying the logarithm: $\log|\mathcal{H}(x_{1:n})| \leq \log(n+1)^d = d\log(n+1)$.
[/guided]
[/step]
[step:Take expectations over $z_{1:n}$ to obtain the population Rademacher complexity bound]
The population Rademacher complexity is defined as $\mathcal{R}_n(\mathcal{F}) = \mathbb{E}_{z_{1:n}}[\widehat{\mathcal{R}}(\mathcal{F}(z_{1:n}))]$, where the expectation is over the i.i.d. draw of $z_{1:n}$. Since the bound $\sqrt{2d\log(n+1)/n}$ holds for every realisation of $z_{1:n}$ (it does not depend on $z_{1:n}$), taking expectations preserves the inequality:
\begin{align*}
\mathcal{R}_n(\mathcal{F}) = \mathbb{E}_{z_{1:n}}\!\left[\widehat{\mathcal{R}}(\mathcal{F}(z_{1:n}))\right] \leq \sqrt{\frac{2d\log(n+1)}{n}}.
\end{align*}
This is the stated Rademacher bound via VC dimension.
[/step]