[proofplan]
The excess risk is bounded by the expected supremum of the empirical process $G = \sup_{h \in \mathcal{H}}\{R(h) - \hat{R}(h)\}$. We rewrite this supremum in the form required by the [Symmetrization Bound](/theorems/1965), then invoke symmetrisation to obtain a bound in terms of Rademacher complexity of the negated loss class $-F$. A sign-symmetry argument for Rademacher variables shows $\mathcal{R}_n(-F) = \mathcal{R}_n(F)$, completing the proof.
[/proofplan]
[step:Bound the expected excess risk by the expected supremum of the empirical process]
Let $G := \sup_{h \in \mathcal{H}}\{R(h) - \hat{R}(h)\}$. For the ERM $\hat{h}$, the ERM property $\hat{R}(\hat{h}) \leq \hat{R}(h^*)$ gives
\begin{align*}
R(\hat{h}) - R(h^*) = \bigl[R(\hat{h}) - \hat{R}(\hat{h})\bigr] + \bigl[\hat{R}(\hat{h}) - \hat{R}(h^*)\bigr] + \bigl[\hat{R}(h^*) - R(h^*)\bigr] \leq \bigl[R(\hat{h}) - \hat{R}(\hat{h})\bigr] + \bigl[\hat{R}(h^*) - R(h^*)\bigr].
\end{align*}
Since $R(\hat{h}) - \hat{R}(\hat{h}) \leq G$ and $\hat{R}(h^*) - R(h^*) \leq 0$ in expectation (by unbiasedness of $\hat{R}(h^*)$), taking expectations:
\begin{align*}
\mathbb{E}[R(\hat{h}) - R(h^*)] \leq \mathbb{E}[R(\hat{h}) - \hat{R}(\hat{h})] + \underbrace{\mathbb{E}[\hat{R}(h^*) - R(h^*)]}_{= 0} \leq \mathbb{E}[G].
\end{align*}
The second term vanishes because $\mathbb{E}[\hat{R}(h^*)] = R(h^*)$ by linearity of expectation and the fact that $h^*$ is non-random.
[/step]
[step:Rewrite $\mathbb{E}[G]$ as the expected supremum of a centred empirical process and apply the Symmetrization Bound]
For each $f \in F$, write $f(Z_i) = \ell(h(X_i), Y_i)$ where $Z_i = (X_i, Y_i)$. Then $R(h) = \mathbb{E}[f(Z)]$ and $\hat{R}(h) = \frac{1}{n}\sum_{i=1}^n f(Z_i)$, so
\begin{align*}
G = \sup_{f \in F}\left\{\mathbb{E}[f(Z)] - \frac{1}{n}\sum_{i=1}^n f(Z_i)\right\} = \sup_{f \in F}\frac{1}{n}\sum_{i=1}^n\bigl\{\mathbb{E}[f(Z_i)] - f(Z_i)\bigr\}.
\end{align*}
Negating inside the supremum, this is equivalently
\begin{align*}
G = \sup_{g \in -F}\frac{1}{n}\sum_{i=1}^n\bigl\{g(Z_i) - \mathbb{E}[g(Z_i)]\bigr\},
\end{align*}
where $-F := \{-f : f \in F\}$ is the negated loss class. This is exactly the form to which the [Symmetrization Bound](/theorems/1965) applies. The hypotheses of the Symmetrization Bound are satisfied: $-F$ is a class of real-valued functions on $\mathcal{Z} = \mathcal{X} \times \{-1, 1\}$, and $Z_1, \ldots, Z_n$ are i.i.d. Applying the bound:
\begin{align*}
\mathbb{E}[G] \leq 2\mathcal{R}_n(-F).
\end{align*}
[guided]
The sign reversal is the key algebraic step. The empirical process $G$ takes the form $\sup_{f \in F}\{\mathbb{E}f - \hat{\mathbb{E}}_n f\}$, but the Symmetrization Bound controls $\sup_{g \in \mathcal{G}}\{\hat{\mathbb{E}}_n g - \mathbb{E}g\}$ (empirical minus population). Introducing $g = -f$ converts one form into the other: $\mathbb{E}f - \hat{\mathbb{E}}_n f = \hat{\mathbb{E}}_n(-f) - \mathbb{E}(-f) = \hat{\mathbb{E}}_n g - \mathbb{E}g$.
The Symmetrization Bound requires i.i.d. data and a class of real-valued functions. Both conditions are met: the data $(X_i, Y_i)$ are i.i.d. by assumption, and each $g \in -F$ is a measurable function from $\mathcal{Z}$ to $\mathbb{R}$.
[/guided]
[/step]
[step:Use Rademacher sign symmetry to replace $\mathcal{R}_n(-F)$ by $\mathcal{R}_n(F)$]
By definition, the Rademacher complexity of a function class $\mathcal{G}$ is
\begin{align*}
\mathcal{R}_n(\mathcal{G}) = \mathbb{E}\!\left[\sup_{g \in \mathcal{G}} \frac{1}{n}\sum_{i=1}^n \varepsilon_i g(Z_i)\right],
\end{align*}
where $\varepsilon_1, \ldots, \varepsilon_n$ are i.i.d. Rademacher random variables (taking values $\pm 1$ with equal probability), independent of $Z_1, \ldots, Z_n$. For the negated class $-F$:
\begin{align*}
\mathcal{R}_n(-F) = \mathbb{E}\!\left[\sup_{f \in F} \frac{1}{n}\sum_{i=1}^n \varepsilon_i \bigl(-f(Z_i)\bigr)\right] = \mathbb{E}\!\left[\sup_{f \in F} \frac{1}{n}\sum_{i=1}^n (-\varepsilon_i) f(Z_i)\right].
\end{align*}
Since $\varepsilon_i$ are symmetric about zero, the vector $(-\varepsilon_1, \ldots, -\varepsilon_n)$ has the same joint distribution as $(\varepsilon_1, \ldots, \varepsilon_n)$. Therefore
\begin{align*}
\mathbb{E}\!\left[\sup_{f \in F} \frac{1}{n}\sum_{i=1}^n (-\varepsilon_i) f(Z_i)\right] = \mathbb{E}\!\left[\sup_{f \in F} \frac{1}{n}\sum_{i=1}^n \varepsilon_i f(Z_i)\right] = \mathcal{R}_n(F).
\end{align*}
Combining with the bound from the previous step:
\begin{align*}
\mathbb{E}[R(\hat{h})] - R(h^*) \leq \mathbb{E}[G] \leq 2\mathcal{R}_n(-F) = 2\mathcal{R}_n(F),
\end{align*}
which is the stated inequality.
[guided]
The sign-symmetry argument is worth spelling out carefully. Each $\varepsilon_i$ takes values $\pm 1$ with equal probability, so $-\varepsilon_i$ has the same distribution as $\varepsilon_i$. Since $\varepsilon_1, \ldots, \varepsilon_n$ are independent, the joint distribution of $(-\varepsilon_1, \ldots, -\varepsilon_n)$ equals the joint distribution of $(\varepsilon_1, \ldots, \varepsilon_n)$. Any measurable function of $(\varepsilon_1, \ldots, \varepsilon_n)$ -- including $\sup_{f \in F}\frac{1}{n}\sum_i \varepsilon_i f(Z_i)$ -- therefore has the same expectation when $\varepsilon_i$ is replaced by $-\varepsilon_i$ throughout.
This symmetry is the reason Rademacher complexity is insensitive to negation of the function class: $\mathcal{R}_n(-\mathcal{G}) = \mathcal{R}_n(\mathcal{G})$ for any class $\mathcal{G}$. It is a purely distributional property of Rademacher random variables and does not require any structure on $\mathcal{G}$.
[/guided]
[/step]