[proofplan]
The proof is a deterministic algebraic comparison between true risk and empirical risk. We insert and subtract the two empirical risks $P_n\ell_{\hat f_n}$ and $P_n\ell_{f^*}$, so that the excess risk splits into two empirical-process errors and one empirical minimization error. The minimization hypothesis controls the middle term by $0$ in the exact case and by $\varepsilon_n$ in the approximate case, while the two remaining terms are each bounded by the same uniform deviation supremum.
[/proofplan]
custom_env
admin
[step:Decompose the excess risk into empirical deviation terms and an empirical risk gap]Define the finite quantity
\begin{align*}
U:=\sup_{f\in\mathcal F}\left|(P_n-P)(\ell_f)\right|.
\end{align*}
Let $\varepsilon_n\ge 0$, and suppose first that $\hat f_n\in\mathcal F$ satisfies
\begin{align*}
R_n(\hat f_n)\le \inf_{f\in\mathcal F}R_n(f)+\varepsilon_n.
\end{align*}
Since all displayed quantities are finite, we may add and subtract $P_n\ell_{\hat f_n}$ and $P_n\ell_{f^*}$ to obtain
\begin{align*}
R(\hat f_n)-R(f^*)=(P-P_n)(\ell_{\hat f_n})+\left(P_n\ell_{\hat f_n}-P_n\ell_{f^*}\right)+(P_n-P)(\ell_{f^*}).
\end{align*}[/step]
custom_env
admin
[guided]The goal is to compare the true risks $R(\hat f_n)$ and $R(f^*)$, but the assumption on $\hat f_n$ concerns the empirical risks $R_n(\hat f_n)$ and $R_n(f)$. We therefore introduce the empirical risks into the expression by adding and subtracting them.
Define
\begin{align*}
U:=\sup_{f\in\mathcal F}\left|(P_n-P)(\ell_f)\right|.
\end{align*}
This is finite by the finiteness assumption in the theorem statement. Let $\varepsilon_n\ge 0$, and assume
\begin{align*}
R_n(\hat f_n)\le \inf_{f\in\mathcal F}R_n(f)+\varepsilon_n.
\end{align*}
By definition of $R$ and $R_n$,
\begin{align*}
R(\hat f_n)-R(f^*)=P\ell_{\hat f_n}-P\ell_{f^*}.
\end{align*}
Now add and subtract $P_n\ell_{\hat f_n}$ and $P_n\ell_{f^*}$. Since every term is finite, this algebraic rearrangement is legitimate:
\begin{align*}
P\ell_{\hat f_n}-P\ell_{f^*}=(P\ell_{\hat f_n}-P_n\ell_{\hat f_n})+(P_n\ell_{\hat f_n}-P_n\ell_{f^*})+(P_n\ell_{f^*}-P\ell_{f^*}).
\end{align*}
Equivalently,
\begin{align*}
R(\hat f_n)-R(f^*)=(P-P_n)(\ell_{\hat f_n})+\left(P_n\ell_{\hat f_n}-P_n\ell_{f^*}\right)+(P_n-P)(\ell_{f^*}).
\end{align*}
This decomposition isolates exactly where the two ingredients enter: the outer terms are empirical deviations, and the middle term is controlled by empirical risk minimization.[/guided]
custom_env
admin
[step:Use approximate empirical minimization to control the empirical risk gap]
Because $f^*\in\mathcal F$, the definition of the infimum gives
\begin{align*}
\inf_{f\in\mathcal F}R_n(f)\le R_n(f^*).
\end{align*}
Combining this with the approximate empirical minimization assumption gives
\begin{align*}
P_n\ell_{\hat f_n}-P_n\ell_{f^*}=R_n(\hat f_n)-R_n(f^*)\le \varepsilon_n.
\end{align*}
[/step]
custom_env
admin
[step:Bound the two empirical deviation terms by the uniform deviation]
Since $\hat f_n\in\mathcal F$, the definition of $U$ gives
\begin{align*}
(P-P_n)(\ell_{\hat f_n})\le \left|(P-P_n)(\ell_{\hat f_n})\right|=\left|(P_n-P)(\ell_{\hat f_n})\right|\le U.
\end{align*}
Since $f^*\in\mathcal F$, the same definition gives
\begin{align*}
(P_n-P)(\ell_{f^*})\le \left|(P_n-P)(\ell_{f^*})\right|\le U.
\end{align*}
[/step]
custom_env
admin
[step:Combine the estimates and recover the exact minimizer case]
Substituting the preceding three bounds into the decomposition yields
\begin{align*}
R(\hat f_n)-R(f^*)\le U+\varepsilon_n+U=2U+\varepsilon_n.
\end{align*}
By the definition of $U$, this is
\begin{align*}
R(\hat f_n)-R(f^*)\le 2\sup_{f\in\mathcal F}\left|(P_n-P)(\ell_f)\right|+\varepsilon_n.
\end{align*}
This proves the approximate empirical risk minimization bound. The exact empirical risk minimization statement is the special case $\varepsilon_n=0$, because
\begin{align*}
R_n(\hat f_n)\le \inf_{f\in\mathcal F}R_n(f)
\end{align*}
implies the approximate inequality with $\varepsilon_n=0$.
[/step]