[proofplan]
Define the empirical-process supremum $Z$ and compare its expectation with the expected empirical Rademacher complexity by the expected Rademacher symmetrisation bound. Then prove that both $Z$ and the empirical Rademacher complexity have bounded differences with coordinate sensitivity given by
\begin{align*}
\frac{b-a}{n}
\end{align*}
[McDiarmid's bounded differences inequality](/theorems/6072) gives one upper-tail estimate for $Z$ and one lower-tail estimate for the empirical Rademacher complexity. Intersecting the two high-probability events and using finite subadditivity of probability gives the asserted constant $3$.
[/proofplan]
custom_env
admin
[step:Introduce the two random functionals to be concentrated]
Define the map
\begin{align*}
Z:\Omega\to\mathbb R
\end{align*}
by
\begin{align*}
Z:=\sup_{f\in\mathcal F}(P f-P_n f).
\end{align*}
Define the map
\begin{align*}
R:\Omega\to\mathbb R
\end{align*}
by
\begin{align*}
R(\omega):=\widehat{\mathfrak R}_n(\mathcal F;X_1(\omega),\dots,X_n(\omega))
\end{align*}
for every $\omega\in\Omega$.
By the hypotheses, both $Z$ and $R$ are measurable real-valued random variables. In particular, the relevant suprema are finite-valued by assumption, and all expectations below are finite.
[/step]
custom_env
admin
[step:Control the mean of the empirical-process supremum by symmetrisation]By the [expected Rademacher uniform deviation bound](/theorems/9827) [citetheorem:9827], applied to the measurable class $\mathcal F$ of $[a,b]$-valued functions and the i.i.d. sample $X_1,\dots,X_n$, we have
\begin{align*}
\mathbb E[Z]\le 2\mathbb E[R].
\end{align*}[/step]
custom_env
admin
[guided]The first task is to replace the unknown mean of the empirical-process supremum by a Rademacher quantity. The theorem [citetheorem:9827] applies in the present setting because $X_1,\dots,X_n$ are i.i.d. with common distribution $P$, every $f\in\mathcal F$ is measurable and bounded in $[a,b]$, and the relevant supremum is measurable by hypothesis. Its conclusion is exactly the expectation comparison
\begin{align*}
\mathbb E\left[\sup_{f\in\mathcal F}(P f-P_n f)\right]\le 2\mathbb E\left[\widehat{\mathfrak R}_n(\mathcal F;X_1,\dots,X_n)\right].
\end{align*}
With the definitions of $Z$ and $R$, this becomes
\begin{align*}
\mathbb E[Z]\le 2\mathbb E[R].
\end{align*}
This is the only symmetrisation input: the remaining work is concentration around these two expectations.[/guided]
custom_env
admin
[step:Verify bounded differences for the empirical-process supremum]
Let $s=(s_1,\dots,s_n)\in S^n$ and $s'=(s_1',\dots,s_n')\in S^n$ be two deterministic samples that differ only in coordinate $k\in\{1,\dots,n\}$. Define
\begin{align*}
\Phi:S^n\to\mathbb R
\end{align*}
by
\begin{align*}
\Phi(s):=\sup_{f\in\mathcal F}\left(P f-\frac{1}{n}\sum_{i=1}^{n}f(s_i)\right).
\end{align*}
For every $f\in\mathcal F$,
\begin{align*}
\left|\left(P f-\frac{1}{n}\sum_{i=1}^{n}f(s_i)\right)-\left(P f-\frac{1}{n}\sum_{i=1}^{n}f(s_i')\right)\right|
\le \frac{b-a}{n},
\end{align*}
because all terms except the $k$th cancel and $f(s_k),f(s_k')\in[a,b]$. Taking suprema preserves this Lipschitz bound, so
\begin{align*}
|\Phi(s)-\Phi(s')|\le \frac{b-a}{n}.
\end{align*}
Thus $Z=\Phi(X_1,\dots,X_n)$ has bounded differences with constants
\begin{align*}
c_1=\cdots=c_n=\frac{b-a}{n}.
\end{align*}
[/step]
custom_env
admin
[step:Concentrate the empirical-process supremum above its mean]Apply McDiarmid's bounded differences inequality to the measurable [random variable](/page/Random%20Variable) $Z=\Phi(X_1,\dots,X_n)$, viewed as a sample functional. The theorem's hypotheses are satisfied because $X_1,\dots,X_n$ are independent, the composed random variable $Z$ is measurable by assumption, and the previous step proves the coordinate bounded-difference constants
\begin{align*}
c_i=\frac{b-a}{n}\qquad (i=1,\dots,n).
\end{align*}
Hence, for every $t>0$,
\begin{align*}
\mathbb P(Z-\mathbb E[Z]\ge t)\le \exp\left(-\frac{2t^2}{\sum_{i=1}^{n}c_i^2}\right).
\end{align*}
Since
\begin{align*}
\sum_{i=1}^{n}c_i^2=n\left(\frac{b-a}{n}\right)^2=\frac{(b-a)^2}{n},
\end{align*}
choosing
\begin{align*}
t_0:=(b-a)\sqrt{\frac{\log(2/\delta)}{2n}}
\end{align*}
gives
\begin{align*}
\mathbb P(Z\le \mathbb E[Z]+t_0)\ge 1-\frac{\delta}{2}.
\end{align*}[/step]
custom_env
admin
[guided]We now apply McDiarmid's bounded differences inequality to $Z$. The inequality requires independent input variables, a measurable output random variable, and deterministic coordinate-sensitivity constants. These hypotheses hold here: $X_1,\dots,X_n$ are independent by the theorem statement, the composed random variable $Z=\Phi(X_1,\dots,X_n)$ is measurable by the theorem statement, and the previous step showed that changing one sample coordinate changes the deterministic functional $\Phi$ by at most the bound displayed above. Therefore McDiarmid gives, for every $t>0$,
\begin{align*}
\mathbb P(Z-\mathbb E[Z]\ge t)\le \exp\left(-\frac{2t^2}{\sum_{i=1}^{n}c_i^2}\right).
\end{align*}[/guided]
custom_env
admin
[step:Verify bounded differences for the empirical Rademacher complexity]For $s=(s_1,\dots,s_n)\in S^n$, define
\begin{align*}
\Psi:S^n\to\mathbb R
\end{align*}
by
\begin{align*}
\Psi(s):=\mathbb E_\varepsilon\left[\sup_{f\in\mathcal F}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(s_i)\right].
\end{align*}
Let $s,s'\in S^n$ differ only in coordinate $k$. For each fixed Rademacher vector $\varepsilon=(\varepsilon_1,\dots,\varepsilon_n)\in\{-1,1\}^n$, define the map
\begin{align*}
A_\varepsilon:S^n \to \mathbb R
\end{align*}
by
\begin{align*}
A_\varepsilon(s)=\sup_{f\in\mathcal F}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(s_i)
\end{align*}
for $s\in S^n$.
For every $f\in\mathcal F$,
\begin{align*}
\left|\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(s_i)-\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(s_i')\right|
=\frac{1}{n}|f(s_k)-f(s_k')|
\le \frac{b-a}{n}.
\end{align*}
Taking suprema over $f\in\mathcal F$ gives
\begin{align*}
|A_\varepsilon(s)-A_\varepsilon(s')|\le \frac{b-a}{n}.
\end{align*}
Taking expectation with respect to the Rademacher variables yields
\begin{align*}
|\Psi(s)-\Psi(s')|\le \frac{b-a}{n}.
\end{align*}
Thus $R=\Psi(X_1,\dots,X_n)$ has bounded differences with constants
\begin{align*}
c_1=\cdots=c_n=\frac{b-a}{n}.
\end{align*}[/step]
custom_env
admin
[guided]We must check the bounded-difference condition after the supremum over $\mathcal F$ and after averaging over the Rademacher signs. Fix two deterministic samples $s=(s_1,\dots,s_n)$ and $s'=(s_1',\dots,s_n')$ that differ only at coordinate $k$. For a fixed sign vector $\varepsilon=(\varepsilon_1,\dots,\varepsilon_n)$, the two Rademacher averages differ only through the $k$th summand. Hence, for every $f\in\mathcal F$,
\begin{align*}
\left|\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(s_i)-\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(s_i')\right|
=\left|\frac{\varepsilon_k}{n}\{f(s_k)-f(s_k')\}\right|.
\end{align*}
Because $|\varepsilon_k|=1$ and $f(s_k),f(s_k')\in[a,b]$, this is bounded by
\begin{align*}
\frac{b-a}{n}.
\end{align*}
The elementary inequality
\begin{align*}
|\sup_{f\in\mathcal F}u_f-\sup_{f\in\mathcal F}v_f|\le \sup_{f\in\mathcal F}|u_f-v_f|
\end{align*}
therefore gives
\begin{align*}
|A_\varepsilon(s)-A_\varepsilon(s')|\le \frac{b-a}{n},
\end{align*}
where
\begin{align*}
A_\varepsilon(s):=\sup_{f\in\mathcal F}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(s_i).
\end{align*}
Finally, expectation with respect to the Rademacher variables cannot increase this deterministic Lipschitz bound, so
\begin{align*}
|\Psi(s)-\Psi(s')|\le \frac{b-a}{n},
\end{align*}
where
\begin{align*}
\Psi(s):=\mathbb E_\varepsilon[A_\varepsilon(s)].
\end{align*}
Thus the empirical Rademacher complexity, viewed as a function of the sample, has the same coordinate sensitivity as the constant displayed above.[/guided]
custom_env
admin
[step:Concentrate the empirical Rademacher complexity below its mean]Apply the lower-tail form of McDiarmid's bounded differences inequality to the measurable random variable $R=\Psi(X_1,\dots,X_n)$. Equivalently, apply the upper-tail form to $-R$. The hypotheses are satisfied because $X_1,\dots,X_n$ are independent, the composed random variable $R$ is measurable by assumption, and the previous step proves the bounded-difference constants
\begin{align*}
c_i=\frac{b-a}{n}\qquad (i=1,\dots,n).
\end{align*}
Thus, for every $t>0$,
\begin{align*}
\mathbb P(\mathbb E[R]-R\ge t)\le \exp\left(-\frac{2t^2}{(b-a)^2/n}\right).
\end{align*}
With the same value
\begin{align*}
t_0=(b-a)\sqrt{\frac{\log(2/\delta)}{2n}},
\end{align*}
we obtain
\begin{align*}
\mathbb P(\mathbb E[R]\le R+t_0)\ge 1-\frac{\delta}{2}.
\end{align*}[/step]
custom_env
admin
[guided]The empirical Rademacher complexity must also be concentrated, but now we need a lower-tail estimate because the final bound will replace $\mathbb E[R]$ by the observable quantity $R$. Apply McDiarmid's inequality to $-R$, or equivalently use its lower-tail form for $R$. The required hypotheses hold: the sample variables $X_1,\dots,X_n$ are independent, the composed random variable $R=\Psi(X_1,\dots,X_n)$ is measurable by the theorem statement, and the previous step proved that changing one coordinate of the sample changes $\Psi$ by at most $\frac{b-a}{n}$. Therefore, for every $t>0$,
\begin{align*}
\mathbb P(\mathbb E[R]-R\ge t)\le \exp\left(-\frac{2t^2}{(b-a)^2/n}\right).
\end{align*}[/guided]
custom_env
admin
[step:Intersect the concentration events and collect the constants]Let $E_1\in\mathcal G$ denote the event
\begin{align*}
E_1:=\{Z\le \mathbb E[Z]+t_0\}
\end{align*}
and let $E_2\in\mathcal G$ denote the event
\begin{align*}
E_2:=\{\mathbb E[R]\le R+t_0\}.
\end{align*}
The previous two steps give
\begin{align*}
\mathbb P(E_1)\ge 1-\frac{\delta}{2}
\end{align*}
and
\begin{align*}
\mathbb P(E_2)\ge 1-\frac{\delta}{2}.
\end{align*}
By finite subadditivity of probability applied to $E_1^c\cup E_2^c$,
\begin{align*}
\mathbb P(E_1\cap E_2)\ge 1-\delta.
\end{align*}
On $E_1\cap E_2$, the expectation comparison and the two concentration bounds imply
\begin{align*}
Z\le \mathbb E[Z]+t_0\le 2\mathbb E[R]+t_0\le 2R+3t_0.
\end{align*}
Substituting the definitions of $Z$, $R$, and $t_0$ gives
\begin{align*}
\sup_{f\in\mathcal F}(P f-P_n f)\le 2\widehat{\mathfrak R}_n(\mathcal F;X_1,\dots,X_n)+3(b-a)\sqrt{\frac{\log(2/\delta)}{2n}}.
\end{align*}
This holds with probability at least $1-\delta$, completing the proof.[/step]
custom_env
admin
[guided]The two concentration estimates each fail with probability at most $\delta/2$. Since $E_1$ and $E_2$ are events in the probability sigma-algebra $\mathcal G$, finite subadditivity gives
\begin{align*}
\mathbb P((E_1\cap E_2)^c)=\mathbb P(E_1^c\cup E_2^c)\le \mathbb P(E_1^c)+\mathbb P(E_2^c)\le \delta.
\end{align*}
Thus $\mathbb P(E_1\cap E_2)\ge 1-\delta$. On this intersection, the event $E_1$ gives $Z\le \mathbb E[Z]+t_0$, the symmetrisation estimate gives $\mathbb E[Z]\le 2\mathbb E[R]$, and the event $E_2$ gives $\mathbb E[R]\le R+t_0$. Combining these three inequalities yields
\begin{align*}
Z\le \mathbb E[Z]+t_0\le 2\mathbb E[R]+t_0\le 2R+3t_0.
\end{align*}[/guided]