[proofplan]
We first prove the finite-vector inequality by bounding the logarithmic [moment generating function](/page/Moment%20Generating%20Function) of the maximum of the signed sums. [Jensen's inequality](/theorems/9) converts the expectation of the supremum into a logarithm of an exponential moment, the maximum is bounded by a sum over the finite set $T$, and independence of the Rademacher variables factors each exponential moment. The elementary inequality $\cosh u\le e^{u^2/2}$ gives the sub-Gaussian bound, after which optimizing over the parameter $\lambda>0$ gives the stated constant. The function-class statement follows by applying the vector result to the finite set of evaluation vectors $(f(x_1),\dots,f(x_n))$ and dividing by $n$.
[/proofplan]
[step:Reduce the expected supremum to an exponential moment]
For each $t=(t_1,\dots,t_n)\in T$, define the real-valued [random variable](/page/Random%20Variable)
\begin{align*}
X_t:\Omega_\varepsilon\to\mathbb R,\qquad X_t(\omega):=\sum_{i=1}^{n}\varepsilon_i(\omega)t_i.
\end{align*}
Since $T$ is finite, the random variable
\begin{align*}
Y:\Omega_\varepsilon\to\mathbb R,\qquad Y(\omega):=\sup_{t\in T}X_t(\omega)
\end{align*}
is measurable and integrable.
Fix $\lambda>0$. Since the logarithm is concave and $\exp(\lambda Y)>0$, [Jensen's inequality](/theorems/1977) for the probability measure $\mathbb P_\varepsilon$ gives
\begin{align*}
\lambda\mathbb E_\varepsilon[Y]=\mathbb E_\varepsilon[\log(\exp(\lambda Y))]\le \log\mathbb E_\varepsilon[\exp(\lambda Y)].
\end{align*}
Thus
\begin{align*}
\mathbb E_\varepsilon[Y]\le \frac{1}{\lambda}\log\mathbb E_\varepsilon[\exp(\lambda Y)].
\end{align*}
[guided]
The goal is to estimate the expectation of a maximum. Exponential moments are useful here because exponentials turn maxima into quantities controlled by sums. For each vector $t=(t_1,\dots,t_n)\in T$, define
\begin{align*}
X_t:\Omega_\varepsilon\to\mathbb R,\qquad X_t(\omega):=\sum_{i=1}^{n}\varepsilon_i(\omega)t_i.
\end{align*}
The random variable whose expectation we need to bound is
\begin{align*}
Y:\Omega_\varepsilon\to\mathbb R,\qquad Y(\omega):=\sup_{t\in T}X_t(\omega).
\end{align*}
Because $T$ is finite and each $X_t$ is a finite linear combination of measurable Rademacher variables, $Y$ is the maximum of finitely many measurable real-valued random variables. Hence $Y$ is measurable. It is also integrable because
\begin{align*}
|Y|\le \sup_{t\in T}\sum_{i=1}^{n}|t_i|
\end{align*}
and the right-hand side is a finite deterministic constant.
Fix $\lambda>0$. We apply Jensen's inequality to the concave function $\log:(0,\infty)\to\mathbb R$ and the positive random variable $\exp(\lambda Y)$. This gives
\begin{align*}
\mathbb E_\varepsilon[\log(\exp(\lambda Y))]\le \log\mathbb E_\varepsilon[\exp(\lambda Y)].
\end{align*}
The left-hand side simplifies because $\log(\exp(\lambda Y))=\lambda Y$, so
\begin{align*}
\lambda\mathbb E_\varepsilon[Y]\le \log\mathbb E_\varepsilon[\exp(\lambda Y)].
\end{align*}
Dividing by $\lambda>0$ yields
\begin{align*}
\mathbb E_\varepsilon[Y]\le \frac{1}{\lambda}\log\mathbb E_\varepsilon[\exp(\lambda Y)].
\end{align*}
This is the point of introducing $\lambda$: it creates a bound with two competing terms, one decreasing in $\lambda$ and one increasing in $\lambda$, which can later be optimized.
[/guided]
[/step]
[step:Bound the exponential moment by summing over $T$]
For each $\omega\in\Omega_\varepsilon$,
\begin{align*}
\exp(\lambda Y(\omega))=\exp\left(\lambda\sup_{t\in T}X_t(\omega)\right)=\sup_{t\in T}\exp(\lambda X_t(\omega)).
\end{align*}
Since $T$ is finite and all summands are nonnegative,
\begin{align*}
\sup_{t\in T}\exp(\lambda X_t(\omega))\le \sum_{t\in T}\exp(\lambda X_t(\omega)).
\end{align*}
Taking expectations with respect to $\mathbb P_\varepsilon$ gives
\begin{align*}
\mathbb E_\varepsilon[\exp(\lambda Y)]\le \sum_{t\in T}\mathbb E_\varepsilon[\exp(\lambda X_t)].
\end{align*}
[/step]
[step:Use the Rademacher moment bound for each vector]
Fix $t=(t_1,\dots,t_n)\in T$. By independence of $\varepsilon_1,\dots,\varepsilon_n$,
\begin{align*}
\mathbb E_\varepsilon[\exp(\lambda X_t)]=\prod_{i=1}^{n}\mathbb E_\varepsilon[\exp(\lambda\varepsilon_i t_i)].
\end{align*}
For each $i\in\{1,\dots,n\}$,
\begin{align*}
\mathbb E_\varepsilon[\exp(\lambda\varepsilon_i t_i)]=\frac{e^{\lambda t_i}+e^{-\lambda t_i}}{2}=\cosh(\lambda t_i).
\end{align*}
For every real number $u$, the elementary estimate $\cosh u\le e^{u^2/2}$ holds. Indeed,
\begin{align*}
\cosh u=\sum_{k=0}^{\infty}\frac{u^{2k}}{(2k)!}\le \sum_{k=0}^{\infty}\frac{u^{2k}}{2^k k!}=e^{u^2/2},
\end{align*}
where the inequality uses $(2k)!\ge 2^k k!$ for every integer $k\ge 0$. Therefore
\begin{align*}
\mathbb E_\varepsilon[\exp(\lambda X_t)]\le \prod_{i=1}^{n}\exp\left(\frac{\lambda^2t_i^2}{2}\right)=\exp\left(\frac{\lambda^2|t|^2}{2}\right).
\end{align*}
Since $|t|\le a$, this gives
\begin{align*}
\mathbb E_\varepsilon[\exp(\lambda X_t)]\le \exp\left(\frac{\lambda^2a^2}{2}\right).
\end{align*}
Combining this estimate with the previous step yields
\begin{align*}
\mathbb E_\varepsilon[\exp(\lambda Y)]\le |T|\exp\left(\frac{\lambda^2a^2}{2}\right).
\end{align*}
[/step]
[step:Optimize the logarithmic bound]
Using the estimates above, for every $\lambda>0$,
\begin{align*}
\mathbb E_\varepsilon[Y]\le \frac{\log |T|}{\lambda}+\frac{\lambda a^2}{2}.
\end{align*}
If $|T|=1$, write $T=\{t_0\}$. Then
\begin{align*}
\mathbb E_\varepsilon[Y]=\mathbb E_\varepsilon[X_{t_0}]=\sum_{i=1}^{n}t_{0,i}\mathbb E_\varepsilon[\varepsilon_i]=0.
\end{align*}
Since $\log |T|=0$, the desired estimate holds.
Now assume $|T|\ge 2$. If $a=0$, then $|t|=0$ for every $t\in T$, so $t=0$ for every $t\in T$ and $Y=0$. Hence the desired estimate holds. If $a>0$, choose
\begin{align*}
\lambda:=\frac{\sqrt{2\log |T|}}{a}.
\end{align*}
Substituting this value into the preceding bound gives
\begin{align*}
\mathbb E_\varepsilon[Y]\le \frac{a\log |T|}{\sqrt{2\log |T|}}+\frac{a\sqrt{2\log |T|}}{2}=a\sqrt{2\log |T|}.
\end{align*}
Thus
\begin{align*}
\mathbb E_\varepsilon\left[\sup_{t\in T}\sum_{i=1}^{n}\varepsilon_i t_i\right]\le a\sqrt{2\log |T|}.
\end{align*}
[/step]
[step:Apply the vector inequality to evaluation vectors]
Define the finite set of evaluation vectors
\begin{align*}
T_{\mathcal F}:=\{(f(x_1),\dots,f(x_n))\in\mathbb R^n:f\in\mathcal F\}.
\end{align*}
Since $\mathcal F$ is nonempty and finite, $T_{\mathcal F}$ is nonempty and finite, and
\begin{align*}
|T_{\mathcal F}|\le |\mathcal F|.
\end{align*}
For every vector $t=(f(x_1),\dots,f(x_n))\in T_{\mathcal F}$,
\begin{align*}
|t|^2=\sum_{i=1}^{n}f(x_i)^2\le nb^2.
\end{align*}
Thus
\begin{align*}
\sup_{t\in T_{\mathcal F}}|t|\le \sqrt n\,b.
\end{align*}
Applying the finite-vector inequality to $T_{\mathcal F}$ with $a=\sqrt n\,b$ gives
\begin{align*}
\mathbb E_\varepsilon\left[\sup_{t\in T_{\mathcal F}}\sum_{i=1}^{n}\varepsilon_i t_i\right]\le \sqrt n\,b\sqrt{2\log |T_{\mathcal F}|}.
\end{align*}
By the definition of $T_{\mathcal F}$,
\begin{align*}
\sup_{t\in T_{\mathcal F}}\sum_{i=1}^{n}\varepsilon_i t_i=\sup_{f\in\mathcal F}\sum_{i=1}^{n}\varepsilon_i f(x_i).
\end{align*}
Dividing by $n$ and using $|T_{\mathcal F}|\le |\mathcal F|$ gives
\begin{align*}
\widehat{\mathfrak R}_n(\mathcal F;x_1,\dots,x_n)\le b\sqrt{\frac{2\log |\mathcal F|}{n}}.
\end{align*}
This proves the function-class consequence and completes the proof.
[/step]