[proofplan]
For each fixed function $f\in\mathcal F$, we first prove a two-sided concentration bound for the centered empirical average by applying the exponential moment assumption to the sum of the independent centered variables. We then choose the deviation level so that the failure probability for one fixed $f$ is at most $\beta/|\mathcal F|$. Finally, we take a finite union over $f\in\mathcal F$; no independence between different functions is needed because the union estimate uses only subadditivity of probability.
[/proofplan]
[step:Prove the fixed-function sub-Gaussian deviation bound]
Fix $f\in\mathcal F$. For each $i\in\{1,\dots,n\}$, define the centered real-valued [random variable](/page/Random%20Variable)
\begin{align*}
X_{i,f}:\Omega\to\mathbb R,\quad X_{i,f}(\omega)=f(Z_i(\omega))-\mathbb E[f(Z_i)].
\end{align*}
Define the centered empirical average
\begin{align*}
M_f:\Omega\to\mathbb R,\quad M_f(\omega)=\frac{1}{n}\sum_{i=1}^n X_{i,f}(\omega).
\end{align*}
Let $t>0$. For any $\lambda>0$, the elementary estimate $a\,\mathbb 1_{\{Y\ge a\}}\le Y$ for a nonnegative random variable $Y$ and a number $a>0$ gives
\begin{align*}
\mathbb P(M_f\ge t)=\mathbb P\left(\exp(\lambda nM_f)\ge \exp(\lambda nt)\right).
\end{align*}
By the preceding estimate,
\begin{align*}
\mathbb P(M_f\ge t)\le \exp(-\lambda nt)\mathbb E[\exp(\lambda nM_f)].
\end{align*}
Because the random variables $X_{1,f},\dots,X_{n,f}$ are independent, and each has variance proxy $\sigma^2$,
\begin{align*}
\mathbb E[\exp(\lambda nM_f)]=\mathbb E\left[\exp\left(\lambda\sum_{i=1}^n X_{i,f}\right)\right].
\end{align*}
Independence gives
\begin{align*}
\mathbb E\left[\exp\left(\lambda\sum_{i=1}^n X_{i,f}\right)\right]=\prod_{i=1}^n \mathbb E[\exp(\lambda X_{i,f})].
\end{align*}
The sub-Gaussian moment bound gives
\begin{align*}
\prod_{i=1}^n \mathbb E[\exp(\lambda X_{i,f})]\le \prod_{i=1}^n \exp\left(\frac{\lambda^2\sigma^2}{2}\right)=\exp\left(\frac{n\lambda^2\sigma^2}{2}\right).
\end{align*}
Therefore
\begin{align*}
\mathbb P(M_f\ge t)
\le
\exp\left(-\lambda nt+\frac{n\lambda^2\sigma^2}{2}\right).
\end{align*}
Choosing $\lambda=t/\sigma^2$ gives
\begin{align*}
\mathbb P(M_f\ge t)\le \exp\left(-\frac{nt^2}{2\sigma^2}\right).
\end{align*}
Applying the same argument to $-M_f$, using the sub-Gaussian bound with $-\lambda$ in place of $\lambda$, gives
\begin{align*}
\mathbb P(M_f\le -t)\le \exp\left(-\frac{nt^2}{2\sigma^2}\right).
\end{align*}
Thus
\begin{align*}
\mathbb P(|M_f|>t)\le 2\exp\left(-\frac{nt^2}{2\sigma^2}\right).
\end{align*}
[guided]
Fix one function $f\in\mathcal F$. The purpose of this step is to get a deviation estimate for that single function before taking a union over the class. For each $i\in\{1,\dots,n\}$, define
\begin{align*}
X_{i,f}:\Omega\to\mathbb R,\quad X_{i,f}(\omega)=f(Z_i(\omega))-\mathbb E[f(Z_i)].
\end{align*}
These are the centered observations associated with $f$. Define their empirical average by
\begin{align*}
M_f:\Omega\to\mathbb R,\quad M_f(\omega)=\frac{1}{n}\sum_{i=1}^n X_{i,f}(\omega).
\end{align*}
We want to bound $\mathbb P(|M_f|>t)$ for a number $t>0$.
First bound the upper tail. Let $\lambda>0$. Since the exponential function is increasing,
\begin{align*}
\{M_f\ge t\}
=
\left\{\exp(\lambda nM_f)\ge \exp(\lambda nt)\right\}.
\end{align*}
For any nonnegative random variable $Y$ and number $a>0$, the inequality $a\,\mathbb 1_{\{Y\ge a\}}\le Y$ implies $\mathbb P(Y\ge a)\le \mathbb E[Y]/a$. Applying this with $Y=\exp(\lambda nM_f)$ and $a=\exp(\lambda nt)$ gives
\begin{align*}
\mathbb P(M_f\ge t)
&\le
\exp(-\lambda nt)\mathbb E[\exp(\lambda nM_f)].
\end{align*}
Now compute the exponential moment. Since
\begin{align*}
nM_f=\sum_{i=1}^n X_{i,f},
\end{align*}
and the random variables $X_{1,f},\dots,X_{n,f}$ are independent for this fixed $f$, the expectation factors:
\begin{align*}
\mathbb E[\exp(\lambda nM_f)]=\mathbb E\left[\exp\left(\lambda\sum_{i=1}^n X_{i,f}\right)\right].
\end{align*}
Since the exponential of a sum is the product of exponentials,
\begin{align*}
\mathbb E\left[\exp\left(\lambda\sum_{i=1}^n X_{i,f}\right)\right]=\mathbb E\left[\prod_{i=1}^n \exp(\lambda X_{i,f})\right].
\end{align*}
Independence of $X_{1,f},\dots,X_{n,f}$ gives
\begin{align*}
\mathbb E\left[\prod_{i=1}^n \exp(\lambda X_{i,f})\right]=\prod_{i=1}^n \mathbb E[\exp(\lambda X_{i,f})].
\end{align*}
The sub-Gaussian hypothesis gives, for every $i$,
\begin{align*}
\mathbb E[\exp(\lambda X_{i,f})]\le \exp\left(\frac{\lambda^2\sigma^2}{2}\right).
\end{align*}
Therefore
\begin{align*}
\mathbb E[\exp(\lambda nM_f)]\le \prod_{i=1}^n \exp\left(\frac{\lambda^2\sigma^2}{2}\right)=\exp\left(\frac{n\lambda^2\sigma^2}{2}\right).
\end{align*}
Substituting this into the upper-tail estimate yields
\begin{align*}
\mathbb P(M_f\ge t)
\le
\exp\left(-\lambda nt+\frac{n\lambda^2\sigma^2}{2}\right).
\end{align*}
The exponent is minimized over $\lambda>0$ by choosing $\lambda=t/\sigma^2$, which is positive because $t>0$ and $\sigma>0$. With this choice,
\begin{align*}
-\lambda nt+\frac{n\lambda^2\sigma^2}{2}=-\frac{nt^2}{\sigma^2}+\frac{nt^2}{2\sigma^2}=-\frac{nt^2}{2\sigma^2}.
\end{align*}
Hence
\begin{align*}
\mathbb P(M_f\ge t)\le \exp\left(-\frac{nt^2}{2\sigma^2}\right).
\end{align*}
The lower tail is the same exponential argument applied to $-M_f$. Equivalently, use the sub-Gaussian moment bound with $-\lambda$ in place of $\lambda$:
\begin{align*}
\mathbb E[\exp(-\lambda X_{i,f})]\le \exp\left(\frac{\lambda^2\sigma^2}{2}\right).
\end{align*}
Repeating the preceding computation gives
\begin{align*}
\mathbb P(M_f\le -t)\le \exp\left(-\frac{nt^2}{2\sigma^2}\right).
\end{align*}
Combining the upper and lower tails by subadditivity of probability,
\begin{align*}
\mathbb P(|M_f|>t)=\mathbb P\bigl(\{M_f>t\}\cup\{M_f<-t\}\bigr).
\end{align*}
By subadditivity of probability,
\begin{align*}
\mathbb P\bigl(\{M_f>t\}\cup\{M_f<-t\}\bigr)\le \mathbb P(M_f>t)+\mathbb P(M_f<-t)\le 2\exp\left(-\frac{nt^2}{2\sigma^2}\right).
\end{align*}
This is the fixed-function concentration estimate.
[/guided]
[/step]
[step:Choose the deviation radius with failure probability $\beta/|\mathcal F|$]
Define
\begin{align*}
r:=\sigma\sqrt{\frac{2\log(2|\mathcal F|/\beta)}{n}}.
\end{align*}
Since $\mathcal F$ is nonempty and finite and $\beta\in(0,1)$, we have $2|\mathcal F|/\beta>1$, so $r>0$. Applying the fixed-function bound with $t=r$ gives, for every $f\in\mathcal F$,
\begin{align*}
\mathbb P(|M_f|>r)\le 2\exp\left(-\frac{nr^2}{2\sigma^2}\right).
\end{align*}
By the definition of $r$,
\begin{align*}
2\exp\left(-\frac{nr^2}{2\sigma^2}\right)=2\exp\left(-\log(2|\mathcal F|/\beta)\right)=\frac{\beta}{|\mathcal F|}.
\end{align*}
[/step]
[step:Take the finite union over the function class]
For each $f\in\mathcal F$, define the bad event
\begin{align*}
B_f:=\{\omega\in\Omega: |M_f(\omega)|>r\}.
\end{align*}
The event on which at least one function violates the desired bound is $\bigcup_{f\in\mathcal F}B_f$. By finite subadditivity of probability and the estimate from the previous step,
\begin{align*}
\mathbb P\left(\bigcup_{f\in\mathcal F}B_f\right)\le \sum_{f\in\mathcal F}\mathbb P(B_f).
\end{align*}
Using the estimate from the previous step for every $f\in\mathcal F$,
\begin{align*}
\sum_{f\in\mathcal F}\mathbb P(B_f)\le \sum_{f\in\mathcal F}\frac{\beta}{|\mathcal F|}=\beta.
\end{align*}
No independence between the events $B_f$ is used in this estimate.
Taking complements, with probability at least $1-\beta$, no bad event occurs. Therefore, for every $f\in\mathcal F$,
\begin{align*}
\left|\frac{1}{n}\sum_{i=1}^n f(Z_i)-\frac{1}{n}\sum_{i=1}^n \mathbb E[f(Z_i)]\right|=|M_f|\le r.
\end{align*}
Taking the supremum over $f\in\mathcal F$ on this same event gives
\begin{align*}
\sup_{f\in\mathcal F}\left|\frac{1}{n}\sum_{i=1}^n f(Z_i)-\frac{1}{n}\sum_{i=1}^n \mathbb E[f(Z_i)]\right|\le \sigma\sqrt{\frac{2\log(2|\mathcal F|/\beta)}{n}}.
\end{align*}
Hence the probability of the displayed supremum event is at least $1-\beta$, as required.
[/step]