[proofplan]
We center the bounded observations by writing $Y_i=f(X_i)-Pf$, so that the upper-tail event becomes a deviation event for the sum $\sum_{i=1}^n Y_i$. For a positive exponential parameter $\lambda$, Markov's inequality applied to the nonnegative [random variable](/page/Random%20Variable) $\exp(\lambda\sum_i Y_i)$ converts the tail probability into a [moment generating function](/page/Moment%20Generating%20Function) estimate. Independence factors this moment [generating function](/page/Generating%20Function), and [Hoeffding's lemma](/theorems/1956) bounds each centered bounded factor by a quadratic exponential. Optimizing the resulting bound in $\lambda$ gives the upper-tail estimate; the lower-tail estimate follows by applying the same argument to $-f$.
[/proofplan]
custom_env
admin
[step:Center the empirical average and record the bounded interval]For each $1\le i\le n$, define the real-valued random variable
\begin{align*}
Y_i:\Omega\to\mathbb R
\end{align*}
by
\begin{align*}
Y_i(\omega):=f(X_i(\omega))-Pf.
\end{align*}
Since $a\le f\le b$ on $S$, the function $f$ is bounded and hence $P$-integrable. Moreover,
\begin{align*}
a\le Pf\le b,
\end{align*}
because $P$ is a probability measure and
\begin{align*}
a=\int_S a\,dP(x)\le \int_S f(x)\,dP(x)\le \int_S b\,dP(x)=b.
\end{align*}
Thus each $Y_i$ is bounded by
\begin{align*}
a-Pf\le Y_i\le b-Pf.
\end{align*}
The interval length is
\begin{align*}
(b-Pf)-(a-Pf)=b-a.
\end{align*}
Also, since $X_i$ has distribution $P$,
\begin{align*}
\mathbb E[Y_i]=\int_\Omega f(X_i(\omega))\,d\mathbb P(\omega)-Pf=\int_S f(x)\,dP(x)-Pf=0.
\end{align*}
Finally,
\begin{align*}
P_nf-Pf=\frac{1}{n}\sum_{i=1}^{n}Y_i.
\end{align*}[/step]
custom_env
admin
[guided]The point of this step is to isolate the random fluctuation from its deterministic mean. For each $1\le i\le n$, define
\begin{align*}
Y_i:\Omega\to\mathbb R
\end{align*}
by
\begin{align*}
Y_i(\omega):=f(X_i(\omega))-Pf.
\end{align*}
The function $f$ is bounded because $a\le f(x)\le b$ for every $x\in S$, so the integral defining $Pf$ is finite. Since $P$ is a probability measure, integrating the pointwise inequality $a\le f\le b$ gives
\begin{align*}
a=\int_S a\,dP(x)\le \int_S f(x)\,dP(x)\le \int_S b\,dP(x)=b.
\end{align*}
Therefore
\begin{align*}
a-Pf\le f(X_i)-Pf\le b-Pf,
\end{align*}
so each $Y_i$ is supported in the interval $[a-Pf,b-Pf]$. This interval has length
\begin{align*}
(b-Pf)-(a-Pf)=b-a.
\end{align*}
The variables are centered. Indeed, because $X_i$ has distribution $P$,
\begin{align*}
\mathbb E[Y_i]=\int_\Omega f(X_i(\omega))\,d\mathbb P(\omega)-Pf=\int_S f(x)\,dP(x)-Pf=0.
\end{align*}
The empirical deviation is exactly the average of these centered variables:
\begin{align*}
P_nf-Pf=\frac{1}{n}\sum_{i=1}^{n}Y_i.
\end{align*}
This reformulation puts the problem into the standard bounded-sum form needed for the exponential moment argument.[/guided]
custom_env
admin
[step:Bound the exponential moment of the centered sum]
We use the following form of Hoeffding's lemma: if $Z:\Omega\to\mathbb R$ is a centered random variable with $\alpha\le Z\le \beta$ almost surely, then for every $\lambda\in\mathbb R$,
\begin{align*}
\mathbb E[\exp(\lambda Z)]\le \exp\left(\frac{\lambda^2(\beta-\alpha)^2}{8}\right).
\end{align*}
This is a standard external ingredient not yet resolved to a wiki theorem: Hoeffding's Lemma.
For each $i$, apply this lemma to $Z=Y_i$, $\alpha=a-Pf$, and $\beta=b-Pf$. Since $\mathbb E[Y_i]=0$ and $\beta-\alpha=b-a$, for every $\lambda>0$,
\begin{align*}
\mathbb E[\exp(\lambda Y_i)]\le \exp\left(\frac{\lambda^2(b-a)^2}{8}\right).
\end{align*}
Because $X_1,\dots,X_n$ are independent and the maps $f:S\to\mathbb R$ are measurable, the random variables $Y_1,\dots,Y_n$ are independent. Hence
\begin{align*}
\mathbb E\left[\exp\left(\lambda\sum_{i=1}^{n}Y_i\right)\right]
=\prod_{i=1}^{n}\mathbb E[\exp(\lambda Y_i)]
\le \exp\left(\frac{n\lambda^2(b-a)^2}{8}\right).
\end{align*}
[/step]
custom_env
admin
[step:Apply exponential Markov inequality and optimize the parameter]
Fix $t>0$ and $\lambda>0$. The event $\{P_nf-Pf\ge t\}$ is the same as
\begin{align*}
\left\{\sum_{i=1}^{n}Y_i\ge nt\right\}.
\end{align*}
Since the exponential map is increasing,
\begin{align*}
\left\{\sum_{i=1}^{n}Y_i\ge nt\right\}
=
\left\{\exp\left(\lambda\sum_{i=1}^{n}Y_i\right)\ge \exp(\lambda nt)\right\}.
\end{align*}
Applying Markov's inequality to the nonnegative random variable
\begin{align*}
\omega\mapsto \exp\left(\lambda\sum_{i=1}^{n}Y_i(\omega)\right)
\end{align*}
gives
\begin{align*}
\mathbb P(P_nf-Pf\ge t)
\le \exp(-\lambda nt)\mathbb E\left[\exp\left(\lambda\sum_{i=1}^{n}Y_i\right)\right].
\end{align*}
Using the exponential moment bound from the previous step,
\begin{align*}
\mathbb P(P_nf-Pf\ge t)
\le \exp\left(-\lambda nt+\frac{n\lambda^2(b-a)^2}{8}\right).
\end{align*}
Choose
\begin{align*}
\lambda_*:=\frac{4t}{(b-a)^2}.
\end{align*}
Then $\lambda_*>0$, and substituting $\lambda=\lambda_*$ yields
\begin{align*}
-\lambda_*nt+\frac{n\lambda_*^2(b-a)^2}{8}
=
-\frac{4nt^2}{(b-a)^2}+\frac{2nt^2}{(b-a)^2}
=
-\frac{2nt^2}{(b-a)^2}.
\end{align*}
Therefore
\begin{align*}
\mathbb P(P_nf-Pf\ge t)\le \exp\left(-\frac{2nt^2}{(b-a)^2}\right).
\end{align*}
[/step]
custom_env
admin
[step:Apply the upper-tail result to $-f$ for the lower tail]
Define the [measurable function](/page/Measurable%20Function)
\begin{align*}
g:S\to\mathbb R
\end{align*}
by
\begin{align*}
g(x):=-f(x).
\end{align*}
Since $a\le f(x)\le b$ for every $x\in S$, we have
\begin{align*}
-b\le g(x)\le -a
\end{align*}
for every $x\in S$, and the interval length is
\begin{align*}
(-a)-(-b)=b-a.
\end{align*}
Also
\begin{align*}
P_ng=-P_nf
\end{align*}
and
\begin{align*}
Pg=-Pf.
\end{align*}
Applying the upper-tail estimate already proved to $g$ gives, for every $t>0$,
\begin{align*}
\mathbb P(P_ng-Pg\ge t)\le \exp\left(-\frac{2nt^2}{(b-a)^2}\right).
\end{align*}
But
\begin{align*}
P_ng-Pg=(-P_nf)-(-Pf)=Pf-P_nf.
\end{align*}
Hence
\begin{align*}
\mathbb P(Pf-P_nf\ge t)\le \exp\left(-\frac{2nt^2}{(b-a)^2}\right).
\end{align*}
Together with the upper-tail estimate, this proves both claimed inequalities.
[/step]