[proofplan]
We reveal the variables one at a time and form the Doob martingale whose $i$-th term is the [conditional expectation](/page/Conditional%20Expectation) of $Z$ after observing the first $i$ variables. The bounded differences hypothesis implies that each successive martingale increment is almost surely confined, conditionally on the previously revealed variables, to an interval of length at most the corresponding constant $c_i$. A conditional Hoeffding-type estimate then gives an exponential bound for each conditional increment, and iteration controls the moment generating function of $Z-\mathbb E[Z]$. Markov's inequality and optimization in the exponential parameter give the upper tail; applying the same argument to $-f$ gives the lower tail, and the two-sided estimate follows from the union bound.
[/proofplan]
[step:Construct the Doob martingale by revealing the variables one at a time]
Let $(\Omega,\mathcal F,\mathbb P)$ be the probability space on which the random variables $X_1,\dots,X_n$ are defined.
For $i\in\{0,\dots,n\}$, define the sub-$\sigma$-algebra
\begin{align*}
\mathcal F_i:=\sigma(X_1,\dots,X_i),
\end{align*}
with $\mathcal F_0:=\{\varnothing,\Omega\}$ completed if necessary. Define the [random variable](/page/Random%20Variable) $M_i:\Omega\to\mathbb R$ by
\begin{align*}
M_i(\omega):=\mathbb E[Z\mid \mathcal F_i](\omega).
\end{align*}
Here $L^1(\Omega,\mathcal F,\mathbb P)$ denotes the space of integrable real-valued random variables on the probability space $(\Omega,\mathcal F,\mathbb P)$, modulo almost sure equality. Since $Z\in L^1(\Omega,\mathcal F,\mathbb P)$, each $M_i$ is integrable and $\mathcal F_i$-measurable. The [tower property of conditional expectation](/theorems/1150) applies because $\mathcal F_{i-1}\subset\mathcal F_i$ and $Z\in L^1(\Omega,\mathcal F,\mathbb P)$, and gives
\begin{align*}
\mathbb E[M_i\mid \mathcal F_{i-1}]
=
\mathbb E[\mathbb E[Z\mid \mathcal F_i]\mid \mathcal F_{i-1}]
=
\mathbb E[Z\mid \mathcal F_{i-1}]
=
M_{i-1},
\end{align*}
so $(M_i)_{i=0}^n$ is a martingale with respect to $(\mathcal F_i)_{i=0}^n$. Also $M_0=\mathbb E[Z]$ and $M_n=Z$ almost surely, because $Z$ is $\mathcal F_n$-measurable.
[/step]
[step:Bound each martingale increment by the corresponding bounded difference constant]
For $i\in\{1,\dots,n\}$, define the martingale difference $D_i:\Omega\to\mathbb R$ by
\begin{align*}
D_i(\omega):=M_i(\omega)-M_{i-1}(\omega).
\end{align*}
We claim that $D_i$ is almost surely contained in an interval of length at most $c_i$ conditional on $\mathcal F_{i-1}$.
Fix $i$. Let $\mu_j$ denote the law of $X_j$ on $(\mathcal X_j,\mathcal A_j)$. Let $\nu_i$ denote the product probability measure $\mu_{i+1}\otimes\cdots\otimes\mu_n$ on $\mathcal X_{i+1}\times\cdots\times\mathcal X_n$, with the convention that for $i=n$ this is the one-point probability measure on the empty tail. Independence means that the law of $(X_1,\dots,X_n)$ on the product measurable space is $\mu_1\otimes\cdots\otimes\mu_n$. We will use this product-law identity directly to verify conditional expectations, rather than invoking a regular conditional distribution of the unrevealed coordinates.
Define a measurable version $G_i:\mathcal X_1\times\cdots\times\mathcal X_i\to\mathbb R$ of the conditional expectation section as follows. Since $Z\in L^1(\Omega,\mathcal F,\mathbb P)$, the Tonelli theorem applied to $|f|$ under the product law is valid because $|f|$ is nonnegative and measurable. Hence for $(\mu_1\otimes\cdots\otimes\mu_i)$-almost every prefix $(x_1,\dots,x_i)$ the integral of $|f(x_1,\dots,x_i,\cdot)|$ with respect to $\nu_i$ is finite. On this full-measure set define
\begin{align*}
G_i(x_1,\dots,x_i):=\int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n} f(x_1,\dots,x_i,y_{i+1},\dots,y_n)\, d\nu_i(y_{i+1},\dots,y_n),
\end{align*}
and define $G_i$ to be $0$ on the exceptional null set. For $i=n$, this formula reads $G_n=f$ outside a null set. The function $G_i$ is $\mathcal A_1\otimes\cdots\otimes\mathcal A_i$-measurable by the measurable-parameter form of Tonelli theorem applied to the positive and negative parts of the integrable function $f$.
We verify the conditional-expectation identity by the defining property. Let $H:\Omega\to\mathbb R$ be a bounded $\mathcal F_i$-measurable random variable. By the Doob-Dynkin factorization lemma for $\sigma(X_1,\dots,X_i)$-measurable real random variables, there is a bounded $\mathcal A_1\otimes\cdots\otimes\mathcal A_i$-measurable function $h:\mathcal X_1\times\cdots\times\mathcal X_i\to\mathbb R$ such that
\begin{align*}
H=h(X_1,\dots,X_i)
\end{align*}
almost surely. Since $h$ is bounded and $f$ is integrable under the product law, [Fubini's theorem](/theorems/2961) applies to the integrable function
\begin{align*}
(x_1,\dots,x_n)\mapsto h(x_1,\dots,x_i)f(x_1,\dots,x_n).
\end{align*}
Using the product law of $(X_1,\dots,X_n)$, we have
\begin{align*}
\mathbb E[HZ]
=
\int_{\mathcal X_1\times\cdots\times\mathcal X_n}
h(x_1,\dots,x_i)f(x_1,\dots,x_n)\,d(\mu_1\otimes\cdots\otimes\mu_n)(x_1,\dots,x_n).
\end{align*}
[Fubini's theorem](/theorems/2961) identifies the last integral as
\begin{align*}
\int_{\mathcal X_1\times\cdots\times\mathcal X_i}
h(x_1,\dots,x_i)G_i(x_1,\dots,x_i)\,d(\mu_1\otimes\cdots\otimes\mu_i)(x_1,\dots,x_i).
\end{align*}
By the product law of $(X_1,\dots,X_i)$, this equals
\begin{align*}
\mathbb E[H G_i(X_1,\dots,X_i)].
\end{align*}
Thus $G_i(X_1,\dots,X_i)$ is integrable, $\mathcal F_i$-measurable, and satisfies the defining identity for $\mathbb E[Z\mid\mathcal F_i]$. Hence
\begin{align*}
M_i=G_i(X_1,\dots,X_i)
\end{align*}
almost surely.
By [Fubini's theorem](/theorems/2961) applied to the integrable function $|f|$ under the product law, there is a set $B_i\subset\mathcal X_1\times\cdots\times\mathcal X_{i-1}$ with $(\mu_1\otimes\cdots\otimes\mu_{i-1})(B_i)=1$ such that, for every prefix $x=(x_1,\dots,x_{i-1})\in B_i$, the section $u\mapsto G_i(x_1,\dots,x_{i-1},u)$ is finite for $\mu_i$-almost every $u\in\mathcal X_i$. For such a prefix $x$, define the finite-a.e. one-coordinate section $g_{i,x}:\mathcal X_i\to\mathbb R$ by $g_{i,x}(u):=G_i(x_1,\dots,x_{i-1},u)$ on its full-measure domain.
If $u,v$ belong to this full-measure domain, then the bounded differences hypothesis gives, for every tail point outside a $\nu_i$-null set depending on $u$ and $v$,
\begin{align*}
\left|
f(x_1,\dots,x_{i-1},u,y_{i+1},\dots,y_n)
-
f(x_1,\dots,x_{i-1},v,y_{i+1},\dots,y_n)
\right|
\le c_i.
\end{align*}
Integrating this pointwise inequality with respect to the probability measure $\nu_i$ gives
\begin{align*}
|g_{i,x}(u)-g_{i,x}(v)|
\le c_i.
\end{align*}
Thus the essential range of $g_{i,x}(X_i)$ under $\mu_i$ is contained in an interval of length at most $c_i$. Conditionally on $\mathcal F_{i-1}$, $M_i$ has this essential range and
\begin{align*}
M_{i-1}=\mathbb E[M_i\mid\mathcal F_{i-1}].
\end{align*}
Subtracting $M_{i-1}$ translates the interval, so conditionally on $\mathcal F_{i-1}$, $D_i=M_i-M_{i-1}$ has mean $0$ and is supported, in the essential-range sense, in an interval of length at most $c_i$.
[guided]
The point of the Doob martingale is that $M_i$ is the best prediction of $Z$ after seeing the first $i$ variables. We must show that revealing $X_i$ can change this prediction by at most $c_i$.
Fix $i\in\{1,\dots,n\}$. Let $\mu_j$ be the distribution of $X_j$ on $(\mathcal X_j,\mathcal A_j)$ for each $j$, and let $\nu_i$ be the product measure $\mu_{i+1}\otimes\cdots\otimes\mu_n$ on the unrevealed coordinates, with the empty product interpreted as the one-point probability measure when $i=n$. Because $X_1,\dots,X_n$ are independent, the joint law of $(X_1,\dots,X_n)$ is the product measure $\mu_1\otimes\cdots\otimes\mu_n$. This product-law statement is what allows us to compute conditional expectations by testing against bounded functions of the revealed coordinates; no regular [conditional probability](/page/Conditional%20Probability) is needed.
Since the theorem assumes $Z\in L^1(\Omega,\mathcal F,\mathbb P)$, the Tonelli theorem applies to $|f|$ under the product law because $|f|$ is nonnegative and measurable. It implies that for $(\mu_1\otimes\cdots\otimes\mu_i)$-almost every prefix $(x_1,\dots,x_i)$, the tail integral of $|f(x_1,\dots,x_i,\cdot)|$ with respect to $\nu_i$ is finite. On this full-measure set define $G_i:\mathcal X_1\times\cdots\times\mathcal X_i\to\mathbb R$ by
\begin{align*}
G_i(x_1,\dots,x_i):=\int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n} f(x_1,\dots,x_i,y_{i+1},\dots,y_n)\, d\nu_i(y_{i+1},\dots,y_n),
\end{align*}
and set $G_i$ equal to $0$ on the exceptional null set. For $i=n$, this is just the statement that $G_n=f$ outside a null set. The function $G_i$ is measurable with respect to $\mathcal A_1\otimes\cdots\otimes\mathcal A_i$ by the measurable-parameter form of Tonelli theorem, applied separately to the positive and negative parts of the integrable function $f$.
We now verify that $G_i(X_1,\dots,X_i)$ is the conditional expectation of $Z$ given $\mathcal F_i$. Take any bounded $\mathcal F_i$-measurable random variable $H:\Omega\to\mathbb R$. Since $\mathcal F_i=\sigma(X_1,\dots,X_i)$, the Doob-Dynkin factorization lemma gives a bounded measurable function $h:\mathcal X_1\times\cdots\times\mathcal X_i\to\mathbb R$ such that
\begin{align*}
H=h(X_1,\dots,X_i)
\end{align*}
almost surely. Because $h$ is bounded and $f$ is integrable under the product law, [Fubini's theorem](/theorems/2961) applies to the integrable function $(x_1,\dots,x_n)\mapsto h(x_1,\dots,x_i)f(x_1,\dots,x_n)$. Therefore
\begin{align*}
\mathbb E[HZ]
=
\int_{\mathcal X_1\times\cdots\times\mathcal X_n}
h(x_1,\dots,x_i)f(x_1,\dots,x_n)\,d(\mu_1\otimes\cdots\otimes\mu_n)(x_1,\dots,x_n).
\end{align*}
Fubini's theorem turns this into
\begin{align*}
\int_{\mathcal X_1\times\cdots\times\mathcal X_i}
h(x_1,\dots,x_i)G_i(x_1,\dots,x_i)\,d(\mu_1\otimes\cdots\otimes\mu_i)(x_1,\dots,x_i).
\end{align*}
By the product law of $(X_1,\dots,X_i)$, the last display is
\begin{align*}
\mathbb E[H G_i(X_1,\dots,X_i)].
\end{align*}
This is exactly the defining property of $\mathbb E[Z\mid\mathcal F_i]$, and $G_i(X_1,\dots,X_i)$ is $\mathcal F_i$-measurable by construction. Hence
\begin{align*}
M_i=G_i(X_1,\dots,X_i)
\end{align*}
almost surely.
Now we must be careful about null sets. [Fubini's theorem](/theorems/2961) applied to the integrable function $|f|$ gives a full-measure set $B_i\subset\mathcal X_1\times\cdots\times\mathcal X_{i-1}$ such that, once $x=(x_1,\dots,x_{i-1})\in B_i$ is fixed, the one-coordinate section $u\mapsto G_i(x_1,\dots,x_{i-1},u)$ is finite for $\mu_i$-almost every $u$. Define $g_{i,x}:\mathcal X_i\to\mathbb R$ by $g_{i,x}(u):=G_i(x_1,\dots,x_{i-1},u)$ on this full-measure domain.
Compare two values $u,v$ from that full-measure domain. The bounded differences condition says that changing only the $i$-th coordinate changes $f$ by at most $c_i$. Hence for $\nu_i$-almost every tail point,
\begin{align*}
\left|
f(x_1,\dots,x_{i-1},u,y_{i+1},\dots,y_n)
-
f(x_1,\dots,x_{i-1},v,y_{i+1},\dots,y_n)
\right|
\le c_i.
\end{align*}
Since $\nu_i$ is a probability measure, integrating preserves this bound:
\begin{align*}
|g_{i,x}(u)-g_{i,x}(v)|
\le c_i.
\end{align*}
Thus, conditional on the information $\mathcal F_{i-1}$, the essential range of $M_i$ under the remaining randomness of $X_i$ is contained in an interval whose length is at most $c_i$. Since
\begin{align*}
M_{i-1}=\mathbb E[M_i\mid\mathcal F_{i-1}],
\end{align*}
subtracting $M_{i-1}$ only translates that interval. Therefore $D_i=M_i-M_{i-1}$ has conditional mean zero and, conditional on $\mathcal F_{i-1}$, is supported in the essential-range sense in an interval of length at most $c_i$.
[/guided]
[/step]
[step:Control the exponential moment of each martingale increment]
We use the following conditional form of [Hoeffding's lemma](/theorems/1956). Let $\mathcal G\subset\mathcal F$ be a sub-$\sigma$-algebra. Let $A:\Omega\to\mathbb R$ and $B:\Omega\to\mathbb R$ be $\mathcal G$-measurable real-valued random variables satisfying $A\le B$ almost surely and $B-A\le c$ almost surely. If $Y:\Omega\to\mathbb R$ is an integrable real-valued random variable with $\mathbb E[Y\mid\mathcal G]=0$ and $A\le Y\le B$ almost surely, then for every $\lambda\in\mathbb R$,
\begin{align*}
\mathbb E[e^{\lambda Y}\mid\mathcal G]\le \exp\left(\frac{\lambda^2c^2}{8}\right).
\end{align*}
This follows by applying the scalar argument below on each conditioning fiber, with the $\mathcal G$-measurable endpoints $A$ and $B$. If $B=A$, then the support condition forces $Y=A$ conditionally on $\mathcal G$, and $\mathbb E[Y\mid\mathcal G]=0$ forces $A=0$ conditionally; hence $\mathbb E[e^{\lambda Y}\mid\mathcal G]=1$, which gives the estimate.
Assume now that $B>A$. Since $A\le Y\le B$ and $\mathbb E[Y\mid\mathcal G]=0$, taking conditional expectations gives
\begin{align*}
A\le 0\le B
\end{align*}
almost surely. The graph of $y\mapsto e^{\lambda y}$ lies below the chord joining $(A,e^{\lambda A})$ and $(B,e^{\lambda B})$ on $[A,B]$. Taking conditional expectations and using $\mathbb E[Y\mid\mathcal G]=0$ gives
\begin{align*}
\mathbb E[e^{\lambda Y}\mid\mathcal G]\le \frac{-A}{B-A}e^{\lambda B}+\frac{B}{B-A}e^{\lambda A}.
\end{align*}
Writing $p:=-A/(B-A)$ and $h:=\lambda(B-A)$, the right-hand side equals $(1-p)e^{-ph}+p e^{(1-p)h}$. The remaining scalar estimate is exactly [Hoeffding's lemma](/theorems/1956) applied pointwise to the centered two-point random variable taking values $-ph$ and $(1-p)h$ with probabilities $1-p$ and $p$, respectively; its hypotheses hold because this variable has mean $0$ and range length $|h|$. Therefore
\begin{align*}
(1-p)e^{-ph}+p e^{(1-p)h}\le e^{h^2/8}.
\end{align*}
Since $|h|=|\lambda|(B-A)\le |\lambda|c$, the displayed conditional bound follows.
In the preceding step, the interval containing the conditional essential range of $D_i$ may depend on the value of the prefix $(X_1,\dots,X_{i-1})$. This causes no change in the estimate: for each prefix in the full-measure set $B_i$, apply the deterministic interval argument above to the conditional law of $D_i$ on that fiber. The bound depends only on the interval length, which is at most $c_i$, and hence the resulting inequality is $\mathcal F_{i-1}$-almost sure.
Applying this fiberwise estimate to $Y=D_i$, $\mathcal G=\mathcal F_{i-1}$, and $c=c_i$, we obtain
\begin{align*}
\mathbb E[e^{\lambda D_i}\mid\mathcal F_{i-1}]
\le
\exp\left(\frac{\lambda^2c_i^2}{8}\right)
\end{align*}
for every $\lambda\in\mathbb R$ and every $i\in\{1,\dots,n\}$.
[/step]
[step:Iterate the conditional exponential bounds]
Define
\begin{align*}
C:=\sum_{i=1}^n c_i^2.
\end{align*}
Since the martingale telescopes, we have
\begin{align*}
Z-\mathbb E[Z]
=
M_n-M_0
=
\sum_{i=1}^n D_i
\end{align*}
almost surely. Repeated conditioning gives the moment generating function estimate. First,
\begin{align*}
\mathbb E[e^{\lambda(Z-\mathbb E[Z])}]=\mathbb E\left[\exp\left(\lambda\sum_{i=1}^n D_i\right)\right].
\end{align*}
Conditioning on $\mathcal F_{n-1}$ and using that $D_1,\dots,D_{n-1}$ are $\mathcal F_{n-1}$-measurable gives
\begin{align*}
\mathbb E[e^{\lambda(Z-\mathbb E[Z])}]=\mathbb E\left[\exp\left(\lambda\sum_{i=1}^{n-1}D_i\right)\mathbb E[e^{\lambda D_n}\mid\mathcal F_{n-1}]\right].
\end{align*}
Using the conditional exponential bound for $D_n$ yields
\begin{align*}
\mathbb E[e^{\lambda(Z-\mathbb E[Z])}]\le \exp\left(\frac{\lambda^2c_n^2}{8}\right)\mathbb E\left[\exp\left(\lambda\sum_{i=1}^{n-1}D_i\right)\right].
\end{align*}
Repeating the same conditioning step for $D_{n-1},\dots,D_1$ gives
\begin{align*}
\mathbb E[e^{\lambda(Z-\mathbb E[Z])}]\le \exp\left(\frac{\lambda^2}{8}\sum_{i=1}^n c_i^2\right)=\exp\left(\frac{\lambda^2C}{8}\right).
\end{align*}
[/step]
[step:Optimize Markov's inequality to obtain the upper tail]
Assume first that $C>0$. If $t=0$, then the desired upper-tail estimate is immediate because $\mathbb P(Z-\mathbb E[Z]\ge 0)\le 1=\exp(0)$. Now assume $t>0$. For $\lambda>0$, [Markov's inequality](/theorems/514) applies to the nonnegative random variable $e^{\lambda(Z-\mathbb E[Z])}$ and gives
\begin{align*}
\mathbb P(Z-\mathbb E[Z]\ge t)=\mathbb P(e^{\lambda(Z-\mathbb E[Z])}\ge e^{\lambda t}).
\end{align*}
Markov's inequality and the exponential moment bound give
\begin{align*}
\mathbb P(Z-\mathbb E[Z]\ge t)\le e^{-\lambda t}\mathbb E[e^{\lambda(Z-\mathbb E[Z])}].
\end{align*}
Therefore
\begin{align*}
\mathbb P(Z-\mathbb E[Z]\ge t)\le \exp\left(-\lambda t+\frac{\lambda^2C}{8}\right).
\end{align*}
The quadratic function
\begin{align*}
\lambda\mapsto -\lambda t+\frac{\lambda^2C}{8}
\end{align*}
is minimized over $\lambda>0$ at
\begin{align*}
\lambda=\frac{4t}{C}.
\end{align*}
Substituting this value gives
\begin{align*}
\mathbb P(Z-\mathbb E[Z]\ge t)
\le
\exp\left(-\frac{2t^2}{C}\right).
\end{align*}
[/step]
[step:Apply the same argument to $-Z$ and combine the two tails]
In this step assume $C>0$; the case $C=0$ is handled at the end of the proof.
Define $\tilde f:\mathcal X_1\times\cdots\times\mathcal X_n\to\mathbb R$ by
\begin{align*}
\tilde f(x_1,\dots,x_n):=-f(x_1,\dots,x_n).
\end{align*}
If two inputs differ only in coordinate $i$, then
\begin{align*}
|\tilde f(x_1,\dots,x_n)-\tilde f(y_1,\dots,y_n)|
=
|f(x_1,\dots,x_n)-f(y_1,\dots,y_n)|
\le c_i.
\end{align*}
Thus $\tilde f$ has the same bounded differences constants, and applying the upper-tail estimate to $-Z=\tilde f(X_1,\dots,X_n)$ gives
\begin{align*}
\mathbb P(\mathbb E[Z]-Z\ge t)
=
\mathbb P((-Z)-\mathbb E[-Z]\ge t)
\le
\exp\left(-\frac{2t^2}{C}\right).
\end{align*}
Finally,
\begin{align*}
\{|Z-\mathbb E[Z]|\ge t\}
=
\{Z-\mathbb E[Z]\ge t\}\cup\{\mathbb E[Z]-Z\ge t\}.
\end{align*}
Using [subadditivity of probability](/theorems/1106),
\begin{align*}
\mathbb P(|Z-\mathbb E[Z]|\ge t)\le \mathbb P(Z-\mathbb E[Z]\ge t)+\mathbb P(\mathbb E[Z]-Z\ge t).
\end{align*}
Using the two one-sided estimates, we obtain
\begin{align*}
\mathbb P(|Z-\mathbb E[Z]|\ge t)\le 2\exp\left(-\frac{2t^2}{C}\right).
\end{align*}
If $C=0$, then each $c_i=0$, so the bounded differences condition shows that changing any coordinate does not change $f$. Hence $Z$ is almost surely constant under the product law of $(X_1,\dots,X_n)$, and therefore $Z=\mathbb E[Z]$ almost surely. This gives the degenerate case.
[/step]