[proofplan]
We express the Doob martingale increments as differences between two conditional averages of $f$: one after revealing $X_i$ and one before revealing $X_i$. Independence lets us write these conditional expectations as product-measure integrals over the unrevealed coordinates. The bounded differences condition says that changing only the $i$-th coordinate changes each future-conditioned value by at most $c_i$, and averaging over future coordinates preserves that oscillation bound. Finally, the value after revealing $X_i$ and the average before revealing $X_i$ both lie in an interval of length at most $c_i$.
[/proofplan]
[step:Represent the conditional expectations by product-measure kernels]
For each $j\in\{1,\dots,n\}$, let $\mu_j:=\mathbb P\circ X_j^{-1}$ be the law of the measurable map $X_j:(\Omega,\mathcal F)\to(\mathcal X_j,\mathcal A_j)$. Fix $i\in\{1,\dots,n\}$. Define the future product measure
\begin{align*}
\nu_i:=\mu_{i+1}\otimes\cdots\otimes\mu_n
\end{align*}
on $(\mathcal X_{i+1}\times\cdots\times\mathcal X_n,\mathcal A_{i+1}\otimes\cdots\otimes\mathcal A_n)$, with the convention that $\nu_n$ is the probability measure on the one-point measurable space. Since $f$ is product-measurable, the section
\begin{align*}
(x_1,\dots,x_i,y_{i+1},\dots,y_n)\mapsto f(x_1,\dots,x_i,y_{i+1},\dots,y_n)
\end{align*}
is measurable with respect to $(\mathcal A_1\otimes\cdots\otimes\mathcal A_i)\otimes(\mathcal A_{i+1}\otimes\cdots\otimes\mathcal A_n)$.
Since the variables $X_1,\dots,X_n$ are independent, the law of $(X_1,\dots,X_n)$ is $\mu_1\otimes\cdots\otimes\mu_n$. Hence integrability of $Z=f(X_1,\dots,X_n)$ is exactly
\begin{align*}
\int_{\mathcal X_1\times\cdots\times\mathcal X_n}
|f(x_1,\dots,x_n)|\,d(\mu_1\otimes\cdots\otimes\mu_n)(x_1,\dots,x_n)<\infty.
\end{align*}
Tonelli's theorem applied to $|f|$ with respect to $\mu_1\otimes\cdots\otimes\mu_n$ gives a finite section integral for $\mu_1\otimes\cdots\otimes\mu_i$-almost every $(x_1,\dots,x_i)$. [Fubini's theorem](/theorems/2961) also gives measurability and integrability of the section integral as a function of $(x_1,\dots,x_i)$. Define the measurable integrable map $g_i:\mathcal X_1\times\cdots\times\mathcal X_i\to\mathbb R$ by this section integral on the full-measure set where it is finite and by $0$ on its complement.
We claim that
\begin{align*}
M_i=g_i(X_1,\dots,X_i)
\end{align*}
almost surely, after modifying $g_i$ on a null set if necessary. Indeed, for every bounded $\mathcal A_1\otimes\cdots\otimes\mathcal A_i$-measurable function $\varphi:\mathcal X_1\times\cdots\times\mathcal X_i\to\mathbb R$, independence gives
\begin{align*}
\mathbb E[\varphi(X_1,\dots,X_i)Z]
=
\int_{\mathcal X_1\times\cdots\times\mathcal X_i}
\varphi(x_1,\dots,x_i)g_i(x_1,\dots,x_i)\,
d(\mu_1\otimes\cdots\otimes\mu_i)(x_1,\dots,x_i).
\end{align*}
This is precisely the defining identity for [conditional expectation](/page/Conditional%20Expectation) with respect to $\mathcal F_i=\sigma(X_1,\dots,X_i)$.
Similarly, since $g_i$ is integrable with respect to $\mu_1\otimes\cdots\otimes\mu_i$, Fubini's theorem gives a measurable integrable section integral over $\mathcal X_i$. Define the measurable integrable map $h_i:\mathcal X_1\times\cdots\times\mathcal X_{i-1}\to\mathbb R$ by
\begin{align*}
h_i(x_1,\dots,x_{i-1}):=\int_{\mathcal X_i} g_i(x_1,\dots,x_{i-1},u)\,d\mu_i(u)
\end{align*}
on the full-measure set where this integral is finite, and by $0$ on its complement.
For every bounded $\mathcal A_1\otimes\cdots\otimes\mathcal A_{i-1}$-measurable function $\psi:\mathcal X_1\times\cdots\times\mathcal X_{i-1}\to\mathbb R$, the product representation of the law and Fubini's theorem give
\begin{align*}
\mathbb E[\psi(X_1,\dots,X_{i-1})Z]
=
\int_{\mathcal X_1\times\cdots\times\mathcal X_{i-1}}
\psi(x_1,\dots,x_{i-1})h_i(x_1,\dots,x_{i-1})\,
d(\mu_1\otimes\cdots\otimes\mu_{i-1})(x_1,\dots,x_{i-1}).
\end{align*}
Here Fubini's theorem applies because $\psi$ is bounded and $f$ is integrable under the full product law. This is the defining identity for conditional expectation with respect to $\mathcal F_{i-1}=\sigma(X_1,\dots,X_{i-1})$, so
\begin{align*}
M_{i-1}=h_i(X_1,\dots,X_{i-1})
\end{align*}
almost surely.
[guided]
The martingale increment compares what we know after seeing $X_i$ with what we knew just before seeing $X_i$. To make this comparison concrete, we write both conditional expectations as ordinary integrals.
For each coordinate $j$, define the law
\begin{align*}
\mu_j:=\mathbb P\circ X_j^{-1}.
\end{align*}
Because $X_1,\dots,X_n$ are independent, the law of the full random vector $(X_1,\dots,X_n)$ is the product measure
\begin{align*}
\mu_1\otimes\cdots\otimes\mu_n.
\end{align*}
Fix $i\in\{1,\dots,n\}$. The future variables are $X_{i+1},\dots,X_n$, so define their product law by
\begin{align*}
\nu_i:=\mu_{i+1}\otimes\cdots\otimes\mu_n,
\end{align*}
with the convention that when $i=n$, this is the probability measure on a one-point space.
Now define the map $g_i:\mathcal X_1\times\cdots\times\mathcal X_i\to\mathbb R$ for $\mu_1\otimes\cdots\otimes\mu_i$-almost every $(x_1,\dots,x_i)$ by
\begin{align*}
g_i(x_1,\dots,x_i):=\int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n} f(x_1,\dots,x_i,y_{i+1},\dots,y_n)\,d\nu_i(y_{i+1},\dots,y_n).
\end{align*}
This is the expected value of $f$ after fixing the first $i$ coordinates and averaging over the independent future coordinates. The integrability hypothesis on $Z=f(X_1,\dots,X_n)$ says exactly that
\begin{align*}
\int_{\mathcal X_1\times\cdots\times\mathcal X_n} |f(x_1,\dots,x_n)|\,d(\mu_1\otimes\cdots\otimes\mu_n)(x_1,\dots,x_n)<\infty.
\end{align*}
Therefore Tonelli's theorem applied to $|f|$ implies that the future integral defining $g_i$ is finite for $\mu_1\otimes\cdots\otimes\mu_i$-almost every fixed past-and-present tuple.
We verify that $g_i(X_1,\dots,X_i)$ is the conditional expectation of $Z$ given $\mathcal F_i$. Let $\varphi:\mathcal X_1\times\cdots\times\mathcal X_i\to\mathbb R$ be bounded and measurable. Since $\varphi(X_1,\dots,X_i)$ is $\mathcal F_i$-measurable, the defining identity for conditional expectation requires
\begin{align*}
\mathbb E[\varphi(X_1,\dots,X_i)Z]=\mathbb E[\varphi(X_1,\dots,X_i)g_i(X_1,\dots,X_i)].
\end{align*}
Using the product law of $(X_1,\dots,X_n)$, the left-hand side is
\begin{align*}
\int_{\mathcal X_1\times\cdots\times\mathcal X_n} \varphi(x_1,\dots,x_i)f(x_1,\dots,x_n)\,d(\mu_1\otimes\cdots\otimes\mu_n)(x_1,\dots,x_n).
\end{align*}
[Fubini's theorem](/theorems/2961) applies because $\varphi$ is bounded and $f$ is integrable under the product law. Integrating first over the future coordinates gives
\begin{align*}
\int_{\mathcal X_1\times\cdots\times\mathcal X_i} \varphi(x_1,\dots,x_i)g_i(x_1,\dots,x_i)\,d(\mu_1\otimes\cdots\otimes\mu_i)(x_1,\dots,x_i),
\end{align*}
which equals
\begin{align*}
\mathbb E[\varphi(X_1,\dots,X_i)g_i(X_1,\dots,X_i)].
\end{align*}
Thus
\begin{align*}
M_i=\mathbb E[Z\mid\mathcal F_i]=g_i(X_1,\dots,X_i)
\end{align*}
almost surely.
Before revealing $X_i$, we average this same function over the possible values of $X_i$. Define the map $h_i:\mathcal X_1\times\cdots\times\mathcal X_{i-1}\to\mathbb R$ by
\begin{align*}
h_i(x_1,\dots,x_{i-1}):=\int_{\mathcal X_i}g_i(x_1,\dots,x_{i-1},u)\,d\mu_i(u).
\end{align*}
The same testing argument, now with bounded [measurable functions](/page/Measurable%20Functions) of $(X_1,\dots,X_{i-1})$, proves that
\begin{align*}
M_{i-1}=\mathbb E[Z\mid\mathcal F_{i-1}]=h_i(X_1,\dots,X_{i-1})
\end{align*}
almost surely. This representation is the key reduction: the martingale increment is now a pointwise difference between a function value and its average over the $i$-th coordinate.
[/guided]
[/step]
[step:Show that averaging over future coordinates preserves the bounded oscillation in the current coordinate]
Let $\mu_{1:i-1}:=\mu_1\otimes\cdots\otimes\mu_{i-1}$, with the convention that $\mu_{1:0}$ is the probability measure on a one-point space. Applying Fubini's theorem to $|f|$ under $\mu_{1:i-1}\otimes\mu_i\otimes\nu_i$ and to $|g_i|$ under $\mu_{1:i-1}\otimes\mu_i$, there is a set $P_i\subset\mathcal X_1\times\cdots\times\mathcal X_{i-1}$ with $\mu_{1:i-1}(P_i)=1$ such that, for every $p=(x_1,\dots,x_{i-1})\in P_i$, the map $F_p:\mathcal X_i\times\mathcal X_{i+1}\times\cdots\times\mathcal X_n\to\mathbb R$ defined by
\begin{align*}
F_p(u,y_{i+1},\dots,y_n):=f(x_1,\dots,x_{i-1},u,y_{i+1},\dots,y_n)
\end{align*}
is integrable with respect to $\mu_i\otimes\nu_i$, and the identity
\begin{align*}
h_i(p)=\int_{\mathcal X_i}g_i(p,u)\,d\mu_i(u)
\end{align*}
holds. For such $p$, Fubini's theorem gives a set $U_p\subset\mathcal X_i$ with $\mu_i(U_p)=1$ such that $F_p(u,\cdot)$ is $\nu_i$-integrable and
\begin{align*}
g_i(p,u)=\int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n}F_p(u,y)\,d\nu_i(y)
\end{align*}
for every $u\in U_p$. No change of version is made in this step; all subsequent estimates are restricted to the full-measure section $U_p$, which is enough for the almost-sure martingale bound.
Now let $u,v\in U_p$. For every $y=(y_{i+1},\dots,y_n)$, the points $(p,u,y)$ and $(p,v,y)$ differ only in coordinate $i$, so
\begin{align*}
|F_p(u,y)-F_p(v,y)|\le c_i.
\end{align*}
Integrating this inequality with respect to the probability measure $\nu_i$ and using the triangle inequality for integrals gives
\begin{align*}
|g_i(p,u)-g_i(p,v)|=\left|\int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n}(F_p(u,y)-F_p(v,y))\,d\nu_i(y)\right|.
\end{align*}
The triangle inequality for integrals gives
\begin{align*}
\left|\int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n}(F_p(u,y)-F_p(v,y))\,d\nu_i(y)\right|\le \int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n}|F_p(u,y)-F_p(v,y)|\,d\nu_i(y).
\end{align*}
Since $|F_p(u,y)-F_p(v,y)|\le c_i$ for every $y$ and $\nu_i$ is a probability measure, the last integral is at most $c_i$. Thus, for every $p\in P_i$, the map $u\mapsto g_i(p,u)$ has oscillation at most $c_i$ on the full-measure set $U_p\subset\mathcal X_i$.
[guided]
We now prove the main pointwise estimate behind the martingale bound: after the future variables have been averaged out, changing the current coordinate $u$ can still change the conditional average by no more than $c_i$.
Define the past product measure
\begin{align*}
\mu_{1:i-1}:=\mu_1\otimes\cdots\otimes\mu_{i-1},
\end{align*}
with the convention that $\mu_{1:0}$ is the probability measure on a one-point space. We apply [Fubini's theorem](/theorems/2961) to $|f|$ with respect to the product measure $\mu_{1:i-1}\otimes\mu_i\otimes\nu_i$. This is valid because the previous step showed that $f$ is integrable under the full product law. Therefore there is a set $P_i\subset\mathcal X_1\times\cdots\times\mathcal X_{i-1}$ satisfying $\mu_{1:i-1}(P_i)=1$ such that, for every $p=(x_1,\dots,x_{i-1})\in P_i$, the map $F_p:\mathcal X_i\times\mathcal X_{i+1}\times\cdots\times\mathcal X_n\to\mathbb R$ defined by
\begin{align*}
F_p(u,y_{i+1},\dots,y_n):=f(x_1,\dots,x_{i-1},u,y_{i+1},\dots,y_n)
\end{align*}
is integrable with respect to $\mu_i\otimes\nu_i$.
Fix $p\in P_i$. Applying [Fubini's theorem](/theorems/2961) once more to the integrable function $F_p$ gives a full-measure set $U_p\subset\mathcal X_i$ such that $F_p(u,\cdot)$ is $\nu_i$-integrable for every $u\in U_p$. Choose $u_0\in U_p$; this is possible because $\mu_i(U_p)=1$. If $u\in\mathcal X_i$ and $y=(y_{i+1},\dots,y_n)$ is a future-coordinate tuple, then $(p,u,y)$ and $(p,u_0,y)$ differ only in coordinate $i$. The bounded differences condition therefore gives
\begin{align*}
|F_p(u,y)|\le |F_p(u_0,y)|+c_i.
\end{align*}
The right-hand side is $\nu_i$-integrable because $F_p(u_0,\cdot)$ is $\nu_i$-integrable and $\nu_i$ is a probability measure. Hence $F_p(u,\cdot)$ is $\nu_i$-integrable for every $u\in\mathcal X_i$, not merely for $\mu_i$-almost every $u$. We now choose a pointwise version of $g_i$: we keep the old values wherever the original section integral already defined the conditional-expectation representative, and for each $p\in P_i$ and remaining $u\in\mathcal X_i$ we define $g_i(p,u)$ by that same $\nu_i$-section integral. This modification occurs only on a $(\mu_1\otimes\cdots\otimes\mu_i)$-null set, so it does not change the almost-sure identity $M_i=g_i(X_1,\dots,X_i)$, but it lets us use $g_i(p,u)$ for every current-coordinate value $u$ at every $p\in P_i$.
Now take arbitrary $u,v\in\mathcal X_i$. For every future tuple $y$, the two complete inputs $(p,u,y)$ and $(p,v,y)$ differ only in coordinate $i$, so the bounded differences condition yields
\begin{align*}
|F_p(u,y)-F_p(v,y)|\le c_i.
\end{align*}
By the definition of $g_i$ at the fixed past tuple $p$,
\begin{align*}
|g_i(p,u)-g_i(p,v)|=\left|\int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n}(F_p(u,y)-F_p(v,y))\,d\nu_i(y)\right|.
\end{align*}
The triangle inequality for integrals gives
\begin{align*}
\left|\int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n}(F_p(u,y)-F_p(v,y))\,d\nu_i(y)\right|\le \int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n}|F_p(u,y)-F_p(v,y)|\,d\nu_i(y).
\end{align*}
Since the integrand is bounded above by $c_i$ and $\nu_i$ has total mass $1$, this integral is at most $c_i$. Thus
\begin{align*}
|g_i(p,u)-g_i(p,v)|\le c_i.
\end{align*}
So for every $p\in P_i$, the current-coordinate section $u\mapsto g_i(p,u)$ has oscillation at most $c_i$ on all of $\mathcal X_i$.
[/guided]
[/step]
[step:Compare the revealed value with its conditional average]
Fix $p=(x_1,\dots,x_{i-1})\in P_i$ and $x_i\in U_p$. By the oscillation estimate from the previous step, for every $u\in U_p$,
\begin{align*}
g_i(p,x_i)-c_i\leq g_i(p,u)\leq g_i(p,x_i)+c_i.
\end{align*}
Since $\mu_i(U_p)=1$, integrating this double inequality with respect to the probability measure $\mu_i$ gives
\begin{align*}
g_i(p,x_i)-c_i
\leq
\int_{\mathcal X_i}g_i(p,u)\,d\mu_i(u)
\leq
g_i(p,x_i)+c_i.
\end{align*}
By the definition of $h_i$, this is
\begin{align*}
|g_i(x_1,\dots,x_i)-h_i(x_1,\dots,x_{i-1})|\leq c_i.
\end{align*}
Since $\mu_{1:i-1}(P_i)=1$ and $\mu_i(U_p)=1$ for every $p\in P_i$, independence and Fubini's theorem imply
\begin{align*}
\mathbb P((X_1,\dots,X_{i-1})\in P_i\text{ and }X_i\in U_{(X_1,\dots,X_{i-1})})=1.
\end{align*}
Substituting $(x_1,\dots,x_i)=(X_1,\dots,X_i)$ on this probability-one event and using the conditional expectation representations
\begin{align*}
M_i=g_i(X_1,\dots,X_i)
\end{align*}
and
\begin{align*}
M_{i-1}=h_i(X_1,\dots,X_{i-1}),
\end{align*}
we obtain
\begin{align*}
|M_i-M_{i-1}|\leq c_i
\end{align*}
almost surely. Since $i\in\{1,\dots,n\}$ was arbitrary, the bound holds for every martingale difference.
[guided]
Fix $p=(x_1,\dots,x_{i-1})\in P_i$ and $x_i\in\mathcal X_i$. The previous step proved that every value of the function $u\mapsto g_i(p,u)$ lies within distance $c_i$ of every other value. In particular, for every $u\in\mathcal X_i$,
\begin{align*}
g_i(p,x_i)-c_i\leq g_i(p,u)\leq g_i(p,x_i)+c_i.
\end{align*}
We now average this interval inequality with respect to the probability measure $\mu_i$. Since $\mu_i(\mathcal X_i)=1$, integration preserves the two endpoint constants and gives
\begin{align*}
g_i(p,x_i)-c_i\leq \int_{\mathcal X_i}g_i(p,u)\,d\mu_i(u)\leq g_i(p,x_i)+c_i.
\end{align*}
By the definition of $h_i$, this is exactly
\begin{align*}
|g_i(x_1,\dots,x_i)-h_i(x_1,\dots,x_{i-1})|\leq c_i.
\end{align*}
The set $P_i$ has $\mu_{1:i-1}$-measure $1$. Because independence identifies the law of $(X_1,\dots,X_{i-1})$ with $\mu_{1:i-1}$, we have $(X_1,\dots,X_{i-1})\in P_i$ almost surely. On this probability-one event, substitute $(x_1,\dots,x_i)=(X_1,\dots,X_i)$ into the pointwise estimate. The conditional expectation representations from the first step give
\begin{align*}
M_i=g_i(X_1,\dots,X_i)
\end{align*}
and
\begin{align*}
M_{i-1}=h_i(X_1,\dots,X_{i-1}).
\end{align*}
Therefore
\begin{align*}
|M_i-M_{i-1}|\leq c_i
\end{align*}
almost surely. Since the index $i\in\{1,\dots,n\}$ was arbitrary, this proves the asserted bound for every Doob martingale difference.
[/guided]
[/step]