[guided]The key point is that the future coordinates are still averaged against their original product law after the first $i$ coordinates have been revealed. This is exactly where independence is used, and no identical-distribution assumption is involved.
Fix $i\in\{1,\dots,n\}$. Once the first $i$ coordinates are prescribed as
\begin{align*}
a=(a_1,\dots,a_i)\in\prod_{j=1}^i\mathcal X_j,
\end{align*}
the only remaining randomness is in the future coordinates. Because $f\in L^1(\mathcal X,\mathcal A,\mu)$ and $\mu$ is a finite product probability measure, [Fubini's theorem](/theorems/2961) implies that these future-coordinate sections are integrable for almost every prefix. We choose measurable finite versions on the exceptional null set and encode the remaining average by the measurable map $g_i:\prod_{j=1}^i\mathcal X_j\to\mathbb R$ defined by
\begin{align*}
g_i(a_1,\dots,a_i)
:=
\int_{\prod_{j=i+1}^n\mathcal X_j}
f(a_1,\dots,a_i,u_{i+1},\dots,u_n)
\,d\left(\bigotimes_{j=i+1}^n\mu_j\right)(u_{i+1},\dots,u_n).
\end{align*}
When $i=n$, this means simply
\begin{align*}
g_n(a_1,\dots,a_n)=f(a_1,\dots,a_n).
\end{align*}
Since the joint law is the product measure $\mu=\bigotimes_{j=1}^n\mu_j$, the product-space formula for [conditional expectation](/page/Conditional%20Expectation), again justified by Fubini's theorem, gives
\begin{align*}
M_i=\mathbb E_\mu[f(Y)\mid\mathcal F_i]
=g_i(Y_1,\dots,Y_i)
\end{align*}
almost surely. Conditioning on one fewer coordinate gives
\begin{align*}
M_{i-1}
=
\int_{\mathcal X_i} g_i(Y_1,\dots,Y_{i-1},v_i)\,d\mu_i(v_i)
\end{align*}
almost surely.
Now fix a prefix $(a_1,\dots,a_{i-1})$ outside the null set supplied by Fubini's theorem. The bounded differences hypothesis says that replacing only the $i$-th coordinate changes $f$ by at most $c_i$, uniformly over all future coordinates on the product space. Thus Fubini's theorem, applied to the non-negative measurable function obtained from the absolute difference of the two future-coordinate sections, gives for $\mu_i\otimes\mu_i$-almost every pair $(v_i,w_i)$ the integrated inequality
\begin{align*}
|g_i(a_1,\dots,a_{i-1},v_i)-g_i(a_1,\dots,a_{i-1},w_i)|\le c_i.
\end{align*}
This almost-everywhere pairwise estimate is exactly the statement needed for essential ranges: as the $i$-th coordinate varies with respect to $\mu_i$, the conditional value $g_i$ has essential range length at most $c_i$.
To turn this fiberwise statement into conditional bounds, define $A_i:\mathcal X\to\mathbb R\cup\{-\infty\}$ to be the conditional essential infimum of $M_i$ given $\mathcal F_{i-1}$, and define $B_i:\mathcal X\to\mathbb R\cup\{\infty\}$ to be the conditional essential supremum of $M_i$ given $\mathcal F_{i-1}$. These are not arbitrary fiberwise choices: by the definition of conditional essential infimum and supremum, $A_i$ and $B_i$ are $\mathcal F_{i-1}$-measurable random variables. They satisfy
\begin{align*}
A_i\le M_i\le B_i
\end{align*}
almost surely.
Why does the length bound survive this conditional formulation? Outside the Fubini null set, conditioning on $\mathcal F_{i-1}$ fixes the prefix $(Y_1,\dots,Y_{i-1})$ and leaves the $i$-th coordinate distributed according to $\mu_i$. The preceding pairwise estimate says that the resulting essential range of $g_i$ over that coordinate has length at most $c_i$. Therefore the conditional essential bounds satisfy
\begin{align*}
B_i-A_i\le c_i
\end{align*}
almost surely. Since $M_i$ is real-valued almost surely and this conditional essential range has finite length, we may use finite versions of $A_i$ and $B_i$ after altering them on a null set. Since
\begin{align*}
M_{i-1}=\mathbb E_\mu[M_i\mid\mathcal F_{i-1}],
\end{align*}
the conditional average of a [random variable](/page/Random%20Variable) lying between $A_i$ and $B_i$ also lies between those same $\mathcal F_{i-1}$-measurable bounds:
\begin{align*}
A_i\le M_{i-1}\le B_i
\end{align*}
almost surely. Consequently the martingale increment
\begin{align*}
D_i=M_i-M_{i-1}
\end{align*}
has conditional mean
\begin{align*}
\mathbb E_\mu[D_i\mid\mathcal F_{i-1}]=0
\end{align*}
and, conditionally on $\mathcal F_{i-1}$, lies in the interval $[A_i-M_{i-1},B_i-M_{i-1}]$, whose length is at most $c_i$. This is the precise form of the bounded increment estimate needed for the exponential argument.[/guided]