Martingale Difference Bound under the Bounded Differences Condition

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We express the Doob martingale increments as differences between two conditional averages of $f$: one after revealing $X_i$ and one before revealing $X_i$. Independence lets us write these conditional expectations as product-measure integrals over the unrevealed coordinates. The bounded differences condition says that changing only the $i$-th coordinate changes each future-conditioned value by at most $c_i$, and averaging over future coordinates preserves that oscillation bound. Finally, the value after revealing $X_i$ and the average before revealing $X_i$ both lie in an interval of length at most $c_i$. [/proofplan] [step:Represent the conditional expectations by product-measure kernels] For each $j\in\{1,\dots,n\}$, let $\mu_j:=\mathbb P\circ X_j^{-1}$ be the law of the measurable map $X_j:(\Omega,\mathcal F)\to(\mathcal X_j,\mathcal A_j)$. Fix $i\in\{1,\dots,n\}$. Define the future product measure \begin{align*} \nu_i:=\mu_{i+1}\otimes\cdots\otimes\mu_n \end{align*} on $(\mathcal X_{i+1}\times\cdots\times\mathcal X_n,\mathcal A_{i+1}\otimes\cdots\otimes\mathcal A_n)$, with the convention that $\nu_n$ is the probability measure on the one-point measurable space. Since $f$ is product-measurable, the section \begin{align*} (x_1,\dots,x_i,y_{i+1},\dots,y_n)\mapsto f(x_1,\dots,x_i,y_{i+1},\dots,y_n) \end{align*} is measurable with respect to $(\mathcal A_1\otimes\cdots\otimes\mathcal A_i)\otimes(\mathcal A_{i+1}\otimes\cdots\otimes\mathcal A_n)$. Since the variables $X_1,\dots,X_n$ are independent, the law of $(X_1,\dots,X_n)$ is $\mu_1\otimes\cdots\otimes\mu_n$. Hence integrability of $Z=f(X_1,\dots,X_n)$ is exactly \begin{align*} \int_{\mathcal X_1\times\cdots\times\mathcal X_n} |f(x_1,\dots,x_n)|\,d(\mu_1\otimes\cdots\otimes\mu_n)(x_1,\dots,x_n)<\infty. \end{align*} Tonelli's theorem applied to $|f|$ with respect to $\mu_1\otimes\cdots\otimes\mu_n$ gives a finite section integral for $\mu_1\otimes\cdots\otimes\mu_i$-almost every $(x_1,\dots,x_i)$. [Fubini's theorem](/theorems/2961) also gives measurability and integrability of the section integral as a function of $(x_1,\dots,x_i)$. Define the measurable integrable map $g_i:\mathcal X_1\times\cdots\times\mathcal X_i\to\mathbb R$ by this section integral on the full-measure set where it is finite and by $0$ on its complement. We claim that \begin{align*} M_i=g_i(X_1,\dots,X_i) \end{align*} almost surely, after modifying $g_i$ on a null set if necessary. Indeed, for every bounded $\mathcal A_1\otimes\cdots\otimes\mathcal A_i$-measurable function $\varphi:\mathcal X_1\times\cdots\times\mathcal X_i\to\mathbb R$, independence gives \begin{align*} \mathbb E[\varphi(X_1,\dots,X_i)Z] = \int_{\mathcal X_1\times\cdots\times\mathcal X_i} \varphi(x_1,\dots,x_i)g_i(x_1,\dots,x_i)\, d(\mu_1\otimes\cdots\otimes\mu_i)(x_1,\dots,x_i). \end{align*} This is precisely the defining identity for [conditional expectation](/page/Conditional%20Expectation) with respect to $\mathcal F_i=\sigma(X_1,\dots,X_i)$. Similarly, since $g_i$ is integrable with respect to $\mu_1\otimes\cdots\otimes\mu_i$, Fubini's theorem gives a measurable integrable section integral over $\mathcal X_i$. Define the measurable integrable map $h_i:\mathcal X_1\times\cdots\times\mathcal X_{i-1}\to\mathbb R$ by \begin{align*} h_i(x_1,\dots,x_{i-1}):=\int_{\mathcal X_i} g_i(x_1,\dots,x_{i-1},u)\,d\mu_i(u) \end{align*} on the full-measure set where this integral is finite, and by $0$ on its complement. For every bounded $\mathcal A_1\otimes\cdots\otimes\mathcal A_{i-1}$-measurable function $\psi:\mathcal X_1\times\cdots\times\mathcal X_{i-1}\to\mathbb R$, the product representation of the law and Fubini's theorem give \begin{align*} \mathbb E[\psi(X_1,\dots,X_{i-1})Z] = \int_{\mathcal X_1\times\cdots\times\mathcal X_{i-1}} \psi(x_1,\dots,x_{i-1})h_i(x_1,\dots,x_{i-1})\, d(\mu_1\otimes\cdots\otimes\mu_{i-1})(x_1,\dots,x_{i-1}). \end{align*} Here Fubini's theorem applies because $\psi$ is bounded and $f$ is integrable under the full product law. This is the defining identity for conditional expectation with respect to $\mathcal F_{i-1}=\sigma(X_1,\dots,X_{i-1})$, so \begin{align*} M_{i-1}=h_i(X_1,\dots,X_{i-1}) \end{align*} almost surely. [guided] The martingale increment compares what we know after seeing $X_i$ with what we knew just before seeing $X_i$. To make this comparison concrete, we write both conditional expectations as ordinary integrals. For each coordinate $j$, define the law \begin{align*} \mu_j:=\mathbb P\circ X_j^{-1}. \end{align*} Because $X_1,\dots,X_n$ are independent, the law of the full random vector $(X_1,\dots,X_n)$ is the product measure \begin{align*} \mu_1\otimes\cdots\otimes\mu_n. \end{align*} Fix $i\in\{1,\dots,n\}$. The future variables are $X_{i+1},\dots,X_n$, so define their product law by \begin{align*} \nu_i:=\mu_{i+1}\otimes\cdots\otimes\mu_n, \end{align*} with the convention that when $i=n$, this is the probability measure on a one-point space. Now define the map $g_i:\mathcal X_1\times\cdots\times\mathcal X_i\to\mathbb R$ for $\mu_1\otimes\cdots\otimes\mu_i$-almost every $(x_1,\dots,x_i)$ by \begin{align*} g_i(x_1,\dots,x_i):=\int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n} f(x_1,\dots,x_i,y_{i+1},\dots,y_n)\,d\nu_i(y_{i+1},\dots,y_n). \end{align*} This is the expected value of $f$ after fixing the first $i$ coordinates and averaging over the independent future coordinates. The integrability hypothesis on $Z=f(X_1,\dots,X_n)$ says exactly that \begin{align*} \int_{\mathcal X_1\times\cdots\times\mathcal X_n} |f(x_1,\dots,x_n)|\,d(\mu_1\otimes\cdots\otimes\mu_n)(x_1,\dots,x_n)<\infty. \end{align*} Therefore Tonelli's theorem applied to $|f|$ implies that the future integral defining $g_i$ is finite for $\mu_1\otimes\cdots\otimes\mu_i$-almost every fixed past-and-present tuple. We verify that $g_i(X_1,\dots,X_i)$ is the conditional expectation of $Z$ given $\mathcal F_i$. Let $\varphi:\mathcal X_1\times\cdots\times\mathcal X_i\to\mathbb R$ be bounded and measurable. Since $\varphi(X_1,\dots,X_i)$ is $\mathcal F_i$-measurable, the defining identity for conditional expectation requires \begin{align*} \mathbb E[\varphi(X_1,\dots,X_i)Z]=\mathbb E[\varphi(X_1,\dots,X_i)g_i(X_1,\dots,X_i)]. \end{align*} Using the product law of $(X_1,\dots,X_n)$, the left-hand side is \begin{align*} \int_{\mathcal X_1\times\cdots\times\mathcal X_n} \varphi(x_1,\dots,x_i)f(x_1,\dots,x_n)\,d(\mu_1\otimes\cdots\otimes\mu_n)(x_1,\dots,x_n). \end{align*} [Fubini's theorem](/theorems/2961) applies because $\varphi$ is bounded and $f$ is integrable under the product law. Integrating first over the future coordinates gives \begin{align*} \int_{\mathcal X_1\times\cdots\times\mathcal X_i} \varphi(x_1,\dots,x_i)g_i(x_1,\dots,x_i)\,d(\mu_1\otimes\cdots\otimes\mu_i)(x_1,\dots,x_i), \end{align*} which equals \begin{align*} \mathbb E[\varphi(X_1,\dots,X_i)g_i(X_1,\dots,X_i)]. \end{align*} Thus \begin{align*} M_i=\mathbb E[Z\mid\mathcal F_i]=g_i(X_1,\dots,X_i) \end{align*} almost surely. Before revealing $X_i$, we average this same function over the possible values of $X_i$. Define the map $h_i:\mathcal X_1\times\cdots\times\mathcal X_{i-1}\to\mathbb R$ by \begin{align*} h_i(x_1,\dots,x_{i-1}):=\int_{\mathcal X_i}g_i(x_1,\dots,x_{i-1},u)\,d\mu_i(u). \end{align*} The same testing argument, now with bounded [measurable functions](/page/Measurable%20Functions) of $(X_1,\dots,X_{i-1})$, proves that \begin{align*} M_{i-1}=\mathbb E[Z\mid\mathcal F_{i-1}]=h_i(X_1,\dots,X_{i-1}) \end{align*} almost surely. This representation is the key reduction: the martingale increment is now a pointwise difference between a function value and its average over the $i$-th coordinate. [/guided] [/step] [step:Show that averaging over future coordinates preserves the bounded oscillation in the current coordinate] Let $\mu_{1:i-1}:=\mu_1\otimes\cdots\otimes\mu_{i-1}$, with the convention that $\mu_{1:0}$ is the probability measure on a one-point space. Applying Fubini's theorem to $|f|$ under $\mu_{1:i-1}\otimes\mu_i\otimes\nu_i$ and to $|g_i|$ under $\mu_{1:i-1}\otimes\mu_i$, there is a set $P_i\subset\mathcal X_1\times\cdots\times\mathcal X_{i-1}$ with $\mu_{1:i-1}(P_i)=1$ such that, for every $p=(x_1,\dots,x_{i-1})\in P_i$, the map $F_p:\mathcal X_i\times\mathcal X_{i+1}\times\cdots\times\mathcal X_n\to\mathbb R$ defined by \begin{align*} F_p(u,y_{i+1},\dots,y_n):=f(x_1,\dots,x_{i-1},u,y_{i+1},\dots,y_n) \end{align*} is integrable with respect to $\mu_i\otimes\nu_i$, and the identity \begin{align*} h_i(p)=\int_{\mathcal X_i}g_i(p,u)\,d\mu_i(u) \end{align*} holds. For such $p$, Fubini's theorem gives a set $U_p\subset\mathcal X_i$ with $\mu_i(U_p)=1$ such that $F_p(u,\cdot)$ is $\nu_i$-integrable and \begin{align*} g_i(p,u)=\int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n}F_p(u,y)\,d\nu_i(y) \end{align*} for every $u\in U_p$. No change of version is made in this step; all subsequent estimates are restricted to the full-measure section $U_p$, which is enough for the almost-sure martingale bound. Now let $u,v\in U_p$. For every $y=(y_{i+1},\dots,y_n)$, the points $(p,u,y)$ and $(p,v,y)$ differ only in coordinate $i$, so \begin{align*} |F_p(u,y)-F_p(v,y)|\le c_i. \end{align*} Integrating this inequality with respect to the probability measure $\nu_i$ and using the triangle inequality for integrals gives \begin{align*} |g_i(p,u)-g_i(p,v)|=\left|\int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n}(F_p(u,y)-F_p(v,y))\,d\nu_i(y)\right|. \end{align*} The triangle inequality for integrals gives \begin{align*} \left|\int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n}(F_p(u,y)-F_p(v,y))\,d\nu_i(y)\right|\le \int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n}|F_p(u,y)-F_p(v,y)|\,d\nu_i(y). \end{align*} Since $|F_p(u,y)-F_p(v,y)|\le c_i$ for every $y$ and $\nu_i$ is a probability measure, the last integral is at most $c_i$. Thus, for every $p\in P_i$, the map $u\mapsto g_i(p,u)$ has oscillation at most $c_i$ on the full-measure set $U_p\subset\mathcal X_i$. [guided] We now prove the main pointwise estimate behind the martingale bound: after the future variables have been averaged out, changing the current coordinate $u$ can still change the conditional average by no more than $c_i$. Define the past product measure \begin{align*} \mu_{1:i-1}:=\mu_1\otimes\cdots\otimes\mu_{i-1}, \end{align*} with the convention that $\mu_{1:0}$ is the probability measure on a one-point space. We apply [Fubini's theorem](/theorems/2961) to $|f|$ with respect to the product measure $\mu_{1:i-1}\otimes\mu_i\otimes\nu_i$. This is valid because the previous step showed that $f$ is integrable under the full product law. Therefore there is a set $P_i\subset\mathcal X_1\times\cdots\times\mathcal X_{i-1}$ satisfying $\mu_{1:i-1}(P_i)=1$ such that, for every $p=(x_1,\dots,x_{i-1})\in P_i$, the map $F_p:\mathcal X_i\times\mathcal X_{i+1}\times\cdots\times\mathcal X_n\to\mathbb R$ defined by \begin{align*} F_p(u,y_{i+1},\dots,y_n):=f(x_1,\dots,x_{i-1},u,y_{i+1},\dots,y_n) \end{align*} is integrable with respect to $\mu_i\otimes\nu_i$. Fix $p\in P_i$. Applying [Fubini's theorem](/theorems/2961) once more to the integrable function $F_p$ gives a full-measure set $U_p\subset\mathcal X_i$ such that $F_p(u,\cdot)$ is $\nu_i$-integrable for every $u\in U_p$. Choose $u_0\in U_p$; this is possible because $\mu_i(U_p)=1$. If $u\in\mathcal X_i$ and $y=(y_{i+1},\dots,y_n)$ is a future-coordinate tuple, then $(p,u,y)$ and $(p,u_0,y)$ differ only in coordinate $i$. The bounded differences condition therefore gives \begin{align*} |F_p(u,y)|\le |F_p(u_0,y)|+c_i. \end{align*} The right-hand side is $\nu_i$-integrable because $F_p(u_0,\cdot)$ is $\nu_i$-integrable and $\nu_i$ is a probability measure. Hence $F_p(u,\cdot)$ is $\nu_i$-integrable for every $u\in\mathcal X_i$, not merely for $\mu_i$-almost every $u$. We now choose a pointwise version of $g_i$: we keep the old values wherever the original section integral already defined the conditional-expectation representative, and for each $p\in P_i$ and remaining $u\in\mathcal X_i$ we define $g_i(p,u)$ by that same $\nu_i$-section integral. This modification occurs only on a $(\mu_1\otimes\cdots\otimes\mu_i)$-null set, so it does not change the almost-sure identity $M_i=g_i(X_1,\dots,X_i)$, but it lets us use $g_i(p,u)$ for every current-coordinate value $u$ at every $p\in P_i$. Now take arbitrary $u,v\in\mathcal X_i$. For every future tuple $y$, the two complete inputs $(p,u,y)$ and $(p,v,y)$ differ only in coordinate $i$, so the bounded differences condition yields \begin{align*} |F_p(u,y)-F_p(v,y)|\le c_i. \end{align*} By the definition of $g_i$ at the fixed past tuple $p$, \begin{align*} |g_i(p,u)-g_i(p,v)|=\left|\int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n}(F_p(u,y)-F_p(v,y))\,d\nu_i(y)\right|. \end{align*} The triangle inequality for integrals gives \begin{align*} \left|\int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n}(F_p(u,y)-F_p(v,y))\,d\nu_i(y)\right|\le \int_{\mathcal X_{i+1}\times\cdots\times\mathcal X_n}|F_p(u,y)-F_p(v,y)|\,d\nu_i(y). \end{align*} Since the integrand is bounded above by $c_i$ and $\nu_i$ has total mass $1$, this integral is at most $c_i$. Thus \begin{align*} |g_i(p,u)-g_i(p,v)|\le c_i. \end{align*} So for every $p\in P_i$, the current-coordinate section $u\mapsto g_i(p,u)$ has oscillation at most $c_i$ on all of $\mathcal X_i$. [/guided] [/step] [step:Compare the revealed value with its conditional average] Fix $p=(x_1,\dots,x_{i-1})\in P_i$ and $x_i\in U_p$. By the oscillation estimate from the previous step, for every $u\in U_p$, \begin{align*} g_i(p,x_i)-c_i\leq g_i(p,u)\leq g_i(p,x_i)+c_i. \end{align*} Since $\mu_i(U_p)=1$, integrating this double inequality with respect to the probability measure $\mu_i$ gives \begin{align*} g_i(p,x_i)-c_i \leq \int_{\mathcal X_i}g_i(p,u)\,d\mu_i(u) \leq g_i(p,x_i)+c_i. \end{align*} By the definition of $h_i$, this is \begin{align*} |g_i(x_1,\dots,x_i)-h_i(x_1,\dots,x_{i-1})|\leq c_i. \end{align*} Since $\mu_{1:i-1}(P_i)=1$ and $\mu_i(U_p)=1$ for every $p\in P_i$, independence and Fubini's theorem imply \begin{align*} \mathbb P((X_1,\dots,X_{i-1})\in P_i\text{ and }X_i\in U_{(X_1,\dots,X_{i-1})})=1. \end{align*} Substituting $(x_1,\dots,x_i)=(X_1,\dots,X_i)$ on this probability-one event and using the conditional expectation representations \begin{align*} M_i=g_i(X_1,\dots,X_i) \end{align*} and \begin{align*} M_{i-1}=h_i(X_1,\dots,X_{i-1}), \end{align*} we obtain \begin{align*} |M_i-M_{i-1}|\leq c_i \end{align*} almost surely. Since $i\in\{1,\dots,n\}$ was arbitrary, the bound holds for every martingale difference. [guided] Fix $p=(x_1,\dots,x_{i-1})\in P_i$ and $x_i\in\mathcal X_i$. The previous step proved that every value of the function $u\mapsto g_i(p,u)$ lies within distance $c_i$ of every other value. In particular, for every $u\in\mathcal X_i$, \begin{align*} g_i(p,x_i)-c_i\leq g_i(p,u)\leq g_i(p,x_i)+c_i. \end{align*} We now average this interval inequality with respect to the probability measure $\mu_i$. Since $\mu_i(\mathcal X_i)=1$, integration preserves the two endpoint constants and gives \begin{align*} g_i(p,x_i)-c_i\leq \int_{\mathcal X_i}g_i(p,u)\,d\mu_i(u)\leq g_i(p,x_i)+c_i. \end{align*} By the definition of $h_i$, this is exactly \begin{align*} |g_i(x_1,\dots,x_i)-h_i(x_1,\dots,x_{i-1})|\leq c_i. \end{align*} The set $P_i$ has $\mu_{1:i-1}$-measure $1$. Because independence identifies the law of $(X_1,\dots,X_{i-1})$ with $\mu_{1:i-1}$, we have $(X_1,\dots,X_{i-1})\in P_i$ almost surely. On this probability-one event, substitute $(x_1,\dots,x_i)=(X_1,\dots,X_i)$ into the pointwise estimate. The conditional expectation representations from the first step give \begin{align*} M_i=g_i(X_1,\dots,X_i) \end{align*} and \begin{align*} M_{i-1}=h_i(X_1,\dots,X_{i-1}). \end{align*} Therefore \begin{align*} |M_i-M_{i-1}|\leq c_i \end{align*} almost surely. Since the index $i\in\{1,\dots,n\}$ was arbitrary, this proves the asserted bound for every Doob martingale difference. [/guided] [/step]

Prerequisites (0/6 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.