Androma — The Home of Mathematics on the Internet

custom_env admin

[step:Bound each martingale increment by the corresponding coordinate oscillation]Fix $i\in\{1,\dots,n\}$. By [Fubini's theorem](/theorems/2961), applied to the integrable function $f\in L^1(\mathcal X,\mathcal A,\mu)$ on the finite product probability space, the following section integrals are finite for $\bigotimes_{j=1}^i\mu_j$-almost every prefix. Choose measurable versions on the remaining null set arbitrarily. Define the measurable map $g_i:\prod_{j=1}^i\mathcal X_j\to\mathbb R$ by \begin{align*} g_i(a_1,\dots,a_i) := \int_{\prod_{j=i+1}^n\mathcal X_j} f(a_1,\dots,a_i,u_{i+1},\dots,u_n) \,d\left(\bigotimes_{j=i+1}^n\mu_j\right)(u_{i+1},\dots,u_n), \end{align*} with the convention that, when $i=n$, the integral over the empty product is the value of $f(a_1,\dots,a_n)$. The standard product-space formula for [conditional expectation](/page/Conditional%20Expectation), justified by [Fubini's theorem](/theorems/2961), gives \begin{align*} M_i=g_i(Y_1,\dots,Y_i) \end{align*} almost surely. Similarly, \begin{align*} M_{i-1}= \int_{\mathcal X_i} g_i(Y_1,\dots,Y_{i-1},v_i)\,d\mu_i(v_i) \end{align*} almost surely. For $\bigotimes_{j=1}^{i-1}\mu_j$-almost every prefix $(a_1,\dots,a_{i-1})$, Fubini's theorem applied on $\mathcal X_i\times\mathcal X_i\times\prod_{j=i+1}^n\mathcal X_j$ gives, for $\mu_i\otimes\mu_i$-almost every pair $(v_i,w_i)$, \begin{align*} |g_i(a_1,\dots,a_{i-1},v_i)-g_i(a_1,\dots,a_{i-1},w_i)|\le c_i. \end{align*} This follows by integrating the bounded differences inequality in the future coordinates against $\bigotimes_{j=i+1}^n\mu_j$. Hence, for almost every fixed prefix, the essential range with respect to $\mu_i$ of the function $v_i\mapsto g_i(a_1,\dots,a_{i-1},v_i)$ has length at most $c_i$. Define $A_i:\mathcal X\to\mathbb R\cup\{-\infty\}$ to be the conditional essential infimum of $M_i$ given $\mathcal F_{i-1}$, and define $B_i:\mathcal X\to\mathbb R\cup\{\infty\}$ to be the conditional essential supremum of $M_i$ given $\mathcal F_{i-1}$. By definition of conditional essential infimum and conditional essential supremum, $A_i$ and $B_i$ are $\mathcal F_{i-1}$-measurable, and \begin{align*} A_i\le M_i\le B_i \end{align*} almost surely. The preceding fiberwise essential-range estimate gives \begin{align*} B_i-A_i\le c_i \end{align*} almost surely, because conditional essential bounds over $\mathcal F_{i-1}$ are computed by fixing the revealed prefix outside the Fubini null set and then taking essential bounds over the remaining $i$-th coordinate. Since $M_i$ is finite almost surely and this conditional essential range has finite length, the bounds $A_i$ and $B_i$ may be chosen finite almost surely after changing them on a null set. Since $M_{i-1}=\mathbb E_\mu[M_i\mid\mathcal F_{i-1}]$, the conditional average also satisfies \begin{align*} A_i\le M_{i-1}\le B_i \end{align*} almost surely. Therefore $D_i=M_i-M_{i-1}$ has conditional mean \begin{align*} \mathbb E_\mu[D_i\mid\mathcal F_{i-1}]=0 \end{align*} and, conditionally on $\mathcal F_{i-1}$, lies in the $\mathcal F_{i-1}$-measurable interval $[A_i-M_{i-1},B_i-M_{i-1}]$, whose length is at most $c_i$.[/step]

custom_env admin

[guided]The key point is that the future coordinates are still averaged against their original product law after the first $i$ coordinates have been revealed. This is exactly where independence is used, and no identical-distribution assumption is involved. Fix $i\in\{1,\dots,n\}$. Once the first $i$ coordinates are prescribed as \begin{align*} a=(a_1,\dots,a_i)\in\prod_{j=1}^i\mathcal X_j, \end{align*} the only remaining randomness is in the future coordinates. Because $f\in L^1(\mathcal X,\mathcal A,\mu)$ and $\mu$ is a finite product probability measure, [Fubini's theorem](/theorems/2961) implies that these future-coordinate sections are integrable for almost every prefix. We choose measurable finite versions on the exceptional null set and encode the remaining average by the measurable map $g_i:\prod_{j=1}^i\mathcal X_j\to\mathbb R$ defined by \begin{align*} g_i(a_1,\dots,a_i) := \int_{\prod_{j=i+1}^n\mathcal X_j} f(a_1,\dots,a_i,u_{i+1},\dots,u_n) \,d\left(\bigotimes_{j=i+1}^n\mu_j\right)(u_{i+1},\dots,u_n). \end{align*} When $i=n$, this means simply \begin{align*} g_n(a_1,\dots,a_n)=f(a_1,\dots,a_n). \end{align*} Since the joint law is the product measure $\mu=\bigotimes_{j=1}^n\mu_j$, the product-space formula for [conditional expectation](/page/Conditional%20Expectation), again justified by Fubini's theorem, gives \begin{align*} M_i=\mathbb E_\mu[f(Y)\mid\mathcal F_i] =g_i(Y_1,\dots,Y_i) \end{align*} almost surely. Conditioning on one fewer coordinate gives \begin{align*} M_{i-1} = \int_{\mathcal X_i} g_i(Y_1,\dots,Y_{i-1},v_i)\,d\mu_i(v_i) \end{align*} almost surely. Now fix a prefix $(a_1,\dots,a_{i-1})$ outside the null set supplied by Fubini's theorem. The bounded differences hypothesis says that replacing only the $i$-th coordinate changes $f$ by at most $c_i$, uniformly over all future coordinates on the product space. Thus Fubini's theorem, applied to the non-negative measurable function obtained from the absolute difference of the two future-coordinate sections, gives for $\mu_i\otimes\mu_i$-almost every pair $(v_i,w_i)$ the integrated inequality \begin{align*} |g_i(a_1,\dots,a_{i-1},v_i)-g_i(a_1,\dots,a_{i-1},w_i)|\le c_i. \end{align*} This almost-everywhere pairwise estimate is exactly the statement needed for essential ranges: as the $i$-th coordinate varies with respect to $\mu_i$, the conditional value $g_i$ has essential range length at most $c_i$. To turn this fiberwise statement into conditional bounds, define $A_i:\mathcal X\to\mathbb R\cup\{-\infty\}$ to be the conditional essential infimum of $M_i$ given $\mathcal F_{i-1}$, and define $B_i:\mathcal X\to\mathbb R\cup\{\infty\}$ to be the conditional essential supremum of $M_i$ given $\mathcal F_{i-1}$. These are not arbitrary fiberwise choices: by the definition of conditional essential infimum and supremum, $A_i$ and $B_i$ are $\mathcal F_{i-1}$-measurable random variables. They satisfy \begin{align*} A_i\le M_i\le B_i \end{align*} almost surely. Why does the length bound survive this conditional formulation? Outside the Fubini null set, conditioning on $\mathcal F_{i-1}$ fixes the prefix $(Y_1,\dots,Y_{i-1})$ and leaves the $i$-th coordinate distributed according to $\mu_i$. The preceding pairwise estimate says that the resulting essential range of $g_i$ over that coordinate has length at most $c_i$. Therefore the conditional essential bounds satisfy \begin{align*} B_i-A_i\le c_i \end{align*} almost surely. Since $M_i$ is real-valued almost surely and this conditional essential range has finite length, we may use finite versions of $A_i$ and $B_i$ after altering them on a null set. Since \begin{align*} M_{i-1}=\mathbb E_\mu[M_i\mid\mathcal F_{i-1}], \end{align*} the conditional average of a [random variable](/page/Random%20Variable) lying between $A_i$ and $B_i$ also lies between those same $\mathcal F_{i-1}$-measurable bounds: \begin{align*} A_i\le M_{i-1}\le B_i \end{align*} almost surely. Consequently the martingale increment \begin{align*} D_i=M_i-M_{i-1} \end{align*} has conditional mean \begin{align*} \mathbb E_\mu[D_i\mid\mathcal F_{i-1}]=0 \end{align*} and, conditionally on $\mathcal F_{i-1}$, lies in the interval $[A_i-M_{i-1},B_i-M_{i-1}]$, whose length is at most $c_i$. This is the precise form of the bounded increment estimate needed for the exponential argument.[/guided]

custom_env admin

[step:Estimate the exponential moment of the martingale sum] Define the variance proxy \begin{align*} V:=\sum_{i=1}^n c_i^2. \end{align*} We use the following elementary conditional Hoeffding estimate: if $W$ is an integrable real-valued random variable, $\mathcal G$ is a sub-$\sigma$-algebra, $\mathbb E[W\mid\mathcal G]=0$, and there are $\mathcal G$-measurable real-valued random variables $A$ and $B$ such that $A\le W\le B$ almost surely and $B-A\le c$, then for every $\lambda\in\mathbb R$, \begin{align*} \mathbb E[\exp(\lambda W)\mid\mathcal G]\le \exp\left(\frac{\lambda^2c^2}{8}\right). \end{align*} Indeed, $\mathbb E[W\mid\mathcal G]=0$ and $A\le W\le B$ imply $A\le 0\le B$ almost surely after taking conditional expectations. On each conditional interval $[A,B]$, convexity of the map $x\mapsto \exp(\lambda x)$ gives the chord bound \begin{align*} \exp(\lambda W) \le \frac{B-W}{B-A}\exp(\lambda A)+\frac{W-A}{B-A}\exp(\lambda B), \end{align*} with the equality case interpreted directly when $A=B$. Taking conditional expectation and using $\mathbb E[W\mid\mathcal G]=0$ gives the scalar expression with interval length $L:=B-A$ and \begin{align*} \theta:=\frac{B}{B-A}\in[0,1]. \end{align*} Since $A=-(1-\theta)L$ and $B=\theta L$, it remains to prove the scalar estimate \begin{align*} \theta e^{-\lambda(1-\theta)L}+(1-\theta)e^{\lambda\theta L} \le \exp\left(\frac{\lambda^2L^2}{8}\right). \end{align*} Set $s:=\lambda L$ and define $\varphi:[0,1]\to\mathbb R$ by \begin{align*} \varphi(\theta):=\log\left(\theta e^{-(1-\theta)s}+(1-\theta)e^{\theta s}\right). \end{align*} The standard one-variable proof of [Hoeffding's lemma](/theorems/1956) differentiates twice and gives \begin{align*} \varphi''(\theta)\le s^2 \end{align*} for $0\le\theta\le1$, while $\varphi(0)=\varphi(1)=0$. Applying [Taylor's theorem](/theorems/827) with remainder to $\varphi$ around its maximizing point on $[0,1]$ yields \begin{align*} \varphi(\theta)\le \frac{s^2}{8}. \end{align*} Thus \begin{align*} \theta e^{-\lambda(1-\theta)L}+(1-\theta)e^{\lambda\theta L} \le \exp\left(\frac{\lambda^2L^2}{8}\right) \le \exp\left(\frac{\lambda^2c^2}{8}\right), \end{align*} and the conditional Hoeffding estimate follows. Applying this estimate to $W=D_i$, $\mathcal G=\mathcal F_{i-1}$, and $c=c_i$, we obtain \begin{align*} \mathbb E_\mu[\exp(\lambda D_i)\mid\mathcal F_{i-1}] \le \exp\left(\frac{\lambda^2c_i^2}{8}\right). \end{align*} Iterating conditional expectations from $i=n$ down to $i=1$ gives \begin{align*} \mathbb E_\mu\left[\exp\left(\lambda\sum_{i=1}^nD_i\right)\right] \le \prod_{i=1}^n \exp\left(\frac{\lambda^2c_i^2}{8}\right) = \exp\left(\frac{\lambda^2}{8}\sum_{i=1}^n c_i^2\right). \end{align*} Since $\sum_{i=1}^nD_i=f(Y)-\mathbb E_\mu[f(Y)]$, this becomes \begin{align*} \mathbb E_\mu\left[\exp\left(\lambda(f(Y)-\mathbb E_\mu[f(Y)])\right)\right] \le \exp\left(\frac{\lambda^2V}{8}\right). \end{align*} [/step]

custom_env admin

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data