McDiarmid's Bounded Differences Inequality — Statement & Proof

McDiarmid's Bounded Differences Inequality (Theorem # 6072)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We expose the independent variables one at a time and form the Doob martingale of $Z$ with respect to the filtration generated by the revealed coordinates. The bounded differences hypothesis implies that, after the first $k-1$ coordinates are fixed, changing only the $k$-th coordinate changes the [conditional expectation](/page/Conditional%20Expectation) of $Z$ by at most $c_k$. Thus the $k$-th martingale increment is almost surely bounded by $c_k$. The desired tail estimate then follows from the [Azuma-Hoeffding inequality](/theorems/6071) for martingales with bounded increments. [/proofplan] [step:Build the coordinate filtration and the Doob martingale] For each $k\in\{0,1,\dots,n\}$, define the sub-$\sigma$-algebra \begin{align*} \mathcal F_k:=\sigma(Y_1,\dots,Y_k), \end{align*} with $\mathcal F_0:=\{\varnothing,\Omega\}$. Define the process $(M_k)_{k=0}^n$ by \begin{align*} M_k:=\mathbb E[Z\mid \mathcal F_k]. \end{align*} Since $Z\in L^1(\Omega,\mathcal F,\mathbb P)$, each conditional expectation $M_k$ is integrable and $\mathcal F_k$-measurable. Since $(\mathcal F_k)_{k=0}^n$ is increasing, the tower property gives, for $0\le k\le n-1$, \begin{align*} \mathbb E[M_{k+1}\mid \mathcal F_k]=\mathbb E[\mathbb E[Z\mid \mathcal F_{k+1}]\mid \mathcal F_k]=\mathbb E[Z\mid \mathcal F_k]=M_k. \end{align*} Thus $(M_k,\mathcal F_k)_{k=0}^n$ is a martingale. Moreover $M_0=\mathbb E[Z]$ and $M_n=Z$, because $Z=f(Y_1,\dots,Y_n)$ is $\mathcal F_n$-measurable. [/step] [step:Represent each conditional expectation by integrating over unrevealed coordinates] For each $j\in\{1,\dots,n\}$, let $\mu_j:=\mathbb P\circ Y_j^{-1}$ denote the law of $Y_j$ on $(E_j,\mathcal E_j)$. Independence gives that the law of $(Y_1,\dots,Y_n)$ is the product measure \begin{align*} \mu:=\mu_1\otimes\cdots\otimes\mu_n \end{align*} on $E_1\times\cdots\times E_n$. For each $k\in\{0,1,\dots,n\}$, define an extended-real measurable partial integral $\tilde h_k:E_1\times\cdots\times E_k\to[-\infty,\infty]$ by \begin{align*} \tilde h_k(y_1,\dots,y_k)=\int_{E_{k+1}\times\cdots\times E_n} f(y_1,\dots,y_k,u_{k+1},\dots,u_n)\, d(\mu_{k+1}\otimes\cdots\otimes\mu_n)(u_{k+1},\dots,u_n). \end{align*} Since $Z\in L^1$ and the joint law is $\mu$, [Fubini's theorem](/theorems/2961) gives a measurable set $A_k\subset E_1\times\cdots\times E_k$ with $(\mu_1\otimes\cdots\otimes\mu_k)(A_k)=1$ such that $\tilde h_k$ is finite on $A_k$. Define the real-valued measurable version $h_k:E_1\times\cdots\times E_k\to\mathbb R$ by $h_k=\tilde h_k$ on $A_k$ and $h_k=0$ on $(E_1\times\cdots\times E_k)\setminus A_k$. For $k=n$ we take $h_n=f$ with the original everywhere-defined function from the theorem statement, and for $k=0$ this means \begin{align*} h_0=\int_{E_1\times\cdots\times E_n} f(u_1,\dots,u_n)\, d\mu(u_1,\dots,u_n)=\mathbb E[Z]. \end{align*} We claim that, after choosing these versions, \begin{align*} M_k=h_k(Y_1,\dots,Y_k) \end{align*} almost surely for each $k$. Indeed, for every bounded $\mathcal F_k$-measurable [random variable](/page/Random%20Variable) $G:\Omega\to\mathbb R$, there exists a bounded measurable map $\varphi:E_1\times\cdots\times E_k\to\mathbb R$ such that $G=\varphi(Y_1,\dots,Y_k)$ almost surely. Using independence and the product-measure representation of the joint law, Fubini's theorem applies to the integrable function $\varphi f$. First, \begin{align*} \mathbb E[GZ]=\int_{E_1\times\cdots\times E_n}\varphi(y_1,\dots,y_k)f(y_1,\dots,y_n)\,d\mu(y_1,\dots,y_n). \end{align*} Fubini's theorem over the unrevealed coordinates gives \begin{align*} \mathbb E[GZ]=\int_{E_1\times\cdots\times E_k}\varphi(y_1,\dots,y_k)h_k(y_1,\dots,y_k)\,d(\mu_1\otimes\cdots\otimes\mu_k)(y_1,\dots,y_k). \end{align*} By the law of $(Y_1,\dots,Y_k)$, this last integral is \begin{align*} \mathbb E\left[G\,h_k(Y_1,\dots,Y_k)\right]. \end{align*} This is precisely the defining property of $\mathbb E[Z\mid\mathcal F_k]$. [guided] The purpose of this step is to make the conditional expectations concrete. A conditional expectation with respect to the first $k$ coordinates should be obtained by freezing those coordinates and averaging over the remaining independent coordinates. For each $j\in\{1,\dots,n\}$, define the law of $Y_j$ by \begin{align*} \mu_j:=\mathbb P\circ Y_j^{-1}. \end{align*} This is a probability measure on $(E_j,\mathcal E_j)$. Since $Y_1,\dots,Y_n$ are independent, the joint law of $(Y_1,\dots,Y_n)$ is \begin{align*} \mu:=\mu_1\otimes\cdots\otimes\mu_n. \end{align*} Now fix $k\in\{0,1,\dots,n\}$. First define the extended-real partial integral $\tilde h_k:E_1\times\cdots\times E_k\to[-\infty,\infty]$ by \begin{align*} \tilde h_k(y_1,\dots,y_k)=\int_{E_{k+1}\times\cdots\times E_n} f(y_1,\dots,y_k,u_{k+1},\dots,u_n)\, d(\mu_{k+1}\otimes\cdots\otimes\mu_n)(u_{k+1},\dots,u_n). \end{align*} The hypothesis $Z\in L^1(\Omega,\mathcal F,\mathbb P)$ means that $f$ is integrable with respect to the joint law $\mu$. Therefore Fubini's theorem gives a full-measure measurable set $A_k\subset E_1\times\cdots\times E_k$ on which $\tilde h_k$ is finite. We define the real-valued measurable version $h_k:E_1\times\cdots\times E_k\to\mathbb R$ by $h_k=\tilde h_k$ on $A_k$ and $h_k=0$ outside $A_k$. When $k=n$, no variables remain to average over, so $h_n=f$ with the original everywhere-defined function from the theorem statement. When $k=0$, no coordinates have been revealed, and the definition becomes \begin{align*} h_0=\int_{E_1\times\cdots\times E_n} f(u_1,\dots,u_n)\, d\mu(u_1,\dots,u_n)=\mathbb E[Z]. \end{align*} We verify that $h_k(Y_1,\dots,Y_k)$ is a version of $\mathbb E[Z\mid\mathcal F_k]$. Let $G:\Omega\to\mathbb R$ be any bounded $\mathcal F_k$-measurable random variable. Since $\mathcal F_k=\sigma(Y_1,\dots,Y_k)$, there is a bounded measurable map $\varphi:E_1\times\cdots\times E_k\to\mathbb R$ such that $G=\varphi(Y_1,\dots,Y_k)$ almost surely. The product $\varphi f$ is integrable because $\varphi$ is bounded and $f\in L^1(E_1\times\cdots\times E_n,\mu)$. Hence Fubini's theorem gives \begin{align*} \mathbb E[GZ]=\int_{E_1\times\cdots\times E_n}\varphi(y_1,\dots,y_k)f(y_1,\dots,y_n)\,d\mu(y_1,\dots,y_n). \end{align*} Averaging first over the unrevealed coordinates yields \begin{align*} \mathbb E[GZ]=\int_{E_1\times\cdots\times E_k}\varphi(y_1,\dots,y_k)h_k(y_1,\dots,y_k)\,d(\mu_1\otimes\cdots\otimes\mu_k)(y_1,\dots,y_k). \end{align*} Since $(Y_1,\dots,Y_k)$ has law $\mu_1\otimes\cdots\otimes\mu_k$, the last integral is \begin{align*} \mathbb E\left[G\,h_k(Y_1,\dots,Y_k)\right]. \end{align*} This identity for every bounded $\mathcal F_k$-measurable $G$ is the defining property of conditional expectation. Hence \begin{align*} M_k=\mathbb E[Z\mid\mathcal F_k]=h_k(Y_1,\dots,Y_k) \end{align*} almost surely. [/guided] [/step] [step:Bound each martingale increment by the corresponding coordinate oscillation] Fix $k\in\{1,\dots,n\}$. Let $\nu_{k-1}:=\mu_1\otimes\cdots\otimes\mu_{k-1}$, with $\nu_0$ the unit measure on a one-point space, and let $\lambda_k:=\mu_{k+1}\otimes\cdots\otimes\mu_n$, with $\lambda_n$ the unit measure on a one-point space. By Fubini's theorem applied to the integrable function $f$, there is a measurable set $B_{k-1}\subset E_1\times\cdots\times E_{k-1}$ with $\nu_{k-1}(B_{k-1})=1$ such that for each prefix $a=(a_1,\dots,a_{k-1})\in B_{k-1}$ the extended-real measurable map $\tilde g_a:E_k\to[-\infty,\infty]$ defined by \begin{align*} \tilde g_a(y):=\int_{E_{k+1}\times\cdots\times E_n}f(a_1,\dots,a_{k-1},y,u_{k+1},\dots,u_n)\,d\lambda_k(u_{k+1},\dots,u_n) \end{align*} is finite for $\mu_k$-almost every $y\in E_k$, agrees with $h_k(a_1,\dots,a_{k-1},y)$ for $\mu_k$-almost every $y\in E_k$, and satisfies \begin{align*} h_{k-1}(a)=\int_{E_k}\tilde g_a(y)\,d\mu_k(y). \end{align*} For $a\in B_{k-1}$, let $C_a\subset E_k$ be a measurable set with $\mu_k(C_a)=1$ on which these finite-version identities hold. For any $y,y'\in C_a$, the bounded differences hypothesis and integration over the tail variables give \begin{align*} |\tilde g_a(y)-\tilde g_a(y')|=\left|\int_{E_{k+1}\times\cdots\times E_n} [f(a_1,\dots,a_{k-1},y,u_{k+1},\dots,u_n)-f(a_1,\dots,a_{k-1},y',u_{k+1},\dots,u_n)]\, d(\mu_{k+1}\otimes\cdots\otimes\mu_n)(u_{k+1},\dots,u_n)\right|. \end{align*} Taking the absolute value inside the integral and using the bounded differences hypothesis for the $k$-th coordinate gives \begin{align*} |\tilde g_a(y)-\tilde g_a(y')|\le \int_{E_{k+1}\times\cdots\times E_n} c_k\, d(\mu_{k+1}\otimes\cdots\otimes\mu_n)(u_{k+1},\dots,u_n)=c_k. \end{align*} Thus the $\mu_k$-essential range of $\tilde g_a$ has diameter at most $c_k$. Moreover, for every $y\in C_a$, \begin{align*} \left|\tilde g_a(y)-\int_{E_k}\tilde g_a(y')\,d\mu_k(y')\right|\le \int_{E_k}|\tilde g_a(y)-\tilde g_a(y')|\,d\mu_k(y')\le c_k. \end{align*} Since $Y_k$ is independent of $\mathcal F_{k-1}$, the previous representation gives \begin{align*} M_{k-1}=\int_{E_k}\tilde g_{(Y_1,\dots,Y_{k-1})}(y)\,d\mu_k(y) \end{align*} almost surely, while \begin{align*} M_k=\tilde g_{(Y_1,\dots,Y_{k-1})}(Y_k) \end{align*} almost surely on the event where $(Y_1,\dots,Y_{k-1})\in B_{k-1}$ and $Y_k\in C_{(Y_1,\dots,Y_{k-1})}$. This event has probability one by independence and Fubini's theorem. Therefore \begin{align*} |M_k-M_{k-1}|\le c_k \end{align*} almost surely. [guided] The delicate point is that conditional expectations are only defined up to null sets, so we must not argue using the ordinary pointwise range of an arbitrarily modified version of $h_k$. We instead prove an essential-range statement on the full-measure set where the partial-integral formula is valid. Fix $k\in\{1,\dots,n\}$. Define $\nu_{k-1}:=\mu_1\otimes\cdots\otimes\mu_{k-1}$, with $\nu_0$ the unit measure on a one-point space, and define $\lambda_k:=\mu_{k+1}\otimes\cdots\otimes\mu_n$, with $\lambda_n$ the unit measure on a one-point space. Fubini's theorem applies because $f$ is integrable with respect to $\mu_1\otimes\cdots\otimes\mu_n$. Hence there is a measurable set $B_{k-1}\subset E_1\times\cdots\times E_{k-1}$ with $\nu_{k-1}(B_{k-1})=1$ such that, for each $a=(a_1,\dots,a_{k-1})\in B_{k-1}$, the extended-real measurable map $\tilde g_a:E_k\to[-\infty,\infty]$ defined by \begin{align*} \tilde g_a(y):=\int_{E_{k+1}\times\cdots\times E_n}f(a_1,\dots,a_{k-1},y,u_{k+1},\dots,u_n)\,d\lambda_k(u_{k+1},\dots,u_n) \end{align*} is finite for $\mu_k$-almost every $y$, agrees with $h_k(a_1,\dots,a_{k-1},y)$ for $\mu_k$-almost every $y$, and satisfies \begin{align*} h_{k-1}(a)=\int_{E_k}\tilde g_a(y)\,d\mu_k(y). \end{align*} Choose a measurable full-measure set $C_a\subset E_k$ on which these identities hold. For $y,y'\in C_a$, the two points differ only in the $k$-th coordinate, while the tail variables are the same. The bounded differences hypothesis gives a pointwise bound by $c_k$ inside the tail integral, so \begin{align*} |\tilde g_a(y)-\tilde g_a(y')|\le \int_{E_{k+1}\times\cdots\times E_n}c_k\,d\lambda_k(u_{k+1},\dots,u_n)=c_k. \end{align*} Now fix $y\in C_a$ and average this inequality over $y'\in E_k$. Since $C_a$ has full $\mu_k$-measure, \begin{align*} \left|\tilde g_a(y)-\int_{E_k}\tilde g_a(y')\,d\mu_k(y')\right|\le \int_{E_k}|\tilde g_a(y)-\tilde g_a(y')|\,d\mu_k(y')\le c_k. \end{align*} Finally take $a=(Y_1,\dots,Y_{k-1})$. The event $a\in B_{k-1}$ has probability one, and by independence plus Fubini the event $Y_k\in C_a$ also has probability one. On this event, \begin{align*} M_k=\tilde g_a(Y_k) \end{align*} and \begin{align*} M_{k-1}=\int_{E_k}\tilde g_a(y)\,d\mu_k(y). \end{align*} Therefore $|M_k-M_{k-1}|\le c_k$ almost surely. [/guided] [/step] [step:Apply the martingale bounded differences inequality] The martingale $(M_k,\mathcal F_k)_{k=0}^n$ is a finite real-valued martingale on the probability space $(\Omega,\mathcal F,\mathbb P)$. It is integrable, satisfies $M_0=\mathbb E[Z]$ and $M_n=Z$, and has deterministic increment bounds \begin{align*} |M_k-M_{k-1}|\le c_k \end{align*} almost surely for every $k\in\{1,\dots,n\}$. These are exactly the hypotheses of the [Azuma-Hoeffding Inequality](/theorems/6071) for a finite martingale with deterministic increment bounds. Applying that inequality gives, for every $t\ge 0$, \begin{align*} \mathbb P(M_n-M_0\ge t)\le\exp\left(-\frac{t^2}{2\sum_{k=1}^n c_k^2}\right). \end{align*} Substituting $M_n=Z$ and $M_0=\mathbb E[Z]$ yields \begin{align*} \mathbb P\left(Z-\mathbb E[Z]\ge t\right) \le \exp\left(-\frac{t^2}{2\sum_{k=1}^n c_k^2}\right). \end{align*} If $\sum_{k=1}^n c_k^2=0$, then $c_k=0$ for every $k$, so the increment bound gives $M_k=M_{k-1}$ almost surely for every $k$. Hence $Z=M_n=M_0=\mathbb E[Z]$ almost surely, and the stated convention for the right-hand side gives the same conclusion. This completes the proof. [/step]

Prerequisites (0/8 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Conditional Expectation Definition Random Variable Definition Expectation Definition Martingale Definition Event Definition Integral Formula For The Inhomogeneous Sobolev Norm Theorem #929 Integral Formula for $\eta_{L/K}$ Theorem #2384 Fubini's Theorem Theorem #2961 Independence of Disjoint Blocks Probability Theory RIP Sufficient Condition for Exact Basis Pursuit Recovery Probability & Statistics Slow-Rate Lasso Prediction Inequality Probability & Statistics Independence Is Preserved Under Complements Probability Theory Jensen's Inequality Probability Theory Doob's Maximal Inequality Martingale Theory $L^p$ Martingale Convergence Theorem Martingale Theory Iteratively Reweighted Least Squares Normal Equation Update Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.

McDiarmid's Bounded Differences Inequality (Theorem # 6072)

Discussion

Proof

Prerequisites (0/8 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

McDiarmid's Bounded Differences Inequality (Theorem # 6072)

Discussion

Proof

Prerequisites (0/8 completed)

Prerequisites Graph

Explore Further