Expected Rademacher Uniform Deviation Bound — Statement & Proof

Expected Rademacher Uniform Deviation Bound (Theorem # 9827)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the first one-sided deviation bound by applying one-sided symmetrisation to the expected supremum and then applying [McDiarmid's bounded differences inequality](/theorems/6072) to the sample-dependent supremum. Countability of $\mathcal F$ makes the relevant suprema measurable, and boundedness of the functions gives the bounded-differences constant $(b-a)/n$ in each coordinate. The second bound follows by applying the same argument to the reflected class $-\mathcal F$ and using Rademacher symmetry to identify its expected Rademacher complexity with $\mathfrak R_n(\mathcal F)$. [/proofplan] [step:Verify measurability and finiteness of the upper deviation supremum] Define the product measurable space \begin{align*} (S^n,\mathcal E^{\otimes n}) \end{align*} and define the sample map $X_{1:n}:(\Omega,\mathcal A)\to(S^n,\mathcal E^{\otimes n})$ by \begin{align*} X_{1:n}(\omega):=(X_1(\omega),\dots,X_n(\omega)). \end{align*} For each $f\in\mathcal F$, define \begin{align*} T_f:S^n\to\mathbb R,\qquad (x_1,\dots,x_n)\mapsto P f-\frac{1}{n}\sum_{i=1}^{n}f(x_i). \end{align*} Since $f$ is $\mathcal E$-measurable and bounded, $P f$ is finite and $T_f$ is $\mathcal E^{\otimes n}$-measurable. Because $\mathcal F$ is countable, the map $z:S^n\to\mathbb R$ defined by \begin{align*} z(x_1,\dots,x_n):=\sup_{f\in\mathcal F}T_f(x_1,\dots,x_n) \end{align*} is measurable as the pointwise supremum of a countable family of measurable real-valued maps. Moreover, for every $(x_1,\dots,x_n)\in S^n$ and every $f\in\mathcal F$, \begin{align*} a\le P f\le b \end{align*} and \begin{align*} a\le \frac{1}{n}\sum_{i=1}^{n}f(x_i)\le b, \end{align*} so \begin{align*} -(b-a)\le T_f(x_1,\dots,x_n)\le b-a. \end{align*} Thus $z$ is finite and bounded. Finally, \begin{align*} Z:=z(X_{1:n})=\sup_{f\in\mathcal F}(P f-P_n f) \end{align*} is a bounded real-valued [random variable](/page/Random%20Variable). [guided] We first isolate the random quantity to which concentration will be applied. The concentration inequality is a statement about a [measurable function](/page/Measurable%20Function) of independent inputs, so we must define that deterministic function and check measurability. Let $X_{1:n}:(\Omega,\mathcal A)\to(S^n,\mathcal E^{\otimes n})$ be the vector of observations, defined by \begin{align*} X_{1:n}(\omega):=(X_1(\omega),\dots,X_n(\omega)). \end{align*} For each $f\in\mathcal F$, define $T_f:S^n\to\mathbb R$ by \begin{align*} T_f(x_1,\dots,x_n):=P f-\frac{1}{n}\sum_{i=1}^{n}f(x_i). \end{align*} The integral $P f=\int_S f(x)\,dP(x)$ is finite because $a\le f\le b$. The map $(x_1,\dots,x_n)\mapsto f(x_i)$ is $\mathcal E^{\otimes n}$-measurable for each coordinate $i$, hence $T_f$ is measurable as a finite linear combination of measurable real-valued functions plus the constant $P f$. Now define $z:S^n\to\mathbb R$ by \begin{align*} z(x_1,\dots,x_n):=\sup_{f\in\mathcal F}T_f(x_1,\dots,x_n). \end{align*} The countability of $\mathcal F$ is used exactly here: a countable supremum of measurable real-valued functions is measurable. Without countability, this step would require an outer-probability or separability convention. The function $z$ is also bounded. Since each $f$ takes values in $[a,b]$, the integral average $P f$ lies in $[a,b]$, and every empirical average $\frac{1}{n}\sum_{i=1}^{n}f(x_i)$ also lies in $[a,b]$. Therefore, for every $f\in\mathcal F$ and every sample point $(x_1,\dots,x_n)\in S^n$, \begin{align*} -(b-a)\le P f-\frac{1}{n}\sum_{i=1}^{n}f(x_i)\le b-a. \end{align*} Taking the supremum over $f\in\mathcal F$ preserves finiteness, so $z$ is a bounded measurable map. Consequently \begin{align*} Z:=z(X_{1:n})=\sup_{f\in\mathcal F}(P f-P_n f) \end{align*} is a bounded real-valued random variable. [/guided] [/step] [step:Bound the expectation by one-sided symmetrisation] The class $\mathcal F$ is nonempty and countable, and each $f\in\mathcal F$ is measurable and $P$-integrable because $a\le f\le b$. Hence the hypotheses of [citetheorem:9824] apply to $\mathcal F$. Therefore \begin{align*} \mathbb E[Z]\le 2\mathbb E\left[\sup_{f\in\mathcal F}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(X_i)\right]. \end{align*} By the definition of $\mathfrak R_n(\mathcal F)$, this gives \begin{align*} \mathbb E[Z]\le 2\mathfrak R_n(\mathcal F). \end{align*} [/step] [step:Check the bounded differences constants for the upper deviation] Let $j\in\{1,\dots,n\}$. Let $x=(x_1,\dots,x_n)\in S^n$ and $x'=(x_1',\dots,x_n')\in S^n$ satisfy $x_i=x_i'$ for every $i\ne j$. For each $f\in\mathcal F$, \begin{align*} T_f(x)-T_f(x')=-\frac{1}{n}\bigl(f(x_j)-f(x_j')\bigr). \end{align*} Since $f(S)\subset[a,b]$, we have \begin{align*} |T_f(x)-T_f(x')|\le \frac{b-a}{n}. \end{align*} Taking suprema preserves this Lipschitz bound in the following sense: for all $f\in\mathcal F$, \begin{align*} T_f(x)\le T_f(x')+\frac{b-a}{n}\le z(x')+\frac{b-a}{n}. \end{align*} Taking the supremum over $f\in\mathcal F$ gives \begin{align*} z(x)\le z(x')+\frac{b-a}{n}. \end{align*} Interchanging $x$ and $x'$ gives \begin{align*} z(x')\le z(x)+\frac{b-a}{n}. \end{align*} Thus \begin{align*} |z(x)-z(x')|\le \frac{b-a}{n}. \end{align*} So $z$ has bounded differences with coordinate constants \begin{align*} c_j:=\frac{b-a}{n} \end{align*} for $j=1,\dots,n$. [/step] [step:Apply McDiarmid's inequality to obtain the upper deviation bound] The random variables $X_1,\dots,X_n$ are independent, and the measurable function $z:S^n\to\mathbb R$ has bounded differences constants $c_j=(b-a)/n$. By McDiarmid's bounded differences inequality, for every $t>0$, \begin{align*} \mathbb P\bigl(Z-\mathbb E[Z]\ge t\bigr)\le \exp\left(-\frac{2t^2}{\sum_{j=1}^{n}c_j^2}\right). \end{align*} Since \begin{align*} \sum_{j=1}^{n}c_j^2=n\left(\frac{b-a}{n}\right)^2=\frac{(b-a)^2}{n}, \end{align*} we obtain \begin{align*} \mathbb P\bigl(Z-\mathbb E[Z]\ge t\bigr)\le \exp\left(-\frac{2nt^2}{(b-a)^2}\right). \end{align*} Set \begin{align*} t_\delta:=(b-a)\sqrt{\frac{\log(1/\delta)}{2n}}. \end{align*} Because $\delta\in(0,1)$, $t_\delta>0$, and substitution gives \begin{align*} \mathbb P\bigl(Z-\mathbb E[Z]\ge t_\delta\bigr)\le \delta. \end{align*} Therefore, with probability at least $1-\delta$, \begin{align*} Z\le \mathbb E[Z]+t_\delta. \end{align*} Combining this with $\mathbb E[Z]\le 2\mathfrak R_n(\mathcal F)$ yields \begin{align*} \sup_{f\in\mathcal F}(P f-P_n f)\le 2\mathfrak R_n(\mathcal F)+(b-a)\sqrt{\frac{\log(1/\delta)}{2n}}. \end{align*} [guided] We now convert the expectation estimate into a high-probability estimate. The correct concentration tool is McDiarmid's bounded differences inequality, because $Z$ is a function of the independent inputs $X_1,\dots,X_n$ and changing one input changes every empirical average by at most $(b-a)/n$. The hypotheses required for McDiarmid's inequality are satisfied. First, $X_1,\dots,X_n$ are independent by assumption. Second, the map $z:S^n\to\mathbb R$ is measurable by the first step. Third, the previous step proved that if two samples differ only in coordinate $j$, then \begin{align*} |z(x)-z(x')|\le \frac{b-a}{n}. \end{align*} Thus the bounded differences constants are \begin{align*} c_j:=\frac{b-a}{n} \end{align*} for $j=1,\dots,n$. McDiarmid's bounded differences inequality gives, for every $t>0$, \begin{align*} \mathbb P\bigl(Z-\mathbb E[Z]\ge t\bigr)\le \exp\left(-\frac{2t^2}{\sum_{j=1}^{n}c_j^2}\right). \end{align*} The denominator is explicit: \begin{align*} \sum_{j=1}^{n}c_j^2=n\left(\frac{b-a}{n}\right)^2=\frac{(b-a)^2}{n}. \end{align*} Therefore \begin{align*} \mathbb P\bigl(Z-\mathbb E[Z]\ge t\bigr)\le \exp\left(-\frac{2nt^2}{(b-a)^2}\right). \end{align*} To make the right-hand side equal to $\delta$, choose \begin{align*} t_\delta:=(b-a)\sqrt{\frac{\log(1/\delta)}{2n}}. \end{align*} Since $\delta\in(0,1)$, $\log(1/\delta)>0$, so $t_\delta>0$. Substituting this value gives \begin{align*} \mathbb P\bigl(Z-\mathbb E[Z]\ge t_\delta\bigr)\le \delta. \end{align*} Equivalently, with probability at least $1-\delta$, \begin{align*} Z\le \mathbb E[Z]+t_\delta. \end{align*} Finally, the symmetrisation step gave \begin{align*} \mathbb E[Z]\le 2\mathfrak R_n(\mathcal F). \end{align*} Since $Z=\sup_{f\in\mathcal F}(P f-P_n f)$, we conclude \begin{align*} \sup_{f\in\mathcal F}(P f-P_n f)\le 2\mathfrak R_n(\mathcal F)+(b-a)\sqrt{\frac{\log(1/\delta)}{2n}}. \end{align*} [/guided] [/step] [step:Reflect the class to prove the lower deviation bound] Define the reflected class \begin{align*} -\mathcal F:=\{-f:f\in\mathcal F\}. \end{align*} Each $g\in-\mathcal F$ is a measurable function $g:S\to[-b,-a]$, and $-\mathcal F$ is nonempty and countable. Applying the already proved upper deviation bound to $-\mathcal F$ gives, with probability at least $1-\delta$, \begin{align*} \sup_{g\in-\mathcal F}(P g-P_n g)\le 2\mathfrak R_n(-\mathcal F)+(b-a)\sqrt{\frac{\log(1/\delta)}{2n}}. \end{align*} For $g=-f$, \begin{align*} P g-P_n g=P(-f)-P_n(-f)=P_n f-P f. \end{align*} Hence \begin{align*} \sup_{g\in-\mathcal F}(P g-P_n g)=\sup_{f\in\mathcal F}(P_n f-P f). \end{align*} It remains to identify the Rademacher complexities. Conditional on $X_1,\dots,X_n$, the vector $(-\varepsilon_1,\dots,-\varepsilon_n)$ has the same distribution as $(\varepsilon_1,\dots,\varepsilon_n)$. Therefore \begin{align*} \mathfrak R_n(-\mathcal F)=\mathbb E\left[\sup_{f\in\mathcal F}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i(-f)(X_i)\right]. \end{align*} By Rademacher symmetry, \begin{align*} \mathfrak R_n(-\mathcal F)=\mathbb E\left[\sup_{f\in\mathcal F}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i f(X_i)\right]=\mathfrak R_n(\mathcal F). \end{align*} Substitution yields, with probability at least $1-\delta$, \begin{align*} \sup_{f\in\mathcal F}(P_n f-P f)\le 2\mathfrak R_n(\mathcal F)+(b-a)\sqrt{\frac{\log(1/\delta)}{2n}}. \end{align*} This proves both asserted inequalities. [/step]

Prerequisites (0/4 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Distribution Definition Supremum Definition McDiarmid's Bounded Differences Inequality Theorem #6072 McDiarmid's Bounded Differences Inequality for Independent Non-Identically Distributed Inputs Theorem #6075 Hedge Nonidentifiability Criterion Probability & Statistics Time Inversion of Brownian Motion Brownian Motion Asymptotic Normality of Z-Estimators Probability & Statistics Stable Embedding Lower Bound for Sparse Vectors Probability & Statistics Cantelli Inequality Probability & Statistics Gaussian Prediction Interval Probability & Statistics Gaussian Log-Sobolev Concentration Probability & Statistics Doob's $L^p$ Inequality Martingale Theory Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.

Expected Rademacher Uniform Deviation Bound (Theorem # 9827)

Discussion

Proof

Prerequisites (0/4 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Expected Rademacher Uniform Deviation Bound (Theorem # 9827)

Discussion

Proof

Prerequisites (0/4 completed)

Prerequisites Graph

Explore Further