Massart Finite-Class Lemma — Statement & Proof

Massart Finite-Class Lemma (Theorem # 9826)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first prove the finite-vector inequality by bounding the logarithmic [moment generating function](/page/Moment%20Generating%20Function) of the maximum of the signed sums. [Jensen's inequality](/theorems/9) converts the expectation of the supremum into a logarithm of an exponential moment, the maximum is bounded by a sum over the finite set $T$, and independence of the Rademacher variables factors each exponential moment. The elementary inequality $\cosh u\le e^{u^2/2}$ gives the sub-Gaussian bound, after which optimizing over the parameter $\lambda>0$ gives the stated constant. The function-class statement follows by applying the vector result to the finite set of evaluation vectors $(f(x_1),\dots,f(x_n))$ and dividing by $n$. [/proofplan] [step:Reduce the expected supremum to an exponential moment] For each $t=(t_1,\dots,t_n)\in T$, define the real-valued [random variable](/page/Random%20Variable) \begin{align*} X_t:\Omega_\varepsilon\to\mathbb R,\qquad X_t(\omega):=\sum_{i=1}^{n}\varepsilon_i(\omega)t_i. \end{align*} Since $T$ is finite, the random variable \begin{align*} Y:\Omega_\varepsilon\to\mathbb R,\qquad Y(\omega):=\sup_{t\in T}X_t(\omega) \end{align*} is measurable and integrable. Fix $\lambda>0$. Since the logarithm is concave and $\exp(\lambda Y)>0$, [Jensen's inequality](/theorems/1977) for the probability measure $\mathbb P_\varepsilon$ gives \begin{align*} \lambda\mathbb E_\varepsilon[Y]=\mathbb E_\varepsilon[\log(\exp(\lambda Y))]\le \log\mathbb E_\varepsilon[\exp(\lambda Y)]. \end{align*} Thus \begin{align*} \mathbb E_\varepsilon[Y]\le \frac{1}{\lambda}\log\mathbb E_\varepsilon[\exp(\lambda Y)]. \end{align*} [guided] The goal is to estimate the expectation of a maximum. Exponential moments are useful here because exponentials turn maxima into quantities controlled by sums. For each vector $t=(t_1,\dots,t_n)\in T$, define \begin{align*} X_t:\Omega_\varepsilon\to\mathbb R,\qquad X_t(\omega):=\sum_{i=1}^{n}\varepsilon_i(\omega)t_i. \end{align*} The random variable whose expectation we need to bound is \begin{align*} Y:\Omega_\varepsilon\to\mathbb R,\qquad Y(\omega):=\sup_{t\in T}X_t(\omega). \end{align*} Because $T$ is finite and each $X_t$ is a finite linear combination of measurable Rademacher variables, $Y$ is the maximum of finitely many measurable real-valued random variables. Hence $Y$ is measurable. It is also integrable because \begin{align*} |Y|\le \sup_{t\in T}\sum_{i=1}^{n}|t_i| \end{align*} and the right-hand side is a finite deterministic constant. Fix $\lambda>0$. We apply Jensen's inequality to the concave function $\log:(0,\infty)\to\mathbb R$ and the positive random variable $\exp(\lambda Y)$. This gives \begin{align*} \mathbb E_\varepsilon[\log(\exp(\lambda Y))]\le \log\mathbb E_\varepsilon[\exp(\lambda Y)]. \end{align*} The left-hand side simplifies because $\log(\exp(\lambda Y))=\lambda Y$, so \begin{align*} \lambda\mathbb E_\varepsilon[Y]\le \log\mathbb E_\varepsilon[\exp(\lambda Y)]. \end{align*} Dividing by $\lambda>0$ yields \begin{align*} \mathbb E_\varepsilon[Y]\le \frac{1}{\lambda}\log\mathbb E_\varepsilon[\exp(\lambda Y)]. \end{align*} This is the point of introducing $\lambda$: it creates a bound with two competing terms, one decreasing in $\lambda$ and one increasing in $\lambda$, which can later be optimized. [/guided] [/step] [step:Bound the exponential moment by summing over $T$] For each $\omega\in\Omega_\varepsilon$, \begin{align*} \exp(\lambda Y(\omega))=\exp\left(\lambda\sup_{t\in T}X_t(\omega)\right)=\sup_{t\in T}\exp(\lambda X_t(\omega)). \end{align*} Since $T$ is finite and all summands are nonnegative, \begin{align*} \sup_{t\in T}\exp(\lambda X_t(\omega))\le \sum_{t\in T}\exp(\lambda X_t(\omega)). \end{align*} Taking expectations with respect to $\mathbb P_\varepsilon$ gives \begin{align*} \mathbb E_\varepsilon[\exp(\lambda Y)]\le \sum_{t\in T}\mathbb E_\varepsilon[\exp(\lambda X_t)]. \end{align*} [/step] [step:Use the Rademacher moment bound for each vector] Fix $t=(t_1,\dots,t_n)\in T$. By independence of $\varepsilon_1,\dots,\varepsilon_n$, \begin{align*} \mathbb E_\varepsilon[\exp(\lambda X_t)]=\prod_{i=1}^{n}\mathbb E_\varepsilon[\exp(\lambda\varepsilon_i t_i)]. \end{align*} For each $i\in\{1,\dots,n\}$, \begin{align*} \mathbb E_\varepsilon[\exp(\lambda\varepsilon_i t_i)]=\frac{e^{\lambda t_i}+e^{-\lambda t_i}}{2}=\cosh(\lambda t_i). \end{align*} For every real number $u$, the elementary estimate $\cosh u\le e^{u^2/2}$ holds. Indeed, \begin{align*} \cosh u=\sum_{k=0}^{\infty}\frac{u^{2k}}{(2k)!}\le \sum_{k=0}^{\infty}\frac{u^{2k}}{2^k k!}=e^{u^2/2}, \end{align*} where the inequality uses $(2k)!\ge 2^k k!$ for every integer $k\ge 0$. Therefore \begin{align*} \mathbb E_\varepsilon[\exp(\lambda X_t)]\le \prod_{i=1}^{n}\exp\left(\frac{\lambda^2t_i^2}{2}\right)=\exp\left(\frac{\lambda^2|t|^2}{2}\right). \end{align*} Since $|t|\le a$, this gives \begin{align*} \mathbb E_\varepsilon[\exp(\lambda X_t)]\le \exp\left(\frac{\lambda^2a^2}{2}\right). \end{align*} Combining this estimate with the previous step yields \begin{align*} \mathbb E_\varepsilon[\exp(\lambda Y)]\le |T|\exp\left(\frac{\lambda^2a^2}{2}\right). \end{align*} [/step] [step:Optimize the logarithmic bound] Using the estimates above, for every $\lambda>0$, \begin{align*} \mathbb E_\varepsilon[Y]\le \frac{\log |T|}{\lambda}+\frac{\lambda a^2}{2}. \end{align*} If $|T|=1$, write $T=\{t_0\}$. Then \begin{align*} \mathbb E_\varepsilon[Y]=\mathbb E_\varepsilon[X_{t_0}]=\sum_{i=1}^{n}t_{0,i}\mathbb E_\varepsilon[\varepsilon_i]=0. \end{align*} Since $\log |T|=0$, the desired estimate holds. Now assume $|T|\ge 2$. If $a=0$, then $|t|=0$ for every $t\in T$, so $t=0$ for every $t\in T$ and $Y=0$. Hence the desired estimate holds. If $a>0$, choose \begin{align*} \lambda:=\frac{\sqrt{2\log |T|}}{a}. \end{align*} Substituting this value into the preceding bound gives \begin{align*} \mathbb E_\varepsilon[Y]\le \frac{a\log |T|}{\sqrt{2\log |T|}}+\frac{a\sqrt{2\log |T|}}{2}=a\sqrt{2\log |T|}. \end{align*} Thus \begin{align*} \mathbb E_\varepsilon\left[\sup_{t\in T}\sum_{i=1}^{n}\varepsilon_i t_i\right]\le a\sqrt{2\log |T|}. \end{align*} [/step] [step:Apply the vector inequality to evaluation vectors] Define the finite set of evaluation vectors \begin{align*} T_{\mathcal F}:=\{(f(x_1),\dots,f(x_n))\in\mathbb R^n:f\in\mathcal F\}. \end{align*} Since $\mathcal F$ is nonempty and finite, $T_{\mathcal F}$ is nonempty and finite, and \begin{align*} |T_{\mathcal F}|\le |\mathcal F|. \end{align*} For every vector $t=(f(x_1),\dots,f(x_n))\in T_{\mathcal F}$, \begin{align*} |t|^2=\sum_{i=1}^{n}f(x_i)^2\le nb^2. \end{align*} Thus \begin{align*} \sup_{t\in T_{\mathcal F}}|t|\le \sqrt n\,b. \end{align*} Applying the finite-vector inequality to $T_{\mathcal F}$ with $a=\sqrt n\,b$ gives \begin{align*} \mathbb E_\varepsilon\left[\sup_{t\in T_{\mathcal F}}\sum_{i=1}^{n}\varepsilon_i t_i\right]\le \sqrt n\,b\sqrt{2\log |T_{\mathcal F}|}. \end{align*} By the definition of $T_{\mathcal F}$, \begin{align*} \sup_{t\in T_{\mathcal F}}\sum_{i=1}^{n}\varepsilon_i t_i=\sup_{f\in\mathcal F}\sum_{i=1}^{n}\varepsilon_i f(x_i). \end{align*} Dividing by $n$ and using $|T_{\mathcal F}|\le |\mathcal F|$ gives \begin{align*} \widehat{\mathfrak R}_n(\mathcal F;x_1,\dots,x_n)\le b\sqrt{\frac{2\log |\mathcal F|}{n}}. \end{align*} This proves the function-class consequence and completes the proof. [/step]

Prerequisites (0/5 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Expectation

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.

Massart Finite-Class Lemma (Theorem # 9826)

Discussion

Proof

Prerequisites (0/5 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Massart Finite-Class Lemma (Theorem # 9826)

Discussion

Proof

Prerequisites (0/5 completed)

Prerequisites Graph

Explore Further