Uniform Entropy Integral Sufficient Condition for Donsker Classes

Uniform Entropy Integral Sufficient Condition for Donsker Classes (Theorem # 6306)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove Donsker convergence by verifying the two standard ingredients for [weak convergence](/page/Weak%20Convergence) of empirical processes in $\ell^\infty(\mathcal F)$: finite-dimensional convergence and asymptotic equicontinuity for the intrinsic $L^2(P)$ semimetric. The square-integrable envelope gives the required second moments and domination, while the uniform entropy integral gives the maximal inequality that forces asymptotic equicontinuity. Once these two conditions are established, the empirical-process convergence criterion yields a tight centred Gaussian limit, and the covariance is computed directly from the finite-dimensional [central limit theorem](/theorems/521). [/proofplan] [step:Define the empirical process and its finite-dimensional marginals] Let $(\Omega,\mathcal G,\mathbb P)$ be the probability space on which the sample is defined. For each $n \in \mathbb N$, let $X_1,\dots,X_n$ be independent measurable maps $X_i:(\Omega,\mathcal G)\to(\mathcal X,\mathcal A)$ with common law $P$. For every $P$-integrable measurable function $h:\mathcal X\to\mathbb R$, write \begin{align*} P h=\int_{\mathcal X} h(x)\,dP(x). \end{align*} Define the empirical process map $\alpha_n:\mathcal F\to\mathbb R$ by \begin{align*} \alpha_n(f)=\frac{1}{\sqrt n}\sum_{i=1}^n \bigl(f(X_i)-P f\bigr),\qquad f\in\mathcal F. \end{align*} Since $|f|\leq F_e$ and $P F_e^2<\infty$, every $f\in\mathcal F$ belongs to $L^2(P)$ and hence $P|f|<\infty$ by Cauchy-Schwarz. For fixed $f_1,\dots,f_k\in\mathcal F$, define the random vector $Z_i:\Omega\to\mathbb R^k$ by \begin{align*} Z_i(\omega)=\bigl(f_1(X_i(\omega))-P f_1,\dots,f_k(X_i(\omega))-P f_k\bigr),\qquad \omega\in\Omega. \end{align*} The vectors $Z_1,\dots,Z_n$ are independent and identically distributed because $X_1,\dots,X_n$ are independent and identically distributed. They have mean zero by the definition of $P f_j$, and they have finite second moments since $|f_j|\leq F_e$ and $P F_e^2<\infty$ for each $j\in\{1,\dots,k\}$. To prove vector convergence, fix an arbitrary vector $a=(a_1,\dots,a_k)\in\mathbb R^k$ and define the real-valued [random variable](/page/Random%20Variable) $Y_i^a:\Omega\to\mathbb R$ by \begin{align*} Y_i^a(\omega)=\sum_{j=1}^k a_j\bigl(f_j(X_i(\omega))-P f_j\bigr),\qquad \omega\in\Omega. \end{align*} The variables $Y_1^a,Y_2^a,\dots$ are independent and identically distributed, have mean zero, and have finite variance because \begin{align*} |Y_i^a|^2\leq \left(\sum_{j=1}^k |a_j|\,|f_j(X_i)-P f_j|\right)^2 \leq k\sum_{j=1}^k |a_j|^2 |f_j(X_i)-P f_j|^2 \end{align*} and each summand has finite expectation. Applying the [Central Limit Theorem](/theorems/532) to $(Y_i^a)_{i\geq1}$ gives \begin{align*} \sum_{j=1}^k a_j\alpha_n(f_j)=\frac{1}{\sqrt n}\sum_{i=1}^n Y_i^a \xrightarrow{d} \mathcal N(0,a^\top\Sigma a), \end{align*} where the matrix $\Sigma\in\mathbb R^{k\times k}$ is defined by \begin{align*} \Sigma_{ij}=P(f_i f_j)-P f_i\,P f_j. \end{align*} By the Cramer-Wold characterization of convergence in distribution in finite-dimensional Euclidean spaces, this proves \begin{align*} \bigl(\alpha_n(f_1),\dots,\alpha_n(f_k)\bigr) \xrightarrow{d} \mathcal N_k(0,\Sigma). \end{align*} Thus the finite-dimensional distributions converge to those of a centred Gaussian random map with the covariance stated in the theorem. [/step] [step:Use the entropy integral to obtain asymptotic equicontinuity] Define the intrinsic semimetric $\rho_P: \mathcal F\times\mathcal F\to[0,\infty)$ by \begin{align*} \rho_P(f,g)=\|f-g\|_{L^2(P)}. \end{align*} For a finitely supported probability measure $Q$ with $0<\|F_e\|_{L^2(Q)}<\infty$, let $N(r,\mathcal H,L^2(Q))$ denote the least number of $L^2(Q)$-balls of radius $r$ needed to cover a function class $\mathcal H$. If $\{f_1,\dots,f_m\}$ is an $r$-cover of $\mathcal F$ in $L^2(Q)$, then $\{f_a-f_b:1\leq a,b\leq m\}$ is a $2r$-cover of $\mathcal F-\mathcal F$ in $L^2(Q)$, because \begin{align*} \|(f-g)-(f_a-f_b)\|_{L^2(Q)}\leq \|f-f_a\|_{L^2(Q)}+\|g-f_b\|_{L^2(Q)}<2r. \end{align*} Thus \begin{align*} N(2r,\mathcal F-\mathcal F,L^2(Q))\leq N(r,\mathcal F,L^2(Q))^2. \end{align*} The envelope of $\mathcal F-\mathcal F$ is $2F_e$, and $P(2F_e)^2=4P F_e^2<\infty$. After the change of variables $u=2r$, the preceding covering inequality shows that the uniform entropy integral of the increment class with envelope $2F_e$ is finite whenever $J(1,\mathcal F)<\infty$. For $\delta>0$, define the localized increment class $\mathcal H_\delta$ by \begin{align*} \mathcal H_\delta=\{f-g:f,g\in\mathcal F,\ \rho_P(f,g)<\delta\}. \end{align*} This class has envelope $2F_e$ and satisfies \begin{align*} \sup_{h\in\mathcal H_\delta}\|h\|_{L^2(P)}\leq \delta. \end{align*} We use the uniform entropy maximal inequality for pointwise measurable empirical-process classes. In the form needed here, it states the following: if $\mathcal H$ is pointwise measurable, has envelope $H_e$, satisfies $P H_e^2<\infty$, and has finite uniform entropy integral relative to $H_e$, then the local empirical-process expectations over the set of all $h\in\mathcal H$ satisfying $\|h\|_{L^2(P)}<\eta$ obey \begin{align*} \lim_{\eta\downarrow0}\limsup_{n\to\infty} \mathbb E\left[\sup\left\{\left|\frac{1}{\sqrt n}\sum_{i=1}^n\bigl(h(X_i)-P h\bigr)\right|:h\in\mathcal H,\ \|h\|_{L^2(P)}<\eta\right\}\right]=0. \end{align*} The pointwise measurability hypothesis is inherited by $\mathcal F-\mathcal F$ from the pointwise measurable class $\mathcal F$, because a countable pointwise-dense subclass of $\mathcal F$ gives the countable difference subclass for increments. The square-integrable-envelope tail condition required by this maximal inequality is exactly $P(2F_e)^2<\infty$. This result is strictly weaker than the theorem being proved: it gives only a localized maximal expectation bound for increments, and it does not assert weak convergence, Gaussian limits, or the Donsker property. We apply this maximal inequality to $\mathcal H=\mathcal F-\mathcal F$ with envelope $H_e=2F_e$. The preceding covering calculation verifies the finite entropy hypothesis, pointwise measurability verifies the separability hypothesis, and $P(2F_e)^2<\infty$ verifies the square-integrability and envelope-tail hypotheses. Therefore \begin{align*} \lim_{\delta\downarrow0}\limsup_{n\to\infty} \mathbb E\left[\sup_{h\in\mathcal H_\delta}\left|\frac{1}{\sqrt n}\sum_{i=1}^n\bigl(h(X_i)-P h\bigr)\right|\right]=0. \end{align*} For $h=f-g$, the displayed empirical process equals $\alpha_n(f)-\alpha_n(g)$. Markov's inequality then gives, for every $\varepsilon>0$, \begin{align*} \lim_{\delta\downarrow0}\limsup_{n\to\infty} \mathbb P\left(\sup_{\rho_P(f,g)<\delta}|\alpha_n(f)-\alpha_n(g)|>\varepsilon\right)=0. \end{align*} [guided] We need asymptotic equicontinuity, so the right object is not the original class alone but the class of local increments. For $\delta>0$, define \begin{align*} \mathcal H_\delta=\{f-g:f,g\in\mathcal F,\ \rho_P(f,g)<\delta\}. \end{align*} If $h=f-g\in\mathcal H_\delta$, then $|h|\leq |f|+|g|\leq 2F_e$, so $2F_e$ is an envelope for every localized increment class. Also \begin{align*} \|h\|_{L^2(P)}=\|f-g\|_{L^2(P)}=\rho_P(f,g)<\delta, \end{align*} and hence \begin{align*} \sup_{h\in\mathcal H_\delta}\|h\|_{L^2(P)}\leq\delta. \end{align*} Here is the covering estimate, stated independently of the exact proof. If $\{f_1,\dots,f_m\}$ is an $r$-cover of $\mathcal F$ in $L^2(Q)$, then every increment $f-g$ is within $2r$ of some $f_a-f_b$, because \begin{align*} \|(f-g)-(f_a-f_b)\|_{L^2(Q)}\leq \|f-f_a\|_{L^2(Q)}+\|g-f_b\|_{L^2(Q)}<2r. \end{align*} Therefore \begin{align*} N(2r,\mathcal F-\mathcal F,L^2(Q))\leq N(r,\mathcal F,L^2(Q))^2. \end{align*} Together with the envelope $2F_e$ and the identity \begin{align*} P(2F_e)^2=4P F_e^2<\infty, \end{align*} this proves that the increment class has finite uniform entropy integral whenever $J(1,\mathcal F)<\infty$. Now we invoke only a maximal inequality, not the Donsker conclusion we are proving. The uniform entropy maximal inequality says that a pointwise measurable class $\mathcal H$ with envelope $H_e$, finite uniform entropy integral relative to $H_e$, and $P H_e^2<\infty$ satisfies the following local expectation bound over all $h\in\mathcal H$ with $\|h\|_{L^2(P)}<\eta$: \begin{align*} \lim_{\eta\downarrow0}\limsup_{n\to\infty} \mathbb E\left[\sup\left\{\left|\frac{1}{\sqrt n}\sum_{i=1}^n\bigl(h(X_i)-P h\bigr)\right|:h\in\mathcal H,\ \|h\|_{L^2(P)}<\eta\right\}\right]=0. \end{align*} This is not circular: it is an expectation estimate for suprema over small $L^2(P)$ balls, and it does not assert weak convergence in $\ell^\infty(\mathcal F)$ or the existence of a Gaussian limit. Apply the inequality with $\mathcal H=\mathcal F-\mathcal F$ and $H_e=2F_e$. Since every $h\in\mathcal H_\delta$ has $\|h\|_{L^2(P)}<\delta$, we obtain \begin{align*} \lim_{\delta\downarrow0}\limsup_{n\to\infty} \mathbb E\left[\sup_{h\in\mathcal H_\delta}\left|\frac{1}{\sqrt n}\sum_{i=1}^n\bigl(h(X_i)-P h\bigr)\right|\right]=0. \end{align*} For $h=f-g$, linearity of the empirical process gives \begin{align*} \frac{1}{\sqrt n}\sum_{i=1}^n\bigl(h(X_i)-P h\bigr)=\alpha_n(f)-\alpha_n(g). \end{align*} Thus Markov's inequality yields, for every $\varepsilon>0$, \begin{align*} \mathbb P\left(\sup_{\rho_P(f,g)<\delta}|\alpha_n(f)-\alpha_n(g)|>\varepsilon\right) \leq \frac{1}{\varepsilon}\mathbb E\left[\sup_{h\in\mathcal H_\delta}\left|\frac{1}{\sqrt n}\sum_{i=1}^n\bigl(h(X_i)-P h\bigr)\right|\right]. \end{align*} Taking $\limsup_{n\to\infty}$ and then letting $\delta\downarrow0$ proves \begin{align*} \lim_{\delta\downarrow0}\limsup_{n\to\infty} \mathbb P\left(\sup_{\rho_P(f,g)<\delta}|\alpha_n(f)-\alpha_n(g)|>\varepsilon\right)=0. \end{align*} [/guided] [/step] [step:Combine finite-dimensional convergence with the empirical-process convergence criterion] We now use the empirical-process convergence criterion, which is distinct from the entropy sufficient condition being proved. In this context, asymptotic measurability means that for every bounded continuous functional $\Phi:\ell^\infty(\mathcal F)\to\mathbb R$, the outer and inner expectations of $\Phi(\alpha_n)$ have the same limit whenever that limit is tested along the sequence. The criterion states that a pointwise measurable empirical process indexed by a semimetric class $(\mathcal F,\rho_P)$ converges weakly in $\ell^\infty(\mathcal F)$ if its finite-dimensional distributions converge, it is asymptotically uniformly equicontinuous in $\rho_P$, and the limiting semimetric space is totally bounded after quotienting by the relation $\rho_P(f,g)=0$. We verify these hypotheses. Pointwise measurability supplies the asymptotic measurability and separability requirement through a countable pointwise-dense subclass. The finite entropy integral supplies [total boundedness](/page/Total%20Boundedness) of the quotient of $(\mathcal F,\rho_P)$, because the same entropy bound controls the number of $L^2(P)$-balls needed after the standard finitely supported approximation of $P$ used in the uniform-entropy criterion. The first step gives finite-dimensional convergence, and the preceding step gives asymptotic uniform equicontinuity. The first step gives convergence of every finite-dimensional marginal of $\alpha_n$ to a centred Gaussian vector with covariance matrix \begin{align*} \Sigma_{ij}=P(f_i f_j)-P f_i\,P f_j. \end{align*} The preceding step proves the required asymptotic uniform equicontinuity: \begin{align*} \lim_{\delta\downarrow 0}\limsup_{n\to\infty} \mathbb P\left(\sup_{\rho_P(f,g)<\delta}|\alpha_n(f)-\alpha_n(g)|>\varepsilon\right)=0 \end{align*} for every $\varepsilon>0$. The convergence criterion therefore yields weak convergence in $\ell^\infty(\mathcal F)$ to a tight centred Gaussian random map $G_P:\mathcal F\to\mathbb R$. Its covariance is determined by the displayed finite-dimensional limits, namely \begin{align*} \operatorname{Cov}(G_P(f),G_P(g))=P(fg)-P f\,P g. \end{align*} This is precisely the asserted $P$-Donsker property of $\mathcal F$. [/step]

Prerequisites (0/7 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.