Bootstrap Donsker Theorem for the Empirical Distribution Function

Bootstrap Donsker Theorem for the Empirical Distribution Function (Theorem # 6353)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We view the empirical distribution function as the empirical measure indexed by the class of lower half-lines. That class is a pointwise measurable Vapnik-Chervonenkis class with bounded envelope, so the ordinary empirical process converges to the Brownian-bridge process and the conditional multinomial bootstrap empirical process has the same weak limit in bounded-Lipschitz distance. We use the countable rational subfamily to fix separable versions of the indexed processes, which makes the Borel-law identification in $\ell^\infty$ legitimate. Finally, the map from the indexed process to the distribution-function process is continuous, and the supremum norm is a Lipschitz functional, so the Kolmogorov-statistic convergence follows from the continuous mapping theorem. [/proofplan] [step:Represent the empirical distribution functions as indexed empirical measures] For each $x \in \mathbb{R}$, define the lower half-line $H_x := (-\infty,x]$. Let $h_x: \mathbb{R} \to \{0,1\}$ be the Borel measurable map whose value at $y \in \mathbb{R}$ is $\mathbb{1}_{H_x}(y)$, and define $\mathcal{H} := \{h_x \mid x \in \mathbb{R}\}$. Let $P$ denote the law of $X_1$. Let $P_n$ denote the empirical measure on Borel sets $A \subset \mathbb{R}$ defined by $P_n(A) := n^{-1}\sum_{i=1}^{n}\mathbb{1}_A(X_i)$. Let $X_1^*,\dots,X_n^*$ be the nonparametric bootstrap sample, conditionally i.i.d. with conditional law $P_n$, and let $P_n^*$ denote its empirical measure. Let $\mathbb{P}^*$ and $\mathbb{E}^*$ denote [conditional probability](/page/Conditional%20Probability) and [conditional expectation](/page/Conditional%20Expectation) given $X_1,\dots,X_n$; all bootstrap laws below are conditional laws under $\mathbb{P}^*$. Then, for every $x \in \mathbb{R}$, \begin{align*} F(x) = P(H_x), \qquad F_n(x) = P_n(H_x), \qquad F_n^*(x) = P_n^*(H_x). \end{align*} Define the ordinary empirical process $\mathbb{G}_n: \mathcal{H} \to \mathbb{R}$ and the bootstrap empirical process $\mathbb{G}_n^*: \mathcal{H} \to \mathbb{R}$ by \begin{align*} \mathbb{G}_n(h_x) := \sqrt n\{P_n(H_x)-P(H_x)\} \end{align*} and \begin{align*} \mathbb{G}_n^*(h_x) := \sqrt n\{P_n^*(H_x)-P_n(H_x)\}. \end{align*} Thus proving conditional [weak convergence](/page/Weak%20Convergence) of $\mathbb{G}_n^*$ in $\ell^\infty(\mathcal{H})$ proves the first assertion after the index change $h_x \leftrightarrow x$. [/step] [step:Apply the conditional multinomial Donsker theorem to lower half-lines] The class $\mathcal{H}$ has Vapnik-Chervonenkis dimension $1$: on any two ordered points $a<b$, the trace of lower half-lines cannot realize the subset $\{b\}$ without also containing $a$, while any one-point set is shattered. The class is pointwise measurable because the countable subclass $\mathcal{H}_{\mathbb{Q}} := \{h_q \mid q \in \mathbb{Q}\}$ approximates it pointwise: for each $x \in \mathbb{R}$ choose rational numbers $q_m \downarrow x$, and then $h_{q_m}(y) \to h_x(y)$ for every $y \in \mathbb{R}$. Its envelope $E: \mathbb{R} \to \mathbb{R}$, $E(y):=1$, is bounded and satisfies $P(E^2)=1$. Let $\mathbb{G}_P: \mathcal{H} \to \mathbb{R}$ denote the centered Gaussian process with covariance \begin{align*} \operatorname{Cov}(\mathbb{G}_P(h_x),\mathbb{G}_P(h_y)) := P(H_x \cap H_y)-P(H_x)P(H_y). \end{align*} Since $H_x \cap H_y = H_{\min\{x,y\}}$, this covariance equals \begin{align*} F(\min\{x,y\})-F(x)F(y). \end{align*} We use the conditional multinomial bootstrap Donsker theorem in the following precise form. Let $\mathcal{F}$ be a pointwise measurable $P$-Donsker class of measurable real-valued functions with envelope $E_{\mathcal{F}}$ satisfying $P(E_{\mathcal{F}}^2)<\infty$. If $X_1^*,\dots,X_n^*$ are conditionally i.i.d. from $P_n$, then a separable version of the conditional bootstrap empirical process $\sqrt n(P_n^*-P_n)$ indexed by $\mathcal{F}$ satisfies \begin{align*} d_{\mathrm{BL}}\bigl(\mathcal{L}^*(\sqrt n(P_n^*-P_n)),\mathcal{L}(\mathbb{G}_P)\bigr) \xrightarrow{\mathbb{P}} 0 \end{align*} for the bounded-Lipschitz metric on Borel probability laws on $\ell^\infty(\mathcal{F})$, where the separable version is determined by the countable pointwise-dense subclass. In the present case, $\mathcal{H}_{\mathbb{Q}}$ supplies that countable subclass, the VC property gives the $P$-Donsker property, and $P(E^2)=1$ gives the square-integrable envelope condition. Therefore \begin{align*} d_{\mathrm{BL}}\bigl(\mathcal{L}^*(\mathbb{G}_n^*),\mathcal{L}(\mathbb{G}_P)\bigr) \xrightarrow{\mathbb{P}} 0, \end{align*} where $\mathcal{L}^*(\mathbb{G}_n^*)$ is the conditional law of the separable version of $\mathbb{G}_n^*$ under $\mathbb{P}^*$, $\mathcal{L}(\mathbb{G}_P)$ is the law of the corresponding tight separable Gaussian limit, and $d_{\mathrm{BL}}$ denotes the supremum of expectation differences over all real-valued Borel functions on $\ell^\infty(\mathcal{H})$ bounded by $1$ and Lipschitz with constant at most $1$. This bounded-Lipschitz metric statement is the asserted conditional weak convergence in probability of $\mathbb{G}_n^*$ to $\mathbb{G}_P$ in the empirical-process sense. [guided] The point of introducing $\mathcal{H}$ is that the empirical distribution function is an empirical process indexed by sets. For $h_x = \mathbb{1}_{(-\infty,x]}$, the coordinate $\mathbb{G}_n^*(h_x)$ is exactly $\sqrt n\{F_n^*(x)-F_n(x)\}$. Thus we need a bootstrap [central limit theorem](/theorems/521) for the whole indexed process, not merely for one fixed $x$. We verify the hypotheses of the conditional multinomial bootstrap Donsker theorem. First, $\mathcal{H}$ is a VC class. If $a<b$ are two [real numbers](/page/Real%20Numbers), a lower half-line containing $b$ also contains $a$, so the subset $\{b\}$ of $\{a,b\}$ cannot be realized as a trace. Any one-point set can be shattered, hence the VC dimension is $1$. Second, the uncountable index class is measurable in the empirical-process sense. Define the countable subclass $\mathcal{H}_{\mathbb{Q}} := \{h_q \mid q \in \mathbb{Q}\}$. For a fixed $x \in \mathbb{R}$, choose rational numbers $q_m$ with $q_m \downarrow x$. Then for every $y \in \mathbb{R}$, the indicators $h_{q_m}(y)=\mathbb{1}_{(-\infty,q_m]}(y)$ converge to $h_x(y)=\mathbb{1}_{(-\infty,x]}(y)$. This pointwise approximation by a countable subclass is the separability condition that prevents measurability pathologies in $\ell^\infty(\mathcal{H})$. Third, the envelope condition holds. The envelope map $E: \mathbb{R} \to \mathbb{R}$ defined by $E(y):=1$ dominates every $h_x$ and satisfies \begin{align*} P(E^2)=1. \end{align*} Thus the envelope is square-integrable and bounded. Since VC classes with square-integrable envelope are $P$-Donsker, the ordinary empirical process indexed by $\mathcal{H}$ has a tight centered Gaussian limit $\mathbb{G}_P$ in $\ell^\infty(\mathcal{H})$. The conditional multinomial bootstrap Donsker theorem now applies because, under $\mathbb{P}^*$, the bootstrap variables $X_1^*,\dots,X_n^*$ are i.i.d. with law $P_n$. The theorem gives convergence of the conditional bootstrap law in bounded-Lipschitz metric, not merely pointwise convergence for each [test function](/page/Test%20Function): \begin{align*} d_{\mathrm{BL}}\bigl(\mathcal{L}^*(\mathbb{G}_n^*),\mathcal{L}(\mathbb{G}_P)\bigr) \xrightarrow{\mathbb{P}} 0. \end{align*} Here $\mathcal{L}^*(\mathbb{G}_n^*)$ is the conditional law of $\mathbb{G}_n^*$, $\mathcal{L}(\mathbb{G}_P)$ is the law of the Gaussian limit, and $d_{\mathrm{BL}}$ is the supremum over all real-valued functions on $\ell^\infty(\mathcal{H})$ bounded by $1$ and Lipschitz with constant at most $1$. This metric formulation is what justifies calling the result conditional weak convergence in probability. It remains to record the covariance of the Gaussian limit. The limiting process $\mathbb{G}_P$ is centered Gaussian, and for $x,y \in \mathbb{R}$ its covariance is the covariance of the two indicators: \begin{align*} \operatorname{Cov}(\mathbb{G}_P(h_x),\mathbb{G}_P(h_y)) = P(H_x \cap H_y)-P(H_x)P(H_y). \end{align*} Because $H_x \cap H_y = H_{\min\{x,y\}}$, this becomes \begin{align*} \operatorname{Cov}(\mathbb{G}_P(h_x),\mathbb{G}_P(h_y)) = F(\min\{x,y\})-F(x)F(y). \end{align*} That is the covariance of the Brownian bridge evaluated at $F(x)$ and $F(y)$. [/guided] [/step] [step:Identify the Gaussian limit with the Brownian bridge composed with $F$] Let $B: [0,1] \to \mathbb{R}$ be a standard Brownian bridge with continuous sample paths, meaning a centered Gaussian process with covariance \begin{align*} \operatorname{Cov}(B(s),B(t)) = \min\{s,t\}-st. \end{align*} Let $Z: \mathbb{R} \to \mathbb{R}$ be the random element of $\ell^\infty(\mathbb{R})$ whose value at $x \in \mathbb{R}$ is $Z(x):=B(F(x))$. For $x,y \in \mathbb{R}$, monotonicity of $F$ gives \begin{align*} \min\{F(x),F(y)\}=F(\min\{x,y\}). \end{align*} Therefore \begin{align*} \operatorname{Cov}(Z(x),Z(y)) = F(\min\{x,y\})-F(x)F(y), \end{align*} which is the covariance of $\mathbb{G}_P(h_x)$ and $\mathbb{G}_P(h_y)$. It remains to identify the full laws, not only finite-dimensional distributions. Let $\mathcal{H}_{\mathbb{Q}}:=\{h_q\mid q\in\mathbb{Q}\}$. The Gaussian limit $\mathbb{G}_P$ is taken in its separable version determined by $\mathcal{H}_{\mathbb{Q}}$. For $x\in\mathbb{R}$ and rational $q_m\downarrow x$, right-continuity of $F$ gives $F(q_m)\to F(x)$, and continuity of $B$ gives $B(F(q_m))\to B(F(x))$. Hence $Z$ is also determined by its coordinates on $\mathbb{Q}$. Since the two centered Gaussian processes have identical finite-dimensional distributions on the countable determining set $\mathbb{Q}$, their induced Borel probability laws on the corresponding separable subspace of $\ell^\infty(\mathbb{R})$ agree. Thus $T\mathbb{G}_P$ has the same law as $B\circ F$ as a random element of $\ell^\infty(\mathbb{R})$. [guided] We must be slightly careful here: equality of finite-dimensional distributions on an uncountable index set does not automatically identify a Borel law on $\ell^\infty(\mathbb{R})$. The countable rational subfamily is what removes this ambiguity. Let $B: [0,1]\to\mathbb{R}$ be a continuous standard Brownian bridge, so $B$ is centered Gaussian and \begin{align*} \operatorname{Cov}(B(s),B(t))=\min\{s,t\}-st. \end{align*} Define $Z: \mathbb{R}\to\mathbb{R}$ by $Z(x):=B(F(x))$. Then $Z$ is bounded because $B$ is continuous on the compact interval $[0,1]$. For $x,y\in\mathbb{R}$, the distribution function $F$ is nondecreasing, so \begin{align*} \min\{F(x),F(y)\}=F(\min\{x,y\}). \end{align*} Thus \begin{align*} \operatorname{Cov}(Z(x),Z(y))=F(\min\{x,y\})-F(x)F(y), \end{align*} which matches the covariance of $\mathbb{G}_P(h_x)$ and $\mathbb{G}_P(h_y)$. Now we pass from covariance matching to equality of laws in $\ell^\infty(\mathbb{R})$. The process $\mathbb{G}_P$ was chosen as the separable Gaussian version determined by $\mathcal{H}_{\mathbb{Q}}$. For the Brownian-bridge process, take any $x\in\mathbb{R}$ and choose rational numbers $q_m\downarrow x$. Right-continuity of the distribution function gives $F(q_m)\to F(x)$, and continuity of the sample path of $B$ gives \begin{align*} B(F(q_m))\to B(F(x)). \end{align*} Therefore $Z(x)$ is determined by the values $Z(q)$ with $q\in\mathbb{Q}$. Both processes are consequently supported on the same kind of separable subspace determined by rational coordinates. On that countable coordinate set, matching covariance and centered Gaussianity imply matching all finite-dimensional distributions. Since countable coordinate laws determine the Borel law on this separable version, $T\mathbb{G}_P$ and $B\circ F$ have the same law in $\ell^\infty(\mathbb{R})$. [/guided] [/step] [step:Transfer the indexed convergence to $\ell^\infty(\mathbb{R})$] Define the [linear map](/page/Linear%20Map) $T: \ell^\infty(\mathcal{H}) \to \ell^\infty(\mathbb{R})$ by \begin{align*} (Tz)(x) := z(h_x). \end{align*} For $z,w \in \ell^\infty(\mathcal{H})$, \begin{align*} \|Tz-Tw\|_{\ell^\infty(\mathbb{R})} \leq \|z-w\|_{\ell^\infty(\mathcal{H})}, \end{align*} so $T$ is continuous. Applying the [Continuous Mapping Theorem](/theorems/1847) in its conditional bounded-Lipschitz form to the continuous map $T$ and to the convergence from the previous step gives \begin{align*} T\mathbb{G}_n^* \xrightarrow{d} T\mathbb{G}_P \end{align*} conditionally in probability in $\ell^\infty(\mathbb{R})$. By the definitions of $T$ and $\mathbb{G}_n^*$, \begin{align*} (T\mathbb{G}_n^*)(x)=\sqrt n\{F_n^*(x)-F_n(x)\}. \end{align*} By the identification of the preceding step, $T\mathbb{G}_P$ has the same law as $B \circ F$. Hence \begin{align*} \sqrt n(F_n^*-F_n) \xrightarrow{d} B\circ F \end{align*} conditionally in probability as a random element of $\ell^\infty(\mathbb{R})$. [guided] The map $T$ only changes the index notation: it sends a function indexed by half-lines to the same values indexed by real numbers. Its continuity is exactly the hypothesis needed for the [Continuous Mapping Theorem](/theorems/1847). Since the bootstrap convergence is stated in bounded-Lipschitz distance conditionally on the data, applying the conditional version of the theorem gives the conditional convergence in probability of $T\mathbb{G}_n^*$ to $T\mathbb{G}_P$. For each $x\in\mathbb{R}$, the definition of the bootstrap empirical process gives \begin{align*} (T\mathbb{G}_n^*)(x)=\mathbb{G}_n^*(h_x)=\sqrt n\{P_n^*(H_x)-P_n(H_x)\}=\sqrt n\{F_n^*(x)-F_n(x)\}. \end{align*} The previous step identifies the law of $T\mathbb{G}_P$ with the law of $B\circ F$ as a random element of $\ell^\infty(\mathbb{R})$. Therefore the distribution-function process itself satisfies \begin{align*} \sqrt n(F_n^*-F_n) \xrightarrow{d} B\circ F \end{align*} conditionally in probability in $\ell^\infty(\mathbb{R})$. [/guided] [/step] [step:Apply the supremum functional to obtain the Kolmogorov statistic limit] Define $S: \ell^\infty(\mathbb{R}) \to \mathbb{R}$ by \begin{align*} S(z) := \sup_{x \in \mathbb{R}} |z(x)|. \end{align*} For $z,w \in \ell^\infty(\mathbb{R})$, the [reverse triangle inequality](/theorems/2300) gives \begin{align*} |S(z)-S(w)| \leq \|z-w\|_{\ell^\infty(\mathbb{R})}, \end{align*} so $S$ is Lipschitz continuous. The [Continuous Mapping Theorem](/theorems/1847), applied conditionally to $S$, yields \begin{align*} \sup_{x \in \mathbb{R}} \sqrt n |F_n^*(x)-F_n(x)| \xrightarrow{d} \sup_{x \in \mathbb{R}} |B(F(x))| \end{align*} conditionally in probability. The ordinary empirical-process Donsker theorem for the same pointwise measurable VC class $\mathcal{H}$ with square-integrable envelope $E=1$ gives, in the same separable empirical-process sense, \begin{align*} \sqrt n(F_n-F) \xrightarrow{d} B\circ F \end{align*} in $\ell^\infty(\mathbb{R})$. Applying the same Lipschitz functional $S$ and the [Continuous Mapping Theorem](/theorems/1847) gives \begin{align*} \sup_{x \in \mathbb{R}} \sqrt n |F_n(x)-F(x)| \xrightarrow{d} \sup_{x \in \mathbb{R}} |B(F(x))|. \end{align*} Thus the conditional distribution of the bootstrap supremum statistic converges in probability to the same limiting distribution as the ordinary empirical supremum statistic. This completes the proof. [/step]

Prerequisites (0/8 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Continuity Definition Distribution Definition Expectation Definition Real Numbers Definition Separable Definition Weak Convergence Definition Continuous Mapping Theorem for the Supremum Norm Theorem #6304 Continuous Mapping Theorem Theorem #1847 Finite-Dimensional Central Limit Theorem for the Empirical Process Probability & Statistics Nodewise Approximate Inverse Bound Probability & Statistics Second Moment Method for Testing Impossibility Probability & Statistics Independence of Disjoint Blocks Probability Theory Bennett Inequality Probability & Statistics Chi-Squared Distribution of the Residual Sum of Squares Probability & Statistics Lasso Basic Inequality Probability & Statistics Factorisation Criterion for Independence Probability Theory Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.