Dvoretzky-Kiefer-Wolfowitz Inequality — Statement & Proof

Dvoretzky-Kiefer-Wolfowitz Inequality (Theorem # 6300)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the two-sided Dvoretzky-Kiefer-Wolfowitz bound by applying the sharp one-sided Massart empirical distribution inequality as an external prerequisite, separately to the upper and lower deviations. The empirical distribution function is first treated through measurable upper and lower deviation events, with measurability reduced to suprema over rational points. Each one-sided event has probability at most $e^{-2n\varepsilon^2}$, and the union bound then gives the factor $2$ with no loss in the exponential constant. [/proofplan] [step:Declare the upper and lower empirical deviation events and verify their measurability] Define the upper deviation event $A_+:=\{\sup_{x\in\mathbb R}(F_n(x)-F(x))>\varepsilon\}$ and the lower deviation event $A_-:=\{\sup_{x\in\mathbb R}(F(x)-F_n(x))>\varepsilon\}$. For each fixed outcome, define the functions $h_+,h_-:\mathbb R\to\mathbb R$ by \begin{align*} h_+(x)&:=F_n(x)-F(x), \end{align*} and \begin{align*} h_-(x)&:=F(x)-F_n(x). \end{align*} Both $F$ and $F_n$ are right-continuous distribution functions, so $h_+$ and $h_-$ are right-continuous. Since $\mathbb Q$ is dense in $\mathbb R$, for every $x\in\mathbb R$ there is a decreasing sequence $(q_m)_{m\in\mathbb N}$ in $\mathbb Q$ with $q_m\downarrow x$, and right-continuity gives $h_\pm(q_m)\to h_\pm(x)$. Therefore \begin{align*} \sup_{x\in\mathbb R}h_\pm(x)=\sup_{q\in\mathbb Q}h_\pm(q). \end{align*} It follows that $A_+$ and $A_-$ are countable unions of events determined by the real-valued random variables $F_n(q)$, $q\in\mathbb Q$, hence are measurable. Since $|a|>\varepsilon$ is equivalent to $a>\varepsilon$ or $-a>\varepsilon$ for each $a\in\mathbb R$, we have \begin{align*} \left\{\sup_{x\in\mathbb R}|F_n(x)-F(x)|>\varepsilon\right\} \subset A_+\cup A_-. \end{align*} [/step] [step:Apply the sharp one-sided Massart empirical distribution bound to the upper deviation] We use the [Massart sharp one-sided empirical distribution inequality](https://doi.org/10.1214/aop/1015345959) as an external prerequisite: if $Y_1,\dots,Y_n$ are i.i.d. real-valued random variables with distribution function $G$, and if $G_n: \mathbb R\to[0,1]$ is their empirical distribution function, then for every $\varepsilon>0$ the event $\{\sup_{x\in\mathbb R}(G_n(x)-G(x))>\varepsilon\}$ is measurable and \begin{align*} \mathbb P\left(\sup_{x\in\mathbb R}(G_n(x)-G(x))>\varepsilon\right) \le e^{-2n\varepsilon^2}. \end{align*} The hypotheses of this inequality are exactly satisfied with $Y_i=X_i$, $G=F$, and $G_n=F_n$: the theorem statement gives that $X_1,\dots,X_n$ are i.i.d. real-valued random variables with distribution function $F$, and the formalized statement defines $F_n$ as their empirical distribution function. Therefore $\mathbb P(A_+)\le e^{-2n\varepsilon^2}$. [/step] [step:Reflect the sample to bound the lower deviation] For the lower deviation, apply the same upper-tail inequality to the reflected sample $Y_i:=-X_i$. Let $G: \mathbb R\to[0,1]$ be the distribution function of $Y_1$, and let $G_n: \mathbb R\to[0,1]$ be the empirical distribution function of $Y_1,\dots,Y_n$. For each $t\in\mathbb R$ and each $m\in\mathbb N$, where $\mathbb N=\{1,2,3,\dots\}$, set $y_m:=-t-1/m$. Then $\{Y_i\le y_m\}=\{X_i\ge t+1/m\}$ increases to $\{X_i>t\}$. By continuity from below for probability measures, $G(y_m)\uparrow \mathbb P(X_1>t)$. Also, for each fixed outcome, the indicators $\mathbb{1}_{\{Y_i\le y_m\}}=\mathbb{1}_{\{X_i\ge t+1/m\}}$ increase to $\mathbb{1}_{\{X_i>t\}}$, so the finite empirical averages $G_n(y_m)$ increase to $\frac{1}{n}\sum_{i=1}^n\mathbb{1}_{\{X_i>t\}}$. Hence \begin{align*} \frac{1}{n}\sum_{i=1}^n\mathbb{1}_{\{X_i>t\}}-\mathbb P(X_1>t)=\lim_{m\to\infty}\left(G_n(y_m)-G(y_m)\right). \end{align*} Since each term $G_n(y_m)-G(y_m)$ is bounded above by $\sup_{y\in\mathbb R}(G_n(y)-G(y))$, passing to the limit gives \begin{align*} \frac{1}{n}\sum_{i=1}^n\mathbb{1}_{\{X_i>t\}}-\mathbb P(X_1>t)\le \sup_{y\in\mathbb R}(G_n(y)-G(y)). \end{align*} Since $F(t)-F_n(t)=\frac{1}{n}\sum_{i=1}^n\mathbb{1}_{\{X_i>t\}}-\mathbb P(X_1>t)$, taking the supremum over $t\in\mathbb R$ and applying the one-sided inequality to $Y_1,\dots,Y_n$ gives $\mathbb P(A_-)\le e^{-2n\varepsilon^2}$. [guided] For the lower deviation, we reduce to the upper-tail inequality by reflecting the sample. Define $Y_i:=-X_i$ for $i\in\{1,\dots,n\}$. The random variables $Y_1,\dots,Y_n$ are again i.i.d. and real-valued. Let $G:\mathbb R\to[0,1]$ be their distribution function and let $G_n:\mathbb R\to[0,1]$ be their empirical distribution function. Applying the [Massart sharp one-sided empirical distribution inequality](https://doi.org/10.1214/aop/1015345959) to $Y_1,\dots,Y_n$ shows that the event $\{\sup_{y\in\mathbb R}(G_n(y)-G(y))>\varepsilon\}$ is measurable and gives \begin{align*} \mathbb P\left(\sup_{y\in\mathbb R}(G_n(y)-G(y))>\varepsilon\right)\le e^{-2n\varepsilon^2}. \end{align*} For $t\in\mathbb R$, \begin{align*} F(t)-F_n(t)=\mathbb P(X_1\le t)-\frac{1}{n}\sum_{i=1}^n\mathbb{1}_{\{X_i\le t\}} =\frac{1}{n}\sum_{i=1}^n\mathbb{1}_{\{X_i>t\}}-\mathbb P(X_1>t). \end{align*} We now compare the strict upper half-line expression with the reflected empirical process. For each $m\in\mathbb N$, where $\mathbb N=\{1,2,3,\dots\}$, define $y_m:=-t-1/m$. Then \begin{align*} \{Y_i\le y_m\}=\{-X_i\le -t-1/m\}=\{X_i\ge t+1/m\}. \end{align*} As $m\to\infty$, the events $\{X_i\ge t+1/m\}$ increase to $\{X_i>t\}$. Continuity from below for probability measures gives $G(y_m)\uparrow\mathbb P(X_1>t)$, and the corresponding indicators increase pointwise to $\mathbb{1}_{\{X_i>t\}}$. Because the empirical average is a finite sum, this pointwise convergence gives \begin{align*} \frac{1}{n}\sum_{i=1}^n\mathbb{1}_{\{X_i>t\}}-\mathbb P(X_1>t)=\lim_{m\to\infty}\left(G_n(y_m)-G(y_m)\right). \end{align*} Every term $G_n(y_m)-G(y_m)$ is at most $\sup_{y\in\mathbb R}(G_n(y)-G(y))$, so passing to the limit gives \begin{align*} F(t)-F_n(t)\le \sup_{y\in\mathbb R}(G_n(y)-G(y)). \end{align*} Taking the supremum over $t\in\mathbb R$ and then applying the one-sided inequality to the reflected sample gives \begin{align*} \mathbb P(A_-)=\mathbb P\left(\sup_{x\in\mathbb R}(F(x)-F_n(x))>\varepsilon\right)\le e^{-2n\varepsilon^2}. \end{align*} This is the point where the sharp constant for the lower sign is fixed: the lower deviation has been converted into the same one-sided empirical process bound as the upper deviation, so it contributes exactly $e^{-2n\varepsilon^2}$. [/guided] [/step] [step:Combine the one-sided bounds by the union bound] By the union bound applied to the two events $A_+$ and $A_-$, \begin{align*} \mathbb P\left(\sup_{x\in\mathbb R}|F_n(x)-F(x)|>\varepsilon\right) \le \mathbb P(A_+\cup A_-). \end{align*} The union bound gives $\mathbb P(A_+\cup A_-)\le \mathbb P(A_+)+\mathbb P(A_-)$. Using the one-sided estimates from the previous step, \begin{align*} \mathbb P(A_+)+\mathbb P(A_-) \le e^{-2n\varepsilon^2}+e^{-2n\varepsilon^2}=2e^{-2n\varepsilon^2}. \end{align*} Combining the last two inequalities gives \begin{align*} \mathbb P\left(\sup_{x\in\mathbb R}|F_n(x)-F(x)|>\varepsilon\right)\le 2e^{-2n\varepsilon^2}. \end{align*} This is the asserted Dvoretzky-Kiefer-Wolfowitz inequality. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.

Dvoretzky-Kiefer-Wolfowitz Inequality (Theorem # 6300)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Dvoretzky-Kiefer-Wolfowitz Inequality (Theorem # 6300)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further