Consistency of the Cramér-von Mises Test Against Fixed Alternatives

Consistency of the Cramér-von Mises Test Against Fixed Alternatives (Theorem # 6312)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We compare the empirical discrepancy $F_n-F_0$ with the fixed discrepancy $F-F_0$ using the probability measure induced by the null distribution function $F_0$. The continuity of $F_0$ is part of the usual null calibration for the Cramer-von Mises statistic, while the divergence argument itself uses only the induced weighting measure, boundedness of distribution functions, and bounded critical values. The [Glivenko-Cantelli theorem](/theorems/2004) gives uniform almost sure convergence of the empirical distribution function $F_n$ to $F$, and this implies convergence of the squared weighted discrepancies. Since the limiting discrepancy has strictly positive squared norm, multiplying by $n$ forces the statistic $W_n^2$ to diverge almost surely, and bounded critical values are eventually exceeded with probability tending to $1$. [/proofplan] [step:Convert uniform empirical convergence into weighted $L^2(\mu_0)$ convergence] Let $(\Omega,\mathcal{F},\mathbb{P}_F)$ denote the probability space carrying the sample sequence $(X_n)_{n=1}^{\infty}$ under the true distribution function $F$. Let $\mu_0$ denote the probability measure on $\mathbb{R}$ induced by the distribution function $F_0$. For each $n \in \mathbb{N}$, let $F_n: \mathbb{R} \to [0,1]$ denote the empirical distribution function of the first $n$ observations. Define the function $\Delta: \mathbb{R} \to [-1,1]$ by \begin{align*} \Delta(x)=F(x)-F_0(x). \end{align*} For each $n \in \mathbb{N}$, define the function $\Delta_n: \mathbb{R} \to [-1,1]$ by \begin{align*} \Delta_n(x)=F_n(x)-F_0(x). \end{align*} We invoke the classical [Glivenko-Cantelli theorem](/theorems/2004), applied to the independent identically distributed real-valued random variables $(X_n)_{n=1}^{\infty}$ with distribution function $F$. It gives \begin{align*} \sup_{x \in \mathbb{R}} |F_n(x)-F(x)| \to 0 \end{align*} $\mathbb{P}_F$-almost surely. For every $x \in \mathbb{R}$ and every $n \in \mathbb{N}$, since distribution functions take values in $[0,1]$, we have $|\Delta_n(x)| \leq 1$ and $|\Delta(x)| \leq 1$. The algebraic identity $a^2-b^2=(a-b)(a+b)$ gives \begin{align*} \left|(\Delta_n(x))^2-(\Delta(x))^2\right|= |\Delta_n(x)-\Delta(x)|\,|\Delta_n(x)+\Delta(x)|. \end{align*} Since $\Delta_n(x)-\Delta(x)=F_n(x)-F(x)$ and $|\Delta_n(x)+\Delta(x)| \leq 2$, it follows that \begin{align*} \left|(\Delta_n(x))^2-(\Delta(x))^2\right| \leq 2|F_n(x)-F(x)|. \end{align*} The triangle inequality for integrals gives \begin{align*} \left|\int_{\mathbb{R}}\bigl((\Delta_n(x))^2-(\Delta(x))^2\bigr)\,d\mu_0(x)\right| \leq \int_{\mathbb{R}}\left|(\Delta_n(x))^2-(\Delta(x))^2\right|\,d\mu_0(x). \end{align*} Using the pointwise estimate above gives \begin{align*} \left|\int_{\mathbb{R}}(\Delta_n(x))^2\,d\mu_0(x)-\int_{\mathbb{R}}(\Delta(x))^2\,d\mu_0(x)\right| \leq \int_{\mathbb{R}}2|F_n(x)-F(x)|\,d\mu_0(x). \end{align*} Since $\mu_0(\mathbb{R})=1$, the right-hand side is bounded by \begin{align*} 2\sup_{x \in \mathbb{R}}|F_n(x)-F(x)|. \end{align*} Therefore \begin{align*} \int_{\mathbb{R}}(F_n(x)-F_0(x))^2\,d\mu_0(x) \to \int_{\mathbb{R}}(F(x)-F_0(x))^2\,d\mu_0(x) \end{align*} $\mathbb{P}_F$-almost surely. [guided] The statistic is built from the weighted squared distance between $F_n$ and $F_0$, so the first task is to show that this weighted distance converges to the corresponding fixed distance between $F$ and $F_0$. The theorem statement defines $F_0: \mathbb{R} \to [0,1]$ and $F: \mathbb{R} \to [0,1]$ as distribution functions and $F_n: \mathbb{R} \to [0,1]$ as the empirical distribution function of the sample. The probability space $(\Omega,\mathcal{F},\mathbb{P}_F)$ carries the sample sequence $(X_n)_{n=1}^{\infty}$ under the true distribution function $F$. Let $\mu_0$ be the probability measure on $\mathbb{R}$ induced by the distribution function $F_0$. Define the discrepancy function $\Delta: \mathbb{R} \to [-1,1]$ by \begin{align*} \Delta(x)=F(x)-F_0(x). \end{align*} For each $n \in \mathbb{N}$, define the discrepancy function $\Delta_n: \mathbb{R} \to [-1,1]$ by \begin{align*} \Delta_n(x)=F_n(x)-F_0(x). \end{align*} These functions take values in $[-1,1]$ because all distribution functions take values in $[0,1]$. We now invoke the classical [Glivenko-Cantelli theorem](/theorems/2004). Its hypothesis is exactly that $X_1,X_2,\dots$ are independent and identically distributed real-valued random variables with common distribution function $F$, which is the sampling model under the true distribution. Therefore \begin{align*} \sup_{x \in \mathbb{R}} |F_n(x)-F(x)| \to 0 \end{align*} $\mathbb{P}_F$-almost surely. Why does [uniform convergence](/page/Uniform%20Convergence) imply convergence of the weighted $L^2(\mu_0)$ discrepancies? For each $x \in \mathbb{R}$, \begin{align*} \Delta_n(x)-\Delta(x)=(F_n(x)-F_0(x))-(F(x)-F_0(x)). \end{align*} After cancelling the two occurrences of $F_0(x)$, this becomes \begin{align*} \Delta_n(x)-\Delta(x)=F_n(x)-F(x). \end{align*} Using the algebraic identity $a^2-b^2=(a-b)(a+b)$ with $a=\Delta_n(x)$ and $b=\Delta(x)$, and using $|\Delta_n(x)| \leq 1$ and $|\Delta(x)| \leq 1$, we obtain \begin{align*} \left|(\Delta_n(x))^2-(\Delta(x))^2\right|= |\Delta_n(x)-\Delta(x)|\,|\Delta_n(x)+\Delta(x)|. \end{align*} Substituting $\Delta_n(x)-\Delta(x)=F_n(x)-F(x)$ and applying the triangle inequality to $\Delta_n(x)+\Delta(x)$ gives \begin{align*} \left|(\Delta_n(x))^2-(\Delta(x))^2\right| \leq |F_n(x)-F(x)|\bigl(|\Delta_n(x)|+|\Delta(x)|\bigr). \end{align*} Since $|\Delta_n(x)| \leq 1$ and $|\Delta(x)| \leq 1$, we conclude \begin{align*} \left|(\Delta_n(x))^2-(\Delta(x))^2\right| \leq 2|F_n(x)-F(x)|. \end{align*} Now integrate with respect to $\mu_0$. Since $\mu_0$ is a probability measure induced by the distribution function $F_0$, we have $\mu_0(\mathbb{R})=1$, and hence First, the triangle inequality for integrals gives \begin{align*} \left|\int_{\mathbb{R}}\bigl((\Delta_n(x))^2-(\Delta(x))^2\bigr)\,d\mu_0(x)\right| \leq \int_{\mathbb{R}}\left|(\Delta_n(x))^2-(\Delta(x))^2\right|\,d\mu_0(x). \end{align*} Using the pointwise bound above gives \begin{align*} \int_{\mathbb{R}}\left|(\Delta_n(x))^2-(\Delta(x))^2\right|\,d\mu_0(x) \leq \int_{\mathbb{R}}2|F_n(x)-F(x)|\,d\mu_0(x). \end{align*} Finally, since $|F_n(x)-F(x)| \leq \sup_{y \in \mathbb{R}}|F_n(y)-F(y)|$ for every $x \in \mathbb{R}$, and since $\mu_0(\mathbb{R})=1$, we obtain \begin{align*} \int_{\mathbb{R}}2|F_n(x)-F(x)|\,d\mu_0(x) \leq 2\sup_{y \in \mathbb{R}}|F_n(y)-F(y)|. \end{align*} The right-hand side converges to $0$ $\mathbb{P}_F$-almost surely by Glivenko-Cantelli. Therefore \begin{align*} \int_{\mathbb{R}}(F_n(x)-F_0(x))^2\,d\mu_0(x) \to \int_{\mathbb{R}}(F(x)-F_0(x))^2\,d\mu_0(x) \end{align*} $\mathbb{P}_F$-almost surely. [/guided] [/step] [step:Use the positive limiting discrepancy to force $W_n^2$ to diverge] Define, for each $n \in \mathbb{N}$, \begin{align*} I_n := \int_{\mathbb{R}}(F_n(x)-F_0(x))^2\,d\mu_0(x). \end{align*} Define the Cramer-von Mises statistic $W_n^2: \Omega \to [0,\infty)$ by \begin{align*} W_n^2 := nI_n. \end{align*} Define the limiting discrepancy $I \in [0,1]$ by \begin{align*} I := \int_{\mathbb{R}}(F(x)-F_0(x))^2\,d\mu_0(x). \end{align*} The previous step gives $I_n \to I$ $\mathbb{P}_F$-almost surely, and the hypothesis gives $I>0$. Therefore, on an event $A$ with $\mathbb{P}_F(A)=1$, for every $\omega \in A$ there exists $N_1(\omega) \in \mathbb{N}$ such that \begin{align*} I_n(\omega) \geq \frac{I}{2} \end{align*} for all $n \geq N_1(\omega)$. Hence, for all $n \geq N_1(\omega)$, \begin{align*} W_n^2(\omega)=nI_n(\omega)\geq \frac{nI}{2}. \end{align*} Thus $W_n^2 \to \infty$ $\mathbb{P}_F$-almost surely. [/step] [step:Compare the diverging statistic with bounded critical values] Let $(c_n)_{n=1}^{\infty}$ denote the bounded sequence of critical values for the Cramer-von Mises test. By definition, the rejection region at sample size $n$ is exactly $\{W_n^2>c_n\}$. Since the sequence $(c_n)_{n=1}^{\infty}$ is bounded, choose a finite constant $C \in \mathbb{R}$ such that $c_n \leq C$ for every $n \in \mathbb{N}$. On the probability-one event from the previous step, for each $\omega$ there exists $N_2(\omega) \in \mathbb{N}$ such that \begin{align*} W_n^2(\omega) > C \end{align*} for all $n \geq N_2(\omega)$. Since $c_n \leq C$, this implies \begin{align*} \mathbb{1}_{\{W_n^2>c_n\}}(\omega) \to 1 \end{align*} for $\mathbb{P}_F$-almost every $\omega$. For each $n \in \mathbb{N}$, define the indicator function $Y_n: \Omega \to \{0,1\}$ by $Y_n(\omega)=\mathbb{1}_{\{W_n^2>c_n\}}(\omega)$. The almost sure convergence just proved says $Y_n \to 1$ $\mathbb{P}_F$-almost surely, and the domination condition for the bounded convergence theorem is $0 \leq Y_n \leq 1$, where the constant function $1: \Omega \to \mathbb{R}$, $\omega \mapsto 1$, is integrable because $\mathbb{P}_F$ is a probability measure. Therefore the bounded convergence theorem applies to $(Y_n)_{n=1}^{\infty}$ and gives \begin{align*} \mathbb{P}_F(W_n^2>c_n)=\int_{\Omega}Y_n(\omega)\,d\mathbb{P}_F(\omega) \to \int_{\Omega}1\,d\mathbb{P}_F(\omega)=1. \end{align*} Therefore the rejection probability tends to $1$, so the Cramér-von Mises test with bounded critical values is consistent against the fixed alternative $F$. [/step]

Prerequisites (0/5 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Distribution Definition Continuity Definition Event Definition test Theorem #89 Triangle Inequality For Inner Product Spaces Theorem #433 Oracle Inequality for the Lasso under a Uniform Compatibility Condition Probability & Statistics Affine Transformation of Variance Probability Theory Sub-Exponential Confidence Radius for the Sample Mean Probability & Statistics Moments of Branching Processes Probability Theory Restricted Isometry Property Implies Injectivity on Sparse Vectors Probability & Statistics Binomial Distribution of the Empirical Distribution Function Probability & Statistics Asymptotic Mean Integrated Squared Error for Kernel Density Estimators Probability & Statistics Marchenko-Pastur Stieltjes Transform Equation Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.