[proofplan]
We compare the empirical discrepancy $F_n-F_0$ with the fixed discrepancy $F-F_0$ using the probability measure induced by the null distribution function $F_0$. The continuity of $F_0$ is part of the usual null calibration for the Cramer-von Mises statistic, while the divergence argument itself uses only the induced weighting measure, boundedness of distribution functions, and bounded critical values. The [Glivenko-Cantelli theorem](/theorems/2004) gives uniform almost sure convergence of the empirical distribution function $F_n$ to $F$, and this implies convergence of the squared weighted discrepancies. Since the limiting discrepancy has strictly positive squared norm, multiplying by $n$ forces the statistic $W_n^2$ to diverge almost surely, and bounded critical values are eventually exceeded with probability tending to $1$.
[/proofplan]
[step:Convert uniform empirical convergence into weighted $L^2(\mu_0)$ convergence]
Let $(\Omega,\mathcal{F},\mathbb{P}_F)$ denote the probability space carrying the sample sequence $(X_n)_{n=1}^{\infty}$ under the true distribution function $F$. Let $\mu_0$ denote the probability measure on $\mathbb{R}$ induced by the distribution function $F_0$. For each $n \in \mathbb{N}$, let $F_n: \mathbb{R} \to [0,1]$ denote the empirical distribution function of the first $n$ observations. Define the function $\Delta: \mathbb{R} \to [-1,1]$ by
\begin{align*}
\Delta(x)=F(x)-F_0(x).
\end{align*}
For each $n \in \mathbb{N}$, define the function $\Delta_n: \mathbb{R} \to [-1,1]$ by
\begin{align*}
\Delta_n(x)=F_n(x)-F_0(x).
\end{align*}
We invoke the classical [Glivenko-Cantelli theorem](/theorems/2004), applied to the independent identically distributed real-valued random variables $(X_n)_{n=1}^{\infty}$ with distribution function $F$. It gives
\begin{align*}
\sup_{x \in \mathbb{R}} |F_n(x)-F(x)| \to 0
\end{align*}
$\mathbb{P}_F$-almost surely.
For every $x \in \mathbb{R}$ and every $n \in \mathbb{N}$, since distribution functions take values in $[0,1]$, we have $|\Delta_n(x)| \leq 1$ and $|\Delta(x)| \leq 1$. The algebraic identity $a^2-b^2=(a-b)(a+b)$ gives
\begin{align*}
\left|(\Delta_n(x))^2-(\Delta(x))^2\right|= |\Delta_n(x)-\Delta(x)|\,|\Delta_n(x)+\Delta(x)|.
\end{align*}
Since $\Delta_n(x)-\Delta(x)=F_n(x)-F(x)$ and $|\Delta_n(x)+\Delta(x)| \leq 2$, it follows that
\begin{align*}
\left|(\Delta_n(x))^2-(\Delta(x))^2\right| \leq 2|F_n(x)-F(x)|.
\end{align*}
The triangle inequality for integrals gives
\begin{align*}
\left|\int_{\mathbb{R}}\bigl((\Delta_n(x))^2-(\Delta(x))^2\bigr)\,d\mu_0(x)\right| \leq \int_{\mathbb{R}}\left|(\Delta_n(x))^2-(\Delta(x))^2\right|\,d\mu_0(x).
\end{align*}
Using the pointwise estimate above gives
\begin{align*}
\left|\int_{\mathbb{R}}(\Delta_n(x))^2\,d\mu_0(x)-\int_{\mathbb{R}}(\Delta(x))^2\,d\mu_0(x)\right| \leq \int_{\mathbb{R}}2|F_n(x)-F(x)|\,d\mu_0(x).
\end{align*}
Since $\mu_0(\mathbb{R})=1$, the right-hand side is bounded by
\begin{align*}
2\sup_{x \in \mathbb{R}}|F_n(x)-F(x)|.
\end{align*}
Therefore
\begin{align*}
\int_{\mathbb{R}}(F_n(x)-F_0(x))^2\,d\mu_0(x) \to \int_{\mathbb{R}}(F(x)-F_0(x))^2\,d\mu_0(x)
\end{align*}
$\mathbb{P}_F$-almost surely.
[guided]
The statistic is built from the weighted squared distance between $F_n$ and $F_0$, so the first task is to show that this weighted distance converges to the corresponding fixed distance between $F$ and $F_0$. The theorem statement defines $F_0: \mathbb{R} \to [0,1]$ and $F: \mathbb{R} \to [0,1]$ as distribution functions and $F_n: \mathbb{R} \to [0,1]$ as the empirical distribution function of the sample. The probability space $(\Omega,\mathcal{F},\mathbb{P}_F)$ carries the sample sequence $(X_n)_{n=1}^{\infty}$ under the true distribution function $F$. Let $\mu_0$ be the probability measure on $\mathbb{R}$ induced by the distribution function $F_0$. Define the discrepancy function $\Delta: \mathbb{R} \to [-1,1]$ by
\begin{align*}
\Delta(x)=F(x)-F_0(x).
\end{align*}
For each $n \in \mathbb{N}$, define the discrepancy function $\Delta_n: \mathbb{R} \to [-1,1]$ by
\begin{align*}
\Delta_n(x)=F_n(x)-F_0(x).
\end{align*}
These functions take values in $[-1,1]$ because all distribution functions take values in $[0,1]$.
We now invoke the classical [Glivenko-Cantelli theorem](/theorems/2004). Its hypothesis is exactly that $X_1,X_2,\dots$ are independent and identically distributed real-valued random variables with common distribution function $F$, which is the sampling model under the true distribution. Therefore
\begin{align*}
\sup_{x \in \mathbb{R}} |F_n(x)-F(x)| \to 0
\end{align*}
$\mathbb{P}_F$-almost surely.
Why does [uniform convergence](/page/Uniform%20Convergence) imply convergence of the weighted $L^2(\mu_0)$ discrepancies? For each $x \in \mathbb{R}$,
\begin{align*}
\Delta_n(x)-\Delta(x)=(F_n(x)-F_0(x))-(F(x)-F_0(x)).
\end{align*}
After cancelling the two occurrences of $F_0(x)$, this becomes
\begin{align*}
\Delta_n(x)-\Delta(x)=F_n(x)-F(x).
\end{align*}
Using the algebraic identity $a^2-b^2=(a-b)(a+b)$ with $a=\Delta_n(x)$ and $b=\Delta(x)$, and using $|\Delta_n(x)| \leq 1$ and $|\Delta(x)| \leq 1$, we obtain
\begin{align*}
\left|(\Delta_n(x))^2-(\Delta(x))^2\right|= |\Delta_n(x)-\Delta(x)|\,|\Delta_n(x)+\Delta(x)|.
\end{align*}
Substituting $\Delta_n(x)-\Delta(x)=F_n(x)-F(x)$ and applying the triangle inequality to $\Delta_n(x)+\Delta(x)$ gives
\begin{align*}
\left|(\Delta_n(x))^2-(\Delta(x))^2\right| \leq |F_n(x)-F(x)|\bigl(|\Delta_n(x)|+|\Delta(x)|\bigr).
\end{align*}
Since $|\Delta_n(x)| \leq 1$ and $|\Delta(x)| \leq 1$, we conclude
\begin{align*}
\left|(\Delta_n(x))^2-(\Delta(x))^2\right| \leq 2|F_n(x)-F(x)|.
\end{align*}
Now integrate with respect to $\mu_0$. Since $\mu_0$ is a probability measure induced by the distribution function $F_0$, we have $\mu_0(\mathbb{R})=1$, and hence
First, the triangle inequality for integrals gives
\begin{align*}
\left|\int_{\mathbb{R}}\bigl((\Delta_n(x))^2-(\Delta(x))^2\bigr)\,d\mu_0(x)\right| \leq \int_{\mathbb{R}}\left|(\Delta_n(x))^2-(\Delta(x))^2\right|\,d\mu_0(x).
\end{align*}
Using the pointwise bound above gives
\begin{align*}
\int_{\mathbb{R}}\left|(\Delta_n(x))^2-(\Delta(x))^2\right|\,d\mu_0(x) \leq \int_{\mathbb{R}}2|F_n(x)-F(x)|\,d\mu_0(x).
\end{align*}
Finally, since $|F_n(x)-F(x)| \leq \sup_{y \in \mathbb{R}}|F_n(y)-F(y)|$ for every $x \in \mathbb{R}$, and since $\mu_0(\mathbb{R})=1$, we obtain
\begin{align*}
\int_{\mathbb{R}}2|F_n(x)-F(x)|\,d\mu_0(x) \leq 2\sup_{y \in \mathbb{R}}|F_n(y)-F(y)|.
\end{align*}
The right-hand side converges to $0$ $\mathbb{P}_F$-almost surely by Glivenko-Cantelli. Therefore
\begin{align*}
\int_{\mathbb{R}}(F_n(x)-F_0(x))^2\,d\mu_0(x) \to \int_{\mathbb{R}}(F(x)-F_0(x))^2\,d\mu_0(x)
\end{align*}
$\mathbb{P}_F$-almost surely.
[/guided]
[/step]
[step:Use the positive limiting discrepancy to force $W_n^2$ to diverge]
Define, for each $n \in \mathbb{N}$,
\begin{align*}
I_n := \int_{\mathbb{R}}(F_n(x)-F_0(x))^2\,d\mu_0(x).
\end{align*}
Define the Cramer-von Mises statistic $W_n^2: \Omega \to [0,\infty)$ by
\begin{align*}
W_n^2 := nI_n.
\end{align*}
Define the limiting discrepancy $I \in [0,1]$ by
\begin{align*}
I := \int_{\mathbb{R}}(F(x)-F_0(x))^2\,d\mu_0(x).
\end{align*}
The previous step gives $I_n \to I$ $\mathbb{P}_F$-almost surely, and the hypothesis gives $I>0$. Therefore, on an event $A$ with $\mathbb{P}_F(A)=1$, for every $\omega \in A$ there exists $N_1(\omega) \in \mathbb{N}$ such that
\begin{align*}
I_n(\omega) \geq \frac{I}{2}
\end{align*}
for all $n \geq N_1(\omega)$. Hence, for all $n \geq N_1(\omega)$,
\begin{align*}
W_n^2(\omega)=nI_n(\omega)\geq \frac{nI}{2}.
\end{align*}
Thus $W_n^2 \to \infty$ $\mathbb{P}_F$-almost surely.
[/step]
[step:Compare the diverging statistic with bounded critical values]
Let $(c_n)_{n=1}^{\infty}$ denote the bounded sequence of critical values for the Cramer-von Mises test. By definition, the rejection region at sample size $n$ is exactly $\{W_n^2>c_n\}$. Since the sequence $(c_n)_{n=1}^{\infty}$ is bounded, choose a finite constant $C \in \mathbb{R}$ such that $c_n \leq C$ for every $n \in \mathbb{N}$. On the probability-one event from the previous step, for each $\omega$ there exists $N_2(\omega) \in \mathbb{N}$ such that
\begin{align*}
W_n^2(\omega) > C
\end{align*}
for all $n \geq N_2(\omega)$. Since $c_n \leq C$, this implies
\begin{align*}
\mathbb{1}_{\{W_n^2>c_n\}}(\omega) \to 1
\end{align*}
for $\mathbb{P}_F$-almost every $\omega$. For each $n \in \mathbb{N}$, define the indicator function $Y_n: \Omega \to \{0,1\}$ by $Y_n(\omega)=\mathbb{1}_{\{W_n^2>c_n\}}(\omega)$. The almost sure convergence just proved says $Y_n \to 1$ $\mathbb{P}_F$-almost surely, and the domination condition for the bounded convergence theorem is $0 \leq Y_n \leq 1$, where the constant function $1: \Omega \to \mathbb{R}$, $\omega \mapsto 1$, is integrable because $\mathbb{P}_F$ is a probability measure. Therefore the bounded convergence theorem applies to $(Y_n)_{n=1}^{\infty}$ and gives
\begin{align*}
\mathbb{P}_F(W_n^2>c_n)=\int_{\Omega}Y_n(\omega)\,d\mathbb{P}_F(\omega) \to \int_{\Omega}1\,d\mathbb{P}_F(\omega)=1.
\end{align*}
Therefore the rejection probability tends to $1$, so the Cramér-von Mises test with bounded critical values is consistent against the fixed alternative $F$.
[/step]