[proofplan]
We first use the probability integral transform to reduce the statistic from the original distribution $F_0$ to the empirical distribution function of independent uniform random variables on $[0,1]$. In that form, the statistic is the squared $L^2$ norm of the uniform empirical process. Donsker's theorem gives [weak convergence](/page/Weak%20Convergence) of this empirical process to a standard Brownian bridge in the uniform topology, and the map sending a path to the integral of its square is continuous for that topology. The conclusion then follows from the continuous mapping theorem.
[/proofplan]
[step:Transform the sample to uniform random variables]
Let $(\Omega,\mathcal F,\mathbb P)$ denote the probability space on which the random variables $(X_i)_{i\in\mathbb N}$ are defined. Let $\mathcal B([0,1])$ denote the Borel $\sigma$-algebra of the interval $[0,1]$ with its Euclidean [subspace topology](/page/Subspace%20Topology). For each $i \in \mathbb{N}$, define the measurable map $U_i:(\Omega,\mathcal F)\to([0,1],\mathcal B([0,1]))$ by $U_i(\omega):=F_0(X_i(\omega))$. Since $F_0$ is continuous and $X_i$ has distribution function $F_0$, the [Probability Integral Transform](/theorems/1139) gives $U_i \sim \operatorname{Unif}(0,1)$ for each $i$, where $\operatorname{Unif}(0,1)$ denotes the probability measure $\mathcal L^1|_{[0,1]}$ on $([0,1],\mathcal B([0,1]))$. Since each $U_i$ is a measurable function of $X_i$, and the random variables $(X_i)_{i\in\mathbb N}$ are independent, the random variables $(U_i)_{i\in\mathbb N}$ are independent. Thus $(U_i)_{i\in\mathbb N}$ are i.i.d. with distribution $\operatorname{Unif}(0,1)$.
Define the uniform empirical distribution function $H_n:[0,1]\to[0,1]$ by
\begin{align*}
H_n(t)=\frac{1}{n}\sum_{i=1}^n \mathbb{1}_{[0,t]}(U_i)
\end{align*}
for $t\in[0,1]$.
[/step]
[step:Rewrite the Cramér-von Mises statistic as a functional of the uniform empirical process]
For each $n\in\mathbb N$, define the empirical distribution function of the original sample $F_n:\mathbb R\to[0,1]$ by
\begin{align*}
F_n(x)=\frac{1}{n}\sum_{i=1}^n \mathbb{1}_{(-\infty,x]}(X_i)
\end{align*}
for $x\in\mathbb R$. The Cramér-von Mises statistic is the real-valued [random variable](/page/Random%20Variable) $W_n^2:\Omega\to\mathbb R$ defined by
\begin{align*}
W_n^2=n\int_{\mathbb R}\bigl(F_n(x)-F_0(x)\bigr)^2\,dF_0(x),
\end{align*}
where $F_0$ also denotes the probability measure on $(\mathbb R,\mathcal B(\mathbb R))$ induced by the distribution function $F_0$.
We claim that, almost surely,
\begin{align*}
W_n^2
=
\int_0^1 \left(\sqrt{n}\bigl(H_n(t)-t\bigr)\right)^2\,d\mathcal{L}^1(t).
\end{align*}
Let $\mu_0$ denote the probability measure on $(\mathbb R,\mathcal B(\mathbb R))$ induced by the distribution function $F_0$. Define the pushforward measure $(F_0)_\#\mu_0$ on $([0,1],\mathcal B([0,1]))$ by
\begin{align*}
(F_0)_\#\mu_0(E)=\mu_0(F_0^{-1}(E))
\end{align*}
for every $E\in\mathcal B([0,1])$. Since $F_0$ is continuous, the [Probability Integral Transform](/theorems/1139) gives $(F_0)_\#\mu_0=\mathcal{L}^1|_{[0,1]}$. Equivalently, for every bounded Borel function $\varphi: [0,1]\to\mathbb{R}$,
\begin{align*}
\int_{\mathbb{R}} \varphi(F_0(x))\,d\mu_0(x)
=
\int_0^1 \varphi(t)\,d\mathcal{L}^1(t).
\end{align*}
For each fixed $\omega\in\Omega$ and each $i\in\{1,\dots,n\}$, define
\begin{align*}
A_i(\omega):=\{x\in\mathbb R: \mathbb{1}_{(-\infty,x]}(X_i(\omega))\ne \mathbb{1}_{[0,F_0(x)]}(F_0(X_i(\omega)))\}.
\end{align*}
If $x\in A_i(\omega)$, monotonicity of $F_0$ implies $x<X_i(\omega)$ and $F_0(x)=F_0(X_i(\omega))$. Hence $A_i(\omega)\subset F_0^{-1}(\{F_0(X_i(\omega))\})$. The pushforward identity above gives
\begin{align*}
F_0\bigl(F_0^{-1}(\{F_0(X_i(\omega))\})\bigr)=\mathcal L^1(\{F_0(X_i(\omega))\})=0,
\end{align*}
so $F_0(A_i(\omega))=0$. Since the finite union $A(\omega):=\bigcup_{i=1}^n A_i(\omega)$ also has $F_0$-measure zero, for every $x\in\mathbb R\setminus A(\omega)$ we have
\begin{align*}
F_n(x)=H_n(F_0(x)).
\end{align*}
Using this identity for $F_0$-almost every $x$ and the preceding change of variables with the map $\varphi:[0,1]\to\mathbb{R}$ defined by $\varphi(t):=\bigl(H_n(t)-t\bigr)^2$, we obtain
\begin{align*}
W_n^2=n\int_{\mathbb{R}}\bigl(F_n(x)-F_0(x)\bigr)^2\,dF_0(x)
\end{align*}
by the definition of $W_n^2$. Substituting $F_n(x)=H_n(F_0(x))$ for $F_0$-almost every $x$ gives
\begin{align*}
W_n^2=n\int_{\mathbb{R}}\bigl(H_n(F_0(x))-F_0(x)\bigr)^2\,dF_0(x).
\end{align*}
The pushforward change of variables for $F_0$ gives
\begin{align*}
W_n^2=n\int_0^1 \bigl(H_n(t)-t\bigr)^2\,d\mathcal{L}^1(t).
\end{align*}
Rewriting the factor $n$ inside the square yields
\begin{align*}
W_n^2=\int_0^1 \left(\sqrt{n}\bigl(H_n(t)-t\bigr)\right)^2\,d\mathcal{L}^1(t).
\end{align*}
[guided]
The point of this step is to remove the unknown distribution function $F_0$ from the statistic. We do this by using $F_0$ itself as a change of variables. Define the map $\varphi:[0,1]\to\mathbb{R}$ by $\varphi(t):=\bigl(H_n(t)-t\bigr)^2$.
Because $F_0$ is continuous, the probability integral transform says that $F_0(X)$ is uniformly distributed on $[0,1]$ whenever $X$ has distribution function $F_0$. In measure-theoretic language, the pushforward of the probability measure $dF_0$ under the map $F_0:\mathbb{R}\to[0,1]$ is one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on $[0,1]$. Therefore
\begin{align*}
\int_{\mathbb{R}} \varphi(F_0(x))\,dF_0(x)
=
\int_0^1 \varphi(t)\,d\mathcal{L}^1(t).
\end{align*}
We also need to compare the empirical distribution function $F_n$ of the original sample with the empirical distribution function $H_n$ of the transformed sample. For $F_0$-almost every $x \in \mathbb{R}$, the events $\{X_i \le x\}$ and $\{F_0(X_i)\le F_0(x)\}$ agree. The only possible failure comes from flat parts of the continuous distribution function, and such intervals carry zero $F_0$-measure. Hence, for $F_0$-almost every $x$,
\begin{align*}
F_n(x)
=
\frac{1}{n}\sum_{i=1}^n \mathbb{1}_{(-\infty,x]}(X_i)
=
\frac{1}{n}\sum_{i=1}^n \mathbb{1}_{[0,F_0(x)]}(U_i)
=
H_n(F_0(x)).
\end{align*}
Substituting this identity into the definition of $W_n^2$ gives
\begin{align*}
W_n^2=n\int_{\mathbb{R}}\bigl(F_n(x)-F_0(x)\bigr)^2\,dF_0(x).
\end{align*}
Using $F_n(x)=H_n(F_0(x))$ for $F_0$-almost every $x$, we get
\begin{align*}
W_n^2=n\int_{\mathbb{R}}\bigl(H_n(F_0(x))-F_0(x)\bigr)^2\,dF_0(x).
\end{align*}
Now apply the pushforward change of variables with the displayed function $\varphi$. This yields
\begin{align*}
W_n^2=n\int_0^1 \bigl(H_n(t)-t\bigr)^2\,d\mathcal{L}^1(t).
\end{align*}
Rewriting the factor $n$ inside the square gives
\begin{align*}
W_n^2=\int_0^1 \left(\sqrt{n}\bigl(H_n(t)-t\bigr)\right)^2\,d\mathcal{L}^1(t).
\end{align*}
This is the desired reduction: the statistic is now a deterministic functional of the uniform empirical process.
[/guided]
[/step]
[step:Apply Donsker convergence to the uniform empirical process]
Let $\ell^\infty([0,1])$ denote the [vector space](/page/Vector%20Space) of bounded Borel maps $g:[0,1]\to\mathbb R$, equipped with the uniform norm
\begin{align*}
\|g\|_\infty := \sup_{t\in[0,1]} |g(t)|.
\end{align*}
For each $n \in \mathbb{N}$, define the empirical process $G_n:[0,1]\to\mathbb{R}$ by
\begin{align*}
G_n(t)=\sqrt{n}\bigl(H_n(t)-t\bigr)
\end{align*}
for $t\in[0,1]$.
The maps $G_n$ belong to $\ell^\infty([0,1])$ because each $H_n$ is a bounded step function. We use the version of [Donsker's Invariance Principle](/theorems/1189) for the uniform empirical distribution process which directly gives convergence in distribution in $\ell^\infty([0,1])$ equipped with the uniform norm, including the usual empirical-process measurability and topology conventions for this separable limit. Its hypotheses are satisfied for the class of indicator functions $\{\mathbb{1}_{[0,t]}:t\in[0,1]\}$ because $(U_i)_{i\in\mathbb N}$ are i.i.d. with common law $\operatorname{Unif}(0,1)$. Therefore
\begin{align*}
G_n \xrightarrow{d} B
\end{align*}
in $\ell^\infty([0,1])$ equipped with $\|\cdot\|_\infty$, where the limiting process $B:[0,1]\to\mathbb{R}$ is a standard Brownian bridge.
[guided]
We now invoke the empirical-process limit theorem in exactly the setting to which it applies. The space $\ell^\infty([0,1])$ is the vector space of bounded Borel maps $g:[0,1]\to\mathbb R$ with norm
\begin{align*}
\|g\|_\infty := \sup_{t\in[0,1]} |g(t)|.
\end{align*}
For each $n\in\mathbb N$, the empirical distribution function $H_n$ is a bounded step function, so
\begin{align*}
G_n(t)=\sqrt n\bigl(H_n(t)-t\bigr),\qquad t\in[0,1],
\end{align*}
defines an element of $\ell^\infty([0,1])$.
We use the version of [Donsker's Invariance Principle](/theorems/1189) for the uniform empirical distribution process that directly states convergence in distribution in $\ell^\infty([0,1])$ with the uniform norm. This formulation includes the usual empirical-process measurability and topology conventions, so the convergence statement is interpreted in the standard separable version of the uniform empirical process. This version requires an i.i.d. sample with common uniform distribution on $[0,1]$ and considers the empirical process indexed by intervals $[0,t]$. Those hypotheses have been verified in the first step: $(U_i)_{i\in\mathbb N}$ are i.i.d. with distribution $\operatorname{Unif}(0,1)$, and $H_n(t)=n^{-1}\sum_{i=1}^n\mathbb{1}_{[0,t]}(U_i)$ is exactly the empirical distribution function indexed by the interval $[0,t]$. Hence the theorem gives
\begin{align*}
G_n \xrightarrow{d} B
\end{align*}
in the uniform norm topology on $\ell^\infty([0,1])$, where $B:[0,1]\to\mathbb R$ is a standard Brownian bridge. This is the probabilistic limit input needed before applying the deterministic squared-integral functional.
[/guided]
[/step]
[step:Use continuity of the squared integral functional]
Define the functional $\Phi:\ell^\infty([0,1])\to\mathbb{R}$ by
\begin{align*}
\Phi(g)=\int_0^1 g(t)^2\,d\mathcal{L}^1(t)
\end{align*}
for $g\in\ell^\infty([0,1])$.
We show that $\Phi$ is continuous with respect to the uniform norm. Let $g,h \in \ell^\infty([0,1])$. Taking absolute values and using $g(t)^2-h(t)^2=(g(t)-h(t))(g(t)+h(t))$, we get
\begin{align*}
|\Phi(g)-\Phi(h)|=\left|\int_0^1 \bigl(g(t)^2-h(t)^2\bigr)\,d\mathcal{L}^1(t)\right|.
\end{align*}
The triangle inequality for the [Lebesgue integral](/page/Lebesgue%20Integral) gives
\begin{align*}
|\Phi(g)-\Phi(h)|\le\int_0^1 |g(t)-h(t)|\,|g(t)+h(t)|\,d\mathcal{L}^1(t).
\end{align*}
Using the definitions of the uniform norms of $g-h$, $g$, and $h$, we obtain
\begin{align*}
|\Phi(g)-\Phi(h)|\le\|g-h\|_\infty\bigl(\|g\|_\infty+\|h\|_\infty\bigr)\mathcal{L}^1([0,1]).
\end{align*}
Since $\mathcal{L}^1([0,1])=1$, this becomes
\begin{align*}
|\Phi(g)-\Phi(h)|\le\|g-h\|_\infty\bigl(\|g\|_\infty+\|h\|_\infty\bigr).
\end{align*}
Thus if $h \to g$ in $\|\cdot\|_\infty$, then $\|h\|_\infty \to \|g\|_\infty$ by the [reverse triangle inequality](/theorems/2300), and the displayed estimate implies $\Phi(h)\to \Phi(g)$. Therefore $\Phi$ is continuous on $\ell^\infty([0,1])$.
[guided]
We need continuity of the map that turns a sample path into the squared integral appearing in the limit. Define $\Phi:\ell^\infty([0,1])\to\mathbb R$ by
\begin{align*}
\Phi(g)=\int_0^1 g(t)^2\,d\mathcal{L}^1(t)
\end{align*}
for $g\in\ell^\infty([0,1])$. This integral is finite because $g$ is bounded and $\mathcal L^1([0,1])=1$.
Let $g,h\in\ell^\infty([0,1])$. To compare $\Phi(g)$ and $\Phi(h)$, use the algebraic identity $g(t)^2-h(t)^2=(g(t)-h(t))(g(t)+h(t))$ for each $t\in[0,1]$. Taking absolute values gives
\begin{align*}
|\Phi(g)-\Phi(h)|
=\left|\int_0^1 \bigl(g(t)^2-h(t)^2\bigr)\,d\mathcal{L}^1(t)\right|.
\end{align*}
Applying the triangle inequality for the Lebesgue integral gives
\begin{align*}
|\Phi(g)-\Phi(h)|
\le\int_0^1 |g(t)-h(t)|\,|g(t)+h(t)|\,d\mathcal{L}^1(t).
\end{align*}
By the definition of the uniform norm,
\begin{align*}
|g(t)-h(t)|\le \|g-h\|_\infty
\end{align*}
and
\begin{align*}
|g(t)+h(t)|\le \|g\|_\infty+\|h\|_\infty
\end{align*}
for every $t\in[0,1]$. Therefore
\begin{align*}
|\Phi(g)-\Phi(h)|
\le\|g-h\|_\infty\bigl(\|g\|_\infty+\|h\|_\infty\bigr)\mathcal{L}^1([0,1]).
\end{align*}
Since $\mathcal L^1([0,1])=1$, this becomes
\begin{align*}
|\Phi(g)-\Phi(h)|
=\|g-h\|_\infty\bigl(\|g\|_\infty+\|h\|_\infty\bigr).
\end{align*}
Now suppose $h\to g$ in the uniform norm. The reverse triangle inequality gives $\|h\|_\infty\to\|g\|_\infty$, so the factor $\|g\|_\infty+\|h\|_\infty$ remains bounded while $\|g-h\|_\infty\to0$. The displayed estimate then gives $\Phi(h)\to\Phi(g)$. Hence $\Phi$ is continuous on $\ell^\infty([0,1])$.
[/guided]
[/step]
[step:Conclude by the continuous mapping theorem]
By the preceding step, $\Phi$ is continuous on $\ell^\infty([0,1])$. Since $G_n \xrightarrow{d} B$ in $\ell^\infty([0,1])$, the [Continuous Mapping Theorem](/theorems/1847) gives
\begin{align*}
\Phi(G_n) \xrightarrow{d} \Phi(B).
\end{align*}
By the definition of $\Phi$ and the reduction already proved,
\begin{align*}
\Phi(G_n)=\int_0^1 \left(\sqrt{n}\bigl(H_n(t)-t\bigr)\right)^2\,d\mathcal{L}^1(t)=W_n^2.
\end{align*}
Also,
\begin{align*}
\Phi(B)=\int_0^1 B(t)^2\,d\mathcal{L}^1(t).
\end{align*}
Therefore
\begin{align*}
W_n^2 \xrightarrow{d} \int_0^1 B(t)^2\,d\mathcal{L}^1(t),
\end{align*}
which proves the theorem.
[guided]
The final step is an application of the [Continuous Mapping Theorem](/theorems/1847). Its input is convergence in distribution in a topological function space, and its hypothesis on the map is continuity. We have both ingredients: the previous step gives
\begin{align*}
G_n \xrightarrow{d} B
\end{align*}
in $\ell^\infty([0,1])$, and the squared-integral map $\Phi:\ell^\infty([0,1])\to\mathbb R$ was proved continuous with respect to the uniform norm. Hence the theorem yields
\begin{align*}
\Phi(G_n) \xrightarrow{d} \Phi(B).
\end{align*}
We now identify both sides. From the definition of $\Phi$ and the earlier rewriting of the statistic,
\begin{align*}
\Phi(G_n)=\int_0^1 \left(\sqrt{n}\bigl(H_n(t)-t\bigr)\right)^2\,d\mathcal{L}^1(t)=W_n^2.
\end{align*}
For the limiting Brownian bridge $B:[0,1]\to\mathbb R$,
\begin{align*}
\Phi(B)=\int_0^1 B(t)^2\,d\mathcal{L}^1(t).
\end{align*}
Substituting these identities into the continuous-mapping conclusion gives
\begin{align*}
W_n^2 \xrightarrow{d} \int_0^1 B(t)^2\,d\mathcal{L}^1(t),
\end{align*}
which is exactly the asserted Cramer-von Mises Brownian bridge limit.
[/guided]
[/step]