[guided]The statistic is built from the weighted squared distance between $F_n$ and $F_0$, so the first task is to show that this weighted distance converges to the corresponding fixed distance between $F$ and $F_0$. The theorem statement defines $F_0: \mathbb{R} \to [0,1]$ and $F: \mathbb{R} \to [0,1]$ as distribution functions and $F_n: \mathbb{R} \to [0,1]$ as the empirical distribution function of the sample. The probability space $(\Omega,\mathcal{F},\mathbb{P}_F)$ carries the sample sequence $(X_n)_{n=1}^{\infty}$ under the true distribution function $F$. Let $\mu_0$ be the probability measure on $\mathbb{R}$ induced by the distribution function $F_0$. Define the discrepancy function $\Delta: \mathbb{R} \to [-1,1]$ by
\begin{align*}
\Delta(x)=F(x)-F_0(x).
\end{align*}
For each $n \in \mathbb{N}$, define the discrepancy function $\Delta_n: \mathbb{R} \to [-1,1]$ by
\begin{align*}
\Delta_n(x)=F_n(x)-F_0(x).
\end{align*}
These functions take values in $[-1,1]$ because all distribution functions take values in $[0,1]$.
We now invoke the classical [Glivenko-Cantelli theorem](/theorems/2004). Its hypothesis is exactly that $X_1,X_2,\dots$ are independent and identically distributed real-valued random variables with common distribution function $F$, which is the sampling model under the true distribution. Therefore
\begin{align*}
\sup_{x \in \mathbb{R}} |F_n(x)-F(x)| \to 0
\end{align*}
$\mathbb{P}_F$-almost surely.
Why does [uniform convergence](/page/Uniform%20Convergence) imply convergence of the weighted $L^2(\mu_0)$ discrepancies? For each $x \in \mathbb{R}$,
\begin{align*}
\Delta_n(x)-\Delta(x)=(F_n(x)-F_0(x))-(F(x)-F_0(x)).
\end{align*}
After cancelling the two occurrences of $F_0(x)$, this becomes
\begin{align*}
\Delta_n(x)-\Delta(x)=F_n(x)-F(x).
\end{align*}
Using the algebraic identity $a^2-b^2=(a-b)(a+b)$ with $a=\Delta_n(x)$ and $b=\Delta(x)$, and using $|\Delta_n(x)| \leq 1$ and $|\Delta(x)| \leq 1$, we obtain
\begin{align*}
\left|(\Delta_n(x))^2-(\Delta(x))^2\right|= |\Delta_n(x)-\Delta(x)|\,|\Delta_n(x)+\Delta(x)|.
\end{align*}
Substituting $\Delta_n(x)-\Delta(x)=F_n(x)-F(x)$ and applying the triangle inequality to $\Delta_n(x)+\Delta(x)$ gives
\begin{align*}
\left|(\Delta_n(x))^2-(\Delta(x))^2\right| \leq |F_n(x)-F(x)|\bigl(|\Delta_n(x)|+|\Delta(x)|\bigr).
\end{align*}
Since $|\Delta_n(x)| \leq 1$ and $|\Delta(x)| \leq 1$, we conclude
\begin{align*}
\left|(\Delta_n(x))^2-(\Delta(x))^2\right| \leq 2|F_n(x)-F(x)|.
\end{align*}
Now integrate with respect to $\mu_0$. Since $\mu_0$ is a probability measure induced by the distribution function $F_0$, we have $\mu_0(\mathbb{R})=1$, and hence
First, the triangle inequality for integrals gives
\begin{align*}
\left|\int_{\mathbb{R}}\bigl((\Delta_n(x))^2-(\Delta(x))^2\bigr)\,d\mu_0(x)\right| \leq \int_{\mathbb{R}}\left|(\Delta_n(x))^2-(\Delta(x))^2\right|\,d\mu_0(x).
\end{align*}
Using the pointwise bound above gives
\begin{align*}
\int_{\mathbb{R}}\left|(\Delta_n(x))^2-(\Delta(x))^2\right|\,d\mu_0(x) \leq \int_{\mathbb{R}}2|F_n(x)-F(x)|\,d\mu_0(x).
\end{align*}
Finally, since $|F_n(x)-F(x)| \leq \sup_{y \in \mathbb{R}}|F_n(y)-F(y)|$ for every $x \in \mathbb{R}$, and since $\mu_0(\mathbb{R})=1$, we obtain
\begin{align*}
\int_{\mathbb{R}}2|F_n(x)-F(x)|\,d\mu_0(x) \leq 2\sup_{y \in \mathbb{R}}|F_n(y)-F(y)|.
\end{align*}
The right-hand side converges to $0$ $\mathbb{P}_F$-almost surely by Glivenko-Cantelli. Therefore
\begin{align*}
\int_{\mathbb{R}}(F_n(x)-F_0(x))^2\,d\mu_0(x) \to \int_{\mathbb{R}}(F(x)-F_0(x))^2\,d\mu_0(x)
\end{align*}
$\mathbb{P}_F$-almost surely.[/guided]