[step:Reduce the statistic to the uniform empirical process]For each $n \in \mathbb{N}$, define the empirical distribution function $F_n: \mathbb{R} \to [0,1]$ by
\begin{align*}
F_n(x)=\frac{1}{n}\sum_{i=1}^{n}\mathbb{1}_{(-\infty,x]}(X_i).
\end{align*}
Define the generalized inverse $Q_0: (0,1) \to \mathbb{R}$ of $F_0$ by
\begin{align*}
Q_0(t)=\inf\{x \in \mathbb{R}: F_0(x) \geq t\}.
\end{align*}
Let $\mathcal{L}^1$ denote one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$. Let $\lambda_{[0,1]}$ denote the probability measure on $([0,1],\mathcal{B}([0,1]))$ defined by $\lambda_{[0,1]}(A)=\mathcal{L}^1(A)$ for every Borel set $A \subset [0,1]$. We write $\operatorname{Unif}(0,1)$ for this probability distribution. Let $U_1,U_2,\dots$ be i.i.d. random variables with distribution $\operatorname{Unif}(0,1)$, and for each $n \in \mathbb{N}$ define the uniform empirical distribution function $H_n: [0,1] \to [0,1]$ by
\begin{align*}
H_n(t) = \frac{1}{n}\sum_{i=1}^{n}\mathbb{1}_{[0,t]}(U_i).
\end{align*}
We justify the probability integral transform in the continuous non-strict case. Let $X$ be any real-valued [random variable](/page/Random%20Variable) with distribution function $F_0$. For $u \in (0,1)$, define
\begin{align*}
r_u=\sup\{x \in \mathbb{R}: F_0(x) \leq u\}.
\end{align*}
Since $F_0$ is nondecreasing, continuous, and has limits $0$ and $1$ at $-\infty$ and $+\infty$, the [intermediate value theorem](/theorems/180) and the definition of $r_u$ give $F_0(r_u)=u$. Moreover, monotonicity gives the event identity $\{F_0(X) \leq u\}=\{X \leq r_u\}$ up to the possible level set $\{x:F_0(x)=u\}$, which has probability $F_0(r_u)-\lim_{x \uparrow r_u}F_0(x)=0$ by continuity. Therefore
\begin{align*}
\mathbb{P}(F_0(X) \leq u)=\mathbb{P}(X \leq r_u)=F_0(r_u)=u.
\end{align*}
The endpoint cases $u=0$ and $u=1$ follow from the endpoint limits of the distribution function. Thus $F_0(X)$ has distribution $\operatorname{Unif}(0,1)$. Therefore the random variables $F_0(X_i)$ are i.i.d. with distribution $\operatorname{Unif}(0,1)$. Since $U_1,U_2,\dots$ are also i.i.d. uniform variables, the vectors $(F_0(X_1),\dots,F_0(X_n))$ and $(U_1,\dots,U_n)$ have the same distribution.
We now justify the distribution-free identity without using a pointwise null event depending on $x$. For the given sample $X_1,\dots,X_n$, define $Y_i=F_0(X_i)$ for $1 \leq i \leq n$, and define the empirical distribution function of $Y_1,\dots,Y_n$ by $G_n: [0,1] \to [0,1]$,
\begin{align*}
G_n(t)=\frac{1}{n}\sum_{i=1}^{n}\mathbb{1}_{[0,t]}(Y_i).
\end{align*}
It suffices to prove the pathwise identity
\begin{align*}
\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|
=
\sup_{0 \leq t \leq 1}|G_n(t)-t|
\end{align*}
outside a probability-zero event, because $G_n$ has the same distribution as $H_n$.
Let $Y_{(1)} \leq \cdots \leq Y_{(n)}$ denote the order statistics of $Y_1,\dots,Y_n$. Since $Y_1,\dots,Y_n$ are i.i.d. with the continuous distribution $\operatorname{Unif}(0,1)$, there is a probability-one event $\Omega_{n,\mathrm{good}}$ on which $0<Y_{(1)}<\cdots<Y_{(n)}<1$. Work on $\Omega_{n,\mathrm{good}}$.
For the empirical distribution function $G_n$, monotonicity shows that its positive and negative deviations are attained at the jump levels and immediately before the jump levels. More precisely,
\begin{align*}
\sup_{0 \leq t \leq 1}(G_n(t)-t)=\max_{1 \leq j \leq n}\left(\frac{j}{n}-Y_{(j)}\right).
\end{align*}
Also,
\begin{align*}
\sup_{0 \leq t \leq 1}(t-G_n(t))=\max_{1 \leq j \leq n}\left(Y_{(j)}-\frac{j-1}{n}\right).
\end{align*}
Thus
\begin{align*}
\sup_{0 \leq t \leq 1}|G_n(t)-t|=\max_{1 \leq j \leq n}\max\left\{\frac{j}{n}-Y_{(j)},Y_{(j)}-\frac{j-1}{n}\right\}.
\end{align*}
Let $X_{(j)}$ be an ordering of the sample chosen so that $F_0(X_{(j)})=Y_{(j)}$. This ordering exists on $\Omega_{n,\mathrm{good}}$ because the map $x \mapsto F_0(x)$ is nondecreasing and the values $Y_i$ are distinct. At $x=X_{(j)}$, exactly $j$ sample points are at most $x$, so
\begin{align*}
F_n(X_{(j)})-F_0(X_{(j)})=\frac{j}{n}-Y_{(j)}.
\end{align*}
By continuity of $F_0$, points $x$ increasing to $X_{(j)}$ from below through values with no additional sample point satisfy $F_n(x)=(j-1)/n$ and $F_0(x) \to Y_{(j)}$, so
\begin{align*}
\sup_{x \in \mathbb{R}}(F_0(x)-F_n(x)) \geq Y_{(j)}-\frac{j-1}{n}.
\end{align*}
Conversely, if $x \in \mathbb{R}$ and exactly $j$ sample points are at most $x$, then monotonicity gives $Y_{(j)} \leq F_0(x) \leq Y_{(j+1)}$, with the endpoint conventions $Y_{(0)}=0$ and $Y_{(n+1)}=1$. Hence the positive and negative deviations of $F_n-F_0$ are bounded by the same displayed maxima. Therefore
\begin{align*}
\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|=\sup_{0 \leq t \leq 1}|G_n(t)-t|
\end{align*}
on the single probability-one event $\Omega_{n,\mathrm{good}}$. Thus
\begin{align*}
\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|
=
\sup_{0 \leq t \leq 1}|G_n(t)-t|
\end{align*}
almost surely, and therefore
\begin{align*}
\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|
\overset{d}{=}
\sup_{0 \leq t \leq 1}|H_n(t)-t|.
\end{align*}
Equivalently,
\begin{align*}
\sqrt{n}\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|
\overset{d}{=}
\sup_{0 \leq t \leq 1}\left|\sqrt{n}(H_n(t)-t)\right|.
\end{align*}
The reduction is therefore to prove convergence of the supremum norm of the uniform empirical process.[/step]