[proofplan]
The proof first removes the unknown continuous distribution function by the probability integral transform and the distribution-free property of the Kolmogorov-Smirnov statistic. This reduces the statistic to the supremum norm of the uniform empirical process on $[0,1]$. Donsker's theorem gives convergence of that empirical process to a standard Brownian bridge, and the continuous mapping theorem transfers this convergence through the supremum norm. The final displayed distribution function is the classical Kolmogorov formula, obtained from the reflection-principle computation for [Brownian motion](/page/Brownian%20Motion) conditioned to return to zero at time $1$.
[/proofplan]
[step:Reduce the statistic to the uniform empirical process]
For each $n \in \mathbb{N}$, define the empirical distribution function $F_n: \mathbb{R} \to [0,1]$ by
\begin{align*}
F_n(x)=\frac{1}{n}\sum_{i=1}^{n}\mathbb{1}_{(-\infty,x]}(X_i).
\end{align*}
Define the generalized inverse $Q_0: (0,1) \to \mathbb{R}$ of $F_0$ by
\begin{align*}
Q_0(t)=\inf\{x \in \mathbb{R}: F_0(x) \geq t\}.
\end{align*}
Let $\mathcal{L}^1$ denote one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$. Let $\lambda_{[0,1]}$ denote the probability measure on $([0,1],\mathcal{B}([0,1]))$ defined by $\lambda_{[0,1]}(A)=\mathcal{L}^1(A)$ for every Borel set $A \subset [0,1]$. We write $\operatorname{Unif}(0,1)$ for this probability distribution. Let $U_1,U_2,\dots$ be i.i.d. random variables with distribution $\operatorname{Unif}(0,1)$, and for each $n \in \mathbb{N}$ define the uniform empirical distribution function $H_n: [0,1] \to [0,1]$ by
\begin{align*}
H_n(t) = \frac{1}{n}\sum_{i=1}^{n}\mathbb{1}_{[0,t]}(U_i).
\end{align*}
We justify the probability integral transform in the continuous non-strict case. Let $X$ be any real-valued [random variable](/page/Random%20Variable) with distribution function $F_0$. For $u \in (0,1)$, define
\begin{align*}
r_u=\sup\{x \in \mathbb{R}: F_0(x) \leq u\}.
\end{align*}
Since $F_0$ is nondecreasing, continuous, and has limits $0$ and $1$ at $-\infty$ and $+\infty$, the [intermediate value theorem](/theorems/180) and the definition of $r_u$ give $F_0(r_u)=u$. Moreover, monotonicity gives the event identity $\{F_0(X) \leq u\}=\{X \leq r_u\}$ up to the possible level set $\{x:F_0(x)=u\}$, which has probability $F_0(r_u)-\lim_{x \uparrow r_u}F_0(x)=0$ by continuity. Therefore
\begin{align*}
\mathbb{P}(F_0(X) \leq u)=\mathbb{P}(X \leq r_u)=F_0(r_u)=u.
\end{align*}
The endpoint cases $u=0$ and $u=1$ follow from the endpoint limits of the distribution function. Thus $F_0(X)$ has distribution $\operatorname{Unif}(0,1)$. Therefore the random variables $F_0(X_i)$ are i.i.d. with distribution $\operatorname{Unif}(0,1)$. Since $U_1,U_2,\dots$ are also i.i.d. uniform variables, the vectors $(F_0(X_1),\dots,F_0(X_n))$ and $(U_1,\dots,U_n)$ have the same distribution.
We now justify the distribution-free identity without using a pointwise null event depending on $x$. For the given sample $X_1,\dots,X_n$, define $Y_i=F_0(X_i)$ for $1 \leq i \leq n$, and define the empirical distribution function of $Y_1,\dots,Y_n$ by $G_n: [0,1] \to [0,1]$,
\begin{align*}
G_n(t)=\frac{1}{n}\sum_{i=1}^{n}\mathbb{1}_{[0,t]}(Y_i).
\end{align*}
It suffices to prove the pathwise identity
\begin{align*}
\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|
=
\sup_{0 \leq t \leq 1}|G_n(t)-t|
\end{align*}
outside a probability-zero event, because $G_n$ has the same distribution as $H_n$.
Let $Y_{(1)} \leq \cdots \leq Y_{(n)}$ denote the order statistics of $Y_1,\dots,Y_n$. Since $Y_1,\dots,Y_n$ are i.i.d. with the continuous distribution $\operatorname{Unif}(0,1)$, there is a probability-one event $\Omega_{n,\mathrm{good}}$ on which $0<Y_{(1)}<\cdots<Y_{(n)}<1$. Work on $\Omega_{n,\mathrm{good}}$.
For the empirical distribution function $G_n$, monotonicity shows that its positive and negative deviations are attained at the jump levels and immediately before the jump levels. More precisely,
\begin{align*}
\sup_{0 \leq t \leq 1}(G_n(t)-t)=\max_{1 \leq j \leq n}\left(\frac{j}{n}-Y_{(j)}\right).
\end{align*}
Also,
\begin{align*}
\sup_{0 \leq t \leq 1}(t-G_n(t))=\max_{1 \leq j \leq n}\left(Y_{(j)}-\frac{j-1}{n}\right).
\end{align*}
Thus
\begin{align*}
\sup_{0 \leq t \leq 1}|G_n(t)-t|=\max_{1 \leq j \leq n}\max\left\{\frac{j}{n}-Y_{(j)},Y_{(j)}-\frac{j-1}{n}\right\}.
\end{align*}
Let $X_{(j)}$ be an ordering of the sample chosen so that $F_0(X_{(j)})=Y_{(j)}$. This ordering exists on $\Omega_{n,\mathrm{good}}$ because the map $x \mapsto F_0(x)$ is nondecreasing and the values $Y_i$ are distinct. At $x=X_{(j)}$, exactly $j$ sample points are at most $x$, so
\begin{align*}
F_n(X_{(j)})-F_0(X_{(j)})=\frac{j}{n}-Y_{(j)}.
\end{align*}
By continuity of $F_0$, points $x$ increasing to $X_{(j)}$ from below through values with no additional sample point satisfy $F_n(x)=(j-1)/n$ and $F_0(x) \to Y_{(j)}$, so
\begin{align*}
\sup_{x \in \mathbb{R}}(F_0(x)-F_n(x)) \geq Y_{(j)}-\frac{j-1}{n}.
\end{align*}
Conversely, if $x \in \mathbb{R}$ and exactly $j$ sample points are at most $x$, then monotonicity gives $Y_{(j)} \leq F_0(x) \leq Y_{(j+1)}$, with the endpoint conventions $Y_{(0)}=0$ and $Y_{(n+1)}=1$. Hence the positive and negative deviations of $F_n-F_0$ are bounded by the same displayed maxima. Therefore
\begin{align*}
\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|=\sup_{0 \leq t \leq 1}|G_n(t)-t|
\end{align*}
on the single probability-one event $\Omega_{n,\mathrm{good}}$. Thus
\begin{align*}
\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|
=
\sup_{0 \leq t \leq 1}|G_n(t)-t|
\end{align*}
almost surely, and therefore
\begin{align*}
\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|
\overset{d}{=}
\sup_{0 \leq t \leq 1}|H_n(t)-t|.
\end{align*}
Equivalently,
\begin{align*}
\sqrt{n}\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|
\overset{d}{=}
\sup_{0 \leq t \leq 1}\left|\sqrt{n}(H_n(t)-t)\right|.
\end{align*}
The reduction is therefore to prove convergence of the supremum norm of the uniform empirical process.
[guided]
The role of the continuity hypothesis on $F_0$ is to make the null distribution independent of the particular continuous distribution function. Let $\mathcal{L}^1$ denote one-dimensional Lebesgue measure on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$. Let $\lambda_{[0,1]}$ be the probability measure on $([0,1],\mathcal{B}([0,1]))$ defined by $\lambda_{[0,1]}(A)=\mathcal{L}^1(A)$ for every Borel set $A \subset [0,1]$, and write $\operatorname{Unif}(0,1)$ for this distribution. First define the empirical distribution function $F_n: \mathbb{R} \to [0,1]$ by
\begin{align*}
F_n(x)=\frac{1}{n}\sum_{i=1}^{n}\mathbb{1}_{(-\infty,x]}(X_i).
\end{align*}
For a real-valued random variable $X$ with distribution function $F_0$, continuity and monotonicity of $F_0$ imply $\mathbb{P}(F_0(X) \leq u)=u$ for every $u \in [0,1]$. For $u \in (0,1)$, set
\begin{align*}
r_u=\sup\{x \in \mathbb{R}: F_0(x) \leq u\}.
\end{align*}
Because $F_0$ is continuous, nondecreasing, and has endpoint limits $0$ and $1$, we have $F_0(r_u)=u$. The only possible mismatch between $\{F_0(X) \leq u\}$ and $\{X \leq r_u\}$ lies in the level set where $F_0=u$, and that level set has probability zero by continuity of $F_0$. Hence
\begin{align*}
\mathbb{P}(F_0(X) \leq u)=\mathbb{P}(X \leq r_u)=F_0(r_u)=u.
\end{align*}
The endpoint cases are immediate from the endpoint limits. Hence $F_0(X_i)$ has distribution $\operatorname{Unif}(0,1)$. Since [measurable functions](/page/Measurable%20Functions) of independent random variables are independent, the transformed variables $F_0(X_1),\dots,F_0(X_n)$ are i.i.d. uniform variables.
Define $Y_i=F_0(X_i)$ for $1 \leq i \leq n$, and define the empirical distribution function $G_n: [0,1] \to [0,1]$ of $Y_1,\dots,Y_n$ by
\begin{align*}
G_n(t)=\frac{1}{n}\sum_{i=1}^{n}\mathbb{1}_{[0,t]}(Y_i).
\end{align*}
Because $(Y_1,\dots,Y_n)$ and $(U_1,\dots,U_n)$ have the same distribution, $G_n$ and $H_n$ have the same distribution as random distribution functions. It remains to prove a pathwise identity for the two suprema.
Let $Y_{(1)} \leq \cdots \leq Y_{(n)}$ denote the order statistics of $Y_1,\dots,Y_n$. Since the $Y_i$ are i.i.d. with the continuous distribution $\operatorname{Unif}(0,1)$, there is a probability-one event $\Omega_{n,\mathrm{good}}$ on which $0<Y_{(1)}<\cdots<Y_{(n)}<1$. Work on $\Omega_{n,\mathrm{good}}$. On each interval between successive order statistics, $G_n(t)$ is constant and $t \mapsto G_n(t)-t$ is decreasing. Therefore the largest positive deviation occurs at a jump value and the largest negative deviation occurs immediately before a jump value:
\begin{align*}
\sup_{0 \leq t \leq 1}(G_n(t)-t)=\max_{1 \leq j \leq n}\left(\frac{j}{n}-Y_{(j)}\right),
\end{align*}
and
\begin{align*}
\sup_{0 \leq t \leq 1}(t-G_n(t))=\max_{1 \leq j \leq n}\left(Y_{(j)}-\frac{j-1}{n}\right).
\end{align*}
Thus
\begin{align*}
\sup_{0 \leq t \leq 1}|G_n(t)-t|=\max_{1 \leq j \leq n}\max\left\{\frac{j}{n}-Y_{(j)},Y_{(j)}-\frac{j-1}{n}\right\}.
\end{align*}
Choose an ordering $X_{(1)},\dots,X_{(n)}$ of the sample such that $F_0(X_{(j)})=Y_{(j)}$. This is compatible with the order on the real line because $F_0$ is nondecreasing and the values $Y_i$ are distinct. At the point $X_{(j)}$, exactly $j$ sample points are at most $X_{(j)}$, so
\begin{align*}
F_n(X_{(j)})-F_0(X_{(j)})=\frac{j}{n}-Y_{(j)}.
\end{align*}
By continuity of $F_0$, approaching $X_{(j)}$ from below through points that do not cross another sample point gives $F_n(x)=(j-1)/n$ and $F_0(x) \to Y_{(j)}$. Hence
\begin{align*}
\sup_{x \in \mathbb{R}}(F_0(x)-F_n(x)) \geq Y_{(j)}-\frac{j-1}{n}.
\end{align*}
Conversely, if $x \in \mathbb{R}$ and exactly $j$ sample points are at most $x$, with endpoint conventions $Y_{(0)}=0$ and $Y_{(n+1)}=1$, monotonicity of $F_0$ gives $Y_{(j)} \leq F_0(x) \leq Y_{(j+1)}$. Therefore the positive and negative deviations of $F_n-F_0$ are bounded by the same maxima displayed above. Consequently
\begin{align*}
\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|
=
\sup_{0 \leq t \leq 1}|G_n(t)-t|
\end{align*}
on $\Omega_{n,\mathrm{good}}$, and hence almost surely. Because $G_n$ and $H_n$ have the same distribution, this becomes
\begin{align*}
\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|
\overset{d}{=}
\sup_{0 \leq t \leq 1}|H_n(t)-t|.
\end{align*}
Multiplying by $\sqrt{n}$ preserves equality in distribution, so
\begin{align*}
\sqrt{n}\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|
\overset{d}{=}
\sup_{0 \leq t \leq 1}\left|\sqrt{n}(H_n(t)-t)\right|.
\end{align*}
Thus the original statistic has been reduced to the supremum norm of the uniform empirical process $\alpha_n: [0,1] \to \mathbb{R}$ defined by
\begin{align*}
\alpha_n(t)=\sqrt{n}(H_n(t)-t).
\end{align*}
[/guided]
[/step]
[step:Apply Donsker convergence to the uniform empirical process]
Let $D([0,1])$ denote the Skorokhod space of càdlàg functions $g: [0,1] \to \mathbb{R}$, equipped with the Skorokhod $J_1$ topology. For each $n \in \mathbb{N}$, define the empirical process $\alpha_n: [0,1] \to \mathbb{R}$ by
\begin{align*}
\alpha_n(t)=\sqrt{n}(H_n(t)-t).
\end{align*}
The hypotheses of the [Donsker-Kolmogorov-Doob theorem](/theorems/2005) for the uniform empirical process are satisfied because $U_1,U_2,\dots$ are i.i.d. with distribution $\operatorname{Unif}(0,1)$ and $H_n$ is their empirical distribution function. Therefore
\begin{align*}
\alpha_n \xrightarrow{d} B
\end{align*}
in $D([0,1])$ with the Skorokhod $J_1$ topology, where $B$ is a standard Brownian bridge. The limiting process $B$ has almost surely continuous sample paths, so this mode of convergence can be passed through functionals that are continuous at continuous paths.
[guided]
We now invoke the [Donsker-Kolmogorov-Doob theorem](/theorems/2005) for the uniform empirical process. The theorem applies to the empirical distribution function of an i.i.d. sample with distribution $\operatorname{Unif}(0,1)$. Those hypotheses were verified in the previous step: the variables $U_1,U_2,\dots$ are i.i.d. uniform variables, and $H_n: [0,1] \to [0,1]$ is their empirical distribution function.
Define the empirical process $\alpha_n: [0,1] \to \mathbb{R}$ by
\begin{align*}
\alpha_n(t)=\sqrt{n}(H_n(t)-t).
\end{align*}
Donsker's theorem then gives convergence in distribution
\begin{align*}
\alpha_n \xrightarrow{d} B
\end{align*}
in the Skorokhod space $D([0,1])$ equipped with the Skorokhod $J_1$ topology. The limit $B$ is a standard Brownian bridge, meaning a centered Gaussian process on $[0,1]$ with covariance
\begin{align*}
\mathbb{E}[B(s)B(t)]=\min\{s,t\}-st \quad \text{for } 0 \leq s,t \leq 1,
\end{align*}
and with almost surely continuous sample paths. The almost sure continuity of the limit is the feature needed in the next step, because the supremum functional is continuous at continuous paths even though it is defined on the larger Skorokhod space.
[/guided]
[/step]
[step:Pass to the supremum norm by the continuous mapping theorem]
Let $C([0,1])$ denote the space of continuous functions $g: [0,1] \to \mathbb{R}$, equipped with the uniform norm. Define the supremum functional $\Phi: D([0,1]) \to [0,\infty)$ by
\begin{align*}
\Phi(g)=\sup_{0 \leq t \leq 1}|g(t)|.
\end{align*}
If $g \in C([0,1])$, then $\Phi$ is continuous at $g$ for the Skorokhod $J_1$ topology. Indeed, Skorokhod $J_1$ convergence $h_m \to g$ to a continuous limit $g$ implies [uniform convergence](/page/Uniform%20Convergence) of $h_m$ to $g$ on $[0,1]$. For such uniformly convergent $h_m \in D([0,1])$, the [reverse triangle inequality](/theorems/2300) gives
\begin{align*}
|\Phi(h_m)-\Phi(g)|
\leq
\sup_{0 \leq t \leq 1}|h_m(t)-g(t)|,
\end{align*}
and the right-hand side tends to $0$.
Define $\operatorname{Disc}(\Phi)=\{g \in D([0,1]): \Phi \text{ is not continuous at } g\}$ to be the discontinuity set of $\Phi$. This set is contained in $D([0,1]) \setminus C([0,1])$. Since $B \in C([0,1])$ almost surely, we have
\begin{align*}
\mathbb{P}(B \in \operatorname{Disc}(\Phi))=0.
\end{align*}
The hypotheses of the [Continuous Mapping Theorem](/theorems/1847) are satisfied, so applying it to $\alpha_n \xrightarrow{d} B$ gives
\begin{align*}
\Phi(\alpha_n) \xrightarrow{d} \Phi(B).
\end{align*}
Unpacking the definitions,
\begin{align*}
\sup_{0 \leq t \leq 1}\left|\sqrt{n}(H_n(t)-t)\right|
\xrightarrow{d}
\sup_{0 \leq t \leq 1}|B(t)|.
\end{align*}
Together with the distribution-free reduction, this proves
\begin{align*}
\sqrt{n}\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|
\xrightarrow{d}
\sup_{0 \leq t \leq 1}|B(t)|.
\end{align*}
[guided]
The functional we need to pass through the [weak convergence](/page/Weak%20Convergence) is the supremum norm. Define $\Phi: D([0,1]) \to [0,\infty)$ by
\begin{align*}
\Phi(g)=\sup_{0 \leq t \leq 1}|g(t)|.
\end{align*}
The [Continuous Mapping Theorem](/theorems/1847) does not require $\Phi$ to be continuous everywhere. It requires the limiting random element to land in the continuity set of $\Phi$ with probability one.
We verify that condition. Let $g \in C([0,1])$, and suppose $h_m \to g$ in the Skorokhod $J_1$ topology. Since the limit $g$ is continuous, Skorokhod $J_1$ convergence implies uniform convergence on $[0,1]$. The reverse triangle inequality then gives
\begin{align*}
|\Phi(h_m)-\Phi(g)|
\leq
\sup_{0 \leq t \leq 1}|h_m(t)-g(t)|,
\end{align*}
and the right-hand side tends to $0$. Thus every continuous path is a continuity point of $\Phi$.
Define $\operatorname{Disc}(\Phi)=\{g \in D([0,1]): \Phi \text{ is not continuous at } g\}$ to be the discontinuity set of $\Phi$. The standard Brownian bridge $B$ has almost surely continuous sample paths, so
\begin{align*}
\mathbb{P}(B \in \operatorname{Disc}(\Phi))=0.
\end{align*}
Because the previous step proved $\alpha_n \xrightarrow{d} B$ in $D([0,1])$, the continuous mapping theorem gives
\begin{align*}
\Phi(\alpha_n) \xrightarrow{d} \Phi(B).
\end{align*}
Substituting the definitions of $\Phi$ and $\alpha_n$ yields
\begin{align*}
\sup_{0 \leq t \leq 1}\left|\sqrt{n}(H_n(t)-t)\right|
\xrightarrow{d}
\sup_{0 \leq t \leq 1}|B(t)|.
\end{align*}
Finally, the distribution-free reduction from the first step transfers this convergence back to the original Kolmogorov-Smirnov statistic:
\begin{align*}
\sqrt{n}\sup_{x \in \mathbb{R}} |F_n(x)-F_0(x)|
\xrightarrow{d}
\sup_{0 \leq t \leq 1}|B(t)|.
\end{align*}
[/guided]
[/step]
[step:Compute the Brownian bridge supremum distribution from killed Brownian transition densities]
Let $a>0$, and let $W=(W_t)_{0 \leq t \leq 1}$ be a standard Brownian motion started at $0$. For $t>0$, define the Brownian transition density $p_t: \mathbb{R} \to (0,\infty)$ by
\begin{align*}
p_t(z)=\frac{1}{\sqrt{2\pi t}}e^{-z^2/(2t)}.
\end{align*}
Define the first exit time from $(-a,a)$ by
\begin{align*}
\tau_a=\inf\{t \geq 0: |W_t|=a\}.
\end{align*}
Let $p_{1,a,\mathrm{kill}}(0,0)$ denote the transition density from $0$ to $0$ at time $1$ for Brownian motion killed when it exits $(-a,a)$. The [Reflection Principle for Brownian Motion](/theorems/1181), applied by the standard image method successively to the boundary points $-a$ and $a$, gives the image expansion for the killed transition density. The reflected Gaussian series converges absolutely for $a>0$ at time $1$, and the alternating signs enforce the absorbing boundary condition at both endpoints. More explicitly, the free Gaussian images from $0$ to the points $4ka$ contribute the even-reflection terms, while the images from $0$ to the points $(4k+2)a$ are subtracted. Thus
\begin{align*}
p_{1,a,\mathrm{kill}}(0,0)=\sum_{k \in \mathbb{Z}}\left(p_1(4ka)-p_1((4k+2)a)\right)
=\frac{1}{\sqrt{2\pi}}\sum_{k \in \mathbb{Z}}\left(e^{-8k^2a^2}-e^{-2(2k+1)^2a^2}\right).
\end{align*}
The free transition density from $0$ to $0$ at time $1$ is
\begin{align*}
p_1(0)=\frac{1}{\sqrt{2\pi}}.
\end{align*}
A standard Brownian bridge $B$ on $[0,1]$ is Brownian motion conditioned on $W_1=0$. Equivalently, for the event $A_a=\{\sup_{0 \leq t \leq 1}|W_t|\leq a\}$, the bridge probability is obtained as the small-window conditional limit
\begin{align*}
\lim_{\varepsilon \downarrow 0}\mathbb{P}(A_a\mid |W_1|<\varepsilon)
=
\frac{p_{1,a,\mathrm{kill}}(0,0)}{p_1(0)},
\end{align*}
because the numerator is governed by the killed transition density and the denominator by the free transition density, both evaluated from $0$ to $0$ at time $1$. Therefore
\begin{align*}
\mathbb{P}\left(\sup_{0 \leq t \leq 1}|B(t)| \leq a\right)=\frac{p_{1,a,\mathrm{kill}}(0,0)}{p_1(0)}.
\end{align*}
Substituting the two densities yields
\begin{align*}
\mathbb{P}\left(\sup_{0 \leq t \leq 1}|B(t)| \leq a\right)=\sum_{k \in \mathbb{Z}}\left(e^{-8k^2a^2}-e^{-2(2k+1)^2a^2}\right).
\end{align*}
The first sum is the contribution of even integers $m=2k$, and the second sum is the contribution of odd integers $m=2k+1$. Hence
\begin{align*}
\sum_{k \in \mathbb{Z}}\left(e^{-8k^2a^2}-e^{-2(2k+1)^2a^2}\right)=\sum_{m \in \mathbb{Z}}(-1)^m e^{-2m^2a^2}.
\end{align*}
Pairing the terms $m$ and $-m$ gives
\begin{align*}
\sum_{m \in \mathbb{Z}}(-1)^m e^{-2m^2a^2}=1+2\sum_{m=1}^{\infty}(-1)^m e^{-2m^2a^2}.
\end{align*}
Equivalently,
\begin{align*}
\mathbb{P}\left(\sup_{0 \leq t \leq 1}|B(t)| \leq a\right)=1-2\sum_{m=1}^{\infty}(-1)^{m-1}e^{-2m^2a^2}.
\end{align*}
Renaming the summation index $m$ as $k$ gives the displayed formula in the theorem.
[guided]
It remains to compute the law of the random variable $\sup_{0 \leq t \leq 1}|B(t)|$. The standard Brownian bridge can be realized as Brownian motion conditioned to return to $0$ at time $1$. Thus the event $\sup_{0 \leq t \leq 1}|B(t)| \leq a$ is computed by comparing two Brownian transition densities: the density of paths from $0$ to $0$ in time $1$ that stay inside $(-a,a)$, divided by the unrestricted density of paths from $0$ to $0$ in time $1$.
Let $W=(W_t)_{0 \leq t \leq 1}$ be standard Brownian motion started at $0$. For $t>0$, define $p_t: \mathbb{R} \to (0,\infty)$ by
\begin{align*}
p_t(z)=\frac{1}{\sqrt{2\pi t}}e^{-z^2/(2t)}.
\end{align*}
This is the transition density of $W_t-W_0$. Define the first exit time from the interval $(-a,a)$ by
\begin{align*}
\tau_a=\inf\{t \geq 0: |W_t|=a\}.
\end{align*}
Let $p_{1,a,\mathrm{kill}}(0,0)$ denote the transition density from $0$ to $0$ at time $1$ for Brownian motion killed when it exits $(-a,a)$. The [Reflection Principle for Brownian Motion](/theorems/1181), used through the standard image method at the two boundary points $-a$ and $a$, gives the killed transition density by summing free Gaussian densities over reflected image points. The reflected Gaussian series converges absolutely for $a>0$ at time $1$; the even reflections put image endpoints at $4ka$, while the odd reflections put image endpoints at $(4k+2)a$ and enter with a minus sign to enforce the absorbing boundary condition at both endpoints. Therefore
\begin{align*}
p_{1,a,\mathrm{kill}}(0,0)=\sum_{k \in \mathbb{Z}}\left(p_1(4ka)-p_1((4k+2)a)\right)
=\frac{1}{\sqrt{2\pi}}\sum_{k \in \mathbb{Z}}\left(e^{-8k^2a^2}-e^{-2(2k+1)^2a^2}\right).
\end{align*}
The unrestricted Brownian transition density from $0$ to $0$ at time $1$ is
\begin{align*}
p_1(0)=\frac{1}{\sqrt{2\pi}}.
\end{align*}
Since the Brownian bridge is Brownian motion conditioned on $W_1=0$, we compute the conditioning through shrinking windows around $0$. If $A_a=\{\sup_{0 \leq t \leq 1}|W_t|\leq a\}$, then
\begin{align*}
\mathbb{P}(A_a\mid |W_1|<\varepsilon)
=
\frac{\mathbb{P}(A_a\cap\{|W_1|<\varepsilon\})}{\mathbb{P}(|W_1|<\varepsilon)}.
\end{align*}
As $\varepsilon \downarrow 0$, the numerator is asymptotic to $2\varepsilon\,p_{1,a,\mathrm{kill}}(0,0)$ and the denominator is asymptotic to $2\varepsilon\,p_1(0)$, by continuity of the killed and free transition densities at the endpoint $0$. Hence conditioning by densities gives
\begin{align*}
\mathbb{P}\left(\sup_{0 \leq t \leq 1}|B(t)| \leq a\right)=\frac{p_{1,a,\mathrm{kill}}(0,0)}{p_1(0)}.
\end{align*}
Substitution cancels the common factor $1/\sqrt{2\pi}$ and yields
\begin{align*}
\mathbb{P}\left(\sup_{0 \leq t \leq 1}|B(t)| \leq a\right)=\sum_{k \in \mathbb{Z}}\left(e^{-8k^2a^2}-e^{-2(2k+1)^2a^2}\right).
\end{align*}
Now rewrite the two image sums as a single alternating theta series. The terms $e^{-8k^2a^2}$ are exactly the terms $e^{-2m^2a^2}$ with even $m=2k$, and the terms $e^{-2(2k+1)^2a^2}$ are exactly the terms with odd $m=2k+1$. Therefore
\begin{align*}
\sum_{k \in \mathbb{Z}}\left(e^{-8k^2a^2}-e^{-2(2k+1)^2a^2}\right)=\sum_{m \in \mathbb{Z}}(-1)^m e^{-2m^2a^2}.
\end{align*}
Finally, the $m=0$ term is $1$, and the terms with indices $m$ and $-m$ are equal. Thus
\begin{align*}
\sum_{m \in \mathbb{Z}}(-1)^m e^{-2m^2a^2}=1+2\sum_{m=1}^{\infty}(-1)^m e^{-2m^2a^2}.
\end{align*}
This is the same as
\begin{align*}
1-2\sum_{m=1}^{\infty}(-1)^{m-1}e^{-2m^2a^2}.
\end{align*}
Renaming $m$ as $k$ gives
\begin{align*}
\mathbb{P}\left(\sup_{0 \leq t \leq 1}|B(t)| \leq a\right)=1-2\sum_{k=1}^{\infty}(-1)^{k-1}e^{-2k^2a^2}.
\end{align*}
[/guided]
[/step]
[step:Combine the convergence and the bridge formula]
The preceding steps prove both assertions in the theorem. The distribution-free reduction identifies the original Kolmogorov-Smirnov statistic with the supremum norm of the uniform empirical process, the [Donsker-Kolmogorov-Doob theorem](/theorems/2005) and the [Continuous Mapping Theorem](/theorems/1847) give the limiting Brownian bridge supremum, and the killed-density computation from the [Reflection Principle for Brownian Motion](/theorems/1181) gives its distribution function:
\begin{align*}
\mathbb{P}\left(\sup_{0 \leq t \leq 1}|B(t)| \leq a\right)
=
1-2\sum_{k=1}^{\infty}(-1)^{k-1}e^{-2k^2a^2}.
\end{align*}
This completes the proof.
[/step]