[proofplan]
The proof rewrites every bandwidth-scale increment $Z_n(x)-Z_n(y)$ as the empirical process indexed by the normalized kernel increment class $\mathcal G_{h_n,\delta}$. The $L^2(\mathcal L^d)$ translation continuity of $K$, together with boundedness of the density $f$, shows that the intrinsic $L^2(P)$ size of this class is controlled by $\omega_K(\delta)$. The compactness of $A$, the boundedness and compact support of $K$, the square-integrability of $K$, and the VC-type bounded-envelope hypothesis are used only as the structural assumptions behind the assumed local maximal inequality for these increment classes; after that inequality is invoked, they are not used independently. That maximal bound makes the expected supremum of the empirical increments small first as $n\to\infty$ and then as $\delta\downarrow0$, and [Markov's inequality](/theorems/514) converts the expectation estimate into the asserted probability estimate.
[/proofplan]
[step:Rewrite the process increments as an empirical process over normalized kernel differences]
Fix $\delta>0$ and $n\in\mathbb N$. We use the standard empirical-process convention that the displayed suprema are taken over measurable versions; equivalently, if separability has not been fixed, the same argument is read with outer expectation and outer probability. The kernel density estimator is the map $\hat f_{h_n}:A\to\mathbb R$ defined by
\begin{align*}
\hat f_{h_n}(z)=\frac{1}{n h_n^d}\sum_{i=1}^n K\left(\frac{z-X_i}{h_n}\right),\qquad z\in A.
\end{align*}
The normalized increment class $\mathcal G_{h_n,\delta}$ is the class defined in the theorem statement with bandwidth $h=h_n$. For $x,y\in A$ with $|x-y|\le h_n\delta$, let $g_{x,y,h_n}:\mathbb R^d\to\mathbb R$ be the measurable map defined, for $u\in\mathbb R^d$, by
\begin{align*}
g_{x,y,h_n}(u)=h_n^{-d/2}\left(K\left(\frac{x-u}{h_n}\right)-K\left(\frac{y-u}{h_n}\right)\right).
\end{align*}
Then $g_{x,y,h_n}\in\mathcal G_{h_n,\delta}$ by definition. Using the definition of $\hat f_{h_n}$ and distributing the centering,
\begin{align*}
Z_n(x)-Z_n(y)=\frac{1}{\sqrt n}\sum_{i=1}^n\left(g_{x,y,h_n}(X_i)-\mathbb E[g_{x,y,h_n}(X_i)]\right).
\end{align*}
Therefore
\begin{align*}
\sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}}
|Z_n(x)-Z_n(y)|
\le
\sup_{g\in\mathcal G_{h_n,\delta}}
\left|
\frac{1}{\sqrt n}
\sum_{i=1}^n
\left(g(X_i)-\mathbb E[g(X_i)]\right)
\right|.
\end{align*}
[guided]
The normalization in the theorem-statement definition of $\mathcal G_{h,\delta}$ is chosen so that the kernel-density scaling disappears exactly. The kernel density estimator is the map $\hat f_{h_n}:A\to\mathbb R$ defined by
\begin{align*}
\hat f_{h_n}(z)=\frac{1}{n h_n^d}\sum_{i=1}^n K\left(\frac{z-X_i}{h_n}\right),\qquad z\in A.
\end{align*}
In this step we use the class $\mathcal G_{h_n,\delta}$ obtained by setting $h=h_n$. Let $x,y\in A$ satisfy $|x-y|\le h_n\delta$, and let $g_{x,y,h_n}:\mathbb R^d\to\mathbb R$ be the measurable map defined, for $u\in\mathbb R^d$, by
\begin{align*}
g_{x,y,h_n}(u)=h_n^{-d/2}\left(K\left(\frac{x-u}{h_n}\right)-K\left(\frac{y-u}{h_n}\right)\right).
\end{align*}
This function belongs to $\mathcal G_{h_n,\delta}$ because the pair $(x,y)$ satisfies the required bandwidth-scale separation.
Now compute the increment of the centred process directly. The scalar factor satisfies
\begin{align*}
\frac{\sqrt{n h_n^d}}{n h_n^d}=\frac{1}{\sqrt n\, h_n^{d/2}},
\end{align*}
and therefore
\begin{align*}
Z_n(x)-Z_n(y)=\frac{1}{\sqrt n}\sum_{i=1}^n\left(g_{x,y,h_n}(X_i)-\mathbb E[g_{x,y,h_n}(X_i)]\right).
\end{align*}
Taking the supremum over all admissible pairs $(x,y)$ gives a supremum over a subcollection of $\mathcal G_{h_n,\delta}$, so
\begin{align*}
\sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}}
|Z_n(x)-Z_n(y)|
\le
\sup_{g\in\mathcal G_{h_n,\delta}}
\left|
\frac{1}{\sqrt n}
\sum_{i=1}^n
\left(g(X_i)-\mathbb E[g(X_i)]\right)
\right|.
\end{align*}
This is the essential reduction: stochastic equicontinuity of $Z_n$ at spatial scale $h_n$ becomes a maximal inequality for an empirical process indexed by normalized kernel increments.
[/guided]
[/step]
[step:Control the intrinsic $L^2(P)$ size of the increment class by the translation modulus]
Let $P$ denote the law of $X_1$, that is, the probability measure on $(\mathbb R^d,\mathcal B(\mathbb R^d))$ satisfying $P(B)=\mathbb P(X_1\in B)$ for every $B\in\mathcal B(\mathbb R^d)$. Let $M_f:=\|f\|_\infty$. For $g_{x,y,h}\in\mathcal G_{h,\delta}$, define $t_{x,y,h}:=(y-x)/h\in\mathbb R^d$. Since $|x-y|\le h\delta$, we have $|t_{x,y,h}|\le\delta$. Using the density of $X_1$ with respect to $\mathcal L^d$,
\begin{align*}
\mathbb E[g_{x,y,h}(X_1)^2]=\int_{\mathbb R^d}h^{-d}\left|K\left(\frac{x-u}{h}\right)-K\left(\frac{y-u}{h}\right)\right|^2 f(u)\,d\mathcal L^d(u).
\end{align*}
Since $f(u)\le M_f$ for $\mathcal L^d$-almost every $u\in\mathbb R^d$, this gives
\begin{align*}
\mathbb E[g_{x,y,h}(X_1)^2]\le M_f\int_{\mathbb R^d}h^{-d}\left|K\left(\frac{x-u}{h}\right)-K\left(\frac{y-u}{h}\right)\right|^2\,d\mathcal L^d(u).
\end{align*}
Apply the change-of-variables formula for [Lebesgue measure](/page/Lebesgue%20Measure) with the affine substitution $v=(x-u)/h$, equivalently $u=x-hv$. The Jacobian determinant of the inverse map $v\mapsto x-hv$ has absolute value $h^d$, so $d\mathcal L^d(u)=h^d\,d\mathcal L^d(v)$, and the domain $\mathbb R^d$ maps onto $\mathbb R^d$. Thus
\begin{align*}
\mathbb E[g_{x,y,h}(X_1)^2]\le M_f\int_{\mathbb R^d}|K(v)-K(v+t_{x,y,h})|^2\,d\mathcal L^d(v).
\end{align*}
Because $|t_{x,y,h}|\le\delta$, the definition of $\omega_K(\delta)$ gives
\begin{align*}
\mathbb E[g_{x,y,h}(X_1)^2]\le M_f\,\omega_K(\delta)^2.
\end{align*}
Since $\operatorname{Var}(g_{x,y,h}(X_1))\le \mathbb E[g_{x,y,h}(X_1)^2]$, it follows that
\begin{align*}
\sup_{g\in\mathcal G_{h,\delta}}
\operatorname{Var}(g(X_1))
\le
M_f\,\omega_K(\delta)^2.
\end{align*}
[/step]
[step:Apply the local VC maximal bound to the increment supremum]
For each $n$, set
\begin{align*}
S_{n,\delta}
:=
\sup_{g\in\mathcal G_{h_n,\delta}}
\left|
\frac{1}{\sqrt n}
\sum_{i=1}^n
\left(g(X_i)-\mathbb E[g(X_i)]\right)
\right|.
\end{align*}
The local VC-type maximal bound is used here as an assumed empirical-process estimate for these increment classes; the preceding $L^2(P)$ computation records the radius input that motivates that hypothesis. It supplies constants $h_0,C_0,A_0>0$ such that the displayed maximal inequality holds for every $0<h\le h_0$, every $n\in\mathbb N$, and every $\delta>0$ with $\omega_K(\delta)>0$. Because $h_n\downarrow0$, there exists $N\in\mathbb N$ such that $h_n\le h_0$ for all $n\ge N$. For the limiting argument it is enough to work with sufficiently small $\delta>0$ for which $\omega_K(\delta)>0$ and
\begin{align*}
\frac{A_0}{\omega_K(\delta)}>1;
\end{align*}
the latter positivity is part of the maximal-bound hypothesis. For such $\delta$ and every $n\ge N$, applying the theorem-statement maximal bound with $h=h_n$ gives
\begin{align*}
\mathbb E[S_{n,\delta}]
\le
C_0\,\omega_K(\delta)\sqrt{\log\frac{A_0}{\omega_K(\delta)}}
+
C_0\,\frac{\log(A_0/\omega_K(\delta))}{\sqrt{n h_n^d}}.
\end{align*}
The condition $(n h_n^d)/\log n\to\infty$ implies $n h_n^d\to\infty$. Therefore, for every such fixed $\delta$,
\begin{align*}
\limsup_{n\to\infty}\mathbb E[S_{n,\delta}]
\le
C_0\,\omega_K(\delta)\sqrt{\log\frac{A_0}{\omega_K(\delta)}}.
\end{align*}
The classes are monotone in $\delta$: if $0<\delta_1\le\delta_2$, then $\mathcal G_{h_n,\delta_1}\subseteq\mathcal G_{h_n,\delta_2}$ and hence $S_{n,\delta_1}\le S_{n,\delta_2}$. Thus possible values with $\omega_K(\delta)=0$ are controlled as follows. If there are admissible radii $\delta_2>\delta$ arbitrarily close to $\delta$, monotonicity bounds $S_{n,\delta}$ by $S_{n,\delta_2}$ and then lets $\delta_2\downarrow\delta$. If instead $\omega_K$ vanishes on an interval $(0,\delta_0)$, then for every $|t|<\delta_0$ the translation identity $K(\cdot+t)=K(\cdot)$ holds in $L^2(\mathcal L^d)$, so the preceding $L^2(P)$ estimate gives $\operatorname{Var}(g(X_1))=0$ for every $g\in\mathcal G_{h,\delta}$ with $0<\delta<\delta_0$. Under the measurable-version convention fixed in the first step, equivalently under the corresponding outer expectation and outer probability formulation, the centred empirical-process version of each such increment is zero, so the chosen supremum satisfies $S_{n,\delta}=0$ almost surely. Since $\omega_K(\delta)\to0$ as $\delta\downarrow0$ and $r\sqrt{\log(A_0/r)}\to0$ as $r\downarrow0$, this monotonicity and zero-radius alternative give
\begin{align*}
\lim_{\delta\downarrow0}
\limsup_{n\to\infty}
\mathbb E[S_{n,\delta}]
=
0.
\end{align*}
[guided]
The previous step identifies the correct intrinsic scale of the class: its $L^2(P)$ radius is at most a constant multiple of $\omega_K(\delta)$, where $P$ denotes the law of $X_1$. The theorem assumes precisely the empirical-process input needed at this scale; the compactness of $A$, the compact support and square-integrability of $K$, and the VC-type bounded-envelope assumptions are used here only as the background hypotheses that make that local maximal inequality available. Define the non-negative [random variable](/page/Random%20Variable)
\begin{align*}
S_{n,\delta}
:=
\sup_{g\in\mathcal G_{h_n,\delta}}
\left|
\frac{1}{\sqrt n}
\sum_{i=1}^n
\left(g(X_i)-\mathbb E[g(X_i)]\right)
\right|.
\end{align*}
The local VC-type maximal bound is used as an assumed empirical-process estimate for these classes; the preceding $L^2(P)$ computation explains the radius scale but is not an additional condition needed to invoke the bound. It provides constants $h_0,C_0,A_0>0$ such that, whenever $0<h\le h_0$ and $\omega_K(\delta)>0$, the expected supremum over $\mathcal G_{h,\delta}$ is bounded by the two displayed terms. The role of the compactness of $A$, the boundedness and compact support of $K$, and the VC-type structure is encoded in that hypothesis; the present proof only needs to check that the bandwidth sequence eventually lies in the allowed range. Since $h_n\downarrow0$, there exists $N\in\mathbb N$ such that $h_n\le h_0$ for every $n\ge N$.
There is one endpoint issue to handle before taking logarithms. We apply the maximal inequality only for sufficiently small $\delta>0$ satisfying $\omega_K(\delta)>0$ and $A_0/\omega_K(\delta)>1$, exactly as required by the theorem-statement hypothesis. This loses no information in the final limit. Indeed, if $0<\delta_1\le\delta_2$, then the defining constraint $|x-y|\le h_n\delta_1$ implies $|x-y|\le h_n\delta_2$, so $\mathcal G_{h_n,\delta_1}\subseteq\mathcal G_{h_n,\delta_2}$ and therefore $S_{n,\delta_1}\le S_{n,\delta_2}$. If admissible radii with positive $\omega_K$ occur arbitrarily close above $\delta_1$, this monotonicity bounds the radius $\delta_1$ by those nearby admissible radii. If no such nearby admissible radii exist, then $\omega_K$ vanishes on a whole interval $(0,\delta_0)$; for every $|t|<\delta_0$, the equality $\|K(\cdot+t)-K\|_{L^2(\mathcal L^d)}=0$ means $K(\cdot+t)=K$ in $L^2(\mathcal L^d)$. The previous $L^2(P)$ estimate then gives $\operatorname{Var}(g(X_1))=0$ for every $g\in\mathcal G_{h,\delta}$ with $0<\delta<\delta_0$. Under the measurable-version convention fixed at the start of the proof, equivalently under outer expectation and outer probability, the centred empirical-process version of every such increment is zero in the chosen supremum. Therefore $S_{n,\delta}=0$ almost surely for those radii. For every admissible $\delta$ and every $n\ge N$, the theorem-statement maximal bound applies with $h=h_n$ and gives
\begin{align*}
\mathbb E[S_{n,\delta}]
\le
C_0\,\omega_K(\delta)\sqrt{\log\frac{A_0}{\omega_K(\delta)}}
+
C_0\,\frac{\log(A_0/\omega_K(\delta))}{\sqrt{n h_n^d}}.
\end{align*}
Here the first term is the Gaussian-size term controlled by the local $L^2(P)$ diameter, and the second term is the residual bounded-envelope contribution.
Now fix such an admissible $\delta>0$. The positive number $\log(A_0/\omega_K(\delta))$ is independent of $n$. The bandwidth hypothesis
\begin{align*}
\frac{n h_n^d}{\log n}\to\infty
\end{align*}
implies $n h_n^d\to\infty$, so
\begin{align*}
\frac{\log(A_0/\omega_K(\delta))}{\sqrt{n h_n^d}}\to0
\end{align*}
as $n\to\infty$. Taking the upper limit in $n$ therefore yields
\begin{align*}
\limsup_{n\to\infty}\mathbb E[S_{n,\delta}]
\le
C_0\,\omega_K(\delta)\sqrt{\log\frac{A_0}{\omega_K(\delta)}}.
\end{align*}
Finally, the assumed $L^2(\mathcal L^d)$ translation continuity gives $\omega_K(\delta)\to0$ as $\delta\downarrow0$. The elementary limit
\begin{align*}
r\sqrt{\log(A_0/r)}\to0
\end{align*}
as $r\downarrow0$ applies with $r=\omega_K(\delta)$ along admissible radii. The monotonicity $S_{n,\delta_1}\le S_{n,\delta_2}$ for $\delta_1\le\delta_2$ extends the same limiting bound to all $\delta\downarrow0$. Hence
\begin{align*}
\lim_{\delta\downarrow0}
\limsup_{n\to\infty}
\mathbb E[S_{n,\delta}]
=
0.
\end{align*}
This is the analytic core of the proof: the bandwidth scale does not require pointwise Lipschitz regularity of $K$; the $L^2$ translation modulus is enough once the class has a local VC-type maximal inequality.
[/guided]
[/step]
[step:Convert the expectation bound into the required probability limit]
Let $\varepsilon>0$. From the first step,
\begin{align*}
\mathbb P\left(
\sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}}
|Z_n(x)-Z_n(y)|>\varepsilon
\right)
\le
\mathbb P(S_{n,\delta}>\varepsilon).
\end{align*}
Since $S_{n,\delta}\ge0$, [Markov's inequality](/theorems/514) gives
\begin{align*}
\mathbb P(S_{n,\delta}>\varepsilon)
\le
\frac{1}{\varepsilon}\mathbb E[S_{n,\delta}].
\end{align*}
Taking $\limsup_{n\to\infty}$ and then $\delta\downarrow0$,
\begin{align*}
\lim_{\delta\downarrow0}\limsup_{n\to\infty}
\mathbb P\left(
\sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}}
|Z_n(x)-Z_n(y)|>\varepsilon
\right)
\le
\frac{1}{\varepsilon}
\lim_{\delta\downarrow0}\limsup_{n\to\infty}\mathbb E[S_{n,\delta}].
\end{align*}
The expectation limit from the previous step is zero, so
\begin{align*}
\lim_{\delta\downarrow0}\limsup_{n\to\infty}
\mathbb P\left(
\sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}}
|Z_n(x)-Z_n(y)|>\varepsilon
\right)
=0.
\end{align*}
The left-hand side is non-negative, so it equals $0$. This proves the asserted bandwidth-scale stochastic equicontinuity.
[guided]
We now translate the expected empirical-process bound into the probability statement in the theorem. Fix $\varepsilon>0$. The first step proved the pointwise comparison
\begin{align*}
\sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}}
|Z_n(x)-Z_n(y)|
\le S_{n,\delta}.
\end{align*}
Therefore the event on the left is contained in the event $\{S_{n,\delta}>\varepsilon\}$, and hence
\begin{align*}
\mathbb P\left(
\sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}}
|Z_n(x)-Z_n(y)|>\varepsilon
\right)
\le
\mathbb P(S_{n,\delta}>\varepsilon).
\end{align*}
The random variable $S_{n,\delta}$ is non-negative by definition, so [Markov's inequality](/theorems/514) applies and gives
\begin{align*}
\mathbb P(S_{n,\delta}>\varepsilon)
\le
\frac{1}{\varepsilon}\mathbb E[S_{n,\delta}].
\end{align*}
Taking the upper limit in $n$ and then letting $\delta\downarrow0$ yields
\begin{align*}
\lim_{\delta\downarrow0}\limsup_{n\to\infty}
\mathbb P\left(
\sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}}
|Z_n(x)-Z_n(y)|>\varepsilon
\right)
\le
\frac{1}{\varepsilon}
\lim_{\delta\downarrow0}\limsup_{n\to\infty}
\mathbb E[S_{n,\delta}].
\end{align*}
The previous step proved that the expectation limit on the right-hand side is $0$. Since probabilities are non-negative, the left-hand side is both at most $0$ and at least $0$, so it equals $0$. This is exactly the asserted bandwidth-scale stochastic equicontinuity.
[/guided]
[/step]