[proofplan]
We split the uniform error into a stochastic fluctuation term and a deterministic bias term. The bias is the difference between $Pg_{x,h}$ and $f(x)$, and it vanishes uniformly because $f$ is uniformly continuous while $K$ is compactly supported and has integral $1$. The stochastic term is controlled by a VC-type maximal inequality for the empirical process indexed by the bandwidth-scaled kernel class. The bandwidth assumptions force both the square-root and linear empirical-process rates to vanish, so the two pieces combine to give convergence in probability.
[/proofplan]
[step:Decompose the estimator into empirical fluctuation and convolution bias]
Let $P$ denote the common law of $X_1$, and for each $n \in \mathbb{N}$ let $P_n$ denote the empirical probability measure
\begin{align*}
P_n(B):=\frac{1}{n}\sum_{i=1}^n \mathbb{1}_B(X_i)
\end{align*}
for Borel sets $B \subset \mathbb{R}^d$. For $h>0$ and $x \in \mathbb{R}^d$, define the measurable function $g_{x,h}: \mathbb{R}^d \to \mathbb{R}$ by
\begin{align*}
g_{x,h}(u)=h^{-d}K\left(\frac{x-u}{h}\right).
\end{align*}
The kernel density estimator at bandwidth $h$ is therefore
\begin{align*}
\hat f_h(x) = P_n g_{x,h}.
\end{align*}
Also,
\begin{align*}
\mathbb{E}[\hat f_h(x)] = P g_{x,h}.
\end{align*}
Hence, for every compact set $A \subset \mathbb{R}^d$,
\begin{align*}
\|\hat f_h-f\|_{\infty,A}
\leq
\sup_{x \in A}|(P_n-P)g_{x,h}|
+
\sup_{x \in A}|Pg_{x,h}-f(x)|.
\end{align*}
It remains to prove that the two terms on the right converge to $0$ in probability, respectively deterministically, when $h=h_n$.
[/step]
[step:Show that the convolution bias vanishes uniformly on compact sets]
Let $R>0$ be such that $\operatorname{supp}K \subset \overline{B}(0,R)$. Since $K$ is bounded and compactly supported, $K \in L^1(\mathbb{R}^d,\mathcal{B}(\mathbb{R}^d),\mathcal{L}^d)$, and we define
\begin{align*}
M_K:=\int_{\mathbb{R}^d}|K(z)|\,d\mathcal{L}^d(z)<\infty.
\end{align*}
For $x \in \mathbb{R}^d$ and $h>0$, the substitution $z=(x-u)/h$, equivalently $u=x-hz$, transforms $d\mathcal{L}^d(u)$ into $h^d\,d\mathcal{L}^d(z)$ and gives
\begin{align*}
Pg_{x,h}=\int_{\mathbb{R}^d} h^{-d}K\left(\frac{x-u}{h}\right)f(u)\,d\mathcal{L}^d(u).
\end{align*}
After this change of variables, the same quantity is
\begin{align*}
Pg_{x,h}=\int_{\mathbb{R}^d}K(z)f(x-hz)\,d\mathcal{L}^d(z).
\end{align*}
Using $\int_{\mathbb{R}^d}K(z)\,d\mathcal{L}^d(z)=1$, we obtain
\begin{align*}
|Pg_{x,h}-f(x)|\leq \int_{\mathbb{R}^d}|K(z)|\,|f(x-hz)-f(x)|\,d\mathcal{L}^d(z).
\end{align*}
Since $K(z)=0$ for $z \notin \overline{B}(0,R)$, this bound becomes
\begin{align*}
|Pg_{x,h}-f(x)|\leq \int_{\overline{B}(0,R)}|K(z)|\,|f(x-hz)-f(x)|\,d\mathcal{L}^d(z).
\end{align*}
Because $f$ is uniformly continuous on $\mathbb{R}^d$, for every $\varepsilon>0$ there exists $\delta>0$ such that
\begin{align*}
|f(y)-f(x)|<\frac{\varepsilon}{1+M_K}
\end{align*}
whenever $|y-x|<\delta$. If $h>0$ satisfies
\begin{align*}
0<h<\frac{\delta}{1+R}
\end{align*}
and $z \in \overline{B}(0,R)$, then $|x-hz-x|\leq hR<\delta$, and therefore
\begin{align*}
\sup_{x \in A}|Pg_{x,h}-f(x)|
\leq
\frac{\varepsilon}{1+M_K}M_K
<\varepsilon.
\end{align*}
Thus
\begin{align*}
\sup_{x \in A}|Pg_{x,h}-f(x)| \to 0
\end{align*}
as $h \downarrow 0$.
[guided]
The deterministic term is the bias of the kernel estimator. We first rewrite it as an approximate identity acting on $f$. Let $R>0$ be chosen so that $\operatorname{supp}K \subset \overline{B}(0,R)$. Since $K$ is bounded and vanishes outside the compact set $\overline{B}(0,R)$, it is absolutely integrable with respect to [Lebesgue measure](/page/Lebesgue%20Measure), so the finite constant
\begin{align*}
M_K:=\int_{\mathbb{R}^d}|K(z)|\,d\mathcal{L}^d(z)
\end{align*}
is well-defined.
For $x \in \mathbb{R}^d$ and $h>0$, compute the expectation using the density of $X_1$. The map $u \mapsto h^{-d}K((x-u)/h)$ is Borel measurable and bounded because $K$ is Borel measurable and bounded. Thus
\begin{align*}
Pg_{x,h}
=
\int_{\mathbb{R}^d} h^{-d}K\left(\frac{x-u}{h}\right)f(u)\,d\mathcal{L}^d(u).
\end{align*}
Now use the affine change of variables $z=(x-u)/h$, or $u=x-hz$. The Jacobian determinant has absolute value $h^d$, so
\begin{align*}
d\mathcal{L}^d(u)=h^d\,d\mathcal{L}^d(z).
\end{align*}
The whole domain $\mathbb{R}^d$ maps onto $\mathbb{R}^d$, and therefore
\begin{align*}
Pg_{x,h}
=
\int_{\mathbb{R}^d}K(z)f(x-hz)\,d\mathcal{L}^d(z).
\end{align*}
Since the kernel has unit integral,
\begin{align*}
f(x)=f(x)\int_{\mathbb{R}^d}K(z)\,d\mathcal{L}^d(z).
\end{align*}
Subtracting this identity from the previous display and applying the triangle inequality gives
\begin{align*}
|Pg_{x,h}-f(x)|\leq \int_{\mathbb{R}^d}|K(z)|\,|f(x-hz)-f(x)|\,d\mathcal{L}^d(z).
\end{align*}
Since $K(z)=0$ for $z \notin \overline{B}(0,R)$, the integral over $\mathbb{R}^d$ is the same as the integral over $\overline{B}(0,R)$, so
\begin{align*}
|Pg_{x,h}-f(x)|\leq \int_{\overline{B}(0,R)}|K(z)|\,|f(x-hz)-f(x)|\,d\mathcal{L}^d(z).
\end{align*}
The reason compact support matters is that $hz$ is uniformly small for every $z$ contributing to the integral. Let $\varepsilon>0$. [Uniform continuity](/page/Uniform%20Continuity) of $f$ on all of $\mathbb{R}^d$ gives a number $\delta>0$ such that
\begin{align*}
|y-x|<\delta
\implies
|f(y)-f(x)|<\frac{\varepsilon}{1+M_K}
\end{align*}
for all $x,y \in \mathbb{R}^d$. If $h>0$ satisfies
\begin{align*}
0<h<\frac{\delta}{1+R}
\end{align*}
and $z \in \overline{B}(0,R)$, then
\begin{align*}
|x-hz-x|=h|z|\leq hR<\delta.
\end{align*}
Hence
\begin{align*}
\sup_{x \in A}|Pg_{x,h}-f(x)|\leq \int_{\overline{B}(0,R)}|K(z)|\frac{\varepsilon}{1+M_K}\,d\mathcal{L}^d(z).
\end{align*}
Using the definition of $M_K$, this gives
\begin{align*}
\sup_{x \in A}|Pg_{x,h}-f(x)|\leq \frac{\varepsilon M_K}{1+M_K}<\varepsilon.
\end{align*}
This proves that the bias converges to zero uniformly on $A$ as $h \downarrow 0$.
[/guided]
[/step]
[step:Control the empirical fluctuation by the VC kernel maximal inequality]
Define, for each $h>0$ and compact $A \subset \mathbb{R}^d$, the function class
\begin{align*}
\mathcal{G}_{h,A}:=\{g_{x,h}: x \in A\}.
\end{align*}
We use the VC-subgraph and pointwise-measurability assumptions on $\mathcal K$. For fixed $h>0$ and compact $A$, the restricted scaled class $\mathcal{G}_{h,A}$ is pointwise measurable because pointwise measurability is inherited by restricting the index set and multiplying every function by the fixed positive scalar $h^{-d}$. The subgraph VC dimension is unchanged by multiplication by the positive scalar $h^{-d}$, because
\begin{align*}
\{(u,t):h^{-d}K((x-u)/h)>t\}=\{(u,t):K((x-u)/h)>h^d t\},
\end{align*}
and the map $(u,t)\mapsto (u,h^dt)$ is a bijection of $\mathbb{R}^d\times\mathbb{R}$ preserving shattering relations.
Let $B_K:=\sup_{z \in \mathbb{R}^d}|K(z)|<\infty$. Define the constant envelope $G_h: \mathbb{R}^d \to [0,\infty)$ by
\begin{align*}
G_h(u)=h^{-d}B_K.
\end{align*}
This is a measurable envelope for $\mathcal{G}_{h,A}$. Since $\mathcal{K}$ is VC-subgraph, there exist constants $a\geq e$ and $v\geq 1$, depending only on the VC characteristics of $\mathcal{K}$, such that for every probability measure $Q$ on $\mathbb{R}^d$ and every $0<\eta\leq 1$,
\begin{align*}
N\left(\eta\|G_h\|_{L^2(Q)},\mathcal{G}_{h,A},L^2(Q)\right)
\leq
\left(\frac{a}{\eta}\right)^v.
\end{align*}
This is the polynomial entropy hypothesis required by the outer-probability VC maximal inequality.
The variance is bounded uniformly in $x \in A$. First,
\begin{align*}
P g_{x,h}^2=\int_{\mathbb{R}^d}h^{-2d}K\left(\frac{x-u}{h}\right)^2 f(u)\,d\mathcal{L}^d(u).
\end{align*}
Using the affine substitution $z=(x-u)/h$, equivalently $u=x-hz$, the measure transforms as $d\mathcal{L}^d(u)=h^d\,d\mathcal{L}^d(z)$ and the domain remains $\mathbb{R}^d$. Hence
\begin{align*}
P g_{x,h}^2=h^{-d}\int_{\mathbb{R}^d}K(z)^2f(x-hz)\,d\mathcal{L}^d(z).
\end{align*}
Since $f$ is bounded and $K$ is bounded with compact support,
\begin{align*}
P g_{x,h}^2\leq h^{-d}\|f\|_{\infty}\int_{\mathbb{R}^d}K(z)^2\,d\mathcal{L}^d(z).
\end{align*}
Set
\begin{align*}
C_{K,f}:=
\|f\|_{\infty}
\int_{\mathbb{R}^d}K(z)^2\,d\mathcal{L}^d(z)<\infty.
\end{align*}
Thus $\sup_{g\in\mathcal{G}_{h,A}}Pg^2\leq C_{K,f}h^{-d}$.
We use the following VC empirical-process maximal inequality with Bernstein-Talagrand concentration, in the form of Theorem 2.14.1 in van der Vaart and Wellner's empirical-process theory together with the standard Bernstein-Talagrand tail integration: if a pointwise measurable class $\mathcal{F}$ has measurable envelope $F$, polynomial covering numbers
\begin{align*}
N\left(\eta\|F\|_{L^2(Q)},\mathcal{F},L^2(Q)\right)\leq \left(\frac{a}{\eta}\right)^v
\end{align*}
for all probability measures $Q$ and all $0<\eta\leq 1$, and if $\sup_{g\in\mathcal F}Pg^2\leq \sigma^2$, then the measurable [random variable](/page/Random%20Variable) $\sup_{g\in\mathcal F}|(P_n-P)g|$ satisfies
\begin{align*}
\sup_{g\in\mathcal F}|(P_n-P)g|
=
O_{\mathbb P}\left(\sqrt{\frac{\sigma^2\Lambda}{n}}\right)
+
O_{\mathbb P}\left(\frac{\|F\|_{\infty}\Lambda}{n}\right),
\end{align*}
where
\begin{align*}
\Lambda:=1+\log\left(\frac{a\|F\|_{\infty}}{\sigma}\right),
\end{align*}
and the implicit constants depend only on $a$ and $v$. Apply this result with $\mathcal F=\mathcal G_{h,A}$, $F=G_h$, and $\sigma^2=C_{K,f}h^{-d}$. The preceding paragraphs verify pointwise measurability, polynomial entropy, measurability of the envelope, and the variance bound. Moreover $\|G_h\|_{\infty}=B_Kh^{-d}$, so
\begin{align*}
\Lambda
=
1+\log\left(\frac{aB_Kh^{-d}}{C_{K,f}^{1/2}h^{-d/2}}\right)
\leq C_1\log(e/h)
\end{align*}
for a constant $C_1=C_1(a,B_K,C_{K,f},d)>0$ and all sufficiently small $h>0$. Therefore there is a constant $C_2=C_2(a,v,B_K,C_{K,f},d)>0$ such that along $h=h_n$,
\begin{align*}
\sup_{x \in A}|(P_n-P)g_{x,h}|
=
O_{\mathbb{P}}\left(\sqrt{\frac{\log(e/h)}{n h^d}}\right)+O_{\mathbb{P}}\left(\frac{\log(e/h)}{n h^d}\right)
\end{align*}
as $n \to \infty$ and $h=h_n \downarrow 0$.
[guided]
The technically important point is that the empirical-process theorem is applied to the scaled class $\mathcal G_{h,A}$, so both the variance and the envelope depend on $h$. We verify the hypotheses from scratch. For fixed $h>0$ and compact $A \subset \mathbb{R}^d$, define
\begin{align*}
\mathcal{G}_{h,A}:=\{g_{x,h}:x\in A\},
\end{align*}
where $g_{x,h}:\mathbb{R}^d\to\mathbb{R}$ is the measurable function
\begin{align*}
g_{x,h}(u)=h^{-d}K\left(\frac{x-u}{h}\right).
\end{align*}
We use an outer-probability version of the maximal inequality, so the restricted class $\mathcal G_{h,A}$ need not be separately shown to be pointwise measurable. The VC-subgraph property is still available for the scaled class: multiplication by $h^{-d}>0$ sends subgraphs to subgraphs after the vertical change of variables $t\mapsto h^dt$, and this change preserves the relevant shattering numbers. The pointwise-measurability property is also inherited by restricting the original index set and multiplying the resulting countable approximating subclass by $h^{-d}$. If
\begin{align*}
B_K:=\sup_{z\in\mathbb{R}^d}|K(z)|,
\end{align*}
then the constant function $G_h:\mathbb{R}^d\to[0,\infty)$ defined by
\begin{align*}
G_h(u)=h^{-d}B_K
\end{align*}
is a measurable envelope for $\mathcal G_{h,A}$, and
\begin{align*}
\|G_h\|_\infty=B_Kh^{-d}.
\end{align*}
Because the VC-subgraph property is preserved under multiplication by a positive scalar, the scaled class has polynomial entropy: there are constants $a\geq e$ and $v\geq 1$, depending only on the VC characteristics of the kernel class, such that for every probability measure $Q$ on $\mathbb{R}^d$ and every $0<\eta\leq 1$,
\begin{align*}
N\left(\eta\|G_h\|_{L^2(Q)},\mathcal{G}_{h,A},L^2(Q)\right)
\leq
\left(\frac{a}{\eta}\right)^v.
\end{align*}
It remains to compute the second-moment scale. For $x\in A$,
\begin{align*}
Pg_{x,h}^2=\int_{\mathbb{R}^d}h^{-2d}K\left(\frac{x-u}{h}\right)^2f(u)\,d\mathcal{L}^d(u).
\end{align*}
Use the affine substitution $z=(x-u)/h$, equivalently $u=x-hz$. This maps $\mathbb{R}^d$ onto $\mathbb{R}^d$ and transforms the measure by
\begin{align*}
d\mathcal{L}^d(u)=h^d\,d\mathcal{L}^d(z).
\end{align*}
Therefore
\begin{align*}
Pg_{x,h}^2=h^{-d}\int_{\mathbb{R}^d}K(z)^2f(x-hz)\,d\mathcal{L}^d(z).
\end{align*}
Since $f$ is bounded and $K$ is bounded with compact support, define
\begin{align*}
C_{K,f}:=
\|f\|_{\infty}
\int_{\mathbb{R}^d}K(z)^2\,d\mathcal{L}^d(z)<\infty.
\end{align*}
Then
\begin{align*}
\sup_{g\in\mathcal G_{h,A}}Pg^2\leq C_{K,f}h^{-d}.
\end{align*}
We now apply the VC empirical-process maximal inequality with Bernstein-Talagrand concentration, in the cited entropy form for pointwise measurable VC-type classes. Its hypotheses in this formulation are pointwise measurability, the measurable envelope, polynomial entropy, and the second-moment bound verified above. Applying it with $\mathcal F=\mathcal G_{h,A}$, $F=G_h$, and $\sigma^2=C_{K,f}h^{-d}$ gives the ordinary-probability estimate
\begin{align*}
\sup_{g\in\mathcal G_{h,A}}|(P_n-P)g|
=
O_{\mathbb P}\left(\sqrt{\frac{\sigma^2\Lambda}{n}}\right)
+
O_{\mathbb P}\left(\frac{\|G_h\|_\infty\Lambda}{n}\right),
\end{align*}
where
\begin{align*}
\Lambda:=1+\log\left(\frac{a\|G_h\|_\infty}{\sigma}\right).
\end{align*}
Substituting $\|G_h\|_\infty=B_Kh^{-d}$ and $\sigma=C_{K,f}^{1/2}h^{-d/2}$ yields
\begin{align*}
\Lambda
=
1+\log\left(\frac{aB_Kh^{-d}}{C_{K,f}^{1/2}h^{-d/2}}\right)
\leq C_1\log(e/h)
\end{align*}
for a constant $C_1=C_1(a,B_K,C_{K,f},d)>0$ and all sufficiently small $h>0$. This logarithmic factor is the entropy logarithm evaluated at the ratio of the envelope size to the standard-deviation scale. Substituting this bound and $\sigma^2=C_{K,f}h^{-d}$ into the maximal inequality gives
\begin{align*}
\sup_{x \in A}|(P_n-P)g_{x,h}|
=
O_{\mathbb{P}}\left(\sqrt{\frac{\log(e/h)}{n h^d}}\right)+O_{\mathbb{P}}\left(\frac{\log(e/h)}{n h^d}\right),
\end{align*}
with constants depending only on $a$, $v$, $B_K$, $C_{K,f}$, and $d$.
[/guided]
[/step]
[step:Use the bandwidth assumptions to force the stochastic term to vanish]
Since $\log(1/h_n)=O(\log n)$, there is a constant $C_h>0$ and an integer $N_h \in \mathbb{N}$ such that
\begin{align*}
\log(e/h_n) \leq C_h \log n
\end{align*}
for all $n \geq N_h$. Therefore
\begin{align*}
\frac{\log(e/h_n)}{n h_n^d}
\leq
C_h\frac{\log n}{n h_n^d}
\to 0
\end{align*}
because $n h_n^d/\log n \to \infty$. It follows also that
\begin{align*}
\sqrt{\frac{\log(e/h_n)}{n h_n^d}}\to 0.
\end{align*}
The stochastic estimate from the previous step therefore gives
\begin{align*}
\sup_{x \in A}|(P_n-P)g_{x,h_n}|
\xrightarrow{\mathbb{P}}0.
\end{align*}
This is ordinary convergence in probability because the pointwise-measurability hypothesis made the indexed supremum a measurable random variable in the maximal inequality.
[/step]
[step:Combine the stochastic and bias estimates]
For the fixed compact set $A \subset \mathbb{R}^d$, the decomposition from the first step gives
\begin{align*}
\|\hat f_{h_n}-f\|_{\infty,A}
\leq
\sup_{x \in A}|(P_n-P)g_{x,h_n}|
+
\sup_{x \in A}|Pg_{x,h_n}-f(x)|.
\end{align*}
The first term on the right converges to $0$ in probability by the VC empirical-process estimate and the bandwidth assumptions. The second term converges to $0$ deterministically because $h_n \downarrow 0$ and the convolution bias vanishes uniformly. Hence, by the preceding inequality,
\begin{align*}
\|\hat f_{h_n}-f\|_{\infty,A}\xrightarrow{\mathbb{P}}0.
\end{align*}
This proves the asserted uniform consistency on every compact set $A \subset \mathbb{R}^d$.
[/step]