[proofplan]
We rewrite the centered kernel density estimator as the supremum of an empirical process over a fixed-bandwidth translate class. The bounded compact support of $K$ and the boundedness of $f$ give an envelope of size $h^{-d}$ and a variance bound of size $h^{-d}$. The pointwise measurable VC-subgraph assumption supplies measurable suprema and polynomial covering numbers, so the standard VC-type empirical-process maximal inequality, followed by Talagrand concentration, gives a stochastic bound with square-root and linear terms. The bandwidth assumptions replace $\log(1/h_n)$ by $\log n$ and make the linear term negligible.
[/proofplan]
[step:Rewrite the centered estimator as an empirical-process supremum]
Let $\mathcal L^d$ denote $d$-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on $(\mathbb R^d,\mathcal B(\mathbb R^d))$. Let $P$ denote the probability law of $X_1$ on $(\mathbb R^d,\mathcal B(\mathbb R^d))$. For $y\in\mathbb R^d$, let $\delta_y$ denote the Dirac probability measure at $y$, defined by $\delta_y(B)=\mathbb 1_B(y)$ for every $B\in\mathcal B(\mathbb R^d)$. Define the empirical probability measure $P_n$ by
\begin{align*}
P_n := \frac{1}{n}\sum_{i=1}^n \delta_{X_i}.
\end{align*}
For each $h>0$, the theorem statement defines the kernel density estimator $\hat f_{n,h}:\mathbb R^d\to\mathbb R$ by
\begin{align*}
\hat f_{n,h}(x)=\frac{1}{n}\sum_{i=1}^n h^{-d}K\left(\frac{x-X_i}{h}\right).
\end{align*}
For each $h>0$ and $x\in A$, define the function $g_{x,h}:\mathbb R^d\to\mathbb R$ by
\begin{align*}
g_{x,h}(u):=h^{-d}K\left(\frac{x-u}{h}\right).
\end{align*}
This function is Borel measurable because $K$ is Borel measurable and the affine map $u\mapsto (x-u)/h$ from $\mathbb R^d$ to $\mathbb R^d$ is continuous.
Define the fixed-bandwidth kernel class $\mathcal F_h$ by
\begin{align*}
\mathcal F_h := \{g_{x,h}:x\in A\}.
\end{align*}
For every $x \in A$,
\begin{align*}
\hat f_{n,h}(x)-\mathbb E[\hat f_{n,h}(x)]
=
(P_n-P)g_{x,h}.
\end{align*}
Therefore
\begin{align*}
\|\hat f_{n,h}-\mathbb E[\hat f_{n,h}]\|_{\infty,A}
=
\sup_{g \in \mathcal F_h}|(P_n-P)g|.
\end{align*}
[guided]
For each bandwidth $h>0$, the estimator is the map $\hat f_{n,h}:\mathbb R^d\to\mathbb R$ given by
\begin{align*}
\hat f_{n,h}(x)=\frac{1}{n}\sum_{i=1}^n h^{-d}K\left(\frac{x-X_i}{h}\right).
\end{align*}
The purpose of introducing $g_{x,h}$ is to view the value of the estimator at $x$ as an empirical average of one function. Define $g_{x,h}:\mathbb R^d\to\mathbb R$ by
\begin{align*}
g_{x,h}(u):=h^{-d}K\left(\frac{x-u}{h}\right).
\end{align*}
This map is Borel measurable because $K$ is Borel measurable and $u\mapsto (x-u)/h$ is continuous. With $P_n=n^{-1}\sum_{i=1}^n\delta_{X_i}$ and $P$ the law of $X_1$, we have
\begin{align*}
P_ng_{x,h}=\frac{1}{n}\sum_{i=1}^n g_{x,h}(X_i)=\hat f_{n,h}(x),
\qquad
Pg_{x,h}=\mathbb E[g_{x,h}(X_1)]=\mathbb E[\hat f_{n,h}(x)].
\end{align*}
Therefore
\begin{align*}
\hat f_{n,h}(x)-\mathbb E[\hat f_{n,h}(x)]=(P_n-P)g_{x,h}.
\end{align*}
Taking the supremum over $x\in A$, equivalently over $\mathcal F_h=\{g_{x,h}:x\in A\}$, gives
\begin{align*}
\|\hat f_{n,h}-\mathbb E[\hat f_{n,h}]\|_{\infty,A}
=
\sup_{g\in\mathcal F_h}|(P_n-P)g|.
\end{align*}
[/guided]
[/step]
[step:Compute the envelope and variance scales]
Set
\begin{align*}
M := \|K\|_{\infty,\mathbb R^d}.
\end{align*}
Since $K$ is bounded and compactly supported, $M<\infty$ and
\begin{align*}
L_K^2 := \int_{\mathbb R^d} |K(v)|^2\,d\mathcal L^d(v)<\infty.
\end{align*}
Moreover $L_K>0$: if $L_K=0$, then $K=0$ $\mathcal L^d$-a.e., contradicting $\int_{\mathbb R^d}K(v)\,d\mathcal L^d(v)=1$.
Define the function $F_h:\mathbb R^d\to[0,\infty)$ by
\begin{align*}
F_h(u):=Mh^{-d}.
\end{align*}
Then $F_h$ is an envelope for $\mathcal F_h$.
For $g_{x,h}\in\mathcal F_h$, using the density $f$ of $X_1$ gives
\begin{align*}
\mathbb E[g_{x,h}(X_1)^2]=h^{-2d}\int_{\mathbb R^d} K\left(\frac{x-u}{h}\right)^2 f(u)\,d\mathcal L^d(u).
\end{align*}
Apply the change of variables $v=(x-u)/h$, equivalently $u=x-hv$, under which $d\mathcal L^d(u)=h^d\,d\mathcal L^d(v)$ and the domain $\mathbb R^d$ maps onto $\mathbb R^d$. This yields
\begin{align*}
\mathbb E[g_{x,h}(X_1)^2]=h^{-d}\int_{\mathbb R^d} K(v)^2 f(x-hv)\,d\mathcal L^d(v).
\end{align*}
Since $f$ is bounded and $L_K^2$ is the preceding $L^2(\mathcal L^d)$ norm square of $K$,
\begin{align*}
\mathbb E[g_{x,h}(X_1)^2]\le h^{-d}\|f\|_{\infty,\mathbb R^d} L_K^2.
\end{align*}
Thus, with
\begin{align*}
\sigma_h^2 := \|f\|_{\infty,\mathbb R^d}L_K^2 h^{-d},
\end{align*}
we have
\begin{align*}
\sup_{g\in\mathcal F_h}\operatorname{Var}(g(X_1))
\le
\sup_{g\in\mathcal F_h}\mathbb E[g(X_1)^2]
\le
\sigma_h^2.
\end{align*}
[guided]
The empirical-process inequalities used below need two numerical inputs: a uniform envelope and a uniform variance bound. The envelope is obtained directly from boundedness of $K$. Define
\begin{align*}
M := \|K\|_{\infty,\mathbb R^d}.
\end{align*}
Then every function $g_{x,h}\in\mathcal F_h$ satisfies
\begin{align*}
|g_{x,h}(u)|
=
h^{-d}\left|K\left(\frac{x-u}{h}\right)\right|
\le
M h^{-d}
\end{align*}
for every $u\in\mathbb R^d$. Hence $F_h(u)=Mh^{-d}$ is an envelope for $\mathcal F_h$.
Next we estimate the variance. Since $\operatorname{Var}(Y)\le \mathbb E[Y^2]$ for every square-integrable real-valued [random variable](/page/Random%20Variable) $Y$, it is enough to control $\mathbb E[g_{x,h}(X_1)^2]$. The random vector $X_1$ has density $f$ with respect to $\mathcal L^d$, so
\begin{align*}
\mathbb E[g_{x,h}(X_1)^2]
=
h^{-2d}\int_{\mathbb R^d} K\left(\frac{x-u}{h}\right)^2 f(u)\,d\mathcal L^d(u).
\end{align*}
We now perform the explicit change of variables $v=(x-u)/h$, so $u=x-hv$ and $d\mathcal L^d(u)=h^d\,d\mathcal L^d(v)$. This gives
\begin{align*}
\mathbb E[g_{x,h}(X_1)^2]
=
h^{-d}\int_{\mathbb R^d}K(v)^2 f(x-hv)\,d\mathcal L^d(v).
\end{align*}
Because $\|f\|_{\infty,\mathbb R^d}<\infty$ and $K$ is bounded with compact support, the constant
\begin{align*}
L_K^2 := \int_{\mathbb R^d}|K(v)|^2\,d\mathcal L^d(v)
\end{align*}
is finite. It is also positive: if $L_K=0$, then $K=0$ $\mathcal L^d$-a.e., contradicting $\int_{\mathbb R^d}K(v)\,d\mathcal L^d(v)=1$. Therefore
\begin{align*}
\mathbb E[g_{x,h}(X_1)^2]
\le
h^{-d}\|f\|_{\infty,\mathbb R^d}L_K^2.
\end{align*}
Thus the natural variance scale is $h^{-d}$, whereas the envelope scale is $h^{-d}$. This distinction is what produces the final square-root rate $\sqrt{1/(nh^d)}$ rather than $1/(nh^d)$.
[/guided]
[/step]
[step:Apply the VC maximal inequality at bandwidth $h$]
We use the [VC-type maximal inequality for bounded empirical processes, van der Vaart and Wellner Theorem 2.14.1](external:van-der-vaart-wellner-1996-theorem-2-14-1) for pointwise measurable VC-subgraph classes. By the pointwise measurability assumption on $\mathcal K$, the fixed-bandwidth subclass $\mathcal F_h\subset h^{-d}\mathcal K$ is pointwise measurable: for this $h$ there is a countable subclass $\mathcal F_{h,0}\subset\mathcal F_h$ such that every $g\in\mathcal F_h$ is the pointwise limit of a sequence in $\mathcal F_{h,0}$. Since every member of $\mathcal F_h$ is bounded by the integrable envelope $F_h$ under $P$, dominated convergence gives $(Pg_{k})\to Pg$ along such an approximating sequence $(g_k)_{k=1}^\infty$; at the finitely many sample points, pointwise convergence also gives $(P_ng_k)\to P_ng$. Hence the supremum over $\mathcal F_h$ agrees with the supremum over the countable class $\mathcal F_{h,0}$ and is measurable. Since $\mathcal K$ is VC-subgraph with bounded envelope $M$, rescaling by the positive scalar $h^{-d}$ and restricting the translation parameter to $x\in A$ preserve the VC-subgraph entropy exponents. Thus there exist constants $a\ge e$ and $v\ge 1$, depending only on the VC characteristics of the original translated-dilated kernel class, and a universal constant $C_0>0$ from the maximal inequality, such that $\mathcal F_h$ has envelope $Mh^{-d}$ and for every $h\in(0,1)$,
\begin{align*}
\mathbb E\left[\sup_{g\in\mathcal F_h}|(P_n-P)g|\right]
\le
C_0\left(
\sqrt{\frac{\sigma_h^2}{n}\log\left(\frac{aMh^{-d}}{\sigma_h}\right)}
+
\frac{Mh^{-d}}{n}\log\left(\frac{aMh^{-d}}{\sigma_h}\right)
\right).
\end{align*}
The hypotheses required for this standard empirical-process inequality are satisfied: $\mathcal F_h$ is pointwise measurable, it is a subclass of the rescaled VC-subgraph class $h^{-d}\mathcal K$, its envelope is $Mh^{-d}$, its VC entropy constants are the fixed constants $a$ and $v$ above, and its variance is bounded by $\sigma_h^2$.
Since $\sigma_h=\|f\|_{\infty,\mathbb R^d}^{1/2}L_Kh^{-d/2}$, we have
\begin{align*}
\frac{aMh^{-d}}{\sigma_h}=\frac{aM}{\|f\|_{\infty,\mathbb R^d}^{1/2}L_K}\,h^{-d/2}.
\end{align*}
Therefore there is a constant $C_1=C_1(C_0,d,K,f,a,v)>0$ such that, for all sufficiently small $h$,
\begin{align*}
\mathbb E\left[\sup_{g\in\mathcal F_h}|(P_n-P)g|\right]
\le
C_1\left(
\sqrt{\frac{\log(1/h)}{nh^d}}
+
\frac{\log(1/h)}{nh^d}
\right).
\end{align*}
[guided]
The maximal inequality is applied to an uncountable class, so we must first make sure the supremum is a measurable random variable rather than only an outer supremum. The theorem assumes that $\mathcal K$ is pointwise measurable. Since $\mathcal F_h\subset h^{-d}\mathcal K$ is obtained by restriction of parameters and multiplication by the fixed positive scalar $h^{-d}$, it is also pointwise measurable. Thus there is a countable subclass $\mathcal F_{h,0}\subset\mathcal F_h$ such that each member of $\mathcal F_h$ is the pointwise limit of a sequence from $\mathcal F_{h,0}$. Along such an approximating sequence, the empirical averages converge because there are only finitely many sample points, and the expectations converge by dominated convergence using the integrable envelope $F_h$ under $P$. Hence the supremum over $\mathcal F_h$ agrees with the supremum over the countable class $\mathcal F_{h,0}$, so it is measurable.
Now the standard VC-type maximal inequality for bounded empirical processes applies. Its hypotheses are: pointwise measurability, VC-subgraph entropy with fixed characteristics $a\ge e$ and $v\ge1$ inherited from $\mathcal K$, an envelope $Mh^{-d}$, and a variance bound $\sigma_h^2$. The first condition was just verified, the VC-subgraph condition is inherited from $\mathcal K$ by rescaling and restriction to $x\in A$, the envelope was computed above, and the variance bound is
\begin{align*}
\sup_{g\in\mathcal F_h}\operatorname{Var}(g(X_1))\le \sigma_h^2.
\end{align*}
Therefore there is a constant $C_0>0$, depending only on the VC characteristics of $\mathcal K$, such that
\begin{align*}
\mathbb E\left[\sup_{g\in\mathcal F_h}|(P_n-P)g|\right]\le C_0\left(\sqrt{\frac{\sigma_h^2}{n}\log\left(\frac{aMh^{-d}}{\sigma_h}\right)}+\frac{Mh^{-d}}{n}\log\left(\frac{aMh^{-d}}{\sigma_h}\right)\right).
\end{align*}
Since $\sigma_h=\|f\|_{\infty,\mathbb R^d}^{1/2}L_Kh^{-d/2}$, the logarithm is bounded by a constant multiple of $\log(1/h)$ for all sufficiently small $h$. Absorbing the fixed constants into $C_1=C_1(C_0,d,K,f,a,v)>0$ gives
\begin{align*}
\mathbb E\left[\sup_{g\in\mathcal F_h}|(P_n-P)g|\right]
\le
C_1\left(
\sqrt{\frac{\log(1/h)}{nh^d}}
+
\frac{\log(1/h)}{nh^d}
\right).
\end{align*}
[/guided]
[/step]
[step:Use concentration to pass from the mean bound to a probability bound]
We use [Talagrand's concentration inequality for bounded empirical processes in Bousquet's form](external:bousquet-2002-talagrand-bounded-empirical-process-concentration) indexed by pointwise measurable classes. The pointwise measurability verification from the preceding step applies to the same class $\mathcal F_h$, so the supremum is measurable and Talagrand's inequality applies with ordinary probability. There is a constant $C_2>0$, depending only on the universal constants in Talagrand's inequality, such that for every $t>0$,
\begin{align*}
\mathbb P\left(
\sup_{g\in\mathcal F_h}|(P_n-P)g|
>
\mathbb E\left[\sup_{g\in\mathcal F_h}|(P_n-P)g|\right]
+
C_2\left(\sqrt{\frac{\sigma_h^2 t}{n}}+\frac{Mh^{-d}t}{n}\right)
\right)
\le
e^{-t}.
\end{align*}
Taking $t=\log n$ and inserting $\sigma_h^2=\|f\|_{\infty,\mathbb R^d}L_K^2h^{-d}$ gives
\begin{align*}
\sup_{g\in\mathcal F_h}|(P_n-P)g|=O_{\mathbb P}\left(\sqrt{\frac{\log(1/h)}{nh^d}}+\frac{\log(1/h)}{nh^d}+\sqrt{\frac{\log n}{nh^d}}+\frac{\log n}{nh^d}\right).
\end{align*}
[guided]
[Talagrand's concentration inequality for bounded empirical processes in Bousquet's form](external:bousquet-2002-talagrand-bounded-empirical-process-concentration) is applied to the same pointwise measurable class $\mathcal F_h$. The hypotheses are the measurable supremum condition, the uniform envelope bound $|g|\le Mh^{-d}$ for all $g\in\mathcal F_h$, and the variance bound $\sup_{g\in\mathcal F_h}\operatorname{Var}(g(X_1))\le\sigma_h^2$. These have all been verified, so for every $t>0$,
\begin{align*}
\mathbb P\left(
\sup_{g\in\mathcal F_h}|(P_n-P)g|
>
\mathbb E\left[\sup_{g\in\mathcal F_h}|(P_n-P)g|\right]
+
C_2\left(
\sqrt{\frac{\sigma_h^2t}{n}}
+
\frac{Mh^{-d}t}{n}
\right)
\right)
\le e^{-t}.
\end{align*}
Choose $t=\log n$. Then $e^{-t}=n^{-1}\to0$, so the displayed bound holds with probability tending to one. Substituting the mean bound from the previous step and the identity $\sigma_h^2=\|f\|_{\infty,\mathbb R^d}L_K^2h^{-d}$ yields
\begin{align*}
\sup_{g\in\mathcal F_h}|(P_n-P)g|=O_{\mathbb P}\left(\sqrt{\frac{\log(1/h)}{nh^d}}+\frac{\log(1/h)}{nh^d}+\sqrt{\frac{\log n}{nh^d}}+\frac{\log n}{nh^d}\right).
\end{align*}
[/guided]
[/step]
[step:Insert the bandwidth assumptions and conclude]
Now take $h=h_n$. Since $\log(1/h_n)=O(\log n)$, the preceding bound becomes
\begin{align*}
\sup_{g\in\mathcal F_{h_n}} |(P_n-P)g|=O_{\mathbb P}\left(\sqrt{\frac{\log n}{n h_n^d}}+\frac{\log n}{n h_n^d}\right).
\end{align*}
The assumption
\begin{align*}
\frac{n h_n^d}{\log n}\to\infty
\end{align*}
implies
\begin{align*}
\frac{\log n}{n h_n^d}
=
o\left(\sqrt{\frac{\log n}{n h_n^d}}\right).
\end{align*}
Therefore
\begin{align*}
\sup_{g\in\mathcal F_{h_n}} |(P_n-P)g|=O_{\mathbb P}\left(\sqrt{\frac{\log n}{n h_n^d}}\right).
\end{align*}
Using the empirical-process identity from the first step, this is exactly
\begin{align*}
\|\hat f_{n,h_n}-\mathbb E[\hat f_{n,h_n}]\|_{\infty,A}=O_{\mathbb P}\left(\sqrt{\frac{\log n}{n h_n^d}}\right).
\end{align*}
This proves the asserted maximal deviation rate.
[guided]
Set $h=h_n$. From the preceding concentration step,
\begin{align*}
\sup_{g\in\mathcal F_{h_n}} |(P_n-P)g|=O_{\mathbb P}\left(\sqrt{\frac{\log(1/h_n)}{n h_n^d}}+\frac{\log(1/h_n)}{n h_n^d}+\sqrt{\frac{\log n}{n h_n^d}}+\frac{\log n}{n h_n^d}\right).
\end{align*}
The assumption $\log(1/h_n)=O(\log n)$ means that the terms involving $\log(1/h_n)$ are bounded by constant multiples of the corresponding terms involving $\log n$. Hence
\begin{align*}
\sup_{g\in\mathcal F_{h_n}} |(P_n-P)g|=O_{\mathbb P}\left(\sqrt{\frac{\log n}{n h_n^d}}+\frac{\log n}{n h_n^d}\right).
\end{align*}
Now use the second bandwidth assumption. Since
\begin{align*}
\frac{n h_n^d}{\log n}\to\infty,
\end{align*}
we have $\log n/(n h_n^d)\to0$, and therefore
\begin{align*}
\frac{\log n}{n h_n^d}
=
o\left(\sqrt{\frac{\log n}{n h_n^d}}\right).
\end{align*}
So the linear term is absorbed into the square-root term:
\begin{align*}
\sup_{g\in\mathcal F_{h_n}} |(P_n-P)g|=O_{\mathbb P}\left(\sqrt{\frac{\log n}{n h_n^d}}\right).
\end{align*}
The first step identified this supremum with the uniform centered estimator deviation over $A$, namely
\begin{align*}
\|\hat f_{n,h_n}-\mathbb E[\hat f_{n,h_n}]\|_{\infty,A}=\sup_{g\in\mathcal F_{h_n}} |(P_n-P)g|.
\end{align*}
Thus
\begin{align*}
\|\hat f_{n,h_n}-\mathbb E[\hat f_{n,h_n}]\|_{\infty,A}=O_{\mathbb P}\left(\sqrt{\frac{\log n}{n h_n^d}}\right),
\end{align*}
which is the claimed maximal deviation rate.
[/guided]
[/step]