Bandwidth-Scale Stochastic Equicontinuity for Kernel Density Processes

Bandwidth-Scale Stochastic Equicontinuity for Kernel Density Processes (Theorem # 6322)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The proof rewrites every bandwidth-scale increment $Z_n(x)-Z_n(y)$ as the empirical process indexed by the normalized kernel increment class $\mathcal G_{h_n,\delta}$. The $L^2(\mathcal L^d)$ translation continuity of $K$, together with boundedness of the density $f$, shows that the intrinsic $L^2(P)$ size of this class is controlled by $\omega_K(\delta)$. The compactness of $A$, the boundedness and compact support of $K$, the square-integrability of $K$, and the VC-type bounded-envelope hypothesis are used only as the structural assumptions behind the assumed local maximal inequality for these increment classes; after that inequality is invoked, they are not used independently. That maximal bound makes the expected supremum of the empirical increments small first as $n\to\infty$ and then as $\delta\downarrow0$, and [Markov's inequality](/theorems/514) converts the expectation estimate into the asserted probability estimate. [/proofplan] [step:Rewrite the process increments as an empirical process over normalized kernel differences] Fix $\delta>0$ and $n\in\mathbb N$. We use the standard empirical-process convention that the displayed suprema are taken over measurable versions; equivalently, if separability has not been fixed, the same argument is read with outer expectation and outer probability. The kernel density estimator is the map $\hat f_{h_n}:A\to\mathbb R$ defined by \begin{align*} \hat f_{h_n}(z)=\frac{1}{n h_n^d}\sum_{i=1}^n K\left(\frac{z-X_i}{h_n}\right),\qquad z\in A. \end{align*} The normalized increment class $\mathcal G_{h_n,\delta}$ is the class defined in the theorem statement with bandwidth $h=h_n$. For $x,y\in A$ with $|x-y|\le h_n\delta$, let $g_{x,y,h_n}:\mathbb R^d\to\mathbb R$ be the measurable map defined, for $u\in\mathbb R^d$, by \begin{align*} g_{x,y,h_n}(u)=h_n^{-d/2}\left(K\left(\frac{x-u}{h_n}\right)-K\left(\frac{y-u}{h_n}\right)\right). \end{align*} Then $g_{x,y,h_n}\in\mathcal G_{h_n,\delta}$ by definition. Using the definition of $\hat f_{h_n}$ and distributing the centering, \begin{align*} Z_n(x)-Z_n(y)=\frac{1}{\sqrt n}\sum_{i=1}^n\left(g_{x,y,h_n}(X_i)-\mathbb E[g_{x,y,h_n}(X_i)]\right). \end{align*} Therefore \begin{align*} \sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}} |Z_n(x)-Z_n(y)| \le \sup_{g\in\mathcal G_{h_n,\delta}} \left| \frac{1}{\sqrt n} \sum_{i=1}^n \left(g(X_i)-\mathbb E[g(X_i)]\right) \right|. \end{align*} [guided] The normalization in the theorem-statement definition of $\mathcal G_{h,\delta}$ is chosen so that the kernel-density scaling disappears exactly. The kernel density estimator is the map $\hat f_{h_n}:A\to\mathbb R$ defined by \begin{align*} \hat f_{h_n}(z)=\frac{1}{n h_n^d}\sum_{i=1}^n K\left(\frac{z-X_i}{h_n}\right),\qquad z\in A. \end{align*} In this step we use the class $\mathcal G_{h_n,\delta}$ obtained by setting $h=h_n$. Let $x,y\in A$ satisfy $|x-y|\le h_n\delta$, and let $g_{x,y,h_n}:\mathbb R^d\to\mathbb R$ be the measurable map defined, for $u\in\mathbb R^d$, by \begin{align*} g_{x,y,h_n}(u)=h_n^{-d/2}\left(K\left(\frac{x-u}{h_n}\right)-K\left(\frac{y-u}{h_n}\right)\right). \end{align*} This function belongs to $\mathcal G_{h_n,\delta}$ because the pair $(x,y)$ satisfies the required bandwidth-scale separation. Now compute the increment of the centred process directly. The scalar factor satisfies \begin{align*} \frac{\sqrt{n h_n^d}}{n h_n^d}=\frac{1}{\sqrt n\, h_n^{d/2}}, \end{align*} and therefore \begin{align*} Z_n(x)-Z_n(y)=\frac{1}{\sqrt n}\sum_{i=1}^n\left(g_{x,y,h_n}(X_i)-\mathbb E[g_{x,y,h_n}(X_i)]\right). \end{align*} Taking the supremum over all admissible pairs $(x,y)$ gives a supremum over a subcollection of $\mathcal G_{h_n,\delta}$, so \begin{align*} \sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}} |Z_n(x)-Z_n(y)| \le \sup_{g\in\mathcal G_{h_n,\delta}} \left| \frac{1}{\sqrt n} \sum_{i=1}^n \left(g(X_i)-\mathbb E[g(X_i)]\right) \right|. \end{align*} This is the essential reduction: stochastic equicontinuity of $Z_n$ at spatial scale $h_n$ becomes a maximal inequality for an empirical process indexed by normalized kernel increments. [/guided] [/step] [step:Control the intrinsic $L^2(P)$ size of the increment class by the translation modulus] Let $P$ denote the law of $X_1$, that is, the probability measure on $(\mathbb R^d,\mathcal B(\mathbb R^d))$ satisfying $P(B)=\mathbb P(X_1\in B)$ for every $B\in\mathcal B(\mathbb R^d)$. Let $M_f:=\|f\|_\infty$. For $g_{x,y,h}\in\mathcal G_{h,\delta}$, define $t_{x,y,h}:=(y-x)/h\in\mathbb R^d$. Since $|x-y|\le h\delta$, we have $|t_{x,y,h}|\le\delta$. Using the density of $X_1$ with respect to $\mathcal L^d$, \begin{align*} \mathbb E[g_{x,y,h}(X_1)^2]=\int_{\mathbb R^d}h^{-d}\left|K\left(\frac{x-u}{h}\right)-K\left(\frac{y-u}{h}\right)\right|^2 f(u)\,d\mathcal L^d(u). \end{align*} Since $f(u)\le M_f$ for $\mathcal L^d$-almost every $u\in\mathbb R^d$, this gives \begin{align*} \mathbb E[g_{x,y,h}(X_1)^2]\le M_f\int_{\mathbb R^d}h^{-d}\left|K\left(\frac{x-u}{h}\right)-K\left(\frac{y-u}{h}\right)\right|^2\,d\mathcal L^d(u). \end{align*} Apply the change-of-variables formula for [Lebesgue measure](/page/Lebesgue%20Measure) with the affine substitution $v=(x-u)/h$, equivalently $u=x-hv$. The Jacobian determinant of the inverse map $v\mapsto x-hv$ has absolute value $h^d$, so $d\mathcal L^d(u)=h^d\,d\mathcal L^d(v)$, and the domain $\mathbb R^d$ maps onto $\mathbb R^d$. Thus \begin{align*} \mathbb E[g_{x,y,h}(X_1)^2]\le M_f\int_{\mathbb R^d}|K(v)-K(v+t_{x,y,h})|^2\,d\mathcal L^d(v). \end{align*} Because $|t_{x,y,h}|\le\delta$, the definition of $\omega_K(\delta)$ gives \begin{align*} \mathbb E[g_{x,y,h}(X_1)^2]\le M_f\,\omega_K(\delta)^2. \end{align*} Since $\operatorname{Var}(g_{x,y,h}(X_1))\le \mathbb E[g_{x,y,h}(X_1)^2]$, it follows that \begin{align*} \sup_{g\in\mathcal G_{h,\delta}} \operatorname{Var}(g(X_1)) \le M_f\,\omega_K(\delta)^2. \end{align*} [/step] [step:Apply the local VC maximal bound to the increment supremum] For each $n$, set \begin{align*} S_{n,\delta} := \sup_{g\in\mathcal G_{h_n,\delta}} \left| \frac{1}{\sqrt n} \sum_{i=1}^n \left(g(X_i)-\mathbb E[g(X_i)]\right) \right|. \end{align*} The local VC-type maximal bound is used here as an assumed empirical-process estimate for these increment classes; the preceding $L^2(P)$ computation records the radius input that motivates that hypothesis. It supplies constants $h_0,C_0,A_0>0$ such that the displayed maximal inequality holds for every $0<h\le h_0$, every $n\in\mathbb N$, and every $\delta>0$ with $\omega_K(\delta)>0$. Because $h_n\downarrow0$, there exists $N\in\mathbb N$ such that $h_n\le h_0$ for all $n\ge N$. For the limiting argument it is enough to work with sufficiently small $\delta>0$ for which $\omega_K(\delta)>0$ and \begin{align*} \frac{A_0}{\omega_K(\delta)}>1; \end{align*} the latter positivity is part of the maximal-bound hypothesis. For such $\delta$ and every $n\ge N$, applying the theorem-statement maximal bound with $h=h_n$ gives \begin{align*} \mathbb E[S_{n,\delta}] \le C_0\,\omega_K(\delta)\sqrt{\log\frac{A_0}{\omega_K(\delta)}} + C_0\,\frac{\log(A_0/\omega_K(\delta))}{\sqrt{n h_n^d}}. \end{align*} The condition $(n h_n^d)/\log n\to\infty$ implies $n h_n^d\to\infty$. Therefore, for every such fixed $\delta$, \begin{align*} \limsup_{n\to\infty}\mathbb E[S_{n,\delta}] \le C_0\,\omega_K(\delta)\sqrt{\log\frac{A_0}{\omega_K(\delta)}}. \end{align*} The classes are monotone in $\delta$: if $0<\delta_1\le\delta_2$, then $\mathcal G_{h_n,\delta_1}\subseteq\mathcal G_{h_n,\delta_2}$ and hence $S_{n,\delta_1}\le S_{n,\delta_2}$. Thus possible values with $\omega_K(\delta)=0$ are controlled as follows. If there are admissible radii $\delta_2>\delta$ arbitrarily close to $\delta$, monotonicity bounds $S_{n,\delta}$ by $S_{n,\delta_2}$ and then lets $\delta_2\downarrow\delta$. If instead $\omega_K$ vanishes on an interval $(0,\delta_0)$, then for every $|t|<\delta_0$ the translation identity $K(\cdot+t)=K(\cdot)$ holds in $L^2(\mathcal L^d)$, so the preceding $L^2(P)$ estimate gives $\operatorname{Var}(g(X_1))=0$ for every $g\in\mathcal G_{h,\delta}$ with $0<\delta<\delta_0$. Under the measurable-version convention fixed in the first step, equivalently under the corresponding outer expectation and outer probability formulation, the centred empirical-process version of each such increment is zero, so the chosen supremum satisfies $S_{n,\delta}=0$ almost surely. Since $\omega_K(\delta)\to0$ as $\delta\downarrow0$ and $r\sqrt{\log(A_0/r)}\to0$ as $r\downarrow0$, this monotonicity and zero-radius alternative give \begin{align*} \lim_{\delta\downarrow0} \limsup_{n\to\infty} \mathbb E[S_{n,\delta}] = 0. \end{align*} [guided] The previous step identifies the correct intrinsic scale of the class: its $L^2(P)$ radius is at most a constant multiple of $\omega_K(\delta)$, where $P$ denotes the law of $X_1$. The theorem assumes precisely the empirical-process input needed at this scale; the compactness of $A$, the compact support and square-integrability of $K$, and the VC-type bounded-envelope assumptions are used here only as the background hypotheses that make that local maximal inequality available. Define the non-negative [random variable](/page/Random%20Variable) \begin{align*} S_{n,\delta} := \sup_{g\in\mathcal G_{h_n,\delta}} \left| \frac{1}{\sqrt n} \sum_{i=1}^n \left(g(X_i)-\mathbb E[g(X_i)]\right) \right|. \end{align*} The local VC-type maximal bound is used as an assumed empirical-process estimate for these classes; the preceding $L^2(P)$ computation explains the radius scale but is not an additional condition needed to invoke the bound. It provides constants $h_0,C_0,A_0>0$ such that, whenever $0<h\le h_0$ and $\omega_K(\delta)>0$, the expected supremum over $\mathcal G_{h,\delta}$ is bounded by the two displayed terms. The role of the compactness of $A$, the boundedness and compact support of $K$, and the VC-type structure is encoded in that hypothesis; the present proof only needs to check that the bandwidth sequence eventually lies in the allowed range. Since $h_n\downarrow0$, there exists $N\in\mathbb N$ such that $h_n\le h_0$ for every $n\ge N$. There is one endpoint issue to handle before taking logarithms. We apply the maximal inequality only for sufficiently small $\delta>0$ satisfying $\omega_K(\delta)>0$ and $A_0/\omega_K(\delta)>1$, exactly as required by the theorem-statement hypothesis. This loses no information in the final limit. Indeed, if $0<\delta_1\le\delta_2$, then the defining constraint $|x-y|\le h_n\delta_1$ implies $|x-y|\le h_n\delta_2$, so $\mathcal G_{h_n,\delta_1}\subseteq\mathcal G_{h_n,\delta_2}$ and therefore $S_{n,\delta_1}\le S_{n,\delta_2}$. If admissible radii with positive $\omega_K$ occur arbitrarily close above $\delta_1$, this monotonicity bounds the radius $\delta_1$ by those nearby admissible radii. If no such nearby admissible radii exist, then $\omega_K$ vanishes on a whole interval $(0,\delta_0)$; for every $|t|<\delta_0$, the equality $\|K(\cdot+t)-K\|_{L^2(\mathcal L^d)}=0$ means $K(\cdot+t)=K$ in $L^2(\mathcal L^d)$. The previous $L^2(P)$ estimate then gives $\operatorname{Var}(g(X_1))=0$ for every $g\in\mathcal G_{h,\delta}$ with $0<\delta<\delta_0$. Under the measurable-version convention fixed at the start of the proof, equivalently under outer expectation and outer probability, the centred empirical-process version of every such increment is zero in the chosen supremum. Therefore $S_{n,\delta}=0$ almost surely for those radii. For every admissible $\delta$ and every $n\ge N$, the theorem-statement maximal bound applies with $h=h_n$ and gives \begin{align*} \mathbb E[S_{n,\delta}] \le C_0\,\omega_K(\delta)\sqrt{\log\frac{A_0}{\omega_K(\delta)}} + C_0\,\frac{\log(A_0/\omega_K(\delta))}{\sqrt{n h_n^d}}. \end{align*} Here the first term is the Gaussian-size term controlled by the local $L^2(P)$ diameter, and the second term is the residual bounded-envelope contribution. Now fix such an admissible $\delta>0$. The positive number $\log(A_0/\omega_K(\delta))$ is independent of $n$. The bandwidth hypothesis \begin{align*} \frac{n h_n^d}{\log n}\to\infty \end{align*} implies $n h_n^d\to\infty$, so \begin{align*} \frac{\log(A_0/\omega_K(\delta))}{\sqrt{n h_n^d}}\to0 \end{align*} as $n\to\infty$. Taking the upper limit in $n$ therefore yields \begin{align*} \limsup_{n\to\infty}\mathbb E[S_{n,\delta}] \le C_0\,\omega_K(\delta)\sqrt{\log\frac{A_0}{\omega_K(\delta)}}. \end{align*} Finally, the assumed $L^2(\mathcal L^d)$ translation continuity gives $\omega_K(\delta)\to0$ as $\delta\downarrow0$. The elementary limit \begin{align*} r\sqrt{\log(A_0/r)}\to0 \end{align*} as $r\downarrow0$ applies with $r=\omega_K(\delta)$ along admissible radii. The monotonicity $S_{n,\delta_1}\le S_{n,\delta_2}$ for $\delta_1\le\delta_2$ extends the same limiting bound to all $\delta\downarrow0$. Hence \begin{align*} \lim_{\delta\downarrow0} \limsup_{n\to\infty} \mathbb E[S_{n,\delta}] = 0. \end{align*} This is the analytic core of the proof: the bandwidth scale does not require pointwise Lipschitz regularity of $K$; the $L^2$ translation modulus is enough once the class has a local VC-type maximal inequality. [/guided] [/step] [step:Convert the expectation bound into the required probability limit] Let $\varepsilon>0$. From the first step, \begin{align*} \mathbb P\left( \sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}} |Z_n(x)-Z_n(y)|>\varepsilon \right) \le \mathbb P(S_{n,\delta}>\varepsilon). \end{align*} Since $S_{n,\delta}\ge0$, [Markov's inequality](/theorems/514) gives \begin{align*} \mathbb P(S_{n,\delta}>\varepsilon) \le \frac{1}{\varepsilon}\mathbb E[S_{n,\delta}]. \end{align*} Taking $\limsup_{n\to\infty}$ and then $\delta\downarrow0$, \begin{align*} \lim_{\delta\downarrow0}\limsup_{n\to\infty} \mathbb P\left( \sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}} |Z_n(x)-Z_n(y)|>\varepsilon \right) \le \frac{1}{\varepsilon} \lim_{\delta\downarrow0}\limsup_{n\to\infty}\mathbb E[S_{n,\delta}]. \end{align*} The expectation limit from the previous step is zero, so \begin{align*} \lim_{\delta\downarrow0}\limsup_{n\to\infty} \mathbb P\left( \sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}} |Z_n(x)-Z_n(y)|>\varepsilon \right) =0. \end{align*} The left-hand side is non-negative, so it equals $0$. This proves the asserted bandwidth-scale stochastic equicontinuity. [guided] We now translate the expected empirical-process bound into the probability statement in the theorem. Fix $\varepsilon>0$. The first step proved the pointwise comparison \begin{align*} \sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}} |Z_n(x)-Z_n(y)| \le S_{n,\delta}. \end{align*} Therefore the event on the left is contained in the event $\{S_{n,\delta}>\varepsilon\}$, and hence \begin{align*} \mathbb P\left( \sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}} |Z_n(x)-Z_n(y)|>\varepsilon \right) \le \mathbb P(S_{n,\delta}>\varepsilon). \end{align*} The random variable $S_{n,\delta}$ is non-negative by definition, so [Markov's inequality](/theorems/514) applies and gives \begin{align*} \mathbb P(S_{n,\delta}>\varepsilon) \le \frac{1}{\varepsilon}\mathbb E[S_{n,\delta}]. \end{align*} Taking the upper limit in $n$ and then letting $\delta\downarrow0$ yields \begin{align*} \lim_{\delta\downarrow0}\limsup_{n\to\infty} \mathbb P\left( \sup_{\{(x,y)\in A\times A:\ |x-y|\le h_n\delta\}} |Z_n(x)-Z_n(y)|>\varepsilon \right) \le \frac{1}{\varepsilon} \lim_{\delta\downarrow0}\limsup_{n\to\infty} \mathbb E[S_{n,\delta}]. \end{align*} The previous step proved that the expectation limit on the right-hand side is $0$. Since probabilities are non-negative, the left-hand side is both at most $0$ and at least $0$, so it equals $0$. This is exactly the asserted bandwidth-scale stochastic equicontinuity. [/guided] [/step]

Prerequisites (0/8 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Continuity Definition Random Variable Definition Equicontinuity Definition Determinant Definition Expectation Definition Event Definition Jacobian Theorem #34 Maximal Inequality for Finitely Many Sub-Gaussian Random Variables Theorem #6058 Martingale Regularisation Theorem Stochastic Processes Chernoff Bound for Sub-Gaussian Random Variables Probability & Statistics Countable Subadditivity Probability Theory Uniqueness of the PGF Probability Theory Non-comparability of McDiarmid's Bounded-Difference Proxy with Variance Probability & Statistics Consistency of the Kolmogorov-Smirnov Test Against Fixed Alternatives Probability & Statistics Uniform Supremum-Norm Bias-Variance Tradeoff for Kernel Density Estimators Probability & Statistics Assouad's Lemma Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.