Undersmoothed Pointwise Asymptotic Normality and Confidence Interval for Kernel Density Estimation

Undersmoothed Pointwise Asymptotic Normality and Confidence Interval for Kernel Density Estimation (Theorem # 6356)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We decompose the estimation error into a centered stochastic term and a deterministic bias term. The centered term is a triangular-array sum of independent kernel summands; Lyapunov's [central limit theorem](/theorems/521) applies because boundedness of $K$, together with the pointwise variance expansion, controls the required Lyapunov moments and identifies the limiting variance. The undersmoothing condition removes the bias at the standard-error scale. Finally, pointwise consistency makes the positive-part plug-in standard error asymptotically equivalent to the oracle standard error, and Slutsky's theorem gives the stated interval coverage. [/proofplan] [step:Normalize the centered stochastic term as a triangular array] Let $(\Omega,\mathcal{F},\mathbb{P})$ be the probability space on which the independent random variables $X_i:(\Omega,\mathcal{F})\to(\mathbb{R},\mathcal{B}(\mathbb{R}))$ are defined. Let $(h_n)_{n\in\mathbb{N}}$ denote the bandwidth sequence from the theorem statement; this is the asymptotic version of the bandwidth parameter $h$ appearing in the interval formula. For each $n\in\mathbb{N}$ and $1\leq i\leq n$, define the real-valued [random variable](/page/Random%20Variable) \begin{align*} Y_{n,i} := \frac{1}{\sqrt{n h_n}} \left[ K\left(\frac{x-X_i}{h_n}\right) - \mathbb{E}\left[K\left(\frac{x-X_i}{h_n}\right)\right] \right]. \end{align*} Then $(Y_{n,i})_{1\leq i\leq n}$ are independent and centered. Their sum is exactly the centered kernel estimator at the $\sqrt{n h_n}$ scale: \begin{align*} \sum_{i=1}^{n}Y_{n,i} &= \sqrt{n h_n} \left( \hat f_{h_n}(x)-\mathbb{E}[\hat f_{h_n}(x)] \right). \end{align*} The variance of this sum is \begin{align*} s_n^2 := \operatorname{Var}\left(\sum_{i=1}^{n}Y_{n,i}\right) = n\operatorname{Var}(Y_{n,1}) = n h_n\,\operatorname{Var}(\hat f_{h_n}(x)). \end{align*} By the assumed pointwise variance expansion, \begin{align*} s_n^2\to f(x)R(K). \end{align*} Since $f(x)>0$ and $R(K)>0$, the limiting variance is positive. [/step] [step:Verify Lyapunov's condition for the centered kernel array] Set $\delta:=1$, let $\mathcal{L}^1$ denote one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on $\mathbb{R}$, and let $M:=\|K\|_\infty<\infty$. Since $K$ is bounded, for every $u\in\mathbb{R}$, \begin{align*} |K(u)|^{2+\delta}\leq M^\delta K(u)^2. \end{align*} Thus $\int_{\mathbb{R}} |K(u)|^{2+\delta}\,d\mathcal{L}^1(u)<\infty$ is controlled by $K\in L^\infty(\mathbb{R})$ and $R(K)<\infty$. Using the inequality $|a-b|^{2+\delta}\leq 2^{1+\delta}(|a|^{2+\delta}+|b|^{2+\delta})$ for $a,b\in\mathbb{R}$ and [Jensen's inequality](/theorems/9) for the convex function $t\mapsto |t|^{2+\delta}$, we obtain \begin{align*} \mathbb{E}\left[ \left| K\left(\frac{x-X_1}{h_n}\right) - \mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)\right] \right|^{2+\delta} \right] \leq 2^{2+\delta} \mathbb{E}\left[ \left| K\left(\frac{x-X_1}{h_n}\right) \right|^{2+\delta} \right]. \end{align*} Using the previous bound then gives \begin{align*} \mathbb{E}\left[ \left| K\left(\frac{x-X_1}{h_n}\right) - \mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)\right] \right|^{2+\delta} \right] \leq 2^{2+\delta}M^\delta \mathbb{E}\left[ K\left(\frac{x-X_1}{h_n}\right)^2 \right]. \end{align*} The bias expansion gives \begin{align*} \mathbb{E}[\hat f_{h_n}(x)]=f(x)+o(1). \end{align*} Since \begin{align*} \mathbb{E}[\hat f_{h_n}(x)]=\frac{1}{h_n}\mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)\right], \end{align*} we obtain \begin{align*} \mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)\right]=h_n f(x)+o(h_n), \end{align*} and hence the squared mean term is $O(h_n^2)$. The variance expansion gives \begin{align*} \operatorname{Var}\left(K\left(\frac{x-X_1}{h_n}\right)\right)=h_n f(x)R(K)+o(h_n). \end{align*} Therefore \begin{align*} \mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)^2\right]=h_n f(x)R(K)+o(h_n). \end{align*} Choose $A>f(x)R(K)$ such that, for all sufficiently large $n$, \begin{align*} \mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)^2\right]\leq A h_n. \end{align*} Define the constant $C:=2^{2+\delta}M^\delta A$. Then, for all sufficiently large $n$, \begin{align*} \mathbb{E}\left[\left|K\left(\frac{x-X_1}{h_n}\right)-\mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)\right]\right|^{2+\delta}\right]\leq C h_n. \end{align*} By the definition of $Y_{n,i}$, \begin{align*} \sum_{i=1}^{n}\mathbb{E}[|Y_{n,i}|^{2+\delta}]=n(nh_n)^{-(1+\delta/2)}\mathbb{E}\left[\left|K\left(\frac{x-X_1}{h_n}\right)-\mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)\right]\right|^{2+\delta}\right]. \end{align*} Using the preceding centered-moment bound, \begin{align*} \sum_{i=1}^{n}\mathbb{E}[|Y_{n,i}|^{2+\delta}]\leq C(nh_n)^{-\delta/2}\to0. \end{align*} Since $s_n^2\to f(x)R(K)>0$, Lyapunov's condition holds: \begin{align*} \frac{1}{s_n^{2+\delta}} \sum_{i=1}^{n}\mathbb{E}[|Y_{n,i}|^{2+\delta}] \to0. \end{align*} By the Lyapunov [central limit theorem](/theorems/1848) for triangular arrays, \begin{align*} \frac{\sum_{i=1}^{n}Y_{n,i}}{s_n} \xrightarrow{d} \mathcal N(0,1). \end{align*} [guided] The stochastic part of the estimator is a sum of independent terms whose distribution changes with $n$, because the bandwidth $h_n$ changes with $n$. This is why the correct central limit theorem is a triangular-array version, not the classical i.i.d. central limit theorem. We must verify Lyapunov's condition with exponent $\delta:=1$. The random variable in one summand is \begin{align*} K\left(\frac{x-X_1}{h_n}\right) - \mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)\right]. \end{align*} The first issue is to control its $(2+\delta)$ moment. Let $M:=\|K\|_\infty$. Since $K$ is bounded, for every $u\in\mathbb{R}$, \begin{align*} |K(u)|^{2+\delta}=|K(u)|^\delta K(u)^2\leq M^\delta K(u)^2. \end{align*} Consequently $\int_{\mathbb{R}} |K(u)|^{2+\delta}\,d\mathcal{L}^1(u)<\infty$ is automatic from boundedness of $K$ and $R(K)<\infty$. Also, for [real numbers](/page/Real%20Numbers) $a,b$, \begin{align*} |a-b|^{2+\delta}\leq 2^{1+\delta}(|a|^{2+\delta}+|b|^{2+\delta}). \end{align*} Applying this with $a=K\left(\frac{x-X_1}{h_n}\right)$ and $b=\mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)\right]$, and then using [Jensen's inequality](/theorems/1977) for the convex function $t\mapsto |t|^{2+\delta}$ gives \begin{align*} \mathbb{E}\left[ \left| K\left(\frac{x-X_1}{h_n}\right) - \mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)\right] \right|^{2+\delta} \right] \leq 2^{2+\delta} \mathbb{E}\left[ \left| K\left(\frac{x-X_1}{h_n}\right) \right|^{2+\delta} \right]. \end{align*} Using $|K(u)|^{2+\delta}\leq M^\delta K(u)^2$ then gives \begin{align*} \mathbb{E}\left[ \left| K\left(\frac{x-X_1}{h_n}\right) - \mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)\right] \right|^{2+\delta} \right] \leq 2^{2+\delta}M^\delta \mathbb{E}\left[ K\left(\frac{x-X_1}{h_n}\right)^2 \right]. \end{align*} The pointwise variance expansion gives the size of the last expectation. Indeed, \begin{align*} \operatorname{Var}(\hat f_{h_n}(x)) = \frac{1}{n h_n^2} \operatorname{Var}\left(K\left(\frac{x-X_1}{h_n}\right)\right), \end{align*} so the hypothesis \begin{align*} \operatorname{Var}(\hat f_{h_n}(x)) = \frac{f(x)R(K)}{n h_n}+o\left(\frac{1}{n h_n}\right) \end{align*} implies \begin{align*} \operatorname{Var}\left(K\left(\frac{x-X_1}{h_n}\right)\right) = h_n f(x)R(K)+o(h_n). \end{align*} The bias expansion also gives \begin{align*} \mathbb{E}[\hat f_{h_n}(x)]=f(x)+o(1). \end{align*} Since $\mathbb{E}[\hat f_{h_n}(x)]=h_n^{-1}\mathbb{E}[K((x-X_1)/h_n)]$, this implies \begin{align*} \mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)\right]=h_n f(x)+o(h_n). \end{align*} Thus the squared mean is $O(h_n^2)$, and consequently \begin{align*} \mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)^2\right]=h_n f(x)R(K)+o(h_n). \end{align*} Choose $A>f(x)R(K)$ such that, for all sufficiently large $n$, \begin{align*} \mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)^2\right]\leq A h_n. \end{align*} The constant in the centered-moment estimate is therefore explicit: define $C:=2^{2+\delta}M^\delta A$. Then, for all sufficiently large $n$, \begin{align*} \mathbb{E}\left[\left|K\left(\frac{x-X_1}{h_n}\right)-\mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)\right]\right|^{2+\delta}\right]\leq C h_n. \end{align*} Now compute Lyapunov's numerator for the array $Y_{n,i}$. Since \begin{align*} Y_{n,i} = \frac{1}{\sqrt{n h_n}} \left[ K\left(\frac{x-X_i}{h_n}\right) - \mathbb{E}\left[K\left(\frac{x-X_i}{h_n}\right)\right] \right], \end{align*} From the definition of $Y_{n,i}$, \begin{align*} \sum_{i=1}^{n}\mathbb{E}[|Y_{n,i}|^{2+\delta}]=n(nh_n)^{-(1+\delta/2)}\mathbb{E}\left[\left|K\left(\frac{x-X_1}{h_n}\right)-\mathbb{E}\left[K\left(\frac{x-X_1}{h_n}\right)\right]\right|^{2+\delta}\right]. \end{align*} The centered-moment estimate gives \begin{align*} \sum_{i=1}^{n}\mathbb{E}[|Y_{n,i}|^{2+\delta}]\leq C(nh_n)^{-\delta/2}. \end{align*} Because $nh_n\to\infty$, the right-hand side tends to $0$. The variance of the whole array is \begin{align*} s_n^2 = \operatorname{Var}\left(\sum_{i=1}^{n}Y_{n,i}\right) = n h_n\,\operatorname{Var}(\hat f_{h_n}(x)) \to f(x)R(K)>0. \end{align*} Therefore \begin{align*} \frac{1}{s_n^{2+\delta}} \sum_{i=1}^{n}\mathbb{E}[|Y_{n,i}|^{2+\delta}] \to0. \end{align*} This is exactly Lyapunov's condition. Hence the Lyapunov central limit theorem for triangular arrays yields \begin{align*} \frac{\sum_{i=1}^{n}Y_{n,i}}{s_n} \xrightarrow{d} \mathcal N(0,1). \end{align*} [/guided] [/step] [step:Replace the random-array variance by its deterministic limit] Since \begin{align*} s_n^2\to f(x)R(K)>0, \end{align*} the deterministic ratio satisfies \begin{align*} \frac{s_n}{\sqrt{f(x)R(K)}}\to1. \end{align*} By Slutsky's theorem, \begin{align*} \frac{\sqrt{n h_n}\left(\hat f_{h_n}(x)-\mathbb{E}[\hat f_{h_n}(x)]\right)} {\sqrt{f(x)R(K)}} \xrightarrow{d} \mathcal N(0,1). \end{align*} Equivalently, \begin{align*} \frac{\hat f_{h_n}(x)-\mathbb{E}[\hat f_{h_n}(x)]} {\sqrt{f(x)R(K)/(n h_n)}} \xrightarrow{d} \mathcal N(0,1). \end{align*} [/step] [step:Use undersmoothing to remove the bias] Decompose \begin{align*} \frac{\hat f_{h_n}(x)-f(x)} {\sqrt{f(x)R(K)/(n h_n)}} &= \frac{\hat f_{h_n}(x)-\mathbb{E}[\hat f_{h_n}(x)]} {\sqrt{f(x)R(K)/(n h_n)}} + \frac{\mathbb{E}[\hat f_{h_n}(x)]-f(x)} {\sqrt{f(x)R(K)/(n h_n)}}. \end{align*} The first term converges in distribution to $\mathcal N(0,1)$ by the previous step. The second term equals \begin{align*} \frac{\sqrt{n h_n}\left(\mathbb{E}[\hat f_{h_n}(x)]-f(x)\right)} {\sqrt{f(x)R(K)}}, \end{align*} which tends to $0$ by the undersmoothing condition and the positivity of $f(x)R(K)$. A second application of Slutsky's theorem gives \begin{align*} \frac{\hat f_{h_n}(x)-f(x)} {\sqrt{f(x)R(K)/(n h_n)}} \xrightarrow{d} \mathcal N(0,1). \end{align*} [/step] [step:Show that the plug-in standard error is asymptotically equivalent] We first prove pointwise consistency. The pointwise variance expansion gives \begin{align*} \operatorname{Var}(\hat f_{h_n}(x)) = \frac{f(x)R(K)}{n h_n}+o\left(\frac{1}{n h_n}\right) \to0, \end{align*} because the bandwidth assumptions include $n h_n\to\infty$. The undersmoothing condition gives \begin{align*} \sqrt{n h_n}\left(\mathbb{E}[\hat f_{h_n}(x)]-f(x)\right)\to0. \end{align*} Since $n h_n\to\infty$, this implies \begin{align*} \mathbb{E}[\hat f_{h_n}(x)]\to f(x). \end{align*} For every $\varepsilon>0$, [Chebyshev's inequality](/theorems/1126) applied to $\hat f_{h_n}(x)-\mathbb{E}[\hat f_{h_n}(x)]$ gives \begin{align*} \mathbb{P}\left( |\hat f_{h_n}(x)-f(x)|>\varepsilon \right) \leq \mathbb{P}\left( |\hat f_{h_n}(x)-\mathbb{E}[\hat f_{h_n}(x)]|>\frac{\varepsilon}{2} \right) \end{align*} for all sufficiently large $n$, because $|\mathbb{E}[\hat f_{h_n}(x)]-f(x)|\leq\varepsilon/2$ eventually. Hence \begin{align*} \mathbb{P}\left( |\hat f_{h_n}(x)-f(x)|>\varepsilon \right) \leq \frac{4\operatorname{Var}(\hat f_{h_n}(x))}{\varepsilon^2} \to0. \end{align*} Thus \begin{align*} \hat f_{h_n}(x)\xrightarrow{\mathbb P} f(x). \end{align*} Define the positive part $a^+:=\max\{a,0\}$ for $a\in\mathbb{R}$. Since $f(x)>0$ and $\hat f_{h_n}(x)\xrightarrow{\mathbb P} f(x)$, the [continuous mapping theorem](/theorems/1847) applied to the continuous map $a\mapsto \sqrt{a^+/f(x)}$ gives \begin{align*} \sqrt{\frac{\hat f_{h_n}(x)^+}{f(x)}}\xrightarrow{\mathbb P}1. \end{align*} Equivalently, \begin{align*} \frac{\sqrt{\hat f_{h_n}(x)^+R(K)/(n h_n)}}{\sqrt{f(x)R(K)/(n h_n)}}\xrightarrow{\mathbb P}1. \end{align*} Therefore, by Slutsky's theorem, \begin{align*} \frac{\hat f_{h_n}(x)-f(x)}{\sqrt{\hat f_{h_n}(x)^+R(K)/(n h_n)}}\xrightarrow{d}\mathcal N(0,1), \end{align*} where the statistic may be assigned any value on the event $\{\hat f_{h_n}(x)^+=0\}$, whose probability tends to $0$. [/step] [step:Convert the studentized limit into interval coverage] Let $\Phi: \mathbb{R}\to[0,1]$ denote the distribution function of $\mathcal N(0,1)$, and let $z_{1-\tau/2}$ be the number satisfying \begin{align*} \Phi(z_{1-\tau/2})=1-\tau/2. \end{align*} Let $L_n$ and $U_n$ denote the lower and upper endpoints \begin{align*} L_n:=\hat f_{h_n}(x)-z_{1-\tau/2}\sqrt{\frac{\hat f_{h_n}(x)^+R(K)}{n h_n}} \end{align*} and \begin{align*} U_n:=\hat f_{h_n}(x)+z_{1-\tau/2}\sqrt{\frac{\hat f_{h_n}(x)^+R(K)}{n h_n}}. \end{align*} Define $I_n:=[L_n,U_n]$. The event that the interval covers $f(x)$ is $\{f(x)\in I_n\}$. On the event $\{\hat f_{h_n}(x)^+>0\}$, this event is equivalent to \begin{align*} \left\{\left|\frac{\hat f_{h_n}(x)-f(x)}{\sqrt{\hat f_{h_n}(x)^+R(K)/(n h_n)}}\right|\leq z_{1-\tau/2}\right\}. \end{align*} The probability of $\{\hat f_{h_n}(x)^+=0\}$ tends to $0$ because $\hat f_{h_n}(x)\xrightarrow{\mathbb P}f(x)>0$. Since the studentized statistic converges in distribution to $\mathcal N(0,1)$ and the boundary points $\pm z_{1-\tau/2}$ have zero normal probability, \begin{align*} \mathbb{P}\left( \left| \frac{\hat f_{h_n}(x)-f(x)}{\sqrt{\hat f_{h_n}(x)^+R(K)/(n h_n)}} \right| \leq z_{1-\tau/2} \right) \to \Phi(z_{1-\tau/2})-\Phi(-z_{1-\tau/2}) = 1-\tau. \end{align*} Thus the positive-part plug-in pointwise interval has asymptotic coverage $1-\tau$. [/step]

Prerequisites (0/8 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Distribution Definition Event Definition Expectation Definition Random Variable Definition Variance Definition Central Limit Theorem for Nondegenerate U-Statistics Theorem #6336 Central Limit Theorem Theorem #521 Central Limit Theorem Theorem #1848 Inverse Transform Sampling Probability Theory Gilbert Varshamov Bound for Constant Weight Binary Codes Probability & Statistics Harmonicity of the Brownian Dirichlet Solution Brownian Motion Generated Sigma-Algebra of Maps Probability & Statistics Cramér-von Mises Theorem Probability & Statistics Sub-Exponential Random Variable Characterizations Probability & Statistics Le Cam's Second Lemma Probability & Statistics Exact Null Distribution of the Wilcoxon Rank-Sum Statistic Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.