Pointwise Asymptotic Normality of the Kernel Density Estimator

Pointwise Asymptotic Normality of the Kernel Density Estimator (Theorem # 6317)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We write the centred kernel estimator as the sum of a triangular array of independent centred random variables. The variance of this array is computed directly by a change of variables and the [dominated convergence theorem](/theorems/4), giving the limiting variance $f(x)R(K)$. Boundedness and compact support of $K$ imply that the individual triangular-array summands converge uniformly to $0$, so the Lindeberg condition holds. The final centring replacement follows because the scaled bias is deterministic and tends to $0$. [/proofplan] [step:Rewrite the centred estimator as a triangular-array sum] Let $(\Omega,\mathcal F,\mathbb P)$ denote the probability space on which the real-valued random variables $X_i:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ are defined. Let $\mathcal L^1$ denote one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on $\mathbb R$, let $\mathbb E[Z]:=\int_\Omega Z\,d\mathbb P$ denote expectation of an integrable real-valued [random variable](/page/Random%20Variable) $Z$, and let $\operatorname{Var}(Z):=\mathbb E[(Z-\mathbb E[Z])^2]$ denote its variance when $Z\in L^2(\Omega,\mathcal F,\mathbb P)$. Define the kernel estimator by \begin{align*} \hat f_{n,h_n}(x):=\frac{1}{nh_n}\sum_{i=1}^{n}K\left(\frac{x-X_i}{h_n}\right). \end{align*} Define the squared kernel integral by \begin{align*} R(K):=\int_{\mathbb R}K(t)^2\,d\mathcal L^1(t). \end{align*} Let $\operatorname{supp}K$ denote the closed support of the measurable kernel $K:\mathbb R\to\mathbb R$. Because $K$ is measurable, each composition $\omega\mapsto K((x-X_i(\omega))/h_n)$ is measurable. For each $n\in\mathbb N$ and $1\le i\le n$, define the random variable \begin{align*} W_{n,i}:\Omega\to\mathbb R,\qquad W_{n,i}(\omega)=h_n^{-1}K\left(\frac{x-X_i(\omega)}{h_n}\right), \end{align*} and define \begin{align*} Y_{n,i}:\Omega\to\mathbb R,\qquad Y_{n,i}(\omega)=\sqrt{\frac{h_n}{n}}\left(W_{n,i}(\omega)-\mathbb E[W_{n,i}]\right). \end{align*} For each fixed $n$, the random variables $Y_{n,1},\dots,Y_{n,n}$ are independent because $X_1,\dots,X_n$ are independent, and they are centred by construction. Moreover, \begin{align*} \sum_{i=1}^{n}Y_{n,i}=\sqrt{\frac{h_n}{n}}\sum_{i=1}^{n}\left(h_n^{-1}K\left(\frac{x-X_i}{h_n}\right)-\mathbb E\left[h_n^{-1}K\left(\frac{x-X_i}{h_n}\right)\right]\right). \end{align*} Equivalently, \begin{align*} \sum_{i=1}^{n}Y_{n,i}=\sqrt{nh_n}\left(\frac{1}{nh_n}\sum_{i=1}^{n}K\left(\frac{x-X_i}{h_n}\right)-\mathbb E[\hat f_{n,h_n}(x)]\right). \end{align*} By the definition of $\hat f_{n,h_n}(x)$, \begin{align*} \sum_{i=1}^{n}Y_{n,i}=\sqrt{nh_n}\left(\hat f_{n,h_n}(x)-\mathbb E[\hat f_{n,h_n}(x)]\right). \end{align*} Thus it suffices to prove a [central limit theorem](/theorems/521) for the triangular array $(Y_{n,i})_{1\le i\le n}$. [/step] [step:Compute the limiting variance of the triangular array] Let \begin{align*} \sigma_n^2:=\sum_{i=1}^{n}\operatorname{Var}(Y_{n,i}). \end{align*} Since the $Y_{n,i}$ have the same distribution for fixed $n$, \begin{align*} \sigma_n^2=n\operatorname{Var}(Y_{n,1})=h_n\operatorname{Var}(W_{n,1}). \end{align*} We compute the two terms in this variance. Since $X_1$ has density $f$ with respect to $\mathcal L^1$, \begin{align*} \mathbb E[W_{n,1}] &=\int_{\mathbb R} h_n^{-1}K\left(\frac{x-u}{h_n}\right)f(u)\,d\mathcal L^1(u). \end{align*} Under the substitution $t=(x-u)/h_n$, equivalently $u=x-h_nt$, the one-dimensional Lebesgue measure transforms as $d\mathcal L^1(u)=h_n\,d\mathcal L^1(t)$, and the domain $\mathbb R$ is mapped onto $\mathbb R$. Therefore \begin{align*} \mathbb E[W_{n,1}] &=\int_{\mathbb R}K(t)f(x-h_nt)\,d\mathcal L^1(t). \end{align*} Similarly, \begin{align*} h_n\mathbb E[W_{n,1}^2] =h_n\int_{\mathbb R}h_n^{-2}K\left(\frac{x-u}{h_n}\right)^2f(u)\,d\mathcal L^1(u). \end{align*} Applying the same substitution gives \begin{align*} h_n\mathbb E[W_{n,1}^2] =\int_{\mathbb R}K(t)^2f(x-h_nt)\,d\mathcal L^1(t). \end{align*} Choose $A>0$ such that $\operatorname{supp}K\subset[-A,A]$. Since $f$ is continuous at $x$, there exist $\delta>0$ and $B<\infty$ such that $f(y)\le B$ whenever $|y-x|<\delta$. For all sufficiently large $n$, $h_nA<\delta$, so on $\operatorname{supp}K$ we have $f(x-h_nt)\le B$. Since $K$ is bounded and supported in $[-A,A]$, the functions $t\mapsto B K(t)^2$ and $t\mapsto B|K(t)|$ are integrable with respect to $\mathcal L^1$. The dominated convergence theorem, applied first with dominating function $B K(t)^2$ and then with dominating function $B|K(t)|$, gives \begin{align*} \int_{\mathbb R}K(t)^2f(x-h_nt)\,d\mathcal L^1(t) \to f(x)\int_{\mathbb R}K(t)^2\,d\mathcal L^1(t) =f(x)R(K), \end{align*} and \begin{align*} \int_{\mathbb R}K(t)f(x-h_nt)\,d\mathcal L^1(t) \to f(x)\int_{\mathbb R}K(t)\,d\mathcal L^1(t). \end{align*} Hence $\mathbb E[W_{n,1}]$ is bounded as $n\to\infty$, and because $h_n\to0$, \begin{align*} h_n(\mathbb E[W_{n,1}])^2\to0. \end{align*} Combining the preceding identities, \begin{align*} \sigma_n^2=h_n\operatorname{Var}(W_{n,1}). \end{align*} The variance identity gives \begin{align*} \sigma_n^2=h_n\mathbb E[W_{n,1}^2]-h_n(\mathbb E[W_{n,1}])^2. \end{align*} Therefore \begin{align*} \sigma_n^2\to f(x)R(K). \end{align*} [guided] The variance computation is the point where the scaling $\sqrt{nh_n}$ is determined. We need the total variance of the triangular array to converge to a finite non-zero quantity, and the natural candidate is the local value of the density times the squared $L^2$ size of the kernel. For each $n\in\mathbb N$, define \begin{align*} \sigma_n^2:=\sum_{i=1}^{n}\operatorname{Var}(Y_{n,i}). \end{align*} Because $X_1,\dots,X_n$ are identically distributed, the random variables $Y_{n,1},\dots,Y_{n,n}$ also have the same distribution. Therefore \begin{align*} \sigma_n^2=n\operatorname{Var}(Y_{n,1})=n\operatorname{Var}\left(\sqrt{\frac{h_n}{n}}W_{n,1}\right)=h_n\operatorname{Var}(W_{n,1}), \end{align*} where subtracting the mean inside the definition of $Y_{n,1}$ does not change the variance except for centring. We now compute $\mathbb E[W_{n,1}]$ and $\mathbb E[W_{n,1}^2]$ from the density of $X_1$. Since $X_1$ has density $f$ with respect to $\mathcal L^1$, \begin{align*} \mathbb E[W_{n,1}] &=\int_{\mathbb R} h_n^{-1}K\left(\frac{x-u}{h_n}\right)f(u)\,d\mathcal L^1(u). \end{align*} Use the substitution $t=(x-u)/h_n$, so that $u=x-h_nt$. The map $t\mapsto x-h_nt$ sends $\mathbb R$ onto $\mathbb R$, and the one-dimensional Lebesgue measure transforms by \begin{align*} d\mathcal L^1(u)=h_n\,d\mathcal L^1(t). \end{align*} Thus \begin{align*} \mathbb E[W_{n,1}] &=\int_{\mathbb R}K(t)f(x-h_nt)\,d\mathcal L^1(t). \end{align*} The same substitution applied to the second moment starts from \begin{align*} h_n\mathbb E[W_{n,1}^2] =h_n\int_{\mathbb R}h_n^{-2}K\left(\frac{x-u}{h_n}\right)^2f(u)\,d\mathcal L^1(u). \end{align*} After substituting $t=(x-u)/h_n$, this becomes \begin{align*} h_n\mathbb E[W_{n,1}^2] =\int_{\mathbb R}K(t)^2f(x-h_nt)\,d\mathcal L^1(t). \end{align*} Now we justify the limiting passage. Since $K$ has compact support, choose $A>0$ such that $\operatorname{supp}K\subset[-A,A]$. Since $f$ is continuous at $x$, it is bounded in a neighbourhood of $x$: there exist $\delta>0$ and $B<\infty$ such that $f(y)\le B$ whenever $|y-x|<\delta$. For all sufficiently large $n$, $h_nA<\delta$. Hence, whenever $t\in\operatorname{supp}K$, we have \begin{align*} |x-h_nt-x|\le h_nA<\delta, \end{align*} and therefore $f(x-h_nt)\le B$. The functions $t\mapsto K(t)^2f(x-h_nt)$ are then dominated by the integrable function $t\mapsto B K(t)^2$ supported on $[-A,A]$. Also $f(x-h_nt)\to f(x)$ for each fixed $t\in\mathbb R$, because $h_n\to0$ and $f$ is continuous at $x$. By the dominated convergence theorem, \begin{align*} \int_{\mathbb R}K(t)^2f(x-h_nt)\,d\mathcal L^1(t) \to f(x)\int_{\mathbb R}K(t)^2\,d\mathcal L^1(t) =f(x)R(K). \end{align*} For the first moment, the possible sign changes of $K$ require domination by the absolute value. Since $K$ is bounded and compactly supported, the function $t\mapsto B|K(t)|$ is integrable with respect to $\mathcal L^1$, and \begin{align*} |K(t)f(x-h_nt)|\le B|K(t)| \end{align*} for all sufficiently large $n$ and all $t\in\operatorname{supp}K$. Applying the dominated convergence theorem again gives \begin{align*} \int_{\mathbb R}K(t)f(x-h_nt)\,d\mathcal L^1(t) \to f(x)\int_{\mathbb R}K(t)\,d\mathcal L^1(t). \end{align*} In particular, $\mathbb E[W_{n,1}]$ is bounded as $n\to\infty$. Therefore \begin{align*} h_n(\mathbb E[W_{n,1}])^2\to0, \end{align*} because $h_n\to0$. Finally, \begin{align*} \sigma_n^2=h_n\operatorname{Var}(W_{n,1}). \end{align*} Using the variance identity gives \begin{align*} \sigma_n^2=h_n\mathbb E[W_{n,1}^2]-h_n(\mathbb E[W_{n,1}])^2. \end{align*} Combining the two limits proved above, \begin{align*} \sigma_n^2\to f(x)R(K). \end{align*} This proves that the total variance of the centred triangular array converges to the variance appearing in the claimed normal limit. [/guided] [/step] [step:Verify the Lindeberg condition from the uniform boundedness of the kernel] Let $M:=\sup_{t\in\mathbb R}|K(t)|<\infty$. From the preceding step, there is a constant $C_0<\infty$ such that $|\mathbb E[W_{n,1}]|\le C_0$ for all sufficiently large $n$. Hence, for all sufficiently large $n$ and all $1\le i\le n$, \begin{align*} |Y_{n,i}|\le \sqrt{\frac{h_n}{n}}\left(|W_{n,i}|+|\mathbb E[W_{n,i}]|\right). \end{align*} Using \begin{align*} |W_{n,i}|\le \frac{M}{h_n} \end{align*} gives \begin{align*} |Y_{n,i}|\le \sqrt{\frac{h_n}{n}}\left(\frac{M}{h_n}+C_0\right). \end{align*} Equivalently, \begin{align*} |Y_{n,i}|\le\frac{M}{\sqrt{nh_n}}+C_0\sqrt{\frac{h_n}{n}}. \end{align*} Since \begin{align*} nh_n\to\infty \end{align*} and \begin{align*} \frac{h_n}{n}\to0, \end{align*} the right-hand side tends to $0$. Therefore, for every $\varepsilon>0$, there exists $N_\varepsilon\in\mathbb N$ such that $|Y_{n,i}|\le\varepsilon$ for all $n\ge N_\varepsilon$ and all $1\le i\le n$. Consequently, \begin{align*} \sum_{i=1}^{n}\mathbb E\left[Y_{n,i}^2\mathbb{1}_{\{|Y_{n,i}|>\varepsilon\}}\right]=0 \end{align*} for all $n\ge N_\varepsilon$. This is the Lindeberg condition for the triangular array $(Y_{n,i})_{1\le i\le n}$. [guided] We need to verify that no single summand in the triangular array can contribute a non-negligible jump. Let \begin{align*} M:=\sup_{t\in\mathbb R}|K(t)|<\infty. \end{align*} From the variance computation, the sequence $\mathbb E[W_{n,1}]$ is bounded for all sufficiently large $n$; hence there is a constant $C_0<\infty$ such that $|\mathbb E[W_{n,1}]|\le C_0$ for all sufficiently large $n$. For such $n$ and all $1\le i\le n$, \begin{align*} |Y_{n,i}|\le \sqrt{\frac{h_n}{n}}\left(|W_{n,i}|+|\mathbb E[W_{n,i}]|\right). \end{align*} The boundedness of $K$ gives \begin{align*} |W_{n,i}|=h_n^{-1}\left|K\left(\frac{x-X_i}{h_n}\right)\right|\le \frac{M}{h_n}. \end{align*} Therefore \begin{align*} |Y_{n,i}|\le\frac{M}{\sqrt{nh_n}}+C_0\sqrt{\frac{h_n}{n}}. \end{align*} The first term tends to $0$ because $nh_n\to\infty$, and the second term tends to $0$ because $h_n\to0$ while $n\to\infty$. Thus for every $\varepsilon>0$ there exists $N_\varepsilon\in\mathbb N$ such that $|Y_{n,i}|\le\varepsilon$ for all $n\ge N_\varepsilon$ and all $1\le i\le n$. Hence the event $\{|Y_{n,i}|>\varepsilon\}$ is empty for every row index $i$ once $n\ge N_\varepsilon$, and so \begin{align*} \sum_{i=1}^{n}\mathbb E\left[Y_{n,i}^2\mathbb{1}_{\{|Y_{n,i}|>\varepsilon\}}\right]=0. \end{align*} This proves the Lindeberg condition. [/guided] [/step] [step:Apply the triangular-array central limit theorem] If $R(K)>0$, then the variance limit from the previous step is \begin{align*} \sigma^2:=f(x)R(K)>0. \end{align*} The triangular array $(Y_{n,i})_{1\le i\le n}$ is row-wise independent, centred, satisfies the Lindeberg condition, and has total variance $\sigma_n^2\to\sigma^2$. By the Lindeberg-Feller [central limit theorem](/theorems/1848) for triangular arrays applied to this row-wise independent centred array, \begin{align*} \sum_{i=1}^{n}Y_{n,i}\xrightarrow{d}\mathcal N(0,\sigma^2) =\mathcal N(0,f(x)R(K)). \end{align*} Using the identity from the first step, \begin{align*} \sqrt{nh_n}\left(\hat f_{n,h_n}(x)-\mathbb E[\hat f_{n,h_n}(x)]\right) \xrightarrow{d}\mathcal N(0,f(x)R(K)). \end{align*} If $R(K)=0$, then $K=0$ $\mathcal L^1$-a.e. Since each $X_i$ has a density with respect to $\mathcal L^1$, the random variable $K((x-X_i)/h_n)$ is $0$ almost surely for every $i$ and $n$. Hence the centred estimator is identically $0$, and the same conclusion holds with the degenerate normal law $\mathcal N(0,0)$. [guided] There are two cases, depending on whether the limiting variance is positive. First suppose $R(K)>0$. Since $f(x)>0$, define \begin{align*} \sigma^2:=f(x)R(K)>0. \end{align*} The preceding steps verify exactly the hypotheses of the Lindeberg-Feller central limit theorem for triangular arrays: the array is row-wise independent, each summand is centred, the Lindeberg condition holds for every $\varepsilon>0$, and the total variance satisfies $\sigma_n^2\to\sigma^2$. Therefore \begin{align*} \sum_{i=1}^{n}Y_{n,i}\xrightarrow{d}\mathcal N(0,\sigma^2) =\mathcal N(0,f(x)R(K)). \end{align*} The first step identified this sum with the scaled centred estimator, so \begin{align*} \sqrt{nh_n}\left(\hat f_{n,h_n}(x)-\mathbb E[\hat f_{n,h_n}(x)]\right) \xrightarrow{d}\mathcal N(0,f(x)R(K)). \end{align*} If $R(K)=0$, then \begin{align*} \int_{\mathbb R}K(t)^2\,d\mathcal L^1(t)=0, \end{align*} so $K=0$ $\mathcal L^1$-a.e. Because each $X_i$ has a density with respect to $\mathcal L^1$, the transformed random variable $K((x-X_i)/h_n)$ is $0$ almost surely for every $i$ and $n$. Hence the centred estimator is identically $0$, and the asserted convergence holds with the degenerate normal law $\mathcal N(0,0)$. [/guided] [/step] [step:Replace the expectation by $f(x)$ when the scaled bias vanishes] Assume now that, for some $s>0$, \begin{align*} \mathbb E[\hat f_{n,h_n}(x)]-f(x)=O(h_n^s) \end{align*} and that $\sqrt{nh_n}\,h_n^s\to0$. Then there are constants $C_b<\infty$ and $N_b\in\mathbb N$ such that, for all $n\ge N_b$, \begin{align*} \left|\mathbb E[\hat f_{n,h_n}(x)]-f(x)\right|\le C_bh_n^s. \end{align*} Multiplying by $\sqrt{nh_n}$ gives \begin{align*} \sqrt{nh_n}\left|\mathbb E[\hat f_{n,h_n}(x)]-f(x)\right| \le C_b\sqrt{nh_n}\,h_n^s\to0. \end{align*} Therefore \begin{align*} \sqrt{nh_n}\left(\hat f_{n,h_n}(x)-f(x)\right)=\sqrt{nh_n}\left(\hat f_{n,h_n}(x)-\mathbb E[\hat f_{n,h_n}(x)]\right)+\sqrt{nh_n}\left(\mathbb E[\hat f_{n,h_n}(x)]-f(x)\right), \end{align*} where the first term converges in distribution to $\mathcal N(0,f(x)R(K))$ and the second term is deterministic and converges to $0$. Hence, by the elementary stability of convergence in distribution under addition of deterministic $o(1)$ terms, \begin{align*} \sqrt{nh_n}\left(\hat f_{n,h_n}(x)-f(x)\right) \xrightarrow{d}\mathcal N(0,f(x)R(K)). \end{align*} This proves the bias-corrected centring statement and completes the proof. [/step]

Prerequisites (0/10 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.