Pointwise Variance Asymptotic for the Kernel Density Estimator

Pointwise Variance Asymptotic for the Kernel Density Estimator (Theorem # 6315)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first reduce the variance of the estimator to the variance of one summand using independence. We then compute the second moment of a single kernel summand by changing variables from the sample variable $y$ to the kernel variable $u = (x-y)/h$. The compact support of $K$ localizes the integral near $x$, where continuity of $f$ gives [uniform convergence](/page/Uniform%20Convergence) of $f(x-hu)$ to $f(x)$. Finally, the squared mean term is bounded as $h \downarrow 0$, hence lower order than $h^{-1}$ after the variance scaling. [/proofplan] [step:Reduce the estimator variance to one summand] Let $(\Omega,\mathcal{F},\mathbb{P})$ denote the probability space on which the random variables $X_1,\dots,X_n$ are defined. For each $h > 0$ and each $i \in \{1,\dots,n\}$, define the real-valued [random variable](/page/Random%20Variable) $Y_{i,h}: \Omega \to \mathbb{R}$ by \begin{align*} Y_{i,h}(\omega) = \frac{1}{h}K\left(\frac{x - X_i(\omega)}{h}\right). \end{align*} Then \begin{align*} \hat f_{n,h}(x) = \frac{1}{n}\sum_{i=1}^n Y_{i,h}. \end{align*} Since $X_1,\dots,X_n$ are independent and identically distributed, the random variables $Y_{1,h},\dots,Y_{n,h}$ are independent and identically distributed. Therefore the [variance of a sum of independent random variables](/theorems/1119) gives \begin{align*} \operatorname{Var}(\hat f_{n,h}(x)) = \operatorname{Var}\left(\frac{1}{n}\sum_{i=1}^n Y_{i,h}\right) = \frac{1}{n^2}\sum_{i=1}^n \operatorname{Var}(Y_{i,h}) = \frac{1}{n}\operatorname{Var}(Y_{1,h}). \end{align*} Thus it remains to prove \begin{align*} \operatorname{Var}(Y_{1,h}) = \frac{f(x)}{h}R(K) + o\left(\frac{1}{h}\right). \end{align*} [/step] [step:Compute the second moment by changing variables] Since $X_1$ has density $f$ with respect to $\mathcal{L}^1$, the second moment of $Y_{1,h}$ is \begin{align*} \mathbb{E}[Y_{1,h}^2] &= \int_{\mathbb{R}} \frac{1}{h^2}K\left(\frac{x-y}{h}\right)^2 f(y)\,d\mathcal{L}^1(y). \end{align*} Use the change of variables $u = (x-y)/h$, equivalently $y = x-hu$. Since $h>0$, one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) transforms as $d\mathcal{L}^1(y) = h\,d\mathcal{L}^1(u)$. Therefore \begin{align*} \mathbb{E}[Y_{1,h}^2] &= \frac{1}{h}\int_{\mathbb{R}} K(u)^2 f(x-hu)\,d\mathcal{L}^1(u). \end{align*} [guided] We need the precise size of the second moment because the variance is the second moment minus the square of the mean. Starting from the definition of $Y_{1,h}$ and using the fact that $X_1$ has Lebesgue density $f$, we get \begin{align*} \mathbb{E}[Y_{1,h}^2] = \int_{\mathbb{R}} \left(\frac{1}{h}K\left(\frac{x-y}{h}\right)\right)^2 f(y)\,d\mathcal{L}^1(y) = \int_{\mathbb{R}} \frac{1}{h^2}K\left(\frac{x-y}{h}\right)^2 f(y)\,d\mathcal{L}^1(y). \end{align*} The natural kernel variable is $u = (x-y)/h$, so we set $y = x-hu$. Because $h>0$, this affine map scales one-dimensional Lebesgue measure by the factor $h$, meaning \begin{align*} d\mathcal{L}^1(y) = h\,d\mathcal{L}^1(u). \end{align*} The domain $\mathbb{R}$ is mapped onto $\mathbb{R}$ under this substitution. Hence \begin{align*} \mathbb{E}[Y_{1,h}^2] = \int_{\mathbb{R}} \frac{1}{h^2}K(u)^2 f(x-hu)\,h\,d\mathcal{L}^1(u) = \frac{1}{h}\int_{\mathbb{R}} K(u)^2 f(x-hu)\,d\mathcal{L}^1(u). \end{align*} This is the decisive normalization: the factor $h^{-1}$ is the source of the leading variance term. [/guided] [/step] [step:Use continuity at $x$ on the compact support of $K$] Let $S := \operatorname{supp}K \subset \mathbb{R}$. Since $K$ is compactly supported, $S$ is compact. Define \begin{align*} A_h := \int_{\mathbb{R}} K(u)^2 f(x-hu)\,d\mathcal{L}^1(u). \end{align*} Because $K(u)=0$ for $\mathcal{L}^1$-a.e. $u \notin S$, we have \begin{align*} A_h - f(x)R(K) = \int_S K(u)^2\bigl(f(x-hu)-f(x)\bigr)\,d\mathcal{L}^1(u). \end{align*} Continuity of $f$ at $x$ implies \begin{align*} \sup_{u \in S}|f(x-hu)-f(x)| \to 0 \end{align*} as $h \downarrow 0$: indeed, if $M := \sup_{u \in S}|u| < \infty$, then $|x-hu-x| \le hM$. Since $K \in L^2(\mathbb{R})$, \begin{align*} \int_S K(u)^2\,d\mathcal{L}^1(u) = R(K) < \infty. \end{align*} Therefore \begin{align*} |A_h - f(x)R(K)| \le \sup_{u \in S}|f(x-hu)-f(x)|\int_S K(u)^2\,d\mathcal{L}^1(u) = R(K)\sup_{u \in S}|f(x-hu)-f(x)| \to 0. \end{align*} Thus \begin{align*} \mathbb{E}[Y_{1,h}^2] = \frac{f(x)}{h}R(K) + o\left(\frac{1}{h}\right). \end{align*} [/step] [step:Show the squared mean is lower order] The first moment of $Y_{1,h}$ is \begin{align*} \mathbb{E}[Y_{1,h}] = \int_{\mathbb{R}} \frac{1}{h}K\left(\frac{x-y}{h}\right)f(y)\,d\mathcal{L}^1(y) = \int_{\mathbb{R}} K(u)f(x-hu)\,d\mathcal{L}^1(u), \end{align*} using the same substitution $u=(x-y)/h$ and $d\mathcal{L}^1(y)=h\,d\mathcal{L}^1(u)$. Since $K$ is supported on the compact set $S$, the function $K$ belongs to $L^1(\mathbb{R})$ by Cauchy-Schwarz. Indeed, \begin{align*} \int_{\mathbb{R}} |K(u)|\,d\mathcal{L}^1(u) = \int_S |K(u)|\,d\mathcal{L}^1(u) \le \left(\mathcal{L}^1(S)\right)^{1/2}\left(\int_S K(u)^2\,d\mathcal{L}^1(u)\right)^{1/2} < \infty. \end{align*} By continuity of $f$ at $x$, there exist $\delta>0$ and $B>0$ such that $|f(y)|\le B$ whenever $|y-x|<\delta$. For all sufficiently small $h>0$, the inclusion $x-hS \subset (x-\delta,x+\delta)$ holds. Hence \begin{align*} |\mathbb{E}[Y_{1,h}]| \le \int_S |K(u)|\,|f(x-hu)|\,d\mathcal{L}^1(u) \le B\int_S |K(u)|\,d\mathcal{L}^1(u). \end{align*} Thus $\mathbb{E}[Y_{1,h}] = O(1)$ as $h \downarrow 0$, and consequently \begin{align*} (\mathbb{E}[Y_{1,h}])^2 = O(1) = o\left(\frac{1}{h}\right). \end{align*} [/step] [step:Combine the moment estimates and rescale by $n$] By the identity $\operatorname{Var}(Y_{1,h})=\mathbb{E}[Y_{1,h}^2]-(\mathbb{E}[Y_{1,h}])^2$, the preceding two steps give \begin{align*} \operatorname{Var}(Y_{1,h}) = \frac{f(x)}{h}R(K) + o\left(\frac{1}{h}\right) - o\left(\frac{1}{h}\right) = \frac{f(x)}{h}R(K) + o\left(\frac{1}{h}\right). \end{align*} Using the variance reduction from independence, \begin{align*} \operatorname{Var}(\hat f_{n,h}(x)) = \frac{1}{n}\operatorname{Var}(Y_{1,h}) = \frac{f(x)}{nh}R(K) + o\left(\frac{1}{nh}\right). \end{align*} This is the asserted pointwise variance asymptotic. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Lebesgue Measure Definition Continuity Definition Variance Definition Le Cam Two-Point Method Probability & Statistics Bandwidth-Scale Stochastic Equicontinuity for Kernel Density Processes Probability & Statistics Factorisation Criterion for Independence Probability Theory Non-comparability of McDiarmid's Bounded-Difference Proxy with Variance Probability & Statistics Conditional Convergence Theorems Conditional Expectation Assouad's Lemma Probability & Statistics Properties of the Discrete Conditional Expectation Conditional Expectation Conditional Expectation as the $L^2$ Risk Minimizer Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.