Pointwise Asymptotic Bias and Variance of the Kernel Density Estimator

Pointwise Asymptotic Bias and Variance of the Kernel Density Estimator (Theorem # 6355)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We compute the expectation and variance directly from the one-sample summand in the kernel estimator. The expectation becomes a kernel average of $f(x-hu)$ after the change of variables $y=x-hu$, and a second-order Taylor expansion near $x$ gives the bias; symmetry of $K$ removes the first-order term. For the variance, independence reduces the computation to the variance of one summand, whose second moment has leading order $h^{-1} f(x)R(K)$, while the square of the mean is lower order after division by $n$. [/proofplan] [step:Rewrite the expectation as a kernel average of the density near $x$] Let $(\Omega,\mathcal{F},\mathbb{P})$ be the probability space from the statement, and let $X_1,\dots,X_n: (\Omega,\mathcal{F})\to(\mathbb{R},\mathcal{B}(\mathbb{R}))$ be the independent identically distributed real-valued random variables with common density $f$. For each $h>0$, define the measurable [random variable](/page/Random%20Variable) \begin{align*} Y_{h,1}: \Omega \to \mathbb{R}, \qquad Y_{h,1}(\omega)=\frac{1}{h}K\left(\frac{x-X_1(\omega)}{h}\right). \end{align*} Then \begin{align*} \hat f_h(x) = \frac{1}{n}\sum_{i=1}^{n} Y_{h,i}, \end{align*} where $Y_{h,i}$ is defined from $X_i$ in the same way as $Y_{h,1}$. Since $X_1$ has density $f$ with respect to $\mathcal{L}^1$, the expectation of $Y_{h,1}$ is \begin{align*} \mathbb{E}[Y_{h,1}] = \int_{\mathbb{R}} \frac{1}{h} K\left(\frac{x-y}{h}\right) f(y)\,d\mathcal{L}^1(y). \end{align*} Apply the change of variables $u=(x-y)/h$, equivalently $y=x-hu$. The one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) transforms as $d\mathcal{L}^1(y)=h\,d\mathcal{L}^1(u)$, and the domain $\mathbb{R}$ maps onto $\mathbb{R}$. Hence \begin{align*} \mathbb{E}[Y_{h,1}] = \int_{\mathbb{R}} K(u)f(x-hu)\,d\mathcal{L}^1(u). \end{align*} Linearity of expectation and identical distribution of the $Y_{h,i}$ give \begin{align*} \mathbb{E}[\hat f_h(x)] = \mathbb{E}[Y_{h,1}] = \int_{\mathbb{R}} K(u)f(x-hu)\,d\mathcal{L}^1(u). \end{align*} [guided] The estimator is an average of independent copies of a single kernel summand, so the first task is to compute the mean of that one summand. Let $(\Omega,\mathcal{F},\mathbb{P})$ be the probability space from the statement, and let $X_1,\dots,X_n: (\Omega,\mathcal{F})\to(\mathbb{R},\mathcal{B}(\mathbb{R}))$ be the independent identically distributed real-valued random variables with common density $f$. Define \begin{align*} Y_{h,1}: \Omega \to \mathbb{R}, \qquad Y_{h,1}(\omega)=\frac{1}{h}K\left(\frac{x-X_1(\omega)}{h}\right). \end{align*} For each $i \in \{1,\dots,n\}$, define $Y_{h,i}$ by replacing $X_1$ with $X_i$. Then \begin{align*} \hat f_h(x) = \frac{1}{n}\sum_{i=1}^{n}Y_{h,i}. \end{align*} Because $X_1$ has probability density function $f: \mathbb{R}\to[0,\infty)$ with respect to $\mathcal{L}^1$, expectation against $X_1$ is integration against $f(y)\,d\mathcal{L}^1(y)$. Therefore \begin{align*} \mathbb{E}[Y_{h,1}] = \int_{\mathbb{R}} \frac{1}{h}K\left(\frac{x-y}{h}\right) f(y)\,d\mathcal{L}^1(y). \end{align*} Now use the substitution $u=(x-y)/h$, so $y=x-hu$. Since $h>0$, this affine change of variables maps $\mathbb{R}$ bijectively onto $\mathbb{R}$ and transforms Lebesgue measure by \begin{align*} d\mathcal{L}^1(y)=h\,d\mathcal{L}^1(u). \end{align*} Substituting gives \begin{align*} \mathbb{E}[Y_{h,1}] = \int_{\mathbb{R}} K(u) f(x-hu)\,d\mathcal{L}^1(u). \end{align*} Finally, expectation is linear and all $Y_{h,i}$ have the same distribution, so \begin{align*} \mathbb{E}[\hat f_h(x)] = \frac{1}{n}\sum_{i=1}^{n}\mathbb{E}[Y_{h,i}] = \mathbb{E}[Y_{h,1}] = \int_{\mathbb{R}} K(u) f(x-hu)\,d\mathcal{L}^1(u). \end{align*} [/guided] [/step] [step:Expand the kernel average to obtain the pointwise bias] Let $U_x\subset\mathbb{R}$ be an open interval containing $x$ on which $f$ has two continuous derivatives. Choose $\delta>0$ such that $(x-\delta,x+\delta)\subset U_x$. Since $f''$ is continuous at $x$, define the remainder function $\rho: (-\delta,\delta) \to \mathbb{R}$ as follows. For $t\ne0$, set \begin{align*} \rho(t)=\frac{f(x+t)-f(x)-t f'(x)-\frac{t^2}{2}f''(x)}{t^2}. \end{align*} Set $\rho(0)=0$. Then $\rho(t)\to 0$ as $t\to 0$. For $|hu|<\delta$, \begin{align*} f(x-hu) = f(x)-hu f'(x)+\frac{h^2u^2}{2}f''(x)+h^2u^2\rho(-hu). \end{align*} Integrating this identity over $\{|u|<\delta/h\}$ against $K(u)\,d\mathcal{L}^1(u)$ gives the local contribution. The constant term over the truncated region is \begin{align*} f(x)\int_{|u|<\delta/h}K(u)\,d\mathcal{L}^1(u), \end{align*} which equals $f(x)\int_{\mathbb{R}}K(u)\,d\mathcal{L}^1(u)$ up to the tail $f(x)\int_{|u|\ge\delta/h}K(u)\,d\mathcal{L}^1(u)$. This constant-tail term satisfies \begin{align*} \left|f(x)\int_{|u|\ge\delta/h}K(u)\,d\mathcal{L}^1(u)\right| \le |f(x)|\int_{|u|\ge\delta/h}|K(u)|\,d\mathcal{L}^1(u) \le \frac{|f(x)|h^2}{\delta^2}\int_{|u|\ge\delta/h}u^2|K(u)|\,d\mathcal{L}^1(u) = o(h^2). \end{align*} The function $u\mapsto uK(u)$ is integrable on the bounded interval $\{|u|<\delta/h\}$ because $K$ is Lebesgue-integrable and $|u|\le \delta/h$ there. The first-order term vanishes on the symmetric interval $\{|u|<\delta/h\}$ because symmetry of $K$ makes $uK(u)$ odd. The second-order term over the truncated region equals \begin{align*} \frac{h^2 f''(x)}{2}\int_{|u|<\delta/h}u^2K(u)\,d\mathcal{L}^1(u), \end{align*} which differs from \begin{align*} \frac{h^2 f''(x)}{2}\int_{\mathbb{R}}u^2K(u)\,d\mathcal{L}^1(u) = \frac{h^2\mu_2(K)}{2}f''(x) \end{align*} by a tail error bounded by \begin{align*} \frac{h^2|f''(x)|}{2}\int_{|u|\ge \delta/h}u^2|K(u)|\,d\mathcal{L}^1(u) = o(h^2). \end{align*} The Taylor remainder satisfies \begin{align*} \left|h^2\int_{|u|<\delta/h}u^2K(u)\rho(-hu)\,d\mathcal{L}^1(u)\right| \le h^2\int_{\mathbb{R}}u^2|K(u)|\,|\rho(-hu)|\mathbb{1}_{\{|u|<\delta/h\}}\,d\mathcal{L}^1(u). \end{align*} Because $\rho(t)\to0$ as $t\to0$ and $\rho(0)=0$, after decreasing $\delta$ if necessary there is a finite constant \begin{align*} A_\rho := \sup_{|t|<\delta}|\rho(t)| < \infty. \end{align*} The integrand divided by $h^2$ converges pointwise to $0$ and is dominated by $A_\rho u^2|K(u)|$, which is integrable. Hence the [Dominated Convergence Theorem](/theorems/4) gives that this term is $o(h^2)$. It remains to control the part of the expectation over $\{|u|\ge \delta/h\}$. Since $f$ is bounded, let \begin{align*} M := \sup_{y\in\mathbb{R}} f(y) < \infty. \end{align*} Then \begin{align*} \left|\int_{|u|\ge \delta/h}K(u)f(x-hu)\,d\mathcal{L}^1(u)\right| \le M\int_{|u|\ge \delta/h}|K(u)|\,d\mathcal{L}^1(u) \le \frac{Mh^2}{\delta^2}\int_{|u|\ge \delta/h}u^2|K(u)|\,d\mathcal{L}^1(u) = o(h^2). \end{align*} Combining the local expansion and the tail estimate yields \begin{align*} \mathbb{E}[\hat f_h(x)]-f(x) = \frac{h^2\mu_2(K)}{2}f''(x)+o(h^2). \end{align*} [guided] The bias calculation is a local Taylor expansion plus a tail estimate. Let $U_x\subset\mathbb{R}$ be an open interval containing $x$ on which $f$ has two continuous derivatives. Choose $\delta>0$ such that $(x-\delta,x+\delta)\subset U_x$. Since $f''$ is continuous at $x$, [Taylor's theorem](/theorems/827) with Peano remainder gives a function $\rho: (-\delta,\delta)\to\mathbb{R}$ such that $\rho(t)\to0$ as $t\to0$ and, whenever $|t|<\delta$, \begin{align*} f(x+t)=f(x)+t f'(x)+\frac{t^2}{2}f''(x)+t^2\rho(t). \end{align*} Taking $t=-hu$ gives, for $|hu|<\delta$, \begin{align*} f(x-hu)=f(x)-hu f'(x)+\frac{h^2u^2}{2}f''(x)+h^2u^2\rho(-hu). \end{align*} We integrate this identity only over $\{|u|<\delta/h\}$ because precisely on that set the point $x-hu$ remains in the neighbourhood where the Taylor expansion is valid. The constant term contributes $f(x)\int_{|u|<\delta/h}K(u)\,d\mathcal{L}^1(u)$. Since $\int_{\mathbb{R}}K(u)\,d\mathcal{L}^1(u)=1$, replacing the truncated integral by the full integral introduces only the tail $f(x)\int_{|u|\ge\delta/h}K(u)\,d\mathcal{L}^1(u)$. This tail is small at the needed scale because \begin{align*} \left|f(x)\int_{|u|\ge\delta/h}K(u)\,d\mathcal{L}^1(u)\right| \le |f(x)|\int_{|u|\ge\delta/h}|K(u)|\,d\mathcal{L}^1(u) \le \frac{|f(x)|h^2}{\delta^2}\int_{|u|\ge\delta/h}u^2|K(u)|\,d\mathcal{L}^1(u) = o(h^2). \end{align*} The first-order term is \begin{align*} -h f'(x)\int_{|u|<\delta/h}uK(u)\,d\mathcal{L}^1(u). \end{align*} The function $u\mapsto uK(u)$ is integrable on $\{|u|<\delta/h\}$ because $K$ is Lebesgue-integrable and $|u|\le\delta/h$ on this bounded interval. The interval $\{|u|<\delta/h\}$ is symmetric about $0$, and symmetry of $K$ makes $uK(u)$ odd, so this integral is $0$. The second-order term is \begin{align*} \frac{h^2 f''(x)}{2}\int_{|u|<\delta/h}u^2K(u)\,d\mathcal{L}^1(u). \end{align*} Extending this truncated integral to $\mathbb{R}$ gives the main term \begin{align*} \frac{h^2 f''(x)}{2}\int_{\mathbb{R}}u^2K(u)\,d\mathcal{L}^1(u)=\frac{h^2\mu_2(K)}{2}f''(x), \end{align*} and the discarded second-moment tail is $o(h^2)$ because $u^2|K(u)|$ is integrable. For the Taylor remainder, we estimate \begin{align*} \left|h^2\int_{|u|<\delta/h}u^2K(u)\rho(-hu)\,d\mathcal{L}^1(u)\right| \le h^2\int_{\mathbb{R}}u^2|K(u)|\,|\rho(-hu)|\mathbb{1}_{\{|u|<\delta/h\}}\,d\mathcal{L}^1(u). \end{align*} Because $\rho(t)\to0$ as $t\to0$ and $\rho(0)=0$, after decreasing $\delta$ if necessary the constant \begin{align*} A_\rho := \sup_{|t|<\delta}|\rho(t)| \end{align*} is finite. After dividing by $h^2$, the integrand converges pointwise to $0$ and is dominated by the integrable function $A_\rho u^2|K(u)|$. The [Dominated Convergence Theorem](/theorems/4) therefore gives an $o(h^2)$ remainder. It remains to control the part of the expectation with $|u|\ge\delta/h$. With $M:=\sup_{y\in\mathbb{R}}f(y)<\infty$, \begin{align*} \left|\int_{|u|\ge \delta/h}K(u)f(x-hu)\,d\mathcal{L}^1(u)\right| \le M\int_{|u|\ge \delta/h}|K(u)|\,d\mathcal{L}^1(u) \le \frac{Mh^2}{\delta^2}\int_{|u|\ge \delta/h}u^2|K(u)|\,d\mathcal{L}^1(u)=o(h^2). \end{align*} Combining the Taylor expansion, the vanishing first-order term, and the tail estimates gives \begin{align*} \mathbb{E}[\hat f_h(x)]-f(x)=\frac{h^2\mu_2(K)}{2}f''(x)+o(h^2). \end{align*} [/guided] [/step] [step:Compute the leading second moment of one kernel summand] Using the density of $X_1$, \begin{align*} \mathbb{E}[Y_{h,1}^2] = \int_{\mathbb{R}}\frac{1}{h^2}K\left(\frac{x-y}{h}\right)^2 f(y)\,d\mathcal{L}^1(y). \end{align*} With the same substitution $u=(x-y)/h$, so $y=x-hu$ and $d\mathcal{L}^1(y)=h\,d\mathcal{L}^1(u)$, this becomes \begin{align*} \mathbb{E}[Y_{h,1}^2] = \frac{1}{h}\int_{\mathbb{R}}K(u)^2 f(x-hu)\,d\mathcal{L}^1(u). \end{align*} Since $f$ is continuous at $x$, $f(x-hu)\to f(x)$ for each fixed $u\in\mathbb{R}$. Since $f$ is bounded by $M$ and $K^2$ is integrable, the [Dominated Convergence Theorem](/theorems/4) gives \begin{align*} \int_{\mathbb{R}}K(u)^2 f(x-hu)\,d\mathcal{L}^1(u) \to f(x)\int_{\mathbb{R}}K(u)^2\,d\mathcal{L}^1(u) = f(x)R(K). \end{align*} Therefore \begin{align*} \mathbb{E}[Y_{h,1}^2] = \frac{f(x)R(K)}{h}+o\left(\frac{1}{h}\right). \end{align*} [guided] The variance of the average will be controlled by the second moment of one summand, so we compute that moment explicitly. Since \begin{align*} Y_{h,1} = \frac{1}{h}K\left(\frac{x-X_1}{h}\right), \end{align*} and $X_1$ has density $f$, we have \begin{align*} \mathbb{E}[Y_{h,1}^2] = \int_{\mathbb{R}}\frac{1}{h^2}K\left(\frac{x-y}{h}\right)^2 f(y)\,d\mathcal{L}^1(y). \end{align*} Use the same change of variables as in the expectation calculation: $u=(x-y)/h$, equivalently $y=x-hu$. The measure transforms as $d\mathcal{L}^1(y)=h\,d\mathcal{L}^1(u)$, and the full real line maps onto the full real line. Thus \begin{align*} \mathbb{E}[Y_{h,1}^2] = \frac{1}{h}\int_{\mathbb{R}}K(u)^2 f(x-hu)\,d\mathcal{L}^1(u). \end{align*} The factor $1/h$ is the source of the leading variance order. It remains to identify the limit of the integral. For each fixed $u\in\mathbb{R}$, the condition $h\to0$ gives $x-hu\to x$, and continuity of $f$ at $x$ gives \begin{align*} f(x-hu)\to f(x). \end{align*} To pass the limit through the integral, use the [Dominated Convergence Theorem](/theorems/4). The domination is valid because $f$ is bounded: with \begin{align*} M := \sup_{y\in\mathbb{R}} f(y) < \infty, \end{align*} we have \begin{align*} |K(u)^2 f(x-hu)| \le M K(u)^2. \end{align*} The function $u\mapsto M K(u)^2$ is integrable because $R(K)<\infty$. Hence \begin{align*} \int_{\mathbb{R}}K(u)^2 f(x-hu)\,d\mathcal{L}^1(u) \to f(x)\int_{\mathbb{R}}K(u)^2\,d\mathcal{L}^1(u) = f(x)R(K). \end{align*} Multiplying by $1/h$ gives \begin{align*} \mathbb{E}[Y_{h,1}^2] = \frac{f(x)R(K)}{h}+o\left(\frac{1}{h}\right). \end{align*} [/guided] [/step] [step:Subtract the squared mean and divide by $n$] From the bias expansion, \begin{align*} \mathbb{E}[Y_{h,1}] = f(x)+O(h^2), \end{align*} so \begin{align*} (\mathbb{E}[Y_{h,1}])^2 = O(1). \end{align*} Therefore \begin{align*} \operatorname{Var}(Y_{h,1}) = \mathbb{E}[Y_{h,1}^2]-(\mathbb{E}[Y_{h,1}])^2 = \frac{f(x)R(K)}{h}+o\left(\frac{1}{h}\right), \end{align*} because the bounded term $O(1)$ is $o(1/h)$ as $h\to0$. The random variables $Y_{h,1},\dots,Y_{h,n}$ are independent because $X_1,\dots,X_n$ are independent and each $Y_{h,i}$ is a measurable function of $X_i$. Hence \begin{align*} \operatorname{Var}(\hat f_h(x)) = \operatorname{Var}\left(\frac{1}{n}\sum_{i=1}^{n}Y_{h,i}\right) = \frac{1}{n^2}\sum_{i=1}^{n}\operatorname{Var}(Y_{h,i}) = \frac{1}{n}\operatorname{Var}(Y_{h,1}). \end{align*} Substituting the single-summand variance gives \begin{align*} \operatorname{Var}(\hat f_h(x)) = \frac{f(x)R(K)}{nh}+o\left(\frac{1}{nh}\right). \end{align*} Together with the bias expansion above, this proves both asserted asymptotic formulas. [guided] We now assemble the variance from the one-summand calculation. The bias expansion already proved that \begin{align*} \mathbb{E}[Y_{h,1}]=f(x)+O(h^2), \end{align*} so the squared mean satisfies \begin{align*} (\mathbb{E}[Y_{h,1}])^2=O(1). \end{align*} The second-moment computation gave \begin{align*} \mathbb{E}[Y_{h,1}^2]=\frac{f(x)R(K)}{h}+o\left(\frac{1}{h}\right). \end{align*} Since $h\to0$, every bounded quantity is $o(1/h)$. Therefore subtracting the squared mean does not change the leading order: \begin{align*} \operatorname{Var}(Y_{h,1})=\mathbb{E}[Y_{h,1}^2]-(\mathbb{E}[Y_{h,1}])^2=\frac{f(x)R(K)}{h}+o\left(\frac{1}{h}\right). \end{align*} Because $X_1,\dots,X_n$ are independent and each $Y_{h,i}$ is a measurable function of $X_i$, the random variables $Y_{h,1},\dots,Y_{h,n}$ are independent. Variance is additive for independent random variables, so \begin{align*} \operatorname{Var}(\hat f_h(x))=\operatorname{Var}\left(\frac{1}{n}\sum_{i=1}^{n}Y_{h,i}\right)=\frac{1}{n^2}\sum_{i=1}^{n}\operatorname{Var}(Y_{h,i})=\frac{1}{n}\operatorname{Var}(Y_{h,1}). \end{align*} Substituting the one-summand asymptotic gives \begin{align*} \operatorname{Var}(\hat f_h(x))=\frac{f(x)R(K)}{nh}+o\left(\frac{1}{nh}\right). \end{align*} Together with the bias formula, this proves both asserted asymptotic statements. [/guided] [/step]

Prerequisites (0/5 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Continuity Definition Distribution Definition Expectation Definition Lebesgue Measure Definition Variance Definition Gambler's Ruin Recurrence Probability Theory Marchenko Pastur Edge for White Noise Probability & Statistics Uniform Entropy Integral Sufficient Condition for Donsker Classes Probability & Statistics Oracle Inequality for the Lasso under a Uniform Compatibility Condition Probability & Statistics Marchenko-Pastur Stieltjes Transform Equation Probability & Statistics Exact Validity of Finite Group Permutation Tests Probability & Statistics Binomial Distribution from Independent Bernoulli Trials Probability Theory Canonical Minimax Rates in Gaussian High-Dimensional Models Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.