Pointwise Bias Expansion for Kernel Density Estimators

Pointwise Bias Expansion for Kernel Density Estimators (Theorem # 6314)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first compute the expectation of the kernel density estimator and rewrite it as an integral against $K(u)f(x-hu)$. Since $K$ has compact support, only values of $f$ in a bounded shrinking neighbourhood of $x$ enter the integral. We then apply [Taylor's theorem](/theorems/827) to $f(x-hu)$ through order $s$, use the kernel moment conditions to cancel all terms except the zeroth and $s$th terms, and control the Taylor remainder uniformly on the support of $K$ by continuity of $f^{(s)}$ at $x$. [/proofplan] [step:Rewrite the expectation as a localized convolution integral] Let $R>0$ be such that $\operatorname{supp} K \subset [-R,R]$. By the definition of a kernel of order $s$, $K\in L^1(\mathbb R)$, its zeroth moment equals $1$, and its moments of orders $1,\dots,s-1$ vanish. For each $h>0$, define the kernel density estimator $\hat f_{n,h}:\mathbb R\to\mathbb R$ by \begin{align*} \hat f_{n,h}(t) = \frac{1}{nh}\sum_{i=1}^n K\!\left(\frac{t-X_i}{h}\right), \qquad t\in\mathbb R. \end{align*} Fix $h>0$ small enough that $x+[-hR,hR]$ is contained in a neighbourhood on which $f$ is continuous. Then $f$ is bounded on the compact interval $x+[-hR,hR]$. Moreover, the map $\kappa_h:\mathbb R\to\mathbb R$ defined by $\kappa_h(y)=K((x-y)/h)$ belongs to $L^1(\mathbb R)$, because the affine change of variables $u=(x-y)/h$ gives \begin{align*} \int_{\mathbb R}|\kappa_h(y)|\,d\mathcal L^1(y)=h\int_{\mathbb R}|K(u)|\,d\mathcal L^1(u)<\infty. \end{align*} Thus $K((x-\cdot)/h)f$ is integrable with respect to $\mathcal L^1$ because its support is contained in $x+[-hR,hR]$ and $f$ is bounded there. Since the random variables $X_1,\dots,X_n$ are identically distributed with density $f$, linearity of expectation gives \begin{align*} \mathbb E[\hat f_{n,h}(x)] = \frac{1}{nh}\sum_{i=1}^n \mathbb E\!\left[K\!\left(\frac{x-X_i}{h}\right)\right]. \end{align*} Since all summands have the same expectation and each $X_i$ has density $f$ with respect to $\mathcal{L}^1$, \begin{align*} \mathbb E[\hat f_{n,h}(x)] = \frac{1}{h}\int_{\mathbb{R}} K\!\left(\frac{x-y}{h}\right)f(y)\,d\mathcal{L}^1(y). \end{align*} For $h>0$, use the [change of variables formula](/theorems/22) with $u=(x-y)/h$, equivalently $y=x-hu$. Under this affine substitution, $d\mathcal{L}^1(y)=h\,d\mathcal{L}^1(u)$, and the domain $\mathbb{R}$ is mapped onto $\mathbb{R}$. Hence \begin{align*} \mathbb E[\hat f_{n,h}(x)] = \int_{\mathbb{R}} K(u)f(x-hu)\,d\mathcal{L}^1(u). \end{align*} [guided] Let $R>0$ be such that $\operatorname{supp} K \subset [-R,R]$. The expectation reduces to a one-dimensional integral because each $X_i$ has density $f$ with respect to $\mathcal{L}^1$, but first we state exactly which estimator is being averaged. For each $h>0$, define the map $\hat f_{n,h}:\mathbb R\to\mathbb R$ by \begin{align*} \hat f_{n,h}(t) = \frac{1}{nh}\sum_{i=1}^n K\!\left(\frac{t-X_i}{h}\right), \qquad t\in\mathbb R. \end{align*} Now we check that the integrand appearing in the expectation is integrable. By the definition of a kernel of order $s$, $K\in L^1(\mathbb R)$, its zeroth moment equals $1$, and its moments of orders $1,\dots,s-1$ vanish. For $h>0$ small enough that $x+[-hR,hR]$ is contained in a neighbourhood on which $f$ is continuous, the function $f$ is bounded on the compact interval $x+[-hR,hR]$. Also $K((x-\cdot)/h)$ is supported in $x+[-hR,hR]$. To see that the rescaled kernel is integrable, define $\kappa_h:\mathbb R\to\mathbb R$ by $\kappa_h(y)=K((x-y)/h)$. The affine substitution $u=(x-y)/h$ gives \begin{align*} \int_{\mathbb R}|\kappa_h(y)|\,d\mathcal L^1(y)=h\int_{\mathbb R}|K(u)|\,d\mathcal L^1(u)<\infty. \end{align*} Thus $K((x-\cdot)/h)f$ is integrable with respect to $\mathcal L^1$, and for each index $i \in \{1,\dots,n\}$, \begin{align*} \mathbb E\!\left[K\!\left(\frac{x-X_i}{h}\right)\right] = \int_{\mathbb{R}} K\!\left(\frac{x-y}{h}\right)f(y)\,d\mathcal{L}^1(y). \end{align*} The variables are identically distributed, so all $n$ summands have the same expectation. Therefore \begin{align*} \mathbb E[\hat f_{n,h}(x)] = \frac{1}{nh}\sum_{i=1}^n \int_{\mathbb{R}} K\!\left(\frac{x-y}{h}\right)f(y)\,d\mathcal{L}^1(y). \end{align*} Because the $n$ summands are identical, this becomes \begin{align*} \mathbb E[\hat f_{n,h}(x)] = \frac{1}{h}\int_{\mathbb{R}} K\!\left(\frac{x-y}{h}\right)f(y)\,d\mathcal{L}^1(y). \end{align*} Now apply the [change of variables formula](/theorems/22) with $u=(x-y)/h$, so $y=x-hu$. Because $h>0$, the one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) transforms by \begin{align*} d\mathcal{L}^1(y)=h\,d\mathcal{L}^1(u). \end{align*} The affine map $u \mapsto x-hu$ maps $\mathbb{R}$ onto $\mathbb{R}$, so the integration domain remains the whole real line. Substituting gives \begin{align*} \mathbb E[\hat f_{n,h}(x)] = \frac{1}{h}\int_{\mathbb{R}} K(u)f(x-hu)\,h\,d\mathcal{L}^1(u). \end{align*} Cancelling the scalar factor $h$ yields \begin{align*} \mathbb E[\hat f_{n,h}(x)] = \int_{\mathbb{R}} K(u)f(x-hu)\,d\mathcal{L}^1(u). \end{align*} This is the form in which the kernel moments can act directly on the Taylor expansion of $f$ at $x$. [/guided] [/step] [step:Apply Taylor's theorem uniformly on the support of the kernel] Let $I\subset\mathbb R$ be an open interval containing $x$ on which $f$ has $s$ continuous derivatives. Choose $\delta>0$ such that $(x-\delta,x+\delta)\subset I$. For $0<h<\delta/R$, and for every $u \in \operatorname{supp}K$, the point $x-hu$ lies in $I$. Thus the one-dimensional Taylor theorem with integral remainder, applied with order parameter $s-1$, applies to the restriction $f|_I:I\to\mathbb R$ on the line segment from $x$ to $x-hu$. For $u\in\operatorname{supp}K$, Taylor's theorem gives \begin{align*} f(x-hu) = \sum_{k=0}^{s-1} \frac{(-hu)^k}{k!}f^{(k)}(x) + \frac{(-hu)^s}{(s-1)!}\int_0^1 (1-\theta)^{s-1} f^{(s)}(x-\theta hu)\,d\mathcal L^1(\theta). \end{align*} Adding and subtracting the term $\frac{(-hu)^s}{s!}f^{(s)}(x)$, and using \begin{align*} \int_0^1 (1-\theta)^{s-1}\,d\mathcal L^1(\theta)=\frac{1}{s}, \end{align*} we obtain \begin{align*} f(x-hu) = \sum_{k=0}^s \frac{(-hu)^k}{k!}f^{(k)}(x) + \rho_h(u), \end{align*} where the remainder function $\rho_h:\mathbb R \to \mathbb{R}$ is defined on $\operatorname{supp}K$ by \begin{align*} \rho_h(u)=\frac{(-hu)^s}{(s-1)!}\int_0^1 (1-\theta)^{s-1}\left(f^{(s)}(x-\theta hu)-f^{(s)}(x)\right)\,d\mathcal L^1(\theta), \end{align*} and is defined by $\rho_h(u)=0$ for $u\notin\operatorname{supp}K$. On $\operatorname{supp}K$, it satisfies \begin{align*} |\rho_h(u)| \leq \frac{h^s |u|^s}{(s-1)!}\int_0^1 (1-\theta)^{s-1} \left|f^{(s)}(x-\theta hu)-f^{(s)}(x)\right|\,d\mathcal{L}^1(\theta). \end{align*} Define the modulus quantity \begin{align*} \omega(h) := \sup\left\{\left|f^{(s)}(z)-f^{(s)}(x)\right| : |z-x|\leq hR\right\}. \end{align*} Since $f^{(s)}$ is continuous at $x$, $\omega(h)\to 0$ as $h\to0^+$. Thus \begin{align*} |\rho_h(u)| \leq \frac{h^s |u|^s}{s!}\omega(h) \end{align*} for every $u \in \operatorname{supp}K$. [guided] Let $I\subset\mathbb R$ be an open interval containing $x$ on which $f$ has $s$ continuous derivatives. The reason for introducing $I$ is that Taylor's theorem must be applied on an interval containing both the expansion point $x$ and the evaluation point $x-hu$. Choose $\delta>0$ such that $(x-\delta,x+\delta)\subset I$. Since $R$ was chosen positive and $\operatorname{supp}K\subset[-R,R]$, if $0<h<\delta/R$ and $u\in\operatorname{supp}K$, then $|hu|\le hR<\delta$, so $x-hu\in I$. The one-dimensional Taylor theorem with integral remainder, applied with order parameter $s-1$, applies to the map $f|_I:I\to\mathbb R$ because $f$ has $s$ continuous derivatives on $I$ and the segment from $x$ to $x-hu$ is contained in $I$. This version of Taylor's theorem requires the $s$th derivative as the integral-remainder derivative. For each $u\in\operatorname{supp}K$, we add and subtract the $s$th Taylor term to write \begin{align*} f(x-hu) = \sum_{k=0}^s \frac{(-hu)^k}{k!}f^{(k)}(x) + \rho_h(u), \end{align*} where the remainder function $\rho_h:\mathbb R\to\mathbb R$ is defined on $\operatorname{supp}K$ as the difference between the integral remainder through order $s-1$ and the added $s$th term, and is defined by $\rho_h(u)=0$ for $u\notin\operatorname{supp}K$. On $\operatorname{supp}K$, it satisfies \begin{align*} |\rho_h(u)| \leq \frac{h^s |u|^s}{(s-1)!}\int_0^1 (1-\theta)^{s-1} \left|f^{(s)}(x-\theta hu)-f^{(s)}(x)\right|\,d\mathcal{L}^1(\theta). \end{align*} Define \begin{align*} \omega(h) := \sup\left\{\left|f^{(s)}(z)-f^{(s)}(x)\right| : |z-x|\leq hR\right\}. \end{align*} For $u\in\operatorname{supp}K$ and $\theta\in[0,1]$, the point $z=x-\theta hu$ satisfies $|z-x|\le hR$, so the integrand is bounded above by $\omega(h)$. Therefore \begin{align*} |\rho_h(u)| \leq \frac{h^s |u|^s\omega(h)}{(s-1)!}\int_0^1 (1-\theta)^{s-1}\,d\mathcal{L}^1(\theta). \end{align*} Since \begin{align*} \int_0^1 (1-\theta)^{s-1}\,d\mathcal{L}^1(\theta)=\frac{1}{s}, \end{align*} we obtain \begin{align*} |\rho_h(u)| \leq \frac{h^s |u|^s}{s!}\omega(h). \end{align*} Finally, continuity of $f^{(s)}$ at $x$ gives $\omega(h)\to0$ as $h\to0^+$, which is the uniform smallness needed after integration against $K$. [/guided] [/step] [step:Use the kernel moment conditions to isolate the leading bias term] For each $k\in\{0,\dots,s\}$, the function $u\mapsto u^kK(u)$ is integrable with respect to $\mathcal L^1$ because $K$ is compactly supported and $K\in L^1(\mathbb R)$. The remainder estimate gives $|K(u)\rho_h(u)|\leq h^s\omega(h)|u|^s|K(u)|/s!$ on $\operatorname{supp}K$, and the right-hand side is integrable. Hence linearity of the [Lebesgue integral](/page/Lebesgue%20Integral) allows the finite Taylor sum and the remainder term to be integrated separately. Substituting the Taylor expansion into the expectation formula gives \begin{align*} \mathbb E[\hat f_{n,h}(x)] = \sum_{k=0}^s \frac{(-h)^k}{k!}f^{(k)}(x) \int_{\mathbb{R}} u^kK(u)\,d\mathcal{L}^1(u) + \int_{\mathbb{R}} K(u)\rho_h(u)\,d\mathcal{L}^1(u). \end{align*} The zeroth moment equals $1$, and the moments of orders $1,\dots,s-1$ vanish. Therefore \begin{align*} \mathbb E[\hat f_{n,h}(x)] - f(x) = \frac{(-1)^s h^s}{s!}f^{(s)}(x)\mu_s(K) + \int_{\mathbb{R}} K(u)\rho_h(u)\,d\mathcal{L}^1(u). \end{align*} [guided] We now use exactly the moment conditions encoded in the phrase kernel of order $s$. First, for each $k\in\{0,\dots,s\}$, the function $u\mapsto u^kK(u)$ is integrable with respect to $\mathcal L^1$: the factor $|u|^k$ is bounded on the compact set $[-R,R]$, and $K\in L^1(\mathbb R)$. The Taylor expansion from the previous step is valid for every $u\in\operatorname{supp}K$. Multiplying by $K(u)$ and integrating is legitimate because the finite Taylor terms are integrable, and the remainder estimate gives \begin{align*} |K(u)\rho_h(u)|\leq \frac{h^s\omega(h)}{s!}|u|^s|K(u)| \end{align*} on $\operatorname{supp}K$, with the right-hand side integrable with respect to $\mathcal L^1$. Hence linearity of the Lebesgue integral gives \begin{align*} \mathbb E[\hat f_{n,h}(x)] = \sum_{k=0}^s \frac{(-h)^k}{k!}f^{(k)}(x) \int_{\mathbb{R}} u^kK(u)\,d\mathcal{L}^1(u) + \int_{\mathbb{R}} K(u)\rho_h(u)\,d\mathcal{L}^1(u). \end{align*} The normalization condition gives \begin{align*} \int_{\mathbb{R}}K(u)\,d\mathcal L^1(u)=1, \end{align*} and the order-$s$ moment conditions give \begin{align*} \int_{\mathbb{R}}u^kK(u)\,d\mathcal L^1(u)=0,\qquad 1\leq k\leq s-1. \end{align*} Thus all Taylor terms except the constant term and the $s$th term disappear after integration. Using the definition \begin{align*} \mu_s(K)=\int_{\mathbb R}u^sK(u)\,d\mathcal L^1(u), \end{align*} we obtain \begin{align*} \mathbb E[\hat f_{n,h}(x)] - f(x) = \frac{(-1)^s h^s}{s!}f^{(s)}(x)\mu_s(K) + \int_{\mathbb{R}} K(u)\rho_h(u)\,d\mathcal{L}^1(u). \end{align*} [/guided] [/step] [step:Show that the integrated Taylor remainder is $o(h^s)$] Because $\operatorname{supp}K \subset [-R,R]$, the function $u \mapsto |u|^s|K(u)|$ is integrable with respect to $\mathcal{L}^1$. Using the triangle inequality for the Lebesgue integral, \begin{align*} \left|\int_{\mathbb{R}} K(u)\rho_h(u)\,d\mathcal{L}^1(u)\right| \leq \int_{\mathbb{R}} |K(u)|\,|\rho_h(u)|\,d\mathcal{L}^1(u). \end{align*} Using the remainder bound from the Taylor step on $\operatorname{supp}K$, and noting that $K(u)=0$ for $u\notin\operatorname{supp}K$ up to $\mathcal{L}^1$-null changes of representative, gives \begin{align*} \left|\int_{\mathbb{R}} K(u)\rho_h(u)\,d\mathcal{L}^1(u)\right| \leq \frac{h^s\omega(h)}{s!} \int_{\mathbb{R}} |u|^s|K(u)|\,d\mathcal{L}^1(u). \end{align*} The integral on the right-hand side is finite and independent of $h$, while $\omega(h)\to0$. Hence \begin{align*} \int_{\mathbb{R}} K(u)\rho_h(u)\,d\mathcal{L}^1(u)=o(h^s). \end{align*} Combining this estimate with the previous identity yields \begin{align*} \mathbb E[\hat f_{n,h}(x)] - f(x) = \frac{(-1)^s h^s}{s!} f^{(s)}(x)\mu_s(K) + o(h^s), \end{align*} which is the asserted pointwise bias expansion. [guided] It remains to prove that the integrated Taylor remainder is smaller than $h^s$. Because $\operatorname{supp}K \subset [-R,R]$ and $K\in L^1(\mathbb R)$, the function $u\mapsto |u|^s|K(u)|$ is integrable with respect to $\mathcal L^1$. Apply the triangle inequality for the Lebesgue integral to the remainder term: \begin{align*} \left|\int_{\mathbb{R}} K(u)\rho_h(u)\,d\mathcal{L}^1(u)\right| \leq \int_{\mathbb{R}} |K(u)|\,|\rho_h(u)|\,d\mathcal{L}^1(u). \end{align*} The pointwise Taylor remainder bound applies on $\operatorname{supp}K$. Outside $\operatorname{supp}K$, the value of $K$ is zero up to the chosen $\mathcal L^1$-representative, so the same integral bound over $\mathbb R$ gives \begin{align*} \left|\int_{\mathbb{R}} K(u)\rho_h(u)\,d\mathcal{L}^1(u)\right| \leq \frac{h^s\omega(h)}{s!} \int_{\mathbb{R}} |u|^s|K(u)|\,d\mathcal{L}^1(u). \end{align*} The integral \begin{align*} \int_{\mathbb{R}} |u|^s|K(u)|\,d\mathcal{L}^1(u) \end{align*} is finite and independent of $h$, while continuity of $f^{(s)}$ at $x$ gives $\omega(h)\to0$ as $h\to0^+$. Therefore \begin{align*} \int_{\mathbb{R}} K(u)\rho_h(u)\,d\mathcal{L}^1(u)=o(h^s). \end{align*} Substituting this into the identity from the moment-cancellation step yields \begin{align*} \mathbb E[\hat f_{n,h}(x)] - f(x) = \frac{(-1)^s h^s}{s!} f^{(s)}(x)\mu_s(K) + o(h^s), \end{align*} which is the desired pointwise bias expansion. [/guided] [/step]

Prerequisites (0/9 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.