MISE Expansion for Kernel Density Estimators with an Order $s$ Kernel

MISE Expansion for Kernel Density Estimators with an Order $s$ Kernel (Theorem # 6318)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We decompose the mean integrated squared error into the integrated squared bias and the integrated variance. The bias is the difference between $f$ and its kernel average; the order $s$ moment conditions eliminate all lower Taylor terms, and translation continuity of $f^{(s)}$ makes the Taylor remainder small in $L^2(\mathbb{R})$. The variance is computed directly by integrating the pointwise variance of one summand, giving the leading term $R(K)/(nh)$ and a lower-order subtraction from the squared mean. The condition $nh_n\to\infty$ is used after the variance expansion to record that this integrated variance contribution vanishes along the stated bandwidth sequence. [/proofplan] [step:Decompose the MISE into integrated bias and integrated variance] Let $(\Omega,\mathcal F,\mathbb P)$ denote the probability space on which the i.i.d. real-valued random variables $X_1,\dots,X_n$ are defined. For each $x\in\mathbb{R}$ and each $i\in\{1,\dots,n\}$, define the real-valued [random variable](/page/Random%20Variable) $Y_{h,x,i}:\Omega\to\mathbb{R}$ by \begin{align*} Y_{h,x,i}(\omega)=\frac{1}{h}K\left(\frac{x-X_i(\omega)}{h}\right). \end{align*} Then \begin{align*} \hat f_{n,h}(x)=\frac{1}{n}\sum_{i=1}^n Y_{h,x,i}. \end{align*} Define the mean function $m_h:\mathbb{R}\to\mathbb{R}$ by \begin{align*} m_h(x)=\mathbb{E}[\hat f_{n,h}(x)]. \end{align*} Since the summands are independent and identically distributed, \begin{align*} \operatorname{Var}(\hat f_{n,h}(x))=\frac{1}{n}\operatorname{Var}(Y_{h,x,1}). \end{align*} We use the convention \begin{align*} \operatorname{MISE}(\hat f_{n,h},f) = \int_{\mathbb{R}}\mathbb{E}\!\left[\left|\hat f_{n,h}(x)-f(x)\right|^2\right]\,dx, \end{align*} which agrees with the expectation of the integrated squared error whenever the latter is finite, by Tonelli's theorem for the non-negative squared error. Using the identity $\mathbb{E}[(Z-a)^2]=\operatorname{Var}(Z)+(\mathbb{E}[Z]-a)^2$ with $Z=\hat f_{n,h}(x)$ and $a=f(x)$, and integrating over $\mathbb{R}$, we obtain \begin{align*} \operatorname{MISE}(\hat f_{n,h},f) = \int_{\mathbb{R}} |m_h(x)-f(x)|^2\,dx + \int_{\mathbb{R}} \operatorname{Var}(\hat f_{n,h}(x))\,dx. \end{align*} [/step] [step:Express the bias as a kernel average of translated functions] Because $X_1$ has density $f$ with respect to [Lebesgue measure](/page/Lebesgue%20Measure), for every $x\in\mathbb{R}$, \begin{align*} m_h(x) &= \frac{1}{h}\int_{\mathbb{R}}K\left(\frac{x-y}{h}\right)f(y)\,dy. \end{align*} Apply the change of variables $u=(x-y)/h$, equivalently $y=x-hu$; this gives $dy=h\,du$ because $h>0$. Thus \begin{align*} m_h(x) = \int_{\mathbb{R}}K(u)f(x-hu)\,du. \end{align*} Since $\int_{\mathbb{R}}K(u)\,du=1$, define the bias function $b_h:\mathbb R\to\mathbb R$ by \begin{align*} b_h(x)=m_h(x)-f(x),\qquad x\in\mathbb{R}, \end{align*} and then $b_h$ satisfies \begin{align*} b_h(x) = \int_{\mathbb{R}}K(u)\bigl(f(x-hu)-f(x)\bigr)\,du. \end{align*} [/step] [step:Use the order $s$ moment conditions to isolate the leading bias term] For $0\le j\le s$, let $f^{(j)}$ denote the $j$th [weak derivative](/page/Weak%20Derivative) of $f$, with $f^{(0)}:=f$. For fixed $u\in\mathbb{R}$ and $0\le k\le s-1$, the map $t\mapsto f^{(k)}(\cdot-tu)$ is absolutely continuous as an $L^2(\mathbb{R})$-valued map on bounded intervals, with derivative $-u f^{(k+1)}(\cdot-tu)$ for almost every $t$. This follows by first checking the identity for smooth compactly supported functions and then passing to functions with weak derivatives through order $s$ in $L^2(\mathbb{R})$ by approximation, using strong continuity of translations on $L^2(\mathbb{R})$. Iterating this Banach-valued [fundamental theorem of calculus](/theorems/632) gives, in $L^2(\mathbb{R})$ for each fixed $u\in\mathbb{R}$, \begin{align*} f(\cdot-hu)-f = \sum_{j=1}^{s-1}\frac{(-hu)^j}{j!}f^{(j)} + \frac{(-hu)^s}{(s-1)!} \int_0^1(1-\theta)^{s-1}f^{(s)}(\cdot-\theta hu)\,d\theta. \end{align*} Here the integral in $\theta$ is a Bochner integral in $L^2(\mathbb{R})$. Since translations are isometries on $L^2(\mathbb{R})$, the $L^2$ norm of the remainder integrand is bounded by a constant times $|u|^s\|f^{(s)}\|_{L^2(\mathbb{R})}$. Because $K$ is compactly supported and belongs to $L^2(\mathbb{R})$, the function $u\mapsto |u|^s|K(u)|$ belongs to $L^1(\mathbb{R})$. Thus the resulting $u$-integral is a well-defined Bochner integral in $L^2(\mathbb{R})$. Multiplying by $K(u)$ and integrating over $u$, the moment conditions \begin{align*} \int_{\mathbb{R}}u^jK(u)\,du=0 \qquad 1\le j\le s-1 \end{align*} remove all lower-order terms. Therefore \begin{align*} b_h = \frac{(-1)^sh^s}{(s-1)!} \int_{\mathbb{R}}u^sK(u) \int_0^1(1-\theta)^{s-1}f^{(s)}(\cdot-\theta hu)\,d\theta\,du. \end{align*} Define the function $a:\mathbb{R}\to\mathbb{R}$ by \begin{align*} a(x)=\frac{(-1)^s\mu_s(K)}{s!}f^{(s)}(x) \end{align*} and \begin{align*} r_h: \mathbb{R} &\to \mathbb{R} \end{align*} by the identity $b_h=h^s(a+r_h)$. Explicitly, \begin{align*} r_h = \frac{(-1)^s}{(s-1)!} \int_{\mathbb{R}}u^sK(u) \int_0^1(1-\theta)^{s-1} \bigl(f^{(s)}(\cdot-\theta hu)-f^{(s)}\bigr) \,d\theta\,du. \end{align*} [guided] The bias is where the order of the kernel is used. We want to compare the translated function $f(\cdot-hu)$ to $f$ in a way that keeps the dependence on $u$ visible, because the kernel moments are integrals in the variable $u$. For each fixed $u\in\mathbb{R}$, the Sobolev Taylor formula for translations gives the identity in $L^2(\mathbb{R})$. This formula follows by applying the fundamental theorem of calculus for Sobolev functions iteratively to the $L^2(\mathbb{R})$-valued translation path $t\mapsto f(\cdot-tu)$: \begin{align*} f(\cdot-hu)-f = \sum_{j=1}^{s-1}\frac{(-hu)^j}{j!}f^{(j)} + \frac{(-hu)^s}{(s-1)!} \int_0^1(1-\theta)^{s-1}f^{(s)}(\cdot-\theta hu)\,d\theta. \end{align*} The hypotheses needed here are exactly that $f$ has weak derivatives through order $s$ in $L^2(\mathbb{R})$. More explicitly, for $0\le k\le s-1$ the translation path $t\mapsto f^{(k)}(\cdot-tu)$ is absolutely continuous in $L^2(\mathbb{R})$ on bounded intervals, and its derivative is $-u f^{(k+1)}(\cdot-tu)$ for almost every $t$. This is obtained by approximation from smooth functions and the strong continuity of translations on $L^2(\mathbb{R})$. Iterating the Banach-valued fundamental theorem of calculus gives the displayed Taylor formula. The integral over $\theta$ is interpreted as an $L^2(\mathbb{R})$-valued integral. Because $K$ is compactly supported and belongs to $L^2(\mathbb{R})$, both $K$ and $u^sK(u)$ belong to $L^1(\mathbb{R})$; translations preserve the $L^2$ norm of $f^{(s)}$, so the Taylor remainder is integrable in $u$ as a Bochner integral. The displayed $L^2(\mathbb{R})$-valued Taylor identity may therefore be multiplied by $K(u)$ and integrated in $u$. Substitute this expansion into \begin{align*} b_h = \int_{\mathbb{R}}K(u)\bigl(f(\cdot-hu)-f\bigr)\,du. \end{align*} For $1\le j\le s-1$, the coefficient of $f^{(j)}$ is \begin{align*} \frac{(-h)^j}{j!}\int_{\mathbb{R}}u^jK(u)\,du, \end{align*} and this is zero by the order $s$ moment condition. This is the reason an order $s$ kernel has bias of size $h^s$ rather than size $h$. After the lower-order terms vanish, only the integral remainder remains: \begin{align*} b_h = \frac{(-1)^sh^s}{(s-1)!} \int_{\mathbb{R}}u^sK(u) \int_0^1(1-\theta)^{s-1}f^{(s)}(\cdot-\theta hu)\,d\theta\,du. \end{align*} To identify the leading term, add and subtract $f^{(s)}$ inside the inner integral. Since \begin{align*} \int_0^1(1-\theta)^{s-1}\,d\theta=\frac{1}{s}, \end{align*} the part containing $f^{(s)}$ equals \begin{align*} \frac{(-1)^sh^s}{(s-1)!} \left(\int_{\mathbb{R}}u^sK(u)\,du\right) \left(\frac{1}{s}\right)f^{(s)} = h^s\frac{(-1)^s\mu_s(K)}{s!}f^{(s)}. \end{align*} Thus, with define $a:\mathbb{R}\to\mathbb{R}$ by \begin{align*} a(x)=\frac{(-1)^s\mu_s(K)}{s!}f^{(s)}(x), \end{align*} we may write $b_h=h^s(a+r_h)$, where \begin{align*} r_h = \frac{(-1)^s}{(s-1)!} \int_{\mathbb{R}}u^sK(u) \int_0^1(1-\theta)^{s-1} \bigl(f^{(s)}(\cdot-\theta hu)-f^{(s)}\bigr) \,d\theta\,du. \end{align*} This separates the deterministic leading bias from the translation-continuity remainder. It remains in this guided argument to see that the remainder is genuinely smaller than the leading bias scale. The compact support of $K$ gives a number $A>0$ with $\operatorname{supp}K\subset[-A,A]$. Given $\varepsilon>0$, choose $\delta>0$ so that $|t|<\delta$ implies $\|f^{(s)}(\cdot-t)-f^{(s)}\|_{L^2(\mathbb{R})}<\varepsilon$. For $h<\delta/A$, every $u\in[-A,A]$ and $\theta\in[0,1]$ satisfies $|\theta hu|<\delta$. Hence the assumed $L^2$ translation continuity of $f^{(s)}$ gives \begin{align*} \sup_{\substack{|u|\le A,\;0\le\theta\le1}} \|f^{(s)}(\cdot-\theta hu)-f^{(s)}\|_{L^2(\mathbb{R})} \to0. \end{align*} [Minkowski's integral inequality](/theorems/464) then bounds $\|r_h\|_{L^2(\mathbb{R})}$ by this supremum times the finite constant \begin{align*} \frac{1}{(s-1)!} \int_{\mathbb{R}}|u|^s|K(u)| \int_0^1(1-\theta)^{s-1}\,d\theta\,du, \end{align*} so $\|r_h\|_{L^2(\mathbb{R})}\to0$. Thus the guided computation is a complete proof of the $L^2$ bias expansion. [/guided] [/step] [step:Show the Taylor remainder is small in $L^2(\mathbb{R})$] Let $A>0$ be such that $\operatorname{supp}K\subset[-A,A]$. Since $K$ is compactly supported and measurable with $K\in L^2(\mathbb{R})$, the function $u\mapsto |u|^s|K(u)|$ belongs to $L^1(\mathbb{R})$. By Minkowski's integral inequality for Bochner integrals applied in the [Banach space](/page/Banach%20Space) $L^2(\mathbb{R})$, \begin{align*} \|r_h\|_{L^2(\mathbb{R})} &\le \frac{1}{(s-1)!} \int_{\mathbb{R}}|u|^s|K(u)| \int_0^1(1-\theta)^{s-1} \|f^{(s)}(\cdot-\theta hu)-f^{(s)}\|_{L^2(\mathbb{R})} \,d\theta\,du. \end{align*} For $u\in[-A,A]$ and $\theta\in[0,1]$, the translation parameter $\theta hu$ tends to $0$ uniformly as $h\to0$. Indeed, given $\varepsilon>0$, translation continuity of $f^{(s)}$ gives $\delta>0$ such that \begin{align*} |t|<\delta \quad\Longrightarrow\quad \|f^{(s)}(\cdot-t)-f^{(s)}\|_{L^2(\mathbb{R})}<\varepsilon. \end{align*} If $h<\delta/A$, then $|\theta hu|<\delta$ for every $|u|\le A$ and every $\theta\in[0,1]$. Hence \begin{align*} \sup_{\substack{|u|\le A,\;0\le \theta\le 1}} \|f^{(s)}(\cdot-\theta hu)-f^{(s)}\|_{L^2(\mathbb{R})} \to 0. \end{align*} Therefore $\|r_h\|_{L^2(\mathbb{R})}\to0$. Since $b_h=h^s(a+r_h)$, we get \begin{align*} \int_{\mathbb{R}}|b_h(x)|^2\,dx=h^{2s}\|a+r_h\|_{L^2(\mathbb{R})}^2. \end{align*} Since $\|r_h\|_{L^2(\mathbb{R})}\to0$, this gives \begin{align*} \int_{\mathbb{R}}|b_h(x)|^2\,dx=h^{2s}\|a\|_{L^2(\mathbb{R})}^2+o(h^{2s}). \end{align*} Using the formula for $a$, we obtain \begin{align*} \int_{\mathbb{R}}|b_h(x)|^2\,dx=\frac{h^{2s}\mu_s(K)^2}{(s!)^2}\|f^{(s)}\|_{L^2(\mathbb{R})}^2+o(h^{2s}). \end{align*} [/step] [step:Compute the integrated variance] For each $x\in\mathbb{R}$, \begin{align*} \operatorname{Var}(\hat f_{n,h}(x)) = \frac{1}{n} \left( \mathbb{E}[Y_{h,x,1}^2]-m_h(x)^2 \right). \end{align*} First compute the integral of $\mathbb{E}[Y_{h,x,1}^2]$. Since $X_1$ has density $f$, \begin{align*} \mathbb{E}[Y_{h,x,1}^2] = \frac{1}{h^2} \int_{\mathbb{R}}K\left(\frac{x-y}{h}\right)^2 f(y)\,dy. \end{align*} The integrand $K((x-y)/h)^2f(y)$ is non-negative because $f$ is a density, so Tonelli's theorem justifies interchanging the $x$ and $y$ integrations. Integrating over $x$ and using the substitution $u=(x-y)/h$, so that $dx=h\,du$, gives \begin{align*} \int_{\mathbb{R}}\mathbb{E}[Y_{h,x,1}^2]\,dx=\frac{1}{h^2}\int_{\mathbb{R}}\int_{\mathbb{R}}K\left(\frac{x-y}{h}\right)^2 f(y)\,dy\,dx. \end{align*} After the substitution $u=(x-y)/h$, this becomes \begin{align*} \int_{\mathbb{R}}\mathbb{E}[Y_{h,x,1}^2]\,dx=\frac{1}{h}\int_{\mathbb{R}}f(y)\,dy\int_{\mathbb{R}}K(u)^2\,du. \end{align*} Because $f$ is a probability density, the last display equals $R(K)/h$. Next, by [Young's convolution inequality](/theorems/463) in the case $L^2(\mathbb{R})*L^1(\mathbb{R})\to L^2(\mathbb{R})$, applied to the convolution representation of $m_h$, \begin{align*} \|m_h\|_{L^2(\mathbb{R})} \le \|f\|_{L^2(\mathbb{R})}\|K\|_{L^1(\mathbb{R})}. \end{align*} The right-hand side is finite because $f\in L^2(\mathbb{R})$ and $K$ is compactly supported with $K\in L^2(\mathbb{R})$, hence $K\in L^1(\mathbb{R})$. Consequently, \begin{align*} \frac{1}{n}\int_{\mathbb{R}}m_h(x)^2\,dx = O\left(\frac{1}{n}\right) = o\left(\frac{1}{nh}\right), \end{align*} because $h\to0$. Therefore \begin{align*} \int_{\mathbb{R}}\operatorname{Var}(\hat f_{n,h}(x))\,dx=\frac{1}{n}\int_{\mathbb{R}}\mathbb{E}[Y_{h,x,1}^2]\,dx-\frac{1}{n}\int_{\mathbb{R}}m_h(x)^2\,dx. \end{align*} Combining the two estimates above, \begin{align*} \int_{\mathbb{R}}\operatorname{Var}(\hat f_{n,h}(x))\,dx=\frac{R(K)}{nh}+o\left(\frac{1}{nh}\right). \end{align*} The condition $nh_n\to\infty$ also records that this integrated variance contribution tends to $0$ along the stated bandwidth sequence. [/step] [step:Combine the bias and variance expansions] From the bias estimate, \begin{align*} \int_{\mathbb{R}}|m_h(x)-f(x)|^2\,dx = \frac{h^{2s}\mu_s(K)^2}{(s!)^2} \|f^{(s)}\|_{L^2(\mathbb{R})}^2 + o(h^{2s}). \end{align*} From the variance estimate, \begin{align*} \int_{\mathbb{R}}\operatorname{Var}(\hat f_{n,h}(x))\,dx = \frac{R(K)}{nh} + o\left(\frac{1}{nh}\right). \end{align*} Adding these two identities in the MISE decomposition gives \begin{align*} \operatorname{MISE}(\hat f_{n,h},f) = \frac{h^{2s}\mu_s(K)^2}{(s!)^2} \|f^{(s)}\|_{L^2(\mathbb{R})}^2 + \frac{R(K)}{nh} + o(h^{2s}) + o\left(\frac{1}{nh}\right). \end{align*} This is the claimed expansion. [/step]

Prerequisites (0/6 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Convolution Definition Expectation Definition Continuity Definition Variance Definition Fundamental Theorem Of Calculus Theorem #632 Minkowski's Integral Inequality Theorem #464 Oracle Inequality for Lepski Bandwidth Selection Probability & Statistics Le Cam Two-Point Lower Bound Probability & Statistics Chernoff Bound in Legendre Transform Form Probability & Statistics Soft-Thresholding Formula for the Lasso with Orthonormal Design Probability & Statistics Weak Stirling Probability Theory Time Inversion of Brownian Motion Brownian Motion Pivotal Score Bound for the Square-Root Lasso Probability & Statistics Poisson Chernoff Bounds Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.