Asymptotic Mean Integrated Squared Error for Kernel Density Estimators

Asymptotic Mean Integrated Squared Error for Kernel Density Estimators (Theorem # 6328)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We decompose the mean integrated squared error into the integrated squared bias plus the integrated variance. The bias is expanded by the second-order Taylor formula with integral remainder: symmetry of $K$ cancels the first-order term, and the stated $L^2$ remainder assumption makes the integrated squared bias equal to $\frac{h^4\mu_2(K)^2}{4}R(f'')+o(h^4)$. The variance is computed directly from independence; its leading integrated contribution is $\frac{R(K)}{nh}$, while the centering correction is $O(1/n)=o(1/(nh))$. Combining the two expansions gives the stated AMISE and the exact MISE remainder. [/proofplan] [step:Decompose the risk into integrated squared bias and integrated variance] Let $(\Omega,\mathcal F,\mathbb P)$, the sample $X_1,\dots,X_n:\Omega\to\mathbb R$, and the estimator $\hat f_h:\mathbb R\times\Omega\to\mathbb R$ be as in the theorem statement. For fixed $x\in\mathbb R$, write $\hat f_h(x):\Omega\to\mathbb R$ for the [random variable](/page/Random%20Variable) $\omega\mapsto\hat f_h(x,\omega)$. For $h>0$, define the mean estimator $m_h: \mathbb{R}\to\mathbb{R}$ by $m_h(x)=\mathbb{E}[\hat f_h(x)]$. Define the pointwise bias $b_h: \mathbb{R}\to\mathbb{R}$ by $b_h(x)=m_h(x)-f(x)$. For a real-valued random variable $Z:\Omega\to\mathbb R$ with finite second moment, write $\operatorname{Var}(Z)=\mathbb E[(Z-\mathbb E[Z])^2]$ for its variance. For each $x\in\mathbb{R}$, the identity \begin{align*} \mathbb{E}\left[\left(\hat f_h(x)-f(x)\right)^2\right] = b_h(x)^2+\operatorname{Var}(\hat f_h(x)) \end{align*} follows by expanding around $m_h(x)$. Integrating over $\mathbb{R}$ gives \begin{align*} \operatorname{MISE}(h) = \int_{\mathbb{R}} b_h(x)^2\,d\mathcal{L}^1(x) + \int_{\mathbb{R}} \operatorname{Var}(\hat f_h(x))\,d\mathcal{L}^1(x), \end{align*} where the interchange of expectation and integration is justified by Tonelli's theorem applied to the nonnegative measurable function $(\omega,x)\mapsto (\hat f_h(x,\omega)-f(x))^2$. The computations below show that the integrated variance term is finite up to the displayed finite centering correction, and the bias step shows the integrated squared bias is finite along the bandwidth sequence under the stated $L^2$ remainder assumption. [/step] [step:Expand the bias and isolate the second derivative term] For Lebesgue-a.e. $x\in\mathbb{R}$, the expectation of the estimator is \begin{align*} m_h(x)=\mathbb{E}\left[\frac{1}{h}K\left(\frac{x-X_1}{h}\right)\right]. \end{align*} Since $X_1$ has density $f$ with respect to $\mathcal L^1$, \begin{align*} m_h(x)=\frac{1}{h}\int_{\mathbb{R}} K\left(\frac{x-y}{h}\right)f(y)\,d\mathcal{L}^1(y). \end{align*} Using the substitution $u=(x-y)/h$, so that $y=x-hu$ and $d\mathcal{L}^1(y)=h\,d\mathcal{L}^1(u)$, we obtain, for Lebesgue-a.e. $x$, \begin{align*} m_h(x) = \int_{\mathbb{R}} K(u)f(x-hu)\,d\mathcal{L}^1(u). \end{align*} Consequently, for Lebesgue-a.e. $x$, \begin{align*} b_h(x) = \int_{\mathbb{R}} K(u)\{f(x-hu)-f(x)\}\,d\mathcal{L}^1(u). \end{align*} Because $K\in L^1(\mathbb{R})$ and $\int_{\mathbb{R}} |u|^2|K(u)|\,d\mathcal{L}^1(u)<\infty$, the first moment $\int_{\mathbb{R}} |u||K(u)|\,d\mathcal{L}^1(u)$ is finite. Since $K$ is symmetric, the function $u\mapsto uK(u)$ is odd and integrable, hence \begin{align*} \int_{\mathbb{R}} uK(u)\,d\mathcal{L}^1(u)=0. \end{align*} For almost every pair $(x,u)$, the second-order Taylor formula with integral remainder gives \begin{align*} f(x-hu)-f(x)=-hu f'(x)+h^2u^2\int_0^1(1-t)f''(x-thu)\,d\mathcal{L}^1(t), \end{align*} where the Taylor formula with integral remainder is applicable because $f'$ is absolutely continuous on compact intervals and $f''$ is its almost-everywhere derivative. We fix a measurable representative of $f''$. Then $(x,u,t)\mapsto f''(x-thu)$ is measurable, and for each fixed $(x,u)$ the $t$-integral is finite because $f''\in L^1_{\mathrm{loc}}(\mathbb R)$. The subsequent $u$-integral defining the remainder is interpreted as the measurable $L^2(\mathbb R)$ function whose existence and convergence are asserted by the stated remainder assumption. Hence \begin{align*} b_h(x)=-hf'(x)\int_{\mathbb{R}}uK(u)\,d\mathcal{L}^1(u)+h^2\int_{\mathbb{R}}u^2K(u)\int_0^1(1-t)f''(x-thu)\,d\mathcal{L}^1(t)\,d\mathcal{L}^1(u). \end{align*} Using $\int_{\mathbb R}uK(u)\,d\mathcal L^1(u)=0$, we obtain \begin{align*} b_h(x)=h^2\int_{\mathbb{R}}u^2K(u)\int_0^1(1-t)f''(x-thu)\,d\mathcal{L}^1(t)\,d\mathcal{L}^1(u). \end{align*} Define the bias remainder $\rho_h:\mathbb{R}\to\mathbb{R}$ by \begin{align*} \rho_h(x)=2\int_{\mathbb{R}}u^2K(u)\int_0^1(1-t)\left\{f''(x-thu)-f''(x)\right\}\,d\mathcal{L}^1(t)\,d\mathcal{L}^1(u). \end{align*} Since $\int_0^1(1-t)\,d\mathcal L^1(t)=1/2$, it follows that \begin{align*} b_h(x)=\frac{h^2}{2}\left(\mu_2(K)f''(x)+\rho_h(x)\right). \end{align*} [guided] The purpose of this step is to find the leading deterministic error in the estimator. We compute its expectation as an identity holding for Lebesgue-a.e. $x$. Define $m_h:\mathbb{R}\to\mathbb{R}$ by $m_h(x)=\mathbb{E}[\hat f_h(x)]$ where this expectation is finite, choosing any measurable representative elsewhere. Since the $X_i$ have common density $f$, linearity of expectation gives, for Lebesgue-a.e. $x$, \begin{align*} m_h(x)=\mathbb{E}\left[\frac{1}{h}K\left(\frac{x-X_1}{h}\right)\right]. \end{align*} The density formula for $X_1$ gives \begin{align*} m_h(x)=\frac{1}{h}\int_{\mathbb{R}}K\left(\frac{x-y}{h}\right)f(y)\,d\mathcal{L}^1(y). \end{align*} Now make the change of variables $u=(x-y)/h$. Then $y=x-hu$, and one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) transforms as $d\mathcal{L}^1(y)=h\,d\mathcal{L}^1(u)$. Thus, for Lebesgue-a.e. $x$, \begin{align*} m_h(x) = \int_{\mathbb{R}}K(u)f(x-hu)\,d\mathcal{L}^1(u). \end{align*} The bias is therefore, for Lebesgue-a.e. $x$, \begin{align*} b_h(x) = m_h(x)-f(x) = \int_{\mathbb{R}}K(u)\{f(x-hu)-f(x)\}\,d\mathcal{L}^1(u), \end{align*} using $\int_{\mathbb{R}}K(u)\,d\mathcal{L}^1(u)=1$. We now expand $f(x-hu)$ around $x$. The absolute continuity of $f'$ on compact intervals permits the second-order Taylor formula with integral remainder: \begin{align*} f(x-hu)-f(x)=-hu f'(x)+h^2u^2\int_0^1(1-t)f''(x-thu)\,d\mathcal{L}^1(t). \end{align*} This is the correct remainder form under the stated hypotheses: $f''$ is only an $L^2$ function, so point evaluation of $f''$ at a Lagrange remainder point need not be meaningful. We fix a measurable representative of $f''$; then $(x,u,t)\mapsto f''(x-thu)$ is measurable, and the $t$-integral is finite for each fixed $(x,u)$ because $f''\in L^1_{\mathrm{loc}}(\mathbb R)$. The theorem's remainder hypothesis is precisely the assertion that the weighted $u$-integral obtained below defines an $L^2(\mathbb R)$ function tending to zero. Substituting the integral remainder into the bias gives \begin{align*} b_h(x)=-hf'(x)\int_{\mathbb{R}}uK(u)\,d\mathcal{L}^1(u)+h^2\int_{\mathbb{R}}u^2K(u)\int_0^1(1-t)f''(x-thu)\,d\mathcal{L}^1(t)\,d\mathcal{L}^1(u). \end{align*} The first term disappears because $K$ is symmetric. Indeed, $u\mapsto uK(u)$ is odd, and it is integrable because $K\in L^1(\mathbb{R})$ and $\int_{\mathbb{R}}|u|^2|K(u)|\,d\mathcal{L}^1(u)<\infty$. Hence \begin{align*} \int_{\mathbb{R}}uK(u)\,d\mathcal{L}^1(u)=0. \end{align*} This cancellation is exactly why the leading bias of a symmetric second-order kernel is quadratic in $h$, rather than linear in $h$. To separate the main term from the Taylor error, define $\rho_h:\mathbb{R}\to\mathbb{R}$ by \begin{align*} \rho_h(x)=2\int_{\mathbb{R}}u^2K(u)\int_0^1(1-t)\left\{f''(x-thu)-f''(x)\right\}\,d\mathcal{L}^1(t)\,d\mathcal{L}^1(u). \end{align*} Since \begin{align*} \mu_2(K)=\int_{\mathbb{R}}u^2K(u)\,d\mathcal{L}^1(u) \end{align*} and $\int_0^1(1-t)\,d\mathcal L^1(t)=1/2$, we obtain the exact decomposition \begin{align*} b_h(x)=\frac{h^2}{2}\left(\mu_2(K)f''(x)+\rho_h(x)\right). \end{align*} This formula contains both the main AMISE bias term and the remainder term controlled by the hypothesis. [/guided] [/step] [step:Integrate the squared bias and use the $L^2$ remainder assumption] From the previous step, \begin{align*} \int_{\mathbb{R}}b_h(x)^2\,d\mathcal{L}^1(x) = \frac{h^4}{4} \int_{\mathbb{R}} \left(\mu_2(K)f''(x)+\rho_h(x)\right)^2 \,d\mathcal{L}^1(x). \end{align*} Expanding the square gives \begin{align*} \int_{\mathbb{R}}b_h(x)^2\,d\mathcal{L}^1(x)=\frac{h^4\mu_2(K)^2}{4}\int_{\mathbb{R}}f''(x)^2\,d\mathcal{L}^1(x)+\frac{h^4\mu_2(K)}{2}\int_{\mathbb{R}}f''(x)\rho_h(x)\,d\mathcal{L}^1(x)+\frac{h^4}{4}\int_{\mathbb{R}}\rho_h(x)^2\,d\mathcal{L}^1(x). \end{align*} By hypothesis, \begin{align*} \|\rho_h\|_{L^2(\mathbb{R})}^2 = \int_{\mathbb{R}}\rho_h(x)^2\,d\mathcal{L}^1(x) \to0. \end{align*} Since $f''\in L^2(\mathbb{R})$, the [Cauchy-Schwarz inequality](/theorems/432) gives \begin{align*} \left| \int_{\mathbb{R}}f''(x)\rho_h(x)\,d\mathcal{L}^1(x) \right| \le \|f''\|_{L^2(\mathbb{R})}\|\rho_h\|_{L^2(\mathbb{R})} \to0. \end{align*} Therefore \begin{align*} \int_{\mathbb{R}}b_h(x)^2\,d\mathcal{L}^1(x) = \frac{h^4\mu_2(K)^2}{4}R(f'') + o(h^4). \end{align*} [/step] [step:Compute the integrated variance and identify its leading term] For $x\in\mathbb{R}$, define $Y_{h,x}:\Omega\to\mathbb{R}$ by \begin{align*} Y_{h,x}(\omega)=\frac{1}{h}K\left(\frac{x-X_1(\omega)}{h}\right). \end{align*} Since the $X_i$ are independent and identically distributed, \begin{align*} \operatorname{Var}(\hat f_h(x))=\frac{1}{n}\operatorname{Var}(Y_{h,x})=\frac{1}{n}\mathbb{E}[Y_{h,x}^2]-\frac{1}{n}m_h(x)^2. \end{align*} The second moment is \begin{align*} \mathbb{E}[Y_{h,x}^2] = \frac{1}{h^2} \int_{\mathbb{R}} K\left(\frac{x-y}{h}\right)^2 f(y)\,d\mathcal{L}^1(y). \end{align*} Again using $u=(x-y)/h$, so that $y=x-hu$ and $d\mathcal{L}^1(y)=h\,d\mathcal{L}^1(u)$, \begin{align*} \mathbb{E}[Y_{h,x}^2] = \frac{1}{h} \int_{\mathbb{R}}K(u)^2 f(x-hu)\,d\mathcal{L}^1(u). \end{align*} Integrating over $x$ and applying nonnegative product-measure integration gives \begin{align*} \int_{\mathbb{R}}\mathbb{E}[Y_{h,x}^2]\,d\mathcal{L}^1(x)=\frac{1}{h}\int_{\mathbb{R}}\int_{\mathbb{R}}K(u)^2 f(x-hu)\,d\mathcal{L}^1(u)\,d\mathcal{L}^1(x). \end{align*} By Tonelli's theorem, applied to the nonnegative measurable integrand $K(u)^2f(x-hu)$, this equals \begin{align*} \frac{1}{h}\int_{\mathbb{R}}K(u)^2\left(\int_{\mathbb{R}}f(x-hu)\,d\mathcal{L}^1(x)\right)d\mathcal{L}^1(u). \end{align*} For each fixed $u\in\mathbb{R}$, the translation $z=x-hu$ preserves Lebesgue measure, so \begin{align*} \int_{\mathbb{R}}f(x-hu)\,d\mathcal{L}^1(x) = \int_{\mathbb{R}}f(z)\,d\mathcal{L}^1(z) = 1. \end{align*} Thus \begin{align*} \int_{\mathbb{R}}\mathbb{E}[Y_{h,x}^2]\,d\mathcal{L}^1(x) = \frac{R(K)}{h}. \end{align*} Consequently \begin{align*} \int_{\mathbb{R}}\operatorname{Var}(\hat f_h(x))\,d\mathcal{L}^1(x) = \frac{R(K)}{nh} - \frac{1}{n}\int_{\mathbb{R}}m_h(x)^2\,d\mathcal{L}^1(x). \end{align*} [/step] [step:Show the variance centering correction is negligible] Define the rescaled kernel $K_h:\mathbb{R}\to\mathbb{R}$ by \begin{align*} K_h(t)=\frac{1}{h}K\left(\frac{t}{h}\right). \end{align*} Then $m_h=K_h*f$, where convolution is taken with respect to $\mathcal{L}^1$. Moreover, \begin{align*} \|K_h\|_{L^1(\mathbb{R})} = \int_{\mathbb{R}}\frac{1}{h}\left|K\left(\frac{t}{h}\right)\right|\,d\mathcal{L}^1(t) = \int_{\mathbb{R}}|K(u)|\,d\mathcal{L}^1(u) = \|K\|_{L^1(\mathbb{R})}. \end{align*} Since $K\in L^1(\mathbb{R})$ and $f\in L^2(\mathbb{R})$, [Young's convolution inequality](/theorems/463) gives \begin{align*} \|m_h\|_{L^2(\mathbb{R})} = \|K_h*f\|_{L^2(\mathbb{R})} \le \|K_h\|_{L^1(\mathbb{R})}\|f\|_{L^2(\mathbb{R})} = \|K\|_{L^1(\mathbb{R})}\|f\|_{L^2(\mathbb{R})}, \end{align*} where Young's convolution inequality is used in the case $L^1*L^2\subset L^2$. The hypothesis $f\in L^2(\mathbb R)$ in the theorem statement is exactly the condition needed for this bound. Define the asymptotic notation $A_n=O(1/n)$ to mean that there are constants $C>0$ and $n_0\in\mathbb N$ such that $|A_n|\le C/n$ for all $n\ge n_0$. Hence \begin{align*} 0\le\frac{1}{n}\int_{\mathbb{R}}m_h(x)^2\,d\mathcal{L}^1(x)\le\frac{\|K\|_{L^1(\mathbb{R})}^2\|f\|_{L^2(\mathbb{R})}^2}{n}=O\left(\frac{1}{n}\right). \end{align*} Because $h\to0$, \begin{align*} \frac{1/n}{1/(nh)} = h \to0, \end{align*} so $1/n=o(1/(nh))$. Therefore \begin{align*} \int_{\mathbb{R}}\operatorname{Var}(\hat f_h(x))\,d\mathcal{L}^1(x) = \frac{R(K)}{nh} + o\left(\frac{1}{nh}\right). \end{align*} [/step] [step:Combine the bias and variance expansions] Using the risk decomposition, \begin{align*} \operatorname{MISE}(h) = \int_{\mathbb{R}} b_h(x)^2\,d\mathcal{L}^1(x) + \int_{\mathbb{R}}\operatorname{Var}(\hat f_h(x))\,d\mathcal{L}^1(x). \end{align*} The integrated squared bias expansion gives \begin{align*} \int_{\mathbb{R}} b_h(x)^2\,d\mathcal{L}^1(x) = \frac{h^4\mu_2(K)^2}{4}R(f'') + o(h^4), \end{align*} and the integrated variance expansion gives \begin{align*} \int_{\mathbb{R}}\operatorname{Var}(\hat f_h(x))\,d\mathcal{L}^1(x) = \frac{R(K)}{nh} + o\left(\frac{1}{nh}\right). \end{align*} Therefore \begin{align*} \operatorname{MISE}(h)=\frac{R(K)}{nh}+\frac{h^4\mu_2(K)^2}{4}R(f'')+o\left(\frac{1}{nh}\right)+o(h^4). \end{align*} Since \begin{align*} o\left(\frac{1}{nh}\right)+o(h^4) = o\left(\frac{1}{nh}+h^4\right), \end{align*} we obtain \begin{align*} \operatorname{MISE}(h) = \operatorname{AMISE}(h) + o\left(\frac{1}{nh}+h^4\right), \end{align*} with \begin{align*} \operatorname{AMISE}(h) = \frac{R(K)}{nh} + \frac{h^4\mu_2(K)^2}{4}R(f''). \end{align*} This is the desired asymptotic mean integrated squared error formula. [/step]

Prerequisites (0/8 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Random Variable Definition Convolution Definition Continuity Definition Lebesgue Measure Definition Expectation Definition Variance Definition Young's Convolution Inequality Theorem #463 Error Formula for Polynomial Interpolation Theorem #475 Immediate Return to Zero for Brownian Motion Brownian Motion Continuity of Probability Probability Theory Empirical Quantile Process Central Limit Theorem Probability & Statistics Davis-Kahan Sin Theta Theorem Probability & Statistics Average KL Divergence Bound for Mutual Information Probability & Statistics Necessary KKT Conditions for Exact Lasso Sign Recovery Probability & Statistics Moments and Asymptotic Normality of the Wilcoxon Signed-Rank Statistic Probability & Statistics Gambler's Ruin Probability Probability Theory Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.