MISE Rate for Multivariate Kernel Density Estimation

MISE Rate for Multivariate Kernel Density Estimation (Theorem # 6326)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We decompose the pointwise mean squared error into squared bias plus variance, then integrate this identity over $\mathbb{R}^d$ with respect to [Lebesgue measure](/page/Lebesgue%20Measure). The two assumed bounds immediately control the integrated squared bias and integrated variance. Finally, the bandwidth rate is obtained by balancing the powers $h^{2s}$ and $(n h^d)^{-1}$ and substituting $h_n \asymp n^{-1/(2s+d)}$ into both terms. [/proofplan] [step:Decompose the pointwise mean squared error into bias and variance] Fix $n \in \mathbb{N}$ and $0 < h \le h_0$. For each $x \in \mathbb{R}^d$ such that $\mathbb{E}[|\hat f_{n,h}(x)|^2] < \infty$, define the real-valued [random variable](/page/Random%20Variable) $Y_x: \Omega \to \mathbb{R}$ by \begin{align*} Y_x(\omega)=\hat f_{n,h}(\omega,x). \end{align*} Then $Y_x - f(x)$ has finite second moment. Using the identity \begin{align*} Y_x - f(x) = \left(Y_x-\mathbb{E}[Y_x]\right) + \left(\mathbb{E}[Y_x]-f(x)\right), \end{align*} we expand and take expectation. This gives \begin{align*} \mathbb{E}\left[(Y_x-f(x))^2\right] = \mathbb{E}\left[ \left(Y_x-\mathbb{E}[Y_x]\right)^2 \right] + 2\left(\mathbb{E}[Y_x]-f(x)\right) \mathbb{E}\left[Y_x-\mathbb{E}[Y_x]\right] + \left(\mathbb{E}[Y_x]-f(x)\right)^2. \end{align*} Since $\mathbb{E}[Y_x-\mathbb{E}[Y_x]]=0$, this becomes \begin{align*} \mathbb{E}\left[(\hat f_{n,h}(x)-f(x))^2\right] = \operatorname{Var}(\hat f_{n,h}(x)) + \left(\mathbb{E}[\hat f_{n,h}(x)]-f(x)\right)^2. \end{align*} [guided] Fix $n \in \mathbb{N}$ and $0 < h \le h_0$. The quantity inside the MISE integral is a pointwise mean squared error, so we first prove the usual bias-variance identity at each spatial point $x \in \mathbb{R}^d$. For every $x \in \mathbb{R}^d$ such that $\mathbb{E}[|\hat f_{n,h}(x)|^2] < \infty$, define the real-valued random variable $Y_x: \Omega \to \mathbb{R}$ by \begin{align*} Y_x(\omega)=\hat f_{n,h}(\omega,x). \end{align*} The deterministic number $f(x)$ is the target value of the density at $x$, while $\mathbb{E}[Y_x]$ is the mean of the estimator at $x$. We split the estimation error into a centered random fluctuation and a deterministic bias: \begin{align*} Y_x - f(x) = \left(Y_x-\mathbb{E}[Y_x]\right) + \left(\mathbb{E}[Y_x]-f(x)\right). \end{align*} Squaring this identity gives \begin{align*} (Y_x-f(x))^2 = \left(Y_x-\mathbb{E}[Y_x]\right)^2 + 2\left(Y_x-\mathbb{E}[Y_x]\right)\left(\mathbb{E}[Y_x]-f(x)\right) + \left(\mathbb{E}[Y_x]-f(x)\right)^2. \end{align*} Now take expectation. The middle term vanishes because $\mathbb{E}[Y_x]-f(x)$ is deterministic and \begin{align*} \mathbb{E}\left[Y_x-\mathbb{E}[Y_x]\right] = \mathbb{E}[Y_x]-\mathbb{E}[Y_x] = 0. \end{align*} Therefore \begin{align*} \mathbb{E}\left[(Y_x-f(x))^2\right] = \mathbb{E}\left[ \left(Y_x-\mathbb{E}[Y_x]\right)^2 \right] + \left(\mathbb{E}[Y_x]-f(x)\right)^2. \end{align*} By the definition of variance, the first term is $\operatorname{Var}(Y_x)=\operatorname{Var}(\hat f_{n,h}(x))$. Hence \begin{align*} \mathbb{E}\left[(\hat f_{n,h}(x)-f(x))^2\right] = \operatorname{Var}(\hat f_{n,h}(x)) + \left(\mathbb{E}[\hat f_{n,h}(x)]-f(x)\right)^2. \end{align*} This identity is the exact pointwise mechanism behind the MISE bound: random fluctuation contributes variance, and systematic displacement of the mean contributes squared bias. [/guided] [/step] [step:Integrate the bias-variance decomposition and use the assumed bounds] Integrating the pointwise identity over $\mathbb{R}^d$ with respect to $\mathcal{L}^d$ gives \begin{align*} \operatorname{MISE}_{n}(h) &= \int_{\mathbb{R}^d} \left(\mathbb{E}[\hat f_{n,h}(x)]-f(x)\right)^2 \,d\mathcal{L}^d(x) + \int_{\mathbb{R}^d} \operatorname{Var}(\hat f_{n,h}(x)) \,d\mathcal{L}^d(x). \end{align*} The first integral is bounded by $C_1 h^{2s}$ by the integrated squared-bias hypothesis, and the second integral is bounded by $C_2/(n h^d)$ by the integrated variance hypothesis. Therefore \begin{align*} \operatorname{MISE}_{n}(h) \le C_1 h^{2s} + \frac{C_2}{n h^d}. \end{align*} With $C_3 := \max\{C_1,C_2\}$, this implies \begin{align*} \operatorname{MISE}_{n}(h) \le C_3\left(h^{2s}+\frac{1}{n h^d}\right), \end{align*} which is precisely \begin{align*} \operatorname{MISE}_{n}(h) = O\left(h^{2s}+\frac{1}{n h^d}\right). \end{align*} [/step] [step:Balance the two terms to determine the bandwidth rate] To minimize the order of the upper bound, balance the squared-bias contribution and the variance contribution: \begin{align*} h^{2s} \asymp \frac{1}{n h^d}. \end{align*} Multiplying by $h^d$ gives \begin{align*} h^{2s+d} \asymp \frac{1}{n}. \end{align*} Taking the positive $(2s+d)$-th root yields \begin{align*} h \asymp n^{-1/(2s+d)}. \end{align*} This is the claimed bandwidth scale. [/step] [step:Substitute the balanced bandwidth into the MISE bound] Let $(h_n)_{n=1}^{\infty}$ be a positive sequence such that $h_n \asymp n^{-1/(2s+d)}$ and $h_n \le h_0$ for all sufficiently large $n$. By the definition of $\asymp$, there exist constants $a,b>0$ and $N \in \mathbb{N}$ such that for every $n \ge N$, \begin{align*} a n^{-1/(2s+d)} \le h_n \le b n^{-1/(2s+d)}. \end{align*} For the bias term, the upper bound on $h_n$ gives \begin{align*} h_n^{2s} \le b^{2s} n^{-2s/(2s+d)}. \end{align*} For the variance term, the lower bound on $h_n$ gives \begin{align*} \frac{1}{n h_n^d} \le \frac{1}{n a^d n^{-d/(2s+d)}} = a^{-d} n^{-2s/(2s+d)}. \end{align*} Using the MISE bound already proved, for every sufficiently large $n$, \begin{align*} \operatorname{MISE}_{n}(h_n) \le C_1 h_n^{2s} + \frac{C_2}{n h_n^d}. \end{align*} The two preceding estimates for the bias and variance terms therefore give \begin{align*} \operatorname{MISE}_{n}(h_n) \le \left(C_1 b^{2s}+C_2 a^{-d}\right) n^{-2s/(2s+d)}. \end{align*} Thus \begin{align*} \operatorname{MISE}_{n}(h_n) = O\left(n^{-2s/(2s+d)}\right), \end{align*} which completes the proof. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.