Mean Integrated Squared Error Upper Bound for Hölder Kernel Density Estimators

Mean Integrated Squared Error Upper Bound for Hölder Kernel Density Estimators (Theorem # 6358)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We decompose the mean integrated squared error into a variance term and a squared bias term. The variance is controlled by independence and a uniform $L^2$ bound for the rescaled kernel sections; in the periodic case the periodized kernel is handled with an explicit finite-overlap constant, giving order $(nh^d)^{-1}$. The bias is controlled either by Taylor expansion on the periodic Hölder extension to the torus or by the stated uniform boundary-corrected bias bound; the kernel moment conditions cancel all polynomial terms, and the Hölder remainder gives order $h^s$ pointwise, hence order $h^{2s}$ in squared $L^2$. Balancing $h^{2s}$ and $(nh^d)^{-1}$ with the two-sided bandwidth bound gives the asserted rate. [/proofplan] [step:Decompose the risk into variance and squared bias] Fix $f\in\mathcal F$. Let $\mathbb P_f$ denote the probability law on $\mathcal X$ with density $f$ with respect to $\mathcal L^d$, and let $\mathbb E_f$ and $\operatorname{Var}_f$ denote expectation and variance under $\mathbb P_f$. Let $\mathbb P_f^{(n)}=\mathbb P_f^{\otimes n}$ denote the joint law of $(X_1,\dots,X_n)$, and let $\mathbb E_f^{(n)}$ denote expectation under $\mathbb P_f^{(n)}$. Define the mean estimator $m_h:\mathcal X\to\mathbb R$ by \begin{align*} m_h(x)=\mathbb E_f^{(n)}[\hat f_h(x)]. \end{align*} For each $x\in\mathcal X$, \begin{align*} \hat f_h(x)-f(x) = \bigl(\hat f_h(x)-m_h(x)\bigr)+\bigl(m_h(x)-f(x)\bigr). \end{align*} Since $\mathbb E_f^{(n)}[\hat f_h(x)-m_h(x)]=0$, the cross term vanishes after taking expectation. Therefore \begin{align*} \mathbb E_f^{(n)} \left[ \|\hat f_h-f\|_{L^2(\mathcal X)}^2 \right] = \int_{\mathcal X} \mathbb E_f^{(n)} \left[ |\hat f_h(x)-f(x)|^2 \right] \,d\mathcal L^d(x). \end{align*} Moreover, \begin{align*} \int_{\mathcal X} \mathbb E_f^{(n)} \left[ |\hat f_h(x)-f(x)|^2 \right] \,d\mathcal L^d(x) = \int_{\mathcal X} \operatorname{Var}_f(\hat f_h(x)) \,d\mathcal L^d(x) + \int_{\mathcal X} |m_h(x)-f(x)|^2 \,d\mathcal L^d(x). \end{align*} Here Tonelli's theorem applies because the integrand is non-negative, and the pointwise cross term has expectation zero by the elementary variance identity $\mathbb E[(Y-\mathbb E[Y])a]=0$ for an integrable real-valued [random variable](/page/Random%20Variable) $Y$ and a deterministic scalar $a$. [/step] [step:Bound the integrated variance by the rescaled kernel size] First consider the periodic estimator. For $x,y\in\mathcal X$, define the periodic rescaled kernel section $k_{h,x}:\mathcal X\to\mathbb R$ by \begin{align*} k_{h,x}(y) = \sum_{m\in\mathbb Z^d} h^{-d}K\left(\frac{x-y+m}{h}\right). \end{align*} Then \begin{align*} \hat f_h(x)=\frac{1}{n}\sum_{j=1}^n k_{h,x}(X_j). \end{align*} By independence of $X_1,\dots,X_n$, \begin{align*} \operatorname{Var}_f(\hat f_h(x)) = \frac{1}{n}\operatorname{Var}_f(k_{h,x}(X_1)) \le \frac{1}{n}\mathbb E_f[|k_{h,x}(X_1)|^2]. \end{align*} Integrating in $x$ and using the density of $X_1$ gives \begin{align*} \int_{\mathcal X}\operatorname{Var}_f(\hat f_h(x))\,d\mathcal L^d(x) \le \frac{1}{n} \int_{\mathcal X}\int_{\mathcal X} |k_{h,x}(y)|^2 f(y) \,d\mathcal L^d(y)\,d\mathcal L^d(x). \end{align*} Using $f(y)\le\|f\|_\infty$ for $\mathcal L^d$-almost every $y\in\mathcal X$ gives \begin{align*} \int_{\mathcal X}\operatorname{Var}_f(\hat f_h(x))\,d\mathcal L^d(x) \le \frac{\|f\|_\infty}{n} \int_{\mathcal X}\int_{\mathcal X} |k_{h,x}(y)|^2 \,d\mathcal L^d(y)\,d\mathcal L^d(x). \end{align*} Let $R_K<\infty$ be chosen so that $\operatorname{supp}K\subset B(0,R_K)$, and define the finite overlap constant \begin{align*} N_K = \sup_{u\in\mathbb R^d}\#\left\{m\in\mathbb Z^d:u+m\in B(0,R_K)\right\}. \end{align*} For $0<h\le 1$, at most $N_K$ summands in the periodized kernel section are non-zero for each pair $(x,y)\in\mathcal X\times\mathcal X$. Hence the finite-sum [Cauchy-Schwarz inequality](/theorems/432) gives \begin{align*} |k_{h,x}(y)|^2 \le N_K\sum_{m\in\mathbb Z^d} h^{-2d}\left|K\left(\frac{x-y+m}{h}\right)\right|^2. \end{align*} The right-hand side is non-negative, so Tonelli's theorem permits the sum and the integral in $x$ to be interchanged. For each $m\in\mathbb Z^d$, the substitution $z=(x-y+m)/h$ transforms the measure by $d\mathcal L^d(x)=h^d\,d\mathcal L^d(z)$ and sends $x\in\mathcal X$ to $z\in h^{-1}(\mathcal X-y+m)$. The translated cubes $\mathcal X-y+m$, with $m\in\mathbb Z^d$, tile $\mathbb R^d$ up to their boundary sets, which are $\mathcal L^d$-null; after scaling by $h^{-1}$, the sets $h^{-1}(\mathcal X-y+m)$ also tile $\mathbb R^d$ up to $\mathcal L^d$-null sets. Therefore \begin{align*} \int_{\mathcal X}|k_{h,x}(y)|^2\,d\mathcal L^d(x) \le N_K\sum_{m\in\mathbb Z^d}h^{-2d} \int_{\mathcal X}\left|K\left(\frac{x-y+m}{h}\right)\right|^2 \,d\mathcal L^d(x). \end{align*} The change of variables and the tiling give \begin{align*} \sum_{m\in\mathbb Z^d}h^{-2d} \int_{\mathcal X}\left|K\left(\frac{x-y+m}{h}\right)\right|^2 \,d\mathcal L^d(x) = h^{-d}\int_{\mathbb R^d}|K(z)|^2\,d\mathcal L^d(z). \end{align*} Hence \begin{align*} \int_{\mathcal X}|k_{h,x}(y)|^2\,d\mathcal L^d(x) \le N_Kh^{-d}\|K\|_{L^2(\mathbb R^d)}^2. \end{align*} Thus, using $\mathcal L^d(\mathcal X)=1$ and $\|f\|_\infty\le B$, \begin{align*} \int_{\mathcal X}\operatorname{Var}_f(\hat f_h(x))\,d\mathcal L^d(x) \le \frac{BN_K\|K\|_{L^2(\mathbb R^d)}^2}{nh^d}. \end{align*} In the boundary-corrected case, the assumed uniform $L^2$ kernel-size bound gives the same estimate with $N_K\|K\|_{L^2(\mathbb R^d)}^2$ replaced by the corresponding boundary-correction constant. [guided] The variance calculation has two purposes: use independence to gain the factor $1/n$, and use the $h^{-d}$ scaling of the kernel to determine the size of one summand. For $x,y\in\mathcal X$, write \begin{align*} k_{h,x}(y) = \sum_{m\in\mathbb Z^d} h^{-d}K\left(\frac{x-y+m}{h}\right). \end{align*} This is the function of the observation variable $y$ that is evaluated at $X_j$. Hence \begin{align*} \hat f_h(x)=\frac{1}{n}\sum_{j=1}^n k_{h,x}(X_j). \end{align*} Because the random variables $X_1,\dots,X_n$ are independent and identically distributed, the variance of the average is \begin{align*} \operatorname{Var}_f(\hat f_h(x)) = \frac{1}{n}\operatorname{Var}_f(k_{h,x}(X_1)). \end{align*} We then bound the variance by the second moment: \begin{align*} \operatorname{Var}_f(k_{h,x}(X_1)) \le \mathbb E_f[|k_{h,x}(X_1)|^2]. \end{align*} Since $X_1$ has density $f$ with respect to $\mathcal L^d$ on $\mathcal X$, \begin{align*} \mathbb E_f[|k_{h,x}(X_1)|^2] = \int_{\mathcal X}|k_{h,x}(y)|^2 f(y)\,d\mathcal L^d(y). \end{align*} Now integrate over $x$ and use the uniform density bound $\|f\|_\infty\le B$: \begin{align*} \int_{\mathcal X}\operatorname{Var}_f(\hat f_h(x))\,d\mathcal L^d(x) \le \frac{1}{n} \int_{\mathcal X}\int_{\mathcal X} |k_{h,x}(y)|^2 f(y) \,d\mathcal L^d(y)\,d\mathcal L^d(x). \end{align*} Using $f(y)\le B$ for $\mathcal L^d$-almost every $y\in\mathcal X$, we get \begin{align*} \int_{\mathcal X}\operatorname{Var}_f(\hat f_h(x))\,d\mathcal L^d(x) \le \frac{B}{n} \int_{\mathcal X}\int_{\mathcal X} |k_{h,x}(y)|^2 \,d\mathcal L^d(y)\,d\mathcal L^d(x). \end{align*} The remaining integral is the only place where periodization needs care. Squaring the periodized kernel creates cross terms, so we do not identify its squared mass with the squared mass of $K$. Instead choose $R_K<\infty$ such that $\operatorname{supp}K\subset B(0,R_K)$ and define \begin{align*} N_K = \sup_{u\in\mathbb R^d}\#\left\{m\in\mathbb Z^d:u+m\in B(0,R_K)\right\}. \end{align*} This number is finite because $B(0,R_K)$ is bounded and the lattice $\mathbb Z^d$ is locally finite. For $0<h\le 1$, at most $N_K$ terms in $k_{h,x}(y)$ can be non-zero for any fixed $(x,y)$. The finite-sum Cauchy-Schwarz inequality therefore gives \begin{align*} |k_{h,x}(y)|^2 \le N_K\sum_{m\in\mathbb Z^d}h^{-2d}\left|K\left(\frac{x-y+m}{h}\right)\right|^2. \end{align*} The right-hand side is non-negative, so Tonelli's theorem allows us to interchange the sum and the integral over $x$. For a fixed $m\in\mathbb Z^d$, use the substitution $z=(x-y+m)/h$; the measure transforms as \begin{align*} d\mathcal L^d(x)=h^d\,d\mathcal L^d(z). \end{align*} The image of $x\in\mathcal X$ is $h^{-1}(\mathcal X-y+m)$. The translated cubes $\mathcal X-y+m$, indexed by $m\in\mathbb Z^d$, tile $\mathbb R^d$ up to boundary sets of $\mathcal L^d$-measure zero; scaling by $h^{-1}$ preserves this tiling property up to null sets. Hence the sum of the changed-variable integrals is exactly the integral over $\mathbb R^d$: \begin{align*} \sum_{m\in\mathbb Z^d}h^{-2d} \int_{\mathcal X}\left|K\left(\frac{x-y+m}{h}\right)\right|^2 \,d\mathcal L^d(x) = h^{-d}\int_{\mathbb R^d}|K(z)|^2\,d\mathcal L^d(z). \end{align*} Combining this with the finite-overlap inequality gives \begin{align*} \int_{\mathcal X}|k_{h,x}(y)|^2\,d\mathcal L^d(x) \le N_Kh^{-d}\|K\|_{L^2(\mathbb R^d)}^2. \end{align*} Integrating in $y$ over $\mathcal X$ contributes the factor $\mathcal L^d(\mathcal X)=1$, so \begin{align*} \int_{\mathcal X}\operatorname{Var}_f(\hat f_h(x))\,d\mathcal L^d(x) \le \frac{BN_K\|K\|_{L^2(\mathbb R^d)}^2}{nh^d}. \end{align*} For a boundary-corrected estimator, the same argument is encoded in the assumed uniform $L^2$ kernel-size bound: the corrected kernel sections have squared $L^2$ mass at most a constant times $h^{-d}$ uniformly in the boundary location. Thus the variance remains of order $(nh^d)^{-1}$. [/guided] [/step] [step:Use the kernel moments to bound the squared bias] In the periodic case, let $f_{\operatorname{per}}:\mathbb T^d\to\mathbb R$ denote the periodic extension of $f$, which belongs to $\mathcal H^s(L;\mathbb T^d)$ by the periodic hypothesis in the statement. The expectation of the estimator is the periodic convolution \begin{align*} m_h(x) = \int_{\mathbb R^d}K(z)f_{\operatorname{per}}(x-hz)\,d\mathcal L^d(z), \end{align*} where $x-hz$ is interpreted in $\mathbb T^d$. Fix $x\in\mathcal X$. Let $r=\lceil s\rceil-1$. The Hölder-space convention for $\mathcal H^s(L;\mathbb T^d)$ supplies an order-$r$ Taylor expansion with a remainder bounded by the $s$-order Hölder modulus, including when $s$ is an integer and $r=s-1$. Taylor's formula with this Hölder remainder on the torus gives \begin{align*} f_{\operatorname{per}}(x-hz) = \sum_{|\beta|\le r} \frac{(-hz)^\beta}{\beta!}D^\beta f_{\operatorname{per}}(x) + R_x(z), \end{align*} where $R_x:\mathbb R^d\to\mathbb R$ denotes the Taylor remainder and satisfies \begin{align*} |R_x(z)| \le C_{s,d}Lh^s|z|^s \end{align*} for a constant $C_{s,d}<\infty$ depending only on $s$ and $d$. Therefore \begin{align*} m_h(x)-f(x) = \int_{\mathbb R^d} K(z)\bigl(f_{\operatorname{per}}(x-hz)-f_{\operatorname{per}}(x)\bigr) \,d\mathcal L^d(z). \end{align*} Expanding the Taylor polynomial inside this integral gives \begin{align*} m_h(x)-f(x) = \sum_{1\le|\beta|\le r} \frac{(-h)^{|\beta|}D^\beta f_{\operatorname{per}}(x)}{\beta!} \int_{\mathbb R^d}z^\beta K(z)\,d\mathcal L^d(z) + \int_{\mathbb R^d}K(z)R_x(z)\,d\mathcal L^d(z). \end{align*} The polynomial terms vanish by the moment assumptions on $K$. Since $K$ is bounded and compactly supported, the moment \begin{align*} M_{K,s} = \int_{\mathbb R^d}|z|^s|K(z)|\,d\mathcal L^d(z) \end{align*} is finite. Hence \begin{align*} |m_h(x)-f(x)| \le C_{s,d}L M_{K,s} h^s. \end{align*} Squaring and integrating over $\mathcal X$ gives \begin{align*} \|m_h-f\|_{L^2(\mathcal X)}^2 \le C_{s,d}^2L^2M_{K,s}^2 h^{2s}. \end{align*} In the boundary-corrected case, the assumed uniform order-$s$ bias bound gives the same conclusion, with the constant replaced by the corresponding boundary-correction constant. [/step] [step:Balance the variance and bias terms] Combining the preceding estimates, there is a constant $C_0<\infty$, depending only on $s,d,L,B,K$ and on the boundary correction convention, such that \begin{align*} \mathbb E_f^{(n)} \left[ \|\hat f_h-f\|_{L^2(\mathcal X)}^2 \right] \le C_0\left(h^{2s}+\frac{1}{nh^d}\right) \end{align*} for every $f\in\mathcal F$. Taking the supremum over $f\in\mathcal F$ preserves the same upper bound. By the two-sided bandwidth hypothesis in the statement, there exist fixed constants $a,b>0$ such that \begin{align*} a n^{-1/(2s+d)} \le h \le b n^{-1/(2s+d)}. \end{align*} Thus \begin{align*} h^{2s} \le b^{2s}n^{-2s/(2s+d)} \end{align*} and \begin{align*} \frac{1}{nh^d} \le a^{-d}n^{-1+d/(2s+d)} = a^{-d}n^{-2s/(2s+d)}. \end{align*} Therefore, with \begin{align*} C=C_0(b^{2s}+a^{-d}), \end{align*} we obtain \begin{align*} \sup_{f\in\mathcal F} \mathbb E_f^{(n)} \left[ \|\hat f_h-f\|_{L^2(\mathcal X)}^2 \right] \le C n^{-2s/(2s+d)}. \end{align*} This is the asserted uniform upper bound. [/step]

Prerequisites (0/5 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.