Pointwise Bias and Variance Expansion for the Nadaraya-Watson Estimator

Pointwise Bias and Variance Expansion for the Nadaraya-Watson Estimator (Theorem # 6348)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first show that the kernel denominator is positive with exponentially high probability, using the uniform lower bound of $K$ near $0$ and the positivity of $f_X(x)$. We then replace the random ratio by the ratio of its expectations and prove that the error is $o(h^2)$, where the condition $nh^3\to\infty$ makes denominator fluctuations smaller than the deterministic second-order bias. Finally, Taylor expansion of the numerator and denominator gives the bias constant, and a first-order linearisation of the ratio gives the variance constant. [/proofplan] [step:Bound the probability that the denominator vanishes] For each $h>0$, define the rescaled kernel map $K_h:\mathbb R\to[0,\infty)$ by \begin{align*} K_h(u):=h^{-1}K(u/h). \end{align*} Since $K$ is uniformly positive on a neighbourhood of $0$, choose constants $a_K>0$ and $c_K>0$ such that $K(u)\ge c_K$ whenever $|u|\le a_K$. For $h>0$ and $i\in\{1,\dots,n\}$, define the kernel weight $Z_{i,h}:\Omega\to[0,\infty)$ by \begin{align*} Z_{i,h}:=K_h(x-X_i)=h^{-1}K((x-X_i)/h). \end{align*} Define the denominator $S_{n,h}:\Omega\to[0,\infty)$ by \begin{align*} S_{n,h}:=\sum_{i=1}^n Z_{i,h}. \end{align*} Since $K\ge0$, the event $A_{n,h}(x)^c$ is contained in the event that no observation lies in $(x-a_Kh,x+a_Kh)$. Let \begin{align*} p_h:=\mathbb P(|X-x|\le a_Kh). \end{align*} Because $f_X$ is continuous at $x$ and $f_X(x)>0$, there are constants $h_0>0$ and $c_x>0$ such that $p_h\ge c_x h$ for $0<h<h_0$. Independence gives \begin{align*} \mathbb P(A_{n,h}(x)^c)\le (1-p_h)^n\le \exp(-np_h)\le \exp(-c_xnh). \end{align*} Thus $\mathbb P(A_{n,h}(x)^c)$ is exponentially small in $nh$. [guided] The denominator can vanish only if every kernel weight is zero. The nonnegativity hypothesis is essential here: with $K\ge0$, no cancellation among positive and negative weights is possible. Since $K(u)\ge c_K>0$ whenever $|u|\le a_K$, any observation satisfying $|X_i-x|\le a_Kh$ contributes the positive amount at least $h^{-1}c_K$ to $S_{n,h}$. Therefore \begin{align*} A_{n,h}(x)^c\subseteq \bigcap_{i=1}^n\{|X_i-x|>a_Kh\}. \end{align*} Define $p_h:=\mathbb P(|X-x|\le a_Kh)$. Since $f_X$ is continuous at $x$ and $f_X(x)>0$, shrinking $h_0>0$ if necessary gives $f_X(t)\ge f_X(x)/2$ for $|t-x|\le a_Kh_0$. Hence, for $0<h<h_0$, \begin{align*} p_h=\int_{x-a_Kh}^{x+a_Kh} f_X(t)\,d\mathcal L^1(t)\ge a_K f_X(x)h. \end{align*} Set $c_x:=a_Kf_X(x)$. The variables $X_1,\dots,X_n$ are independent, so \begin{align*} \mathbb P(A_{n,h}(x)^c)\le (1-p_h)^n. \end{align*} For $0\le r\le1$, the elementary exponential inequality $1-r\le e^{-r}$ follows from convexity of the exponential function. Applying this with $r=p_h$ gives \begin{align*} \mathbb P(A_{n,h}(x)^c)\le \exp(-np_h)\le \exp(-c_xnh). \end{align*} This proves the claimed exponential smallness. [/guided] [/step] [step:Linearise the random ratio around its deterministic numerator and denominator] Define the random numerator $T_{n,h}:\Omega\to\mathbb R$ by \begin{align*} T_{n,h}:=\sum_{i=1}^n Y_iZ_{i,h}. \end{align*} Define the deterministic quantities $a_h\in\mathbb R$, $b_h\in\mathbb R$, and $r_h\in\mathbb R$ by \begin{align*} a_h:=\mathbb E[Z_{1,h}],\qquad b_h:=\mathbb E[Y_1Z_{1,h}],\qquad r_h:=b_h/a_h. \end{align*} Let $R_K>0$ be such that $\operatorname{supp}K\subset[-R_K,R_K]$, and let $M_K:=\sup_{u\in\mathbb R}|K(u)|$. Fix the regular conditional version near $x$ used to define $m(t)=\mathbb E[Y\mid X=t]$ and $\sigma^2(t)=\operatorname{Var}(Y\mid X=t)$. The local second-moment hypothesis gives $h_0>0$ and $M_2<\infty$ such that $\mathbb E[Y^2\mid X=t]\le M_2$ whenever $|t-x|\le R_Kh_0$. Since $K$ is compactly supported and $f_X$ is continuous at $x$, $a_h\to f_X(x)>0$, so after decreasing $h_0$ we have $a_h\ge f_X(x)/2$ for $0<h<h_0$. For $0<h<h_0$, the support condition implies $Z_{1,h}=0$ unless $|X_1-x|\le R_Kh$. We use the [law of total expectation](/theorems/1121) with respect to the regular conditional law of $Y$ given $X=t$, and then use that $X$ has density $f_X$ with respect to $\mathcal L^1$. Hence \begin{align*} \mathbb E[Z_{1,h}^2] &=h^{-1}\int_{\mathbb R}K(u)^2f_X(x-hu)\,d\mathcal L^1(u)=O(h^{-1}), \end{align*} and \begin{align*} \mathbb E[Y_1^2Z_{1,h}^2] &=h^{-1}\int_{\mathbb R}K(u)^2\mathbb E[Y^2\mid X=x-hu]f_X(x-hu)\,d\mathcal L^1(u)=O(h^{-1}). \end{align*} Therefore \begin{align*} \mathbb E[(S_{n,h}-na_h)^2]=O(nh^{-1}). \end{align*} Also, \begin{align*} \mathbb E[(T_{n,h}-nb_h)^2]=O(nh^{-1}). \end{align*} We verify the inputs for the displayed scalar concentration bound. Since $|K|\le M_K$, each centered summand satisfies \begin{align*} |Z_{i,h}-a_h|\le h^{-1}M_K+a_h\le C_Zh^{-1} \end{align*} for a constant $C_Z>0$. Also $\operatorname{Var}(Z_{i,h})\le\mathbb E[Z_{i,h}^2]\le C_Zh^{-1}$, so \begin{align*} \sum_{i=1}^n\operatorname{Var}(Z_{i,h})\le C_Znh^{-1}. \end{align*} We use the following scalar [Bernstein Inequality](/theorems/1200): if independent centered variables are bounded in absolute value by $M>0$ and have total variance at most $v>0$, then for every $t>0$ their sum exceeds $t$ in absolute value with probability at most \begin{align*} 2\exp\left(-\frac{t^2}{2(v+Mt/3)}\right). \end{align*} Applying this bound to the independent centered variables $Z_{i,h}-a_h$, with $M=C_Zh^{-1}$, $v=C_Znh^{-1}$, and $t=na_h/2$, and using $a_h\ge f_X(x)/2$, gives constants $c_1,c_2>0$ depending only on $x$, $K$, and an upper bound for $f_X$ near $x$ such that \begin{align*} \mathbb P\left(|S_{n,h}-na_h|>\frac{na_h}{2}\right)\le c_1\exp(-c_2nh). \end{align*} Let \begin{align*} B_{n,h}:=\left\{|S_{n,h}-na_h|\le \frac{na_h}{2}\right\}. \end{align*} On $B_{n,h}$, $S_{n,h}\ge na_h/2$, and the algebraic identity \begin{align*} \frac{T_{n,h}}{S_{n,h}}-r_h =\frac{T_{n,h}-r_hS_{n,h}}{na_h} -\frac{(T_{n,h}-r_hS_{n,h})(S_{n,h}-na_h)}{na_hS_{n,h}} \end{align*} holds. Since $\mathbb E[T_{n,h}-r_hS_{n,h}]=nb_h-r_hna_h=0$, taking expectations on $B_{n,h}$ and subtracting the missing part of the centered linear term gives \begin{align*} \left|\mathbb E\left[\left(\frac{T_{n,h}}{S_{n,h}}-r_h\right)\mathbb 1_{B_{n,h}}\right]\right| \le \frac{2}{n^2a_h^2}\mathbb E\left[|T_{n,h}-r_hS_{n,h}|\,|S_{n,h}-na_h|\right]+\frac{1}{na_h}\mathbb E\left[|T_{n,h}-r_hS_{n,h}|\mathbb 1_{B_{n,h}^c}\right]. \end{align*} Applying the [Cauchy-Schwarz Inequality](/theorems/1201), in the [Hilbert space](/page/Hilbert%20Space) $L^2(\Omega)$, to the two centered sums and to the indicator term gives \begin{align*} \left|\mathbb E\left[\left(\frac{T_{n,h}}{S_{n,h}}-r_h\right)\mathbb 1_{B_{n,h}}\right]\right| \le C(nh)^{-1}+C(nh)^{-1/2}\mathbb P(B_{n,h}^c)^{1/2}. \end{align*} The exponential bound for $B_{n,h}^c$ makes the second term $O((nh)^{-1})$, so the contribution from $B_{n,h}$ is $O((nh)^{-1})$. Here the constant $C>0$ is chosen large enough to dominate the preceding second-moment bounds; it depends only on $x$, $K$, $f_X$, and the local bounded second-moment hypothesis for $Y$ near $x$. It remains to bound the actual ratio on $B_{n,h}^c$. Choose $\delta>0$ and $M_2<\infty$ such that $\mathbb E[Y^2\mid X=t]\le M_2$ whenever $|t-x|<\delta$, where $m(t)=\mathbb E[Y\mid X=t]$ is the regression function. Since $K$ is compactly supported, for all sufficiently small $h$ every nonzero weight $K_h(x-X_i)$ has $|X_i-x|<\delta$. Because $K\ge0$, on $A_{n,h}(x)$ the estimator is a weighted average of those $Y_i$ with nonzero weight, so [Jensen's inequality](/theorems/9) for the finite probability weights gives \begin{align*} \hat m_{NW}(x)^2\le \frac{\sum_{i=1}^n K_h(x-X_i)Y_i^2}{\sum_{i=1}^n K_h(x-X_i)}. \end{align*} Conditioning on $X_1,\dots,X_n$ and using the local second-moment bound gives $\mathbb E[\hat m_{NW}(x)^2\mid X_1,\dots,X_n]\le M_2$ on $A_{n,h}(x)$, while $\hat m_{NW}(x)=0$ on $A_{n,h}(x)^c$. Since $r_h\to m(x)$, there is $M_r<\infty$ such that $|r_h|\le M_r$ for all sufficiently small $h$. Therefore the [Cauchy-Schwarz inequality](/theorems/432) gives \begin{align*} \left|\mathbb E\left[(\hat m_{NW}(x)-r_h)\mathbb 1_{B_{n,h}^c}\right]\right| \le \left(2M_2+2M_r^2\right)^{1/2}\mathbb P(B_{n,h}^c)^{1/2}=O((nh)^{-1}). \end{align*} Thus \begin{align*} \mathbb E[\hat m_{NW}(x)]-r_h=O((nh)^{-1}). \end{align*} Since $nh^3\to\infty$, we have $(nh)^{-1}=o(h^2)$, and therefore \begin{align*} \mathbb E[\hat m_{NW}(x)]-r_h=o(h^2). \end{align*} [/step] [step:Expand the deterministic numerator and denominator to second order] Choose an open interval $I\subset\mathbb R$ containing $x$ on which $m$ and $f_X$ are twice continuously differentiable. Define the product map $g:I\to\mathbb R$ by $g(t)=m(t)f_X(t)$. Since $m$ and $f_X$ are twice continuously differentiable on $I$, the function $g$ is twice continuously differentiable on $I$. By the law of total expectation applied to the regular conditional law of $Y$ given $X=t$, and using the density $f_X$ of $X$, the compact support of $K$ permits the change of variables $t=x-hu$, with $d\mathcal L^1(t)=h\,d\mathcal L^1(u)$, giving \begin{align*} a_h=\int_{\mathbb R}K(u)f_X(x-hu)\,d\mathcal L^1(u). \end{align*} The [Taylor Theorem With Remainder](/theorems/1202), applied at $x$, together with symmetry of $K$ and $\int_{\mathbb R}K(u)\,d\mathcal L^1(u)=1$, gives \begin{align*} a_h=f_X(x)+\frac{h^2\mu_2(K)}{2}f_X''(x)+o(h^2). \end{align*} Similarly, \begin{align*} b_h=\int_{\mathbb R}K(u)g(x-hu)\,d\mathcal L^1(u)=g(x)+\frac{h^2\mu_2(K)}{2}g''(x)+o(h^2). \end{align*} Dividing the two expansions and using $g(x)=m(x)f_X(x)$ yields \begin{align*} r_h-m(x)=\frac{h^2\mu_2(K)}{2}\left(\frac{g''(x)}{f_X(x)}-m(x)\frac{f_X''(x)}{f_X(x)}\right)+o(h^2). \end{align*} Since $g''=m''f_X+2m'f_X'+mf_X''$, this becomes \begin{align*} r_h-m(x)=\frac{h^2\mu_2(K)}{2}\left(m''(x)+2m'(x)\frac{f_X'(x)}{f_X(x)}\right)+o(h^2). \end{align*} Combining this with $\mathbb E[\hat m_{NW}(x)]-r_h=o(h^2)$ proves the bias expansion. [guided] The deterministic ratio $r_h=b_h/a_h$ is the ratio obtained by replacing the random numerator and denominator by their expectations. Choose an open interval $I\subset\mathbb R$ containing $x$ on which $m$ and $f_X$ are twice continuously differentiable, and define the product map $g:I\to\mathbb R$ by $g(t)=m(t)f_X(t)$ Then $g$ is twice continuously differentiable on $I$ by the product rule. For all sufficiently small $h$, the compact support of $K$ ensures that $x-hu\in I$ for every $u\in\operatorname{supp}K$. We first condition on $X=t$ and use the regular conditional law of $Y$ given $X=t$; the law of total expectation then writes expectations involving $Y$ as integrals against the density $f_X(t)\,d\mathcal L^1(t)$. Applying the change of variables $t=x-hu$, so that $d\mathcal L^1(t)=h\,d\mathcal L^1(u)$, gives \begin{align*} a_h=\int_{\mathbb R}K(u)f_X(x-hu)\,d\mathcal L^1(u) \end{align*} and \begin{align*} b_h=\int_{\mathbb R}K(u)g(x-hu)\,d\mathcal L^1(u). \end{align*} The [Taylor Theorem With Remainder](/theorems/1202), applied at $x$ to the one-dimensional function $f_X:I\to\mathbb R$, gives \begin{align*} f_X(x-hu)=f_X(x)-hu f_X'(x)+\frac{h^2u^2}{2}f_X''(x)+o(h^2u^2) \end{align*} locally uniformly for $u\in\operatorname{supp}K$. Since $K$ is symmetric, $\int_{\mathbb R}uK(u)\,d\mathcal L^1(u)=0$, and since $\int_{\mathbb R}K(u)\,d\mathcal L^1(u)=1$, we obtain \begin{align*} a_h=f_X(x)+\frac{h^2\mu_2(K)}{2}f_X''(x)+o(h^2). \end{align*} The same argument applied to $g$ gives \begin{align*} b_h=g(x)+\frac{h^2\mu_2(K)}{2}g''(x)+o(h^2). \end{align*} Because $f_X(x)>0$, division of the two expansions is valid and yields \begin{align*} r_h-m(x)=\frac{h^2\mu_2(K)}{2}\left(\frac{g''(x)}{f_X(x)}-m(x)\frac{f_X''(x)}{f_X(x)}\right)+o(h^2). \end{align*} Finally, differentiating $g=mf_X$ twice gives \begin{align*} g''(x)=m''(x)f_X(x)+2m'(x)f_X'(x)+m(x)f_X''(x). \end{align*} Substitution cancels the final $m(x)f_X''(x)$ term and gives \begin{align*} r_h-m(x)=\frac{h^2\mu_2(K)}{2}\left(m''(x)+2m'(x)\frac{f_X'(x)}{f_X(x)}\right)+o(h^2). \end{align*} The previous step proved $\mathbb E[\hat m_{NW}(x)]-r_h=o(h^2)$, so the same expansion holds for $\mathbb E[\hat m_{NW}(x)]-m(x)$. [/guided] [/step] [step:Compute the leading variance from the linearised centered numerator] Define $U_{i,h}:\Omega\to\mathbb R$ by $U_{i,h}:=(Y_i-r_h)Z_{i,h}$ for $i\in\{1,\dots,n\}$. On $B_{n,h}\cap A_{n,h}(x)$, the ratio identity from the preceding step gives \begin{align*} \hat m_{NW}(x)-r_h =\frac{1}{na_h}\sum_{i=1}^n U_{i,h}+R_{n,h}, \end{align*} where \begin{align*} R_{n,h}:=-\frac{\left(\sum_{i=1}^n U_{i,h}\right)(S_{n,h}-na_h)}{na_hS_{n,h}}. \end{align*} Let $V_{n,h}:=\sum_{i=1}^n U_{i,h}$ and $W_{n,h}:=S_{n,h}-na_h$. On $B_{n,h}$, $S_{n,h}\ge na_h/2$, so \begin{align*} R_{n,h}^2\mathbb 1_{B_{n,h}} \le \frac{4V_{n,h}^2W_{n,h}^2}{n^4a_h^4}. \end{align*} We estimate this remainder using conditional second moments, without requiring fourth moments of $Y$. Conditional on $X_1,\dots,X_n$, the quantity $W_{n,h}$ is fixed. The summands in $V_{n,h}$ are not conditionally centered at $r_h$, so we separate their conditional means. Define the conditional noise variables $\varepsilon_i:=Y_i-m(X_i)$ and the deterministic conditional drift \begin{align*} D_{n,h}:=\sum_{i=1}^n (m(X_i)-r_h)Z_{i,h}. \end{align*} Then $V_{n,h}=\sum_{i=1}^n \varepsilon_iZ_{i,h}+D_{n,h}$, and the conditional independence of the observations gives \begin{align*} \mathbb E\left[\left(\sum_{i=1}^n \varepsilon_iZ_{i,h}\right)^2\mid X_1,\dots,X_n\right] \le C_0\sum_{i=1}^n Z_{i,h}^2 \end{align*} for a constant $C_0>0$ depending only on the local conditional variance bound. Since $m$ is continuously differentiable near $x$ and $r_h=m(x)+O(h^2)$ by the deterministic expansion, there is $C_m>0$ such that $|m(t)-r_h|\le C_mh$ whenever $|t-x|\le R_Kh$ and $h$ is sufficiently small. Hence \begin{align*} D_{n,h}^2\le C_m^2h^2S_{n,h}^2. \end{align*} Since $|K|\le M_K$ and $K$ is supported in $[-R_K,R_K]$, we have $Z_{i,h}^2\le M_Kh^{-1}Z_{i,h}$ and therefore $\sum_i Z_{i,h}^2\le M_Kh^{-1}S_{n,h}$. On $B_{n,h}$, also $S_{n,h}\le 3na_h/2$, so \begin{align*} \mathbb E[V_{n,h}^2\mid X_1,\dots,X_n]\mathbb 1_{B_{n,h}} \le C_1(nh^{-1}+n^2h^2)\mathbb 1_{B_{n,h}} \end{align*} for a constant $C_1>0$. Hence \begin{align*} \mathbb E[R_{n,h}^2\mathbb 1_{B_{n,h}}] \le C_2\left(\frac{1}{n^3h}+\frac{h^2}{n^2}\right)\mathbb E[W_{n,h}^2]. \end{align*} The denominator calculation from the preceding step gives $\mathbb E[W_{n,h}^2]=O(nh^{-1})$, so \begin{align*} \mathbb E[R_{n,h}^2\mathbb 1_{B_{n,h}}] =O(n^{-2}h^{-2})+O(h/n)=o((nh)^{-1}), \end{align*} because $nh\to\infty$ and $h\to0$. Define the linearised leading term \begin{align*} L_{n,h}:=\frac{1}{na_h}\sum_{i=1}^n U_{i,h}=\frac{V_{n,h}}{na_h}. \end{align*} It remains to control the complement of $B_{n,h}$. On $A_{n,h}(x)$, all nonzero weights are nonnegative and come from observations with $|X_i-x|\le R_Kh$. For sufficiently small $h$, the local second-moment assumption gives $M_2<\infty$ such that $\mathbb E[Y_i^2\mid X_i=t]\le M_2$ throughout this local support. Since $\hat m_{NW}(x)$ is a weighted average on $A_{n,h}(x)$, [Jensen's inequality](/theorems/1977) for the finite probability weights and conditioning on $X_1,\dots,X_n$ give $\mathbb E[\hat m_{NW}(x)^2\mid X_1,\dots,X_n]\le M_2$ on $A_{n,h}(x)$, while $\hat m_{NW}(x)=0$ on $A_{n,h}(x)^c$. Since $r_h\to m(x)$, define $M_r<\infty$ so that $|r_h|\le M_r$ for all sufficiently small $h$. Therefore \begin{align*} \mathbb E[(\hat m_{NW}(x)-r_h)^2\mathbb 1_{B_{n,h}^c}] \le (2M_2+2M_r^2)\mathbb P(B_{n,h}^c) =o((nh)^{-1}) \end{align*} by the exponential bound for $B_{n,h}^c$. The leading term must also be controlled on this same bad event. Using the corrected conditional estimate \begin{align*} \mathbb E[V_{n,h}^2\mid X_1,\dots,X_n] \le C\left(h^{-1}S_{n,h}+h^2S_{n,h}^2\right) \end{align*} and the definition $L_{n,h}=V_{n,h}/(na_h)$, we obtain \begin{align*} \mathbb E[L_{n,h}^2\mathbb 1_{B_{n,h}^c}] \le \frac{C}{n^2}\left(h^{-1}\mathbb E[S_{n,h}\mathbb 1_{B_{n,h}^c}]+h^2\mathbb E[S_{n,h}^2\mathbb 1_{B_{n,h}^c}]\right). \end{align*} By the [Cauchy-Schwarz Inequality](/theorems/1201), \begin{align*} \mathbb E[S_{n,h}\mathbb 1_{B_{n,h}^c}] \le \mathbb E[S_{n,h}^2]^{1/2}\mathbb P(B_{n,h}^c)^{1/2}. \end{align*} Again by the same inequality, \begin{align*} \mathbb E[S_{n,h}^2\mathbb 1_{B_{n,h}^c}] \le \mathbb E[S_{n,h}^4]^{1/2}\mathbb P(B_{n,h}^c)^{1/2}. \end{align*} The boundedness and compact support of $K$ give polynomial bounds for $\mathbb E[S_{n,h}^2]$ and $\mathbb E[S_{n,h}^4]$, while $\mathbb P(B_{n,h}^c)$ is exponentially small in $nh$. Therefore \begin{align*} \mathbb E[L_{n,h}^2\mathbb 1_{B_{n,h}^c}]=o((nh)^{-1}). \end{align*} Combining the good-event remainder estimate with the two bad-event estimates and the inequality $(a+b)^2\le2a^2+2b^2$ gives \begin{align*} \mathbb E[(\hat m_{NW}(x)-r_h-L_{n,h})^2]=o((nh)^{-1}). \end{align*} Also $\mathbb E[L_{n,h}^2]=O((nh)^{-1})$. Applying the [Cauchy-Schwarz Inequality](/theorems/1201) in $L^2(\Omega)$ to the covariance term gives \begin{align*} \left|\mathbb E[L_{n,h}(\hat m_{NW}(x)-r_h-L_{n,h})]\right|=o((nh)^{-1}). \end{align*} The preceding bias estimate gives $\mathbb E[\hat m_{NW}(x)]-r_h=O((nh)^{-1})$, so \begin{align*} \left(\mathbb E[\hat m_{NW}(x)]-r_h\right)^2=O((nh)^{-2})=o((nh)^{-1}) \end{align*} because $nh\to\infty$. Since subtracting the deterministic constant $r_h$ does not change variance, this yields \begin{align*} \operatorname{Var}(\hat m_{NW}(x))=\operatorname{Var}(L_{n,h})+o((nh)^{-1}) \end{align*} and therefore \begin{align*} \operatorname{Var}(\hat m_{NW}(x))=\frac{1}{n^2a_h^2}\operatorname{Var}\left(\sum_{i=1}^n U_{i,h}\right)+o((nh)^{-1}). \end{align*} The variables $U_{1,h},\dots,U_{n,h}$ are independent and identically distributed, so \begin{align*} \operatorname{Var}\left(\sum_{i=1}^n U_{i,h}\right)=n\operatorname{Var}(U_{1,h}). \end{align*} Moreover $\mathbb E[U_{1,h}]=0$ by the definition $r_h=b_h/a_h$, hence $\operatorname{Var}(U_{1,h})=\mathbb E[U_{1,h}^2]$. Choose a regular conditional version near $x$ for which $t\mapsto\mathbb E[(Y-m(t))^2\mid X=t]$ is defined at $x$ and in a neighbourhood of $x$. Choose an open interval $I\subset\mathbb R$ containing $x$ on which this version is defined, and define the conditional variance map $v:I\to[0,\infty)$ by $v(t)=\mathbb E[(Y-m(t))^2\mid X=t]$. Then $v(x)=\sigma^2(x)$. Using the law of total expectation with respect to the regular conditional law of $Y$ given $X=t$, and then making the change of variables $t=x-hu$, with $d\mathcal L^1(t)=h\,d\mathcal L^1(u)$, gives \begin{align*} \mathbb E[U_{1,h}^2]=\frac{1}{h}\int_{\mathbb R}K(u)^2\mathbb E[(Y-r_h)^2\mid X=x-hu]f_X(x-hu)\,d\mathcal L^1(u). \end{align*} For $u\in\operatorname{supp}K$ and all sufficiently small $h$, the point $x-hu$ lies in $I$. For every $t\in I$, the identity $\mathbb E[Y\mid X=t]=m(t)$ gives \begin{align*} \mathbb E[(Y-r_h)^2\mid X=t]=\sigma^2(t)+(m(t)-r_h)^2. \end{align*} Since $r_h\to m(x)$, $m$ is continuous at $x$, $\sigma^2$ is continuous at $x$, and $f_X$ is continuous at $x$, the integrand converges pointwise to \begin{align*} K(u)^2\sigma^2(x)f_X(x). \end{align*} The local boundedness of $Y$, boundedness of $m$ near $x$, boundedness of $f_X$ near $x$, and compact support of $K$ give an integrable dominating function $C_3K(u)^2\mathbb 1_{[-R_K,R_K]}(u)$ with respect to $\mathcal L^1$, where $C_3>0$ is chosen large enough to dominate these local bounds and depends only on the same local quantities. By the [Dominated Convergence Theorem](/theorems/4), \begin{align*} h\mathbb E[U_{1,h}^2]\to \sigma^2(x)f_X(x)R(K). \end{align*} Since $a_h\to f_X(x)>0$, we obtain \begin{align*} \operatorname{Var}(\hat m_{NW}(x))=\frac{1}{nh}\frac{\sigma^2(x)}{f_X(x)}R(K)+o((nh)^{-1}). \end{align*} This is the stated variance expansion. [guided] The variance calculation has two separate tasks: first show that replacing the ratio by its first-order linearisation costs only $o((nh)^{-1})$, and then compute the variance of that linearised term. Define \begin{align*} U_{i,h}:=(Y_i-r_h)Z_{i,h} \end{align*} for $i\in\{1,\dots,n\}$. On the good denominator event $B_{n,h}\cap A_{n,h}(x)$, the exact algebraic identity gives \begin{align*} \hat m_{NW}(x)-r_h =\frac{1}{na_h}\sum_{i=1}^n U_{i,h} -\frac{\left(\sum_{i=1}^n U_{i,h}\right)(S_{n,h}-na_h)}{na_hS_{n,h}}. \end{align*} Set $V_{n,h}:=\sum_{i=1}^n U_{i,h}$ and $W_{n,h}:=S_{n,h}-na_h$. Since $S_{n,h}\ge na_h/2$ on $B_{n,h}$, the remainder satisfies \begin{align*} R_{n,h}^2\mathbb 1_{B_{n,h}} \le \frac{4V_{n,h}^2W_{n,h}^2}{n^4a_h^4}. \end{align*} The local second-moment hypothesis is enough if we condition on the design points, but the summands are not conditionally centered at $r_h$. Define $\varepsilon_i:=Y_i-m(X_i)$ and \begin{align*} D_{n,h}:=\sum_{i=1}^n(m(X_i)-r_h)Z_{i,h}. \end{align*} Then $V_{n,h}=\sum_i\varepsilon_iZ_{i,h}+D_{n,h}$. Conditional on $X_1,\dots,X_n$, the quantity $W_{n,h}$ is fixed, and conditional independence gives \begin{align*} \mathbb E\left[\left(\sum_{i=1}^n\varepsilon_iZ_{i,h}\right)^2\mid X_1,\dots,X_n\right]\le C_0\sum_{i=1}^n Z_{i,h}^2. \end{align*} Because $m$ is continuously differentiable near $x$ and $r_h=m(x)+O(h^2)$, we also have $|m(t)-r_h|\le C_mh$ on the local kernel support, hence $D_{n,h}^2\le C_m^2h^2S_{n,h}^2$. Since $K$ is bounded and nonnegative, $Z_{i,h}^2\le M_Kh^{-1}Z_{i,h}$, so $\sum_i Z_{i,h}^2\le M_Kh^{-1}S_{n,h}$. On $B_{n,h}$, $S_{n,h}\le 3na_h/2$, and therefore \begin{align*} \mathbb E[V_{n,h}^2\mid X_1,\dots,X_n]\mathbb 1_{B_{n,h}}\le C_1(nh^{-1}+n^2h^2)\mathbb 1_{B_{n,h}}. \end{align*} Using this in the displayed remainder bound gives \begin{align*} \mathbb E[R_{n,h}^2\mathbb 1_{B_{n,h}}] \le C_2\left(\frac{1}{n^3h}+\frac{h^2}{n^2}\right)\mathbb E[W_{n,h}^2] =O(n^{-2}h^{-2})+O(h/n)=o((nh)^{-1}), \end{align*} because $\mathbb E[W_{n,h}^2]=O(nh^{-1})$, $nh\to\infty$, and $h\to0$. We must also control the event $B_{n,h}^c$, where the denominator might be small. On this event, a second-moment bound for the numerator alone would not control the ratio. Instead, use nonnegativity of the weights: on $A_{n,h}(x)$, the estimator is a weighted average of those $Y_i$ whose $X_i$ lie within the compact local support of the kernel. Jensen's inequality for these finite probability weights gives \begin{align*} \hat m_{NW}(x)^2\le \frac{\sum_{i=1}^n K_h(x-X_i)Y_i^2}{\sum_{i=1}^n K_h(x-X_i)}. \end{align*} After conditioning on $X_1,\dots,X_n$, the local second-moment bound makes the [conditional expectation](/page/Conditional%20Expectation) of the right-hand side uniformly bounded. On $A_{n,h}(x)^c$ the estimator is defined to be $0$. Since $r_h\to m(x)$, the squared error on $B_{n,h}^c$ is bounded in expectation by a fixed constant times $\mathbb P(B_{n,h}^c)$, which is exponentially small by Bernstein's inequality. Thus the bad-denominator contribution from $\hat m_{NW}(x)-r_h$ is $o((nh)^{-1})$. We must also control the leading linearised term on $B_{n,h}^c$, because the ratio identity was only used on $B_{n,h}$. If $L_{n,h}:=(na_h)^{-1}\sum_{i=1}^n U_{i,h}=V_{n,h}/(na_h)$, then the corrected conditional estimate gives \begin{align*} \mathbb E[L_{n,h}^2\mathbb 1_{B_{n,h}^c}] \le \frac{C}{n^2}\left(h^{-1}\mathbb E[S_{n,h}\mathbb 1_{B_{n,h}^c}]+h^2\mathbb E[S_{n,h}^2\mathbb 1_{B_{n,h}^c}]\right). \end{align*} Applying the [Cauchy-Schwarz Inequality](/theorems/1201) to $S_{n,h}$ and $\mathbb 1_{B_{n,h}^c}$, and again to $S_{n,h}^2$ and $\mathbb 1_{B_{n,h}^c}$, reduces the two terms to polynomial moments of $S_{n,h}$ multiplied by $\mathbb P(B_{n,h}^c)^{1/2}$. These polynomial moments are finite with polynomial growth because $K$ is bounded and compactly supported, while the Bernstein bound for $B_{n,h}^c$ is exponentially small in $nh$. Therefore \begin{align*} \mathbb E[L_{n,h}^2\mathbb 1_{B_{n,h}^c}]=o((nh)^{-1}). \end{align*} Combining this with the good-event remainder estimate and $(a+b)^2\le2a^2+2b^2$ shows that $\hat m_{NW}(x)-r_h-L_{n,h}$ has squared expectation $o((nh)^{-1})$. Since $\mathbb E[L_{n,h}^2]=O((nh)^{-1})$, the [Cauchy-Schwarz Inequality](/theorems/1201) in $L^2(\Omega)$ bounds the covariance between $L_{n,h}$ and the remainder by $o((nh)^{-1})$. Therefore the variance of $\hat m_{NW}(x)$ is the variance of $L_{n,h}$ up to $o((nh)^{-1})$. It remains to compute the leading variance. Independence gives \begin{align*} \operatorname{Var}\left(\sum_{i=1}^n U_{i,h}\right)=n\operatorname{Var}(U_{1,h}), \end{align*} and $\mathbb E[U_{1,h}]=0$ because $r_h=b_h/a_h$. Hence $\operatorname{Var}(U_{1,h})=\mathbb E[U_{1,h}^2]$. Using the regular conditional version fixed in the statement, define the conditional variance map $v:I\to[0,\infty)$ near $x$ by $v(t)=\mathbb E[(Y-m(t))^2\mid X=t]$, so that $v(x)=\sigma^2(x)$. Using the law of total expectation with respect to the regular conditional law of $Y$ given $X=t$, and then setting $t=x-hu$ with $d\mathcal L^1(t)=h\,d\mathcal L^1(u)$, \begin{align*} \mathbb E[U_{1,h}^2]=\frac{1}{h}\int_{\mathbb R}K(u)^2\mathbb E[(Y-r_h)^2\mid X=x-hu]f_X(x-hu)\,d\mathcal L^1(u). \end{align*} For $t\in I$, the regression identity $\mathbb E[Y\mid X=t]=m(t)$ gives \begin{align*} \mathbb E[(Y-r_h)^2\mid X=t]=\sigma^2(t)+(m(t)-r_h)^2. \end{align*} The integrand converges pointwise to $K(u)^2\sigma^2(x)f_X(x)$ because $r_h\to m(x)$ and $m$, the conditional variance, and the density are continuous at $x$. The same local boundedness assumptions provide the integrable domination $C_3K(u)^2\mathbb 1_{[-R_K,R_K]}(u)$, where $C_3>0$ is chosen to dominate the local bounds. The [Dominated Convergence Theorem](/theorems/4) gives \begin{align*} h\mathbb E[U_{1,h}^2]\to \sigma^2(x)f_X(x)R(K). \end{align*} Combining this with $a_h\to f_X(x)>0$ gives \begin{align*} \operatorname{Var}(\hat m_{NW}(x))=\frac{1}{nh}\frac{\sigma^2(x)}{f_X(x)}R(K)+o((nh)^{-1}). \end{align*} [/guided] [/step]

Prerequisites (0/7 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.