Asymptotic Normality of Interior Local Polynomial Regression Estimators with Negligible Projection Residual

Asymptotic Normality of Interior Local Polynomial Regression Estimators with Negligible Projection Residual (Theorem # 6351)

Theorem

Edit Issues Pull Requests Attributions Admin

Let $p\in\mathbb N\cup\{0\}$ and let $r_p:\mathbb R\to\mathbb R^{p+1}$ be the polynomial basis map $r_p(u)=(1,u,\dots,u^p)^\top$. Let $e_0\in\mathbb R^{p+1}$ be the first coordinate vector. Let $(X_i,Y_i)_{i=1}^n$ be i.i.d. copies of a pair $(X,Y)$ on a probability space $(\Omega,\mathcal F,\mathbb P)$ such that $X$ has Lebesgue density $f_X:\mathbb R\to[0,\infty)$, $Y=m(X)+\varepsilon$, $m:\mathbb R\to\mathbb R$ is measurable, $\varepsilon:\Omega\to\mathbb R$ is integrable conditionally on $X$, $\mathbb E[\varepsilon\mid X]=0$, and $\operatorname{Var}(\varepsilon\mid X=t)=\sigma^2(t)$ for a measurable function $\sigma^2:\mathbb R\to[0,\infty)$. Fix $x\in\mathbb R$ such that $f_X(x)>0$, and assume that $f_X$ is continuous and positive on a neighbourhood of $x$, that $\sigma^2$ is continuous at $x$, and that there are $\delta>0$, a neighbourhood $I_x$ of $x$, and $C_\delta<\infty$ such that $\mathbb E[|\varepsilon|^{2+\delta}\mid X=t]\le C_\delta$ for every $t\in I_x$. Let $K:\mathbb R\to\mathbb R$ be a bounded Borel measurable compactly supported kernel with finite moments through order $2p$, and assume that the matrix $M_p\in\mathbb R^{(p+1)\times(p+1)}$ defined by \begin{align*} (M_p)_{jk}=\int_{\mathbb R}u^{j+k}K(u)\,d\mathcal L^1(u) \end{align*} for $0\le j,k\le p$ is nonsingular. Let $h=h_n>0$ satisfy $h\to0$ and $nh\to\infty$. Define $U_{i,n}=(X_i-x)/h$, \begin{align*} S_n=\frac{1}{nh}\sum_{i=1}^n K(U_{i,n})r_p(U_{i,n})r_p(U_{i,n})^\top, \end{align*} and \begin{align*} T_n=\frac{1}{nh}\sum_{i=1}^n K(U_{i,n})r_p(U_{i,n})Y_i. \end{align*} On the event that $S_n$ is invertible, define the local polynomial intercept estimator by $\hat m_p(x)=e_0^\top S_n^{-1}T_n$. For each sufficiently small $h$, define the population moment matrix and population response vector by \begin{align*} A_h=\frac{1}{h}\mathbb E\left[K\left(\frac{X-x}{h}\right)r_p\left(\frac{X-x}{h}\right)r_p\left(\frac{X-x}{h}\right)^\top\right] \end{align*} and \begin{align*} a_h=\frac{1}{h}\mathbb E\left[K\left(\frac{X-x}{h}\right)r_p\left(\frac{X-x}{h}\right)m(X)\right]. \end{align*} Assume $A_h$ is invertible for all sufficiently small $h$, set $\beta_{p,h,\mathrm{pop}}(x)=A_h^{-1}a_h$, and set $m_{p,h,\mathrm{pop}}(x)=e_0^\top\beta_{p,h,\mathrm{pop}}(x)$. Assume that there are an integer $r\ge1$, a finite coefficient $b_p(x)$, and a deterministic remainder $\rho_{n,h}(x)$ such that \begin{align*} m_{p,h,\mathrm{pop}}(x)-m(x)=b_p(x)h^r+\rho_{n,h}(x) \end{align*} and \begin{align*} \sqrt{nh}\,\rho_{n,h}(x)\to0. \end{align*} Define $d_h:\mathbb R\to\mathbb R$ by \begin{align*} d_h(t)=m(t)-r_p((t-x)/h)^\top\beta_{p,h,\mathrm{pop}}(x). \end{align*} Assume the projection residual is negligible on the local $L^2$ scale: \begin{align*} \frac{1}{h}\mathbb E\left[K\left(\frac{X-x}{h}\right)^2\left|r_p\left(\frac{X-x}{h}\right)\right|^2|d_h(X)|^2\right]\to0. \end{align*} Then $S_n$ is invertible with probability tending to $1$, and \begin{align*} \sqrt{nh}\left(\hat m_p(x)-m(x)-b_p(x)h^r\right)\xrightarrow{d}\mathcal N\left(0,\frac{\sigma^2(x)}{f_X(x)}V_p(K)\right), \end{align*} where \begin{align*} V_p(K)=e_0^\top M_p^{-1}\left(\int_{\mathbb R}r_p(u)r_p(u)^\top K(u)^2\,d\mathcal L^1(u)\right)M_p^{-1}e_0. \end{align*}

Discussion

Proof

[proofplan] We write the estimator through its weighted normal equations and separate it into the population local polynomial target, a stochastic weighted error term, and an empirical projection residual. The deterministic target is handled by the assumed bias expansion, while the projection residual is negligible by its local $L^2$ hypothesis. The main stochastic term is a triangular array; we compute its limiting variance, verify Lyapunov's condition using the conditional $(2+\delta)$-moment bound, and then use Slutsky's theorem to replace the deterministic inverse by the empirical inverse. [/proofplan] [step:Reduce the estimator to a stochastic weighted error term] Let $T_{n,m}:\Omega\to\mathbb R^{p+1}$ and $Z_n:\Omega\to\mathbb R^{p+1}$ be the random vectors defined by \begin{align*} T_{n,m}=\frac{1}{nh}\sum_{i=1}^n K(U_{i,n})r_p(U_{i,n})m(X_i) \end{align*} and \begin{align*} Z_n=\frac{1}{nh}\sum_{i=1}^n K(U_{i,n})r_p(U_{i,n})\varepsilon_i. \end{align*} Since $Y_i=m(X_i)+\varepsilon_i$, we have $T_n=T_{n,m}+Z_n$. On the event that $S_n$ is invertible, \begin{align*} \hat m_p(x)=e_0^\top S_n^{-1}T_n. \end{align*} Thus \begin{align*} \hat m_p(x)-m(x)=m_{p,h,\mathrm{pop}}(x)-m(x)+e_0^\top S_n^{-1}Z_n+e_0^\top S_n^{-1}(T_{n,m}-S_n\beta_{p,h,\mathrm{pop}}(x)). \end{align*} By the definition of $d_h:\mathbb R\to\mathbb R$, \begin{align*} T_{n,m}-S_n\beta_{p,h,\mathrm{pop}}(x)=\frac{1}{nh}\sum_{i=1}^n K(U_{i,n})r_p(U_{i,n})d_h(X_i). \end{align*} The expectation of each summand is zero, because $A_h\beta_{p,h,\mathrm{pop}}(x)=a_h$. For each fixed vector $v\in\mathbb R^{p+1}$, independence gives \begin{align*} \operatorname{Var}\left(\frac{1}{\sqrt{nh}}\sum_{i=1}^n v^\top K(U_{i,n})r_p(U_{i,n})d_h(X_i)\right)\le |v|^2\frac{1}{h}\mathbb E\left[K(U_{1,n})^2|r_p(U_{1,n})|^2|d_h(X_1)|^2\right]. \end{align*} The right-hand side tends to $0$ by the projection-residual hypothesis. [Chebyshev's inequality](/theorems/1126) applied coordinatewise therefore yields \begin{align*} \sqrt{nh}(T_{n,m}-S_n\beta_{p,h,\mathrm{pop}}(x))\xrightarrow{\mathbb P}0. \end{align*} The next step proves $S_n^{-1}\xrightarrow{\mathbb P}f_X(x)^{-1}M_p^{-1}$, so $S_n^{-1}$ is tight. The product of a tight sequence and a sequence converging to $0$ in probability converges to $0$ in probability; hence \begin{align*} \sqrt{nh}\,e_0^\top S_n^{-1}(T_{n,m}-S_n\beta_{p,h,\mathrm{pop}}(x))=o_{\mathbb P}(1). \end{align*} Using the assumed bias expansion, \begin{align*} \sqrt{nh}\left(\hat m_p(x)-m(x)-b_p(x)h^r\right)=\sqrt{nh}\,e_0^\top S_n^{-1}Z_n+\sqrt{nh}\,\rho_{n,h}(x)+o_{\mathbb P}(1). \end{align*} Since $\sqrt{nh}\,\rho_{n,h}(x)\to0$, it remains to prove the normal limit for $\sqrt{nh}\,e_0^\top S_n^{-1}Z_n$. [/step] [step:Show that the empirical moment matrix converges to its population limit] For $0\le j,k\le p$, the $(j,k)$ entry of $S_n$ is \begin{align*} (S_n)_{jk}=\frac{1}{nh}\sum_{i=1}^n K(U_{i,n})U_{i,n}^{j+k}. \end{align*} Its expectation is \begin{align*} \mathbb E[(S_n)_{jk}]=\int_{\mathbb R}K(u)u^{j+k}f_X(x+hu)\,d\mathcal L^1(u). \end{align*} This follows from the substitution $t=x+hu$, under which $d\mathcal L^1(t)=h\,d\mathcal L^1(u)$. Since $K$ is bounded and compactly supported and $f_X$ is bounded near $x$, the integrands are dominated by an integrable function on $\mathbb R$. Since $f_X$ is continuous at $x$, the [Dominated Convergence Theorem](/theorems/4) gives \begin{align*} \mathbb E[(S_n)_{jk}]\to f_X(x)\int_{\mathbb R}K(u)u^{j+k}\,d\mathcal L^1(u). \end{align*} Also, \begin{align*} \operatorname{Var}((S_n)_{jk})\le \frac{1}{nh^2}\mathbb E[K(U_{1,n})^2U_{1,n}^{2j+2k}]. \end{align*} The same substitution gives \begin{align*} \mathbb E[K(U_{1,n})^2U_{1,n}^{2j+2k}]=h\int_{\mathbb R}K(u)^2u^{2j+2k}f_X(x+hu)\,d\mathcal L^1(u). \end{align*} The integral is bounded for all sufficiently small $h$, so $\operatorname{Var}((S_n)_{jk})=O((nh)^{-1})\to0$. Therefore $(S_n)_{jk}\xrightarrow{\mathbb P}f_X(x)(M_p)_{jk}$ for every $j,k$, and hence \begin{align*} S_n\xrightarrow{\mathbb P}f_X(x)M_p. \end{align*} Since $f_X(x)>0$ and $M_p$ is nonsingular, $f_X(x)M_p$ is nonsingular. Continuity of the determinant gives $\mathbb P(\det S_n\ne0)\to1$, and continuity of matrix inversion on the [open set](/page/Open%20Set) of nonsingular matrices gives \begin{align*} S_n^{-1}\xrightarrow{\mathbb P}\frac{1}{f_X(x)}M_p^{-1}. \end{align*} [guided] Fix indices $j,k\in\{0,\dots,p\}$. The entrywise convergence of $S_n$ is enough because the space of $(p+1)\times(p+1)$ matrices is finite-dimensional. The $(j,k)$ entry is \begin{align*} (S_n)_{jk}=\frac{1}{nh}\sum_{i=1}^n K(U_{i,n})U_{i,n}^{j+k}. \end{align*} We first compute its mean. Since $X_1$ has density $f_X$ with respect to $\mathcal L^1$, the definition of expectation gives \begin{align*} \mathbb E[(S_n)_{jk}]=\frac{1}{h}\int_{\mathbb R}K\left(\frac{t-x}{h}\right)\left(\frac{t-x}{h}\right)^{j+k}f_X(t)\,d\mathcal L^1(t). \end{align*} Make the change of variables $t=x+hu$. Under this substitution the domain $\mathbb R$ remains $\mathbb R$ and $d\mathcal L^1(t)=h\,d\mathcal L^1(u)$. Therefore \begin{align*} \mathbb E[(S_n)_{jk}]=\int_{\mathbb R}K(u)u^{j+k}f_X(x+hu)\,d\mathcal L^1(u). \end{align*} The integrand converges pointwise to $K(u)u^{j+k}f_X(x)$ because $f_X$ is continuous at $x$. To apply the [Dominated Convergence Theorem](/theorems/4), we need an integrable majorant. Since $K$ is compactly supported, choose $R>0$ with $\operatorname{supp}K\subset[-R,R]$. Since $f_X$ is continuous near $x$, it is bounded on $[x-Rh_0,x+Rh_0]$ for some $h_0>0$. For $0<h<h_0$ and $u\in[-R,R]$, the factor $f_X(x+hu)$ is bounded by a finite constant, while $K(u)u^{j+k}$ is bounded and supported on $[-R,R]$. Thus dominated convergence applies and yields \begin{align*} \mathbb E[(S_n)_{jk}]\to f_X(x)\int_{\mathbb R}K(u)u^{j+k}\,d\mathcal L^1(u). \end{align*} Next we show that the random fluctuation around the mean vanishes. Independence of the observations gives \begin{align*} \operatorname{Var}((S_n)_{jk})\le \frac{1}{nh^2}\mathbb E[K(U_{1,n})^2U_{1,n}^{2j+2k}]. \end{align*} Using again the substitution $t=x+hu$ with $d\mathcal L^1(t)=h\,d\mathcal L^1(u)$, we obtain \begin{align*} \mathbb E[K(U_{1,n})^2U_{1,n}^{2j+2k}]=h\int_{\mathbb R}K(u)^2u^{2j+2k}f_X(x+hu)\,d\mathcal L^1(u). \end{align*} The integral is uniformly bounded for small $h$ by the same compact-support and local-boundedness argument. Hence \begin{align*} \operatorname{Var}((S_n)_{jk})=O((nh)^{-1})\to0. \end{align*} Convergence of the mean plus convergence of the variance to zero implies \begin{align*} (S_n)_{jk}\xrightarrow{\mathbb P}f_X(x)(M_p)_{jk}. \end{align*} Since this holds for each of finitely many entries, \begin{align*} S_n\xrightarrow{\mathbb P}f_X(x)M_p. \end{align*} The limiting matrix is invertible because $f_X(x)>0$ and $M_p$ is nonsingular. The determinant is a continuous polynomial in the entries, so $\det S_n\xrightarrow{\mathbb P}\det(f_X(x)M_p)\ne0$. Therefore $S_n$ is invertible with probability tending to $1$. Finally, the inversion map is continuous at every nonsingular matrix, so the [continuous mapping theorem](/theorems/1847) gives \begin{align*} S_n^{-1}\xrightarrow{\mathbb P}\frac{1}{f_X(x)}M_p^{-1}. \end{align*} [/guided] [/step] [step:Compute the limiting variance of the deterministic-inverse error sum] Define $q_p:\mathbb R\to\mathbb R$ by \begin{align*} q_p(u)=e_0^\top M_p^{-1}r_p(u). \end{align*} For $1\le i\le n$, define the real-valued triangular-array variable \begin{align*} \xi_{i,n}=\frac{1}{\sqrt{nh}}\frac{1}{f_X(x)}q_p(U_{i,n})K(U_{i,n})\varepsilon_i. \end{align*} The variables $\xi_{1,n},\dots,\xi_{n,n}$ are independent and centered because the observations are i.i.d. and $\mathbb E[\varepsilon_i\mid X_i]=0$. Their variance sum is \begin{align*} \sum_{i=1}^n\mathbb E[\xi_{i,n}^2]=\frac{1}{hf_X(x)^2}\mathbb E[q_p(U_{1,n})^2K(U_{1,n})^2\sigma^2(X_1)]. \end{align*} Using $t=x+hu$ and $d\mathcal L^1(t)=h\,d\mathcal L^1(u)$, this becomes \begin{align*} \sum_{i=1}^n\mathbb E[\xi_{i,n}^2]=\frac{1}{f_X(x)^2}\int_{\mathbb R}q_p(u)^2K(u)^2\sigma^2(x+hu)f_X(x+hu)\,d\mathcal L^1(u). \end{align*} The integrands are dominated by an integrable function because $K$ is bounded with compact support, $q_p$ is a polynomial, $f_X$ is bounded near $x$, and $\sigma^2$ is bounded near $x$ by continuity at $x$. By the [Dominated Convergence Theorem](/theorems/4), \begin{align*} \sum_{i=1}^n\mathbb E[\xi_{i,n}^2]\to\frac{\sigma^2(x)}{f_X(x)}\int_{\mathbb R}q_p(u)^2K(u)^2\,d\mathcal L^1(u). \end{align*} Since \begin{align*} \int_{\mathbb R}q_p(u)^2K(u)^2\,d\mathcal L^1(u)=V_p(K), \end{align*} the limiting variance is $\sigma^2(x)V_p(K)/f_X(x)$. [/step] [step:Verify Lyapunov's condition for the triangular array] Let $R>0$ be such that $\operatorname{supp}K\subset[-R,R]$. Define \begin{align*} C_K=\sup_{u\in[-R,R]}\left|\frac{1}{f_X(x)}q_p(u)K(u)\right|. \end{align*} This constant is finite because $K$ is bounded and $q_p$ is continuous. Then \begin{align*} \sum_{i=1}^n\mathbb E[|\xi_{i,n}|^{2+\delta}]\le n(nh)^{-(1+\delta/2)}C_K^{2+\delta}\mathbb E[|\varepsilon_1|^{2+\delta}\mathbb 1_{\{|U_{1,n}|\le R\}}]. \end{align*} For all sufficiently small $h$, the event $\{|U_{1,n}|\le R\}$ implies $X_1\in I_x$. The conditional moment hypothesis gives \begin{align*} \mathbb E[|\varepsilon_1|^{2+\delta}\mathbb 1_{\{|U_{1,n}|\le R\}}]\le C_\delta\mathbb P(|U_{1,n}|\le R). \end{align*} Since $f_X$ is bounded near $x$, choose $C_X<\infty$ such that $f_X(t)\le C_X$ for $t\in[x-Rh_0,x+Rh_0]$ and all sufficiently small $h<h_0$. Then \begin{align*} \mathbb P(|U_{1,n}|\le R)\le 2RC_Xh. \end{align*} Combining these estimates gives \begin{align*} \sum_{i=1}^n\mathbb E[|\xi_{i,n}|^{2+\delta}]\le 2RC_XC_\delta C_K^{2+\delta}(nh)^{-\delta/2}\to0. \end{align*} For every $\eta>0$, the pointwise inequality $z^2\mathbb 1_{\{|z|>\eta\}}\le \eta^{-\delta}|z|^{2+\delta}$ gives \begin{align*} \sum_{i=1}^n\mathbb E[\xi_{i,n}^2\mathbb 1_{\{|\xi_{i,n}|>\eta\}}]\le \eta^{-\delta}\sum_{i=1}^n\mathbb E[|\xi_{i,n}|^{2+\delta}]\to0. \end{align*} Thus the Lindeberg condition holds. The Lindeberg-Feller [central limit theorem](/theorems/521) for independent triangular arrays applies to the centered row-wise independent array $(\xi_{i,n})_{1\le i\le n}$. Therefore \begin{align*} \sum_{i=1}^n\xi_{i,n}\xrightarrow{d}\mathcal N\left(0,\frac{\sigma^2(x)}{f_X(x)}V_p(K)\right). \end{align*} [/step] [step:Replace the deterministic inverse by the empirical inverse and finish] Define the vector-valued [random variable](/page/Random%20Variable) $W_n:\Omega\to\mathbb R^{p+1}$ by \begin{align*} W_n=\frac{1}{\sqrt{nh}}\sum_{i=1}^n K(U_{i,n})r_p(U_{i,n})\varepsilon_i. \end{align*} The deterministic-inverse limit from the preceding steps is \begin{align*} e_0^\top\frac{1}{f_X(x)}M_p^{-1}W_n\xrightarrow{d}\mathcal N\left(0,\frac{\sigma^2(x)}{f_X(x)}V_p(K)\right). \end{align*} For $0\le j,k\le p$, the covariance entry of $W_n$ is bounded in absolute value by \begin{align*} \frac{1}{h}\mathbb E[|K(U_{1,n})|^2|U_{1,n}|^{j+k}\sigma^2(X_1)]. \end{align*} After the substitution $t=x+hu$, this is bounded by an integral of a bounded compactly supported function times locally bounded $f_X\sigma^2$. Hence the covariance matrices of $W_n$ are uniformly bounded, and $W_n$ is tight in $\mathbb R^{p+1}$. Since \begin{align*} S_n^{-1}-\frac{1}{f_X(x)}M_p^{-1}\xrightarrow{\mathbb P}0, \end{align*} tightness of $W_n$ implies \begin{align*} e_0^\top\left(S_n^{-1}-\frac{1}{f_X(x)}M_p^{-1}\right)W_n\xrightarrow{\mathbb P}0. \end{align*} By Slutsky's theorem, \begin{align*} \sqrt{nh}\,e_0^\top S_n^{-1}Z_n\xrightarrow{d}\mathcal N\left(0,\frac{\sigma^2(x)}{f_X(x)}V_p(K)\right). \end{align*} The reduction step showed \begin{align*} \sqrt{nh}\left(\hat m_p(x)-m(x)-b_p(x)h^r\right)=\sqrt{nh}\,e_0^\top S_n^{-1}Z_n+\sqrt{nh}\,\rho_{n,h}(x)+o_{\mathbb P}(1). \end{align*} The assumption $\sqrt{nh}\,\rho_{n,h}(x)\to0$ and Slutsky's theorem therefore yield \begin{align*} \sqrt{nh}\left(\hat m_p(x)-m(x)-b_p(x)h^r\right)\xrightarrow{d}\mathcal N\left(0,\frac{\sigma^2(x)}{f_X(x)}V_p(K)\right). \end{align*} This proves the asserted asymptotic normality. [/step]

Prerequisites (0/5 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.