Androma — The Home of Mathematics on the Internet

custom_env admin

[step:Expand the local constant ratio to first order] Define $m_0:=m(a)$ and $m_1:=m'(a)$. Since $m\in C^2([a,a+\eta])$, define the Taylor remainder map $R_{m,h}:[0,1]\to\mathbb R$ by \begin{align*} R_{m,h}(u):=\frac{m(a+hu)-m_0-hm_1u}{h^2} \end{align*} for $h>0$. [Taylor's theorem](/theorems/827) on the compact interval $[a,a+\eta]$ gives a constant $C_m>0$ such that $|R_{m,h}(u)|\leq C_m$ for every $u\in[0,1]$ and all sufficiently small $h$. Define the denominator moment $D_h\in\mathbb R$ by \begin{align*} D_h:=\int_0^1K(u)f_X(a+hu)\,d\mathcal L^1(u), \end{align*} define the first weighted design moment $I_h\in\mathbb R$ by \begin{align*} I_h:=\int_0^1uK(u)f_X(a+hu)\,d\mathcal L^1(u), \end{align*} and define the bounded remainder moment $J_h\in\mathbb R$ by \begin{align*} J_h:=\int_0^1K(u)R_{m,h}(u)f_X(a+hu)\,d\mathcal L^1(u). \end{align*} The function $J_h$ is bounded uniformly in $h$, because $K$ is bounded on $[0,1]$, $R_{m,h}$ is uniformly bounded, and $f_X$ is continuous on $[a,a+\eta]$. Substituting the Taylor formula for $m$ into the numerator gives \begin{align*} \int_0^1K(u)m(a+hu)f_X(a+hu)\,d\mathcal L^1(u) = m_0D_h+hm_1I_h+h^2J_h. \end{align*} Therefore \begin{align*} m_{0,h,\mathrm{pop}}(a)-m(a) = hm_1\frac{I_h}{D_h}+h^2\frac{J_h}{D_h}. \end{align*} The denominator satisfies $D_h\to f_X(a)\mu_0^+>0$, so $J_h/D_h=O(1)$. It remains to identify $I_h/D_h$ to zeroth order with an $O(h)$ error. Define $f_0:=f_X(a)$ and $f_1:=f_X'(a)$. Since $f_X\in C^1([a,a+\eta])$, uniformly for $u\in[0,1]$, \begin{align*} f_X(a+hu)=f_0+hf_1u+h\rho_h(u), \end{align*} where the remainder map $\rho_h:[0,1]\to\mathbb R$ satisfies $\sup_{u\in[0,1]}|\rho_h(u)|\to0$. Hence \begin{align*} D_h=f_0\mu_0^+ + hf_1\mu_1^+ + h\theta_{0,h} \end{align*} and \begin{align*} I_h=f_0\mu_1^+ + hf_1\mu_2^+ + h\theta_{1,h}, \end{align*} where $\theta_{0,h}\to0$ and $\theta_{1,h}\to0$. Since $f_0\mu_0^+>0$, the quotient expansion with denominator bounded away from zero gives \begin{align*} \frac{I_h}{D_h}=\frac{\mu_1^+}{\mu_0^+}+O(h). \end{align*} Combining this with the previous identity yields \begin{align*} m_{0,h,\mathrm{pop}}(a)-m(a) = h\,m'(a)\frac{\mu_1^+}{\mu_0^+} + O(h^2). \end{align*} [/step]

custom_env admin

[guided]The reason local linear regression removes the boundary first-order bias is not symmetry; near $a$ there is no symmetry. The reason is exact reproduction of affine functions. Work in the rescaled coordinate $u=(t-a)/h$. In that coordinate, the first two Taylor terms of $m(a+hu)$ form the affine map $\ell_h:[0,1]\to\mathbb R$ defined by \begin{align*} \ell_h(u)=m(a)+hm'(a)u. \end{align*} If the regression target were exactly $\ell_h(u)$, then the weighted least-squares minimizer would be exactly $(m(a),hm'(a))^\top$, because the model class $\gamma_0+\gamma_1u$ contains $\ell_h$ itself. Thus the intercept would be exactly $m(a)$. Now quantify the error from replacing $m(a+hu)$ by $\ell_h(u)$. Since $m\in C^2([a,a+\eta])$, its second derivative is bounded on this compact interval. Choose \begin{align*} C_m:=\frac12\sup_{s\in[a,a+\eta]}|m''(s)|. \end{align*} Taylor's formula with remainder gives, for every $u\in[0,1]$ and all $h<\eta$, \begin{align*} |m(a+hu)-m(a)-hm'(a)u| \leq C_mh^2u^2 \leq C_mh^2. \end{align*} Define \begin{align*} G_h\in\mathbb R^{2\times2},\qquad (G_h)_{jk} := \int_0^1u^{j+k}K(u)f_X(a+hu)\,d\mathcal L^1(u), \quad 0\leq j,k\leq1, \end{align*} and define \begin{align*} r_h\in\mathbb R^2,\qquad (r_h)_j := \int_0^1u^jK(u)m(a+hu)f_X(a+hu)\,d\mathcal L^1(u), \quad 0\leq j\leq1. \end{align*} The local linear normal equations are \begin{align*} G_h(\gamma_{0,h},\gamma_{1,h})^\top=r_h. \end{align*} Also define $c_h\in\mathbb R^2$ by \begin{align*} c_h:=(m(a),hm'(a))^\top. \end{align*} Then $G_hc_h$ is exactly the right-hand side that would occur if $m(a+hu)$ were replaced by $\ell_h(u)$. Therefore the only error is \begin{align*} e_h:=r_h-G_hc_h, \end{align*} whose components are \begin{align*} (e_h)_j = \int_0^1u^jK(u) \bigl(m(a+hu)-m(a)-hm'(a)u\bigr) f_X(a+hu)\,d\mathcal L^1(u). \end{align*} The integrand is bounded in absolute value by a constant times $h^2K(u)$, because $u^j\leq1$, $K$ is bounded and integrable on $[0,1]$, $f_X$ is bounded near $a$, and the Taylor remainder is $O(h^2)$ uniformly in $u$. Hence $e_h=O(h^2)$ in $\mathbb R^2$. Finally, $G_h\to f_X(a)M^+$ entrywise. Since $f_X(a)>0$ and $M^+$ is nonsingular, $f_X(a)M^+$ is nonsingular. By continuity of the determinant, $\det G_h$ stays bounded away from $0$ for all sufficiently small $h$; using the adjugate formula for a $2\times2$ inverse, $G_h^{-1}$ exists and is uniformly bounded for all sufficiently small $h$. Thus \begin{align*} (\gamma_{0,h},\gamma_{1,h})^\top-c_h = G_h^{-1}e_h = O(h^2). \end{align*} Taking the first coordinate yields \begin{align*} m_{1,h,\mathrm{pop}}(a)-m(a) = \gamma_{0,h}-m(a) = O(h^2). \end{align*} This is the precise sense in which local linear fitting removes the boundary term of order $h$.[/guided]

custom_env admin

[step:Control the empirical local averages on the boundary scale] For integers $j\geq0$, define the normalized empirical design moment by \begin{align*} \hat s_j := \frac1{nh}\sum_{i=1}^n \left(\frac{X_i-a}{h}\right)^j K\!\left(\frac{X_i-a}{h}\right). \end{align*} Define the normalized empirical response moment by \begin{align*} \hat q_j := \frac1{nh}\sum_{i=1}^n \left(\frac{X_i-a}{h}\right)^j K\!\left(\frac{X_i-a}{h}\right)Y_i. \end{align*} Let $s_j:=\mathbb E[\hat s_j]$ and $q_j:=\mathbb E[\hat q_j]$. For each fixed $j$ needed below, namely $0\leq j\leq2$ for design moments, \begin{align*} \hat s_j-s_j=O_{\mathbb P}((nh)^{-1/2}). \end{align*} For each fixed $j$ with $0\leq j\leq1$ for response moments, \begin{align*} \hat q_j-q_j=O_{\mathbb P}((nh)^{-1/2}). \end{align*} Indeed, for $\hat s_j$, boundedness of $K$ and support in $[0,1]$ after restriction to the design support give \begin{align*} \operatorname{Var}(\hat s_j) = \frac1{nh^2} \operatorname{Var}\!\left[ \left(\frac{X-a}{h}\right)^jK\!\left(\frac{X-a}{h}\right) \right]. \end{align*} Using $0\leq (X-a)/h\leq1$ on the effective support and bounding variance by the second moment gives \begin{align*} \operatorname{Var}(\hat s_j) \leq \frac1{nh^2} \mathbb E\!\left[ K\!\left(\frac{X-a}{h}\right)^2 \right]. \end{align*} Changing variables $t=a+hu$ gives \begin{align*} \mathbb E\!\left[ K\!\left(\frac{X-a}{h}\right)^2 \right] = h\int_0^1K(u)^2f_X(a+hu)\,d\mathcal L^1(u). \end{align*} Since $K$ is bounded on $[0,1]$ and $f_X$ is bounded on $[a,a+\eta]$, this expectation is $O(h)$, so $\operatorname{Var}(\hat s_j)=O((nh)^{-1})$. For $\hat q_j$, write $Y=m(X)+\varepsilon$. On $[a,a+h]$, the function $m$ is bounded for small $h$. Since $\mathbb E[\varepsilon\mid X=t]=0$, the boundedness of $\sigma^2(t)=\operatorname{Var}(\varepsilon\mid X=t)$ on $[a,a+\eta]$ gives a uniform bound on $\mathbb E[\varepsilon^2\mid X=t]$ there. Hence $\mathbb E[Y^2\mid X=t]$ is bounded uniformly for $t\in[a,a+\eta]$. Therefore \begin{align*} \operatorname{Var}(\hat q_j) \leq \frac1{nh^2} \mathbb E\!\left[ K\!\left(\frac{X-a}{h}\right)^2Y^2 \right]. \end{align*} By the same change of variables and the uniform bound on $\mathbb E[Y^2\mid X=t]$ for $t\in[a,a+\eta]$, the right-hand side is $O((nh)^{-1})$. For each fixed $h$, the summands defining $\hat s_j$ and $\hat q_j$ are i.i.d. finite-variance random variables; as $h\to0$ they form a triangular array indexed by the bandwidth. Applying [Chebyshev's inequality](/theorems/1126) to each centered average gives the asserted stochastic orders. [/step]

custom_env admin

[step:Transfer local average fluctuations to the fitted intercepts] For the local constant estimator, \begin{align*} \hat m_{0,h}(a)=\frac{\hat q_0}{\hat s_0}. \end{align*} The corresponding population target is \begin{align*} m_{0,h,\mathrm{pop}}(a)=\frac{q_0}{s_0}. \end{align*} Since $s_0\to f_X(a)\mu_0^+>0$, there are constants $h_0>0$ and $c_s>0$ such that $s_0\geq c_s$ for $0<h<h_0$. Since $\hat s_0-s_0=O_{\mathbb P}((nh)^{-1/2})$ and $nh\to\infty$, the denominator $\hat s_0$ is bounded below by $c_s/2$ with probability tending to $1$. On the compact neighborhood where $s\geq c_s/2$, the quotient map $(q,s)\mapsto q/s$ is continuously differentiable with bounded derivative. The finite-dimensional [delta method](/theorems/1861), equivalently the [mean value theorem](/theorems/186) on this compact neighborhood, gives \begin{align*} \hat m_{0,h}(a)-m_{0,h,\mathrm{pop}}(a) = O_{\mathbb P}((nh)^{-1/2}). \end{align*} For local linear regression, define $\hat G_h\in\mathbb R^{2\times2}$ by $(\hat G_h)_{jk}=\hat s_{j+k}$ for $0\leq j,k\leq1$, and define $\hat r_h\in\mathbb R^2$ by \begin{align*} \hat r_h=(\hat q_0,\hat q_1)^\top. \end{align*} Define $G_h\in\mathbb R^{2\times2}$ by $(G_h)_{jk}=s_{j+k}$ for $0\leq j,k\leq1$, and define $r_h\in\mathbb R^2$ by \begin{align*} r_h=(q_0,q_1)^\top. \end{align*} The rescaled empirical local linear coefficients satisfy \begin{align*} \hat G_h(\hat\gamma_{0,h},\hat\gamma_{1,h})^\top=\hat r_h, \end{align*} and the population coefficients satisfy \begin{align*} G_h(\gamma_{0,h},\gamma_{1,h})^\top=r_h. \end{align*} From the preceding step, \begin{align*} \|\hat G_h-G_h\|_{\mathrm{op}} = O_{\mathbb P}((nh)^{-1/2}). \end{align*} Also, \begin{align*} |\hat r_h-r_h| = O_{\mathbb P}((nh)^{-1/2}). \end{align*} Also $G_h\to f_X(a)M^+$. Since the smallest singular value of the nonsingular matrix $f_X(a)M^+$ is positive, there are constants $h_1>0$ and $c_G>0$ such that the smallest singular value of $G_h$ is at least $c_G$ for $0<h<h_1$. The perturbation bound $\|\hat G_h-G_h\|_{\mathrm{op}}=O_{\mathbb P}((nh)^{-1/2})$ and $nh\to\infty$ imply that the smallest singular value of $\hat G_h$ is at least $c_G/2$ with probability tending to $1$. Therefore $\|G_h^{-1}\|_{\mathrm{op}}\leq c_G^{-1}$ for small $h$, and $\|\hat G_h^{-1}\|_{\mathrm{op}}\leq 2c_G^{-1}$ with probability tending to $1$. On this event, \begin{align*} (\hat\gamma_{0,h},\hat\gamma_{1,h})^\top - (\gamma_{0,h},\gamma_{1,h})^\top = \hat G_h^{-1}(\hat r_h-r_h) + \hat G_h^{-1}(G_h-\hat G_h)(\gamma_{0,h},\gamma_{1,h})^\top. \end{align*} The population coefficient vector is bounded, because it converges to $(m(a),0)^\top$ in the rescaled coordinates. Hence \begin{align*} (\hat\gamma_{0,h},\hat\gamma_{1,h})^\top - (\gamma_{0,h},\gamma_{1,h})^\top = O_{\mathbb P}((nh)^{-1/2}). \end{align*} Taking first coordinates, and using $\hat m_{1,h}(a)=\hat\gamma_{0,h}$ and $m_{1,h,\mathrm{pop}}(a)=\gamma_{0,h}$, gives \begin{align*} \hat m_{1,h}(a)-m_{1,h,\mathrm{pop}}(a) = O_{\mathbb P}((nh)^{-1/2}). \end{align*} Together with the local constant conclusion, this proves the empirical part and completes the theorem. [/step]

custom_env admin

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data