[step:Transfer local average fluctuations to the fitted intercepts]
For the local constant estimator,
\begin{align*}
\hat m_{0,h}(a)=\frac{\hat q_0}{\hat s_0}.
\end{align*}
The corresponding population target is
\begin{align*}
m_{0,h,\mathrm{pop}}(a)=\frac{q_0}{s_0}.
\end{align*}
Since $s_0\to f_X(a)\mu_0^+>0$, there are constants $h_0>0$ and $c_s>0$ such that $s_0\geq c_s$ for $0<h<h_0$. Since $\hat s_0-s_0=O_{\mathbb P}((nh)^{-1/2})$ and $nh\to\infty$, the denominator $\hat s_0$ is bounded below by $c_s/2$ with probability tending to $1$. On the compact neighborhood where $s\geq c_s/2$, the quotient map $(q,s)\mapsto q/s$ is continuously differentiable with bounded derivative. The finite-dimensional [delta method](/theorems/1861), equivalently the [mean value theorem](/theorems/186) on this compact neighborhood, gives
\begin{align*}
\hat m_{0,h}(a)-m_{0,h,\mathrm{pop}}(a)
=
O_{\mathbb P}((nh)^{-1/2}).
\end{align*}
For local linear regression, define $\hat G_h\in\mathbb R^{2\times2}$ by $(\hat G_h)_{jk}=\hat s_{j+k}$ for $0\leq j,k\leq1$, and define $\hat r_h\in\mathbb R^2$ by
\begin{align*}
\hat r_h=(\hat q_0,\hat q_1)^\top.
\end{align*}
Define $G_h\in\mathbb R^{2\times2}$ by $(G_h)_{jk}=s_{j+k}$ for $0\leq j,k\leq1$, and define $r_h\in\mathbb R^2$ by
\begin{align*}
r_h=(q_0,q_1)^\top.
\end{align*}
The rescaled empirical local linear coefficients satisfy
\begin{align*}
\hat G_h(\hat\gamma_{0,h},\hat\gamma_{1,h})^\top=\hat r_h,
\end{align*}
and the population coefficients satisfy
\begin{align*}
G_h(\gamma_{0,h},\gamma_{1,h})^\top=r_h.
\end{align*}
From the preceding step,
\begin{align*}
\|\hat G_h-G_h\|_{\mathrm{op}}
=
O_{\mathbb P}((nh)^{-1/2}).
\end{align*}
Also,
\begin{align*}
|\hat r_h-r_h|
=
O_{\mathbb P}((nh)^{-1/2}).
\end{align*}
Also $G_h\to f_X(a)M^+$. Since the smallest singular value of the nonsingular matrix $f_X(a)M^+$ is positive, there are constants $h_1>0$ and $c_G>0$ such that the smallest singular value of $G_h$ is at least $c_G$ for $0<h<h_1$. The perturbation bound $\|\hat G_h-G_h\|_{\mathrm{op}}=O_{\mathbb P}((nh)^{-1/2})$ and $nh\to\infty$ imply that the smallest singular value of $\hat G_h$ is at least $c_G/2$ with probability tending to $1$. Therefore $\|G_h^{-1}\|_{\mathrm{op}}\leq c_G^{-1}$ for small $h$, and $\|\hat G_h^{-1}\|_{\mathrm{op}}\leq 2c_G^{-1}$ with probability tending to $1$. On this event,
\begin{align*}
(\hat\gamma_{0,h},\hat\gamma_{1,h})^\top
-
(\gamma_{0,h},\gamma_{1,h})^\top
=
\hat G_h^{-1}(\hat r_h-r_h)
+
\hat G_h^{-1}(G_h-\hat G_h)(\gamma_{0,h},\gamma_{1,h})^\top.
\end{align*}
The population coefficient vector is bounded, because it converges to $(m(a),0)^\top$ in the rescaled coordinates. Hence
\begin{align*}
(\hat\gamma_{0,h},\hat\gamma_{1,h})^\top
-
(\gamma_{0,h},\gamma_{1,h})^\top
=
O_{\mathbb P}((nh)^{-1/2}).
\end{align*}
Taking first coordinates, and using $\hat m_{1,h}(a)=\hat\gamma_{0,h}$ and $m_{1,h,\mathrm{pop}}(a)=\gamma_{0,h}$, gives
\begin{align*}
\hat m_{1,h}(a)-m_{1,h,\mathrm{pop}}(a)
=
O_{\mathbb P}((nh)^{-1/2}).
\end{align*}
Together with the local constant conclusion, this proves the empirical part and completes the theorem.
[/step]