[proofplan]
We decompose the mean integrated squared error into the integrated squared bias and the integrated variance. The bias is the difference between $f$ and its kernel average; the order $s$ moment conditions eliminate all lower Taylor terms, and translation continuity of $f^{(s)}$ makes the Taylor remainder small in $L^2(\mathbb{R})$. The variance is computed directly by integrating the pointwise variance of one summand, giving the leading term $R(K)/(nh)$ and a lower-order subtraction from the squared mean. The condition $nh_n\to\infty$ is used after the variance expansion to record that this integrated variance contribution vanishes along the stated bandwidth sequence.
[/proofplan]
[step:Decompose the MISE into integrated bias and integrated variance]
Let $(\Omega,\mathcal F,\mathbb P)$ denote the probability space on which the i.i.d. real-valued random variables $X_1,\dots,X_n$ are defined. For each $x\in\mathbb{R}$ and each $i\in\{1,\dots,n\}$, define the real-valued [random variable](/page/Random%20Variable) $Y_{h,x,i}:\Omega\to\mathbb{R}$ by
\begin{align*}
Y_{h,x,i}(\omega)=\frac{1}{h}K\left(\frac{x-X_i(\omega)}{h}\right).
\end{align*}
Then
\begin{align*}
\hat f_{n,h}(x)=\frac{1}{n}\sum_{i=1}^n Y_{h,x,i}.
\end{align*}
Define the mean function $m_h:\mathbb{R}\to\mathbb{R}$ by
\begin{align*}
m_h(x)=\mathbb{E}[\hat f_{n,h}(x)].
\end{align*}
Since the summands are independent and identically distributed,
\begin{align*}
\operatorname{Var}(\hat f_{n,h}(x))=\frac{1}{n}\operatorname{Var}(Y_{h,x,1}).
\end{align*}
We use the convention
\begin{align*}
\operatorname{MISE}(\hat f_{n,h},f)
=
\int_{\mathbb{R}}\mathbb{E}\!\left[\left|\hat f_{n,h}(x)-f(x)\right|^2\right]\,dx,
\end{align*}
which agrees with the expectation of the integrated squared error whenever the latter is finite, by Tonelli's theorem for the non-negative squared error. Using the identity $\mathbb{E}[(Z-a)^2]=\operatorname{Var}(Z)+(\mathbb{E}[Z]-a)^2$ with $Z=\hat f_{n,h}(x)$ and $a=f(x)$, and integrating over $\mathbb{R}$, we obtain
\begin{align*}
\operatorname{MISE}(\hat f_{n,h},f)
=
\int_{\mathbb{R}} |m_h(x)-f(x)|^2\,dx
+
\int_{\mathbb{R}} \operatorname{Var}(\hat f_{n,h}(x))\,dx.
\end{align*}
[/step]
[step:Express the bias as a kernel average of translated functions]
Because $X_1$ has density $f$ with respect to [Lebesgue measure](/page/Lebesgue%20Measure), for every $x\in\mathbb{R}$,
\begin{align*}
m_h(x)
&=
\frac{1}{h}\int_{\mathbb{R}}K\left(\frac{x-y}{h}\right)f(y)\,dy.
\end{align*}
Apply the change of variables $u=(x-y)/h$, equivalently $y=x-hu$; this gives $dy=h\,du$ because $h>0$. Thus
\begin{align*}
m_h(x)
=
\int_{\mathbb{R}}K(u)f(x-hu)\,du.
\end{align*}
Since $\int_{\mathbb{R}}K(u)\,du=1$, define the bias function $b_h:\mathbb R\to\mathbb R$ by
\begin{align*}
b_h(x)=m_h(x)-f(x),\qquad x\in\mathbb{R},
\end{align*}
and then $b_h$ satisfies
\begin{align*}
b_h(x)
=
\int_{\mathbb{R}}K(u)\bigl(f(x-hu)-f(x)\bigr)\,du.
\end{align*}
[/step]
[step:Use the order $s$ moment conditions to isolate the leading bias term]
For $0\le j\le s$, let $f^{(j)}$ denote the $j$th [weak derivative](/page/Weak%20Derivative) of $f$, with $f^{(0)}:=f$. For fixed $u\in\mathbb{R}$ and $0\le k\le s-1$, the map $t\mapsto f^{(k)}(\cdot-tu)$ is absolutely continuous as an $L^2(\mathbb{R})$-valued map on bounded intervals, with derivative $-u f^{(k+1)}(\cdot-tu)$ for almost every $t$. This follows by first checking the identity for smooth compactly supported functions and then passing to functions with weak derivatives through order $s$ in $L^2(\mathbb{R})$ by approximation, using strong continuity of translations on $L^2(\mathbb{R})$. Iterating this Banach-valued [fundamental theorem of calculus](/theorems/632) gives, in $L^2(\mathbb{R})$ for each fixed $u\in\mathbb{R}$,
\begin{align*}
f(\cdot-hu)-f
=
\sum_{j=1}^{s-1}\frac{(-hu)^j}{j!}f^{(j)}
+
\frac{(-hu)^s}{(s-1)!}
\int_0^1(1-\theta)^{s-1}f^{(s)}(\cdot-\theta hu)\,d\theta.
\end{align*}
Here the integral in $\theta$ is a Bochner integral in $L^2(\mathbb{R})$. Since translations are isometries on $L^2(\mathbb{R})$, the $L^2$ norm of the remainder integrand is bounded by a constant times $|u|^s\|f^{(s)}\|_{L^2(\mathbb{R})}$. Because $K$ is compactly supported and belongs to $L^2(\mathbb{R})$, the function $u\mapsto |u|^s|K(u)|$ belongs to $L^1(\mathbb{R})$. Thus the resulting $u$-integral is a well-defined Bochner integral in $L^2(\mathbb{R})$. Multiplying by $K(u)$ and integrating over $u$, the moment conditions
\begin{align*}
\int_{\mathbb{R}}u^jK(u)\,du=0
\qquad 1\le j\le s-1
\end{align*}
remove all lower-order terms. Therefore
\begin{align*}
b_h
=
\frac{(-1)^sh^s}{(s-1)!}
\int_{\mathbb{R}}u^sK(u)
\int_0^1(1-\theta)^{s-1}f^{(s)}(\cdot-\theta hu)\,d\theta\,du.
\end{align*}
Define
the function $a:\mathbb{R}\to\mathbb{R}$ by
\begin{align*}
a(x)=\frac{(-1)^s\mu_s(K)}{s!}f^{(s)}(x)
\end{align*}
and
\begin{align*}
r_h: \mathbb{R} &\to \mathbb{R}
\end{align*}
by the identity $b_h=h^s(a+r_h)$. Explicitly,
\begin{align*}
r_h
=
\frac{(-1)^s}{(s-1)!}
\int_{\mathbb{R}}u^sK(u)
\int_0^1(1-\theta)^{s-1}
\bigl(f^{(s)}(\cdot-\theta hu)-f^{(s)}\bigr)
\,d\theta\,du.
\end{align*}
[guided]
The bias is where the order of the kernel is used. We want to compare the translated function $f(\cdot-hu)$ to $f$ in a way that keeps the dependence on $u$ visible, because the kernel moments are integrals in the variable $u$.
For each fixed $u\in\mathbb{R}$, the Sobolev Taylor formula for translations gives the identity in $L^2(\mathbb{R})$. This formula follows by applying the fundamental theorem of calculus for Sobolev functions iteratively to the $L^2(\mathbb{R})$-valued translation path $t\mapsto f(\cdot-tu)$:
\begin{align*}
f(\cdot-hu)-f
=
\sum_{j=1}^{s-1}\frac{(-hu)^j}{j!}f^{(j)}
+
\frac{(-hu)^s}{(s-1)!}
\int_0^1(1-\theta)^{s-1}f^{(s)}(\cdot-\theta hu)\,d\theta.
\end{align*}
The hypotheses needed here are exactly that $f$ has weak derivatives through order $s$ in $L^2(\mathbb{R})$. More explicitly, for $0\le k\le s-1$ the translation path $t\mapsto f^{(k)}(\cdot-tu)$ is absolutely continuous in $L^2(\mathbb{R})$ on bounded intervals, and its derivative is $-u f^{(k+1)}(\cdot-tu)$ for almost every $t$. This is obtained by approximation from smooth functions and the strong continuity of translations on $L^2(\mathbb{R})$. Iterating the Banach-valued fundamental theorem of calculus gives the displayed Taylor formula. The integral over $\theta$ is interpreted as an $L^2(\mathbb{R})$-valued integral. Because $K$ is compactly supported and belongs to $L^2(\mathbb{R})$, both $K$ and $u^sK(u)$ belong to $L^1(\mathbb{R})$; translations preserve the $L^2$ norm of $f^{(s)}$, so the Taylor remainder is integrable in $u$ as a Bochner integral. The displayed $L^2(\mathbb{R})$-valued Taylor identity may therefore be multiplied by $K(u)$ and integrated in $u$.
Substitute this expansion into
\begin{align*}
b_h
=
\int_{\mathbb{R}}K(u)\bigl(f(\cdot-hu)-f\bigr)\,du.
\end{align*}
For $1\le j\le s-1$, the coefficient of $f^{(j)}$ is
\begin{align*}
\frac{(-h)^j}{j!}\int_{\mathbb{R}}u^jK(u)\,du,
\end{align*}
and this is zero by the order $s$ moment condition. This is the reason an order $s$ kernel has bias of size $h^s$ rather than size $h$.
After the lower-order terms vanish, only the integral remainder remains:
\begin{align*}
b_h
=
\frac{(-1)^sh^s}{(s-1)!}
\int_{\mathbb{R}}u^sK(u)
\int_0^1(1-\theta)^{s-1}f^{(s)}(\cdot-\theta hu)\,d\theta\,du.
\end{align*}
To identify the leading term, add and subtract $f^{(s)}$ inside the inner integral. Since
\begin{align*}
\int_0^1(1-\theta)^{s-1}\,d\theta=\frac{1}{s},
\end{align*}
the part containing $f^{(s)}$ equals
\begin{align*}
\frac{(-1)^sh^s}{(s-1)!}
\left(\int_{\mathbb{R}}u^sK(u)\,du\right)
\left(\frac{1}{s}\right)f^{(s)}
=
h^s\frac{(-1)^s\mu_s(K)}{s!}f^{(s)}.
\end{align*}
Thus, with
define $a:\mathbb{R}\to\mathbb{R}$ by
\begin{align*}
a(x)=\frac{(-1)^s\mu_s(K)}{s!}f^{(s)}(x),
\end{align*}
we may write $b_h=h^s(a+r_h)$, where
\begin{align*}
r_h
=
\frac{(-1)^s}{(s-1)!}
\int_{\mathbb{R}}u^sK(u)
\int_0^1(1-\theta)^{s-1}
\bigl(f^{(s)}(\cdot-\theta hu)-f^{(s)}\bigr)
\,d\theta\,du.
\end{align*}
This separates the deterministic leading bias from the translation-continuity remainder.
It remains in this guided argument to see that the remainder is genuinely smaller than the leading bias scale. The compact support of $K$ gives a number $A>0$ with $\operatorname{supp}K\subset[-A,A]$. Given $\varepsilon>0$, choose $\delta>0$ so that $|t|<\delta$ implies $\|f^{(s)}(\cdot-t)-f^{(s)}\|_{L^2(\mathbb{R})}<\varepsilon$. For $h<\delta/A$, every $u\in[-A,A]$ and $\theta\in[0,1]$ satisfies $|\theta hu|<\delta$. Hence the assumed $L^2$ translation continuity of $f^{(s)}$ gives
\begin{align*}
\sup_{\substack{|u|\le A,\;0\le\theta\le1}}
\|f^{(s)}(\cdot-\theta hu)-f^{(s)}\|_{L^2(\mathbb{R})}
\to0.
\end{align*}
[Minkowski's integral inequality](/theorems/464) then bounds $\|r_h\|_{L^2(\mathbb{R})}$ by this supremum times the finite constant
\begin{align*}
\frac{1}{(s-1)!}
\int_{\mathbb{R}}|u|^s|K(u)|
\int_0^1(1-\theta)^{s-1}\,d\theta\,du,
\end{align*}
so $\|r_h\|_{L^2(\mathbb{R})}\to0$. Thus the guided computation is a complete proof of the $L^2$ bias expansion.
[/guided]
[/step]
[step:Show the Taylor remainder is small in $L^2(\mathbb{R})$]
Let $A>0$ be such that $\operatorname{supp}K\subset[-A,A]$. Since $K$ is compactly supported and measurable with $K\in L^2(\mathbb{R})$, the function $u\mapsto |u|^s|K(u)|$ belongs to $L^1(\mathbb{R})$. By Minkowski's integral inequality for Bochner integrals applied in the [Banach space](/page/Banach%20Space) $L^2(\mathbb{R})$,
\begin{align*}
\|r_h\|_{L^2(\mathbb{R})}
&\le
\frac{1}{(s-1)!}
\int_{\mathbb{R}}|u|^s|K(u)|
\int_0^1(1-\theta)^{s-1}
\|f^{(s)}(\cdot-\theta hu)-f^{(s)}\|_{L^2(\mathbb{R})}
\,d\theta\,du.
\end{align*}
For $u\in[-A,A]$ and $\theta\in[0,1]$, the translation parameter $\theta hu$ tends to $0$ uniformly as $h\to0$. Indeed, given $\varepsilon>0$, translation continuity of $f^{(s)}$ gives $\delta>0$ such that
\begin{align*}
|t|<\delta
\quad\Longrightarrow\quad
\|f^{(s)}(\cdot-t)-f^{(s)}\|_{L^2(\mathbb{R})}<\varepsilon.
\end{align*}
If $h<\delta/A$, then $|\theta hu|<\delta$ for every $|u|\le A$ and every $\theta\in[0,1]$. Hence
\begin{align*}
\sup_{\substack{|u|\le A,\;0\le \theta\le 1}}
\|f^{(s)}(\cdot-\theta hu)-f^{(s)}\|_{L^2(\mathbb{R})}
\to 0.
\end{align*}
Therefore $\|r_h\|_{L^2(\mathbb{R})}\to0$. Since $b_h=h^s(a+r_h)$, we get
\begin{align*}
\int_{\mathbb{R}}|b_h(x)|^2\,dx=h^{2s}\|a+r_h\|_{L^2(\mathbb{R})}^2.
\end{align*}
Since $\|r_h\|_{L^2(\mathbb{R})}\to0$, this gives
\begin{align*}
\int_{\mathbb{R}}|b_h(x)|^2\,dx=h^{2s}\|a\|_{L^2(\mathbb{R})}^2+o(h^{2s}).
\end{align*}
Using the formula for $a$, we obtain
\begin{align*}
\int_{\mathbb{R}}|b_h(x)|^2\,dx=\frac{h^{2s}\mu_s(K)^2}{(s!)^2}\|f^{(s)}\|_{L^2(\mathbb{R})}^2+o(h^{2s}).
\end{align*}
[/step]
[step:Compute the integrated variance]
For each $x\in\mathbb{R}$,
\begin{align*}
\operatorname{Var}(\hat f_{n,h}(x))
=
\frac{1}{n}
\left(
\mathbb{E}[Y_{h,x,1}^2]-m_h(x)^2
\right).
\end{align*}
First compute the integral of $\mathbb{E}[Y_{h,x,1}^2]$. Since $X_1$ has density $f$,
\begin{align*}
\mathbb{E}[Y_{h,x,1}^2]
=
\frac{1}{h^2}
\int_{\mathbb{R}}K\left(\frac{x-y}{h}\right)^2 f(y)\,dy.
\end{align*}
The integrand $K((x-y)/h)^2f(y)$ is non-negative because $f$ is a density, so Tonelli's theorem justifies interchanging the $x$ and $y$ integrations. Integrating over $x$ and using the substitution $u=(x-y)/h$, so that $dx=h\,du$, gives
\begin{align*}
\int_{\mathbb{R}}\mathbb{E}[Y_{h,x,1}^2]\,dx=\frac{1}{h^2}\int_{\mathbb{R}}\int_{\mathbb{R}}K\left(\frac{x-y}{h}\right)^2 f(y)\,dy\,dx.
\end{align*}
After the substitution $u=(x-y)/h$, this becomes
\begin{align*}
\int_{\mathbb{R}}\mathbb{E}[Y_{h,x,1}^2]\,dx=\frac{1}{h}\int_{\mathbb{R}}f(y)\,dy\int_{\mathbb{R}}K(u)^2\,du.
\end{align*}
Because $f$ is a probability density, the last display equals $R(K)/h$.
Next, by [Young's convolution inequality](/theorems/463) in the case $L^2(\mathbb{R})*L^1(\mathbb{R})\to L^2(\mathbb{R})$, applied to the convolution representation of $m_h$,
\begin{align*}
\|m_h\|_{L^2(\mathbb{R})}
\le
\|f\|_{L^2(\mathbb{R})}\|K\|_{L^1(\mathbb{R})}.
\end{align*}
The right-hand side is finite because $f\in L^2(\mathbb{R})$ and $K$ is compactly supported with $K\in L^2(\mathbb{R})$, hence $K\in L^1(\mathbb{R})$. Consequently,
\begin{align*}
\frac{1}{n}\int_{\mathbb{R}}m_h(x)^2\,dx
=
O\left(\frac{1}{n}\right)
=
o\left(\frac{1}{nh}\right),
\end{align*}
because $h\to0$. Therefore
\begin{align*}
\int_{\mathbb{R}}\operatorname{Var}(\hat f_{n,h}(x))\,dx=\frac{1}{n}\int_{\mathbb{R}}\mathbb{E}[Y_{h,x,1}^2]\,dx-\frac{1}{n}\int_{\mathbb{R}}m_h(x)^2\,dx.
\end{align*}
Combining the two estimates above,
\begin{align*}
\int_{\mathbb{R}}\operatorname{Var}(\hat f_{n,h}(x))\,dx=\frac{R(K)}{nh}+o\left(\frac{1}{nh}\right).
\end{align*}
The condition $nh_n\to\infty$ also records that this integrated variance contribution tends to $0$ along the stated bandwidth sequence.
[/step]
[step:Combine the bias and variance expansions]
From the bias estimate,
\begin{align*}
\int_{\mathbb{R}}|m_h(x)-f(x)|^2\,dx
=
\frac{h^{2s}\mu_s(K)^2}{(s!)^2}
\|f^{(s)}\|_{L^2(\mathbb{R})}^2
+
o(h^{2s}).
\end{align*}
From the variance estimate,
\begin{align*}
\int_{\mathbb{R}}\operatorname{Var}(\hat f_{n,h}(x))\,dx
=
\frac{R(K)}{nh}
+
o\left(\frac{1}{nh}\right).
\end{align*}
Adding these two identities in the MISE decomposition gives
\begin{align*}
\operatorname{MISE}(\hat f_{n,h},f)
=
\frac{h^{2s}\mu_s(K)^2}{(s!)^2}
\|f^{(s)}\|_{L^2(\mathbb{R})}^2
+
\frac{R(K)}{nh}
+
o(h^{2s})
+
o\left(\frac{1}{nh}\right).
\end{align*}
This is the claimed expansion.
[/step]