[proofplan]
We decompose the mean integrated squared error into the integrated squared bias plus the integrated variance. The bias is expanded by the second-order Taylor formula with integral remainder: symmetry of $K$ cancels the first-order term, and the stated $L^2$ remainder assumption makes the integrated squared bias equal to $\frac{h^4\mu_2(K)^2}{4}R(f'')+o(h^4)$. The variance is computed directly from independence; its leading integrated contribution is $\frac{R(K)}{nh}$, while the centering correction is $O(1/n)=o(1/(nh))$. Combining the two expansions gives the stated AMISE and the exact MISE remainder.
[/proofplan]
[step:Decompose the risk into integrated squared bias and integrated variance]
Let $(\Omega,\mathcal F,\mathbb P)$, the sample $X_1,\dots,X_n:\Omega\to\mathbb R$, and the estimator $\hat f_h:\mathbb R\times\Omega\to\mathbb R$ be as in the theorem statement. For fixed $x\in\mathbb R$, write $\hat f_h(x):\Omega\to\mathbb R$ for the [random variable](/page/Random%20Variable) $\omega\mapsto\hat f_h(x,\omega)$. For $h>0$, define the mean estimator $m_h: \mathbb{R}\to\mathbb{R}$ by $m_h(x)=\mathbb{E}[\hat f_h(x)]$. Define the pointwise bias $b_h: \mathbb{R}\to\mathbb{R}$ by $b_h(x)=m_h(x)-f(x)$. For a real-valued random variable $Z:\Omega\to\mathbb R$ with finite second moment, write $\operatorname{Var}(Z)=\mathbb E[(Z-\mathbb E[Z])^2]$ for its variance. For each $x\in\mathbb{R}$, the identity
\begin{align*}
\mathbb{E}\left[\left(\hat f_h(x)-f(x)\right)^2\right]
=
b_h(x)^2+\operatorname{Var}(\hat f_h(x))
\end{align*}
follows by expanding around $m_h(x)$. Integrating over $\mathbb{R}$ gives
\begin{align*}
\operatorname{MISE}(h)
=
\int_{\mathbb{R}} b_h(x)^2\,d\mathcal{L}^1(x)
+
\int_{\mathbb{R}} \operatorname{Var}(\hat f_h(x))\,d\mathcal{L}^1(x),
\end{align*}
where the interchange of expectation and integration is justified by Tonelli's theorem applied to the nonnegative measurable function $(\omega,x)\mapsto (\hat f_h(x,\omega)-f(x))^2$. The computations below show that the integrated variance term is finite up to the displayed finite centering correction, and the bias step shows the integrated squared bias is finite along the bandwidth sequence under the stated $L^2$ remainder assumption.
[/step]
[step:Expand the bias and isolate the second derivative term]
For Lebesgue-a.e. $x\in\mathbb{R}$, the expectation of the estimator is
\begin{align*}
m_h(x)=\mathbb{E}\left[\frac{1}{h}K\left(\frac{x-X_1}{h}\right)\right].
\end{align*}
Since $X_1$ has density $f$ with respect to $\mathcal L^1$,
\begin{align*}
m_h(x)=\frac{1}{h}\int_{\mathbb{R}} K\left(\frac{x-y}{h}\right)f(y)\,d\mathcal{L}^1(y).
\end{align*}
Using the substitution $u=(x-y)/h$, so that $y=x-hu$ and $d\mathcal{L}^1(y)=h\,d\mathcal{L}^1(u)$, we obtain, for Lebesgue-a.e. $x$,
\begin{align*}
m_h(x)
=
\int_{\mathbb{R}} K(u)f(x-hu)\,d\mathcal{L}^1(u).
\end{align*}
Consequently, for Lebesgue-a.e. $x$,
\begin{align*}
b_h(x)
=
\int_{\mathbb{R}} K(u)\{f(x-hu)-f(x)\}\,d\mathcal{L}^1(u).
\end{align*}
Because $K\in L^1(\mathbb{R})$ and $\int_{\mathbb{R}} |u|^2|K(u)|\,d\mathcal{L}^1(u)<\infty$, the first moment $\int_{\mathbb{R}} |u||K(u)|\,d\mathcal{L}^1(u)$ is finite. Since $K$ is symmetric, the function $u\mapsto uK(u)$ is odd and integrable, hence
\begin{align*}
\int_{\mathbb{R}} uK(u)\,d\mathcal{L}^1(u)=0.
\end{align*}
For almost every pair $(x,u)$, the second-order Taylor formula with integral remainder gives
\begin{align*}
f(x-hu)-f(x)=-hu f'(x)+h^2u^2\int_0^1(1-t)f''(x-thu)\,d\mathcal{L}^1(t),
\end{align*}
where the Taylor formula with integral remainder is applicable because $f'$ is absolutely continuous on compact intervals and $f''$ is its almost-everywhere derivative. We fix a measurable representative of $f''$. Then $(x,u,t)\mapsto f''(x-thu)$ is measurable, and for each fixed $(x,u)$ the $t$-integral is finite because $f''\in L^1_{\mathrm{loc}}(\mathbb R)$. The subsequent $u$-integral defining the remainder is interpreted as the measurable $L^2(\mathbb R)$ function whose existence and convergence are asserted by the stated remainder assumption. Hence
\begin{align*}
b_h(x)=-hf'(x)\int_{\mathbb{R}}uK(u)\,d\mathcal{L}^1(u)+h^2\int_{\mathbb{R}}u^2K(u)\int_0^1(1-t)f''(x-thu)\,d\mathcal{L}^1(t)\,d\mathcal{L}^1(u).
\end{align*}
Using $\int_{\mathbb R}uK(u)\,d\mathcal L^1(u)=0$, we obtain
\begin{align*}
b_h(x)=h^2\int_{\mathbb{R}}u^2K(u)\int_0^1(1-t)f''(x-thu)\,d\mathcal{L}^1(t)\,d\mathcal{L}^1(u).
\end{align*}
Define the bias remainder $\rho_h:\mathbb{R}\to\mathbb{R}$ by
\begin{align*}
\rho_h(x)=2\int_{\mathbb{R}}u^2K(u)\int_0^1(1-t)\left\{f''(x-thu)-f''(x)\right\}\,d\mathcal{L}^1(t)\,d\mathcal{L}^1(u).
\end{align*}
Since $\int_0^1(1-t)\,d\mathcal L^1(t)=1/2$, it follows that
\begin{align*}
b_h(x)=\frac{h^2}{2}\left(\mu_2(K)f''(x)+\rho_h(x)\right).
\end{align*}
[guided]
The purpose of this step is to find the leading deterministic error in the estimator. We compute its expectation as an identity holding for Lebesgue-a.e. $x$. Define $m_h:\mathbb{R}\to\mathbb{R}$ by $m_h(x)=\mathbb{E}[\hat f_h(x)]$ where this expectation is finite, choosing any measurable representative elsewhere. Since the $X_i$ have common density $f$, linearity of expectation gives, for Lebesgue-a.e. $x$,
\begin{align*}
m_h(x)=\mathbb{E}\left[\frac{1}{h}K\left(\frac{x-X_1}{h}\right)\right].
\end{align*}
The density formula for $X_1$ gives
\begin{align*}
m_h(x)=\frac{1}{h}\int_{\mathbb{R}}K\left(\frac{x-y}{h}\right)f(y)\,d\mathcal{L}^1(y).
\end{align*}
Now make the change of variables $u=(x-y)/h$. Then $y=x-hu$, and one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) transforms as $d\mathcal{L}^1(y)=h\,d\mathcal{L}^1(u)$. Thus, for Lebesgue-a.e. $x$,
\begin{align*}
m_h(x)
=
\int_{\mathbb{R}}K(u)f(x-hu)\,d\mathcal{L}^1(u).
\end{align*}
The bias is therefore, for Lebesgue-a.e. $x$,
\begin{align*}
b_h(x)
=
m_h(x)-f(x)
=
\int_{\mathbb{R}}K(u)\{f(x-hu)-f(x)\}\,d\mathcal{L}^1(u),
\end{align*}
using $\int_{\mathbb{R}}K(u)\,d\mathcal{L}^1(u)=1$.
We now expand $f(x-hu)$ around $x$. The absolute continuity of $f'$ on compact intervals permits the second-order Taylor formula with integral remainder:
\begin{align*}
f(x-hu)-f(x)=-hu f'(x)+h^2u^2\int_0^1(1-t)f''(x-thu)\,d\mathcal{L}^1(t).
\end{align*}
This is the correct remainder form under the stated hypotheses: $f''$ is only an $L^2$ function, so point evaluation of $f''$ at a Lagrange remainder point need not be meaningful. We fix a measurable representative of $f''$; then $(x,u,t)\mapsto f''(x-thu)$ is measurable, and the $t$-integral is finite for each fixed $(x,u)$ because $f''\in L^1_{\mathrm{loc}}(\mathbb R)$. The theorem's remainder hypothesis is precisely the assertion that the weighted $u$-integral obtained below defines an $L^2(\mathbb R)$ function tending to zero. Substituting the integral remainder into the bias gives
\begin{align*}
b_h(x)=-hf'(x)\int_{\mathbb{R}}uK(u)\,d\mathcal{L}^1(u)+h^2\int_{\mathbb{R}}u^2K(u)\int_0^1(1-t)f''(x-thu)\,d\mathcal{L}^1(t)\,d\mathcal{L}^1(u).
\end{align*}
The first term disappears because $K$ is symmetric. Indeed, $u\mapsto uK(u)$ is odd, and it is integrable because $K\in L^1(\mathbb{R})$ and $\int_{\mathbb{R}}|u|^2|K(u)|\,d\mathcal{L}^1(u)<\infty$. Hence
\begin{align*}
\int_{\mathbb{R}}uK(u)\,d\mathcal{L}^1(u)=0.
\end{align*}
This cancellation is exactly why the leading bias of a symmetric second-order kernel is quadratic in $h$, rather than linear in $h$.
To separate the main term from the Taylor error, define $\rho_h:\mathbb{R}\to\mathbb{R}$ by
\begin{align*}
\rho_h(x)=2\int_{\mathbb{R}}u^2K(u)\int_0^1(1-t)\left\{f''(x-thu)-f''(x)\right\}\,d\mathcal{L}^1(t)\,d\mathcal{L}^1(u).
\end{align*}
Since
\begin{align*}
\mu_2(K)=\int_{\mathbb{R}}u^2K(u)\,d\mathcal{L}^1(u)
\end{align*}
and $\int_0^1(1-t)\,d\mathcal L^1(t)=1/2$, we obtain the exact decomposition
\begin{align*}
b_h(x)=\frac{h^2}{2}\left(\mu_2(K)f''(x)+\rho_h(x)\right).
\end{align*}
This formula contains both the main AMISE bias term and the remainder term controlled by the hypothesis.
[/guided]
[/step]
[step:Integrate the squared bias and use the $L^2$ remainder assumption]
From the previous step,
\begin{align*}
\int_{\mathbb{R}}b_h(x)^2\,d\mathcal{L}^1(x)
=
\frac{h^4}{4}
\int_{\mathbb{R}}
\left(\mu_2(K)f''(x)+\rho_h(x)\right)^2
\,d\mathcal{L}^1(x).
\end{align*}
Expanding the square gives
\begin{align*}
\int_{\mathbb{R}}b_h(x)^2\,d\mathcal{L}^1(x)=\frac{h^4\mu_2(K)^2}{4}\int_{\mathbb{R}}f''(x)^2\,d\mathcal{L}^1(x)+\frac{h^4\mu_2(K)}{2}\int_{\mathbb{R}}f''(x)\rho_h(x)\,d\mathcal{L}^1(x)+\frac{h^4}{4}\int_{\mathbb{R}}\rho_h(x)^2\,d\mathcal{L}^1(x).
\end{align*}
By hypothesis,
\begin{align*}
\|\rho_h\|_{L^2(\mathbb{R})}^2
=
\int_{\mathbb{R}}\rho_h(x)^2\,d\mathcal{L}^1(x)
\to0.
\end{align*}
Since $f''\in L^2(\mathbb{R})$, the [Cauchy-Schwarz inequality](/theorems/432) gives
\begin{align*}
\left|
\int_{\mathbb{R}}f''(x)\rho_h(x)\,d\mathcal{L}^1(x)
\right|
\le
\|f''\|_{L^2(\mathbb{R})}\|\rho_h\|_{L^2(\mathbb{R})}
\to0.
\end{align*}
Therefore
\begin{align*}
\int_{\mathbb{R}}b_h(x)^2\,d\mathcal{L}^1(x)
=
\frac{h^4\mu_2(K)^2}{4}R(f'')
+
o(h^4).
\end{align*}
[/step]
[step:Compute the integrated variance and identify its leading term]
For $x\in\mathbb{R}$, define $Y_{h,x}:\Omega\to\mathbb{R}$ by
\begin{align*}
Y_{h,x}(\omega)=\frac{1}{h}K\left(\frac{x-X_1(\omega)}{h}\right).
\end{align*}
Since the $X_i$ are independent and identically distributed,
\begin{align*}
\operatorname{Var}(\hat f_h(x))=\frac{1}{n}\operatorname{Var}(Y_{h,x})=\frac{1}{n}\mathbb{E}[Y_{h,x}^2]-\frac{1}{n}m_h(x)^2.
\end{align*}
The second moment is
\begin{align*}
\mathbb{E}[Y_{h,x}^2]
=
\frac{1}{h^2}
\int_{\mathbb{R}}
K\left(\frac{x-y}{h}\right)^2 f(y)\,d\mathcal{L}^1(y).
\end{align*}
Again using $u=(x-y)/h$, so that $y=x-hu$ and $d\mathcal{L}^1(y)=h\,d\mathcal{L}^1(u)$,
\begin{align*}
\mathbb{E}[Y_{h,x}^2]
=
\frac{1}{h}
\int_{\mathbb{R}}K(u)^2 f(x-hu)\,d\mathcal{L}^1(u).
\end{align*}
Integrating over $x$ and applying nonnegative product-measure integration gives
\begin{align*}
\int_{\mathbb{R}}\mathbb{E}[Y_{h,x}^2]\,d\mathcal{L}^1(x)=\frac{1}{h}\int_{\mathbb{R}}\int_{\mathbb{R}}K(u)^2 f(x-hu)\,d\mathcal{L}^1(u)\,d\mathcal{L}^1(x).
\end{align*}
By Tonelli's theorem, applied to the nonnegative measurable integrand $K(u)^2f(x-hu)$, this equals
\begin{align*}
\frac{1}{h}\int_{\mathbb{R}}K(u)^2\left(\int_{\mathbb{R}}f(x-hu)\,d\mathcal{L}^1(x)\right)d\mathcal{L}^1(u).
\end{align*}
For each fixed $u\in\mathbb{R}$, the translation $z=x-hu$ preserves Lebesgue measure, so
\begin{align*}
\int_{\mathbb{R}}f(x-hu)\,d\mathcal{L}^1(x)
=
\int_{\mathbb{R}}f(z)\,d\mathcal{L}^1(z)
=
1.
\end{align*}
Thus
\begin{align*}
\int_{\mathbb{R}}\mathbb{E}[Y_{h,x}^2]\,d\mathcal{L}^1(x)
=
\frac{R(K)}{h}.
\end{align*}
Consequently
\begin{align*}
\int_{\mathbb{R}}\operatorname{Var}(\hat f_h(x))\,d\mathcal{L}^1(x)
=
\frac{R(K)}{nh}
-
\frac{1}{n}\int_{\mathbb{R}}m_h(x)^2\,d\mathcal{L}^1(x).
\end{align*}
[/step]
[step:Show the variance centering correction is negligible]
Define the rescaled kernel $K_h:\mathbb{R}\to\mathbb{R}$ by
\begin{align*}
K_h(t)=\frac{1}{h}K\left(\frac{t}{h}\right).
\end{align*}
Then $m_h=K_h*f$, where convolution is taken with respect to $\mathcal{L}^1$. Moreover,
\begin{align*}
\|K_h\|_{L^1(\mathbb{R})}
=
\int_{\mathbb{R}}\frac{1}{h}\left|K\left(\frac{t}{h}\right)\right|\,d\mathcal{L}^1(t)
=
\int_{\mathbb{R}}|K(u)|\,d\mathcal{L}^1(u)
=
\|K\|_{L^1(\mathbb{R})}.
\end{align*}
Since $K\in L^1(\mathbb{R})$ and $f\in L^2(\mathbb{R})$, [Young's convolution inequality](/theorems/463) gives
\begin{align*}
\|m_h\|_{L^2(\mathbb{R})}
=
\|K_h*f\|_{L^2(\mathbb{R})}
\le
\|K_h\|_{L^1(\mathbb{R})}\|f\|_{L^2(\mathbb{R})}
=
\|K\|_{L^1(\mathbb{R})}\|f\|_{L^2(\mathbb{R})},
\end{align*}
where Young's convolution inequality is used in the case $L^1*L^2\subset L^2$. The hypothesis $f\in L^2(\mathbb R)$ in the theorem statement is exactly the condition needed for this bound. Define the asymptotic notation $A_n=O(1/n)$ to mean that there are constants $C>0$ and $n_0\in\mathbb N$ such that $|A_n|\le C/n$ for all $n\ge n_0$. Hence
\begin{align*}
0\le\frac{1}{n}\int_{\mathbb{R}}m_h(x)^2\,d\mathcal{L}^1(x)\le\frac{\|K\|_{L^1(\mathbb{R})}^2\|f\|_{L^2(\mathbb{R})}^2}{n}=O\left(\frac{1}{n}\right).
\end{align*}
Because $h\to0$,
\begin{align*}
\frac{1/n}{1/(nh)}
=
h
\to0,
\end{align*}
so $1/n=o(1/(nh))$. Therefore
\begin{align*}
\int_{\mathbb{R}}\operatorname{Var}(\hat f_h(x))\,d\mathcal{L}^1(x)
=
\frac{R(K)}{nh}
+
o\left(\frac{1}{nh}\right).
\end{align*}
[/step]
[step:Combine the bias and variance expansions]
Using the risk decomposition,
\begin{align*}
\operatorname{MISE}(h)
=
\int_{\mathbb{R}} b_h(x)^2\,d\mathcal{L}^1(x)
+
\int_{\mathbb{R}}\operatorname{Var}(\hat f_h(x))\,d\mathcal{L}^1(x).
\end{align*}
The integrated squared bias expansion gives
\begin{align*}
\int_{\mathbb{R}} b_h(x)^2\,d\mathcal{L}^1(x)
=
\frac{h^4\mu_2(K)^2}{4}R(f'')
+
o(h^4),
\end{align*}
and the integrated variance expansion gives
\begin{align*}
\int_{\mathbb{R}}\operatorname{Var}(\hat f_h(x))\,d\mathcal{L}^1(x)
=
\frac{R(K)}{nh}
+
o\left(\frac{1}{nh}\right).
\end{align*}
Therefore
\begin{align*}
\operatorname{MISE}(h)=\frac{R(K)}{nh}+\frac{h^4\mu_2(K)^2}{4}R(f'')+o\left(\frac{1}{nh}\right)+o(h^4).
\end{align*}
Since
\begin{align*}
o\left(\frac{1}{nh}\right)+o(h^4)
=
o\left(\frac{1}{nh}+h^4\right),
\end{align*}
we obtain
\begin{align*}
\operatorname{MISE}(h)
=
\operatorname{AMISE}(h)
+
o\left(\frac{1}{nh}+h^4\right),
\end{align*}
with
\begin{align*}
\operatorname{AMISE}(h)
=
\frac{R(K)}{nh}
+
\frac{h^4\mu_2(K)^2}{4}R(f'').
\end{align*}
This is the desired asymptotic mean integrated squared error formula.
[/step]