[proofplan]
We decompose the pointwise mean squared error into squared bias plus variance, then integrate this identity over $\mathbb{R}^d$ with respect to [Lebesgue measure](/page/Lebesgue%20Measure). The two assumed bounds immediately control the integrated squared bias and integrated variance. Finally, the bandwidth rate is obtained by balancing the powers $h^{2s}$ and $(n h^d)^{-1}$ and substituting $h_n \asymp n^{-1/(2s+d)}$ into both terms.
[/proofplan]
[step:Decompose the pointwise mean squared error into bias and variance]
Fix $n \in \mathbb{N}$ and $0 < h \le h_0$. For each $x \in \mathbb{R}^d$ such that $\mathbb{E}[|\hat f_{n,h}(x)|^2] < \infty$, define the real-valued [random variable](/page/Random%20Variable) $Y_x: \Omega \to \mathbb{R}$ by
\begin{align*}
Y_x(\omega)=\hat f_{n,h}(\omega,x).
\end{align*}
Then $Y_x - f(x)$ has finite second moment. Using the identity
\begin{align*}
Y_x - f(x)
=
\left(Y_x-\mathbb{E}[Y_x]\right)
+
\left(\mathbb{E}[Y_x]-f(x)\right),
\end{align*}
we expand and take expectation. This gives
\begin{align*}
\mathbb{E}\left[(Y_x-f(x))^2\right]
=
\mathbb{E}\left[
\left(Y_x-\mathbb{E}[Y_x]\right)^2
\right]
+
2\left(\mathbb{E}[Y_x]-f(x)\right)
\mathbb{E}\left[Y_x-\mathbb{E}[Y_x]\right]
+
\left(\mathbb{E}[Y_x]-f(x)\right)^2.
\end{align*}
Since $\mathbb{E}[Y_x-\mathbb{E}[Y_x]]=0$, this becomes
\begin{align*}
\mathbb{E}\left[(\hat f_{n,h}(x)-f(x))^2\right]
=
\operatorname{Var}(\hat f_{n,h}(x))
+
\left(\mathbb{E}[\hat f_{n,h}(x)]-f(x)\right)^2.
\end{align*}
[guided]
Fix $n \in \mathbb{N}$ and $0 < h \le h_0$. The quantity inside the MISE integral is a pointwise mean squared error, so we first prove the usual bias-variance identity at each spatial point $x \in \mathbb{R}^d$.
For every $x \in \mathbb{R}^d$ such that $\mathbb{E}[|\hat f_{n,h}(x)|^2] < \infty$, define the real-valued random variable $Y_x: \Omega \to \mathbb{R}$ by
\begin{align*}
Y_x(\omega)=\hat f_{n,h}(\omega,x).
\end{align*}
The deterministic number $f(x)$ is the target value of the density at $x$, while $\mathbb{E}[Y_x]$ is the mean of the estimator at $x$. We split the estimation error into a centered random fluctuation and a deterministic bias:
\begin{align*}
Y_x - f(x)
=
\left(Y_x-\mathbb{E}[Y_x]\right)
+
\left(\mathbb{E}[Y_x]-f(x)\right).
\end{align*}
Squaring this identity gives
\begin{align*}
(Y_x-f(x))^2
=
\left(Y_x-\mathbb{E}[Y_x]\right)^2
+
2\left(Y_x-\mathbb{E}[Y_x]\right)\left(\mathbb{E}[Y_x]-f(x)\right)
+
\left(\mathbb{E}[Y_x]-f(x)\right)^2.
\end{align*}
Now take expectation. The middle term vanishes because $\mathbb{E}[Y_x]-f(x)$ is deterministic and
\begin{align*}
\mathbb{E}\left[Y_x-\mathbb{E}[Y_x]\right]
=
\mathbb{E}[Y_x]-\mathbb{E}[Y_x]
=
0.
\end{align*}
Therefore
\begin{align*}
\mathbb{E}\left[(Y_x-f(x))^2\right]
=
\mathbb{E}\left[
\left(Y_x-\mathbb{E}[Y_x]\right)^2
\right]
+
\left(\mathbb{E}[Y_x]-f(x)\right)^2.
\end{align*}
By the definition of variance, the first term is $\operatorname{Var}(Y_x)=\operatorname{Var}(\hat f_{n,h}(x))$. Hence
\begin{align*}
\mathbb{E}\left[(\hat f_{n,h}(x)-f(x))^2\right]
=
\operatorname{Var}(\hat f_{n,h}(x))
+
\left(\mathbb{E}[\hat f_{n,h}(x)]-f(x)\right)^2.
\end{align*}
This identity is the exact pointwise mechanism behind the MISE bound: random fluctuation contributes variance, and systematic displacement of the mean contributes squared bias.
[/guided]
[/step]
[step:Integrate the bias-variance decomposition and use the assumed bounds]
Integrating the pointwise identity over $\mathbb{R}^d$ with respect to $\mathcal{L}^d$ gives
\begin{align*}
\operatorname{MISE}_{n}(h)
&=
\int_{\mathbb{R}^d}
\left(\mathbb{E}[\hat f_{n,h}(x)]-f(x)\right)^2
\,d\mathcal{L}^d(x)
+
\int_{\mathbb{R}^d}
\operatorname{Var}(\hat f_{n,h}(x))
\,d\mathcal{L}^d(x).
\end{align*}
The first integral is bounded by $C_1 h^{2s}$ by the integrated squared-bias hypothesis, and the second integral is bounded by $C_2/(n h^d)$ by the integrated variance hypothesis. Therefore
\begin{align*}
\operatorname{MISE}_{n}(h)
\le
C_1 h^{2s}
+
\frac{C_2}{n h^d}.
\end{align*}
With $C_3 := \max\{C_1,C_2\}$, this implies
\begin{align*}
\operatorname{MISE}_{n}(h)
\le
C_3\left(h^{2s}+\frac{1}{n h^d}\right),
\end{align*}
which is precisely
\begin{align*}
\operatorname{MISE}_{n}(h)
=
O\left(h^{2s}+\frac{1}{n h^d}\right).
\end{align*}
[/step]
[step:Balance the two terms to determine the bandwidth rate]
To minimize the order of the upper bound, balance the squared-bias contribution and the variance contribution:
\begin{align*}
h^{2s}
\asymp
\frac{1}{n h^d}.
\end{align*}
Multiplying by $h^d$ gives
\begin{align*}
h^{2s+d}
\asymp
\frac{1}{n}.
\end{align*}
Taking the positive $(2s+d)$-th root yields
\begin{align*}
h
\asymp
n^{-1/(2s+d)}.
\end{align*}
This is the claimed bandwidth scale.
[/step]
[step:Substitute the balanced bandwidth into the MISE bound]
Let $(h_n)_{n=1}^{\infty}$ be a positive sequence such that $h_n \asymp n^{-1/(2s+d)}$ and $h_n \le h_0$ for all sufficiently large $n$. By the definition of $\asymp$, there exist constants $a,b>0$ and $N \in \mathbb{N}$ such that for every $n \ge N$,
\begin{align*}
a n^{-1/(2s+d)}
\le
h_n
\le
b n^{-1/(2s+d)}.
\end{align*}
For the bias term, the upper bound on $h_n$ gives
\begin{align*}
h_n^{2s}
\le
b^{2s} n^{-2s/(2s+d)}.
\end{align*}
For the variance term, the lower bound on $h_n$ gives
\begin{align*}
\frac{1}{n h_n^d}
\le
\frac{1}{n a^d n^{-d/(2s+d)}}
=
a^{-d} n^{-2s/(2s+d)}.
\end{align*}
Using the MISE bound already proved, for every sufficiently large $n$,
\begin{align*}
\operatorname{MISE}_{n}(h_n)
\le
C_1 h_n^{2s}
+
\frac{C_2}{n h_n^d}.
\end{align*}
The two preceding estimates for the bias and variance terms therefore give
\begin{align*}
\operatorname{MISE}_{n}(h_n)
\le
\left(C_1 b^{2s}+C_2 a^{-d}\right)
n^{-2s/(2s+d)}.
\end{align*}
Thus
\begin{align*}
\operatorname{MISE}_{n}(h_n)
=
O\left(n^{-2s/(2s+d)}\right),
\end{align*}
which completes the proof.
[/step]