[proofplan]
The estimator is a second-order $U$-statistic, so independence reduces its expectation to a common two-sample kernel integral. After introducing the rescaled kernel and changing variables by $y=x-hu$, the bias becomes an average, against $K(u)\,d\mathcal L^1(u)$, of the $L^2$ pairing between $f$ and the translation difference $f(\cdot-hu)-f$. The $H^2$ regularity of $f$ gives a second-order translation estimate in $L^2$, while the zeroth-order term is the quadratic functional from the statement and the first-order term vanishes because the first moment of $K$ is zero. Averaging the quadratic remainder against the finite second absolute moment of $K$ gives the desired $O(h^2)$ bound, uniformly in $n$.
[/proofplan]
[step:Rewrite the expectation as a kernel-smoothed quadratic form]
Let $(\Omega,\mathcal F,\mathbb P)$ be the probability space on which the independent identically distributed real-valued random variables $X_1,\dots,X_n:\Omega\to\mathbb R$ are defined, each with density $f:\mathbb R\to[0,\infty)$. Let $L^2(\mathbb R)$ denote the [Hilbert space](/page/Hilbert%20Space) of real-valued $\mathcal L^1$-square-integrable functions on $\mathbb R$, modulo equality $\mathcal L^1$-a.e., with norm
\begin{align*}
\|g\|_{L^2(\mathbb R)}:=\left(\int_{\mathbb R}|g(x)|^2\,d\mathcal L^1(x)\right)^{1/2}.
\end{align*}
Let $H^2(\mathbb R)$ denote the [Sobolev space](/page/Sobolev%20Space) of all $g\in L^2(\mathbb R)$ whose first and second weak derivatives belong to $L^2(\mathbb R)$. Define the quadratic functional $Q(f)\in\mathbb R$ by
\begin{align*}
Q(f):=\int_{\mathbb R}f(x)^2\,d\mathcal L^1(x).
\end{align*}
For $h>0$, define the rescaled kernel $K_h:\mathbb R\to\mathbb R$ by
\begin{align*}
K_h(z):=h^{-1}K(z/h).
\end{align*}
Define the kernel quadratic estimator $\widehat Q_{n,h}:\Omega\to\mathbb R$ by
\begin{align*}
\widehat Q_{n,h}:=\frac{2}{n(n-1)}\sum_{1\le i<j\le n}K_h(X_i-X_j).
\end{align*}
For $h>0$, the function $K_h:\mathbb R\to\mathbb R$ satisfies
\begin{align*}
\|K_h\|_{L^1(\mathbb R)}=\int_{\mathbb R}|K_h(z)|\,d\mathcal L^1(z)=\int_{\mathbb R}|K(u)|\,d\mathcal L^1(u)<\infty.
\end{align*}
Since $f\in H^2(\mathbb R)\subset L^2(\mathbb R)$, Tonelli's theorem applied to the non-negative integrand gives
\begin{align*}
\int_{\mathbb R}\int_{\mathbb R}|K_h(x-y)|f(x)f(y)\,d\mathcal L^1(y)\,d\mathcal L^1(x)
\le \|K\|_{L^1(\mathbb R)}\|f\|_{L^2(\mathbb R)}^2<\infty.
\end{align*}
Therefore the signed kernel integral is absolutely integrible, and [Fubini's theorem](/theorems/2961) justifies the iterated integral representation used below.
By independence and identical distribution, each pair $(X_i,X_j)$ with $i<j$ has the same joint density $(x,y)\mapsto f(x)f(y)$ on $\mathbb R^2$. Therefore
\begin{align*}
\mathbb E[\widehat Q_{n,h}]=\frac{2}{n(n-1)}\sum_{1\le i<j\le n}\mathbb E[K_h(X_i-X_j)].
\end{align*}
Since there are $n(n-1)/2$ pairs and each summand has the displayed joint-density integral,
\begin{align*}
\mathbb E[\widehat Q_{n,h}]=\int_{\mathbb R}\int_{\mathbb R}K_h(x-y)f(x)f(y)\,d\mathcal L^1(y)\,d\mathcal L^1(x).
\end{align*}
For fixed $x\in\mathbb R$, use the substitution $u=(x-y)/h$, equivalently $y=x-hu$. Since $d\mathcal L^1(y)=h\,d\mathcal L^1(u)$ under this affine change of variables and the preceding absolute-integrability estimate permits Fubini's theorem after the substitution, we obtain
\begin{align*}
\mathbb E[\widehat Q_{n,h}]=\int_{\mathbb R}K(u)\int_{\mathbb R}f(x)f(x-hu)\,d\mathcal L^1(x)\,d\mathcal L^1(u).
\end{align*}
For every $u\in\mathbb R$, the inner integral is finite by the [Cauchy-Schwarz inequality](/theorems/432) and translation-invariance of $\mathcal L^1$, and the rewritten integral is absolutely integrable because
\begin{align*}
\int_{\mathbb R}|K(u)|\left|\int_{\mathbb R}f(x)f(x-hu)\,d\mathcal L^1(x)\right|\,d\mathcal L^1(u)\le \|K\|_{L^1(\mathbb R)}\|f\|_{L^2(\mathbb R)}^2.
\end{align*}
Using $\int_{\mathbb R}K(u)\,d\mathcal L^1(u)=1$, this gives
\begin{align*}
\mathbb E[\widehat Q_{n,h}]-Q(f)
=\int_{\mathbb R}K(u)\int_{\mathbb R}f(x)\{f(x-hu)-f(x)\}\,d\mathcal L^1(x)\,d\mathcal L^1(u).
\end{align*}
[guided]
We first reduce the expectation of the $U$-statistic to a deterministic integral. Let $(\Omega,\mathcal F,\mathbb P)$ be the probability space on which the independent identically distributed real-valued random variables $X_1,\dots,X_n:\Omega\to\mathbb R$ are defined, each with density $f:\mathbb R\to[0,\infty)$. Let $L^2(\mathbb R)$ denote the Hilbert space of real-valued $\mathcal L^1$-square-integrable functions on $\mathbb R$, modulo equality $\mathcal L^1$-a.e., with norm
\begin{align*}
\|g\|_{L^2(\mathbb R)}:=\left(\int_{\mathbb R}|g(x)|^2\,d\mathcal L^1(x)\right)^{1/2}.
\end{align*}
Let $H^2(\mathbb R)$ denote the Sobolev space of all $g\in L^2(\mathbb R)$ whose first and second weak derivatives belong to $L^2(\mathbb R)$. Define the quadratic functional $Q(f)\in\mathbb R$ by
\begin{align*}
Q(f):=\int_{\mathbb R}f(x)^2\,d\mathcal L^1(x).
\end{align*}
For $h>0$, define the rescaled kernel $K_h:\mathbb R\to\mathbb R$ by
\begin{align*}
K_h(z):=h^{-1}K(z/h).
\end{align*}
Define the kernel quadratic estimator $\widehat Q_{n,h}:\Omega\to\mathbb R$ by
\begin{align*}
\widehat Q_{n,h}:=\frac{2}{n(n-1)}\sum_{1\le i<j\le n}K_h(X_i-X_j).
\end{align*}
The first issue is integrability: we need the expectation and the changes in the order of integration to be legitimate. Since $K_h$ is a rescaling of $K$, the affine substitution $z=hu$ gives
\begin{align*}
\|K_h\|_{L^1(\mathbb R)}=\int_{\mathbb R}|K_h(z)|\,d\mathcal L^1(z)=\int_{\mathbb R}|K(u)|\,d\mathcal L^1(u)<\infty.
\end{align*}
Since $f\in H^2(\mathbb R)\subset L^2(\mathbb R)$, Tonelli's theorem applied to the non-negative integrand and Young's convolution bound for the pairing with $f$ give
\begin{align*}
\int_{\mathbb R}\int_{\mathbb R}|K_h(x-y)|f(x)f(y)\,d\mathcal L^1(y)\,d\mathcal L^1(x)
\le \|K\|_{L^1(\mathbb R)}\|f\|_{L^2(\mathbb R)}^2<\infty.
\end{align*}
Thus the signed kernel integral is absolutely integrable, so Fubini's theorem applies.
By independence and identical distribution, each pair $(X_i,X_j)$ with $i<j$ has joint density $(x,y)\mapsto f(x)f(y)$ on $\mathbb R^2$. Therefore
\begin{align*}
\mathbb E[\widehat Q_{n,h}]=\frac{2}{n(n-1)}\sum_{1\le i<j\le n}\mathbb E[K_h(X_i-X_j)].
\end{align*}
There are $n(n-1)/2$ pairs, and every pair has the same expectation, so the coefficient cancels the number of summands and yields
\begin{align*}
\mathbb E[\widehat Q_{n,h}]=\int_{\mathbb R}\int_{\mathbb R}K_h(x-y)f(x)f(y)\,d\mathcal L^1(y)\,d\mathcal L^1(x).
\end{align*}
For fixed $x\in\mathbb R$, make the affine substitution $u=(x-y)/h$, equivalently $y=x-hu$. Under this substitution, $d\mathcal L^1(y)=h\,d\mathcal L^1(u)$ and $K_h(x-y)=h^{-1}K(u)$, so the factors of $h$ cancel. Hence
\begin{align*}
\mathbb E[\widehat Q_{n,h}]=\int_{\mathbb R}K(u)\int_{\mathbb R}f(x)f(x-hu)\,d\mathcal L^1(x)\,d\mathcal L^1(u).
\end{align*}
The inner integral is finite for every $u\in\mathbb R$ by the Cauchy-Schwarz inequality and translation-invariance of $\mathcal L^1$, because
\begin{align*}
\left|\int_{\mathbb R}f(x)f(x-hu)\,d\mathcal L^1(x)\right|\le \|f\|_{L^2(\mathbb R)}^2.
\end{align*}
Consequently the outer integral is absolutely integrable against $|K(u)|\,d\mathcal L^1(u)$. Finally, using the normalization $\int_{\mathbb R}K(u)\,d\mathcal L^1(u)=1$, we subtract
\begin{align*}
Q(f)=\int_{\mathbb R}K(u)\int_{\mathbb R}f(x)^2\,d\mathcal L^1(x)\,d\mathcal L^1(u)
\end{align*}
from the preceding identity. This gives the bias identity
\begin{align*}
\mathbb E[\widehat Q_{n,h}]-Q(f)
=\int_{\mathbb R}K(u)\int_{\mathbb R}f(x)\{f(x-hu)-f(x)\}\,d\mathcal L^1(x)\,d\mathcal L^1(u).
\end{align*}
This identity is the bridge from probability to analysis: the rest of the proof only needs to estimate the translation difference $f(\cdot-hu)-f$ in $L^2(\mathbb R)$.
[/guided]
[/step]
[step:Control the second-order translation remainder in $L^2$]
For each $t\in\mathbb R$, define the translation operator $T_t:L^2(\mathbb R)\to L^2(\mathbb R)$ as follows: for each $g\in L^2(\mathbb R)$, $T_tg:\mathbb R\to\mathbb R$ is the equivalence class represented by the function $x\mapsto g(x-t)$ for $\mathcal L^1$-a.e. $x\in\mathbb R$. Since [Lebesgue measure](/page/Lebesgue%20Measure) is translation-invariant, the affine substitution $y=x-t$ gives
\begin{align*}
\|T_tg\|_{L^2(\mathbb R)}^2=\int_{\mathbb R}|g(x-t)|^2\,d\mathcal L^1(x)=\int_{\mathbb R}|g(y)|^2\,d\mathcal L^1(y)=\|g\|_{L^2(\mathbb R)}^2
\end{align*}
for every $g\in L^2(\mathbb R)$, so $T_t$ is an isometry on $L^2(\mathbb R)$.
Since $f\in H^2(\mathbb R)$, let $f'\in L^2(\mathbb R)$ denote its first [weak derivative](/page/Weak%20Derivative) and let $f''\in L^2(\mathbb R)$ denote its second weak derivative.
We claim that for every $t\in\mathbb R$,
\begin{align*}
\|T_tf-f+t f'\|_{L^2(\mathbb R)}
\le \frac{t^2}{2}\|f''\|_{L^2(\mathbb R)}.
\end{align*}
First assume $f\in C_c^\infty(\mathbb R)$, the space of smooth compactly supported real-valued functions on $\mathbb R$. For every $x\in\mathbb R$, the [Fundamental Theorem of Calculus](/theorems/632) applied twice to the map $s\mapsto f(x-s)$ gives
\begin{align*}
f(x-t)-f(x)+t f'(x)
=\int_0^t (t-s)f''(x-s)\,d\mathcal L^1(s),
\end{align*}
with the oriented integral understood when $t<0$. Taking the $L^2(\mathbb R)$ norm and using the triangle inequality for the Bochner integral in the [Banach space](/page/Banach%20Space) $L^2(\mathbb R)$ together with the isometry of $T_s$ gives
\begin{align*}
\|T_tf-f+t f'\|_{L^2(\mathbb R)}\le \int_{\min\{0,t\}}^{\max\{0,t\}} |t-s|\,\|T_s f''\|_{L^2(\mathbb R)}\,d\mathcal L^1(s).
\end{align*}
Using $\|T_s f''\|_{L^2(\mathbb R)}=\|f''\|_{L^2(\mathbb R)}$ and evaluating the one-dimensional integral gives
\begin{align*}
\|T_tf-f+t f'\|_{L^2(\mathbb R)}\le \frac{t^2}{2}\|f''\|_{L^2(\mathbb R)}.
\end{align*}
For general $f\in H^2(\mathbb R)$, by the [Global Approximation in Sobolev Spaces](/theorems/57), choose a sequence $(f_m)_{m=1}^\infty$ in $C_c^\infty(\mathbb R)$ with $f_m\to f$ in $H^2(\mathbb R)$. Applying the estimate to $f_m$, using the isometry of $T_t$ to get $T_t f_m\to T_t f$ in $L^2(\mathbb R)$, and passing to the limit in $L^2(\mathbb R)$ proves the claim.
[guided]
The purpose of this step is to make the informal Taylor expansion valid for a Sobolev function. Since $f$ need not be twice classically differentiable pointwise, we do not use a pointwise Taylor theorem directly. Instead we prove the expansion in the $L^2$ norm, where weak derivatives are the correct objects. Since $f\in H^2(\mathbb R)$, let $f'\in L^2(\mathbb R)$ denote its first weak derivative and let $f''\in L^2(\mathbb R)$ denote its second weak derivative.
For $t\in\mathbb R$, define the translation operator $T_t:L^2(\mathbb R)\to L^2(\mathbb R)$ as follows: for each $g\in L^2(\mathbb R)$, $T_tg:\mathbb R\to\mathbb R$ is the equivalence class represented by the function $x\mapsto g(x-t)$ for $\mathcal L^1$-a.e. $x\in\mathbb R$. Translation-invariance of $\mathcal L^1$ gives, by the substitution $y=x-t$,
\begin{align*}
\|T_tg\|_{L^2(\mathbb R)}^2=\int_{\mathbb R}|g(x-t)|^2\,d\mathcal L^1(x)=\int_{\mathbb R}|g(y)|^2\,d\mathcal L^1(y)=\|g\|_{L^2(\mathbb R)}^2.
\end{align*}
Thus translations do not change $L^2$ size.
We first prove the estimate for a smooth compactly supported function $f\in C_c^\infty(\mathbb R)$, where $C_c^\infty(\mathbb R)$ denotes the space of smooth compactly supported real-valued functions on $\mathbb R$. For every $x\in\mathbb R$, the [Fundamental Theorem of Calculus](/theorems/632) applied twice to the one-dimensional map $s\mapsto f(x-s)$ gives
\begin{align*}
f(x-t)-f(x)+t f'(x)
=\int_0^t (t-s)f''(x-s)\,d\mathcal L^1(s).
\end{align*}
This identity is the second-order Taylor formula with integral remainder, written in translation form. Taking the $L^2(\mathbb R)$ norm yields
\begin{align*}
\|T_tf-f+t f'\|_{L^2(\mathbb R)}=\left\|\int_0^t (t-s)T_s f''\,d\mathcal L^1(s)\right\|_{L^2(\mathbb R)}.
\end{align*}
Using the triangle inequality for the Bochner integral in the Banach space $L^2(\mathbb R)$ gives
\begin{align*}
\|T_tf-f+t f'\|_{L^2(\mathbb R)}\le \int_{\min\{0,t\}}^{\max\{0,t\}} |t-s|\,\|T_s f''\|_{L^2(\mathbb R)}\,d\mathcal L^1(s).
\end{align*}
The translation isometry gives $\|T_s f''\|_{L^2(\mathbb R)}=\|f''\|_{L^2(\mathbb R)}$, so evaluating the remaining scalar integral gives
\begin{align*}
\|T_tf-f+t f'\|_{L^2(\mathbb R)}\le \frac{t^2}{2}\|f''\|_{L^2(\mathbb R)}.
\end{align*}
Now let $f\in H^2(\mathbb R)$. By the [Global Approximation in Sobolev Spaces](/theorems/57), there exists a sequence $(f_m)_{m=1}^\infty$ in $C_c^\infty(\mathbb R)$ such that $f_m\to f$ in $H^2(\mathbb R)$. The estimate just proved gives
\begin{align*}
\|T_tf_m-f_m+t f_m'\|_{L^2(\mathbb R)}
\le \frac{t^2}{2}\|f_m''\|_{L^2(\mathbb R)}.
\end{align*}
Since $f_m\to f$, $f_m'\to f'$, and $f_m''\to f''$ in $L^2(\mathbb R)$, and since $T_t$ is an isometry on $L^2(\mathbb R)$, passing to the limit gives
\begin{align*}
\|T_tf-f+t f'\|_{L^2(\mathbb R)}
\le \frac{t^2}{2}\|f''\|_{L^2(\mathbb R)}.
\end{align*}
This is the Sobolev replacement for the classical second-order Taylor remainder. In the bias computation, we will use it with $t=hu$ to show that the translation error is bounded by a multiple of $h^2u^2$, which is integrable against $|K(u)|\,d\mathcal L^1(u)$ by the finite second absolute moment assumption.
[/guided]
[/step]
[step:Cancel the lower-order terms and bound the averaged remainder]
Define the scalar $A\in\mathbb R$ by
\begin{align*}
A:=\int_{\mathbb R}f(x)f'(x)\,d\mathcal L^1(x).
\end{align*}
This integral is finite by the Cauchy-Schwarz inequality because $f,f'\in L^2(\mathbb R)$. For $t\in\mathbb R$, define the remainder scalar $R(t)\in\mathbb R$ by
\begin{align*}
R(t):=\int_{\mathbb R}f(x)\{(T_tf)(x)-f(x)+t f'(x)\}\,d\mathcal L^1(x).
\end{align*}
The Cauchy-Schwarz inequality gives
\begin{align*}
|R(t)|\le \|f\|_{L^2(\mathbb R)}\|T_tf-f+t f'\|_{L^2(\mathbb R)}.
\end{align*}
Applying the translation estimate to the second factor gives
\begin{align*}
|R(t)|\le \frac{t^2}{2}\|f\|_{L^2(\mathbb R)}\|f''\|_{L^2(\mathbb R)}.
\end{align*}
With $t=hu$, the bias identity becomes
\begin{align*}
\mathbb E[\widehat Q_{n,h}]-Q(f)=\int_{\mathbb R}K(u)\{-huA+R(hu)\}\,d\mathcal L^1(u).
\end{align*}
The first-order term vanishes by the moment condition:
\begin{align*}
\int_{\mathbb R}K(u)(-huA)\,d\mathcal L^1(u)
=-hA\int_{\mathbb R}uK(u)\,d\mathcal L^1(u)=0.
\end{align*}
Therefore
\begin{align*}
\left|\mathbb E[\widehat Q_{n,h}]-Q(f)\right|\le \int_{\mathbb R}|K(u)|\,|R(hu)|\,d\mathcal L^1(u).
\end{align*}
Using the remainder estimate with $t=hu$ gives
\begin{align*}
\left|\mathbb E[\widehat Q_{n,h}]-Q(f)\right|\le \frac{h^2}{2}\|f\|_{L^2(\mathbb R)}\|f''\|_{L^2(\mathbb R)}\int_{\mathbb R}u^2|K(u)|\,d\mathcal L^1(u).
\end{align*}
Define
\begin{align*}
C:=\frac{1}{2}\|f\|_{L^2(\mathbb R)}\|f''\|_{L^2(\mathbb R)}
\int_{\mathbb R}u^2|K(u)|\,d\mathcal L^1(u).
\end{align*}
The hypotheses imply $C<\infty$, and the expression for $C$ depends only on $K$ and $f$. Hence, for every $h>0$ and every $n\ge2$,
\begin{align*}
\left|\mathbb E[\widehat Q_{n,h}]-Q(f)\right|\le Ch^2.
\end{align*}
In particular, the same estimate holds for all sufficiently small $h>0$, and the constant is independent of $n$. This proves
\begin{align*}
\mathbb E[\widehat Q_{n,h}]-Q(f)=O(h^2)
\end{align*}
as $h\to0$.
[guided]
We now insert the $L^2$ translation estimate into the bias identity. Define the scalar $A\in\mathbb R$ by
\begin{align*}
A:=\int_{\mathbb R}f(x)f'(x)\,d\mathcal L^1(x).
\end{align*}
This integral is finite by the Cauchy-Schwarz inequality applied in $L^2(\mathbb R)$, since $f,f'\in L^2(\mathbb R)$. For each $t\in\mathbb R$, define the scalar remainder $R(t)\in\mathbb R$ by
\begin{align*}
R(t):=\int_{\mathbb R}f(x)\{(T_tf)(x)-f(x)+t f'(x)\}\,d\mathcal L^1(x).
\end{align*}
The point of this definition is to isolate the part of the translation difference that remains after removing the zeroth-order and first-order terms. By the Cauchy-Schwarz inequality in $L^2(\mathbb R)$,
\begin{align*}
|R(t)|\le \|f\|_{L^2(\mathbb R)}\|T_tf-f+t f'\|_{L^2(\mathbb R)}.
\end{align*}
The translation estimate from the preceding step gives
\begin{align*}
|R(t)|\le \frac{t^2}{2}\|f\|_{L^2(\mathbb R)}\|f''\|_{L^2(\mathbb R)}.
\end{align*}
Using the bias identity with $t=hu$, we have
\begin{align*}
\mathbb E[\widehat Q_{n,h}]-Q(f)=\int_{\mathbb R}K(u)\{-huA+R(hu)\}\,d\mathcal L^1(u).
\end{align*}
The first-order term is zero because the hypothesis gives the first moment condition:
\begin{align*}
\int_{\mathbb R}K(u)(-huA)\,d\mathcal L^1(u)
=-hA\int_{\mathbb R}uK(u)\,d\mathcal L^1(u)=0.
\end{align*}
Therefore only the second-order remainder contributes to the bias. Taking absolute values gives
\begin{align*}
\left|\mathbb E[\widehat Q_{n,h}]-Q(f)\right|
\le \int_{\mathbb R}|K(u)|\,|R(hu)|\,d\mathcal L^1(u).
\end{align*}
Using the estimate for $R(hu)$ in this integral gives
\begin{align*}
\left|\mathbb E[\widehat Q_{n,h}]-Q(f)\right|
\le \frac{h^2}{2}\|f\|_{L^2(\mathbb R)}\|f''\|_{L^2(\mathbb R)}\int_{\mathbb R}u^2|K(u)|\,d\mathcal L^1(u).
\end{align*}
Define
\begin{align*}
C:=\frac{1}{2}\|f\|_{L^2(\mathbb R)}\|f''\|_{L^2(\mathbb R)}\int_{\mathbb R}u^2|K(u)|\,d\mathcal L^1(u).
\end{align*}
The assumptions $f\in H^2(\mathbb R)$ and $\int_{\mathbb R}u^2|K(u)|\,d\mathcal L^1(u)<\infty$ imply $C<\infty$. The constant $C$ depends only on $K$ and $f$, and no factor in $C$ depends on the sample size $n$. Hence, for every $h>0$ and every $n\ge2$,
\begin{align*}
\left|\mathbb E[\widehat Q_{n,h}]-Q(f)\right|\le Ch^2.
\end{align*}
This is exactly the assertion that $\mathbb E[\widehat Q_{n,h}]-Q(f)=O(h^2)$ as $h\to0$, with an implicit constant independent of $n$.
[/guided]
[/step]