[proofplan]
We expand the mean integrated squared error into its quadratic, cross, and constant terms. The quadratic term already appears in $\operatorname{LSCV}(h)$, so the main point is to show that the leave-one-out average is an unbiased estimator of the cross term $\mathbb E[\int_{\mathbb R}\hat f_h(x) f(x)\,d\mathcal L^1(x)]$. This follows by conditioning on all observations except $X_i$: after conditioning, $\hat f_{h,-i}$ is fixed while $X_i$ still has conditional density $f$ with respect to $\mathcal L^1$. Finally, the pathwise identity that the average of the leave-one-out estimators equals the full estimator converts the averaged leave-one-out expectation into the full cross term.
[/proofplan]
[step:Expand the mean integrated squared error into three finite terms]
Because $f\in L^2(\mathbb R)$ and
\begin{align*}
\mathbb E\left[\int_{\mathbb R}\hat f_h(x)^2\,d\mathcal L^1(x)\right]<\infty,
\end{align*}
the cross term is integrable. Indeed, for almost every $\omega\in\Omega$, the [Cauchy-Schwarz inequality](/theorems/432) in $L^2(\mathbb R)$ gives the following bound:
\begin{align*}
\int_{\mathbb R}|\hat f_h(x)f(x)|\,d\mathcal L^1(x) \leq \left(\int_{\mathbb R}\hat f_h(x)^2\,d\mathcal L^1(x)\right)^{1/2}\left(\int_{\mathbb R}f(x)^2\,d\mathcal L^1(x)\right)^{1/2}.
\end{align*}
Taking expectations and applying the [Cauchy-Schwarz inequality](/theorems/432) for real random variables gives
\begin{align*}
\mathbb E\left[\int_{\mathbb R}|\hat f_h(x)f(x)|\,d\mathcal L^1(x)\right]
\leq \left(\mathbb E\left[\int_{\mathbb R}\hat f_h(x)^2\,d\mathcal L^1(x)\right]\right)^{1/2}\left(\int_{\mathbb R}f(x)^2\,d\mathcal L^1(x)\right)^{1/2}<\infty.
\end{align*}
Therefore we may expand the square and take expectations term by term:
\begin{align*}
\operatorname{MISE}(h) = \mathbb E\left[\int_{\mathbb R}\hat f_h(x)^2\,d\mathcal L^1(x)\right] -2\mathbb E\left[\int_{\mathbb R}\hat f_h(x)f(x)\,d\mathcal L^1(x)\right] +\int_{\mathbb R}f(x)^2\,d\mathcal L^1(x).
\end{align*}
[/step]
[step:Condition on the observations outside the leave-one-out index]
Fix $i\in\{1,\dots,n\}$. Let $\mathcal F_{-i}:=\sigma(X_j:1\leq j\leq n,\ j\neq i)$ be the $\sigma$-algebra generated by all observations except $X_i$, and let $\mathcal B(\mathbb R)$ denote the Borel $\sigma$-algebra on $\mathbb R$. By the theorem hypotheses, $K:\mathbb R\to\mathbb R$ is Borel-measurable and $\hat f_{h,-i}$ is given by its finite-sum formula in the variables $X_j$ with $j\neq i$. Therefore the map $(\omega,x)\mapsto \hat f_{h,-i}(\omega,x)$ is $\mathcal F_{-i}\otimes\mathcal B(\mathbb R)$-measurable.
We use the following conditional integration fact. If $G:\Omega\times\mathbb R\to\mathbb R$ is $\mathcal F_{-i}\otimes\mathcal B(\mathbb R)$-measurable and $G(\cdot,X_i)$ is integrable, then independence of $X_i$ from $\mathcal F_{-i}$ and the product-measure construction imply
\begin{align*}
\mathbb E\left[G(\cdot,X_i)\mid \mathcal F_{-i}\right]
=
\int_{\mathbb R}G(\cdot,x)f(x)\,d\mathcal L^1(x)
\end{align*}
almost surely, whenever the right-hand side is interpreted as an absolutely finite [random variable](/page/Random%20Variable). Indeed, the identity first holds for non-negative $G$ by Tonelli's theorem applied to the joint law of $(\omega,X_i)$ over $\mathcal F_{-i}\otimes\mathcal B(\mathbb R)$, and the signed case follows by applying the non-negative case to $G^+$ and $G^-$ whenever $G(\cdot,X_i)$ is integrable.
Apply this fact to $G(\omega,x):=\hat f_{h,-i}(\omega,x)$. The theorem hypothesis $\mathbb E[|\hat f_{h,-i}(X_i)|]<\infty$ gives the required integrability. Hence
\begin{align*}
\mathbb E\left[\hat f_{h,-i}(X_i)\mid \mathcal F_{-i}\right]
=
\int_{\mathbb R}\hat f_{h,-i}(x)f(x)\,d\mathcal L^1(x)
\end{align*}
almost surely, and the conditional integral is absolutely finite almost surely. Taking expectations and using the defining property of [conditional expectation](/page/Conditional%20Expectation) yields
\begin{align*}
\mathbb E[\hat f_{h,-i}(X_i)]
=
\mathbb E\left[\int_{\mathbb R}\hat f_{h,-i}(x)f(x)\,d\mathcal L^1(x)\right].
\end{align*}
[guided]
Fix one index $i\in\{1,\dots,n\}$. The reason for deleting $X_i$ from the estimator is that the remaining estimator no longer depends on $X_i$. Define
\begin{align*}
\mathcal F_{-i}:=\sigma(X_j:1\leq j\leq n,\ j\neq i).
\end{align*}
Let $\mathcal B(\mathbb R)$ denote the Borel $\sigma$-algebra on $\mathbb R$. By construction from the theorem statement, $\hat f_{h,-i}:\mathbb R\to\mathbb R$ is determined by the random variables $X_j$ with $j\neq i$. More explicitly, its finite-sum formula is built from the Borel-measurable kernel $K:\mathbb R\to\mathbb R$ and the measurable coordinate maps $X_j$, so $(\omega,x)\mapsto \hat f_{h,-i}(\omega,x)$ is $\mathcal F_{-i}\otimes\mathcal B(\mathbb R)$-measurable. Thus it is legitimate to treat $\hat f_{h,-i}$ as an $\mathcal F_{-i}$-measurable random function. Since the theorem assumes that $X_1,\dots,X_n$ are independent identically distributed random variables with common density $f$, the random variable $X_i$ is independent of $\mathcal F_{-i}$ and its law has density $f$ with respect to $\mathcal L^1$.
The precise tool is the following conditional integration identity. Suppose $G:\Omega\times\mathbb R\to\mathbb R$ is $\mathcal F_{-i}\otimes\mathcal B(\mathbb R)$-measurable and $G(\cdot,X_i)$ is integrable. Then
\begin{align*}
\mathbb E\left[G(\cdot,X_i)\mid \mathcal F_{-i}\right]
=
\int_{\mathbb R}G(\cdot,x)f(x)\,d\mathcal L^1(x)
\end{align*}
almost surely. Why is this true? For non-negative $G$, independence says that the joint law factors into the law generated by $\mathcal F_{-i}$ and the law of $X_i$, and the latter has density $f$ with respect to $\mathcal L^1$. Tonelli's theorem then gives the displayed identity after testing against every bounded $\mathcal F_{-i}$-measurable random variable. For signed $G$, we apply the non-negative result to $G^+$ and $G^-$ and subtract the two finite identities; integrability of $G(\cdot,X_i)$ ensures that this subtraction is legitimate.
Now take $G(\omega,x):=\hat f_{h,-i}(\omega,x)$. The integrability hypothesis
\begin{align*}
\mathbb E\left[\left|\hat f_{h,-i}(X_i)\right|\right]<\infty
\end{align*}
ensures that $G(\cdot,X_i)$ is integrable. Therefore the conditional integration identity gives
\begin{align*}
\mathbb E\left[\hat f_{h,-i}(X_i)\mid \mathcal F_{-i}\right]
=
\int_{\mathbb R}\hat f_{h,-i}(x)f(x)\,d\mathcal L^1(x)
\end{align*}
almost surely. The same identity applied to $|G|$ shows that the integral on the right is absolutely finite almost surely. Taking expectation on both sides and using the defining property of [conditional expectation](/page/Conditional%20Expectation) gives
\begin{align*}
\mathbb E[\hat f_{h,-i}(X_i)]
=
\mathbb E\left[\int_{\mathbb R}\hat f_{h,-i}(x)f(x)\,d\mathcal L^1(x)\right].
\end{align*}
This is the key unbiasedness mechanism: leave-one-out removes the dependence between the evaluation point $X_i$ and the estimator being evaluated there.
[/guided]
[/step]
[step:Average the leave-one-out cross terms before taking expectation]
For each $i\in\{1,\dots,n\}$, define the random variable
\begin{align*}
A_i:=\int_{\mathbb R}\hat f_{h,-i}(x)f(x)\,d\mathcal L^1(x).
\end{align*}
The previous step shows that $A_i$ is integrable and that
\begin{align*}
\mathbb E[A_i]=\mathbb E[\hat f_{h,-i}(X_i)].
\end{align*}
We compare the average of the random variables $A_i$ with the full cross term pathwise, avoiding pointwise expectations of the kernel. By the theorem statement, the estimators are the displayed finite-sum random fields, so the finite sums below define measurable random functions and the evaluation maps used above are measurable. By linearity in the definition of the [Lebesgue integral](/page/Lebesgue%20Integral) for finite sums,
\begin{align*}
\frac{1}{n}\sum_{i=1}^n A_i
=
\int_{\mathbb R}\left(\frac{1}{n}\sum_{i=1}^n \hat f_{h,-i}(x)\right)f(x)\,d\mathcal L^1(x).
\end{align*}
For each $x\in\mathbb R$, the finite-sum formula gives
\begin{align*}
\frac{1}{n}\sum_{i=1}^n \hat f_{h,-i}(x)
=
\frac{1}{n}\sum_{i=1}^n \frac{1}{(n-1)h}\sum_{\substack{1\leq j\leq n,\ j\neq i}} K\left(\frac{x-X_j}{h}\right).
\end{align*}
Since each kernel term with index $j$ appears exactly $n-1$ times in the double sum, this becomes
\begin{align*}
\frac{1}{n}\sum_{i=1}^n \hat f_{h,-i}(x)
=
\frac{1}{nh}\sum_{j=1}^n K\left(\frac{x-X_j}{h}\right).
\end{align*}
By the definition of the full kernel density estimator, the last expression equals $\hat f_h(x)$, so
\begin{align*}
\frac{1}{n}\sum_{i=1}^n \hat f_{h,-i}(x)=\hat f_h(x).
\end{align*}
Hence
\begin{align*}
\frac{1}{n}\sum_{i=1}^n A_i
=
\int_{\mathbb R}\hat f_h(x)f(x)\,d\mathcal L^1(x).
\end{align*}
The right-hand side is integrable by the cross-term estimate in the first step, and the left-hand side is integrable because each $A_i$ is integrable. Taking expectations gives
\begin{align*}
\frac{1}{n}\sum_{i=1}^n\mathbb E[\hat f_{h,-i}(X_i)]
=
\mathbb E\left[\int_{\mathbb R}\hat f_h(x)f(x)\,d\mathcal L^1(x)\right].
\end{align*}
[/step]
[step:Average over the leave-one-out terms and identify the risk]
Using linearity in the definition of expectation and the identity from the previous step,
\begin{align*}
\mathbb E[\operatorname{LSCV}(h)] = \mathbb E\left[\int_{\mathbb R}\hat f_h(x)^2\,d\mathcal L^1(x)\right]-\frac{2}{n}\sum_{i=1}^n \mathbb E[\hat f_{h,-i}(X_i)].
\end{align*}
Substituting the averaged leave-one-out identity gives
\begin{align*}
\mathbb E[\operatorname{LSCV}(h)] = \mathbb E\left[\int_{\mathbb R}\hat f_h(x)^2\,d\mathcal L^1(x)\right]-2\mathbb E\left[\int_{\mathbb R}\hat f_h(x)f(x)\,d\mathcal L^1(x)\right].
\end{align*}
From the expansion of $\operatorname{MISE}(h)$, the preceding right-hand side is exactly
\begin{align*}
\operatorname{MISE}(h)
-
\int_{\mathbb R}f(x)^2\,d\mathcal L^1(x).
\end{align*}
Therefore
\begin{align*}
\mathbb E[\operatorname{LSCV}(h)]
=
\operatorname{MISE}(h)
-
\int_{\mathbb R}f(x)^2\,d\mathcal L^1(x),
\end{align*}
which is the desired identity.
[/step]