[proofplan]
We write the prediction error as the sum of three pieces: the new noise, the deterministic bias, and the centered fluctuation of the estimator. After expanding the square, the three diagonal terms become the noise variance, squared bias, and estimator variance. The mixed terms vanish because the new noise has mean zero, the centered estimator has mean zero, and the new noise is independent of the estimator.
[/proofplan]
custom_env
admin
[step:Decompose the prediction error into noise, bias, and centered estimator fluctuation]
Define the deterministic bias scalar $b \in \mathbb R$ by
\begin{align*}
b := \mathbb E[\hat f(x_0)] - f(x_0).
\end{align*}
Define the centered estimator fluctuation
\begin{align*}
Z : \Omega \to \mathbb R,
\qquad
Z := \hat f(x_0) - \mathbb E[\hat f(x_0)].
\end{align*}
Since $\hat f(x_0)$ is square-integrable, $Z$ is square-integrable and
\begin{align*}
\mathbb E[Z] = 0,
\qquad
\mathbb E[Z^2] = \operatorname{Var}(\hat f(x_0)).
\end{align*}
Using $Y_{\mathrm{new}} = f(x_0)+\varepsilon_{\mathrm{new}}$, we obtain
\begin{align*}
Y_{\mathrm{new}}-\hat f(x_0)
&=
f(x_0)+\varepsilon_{\mathrm{new}}-\hat f(x_0) \\
&=
\varepsilon_{\mathrm{new}}-\left(\mathbb E[\hat f(x_0)]-f(x_0)\right)
-\left(\hat f(x_0)-\mathbb E[\hat f(x_0)]\right) \\
&=
\varepsilon_{\mathrm{new}} - b - Z.
\end{align*}
[/step]
custom_env
admin
[step:Expand the squared error and take expectations]
Because $\varepsilon_{\mathrm{new}}$ and $Z$ are square-integrable, all terms in the following expansion are integrable. Expanding the square gives
\begin{align*}
(Y_{\mathrm{new}}-\hat f(x_0))^2
&=
(\varepsilon_{\mathrm{new}}-b-Z)^2 \\
&=
\varepsilon_{\mathrm{new}}^2 + b^2 + Z^2
-2b\varepsilon_{\mathrm{new}}
-2\varepsilon_{\mathrm{new}}Z
+2bZ.
\end{align*}
Taking expectations and using linearity of expectation,
\begin{align*}
\mathbb E[(Y_{\mathrm{new}}-\hat f(x_0))^2]
&=
\mathbb E[\varepsilon_{\mathrm{new}}^2]
+b^2
+\mathbb E[Z^2]
-2b\,\mathbb E[\varepsilon_{\mathrm{new}}]
-2\mathbb E[\varepsilon_{\mathrm{new}}Z]
+2b\,\mathbb E[Z].
\end{align*}
[/step]
custom_env
admin
[step:Show that the mixed terms vanish]
Since $\mathbb E[\varepsilon_{\mathrm{new}}]=0$, the term $-2b\,\mathbb E[\varepsilon_{\mathrm{new}}]$ is zero. Since $\mathbb E[Z]=0$, the term $2b\,\mathbb E[Z]$ is zero.
The random variable $Z$ is a measurable function of $\hat f(x_0)$. Since $\hat f(x_0)$ is independent of $\varepsilon_{\mathrm{new}}$, the random variables $Z$ and $\varepsilon_{\mathrm{new}}$ are independent. Therefore,
\begin{align*}
\mathbb E[\varepsilon_{\mathrm{new}}Z]
=
\mathbb E[\varepsilon_{\mathrm{new}}]\,\mathbb E[Z]
=
0 \cdot 0
=
0.
\end{align*}
Thus all mixed terms vanish.
[/step]
custom_env
admin
[step:Identify the remaining terms with noise variance, squared bias, and estimator variance]
Since $\mathbb E[\varepsilon_{\mathrm{new}}]=0$ and $\operatorname{Var}(\varepsilon_{\mathrm{new}})=\sigma^2$,
\begin{align*}
\mathbb E[\varepsilon_{\mathrm{new}}^2]
=
\operatorname{Var}(\varepsilon_{\mathrm{new}})
=
\sigma^2.
\end{align*}
By the definition of $b$,
\begin{align*}
b^2
=
\left(\mathbb E[\hat f(x_0)]-f(x_0)\right)^2.
\end{align*}
By the definition of $Z$,
\begin{align*}
\mathbb E[Z^2]
=
\operatorname{Var}(\hat f(x_0)).
\end{align*}
Substituting these three identities into the expectation expansion gives
\begin{align*}
\mathbb E[(Y_{\mathrm{new}}-\hat f(x_0))^2]
=
\sigma^2
+
\left(\mathbb E[\hat f(x_0)]-f(x_0)\right)^2
+
\operatorname{Var}(\hat f(x_0)).
\end{align*}
This is the desired bias–variance decomposition.
[/step]