Bias–Variance Decomposition for Prediction Error

Bias–Variance Decomposition for Prediction Error (Theorem # 4464)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We write the prediction error as the sum of three pieces: the new noise, the deterministic bias, and the centered fluctuation of the estimator. After expanding the square, the three diagonal terms become the noise variance, squared bias, and estimator variance. The mixed terms vanish because the new noise has mean zero, the centered estimator has mean zero, and the new noise is independent of the estimator. [/proofplan] [step:Decompose the prediction error into noise, bias, and centered estimator fluctuation] Define the deterministic bias scalar $b \in \mathbb R$ by \begin{align*} b := \mathbb E[\hat f(x_0)] - f(x_0). \end{align*} Define the centered estimator fluctuation \begin{align*} Z : \Omega \to \mathbb R, \qquad Z := \hat f(x_0) - \mathbb E[\hat f(x_0)]. \end{align*} Since $\hat f(x_0)$ is square-integrable, $Z$ is square-integrable and \begin{align*} \mathbb E[Z] = 0, \qquad \mathbb E[Z^2] = \operatorname{Var}(\hat f(x_0)). \end{align*} Using $Y_{\mathrm{new}} = f(x_0)+\varepsilon_{\mathrm{new}}$, we obtain \begin{align*} Y_{\mathrm{new}}-\hat f(x_0) &= f(x_0)+\varepsilon_{\mathrm{new}}-\hat f(x_0) \\ &= \varepsilon_{\mathrm{new}}-\left(\mathbb E[\hat f(x_0)]-f(x_0)\right) -\left(\hat f(x_0)-\mathbb E[\hat f(x_0)]\right) \\ &= \varepsilon_{\mathrm{new}} - b - Z. \end{align*} [/step] [step:Expand the squared error and take expectations] Because $\varepsilon_{\mathrm{new}}$ and $Z$ are square-integrable, all terms in the following expansion are integrable. Expanding the square gives \begin{align*} (Y_{\mathrm{new}}-\hat f(x_0))^2 &= (\varepsilon_{\mathrm{new}}-b-Z)^2 \\ &= \varepsilon_{\mathrm{new}}^2 + b^2 + Z^2 -2b\varepsilon_{\mathrm{new}} -2\varepsilon_{\mathrm{new}}Z +2bZ. \end{align*} Taking expectations and using linearity of expectation, \begin{align*} \mathbb E[(Y_{\mathrm{new}}-\hat f(x_0))^2] &= \mathbb E[\varepsilon_{\mathrm{new}}^2] +b^2 +\mathbb E[Z^2] -2b\,\mathbb E[\varepsilon_{\mathrm{new}}] -2\mathbb E[\varepsilon_{\mathrm{new}}Z] +2b\,\mathbb E[Z]. \end{align*} [/step] [step:Show that the mixed terms vanish] Since $\mathbb E[\varepsilon_{\mathrm{new}}]=0$, the term $-2b\,\mathbb E[\varepsilon_{\mathrm{new}}]$ is zero. Since $\mathbb E[Z]=0$, the term $2b\,\mathbb E[Z]$ is zero. The random variable $Z$ is a measurable function of $\hat f(x_0)$. Since $\hat f(x_0)$ is independent of $\varepsilon_{\mathrm{new}}$, the random variables $Z$ and $\varepsilon_{\mathrm{new}}$ are independent. Therefore, \begin{align*} \mathbb E[\varepsilon_{\mathrm{new}}Z] = \mathbb E[\varepsilon_{\mathrm{new}}]\,\mathbb E[Z] = 0 \cdot 0 = 0. \end{align*} Thus all mixed terms vanish. [/step] [step:Identify the remaining terms with noise variance, squared bias, and estimator variance] Since $\mathbb E[\varepsilon_{\mathrm{new}}]=0$ and $\operatorname{Var}(\varepsilon_{\mathrm{new}})=\sigma^2$, \begin{align*} \mathbb E[\varepsilon_{\mathrm{new}}^2] = \operatorname{Var}(\varepsilon_{\mathrm{new}}) = \sigma^2. \end{align*} By the definition of $b$, \begin{align*} b^2 = \left(\mathbb E[\hat f(x_0)]-f(x_0)\right)^2. \end{align*} By the definition of $Z$, \begin{align*} \mathbb E[Z^2] = \operatorname{Var}(\hat f(x_0)). \end{align*} Substituting these three identities into the expectation expansion gives \begin{align*} \mathbb E[(Y_{\mathrm{new}}-\hat f(x_0))^2] = \sigma^2 + \left(\mathbb E[\hat f(x_0)]-f(x_0)\right)^2 + \operatorname{Var}(\hat f(x_0)). \end{align*} This is the desired bias–variance decomposition. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Expectation

Explore Further

Expectation Definition Omitted Variable Bias Formula Probability & Statistics Taking Out What is Known Conditional Expectation Independence Under Complementation Probability Theory Backwards Martingale Convergence Theorem Martingale Theory Conditional Expectations are Uniformly Integrable Martingale Theory Exponential Martingale for Brownian Motion Brownian Motion Gambler's Ruin Recurrence Probability Theory $L^p$ Contraction Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.