Conditional Expectation as the $L^2$ Risk Minimizer

Conditional Expectation as the $L^2$ Risk Minimizer (Theorem # 4421)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] Write the prediction error as the sum of the unpredictable residual $Y-m_X(X)$ and the discrepancy $m_X(X)-g(X)$ between the proposed predictor and the conditional mean. Expanding the square gives the desired risk decomposition once the cross term is shown to have expectation zero. That orthogonality follows from the defining property of conditional expectation because $m_X(X)-g(X)$ is a square-integrable $\sigma(X)$-measurable random variable. The non-negative discrepancy term then identifies the minimizers exactly. [/proofplan] [step:Define the residual and the predictor discrepancy] Fix an admissible Borel measurable function \begin{align*} g:\mathbb R^p \to \mathbb R \end{align*} with $g(X)\in L^2(\Omega,\mathcal F,\mathbb P)$. Define the residual random variable \begin{align*} U:\Omega &\to \mathbb R \\ \omega &\mapsto Y(\omega)-m_X(X(\omega)) \end{align*} and the discrepancy random variable \begin{align*} V:\Omega &\to \mathbb R \\ \omega &\mapsto m_X(X(\omega))-g(X(\omega)). \end{align*} Since $Y\in L^2(\Omega,\mathcal F,\mathbb P)$, $m_X(X)=\mathbb E[Y\mid\sigma(X)]\in L^2(\Omega,\mathcal F,\mathbb P)$, and $g(X)\in L^2(\Omega,\mathcal F,\mathbb P)$, both $U$ and $V$ belong to $L^2(\Omega,\mathcal F,\mathbb P)$. Also, $V$ is $\sigma(X)$-measurable because both $m_X(X)$ and $g(X)$ are $\sigma(X)$-measurable. [/step] [step:Show the residual is orthogonal to every square-integrable function of $X$] We prove that \begin{align*} \mathbb E[UV]=0. \end{align*} For each $k\in\mathbb N$, define the bounded truncation \begin{align*} V_k:\Omega &\to \mathbb R \\ \omega &\mapsto \max\{-k,\min\{V(\omega),k\}\}. \end{align*} Each $V_k$ is bounded and $\sigma(X)$-measurable. Since $m_X(X)=\mathbb E[Y\mid\sigma(X)]$ $\mathbb P$-almost surely, the defining property of conditional expectation gives \begin{align*} \mathbb E[YV_k]=\mathbb E[m_X(X)V_k]. \end{align*} Subtracting the right-hand side from the left-hand side yields \begin{align*} \mathbb E[UV_k]=\mathbb E[(Y-m_X(X))V_k]=0. \end{align*} Because $V_k\to V$ pointwise and $|V_k-V|^2\leq 4|V|^2$ with $V\in L^2(\Omega,\mathcal F,\mathbb P)$, we have $V_k\to V$ in $L^2(\Omega,\mathcal F,\mathbb P)$. Applying the [Cauchy-Schwarz inequality](/theorems/432) to the product $U(V_k-V)$ gives \begin{align*} |\mathbb E[U(V_k-V)]| \leq \mathbb E[U^2]^{1/2}\mathbb E[(V_k-V)^2]^{1/2} \to 0. \end{align*} Therefore \begin{align*} \mathbb E[UV] = \lim_{k\to\infty}\mathbb E[UV_k] = 0. \end{align*} [guided] The only delicate point is that the defining property of conditional expectation is immediate for bounded $\sigma(X)$-measurable test variables, while $V=m_X(X)-g(X)$ is only known to be square-integrable. We therefore approximate $V$ by bounded $\sigma(X)$-measurable truncations. For each $k\in\mathbb N$, define \begin{align*} V_k:\Omega &\to \mathbb R \\ \omega &\mapsto \max\{-k,\min\{V(\omega),k\}\}. \end{align*} Since $V$ is $\sigma(X)$-measurable and the truncation map $t\mapsto \max\{-k,\min\{t,k\}\}$ is Borel measurable, $V_k$ is $\sigma(X)$-measurable. It is also bounded by $k$. Thus the defining property of $m_X(X)=\mathbb E[Y\mid\sigma(X)]$ applies to $V_k$ and gives \begin{align*} \mathbb E[YV_k]=\mathbb E[m_X(X)V_k]. \end{align*} Equivalently, \begin{align*} \mathbb E[(Y-m_X(X))V_k]=0. \end{align*} In the notation $U=Y-m_X(X)$, this is \begin{align*} \mathbb E[UV_k]=0. \end{align*} It remains to pass from $V_k$ to $V$. Since $V_k(\omega)\to V(\omega)$ for every $\omega\in\Omega$ and $|V_k(\omega)-V(\omega)|\leq 2|V(\omega)|$, we have \begin{align*} |V_k(\omega)-V(\omega)|^2\leq 4|V(\omega)|^2. \end{align*} The function $4|V|^2$ is integrable because $V\in L^2(\Omega,\mathcal F,\mathbb P)$, so $V_k\to V$ in $L^2(\Omega,\mathcal F,\mathbb P)$. Since $U\in L^2(\Omega,\mathcal F,\mathbb P)$, the Cauchy-Schwarz inequality gives \begin{align*} |\mathbb E[U(V_k-V)]| \leq \mathbb E[U^2]^{1/2}\mathbb E[(V_k-V)^2]^{1/2} \to 0. \end{align*} Thus \begin{align*} \mathbb E[UV] = \lim_{k\to\infty}\mathbb E[UV_k] = 0. \end{align*} This is the orthogonality statement: the residual $Y-\mathbb E[Y\mid\sigma(X)]$ has zero inner product with every square-integrable function of $X$. [/guided] [/step] [step:Expand the squared error and remove the cross term] Using \begin{align*} Y-g(X)=\bigl(Y-m_X(X)\bigr)+\bigl(m_X(X)-g(X)\bigr)=U+V, \end{align*} we expand the square: \begin{align*} \mathbb E[(Y-g(X))^2] &= \mathbb E[(U+V)^2] \\ &= \mathbb E[U^2]+2\mathbb E[UV]+\mathbb E[V^2]. \end{align*} By the orthogonality established above, $\mathbb E[UV]=0$. Hence \begin{align*} \mathbb E[(Y-g(X))^2] = \mathbb E[(Y-m_X(X))^2] + \mathbb E[(m_X(X)-g(X))^2]. \end{align*} [/step] [step:Identify the minimum and the equality case] The second term in the decomposition is non-negative: \begin{align*} \mathbb E[(m_X(X)-g(X))^2]\geq 0. \end{align*} Therefore, for every admissible $g$, \begin{align*} \mathbb E[(Y-g(X))^2] \geq \mathbb E[(Y-m_X(X))^2]. \end{align*} Taking $g=m_X$ gives equality, so the minimum risk is \begin{align*} \mathbb E[(Y-m_X(X))^2] = \mathbb E[(Y-\mathbb E[Y\mid\sigma(X)])^2]. \end{align*} Moreover, equality holds for an admissible $g$ exactly when \begin{align*} \mathbb E[(m_X(X)-g(X))^2]=0, \end{align*} which is equivalent to $m_X(X)=g(X)$ $\mathbb P$-almost surely. This proves both the minimizing property and the stated equality case. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Cauchy-Schwarz Inequality

Definitions & Concepts

Explore Further

Orthogonality Definition Expectation Definition Cauchy-Schwarz Inequality Theorem #432 Kakutani's Product Martingale Theorem Martingale Theory Weighted Least Squares Gauss-Markov Theorem Probability & Statistics Population Least Squares Projection Probability & Statistics Sum Law by Convolution Probability Theory Moments of Branching Processes Probability Theory Almost Sure Martingale Convergence Theorem Martingale Theory PGF of a Sum Probability Theory MGF of a Sum Probability Theory Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.