Properties of Conditional Expectation — Statement & Proof

Properties of Conditional Expectation (Theorem # 1122)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove Property (iv), the orthogonal decomposition that identifies $\mathbb{E}[Z \mid W]$ as the best least-squares predictor. The strategy is to add and subtract $\mathbb{E}[Z \mid W]$ inside $(Z - g(W))^2$, expand the square into three terms, and show that the cross term vanishes. The cross term is eliminated by conditioning on $W$, applying the "taking out what is known" property (iii) to extract $(\mathbb{E}[Z \mid W] - g(W))$ as a measurable function of $W$, and using the defining property $\mathbb{E}[Z - \mathbb{E}[Z \mid W] \mid W] = 0$. [/proofplan] [step:Add and subtract $\mathbb{E}[Z \mid W]$ and expand the square] Write $m(W) := \mathbb{E}[Z \mid W]$ for brevity. Decompose \begin{align*} Z - g(W) = \bigl(Z - m(W)\bigr) + \bigl(m(W) - g(W)\bigr). \end{align*} Expanding the square and taking expectations: \begin{align*} \mathbb{E}[(Z - g(W))^2] &= \mathbb{E}\bigl[\bigl(Z - m(W)\bigr)^2\bigr] + \mathbb{E}\bigl[\bigl(m(W) - g(W)\bigr)^2\bigr] \\ &\quad + 2\,\mathbb{E}\bigl[\bigl(Z - m(W)\bigr)\bigl(m(W) - g(W)\bigr)\bigr]. \end{align*} The expansion uses the algebraic identity $(a + b)^2 = a^2 + 2ab + b^2$ applied with $a = Z - m(W)$ and $b = m(W) - g(W)$, together with linearity of expectation. [/step] [step:Show the cross term vanishes by conditioning on $W$] It remains to show \begin{align*} \mathbb{E}\bigl[(Z - m(W))(m(W) - g(W))\bigr] = 0. \end{align*} Apply the tower property (ii) with the $\sigma$-algebra $\sigma(W)$: \begin{align*} \mathbb{E}\bigl[(Z - m(W))(m(W) - g(W))\bigr] = \mathbb{E}\Bigl[\mathbb{E}\bigl[(Z - m(W))(m(W) - g(W)) \mid W\bigr]\Bigr]. \end{align*} The tower property applies because $(Z - m(W))(m(W) - g(W))$ is integrable: by the Cauchy--Schwarz inequality, $\mathbb{E}[|(Z - m(W))(m(W) - g(W))|] \leq \|Z - m(W)\|_{L^2} \cdot \|m(W) - g(W)\|_{L^2}$, and both factors are finite since $\mathbb{E}[Z^2] < \infty$ and $\mathbb{E}[(g(W))^2] < \infty$. Inside the inner conditional expectation, $m(W) - g(W)$ is $\sigma(W)$-measurable. By Property (iii) — "taking out what is known" — with $\mathbb{E}[(m(W) - g(W))^2] < \infty$ and $\mathbb{E}[(Z - m(W))^2] < \infty$: \begin{align*} \mathbb{E}\bigl[(Z - m(W))(m(W) - g(W)) \mid W\bigr] = (m(W) - g(W)) \cdot \mathbb{E}[Z - m(W) \mid W]. \end{align*} Now evaluate $\mathbb{E}[Z - m(W) \mid W]$. By linearity of conditional expectation and the fact that $m(W) = \mathbb{E}[Z \mid W]$ is $\sigma(W)$-measurable: \begin{align*} \mathbb{E}[Z - m(W) \mid W] = \mathbb{E}[Z \mid W] - \mathbb{E}[m(W) \mid W] = m(W) - m(W) = 0. \end{align*} Therefore the inner conditional expectation is $(m(W) - g(W)) \cdot 0 = 0$, and consequently \begin{align*} \mathbb{E}\bigl[(Z - m(W))(m(W) - g(W))\bigr] = \mathbb{E}[0] = 0. \end{align*} [guided] The core of this step is showing that the "prediction error" $Z - \mathbb{E}[Z \mid W]$ is orthogonal (in the $L^2$ sense) to every $\sigma(W)$-measurable random variable. We use the tower property to move from a global expectation to a conditional one: \begin{align*} \mathbb{E}\bigl[(Z - m(W))(m(W) - g(W))\bigr] = \mathbb{E}\Bigl[\mathbb{E}\bigl[(Z - m(W))(m(W) - g(W)) \mid W\bigr]\Bigr]. \end{align*} Why does this help? Because conditioning on $W$ makes $m(W) - g(W)$ a known quantity (it is $\sigma(W)$-measurable), so we can pull it outside the conditional expectation using Property (iii): \begin{align*} \mathbb{E}\bigl[(Z - m(W))(m(W) - g(W)) \mid W\bigr] = (m(W) - g(W)) \cdot \mathbb{E}[Z - m(W) \mid W]. \end{align*} The application of Property (iii) requires both factors to be in $L^2$: $\mathbb{E}[(Z - m(W))^2] < \infty$ holds because $\mathbb{E}[Z^2] < \infty$, and $\mathbb{E}[(m(W) - g(W))^2] < \infty$ because both $m(W)$ and $g(W)$ are in $L^2$. Now the decisive computation: $\mathbb{E}[Z - m(W) \mid W] = \mathbb{E}[Z \mid W] - m(W) = 0$. This uses the linearity of conditional expectation and the fact that $m(W) = \mathbb{E}[Z \mid W]$ is $\sigma(W)$-measurable, so $\mathbb{E}[m(W) \mid W] = m(W)$. In other words, $Z - \mathbb{E}[Z \mid W]$ has conditional mean zero given $W$ — this is the fundamental property of conditional expectation as a projection. Since the inner conditional expectation is zero, the outer expectation is zero, and the cross term vanishes. [/guided] [/step] [step:Conclude the orthogonal decomposition and uniqueness of the minimiser] Combining the expansion with the vanishing cross term: \begin{align*} \mathbb{E}[(Z - g(W))^2] = \mathbb{E}[(Z - \mathbb{E}[Z \mid W])^2] + \mathbb{E}[(\mathbb{E}[Z \mid W] - g(W))^2]. \end{align*} The first term on the right-hand side is the irreducible error — it does not depend on the choice of $g$. The second term $\mathbb{E}[(\mathbb{E}[Z \mid W] - g(W))^2] \geq 0$, with equality if and only if $g(W) = \mathbb{E}[Z \mid W]$ almost surely. Therefore $g = \mathbb{E}[Z \mid W]$ is the unique minimiser of $g \mapsto \mathbb{E}[(Z - g(W))^2]$ over all measurable $g : \mathbb{R}^d \to \mathbb{R}$. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

random variable Definition conditional expectation Definition independence Definition Existence of Nonmeasurable Subsets of the Real Line Probability & Statistics Inverse Transform Sampling Probability Theory Unbiasedness of Ordinary Least Squares Under Strict Exogeneity Probability & Statistics Donsker's Invariance Principle Brownian Motion Radon-Nikodym Theorem (Probabilistic) Martingale Theory Ordinary Least Squares Projection Theorem Probability & Statistics Wald, Likelihood-Ratio, and Score Tests in Regular Generalized Linear Models Probability & Statistics Moments from the MGF Probability Theory Probability & Statistics Area Probability Theory Subarea

What brings you to Androma?

Start with a route through the knowledge graph.

Properties of Conditional Expectation (Theorem # 1122)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Properties of Conditional Expectation (Theorem # 1122)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further