Conditional Expectation as the $L^2$ Projection — Statement & Proof

Conditional Expectation as the $L^2$ Projection (Theorem # 3537)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The theorem is equivalent to the statement that $\mathbb{E}[X \mid \mathcal{G}]$ is the orthogonal projection of $X$ onto the closed linear subspace $L^2(\Omega, \mathcal{G}, \mathbb{P})$ of the real Hilbert space $L^2(\Omega, \mathcal{F}, \mathbb{P})$, equipped with the inner product $(U, V)_{L^2} = \mathbb{E}[UV]$. The proof has four steps. First, the conditional Jensen inequality establishes that $\mathbb{E}[X \mid \mathcal{G}]$ itself belongs to $L^2(\Omega, \mathcal{G}, \mathbb{P})$. Next, the "taking out what is known" and tower properties of conditional expectation yield the key orthogonality: the residual $X - \mathbb{E}[X \mid \mathcal{G}]$ satisfies $\mathbb{E}[(X - \mathbb{E}[X \mid \mathcal{G}])\,W] = 0$ for every $W \in L^2(\Omega, \mathcal{G}, \mathbb{P})$. A Pythagorean expansion of $\mathbb{E}[(X - Z)^2]$ then produces a non-negative correction term $\mathbb{E}[(\mathbb{E}[X \mid \mathcal{G}] - Z)^2]$, giving the inequality; equality holds precisely when this correction term vanishes, which forces $Z = \mathbb{E}[X \mid \mathcal{G}]$ $\mathbb{P}$-almost surely. [/proofplan] [step:Verify that $\mathbb{E}[X \mid \mathcal{G}]$ belongs to $L^2(\Omega, \mathcal{G}, \mathbb{P})$ via the conditional Jensen inequality] Since $(\Omega, \mathcal{F}, \mathbb{P})$ is a probability space and $X \in L^2(\Omega, \mathcal{F}, \mathbb{P})$, the [Cauchy-Schwarz Inequality](/theorems/432) gives $\mathbb{E}[|X|] = \mathbb{E}[|X| \cdot 1] \leq \|X\|_{L^2(\Omega,\mathcal{F},\mathbb{P})}\|1\|_{L^2(\Omega,\mathcal{F},\mathbb{P})} = \|X\|_{L^2(\Omega,\mathcal{F},\mathbb{P})} < \infty$, so $X \in L^1(\Omega, \mathcal{F}, \mathbb{P})$. The function $\varphi: \mathbb{R} \to \mathbb{R}$, $t \mapsto t^2$, is convex, and $\mathbb{E}[\varphi(X)] = \mathbb{E}[X^2] < \infty$ since $X \in L^2$. Applying the [Conditional Jensen Inequality](/theorems/1149) to $\varphi$ and the sub-$\sigma$-algebra $\mathcal{G}$: \begin{align*} \bigl(\mathbb{E}[X \mid \mathcal{G}]\bigr)^2 \leq \mathbb{E}[X^2 \mid \mathcal{G}] \quad \mathbb{P}\text{-a.s.} \end{align*} Taking expectations of both sides and applying the [Tower Property of Conditional Expectation](/theorems/1150): \begin{align*} \mathbb{E}\!\left[\bigl(\mathbb{E}[X \mid \mathcal{G}]\bigr)^2\right] \leq \mathbb{E}\!\left[\mathbb{E}[X^2 \mid \mathcal{G}]\right] = \mathbb{E}[X^2] < \infty. \end{align*} Since $\mathbb{E}[X \mid \mathcal{G}]$ is $\mathcal{G}$-measurable by definition and has finite second moment, $\mathbb{E}[X \mid \mathcal{G}] \in L^2(\Omega, \mathcal{G}, \mathbb{P})$. [/step] [step:Show that the residual $X - \mathbb{E}[X \mid \mathcal{G}]$ is orthogonal to every element of $L^2(\Omega, \mathcal{G}, \mathbb{P})$] Let $W \in L^2(\Omega, \mathcal{G}, \mathbb{P})$. By the [Cauchy-Schwarz Inequality](/theorems/432): \begin{align*} \mathbb{E}[|XW|] \leq \|X\|_{L^2(\Omega,\mathcal{F},\mathbb{P})}\,\|W\|_{L^2(\Omega,\mathcal{G},\mathbb{P})} < \infty, \end{align*} so $XW \in L^1(\Omega, \mathcal{F}, \mathbb{P})$. Since $W$ is $\mathcal{G}$-measurable and $XW \in L^1$, the "taking out what is known" property of conditional expectation (cf. [Basic Properties of Conditional Expectation](/theorems/1148)) gives: \begin{align*} \mathbb{E}[XW \mid \mathcal{G}] = W\,\mathbb{E}[X \mid \mathcal{G}] \quad \mathbb{P}\text{-a.s.} \end{align*} Taking expectations and applying the [Tower Property of Conditional Expectation](/theorems/1150) to the left-hand side: \begin{align*} \mathbb{E}[XW] = \mathbb{E}\!\left[\mathbb{E}[XW \mid \mathcal{G}]\right] = \mathbb{E}\!\left[W\,\mathbb{E}[X \mid \mathcal{G}]\right]. \end{align*} By linearity of expectation: \begin{align*} \mathbb{E}\!\left[\bigl(X - \mathbb{E}[X \mid \mathcal{G}]\bigr)W\right] = \mathbb{E}[XW] - \mathbb{E}\!\left[\mathbb{E}[X \mid \mathcal{G}]\cdot W\right] = \mathbb{E}[XW] - \mathbb{E}[XW] = 0. \end{align*} [guided] We want to show the residual $X - \mathbb{E}[X \mid \mathcal{G}]$ is uncorrelated with every $\mathcal{G}$-measurable square-integrable random variable — that is, $\mathbb{E}[(X - \mathbb{E}[X \mid \mathcal{G}])\,W] = 0$ for all $W \in L^2(\Omega, \mathcal{G}, \mathbb{P})$. In Hilbert space language this says the error is orthogonal to the entire subspace $L^2(\Omega, \mathcal{G}, \mathbb{P})$, which is the defining property of an orthogonal projection. Intuitively, $\mathbb{E}[X \mid \mathcal{G}]$ has already "extracted" everything in $X$ that is visible through the $\sigma$-algebra $\mathcal{G}$; the leftover residual cannot be detected by any $\mathcal{G}$-measurable probe $W$. Let $W \in L^2(\Omega, \mathcal{G}, \mathbb{P})$. **Step 1: Verify that $XW$ and $\mathbb{E}[X \mid \mathcal{G}] \cdot W$ are integrable.** This is needed before we can manipulate expectations. By the [Cauchy-Schwarz Inequality](/theorems/432) applied to $(\Omega, \mathcal{F}, \mathbb{P})$: \begin{align*} \mathbb{E}[|XW|] \leq \|X\|_{L^2(\Omega,\mathcal{F},\mathbb{P})}\,\|W\|_{L^2(\Omega,\mathcal{G},\mathbb{P})} < \infty, \end{align*} so $XW \in L^1(\Omega, \mathcal{F}, \mathbb{P})$. Since $\mathbb{E}[X \mid \mathcal{G}] \in L^2(\Omega, \mathcal{G}, \mathbb{P})$ (established in the previous step), the same Cauchy-Schwarz bound gives $\mathbb{E}[X \mid \mathcal{G}] \cdot W \in L^1(\Omega, \mathcal{G}, \mathbb{P})$. **Step 2: Reduce to the defining identity of conditional expectation.** By linearity: \begin{align*} \mathbb{E}\!\left[\bigl(X - \mathbb{E}[X \mid \mathcal{G}]\bigr)W\right] = \mathbb{E}[XW] - \mathbb{E}\!\left[\mathbb{E}[X \mid \mathcal{G}]\cdot W\right]. \end{align*} It suffices to show $\mathbb{E}[\mathbb{E}[X \mid \mathcal{G}] \cdot W] = \mathbb{E}[XW]$. **Step 3: Apply "taking out what is known" and the tower property.** We invoke two properties from [Basic Properties of Conditional Expectation](/theorems/1148) and the [Tower Property of Conditional Expectation](/theorems/1150). - **Taking out what is known**: since $W$ is $\mathcal{G}$-measurable and $XW \in L^1(\Omega, \mathcal{F}, \mathbb{P})$, we have \begin{align*} \mathbb{E}[XW \mid \mathcal{G}] = W\,\mathbb{E}[X \mid \mathcal{G}] \quad \mathbb{P}\text{-a.s.} \end{align*} The reason this holds is that from the perspective of $\mathcal{G}$, the value of $W$ is already determined, so it factors out of the conditional expectation of $X$. - **Tower property**: for any $Y \in L^1(\Omega, \mathcal{F}, \mathbb{P})$, $\mathbb{E}[\mathbb{E}[Y \mid \mathcal{G}]] = \mathbb{E}[Y]$. Chaining these: \begin{align*} \mathbb{E}\!\left[\mathbb{E}[X \mid \mathcal{G}]\cdot W\right] = \mathbb{E}\!\left[\mathbb{E}[XW \mid \mathcal{G}]\right] = \mathbb{E}[XW]. \end{align*} **Conclusion.** Substituting back: \begin{align*} \mathbb{E}\!\left[\bigl(X - \mathbb{E}[X \mid \mathcal{G}]\bigr)W\right] = \mathbb{E}[XW] - \mathbb{E}[XW] = 0. \end{align*} This orthogonality is the key structural fact. It says that conditional expectation is not merely a good predictor — it is the unique best predictor in the sense that the prediction error is completely invisible to any $\mathcal{G}$-measurable function. The Pythagorean expansion in the next step converts this orthogonality into the minimization inequality. [/guided] [/step] [step:Decompose $X - Z$ into the residual and correction, then apply the Pythagorean identity to obtain the inequality] Define the residual and correction: \begin{align*} \varepsilon &:= X - \mathbb{E}[X \mid \mathcal{G}] \in L^2(\Omega, \mathcal{F}, \mathbb{P}), \\ \delta &:= \mathbb{E}[X \mid \mathcal{G}] - Z \in L^2(\Omega, \mathcal{G}, \mathbb{P}). \end{align*} By construction $X - Z = \varepsilon + \delta$. Since $\varepsilon, \delta \in L^2(\Omega, \mathcal{F}, \mathbb{P})$, the [Cauchy-Schwarz Inequality](/theorems/432) gives $\mathbb{E}[|\varepsilon\delta|] \leq \|\varepsilon\|_{L^2}\|\delta\|_{L^2} < \infty$, so $\varepsilon\delta \in L^1(\Omega, \mathcal{F}, \mathbb{P})$ and the bilinear expansion is valid: \begin{align*} \mathbb{E}[(X - Z)^2] = \mathbb{E}[(\varepsilon + \delta)^2] = \mathbb{E}[\varepsilon^2] + 2\,\mathbb{E}[\varepsilon\delta] + \mathbb{E}[\delta^2]. \end{align*} Since $\delta \in L^2(\Omega, \mathcal{G}, \mathbb{P})$, the preceding step applies with $W := \delta$ to give $\mathbb{E}[\varepsilon\delta] = 0$. Since $\mathbb{E}[\delta^2] \geq 0$: \begin{align*} \mathbb{E}[(X - Z)^2] = \mathbb{E}[\varepsilon^2] + \mathbb{E}[\delta^2] \geq \mathbb{E}[\varepsilon^2] = \mathbb{E}\!\left[(X - \mathbb{E}[X \mid \mathcal{G}])^2\right]. \end{align*} This is the claimed inequality. [guided] The strategy is to write $X - Z$ as a sum of two parts: the irreducible error $\varepsilon = X - \mathbb{E}[X \mid \mathcal{G}]$ (the error of the optimal predictor, which no $\mathcal{G}$-measurable estimate can improve) and the correction $\delta = \mathbb{E}[X \mid \mathcal{G}] - Z$ (the gap between the optimal predictor and the chosen estimator $Z$). The orthogonality established in the preceding step means these two parts are perpendicular in $L^2$, so the total squared error splits as the sum of the two squared errors — the Pythagorean theorem in the Hilbert space $L^2(\Omega, \mathcal{F}, \mathbb{P})$. **Setting up the decomposition.** Define: \begin{align*} \varepsilon &:= X - \mathbb{E}[X \mid \mathcal{G}], \\ \delta &:= \mathbb{E}[X \mid \mathcal{G}] - Z. \end{align*} We check membership in the relevant $L^2$ spaces. Since $X \in L^2(\Omega, \mathcal{F}, \mathbb{P})$ and $\mathbb{E}[X \mid \mathcal{G}] \in L^2(\Omega, \mathcal{G}, \mathbb{P}) \subseteq L^2(\Omega, \mathcal{F}, \mathbb{P})$ (from the previous step), $\varepsilon$ belongs to $L^2(\Omega, \mathcal{F}, \mathbb{P})$. Since $\mathbb{E}[X \mid \mathcal{G}], Z \in L^2(\Omega, \mathcal{G}, \mathbb{P})$ and that space is closed under subtraction, $\delta \in L^2(\Omega, \mathcal{G}, \mathbb{P})$. In particular, $\delta$ is $\mathcal{G}$-measurable. By construction, $\varepsilon + \delta = X - Z$. **Validity of the bilinear expansion.** To expand $\mathbb{E}[(\varepsilon + \delta)^2]$ we need the cross term $\mathbb{E}[\varepsilon\delta]$ to be finite. By the [Cauchy-Schwarz Inequality](/theorems/432): \begin{align*} \mathbb{E}[|\varepsilon\delta|] \leq \|\varepsilon\|_{L^2(\Omega,\mathcal{F},\mathbb{P})}\,\|\delta\|_{L^2(\Omega,\mathcal{G},\mathbb{P})} < \infty, \end{align*} so $\varepsilon\delta \in L^1(\Omega, \mathcal{F}, \mathbb{P})$ and the expansion is justified: \begin{align*} \mathbb{E}[(X - Z)^2] = \mathbb{E}[(\varepsilon + \delta)^2] = \mathbb{E}[\varepsilon^2] + 2\,\mathbb{E}[\varepsilon\delta] + \mathbb{E}[\delta^2]. \end{align*} **Killing the cross term.** The cross term vanishes by orthogonality: since $\delta \in L^2(\Omega, \mathcal{G}, \mathbb{P})$, we apply the result of the preceding step with $W := \delta$ to obtain $\mathbb{E}[\varepsilon\delta] = \mathbb{E}[(X - \mathbb{E}[X \mid \mathcal{G}])\,\delta] = 0$. In Hilbert space terms, $\varepsilon$ and $\delta$ are orthogonal elements of $L^2(\Omega, \mathcal{F}, \mathbb{P})$, so the Pythagorean identity holds: $\|\varepsilon + \delta\|_{L^2}^2 = \|\varepsilon\|_{L^2}^2 + \|\delta\|_{L^2}^2$. **Concluding the inequality.** Since $\mathbb{E}[\delta^2] \geq 0$: \begin{align*} \mathbb{E}[(X - Z)^2] = \mathbb{E}[\varepsilon^2] + \mathbb{E}[\delta^2] \geq \mathbb{E}[\varepsilon^2] = \mathbb{E}\!\left[(X - \mathbb{E}[X \mid \mathcal{G}])^2\right]. \end{align*} The mean-square error of any $\mathcal{G}$-measurable estimator $Z$ is at least the mean-square error of the conditional expectation. The extra cost of using $Z$ rather than $\mathbb{E}[X \mid \mathcal{G}]$ is precisely $\mathbb{E}[\delta^2] = \mathbb{E}[(\mathbb{E}[X \mid \mathcal{G}] - Z)^2]$, the squared $L^2$-distance between $Z$ and the optimal predictor. [/guided] [/step] [step:Conclude that equality holds if and only if $Z = \mathbb{E}[X \mid \mathcal{G}]$ $\mathbb{P}$-almost surely] The preceding step established the identity \begin{align*} \mathbb{E}[(X - Z)^2] = \mathbb{E}\!\left[(X - \mathbb{E}[X \mid \mathcal{G}])^2\right] + \mathbb{E}\!\left[(\mathbb{E}[X \mid \mathcal{G}] - Z)^2\right]. \end{align*} Equality $\mathbb{E}[(X - Z)^2] = \mathbb{E}[(X - \mathbb{E}[X \mid \mathcal{G}])^2]$ holds if and only if $\mathbb{E}[(\mathbb{E}[X \mid \mathcal{G}] - Z)^2] = 0$. Since $(\mathbb{E}[X \mid \mathcal{G}] - Z)^2 \geq 0$ $\mathbb{P}$-a.s., its expectation is zero if and only if $(\mathbb{E}[X \mid \mathcal{G}] - Z)^2 = 0$ $\mathbb{P}$-a.s., which is equivalent to $Z = \mathbb{E}[X \mid \mathcal{G}]$ $\mathbb{P}$-almost surely. This establishes both the inequality and the characterization of equality, completing the proof. [/step]

Explore Further

Independence of the Sample Mean and Sample Covariance for a Multivariate Normal Sample probability Distribution of the Sample Mean of a Multivariate Normal Sample probability Maximum Likelihood Estimator of the Coefficient Matrix in the Multivariate Linear Model probability Expectation of an Indicator Random Variable probability Diagonal Marginals of the Wishart Distribution probability Covariance Stationarity Criterion for the GARCH(1,1) Process probability Wishart Distribution of the Residual Sum of Squares in the Multivariate Linear Model probability Two-Sample Hotelling $T^2$ Distribution Theorem probability

What brings you to Androma?

Start with a route through the knowledge graph.

Conditional Expectation as the $L^2$ Projection (Theorem # 3537)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Conditional Expectation as the $L^2$ Projection (Theorem # 3537)

Discussion

Proof

Explore Further