Conditional Expectation as the Mean Square Forecast

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] Let $X := \mathbb E[Y \mid \mathcal G]$. We first prove that $X$ is square-integrable by testing the defining identity for conditional expectation against bounded truncations of $X$. We then show that the bounded-test orthogonality property is exactly the conditional expectation property and prove uniqueness by testing the difference of two candidates against its own truncations. Finally, we extend orthogonality from bounded $\mathcal G$-measurable test variables to all variables in $L^2(\Omega,\mathcal G,\mathbb P)$ and use the resulting Pythagorean identity to prove the unique mean-square minimisation property. [/proofplan] [step:Show that the conditional expectation is square-integrable] Because $Y \in L^2(\Omega,\mathcal F,\mathbb P)$ and $\mathbb P(\Omega)=1$, the random variable $Y$ belongs to $L^1(\Omega,\mathcal F,\mathbb P)$. Let \begin{align*} X: (\Omega,\mathcal G) &\to (\mathbb R,\mathcal B(\mathbb R)) \end{align*} denote $X := \mathbb E[Y \mid \mathcal G]$. For each $n \in \mathbb N$, define the bounded $\mathcal G$-measurable random variable \begin{align*} Z_n: \Omega &\to \mathbb R \\ \omega &\mapsto X(\omega)\mathbb 1_{\{|X|\le n\}}(\omega). \end{align*} The defining property of conditional expectation gives \begin{align*} \mathbb E[X Z_n] = \mathbb E[Y Z_n]. \end{align*} Since $X Z_n = X^2\mathbb 1_{\{|X|\le n\}}$, we have \begin{align*} \mathbb E[X^2\mathbb 1_{\{|X|\le n\}}] = \mathbb E[Y X\mathbb 1_{\{|X|\le n\}}]. \end{align*} Set \begin{align*} a_n := \mathbb E[X^2\mathbb 1_{\{|X|\le n\}}] \in [0,\infty). \end{align*} Using $2|ab|\le a^2+b^2$ with \begin{align*} a = \varepsilon |Y|, \qquad b = \varepsilon^{-1}|X|\mathbb 1_{\{|X|\le n\}}, \end{align*} for any $\varepsilon>0$, we obtain \begin{align*} a_n &\le \mathbb E[|Y|\,|X|\,\mathbb 1_{\{|X|\le n\}}] \\ &\le \frac{\varepsilon^2}{2}\mathbb E[Y^2] + \frac{1}{2\varepsilon^2}\mathbb E[X^2\mathbb 1_{\{|X|\le n\}}] \\ &= \frac{\varepsilon^2}{2}\mathbb E[Y^2] + \frac{1}{2\varepsilon^2}a_n. \end{align*} Taking $\varepsilon=1$ gives \begin{align*} a_n \le \mathbb E[Y^2]. \end{align*} The sequence $(X^2\mathbb 1_{\{|X|\le n\}})_{n\in\mathbb N}$ increases pointwise to $X^2$, so the definition of the non-negative expectation by increasing truncation gives \begin{align*} \mathbb E[X^2] = \sup_{n\in\mathbb N} a_n \le \mathbb E[Y^2] < \infty. \end{align*} Thus $X \in L^2(\Omega,\mathcal G,\mathbb P)$. [guided] The conditional expectation is initially defined for integrable random variables, and $Y$ is integrable because $Y \in L^2$ on a probability space: \begin{align*} |Y| \le \frac{1}{2}(Y^2+1), \end{align*} so $\mathbb E[|Y|]<\infty$. Let \begin{align*} X: (\Omega,\mathcal G) &\to (\mathbb R,\mathcal B(\mathbb R)) \end{align*} be $X := \mathbb E[Y\mid\mathcal G]$. The point is to prove $X \in L^2$ without assuming it. We cannot test the defining identity with $X$ directly, because we do not yet know that $X$ is square-integrable. Instead, we test against bounded truncations of $X$. For each $n\in\mathbb N$, define \begin{align*} Z_n: \Omega &\to \mathbb R \\ \omega &\mapsto X(\omega)\mathbb 1_{\{|X|\le n\}}(\omega). \end{align*} Since $X$ is $\mathcal G$-measurable, the set $\{|X|\le n\}$ belongs to $\mathcal G$, and $Z_n$ is $\mathcal G$-measurable. It is bounded by $n$, so it is an admissible test variable in the defining property of conditional expectation. Therefore \begin{align*} \mathbb E[XZ_n] = \mathbb E[YZ_n]. \end{align*} Substituting the definition of $Z_n$ gives \begin{align*} \mathbb E[X^2\mathbb 1_{\{|X|\le n\}}] = \mathbb E[Y X\mathbb 1_{\{|X|\le n\}}]. \end{align*} Define \begin{align*} a_n := \mathbb E[X^2\mathbb 1_{\{|X|\le n\}}]. \end{align*} We now bound $a_n$ uniformly in $n$. The elementary inequality $2|ab|\le a^2+b^2$, applied with \begin{align*} a = |Y|, \qquad b = |X|\mathbb 1_{\{|X|\le n\}}, \end{align*} gives \begin{align*} |Y|\,|X|\,\mathbb 1_{\{|X|\le n\}} \le \frac{1}{2}Y^2 + \frac{1}{2}X^2\mathbb 1_{\{|X|\le n\}}. \end{align*} Taking expectations yields \begin{align*} a_n &\le \mathbb E[|Y|\,|X|\,\mathbb 1_{\{|X|\le n\}}] \\ &\le \frac{1}{2}\mathbb E[Y^2] + \frac{1}{2}a_n. \end{align*} Hence \begin{align*} a_n \le \mathbb E[Y^2]. \end{align*} The functions $X^2\mathbb 1_{\{|X|\le n\}}$ increase pointwise to $X^2$. Therefore \begin{align*} \mathbb E[X^2] = \sup_{n\in\mathbb N}\mathbb E[X^2\mathbb 1_{\{|X|\le n\}}] \le \mathbb E[Y^2] < \infty. \end{align*} This proves $X\in L^2(\Omega,\mathcal G,\mathbb P)$. [/guided] [/step] [step:Identify the conditional expectation by bounded orthogonality] Let $Z:\Omega\to\mathbb R$ be a bounded $\mathcal G$-measurable random variable. Since $Y\in L^1$, $X\in L^1$, and $Z$ is bounded, both $\mathbb E[YZ]$ and $\mathbb E[XZ]$ are finite. The defining property of conditional expectation gives \begin{align*} \mathbb E[XZ] = \mathbb E[YZ]. \end{align*} Therefore \begin{align*} \mathbb E[(Y-X)Z] = 0. \end{align*} Conversely, suppose $U:\Omega\to\mathbb R$ is a $\mathcal G$-measurable random variable in $L^2(\Omega,\mathcal F,\mathbb P)$ such that \begin{align*} \mathbb E[(Y-U)Z] = 0 \end{align*} for every bounded $\mathcal G$-measurable real-valued random variable $Z$. Taking $Z=\mathbb 1_A$ for an arbitrary set $A\in\mathcal G$ gives \begin{align*} \mathbb E[Y\mathbb 1_A] = \mathbb E[U\mathbb 1_A]. \end{align*} Since $U$ is $\mathcal G$-measurable and integrable, this is precisely the defining identity for $U=\mathbb E[Y\mid\mathcal G]$. Thus the bounded orthogonality condition characterises the conditional expectation. [/step] [step:Prove uniqueness by testing the difference against its truncations] Let $U_1,U_2\in L^2(\Omega,\mathcal F,\mathbb P)$ be $\mathcal G$-measurable random variables satisfying \begin{align*} \mathbb E[(Y-U_i)Z]=0 \end{align*} for every bounded $\mathcal G$-measurable real-valued random variable $Z$ and for $i\in\{1,2\}$. Define \begin{align*} W:\Omega&\to\mathbb R\\ \omega&\mapsto U_1(\omega)-U_2(\omega). \end{align*} Then $W$ is $\mathcal G$-measurable and belongs to $L^2(\Omega,\mathcal F,\mathbb P)$. Subtracting the two orthogonality identities gives \begin{align*} \mathbb E[WZ]=0 \end{align*} for every bounded $\mathcal G$-measurable real-valued random variable $Z$. For each $n\in\mathbb N$, define \begin{align*} T_n:\Omega&\to\mathbb R\\ \omega&\mapsto W(\omega)\mathbb 1_{\{|W|\le n\}}(\omega). \end{align*} Then $T_n$ is bounded and $\mathcal G$-measurable. Hence \begin{align*} 0=\mathbb E[WT_n]=\mathbb E[W^2\mathbb 1_{\{|W|\le n\}}]. \end{align*} Since $W^2\mathbb 1_{\{|W|\le n\}}$ increases pointwise to $W^2$, we get \begin{align*} \mathbb E[W^2]=0. \end{align*} Therefore $W=0$ $\mathbb P$-a.s., so $U_1=U_2$ in $L^2(\Omega,\mathcal F,\mathbb P)$. This proves uniqueness. [/step] [step:Extend orthogonality from bounded tests to all square-integrable forecasts] Let $V\in L^2(\Omega,\mathcal G,\mathbb P)$. Define \begin{align*} V_n:\Omega&\to\mathbb R\\ \omega&\mapsto V(\omega)\mathbb 1_{\{|V|\le n\}}(\omega) \end{align*} for each $n\in\mathbb N$. Each $V_n$ is bounded and $\mathcal G$-measurable, so \begin{align*} \mathbb E[(Y-X)V_n]=0. \end{align*} Moreover, \begin{align*} |(Y-X)(V-V_n)| \le \frac{1}{2}(Y-X)^2 + \frac{1}{2}(V-V_n)^2, \end{align*} and $(V_n)_{n\in\mathbb N}$ converges to $V$ in $L^2(\Omega,\mathcal G,\mathbb P)$ by square-integrable truncation. Therefore \begin{align*} \mathbb E[(Y-X)(V-V_n)]\to 0. \end{align*} It follows that \begin{align*} \mathbb E[(Y-X)V]=0. \end{align*} [guided] We already know orthogonality against bounded $\mathcal G$-measurable test variables. For the minimisation argument, the test variable will be $V-X$, where $V$ is an arbitrary square-integrable $\mathcal G$-measurable forecast. This variable need not be bounded, so we extend the identity by truncation. Let $V\in L^2(\Omega,\mathcal G,\mathbb P)$. For each $n\in\mathbb N$, define \begin{align*} V_n:\Omega&\to\mathbb R\\ \omega&\mapsto V(\omega)\mathbb 1_{\{|V|\le n\}}(\omega). \end{align*} Because $V$ is $\mathcal G$-measurable, the set $\{|V|\le n\}$ lies in $\mathcal G$, and hence $V_n$ is $\mathcal G$-measurable. It is bounded by $n$, so the already proved bounded orthogonality gives \begin{align*} \mathbb E[(Y-X)V_n]=0. \end{align*} We now pass to the limit. The difference $V-V_n$ converges to $0$ in $L^2$ because \begin{align*} (V-V_n)^2 = V^2\mathbb 1_{\{|V|>n\}}, \end{align*} and these non-negative functions decrease pointwise to $0$ while being dominated by the integrable function $V^2$. Also $Y-X\in L^2$ because both $Y$ and $X$ are in $L^2$. The elementary inequality \begin{align*} 2|(Y-X)(V-V_n)|\le (Y-X)^2+(V-V_n)^2 \end{align*} shows that the product has integrable control along the truncation limit, and the usual $L^2$ product estimate gives \begin{align*} \mathbb E[(Y-X)(V-V_n)]\to 0. \end{align*} Therefore \begin{align*} \mathbb E[(Y-X)V] = \lim_{n\to\infty}\mathbb E[(Y-X)V_n] = 0. \end{align*} Thus the orthogonality identity holds for every $V\in L^2(\Omega,\mathcal G,\mathbb P)$, not only for bounded test variables. [/guided] [/step] [step:Use the Pythagorean identity to prove the mean-square minimisation property] Let $V\in L^2(\Omega,\mathcal G,\mathbb P)$. Define \begin{align*} W:\Omega&\to\mathbb R\\ \omega&\mapsto V(\omega)-X(\omega). \end{align*} Then $W\in L^2(\Omega,\mathcal G,\mathbb P)$. By the extended orthogonality just proved, \begin{align*} \mathbb E[(Y-X)W]=0. \end{align*} Since $Y-V=(Y-X)-W$, expanding the square gives \begin{align*} \mathbb E[(Y-V)^2] &= \mathbb E[((Y-X)-W)^2] \\ &= \mathbb E[(Y-X)^2] - 2\mathbb E[(Y-X)W] + \mathbb E[W^2] \\ &= \mathbb E[(Y-X)^2] + \mathbb E[(V-X)^2]. \end{align*} Hence \begin{align*} J(V) \ge J(X). \end{align*} Equality holds if and only if \begin{align*} \mathbb E[(V-X)^2]=0, \end{align*} which is equivalent to $V=X$ $\mathbb P$-a.s. Therefore $X=\mathbb E[Y\mid\mathcal G]$ is the unique minimiser of $J$ over $L^2(\Omega,\mathcal G,\mathbb P)$. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.