Rao-Blackwell Variance Reduction Theorem — Statement & Proof

Rao-Blackwell Variance Reduction Theorem (Theorem # 7205)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We set $Z := \mathbb E[Y \mid \mathcal G]$ and use the defining property of [conditional expectation](/page/Conditional%20Expectation). First, testing against the event $\Omega$ gives equality of expectations. Then we prove the $L^2$ orthogonality identity $\mathbb E[(Y-Z)W]=0$ for every square-integrable $\mathcal G$-measurable [random variable](/page/Random%20Variable) $W$, first for bounded $W$ and then by truncation. Applying this with $W := Z-\mathbb E[Y]$ yields the Pythagorean variance decomposition, from which the variance inequality follows by nonnegativity of a square. [/proofplan] [step:Use the defining property of conditional expectation to preserve the mean] Let \begin{align*} Z: (\Omega,\mathcal G) \to (\mathbb R,\mathcal B(\mathbb R)) \end{align*} denote a version of the conditional expectation $\mathbb E[Y \mid \mathcal G]$. By definition, $Z$ is $\mathcal G$-measurable, $Z \in L^1(\Omega,\mathcal G,\mathbb P|_{\mathcal G})$, and for every event $A \in \mathcal G$, \begin{align*} \mathbb E[\mathbb 1_A Z] = \mathbb E[\mathbb 1_A Y]. \end{align*} Since $\Omega \in \mathcal G$, taking $A := \Omega$ gives \begin{align*} \mathbb E[Z] = \mathbb E[Y]. \end{align*} [/step] [step:Show that the conditional expectation is square-integrable] For each $m \in \mathbb N$, define the truncation map \begin{align*} T_m: \mathbb R \to \mathbb R, \qquad t \mapsto \max\{-m,\min\{t,m\}\}. \end{align*} Define the bounded $\mathcal G$-measurable random variable \begin{align*} Z_m: (\Omega,\mathcal G) \to (\mathbb R,\mathcal B(\mathbb R)), \qquad \omega \mapsto T_m(Z(\omega)). \end{align*} Since $Z_m$ is bounded and $\mathcal G$-measurable, the defining property of conditional expectation extends from indicators to bounded $\mathcal G$-measurable random variables by linearity and monotone approximation of nonnegative simple functions. Hence \begin{align*} \mathbb E[Z_m Z] = \mathbb E[Z_m Y]. \end{align*} The product $Z_m Z$ is nonnegative because $Z_m$ has the same sign as $Z$, and \begin{align*} Z_m Z = |Z|\min\{|Z|,m\}. \end{align*} By Cauchy-Schwarz applied to the real-valued random variables $Z_m$ and $Y$, \begin{align*} \mathbb E[Z_m Z] = \mathbb E[Z_m Y] \leq \mathbb E[Z_m^2]^{1/2}\mathbb E[Y^2]^{1/2}. \end{align*} Since $|Z_m| \leq |Z|$ and $Z_m$ has the same sign as $Z$, \begin{align*} Z_m^2 \leq Z_m Z. \end{align*} Therefore \begin{align*} \mathbb E[Z_m Z] \leq \mathbb E[Z_m Z]^{1/2}\mathbb E[Y^2]^{1/2}. \end{align*} If $\mathbb E[Z_m Z] = 0$, the desired bound is immediate for that $m$; otherwise division by $\mathbb E[Z_m Z]^{1/2}$ gives \begin{align*} \mathbb E[Z_m Z] \leq \mathbb E[Y^2]. \end{align*} As $m \to \infty$, the nonnegative random variables $Z_m Z = |Z|\min\{|Z|,m\}$ increase pointwise to $Z^2$. By the [monotone convergence theorem](/theorems/509), \begin{align*} \mathbb E[Z^2] = \lim_{m \to \infty}\mathbb E[Z_m Z] \leq \mathbb E[Y^2] < \infty. \end{align*} Thus $Z \in L^2(\Omega,\mathcal G,\mathbb P|_{\mathcal G})$. [guided] We need $Z$ to be square-integrable before it is legitimate to discuss $\operatorname{Var}(Z)$ as a finite quantity. Conditional expectation is initially defined only as an $L^1$ object, so this step proves the extra $L^2$ bound from the assumption $Y \in L^2$. For each $m \in \mathbb N$, define the truncation map \begin{align*} T_m: \mathbb R \to \mathbb R, \qquad t \mapsto \max\{-m,\min\{t,m\}\}. \end{align*} Now define \begin{align*} Z_m: (\Omega,\mathcal G) \to (\mathbb R,\mathcal B(\mathbb R)), \qquad \omega \mapsto T_m(Z(\omega)). \end{align*} The random variable $Z_m$ is bounded and $\mathcal G$-measurable. The defining property of conditional expectation says that integration against $Z$ and against $Y$ agree on indicators of sets in $\mathcal G$. By linearity this holds for simple $\mathcal G$-[measurable functions](/page/Measurable%20Functions), and by monotone approximation it holds for bounded $\mathcal G$-measurable functions. Applying this extension with the [test function](/page/Test%20Function) $Z_m$ gives \begin{align*} \mathbb E[Z_m Z] = \mathbb E[Z_m Y]. \end{align*} The reason for choosing $Z_m$ is that it has the same sign as $Z$, so the product $Z_mZ$ is nonnegative. More precisely, \begin{align*} Z_m Z = |Z|\min\{|Z|,m\}. \end{align*} This expression increases pointwise to $Z^2$ as $m \to \infty$. We now bound its expectation uniformly in $m$. By the [Cauchy-Schwarz inequality](/theorems/432) applied to the real-valued square-integrable random variables $Z_m$ and $Y$, \begin{align*} \mathbb E[Z_m Z] = \mathbb E[Z_m Y] \leq \mathbb E[Z_m^2]^{1/2}\mathbb E[Y^2]^{1/2}. \end{align*} Because $Z_m$ is a truncation of $Z$ with the same sign, we have $Z_m^2 \leq Z_mZ$. Hence \begin{align*} \mathbb E[Z_m Z] \leq \mathbb E[Z_m Z]^{1/2}\mathbb E[Y^2]^{1/2}. \end{align*} If $\mathbb E[Z_mZ]=0$, this already gives $\mathbb E[Z_mZ]\leq \mathbb E[Y^2]$. If $\mathbb E[Z_mZ]>0$, divide by $\mathbb E[Z_mZ]^{1/2}$ to obtain the same conclusion: \begin{align*} \mathbb E[Z_m Z] \leq \mathbb E[Y^2]. \end{align*} Finally, since $Z_mZ$ increases pointwise to $Z^2$, the monotone convergence theorem gives \begin{align*} \mathbb E[Z^2] = \lim_{m \to \infty}\mathbb E[Z_m Z] \leq \mathbb E[Y^2] < \infty. \end{align*} Therefore $Z \in L^2(\Omega,\mathcal G,\mathbb P|_{\mathcal G})$. [/guided] [/step] [step:Prove the orthogonality of the residual to square-integrable $\mathcal G$-measurable variables] Let \begin{align*} W: (\Omega,\mathcal G) \to (\mathbb R,\mathcal B(\mathbb R)) \end{align*} be any random variable in $L^2(\Omega,\mathcal G,\mathbb P|_{\mathcal G})$. For each $m \in \mathbb N$, define \begin{align*} W_m: (\Omega,\mathcal G) \to (\mathbb R,\mathcal B(\mathbb R)), \qquad \omega \mapsto T_m(W(\omega)), \end{align*} where $T_m$ is the truncation map from the previous step. Since $W_m$ is bounded and $\mathcal G$-measurable, the defining property of conditional expectation gives \begin{align*} \mathbb E[W_mY] = \mathbb E[W_mZ]. \end{align*} Because $W_m \to W$ pointwise and $|W_m| \leq |W|$, Cauchy-Schwarz gives \begin{align*} \mathbb E[|W_mY-WY|] \leq \mathbb E[(W_m-W)^2]^{1/2}\mathbb E[Y^2]^{1/2} \to 0. \end{align*} Similarly, since $Z \in L^2$, \begin{align*} \mathbb E[|W_mZ-WZ|] \leq \mathbb E[(W_m-W)^2]^{1/2}\mathbb E[Z^2]^{1/2} \to 0. \end{align*} Passing to the limit in $\mathbb E[W_mY] = \mathbb E[W_mZ]$ yields \begin{align*} \mathbb E[WY] = \mathbb E[WZ]. \end{align*} Equivalently, \begin{align*} \mathbb E[W(Y-Z)] = 0. \end{align*} [/step] [step:Apply orthogonality to decompose the variance] Define the constant \begin{align*} \mu := \mathbb E[Y]. \end{align*} By the first step, $\mu = \mathbb E[Z]$. Define the centered random variables \begin{align*} Y_0: (\Omega,\mathcal F) \to (\mathbb R,\mathcal B(\mathbb R)), \qquad \omega \mapsto Y(\omega)-\mu \end{align*} and \begin{align*} Z_0: (\Omega,\mathcal G) \to (\mathbb R,\mathcal B(\mathbb R)), \qquad \omega \mapsto Z(\omega)-\mu. \end{align*} Since $Z_0$ is $\mathcal G$-measurable and belongs to $L^2(\Omega,\mathcal G,\mathbb P|_{\mathcal G})$, the orthogonality result with $W := Z_0$ gives \begin{align*} \mathbb E[Z_0(Y-Z)] = 0. \end{align*} Now $Y-\mu = (Z-\mu)+(Y-Z)=Z_0+(Y-Z)$. Expanding the square and using the preceding orthogonality relation, \begin{align*} \operatorname{Var}(Y) = \mathbb E[(Y-\mu)^2]. \end{align*} \begin{align*} \mathbb E[(Y-\mu)^2] = \mathbb E[Z_0^2] + 2\mathbb E[Z_0(Y-Z)] + \mathbb E[(Y-Z)^2]. \end{align*} \begin{align*} \mathbb E[(Y-\mu)^2] = \mathbb E[Z_0^2] + \mathbb E[(Y-Z)^2]. \end{align*} Since $\mu=\mathbb E[Z]$, \begin{align*} \mathbb E[Z_0^2] = \operatorname{Var}(Z). \end{align*} Therefore \begin{align*} \operatorname{Var}(Y) = \operatorname{Var}(Z) + \mathbb E[(Y-Z)^2]. \end{align*} [/step] [step:Conclude that conditioning cannot increase variance] The random variable $(Y-Z)^2$ is nonnegative, so its expectation is nonnegative: \begin{align*} \mathbb E[(Y-Z)^2] \geq 0. \end{align*} Using the variance decomposition from the previous step, \begin{align*} \operatorname{Var}(Y) = \operatorname{Var}(Z) + \mathbb E[(Y-Z)^2] \geq \operatorname{Var}(Z). \end{align*} Since $Z=\mathbb E[Y\mid\mathcal G]$, this is exactly \begin{align*} \operatorname{Var}(\mathbb E[Y\mid\mathcal G]) \leq \operatorname{Var}(Y). \end{align*} Together with $\mathbb E[\mathbb E[Y\mid\mathcal G]]=\mathbb E[Y]$, this proves the theorem. [/step]

Explore Further

Equivalence of Nondeterministic Polynomial Time and Polynomial-Time Verification applied Canonical Commutation Relation on the Schwartz Space applied Configuration Count Bound for Space-Bounded Turing Machines applied Zero-Variance Importance Sampling Proposal applied Lagrange-d'Alembert Equations in Constraint Form applied Ladder Operator Commutation Relations for Angular Momentum applied Angular Momentum Decomposition of $L^2(\mathbb{R}^3)$ applied Characterization of ZPP as RP Intersect coRP applied

What brings you to Androma?

Start with a route through the knowledge graph.

Rao-Blackwell Variance Reduction Theorem (Theorem # 7205)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Rao-Blackwell Variance Reduction Theorem (Theorem # 7205)

Discussion

Proof

Explore Further