Variance Formula for Stratified Sampling — Statement & Proof

Variance Formula for Stratified Sampling (Theorem # 7204)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first isolate each stratum by writing its sample mean as a [random variable](/page/Random%20Variable) $Y_k := \bar{g}_k$. Inside a fixed stratum, the centered variables $g(X_{k,j}) - \mathbb{E}[g(X_{k,1})]$ are independent, square-integrable, and have mean zero, so expanding the square of their average gives $\operatorname{Var}(Y_k) = \sigma_k^2/n_k$. The random variables $Y_1,\dots,Y_K$ are independent because each is a measurable function of an independent block of samples. Finally, we expand the variance of the weighted sum defining $\hat{I}_{\mathrm{strat}}$, use the vanishing of cross terms, and factor out $\mathcal{L}^d(D)^2$. [/proofplan] [step:Define the stratum means as square-integrable random variables] For each $k \in \{1,\dots,K\}$, define the real-valued random variable \begin{align*} Y_k : \Omega \to \mathbb{R}, \qquad Y_k(\omega) := \frac{1}{n_k}\sum_{j=1}^{n_k} g(X_{k,j}(\omega)). \end{align*} Thus $Y_k = \bar{g}_k$. The measurability of $Y_k$ follows because each composition $g \circ X_{k,j} : (\Omega,\mathcal{F}) \to (\mathbb{R},\mathcal{B}(\mathbb{R}))$ is measurable and finite sums of measurable real-valued functions are measurable. Since each $g(X_{k,j})$ belongs to $L^2(\Omega,\mathcal{F},\mathbb{P})$ and $n_k$ is finite, the finite linear combination $Y_k$ also belongs to $L^2(\Omega,\mathcal{F},\mathbb{P})$. Hence $\operatorname{Var}(Y_k)$ is finite for every $k$. For each $k$, define the mean \begin{align*} \mu_k := \mathbb{E}[g(X_{k,1})]. \end{align*} This expectation is finite because $g(X_{k,1}) \in L^2(\Omega,\mathcal{F},\mathbb{P})$, and therefore $g(X_{k,1}) \in L^1(\Omega,\mathcal{F},\mathbb{P})$. [/step] [step:Compute the variance of each stratum sample mean by expanding centered variables] Fix $k \in \{1,\dots,K\}$. For each $j \in \{1,\dots,n_k\}$, define the centered real-valued random variable \begin{align*} Z_{k,j} : \Omega \to \mathbb{R}, \qquad Z_{k,j}(\omega) := g(X_{k,j}(\omega)) - \mu_k \end{align*} The variables $Z_{k,1},\dots,Z_{k,n_k}$ are independent because they are [measurable functions](/page/Measurable%20Functions) of the independent variables $X_{k,1},\dots,X_{k,n_k}$; this is the standard [preservation of independence](/theorems/1116) under measurable maps. They satisfy $\mathbb{E}[Z_{k,j}] = 0$ and \begin{align*} \mathbb{E}[Z_{k,j}^2] = \operatorname{Var}(g(X_{k,j})) = \sigma_k^2 \end{align*} where the last equality uses that $X_{k,j}$ and $X_{k,1}$ have the same distribution on $D_k$. Since \begin{align*} Y_k - \mathbb{E}[Y_k] = \frac{1}{n_k}\sum_{j=1}^{n_k} Z_{k,j} \end{align*} we compute \begin{align*} \operatorname{Var}(Y_k) = \mathbb{E}\left[\left(\frac{1}{n_k}\sum_{j=1}^{n_k} Z_{k,j}\right)^2\right] \end{align*} Expanding the finite square gives \begin{align*} \operatorname{Var}(Y_k) = \frac{1}{n_k^2}\sum_{j=1}^{n_k}\sum_{\ell=1}^{n_k}\mathbb{E}[Z_{k,j}Z_{k,\ell}] \end{align*} If $j \neq \ell$, independence and zero mean give \begin{align*} \mathbb{E}[Z_{k,j}Z_{k,\ell}] = \mathbb{E}[Z_{k,j}]\mathbb{E}[Z_{k,\ell}] = 0 \end{align*} If $j = \ell$, then $\mathbb{E}[Z_{k,j}^2] = \sigma_k^2$. Therefore exactly $n_k$ diagonal terms remain, and \begin{align*} \operatorname{Var}(Y_k) = \frac{1}{n_k^2} n_k \sigma_k^2 = \frac{\sigma_k^2}{n_k} \end{align*} [guided] Fix one stratum index $k \in \{1,\dots,K\}$. The goal in this step is to compute the variance of the average of the samples inside this single stratum. Define, for each $j \in \{1,\dots,n_k\}$, the centered random variable \begin{align*} Z_{k,j} : \Omega \to \mathbb{R}, \qquad Z_{k,j}(\omega) := g(X_{k,j}(\omega)) - \mu_k \end{align*} The centering is useful because variance is the second moment of the centered variable. Since $X_{k,1},\dots,X_{k,n_k}$ are independent and each $Z_{k,j}$ is obtained from $X_{k,j}$ by applying the measurable map $g$ and subtracting the constant $\mu_k$, the variables $Z_{k,1},\dots,Z_{k,n_k}$ are independent. Also, \begin{align*} \mathbb{E}[Z_{k,j}] = \mathbb{E}[g(X_{k,j})] - \mu_k = 0 \end{align*} The equality $\mathbb{E}[g(X_{k,j})] = \mu_k$ holds because all variables $X_{k,j}$ have the same uniform distribution on $D_k$. Similarly, the second centered moment is the common stratum variance: \begin{align*} \mathbb{E}[Z_{k,j}^2] = \operatorname{Var}(g(X_{k,j})) = \operatorname{Var}(g(X_{k,1})) = \sigma_k^2 \end{align*} Now write the centered sample mean in terms of these centered variables: \begin{align*} Y_k - \mathbb{E}[Y_k] = \frac{1}{n_k}\sum_{j=1}^{n_k} Z_{k,j} \end{align*} By the definition of variance, \begin{align*} \operatorname{Var}(Y_k) = \mathbb{E}\left[\left(Y_k - \mathbb{E}[Y_k]\right)^2\right] = \mathbb{E}\left[\left(\frac{1}{n_k}\sum_{j=1}^{n_k} Z_{k,j}\right)^2\right] \end{align*} Because the sum is finite, we may expand the square algebraically before taking expectation: \begin{align*} \operatorname{Var}(Y_k) = \frac{1}{n_k^2}\sum_{j=1}^{n_k}\sum_{\ell=1}^{n_k}\mathbb{E}[Z_{k,j}Z_{k,\ell}] \end{align*} There are two kinds of terms. For off-diagonal terms with $j \neq \ell$, independence gives factorisation of expectation, and the zero means make the term vanish: \begin{align*} \mathbb{E}[Z_{k,j}Z_{k,\ell}] = \mathbb{E}[Z_{k,j}]\mathbb{E}[Z_{k,\ell}] = 0 \end{align*} For diagonal terms with $j = \ell$, the term is exactly the second centered moment: \begin{align*} \mathbb{E}[Z_{k,j}Z_{k,j}] = \mathbb{E}[Z_{k,j}^2] = \sigma_k^2 \end{align*} Thus the double sum has $n_k$ nonzero diagonal contributions, each equal to $\sigma_k^2$. Hence \begin{align*} \operatorname{Var}(Y_k) = \frac{1}{n_k^2} n_k \sigma_k^2 = \frac{\sigma_k^2}{n_k} \end{align*} This is precisely the variance-reduction factor for averaging $n_k$ independent samples in the $k\text{-th}$ stratum. [/guided] [/step] [step:Show the stratum sample means are independent] For each $k \in \{1,\dots,K\}$, define the sub-$\sigma$-algebra \begin{align*} \mathcal{G}_k := \sigma(X_{k,1},\dots,X_{k,n_k}) \subset \mathcal{F}. \end{align*} The hypothesis that all samples are independent implies that the sub-$\sigma$-algebras $\mathcal{G}_1,\dots,\mathcal{G}_K$ are independent. Indeed, for events $A_k \in \mathcal{G}_k$, the defining identity $\mathbb{P}(A_1 \cap \cdots \cap A_K) = \prod_{k=1}^K \mathbb{P}(A_k)$ first holds on cylinder events determined by finitely many coordinates in each block by independence of the full family, and the finite monotone-class theorem extends it to the generated $\sigma$-algebras. Since $Y_k$ is $\mathcal{G}_k$-measurable for each $k$, the random variables $Y_1,\dots,Y_K$ are independent. [/step] [step:Expand the variance of the weighted stratified estimator] Define the real-valued random variable \begin{align*} S : \Omega \to \mathbb{R}, \qquad S(\omega) := \sum_{k=1}^K p_k Y_k(\omega). \end{align*} Then \begin{align*} \hat{I}_{\mathrm{strat}} = \mathcal{L}^d(D) S. \end{align*} Since $\mathcal{L}^d(D)$ is a deterministic finite constant, the variance homogeneity identity $\operatorname{Var}(cV)=c^2\operatorname{Var}(V)$ for $c \in \mathbb{R}$ and $V \in L^2(\Omega,\mathcal{F},\mathbb{P})$ gives \begin{align*} \operatorname{Var}(\hat{I}_{\mathrm{strat}}) = \mathcal{L}^d(D)^2 \operatorname{Var}(S). \end{align*} For each $k$, define the centered weighted random variable \begin{align*} W_k : \Omega \to \mathbb{R}, \qquad W_k(\omega) := p_k\bigl(Y_k(\omega) - \mathbb{E}[Y_k]\bigr). \end{align*} The variables $W_1,\dots,W_K$ are independent, square-integrable, and have mean zero, because the variables $Y_1,\dots,Y_K$ have these properties after centering and deterministic scaling. Moreover, \begin{align*} S - \mathbb{E}[S] = \sum_{k=1}^K W_k. \end{align*} By the definition of variance for a square-integrable real-valued random variable, applied to $S \in L^2(\Omega,\mathcal{F},\mathbb{P})$, \begin{align*} \operatorname{Var}(S) = \mathbb{E}\left[\left(\sum_{k=1}^K W_k\right)^2\right]. \end{align*} Expanding the finite square, \begin{align*} \operatorname{Var}(S) = \sum_{k=1}^K\sum_{\ell=1}^K \mathbb{E}[W_k W_\ell]. \end{align*} For $k \neq \ell$, independence and zero mean imply \begin{align*} \mathbb{E}[W_k W_\ell] = \mathbb{E}[W_k]\mathbb{E}[W_\ell] = 0. \end{align*} For $k = \ell$, \begin{align*} \mathbb{E}[W_k^2] = p_k^2 \operatorname{Var}(Y_k). \end{align*} Consequently, this proves the finite additivity of variance for the independent centered square-integrable summands $W_1,\dots,W_K$ in the present setting, namely \begin{align*} \operatorname{Var}(S) = \sum_{k=1}^K p_k^2 \operatorname{Var}(Y_k). \end{align*} Substituting $\operatorname{Var}(Y_k) = \sigma_k^2/n_k$ from the previous computation yields \begin{align*} \operatorname{Var}(S) = \sum_{k=1}^K \frac{p_k^2\sigma_k^2}{n_k}. \end{align*} Multiplying by $\mathcal{L}^d(D)^2$ gives \begin{align*} \operatorname{Var}(\hat{I}_{\mathrm{strat}}) = \mathcal{L}^d(D)^2 \sum_{k=1}^K \frac{p_k^2\sigma_k^2}{n_k}. \end{align*} This is the desired variance formula. [/step]

Explore Further

Schur Complement Criterion for Positive Semidefinite Block Matrices applied Transitivity of Polynomial-Time Many-One Reductions applied Importance Sampling Identity applied Spectral Theorem for Self-Adjoint Operators applied Characterization of ZPP as RP Intersect coRP applied Stability from Uniform Compact Near-Minimizers applied Karush Kuhn Tucker Theorem For Convex Programmes applied One-Dimensional Stationary Phase applied

What brings you to Androma?

Start with a route through the knowledge graph.

Variance Formula for Stratified Sampling (Theorem # 7204)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Variance Formula for Stratified Sampling (Theorem # 7204)

Discussion

Proof

Explore Further