Ensemble Variance Formula — Statement & Proof

Ensemble Variance Formula (Theorem # 1949)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We expand $\operatorname{Var}(\hat{f}_{\mathrm{rf}}(x))$ using the bilinearity of covariance, separate the diagonal (variance) and off-diagonal (covariance) terms, express the covariances in terms of $\rho$ and $\sigma_T^2$, and rearrange the resulting expression into the stated form. [/proofplan] [step:Expand the variance of the average using bilinearity of covariance] Fix $x \in \mathbb{R}^p$. Write $T_b := \hat{T}^{(b)}(x)$ for brevity. By the bilinearity of covariance applied to the linear combination $\hat{f}_{\mathrm{rf}}(x) = \frac{1}{B}\sum_{b=1}^B T_b$: \begin{align*} \operatorname{Var}\!\left(\frac{1}{B}\sum_{b=1}^B T_b\right) = \frac{1}{B^2} \sum_{b_1=1}^B \sum_{b_2=1}^B \operatorname{Cov}(T_{b_1}, T_{b_2}). \end{align*} This uses $\operatorname{Var}\!\left(\sum_b \alpha_b T_b\right) = \sum_{b_1, b_2} \alpha_{b_1} \alpha_{b_2} \operatorname{Cov}(T_{b_1}, T_{b_2})$ with $\alpha_b = 1/B$ for each $b$. [/step] [step:Separate diagonal and off-diagonal contributions] Split the double sum into the $B$ diagonal terms ($b_1 = b_2$) and the $B(B-1)$ off-diagonal terms ($b_1 \neq b_2$): \begin{align*} \sum_{b_1=1}^B \sum_{b_2=1}^B \operatorname{Cov}(T_{b_1}, T_{b_2}) = \sum_{b=1}^B \operatorname{Var}(T_b) + \sum_{\substack{b_1, b_2 = 1 \\ b_1 \neq b_2}}^B \operatorname{Cov}(T_{b_1}, T_{b_2}). \end{align*} For the diagonal terms: since the $T_b$ are identically distributed with $\operatorname{Var}(T_b) = \sigma_T^2$, the sum of variances is $B \sigma_T^2$. For the off-diagonal terms: by definition, $\operatorname{Corr}(T_{b_1}, T_{b_2}) = \operatorname{Cov}(T_{b_1}, T_{b_2}) / (\sigma_T \cdot \sigma_T)$ for $b_1 \neq b_2$, since all marginal standard deviations equal $\sigma_T$. The hypothesis $\operatorname{Corr}(T_{b_1}, T_{b_2}) = \rho$ for all $b_1 \neq b_2$ gives $\operatorname{Cov}(T_{b_1}, T_{b_2}) = \rho \sigma_T^2$. There are $B(B-1)$ ordered pairs with $b_1 \neq b_2$, so the off-diagonal sum equals $B(B-1) \rho \sigma_T^2$. Substituting: \begin{align*} \operatorname{Var}(\hat{f}_{\mathrm{rf}}(x)) = \frac{1}{B^2}\bigl[B\sigma_T^2 + B(B-1)\rho\sigma_T^2\bigr] = \frac{\sigma_T^2}{B} + \frac{(B-1)\rho\sigma_T^2}{B}. \end{align*} [/step] [step:Rearrange into the stated form] Factor out $\sigma_T^2 / B$: \begin{align*} \frac{\sigma_T^2}{B} + \frac{(B-1)\rho\sigma_T^2}{B} = \frac{\sigma_T^2}{B}\bigl[1 + (B-1)\rho\bigr] = \frac{\sigma_T^2}{B}\bigl[1 - \rho + B\rho\bigr] = \frac{(1-\rho)\sigma_T^2}{B} + \rho\,\sigma_T^2. \end{align*} The second equality uses $1 + (B-1)\rho = 1 + B\rho - \rho = (1 - \rho) + B\rho$. Therefore \begin{align*} \operatorname{Var}(\hat{f}_{\mathrm{rf}}(x)) = \frac{1 - \rho}{B}\,\sigma_T^2 + \rho\,\sigma_T^2, \end{align*} which is the stated decomposition. The first term $\frac{1-\rho}{B}\sigma_T^2$ vanishes as $B \to \infty$ (the benefit of averaging), while the second term $\rho\,\sigma_T^2$ persists regardless of the ensemble size (the cost of correlation between the base learners). [/step]

Explore Further

Symmetrization Bound Machine Learning Bias-Variance Decomposition for Decision Trees Machine Learning Subgradient Calculus Machine Learning Sub-Gaussian Stability Under Linear Combinations Machine Learning Zhang–Bartlett Machine Learning Subdifferential at Points of Differentiability Machine Learning Non-Emptiness of the Subdifferential Machine Learning Hoeffding's Lemma Machine Learning

What brings you to Androma?

Start with a route through the knowledge graph.

Ensemble Variance Formula (Theorem # 1949)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Ensemble Variance Formula (Theorem # 1949)

Discussion

Proof

Explore Further