Bonferroni Simultaneous Confidence Intervals for Linear Contrasts of a Multivariate Normal Mean

Bonferroni Simultaneous Confidence Intervals for Linear Contrasts of a Multivariate Normal Mean (Theorem # 4028)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] Project the multivariate observations onto each fixed contrast direction $a_j$, reducing the problem to scalar normal samples. For every nonzero contrast, the usual centered and studentized sample mean has a Student $t$ distribution with $n-1$ degrees of freedom, so the chosen Bonferroni quantile gives marginal failure probability $\alpha/k$. Zero contrast vectors are harmless because their target and interval both collapse to $0$. Finally, the union bound converts the $k$ marginal failure bounds into the stated simultaneous coverage bound. [/proofplan] [step:Project each fixed contrast to a scalar normal sample] For each $j \in \{1,\dots,k\}$ and $i \in \{1,\dots,n\}$, define the real-valued random variable \begin{align*} Y_{ij}: (\Omega,\mathcal{F}) &\to (\mathbb{R},\mathcal{B}(\mathbb{R})) \\ \omega &\mapsto a_j^\top X_i(\omega). \end{align*} Since $X_1,\dots,X_n$ are independent and identically distributed as $\mathcal{N}_p(\mu,\Sigma)$ and $a_j$ is deterministic, the random variables $Y_{1j},\dots,Y_{nj}$ are independent and identically distributed with univariate normal distribution \begin{align*} Y_{ij} \sim \mathcal{N}(a_j^\top \mu,\, a_j^\top \Sigma a_j). \end{align*} Define the scalar sample mean and scalar sample variance for the $j$th projected sample by \begin{align*} \bar Y_j := \frac{1}{n}\sum_{i=1}^n Y_{ij}, \qquad V_j := \frac{1}{n-1}\sum_{i=1}^n (Y_{ij}-\bar Y_j)^2 . \end{align*} By the definitions of $\bar X$ and $S$, \begin{align*} \bar Y_j = a_j^\top \bar X \end{align*} and \begin{align*} V_j &= \frac{1}{n-1}\sum_{i=1}^n \left(a_j^\top X_i-a_j^\top \bar X\right)^2 \\ &= \frac{1}{n-1}\sum_{i=1}^n a_j^\top (X_i-\bar X)(X_i-\bar X)^\top a_j \\ &= a_j^\top S a_j . \end{align*} [guided] The goal is to prove a statement about $a_j^\top \bar X$ and $a_j^\top S a_j$. For a fixed $j$, these are exactly the ordinary sample mean and sample variance of the scalar observations obtained by applying the fixed linear functional $x \mapsto a_j^\top x$ to each vector observation. For each $j \in \{1,\dots,k\}$ and $i \in \{1,\dots,n\}$, define \begin{align*} Y_{ij}: (\Omega,\mathcal{F}) &\to (\mathbb{R},\mathcal{B}(\mathbb{R})) \\ \omega &\mapsto a_j^\top X_i(\omega). \end{align*} The map $x \mapsto a_j^\top x$ is deterministic and linear. Therefore applying it to independent random vectors preserves independence, and since the $X_i$ have the same distribution, the $Y_{ij}$ also have the same distribution for fixed $j$. A linear image of a multivariate normal random vector is univariate normal, with mean and variance computed by the usual linear transformation formulas: \begin{align*} Y_{ij} \sim \mathcal{N}(a_j^\top \mu,\, a_j^\top \Sigma a_j). \end{align*} Now define the scalar sample mean and scalar sample variance by \begin{align*} \bar Y_j := \frac{1}{n}\sum_{i=1}^n Y_{ij}, \qquad V_j := \frac{1}{n-1}\sum_{i=1}^n (Y_{ij}-\bar Y_j)^2 . \end{align*} The sample mean is \begin{align*} \bar Y_j = \frac{1}{n}\sum_{i=1}^n a_j^\top X_i = a_j^\top \left(\frac{1}{n}\sum_{i=1}^n X_i\right) = a_j^\top \bar X . \end{align*} For the variance, substitute $Y_{ij}=a_j^\top X_i$ and $\bar Y_j=a_j^\top \bar X$: \begin{align*} V_j &= \frac{1}{n-1}\sum_{i=1}^n \left(a_j^\top X_i-a_j^\top \bar X\right)^2 \\ &= \frac{1}{n-1}\sum_{i=1}^n \left(a_j^\top (X_i-\bar X)\right)^2 \\ &= \frac{1}{n-1}\sum_{i=1}^n a_j^\top (X_i-\bar X)(X_i-\bar X)^\top a_j \\ &= a_j^\top \left[ \frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)(X_i-\bar X)^\top \right] a_j \\ &= a_j^\top S a_j . \end{align*} Thus the displayed interval in the theorem is precisely the usual scalar $t$ interval for the projected observations $Y_{1j},\dots,Y_{nj}$. [/guided] [/step] [step:Obtain the marginal coverage probability for every contrast] Define the Bonferroni critical value \begin{align*} c := t_{n-1;1-\alpha/(2k)} . \end{align*} For each $j \in \{1,\dots,k\}$, define the coverage event \begin{align*} A_j := \left\{ \omega \in \Omega : |a_j^\top \bar X(\omega)-a_j^\top \mu| \leq c\sqrt{\frac{a_j^\top S(\omega)a_j}{n}} \right\}. \end{align*} If $a_j=0$, then $a_j^\top \bar X=0$, $a_j^\top \mu=0$, and $a_j^\top S a_j=0$, so $A_j=\Omega$ and hence \begin{align*} \mathbb{P}(A_j^c)=0 \leq \frac{\alpha}{k}. \end{align*} Now suppose $a_j \neq 0$. Since $\Sigma$ is positive definite, \begin{align*} \sigma_j^2 := a_j^\top \Sigma a_j > 0. \end{align*} By the standard univariate normal theory fact that the studentized mean of an independent normal sample has a Student $t$ distribution with $n-1$ degrees of freedom (citing a result not yet in the wiki: Student $t$ statistic for a normal sample), the statistic \begin{align*} T_j := \frac{\bar Y_j-a_j^\top \mu}{\sqrt{V_j/n}} \end{align*} has the Student $t$ distribution with $n-1$ degrees of freedom. Using $\bar Y_j=a_j^\top \bar X$ and $V_j=a_j^\top S a_j$, the event $A_j$ is exactly $\{|T_j|\leq c\}$. Therefore, by the definition of $c$ as the two-sided $1-\alpha/k$ Student $t$ critical value, \begin{align*} \mathbb{P}(A_j^c) = \mathbb{P}(|T_j|>c) = \frac{\alpha}{k}. \end{align*} Thus for every $j \in \{1,\dots,k\}$, \begin{align*} \mathbb{P}(A_j^c) \leq \frac{\alpha}{k}. \end{align*} [guided] We now prove the marginal confidence statement for one contrast at a time. Define \begin{align*} c := t_{n-1;1-\alpha/(2k)} . \end{align*} For each $j \in \{1,\dots,k\}$, define the event \begin{align*} A_j := \left\{ \omega \in \Omega : |a_j^\top \bar X(\omega)-a_j^\top \mu| \leq c\sqrt{\frac{a_j^\top S(\omega)a_j}{n}} \right\}. \end{align*} This is the event that the $j$th displayed interval contains its target value $a_j^\top \mu$. There is a small degeneracy to handle first. If $a_j=0$, then the contrast is identically zero: \begin{align*} a_j^\top \bar X=0, \qquad a_j^\top \mu=0, \qquad a_j^\top S a_j=0. \end{align*} Hence the interval is $[0,0]$, which contains $0$ for every outcome $\omega \in \Omega$. Thus $A_j=\Omega$, and \begin{align*} \mathbb{P}(A_j^c)=0 \leq \frac{\alpha}{k}. \end{align*} Assume now that $a_j \neq 0$. Since $\Sigma$ is positive definite, the projected variance is strictly positive: \begin{align*} \sigma_j^2 := a_j^\top \Sigma a_j > 0. \end{align*} From the previous step, $Y_{1j},\dots,Y_{nj}$ are independent and identically distributed as $\mathcal{N}(a_j^\top \mu,\sigma_j^2)$. Therefore the usual studentized mean statistic \begin{align*} T_j := \frac{\bar Y_j-a_j^\top \mu}{\sqrt{V_j/n}} \end{align*} has the Student $t$ distribution with $n-1$ degrees of freedom by the standard univariate normal theory result for the Student statistic (citing a result not yet in the wiki: Student $t$ statistic for a normal sample). The hypotheses of that result are satisfied here: the observations are independent, normally distributed, have common finite mean $a_j^\top \mu$, have common positive variance $\sigma_j^2$, and $n\geq 2$. Because $\bar Y_j=a_j^\top \bar X$ and $V_j=a_j^\top S a_j$, the event $A_j$ can be rewritten as \begin{align*} A_j = \{|T_j|\leq c\}. \end{align*} The Student $t$ distribution is symmetric about $0$, and $c=t_{n-1;1-\alpha/(2k)}$ is chosen so that the upper tail probability is $\alpha/(2k)$. Hence the two tails together have probability $\alpha/k$: \begin{align*} \mathbb{P}(A_j^c) = \mathbb{P}(|T_j|>c) = \frac{\alpha}{2k}+\frac{\alpha}{2k} = \frac{\alpha}{k}. \end{align*} Combining this with the zero-contrast case gives, for every $j \in \{1,\dots,k\}$, \begin{align*} \mathbb{P}(A_j^c) \leq \frac{\alpha}{k}. \end{align*} [/guided] [/step] [step:Apply the union bound to pass from marginal to simultaneous coverage] The event that all intervals cover their corresponding targets is \begin{align*} \bigcap_{j=1}^k A_j. \end{align*} Its complement is the event that at least one interval fails: \begin{align*} \left(\bigcap_{j=1}^k A_j\right)^c = \bigcup_{j=1}^k A_j^c. \end{align*} By finite subadditivity of probability, \begin{align*} \mathbb{P}\left(\bigcup_{j=1}^k A_j^c\right) \leq \sum_{j=1}^k \mathbb{P}(A_j^c) \leq \sum_{j=1}^k \frac{\alpha}{k} = \alpha . \end{align*} Therefore \begin{align*} \mathbb{P}\left(\bigcap_{j=1}^k A_j\right) = 1- \mathbb{P}\left(\left(\bigcap_{j=1}^k A_j\right)^c\right) \geq 1-\alpha . \end{align*} This is exactly the asserted simultaneous coverage probability. [guided] The marginal estimates do not require the events $A_1,\dots,A_k$ to be independent. This is precisely why the Bonferroni argument is useful: it only uses finite subadditivity of probability. The event of simultaneous success is \begin{align*} \bigcap_{j=1}^k A_j, \end{align*} because this says that every one of the $k$ intervals contains its corresponding target. Its complement is the event that at least one interval fails. By De Morgan's law, \begin{align*} \left(\bigcap_{j=1}^k A_j\right)^c = \bigcup_{j=1}^k A_j^c. \end{align*} Now apply finite subadditivity of probability to the events $A_1^c,\dots,A_k^c$: \begin{align*} \mathbb{P}\left(\bigcup_{j=1}^k A_j^c\right) \leq \sum_{j=1}^k \mathbb{P}(A_j^c). \end{align*} From the previous step, each failure probability is at most $\alpha/k$, so \begin{align*} \mathbb{P}\left(\bigcup_{j=1}^k A_j^c\right) \leq \sum_{j=1}^k \frac{\alpha}{k} = \alpha . \end{align*} Taking complements gives \begin{align*} \mathbb{P}\left(\bigcap_{j=1}^k A_j\right) = 1- \mathbb{P}\left(\left(\bigcap_{j=1}^k A_j\right)^c\right) \geq 1-\alpha . \end{align*} Since $A_j$ was defined exactly as the event that the $j$th displayed interval contains $a_j^\top \mu$, this proves the claimed simultaneous coverage statement. [/guided] [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Distribution

Explore Further

Distribution Definition Wiener–Khinchin Theorem probability Birkhoff Ergodic Theorem for Sample Means probability Positive Definiteness of the Autocovariance Function probability Wishart Mean and Entrywise Covariance Formula probability Affine Stability of the Multivariate Normal Distribution probability Wilks' Lambda for Testing Zero Cross-Covariance probability Tail Dependence Coefficients of the Bivariate Student $t$ Copula probability Linear Forecast from the Wold Representation probability

What brings you to Androma?

Start with a route through the knowledge graph.