Two-Sample Hotelling $T^2$ Distribution Theorem — Statement & Proof

Two-Sample Hotelling $T^2$ Distribution Theorem (Theorem # 4025)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We reduce the two-sample statistic to the standard normal vector plus independent Wishart matrix form. Under $H_0$, the scaled mean difference is $\mathcal{N}_p(0,\Sigma)$. The pooled scatter matrix has a Wishart distribution with $n_1+n_2-2$ degrees of freedom and is independent of the mean difference. After this reduction, the standard Hotelling-Wishart distributional calculation yields the stated $F$ law. [/proofplan] [step:Scale the sample mean difference to a centered multivariate normal vector] Let \begin{align*} \nu &:= n_1+n_2-2, & a &:= \left(\frac{1}{n_1}+\frac{1}{n_2}\right)^{-1/2}. \end{align*} Since the samples are independent and each sample consists of independent Gaussian random vectors, \begin{align*} \bar{X} &\sim \mathcal{N}_p\left(\mu_X,\frac{1}{n_1}\Sigma\right), & \bar{Y} &\sim \mathcal{N}_p\left(\mu_Y,\frac{1}{n_2}\Sigma\right), \end{align*} and $\bar{X}$ is independent of $\bar{Y}$. Therefore \begin{align*} \bar{X}-\bar{Y} \sim \mathcal{N}_p\left(\mu_X-\mu_Y,\left(\frac{1}{n_1}+\frac{1}{n_2}\right)\Sigma\right). \end{align*} Under $H_0:\mu_X=\mu_Y$, define the scaled mean-difference random vector \begin{align*} Z:\Omega &\to \mathbb{R}^p \\ \omega &\mapsto a\bigl(\bar{X}(\omega)-\bar{Y}(\omega)\bigr). \end{align*} Then \begin{align*} Z \sim \mathcal{N}_p(0,\Sigma). \end{align*} [guided] The first goal is to turn the two-sample mean difference into the same object that appears in the one-sample Hotelling calculation: one centered Gaussian vector with covariance matrix $\Sigma$. Define \begin{align*} \nu &:= n_1+n_2-2, & a &:= \left(\frac{1}{n_1}+\frac{1}{n_2}\right)^{-1/2}. \end{align*} The sample mean $\bar{X}$ is a linear combination of independent $\mathcal{N}_p(\mu_X,\Sigma)$ random vectors, so \begin{align*} \bar{X}\sim \mathcal{N}_p\left(\mu_X,\frac{1}{n_1}\Sigma\right). \end{align*} Similarly, \begin{align*} \bar{Y}\sim \mathcal{N}_p\left(\mu_Y,\frac{1}{n_2}\Sigma\right). \end{align*} Because the entire $X$-sample is independent of the entire $Y$-sample, $\bar{X}$ and $\bar{Y}$ are independent. The difference of independent Gaussian random vectors is Gaussian, with means subtracting and covariance matrices adding: \begin{align*} \bar{X}-\bar{Y} \sim \mathcal{N}_p\left(\mu_X-\mu_Y,\left(\frac{1}{n_1}+\frac{1}{n_2}\right)\Sigma\right). \end{align*} Under $H_0$, the mean term vanishes. Therefore the map \begin{align*} Z:\Omega &\to \mathbb{R}^p \\ \omega &\mapsto a\bigl(\bar{X}(\omega)-\bar{Y}(\omega)\bigr) \end{align*} satisfies \begin{align*} Z \sim \mathcal{N}_p(0,\Sigma). \end{align*} The scalar $a$ was chosen exactly so that the covariance factor $\frac{1}{n_1}+\frac{1}{n_2}$ is removed. [/guided] [/step] [step:Identify the pooled covariance matrix as an independent Wishart estimate] Define the pooled scatter matrix \begin{align*} W := \nu S_p. \end{align*} We use the following convention for [Wishart distribution](/page/Wishart%20Distribution) notation: for a positive definite matrix $\Sigma \in \mathbb{R}^{p \times p}$ and an integer $m \geq 0$, the notation $W_p(\Sigma,m)$ denotes the law of \begin{align*} \sum_{k=1}^{m} G_kG_k^\top, \end{align*} where $G_1,\dots,G_m:\Omega\to\mathbb{R}^p$ are independent random vectors with distribution $\mathcal{N}_p(0,\Sigma)$; when $m=0$, the sum is the zero matrix. By the standard Gaussian sample-mean and residual-scatter decomposition, for the first sample, \begin{align*} \sum_{i=1}^{n_1}(X_i-\bar{X})(X_i-\bar{X})^\top \sim W_p(\Sigma,n_1-1), \end{align*} and this scatter matrix is independent of $\bar{X}$. For the second sample, \begin{align*} \sum_{j=1}^{n_2}(Y_j-\bar{Y})(Y_j-\bar{Y})^\top \sim W_p(\Sigma,n_2-1), \end{align*} and this scatter matrix is independent of $\bar{Y}$. Since the two samples are independent, the two scatter matrices are independent of each other and jointly independent of $(\bar{X},\bar{Y})$. The additivity of independent Wishart matrices with common scale matrix $\Sigma$ gives \begin{align*} W \sim W_p(\Sigma,\nu). \end{align*} Moreover, $W$ is independent of $Z$ because $Z$ is a measurable function of $(\bar{X},\bar{Y})$. Since $n_1+n_2-1>p$, we have $\nu>p-1$, so $W$ and $S_p$ are invertible almost surely. Here we are using the standard Gaussian-Wishart decomposition and Wishart additivity facts (citing results not yet in the wiki: independence of Gaussian sample mean and sample scatter; additivity of independent Wishart matrices). [guided] Now we identify the random matrix that appears in the denominator of $T^2$. Define \begin{align*} W := \nu S_p, \end{align*} where $\nu=n_1+n_2-2$. We use the following convention for [Wishart distribution](/page/Wishart%20Distribution) notation: for a positive definite matrix $\Sigma \in \mathbb{R}^{p \times p}$ and an integer $m \geq 0$, the notation $W_p(\Sigma,m)$ denotes the law of \begin{align*} \sum_{k=1}^{m} G_kG_k^\top, \end{align*} where $G_1,\dots,G_m:\Omega\to\mathbb{R}^p$ are independent random vectors with distribution $\mathcal{N}_p(0,\Sigma)$. If $m=0$, this sum is the zero matrix. This convention covers the cases $n_1=1$ or $n_2=1$, where one residual scatter matrix has zero degrees of freedom. By the definition of $S_p$, \begin{align*} W = \sum_{i=1}^{n_1}(X_i-\bar{X})(X_i-\bar{X})^\top + \sum_{j=1}^{n_2}(Y_j-\bar{Y})(Y_j-\bar{Y})^\top. \end{align*} For a Gaussian sample, the sample mean and residual scatter matrix are independent, and the residual scatter matrix has a Wishart distribution. Applying this to the $X$-sample gives \begin{align*} \sum_{i=1}^{n_1}(X_i-\bar{X})(X_i-\bar{X})^\top \sim W_p(\Sigma,n_1-1), \end{align*} with independence from $\bar{X}$. Applying the same fact to the $Y$-sample gives \begin{align*} \sum_{j=1}^{n_2}(Y_j-\bar{Y})(Y_j-\bar{Y})^\top \sim W_p(\Sigma,n_2-1), \end{align*} with independence from $\bar{Y}$. The two samples are independent, so the two scatter matrices are independent of each other. They are also jointly independent of the pair $(\bar{X},\bar{Y})$. Because $Z$ is defined only from $(\bar{X},\bar{Y})$, this implies that $W$ is independent of $Z$. The sum of independent Wishart matrices with the same scale matrix is Wishart, with degrees of freedom adding. Hence \begin{align*} W \sim W_p(\Sigma,(n_1-1)+(n_2-1)) = W_p(\Sigma,\nu). \end{align*} The condition $n_1+n_2-1>p$ is exactly the condition $\nu>p-1$, which guarantees that a $W_p(\Sigma,\nu)$ matrix is nonsingular almost surely. Therefore $S_p=W/\nu$ is also nonsingular almost surely. This step uses the standard Gaussian-Wishart decomposition and Wishart additivity facts (citing results not yet in the wiki: independence of Gaussian sample mean and sample scatter; additivity of independent Wishart matrices). [/guided] [/step] [step:Rewrite $T^2$ in the one-sample Hotelling form] Since \begin{align*} a^2=\left(\frac{1}{n_1}+\frac{1}{n_2}\right)^{-1} =\frac{n_1n_2}{n_1+n_2}, \end{align*} the definition of $Z$ gives \begin{align*} Z^\top S_p^{-1}Z &= a^2(\bar{X}-\bar{Y})^\top S_p^{-1}(\bar{X}-\bar{Y}) \\ &= \frac{n_1n_2}{n_1+n_2} (\bar{X}-\bar{Y})^\top S_p^{-1}(\bar{X}-\bar{Y}) \\ &= T^2. \end{align*} Since $S_p=W/\nu$, we also have $S_p^{-1}=\nu W^{-1}$ almost surely, and therefore \begin{align*} T^2 = \nu Z^\top W^{-1}Z. \end{align*} Thus $T^2$ has the standard Hotelling-Wishart form with $Z\sim \mathcal{N}_p(0,\Sigma)$, $W\sim W_p(\Sigma,\nu)$, and $Z$ independent of $W$. [/step] [step:Apply the Hotelling-Wishart distribution calculation] The standard Hotelling-Wishart distribution calculation states that if $Z\sim\mathcal{N}_p(0,\Sigma)$, $W\sim W_p(\Sigma,\nu)$, $Z$ and $W$ are independent, $\Sigma$ is positive definite, and $\nu>p-1$, then \begin{align*} \frac{\nu-p+1}{p\nu}\,\nu Z^\top W^{-1}Z \sim F_{p,\nu-p+1}. \end{align*} The hypotheses have been verified: $\Sigma$ is positive definite by assumption, $Z\sim\mathcal{N}_p(0,\Sigma)$, $W\sim W_p(\Sigma,\nu)$, $Z$ and $W$ are independent, and $\nu>p-1$. Substituting $T^2=\nu Z^\top W^{-1}Z$ and $\nu=n_1+n_2-2$ yields \begin{align*} \frac{n_1+n_2-p-1}{p(n_1+n_2-2)}T^2 \sim F_{p,n_1+n_2-p-1}. \end{align*} This is the desired distribution under $H_0$. Here we are using the standard one-sample Hotelling-Wishart distribution calculation (citing a result not yet in the wiki: one-sample Hotelling $T^2$ distribution theorem). [guided] At this point the two-sample problem has been reduced to a standard distributional form. We have a Gaussian vector \begin{align*} Z\sim\mathcal{N}_p(0,\Sigma), \end{align*} an independent Wishart matrix \begin{align*} W\sim W_p(\Sigma,\nu), \end{align*} and the statistic has been rewritten as \begin{align*} T^2=\nu Z^\top W^{-1}Z. \end{align*} The standard Hotelling-Wishart distribution calculation says that whenever $\Sigma$ is positive definite, $Z\sim\mathcal{N}_p(0,\Sigma)$, $W\sim W_p(\Sigma,\nu)$, $Z$ and $W$ are independent, and $\nu>p-1$, then \begin{align*} \frac{\nu-p+1}{p\nu}\,\nu Z^\top W^{-1}Z \sim F_{p,\nu-p+1}. \end{align*} We verify each condition in the present setting. The positive definiteness of $\Sigma$ is an assumption of the theorem. The distribution of $Z$ was proved in the first step. The distribution of $W$ and the independence of $Z$ and $W$ were proved in the second step. Finally, \begin{align*} \nu>p-1 \end{align*} is equivalent to \begin{align*} n_1+n_2-2>p-1, \end{align*} which is exactly the stated hypothesis $n_1+n_2-1>p$. Therefore the Hotelling-Wishart calculation applies. Substituting $T^2=\nu Z^\top W^{-1}Z$ gives \begin{align*} \frac{\nu-p+1}{p\nu}T^2 \sim F_{p,\nu-p+1}. \end{align*} Since $\nu=n_1+n_2-2$, the numerator degrees of freedom remain $p$, and the denominator degrees of freedom become \begin{align*} \nu-p+1 = n_1+n_2-p-1. \end{align*} Thus \begin{align*} \frac{n_1+n_2-p-1}{p(n_1+n_2-2)}T^2 \sim F_{p,n_1+n_2-p-1}. \end{align*} This proves the claimed null distribution. This final step uses the standard one-sample Hotelling-Wishart distribution calculation (citing a result not yet in the wiki: one-sample Hotelling $T^2$ distribution theorem). [/guided] [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Distribution

Explore Further

Distribution Definition Forecast Error Variance of a Causal ARMA Process probability Wishart Principal Block Marginal Theorem probability Nonnegative Definiteness of the Spectral Density Matrix probability Characteristic Function of the Multivariate Normal Distribution probability Gauss-Markov Efficiency of Generalized Least Squares in Seemingly Unrelated Regressions probability Law of the Unconscious Statistician probability Diagonal Marginals of the Wishart Distribution probability Durbin-Levinson Recursion probability

What brings you to Androma?

Start with a route through the knowledge graph.

Two-Sample Hotelling $T^2$ Distribution Theorem (Theorem # 4025)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Two-Sample Hotelling $T^2$ Distribution Theorem (Theorem # 4025)

Discussion

Proof

Prerequisites (0/1 completed)

Prerequisites Graph

Explore Further