Noncentral F Distribution of Hotelling's $T^2$ Statistic

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first whiten and recenter the observations so that the problem becomes the corresponding statement for independent Gaussian vectors with identity covariance and common mean $\delta=\Sigma^{-1/2}(\mu-\mu_0)$. An orthogonal change of coordinates in the sample index separates the transformed sample mean from the transformed residuals, giving an independent noncentral Gaussian vector and a central Wishart matrix. The remaining distributional reduction is a block decomposition of a Wishart matrix: for $Z\sim \mathcal N_p(\theta,I_p)$ independent of $W\sim W_p(I_p,n-1)$, the quadratic form $Z^\top W^{-1}Z$ has the same law as $\chi^2_p(|\theta|^2)/\chi^2_{n-p}$ with independent numerator and denominator. This is exactly the defining representation of the noncentral $F$ distribution after multiplying by $(n-p)/p$. [/proofplan] [step:Whiten the observations and rewrite Hotelling's statistic] Let $\Sigma^{1/2}$ denote the unique symmetric positive definite square root of $\Sigma$, and let $\Sigma^{-1/2}$ denote its inverse. Define \begin{align*} \delta &= \Sigma^{-1/2}(\mu-\mu_0) \in \mathbb{R}^p, \end{align*} and define transformed random vectors \begin{align*} Y_i : \Omega &\to \mathbb{R}^p \\ \omega &\mapsto \Sigma^{-1/2}(X_i(\omega)-\mu_0) \end{align*} for $i=1,\dots,n$. Since affine transformations of multivariate Gaussian random vectors are Gaussian, the random vectors $Y_1,\dots,Y_n$ are independent with common distribution $\mathcal N_p(\delta,I_p)$. Define \begin{align*} \bar Y &= \frac{1}{n}\sum_{i=1}^n Y_i, \\ S_Y &= \frac{1}{n-1}\sum_{i=1}^n (Y_i-\bar Y)(Y_i-\bar Y)^\top. \end{align*} Then \begin{align*} \bar Y = \Sigma^{-1/2}(\bar X-\mu_0), \end{align*} and \begin{align*} S_Y = \Sigma^{-1/2}S\Sigma^{-1/2}. \end{align*} Since $S_Y^{-1}=\Sigma^{1/2}S^{-1}\Sigma^{1/2}$ whenever $S$ is invertible, Hotelling's statistic may be written as \begin{align*} T^2 &= n(\bar X-\mu_0)^\top S^{-1}(\bar X-\mu_0) \\ &= n\bar Y^\top S_Y^{-1}\bar Y. \end{align*} Thus it suffices to determine the distribution of $n\bar Y^\top S_Y^{-1}\bar Y$ in the standardized model. [guided] The first move is to remove the covariance matrix from the problem. Because $\Sigma$ is symmetric positive definite, it has a unique symmetric positive definite square root $\Sigma^{1/2}$, and $\Sigma^{-1/2}$ is well-defined. We introduce the vector \begin{align*} \delta = \Sigma^{-1/2}(\mu-\mu_0), \end{align*} which is the mean shift measured in covariance-standardized coordinates. For each observation, define the whitened random vector \begin{align*} Y_i : \Omega &\to \mathbb{R}^p \\ \omega &\mapsto \Sigma^{-1/2}(X_i(\omega)-\mu_0). \end{align*} The transformation $x\mapsto \Sigma^{-1/2}(x-\mu_0)$ is affine. Therefore each $Y_i$ is Gaussian with mean \begin{align*} \mathbb{E}[Y_i] = \Sigma^{-1/2}(\mu-\mu_0)=\delta \end{align*} and covariance \begin{align*} \Sigma^{-1/2}\Sigma(\Sigma^{-1/2})^\top = I_p, \end{align*} because $\Sigma^{-1/2}$ is symmetric. Independence is preserved because each $Y_i$ is a measurable function of $X_i$ alone. Hence $Y_1,\dots,Y_n$ are independent with common distribution $\mathcal N_p(\delta,I_p)$. Now define \begin{align*} \bar Y &= \frac{1}{n}\sum_{i=1}^n Y_i, \\ S_Y &= \frac{1}{n-1}\sum_{i=1}^n (Y_i-\bar Y)(Y_i-\bar Y)^\top. \end{align*} From the definition of $Y_i$, \begin{align*} \bar Y=\Sigma^{-1/2}(\bar X-\mu_0). \end{align*} Also, \begin{align*} Y_i-\bar Y=\Sigma^{-1/2}(X_i-\bar X), \end{align*} so \begin{align*} S_Y &= \frac{1}{n-1}\sum_{i=1}^n \Sigma^{-1/2}(X_i-\bar X)(X_i-\bar X)^\top \Sigma^{-1/2} \\ &= \Sigma^{-1/2}S\Sigma^{-1/2}. \end{align*} Therefore, on the almost sure event where $S$ is invertible, \begin{align*} S_Y^{-1}=\Sigma^{1/2}S^{-1}\Sigma^{1/2}. \end{align*} Substituting these identities into Hotelling's statistic gives \begin{align*} T^2 &= n(\bar X-\mu_0)^\top S^{-1}(\bar X-\mu_0) \\ &= n\bar Y^\top S_Y^{-1}\bar Y. \end{align*} Thus the original statistic is unchanged by whitening, provided we express it using the whitened sample covariance matrix. The noncentrality parameter also becomes \begin{align*} \lambda &= n(\mu-\mu_0)^\top \Sigma^{-1}(\mu-\mu_0) \\ &= n|\delta|^2. \end{align*} [/guided] [/step] [step:Separate the sample mean from the residual directions] Choose an orthogonal matrix $Q\in \mathbb{R}^{n\times n}$ whose first row is \begin{align*} \left(\frac{1}{\sqrt n},\dots,\frac{1}{\sqrt n}\right). \end{align*} Define random vectors $Z_1,\dots,Z_n : \Omega \to \mathbb{R}^p$ by \begin{align*} Z_a = \sum_{i=1}^n Q_{ai}Y_i \end{align*} for $a=1,\dots,n$. Since $(Y_1,\dots,Y_n)$ is jointly Gaussian with independent components across the sample index and common covariance $I_p$, orthogonality of $Q$ implies that $Z_1,\dots,Z_n$ are independent Gaussian random vectors. Their means are \begin{align*} \mathbb{E}[Z_1] &= \sqrt n\,\delta, \\ \mathbb{E}[Z_a] &= 0 \qquad \text{for } a=2,\dots,n, \end{align*} and each has covariance matrix $I_p$. Hence \begin{align*} Z_1 \sim \mathcal N_p(\sqrt n\,\delta,I_p), \end{align*} while $Z_2,\dots,Z_n$ are independent $\mathcal N_p(0,I_p)$ random vectors, independent of $Z_1$. Moreover, \begin{align*} Z_1=\sqrt n\,\bar Y, \end{align*} and orthogonality of $Q$ gives the residual sum-of-squares identity \begin{align*} \sum_{i=1}^n (Y_i-\bar Y)(Y_i-\bar Y)^\top = \sum_{a=2}^n Z_aZ_a^\top. \end{align*} Define \begin{align*} W = \sum_{a=2}^n Z_aZ_a^\top. \end{align*} Then $W$ is independent of $Z_1$, has the central Wishart distribution $W_p(I_p,n-1)$, and \begin{align*} S_Y = \frac{1}{n-1}W. \end{align*} Consequently, \begin{align*} T^2 = n\bar Y^\top S_Y^{-1}\bar Y = (n-1)Z_1^\top W^{-1}Z_1. \end{align*} [guided] The purpose of this step is to isolate the part of the data containing the sample mean from the part containing the sample covariance. We do this by rotating the $n$ observations in the sample-index direction. Choose an orthogonal matrix $Q\in \mathbb{R}^{n\times n}$ whose first row is \begin{align*} \left(\frac{1}{\sqrt n},\dots,\frac{1}{\sqrt n}\right). \end{align*} For each $a=1,\dots,n$, define \begin{align*} Z_a=\sum_{i=1}^n Q_{ai}Y_i. \end{align*} This is a linear transformation of the jointly Gaussian vector $(Y_1,\dots,Y_n)$, so $(Z_1,\dots,Z_n)$ is jointly Gaussian. Orthogonality of $Q$ preserves the covariance in the sample-index direction. More explicitly, for $a,b\in\{1,\dots,n\}$, \begin{align*} \operatorname{Cov}(Z_a,Z_b) &= \sum_{i=1}^n\sum_{j=1}^n Q_{ai}Q_{bj}\operatorname{Cov}(Y_i,Y_j) \\ &= \sum_{i=1}^n Q_{ai}Q_{bi} I_p \\ &= \delta_{ab}I_p, \end{align*} where $\delta_{ab}$ denotes the Kronecker delta. Since the vector is jointly Gaussian, zero covariance between distinct $Z_a$ and $Z_b$ implies independence. The means are computed from $\mathbb{E}[Y_i]=\delta$: \begin{align*} \mathbb{E}[Z_a] &= \sum_{i=1}^n Q_{ai}\delta \\ &= \left(\sum_{i=1}^n Q_{ai}\right)\delta. \end{align*} For $a=1$, the row sum is $\sqrt n$, so $\mathbb{E}[Z_1]=\sqrt n\,\delta$. For $a\ge 2$, the row of $Q$ is orthogonal to the first row, hence its entries sum to $0$, so $\mathbb{E}[Z_a]=0$. Therefore \begin{align*} Z_1\sim \mathcal N_p(\sqrt n\,\delta,I_p), \end{align*} and $Z_2,\dots,Z_n$ are independent central $\mathcal N_p(0,I_p)$ random vectors, independent of $Z_1$. The first transformed vector is the scaled sample mean: \begin{align*} Z_1=\sum_{i=1}^n \frac{1}{\sqrt n}Y_i=\sqrt n\,\bar Y. \end{align*} The remaining transformed vectors span the orthogonal complement of the constant sample direction, so they contain exactly the centered residual information. Orthogonality of $Q$ gives \begin{align*} \sum_{i=1}^n Y_iY_i^\top = \sum_{a=1}^n Z_aZ_a^\top. \end{align*} Since $Z_1Z_1^\top=n\bar Y\bar Y^\top$, subtracting this rank-one mean part gives \begin{align*} \sum_{i=1}^n (Y_i-\bar Y)(Y_i-\bar Y)^\top = \sum_{a=2}^n Z_aZ_a^\top. \end{align*} Define \begin{align*} W=\sum_{a=2}^n Z_aZ_a^\top. \end{align*} Because $Z_2,\dots,Z_n$ are independent $\mathcal N_p(0,I_p)$ random vectors, $W$ has the central Wishart distribution $W_p(I_p,n-1)$. It is independent of $Z_1$ because it is a measurable function of $Z_2,\dots,Z_n$, which are independent of $Z_1$. Finally, \begin{align*} S_Y=\frac{1}{n-1}W, \end{align*} and therefore \begin{align*} T^2 &= n\bar Y^\top S_Y^{-1}\bar Y \\ &= Z_1^\top \left(\frac{1}{n-1}W\right)^{-1}Z_1 \\ &= (n-1)Z_1^\top W^{-1}Z_1. \end{align*} This reduces the theorem to a distributional statement about one noncentral Gaussian vector independent of one central Wishart matrix. [/guided] [/step] [step:Reduce the inverse Wishart quadratic form to independent chi-square variables] Let $m=n-1$ and let $\nu=m-p+1=n-p$. We prove the following distributional fact in the present setting: if $Z:\Omega\to\mathbb{R}^p$ satisfies $Z\sim\mathcal N_p(\theta,I_p)$, if $W\sim W_p(I_p,m)$ is independent of $Z$, and if $m\ge p$, then \begin{align*} Z^\top W^{-1}Z \sim \frac{U}{V}, \end{align*} where $U\sim\chi^2_p(|\theta|^2)$, $V\sim\chi^2_{\nu}$, and $U,V$ are independent. Condition on the value $Z=z\in\mathbb{R}^p$. If $z=0$, then $z^\top W^{-1}z=0$. If $z\neq 0$, choose an orthogonal matrix $R_z\in\mathbb{R}^{p\times p}$ such that $R_zz=|z|e_1$, where $e_1=(1,0,\dots,0)^\top$. Since $W\sim W_p(I_p,m)$ is orthogonally invariant, $R_zWR_z^\top$ has the same distribution as $W$. Therefore the conditional distribution of $z^\top W^{-1}z$ equals the distribution of \begin{align*} |z|^2 e_1^\top W^{-1}e_1. \end{align*} Represent $W$ as \begin{align*} W=\sum_{r=1}^m G_rG_r^\top, \end{align*} where $G_1,\dots,G_m:\Omega\to\mathbb{R}^p$ are independent $\mathcal N_p(0,I_p)$ random vectors. Let $A\in\mathbb{R}^{m\times p}$ be the random matrix whose $r$-th row is $G_r^\top$, so $W=A^\top A$. Write $A=(a,B)$, where $a:\Omega\to\mathbb{R}^m$ is the first column and $B:\Omega\to\mathbb{R}^{m\times(p-1)}$ is the matrix of the remaining columns. Then $a$ is a standard Gaussian vector in $\mathbb{R}^m$, $B$ is independent of $a$, and almost surely $B$ has rank $p-1$. The Schur complement formula for the inverse of $W=A^\top A$ gives \begin{align*} e_1^\top W^{-1}e_1 = \frac{1}{a^\top (I_m-P_B)a}, \end{align*} where $I_m$ is the identity matrix on $\mathbb{R}^m$ and $P_B:\mathbb{R}^m\to\mathbb{R}^m$ is the [orthogonal projection](/theorems/437) onto the column space of $B$. Conditional on $B$, the matrix $I_m-P_B$ is the orthogonal projection onto a subspace of dimension \begin{align*} m-(p-1)=m-p+1=\nu. \end{align*} Since $a\sim\mathcal N_m(0,I_m)$ and $a$ is independent of $B$, the quadratic form \begin{align*} a^\top(I_m-P_B)a \end{align*} has the $\chi^2_{\nu}$ distribution, conditionally on $B$ and hence unconditionally. Denote this random variable by $V$. Its conditional distribution does not depend on $z$, and $W$ is independent of $Z$, so $V$ may be chosen independent of $Z$. Thus, conditionally on $Z=z$, \begin{align*} Z^\top W^{-1}Z \sim \frac{|z|^2}{V}, \end{align*} with $V\sim\chi^2_{\nu}$ independent of $Z$. Since $Z\sim\mathcal N_p(\theta,I_p)$, the squared norm \begin{align*} U=|Z|^2 \end{align*} has the noncentral chi-square distribution $\chi^2_p(|\theta|^2)$. Hence \begin{align*} Z^\top W^{-1}Z \sim \frac{U}{V}, \end{align*} with $U$ and $V$ independent. [guided] We now need the exact distribution of the quadratic form $Z^\top W^{-1}Z$, where $Z$ is Gaussian and $W$ is an independent central Wishart matrix. The key point is that, after conditioning on $Z=z$, the Wishart matrix is rotationally symmetric. This lets us rotate $z$ onto the first coordinate axis and compute only the upper-left entry of $W^{-1}$. Let $m=n-1$ and $\nu=m-p+1=n-p$. Consider the general situation where \begin{align*} Z\sim\mathcal N_p(\theta,I_p), \end{align*} where $\theta\in\mathbb{R}^p$, and where $W\sim W_p(I_p,m)$ is independent of $Z$. Since $m=n-1\ge p$, the Wishart matrix $W$ is invertible almost surely. Fix $z\in\mathbb{R}^p$ and condition on the event $Z=z$ in the regular conditional distribution sense. If $z=0$, then $z^\top W^{-1}z=0$, so there is nothing to rotate. If $z\neq 0$, choose an orthogonal matrix $R_z\in\mathbb{R}^{p\times p}$ satisfying \begin{align*} R_zz=|z|e_1, \end{align*} where $e_1=(1,0,\dots,0)^\top$. The central Wishart distribution with scale matrix $I_p$ is invariant under orthogonal conjugation: $R_zWR_z^\top$ has the same distribution as $W$. Therefore \begin{align*} z^\top W^{-1}z &= (R_zz)^\top (R_zWR_z^\top)^{-1}(R_zz) \\ &\sim |z|^2 e_1^\top W^{-1}e_1 \end{align*} conditionally on $Z=z$. It remains to compute $e_1^\top W^{-1}e_1$. Represent the Wishart matrix as \begin{align*} W=\sum_{r=1}^m G_rG_r^\top, \end{align*} where $G_1,\dots,G_m$ are independent $\mathcal N_p(0,I_p)$ random vectors. Equivalently, if $A\in\mathbb{R}^{m\times p}$ is the matrix whose $r$-th row is $G_r^\top$, then \begin{align*} W=A^\top A. \end{align*} Write the columns of $A$ as \begin{align*} A=(a,B), \end{align*} where $a:\Omega\to\mathbb{R}^m$ is the first column and $B:\Omega\to\mathbb{R}^{m\times(p-1)}$ contains the remaining columns. The entries of $A$ are independent standard normal random variables, so $a$ is a standard Gaussian vector in $\mathbb{R}^m$, $B$ is independent of $a$, and $B$ has rank $p-1$ almost surely because $m\ge p$. With respect to this block decomposition, \begin{align*} W=A^\top A = \begin{pmatrix} a^\top a & a^\top B \\ B^\top a & B^\top B \end{pmatrix}. \end{align*} The Schur complement formula says that the upper-left entry of $W^{-1}$ is the reciprocal of the Schur complement of $B^\top B$: \begin{align*} e_1^\top W^{-1}e_1 = \frac{1}{a^\top a-a^\top B(B^\top B)^{-1}B^\top a}. \end{align*} Let $P_B:\mathbb{R}^m\to\mathbb{R}^m$ be the orthogonal projection onto the column space of $B$. Since \begin{align*} P_B=B(B^\top B)^{-1}B^\top \end{align*} almost surely, the denominator becomes \begin{align*} a^\top(I_m-P_B)a. \end{align*} Conditional on $B$, the operator $I_m-P_B$ is the orthogonal projection onto the orthogonal complement of the column space of $B$. Since $B$ has rank $p-1$ almost surely, this orthogonal complement has dimension \begin{align*} m-(p-1)=m-p+1=\nu. \end{align*} Because $a\sim\mathcal N_m(0,I_m)$ and is independent of $B$, the squared Euclidean length of the projection of $a$ onto this $\nu$-dimensional subspace has the $\chi^2_\nu$ distribution. Thus \begin{align*} a^\top(I_m-P_B)a\sim\chi^2_\nu. \end{align*} Call this denominator $V$. The conditional distribution of $V$ does not depend on $z$, and the entire Wishart construction is independent of $Z$. Therefore $V$ is independent of $Z$. We have shown that, conditionally on $Z=z$, \begin{align*} Z^\top W^{-1}Z \sim \frac{|z|^2}{V}, \end{align*} where $V\sim\chi^2_\nu$ is independent of $Z$. Finally, since $Z\sim\mathcal N_p(\theta,I_p)$, its squared norm \begin{align*} U=|Z|^2 \end{align*} has the noncentral chi-square distribution $\chi^2_p(|\theta|^2)$. Hence \begin{align*} Z^\top W^{-1}Z \sim \frac{U}{V}, \end{align*} where $U\sim\chi^2_p(|\theta|^2)$, $V\sim\chi^2_\nu$, and $U,V$ are independent. [/guided] [/step] [step:Identify the noncentral F distribution] Apply the previous step with \begin{align*} Z=Z_1,\qquad \theta=\sqrt n\,\delta,\qquad m=n-1,\qquad \nu=n-p. \end{align*} Then \begin{align*} |\theta|^2 = n|\delta|^2 = n(\mu-\mu_0)^\top\Sigma^{-1}(\mu-\mu_0) = \lambda. \end{align*} Therefore there exist independent random variables \begin{align*} U &\sim \chi^2_p(\lambda), \\ V &\sim \chi^2_{n-p} \end{align*} such that \begin{align*} Z_1^\top W^{-1}Z_1 \sim \frac{U}{V}. \end{align*} Using $T^2=(n-1)Z_1^\top W^{-1}Z_1$, we obtain \begin{align*} \frac{n-p}{p(n-1)}T^2 &= \frac{n-p}{p}Z_1^\top W^{-1}Z_1 \\ &\sim \frac{U/p}{V/(n-p)}. \end{align*} By definition, the distribution of $(U/p)/(V/(n-p))$, with $U\sim\chi^2_p(\lambda)$ independent of $V\sim\chi^2_{n-p}$, is the noncentral $F$ distribution $F_{p,n-p}(\lambda)$. Hence \begin{align*} \frac{n-p}{p(n-1)}T^2 \sim F_{p,n-p}(\lambda). \end{align*} This proves the theorem. [guided] We now substitute the objects obtained from the data into the distributional reduction. In our case, \begin{align*} Z=Z_1,\qquad \theta=\sqrt n\,\delta,\qquad m=n-1. \end{align*} The denominator degrees of freedom from the previous step are \begin{align*} \nu=m-p+1=(n-1)-p+1=n-p. \end{align*} The noncentrality parameter of $|Z_1|^2$ is the squared length of its mean: \begin{align*} |\theta|^2 &= |\sqrt n\,\delta|^2 \\ &= n|\delta|^2 \\ &= n(\mu-\mu_0)^\top\Sigma^{-1}(\mu-\mu_0) \\ &= \lambda. \end{align*} Therefore the previous step gives independent random variables \begin{align*} U &\sim \chi^2_p(\lambda), \\ V &\sim \chi^2_{n-p} \end{align*} such that \begin{align*} Z_1^\top W^{-1}Z_1 \sim \frac{U}{V}. \end{align*} From the sample-mean and residual decomposition, we already proved \begin{align*} T^2=(n-1)Z_1^\top W^{-1}Z_1. \end{align*} Multiplying by the normalizing factor in the theorem gives \begin{align*} \frac{n-p}{p(n-1)}T^2 &= \frac{n-p}{p}Z_1^\top W^{-1}Z_1 \\ &\sim \frac{n-p}{p}\frac{U}{V} \\ &= \frac{U/p}{V/(n-p)}. \end{align*} The defining representation of the noncentral $F$ distribution with numerator degrees of freedom $p$, denominator degrees of freedom $n-p$, and noncentrality parameter $\lambda$ is precisely \begin{align*} \frac{U/p}{V/(n-p)}, \end{align*} where $U\sim\chi^2_p(\lambda)$ and $V\sim\chi^2_{n-p}$ are independent. Hence \begin{align*} \frac{n-p}{p(n-1)}T^2\sim F_{p,n-p}(\lambda). \end{align*} This is the desired distributional identity. [/guided] [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Orthogonal Projection

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.