Androma — The Home of Mathematics on the Internet

custom_env admin

[guided]The first move is to remove the covariance matrix from the problem. Because $\Sigma$ is symmetric positive definite, it has a unique symmetric positive definite square root $\Sigma^{1/2}$, and $\Sigma^{-1/2}$ is well-defined. We introduce the vector \begin{align*} \delta = \Sigma^{-1/2}(\mu-\mu_0), \end{align*} which is the mean shift measured in covariance-standardized coordinates. For each observation, define the whitened random vector \begin{align*} Y_i : \Omega &\to \mathbb{R}^p \\ \omega &\mapsto \Sigma^{-1/2}(X_i(\omega)-\mu_0). \end{align*} The transformation $x\mapsto \Sigma^{-1/2}(x-\mu_0)$ is affine. Therefore each $Y_i$ is Gaussian with mean \begin{align*} \mathbb{E}[Y_i] = \Sigma^{-1/2}(\mu-\mu_0)=\delta \end{align*} and covariance \begin{align*} \Sigma^{-1/2}\Sigma(\Sigma^{-1/2})^\top = I_p, \end{align*} because $\Sigma^{-1/2}$ is symmetric. Independence is preserved because each $Y_i$ is a measurable function of $X_i$ alone. Hence $Y_1,\dots,Y_n$ are independent with common distribution $\mathcal N_p(\delta,I_p)$. Now define \begin{align*} \bar Y &= \frac{1}{n}\sum_{i=1}^n Y_i, \\ S_Y &= \frac{1}{n-1}\sum_{i=1}^n (Y_i-\bar Y)(Y_i-\bar Y)^\top. \end{align*} From the definition of $Y_i$, \begin{align*} \bar Y=\Sigma^{-1/2}(\bar X-\mu_0). \end{align*} Also, \begin{align*} Y_i-\bar Y=\Sigma^{-1/2}(X_i-\bar X), \end{align*} so \begin{align*} S_Y &= \frac{1}{n-1}\sum_{i=1}^n \Sigma^{-1/2}(X_i-\bar X)(X_i-\bar X)^\top \Sigma^{-1/2} \\ &= \Sigma^{-1/2}S\Sigma^{-1/2}. \end{align*} Therefore, on the almost sure event where $S$ is invertible, \begin{align*} S_Y^{-1}=\Sigma^{1/2}S^{-1}\Sigma^{1/2}. \end{align*} Substituting these identities into Hotelling's statistic gives \begin{align*} T^2 &= n(\bar X-\mu_0)^\top S^{-1}(\bar X-\mu_0) \\ &= n\bar Y^\top S_Y^{-1}\bar Y. \end{align*} Thus the original statistic is unchanged by whitening, provided we express it using the whitened sample covariance matrix. The noncentrality parameter also becomes \begin{align*} \lambda &= n(\mu-\mu_0)^\top \Sigma^{-1}(\mu-\mu_0) \\ &= n|\delta|^2. \end{align*}[/guided]

custom_env admin

[guided]The purpose of this step is to isolate the part of the data containing the sample mean from the part containing the sample covariance. We do this by rotating the $n$ observations in the sample-index direction. Choose an orthogonal matrix $Q\in \mathbb{R}^{n\times n}$ whose first row is \begin{align*} \left(\frac{1}{\sqrt n},\dots,\frac{1}{\sqrt n}\right). \end{align*} For each $a=1,\dots,n$, define \begin{align*} Z_a=\sum_{i=1}^n Q_{ai}Y_i. \end{align*} This is a linear transformation of the jointly Gaussian vector $(Y_1,\dots,Y_n)$, so $(Z_1,\dots,Z_n)$ is jointly Gaussian. Orthogonality of $Q$ preserves the covariance in the sample-index direction. More explicitly, for $a,b\in\{1,\dots,n\}$, \begin{align*} \operatorname{Cov}(Z_a,Z_b) &= \sum_{i=1}^n\sum_{j=1}^n Q_{ai}Q_{bj}\operatorname{Cov}(Y_i,Y_j) \\ &= \sum_{i=1}^n Q_{ai}Q_{bi} I_p \\ &= \delta_{ab}I_p, \end{align*} where $\delta_{ab}$ denotes the Kronecker delta. Since the vector is jointly Gaussian, zero covariance between distinct $Z_a$ and $Z_b$ implies independence. The means are computed from $\mathbb{E}[Y_i]=\delta$: \begin{align*} \mathbb{E}[Z_a] &= \sum_{i=1}^n Q_{ai}\delta \\ &= \left(\sum_{i=1}^n Q_{ai}\right)\delta. \end{align*} For $a=1$, the row sum is $\sqrt n$, so $\mathbb{E}[Z_1]=\sqrt n\,\delta$. For $a\ge 2$, the row of $Q$ is orthogonal to the first row, hence its entries sum to $0$, so $\mathbb{E}[Z_a]=0$. Therefore \begin{align*} Z_1\sim \mathcal N_p(\sqrt n\,\delta,I_p), \end{align*} and $Z_2,\dots,Z_n$ are independent central $\mathcal N_p(0,I_p)$ random vectors, independent of $Z_1$. The first transformed vector is the scaled sample mean: \begin{align*} Z_1=\sum_{i=1}^n \frac{1}{\sqrt n}Y_i=\sqrt n\,\bar Y. \end{align*} The remaining transformed vectors span the orthogonal complement of the constant sample direction, so they contain exactly the centered residual information. Orthogonality of $Q$ gives \begin{align*} \sum_{i=1}^n Y_iY_i^\top = \sum_{a=1}^n Z_aZ_a^\top. \end{align*} Since $Z_1Z_1^\top=n\bar Y\bar Y^\top$, subtracting this rank-one mean part gives \begin{align*} \sum_{i=1}^n (Y_i-\bar Y)(Y_i-\bar Y)^\top = \sum_{a=2}^n Z_aZ_a^\top. \end{align*} Define \begin{align*} W=\sum_{a=2}^n Z_aZ_a^\top. \end{align*} Because $Z_2,\dots,Z_n$ are independent $\mathcal N_p(0,I_p)$ random vectors, $W$ has the central Wishart distribution $W_p(I_p,n-1)$. It is independent of $Z_1$ because it is a measurable function of $Z_2,\dots,Z_n$, which are independent of $Z_1$. Finally, \begin{align*} S_Y=\frac{1}{n-1}W, \end{align*} and therefore \begin{align*} T^2 &= n\bar Y^\top S_Y^{-1}\bar Y \\ &= Z_1^\top \left(\frac{1}{n-1}W\right)^{-1}Z_1 \\ &= (n-1)Z_1^\top W^{-1}Z_1. \end{align*} This reduces the theorem to a distributional statement about one noncentral Gaussian vector independent of one central Wishart matrix.[/guided]

custom_env admin

[step:Reduce the inverse Wishart quadratic form to independent chi-square variables]Let $m=n-1$ and let $\nu=m-p+1=n-p$. We prove the following distributional fact in the present setting: if $Z:\Omega\to\mathbb{R}^p$ satisfies $Z\sim\mathcal N_p(\theta,I_p)$, if $W\sim W_p(I_p,m)$ is independent of $Z$, and if $m\ge p$, then \begin{align*} Z^\top W^{-1}Z \sim \frac{U}{V}, \end{align*} where $U\sim\chi^2_p(|\theta|^2)$, $V\sim\chi^2_{\nu}$, and $U,V$ are independent. Condition on the value $Z=z\in\mathbb{R}^p$. If $z=0$, then $z^\top W^{-1}z=0$. If $z\neq 0$, choose an orthogonal matrix $R_z\in\mathbb{R}^{p\times p}$ such that $R_zz=|z|e_1$, where $e_1=(1,0,\dots,0)^\top$. Since $W\sim W_p(I_p,m)$ is orthogonally invariant, $R_zWR_z^\top$ has the same distribution as $W$. Therefore the conditional distribution of $z^\top W^{-1}z$ equals the distribution of \begin{align*} |z|^2 e_1^\top W^{-1}e_1. \end{align*} Represent $W$ as \begin{align*} W=\sum_{r=1}^m G_rG_r^\top, \end{align*} where $G_1,\dots,G_m:\Omega\to\mathbb{R}^p$ are independent $\mathcal N_p(0,I_p)$ random vectors. Let $A\in\mathbb{R}^{m\times p}$ be the random matrix whose $r$-th row is $G_r^\top$, so $W=A^\top A$. Write $A=(a,B)$, where $a:\Omega\to\mathbb{R}^m$ is the first column and $B:\Omega\to\mathbb{R}^{m\times(p-1)}$ is the matrix of the remaining columns. Then $a$ is a standard Gaussian vector in $\mathbb{R}^m$, $B$ is independent of $a$, and almost surely $B$ has rank $p-1$. The Schur complement formula for the inverse of $W=A^\top A$ gives \begin{align*} e_1^\top W^{-1}e_1 = \frac{1}{a^\top (I_m-P_B)a}, \end{align*} where $I_m$ is the identity matrix on $\mathbb{R}^m$ and $P_B:\mathbb{R}^m\to\mathbb{R}^m$ is the [orthogonal projection](/theorems/437) onto the column space of $B$. Conditional on $B$, the matrix $I_m-P_B$ is the orthogonal projection onto a subspace of dimension \begin{align*} m-(p-1)=m-p+1=\nu. \end{align*} Since $a\sim\mathcal N_m(0,I_m)$ and $a$ is independent of $B$, the quadratic form \begin{align*} a^\top(I_m-P_B)a \end{align*} has the $\chi^2_{\nu}$ distribution, conditionally on $B$ and hence unconditionally. Denote this random variable by $V$. Its conditional distribution does not depend on $z$, and $W$ is independent of $Z$, so $V$ may be chosen independent of $Z$. Thus, conditionally on $Z=z$, \begin{align*} Z^\top W^{-1}Z \sim \frac{|z|^2}{V}, \end{align*} with $V\sim\chi^2_{\nu}$ independent of $Z$. Since $Z\sim\mathcal N_p(\theta,I_p)$, the squared norm \begin{align*} U=|Z|^2 \end{align*} has the noncentral chi-square distribution $\chi^2_p(|\theta|^2)$. Hence \begin{align*} Z^\top W^{-1}Z \sim \frac{U}{V}, \end{align*} with $U$ and $V$ independent.[/step]

custom_env admin

[guided]We now need the exact distribution of the quadratic form $Z^\top W^{-1}Z$, where $Z$ is Gaussian and $W$ is an independent central Wishart matrix. The key point is that, after conditioning on $Z=z$, the Wishart matrix is rotationally symmetric. This lets us rotate $z$ onto the first coordinate axis and compute only the upper-left entry of $W^{-1}$. Let $m=n-1$ and $\nu=m-p+1=n-p$. Consider the general situation where \begin{align*} Z\sim\mathcal N_p(\theta,I_p), \end{align*} where $\theta\in\mathbb{R}^p$, and where $W\sim W_p(I_p,m)$ is independent of $Z$. Since $m=n-1\ge p$, the Wishart matrix $W$ is invertible almost surely. Fix $z\in\mathbb{R}^p$ and condition on the event $Z=z$ in the regular conditional distribution sense. If $z=0$, then $z^\top W^{-1}z=0$, so there is nothing to rotate. If $z\neq 0$, choose an orthogonal matrix $R_z\in\mathbb{R}^{p\times p}$ satisfying \begin{align*} R_zz=|z|e_1, \end{align*} where $e_1=(1,0,\dots,0)^\top$. The central Wishart distribution with scale matrix $I_p$ is invariant under orthogonal conjugation: $R_zWR_z^\top$ has the same distribution as $W$. Therefore \begin{align*} z^\top W^{-1}z &= (R_zz)^\top (R_zWR_z^\top)^{-1}(R_zz) \\ &\sim |z|^2 e_1^\top W^{-1}e_1 \end{align*} conditionally on $Z=z$. It remains to compute $e_1^\top W^{-1}e_1$. Represent the Wishart matrix as \begin{align*} W=\sum_{r=1}^m G_rG_r^\top, \end{align*} where $G_1,\dots,G_m$ are independent $\mathcal N_p(0,I_p)$ random vectors. Equivalently, if $A\in\mathbb{R}^{m\times p}$ is the matrix whose $r$-th row is $G_r^\top$, then \begin{align*} W=A^\top A. \end{align*} Write the columns of $A$ as \begin{align*} A=(a,B), \end{align*} where $a:\Omega\to\mathbb{R}^m$ is the first column and $B:\Omega\to\mathbb{R}^{m\times(p-1)}$ contains the remaining columns. The entries of $A$ are independent standard normal random variables, so $a$ is a standard Gaussian vector in $\mathbb{R}^m$, $B$ is independent of $a$, and $B$ has rank $p-1$ almost surely because $m\ge p$. With respect to this block decomposition, \begin{align*} W=A^\top A = \begin{pmatrix} a^\top a & a^\top B \\ B^\top a & B^\top B \end{pmatrix}. \end{align*} The Schur complement formula says that the upper-left entry of $W^{-1}$ is the reciprocal of the Schur complement of $B^\top B$: \begin{align*} e_1^\top W^{-1}e_1 = \frac{1}{a^\top a-a^\top B(B^\top B)^{-1}B^\top a}. \end{align*} Let $P_B:\mathbb{R}^m\to\mathbb{R}^m$ be the orthogonal projection onto the column space of $B$. Since \begin{align*} P_B=B(B^\top B)^{-1}B^\top \end{align*} almost surely, the denominator becomes \begin{align*} a^\top(I_m-P_B)a. \end{align*} Conditional on $B$, the operator $I_m-P_B$ is the orthogonal projection onto the orthogonal complement of the column space of $B$. Since $B$ has rank $p-1$ almost surely, this orthogonal complement has dimension \begin{align*} m-(p-1)=m-p+1=\nu. \end{align*} Because $a\sim\mathcal N_m(0,I_m)$ and is independent of $B$, the squared Euclidean length of the projection of $a$ onto this $\nu$-dimensional subspace has the $\chi^2_\nu$ distribution. Thus \begin{align*} a^\top(I_m-P_B)a\sim\chi^2_\nu. \end{align*} Call this denominator $V$. The conditional distribution of $V$ does not depend on $z$, and the entire Wishart construction is independent of $Z$. Therefore $V$ is independent of $Z$. We have shown that, conditionally on $Z=z$, \begin{align*} Z^\top W^{-1}Z \sim \frac{|z|^2}{V}, \end{align*} where $V\sim\chi^2_\nu$ is independent of $Z$. Finally, since $Z\sim\mathcal N_p(\theta,I_p)$, its squared norm \begin{align*} U=|Z|^2 \end{align*} has the noncentral chi-square distribution $\chi^2_p(|\theta|^2)$. Hence \begin{align*} Z^\top W^{-1}Z \sim \frac{U}{V}, \end{align*} where $U\sim\chi^2_p(|\theta|^2)$, $V\sim\chi^2_\nu$, and $U,V$ are independent.[/guided]

custom_env admin

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data