Wishart Distribution of the Sample Covariance Matrix

Wishart Distribution of the Sample Covariance Matrix (Theorem # 4015)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We write the centered sum of squares as a matrix quadratic form using the [orthogonal projection](/theorems/437) onto the subspace of $\mathbb{R}^n$ perpendicular to the all-ones vector. An orthogonal change of coordinates diagonalizes this projection and transforms the centered data into $n-1$ independent Gaussian contrast vectors and one sample-mean direction. The contrast vectors have distribution $\mathcal{N}_p(0,\Sigma)$, and the centered cross-product matrix is exactly the sum of their outer products. This is the defining representation of the Wishart distribution with $n-1$ degrees of freedom and scale matrix $\Sigma$. [/proofplan] [step:Represent the centered cross-product matrix by the centering projection] Let $Y: \Omega \to \mathbb{R}^{n \times p}$ be the random matrix whose $i$-th row is $Y_i^\top$. Let $\mathbf{1}_n \in \mathbb{R}^n$ denote the column vector whose entries are all $1$, and define the centering projection $P \in \mathbb{R}^{n \times n}$ by \begin{align*} P &= I_n - \frac{1}{n}\mathbf{1}_n\mathbf{1}_n^\top . \end{align*} For each $\omega \in \Omega$, the $i$-th row of $PY(\omega)$ is \begin{align*} Y_i(\omega)^\top - \bar{Y}(\omega)^\top . \end{align*} Therefore \begin{align*} Y^\top P Y &= (PY)^\top(PY) \\ &= \sum_{i=1}^n (Y_i-\bar{Y})(Y_i-\bar{Y})^\top \\ &= (n-1)S. \end{align*} Here we used $P^\top=P$ and $P^2=P$, which follow directly from the definition of $P$ and the identity $\mathbf{1}_n^\top\mathbf{1}_n=n$. [/step] [step:Diagonalize the centering projection by an orthogonal basis of contrasts] Choose vectors $q_1,\dots,q_{n-1} \in \mathbb{R}^n$ forming an orthonormal basis of the subspace \begin{align*} \mathbf{1}_n^\perp &= \{x \in \mathbb{R}^n : x^\top \mathbf{1}_n = 0\}, \end{align*} and set \begin{align*} q_n &= \frac{1}{\sqrt{n}}\mathbf{1}_n. \end{align*} Let $Q \in \mathbb{R}^{n \times n}$ be the orthogonal matrix whose $a$-th row is $q_a^\top$. Then $Q Q^\top=I_n$, and the projection $P$ has the decomposition \begin{align*} P = Q^\top D Q, \qquad D = \begin{pmatrix} I_{n-1} & 0 \\ 0 & 0 \end{pmatrix}. \end{align*} Indeed, $Pq_a=q_a$ for $a \in \{1,\dots,n-1\}$ because $q_a^\top\mathbf{1}_n=0$, while $Pq_n=0$ because $q_n$ spans $\operatorname{span}\{\mathbf{1}_n\}$. [guided] The matrix $P$ subtracts row averages, so it should preserve exactly the directions in $\mathbb{R}^n$ whose coordinates sum to zero. We make this precise by defining \begin{align*} \mathbf{1}_n^\perp &= \{x \in \mathbb{R}^n : x^\top \mathbf{1}_n = 0\}. \end{align*} This subspace has dimension $n-1$. Choose an orthonormal basis $q_1,\dots,q_{n-1}$ for it, and complete it by setting \begin{align*} q_n &= \frac{1}{\sqrt{n}}\mathbf{1}_n. \end{align*} The vectors $q_1,\dots,q_n$ form an orthonormal basis of $\mathbb{R}^n$, because $q_n$ has Euclidean norm $1$ and is orthogonal to every vector in $\mathbf{1}_n^\perp$. Let $Q \in \mathbb{R}^{n \times n}$ be the orthogonal matrix whose $a$-th row is $q_a^\top$. The projection $P$ acts as the identity on $\mathbf{1}_n^\perp$: if $x \in \mathbf{1}_n^\perp$, then \begin{align*} Px &= x - \frac{1}{n}\mathbf{1}_n\mathbf{1}_n^\top x \\ &= x. \end{align*} It annihilates the mean direction: \begin{align*} Pq_n &= \frac{1}{\sqrt{n}}\mathbf{1}_n - \frac{1}{n}\mathbf{1}_n\mathbf{1}_n^\top\frac{1}{\sqrt{n}}\mathbf{1}_n \\ &= \frac{1}{\sqrt{n}}\mathbf{1}_n - \frac{1}{n\sqrt{n}}\mathbf{1}_n(\mathbf{1}_n^\top\mathbf{1}_n) \\ &= \frac{1}{\sqrt{n}}\mathbf{1}_n - \frac{1}{\sqrt{n}}\mathbf{1}_n \\ &=0. \end{align*} Thus, in the orthonormal basis $q_1,\dots,q_n$, the matrix of $P$ is \begin{align*} D = \begin{pmatrix} I_{n-1} & 0 \\ 0 & 0 \end{pmatrix}, \end{align*} which means \begin{align*} P = Q^\top DQ. \end{align*} [/guided] [/step] [step:Transform the observations into independent Gaussian contrasts] Define the transformed data matrix $Z: \Omega \to \mathbb{R}^{n \times p}$ by \begin{align*} Z &= QY. \end{align*} Let $Z_a: \Omega \to \mathbb{R}^p$ denote the column vector whose transpose is the $a$-th row of $Z$. Thus \begin{align*} Z_a &= \sum_{i=1}^n q_{a i}Y_i, \end{align*} where $q_{a i}$ is the $i$-th component of $q_a$. For $a \in \{1,\dots,n-1\}$, since $\sum_{i=1}^n q_{a i}=q_a^\top\mathbf{1}_n=0$, the mean of $Z_a$ is \begin{align*} \mathbb{E}[Z_a] &= \sum_{i=1}^n q_{a i}\mu \\ &= 0. \end{align*} Its covariance matrix is \begin{align*} \operatorname{Cov}(Z_a) &= \sum_{i=1}^n q_{a i}^2\Sigma \\ &= \Sigma, \end{align*} because $|q_a|^2=1$. If $a,b \in \{1,\dots,n-1\}$ with $a \neq b$, then \begin{align*} \operatorname{Cov}(Z_a,Z_b) &= \sum_{i=1}^n q_{a i}q_{b i}\Sigma \\ &= (q_a^\top q_b)\Sigma \\ &=0. \end{align*} Since $(Z_1,\dots,Z_n)$ is obtained from the jointly Gaussian vector $(Y_1,\dots,Y_n)$ by a linear transformation, it is jointly Gaussian. Therefore the zero cross-covariances imply that $Z_1,\dots,Z_{n-1}$ are independent, and we have \begin{align*} Z_1,\dots,Z_{n-1} \overset{\mathrm{ind}}{\sim} \mathcal{N}_p(0,\Sigma). \end{align*} [guided] The purpose of multiplying by $Q$ is to replace the original observations by orthogonal linear combinations: $n-1$ contrast directions and one mean direction. Define \begin{align*} Z &= QY, \end{align*} and let $Z_a: \Omega \to \mathbb{R}^p$ be the column vector whose transpose is row $a$ of $Z$. Because row $a$ of $Q$ is $q_a^\top$, we have \begin{align*} Z_a = \sum_{i=1}^n q_{a i}Y_i. \end{align*} For $a \leq n-1$, the vector $q_a$ lies in $\mathbf{1}_n^\perp$, so its coordinates sum to zero. Therefore the common mean $\mu$ cancels: \begin{align*} \mathbb{E}[Z_a] &= \mathbb{E}\left[\sum_{i=1}^n q_{a i}Y_i\right] \\ &= \sum_{i=1}^n q_{a i}\mathbb{E}[Y_i] \\ &= \sum_{i=1}^n q_{a i}\mu \\ &= \left(\sum_{i=1}^n q_{a i}\right)\mu \\ &=0. \end{align*} The covariance calculation uses independence of the original observations. Since $\operatorname{Cov}(Y_i,Y_j)=0$ for $i \neq j$ and $\operatorname{Cov}(Y_i)=\Sigma$, we get \begin{align*} \operatorname{Cov}(Z_a) &= \operatorname{Cov}\left(\sum_{i=1}^n q_{a i}Y_i\right) \\ &= \sum_{i=1}^n q_{a i}^2\operatorname{Cov}(Y_i) \\ &= \left(\sum_{i=1}^n q_{a i}^2\right)\Sigma \\ &= \Sigma. \end{align*} For two distinct contrast rows $a,b \leq n-1$, the same independence calculation gives \begin{align*} \operatorname{Cov}(Z_a,Z_b) &= \sum_{i=1}^n q_{a i}q_{b i}\operatorname{Cov}(Y_i) \\ &= \left(\sum_{i=1}^n q_{a i}q_{b i}\right)\Sigma \\ &= (q_a^\top q_b)\Sigma \\ &=0. \end{align*} The vector $(Z_1,\dots,Z_n)$ is a linear transformation of the jointly Gaussian vector $(Y_1,\dots,Y_n)$, hence is jointly Gaussian. For jointly Gaussian random vectors, vanishing cross-covariance is equivalent to independence. Thus \begin{align*} Z_1,\dots,Z_{n-1} \overset{\mathrm{ind}}{\sim} \mathcal{N}_p(0,\Sigma). \end{align*} [/guided] [/step] [step:Identify the centered cross-product matrix as a Wishart sum] Using the diagonalization $P=Q^\top DQ$ and the definition $Z=QY$, we obtain \begin{align*} (n-1)S &= Y^\top P Y \\ &= Y^\top Q^\top DQY \\ &= Z^\top D Z \\ &= \sum_{a=1}^{n-1} Z_a Z_a^\top. \end{align*} By the defining representation of the Wishart distribution, the sum of $m$ independent outer products $X_aX_a^\top$, where \begin{align*} X_1,\dots,X_m \overset{\mathrm{ind}}{\sim} \mathcal{N}_p(0,\Sigma), \end{align*} has distribution $W_p(m,\Sigma)$. Applying this definition with $m=n-1$ and $X_a=Z_a$ gives \begin{align*} (n-1)S \sim W_p(n-1,\Sigma). \end{align*} This is the desired conclusion. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Distribution

Explore Further

Distribution Definition Rank Bound for the Sample Covariance Matrix probability Rotational Diagonalization of a Maximum Likelihood Factor Loading Representative probability Independence of the Sample Mean and Sample Covariance for a Multivariate Normal Sample probability Rank Correlations for the Bivariate Normal Distribution probability Birkhoff Ergodic Theorem for Sample Means probability One-Sample Gaussian Mean Test with Known Covariance probability Asymptotic Normality of ARMA Maximum Likelihood Estimators probability Conditional Moments in the Gaussian Factor Model probability

What brings you to Androma?

Start with a route through the knowledge graph.