Independence of the Sample Mean and Sample Covariance for Multivariate Normal Samples

Independence of the Sample Mean and Sample Covariance for Multivariate Normal Samples (Theorem # 4024)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We rotate the observation index by an orthogonal matrix whose first row is the normalized all-ones vector. This separates the data into one Gaussian coordinate equal to $\sqrt{n}(\bar{X}-\mu)$ and $n-1$ Gaussian contrast coordinates. Orthogonality preserves the joint Gaussian structure, and a characteristic-function computation shows that these transformed coordinates are independent with covariance $\Sigma$. Finally, the centered residual sum of squares is exactly the sum of the outer products of the contrast coordinates, which gives both independence from $\bar{X}$ and the Wishart law. [/proofplan] [step:Choose an orthogonal basis separating the mean direction] Let $e \in \mathbb{R}^n$ be the vector \begin{align*} e &= \frac{1}{\sqrt{n}}(1,\dots,1). \end{align*} Extend $e$ to an orthonormal basis $h_1,\dots,h_n$ of $\mathbb{R}^n$, with $h_1=e$. Let $H \in \mathbb{R}^{n \times n}$ be the orthogonal matrix whose $k$-th row is $h_k^\top$. Define centered observations $Y_j: \Omega \to \mathbb{R}^p$ by \begin{align*} Y_j &= X_j-\mu, \end{align*} and define transformed coordinates $Z_k: \Omega \to \mathbb{R}^p$ by \begin{align*} Z_k &= \sum_{j=1}^{n} h_{kj}Y_j, \qquad 1 \leq k \leq n. \end{align*} Because $h_{1j}=1/\sqrt{n}$ for every $j$, \begin{align*} Z_1 &= \frac{1}{\sqrt{n}}\sum_{j=1}^{n}(X_j-\mu) \\ &= \sqrt{n}(\bar{X}-\mu). \end{align*} [guided] The observation index is the index $j \in \{1,\dots,n\}$, not the coordinate index in $\mathbb{R}^p$. We choose an orthogonal change of basis in this $n$-dimensional observation space so that one new coordinate measures the average and the remaining coordinates measure deviations from the average. Let \begin{align*} e &= \frac{1}{\sqrt{n}}(1,\dots,1) \in \mathbb{R}^n. \end{align*} Since $|e|=1$, we may extend $e$ to an orthonormal basis $h_1,\dots,h_n$ of $\mathbb{R}^n$ with $h_1=e$. Let $H \in \mathbb{R}^{n \times n}$ be the orthogonal matrix whose $k$-th row is $h_k^\top$. Thus \begin{align*} \sum_{j=1}^{n} h_{kj}h_{\ell j}=\delta_{k\ell}, \qquad \sum_{k=1}^{n} h_{ki}h_{kj}=\delta_{ij}. \end{align*} Define $Y_j: \Omega \to \mathbb{R}^p$ by $Y_j=X_j-\mu$. Each $Y_j$ has law $\mathcal{N}_p(0,\Sigma)$, and the random vectors $Y_1,\dots,Y_n$ are independent because subtracting a deterministic vector preserves independence. For $1 \leq k \leq n$, define \begin{align*} Z_k &= \sum_{j=1}^{n} h_{kj}Y_j. \end{align*} The first transformed coordinate is the mean direction: \begin{align*} Z_1 &= \sum_{j=1}^{n} \frac{1}{\sqrt{n}}(X_j-\mu) \\ &= \sqrt{n}\left(\frac{1}{n}\sum_{j=1}^{n}X_j-\mu\right) \\ &= \sqrt{n}(\bar{X}-\mu). \end{align*} Thus proving independence of $Z_1$ from $Z_2,\dots,Z_n$ will prove independence of $\bar{X}$ from any measurable function of $Z_2,\dots,Z_n$. [/guided] [/step] [step:Show the transformed coordinates are independent centered normal vectors] We compute the joint characteristic function of $(Z_1,\dots,Z_n)$. Let $a_1,\dots,a_n \in \mathbb{R}^p$, and define $b_j \in \mathbb{R}^p$ by \begin{align*} b_j &= \sum_{k=1}^{n} h_{kj}a_k. \end{align*} Using independence of $Y_1,\dots,Y_n$ and the characteristic function of $\mathcal{N}_p(0,\Sigma)$, \begin{align*} \mathbb{E}\left[\exp\left(i\sum_{k=1}^{n} a_k^\top Z_k\right)\right] &= \mathbb{E}\left[\exp\left(i\sum_{j=1}^{n} b_j^\top Y_j\right)\right] \\ &= \prod_{j=1}^{n}\exp\left(-\frac{1}{2}b_j^\top\Sigma b_j\right) \\ &= \exp\left(-\frac{1}{2}\sum_{j=1}^{n} b_j^\top\Sigma b_j\right). \end{align*} By orthogonality of $H$, \begin{align*} \sum_{j=1}^{n} b_j^\top\Sigma b_j &= \sum_{j=1}^{n}\sum_{k=1}^{n}\sum_{\ell=1}^{n} h_{kj}h_{\ell j}a_k^\top\Sigma a_\ell \\ &= \sum_{k=1}^{n}\sum_{\ell=1}^{n}\delta_{k\ell}a_k^\top\Sigma a_\ell \\ &= \sum_{k=1}^{n}a_k^\top\Sigma a_k. \end{align*} Therefore \begin{align*} \mathbb{E}\left[\exp\left(i\sum_{k=1}^{n} a_k^\top Z_k\right)\right] &= \prod_{k=1}^{n}\exp\left(-\frac{1}{2}a_k^\top\Sigma a_k\right). \end{align*} This factorization shows that $Z_1,\dots,Z_n$ are independent, and each $Z_k$ has law $\mathcal{N}_p(0,\Sigma)$. [guided] We now verify independence directly rather than merely asserting that an orthogonal transform preserves normality. Let $a_1,\dots,a_n \in \mathbb{R}^p$ be arbitrary test vectors. Define $b_j \in \mathbb{R}^p$ by \begin{align*} b_j &= \sum_{k=1}^{n}h_{kj}a_k. \end{align*} Then \begin{align*} \sum_{k=1}^{n}a_k^\top Z_k &= \sum_{k=1}^{n}a_k^\top\left(\sum_{j=1}^{n}h_{kj}Y_j\right) \\ &= \sum_{j=1}^{n}\left(\sum_{k=1}^{n}h_{kj}a_k\right)^\top Y_j \\ &= \sum_{j=1}^{n}b_j^\top Y_j. \end{align*} Since $Y_1,\dots,Y_n$ are independent and each $Y_j$ has law $\mathcal{N}_p(0,\Sigma)$, their characteristic functions multiply: \begin{align*} \mathbb{E}\left[\exp\left(i\sum_{k=1}^{n}a_k^\top Z_k\right)\right] &= \prod_{j=1}^{n}\mathbb{E}\left[\exp(i b_j^\top Y_j)\right] \\ &= \prod_{j=1}^{n}\exp\left(-\frac{1}{2}b_j^\top\Sigma b_j\right) \\ &= \exp\left(-\frac{1}{2}\sum_{j=1}^{n}b_j^\top\Sigma b_j\right). \end{align*} It remains to simplify the quadratic form. Expanding $b_j$ gives \begin{align*} \sum_{j=1}^{n}b_j^\top\Sigma b_j &= \sum_{j=1}^{n} \left(\sum_{k=1}^{n}h_{kj}a_k\right)^\top \Sigma \left(\sum_{\ell=1}^{n}h_{\ell j}a_\ell\right) \\ &= \sum_{j=1}^{n}\sum_{k=1}^{n}\sum_{\ell=1}^{n} h_{kj}h_{\ell j}a_k^\top\Sigma a_\ell. \end{align*} Because the rows of $H$ are orthonormal, \begin{align*} \sum_{j=1}^{n}h_{kj}h_{\ell j}=\delta_{k\ell}. \end{align*} Hence \begin{align*} \sum_{j=1}^{n}b_j^\top\Sigma b_j &= \sum_{k=1}^{n}\sum_{\ell=1}^{n}\delta_{k\ell}a_k^\top\Sigma a_\ell \\ &= \sum_{k=1}^{n}a_k^\top\Sigma a_k. \end{align*} Therefore \begin{align*} \mathbb{E}\left[\exp\left(i\sum_{k=1}^{n}a_k^\top Z_k\right)\right] &= \prod_{k=1}^{n}\exp\left(-\frac{1}{2}a_k^\top\Sigma a_k\right). \end{align*} This is exactly the product of the characteristic functions of $n$ independent $\mathcal{N}_p(0,\Sigma)$ random vectors. Thus $Z_1,\dots,Z_n$ are independent and each $Z_k$ has law $\mathcal{N}_p(0,\Sigma)$. [/guided] [/step] [step:Rewrite the residual sum of squares using only the contrast coordinates] Let $R_j: \Omega \to \mathbb{R}^p$ be the centered residual \begin{align*} R_j &= X_j-\bar{X}. \end{align*} Since $Y_j=X_j-\mu$, we have \begin{align*} R_j &= Y_j-\frac{1}{n}\sum_{m=1}^{n}Y_m. \end{align*} Using the inverse orthogonal expansion $Y_j=\sum_{k=1}^{n}h_{kj}Z_k$ and $h_{1j}=1/\sqrt{n}$, \begin{align*} R_j &= \sum_{k=1}^{n}h_{kj}Z_k-\frac{1}{n}\sum_{m=1}^{n}\sum_{k=1}^{n}h_{km}Z_k \\ &= \sum_{k=1}^{n}h_{kj}Z_k-\frac{1}{n}\sum_{k=1}^{n}\left(\sum_{m=1}^{n}h_{km}\right)Z_k. \end{align*} Now $\sum_{m=1}^{n}h_{1m}=\sqrt{n}$, while $\sum_{m=1}^{n}h_{km}=0$ for $k \geq 2$ because $h_k$ is orthogonal to $h_1=e$. Hence \begin{align*} R_j &= \sum_{k=2}^{n}h_{kj}Z_k. \end{align*} Therefore, using orthonormality of the rows of $H$, \begin{align*} \sum_{j=1}^{n}R_jR_j^\top &= \sum_{j=1}^{n} \left(\sum_{k=2}^{n}h_{kj}Z_k\right) \left(\sum_{\ell=2}^{n}h_{\ell j}Z_\ell\right)^\top \\ &= \sum_{k=2}^{n}\sum_{\ell=2}^{n} \left(\sum_{j=1}^{n}h_{kj}h_{\ell j}\right)Z_kZ_\ell^\top \\ &= \sum_{k=2}^{n}Z_kZ_k^\top. \end{align*} [guided] The sample covariance only depends on residuals from the sample mean, so we need to express those residuals in the new orthogonal coordinates. Define \begin{align*} R_j &= X_j-\bar{X}. \end{align*} Because $Y_j=X_j-\mu$, \begin{align*} R_j &= X_j-\frac{1}{n}\sum_{m=1}^{n}X_m \\ &= Y_j-\frac{1}{n}\sum_{m=1}^{n}Y_m. \end{align*} The transformation from $(Y_1,\dots,Y_n)$ to $(Z_1,\dots,Z_n)$ is orthogonal in the observation index, so its inverse is the transpose transformation: \begin{align*} Y_j &= \sum_{k=1}^{n}h_{kj}Z_k. \end{align*} Substituting this into the residual gives \begin{align*} R_j &= \sum_{k=1}^{n}h_{kj}Z_k -\frac{1}{n}\sum_{m=1}^{n}\sum_{k=1}^{n}h_{km}Z_k \\ &= \sum_{k=1}^{n}h_{kj}Z_k -\frac{1}{n}\sum_{k=1}^{n}\left(\sum_{m=1}^{n}h_{km}\right)Z_k. \end{align*} The row $h_1$ equals $(1/\sqrt{n},\dots,1/\sqrt{n})$, so \begin{align*} \sum_{m=1}^{n}h_{1m}=\sqrt{n}. \end{align*} For $k \geq 2$, the row $h_k$ is orthogonal to $h_1$, which means \begin{align*} 0=h_k\cdot h_1=\frac{1}{\sqrt{n}}\sum_{m=1}^{n}h_{km}, \end{align*} and therefore $\sum_{m=1}^{n}h_{km}=0$. Thus the mean component cancels and only the contrast coordinates remain: \begin{align*} R_j &= \sum_{k=2}^{n}h_{kj}Z_k. \end{align*} Now compute the residual sum of squares: \begin{align*} \sum_{j=1}^{n}R_jR_j^\top &= \sum_{j=1}^{n} \left(\sum_{k=2}^{n}h_{kj}Z_k\right) \left(\sum_{\ell=2}^{n}h_{\ell j}Z_\ell\right)^\top \\ &= \sum_{k=2}^{n}\sum_{\ell=2}^{n} \left(\sum_{j=1}^{n}h_{kj}h_{\ell j}\right)Z_kZ_\ell^\top. \end{align*} Orthonormality of the rows of $H$ gives \begin{align*} \sum_{j=1}^{n}h_{kj}h_{\ell j}=\delta_{k\ell}. \end{align*} Hence all cross terms vanish and \begin{align*} \sum_{j=1}^{n}R_jR_j^\top &= \sum_{k=2}^{n}Z_kZ_k^\top. \end{align*} This identity is the algebraic reason the covariance matrix has $n-1$ degrees of freedom: the mean direction has been removed, leaving $n-1$ independent contrast directions. [/guided] [/step] [step:Conclude independence and identify the Wishart distribution] By the previous step, \begin{align*} (n-1)S &= \sum_{j=1}^{n}(X_j-\bar{X})(X_j-\bar{X})^\top \\ &= \sum_{k=2}^{n}Z_kZ_k^\top. \end{align*} The random vector $\bar{X}$ is a measurable function of $Z_1$, since \begin{align*} \bar{X} &= \mu+\frac{1}{\sqrt{n}}Z_1. \end{align*} The matrix $(n-1)S$ is a measurable function of $(Z_2,\dots,Z_n)$. Since $Z_1$ is independent of $(Z_2,\dots,Z_n)$, it follows that $\bar{X}$ and $S$ are independent. Finally, $Z_2,\dots,Z_n$ are $n-1$ independent random vectors with common law $\mathcal{N}_p(0,\Sigma)$. By the defining characterization of the Wishart distribution, \begin{align*} \sum_{k=2}^{n}Z_kZ_k^\top \sim W_p(\Sigma,n-1). \end{align*} Therefore \begin{align*} (n-1)S \sim W_p(\Sigma,n-1), \end{align*} which completes the proof. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

test

Definitions & Concepts

Explore Further

Distribution Definition Orthogonality Definition test Theorem #89 Conditional Expectation as the Mean Square Forecast probability Wiener–Khinchin Theorem probability Wishart Mean and Entrywise Covariance Formula probability Tail Dependence Coefficients of the Bivariate Student $t$ Copula probability Maximum Likelihood Estimator of the Coefficient Matrix in the Multivariate Linear Model probability Orthogonal Equivariance of Covariance Principal Components probability Population Canonical Correlations via Singular Values probability Monotonicity and Triangle Inequality for Expectation probability

What brings you to Androma?

Start with a route through the knowledge graph.