Bartlett's Chi-Squared Approximation for Wilks' Lambda in One-Way MANOVA

Bartlett's Chi-Squared Approximation for Wilks' Lambda in One-Way MANOVA (Theorem # 4057)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the result directly from the Gaussian MANOVA decomposition. Under $H_0$, the between-group matrix $B$ has rank-$g-1$ Wishart form and $\operatorname{tr}(\Sigma^{-1}B)$ has the $\chi^2_{p(g-1)}$ distribution, while the pooled within-group matrix satisfies $W/N \xrightarrow{\mathbb P} \Sigma$. Since $B = O_{\mathbb P}(1)$ and $W = N\Sigma + o_{\mathbb P}(N)$, the logarithm of Wilks' lambda has the first-order expansion $-N\log\Lambda = \operatorname{tr}(\Sigma^{-1}B) + o_{\mathbb P}(1)$. The Bartlett factor differs from $N$ only by a fixed additive constant, so it has the same limiting distribution. [/proofplan] [step:Declare the MANOVA matrices and rewrite Wilks' lambda] For each group $i \in \{1,\dots,g\}$, define the sample mean $\bar X_i: (\Omega,\mathcal F) \to \mathbb R^p$ and the grand mean $\bar X: (\Omega,\mathcal F) \to \mathbb R^p$ by \begin{align*} \bar X_i &= \frac{1}{n_i}\sum_{j=1}^{n_i} X_{ij}, & \bar X &= \frac{1}{N}\sum_{i=1}^g\sum_{j=1}^{n_i} X_{ij}. \end{align*} Define the within-group and between-group sum-of-squares-and-cross-products matrices $W,B: \Omega \to \mathbb R^{p\times p}$ by \begin{align*} W &= \sum_{i=1}^g\sum_{j=1}^{n_i}(X_{ij}-\bar X_i)(X_{ij}-\bar X_i)^\top,\\ B &= \sum_{i=1}^g n_i(\bar X_i-\bar X)(\bar X_i-\bar X)^\top. \end{align*} Wilks' lambda is \begin{align*} \Lambda = \frac{\det W}{\det(W+B)}. \end{align*} For all sufficiently large $N$, $W$ is invertible with probability tending to $1$, since $N-g \to \infty$ and $\Sigma$ is positive definite. On this event, \begin{align*} -\log\Lambda &= \log\det(W+B)-\log\det W\\ &= \log\det\left(I_p+W^{-1/2}BW^{-1/2}\right), \end{align*} where $I_p \in \mathbb R^{p\times p}$ is the identity matrix and $W^{-1/2}$ denotes the symmetric inverse square root of the positive definite matrix $W$. [/step] [step:Identify the null distribution of the between-group matrix] Under $H_0$, let $\mu \in \mathbb R^p$ denote the common mean. Define the standardized group-mean matrix $Y: \Omega \to \mathbb R^{g\times p}$ by giving its $i$th row as \begin{align*} Y_i = \sqrt{n_i}(\bar X_i-\mu)^\top\Sigma^{-1/2} \in \mathbb R^{1\times p}. \end{align*} The rows $Y_1,\dots,Y_g$ are independent $\mathcal N(0,I_p)$ row vectors, because the observations are independent Gaussian vectors with covariance $\Sigma$. Define $a_N \in \mathbb R^g$ by $(a_N)_i = \sqrt{n_i/N}$. Then $|a_N|^2=1$, and the matrix \begin{align*} P_N = I_g-a_Na_N^\top \in \mathbb R^{g\times g} \end{align*} is the [orthogonal projection](/theorems/437) onto the $(g-1)$-dimensional subspace $a_N^\perp \subset \mathbb R^g$. A direct substitution of the definitions gives \begin{align*} \Sigma^{-1/2}B\Sigma^{-1/2} = Y^\top P_NY. \end{align*} Choose an orthogonal matrix $Q_N \in \mathbb R^{g\times g}$ whose first $g-1$ rows form an orthonormal basis of $a_N^\perp$ and whose last row is $a_N^\top$. Orthogonal invariance of the standard Gaussian law gives that $Z=Q_NY: \Omega \to \mathbb R^{g\times p}$ again has independent $\mathcal N(0,I_p)$ rows. Since $Q_NP_NQ_N^\top=\operatorname{diag}(1,\dots,1,0)$, \begin{align*} \Sigma^{-1/2}B\Sigma^{-1/2} = Z^\top \operatorname{diag}(1,\dots,1,0)Z = \sum_{r=1}^{g-1} Z_r^\top Z_r, \end{align*} where $Z_r \in \mathbb R^{1\times p}$ is the $r$th row of $Z$. Therefore \begin{align*} \operatorname{tr}(\Sigma^{-1}B) = \operatorname{tr}(\Sigma^{-1/2}B\Sigma^{-1/2}) = \sum_{r=1}^{g-1}\sum_{k=1}^p Z_{rk}^2 \sim \chi^2_{p(g-1)}. \end{align*} [guided] The aim of this step is to turn the between-group variation into a sum of squares of independent standard normal variables. Under $H_0$, all group means have the same expectation $\mu \in \mathbb R^p$, and the Gaussian assumption gives \begin{align*} \bar X_i \sim \mathcal N\left(\mu,\frac{1}{n_i}\Sigma\right). \end{align*} Thus the standardized row vector \begin{align*} Y_i = \sqrt{n_i}(\bar X_i-\mu)^\top\Sigma^{-1/2} \end{align*} is a standard Gaussian row vector in $\mathbb R^p$, and the rows are independent because the groups are independent. The grand mean removes one linear degree of freedom among the $g$ group means. This is encoded by the unit vector $a_N \in \mathbb R^g$ with entries $(a_N)_i = \sqrt{n_i/N}$. The orthogonal projection \begin{align*} P_N = I_g-a_Na_N^\top \end{align*} projects onto the subspace perpendicular to $a_N$, which has dimension $g-1$. Expanding $Y^\top P_NY$ gives \begin{align*} Y^\top P_NY &= \sum_{i=1}^g n_i\Sigma^{-1/2}(\bar X_i-\mu)(\bar X_i-\mu)^\top\Sigma^{-1/2}\\ &\quad - N\Sigma^{-1/2}(\bar X-\mu)(\bar X-\mu)^\top\Sigma^{-1/2}\\ &= \Sigma^{-1/2}\left[\sum_{i=1}^g n_i(\bar X_i-\bar X)(\bar X_i-\bar X)^\top\right]\Sigma^{-1/2}\\ &= \Sigma^{-1/2}B\Sigma^{-1/2}. \end{align*} This verifies the matrix identity rather than assuming the Wishart form. Now choose an orthogonal matrix $Q_N$ whose first $g-1$ rows span $a_N^\perp$ and whose last row is $a_N^\top$. Orthogonal multiplication preserves the joint standard Gaussian distribution of the rows, so $Z=Q_NY$ has independent standard Gaussian rows. Since $Q_NP_NQ_N^\top=\operatorname{diag}(1,\dots,1,0)$, we obtain \begin{align*} \Sigma^{-1/2}B\Sigma^{-1/2} = \sum_{r=1}^{g-1}Z_r^\top Z_r. \end{align*} Taking the trace converts this matrix identity into a scalar sum of squares: \begin{align*} \operatorname{tr}(\Sigma^{-1}B) = \sum_{r=1}^{g-1}\sum_{k=1}^p Z_{rk}^2. \end{align*} There are exactly $p(g-1)$ independent standard normal squares, so the distribution is $\chi^2_{p(g-1)}$. [/guided] [/step] [step:Show that the within-group covariance consistently estimates $\Sigma$] Define the pooled covariance matrix $S_N: \Omega \to \mathbb R^{p\times p}$ by \begin{align*} S_N = \frac{1}{N}W. \end{align*} For each fixed group $i$, the centered residuals $X_{ij}-\bar X_i$ have the usual Gaussian within-group sum-of-squares decomposition. Since $p$ and $g$ are fixed and $N-g \to \infty$, the law of large numbers for the independent centered Gaussian quadratic terms gives \begin{align*} S_N \xrightarrow{\mathbb P} \Sigma. \end{align*} Because $\Sigma$ is positive definite, matrix inversion is continuous at $\Sigma$, hence \begin{align*} S_N^{-1} \xrightarrow{\mathbb P} \Sigma^{-1}. \end{align*} The same Gaussian decomposition also gives $B=O_{\mathbb P}(1)$, since the preceding step identifies $\operatorname{tr}(\Sigma^{-1}B)$ as a fixed chi-squared random variable. [/step] [step:Expand the log determinant to first order] Define $A_N: \Omega \to \mathbb R^{p\times p}$ by \begin{align*} A_N = W^{-1/2}BW^{-1/2}. \end{align*} Since $W=NS_N$, we have \begin{align*} A_N = \frac{1}{N}S_N^{-1/2}BS_N^{-1/2}. \end{align*} The convergence $S_N^{-1}\xrightarrow{\mathbb P}\Sigma^{-1}$ and the bound $B=O_{\mathbb P}(1)$ imply $A_N=O_{\mathbb P}(N^{-1})$. For symmetric matrices $A$ with $\|A\|_{\mathrm{op}}\leq 1/2$, the Taylor expansion of $\log(1+t)$ applied to the eigenvalues of $A$ gives \begin{align*} \left|\log\det(I_p+A)-\operatorname{tr}A\right| \leq \operatorname{tr}(A^2). \end{align*} Applying this estimate to $A_N$ on an event whose probability tends to $1$ yields \begin{align*} -N\log\Lambda &= N\log\det(I_p+A_N)\\ &= N\operatorname{tr}(A_N)+o_{\mathbb P}(1)\\ &= \operatorname{tr}(S_N^{-1}B)+o_{\mathbb P}(1). \end{align*} Finally, \begin{align*} \operatorname{tr}(S_N^{-1}B)-\operatorname{tr}(\Sigma^{-1}B) = \operatorname{tr}\left((S_N^{-1}-\Sigma^{-1})B\right) \xrightarrow{\mathbb P}0, \end{align*} because $S_N^{-1}-\Sigma^{-1}\xrightarrow{\mathbb P}0$ and $B=O_{\mathbb P}(1)$. Therefore \begin{align*} -N\log\Lambda = \operatorname{tr}(\Sigma^{-1}B)+o_{\mathbb P}(1). \end{align*} [guided] The determinant ratio is converted into a trace because the matrix perturbation is small. Since $W=NS_N$ and $S_N\xrightarrow{\mathbb P}\Sigma$, the matrix $W$ is of order $N$, whereas $B$ is of order one under $H_0$. Thus \begin{align*} A_N = W^{-1/2}BW^{-1/2} = \frac{1}{N}S_N^{-1/2}BS_N^{-1/2} \end{align*} has operator norm of order $N^{-1}$ in probability. The scalar expansion behind the argument is \begin{align*} \log(1+t)=t+O(t^2) \quad \text{as } t\to 0. \end{align*} For a symmetric matrix, applying this expansion to the eigenvalues gives \begin{align*} \left|\log\det(I_p+A)-\operatorname{tr}A\right| \leq \operatorname{tr}(A^2) \end{align*} whenever $\|A\|_{\mathrm{op}}\leq 1/2$. This condition holds for $A=A_N$ with probability tending to $1$. Therefore \begin{align*} -N\log\Lambda &= N\log\det(I_p+A_N)\\ &= N\operatorname{tr}(A_N)+o_{\mathbb P}(1). \end{align*} Using cyclic invariance of trace and $W=NS_N$ gives \begin{align*} N\operatorname{tr}(A_N) = N\operatorname{tr}(W^{-1}B) = \operatorname{tr}(S_N^{-1}B). \end{align*} The pooled covariance matrix satisfies $S_N^{-1}\xrightarrow{\mathbb P}\Sigma^{-1}$, and the between-group matrix is bounded in probability. Hence \begin{align*} \operatorname{tr}(S_N^{-1}B) = \operatorname{tr}(\Sigma^{-1}B)+o_{\mathbb P}(1). \end{align*} Combining these estimates gives \begin{align*} -N\log\Lambda = \operatorname{tr}(\Sigma^{-1}B)+o_{\mathbb P}(1). \end{align*} [/guided] [/step] [step:Replace $N$ by the Bartlett factor and conclude] Define the Bartlett factor $c_N \in \mathbb R$ by \begin{align*} c_N = N-1-\frac{p+g}{2}. \end{align*} Since $p$ and $g$ are fixed, \begin{align*} \frac{c_N}{N} \to 1. \end{align*} From the previous step, $-N\log\Lambda = O_{\mathbb P}(1)$, so \begin{align*} -c_N\log\Lambda = \frac{c_N}{N}(-N\log\Lambda) = \operatorname{tr}(\Sigma^{-1}B)+o_{\mathbb P}(1). \end{align*} The second step showed that $\operatorname{tr}(\Sigma^{-1}B)\sim \chi^2_{p(g-1)}$. Hence \begin{align*} -\left(N-1-\frac{p+g}{2}\right)\log\Lambda \xrightarrow{d} \chi^2_{p(g-1)}. \end{align*} This is the claimed large-residual-degrees-of-freedom chi-squared approximation. [/step]

Prerequisites (0/4 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Orthogonal Projection

Definitions & Concepts

Explore Further

Determinant Definition Distribution Definition Expectation Definition Orthogonal Projection Theorem #437 Reduction of Multivariate Linear Hypotheses to MANOVA probability Wishart Distribution of the Residual Sum of Squares in the Multivariate Linear Model probability Durbin-Levinson Recursion probability Rank Correlations for the Bivariate Normal Distribution probability Nonnegative Definiteness of the Spectral Density Matrix probability Independence of the Sample Mean and Sample Covariance for Multivariate Normal Samples probability Recursive Forecast Formula for Causal ARMA Processes probability Asymptotic Normality of the Maximum Likelihood Estimators in the Multivariate Normal Model probability

What brings you to Androma?

Start with a route through the knowledge graph.