Wilks' Lambda Statistic for One-Way MANOVA — Statement & Proof

Wilks' Lambda Statistic for One-Way MANOVA (Theorem # 4056)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We write the multivariate normal likelihood and maximize it first over the mean parameters for a fixed covariance matrix. Under the unrestricted model, the minimizing fitted means are the group sample means and the residual matrix is $W$; under $H_0$, the minimizing common mean is the grand mean and the residual matrix is $T$. We then maximize over the positive definite covariance matrix and compute the two maximized likelihoods explicitly. Their ratio is $\left(\det W/\det T\right)^{N/2}$, so the monotone statistic determining the same rejection regions is Wilks' lambda $\Lambda = \det W/\det T$. [/proofplan] [step:Write the likelihood in terms of a residual sums-of-products matrix] For parameter values $m_1,\dots,m_g \in \mathbb{R}^p$ and a symmetric positive definite matrix $\Sigma \in \mathbb{R}^{p \times p}$, define the residual sums-of-products matrix \begin{align*} S(m_1,\dots,m_g) := \sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-m_i)(x_{ij}-m_i)^\top. \end{align*} The unrestricted likelihood function \begin{align*} L_{\mathrm{un}}: (\mathbb{R}^p)^g \times \{\Sigma \in \mathbb{R}^{p \times p}: \Sigma \text{ symmetric positive definite}\} &\to (0,\infty) \end{align*} is \begin{align*} L_{\mathrm{un}}(m_1,\dots,m_g,\Sigma) = (2\pi)^{-Np/2}(\det \Sigma)^{-N/2} \exp\left( -\frac{1}{2}\operatorname{tr}\left(\Sigma^{-1}S(m_1,\dots,m_g)\right) \right). \end{align*} Under $H_0$, all group means are equal. For $m \in \mathbb{R}^p$, define \begin{align*} S_0(m) := \sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-m)(x_{ij}-m)^\top. \end{align*} The null likelihood is the restriction \begin{align*} L_0(m,\Sigma) = (2\pi)^{-Np/2}(\det \Sigma)^{-N/2} \exp\left( -\frac{1}{2}\operatorname{tr}\left(\Sigma^{-1}S_0(m)\right) \right). \end{align*} [guided] For a fixed data array $(x_{ij})$, the likelihood is a function of the unknown mean vectors and covariance matrix. In the unrestricted model the group mean for group $i$ is an arbitrary vector $m_i \in \mathbb{R}^p$, so we package the residuals into the matrix \begin{align*} S(m_1,\dots,m_g) = \sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-m_i)(x_{ij}-m_i)^\top. \end{align*} This matrix records exactly the quadratic form appearing in the multivariate normal density, because \begin{align*} \sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-m_i)^\top \Sigma^{-1}(x_{ij}-m_i) = \operatorname{tr}\left(\Sigma^{-1}S(m_1,\dots,m_g)\right). \end{align*} Therefore the unrestricted likelihood is \begin{align*} L_{\mathrm{un}}(m_1,\dots,m_g,\Sigma) = (2\pi)^{-Np/2}(\det \Sigma)^{-N/2} \exp\left( -\frac{1}{2}\operatorname{tr}\left(\Sigma^{-1}S(m_1,\dots,m_g)\right) \right). \end{align*} Under the null hypothesis $H_0$, there is a single common mean vector $m \in \mathbb{R}^p$. Thus the null residual matrix is \begin{align*} S_0(m) = \sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-m)(x_{ij}-m)^\top, \end{align*} and substituting this residual matrix into the same normal likelihood formula gives \begin{align*} L_0(m,\Sigma) = (2\pi)^{-Np/2}(\det \Sigma)^{-N/2} \exp\left( -\frac{1}{2}\operatorname{tr}\left(\Sigma^{-1}S_0(m)\right) \right). \end{align*} [/guided] [/step] [step:Maximize over the unrestricted group means to obtain $W$] Fix a symmetric positive definite matrix $\Sigma$. Since $\Sigma^{-1}$ is symmetric positive definite, maximizing $L_{\mathrm{un}}$ over $m_1,\dots,m_g$ is equivalent to minimizing \begin{align*} Q(m_1,\dots,m_g) := \sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-m_i)^\top \Sigma^{-1}(x_{ij}-m_i). \end{align*} For each $i$, completing the square gives \begin{align*} \sum_{j=1}^{n_i} (x_{ij}-m_i)^\top \Sigma^{-1}(x_{ij}-m_i) &= \sum_{j=1}^{n_i} (x_{ij}-\bar{x}_i)^\top \Sigma^{-1}(x_{ij}-\bar{x}_i) + n_i(m_i-\bar{x}_i)^\top \Sigma^{-1}(m_i-\bar{x}_i). \end{align*} The second term is nonnegative and equals $0$ exactly when $m_i=\bar{x}_i$. Hence the unrestricted maximizing mean vectors are \begin{align*} \hat{m}_i = \bar{x}_i, \qquad i=1,\dots,g, \end{align*} and the corresponding residual matrix is \begin{align*} S(\hat{m}_1,\dots,\hat{m}_g)=W. \end{align*} [guided] For fixed $\Sigma$, the factors $(2\pi)^{-Np/2}$ and $(\det \Sigma)^{-N/2}$ do not depend on the mean vectors. The exponential function is strictly increasing, so maximizing the likelihood over the mean vectors is the same as minimizing \begin{align*} Q(m_1,\dots,m_g) = \sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-m_i)^\top \Sigma^{-1}(x_{ij}-m_i). \end{align*} Because $\Sigma$ is symmetric positive definite, $\Sigma^{-1}$ is also symmetric positive definite, so each quadratic term is nonnegative. We minimize separately group by group. Fix $i \in \{1,\dots,g\}$. Since \begin{align*} x_{ij}-m_i = (x_{ij}-\bar{x}_i)+(\bar{x}_i-m_i), \end{align*} we expand: \begin{align*} \sum_{j=1}^{n_i} (x_{ij}-m_i)^\top \Sigma^{-1}(x_{ij}-m_i) &= \sum_{j=1}^{n_i} (x_{ij}-\bar{x}_i)^\top \Sigma^{-1}(x_{ij}-\bar{x}_i)\\ &\quad + 2\sum_{j=1}^{n_i} (x_{ij}-\bar{x}_i)^\top \Sigma^{-1}(\bar{x}_i-m_i)\\ &\quad + \sum_{j=1}^{n_i} (\bar{x}_i-m_i)^\top \Sigma^{-1}(\bar{x}_i-m_i). \end{align*} The middle term vanishes because \begin{align*} \sum_{j=1}^{n_i}(x_{ij}-\bar{x}_i)=0. \end{align*} Therefore \begin{align*} \sum_{j=1}^{n_i} (x_{ij}-m_i)^\top \Sigma^{-1}(x_{ij}-m_i) &= \sum_{j=1}^{n_i} (x_{ij}-\bar{x}_i)^\top \Sigma^{-1}(x_{ij}-\bar{x}_i) + n_i(m_i-\bar{x}_i)^\top \Sigma^{-1}(m_i-\bar{x}_i). \end{align*} The final term is minimized uniquely at $m_i=\bar{x}_i$. Applying this to each group gives $\hat{m}_i=\bar{x}_i$ for all $i$, and substituting these fitted means into the residual matrix gives \begin{align*} S(\hat{m}_1,\dots,\hat{m}_g) = \sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-\bar{x}_i)(x_{ij}-\bar{x}_i)^\top = W. \end{align*} [/guided] [/step] [step:Maximize over the null mean to obtain $T$] Fix a symmetric positive definite matrix $\Sigma$. Maximizing $L_0$ over $m \in \mathbb{R}^p$ is equivalent to minimizing \begin{align*} Q_0(m) := \sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-m)^\top \Sigma^{-1}(x_{ij}-m). \end{align*} Completing the square around the grand mean gives \begin{align*} Q_0(m) &= \sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-\bar{x})^\top \Sigma^{-1}(x_{ij}-\bar{x}) + N(m-\bar{x})^\top \Sigma^{-1}(m-\bar{x}). \end{align*} The second term is nonnegative and equals $0$ exactly when $m=\bar{x}$. Hence the null maximizing mean is $\hat{m}_0=\bar{x}$, and the corresponding residual matrix is \begin{align*} S_0(\hat{m}_0)=T. \end{align*} [/step] [step:Maximize the likelihood over the covariance matrix] Let $S \in \mathbb{R}^{p \times p}$ be symmetric positive definite. Define \begin{align*} \ell_S: \{\Sigma \in \mathbb{R}^{p \times p}: \Sigma \text{ symmetric positive definite}\} &\to \mathbb{R} \end{align*} by \begin{align*} \ell_S(\Sigma) := -\frac{N}{2}\log\det\Sigma -\frac{1}{2}\operatorname{tr}(\Sigma^{-1}S). \end{align*} Writing $A:=\Sigma^{-1}S$, the matrix $A$ is similar to the symmetric positive definite matrix $S^{1/2}\Sigma^{-1}S^{1/2}$, so its eigenvalues $\lambda_1,\dots,\lambda_p$ are positive. Since \begin{align*} \det\Sigma = \frac{\det S}{\det A}, \qquad \operatorname{tr}(\Sigma^{-1}S)=\operatorname{tr}A=\sum_{k=1}^p \lambda_k, \end{align*} we have \begin{align*} \ell_S(\Sigma) &= -\frac{N}{2}\log\det S + \frac{N}{2}\sum_{k=1}^p \log \lambda_k - \frac{1}{2}\sum_{k=1}^p \lambda_k. \end{align*} For each $\lambda>0$, the function $\lambda \mapsto N\log\lambda-\lambda$ is maximized at $\lambda=N$. Therefore $\ell_S$ is maximized exactly when $\lambda_1=\cdots=\lambda_p=N$, equivalently $A=N I_p$, equivalently \begin{align*} \hat{\Sigma}=\frac{1}{N}S. \end{align*} At this maximizer, \begin{align*} \max_{\Sigma}\, (2\pi)^{-Np/2}(\det \Sigma)^{-N/2} \exp\left(-\frac{1}{2}\operatorname{tr}(\Sigma^{-1}S)\right) = (2\pi)^{-Np/2} \left(\det\frac{S}{N}\right)^{-N/2} e^{-Np/2}. \end{align*} [guided] After the means have been fitted, the likelihood depends on the covariance matrix only through a fixed positive definite residual matrix $S$. We therefore study the log-likelihood part \begin{align*} \ell_S(\Sigma) = -\frac{N}{2}\log\det\Sigma -\frac{1}{2}\operatorname{tr}(\Sigma^{-1}S), \end{align*} where $\Sigma$ ranges over symmetric positive definite matrices. The assumption that $S$ is positive definite is used here to ensure that the determinant is positive and that the maximum occurs at a positive definite covariance matrix. Set $A:=\Sigma^{-1}S$. Although $A$ need not be symmetric, it is similar to \begin{align*} S^{1/2}\Sigma^{-1}S^{1/2}, \end{align*} which is symmetric positive definite. Hence $A$ has positive eigenvalues $\lambda_1,\dots,\lambda_p$. We rewrite the two terms in $\ell_S$ using these eigenvalues. Since $A=\Sigma^{-1}S$, we have \begin{align*} \det A = \det(\Sigma^{-1}S)=\frac{\det S}{\det\Sigma}, \end{align*} and therefore \begin{align*} \det\Sigma = \frac{\det S}{\det A}. \end{align*} Also, \begin{align*} \operatorname{tr}(\Sigma^{-1}S)=\operatorname{tr}A=\sum_{k=1}^p \lambda_k. \end{align*} Substituting these identities gives \begin{align*} \ell_S(\Sigma) &= -\frac{N}{2}\log\det S + \frac{N}{2}\sum_{k=1}^p \log \lambda_k - \frac{1}{2}\sum_{k=1}^p \lambda_k. \end{align*} Thus the maximization separates into $p$ one-variable problems. For $\lambda>0$, the derivative of $N\log\lambda-\lambda$ is $N/\lambda-1$, so the unique critical point is $\lambda=N$; the second derivative is $-N/\lambda^2<0$, so this critical point is the unique maximum. Therefore the maximum occurs exactly when every eigenvalue of $A$ equals $N$. Since $A$ is similar to a symmetric positive definite matrix, this condition is equivalent here to $A=N I_p$, hence \begin{align*} \Sigma^{-1}S=N I_p, \qquad \text{so} \qquad \hat{\Sigma}=\frac{1}{N}S. \end{align*} Substituting this maximizing covariance matrix into the normal likelihood gives \begin{align*} (2\pi)^{-Np/2} \left(\det\frac{S}{N}\right)^{-N/2} \exp\left( -\frac{1}{2}\operatorname{tr}\left(N S^{-1}S\right) \right) = (2\pi)^{-Np/2} \left(\det\frac{S}{N}\right)^{-N/2} e^{-Np/2}. \end{align*} [/guided] [/step] [step:Compute the likelihood ratio and identify Wilks' lambda] Applying the covariance maximization formula with $S=W$ gives \begin{align*} \sup_{\mathrm{unrestricted}} L = (2\pi)^{-Np/2} \left(\det\frac{W}{N}\right)^{-N/2} e^{-Np/2}. \end{align*} Applying the same formula with $S=T$ gives \begin{align*} \sup_{H_0} L = (2\pi)^{-Np/2} \left(\det\frac{T}{N}\right)^{-N/2} e^{-Np/2}. \end{align*} Therefore \begin{align*} \frac{\sup_{H_0} L}{\sup_{\mathrm{unrestricted}} L} &= \frac{ (2\pi)^{-Np/2} \left(\det\frac{T}{N}\right)^{-N/2} e^{-Np/2} }{ (2\pi)^{-Np/2} \left(\det\frac{W}{N}\right)^{-N/2} e^{-Np/2} }\\ &= \left( \frac{\det W}{\det T} \right)^{N/2}. \end{align*} Since the map $u \mapsto u^{N/2}$ is strictly increasing on $(0,\infty)$, the likelihood-[ratio test](/theorems/174) has the same rejection regions when expressed in terms of \begin{align*} \Lambda := \frac{\det W}{\det T}. \end{align*} This is Wilks' lambda statistic for the one-way MANOVA likelihood-ratio test. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Determinant

Explore Further

Determinant Definition test Theorem #89 Ratio Test Theorem #174 Kalman Filter Step with a Missing Observation probability Monotonicity and Triangle Inequality for Expectation probability Fisher Discriminant Maximisation Theorem probability Gauss-Markov Efficiency of Generalized Least Squares in Seemingly Unrelated Regressions probability Consistency of Sample Principal Component Analysis probability Simultaneous Confidence Intervals for Mean Contrasts in One-Way MANOVA probability Innovations Algorithm probability Conditional Distribution of a Multivariate Normal Vector probability

What brings you to Androma?

Start with a route through the knowledge graph.

Wilks' Lambda Statistic for One-Way MANOVA (Theorem # 4056)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Wilks' Lambda Statistic for One-Way MANOVA (Theorem # 4056)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further