[proofplan]
We write the multivariate normal likelihood and maximize it first over the mean parameters for a fixed covariance matrix. Under the unrestricted model, the minimizing fitted means are the group sample means and the residual matrix is $W$; under $H_0$, the minimizing common mean is the grand mean and the residual matrix is $T$. We then maximize over the positive definite covariance matrix and compute the two maximized likelihoods explicitly. Their ratio is $\left(\det W/\det T\right)^{N/2}$, so the monotone statistic determining the same rejection regions is Wilks' lambda $\Lambda = \det W/\det T$.
[/proofplan]
[step:Write the likelihood in terms of a residual sums-of-products matrix]
For parameter values $m_1,\dots,m_g \in \mathbb{R}^p$ and a symmetric positive definite matrix $\Sigma \in \mathbb{R}^{p \times p}$, define the residual sums-of-products matrix
\begin{align*}
S(m_1,\dots,m_g)
:=
\sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-m_i)(x_{ij}-m_i)^\top.
\end{align*}
The unrestricted likelihood function
\begin{align*}
L_{\mathrm{un}}: (\mathbb{R}^p)^g \times \{\Sigma \in \mathbb{R}^{p \times p}: \Sigma \text{ symmetric positive definite}\} &\to (0,\infty)
\end{align*}
is
\begin{align*}
L_{\mathrm{un}}(m_1,\dots,m_g,\Sigma)
=
(2\pi)^{-Np/2}(\det \Sigma)^{-N/2}
\exp\left(
-\frac{1}{2}\operatorname{tr}\left(\Sigma^{-1}S(m_1,\dots,m_g)\right)
\right).
\end{align*}
Under $H_0$, all group means are equal. For $m \in \mathbb{R}^p$, define
\begin{align*}
S_0(m)
:=
\sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-m)(x_{ij}-m)^\top.
\end{align*}
The null likelihood is the restriction
\begin{align*}
L_0(m,\Sigma)
=
(2\pi)^{-Np/2}(\det \Sigma)^{-N/2}
\exp\left(
-\frac{1}{2}\operatorname{tr}\left(\Sigma^{-1}S_0(m)\right)
\right).
\end{align*}
[guided]
For a fixed data array $(x_{ij})$, the likelihood is a function of the unknown mean vectors and covariance matrix. In the unrestricted model the group mean for group $i$ is an arbitrary vector $m_i \in \mathbb{R}^p$, so we package the residuals into the matrix
\begin{align*}
S(m_1,\dots,m_g)
=
\sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-m_i)(x_{ij}-m_i)^\top.
\end{align*}
This matrix records exactly the quadratic form appearing in the multivariate normal density, because
\begin{align*}
\sum_{i=1}^g \sum_{j=1}^{n_i}
(x_{ij}-m_i)^\top \Sigma^{-1}(x_{ij}-m_i)
=
\operatorname{tr}\left(\Sigma^{-1}S(m_1,\dots,m_g)\right).
\end{align*}
Therefore the unrestricted likelihood is
\begin{align*}
L_{\mathrm{un}}(m_1,\dots,m_g,\Sigma)
=
(2\pi)^{-Np/2}(\det \Sigma)^{-N/2}
\exp\left(
-\frac{1}{2}\operatorname{tr}\left(\Sigma^{-1}S(m_1,\dots,m_g)\right)
\right).
\end{align*}
Under the null hypothesis $H_0$, there is a single common mean vector $m \in \mathbb{R}^p$. Thus the null residual matrix is
\begin{align*}
S_0(m)
=
\sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-m)(x_{ij}-m)^\top,
\end{align*}
and substituting this residual matrix into the same normal likelihood formula gives
\begin{align*}
L_0(m,\Sigma)
=
(2\pi)^{-Np/2}(\det \Sigma)^{-N/2}
\exp\left(
-\frac{1}{2}\operatorname{tr}\left(\Sigma^{-1}S_0(m)\right)
\right).
\end{align*}
[/guided]
[/step]
[step:Maximize over the unrestricted group means to obtain $W$]
Fix a symmetric positive definite matrix $\Sigma$. Since $\Sigma^{-1}$ is symmetric positive definite, maximizing $L_{\mathrm{un}}$ over $m_1,\dots,m_g$ is equivalent to minimizing
\begin{align*}
Q(m_1,\dots,m_g)
:=
\sum_{i=1}^g \sum_{j=1}^{n_i}
(x_{ij}-m_i)^\top \Sigma^{-1}(x_{ij}-m_i).
\end{align*}
For each $i$, completing the square gives
\begin{align*}
\sum_{j=1}^{n_i}
(x_{ij}-m_i)^\top \Sigma^{-1}(x_{ij}-m_i)
&=
\sum_{j=1}^{n_i}
(x_{ij}-\bar{x}_i)^\top \Sigma^{-1}(x_{ij}-\bar{x}_i)
+
n_i(m_i-\bar{x}_i)^\top \Sigma^{-1}(m_i-\bar{x}_i).
\end{align*}
The second term is nonnegative and equals $0$ exactly when $m_i=\bar{x}_i$. Hence the unrestricted maximizing mean vectors are
\begin{align*}
\hat{m}_i = \bar{x}_i,
\qquad i=1,\dots,g,
\end{align*}
and the corresponding residual matrix is
\begin{align*}
S(\hat{m}_1,\dots,\hat{m}_g)=W.
\end{align*}
[guided]
For fixed $\Sigma$, the factors $(2\pi)^{-Np/2}$ and $(\det \Sigma)^{-N/2}$ do not depend on the mean vectors. The exponential function is strictly increasing, so maximizing the likelihood over the mean vectors is the same as minimizing
\begin{align*}
Q(m_1,\dots,m_g)
=
\sum_{i=1}^g \sum_{j=1}^{n_i}
(x_{ij}-m_i)^\top \Sigma^{-1}(x_{ij}-m_i).
\end{align*}
Because $\Sigma$ is symmetric positive definite, $\Sigma^{-1}$ is also symmetric positive definite, so each quadratic term is nonnegative.
We minimize separately group by group. Fix $i \in \{1,\dots,g\}$. Since
\begin{align*}
x_{ij}-m_i = (x_{ij}-\bar{x}_i)+(\bar{x}_i-m_i),
\end{align*}
we expand:
\begin{align*}
\sum_{j=1}^{n_i}
(x_{ij}-m_i)^\top \Sigma^{-1}(x_{ij}-m_i)
&=
\sum_{j=1}^{n_i}
(x_{ij}-\bar{x}_i)^\top \Sigma^{-1}(x_{ij}-\bar{x}_i)\\
&\quad
+
2\sum_{j=1}^{n_i}
(x_{ij}-\bar{x}_i)^\top \Sigma^{-1}(\bar{x}_i-m_i)\\
&\quad
+
\sum_{j=1}^{n_i}
(\bar{x}_i-m_i)^\top \Sigma^{-1}(\bar{x}_i-m_i).
\end{align*}
The middle term vanishes because
\begin{align*}
\sum_{j=1}^{n_i}(x_{ij}-\bar{x}_i)=0.
\end{align*}
Therefore
\begin{align*}
\sum_{j=1}^{n_i}
(x_{ij}-m_i)^\top \Sigma^{-1}(x_{ij}-m_i)
&=
\sum_{j=1}^{n_i}
(x_{ij}-\bar{x}_i)^\top \Sigma^{-1}(x_{ij}-\bar{x}_i)
+
n_i(m_i-\bar{x}_i)^\top \Sigma^{-1}(m_i-\bar{x}_i).
\end{align*}
The final term is minimized uniquely at $m_i=\bar{x}_i$. Applying this to each group gives $\hat{m}_i=\bar{x}_i$ for all $i$, and substituting these fitted means into the residual matrix gives
\begin{align*}
S(\hat{m}_1,\dots,\hat{m}_g)
=
\sum_{i=1}^g \sum_{j=1}^{n_i} (x_{ij}-\bar{x}_i)(x_{ij}-\bar{x}_i)^\top
=
W.
\end{align*}
[/guided]
[/step]
[step:Maximize over the null mean to obtain $T$]
Fix a symmetric positive definite matrix $\Sigma$. Maximizing $L_0$ over $m \in \mathbb{R}^p$ is equivalent to minimizing
\begin{align*}
Q_0(m)
:=
\sum_{i=1}^g \sum_{j=1}^{n_i}
(x_{ij}-m)^\top \Sigma^{-1}(x_{ij}-m).
\end{align*}
Completing the square around the grand mean gives
\begin{align*}
Q_0(m)
&=
\sum_{i=1}^g \sum_{j=1}^{n_i}
(x_{ij}-\bar{x})^\top \Sigma^{-1}(x_{ij}-\bar{x})
+
N(m-\bar{x})^\top \Sigma^{-1}(m-\bar{x}).
\end{align*}
The second term is nonnegative and equals $0$ exactly when $m=\bar{x}$. Hence the null maximizing mean is $\hat{m}_0=\bar{x}$, and the corresponding residual matrix is
\begin{align*}
S_0(\hat{m}_0)=T.
\end{align*}
[/step]
[step:Maximize the likelihood over the covariance matrix]
Let $S \in \mathbb{R}^{p \times p}$ be symmetric positive definite. Define
\begin{align*}
\ell_S: \{\Sigma \in \mathbb{R}^{p \times p}: \Sigma \text{ symmetric positive definite}\} &\to \mathbb{R}
\end{align*}
by
\begin{align*}
\ell_S(\Sigma)
:=
-\frac{N}{2}\log\det\Sigma
-\frac{1}{2}\operatorname{tr}(\Sigma^{-1}S).
\end{align*}
Writing $A:=\Sigma^{-1}S$, the matrix $A$ is similar to the symmetric positive definite matrix $S^{1/2}\Sigma^{-1}S^{1/2}$, so its eigenvalues $\lambda_1,\dots,\lambda_p$ are positive. Since
\begin{align*}
\det\Sigma = \frac{\det S}{\det A},
\qquad
\operatorname{tr}(\Sigma^{-1}S)=\operatorname{tr}A=\sum_{k=1}^p \lambda_k,
\end{align*}
we have
\begin{align*}
\ell_S(\Sigma)
&=
-\frac{N}{2}\log\det S
+
\frac{N}{2}\sum_{k=1}^p \log \lambda_k
-
\frac{1}{2}\sum_{k=1}^p \lambda_k.
\end{align*}
For each $\lambda>0$, the function $\lambda \mapsto N\log\lambda-\lambda$ is maximized at $\lambda=N$. Therefore $\ell_S$ is maximized exactly when $\lambda_1=\cdots=\lambda_p=N$, equivalently $A=N I_p$, equivalently
\begin{align*}
\hat{\Sigma}=\frac{1}{N}S.
\end{align*}
At this maximizer,
\begin{align*}
\max_{\Sigma}\,
(2\pi)^{-Np/2}(\det \Sigma)^{-N/2}
\exp\left(-\frac{1}{2}\operatorname{tr}(\Sigma^{-1}S)\right)
=
(2\pi)^{-Np/2}
\left(\det\frac{S}{N}\right)^{-N/2}
e^{-Np/2}.
\end{align*}
[guided]
After the means have been fitted, the likelihood depends on the covariance matrix only through a fixed positive definite residual matrix $S$. We therefore study the log-likelihood part
\begin{align*}
\ell_S(\Sigma)
=
-\frac{N}{2}\log\det\Sigma
-\frac{1}{2}\operatorname{tr}(\Sigma^{-1}S),
\end{align*}
where $\Sigma$ ranges over symmetric positive definite matrices. The assumption that $S$ is positive definite is used here to ensure that the determinant is positive and that the maximum occurs at a positive definite covariance matrix.
Set $A:=\Sigma^{-1}S$. Although $A$ need not be symmetric, it is similar to
\begin{align*}
S^{1/2}\Sigma^{-1}S^{1/2},
\end{align*}
which is symmetric positive definite. Hence $A$ has positive eigenvalues $\lambda_1,\dots,\lambda_p$. We rewrite the two terms in $\ell_S$ using these eigenvalues. Since $A=\Sigma^{-1}S$, we have
\begin{align*}
\det A = \det(\Sigma^{-1}S)=\frac{\det S}{\det\Sigma},
\end{align*}
and therefore
\begin{align*}
\det\Sigma = \frac{\det S}{\det A}.
\end{align*}
Also,
\begin{align*}
\operatorname{tr}(\Sigma^{-1}S)=\operatorname{tr}A=\sum_{k=1}^p \lambda_k.
\end{align*}
Substituting these identities gives
\begin{align*}
\ell_S(\Sigma)
&=
-\frac{N}{2}\log\det S
+
\frac{N}{2}\sum_{k=1}^p \log \lambda_k
-
\frac{1}{2}\sum_{k=1}^p \lambda_k.
\end{align*}
Thus the maximization separates into $p$ one-variable problems. For $\lambda>0$, the derivative of $N\log\lambda-\lambda$ is $N/\lambda-1$, so the unique critical point is $\lambda=N$; the second derivative is $-N/\lambda^2<0$, so this critical point is the unique maximum. Therefore the maximum occurs exactly when every eigenvalue of $A$ equals $N$. Since $A$ is similar to a symmetric positive definite matrix, this condition is equivalent here to $A=N I_p$, hence
\begin{align*}
\Sigma^{-1}S=N I_p,
\qquad
\text{so}
\qquad
\hat{\Sigma}=\frac{1}{N}S.
\end{align*}
Substituting this maximizing covariance matrix into the normal likelihood gives
\begin{align*}
(2\pi)^{-Np/2}
\left(\det\frac{S}{N}\right)^{-N/2}
\exp\left(
-\frac{1}{2}\operatorname{tr}\left(N S^{-1}S\right)
\right)
=
(2\pi)^{-Np/2}
\left(\det\frac{S}{N}\right)^{-N/2}
e^{-Np/2}.
\end{align*}
[/guided]
[/step]
[step:Compute the likelihood ratio and identify Wilks' lambda]
Applying the covariance maximization formula with $S=W$ gives
\begin{align*}
\sup_{\mathrm{unrestricted}} L
=
(2\pi)^{-Np/2}
\left(\det\frac{W}{N}\right)^{-N/2}
e^{-Np/2}.
\end{align*}
Applying the same formula with $S=T$ gives
\begin{align*}
\sup_{H_0} L
=
(2\pi)^{-Np/2}
\left(\det\frac{T}{N}\right)^{-N/2}
e^{-Np/2}.
\end{align*}
Therefore
\begin{align*}
\frac{\sup_{H_0} L}{\sup_{\mathrm{unrestricted}} L}
&=
\frac{
(2\pi)^{-Np/2}
\left(\det\frac{T}{N}\right)^{-N/2}
e^{-Np/2}
}{
(2\pi)^{-Np/2}
\left(\det\frac{W}{N}\right)^{-N/2}
e^{-Np/2}
}\\
&=
\left(
\frac{\det W}{\det T}
\right)^{N/2}.
\end{align*}
Since the map $u \mapsto u^{N/2}$ is strictly increasing on $(0,\infty)$, the likelihood-[ratio test](/theorems/174) has the same rejection regions when expressed in terms of
\begin{align*}
\Lambda := \frac{\det W}{\det T}.
\end{align*}
This is Wilks' lambda statistic for the one-way MANOVA likelihood-ratio test.
[/step]