[proofplan]
We reduce the hypothesis $C B M = 0$ to an ordinary multivariate general linear model for the transformed response matrix $Z := YM$. The unrestricted least-squares fit gives the residual SSP matrix $E_M$, while the constrained least-squares fit under $C\Gamma = 0$ gives an increase in fitted SSP equal to $H_M$. Once $E_M$ is positive definite, the pair $(H_M,E_M)$ determines a well-defined generalized eigenvalue problem, and the four classical MANOVA statistics are exactly the standard symmetric functions of those roots.
[/proofplan]
[step:Pass to the response contrast model]
Define the response contrast matrix
\begin{align*}
Z := YM \in \mathbb{R}^{N \times \ell}.
\end{align*}
Since $Y = XB + U$, we have
\begin{align*}
Z = XBM + UM.
\end{align*}
Define the contrast coefficient matrix
\begin{align*}
\Gamma := BM \in \mathbb{R}^{r \times \ell}
\end{align*}
and the contrast error matrix
\begin{align*}
V := UM \in \mathbb{R}^{N \times \ell}.
\end{align*}
Then the transformed model is
\begin{align*}
Z = X\Gamma + V.
\end{align*}
The hypothesis $H_0 : C B M = 0$ is therefore exactly
\begin{align*}
H_0 : C\Gamma = 0.
\end{align*}
Because $X$ has full column rank, $X^\top X$ is invertible. Hence the unrestricted least-squares estimator of $\Gamma$ is
\begin{align*}
\hat\Gamma
&= (X^\top X)^{-1}X^\top Z \\
&= (X^\top X)^{-1}X^\top Y M \\
&= \hat B M.
\end{align*}
[guided]
The role of $M$ is to select the response directions in which the hypothesis is tested. We therefore replace the original response matrix $Y$ by the transformed response matrix
\begin{align*}
Z := YM \in \mathbb{R}^{N \times \ell}.
\end{align*}
Substituting the model equation $Y = XB + U$ gives
\begin{align*}
Z = YM = XBM + UM.
\end{align*}
Define
\begin{align*}
\Gamma := BM \in \mathbb{R}^{r \times \ell},
\qquad
V := UM \in \mathbb{R}^{N \times \ell}.
\end{align*}
Then the transformed model has the standard multivariate linear form
\begin{align*}
Z = X\Gamma + V.
\end{align*}
The original null hypothesis $C B M = 0$ becomes exactly
\begin{align*}
C\Gamma = 0.
\end{align*}
Since $X$ has full column rank, the Gram matrix $X^\top X$ is invertible. Thus the least-squares estimator in the transformed model is
\begin{align*}
\hat\Gamma
&= (X^\top X)^{-1}X^\top Z \\
&= (X^\top X)^{-1}X^\top Y M \\
&= \hat B M.
\end{align*}
This shows that testing $C B M = 0$ in the original model is the same as testing $C\Gamma = 0$ in the contrast model.
[/guided]
[/step]
[step:Identify the error SSP matrix from the unrestricted residuals]
The unrestricted fitted value in the contrast model is
\begin{align*}
\hat Z := X\hat\Gamma = X(X^\top X)^{-1}X^\top Z = P_X Z.
\end{align*}
The unrestricted residual matrix is therefore
\begin{align*}
R := Z - \hat Z = (I_N - P_X)Z.
\end{align*}
The error SSP matrix is
\begin{align*}
R^\top R
&= Z^\top (I_N - P_X)^\top (I_N - P_X)Z.
\end{align*}
Since $P_X$ is symmetric and idempotent, $I_N - P_X$ is also symmetric and idempotent. Hence
\begin{align*}
R^\top R
&= Z^\top (I_N - P_X)Z \\
&= M^\top Y^\top (I_N - P_X)Y M \\
&= E_M.
\end{align*}
Thus $E_M$ is the error sum-of-squares-and-products matrix for the transformed response $YM$.
[/step]
[step:Compute the constrained least-squares increase under $C\Gamma = 0$]
Let
\begin{align*}
A := C(X^\top X)^{-1}C^\top \in \mathbb{R}^{k \times k}.
\end{align*}
Because $C$ has full row rank and $X^\top X$ is positive definite, $A$ is positive definite and hence invertible.
Define
\begin{align*}
\tilde\Gamma
:= \hat\Gamma - (X^\top X)^{-1}C^\top A^{-1}C\hat\Gamma.
\end{align*}
Then
\begin{align*}
C\tilde\Gamma
&= C\hat\Gamma - C(X^\top X)^{-1}C^\top A^{-1}C\hat\Gamma \\
&= C\hat\Gamma - A A^{-1}C\hat\Gamma \\
&= 0,
\end{align*}
so $\tilde\Gamma$ is feasible for the constraint $C\Gamma=0$.
We now prove that $\tilde\Gamma$ is the constrained least-squares minimizer. Define the residual sum-of-squares functional
\begin{align*}
Q: \mathbb{R}^{r \times \ell} &\to \mathbb{R} \\
G &\mapsto \operatorname{tr}\!\left((Z-XG)^\top(Z-XG)\right).
\end{align*}
Since $X^\top(Z-X\hat\Gamma)=0$, expanding around $\hat\Gamma$ gives, for every $G \in \mathbb{R}^{r \times \ell}$,
\begin{align*}
Q(G)
&= Q(\hat\Gamma) + \operatorname{tr}\!\left((G-\hat\Gamma)^\top X^\top X(G-\hat\Gamma)\right).
\end{align*}
Let $G \in \mathbb{R}^{r \times \ell}$ satisfy $CG=0$, and define $\Delta := G-\tilde\Gamma \in \mathbb{R}^{r \times \ell}$. Then $C\Delta=0$. Moreover
\begin{align*}
\tilde\Gamma-\hat\Gamma
= -(X^\top X)^{-1}C^\top A^{-1}C\hat\Gamma,
\end{align*}
and hence the $X^\top X$-inner product cross-term vanishes:
\begin{align*}
\operatorname{tr}\!\left((\tilde\Gamma-\hat\Gamma)^\top X^\top X\Delta\right)
&= -\operatorname{tr}\!\left((C\hat\Gamma)^\top A^{-1}C\Delta\right) \\
&= 0.
\end{align*}
Therefore
\begin{align*}
Q(G)-Q(\tilde\Gamma)
&= \operatorname{tr}\!\left(\Delta^\top X^\top X\Delta\right)
\geq 0.
\end{align*}
Since $X^\top X$ is positive definite, equality holds only when $\Delta=0$. Thus $\tilde\Gamma$ is the unique constrained least-squares estimator under $C\Gamma=0$.
The fitted-value difference between the unrestricted and constrained fits is
\begin{align*}
X\hat\Gamma - X\tilde\Gamma
= X(X^\top X)^{-1}C^\top A^{-1}C\hat\Gamma.
\end{align*}
Therefore the hypothesis SSP matrix, defined as the cross-product of this fitted-value difference, is
\begin{align*}
(X\hat\Gamma - X\tilde\Gamma)^\top(X\hat\Gamma - X\tilde\Gamma)
&= \hat\Gamma^\top C^\top A^{-1}C(X^\top X)^{-1}X^\top X(X^\top X)^{-1}C^\top A^{-1}C\hat\Gamma \\
&= \hat\Gamma^\top C^\top A^{-1}C(X^\top X)^{-1}C^\top A^{-1}C\hat\Gamma \\
&= \hat\Gamma^\top C^\top A^{-1}A A^{-1}C\hat\Gamma \\
&= \hat\Gamma^\top C^\top A^{-1}C\hat\Gamma.
\end{align*}
Using $\hat\Gamma = \hat B M$, this becomes
\begin{align*}
H_M
= M^\top \hat B^\top C^\top \left(C(X^\top X)^{-1}C^\top\right)^{-1}C\hat B M.
\end{align*}
Thus $H_M$ is the hypothesis sum-of-squares-and-products matrix associated with $H_0 : C B M = 0$.
[guided]
The constrained estimator is the unrestricted estimator adjusted just enough to satisfy $C\Gamma = 0$. Define
\begin{align*}
A := C(X^\top X)^{-1}C^\top \in \mathbb{R}^{k \times k}.
\end{align*}
The matrix $X^\top X$ is positive definite because $X$ has full column rank. Since $C$ has full row rank, for every nonzero $a \in \mathbb{R}^k$ we have $C^\top a \neq 0$, and therefore
\begin{align*}
a^\top A a
= (C^\top a)^\top (X^\top X)^{-1}(C^\top a)
> 0.
\end{align*}
So $A$ is positive definite and invertible.
Define
\begin{align*}
\tilde\Gamma
:= \hat\Gamma - (X^\top X)^{-1}C^\top A^{-1}C\hat\Gamma.
\end{align*}
This matrix satisfies the constraint because
\begin{align*}
C\tilde\Gamma
&= C\hat\Gamma - C(X^\top X)^{-1}C^\top A^{-1}C\hat\Gamma \\
&= C\hat\Gamma - A A^{-1}C\hat\Gamma \\
&= 0.
\end{align*}
So $\tilde\Gamma$ is feasible, but feasibility alone is not enough: we must prove it minimizes the least-squares criterion among all matrices satisfying $C\Gamma=0$.
Define the least-squares objective
\begin{align*}
Q: \mathbb{R}^{r \times \ell} &\to \mathbb{R} \\
G &\mapsto \operatorname{tr}\!\left((Z-XG)^\top(Z-XG)\right).
\end{align*}
The unrestricted normal equation is $X^\top(Z-X\hat\Gamma)=0$. Therefore, when we expand $Q(G)$ around $\hat\Gamma$, the linear cross-term disappears and we get
\begin{align*}
Q(G)
&= Q(\hat\Gamma) + \operatorname{tr}\!\left((G-\hat\Gamma)^\top X^\top X(G-\hat\Gamma)\right)
\end{align*}
for every $G \in \mathbb{R}^{r \times \ell}$.
Now let $G \in \mathbb{R}^{r \times \ell}$ be any feasible coefficient matrix, so $CG=0$, and define
\begin{align*}
\Delta := G-\tilde\Gamma \in \mathbb{R}^{r \times \ell}.
\end{align*}
Since both $G$ and $\tilde\Gamma$ satisfy the constraint, $C\Delta=0$. Also
\begin{align*}
\tilde\Gamma-\hat\Gamma
= -(X^\top X)^{-1}C^\top A^{-1}C\hat\Gamma.
\end{align*}
Thus the correction $\tilde\Gamma-\hat\Gamma$ is orthogonal, in the $X^\top X$ inner product, to every feasible direction $\Delta$:
\begin{align*}
\operatorname{tr}\!\left((\tilde\Gamma-\hat\Gamma)^\top X^\top X\Delta\right)
&= -\operatorname{tr}\!\left((C\hat\Gamma)^\top A^{-1}C\Delta\right) \\
&= 0.
\end{align*}
Consequently,
\begin{align*}
Q(G)-Q(\tilde\Gamma)
&= \operatorname{tr}\!\left(\Delta^\top X^\top X\Delta\right)
\geq 0.
\end{align*}
The matrix $X^\top X$ is positive definite, so equality occurs only when $\Delta=0$, that is, only when $G=\tilde\Gamma$. Hence $\tilde\Gamma$ is the unique constrained least-squares fit. This is exactly the projection of $\hat\Gamma$ onto the constraint space $\{G \in \mathbb{R}^{r \times \ell}: CG=0\}$ with respect to the $X^\top X$ inner product.
The hypothesis SSP matrix measures how much fitted variation is lost by imposing the null hypothesis. The difference between unrestricted and constrained fitted values is
\begin{align*}
X\hat\Gamma - X\tilde\Gamma
= X(X^\top X)^{-1}C^\top A^{-1}C\hat\Gamma.
\end{align*}
Taking its cross-product gives
\begin{align*}
(X\hat\Gamma - X\tilde\Gamma)^\top(X\hat\Gamma - X\tilde\Gamma)
&= \hat\Gamma^\top C^\top A^{-1}C(X^\top X)^{-1}X^\top X(X^\top X)^{-1}C^\top A^{-1}C\hat\Gamma \\
&= \hat\Gamma^\top C^\top A^{-1}C(X^\top X)^{-1}C^\top A^{-1}C\hat\Gamma \\
&= \hat\Gamma^\top C^\top A^{-1}A A^{-1}C\hat\Gamma \\
&= \hat\Gamma^\top C^\top A^{-1}C\hat\Gamma.
\end{align*}
Finally, since $\hat\Gamma = \hat B M$, this is exactly
\begin{align*}
H_M
= M^\top \hat B^\top C^\top \left(C(X^\top X)^{-1}C^\top\right)^{-1}C\hat B M.
\end{align*}
This verifies that $H_M$ is the hypothesis SSP matrix for the contrast hypothesis.
[/guided]
[/step]
[step:Express the four MANOVA statistics through the real nonnegative roots of $(H_M,E_M)$]
Assume $E_M$ is positive definite. Since $H_M$ is a cross-product matrix, it is positive semidefinite. Let $E_M^{1/2} \in \mathbb{R}^{\ell \times \ell}$ denote the positive definite square root of $E_M$, and define
\begin{align*}
S := E_M^{-1/2}H_M E_M^{-1/2} \in \mathbb{R}^{\ell \times \ell}.
\end{align*}
Then $S$ is symmetric positive semidefinite. Also
\begin{align*}
E_M^{-1}H_M
= E_M^{-1/2} S E_M^{1/2},
\end{align*}
so $E_M^{-1}H_M$ is similar to $S$. Therefore the generalized roots of $(H_M,E_M)$ are real and nonnegative. Denote the nonzero roots, counted with multiplicity, by
\begin{align*}
\lambda_1,\dots,\lambda_s,
\end{align*}
where $s \leq \ell$.
The four standard MANOVA statistics are then obtained from these roots:
\begin{align*}
\Lambda_{\mathrm{Wilks}}
&:= \frac{\det(E_M)}{\det(E_M + H_M)}
= \prod_{i=1}^{s}(1+\lambda_i)^{-1}, \\
V_{\mathrm{Pillai}}
&:= \operatorname{tr}\!\left(H_M(H_M+E_M)^{-1}\right)
= \sum_{i=1}^{s}\frac{\lambda_i}{1+\lambda_i}, \\
T_{\mathrm{LH}}
&:= \operatorname{tr}\!\left(E_M^{-1}H_M\right)
= \sum_{i=1}^{s}\lambda_i, \\
\Theta_{\mathrm{Roy}}
&:= \lambda_{\max},
\end{align*}
where $\lambda_{\max} := \max\{\lambda_i : 1 \leq i \leq s\}$. Thus Wilks' lambda, Pillai's trace, the Lawley-Hotelling trace, and Roy's largest root are precisely the root statistics associated with $(H_M,E_M)$.
[guided]
The final step is to check that the root statistics are well-defined. The hypothesis matrix $H_M$ was constructed as a cross-product matrix, so for every $a \in \mathbb{R}^\ell$,
\begin{align*}
a^\top H_M a \geq 0.
\end{align*}
Thus $H_M$ is positive semidefinite. The theorem assumes $E_M$ is positive definite, so it has a unique positive definite square root $E_M^{1/2}$ and an inverse square root $E_M^{-1/2}$.
Define
\begin{align*}
S := E_M^{-1/2}H_M E_M^{-1/2} \in \mathbb{R}^{\ell \times \ell}.
\end{align*}
This matrix is symmetric because both $E_M^{-1/2}$ and $H_M$ are symmetric, and it is positive semidefinite because
\begin{align*}
a^\top S a
= (E_M^{-1/2}a)^\top H_M(E_M^{-1/2}a)
\geq 0
\end{align*}
for every $a \in \mathbb{R}^\ell$. Therefore all eigenvalues of $S$ are real and nonnegative.
The generalized eigenvalue problem for the pair $(H_M,E_M)$ is equivalent to the ordinary eigenvalue problem for $S$. Indeed,
\begin{align*}
E_M^{-1}H_M
= E_M^{-1/2} S E_M^{1/2},
\end{align*}
so $E_M^{-1}H_M$ is similar to $S$ and has the same eigenvalues. Denote the nonzero eigenvalues by
\begin{align*}
\lambda_1,\dots,\lambda_s,
\end{align*}
counted with multiplicity, where $s \leq \ell$. These are the real nonnegative generalized roots of $(H_M,E_M)$.
The classical MANOVA tests are symmetric functions of these roots:
\begin{align*}
\Lambda_{\mathrm{Wilks}}
&:= \frac{\det(E_M)}{\det(E_M + H_M)}
= \prod_{i=1}^{s}(1+\lambda_i)^{-1}, \\
V_{\mathrm{Pillai}}
&:= \operatorname{tr}\!\left(H_M(H_M+E_M)^{-1}\right)
= \sum_{i=1}^{s}\frac{\lambda_i}{1+\lambda_i}, \\
T_{\mathrm{LH}}
&:= \operatorname{tr}\!\left(E_M^{-1}H_M\right)
= \sum_{i=1}^{s}\lambda_i, \\
\Theta_{\mathrm{Roy}}
&:= \lambda_{\max},
\end{align*}
where $\lambda_{\max} := \max\{\lambda_i : 1 \leq i \leq s\}$. Hence the pair $(H_M,E_M)$ supplies exactly the roots used by Wilks' lambda, Pillai's trace, the Lawley-Hotelling trace, and Roy's largest root.
[/guided]
[/step]
[step:Conclude that the constructed matrices are the required SSP matrices]
The preceding steps construct $E_M$ as the residual cross-product matrix for the unrestricted fit of $YM$ on $X$, and construct $H_M$ as the fitted cross-product increase obtained by comparing the unrestricted fit with the constrained fit satisfying $C B M = 0$. Hence $H_M$ and $E_M$ are exactly the hypothesis and error SSP matrices associated with the estimable hypothesis $H_0 : C B M = 0$. When $E_M$ is positive definite, the matrix pair $(H_M,E_M)$ has well-defined generalized roots, and the four classical MANOVA tests are obtained by applying the corresponding Wilks, Pillai, Lawley-Hotelling, and Roy root statistics to that pair. This proves the theorem.
[/step]