[proofplan]
The unrestricted Gaussian likelihood is maximized at the full sample covariance matrix $S$, while the likelihood under $H_0$ is maximized at the block-diagonal covariance matrix with diagonal blocks $S_{11}$ and $S_{22}$. Thus Wilks' lambda reduces to a determinant ratio. The Schur complement rewrites this determinant ratio as the determinant of an identity minus a whitened cross-covariance product. The eigenvalues of that whitened product are precisely the squared sample canonical correlations, so the determinant factors into the desired product.
[/proofplan]
[step:Compute the unrestricted and constrained covariance maximizers]
Let $d=p+q$, and let $(\Omega,\mathcal F,\mathbb P)$ denote the underlying probability space. Let $Y_1,\dots,Y_n: (\Omega,\mathcal F) \to (\mathbb{R}^{d},\mathcal B(\mathbb{R}^{d}))$ denote the observed independent Gaussian random vectors, and define the sample mean $\bar{Y}\in\mathbb{R}^{d}$ and sample covariance matrix $S\in\mathbb{R}^{d\times d}$ by
\begin{align*}
\bar{Y}
&=
\frac{1}{n}\sum_{i=1}^{n}Y_i,\qquad
S
=
\frac{1}{n}\sum_{i=1}^{n}(Y_i-\bar{Y})(Y_i-\bar{Y})^\top.
\end{align*}
Partition $S$ conformally with $\mathbb{R}^{d}=\mathbb{R}^{p}\times\mathbb{R}^{q}$ by
\begin{align*}
S=
\begin{pmatrix}
S_{11} & S_{12}\\
S_{21} & S_{22}
\end{pmatrix},
\end{align*}
where $S_{11}\in\mathbb{R}^{p\times p}$, $S_{12}\in\mathbb{R}^{p\times q}$, $S_{21}\in\mathbb{R}^{q\times p}$, and $S_{22}\in\mathbb{R}^{q\times q}$.
Let $\mathcal{P}_d$ denote the set of positive definite symmetric matrices in $\mathbb{R}^{d\times d}$. The centered Gaussian log-likelihood, up to terms independent of $\Sigma$, is the map
\begin{align*}
\ell: \mathcal{P}_d &\to \mathbb{R}\\
\Sigma &\mapsto
-\frac{n}{2}\log\det \Sigma
-\frac{n}{2}\operatorname{tr}(\Sigma^{-1}S).
\end{align*}
For $\Sigma\in\mathcal{P}_d$, define $M=\Sigma^{-1/2}S\Sigma^{-1/2}\in\mathbb{R}^{d\times d}$. Since $n>d$, $S\in\mathcal{P}_d$ almost surely. If $\lambda_1,\dots,\lambda_d>0$ are the eigenvalues of $M$, then
\begin{align*}
\ell(\Sigma)-\ell(S)
&=
-\frac{n}{2}\left(\log\det M^{-1}+\operatorname{tr}M-d\right)\\
&=
-\frac{n}{2}\sum_{j=1}^{d}\left(\lambda_j-1-\log\lambda_j\right)
\leq 0,
\end{align*}
where the last inequality uses $\log t\leq t-1$ for every $t>0$. Equality holds exactly when $\lambda_j=1$ for every $j$, equivalently $M=I_d$, so the unrestricted maximum is attained at $\hat{\Sigma}=S$.
Under $H_0$, the covariance matrix has the block-diagonal form
\begin{align*}
\Sigma_0=
\begin{pmatrix}
A & 0\\
0 & B
\end{pmatrix},
\end{align*}
where $A\in\mathbb{R}^{p\times p}$ and $B\in\mathbb{R}^{q\times q}$ are positive definite. Substituting this form gives
\begin{align*}
\ell(\Sigma_0)
=
-\frac{n}{2}\log\det A
-\frac{n}{2}\log\det B
-\frac{n}{2}\operatorname{tr}(A^{-1}S_{11})
-\frac{n}{2}\operatorname{tr}(B^{-1}S_{22}).
\end{align*}
The two blocks separate. Applying the preceding maximization argument in dimensions $p$ and $q$, respectively, to the positive definite blocks $S_{11}$ and $S_{22}$ shows that the constrained maximum is attained at
\begin{align*}
\hat{\Sigma}_0=
\begin{pmatrix}
S_{11} & 0\\
0 & S_{22}
\end{pmatrix}.
\end{align*}
Therefore the determinant form of Wilks' lambda is
\begin{align*}
\Lambda
=
\frac{\det S}{\det S_{11}\det S_{22}}.
\end{align*}
[guided]
The Gaussian likelihood depends on $\Sigma$ through two terms: a volume term $\log\det\Sigma$ and a quadratic-fit term involving $\Sigma^{-1}S$. Let $d=p+q$, and let $(\Omega,\mathcal F,\mathbb P)$ be the underlying probability space. Let $Y_1,\dots,Y_n: (\Omega,\mathcal F) \to (\mathbb{R}^{d},\mathcal B(\mathbb{R}^{d}))$ be the observed independent Gaussian random vectors. Define
\begin{align*}
\bar{Y}
&=
\frac{1}{n}\sum_{i=1}^{n}Y_i,\qquad
S
=
\frac{1}{n}\sum_{i=1}^{n}(Y_i-\bar{Y})(Y_i-\bar{Y})^\top.
\end{align*}
We partition $S$ according to the decomposition $\mathbb{R}^{d}=\mathbb{R}^{p}\times\mathbb{R}^{q}$:
\begin{align*}
S=
\begin{pmatrix}
S_{11} & S_{12}\\
S_{21} & S_{22}
\end{pmatrix},
\end{align*}
where $S_{11}\in\mathbb{R}^{p\times p}$, $S_{12}\in\mathbb{R}^{p\times q}$, $S_{21}\in\mathbb{R}^{q\times p}$, and $S_{22}\in\mathbb{R}^{q\times q}$.
Let $\mathcal{P}_d$ be the set of positive definite symmetric matrices in $\mathbb{R}^{d\times d}$. After maximizing over the mean, the relevant log-likelihood is the map
\begin{align*}
\ell: \mathcal{P}_d &\to \mathbb{R}\\
\Sigma &\mapsto
-\frac{n}{2}\log\det \Sigma
-\frac{n}{2}\operatorname{tr}(\Sigma^{-1}S).
\end{align*}
Why is the maximizer $S$? For $\Sigma\in\mathcal{P}_d$, set $M=\Sigma^{-1/2}S\Sigma^{-1/2}\in\mathbb{R}^{d\times d}$. Since $n>d$, the centered sample covariance matrix $S$ is positive definite almost surely. If $\lambda_1,\dots,\lambda_d>0$ are the eigenvalues of $M$, then
\begin{align*}
\ell(\Sigma)-\ell(S)
&=
-\frac{n}{2}\left(\log\det M^{-1}+\operatorname{tr}M-d\right)\\
&=
-\frac{n}{2}\sum_{j=1}^{d}\left(\lambda_j-1-\log\lambda_j\right).
\end{align*}
The scalar inequality $\log t\leq t-1$ for $t>0$ gives $\lambda_j-1-\log\lambda_j\geq 0$ for each $j$, so $\ell(\Sigma)\leq \ell(S)$. Equality holds precisely when every $\lambda_j=1$, which is equivalent to $M=I_d$ and hence $\Sigma=S$. Therefore the unrestricted maximizer is $\hat{\Sigma}=S$.
Under the null hypothesis $\Sigma_{12}=0$, the covariance matrix must be block diagonal:
\begin{align*}
\Sigma_0=
\begin{pmatrix}
A & 0\\
0 & B
\end{pmatrix},
\end{align*}
with $A$ and $B$ positive definite. Its inverse and determinant are
\begin{align*}
\Sigma_0^{-1}
=
\begin{pmatrix}
A^{-1} & 0\\
0 & B^{-1}
\end{pmatrix},
\qquad
\det\Sigma_0=(\det A)(\det B).
\end{align*}
Substituting into the log-likelihood gives
\begin{align*}
\ell(\Sigma_0)
=
-\frac{n}{2}\log\det A
-\frac{n}{2}\log\det B
-\frac{n}{2}\operatorname{tr}(A^{-1}S_{11})
-\frac{n}{2}\operatorname{tr}(B^{-1}S_{22}).
\end{align*}
The $A$-terms and $B$-terms are independent, so the constrained likelihood maximizes each block separately. The same eigenvalue argument just used applies to the $p\times p$ block with empirical covariance $S_{11}$ and to the $q\times q$ block with empirical covariance $S_{22}$. Since $S$ is positive definite almost surely, both principal blocks $S_{11}$ and $S_{22}$ are positive definite almost surely, so the argument applies without degeneracy. Hence the constrained covariance estimator is
\begin{align*}
\hat{\Sigma}_0=
\begin{pmatrix}
S_{11} & 0\\
0 & S_{22}
\end{pmatrix}.
\end{align*}
At both maximizers, the trace term contributes the same dimension-dependent constant. Thus the maximized likelihood ratio is
\begin{align*}
\left(\frac{\det S}{\det S_{11}\det S_{22}}\right)^{n/2}.
\end{align*}
The conventional Wilks' lambda statistic is the determinant ratio inside this power, namely
\begin{align*}
\Lambda
=
\frac{\det S}{\det S_{11}\det S_{22}}.
\end{align*}
[/guided]
[/step]
[step:Rewrite the determinant ratio using the Schur complement]
Since $n>d$, the sample covariance matrix $S$ is positive definite almost surely, and therefore $S_{11}$ and $S_{22}$ are positive definite almost surely. We use the Schur complement determinant identity for the invertible block $S_{22}$. To verify it here, multiply by block triangular matrices of determinant $1$:
\begin{align*}
\begin{pmatrix}
I_p & -S_{12}S_{22}^{-1}\\
0 & I_q
\end{pmatrix}
S
\begin{pmatrix}
I_p & 0\\
-S_{22}^{-1}S_{21} & I_q
\end{pmatrix}
=
\begin{pmatrix}
S_{11}-S_{12}S_{22}^{-1}S_{21} & 0\\
0 & S_{22}
\end{pmatrix}.
\end{align*}
Taking determinants gives
\begin{align*}
\det S
=
\det S_{22}\det(S_{11}-S_{12}S_{22}^{-1}S_{21}).
\end{align*}
Dividing by $\det S_{11}\det S_{22}$ gives
\begin{align*}
\Lambda
&=
\frac{\det(S_{11}-S_{12}S_{22}^{-1}S_{21})}{\det S_{11}}\\
&=
\det\left(I_p-S_{11}^{-1/2}S_{12}S_{22}^{-1}S_{21}S_{11}^{-1/2}\right).
\end{align*}
Define
\begin{align*}
\hat{C}: \mathbb{R}^{q} &\to \mathbb{R}^{p}\\
v &\mapsto S_{11}^{-1/2}S_{12}S_{22}^{-1/2}v.
\end{align*}
Then
\begin{align*}
\hat{C}\hat{C}^{\top}
=
S_{11}^{-1/2}S_{12}S_{22}^{-1}S_{21}S_{11}^{-1/2},
\end{align*}
so
\begin{align*}
\Lambda=\det(I_p-\hat{C}\hat{C}^{\top}).
\end{align*}
[guided]
The determinant ratio becomes useful only after isolating the cross-covariance block $S_{12}$. Because $n>d$, the centered Gaussian sample covariance matrix is positive definite almost surely, so its principal blocks $S_{11}$ and $S_{22}$ are also positive definite almost surely. This lets us invert $S_{22}$ and take the positive definite square roots $S_{11}^{1/2}$ and $S_{22}^{1/2}$.
Apply the Schur complement determinant identity to the lower-right block $S_{22}$. The hypothesis needed for this identity is invertibility of $S_{22}$, which holds because $S_{22}$ is positive definite. In this concrete block matrix, the identity follows from the determinant-preserving block elimination
\begin{align*}
\begin{pmatrix}
I_p & -S_{12}S_{22}^{-1}\\
0 & I_q
\end{pmatrix}
S
\begin{pmatrix}
I_p & 0\\
-S_{22}^{-1}S_{21} & I_q
\end{pmatrix}
=
\begin{pmatrix}
S_{11}-S_{12}S_{22}^{-1}S_{21} & 0\\
0 & S_{22}
\end{pmatrix}.
\end{align*}
Both outside matrices are block triangular with identity diagonal blocks, so each has determinant $1$. Taking determinants therefore gives
\begin{align*}
\det S
=
\det S_{22}\det(S_{11}-S_{12}S_{22}^{-1}S_{21}).
\end{align*}
Substituting this into the determinant ratio gives
\begin{align*}
\Lambda
=
\frac{\det(S_{11}-S_{12}S_{22}^{-1}S_{21})}{\det S_{11}}.
\end{align*}
Now factor $S_{11}^{1/2}$ from both sides of the matrix inside the determinant:
\begin{align*}
S_{11}-S_{12}S_{22}^{-1}S_{21}
=
S_{11}^{1/2}
\left(I_p-S_{11}^{-1/2}S_{12}S_{22}^{-1}S_{21}S_{11}^{-1/2}\right)
S_{11}^{1/2}.
\end{align*}
Taking determinants yields
\begin{align*}
\det(S_{11}-S_{12}S_{22}^{-1}S_{21})
=
\det S_{11}\,
\det\left(I_p-S_{11}^{-1/2}S_{12}S_{22}^{-1}S_{21}S_{11}^{-1/2}\right).
\end{align*}
After cancellation of $\det S_{11}$, we obtain
\begin{align*}
\Lambda
=
\det\left(I_p-S_{11}^{-1/2}S_{12}S_{22}^{-1}S_{21}S_{11}^{-1/2}\right).
\end{align*}
Finally define the whitened sample cross-covariance map
\begin{align*}
\hat{C}: \mathbb{R}^{q} &\to \mathbb{R}^{p}\\
v &\mapsto S_{11}^{-1/2}S_{12}S_{22}^{-1/2}v.
\end{align*}
Then
\begin{align*}
\hat{C}\hat{C}^{\top}
=
S_{11}^{-1/2}S_{12}S_{22}^{-1}S_{21}S_{11}^{-1/2},
\end{align*}
and hence
\begin{align*}
\Lambda=\det(I_p-\hat{C}\hat{C}^{\top}).
\end{align*}
[/guided]
[/step]
[step:Identify the eigenvalues with squared sample canonical correlations]
By definition, the sample canonical correlations $\hat{\rho}_1,\dots,\hat{\rho}_s$ are the singular values of the [linear map](/page/Linear%20Map) $\hat{C}:\mathbb{R}^{q}\to\mathbb{R}^{p}$. Therefore the nonzero eigenvalues of the positive semidefinite matrix $\hat{C}\hat{C}^{\top}$ are
\begin{align*}
\hat{\rho}_1^2,\dots,\hat{\rho}_s^2,
\end{align*}
with additional zero eigenvalues if $p>s$. Hence
\begin{align*}
\det(I_p-\hat{C}\hat{C}^{\top})
=
\prod_{k=1}^{s}(1-\hat{\rho}_k^2).
\end{align*}
Combining this identity with the determinant formula for $\Lambda$ gives
\begin{align*}
\Lambda
=
\prod_{k=1}^{s}(1-\hat{\rho}_k^2),
\end{align*}
which is the desired expression for Wilks' lambda.
[/step]