[guided]The Gaussian likelihood depends on $\Sigma$ through two terms: a volume term $\log\det\Sigma$ and a quadratic-fit term involving $\Sigma^{-1}S$. Let $d=p+q$, and let $(\Omega,\mathcal F,\mathbb P)$ be the underlying probability space. Let $Y_1,\dots,Y_n: (\Omega,\mathcal F) \to (\mathbb{R}^{d},\mathcal B(\mathbb{R}^{d}))$ be the observed independent Gaussian random vectors. Define
\begin{align*}
\bar{Y}
&=
\frac{1}{n}\sum_{i=1}^{n}Y_i,\qquad
S
=
\frac{1}{n}\sum_{i=1}^{n}(Y_i-\bar{Y})(Y_i-\bar{Y})^\top.
\end{align*}
We partition $S$ according to the decomposition $\mathbb{R}^{d}=\mathbb{R}^{p}\times\mathbb{R}^{q}$:
\begin{align*}
S=
\begin{pmatrix}
S_{11} & S_{12}\\
S_{21} & S_{22}
\end{pmatrix},
\end{align*}
where $S_{11}\in\mathbb{R}^{p\times p}$, $S_{12}\in\mathbb{R}^{p\times q}$, $S_{21}\in\mathbb{R}^{q\times p}$, and $S_{22}\in\mathbb{R}^{q\times q}$.
Let $\mathcal{P}_d$ be the set of positive definite symmetric matrices in $\mathbb{R}^{d\times d}$. After maximizing over the mean, the relevant log-likelihood is the map
\begin{align*}
\ell: \mathcal{P}_d &\to \mathbb{R}\\
\Sigma &\mapsto
-\frac{n}{2}\log\det \Sigma
-\frac{n}{2}\operatorname{tr}(\Sigma^{-1}S).
\end{align*}
Why is the maximizer $S$? For $\Sigma\in\mathcal{P}_d$, set $M=\Sigma^{-1/2}S\Sigma^{-1/2}\in\mathbb{R}^{d\times d}$. Since $n>d$, the centered sample covariance matrix $S$ is positive definite almost surely. If $\lambda_1,\dots,\lambda_d>0$ are the eigenvalues of $M$, then
\begin{align*}
\ell(\Sigma)-\ell(S)
&=
-\frac{n}{2}\left(\log\det M^{-1}+\operatorname{tr}M-d\right)\\
&=
-\frac{n}{2}\sum_{j=1}^{d}\left(\lambda_j-1-\log\lambda_j\right).
\end{align*}
The scalar inequality $\log t\leq t-1$ for $t>0$ gives $\lambda_j-1-\log\lambda_j\geq 0$ for each $j$, so $\ell(\Sigma)\leq \ell(S)$. Equality holds precisely when every $\lambda_j=1$, which is equivalent to $M=I_d$ and hence $\Sigma=S$. Therefore the unrestricted maximizer is $\hat{\Sigma}=S$.
Under the null hypothesis $\Sigma_{12}=0$, the covariance matrix must be block diagonal:
\begin{align*}
\Sigma_0=
\begin{pmatrix}
A & 0\\
0 & B
\end{pmatrix},
\end{align*}
with $A$ and $B$ positive definite. Its inverse and determinant are
\begin{align*}
\Sigma_0^{-1}
=
\begin{pmatrix}
A^{-1} & 0\\
0 & B^{-1}
\end{pmatrix},
\qquad
\det\Sigma_0=(\det A)(\det B).
\end{align*}
Substituting into the log-likelihood gives
\begin{align*}
\ell(\Sigma_0)
=
-\frac{n}{2}\log\det A
-\frac{n}{2}\log\det B
-\frac{n}{2}\operatorname{tr}(A^{-1}S_{11})
-\frac{n}{2}\operatorname{tr}(B^{-1}S_{22}).
\end{align*}
The $A$-terms and $B$-terms are independent, so the constrained likelihood maximizes each block separately. The same eigenvalue argument just used applies to the $p\times p$ block with empirical covariance $S_{11}$ and to the $q\times q$ block with empirical covariance $S_{22}$. Since $S$ is positive definite almost surely, both principal blocks $S_{11}$ and $S_{22}$ are positive definite almost surely, so the argument applies without degeneracy. Hence the constrained covariance estimator is
\begin{align*}
\hat{\Sigma}_0=
\begin{pmatrix}
S_{11} & 0\\
0 & S_{22}
\end{pmatrix}.
\end{align*}
At both maximizers, the trace term contributes the same dimension-dependent constant. Thus the maximized likelihood ratio is
\begin{align*}
\left(\frac{\det S}{\det S_{11}\det S_{22}}\right)^{n/2}.
\end{align*}
The conventional Wilks' lambda statistic is the determinant ratio inside this power, namely
\begin{align*}
\Lambda
=
\frac{\det S}{\det S_{11}\det S_{22}}.
\end{align*}[/guided]