[guided]Now we identify the random matrix that appears in the denominator of $T^2$. Define
\begin{align*}
W := \nu S_p,
\end{align*}
where $\nu=n_1+n_2-2$. We use the following convention for [Wishart distribution](/page/Wishart%20Distribution) notation: for a positive definite matrix $\Sigma \in \mathbb{R}^{p \times p}$ and an integer $m \geq 0$, the notation $W_p(\Sigma,m)$ denotes the law of
\begin{align*}
\sum_{k=1}^{m} G_kG_k^\top,
\end{align*}
where $G_1,\dots,G_m:\Omega\to\mathbb{R}^p$ are independent random vectors with distribution $\mathcal{N}_p(0,\Sigma)$. If $m=0$, this sum is the zero matrix. This convention covers the cases $n_1=1$ or $n_2=1$, where one residual scatter matrix has zero degrees of freedom.
By the definition of $S_p$,
\begin{align*}
W =
\sum_{i=1}^{n_1}(X_i-\bar{X})(X_i-\bar{X})^\top
+
\sum_{j=1}^{n_2}(Y_j-\bar{Y})(Y_j-\bar{Y})^\top.
\end{align*}
For a Gaussian sample, the sample mean and residual scatter matrix are independent, and the residual scatter matrix has a Wishart distribution. Applying this to the $X$-sample gives
\begin{align*}
\sum_{i=1}^{n_1}(X_i-\bar{X})(X_i-\bar{X})^\top \sim W_p(\Sigma,n_1-1),
\end{align*}
with independence from $\bar{X}$. Applying the same fact to the $Y$-sample gives
\begin{align*}
\sum_{j=1}^{n_2}(Y_j-\bar{Y})(Y_j-\bar{Y})^\top \sim W_p(\Sigma,n_2-1),
\end{align*}
with independence from $\bar{Y}$.
The two samples are independent, so the two scatter matrices are independent of each other. They are also jointly independent of the pair $(\bar{X},\bar{Y})$. Because $Z$ is defined only from $(\bar{X},\bar{Y})$, this implies that $W$ is independent of $Z$.
The sum of independent Wishart matrices with the same scale matrix is Wishart, with degrees of freedom adding. Hence
\begin{align*}
W \sim W_p(\Sigma,(n_1-1)+(n_2-1)) = W_p(\Sigma,\nu).
\end{align*}
The condition $n_1+n_2-1>p$ is exactly the condition $\nu>p-1$, which guarantees that a $W_p(\Sigma,\nu)$ matrix is nonsingular almost surely. Therefore $S_p=W/\nu$ is also nonsingular almost surely.
This step uses the standard Gaussian-Wishart decomposition and Wishart additivity facts (citing results not yet in the wiki: independence of Gaussian sample mean and sample scatter; additivity of independent Wishart matrices).[/guided]