[proofplan]
We reduce the two-sample statistic to the standard normal vector plus independent Wishart matrix form. Under $H_0$, the scaled mean difference is $\mathcal{N}_p(0,\Sigma)$. The pooled scatter matrix has a Wishart distribution with $n_1+n_2-2$ degrees of freedom and is independent of the mean difference. After this reduction, the standard Hotelling-Wishart distributional calculation yields the stated $F$ law.
[/proofplan]
[step:Scale the sample mean difference to a centered multivariate normal vector]
Let
\begin{align*}
\nu &:= n_1+n_2-2, &
a &:= \left(\frac{1}{n_1}+\frac{1}{n_2}\right)^{-1/2}.
\end{align*}
Since the samples are independent and each sample consists of independent Gaussian random vectors,
\begin{align*}
\bar{X} &\sim \mathcal{N}_p\left(\mu_X,\frac{1}{n_1}\Sigma\right), &
\bar{Y} &\sim \mathcal{N}_p\left(\mu_Y,\frac{1}{n_2}\Sigma\right),
\end{align*}
and $\bar{X}$ is independent of $\bar{Y}$. Therefore
\begin{align*}
\bar{X}-\bar{Y}
\sim
\mathcal{N}_p\left(\mu_X-\mu_Y,\left(\frac{1}{n_1}+\frac{1}{n_2}\right)\Sigma\right).
\end{align*}
Under $H_0:\mu_X=\mu_Y$, define the scaled mean-difference random vector
\begin{align*}
Z:\Omega &\to \mathbb{R}^p \\
\omega &\mapsto a\bigl(\bar{X}(\omega)-\bar{Y}(\omega)\bigr).
\end{align*}
Then
\begin{align*}
Z \sim \mathcal{N}_p(0,\Sigma).
\end{align*}
[guided]
The first goal is to turn the two-sample mean difference into the same object that appears in the one-sample Hotelling calculation: one centered Gaussian vector with covariance matrix $\Sigma$.
Define
\begin{align*}
\nu &:= n_1+n_2-2, &
a &:= \left(\frac{1}{n_1}+\frac{1}{n_2}\right)^{-1/2}.
\end{align*}
The sample mean $\bar{X}$ is a linear combination of independent $\mathcal{N}_p(\mu_X,\Sigma)$ random vectors, so
\begin{align*}
\bar{X}\sim \mathcal{N}_p\left(\mu_X,\frac{1}{n_1}\Sigma\right).
\end{align*}
Similarly,
\begin{align*}
\bar{Y}\sim \mathcal{N}_p\left(\mu_Y,\frac{1}{n_2}\Sigma\right).
\end{align*}
Because the entire $X$-sample is independent of the entire $Y$-sample, $\bar{X}$ and $\bar{Y}$ are independent. The difference of independent Gaussian random vectors is Gaussian, with means subtracting and covariance matrices adding:
\begin{align*}
\bar{X}-\bar{Y}
\sim
\mathcal{N}_p\left(\mu_X-\mu_Y,\left(\frac{1}{n_1}+\frac{1}{n_2}\right)\Sigma\right).
\end{align*}
Under $H_0$, the mean term vanishes. Therefore the map
\begin{align*}
Z:\Omega &\to \mathbb{R}^p \\
\omega &\mapsto a\bigl(\bar{X}(\omega)-\bar{Y}(\omega)\bigr)
\end{align*}
satisfies
\begin{align*}
Z \sim \mathcal{N}_p(0,\Sigma).
\end{align*}
The scalar $a$ was chosen exactly so that the covariance factor $\frac{1}{n_1}+\frac{1}{n_2}$ is removed.
[/guided]
[/step]
[step:Identify the pooled covariance matrix as an independent Wishart estimate]
Define the pooled scatter matrix
\begin{align*}
W := \nu S_p.
\end{align*}
We use the following convention for [Wishart distribution](/page/Wishart%20Distribution) notation: for a positive definite matrix $\Sigma \in \mathbb{R}^{p \times p}$ and an integer $m \geq 0$, the notation $W_p(\Sigma,m)$ denotes the law of
\begin{align*}
\sum_{k=1}^{m} G_kG_k^\top,
\end{align*}
where $G_1,\dots,G_m:\Omega\to\mathbb{R}^p$ are independent random vectors with distribution $\mathcal{N}_p(0,\Sigma)$; when $m=0$, the sum is the zero matrix.
By the standard Gaussian sample-mean and residual-scatter decomposition, for the first sample,
\begin{align*}
\sum_{i=1}^{n_1}(X_i-\bar{X})(X_i-\bar{X})^\top \sim W_p(\Sigma,n_1-1),
\end{align*}
and this scatter matrix is independent of $\bar{X}$. For the second sample,
\begin{align*}
\sum_{j=1}^{n_2}(Y_j-\bar{Y})(Y_j-\bar{Y})^\top \sim W_p(\Sigma,n_2-1),
\end{align*}
and this scatter matrix is independent of $\bar{Y}$. Since the two samples are independent, the two scatter matrices are independent of each other and jointly independent of $(\bar{X},\bar{Y})$.
The additivity of independent Wishart matrices with common scale matrix $\Sigma$ gives
\begin{align*}
W \sim W_p(\Sigma,\nu).
\end{align*}
Moreover, $W$ is independent of $Z$ because $Z$ is a measurable function of $(\bar{X},\bar{Y})$. Since $n_1+n_2-1>p$, we have $\nu>p-1$, so $W$ and $S_p$ are invertible almost surely.
Here we are using the standard Gaussian-Wishart decomposition and Wishart additivity facts (citing results not yet in the wiki: independence of Gaussian sample mean and sample scatter; additivity of independent Wishart matrices).
[guided]
Now we identify the random matrix that appears in the denominator of $T^2$. Define
\begin{align*}
W := \nu S_p,
\end{align*}
where $\nu=n_1+n_2-2$. We use the following convention for [Wishart distribution](/page/Wishart%20Distribution) notation: for a positive definite matrix $\Sigma \in \mathbb{R}^{p \times p}$ and an integer $m \geq 0$, the notation $W_p(\Sigma,m)$ denotes the law of
\begin{align*}
\sum_{k=1}^{m} G_kG_k^\top,
\end{align*}
where $G_1,\dots,G_m:\Omega\to\mathbb{R}^p$ are independent random vectors with distribution $\mathcal{N}_p(0,\Sigma)$. If $m=0$, this sum is the zero matrix. This convention covers the cases $n_1=1$ or $n_2=1$, where one residual scatter matrix has zero degrees of freedom.
By the definition of $S_p$,
\begin{align*}
W =
\sum_{i=1}^{n_1}(X_i-\bar{X})(X_i-\bar{X})^\top
+
\sum_{j=1}^{n_2}(Y_j-\bar{Y})(Y_j-\bar{Y})^\top.
\end{align*}
For a Gaussian sample, the sample mean and residual scatter matrix are independent, and the residual scatter matrix has a Wishart distribution. Applying this to the $X$-sample gives
\begin{align*}
\sum_{i=1}^{n_1}(X_i-\bar{X})(X_i-\bar{X})^\top \sim W_p(\Sigma,n_1-1),
\end{align*}
with independence from $\bar{X}$. Applying the same fact to the $Y$-sample gives
\begin{align*}
\sum_{j=1}^{n_2}(Y_j-\bar{Y})(Y_j-\bar{Y})^\top \sim W_p(\Sigma,n_2-1),
\end{align*}
with independence from $\bar{Y}$.
The two samples are independent, so the two scatter matrices are independent of each other. They are also jointly independent of the pair $(\bar{X},\bar{Y})$. Because $Z$ is defined only from $(\bar{X},\bar{Y})$, this implies that $W$ is independent of $Z$.
The sum of independent Wishart matrices with the same scale matrix is Wishart, with degrees of freedom adding. Hence
\begin{align*}
W \sim W_p(\Sigma,(n_1-1)+(n_2-1)) = W_p(\Sigma,\nu).
\end{align*}
The condition $n_1+n_2-1>p$ is exactly the condition $\nu>p-1$, which guarantees that a $W_p(\Sigma,\nu)$ matrix is nonsingular almost surely. Therefore $S_p=W/\nu$ is also nonsingular almost surely.
This step uses the standard Gaussian-Wishart decomposition and Wishart additivity facts (citing results not yet in the wiki: independence of Gaussian sample mean and sample scatter; additivity of independent Wishart matrices).
[/guided]
[/step]
[step:Rewrite $T^2$ in the one-sample Hotelling form]
Since
\begin{align*}
a^2=\left(\frac{1}{n_1}+\frac{1}{n_2}\right)^{-1}
=\frac{n_1n_2}{n_1+n_2},
\end{align*}
the definition of $Z$ gives
\begin{align*}
Z^\top S_p^{-1}Z
&=
a^2(\bar{X}-\bar{Y})^\top S_p^{-1}(\bar{X}-\bar{Y}) \\
&=
\frac{n_1n_2}{n_1+n_2}
(\bar{X}-\bar{Y})^\top S_p^{-1}(\bar{X}-\bar{Y}) \\
&= T^2.
\end{align*}
Since $S_p=W/\nu$, we also have $S_p^{-1}=\nu W^{-1}$ almost surely, and therefore
\begin{align*}
T^2 = \nu Z^\top W^{-1}Z.
\end{align*}
Thus $T^2$ has the standard Hotelling-Wishart form with $Z\sim \mathcal{N}_p(0,\Sigma)$, $W\sim W_p(\Sigma,\nu)$, and $Z$ independent of $W$.
[/step]
[step:Apply the Hotelling-Wishart distribution calculation]
The standard Hotelling-Wishart distribution calculation states that if $Z\sim\mathcal{N}_p(0,\Sigma)$, $W\sim W_p(\Sigma,\nu)$, $Z$ and $W$ are independent, $\Sigma$ is positive definite, and $\nu>p-1$, then
\begin{align*}
\frac{\nu-p+1}{p\nu}\,\nu Z^\top W^{-1}Z \sim F_{p,\nu-p+1}.
\end{align*}
The hypotheses have been verified: $\Sigma$ is positive definite by assumption, $Z\sim\mathcal{N}_p(0,\Sigma)$, $W\sim W_p(\Sigma,\nu)$, $Z$ and $W$ are independent, and $\nu>p-1$.
Substituting $T^2=\nu Z^\top W^{-1}Z$ and $\nu=n_1+n_2-2$ yields
\begin{align*}
\frac{n_1+n_2-p-1}{p(n_1+n_2-2)}T^2
\sim
F_{p,n_1+n_2-p-1}.
\end{align*}
This is the desired distribution under $H_0$.
Here we are using the standard one-sample Hotelling-Wishart distribution calculation (citing a result not yet in the wiki: one-sample Hotelling $T^2$ distribution theorem).
[guided]
At this point the two-sample problem has been reduced to a standard distributional form. We have a Gaussian vector
\begin{align*}
Z\sim\mathcal{N}_p(0,\Sigma),
\end{align*}
an independent Wishart matrix
\begin{align*}
W\sim W_p(\Sigma,\nu),
\end{align*}
and the statistic has been rewritten as
\begin{align*}
T^2=\nu Z^\top W^{-1}Z.
\end{align*}
The standard Hotelling-Wishart distribution calculation says that whenever $\Sigma$ is positive definite, $Z\sim\mathcal{N}_p(0,\Sigma)$, $W\sim W_p(\Sigma,\nu)$, $Z$ and $W$ are independent, and $\nu>p-1$, then
\begin{align*}
\frac{\nu-p+1}{p\nu}\,\nu Z^\top W^{-1}Z \sim F_{p,\nu-p+1}.
\end{align*}
We verify each condition in the present setting. The positive definiteness of $\Sigma$ is an assumption of the theorem. The distribution of $Z$ was proved in the first step. The distribution of $W$ and the independence of $Z$ and $W$ were proved in the second step. Finally,
\begin{align*}
\nu>p-1
\end{align*}
is equivalent to
\begin{align*}
n_1+n_2-2>p-1,
\end{align*}
which is exactly the stated hypothesis $n_1+n_2-1>p$.
Therefore the Hotelling-Wishart calculation applies. Substituting $T^2=\nu Z^\top W^{-1}Z$ gives
\begin{align*}
\frac{\nu-p+1}{p\nu}T^2 \sim F_{p,\nu-p+1}.
\end{align*}
Since $\nu=n_1+n_2-2$, the numerator degrees of freedom remain $p$, and the denominator degrees of freedom become
\begin{align*}
\nu-p+1 = n_1+n_2-p-1.
\end{align*}
Thus
\begin{align*}
\frac{n_1+n_2-p-1}{p(n_1+n_2-2)}T^2
\sim
F_{p,n_1+n_2-p-1}.
\end{align*}
This proves the claimed null distribution.
This final step uses the standard one-sample Hotelling-Wishart distribution calculation (citing a result not yet in the wiki: one-sample Hotelling $T^2$ distribution theorem).
[/guided]
[/step]