[proofplan]
We first reduce the statistic to a whitened normal-Wishart problem: under $H_0$, the standardized sample mean is standard Gaussian and is independent of the standardized sample covariance matrix. The core distributional calculation is then the identity that, for $Y \sim \mathcal{N}_p(0,I_p)$ independent of $B \sim W_p(n-1,I_p)$, the quadratic form $Y^\top B^{-1}Y$ is a ratio of independent chi-square variables. This follows by rotating $Y$ to the first coordinate axis and applying a Schur-complement decomposition of a standard Wishart matrix. The claimed $F$ distribution is then obtained by restoring the factor relating $B$ to $(n-1)S$.
[/proofplan]
[step:Whiten the sample mean and covariance matrix]
Let $\Sigma^{1/2} \in \mathbb{R}^{p \times p}$ denote the symmetric positive definite square root of $\Sigma$, and let $C=(\Sigma^{1/2})^{-1} \in \mathbb{R}^{p \times p}$ denote its symmetric positive definite inverse square root, so that $C\Sigma C = I_p$. Define the random vector $Y: \Omega \to \mathbb{R}^p$ and the random matrix $B: \Omega \to \mathbb{R}^{p \times p}$ by
\begin{align*}
Y &= \sqrt{n}\,C(\bar{X}-\mu_0), &
B &= (n-1)CSC.
\end{align*}
By the normal-Wishart sampling theorem for multivariate normal samples (citing a result not yet in the wiki: normal-Wishart independence theorem), under $H_0$ we have
\begin{align*}
Y &\sim \mathcal{N}_p(0,I_p), &
B &\sim W_p(n-1,I_p),
\end{align*}
and $Y$ and $B$ are independent. Here $W_p(\nu,\Psi)$ denotes the $p$-variate Wishart distribution with $\nu$ degrees of freedom and scale matrix $\Psi$, equivalently the law of $\sum_{k=1}^{\nu} Z_k Z_k^\top$ for independent random vectors $Z_k \sim \mathcal{N}_p(0,\Psi)$.
Since $n>p$, the Wishart matrix $B$ has $n-1 \ge p$ degrees of freedom and is invertible almost surely. Hence $S$ is invertible almost surely because
\begin{align*}
S = \frac{1}{n-1}\Sigma^{1/2}B\Sigma^{1/2}.
\end{align*}
On this almost sure event,
\begin{align*}
S^{-1} = (n-1)CB^{-1}C.
\end{align*}
Therefore
\begin{align*}
T^2
&= n(\bar{X}-\mu_0)^\top S^{-1}(\bar{X}-\mu_0) \\
&= n(\bar{X}-\mu_0)^\top (n-1)CB^{-1}C(\bar{X}-\mu_0) \\
&= (n-1)Y^\top B^{-1}Y.
\end{align*}
[guided]
The goal is to remove the unknown covariance matrix $\Sigma$ from the statistic. Since $\Sigma$ is symmetric positive definite, it has a symmetric positive definite square root $\Sigma^{1/2}$ and a symmetric positive definite inverse square root $C=\Sigma^{-1/2}$ satisfying $C\Sigma C=I_p$.
Define
\begin{align*}
Y &= \sqrt{n}\,C(\bar{X}-\mu_0), &
B &= (n-1)CSC.
\end{align*}
The random vector $Y: \Omega \to \mathbb{R}^p$ is the whitened sample mean, and the random matrix $B: \Omega \to \mathbb{R}^{p \times p}$ is the whitened scaled sample covariance matrix.
By the normal-Wishart sampling theorem for multivariate normal samples, under the null hypothesis $H_0:\mu=\mu_0$,
\begin{align*}
Y &\sim \mathcal{N}_p(0,I_p), &
B &\sim W_p(n-1,I_p),
\end{align*}
and $Y$ is independent of $B$. Here $W_p(\nu,\Psi)$ denotes the $p$-variate Wishart distribution with $\nu$ degrees of freedom and scale matrix $\Psi$, equivalently the law of $\sum_{k=1}^{\nu} Z_k Z_k^\top$ for independent random vectors $Z_k \sim \mathcal{N}_p(0,\Psi)$. The hypothesis $n>p$ is used here to ensure that the Wishart matrix has enough degrees of freedom: $n-1 \ge p$. A standard Wishart matrix with at least $p$ degrees of freedom is invertible almost surely, so $B$ is invertible almost surely.
The relation between $B$ and $S$ is
\begin{align*}
B=(n-1)CSC.
\end{align*}
Solving this for $S$ gives
\begin{align*}
S = \frac{1}{n-1}\Sigma^{1/2}B\Sigma^{1/2}.
\end{align*}
Thus, whenever $B$ is invertible,
\begin{align*}
S^{-1} = (n-1)CB^{-1}C.
\end{align*}
Substituting this into Hotelling's statistic gives
\begin{align*}
T^2
&= n(\bar{X}-\mu_0)^\top S^{-1}(\bar{X}-\mu_0) \\
&= n(\bar{X}-\mu_0)^\top (n-1)CB^{-1}C(\bar{X}-\mu_0).
\end{align*}
Because $Y=\sqrt{n}\,C(\bar{X}-\mu_0)$, this becomes
\begin{align*}
T^2 = (n-1)Y^\top B^{-1}Y.
\end{align*}
So the theorem is reduced to identifying the distribution of $Y^\top B^{-1}Y$ when $Y$ is standard Gaussian and independent of a standard Wishart matrix.
[/guided]
[/step]
[step:Compute the inverse-Wishart quadratic form by rotating the Gaussian vector]
Set $m=n-1$. We prove the following distributional identity: if $Y \sim \mathcal{N}_p(0,I_p)$, $B \sim W_p(m,I_p)$, $m \ge p$, and $Y$ is independent of $B$, then
\begin{align*}
\frac{m-p+1}{p}Y^\top B^{-1}Y \sim F_{p,m-p+1}.
\end{align*}
Define the nonnegative random variable $R: \Omega \to [0,\infty)$ by
\begin{align*}
R = (Y^\top Y)^{1/2}.
\end{align*}
Then $R^2=Y^\top Y \sim \chi^2_p$. On the event $Y\ne 0$, choose an orthogonal matrix $Q_Y \in \mathbb{R}^{p \times p}$ such that
\begin{align*}
Q_YY = R e_1,
\end{align*}
where $e_1=(1,0,\dots,0)^\top \in \mathbb{R}^p$. Since $Y=0$ has probability $0$, this defines the relevant distribution almost surely.
Conditional on $Y$, the matrix
\begin{align*}
A = Q_Y B Q_Y^\top
\end{align*}
has distribution $W_p(m,I_p)$ and is independent of $R^2$, because the standard Wishart law is invariant under deterministic orthogonal conjugation and $B$ is independent of $Y$. Hence
\begin{align*}
Y^\top B^{-1}Y
&= (Q_YY)^\top (Q_YBQ_Y^\top)^{-1}(Q_YY) \\
&= R^2 e_1^\top A^{-1}e_1.
\end{align*}
[claim:Schur complement of a standard Wishart matrix]
Let $A \sim W_p(m,I_p)$ with $m \ge p$. Write $A$ in block form
\begin{align*}
A =
\begin{pmatrix}
a_{11} & a_{12}^\top \\
a_{12} & A_{22}
\end{pmatrix},
\end{align*}
where $a_{11} \in \mathbb{R}$, $a_{12} \in \mathbb{R}^{p-1}$, and $A_{22} \in \mathbb{R}^{(p-1)\times(p-1)}$. Then $A_{22}$ is invertible almost surely, and the Schur complement
\begin{align*}
C_A = a_{11}-a_{12}^\top A_{22}^{-1}a_{12}
\end{align*}
satisfies
\begin{align*}
C_A \sim \chi^2_{m-p+1}.
\end{align*}
Moreover,
\begin{align*}
e_1^\top A^{-1}e_1 = C_A^{-1}.
\end{align*}
[/claim]
[proof]
Represent $A$ as
\begin{align*}
A = G^\top G,
\end{align*}
where $G: \Omega \to \mathbb{R}^{m \times p}$ is a random matrix whose entries are independent $\mathcal{N}(0,1)$ random variables. Partition $G$ by columns as
\begin{align*}
G = (g_1\; G_2),
\end{align*}
where $g_1: \Omega \to \mathbb{R}^m$ is the first column and $G_2: \Omega \to \mathbb{R}^{m \times (p-1)}$ is the remaining column block. Then
\begin{align*}
a_{11} &= g_1^\top g_1, &
a_{12} &= G_2^\top g_1, &
A_{22} &= G_2^\top G_2.
\end{align*}
Since $m \ge p$, the Gaussian matrix $G_2$ has rank $p-1$ almost surely, so $A_{22}=G_2^\top G_2$ is invertible almost surely.
On this event, define the [orthogonal projection](/theorems/437) matrix $P_2: \mathbb{R}^m \to \mathbb{R}^m$ onto $\operatorname{Range}(G_2)$ by
\begin{align*}
P_2 = G_2(G_2^\top G_2)^{-1}G_2^\top.
\end{align*}
Then
\begin{align*}
C_A
&= g_1^\top g_1 - g_1^\top G_2(G_2^\top G_2)^{-1}G_2^\top g_1 \\
&= g_1^\top(I_m-P_2)g_1.
\end{align*}
Conditional on $G_2$, the matrix $I_m-P_2$ is an orthogonal projection of rank
\begin{align*}
m-(p-1)=m-p+1.
\end{align*}
Since $g_1 \sim \mathcal{N}_m(0,I_m)$ and $g_1$ is independent of $G_2$, the quadratic form $g_1^\top(I_m-P_2)g_1$ has conditional distribution $\chi^2_{m-p+1}$. This conditional distribution does not depend on $G_2$, so $C_A \sim \chi^2_{m-p+1}$.
Finally, the block inverse formula for an invertible block matrix with invertible lower-right block gives
\begin{align*}
e_1^\top A^{-1}e_1
=
\left(a_{11}-a_{12}^\top A_{22}^{-1}a_{12}\right)^{-1}
=
C_A^{-1}.
\end{align*}
[/proof]
By the claim,
\begin{align*}
Y^\top B^{-1}Y = \frac{R^2}{C_A},
\end{align*}
where $R^2 \sim \chi^2_p$ and $C_A \sim \chi^2_{m-p+1}$. The random variables $R^2$ and $C_A$ are independent because $R^2$ is a function of $Y$ and $C_A$ is computed from the conditionally rotated Wishart matrix, whose conditional distribution is independent of $Y$. Therefore, by the definition of the $F$ distribution as a ratio of independent normalized chi-square variables,
\begin{align*}
\frac{m-p+1}{p}Y^\top B^{-1}Y
=
\frac{R^2/p}{C_A/(m-p+1)}
\sim F_{p,m-p+1}.
\end{align*}
[guided]
We now prove the distributional identity that drives the theorem. Let $m=n-1$, and suppose
\begin{align*}
Y &\sim \mathcal{N}_p(0,I_p), &
B &\sim W_p(m,I_p),
\end{align*}
with $Y$ independent of $B$ and $m\ge p$.
The first idea is rotational symmetry. Define the nonnegative random variable $R: \Omega \to [0,\infty)$ by
\begin{align*}
R = (Y^\top Y)^{1/2}.
\end{align*}
Since $Y$ is standard Gaussian in $\mathbb{R}^p$, we have $R^2=Y^\top Y\sim \chi^2_p$. On the event $Y\ne 0$, choose an orthogonal matrix $Q_Y\in\mathbb{R}^{p\times p}$ such that
\begin{align*}
Q_YY = R e_1,
\end{align*}
where $e_1=(1,0,\dots,0)^\top$. The event $Y=0$ has probability $0$, so ignoring it does not change the distribution of the statistic.
Now define
\begin{align*}
A = Q_YBQ_Y^\top.
\end{align*}
Conditional on $Y$, the matrix $Q_Y$ is deterministic. Since the standard Wishart distribution $W_p(m,I_p)$ is invariant under orthogonal conjugation, the conditional distribution of $A$ is again $W_p(m,I_p)$. Also, because $B$ is independent of $Y$, this conditional distribution does not depend on $Y$.
The quadratic form becomes
\begin{align*}
Y^\top B^{-1}Y
&= (Q_YY)^\top (Q_YBQ_Y^\top)^{-1}(Q_YY) \\
&= R^2 e_1^\top A^{-1}e_1.
\end{align*}
So the only remaining task is to identify the upper-left entry of $A^{-1}$ for a standard Wishart matrix $A$.
Write $A$ in block form:
\begin{align*}
A =
\begin{pmatrix}
a_{11} & a_{12}^\top \\
a_{12} & A_{22}
\end{pmatrix},
\end{align*}
where $a_{11}\in\mathbb{R}$, $a_{12}\in\mathbb{R}^{p-1}$, and $A_{22}\in\mathbb{R}^{(p-1)\times(p-1)}$. We claim that
\begin{align*}
e_1^\top A^{-1}e_1
=
\left(a_{11}-a_{12}^\top A_{22}^{-1}a_{12}\right)^{-1},
\end{align*}
and that the Schur complement
\begin{align*}
C_A = a_{11}-a_{12}^\top A_{22}^{-1}a_{12}
\end{align*}
has distribution $\chi^2_{m-p+1}$.
To prove this, represent the Wishart matrix as $A=G^\top G$, where $G:\Omega\to\mathbb{R}^{m\times p}$ has independent standard normal entries. Partition $G$ by columns:
\begin{align*}
G=(g_1\;G_2),
\end{align*}
with $g_1:\Omega\to\mathbb{R}^m$ and $G_2:\Omega\to\mathbb{R}^{m\times(p-1)}$. Then
\begin{align*}
a_{11} &= g_1^\top g_1, &
a_{12} &= G_2^\top g_1, &
A_{22} &= G_2^\top G_2.
\end{align*}
Since $m\ge p$, the Gaussian matrix $G_2$ has full column rank $p-1$ almost surely, so $A_{22}$ is invertible almost surely.
Define the orthogonal projection matrix $P_2:\mathbb{R}^m\to\mathbb{R}^m$ onto the column space $\operatorname{Range}(G_2)$ by
\begin{align*}
P_2 = G_2(G_2^\top G_2)^{-1}G_2^\top.
\end{align*}
Then the Schur complement is
\begin{align*}
C_A
&= g_1^\top g_1 - g_1^\top G_2(G_2^\top G_2)^{-1}G_2^\top g_1 \\
&= g_1^\top(I_m-P_2)g_1.
\end{align*}
Conditional on $G_2$, the matrix $I_m-P_2$ is an orthogonal projection. Its rank is the [dimension of the orthogonal complement](/theorems/3297) of $\operatorname{Range}(G_2)$, namely
\begin{align*}
m-(p-1)=m-p+1.
\end{align*}
Since $g_1$ is an $\mathcal{N}_m(0,I_m)$ random vector independent of $G_2$, the projected squared length $g_1^\top(I_m-P_2)g_1$ is chi-square with $m-p+1$ degrees of freedom. Hence
\begin{align*}
C_A\sim \chi^2_{m-p+1}.
\end{align*}
The block inverse formula gives
\begin{align*}
e_1^\top A^{-1}e_1
=
C_A^{-1}.
\end{align*}
Therefore
\begin{align*}
Y^\top B^{-1}Y = \frac{R^2}{C_A}.
\end{align*}
Here $R^2\sim\chi^2_p$, while $C_A\sim\chi^2_{m-p+1}$, and these two variables are independent because $R^2$ depends only on $Y$ whereas the Schur complement comes from the independently rotated Wishart matrix.
By the definition of the $F$ distribution,
\begin{align*}
\frac{R^2/p}{C_A/(m-p+1)} \sim F_{p,m-p+1}.
\end{align*}
Equivalently,
\begin{align*}
\frac{m-p+1}{p}Y^\top B^{-1}Y \sim F_{p,m-p+1}.
\end{align*}
[/guided]
[/step]
[step:Restore the Hotelling scaling]
From the first step,
\begin{align*}
T^2=(n-1)Y^\top B^{-1}Y.
\end{align*}
From the second step with $m=n-1$,
\begin{align*}
\frac{n-p}{p}Y^\top B^{-1}Y \sim F_{p,n-p}.
\end{align*}
Multiplying the identity $T^2=(n-1)Y^\top B^{-1}Y$ by $\frac{n-p}{p(n-1)}$ gives
\begin{align*}
\frac{n-p}{p(n-1)}T^2
=
\frac{n-p}{p}Y^\top B^{-1}Y
\sim F_{p,n-p}.
\end{align*}
This is the asserted distribution of the one-sample Hotelling statistic under $H_0$.
[/step]