[guided]The purpose of this step is to isolate the part of the data containing the sample mean from the part containing the sample covariance. We do this by rotating the $n$ observations in the sample-index direction.
Choose an orthogonal matrix $Q\in \mathbb{R}^{n\times n}$ whose first row is
\begin{align*}
\left(\frac{1}{\sqrt n},\dots,\frac{1}{\sqrt n}\right).
\end{align*}
For each $a=1,\dots,n$, define
\begin{align*}
Z_a=\sum_{i=1}^n Q_{ai}Y_i.
\end{align*}
This is a linear transformation of the jointly Gaussian vector $(Y_1,\dots,Y_n)$, so $(Z_1,\dots,Z_n)$ is jointly Gaussian. Orthogonality of $Q$ preserves the covariance in the sample-index direction. More explicitly, for $a,b\in\{1,\dots,n\}$,
\begin{align*}
\operatorname{Cov}(Z_a,Z_b)
&= \sum_{i=1}^n\sum_{j=1}^n Q_{ai}Q_{bj}\operatorname{Cov}(Y_i,Y_j) \\
&= \sum_{i=1}^n Q_{ai}Q_{bi} I_p \\
&= \delta_{ab}I_p,
\end{align*}
where $\delta_{ab}$ denotes the Kronecker delta. Since the vector is jointly Gaussian, zero covariance between distinct $Z_a$ and $Z_b$ implies independence.
The means are computed from $\mathbb{E}[Y_i]=\delta$:
\begin{align*}
\mathbb{E}[Z_a]
&= \sum_{i=1}^n Q_{ai}\delta \\
&= \left(\sum_{i=1}^n Q_{ai}\right)\delta.
\end{align*}
For $a=1$, the row sum is $\sqrt n$, so $\mathbb{E}[Z_1]=\sqrt n\,\delta$. For $a\ge 2$, the row of $Q$ is orthogonal to the first row, hence its entries sum to $0$, so $\mathbb{E}[Z_a]=0$. Therefore
\begin{align*}
Z_1\sim \mathcal N_p(\sqrt n\,\delta,I_p),
\end{align*}
and $Z_2,\dots,Z_n$ are independent central $\mathcal N_p(0,I_p)$ random vectors, independent of $Z_1$.
The first transformed vector is the scaled sample mean:
\begin{align*}
Z_1=\sum_{i=1}^n \frac{1}{\sqrt n}Y_i=\sqrt n\,\bar Y.
\end{align*}
The remaining transformed vectors span the orthogonal complement of the constant sample direction, so they contain exactly the centered residual information. Orthogonality of $Q$ gives
\begin{align*}
\sum_{i=1}^n Y_iY_i^\top
=
\sum_{a=1}^n Z_aZ_a^\top.
\end{align*}
Since $Z_1Z_1^\top=n\bar Y\bar Y^\top$, subtracting this rank-one mean part gives
\begin{align*}
\sum_{i=1}^n (Y_i-\bar Y)(Y_i-\bar Y)^\top
=
\sum_{a=2}^n Z_aZ_a^\top.
\end{align*}
Define
\begin{align*}
W=\sum_{a=2}^n Z_aZ_a^\top.
\end{align*}
Because $Z_2,\dots,Z_n$ are independent $\mathcal N_p(0,I_p)$ random vectors, $W$ has the central Wishart distribution $W_p(I_p,n-1)$. It is independent of $Z_1$ because it is a measurable function of $Z_2,\dots,Z_n$, which are independent of $Z_1$.
Finally,
\begin{align*}
S_Y=\frac{1}{n-1}W,
\end{align*}
and therefore
\begin{align*}
T^2
&= n\bar Y^\top S_Y^{-1}\bar Y \\
&= Z_1^\top \left(\frac{1}{n-1}W\right)^{-1}Z_1 \\
&= (n-1)Z_1^\top W^{-1}Z_1.
\end{align*}
This reduces the theorem to a distributional statement about one noncentral Gaussian vector independent of one central Wishart matrix.[/guided]