[proofplan]
We rotate the $n$ sample index directions by an orthogonal matrix whose first row averages the observations. After stacking the sample into one vector in $\mathbb{R}^{np}$, this orthogonal rotation separates the averaged component $\sqrt n\,\bar X$ from the $n-1$ residual components. The transformed vector is Gaussian, and its first block has zero covariance with every residual block, so the first block is independent of the residual blocks by the characteristic-function factorisation for Gaussian vectors. Finally, the sample covariance is a measurable function only of the residual blocks, so independence passes from the residual vector to $S$.
[/proofplan]
[step:Choose an orthogonal change of sample coordinates whose first row averages the observations]
Let $e \in \mathbb{R}^n$ be the unit vector
\begin{align*}
e := \frac{1}{\sqrt n}(1,\dots,1).
\end{align*}
By extending $e$ to an orthonormal basis of $\mathbb{R}^n$, choose an orthogonal matrix $A \in \mathbb{R}^{n \times n}$ whose first row is $e^\top$. Thus
\begin{align*}
A A^\top = A^\top A = I_n,
\qquad
A_{1i} = \frac{1}{\sqrt n}
\quad \text{for every } i \in \{1,\dots,n\}.
\end{align*}
Define the stacked sample vector $Z: \Omega \to \mathbb{R}^{np}$ by
\begin{align*}
Z :=
\begin{pmatrix}
X_1\\
\vdots\\
X_n
\end{pmatrix}.
\end{align*}
Since $X_1,\dots,X_n$ are independent and each has distribution $\mathcal{N}_p(\mu,\Sigma)$, the vector $Z$ is Gaussian in $\mathbb{R}^{np}$ with mean vector
\begin{align*}
m :=
\begin{pmatrix}
\mu\\
\vdots\\
\mu
\end{pmatrix}
\in \mathbb{R}^{np}
\end{align*}
and covariance matrix $I_n \otimes \Sigma$.
Let $T: \mathbb{R}^{np} \to \mathbb{R}^{np}$ be the [linear map](/page/Linear%20Map) represented by $A \otimes I_p$, and define
\begin{align*}
Y: \Omega &\to \mathbb{R}^{np}\\
\omega &\mapsto T(Z(\omega)).
\end{align*}
Write $Y$ in $p$-dimensional blocks as
\begin{align*}
Y =
\begin{pmatrix}
Y_1\\
\vdots\\
Y_n
\end{pmatrix},
\qquad
Y_j: \Omega \to \mathbb{R}^p.
\end{align*}
For the first block,
\begin{align*}
Y_1
= \sum_{i=1}^n A_{1i}X_i
= \frac{1}{\sqrt n}\sum_{i=1}^n X_i
= \sqrt n\,\bar X.
\end{align*}
[guided]
The goal is to separate the average direction from all directions orthogonal to it. The vector
\begin{align*}
e := \frac{1}{\sqrt n}(1,\dots,1) \in \mathbb{R}^n
\end{align*}
has Euclidean norm $1$, and the linear functional $x \mapsto e \cdot x$ computes the normalized sum of the $n$ sample-index coordinates. Extending $e$ to an orthonormal basis of $\mathbb{R}^n$ gives an orthogonal matrix $A \in \mathbb{R}^{n \times n}$ whose first row is $e^\top$. Hence
\begin{align*}
A A^\top = A^\top A = I_n,
\qquad
A_{1i} = \frac{1}{\sqrt n}
\quad \text{for every } i \in \{1,\dots,n\}.
\end{align*}
Now stack the $p$-dimensional observations into one $np$-dimensional random vector:
\begin{align*}
Z: \Omega &\to \mathbb{R}^{np}\\
\omega &\mapsto
\begin{pmatrix}
X_1(\omega)\\
\vdots\\
X_n(\omega)
\end{pmatrix}.
\end{align*}
Independence of $X_1,\dots,X_n$ and the common law $\mathcal{N}_p(\mu,\Sigma)$ imply that $Z$ is Gaussian with mean
\begin{align*}
m =
\begin{pmatrix}
\mu\\
\vdots\\
\mu
\end{pmatrix}
\in \mathbb{R}^{np}
\end{align*}
and block covariance matrix $I_n \otimes \Sigma$: the diagonal covariance blocks are $\Sigma$, and the off-diagonal covariance blocks vanish because the observations are independent.
Define the linear map $T: \mathbb{R}^{np} \to \mathbb{R}^{np}$ by the matrix $A \otimes I_p$, and define the transformed random vector
\begin{align*}
Y: \Omega &\to \mathbb{R}^{np}\\
\omega &\mapsto T(Z(\omega)).
\end{align*}
Writing $Y$ in $p$-dimensional blocks as $Y=(Y_1,\dots,Y_n)^\top$, the first block is
\begin{align*}
Y_1
= \sum_{i=1}^n A_{1i}X_i
= \frac{1}{\sqrt n}\sum_{i=1}^n X_i
= \sqrt n\,\bar X.
\end{align*}
Thus the first transformed block is exactly the sample mean, up to the nonzero scalar factor $\sqrt n$.
[/guided]
[/step]
[step:Compute the transformed covariance and isolate the mean block]
Because $Y=(A\otimes I_p)Z$ and $Z$ has covariance $I_n\otimes \Sigma$, the covariance matrix of $Y$ is
\begin{align*}
(A\otimes I_p)(I_n\otimes \Sigma)(A^\top\otimes I_p)
= (A I_n A^\top)\otimes (I_p\Sigma I_p)
= I_n\otimes \Sigma.
\end{align*}
Therefore, for $j,k \in \{1,\dots,n\}$,
\begin{align*}
\operatorname{Cov}(Y_j,Y_k)
=
\begin{cases}
\Sigma, & j=k,\\
0, & j\neq k.
\end{cases}
\end{align*}
In particular,
\begin{align*}
\operatorname{Cov}(Y_1,Y_j)=0
\quad \text{for every } j \in \{2,\dots,n\}.
\end{align*}
[guided]
Linear transformations preserve Gaussianity, so $Y=(A\otimes I_p)Z$ is again a Gaussian vector in $\mathbb{R}^{np}$. Its covariance is obtained by the standard covariance transformation rule for a linear map: if a vector has covariance matrix $C$ and is multiplied by a matrix $M$, the transformed covariance is $MCM^\top$. Here
\begin{align*}
M = A\otimes I_p,
\qquad
C = I_n\otimes \Sigma.
\end{align*}
Using the mixed-product identity for Kronecker products, we compute
\begin{align*}
(A\otimes I_p)(I_n\otimes \Sigma)(A^\top\otimes I_p)
= (A I_n A^\top)\otimes (I_p\Sigma I_p)
= I_n\otimes \Sigma,
\end{align*}
because $A$ is orthogonal and therefore $A A^\top=I_n$.
This block diagonal covariance means that the transformed blocks $Y_1,\dots,Y_n$ have covariance
\begin{align*}
\operatorname{Cov}(Y_j,Y_k)
=
\begin{cases}
\Sigma, & j=k,\\
0, & j\neq k.
\end{cases}
\end{align*}
The important conclusion is the vanishing cross-covariance
\begin{align*}
\operatorname{Cov}(Y_1,Y_j)=0
\quad \text{for every } j \in \{2,\dots,n\}.
\end{align*}
For arbitrary random vectors, zero covariance would not imply independence. For jointly Gaussian vectors, it does, and the next step verifies that implication by factorising the Gaussian characteristic function.
[/guided]
[/step]
[step:Use the Gaussian characteristic function to prove independence of the first block and the residual blocks]
Define the residual block vector $R: \Omega \to \mathbb{R}^{(n-1)p}$ by
\begin{align*}
R :=
\begin{pmatrix}
Y_2\\
\vdots\\
Y_n
\end{pmatrix}.
\end{align*}
Since $(Y_1,R)$ is Gaussian and its covariance matrix has block form
\begin{align*}
\begin{pmatrix}
\Sigma & 0\\
0 & I_{n-1}\otimes \Sigma
\end{pmatrix},
\end{align*}
its characteristic function factorises. Indeed, let $a \in \mathbb{R}^p$ and $b \in \mathbb{R}^{(n-1)p}$. Let $m_1:=\mathbb{E}[Y_1]\in\mathbb{R}^p$ and $m_R:=\mathbb{E}[R]\in\mathbb{R}^{(n-1)p}$. The Gaussian characteristic function gives
\begin{align*}
\mathbb{E}\left[\exp\left(i\,a\cdot Y_1+i\,b\cdot R\right)\right]
&=
\exp\left(
i\,a\cdot m_1+i\,b\cdot m_R
-\frac{1}{2}a^\top \Sigma a
-\frac{1}{2}b^\top (I_{n-1}\otimes \Sigma)b
\right)\\
&=
\exp\left(i\,a\cdot m_1-\frac{1}{2}a^\top \Sigma a\right)
\exp\left(i\,b\cdot m_R-\frac{1}{2}b^\top (I_{n-1}\otimes \Sigma)b\right)\\
&=
\mathbb{E}\left[\exp(i\,a\cdot Y_1)\right]
\mathbb{E}\left[\exp(i\,b\cdot R)\right].
\end{align*}
By [uniqueness of characteristic functions](/theorems/530), $Y_1$ and $R$ are independent.
[guided]
We now convert the covariance computation into independence. Define the residual block vector
\begin{align*}
R: \Omega &\to \mathbb{R}^{(n-1)p}\\
\omega &\mapsto
\begin{pmatrix}
Y_2(\omega)\\
\vdots\\
Y_n(\omega)
\end{pmatrix}.
\end{align*}
The pair $(Y_1,R)$ is Gaussian because it is obtained from the Gaussian vector $Y$ by a coordinate projection. From the previous covariance computation, the covariance matrix of $(Y_1,R)$ has block form
\begin{align*}
\begin{pmatrix}
\Sigma & 0\\
0 & I_{n-1}\otimes \Sigma
\end{pmatrix}.
\end{align*}
To prove independence, we show that the joint characteristic function splits into the product of the marginal characteristic functions. Let $a \in \mathbb{R}^p$ and $b \in \mathbb{R}^{(n-1)p}$ be arbitrary test vectors for the two blocks. Define
\begin{align*}
m_1 := \mathbb{E}[Y_1] \in \mathbb{R}^p,
\qquad
m_R := \mathbb{E}[R] \in \mathbb{R}^{(n-1)p}.
\end{align*}
The characteristic function of a Gaussian vector with mean $m$ and covariance $C$ is
\begin{align*}
u \mapsto \exp\left(i\,u\cdot m-\frac{1}{2}u^\top C u\right).
\end{align*}
Applying this formula to the Gaussian vector $(Y_1,R)$ and to the vector $(a,b)$ gives
\begin{align*}
\mathbb{E}\left[\exp\left(i\,a\cdot Y_1+i\,b\cdot R\right)\right]
&=
\exp\left(
i\,a\cdot m_1+i\,b\cdot m_R
-\frac{1}{2}a^\top \Sigma a
-\frac{1}{2}b^\top (I_{n-1}\otimes \Sigma)b
\right).
\end{align*}
There is no mixed term involving both $a$ and $b$ because the off-diagonal covariance block is zero. Therefore the exponential separates:
\begin{align*}
\mathbb{E}\left[\exp\left(i\,a\cdot Y_1+i\,b\cdot R\right)\right]
&=
\exp\left(i\,a\cdot m_1-\frac{1}{2}a^\top \Sigma a\right)
\exp\left(i\,b\cdot m_R-\frac{1}{2}b^\top (I_{n-1}\otimes \Sigma)b\right)\\
&=
\mathbb{E}\left[\exp(i\,a\cdot Y_1)\right]
\mathbb{E}\left[\exp(i\,b\cdot R)\right].
\end{align*}
Since this factorisation holds for every $a \in \mathbb{R}^p$ and every $b \in \mathbb{R}^{(n-1)p}$, uniqueness of characteristic functions implies that $Y_1$ and $R$ are independent.
[/guided]
[/step]
[step:Express the sample covariance as a function of the residual blocks]
Let $P \in \mathbb{R}^{n\times n}$ be the centering matrix
\begin{align*}
P := I_n-\frac{1}{n}\mathbf{1}\mathbf{1}^\top,
\end{align*}
where $\mathbf{1}:=(1,\dots,1)^\top \in \mathbb{R}^n$. The stacked centered sample vector $C: \Omega \to \mathbb{R}^{np}$ is
\begin{align*}
C :=
\begin{pmatrix}
X_1-\bar X\\
\vdots\\
X_n-\bar X
\end{pmatrix}
=
(P\otimes I_p)Z.
\end{align*}
Since the first row of $A$ is $e^\top=\frac{1}{\sqrt n}\mathbf{1}^\top$, the matrix $P$ is the [orthogonal projection](/theorems/437) onto $e^\perp$. Hence
\begin{align*}
P = A^\top
\begin{pmatrix}
0 & 0\\
0 & I_{n-1}
\end{pmatrix}
A.
\end{align*}
Therefore
\begin{align*}
C
&=
\left(A^\top
\begin{pmatrix}
0 & 0\\
0 & I_{n-1}
\end{pmatrix}
A\otimes I_p\right)Z\\
&=
(A^\top\otimes I_p)
\begin{pmatrix}
0\\
Y_2\\
\vdots\\
Y_n
\end{pmatrix}.
\end{align*}
Because $A^\top\otimes I_p$ is orthogonal, it preserves Euclidean inner products in $\mathbb{R}^{np}$. Applying this to each pair of coordinate directions in $\mathbb{R}^p$ gives
\begin{align*}
\sum_{i=1}^n (X_i-\bar X)\otimes (X_i-\bar X)
=
\sum_{j=2}^n Y_j\otimes Y_j.
\end{align*}
Thus
\begin{align*}
S
=
\frac{1}{n-1}\sum_{j=2}^n Y_j\otimes Y_j.
\end{align*}
Consequently $S$ is a measurable function of $R=(Y_2,\dots,Y_n)$.
[guided]
We must now connect the residual blocks $Y_2,\dots,Y_n$ with the sample covariance. Define the vector
\begin{align*}
\mathbf{1}:=(1,\dots,1)^\top \in \mathbb{R}^n
\end{align*}
and define the centering matrix
\begin{align*}
P := I_n-\frac{1}{n}\mathbf{1}\mathbf{1}^\top \in \mathbb{R}^{n\times n}.
\end{align*}
This matrix subtracts the average from an $n$-tuple. Therefore the stacked centered sample vector
\begin{align*}
C: \Omega &\to \mathbb{R}^{np}\\
\omega &\mapsto
\begin{pmatrix}
X_1(\omega)-\bar X(\omega)\\
\vdots\\
X_n(\omega)-\bar X(\omega)
\end{pmatrix}
\end{align*}
satisfies
\begin{align*}
C=(P\otimes I_p)Z.
\end{align*}
The first row of $A$ is $e^\top=\frac{1}{\sqrt n}\mathbf{1}^\top$, so $P$ is the orthogonal projection in $\mathbb{R}^n$ onto the hyperplane perpendicular to $e$. In the orthonormal coordinate system determined by $A$, that projection kills the first coordinate and keeps the remaining $n-1$ coordinates. Hence
\begin{align*}
P = A^\top
\begin{pmatrix}
0 & 0\\
0 & I_{n-1}
\end{pmatrix}
A.
\end{align*}
Substituting this into $C=(P\otimes I_p)Z$ and using $Y=(A\otimes I_p)Z$ gives
\begin{align*}
C
&=
\left(A^\top
\begin{pmatrix}
0 & 0\\
0 & I_{n-1}
\end{pmatrix}
A\otimes I_p\right)Z\\
&=
(A^\top\otimes I_p)
\begin{pmatrix}
0\\
Y_2\\
\vdots\\
Y_n
\end{pmatrix}.
\end{align*}
The sample covariance uses the sum of outer products of the centered observations. Since $A^\top\otimes I_p$ is orthogonal, changing from the centered block vector $C$ to the residual-coordinate block vector $(0,Y_2,\dots,Y_n)^\top$ preserves the Gram matrix in the $p$ coordinate directions. Equivalently,
\begin{align*}
\sum_{i=1}^n (X_i-\bar X)\otimes (X_i-\bar X)
=
\sum_{j=2}^n Y_j\otimes Y_j.
\end{align*}
Therefore
\begin{align*}
S
=
\frac{1}{n-1}\sum_{j=2}^n Y_j\otimes Y_j.
\end{align*}
This formula shows that $S$ depends only on the residual vector $R=(Y_2,\dots,Y_n)$, and the map
\begin{align*}
\mathbb{R}^{(n-1)p} &\to \mathbb{R}^{p\times p}\\
(y_2,\dots,y_n) &\mapsto \frac{1}{n-1}\sum_{j=2}^n y_j\otimes y_j
\end{align*}
is continuous, hence measurable.
[/guided]
[/step]
[step:Pass independence through measurable functions and conclude]
From the previous steps, $Y_1=\sqrt n\,\bar X$ is independent of $R$, and $S$ is a measurable function of $R$. Since $\bar X=n^{-1/2}Y_1$ is a measurable function of $Y_1$, it follows that $\bar X$ and $S$ are independent. This is precisely the desired independence of the sample mean and the sample covariance matrix.
[/step]