[proofplan]
We view the i.i.d. sample $X_1, \ldots, X_n \sim N(\mu, \sigma^2)$ as a single multivariate normal vector $X \sim N_n(\mu, \sigma^2 I_n)$ with $\mu = (\mu, \ldots, \mu)^\top$, and apply a carefully chosen orthogonal transformation $Y = AX$. The matrix $A$ is built so that its first row extracts $\sqrt{n}\,\bar{X}$; the remaining rows are filled in so that $A$ is orthogonal (a concrete choice is the Helmert matrix). Under this transformation, $Y$ has i.i.d. normal components, the sample mean is recovered as $\bar{X} = Y_1/\sqrt{n}$, and the key algebraic identity $S_{XX} = \sum_{i=2}^n Y_i^2$ follows from the norm-preservation of orthogonal matrices. The stated independence and distributional results are immediate from this decomposition.
[/proofplan]
[step:Assemble the sample into a multivariate normal vector]
Let $X_1, \ldots, X_n$ be i.i.d. $N(\mu, \sigma^2)$ random variables with $n \geq 2$, and define
\begin{align*}
X: \Omega &\to \mathbb{R}^n \\
\omega &\mapsto (X_1(\omega), \ldots, X_n(\omega))^\top.
\end{align*}
Since the $X_i$ are independent and each is $N(\mu, \sigma^2)$, the joint distribution is the product of $n$ identical univariate normal distributions, which is the multivariate normal $N_n(\mu, \sigma^2 I_n)$ with $\mu = (\mu, \ldots, \mu)^\top = \mu\,\mathbf{1}$, where $\mathbf{1} := (1, \ldots, 1)^\top \in \mathbb{R}^n$.
[/step]
[step:Construct an orthogonal matrix whose first row is $\tfrac{1}{\sqrt{n}}\mathbf{1}^\top$]
We assert the existence of an orthogonal matrix $A \in \mathbb{R}^{n \times n}$ whose first row is
\begin{align*}
A_{1,\,\cdot} = \left(\tfrac{1}{\sqrt{n}}, \ldots, \tfrac{1}{\sqrt{n}}\right).
\end{align*}
A concrete choice is the **Helmert matrix**: its first row is $\tfrac{1}{\sqrt{n}}\mathbf{1}^\top$ as above, and for $k \in \{2, \ldots, n\}$ the $k$-th row is
\begin{align*}
A_{k,\,\cdot} = \left(\underbrace{\tfrac{1}{\sqrt{k(k-1)}}, \ldots, \tfrac{1}{\sqrt{k(k-1)}}}_{k-1 \text{ entries}},\; -\tfrac{k-1}{\sqrt{k(k-1)}},\; 0,\; \ldots,\; 0\right).
\end{align*}
One checks directly that each row has unit $\ell^2$ norm and that distinct rows are orthogonal, so $A A^\top = I_n$ and $A$ is orthogonal. The precise form of rows $2, \ldots, n$ plays no role in what follows; only two properties are used:
(i) the first row is $\tfrac{1}{\sqrt{n}}\mathbf{1}^\top$;
(ii) $A$ is orthogonal, hence rows $2, \ldots, n$ are orthogonal to the first row, i.e. orthogonal to $\mathbf{1}$.
[guided]
The core of this proof is choosing the right orthogonal change of basis. The "right" choice is one whose first coordinate in the new basis is aligned with the sample mean direction, because that is the direction that $\bar{X}$ picks out. The unit vector in the direction of $\mathbf{1}$ is $\mathbf{1}/\|\mathbf{1}\| = \mathbf{1}/\sqrt{n}$, so we demand $A_{1,\,\cdot} = \tfrac{1}{\sqrt{n}}\mathbf{1}^\top$.
Does such an orthogonal matrix exist? Yes: any unit vector $v \in \mathbb{R}^n$ can be extended to an orthonormal basis $\{v, e_2', \ldots, e_n'\}$ of $\mathbb{R}^n$ by Gram–Schmidt, and stacking these as rows gives an orthogonal matrix with $v^\top$ as its first row. The Helmert matrix is one specific choice that can be written down explicitly.
Let us verify the Helmert construction. Row $k \geq 2$ has squared norm
\begin{align*}
(k-1)\cdot\tfrac{1}{k(k-1)} + \tfrac{(k-1)^2}{k(k-1)} = \tfrac{1}{k} + \tfrac{k-1}{k} = 1,
\end{align*}
confirming unit length. For orthogonality of row $k$ with row $1$, sum the entries of row $k$:
\begin{align*}
(k-1)\cdot\tfrac{1}{\sqrt{k(k-1)}} - \tfrac{k-1}{\sqrt{k(k-1)}} = 0,
\end{align*}
so rows $2, \ldots, n$ are orthogonal to $\mathbf{1}$. Orthogonality between rows $j < k$ is checked similarly: row $j$ is non-zero only in the first $j$ positions, and in those positions row $k$ is constantly $\tfrac{1}{\sqrt{k(k-1)}}$, so their inner product is $\tfrac{1}{\sqrt{k(k-1)}}$ times the sum of the first $j$ entries of row $j$, which is zero by the same cancellation used above.
The reader should notice that we never actually need these formulae. All that matters is **(i)** the first row is $\tfrac{1}{\sqrt{n}}\mathbf{1}^\top$ and **(ii)** $A$ is orthogonal. The Helmert matrix is just a witness to the existence of such an $A$.
[/guided]
[/step]
[step:Apply the orthogonal-transformation theorem to obtain an i.i.d. $Y$]
Define the transformed vector
\begin{align*}
Y: \Omega &\to \mathbb{R}^n \\
\omega &\mapsto AX(\omega).
\end{align*}
By the [Orthogonal Transformations Preserve Multivariate Normality](/theorems/1434) theorem, $Y \sim N_n(A\mu, \sigma^2 A A^\top)$. The hypotheses are verified: $X$ is multivariate normal by the previous step, $A$ is orthogonal by construction, and the covariance is $\sigma^2 I_n$. Since $A A^\top = I_n$, we conclude
\begin{align*}
Y \sim N_n(A\mu,\; \sigma^2 I_n).
\end{align*}
In particular, the components $Y_1, \ldots, Y_n$ are **independent** (the covariance matrix is diagonal, and joint normality upgrades uncorrelatedness to independence) and each $Y_i$ is univariate normal with $\operatorname{Var}(Y_i) = \sigma^2$.
[/step]
[step:Compute the transformed mean vector]
We compute the entries of $A\mu = \mu\, A\mathbf{1} \in \mathbb{R}^n$. The first entry is
\begin{align*}
(A\mu)_1 = \mu \sum_{j=1}^n A_{1j} = \mu \sum_{j=1}^n \tfrac{1}{\sqrt{n}} = \sqrt{n}\,\mu.
\end{align*}
For $k \geq 2$, the $k$-th entry is $(A\mu)_k = \mu\, A_{k,\,\cdot} \cdot \mathbf{1}$, which is $\mu$ times the inner product of row $k$ of $A$ with $\mathbf{1}$. By property (ii) above (rows $2, \ldots, n$ are orthogonal to $\mathbf{1}$), this is $0$. Hence
\begin{align*}
A\mu = \left(\sqrt{n}\,\mu,\; 0,\; \ldots,\; 0\right)^\top.
\end{align*}
Combined with the previous step, the distributions of the components of $Y$ are
\begin{align*}
Y_1 &\sim N(\sqrt{n}\,\mu,\; \sigma^2), & Y_i &\sim N(0,\, \sigma^2) \quad \text{for } i = 2, \ldots, n,
\end{align*}
and $Y_1, \ldots, Y_n$ are mutually independent.
[guided]
We need to compute $A\mu$ entry by entry, where $\mu = \mu \mathbf{1}$. By linearity, $A\mu = \mu (A\mathbf{1})$, so we need $A\mathbf{1}$.
**First entry.** The first row is $\tfrac{1}{\sqrt{n}}\mathbf{1}^\top$, so
\begin{align*}
(A\mathbf{1})_1 = \tfrac{1}{\sqrt{n}}\mathbf{1}^\top \mathbf{1} = \tfrac{1}{\sqrt{n}} \cdot n = \sqrt{n}.
\end{align*}
**Remaining entries.** For $k \geq 2$, the $k$-th entry is $(A\mathbf{1})_k = A_{k,\,\cdot} \cdot \mathbf{1}$. We need this to be zero. Here is where orthogonality of $A$ enters: the rows of $A$ are orthonormal, so in particular rows $k \geq 2$ are orthogonal to row $1$, which is a scalar multiple of $\mathbf{1}$. Orthogonality to a scalar multiple of $\mathbf{1}$ is the same as orthogonality to $\mathbf{1}$ itself, so $(A\mathbf{1})_k = 0$.
Multiplying through by $\mu$ gives
\begin{align*}
A\mu = (\sqrt{n}\,\mu,\, 0,\, \ldots,\, 0)^\top.
\end{align*}
The meaning of this computation: in the new basis defined by the rows of $A$, the constant mean vector $\mu$ has a single non-zero coordinate — precisely along the $\bar{X}$ direction. All the "mean" of the sample is concentrated into $Y_1$, and $Y_2, \ldots, Y_n$ are centred. This is the structural reason why the sample variance, which depends only on the centred coordinates, will be independent of $\bar{X}$.
[/guided]
[/step]
[step:Identify $Y_1 = \sqrt{n}\,\bar{X}$ and rewrite $\bar{X}$ as a function of $Y_1$ alone]
The first component of $Y = AX$ is
\begin{align*}
Y_1 = \sum_{j=1}^n A_{1j} X_j = \tfrac{1}{\sqrt{n}}\sum_{j=1}^n X_j = \tfrac{n}{\sqrt{n}}\,\bar{X} = \sqrt{n}\,\bar{X}.
\end{align*}
Equivalently, $\bar{X} = Y_1 / \sqrt{n}$.
Combined with $Y_1 \sim N(\sqrt{n}\,\mu, \sigma^2)$ from the previous step, the distribution of $\bar{X}$ is
\begin{align*}
\bar{X} = \tfrac{Y_1}{\sqrt{n}} \sim N\!\left(\mu,\; \tfrac{\sigma^2}{n}\right),
\end{align*}
using that $N(a, b)$ scaled by $1/c$ yields $N(a/c, b/c^2)$. This establishes part (1) of the theorem.
[/step]
[step:Recover $S_{XX} = \sum_{i=2}^n Y_i^2$ from norm preservation]
Orthogonal matrices preserve the Euclidean norm: $\|Y\|^2 = \|AX\|^2 = X^\top A^\top A X = X^\top X = \|X\|^2$, using $A^\top A = I_n$. Expanding the norms component-wise,
\begin{align*}
\sum_{i=1}^n X_i^2 = \sum_{i=1}^n Y_i^2.
\end{align*}
Now recall the algebraic identity for the sum of squared deviations: using $\sum_{i=1}^n X_i = n\bar{X}$,
\begin{align*}
S_{XX} := \sum_{i=1}^n (X_i - \bar{X})^2 = \sum_{i=1}^n X_i^2 - 2\bar{X}\sum_{i=1}^n X_i + n\bar{X}^2 = \sum_{i=1}^n X_i^2 - n\bar{X}^2.
\end{align*}
Substituting $\sum_{i=1}^n X_i^2 = \sum_{i=1}^n Y_i^2$ and $n\bar{X}^2 = (\sqrt{n}\,\bar{X})^2 = Y_1^2$,
\begin{align*}
S_{XX} = \sum_{i=1}^n Y_i^2 - Y_1^2 = \sum_{i=2}^n Y_i^2.
\end{align*}
[guided]
This step is the crux. We want to express $S_{XX}$ as something depending only on $Y_2, \ldots, Y_n$ — because then $S_{XX}$ will be independent of $Y_1$ (and hence of $\bar{X}$) by the independence of the components of $Y$.
**The identity for the sum of squared deviations.** Any $n$-tuple $a_1, \ldots, a_n$ with mean $\bar{a} = \tfrac{1}{n}\sum a_i$ satisfies
\begin{align*}
\sum_{i=1}^n (a_i - \bar{a})^2 = \sum_{i=1}^n a_i^2 - n\bar{a}^2.
\end{align*}
Proof: expand $(a_i - \bar{a})^2 = a_i^2 - 2a_i\bar{a} + \bar{a}^2$, sum, and observe $\sum a_i = n\bar{a}$, giving $\sum a_i^2 - 2\bar{a}\cdot n\bar{a} + n\bar{a}^2 = \sum a_i^2 - n\bar{a}^2$.
Applied to our sample, $S_{XX} = \sum X_i^2 - n\bar{X}^2$.
**Norm preservation by orthogonal matrices.** A matrix $A$ satisfies $A^\top A = I_n$ iff $\|Av\| = \|v\|$ for all $v \in \mathbb{R}^n$. This gives $\|Y\|^2 = \|AX\|^2 = \|X\|^2$, i.e. $\sum_i Y_i^2 = \sum_i X_i^2$. We use this to replace the $X$-sum with a $Y$-sum.
**Substitution.** We also have $Y_1 = \sqrt{n}\,\bar{X}$, so $Y_1^2 = n\bar{X}^2$. Plugging both substitutions in:
\begin{align*}
S_{XX} = \sum_{i=1}^n X_i^2 - n\bar{X}^2 = \sum_{i=1}^n Y_i^2 - Y_1^2 = \sum_{i=2}^n Y_i^2.
\end{align*}
Why is this the right manoeuvre? The identity $S_{XX} = \sum X_i^2 - n\bar{X}^2$ already hints that the sample variance "lives in $n-1$ dimensions" — it is the total squared length minus the squared component along $\mathbf{1}$. The orthogonal change of basis $A$ makes this decomposition explicit: $Y_1$ carries the component along $\mathbf{1}$, and $Y_2, \ldots, Y_n$ span the orthogonal complement, which is the $(n-1)$-dimensional subspace where the centred data live.
[/guided]
[/step]
[step:Conclude independence of $\bar{X}$ and $S_{XX}$ and identify the $\chi^2_{n-1}$ distribution]
From the previous steps:
(a) $\bar{X} = Y_1/\sqrt{n}$ is a Borel-measurable function of $Y_1$ alone.
(b) $S_{XX} = \sum_{i=2}^n Y_i^2$ is a Borel-measurable function of $(Y_2, \ldots, Y_n)$ alone.
(c) $Y_1, Y_2, \ldots, Y_n$ are mutually independent.
By the standard fact that measurable functions of disjoint subsets of a mutually independent collection are independent, $\bar{X}$ and $S_{XX}$ are independent. This proves the independence claim of the theorem.
For the distribution of $S_{XX}/\sigma^2$: each $Y_i$ for $i \geq 2$ satisfies $Y_i \sim N(0, \sigma^2)$ independently, so $Y_i/\sigma \sim N(0, 1)$ independently. Therefore
\begin{align*}
\tfrac{S_{XX}}{\sigma^2} = \sum_{i=2}^n \left(\tfrac{Y_i}{\sigma}\right)^2
\end{align*}
is a sum of $n-1$ independent squared standard normals, which by definition of the chi-squared distribution is $\chi^2_{n-1}$. This establishes part (2) of the theorem.
Combining all the conclusions: $\bar{X} \sim N(\mu, \sigma^2/n)$, $S_{XX}/\sigma^2 \sim \chi^2_{n-1}$, and the two are independent. This completes the proof.
[/step]