[proofplan]
We write the coefficient error and the residual sum of squares as orthogonal components of the Gaussian noise vector. The coefficient numerator is the projection of $\varepsilon$ onto a unit vector in $\operatorname{Range}(X)$, while the residual sum of squares is the squared length of the projection of $\varepsilon$ onto $\operatorname{Range}(X)^\perp$. Orthogonal invariance of the multivariate normal distribution makes these components independent, giving one standard normal variable and one independent chi-squared variable. The defining representation of the Student $t$ distribution then gives the result.
[/proofplan]
[step:Express the coefficient error as a Gaussian coordinate in the column space of $X$]
Let $e_j \in \mathbb R^p$ denote the $j$th standard basis vector. Since $X$ has rank $p$, the matrix $X^\top X$ is invertible and positive definite, so $C=(X^\top X)^{-1}$ is positive definite and $c_{jj}=e_j^\top C e_j>0$.
From $Y=X\beta+\varepsilon$ and $\hat\beta=CX^\top Y$, we obtain
\begin{align*}
\hat\beta-\beta
&=CX^\top(X\beta+\varepsilon)-\beta \\
&=CX^\top X\beta+CX^\top\varepsilon-\beta \\
&=\beta+CX^\top\varepsilon-\beta \\
&=CX^\top\varepsilon.
\end{align*}
Therefore
\begin{align*}
\hat\beta_j-\beta_j=e_j^\top CX^\top\varepsilon=(XC e_j)^\top\varepsilon.
\end{align*}
Define the vector $u_j\in \mathbb R^n$ by
\begin{align*}
u_j:=\frac{XC e_j}{\sqrt{c_{jj}}}.
\end{align*}
Then $u_j\in \operatorname{Range}(X)$ and
\begin{align*}
|u_j|^2
&=\frac{e_j^\top C X^\top X C e_j}{c_{jj}} \\
&=\frac{e_j^\top C e_j}{c_{jj}} \\
&=1.
\end{align*}
Under $H_0$, $\beta_j=\beta_{j,0}$, so
\begin{align*}
\frac{\hat\beta_j-\beta_{j,0}}{\sigma\sqrt{c_{jj}}}
=\frac{\hat\beta_j-\beta_j}{\sigma\sqrt{c_{jj}}}
=\frac{u_j^\top\varepsilon}{\sigma}.
\end{align*}
Since $\varepsilon\sim\mathcal N(0,\sigma^2 I_n)$ and $u_j$ is a unit vector, the scalar random variable
\begin{align*}
Z_j:=\frac{u_j^\top\varepsilon}{\sigma}
\end{align*}
has distribution $\mathcal N(0,1)$.
[guided]
The numerator of the test statistic is a single coordinate of the least-squares error. We first rewrite it directly in terms of the noise vector $\varepsilon$.
Let $e_j\in\mathbb R^p$ be the $j$th standard basis vector. Because $X$ has full column rank, $X^\top X$ is positive definite and invertible. Hence $C=(X^\top X)^{-1}$ is positive definite, and in particular
\begin{align*}
c_{jj}=e_j^\top C e_j>0.
\end{align*}
This justifies the denominator $\sqrt{c_{jj}}$.
Using the model equation $Y=X\beta+\varepsilon$, we compute
\begin{align*}
\hat\beta-\beta
&=CX^\top Y-\beta \\
&=CX^\top(X\beta+\varepsilon)-\beta \\
&=CX^\top X\beta+CX^\top\varepsilon-\beta \\
&=\beta+CX^\top\varepsilon-\beta \\
&=CX^\top\varepsilon.
\end{align*}
Taking the $j$th coordinate gives
\begin{align*}
\hat\beta_j-\beta_j=e_j^\top CX^\top\varepsilon=(XC e_j)^\top\varepsilon.
\end{align*}
Now normalize the vector defining this linear functional. Define
\begin{align*}
u_j:=\frac{XC e_j}{\sqrt{c_{jj}}}\in\mathbb R^n.
\end{align*}
This vector lies in $\operatorname{Range}(X)$ because it is $X$ applied to the vector $C e_j\in\mathbb R^p$. Its Euclidean norm is
\begin{align*}
|u_j|^2
&=\frac{(XC e_j)^\top(XC e_j)}{c_{jj}} \\
&=\frac{e_j^\top C X^\top X C e_j}{c_{jj}} \\
&=\frac{e_j^\top C e_j}{c_{jj}} \\
&=1,
\end{align*}
where we used $C X^\top X C=C$ because $C=(X^\top X)^{-1}$.
Under the null hypothesis, $\beta_j=\beta_{j,0}$, so the normalized numerator becomes
\begin{align*}
\frac{\hat\beta_j-\beta_{j,0}}{\sigma\sqrt{c_{jj}}}
=\frac{\hat\beta_j-\beta_j}{\sigma\sqrt{c_{jj}}}
=\frac{u_j^\top\varepsilon}{\sigma}.
\end{align*}
A one-dimensional projection of $\mathcal N(0,\sigma^2 I_n)$ onto a unit vector has distribution $\mathcal N(0,\sigma^2)$. Therefore
\begin{align*}
Z_j:=\frac{u_j^\top\varepsilon}{\sigma}
\end{align*}
has distribution $\mathcal N(0,1)$.
[/guided]
[/step]
[step:Rewrite the residual sum of squares as an orthogonal Gaussian length]
Define the matrix $P\in\mathbb R^{n\times n}$ by
\begin{align*}
P:=X C X^\top,
\end{align*}
and define $M\in\mathbb R^{n\times n}$ by
\begin{align*}
M:=I_n-P.
\end{align*}
The matrix $P$ is the [orthogonal projection](/theorems/437) onto $\operatorname{Range}(X)$, since $P^\top=P$ and $P^2=P$. Hence $M$ is the orthogonal projection onto $\operatorname{Range}(X)^\perp$.
Because $PX=X$, we have $MX=0$. Also $X\hat\beta=PX Y$. Therefore
\begin{align*}
Y-X\hat\beta
&=Y-PY \\
&=MY \\
&=M(X\beta+\varepsilon) \\
&=MX\beta+M\varepsilon \\
&=M\varepsilon.
\end{align*}
Thus
\begin{align*}
\operatorname{RSS}=|M\varepsilon|^2.
\end{align*}
Choose an orthonormal basis $(q_1,\dots,q_n)$ of $\mathbb R^n$ such that $q_1=u_j$, the vectors $q_1,\dots,q_p$ span $\operatorname{Range}(X)$, and $q_{p+1},\dots,q_n$ span $\operatorname{Range}(X)^\perp$. Let $Q\in\mathbb R^{n\times n}$ be the orthogonal matrix with columns $q_1,\dots,q_n$, and define the random vector $\eta:\Omega\to\mathbb R^n$ by
\begin{align*}
\eta:=\frac{Q^\top\varepsilon}{\sigma}.
\end{align*}
By orthogonal invariance of the standard multivariate normal distribution, $\eta\sim\mathcal N(0,I_n)$, so its coordinates $\eta_1,\dots,\eta_n$ are independent $\mathcal N(0,1)$ random variables. In this basis,
\begin{align*}
Z_j=\eta_1
\end{align*}
and
\begin{align*}
\frac{\operatorname{RSS}}{\sigma^2}
=\frac{|M\varepsilon|^2}{\sigma^2}
=\sum_{k=p+1}^{n}\eta_k^2.
\end{align*}
Consequently,
\begin{align*}
W:=\frac{\operatorname{RSS}}{\sigma^2}
\end{align*}
has the chi-squared distribution with $n-p$ degrees of freedom, and $Z_j$ is independent of $W$.
[guided]
The residual vector is obtained by projecting $Y$ away from the column space of $X$. Define
\begin{align*}
P:=XCX^\top\in\mathbb R^{n\times n},
\qquad
M:=I_n-P\in\mathbb R^{n\times n}.
\end{align*}
We verify that $P$ is the orthogonal projection onto $\operatorname{Range}(X)$. First,
\begin{align*}
P^\top=(XCX^\top)^\top=XC^\top X^\top=XCX^\top=P,
\end{align*}
because $C$ is symmetric. Second,
\begin{align*}
P^2
&=XCX^\top XCX^\top \\
&=XC(X^\top X)CX^\top \\
&=XCX^\top \\
&=P.
\end{align*}
Thus $P$ is a symmetric idempotent matrix, hence the orthogonal projection onto its range. Since $PX=X$, its range is exactly $\operatorname{Range}(X)$. Therefore $M=I_n-P$ is the orthogonal projection onto $\operatorname{Range}(X)^\perp$.
Now compute the residual vector:
\begin{align*}
Y-X\hat\beta
&=Y-XCX^\top Y \\
&=Y-PY \\
&=MY \\
&=M(X\beta+\varepsilon) \\
&=MX\beta+M\varepsilon.
\end{align*}
Because $PX=X$, we have $MX=(I_n-P)X=X-X=0$, so $MX\beta=0$. Hence
\begin{align*}
Y-X\hat\beta=M\varepsilon,
\qquad
\operatorname{RSS}=|M\varepsilon|^2.
\end{align*}
To identify the distribution of this squared length and its independence from the numerator, choose coordinates adapted to the two orthogonal subspaces. Since $u_j\in\operatorname{Range}(X)$ and $|u_j|=1$, choose an orthonormal basis $(q_1,\dots,q_n)$ of $\mathbb R^n$ such that $q_1=u_j$, the vectors $q_1,\dots,q_p$ span $\operatorname{Range}(X)$, and $q_{p+1},\dots,q_n$ span $\operatorname{Range}(X)^\perp$. Let $Q\in\mathbb R^{n\times n}$ be the orthogonal matrix with columns $q_1,\dots,q_n$, and define
\begin{align*}
\eta:=\frac{Q^\top\varepsilon}{\sigma}.
\end{align*}
Since $\varepsilon/\sigma\sim\mathcal N(0,I_n)$ and $Q$ is orthogonal, orthogonal invariance of the standard multivariate normal distribution gives
\begin{align*}
\eta\sim\mathcal N(0,I_n).
\end{align*}
Thus the coordinates $\eta_1,\dots,\eta_n$ are independent standard normal random variables.
The numerator coordinate is
\begin{align*}
Z_j=\frac{u_j^\top\varepsilon}{\sigma}
=\frac{q_1^\top\varepsilon}{\sigma}
=\eta_1.
\end{align*}
The residual projection $M\varepsilon$ keeps exactly the coordinates in $\operatorname{Range}(X)^\perp$, so
\begin{align*}
\frac{\operatorname{RSS}}{\sigma^2}
=\frac{|M\varepsilon|^2}{\sigma^2}
=\sum_{k=p+1}^{n}\eta_k^2.
\end{align*}
A sum of squares of $n-p$ independent standard normal random variables has the chi-squared distribution with $n-p$ degrees of freedom. Therefore
\begin{align*}
W:=\frac{\operatorname{RSS}}{\sigma^2}\sim\chi^2_{n-p}.
\end{align*}
Moreover $Z_j=\eta_1$ depends only on the first coordinate, while $W$ depends only on $\eta_{p+1},\dots,\eta_n$. Since the coordinates of $\eta$ are independent, $Z_j$ and $W$ are independent.
[/guided]
[/step]
[step:Identify the Student $t$ representation]
By the definition of $\hat\sigma^2$,
\begin{align*}
\frac{(n-p)\hat\sigma^2}{\sigma^2}
=\frac{\operatorname{RSS}}{\sigma^2}
=W.
\end{align*}
Therefore
\begin{align*}
T_j
&=\frac{\hat\beta_j-\beta_{j,0}}{\hat\sigma\sqrt{c_{jj}}} \\
&=\frac{(\hat\beta_j-\beta_{j,0})/(\sigma\sqrt{c_{jj}})}
{\sqrt{\hat\sigma^2/\sigma^2}} \\
&=\frac{Z_j}{\sqrt{W/(n-p)}}.
\end{align*}
We have shown that $Z_j\sim\mathcal N(0,1)$, $W\sim\chi^2_{n-p}$, and $Z_j$ is independent of $W$. By the defining representation of the Student $t$ distribution, $T_j$ has the Student $t$ distribution with $n-p$ degrees of freedom. This proves the theorem.
[/step]