[proofplan]
We rewrite the standardised coefficient as a ratio $Z / \sqrt{W/(n - p)}$ where $Z$ is standard normal and $W$ is an independent $\chi^2_{n-p}$. The numerator $Z$ comes from extracting the $j$-th marginal of $\hat\beta$ and standardising using the variance $\sigma^2(X^\top X)^{-1}_{jj}$. The denominator $\sqrt{W/(n-p)}$ comes from $\mathrm{RSS}/\sigma^2 \sim \chi^2_{n-p}$ and uses that $\tilde\sigma^2 = \mathrm{RSS}/(n-p)$ while $\operatorname{se}(\hat\beta_j) = \tilde\sigma \sqrt{(X^\top X)^{-1}_{jj}}$. Independence of $\hat\beta$ and $\mathrm{RSS}$ delivers the independence required by the definition of $t_{n-p}$.
[/proofplan]
[step:Extract the marginal of $\hat\beta_j$ and form a standard normal $Z$]
Under the normal linear model, the [Distributional Properties of the Normal Linear Model](/theorems/1445) theorem gives $\hat\beta \sim N_p(\beta, \sigma^2 (X^\top X)^{-1})$. The marginal distribution of the $j$-th coordinate of a multivariate normal is univariate normal with mean equal to the $j$-th entry of the mean vector and variance equal to the $(j, j)$ entry of the covariance matrix:
\begin{align*}
\hat\beta_j &\sim N\!\big(\beta_j,\; \sigma^2 (X^\top X)^{-1}_{jj}\big).
\end{align*}
Standardising, define the map
\begin{align*}
Z &: \Omega \to \mathbb{R}, & \omega &\mapsto \frac{\hat\beta_j(\omega) - \beta_j}{\sigma \sqrt{(X^\top X)^{-1}_{jj}}}.
\end{align*}
Then $Z \sim N(0, 1)$.
The matrix $(X^\top X)^{-1}$ is positive definite — the inverse of the positive definite Gram matrix $X^\top X$ (positive definite because $\operatorname{rank}(X) = p$ forces $X^\top X$ to have full rank $p$, and $X^\top X = (X^\top X)^\top$ has non-negative eigenvalues). In particular, the diagonal entry $(X^\top X)^{-1}_{jj}$ is strictly positive, so the square root and division are well-defined.
[guided]
We need a standard normal to play the role of the numerator in the definition of a $t_{n-p}$ random variable, "standard normal over square root of an independent scaled chi-squared." The natural candidate is the standardised marginal of $\hat\beta_j$.
Why does the marginal of a multivariate normal give a univariate normal? For $X \sim N_p(\mu, \Sigma)$ and $j \in \{1, \ldots, p\}$, the $j$-th component is the linear combination $e_j^\top X$ where $e_j$ is the $j$-th standard basis vector. By the [Orthogonal Transformations Preserve Multivariate Normality](/theorems/1434) theorem applied to the $1 \times p$ matrix $e_j^\top$, this linear combination is normal with mean $e_j^\top \mu = \mu_j$ and variance $e_j^\top \Sigma e_j = \Sigma_{jj}$. Applied to $\hat\beta$:
\begin{align*}
\hat\beta_j = e_j^\top \hat\beta \sim N(\beta_j, \sigma^2 (X^\top X)^{-1}_{jj}).
\end{align*}
*Why is $(X^\top X)^{-1}_{jj} > 0$?* The matrix $X^\top X$ is symmetric. It is also positive definite: for any non-zero $v \in \mathbb{R}^p$,
\begin{align*}
v^\top (X^\top X) v = (Xv)^\top (Xv) = \|Xv\|^2 \geq 0,
\end{align*}
with equality iff $Xv = 0$. Since $\operatorname{rank}(X) = p$ means $X$ has trivial kernel, $Xv = 0 \Leftrightarrow v = 0$. So $X^\top X$ is positive definite, and its inverse $(X^\top X)^{-1}$ is positive definite too (same eigenvectors, reciprocal eigenvalues). The diagonal entries of a positive definite matrix are strictly positive: $(X^\top X)^{-1}_{jj} = e_j^\top (X^\top X)^{-1} e_j > 0$. This guarantees the standardisation is well-defined.
Subtracting the mean and dividing by the standard deviation:
\begin{align*}
Z = \frac{\hat\beta_j - \beta_j}{\sigma \sqrt{(X^\top X)^{-1}_{jj}}} \sim N(0, 1).
\end{align*}
[/guided]
[/step]
[step:Identify the scaled chi-squared $W = \mathrm{RSS}/\sigma^2$ independent of $Z$]
By the [Distributional Properties of the Normal Linear Model](/theorems/1445) (parts 2 and 3), the random variable
\begin{align*}
W &: \Omega \to [0, \infty), & \omega &\mapsto \frac{\mathrm{RSS}(\omega)}{\sigma^2}
\end{align*}
satisfies $W \sim \chi^2_{n-p}$, and $W$ is independent of $\hat\beta$. Independence is preserved under measurable functions (Borel measurability of component extraction and affine standardisation): since $Z$ is a measurable function of $\hat\beta$ and $W$ is a measurable function of $\mathrm{RSS}$, and $\hat\beta \perp\!\!\!\perp \mathrm{RSS}$, we conclude $Z \perp\!\!\!\perp W$.
[guided]
We need a $\chi^2_{n-p}$ random variable independent of $Z$ to serve as the denominator in the $t$-ratio. Both the distribution and the independence were already established in [Distributional Properties of the Normal Linear Model](/theorems/1445): its part (2) gives $W := \mathrm{RSS}/\sigma^2 \sim \chi^2_{n-p}$, and its part (3) gives $\hat\beta \perp\!\!\!\perp \mathrm{RSS}$.
To get the tighter independence $Z \perp\!\!\!\perp W$ we apply the standard fact that independence is preserved under measurable transformations on each side. Explicitly, if $U \perp\!\!\!\perp V$ and $f, g$ are measurable, then $f(U) \perp\!\!\!\perp g(V)$ — this follows from $\sigma(f(U)) \subseteq \sigma(U)$ and $\sigma(g(V)) \subseteq \sigma(V)$, plus the fact that independence between two $\sigma$-algebras is inherited by sub-$\sigma$-algebras.
Here:
- $Z = (\hat\beta_j - \beta_j)/[\sigma \sqrt{(X^\top X)^{-1}_{jj}}]$ is an affine function of the $j$-th coordinate of $\hat\beta$, hence Borel-measurable in $\hat\beta$.
- $W = \mathrm{RSS}/\sigma^2$ is an affine function of $\mathrm{RSS}$, hence Borel-measurable in $\mathrm{RSS}$.
Since $\hat\beta \perp\!\!\!\perp \mathrm{RSS}$, we conclude $Z \perp\!\!\!\perp W$, exactly the independence required by the definition of the $t$ distribution.
[/guided]
[/step]
[step:Rewrite the standardised coefficient as $Z / \sqrt{W/(n-p)}$]
Recall the standard error $\operatorname{se}(\hat\beta_j) = \tilde\sigma \sqrt{(X^\top X)^{-1}_{jj}}$, where $\tilde\sigma^2 = \mathrm{RSS}/(n-p)$ is the unbiased estimator of $\sigma^2$. The quantity of interest is
\begin{align*}
\frac{\hat\beta_j - \beta_j}{\operatorname{se}(\hat\beta_j)} &= \frac{\hat\beta_j - \beta_j}{\tilde\sigma \sqrt{(X^\top X)^{-1}_{jj}}}.
\end{align*}
Multiply numerator and denominator by $1/\big(\sigma \sqrt{(X^\top X)^{-1}_{jj}}\big)$:
\begin{align*}
\frac{\hat\beta_j - \beta_j}{\tilde\sigma \sqrt{(X^\top X)^{-1}_{jj}}} &= \frac{\dfrac{\hat\beta_j - \beta_j}{\sigma\sqrt{(X^\top X)^{-1}_{jj}}}}{\dfrac{\tilde\sigma}{\sigma}} = \frac{Z}{\tilde\sigma/\sigma}.
\end{align*}
Square-rooting the identity $\tilde\sigma^2/\sigma^2 = \mathrm{RSS}/[(n-p)\sigma^2] = W/(n-p)$ (recalling $W \geq 0$ so taking positive square roots is unambiguous):
\begin{align*}
\frac{\tilde\sigma}{\sigma} &= \sqrt{\frac{W}{n-p}}.
\end{align*}
Therefore
\begin{align*}
\frac{\hat\beta_j - \beta_j}{\operatorname{se}(\hat\beta_j)} &= \frac{Z}{\sqrt{W/(n-p)}}.
\end{align*}
[/step]
[step:Apply the definition of the $t_{n-p}$ distribution]
The $t_k$ distribution is defined as the law of $Z/\sqrt{W/k}$ where $Z \sim N(0, 1)$ and $W \sim \chi^2_k$ are independent. In Step 1 we established $Z \sim N(0,1)$; in Step 2, $W \sim \chi^2_{n-p}$ independent of $Z$; in Step 3, the target expression equals $Z/\sqrt{W/(n-p)}$. Taking $k = n - p$ in the definition gives
\begin{align*}
\frac{\hat\beta_j - \beta_j}{\operatorname{se}(\hat\beta_j)} &\sim t_{n-p},
\end{align*}
as claimed. Since $j \in \{1, \ldots, p\}$ was arbitrary, the same conclusion holds for every coefficient.
[/step]