[proofplan]
We compute the distributions of the least-squares estimator and the residual variance estimator by decomposing the Gaussian error vector into its projection onto the column space of $X$ and its orthogonal residual component. Orthogonality of these two projections gives independence, so the numerator in each statistic is Gaussian and independent of the chi-square denominator. The mean-response statistic has variance factor $h_0$, while the prediction statistic has the additional independent future-error variance, producing the factor $1+h_0$.
[/proofplan]
custom_env
admin
[step:Decompose the Gaussian error into fitted and residual components]
Define the hat matrix $H \in \mathbb{R}^{n \times n}$ and residual matrix $M \in \mathbb{R}^{n \times n}$ by
\begin{align*}
H := X(X^\top X)^{-1}X^\top,
\qquad
M := I_n-H.
\end{align*}
Since $X$ has full column rank, $X^\top X$ is invertible. The matrix $H$ is symmetric and idempotent, hence it is the [orthogonal projection](/theorems/437) of $\mathbb{R}^n$ onto $\operatorname{Range}(X)$. Therefore $M$ is symmetric and idempotent, $MX=0$, and $\operatorname{rank}(M)=n-p$.
The estimator satisfies
\begin{align*}
\hat{\beta}
&= (X^\top X)^{-1}X^\top(X\beta+\varepsilon) \\
&= \beta + (X^\top X)^{-1}X^\top \varepsilon.
\end{align*}
The residual vector $r: \Omega \to \mathbb{R}^n$ is
\begin{align*}
r := Y-X\hat{\beta}.
\end{align*}
Using $H X=X$ and the preceding expression for $\hat{\beta}$, we obtain
\begin{align*}
r
&= Y-HY \\
&= MY \\
&= M(X\beta+\varepsilon) \\
&= M\varepsilon.
\end{align*}
Thus
\begin{align*}
\operatorname{RSS}=|M\varepsilon|^2.
\end{align*}
[/step]
custom_env
admin
[step:Identify the distribution of the mean-response numerator]
Define the row vector $a_0 \in \mathbb{R}^{1 \times n}$ by
\begin{align*}
a_0 := x_0(X^\top X)^{-1}X^\top.
\end{align*}
Then
\begin{align*}
x_0\hat{\beta}-x_0\beta
= a_0\varepsilon.
\end{align*}
Because $\varepsilon \sim \mathcal{N}(0,\sigma^2 I_n)$, every linear image of $\varepsilon$ is Gaussian. Hence $a_0\varepsilon$ is a real-valued Gaussian random variable with mean
\begin{align*}
\mathbb{E}[a_0\varepsilon]=a_0\mathbb{E}[\varepsilon]=0
\end{align*}
and variance
\begin{align*}
\operatorname{Var}(a_0\varepsilon)
&= a_0(\sigma^2 I_n)a_0^\top \\
&= \sigma^2 x_0(X^\top X)^{-1}X^\top X(X^\top X)^{-1}x_0^\top \\
&= \sigma^2 x_0(X^\top X)^{-1}x_0^\top \\
&= \sigma^2 h_0.
\end{align*}
Since $X^\top X$ is positive definite and $x_0 \neq 0$, we have $h_0>0$. Therefore
\begin{align*}
Z_0 :=
\frac{x_0\hat{\beta}-x_0\beta}{\sigma\sqrt{h_0}}
\end{align*}
has distribution $\mathcal{N}(0,1)$.
[/step]
custom_env
admin
[step:Show that the residual variance is an independent chi-square scale estimate]
The random vector $H\varepsilon$ is Gaussian with covariance $\sigma^2H$, and $M\varepsilon$ is Gaussian with covariance $\sigma^2M$. Their cross-covariance is
\begin{align*}
\operatorname{Cov}(H\varepsilon,M\varepsilon)
&= H(\sigma^2 I_n)M^\top \\
&= \sigma^2 HM \\
&= \sigma^2 H(I_n-H) \\
&= 0.
\end{align*}
Since $(H\varepsilon,M\varepsilon)$ is jointly Gaussian, zero cross-covariance implies that $H\varepsilon$ and $M\varepsilon$ are independent.
The numerator $x_0\hat{\beta}-x_0\beta=a_0\varepsilon$ depends only on $H\varepsilon$, because
\begin{align*}
a_0M
= x_0(X^\top X)^{-1}X^\top(I_n-H)
= x_0(X^\top X)^{-1}(X^\top-X^\top H)
=0.
\end{align*}
The residual sum of squares depends only on $M\varepsilon$. Hence $x_0\hat{\beta}-x_0\beta$ and $\operatorname{RSS}$ are independent.
Because $M$ is a symmetric idempotent matrix of rank $n-p$, there exists an orthogonal matrix $Q \in \mathbb{R}^{n \times n}$ such that
\begin{align*}
QMQ^\top
=
\begin{pmatrix}
I_{n-p} & 0 \\
0 & 0
\end{pmatrix}.
\end{align*}
Define $\eta: \Omega \to \mathbb{R}^n$ by
\begin{align*}
\eta := \sigma^{-1}Q\varepsilon.
\end{align*}
Since $Q$ is orthogonal and $\varepsilon \sim \mathcal{N}(0,\sigma^2 I_n)$, we have $\eta \sim \mathcal{N}(0,I_n)$. Therefore
\begin{align*}
\frac{\operatorname{RSS}}{\sigma^2}
&= \frac{|M\varepsilon|^2}{\sigma^2} \\
&= \frac{\varepsilon^\top M\varepsilon}{\sigma^2} \\
&= \eta^\top QMQ^\top \eta \\
&= \sum_{i=1}^{n-p}\eta_i^2.
\end{align*}
Thus $\operatorname{RSS}/\sigma^2$ has the chi-square distribution with $n-p$ degrees of freedom, and it is independent of $Z_0$.
[/step]
custom_env
admin
[step:Studentize the mean-response error and invert the probability statement]
Define
\begin{align*}
W := \frac{\operatorname{RSS}}{\sigma^2}.
\end{align*}
From the preceding step, $W$ has the chi-square distribution with $n-p$ degrees of freedom and is independent of $Z_0$. Since
\begin{align*}
\hat{\sigma}^2
= \frac{\operatorname{RSS}}{n-p}
= \sigma^2\frac{W}{n-p},
\end{align*}
we have
\begin{align*}
T_0
&:=
\frac{x_0\hat{\beta}-x_0\beta}{\hat{\sigma}\sqrt{h_0}} \\
&=
\frac{Z_0}{\sqrt{W/(n-p)}}.
\end{align*}
By the definition of the Student $t$ distribution, $T_0$ has the Student $t$ distribution with $n-p$ degrees of freedom.
Let $c:=t_{n-p,1-\alpha/2}$. Since the Student $t$ distribution is symmetric about $0$,
\begin{align*}
\mathbb{P}(-c \leq T_0 \leq c)=1-\alpha.
\end{align*}
Multiplying the inequalities by the positive random variable $\hat{\sigma}\sqrt{h_0}$ gives
\begin{align*}
\mathbb{P}\left(
-c\hat{\sigma}\sqrt{h_0}
\leq
x_0\hat{\beta}-x_0\beta
\leq
c\hat{\sigma}\sqrt{h_0}
\right)
=1-\alpha.
\end{align*}
Rearranging the two inequalities yields
\begin{align*}
\mathbb{P}\left(
x_0\beta \in
\left[
x_0\hat{\beta}
-
c\hat{\sigma}\sqrt{h_0},
\,
x_0\hat{\beta}
+
c\hat{\sigma}\sqrt{h_0}
\right]
\right)
=1-\alpha.
\end{align*}
This proves the stated confidence interval for the mean response.
[/step]
custom_env
admin
[step:Compute the prediction-error distribution and obtain the prediction interval]
Assume now that $\varepsilon_0: \Omega \to \mathbb{R}$ is independent of $\varepsilon$ and has distribution $\mathcal{N}(0,\sigma^2)$, and define
\begin{align*}
Y_0 := x_0\beta+\varepsilon_0.
\end{align*}
The prediction error centered at $x_0\hat{\beta}$ is
\begin{align*}
x_0\hat{\beta}-Y_0
=
(x_0\hat{\beta}-x_0\beta)-\varepsilon_0.
\end{align*}
The random variables $x_0\hat{\beta}-x_0\beta$ and $\varepsilon_0$ are independent because the first is a function of $\varepsilon$ and $\varepsilon_0$ is independent of $\varepsilon$. Hence $x_0\hat{\beta}-Y_0$ is Gaussian with mean $0$ and variance
\begin{align*}
\operatorname{Var}(x_0\hat{\beta}-Y_0)
&=
\operatorname{Var}(x_0\hat{\beta}-x_0\beta)
+
\operatorname{Var}(\varepsilon_0) \\
&=
\sigma^2h_0+\sigma^2 \\
&=
\sigma^2(1+h_0).
\end{align*}
Define
\begin{align*}
Z_1 :=
\frac{x_0\hat{\beta}-Y_0}{\sigma\sqrt{1+h_0}}.
\end{align*}
Then $Z_1 \sim \mathcal{N}(0,1)$.
The variable $Z_1$ is independent of $W=\operatorname{RSS}/\sigma^2$. Indeed, $W$ is a function of $M\varepsilon$, while $x_0\hat{\beta}-x_0\beta$ is a function of $H\varepsilon$, and these two Gaussian components are independent; moreover $\varepsilon_0$ is independent of $\varepsilon$, hence independent of both $H\varepsilon$ and $M\varepsilon$. Therefore
\begin{align*}
T_1
&:=
\frac{x_0\hat{\beta}-Y_0}{\hat{\sigma}\sqrt{1+h_0}} \\
&=
\frac{Z_1}{\sqrt{W/(n-p)}}
\end{align*}
has the Student $t$ distribution with $n-p$ degrees of freedom.
Again set $c:=t_{n-p,1-\alpha/2}$. Symmetry of the Student $t$ distribution gives
\begin{align*}
\mathbb{P}(-c \leq T_1 \leq c)=1-\alpha.
\end{align*}
Multiplying by the positive random variable $\hat{\sigma}\sqrt{1+h_0}$ and rearranging gives
\begin{align*}
\mathbb{P}\left(
Y_0 \in
\left[
x_0\hat{\beta}
-
c\hat{\sigma}\sqrt{1+h_0},
\,
x_0\hat{\beta}
+
c\hat{\sigma}\sqrt{1+h_0}
\right]
\right)
=1-\alpha.
\end{align*}
This is exactly the stated prediction interval.
[/step]