[proofplan]
We express the unrestricted and restricted least-squares estimators explicitly and identify the loss caused by imposing the affine constraint $Rb=r$. The excess residual sum of squares is the quadratic form in the constraint discrepancy $R\hat{\beta}-r$, while the unrestricted residual sum of squares is the squared length of the Gaussian residual projection. Under $H_0$, these two quantities are independent chi-squared variables after division by $\sigma^2$, with degrees of freedom $q$ and $n-p$ respectively. The stated statistic is therefore their ratio divided by degrees of freedom, which is the defining form of the $F_{q,n-p}$ distribution.
[/proofplan]
[step:Write the unrestricted fit as an orthogonal projection]
Define
\begin{align*}
C &:= (X^\top X)^{-1} \in \mathbb{R}^{p \times p}, \\
P_X &:= XCX^\top \in \mathbb{R}^{n \times n}.
\end{align*}
Since $X$ has rank $p$, $X^\top X$ is invertible, and $P_X$ is the [orthogonal projection](/theorems/437) of $\mathbb{R}^n$ onto $\operatorname{Range}(X)$. The unrestricted least-squares estimator is the map
\begin{align*}
\hat{\beta}: \mathbb{R}^n &\to \mathbb{R}^p \\
y &\mapsto C X^\top y.
\end{align*}
Thus the unrestricted fitted value is $X\hat{\beta}=P_XY$, and
\begin{align*}
\operatorname{RSS}_1 = |Y-X\hat{\beta}|^2 = |(I_n-P_X)Y|^2.
\end{align*}
[guided]
We first separate the part of $Y$ explained by the design matrix from the residual part. Define
\begin{align*}
C &:= (X^\top X)^{-1} \in \mathbb{R}^{p \times p}, \\
P_X &:= XCX^\top \in \mathbb{R}^{n \times n}.
\end{align*}
The rank assumption $\operatorname{rank}(X)=p$ implies that $X^\top X$ is invertible. The matrix $P_X$ is symmetric and idempotent:
\begin{align*}
P_X^\top = P_X,
\qquad
P_X^2 = XCX^\top XCX^\top = XC(X^\top X)CX^\top = XCX^\top = P_X.
\end{align*}
Therefore $P_X$ is the orthogonal projection onto $\operatorname{Range}(X)$.
The unrestricted least-squares estimator is the [linear map](/page/Linear%20Map)
\begin{align*}
\hat{\beta}: \mathbb{R}^n &\to \mathbb{R}^p \\
y &\mapsto C X^\top y.
\end{align*}
For the observed random vector $Y$, the fitted value is $X\hat{\beta}=XCX^\top Y=P_XY$. Since least squares residuals are orthogonal to the column space of $X$, the unrestricted residual vector is
\begin{align*}
Y-X\hat{\beta}=(I_n-P_X)Y.
\end{align*}
Hence
\begin{align*}
\operatorname{RSS}_1 = |Y-X\hat{\beta}|^2 = |(I_n-P_X)Y|^2.
\end{align*}
[/guided]
[/step]
[step:Compute the excess sum of squares from the affine restriction]
Define
\begin{align*}
A := RCR^\top \in \mathbb{R}^{q \times q}.
\end{align*}
The matrix $A$ is positive definite because $R$ has rank $q$ and $C$ is positive definite. The restricted estimator is
\begin{align*}
\tilde{\beta}
:= \hat{\beta} - CR^\top A^{-1}(R\hat{\beta}-r).
\end{align*}
Indeed,
\begin{align*}
R\tilde{\beta}
= R\hat{\beta} - RCR^\top A^{-1}(R\hat{\beta}-r)
= R\hat{\beta} - (R\hat{\beta}-r)
= r.
\end{align*}
For every $b \in \mathbb{R}^p$ with $Rb=r$, write $h:=b-\tilde{\beta}$. Then $Rh=0$, and
\begin{align*}
(Y-Xb)
= (Y-X\hat{\beta}) + X(\hat{\beta}-\tilde{\beta}) - Xh.
\end{align*}
The three terms are pairwise orthogonal: $Y-X\hat{\beta}$ is orthogonal to $\operatorname{Range}(X)$, and
\begin{align*}
\langle X(\hat{\beta}-\tilde{\beta}),Xh\rangle_{\mathbb{R}^n}
&= (\hat{\beta}-\tilde{\beta})^\top X^\top Xh \\
&= (R\hat{\beta}-r)^\top A^{-1}RCX^\top Xh \\
&= (R\hat{\beta}-r)^\top A^{-1}Rh \\
&= 0.
\end{align*}
Therefore $\tilde{\beta}$ minimizes the restricted least-squares problem, and
\begin{align*}
\operatorname{RSS}_0-\operatorname{RSS}_1
&= |X(\hat{\beta}-\tilde{\beta})|^2 \\
&= (\hat{\beta}-\tilde{\beta})^\top X^\top X(\hat{\beta}-\tilde{\beta}) \\
&= (R\hat{\beta}-r)^\top A^{-1}(R\hat{\beta}-r).
\end{align*}
[guided]
The restricted problem is an affine least-squares problem because the constraint is $Rb=r$, not necessarily $Rb=0$. Define
\begin{align*}
A := RCR^\top \in \mathbb{R}^{q \times q}.
\end{align*}
We need $A^{-1}$ to exist. Since $C=(X^\top X)^{-1}$ is positive definite and $R$ has rank $q$, every nonzero vector $u \in \mathbb{R}^q$ satisfies $R^\top u \ne 0$, and hence
\begin{align*}
u^\top A u
= u^\top RCR^\top u
= (R^\top u)^\top C(R^\top u)
> 0.
\end{align*}
Thus $A$ is positive definite and invertible.
Define
\begin{align*}
\tilde{\beta}
:= \hat{\beta} - CR^\top A^{-1}(R\hat{\beta}-r).
\end{align*}
This is the unrestricted estimator corrected in the directions needed to enforce the constraint. We verify feasibility:
\begin{align*}
R\tilde{\beta}
= R\hat{\beta} - RCR^\top A^{-1}(R\hat{\beta}-r)
= R\hat{\beta} - A A^{-1}(R\hat{\beta}-r)
= r.
\end{align*}
Now let $b \in \mathbb{R}^p$ satisfy $Rb=r$, and define $h:=b-\tilde{\beta} \in \mathbb{R}^p$. Since both $b$ and $\tilde{\beta}$ satisfy the restriction, $Rh=0$. Decompose the residual:
\begin{align*}
Y-Xb
= (Y-X\hat{\beta}) + X(\hat{\beta}-\tilde{\beta}) - Xh.
\end{align*}
The first term is orthogonal to $\operatorname{Range}(X)$ by the normal equations for unrestricted least squares, so it is orthogonal to both $X(\hat{\beta}-\tilde{\beta})$ and $Xh$. It remains to check the orthogonality of the two fitted-space terms:
\begin{align*}
\langle X(\hat{\beta}-\tilde{\beta}),Xh\rangle_{\mathbb{R}^n}
&= (\hat{\beta}-\tilde{\beta})^\top X^\top Xh \\
&= \bigl(CR^\top A^{-1}(R\hat{\beta}-r)\bigr)^\top X^\top Xh \\
&= (R\hat{\beta}-r)^\top A^{-1}RCX^\top Xh \\
&= (R\hat{\beta}-r)^\top A^{-1}Rh \\
&= 0.
\end{align*}
The [Pythagorean theorem](/theorems/3266) gives
\begin{align*}
|Y-Xb|^2
= |Y-X\hat{\beta}|^2 + |X(\hat{\beta}-\tilde{\beta})|^2 + |Xh|^2.
\end{align*}
This is minimized exactly when $h=0$, so $\tilde{\beta}$ is the restricted least-squares estimator. Therefore
\begin{align*}
\operatorname{RSS}_0-\operatorname{RSS}_1
&= |X(\hat{\beta}-\tilde{\beta})|^2 \\
&= (\hat{\beta}-\tilde{\beta})^\top X^\top X(\hat{\beta}-\tilde{\beta}) \\
&= (R\hat{\beta}-r)^\top A^{-1}RCX^\top XCR^\top A^{-1}(R\hat{\beta}-r) \\
&= (R\hat{\beta}-r)^\top A^{-1}RCR^\top A^{-1}(R\hat{\beta}-r) \\
&= (R\hat{\beta}-r)^\top A^{-1}(R\hat{\beta}-r).
\end{align*}
[/guided]
[/step]
[step:Identify the numerator as a chi-squared variable with $q$ degrees of freedom]
Under $H_0$, $R\beta=r$. Since
\begin{align*}
\hat{\beta}
= C X^\top Y
= \beta + C X^\top \varepsilon,
\end{align*}
we have
\begin{align*}
R\hat{\beta}-r = RCX^\top \varepsilon.
\end{align*}
Thus $R\hat{\beta}-r$ is Gaussian with mean $0$ and covariance matrix
\begin{align*}
\operatorname{Cov}(R\hat{\beta}-r)
= \sigma^2 RCX^\top XCR^\top
= \sigma^2 RCR^\top
= \sigma^2 A.
\end{align*}
Let $A^{1/2} \in \mathbb{R}^{q \times q}$ denote the unique symmetric positive definite square root of $A$, and let $A^{-1/2}:=(A^{1/2})^{-1}$. Define the random vector
\begin{align*}
Z := \sigma^{-1} A^{-1/2}(R\hat{\beta}-r) \in \mathbb{R}^q.
\end{align*}
Then $Z$ has the $q$-dimensional normal distribution with mean $0$ and covariance $I_q$, denoted $Z \sim \mathcal{N}_q(0,I_q)$. Therefore $Z^\top Z$ has the chi-squared distribution with $q$ degrees of freedom, denoted $\chi^2_q$, and
\begin{align*}
\frac{\operatorname{RSS}_0-\operatorname{RSS}_1}{\sigma^2}
= Z^\top Z
\sim \chi^2_q.
\end{align*}
[guided]
Under the null hypothesis $H_0$, the true parameter satisfies $R\beta=r$. The normal linear model writes the observed random vector as $Y=X\beta+\varepsilon$, where $\varepsilon \sim \mathcal{N}_n(0,\sigma^2 I_n)$. Substituting this into the unrestricted estimator gives
\begin{align*}
\hat{\beta}
= C X^\top Y
= C X^\top(X\beta+\varepsilon)
= C(X^\top X)\beta + C X^\top \varepsilon
= \beta + C X^\top \varepsilon.
\end{align*}
Applying $R$ and using $R\beta=r$ yields
\begin{align*}
R\hat{\beta}-r = RCX^\top \varepsilon.
\end{align*}
Thus $R\hat{\beta}-r$ is a linear transformation of the Gaussian vector $\varepsilon$, so it is Gaussian. Its mean is zero, and its covariance matrix is
\begin{align*}
\operatorname{Cov}(R\hat{\beta}-r)
&= \operatorname{Cov}(RCX^\top \varepsilon) \\
&= RCX^\top \operatorname{Cov}(\varepsilon)XCR^\top \\
&= \sigma^2 RCX^\top XCR^\top \\
&= \sigma^2 RCR^\top \\
&= \sigma^2 A.
\end{align*}
We now standardize this Gaussian vector. Since $A$ is symmetric positive definite, it has a unique symmetric positive definite square root $A^{1/2} \in \mathbb{R}^{q \times q}$. Define $A^{-1/2}:=(A^{1/2})^{-1}$ and define
\begin{align*}
Z := \sigma^{-1} A^{-1/2}(R\hat{\beta}-r) \in \mathbb{R}^q.
\end{align*}
The covariance of $Z$ is
\begin{align*}
\operatorname{Cov}(Z)
&= \sigma^{-2}A^{-1/2}\operatorname{Cov}(R\hat{\beta}-r)A^{-1/2} \\
&= \sigma^{-2}A^{-1/2}(\sigma^2 A)A^{-1/2} \\
&= I_q.
\end{align*}
Hence $Z$ has the $q$-dimensional normal distribution with mean $0$ and covariance $I_q$, denoted $Z \sim \mathcal{N}_q(0,I_q)$. By the definition of the chi-squared distribution, $Z^\top Z$ has the chi-squared distribution with $q$ degrees of freedom, denoted $\chi^2_q$. Finally,
\begin{align*}
Z^\top Z
&= \sigma^{-2}(R\hat{\beta}-r)^\top A^{-1/2}A^{-1/2}(R\hat{\beta}-r) \\
&= \sigma^{-2}(R\hat{\beta}-r)^\top A^{-1}(R\hat{\beta}-r) \\
&= \frac{\operatorname{RSS}_0-\operatorname{RSS}_1}{\sigma^2}.
\end{align*}
Therefore
\begin{align*}
\frac{\operatorname{RSS}_0-\operatorname{RSS}_1}{\sigma^2}
\sim \chi^2_q.
\end{align*}
[/guided]
[/step]
[step:Identify the denominator as an independent chi-squared variable with $n-p$ degrees of freedom]
Since $P_X$ is an orthogonal projection of rank $p$, the matrix $I_n-P_X$ is an orthogonal projection of rank $n-p$. Under $H_0$,
\begin{align*}
(I_n-P_X)Y
= (I_n-P_X)(X\beta+\varepsilon)
= (I_n-P_X)\varepsilon,
\end{align*}
because $X\beta \in \operatorname{Range}(X)$. Therefore
\begin{align*}
\frac{\operatorname{RSS}_1}{\sigma^2}
= \frac{|(I_n-P_X)\varepsilon|^2}{\sigma^2}
\sim \chi^2_{n-p}.
\end{align*}
Moreover, $R\hat{\beta}-r=RCX^\top\varepsilon$ and $(I_n-P_X)\varepsilon$ are jointly Gaussian linear transformations of $\varepsilon$, and their covariance is
\begin{align*}
\operatorname{Cov}\bigl(RCX^\top\varepsilon,(I_n-P_X)\varepsilon\bigr)
= \sigma^2 RCX^\top(I_n-P_X)
= 0.
\end{align*}
Joint Gaussian random vectors with zero covariance are independent, so
\begin{align*}
\frac{\operatorname{RSS}_0-\operatorname{RSS}_1}{\sigma^2}
\quad\text{and}\quad
\frac{\operatorname{RSS}_1}{\sigma^2}
\end{align*}
are independent.
[guided]
The unrestricted residual is the part of the noise lying orthogonally to the column space of $X$. Since $P_X$ is an orthogonal projection onto a $p$-dimensional subspace, $I_n-P_X$ is an orthogonal projection onto the orthogonal complement, whose dimension is $n-p$.
Under $H_0$ we still have $Y=X\beta+\varepsilon$, and $X\beta \in \operatorname{Range}(X)$. Hence
\begin{align*}
(I_n-P_X)Y
= (I_n-P_X)(X\beta+\varepsilon)
= (I_n-P_X)X\beta + (I_n-P_X)\varepsilon
= (I_n-P_X)\varepsilon.
\end{align*}
Because $\varepsilon \sim \mathcal{N}_n(0,\sigma^2 I_n)$, the squared length of its projection onto an $(n-p)$-dimensional subspace, divided by $\sigma^2$, has the $\chi^2_{n-p}$ distribution:
\begin{align*}
\frac{\operatorname{RSS}_1}{\sigma^2}
= \frac{|(I_n-P_X)\varepsilon|^2}{\sigma^2}
\sim \chi^2_{n-p}.
\end{align*}
We also need independence from the numerator. The numerator depends only on
\begin{align*}
R\hat{\beta}-r=RCX^\top\varepsilon,
\end{align*}
while the denominator depends only on $(I_n-P_X)\varepsilon$. These are jointly Gaussian because both are linear transformations of the Gaussian vector $\varepsilon$. Their covariance is
\begin{align*}
\operatorname{Cov}\bigl(RCX^\top\varepsilon,(I_n-P_X)\varepsilon\bigr)
= \sigma^2 RCX^\top(I_n-P_X).
\end{align*}
Since $P_X=XCX^\top$, we have $X^\top P_X=X^\top$, and therefore
\begin{align*}
X^\top(I_n-P_X)=X^\top-X^\top P_X=0.
\end{align*}
Thus the covariance is zero. For jointly Gaussian random vectors, zero covariance implies independence. Consequently,
\begin{align*}
\frac{\operatorname{RSS}_0-\operatorname{RSS}_1}{\sigma^2}
\quad\text{and}\quad
\frac{\operatorname{RSS}_1}{\sigma^2}
\end{align*}
are independent.
[/guided]
[/step]
[step:Form the ratio and conclude the $F$ distribution]
Define
\begin{align*}
U := \frac{\operatorname{RSS}_0-\operatorname{RSS}_1}{\sigma^2},
\qquad
V := \frac{\operatorname{RSS}_1}{\sigma^2}.
\end{align*}
The previous steps show that $U \sim \chi^2_q$, $V \sim \chi^2_{n-p}$, and $U$ and $V$ are independent. The distribution $F_{q,n-p}$ denotes the law of $(U/q)/(V/(n-p))$ when $U$ and $V$ are independent chi-squared random variables with $q$ and $n-p$ degrees of freedom. Hence
\begin{align*}
\frac{U/q}{V/(n-p)} \sim F_{q,n-p}.
\end{align*}
Substituting the definitions of $U$ and $V$ gives
\begin{align*}
\frac{U/q}{V/(n-p)}
=
\frac{(\operatorname{RSS}_0-\operatorname{RSS}_1)/q}{\operatorname{RSS}_1/(n-p)}.
\end{align*}
Therefore
\begin{align*}
F
=
\frac{(\operatorname{RSS}_0-\operatorname{RSS}_1)/q}{\operatorname{RSS}_1/(n-p)}
\sim F_{q,n-p},
\end{align*}
as required.
[/step]