[proofplan]
The extra sum of squares $\mathrm{RSS}_0 - \mathrm{RSS}$ is a quadratic form in $Y$ with matrix $P - P_0$, where $P$ and $P_0$ are the orthogonal projections onto the column spaces of $X$ and $X_0$. Since the null hypothesis embeds the small model into the large one, $P_0 = P P_0$, which forces $P - P_0$ to be a symmetric idempotent of rank $p - p_0$ and orthogonal to the residual projection $I_n - P$. Centering at $Z = Y - X_0 \beta_0$ reduces the problem to quadratic forms in a centered Gaussian vector, at which point the [Independence of Orthogonal Quadratic Forms](/theorems/1448) and the [Chi-Squared Distribution of RSS](/theorems/1443) give the two chi-squared laws and their independence. The ratio definition of the F-distribution then yields $F \sim F_{p - p_0,\,n-p}$.
[/proofplan]
[step:Express $\mathrm{RSS}$ and $\mathrm{RSS}_0$ as quadratic forms in $Y$ via the projection matrices]
Let $P = X(X^\top X)^{-1} X^\top$ and $P_0 = X_0 (X_0^\top X_0)^{-1} X_0^\top$ be the orthogonal projections of $\mathbb{R}^n$ onto the column spaces $\operatorname{Range}(X)$ and $\operatorname{Range}(X_0)$ respectively. Both matrices are symmetric and idempotent. The residual vectors in the two models are $R = (I_n - P)Y$ and $R_0 = (I_n - P_0)Y$. Using symmetry and idempotence of $I_n - P$ (and analogously for $I_n - P_0$),
\begin{align*}
\mathrm{RSS} = R^\top R = Y^\top (I_n - P)^\top (I_n - P)Y = Y^\top (I_n - P) Y, \\
\mathrm{RSS}_0 = Y^\top (I_n - P_0)Y.
\end{align*}
Subtracting,
\begin{align*}
\mathrm{RSS}_0 - \mathrm{RSS} = Y^\top \bigl[(I_n - P_0) - (I_n - P)\bigr] Y = Y^\top (P - P_0) Y.
\end{align*}
[/step]
[step:Verify that $P - P_0$ is symmetric idempotent of rank $p - p_0$]
**Nested range.** Since $X_0$ is a sub-matrix of $X$ (the null model is nested in the full model), the columns of $X_0$ lie in $\operatorname{Range}(X)$, so $\operatorname{Range}(X_0) \subseteq \operatorname{Range}(X)$. A projection leaves its range pointwise fixed: $P y = y$ for every $y \in \operatorname{Range}(X)$. Applied columnwise to $P_0 = X_0(X_0^\top X_0)^{-1} X_0^\top$ (whose range is $\operatorname{Range}(X_0) \subseteq \operatorname{Range}(X)$), this gives
\begin{align*}
P P_0 = P_0, \qquad \text{and taking transposes,} \qquad P_0 P = P_0^\top P^\top = P_0.
\end{align*}
**Symmetry.** Both $P$ and $P_0$ are symmetric, so $P - P_0$ is symmetric.
**Idempotence.** Using $PP_0 = P_0 P = P_0$ and $P^2 = P$, $P_0^2 = P_0$:
\begin{align*}
(P - P_0)^2 = P^2 - P P_0 - P_0 P + P_0^2 = P - P_0 - P_0 + P_0 = P - P_0.
\end{align*}
**Rank.** For a symmetric idempotent, rank equals trace. Since $\operatorname{tr}(P) = p$ (the rank of $X$, see the computation in the proof of the [Chi-Squared Distribution of RSS](/theorems/1443)) and $\operatorname{tr}(P_0) = p_0$ analogously,
\begin{align*}
\operatorname{rank}(P - P_0) = \operatorname{tr}(P - P_0) = \operatorname{tr}(P) - \operatorname{tr}(P_0) = p - p_0.
\end{align*}
[guided]
We will apply the [Independence of Orthogonal Quadratic Forms](/theorems/1448) and the [Chi-Squared Distribution of RSS](/theorems/1443) to the matrices $I_n - P$ and $P - P_0$. Both results require the matrices to be **symmetric** and **idempotent**, and the independence result also requires the product to vanish. We therefore verify these properties carefully.
The key geometric fact is that $P_0$ projects onto a subspace of the range of $P$. Why? The null hypothesis $H_0: X = X_0$ (with the extra columns zeroed out) means the columns of $X_0$ sit inside $X$, so $\operatorname{Range}(X_0) \subseteq \operatorname{Range}(X)$. If $y \in \operatorname{Range}(X)$, then $P$ fixes $y$. Since every column of $P_0$ lies in $\operatorname{Range}(X_0) \subseteq \operatorname{Range}(X)$, we get $P P_0 = P_0$. Transposing (both $P$ and $P_0$ are symmetric) gives $P_0 P = P_0$.
**Symmetry** is clear: $(P - P_0)^\top = P^\top - P_0^\top = P - P_0$.
**Idempotence.** We expand:
\begin{align*}
(P - P_0)^2 = P^2 - P P_0 - P_0 P + P_0^2.
\end{align*}
The identity $P^2 = P$ holds because $P$ is a projection. The cross-terms both equal $P_0$ by the nested-range identity. And $P_0^2 = P_0$. So:
\begin{align*}
(P - P_0)^2 = P - P_0 - P_0 + P_0 = P - P_0.
\end{align*}
**Rank.** For a symmetric idempotent $A$, the eigenvalues are $0$ and $1$, so $\operatorname{rank}(A) = \operatorname{tr}(A)$. We computed $\operatorname{tr}(P) = p$ previously (via $\operatorname{tr}(X(X^\top X)^{-1}X^\top) = \operatorname{tr}((X^\top X)^{-1}X^\top X) = \operatorname{tr}(I_p) = p$), and by the same argument $\operatorname{tr}(P_0) = p_0$. Thus $\operatorname{rank}(P - P_0) = p - p_0$. This rank is exactly the number of constraints imposed by $H_0$ — the number of coefficients set to zero — which is why it shows up as the numerator degrees of freedom of the F-statistic.
[/guided]
[/step]
[step:Check the orthogonality $(I_n - P)(P - P_0) = 0$]
Expanding and again using $P^2 = P$, $P P_0 = P_0$:
\begin{align*}
(I_n - P)(P - P_0) = P - P_0 - P^2 + P P_0 = P - P_0 - P + P_0 = 0.
\end{align*}
So the projection onto the residual space of the full model is orthogonal (as matrices) to the projection $P - P_0$ that picks out the increment from $H_0$ to the full model.
[/step]
[step:Center at $Z = Y - X_0 \beta_0$ and rewrite both quadratic forms in $Z$]
Under $H_0$, $Y \sim N_n(X_0 \beta_0, \sigma^2 I_n)$, so
\begin{align*}
Z := Y - X_0 \beta_0 \sim N_n(\mathbf{0}, \sigma^2 I_n).
\end{align*}
We show the two quadratic forms in $Y$ coincide with the same quadratic forms in $Z$. Expanding $Y^\top A Y$ for any symmetric matrix $A$ with $Y = Z + X_0 \beta_0$:
\begin{align*}
Y^\top A Y = Z^\top A Z + 2\,\beta_0^\top X_0^\top A Z + \beta_0^\top X_0^\top A X_0 \beta_0.
\end{align*}
For $A = I_n - P$: the columns of $X_0$ lie in $\operatorname{Range}(X_0) \subseteq \operatorname{Range}(X)$, and $P$ fixes $\operatorname{Range}(X)$, so $(I_n - P) X_0 = X_0 - P X_0 = X_0 - X_0 = 0$ and $X_0^\top (I_n - P) = 0$ by transposition. Both cross-term and constant vanish, giving
\begin{align*}
\mathrm{RSS} = Y^\top(I_n - P)Y = Z^\top (I_n - P)Z.
\end{align*}
For $A = P - P_0$: we must show $(P - P_0) X_0 = 0$. Since $P X_0 = X_0$ (as above) and $P_0 X_0 = X_0$ (because the columns of $X_0$ lie in $\operatorname{Range}(X_0)$ and $P_0$ fixes that range),
\begin{align*}
(P - P_0) X_0 = X_0 - X_0 = 0,
\end{align*}
and $X_0^\top (P - P_0) = 0$ by transposition. Therefore
\begin{align*}
\mathrm{RSS}_0 - \mathrm{RSS} = Y^\top(P - P_0)Y = Z^\top (P - P_0)Z.
\end{align*}
[guided]
The quadratic forms $\mathrm{RSS}$ and $\mathrm{RSS}_0 - \mathrm{RSS}$ are written in terms of $Y$, but to apply the independence and chi-squared lemmas we need a **centered** Gaussian vector with covariance $\sigma^2 I_n$. Under $H_0$, the mean of $Y$ is $X_0 \beta_0$, so we define $Z = Y - X_0 \beta_0 \sim N_n(\mathbf{0}, \sigma^2 I_n)$. We now show that replacing $Y$ by $Z$ in each quadratic form gives the same value.
The algebraic identity we need is: for a symmetric matrix $A$ with $A X_0 = 0$,
\begin{align*}
Y^\top A Y = (Z + X_0 \beta_0)^\top A (Z + X_0 \beta_0) = Z^\top A Z + 2 \beta_0^\top X_0^\top A Z + \beta_0^\top X_0^\top A X_0 \beta_0.
\end{align*}
When $A X_0 = 0$ (and hence $X_0^\top A = 0$ by transposition, using $A^\top = A$), both the cross term and the constant vanish, so $Y^\top A Y = Z^\top A Z$.
**Case $A = I_n - P$.** We verify $(I_n - P) X_0 = 0$. Because the null model is nested in the full model, every column of $X_0$ is a column of $X$, so $\operatorname{Range}(X_0) \subseteq \operatorname{Range}(X)$. The projection $P$ fixes its range, so $P X_0 = X_0$, giving $(I_n - P) X_0 = X_0 - X_0 = 0$.
**Case $A = P - P_0$.** Again $P X_0 = X_0$ by the argument above, and $P_0 X_0 = X_0$ because the columns of $X_0$ lie in $\operatorname{Range}(X_0)$ and $P_0$ fixes that range by definition. So $(P - P_0) X_0 = X_0 - X_0 = 0$.
In both cases, the quadratic form is unchanged under the shift, and we have:
\begin{align*}
\mathrm{RSS} = Z^\top (I_n - P) Z, \qquad \mathrm{RSS}_0 - \mathrm{RSS} = Z^\top (P - P_0) Z.
\end{align*}
This is why we insist on $H_0$: if the true model had a coefficient outside $\operatorname{Range}(X_0)$, then $(P - P_0) X_0 = 0$ would still hold but the relevant direction would not be fixed by $P_0$, the cross term would not vanish, and $\mathrm{RSS}_0 - \mathrm{RSS}$ would pick up a non-central component, shifting its distribution to a non-central chi-squared. This non-centrality is the source of the test's power under the alternative.
[/guided]
[/step]
[step:Apply the chi-squared and independence lemmas to $Z^\top (I_n - P)Z$ and $Z^\top (P - P_0)Z$]
We apply the [Quadratic Forms and Idempotent Matrices](/theorems/1441) lemma to each of $I_n - P$ and $P - P_0$. The lemma requires the matrix to be symmetric idempotent and $Z \sim N_n(\mathbf{0}, \sigma^2 I_n)$; its conclusion is that $Z^\top A Z / \sigma^2 \sim \chi^2_r$, where $r = \operatorname{rank}(A)$.
- $I_n - P$ is symmetric idempotent of rank $n - p$ (standard computation; see the proof of the [Chi-Squared Distribution of RSS](/theorems/1443)). Hence
\begin{align*}
\frac{\mathrm{RSS}}{\sigma^2} = \frac{Z^\top(I_n - P)Z}{\sigma^2} \sim \chi^2_{n - p}.
\end{align*}
- $P - P_0$ is symmetric idempotent of rank $p - p_0$ by Step 2. Hence
\begin{align*}
\frac{\mathrm{RSS}_0 - \mathrm{RSS}}{\sigma^2} = \frac{Z^\top(P - P_0)Z}{\sigma^2} \sim \chi^2_{p - p_0}.
\end{align*}
For independence, we apply the [Independence of Orthogonal Quadratic Forms](/theorems/1448) with $A_1 = I_n - P$ and $A_2 = P - P_0$. The hypotheses are:
1. $Z \sim N_n(\mathbf{0}, \sigma^2 I_n)$ — verified in Step 4.
2. $A_1, A_2$ symmetric idempotent — verified in Step 2 and above.
3. $A_1 A_2 = 0$ — verified in Step 3: $(I_n - P)(P - P_0) = 0$.
The lemma's conclusion is that $Z^\top A_1 Z$ and $Z^\top A_2 Z$ are independent, i.e.\ $\mathrm{RSS}$ and $\mathrm{RSS}_0 - \mathrm{RSS}$ are independent.
[/step]
[step:Assemble the F-statistic as a ratio of independent scaled chi-squareds]
By definition, if $U \sim \chi^2_a$ and $V \sim \chi^2_b$ are independent, then $(U/a)/(V/b) \sim F_{a, b}$. Applying this with
\begin{align*}
U = \frac{\mathrm{RSS}_0 - \mathrm{RSS}}{\sigma^2} \sim \chi^2_{p - p_0}, \qquad V = \frac{\mathrm{RSS}}{\sigma^2} \sim \chi^2_{n - p},
\end{align*}
which are independent by Step 5, the $\sigma^2$ cancels in the ratio:
\begin{align*}
F = \frac{(\mathrm{RSS}_0 - \mathrm{RSS})/(p - p_0)}{\mathrm{RSS}/(n - p)} = \frac{U/(p - p_0)}{V/(n - p)} \sim F_{p - p_0,\, n - p}.
\end{align*}
This completes the proof: under $H_0$, the test statistic $F$ follows the $F$-distribution with $(p - p_0, n - p)$ degrees of freedom.
[/step]