Partial F-Test for Linear Restrictions in the Normal Linear Model

Partial F-Test for Linear Restrictions in the Normal Linear Model (Theorem # 4445)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We express the unrestricted and restricted least-squares estimators explicitly and identify the loss caused by imposing the affine constraint $Rb=r$. The excess residual sum of squares is the quadratic form in the constraint discrepancy $R\hat{\beta}-r$, while the unrestricted residual sum of squares is the squared length of the Gaussian residual projection. Under $H_0$, these two quantities are independent chi-squared variables after division by $\sigma^2$, with degrees of freedom $q$ and $n-p$ respectively. The stated statistic is therefore their ratio divided by degrees of freedom, which is the defining form of the $F_{q,n-p}$ distribution. [/proofplan] [step:Write the unrestricted fit as an orthogonal projection] Define \begin{align*} C &:= (X^\top X)^{-1} \in \mathbb{R}^{p \times p}, \\ P_X &:= XCX^\top \in \mathbb{R}^{n \times n}. \end{align*} Since $X$ has rank $p$, $X^\top X$ is invertible, and $P_X$ is the [orthogonal projection](/theorems/437) of $\mathbb{R}^n$ onto $\operatorname{Range}(X)$. The unrestricted least-squares estimator is the map \begin{align*} \hat{\beta}: \mathbb{R}^n &\to \mathbb{R}^p \\ y &\mapsto C X^\top y. \end{align*} Thus the unrestricted fitted value is $X\hat{\beta}=P_XY$, and \begin{align*} \operatorname{RSS}_1 = |Y-X\hat{\beta}|^2 = |(I_n-P_X)Y|^2. \end{align*} [guided] We first separate the part of $Y$ explained by the design matrix from the residual part. Define \begin{align*} C &:= (X^\top X)^{-1} \in \mathbb{R}^{p \times p}, \\ P_X &:= XCX^\top \in \mathbb{R}^{n \times n}. \end{align*} The rank assumption $\operatorname{rank}(X)=p$ implies that $X^\top X$ is invertible. The matrix $P_X$ is symmetric and idempotent: \begin{align*} P_X^\top = P_X, \qquad P_X^2 = XCX^\top XCX^\top = XC(X^\top X)CX^\top = XCX^\top = P_X. \end{align*} Therefore $P_X$ is the orthogonal projection onto $\operatorname{Range}(X)$. The unrestricted least-squares estimator is the [linear map](/page/Linear%20Map) \begin{align*} \hat{\beta}: \mathbb{R}^n &\to \mathbb{R}^p \\ y &\mapsto C X^\top y. \end{align*} For the observed random vector $Y$, the fitted value is $X\hat{\beta}=XCX^\top Y=P_XY$. Since least squares residuals are orthogonal to the column space of $X$, the unrestricted residual vector is \begin{align*} Y-X\hat{\beta}=(I_n-P_X)Y. \end{align*} Hence \begin{align*} \operatorname{RSS}_1 = |Y-X\hat{\beta}|^2 = |(I_n-P_X)Y|^2. \end{align*} [/guided] [/step] [step:Compute the excess sum of squares from the affine restriction] Define \begin{align*} A := RCR^\top \in \mathbb{R}^{q \times q}. \end{align*} The matrix $A$ is positive definite because $R$ has rank $q$ and $C$ is positive definite. The restricted estimator is \begin{align*} \tilde{\beta} := \hat{\beta} - CR^\top A^{-1}(R\hat{\beta}-r). \end{align*} Indeed, \begin{align*} R\tilde{\beta} = R\hat{\beta} - RCR^\top A^{-1}(R\hat{\beta}-r) = R\hat{\beta} - (R\hat{\beta}-r) = r. \end{align*} For every $b \in \mathbb{R}^p$ with $Rb=r$, write $h:=b-\tilde{\beta}$. Then $Rh=0$, and \begin{align*} (Y-Xb) = (Y-X\hat{\beta}) + X(\hat{\beta}-\tilde{\beta}) - Xh. \end{align*} The three terms are pairwise orthogonal: $Y-X\hat{\beta}$ is orthogonal to $\operatorname{Range}(X)$, and \begin{align*} \langle X(\hat{\beta}-\tilde{\beta}),Xh\rangle_{\mathbb{R}^n} &= (\hat{\beta}-\tilde{\beta})^\top X^\top Xh \\ &= (R\hat{\beta}-r)^\top A^{-1}RCX^\top Xh \\ &= (R\hat{\beta}-r)^\top A^{-1}Rh \\ &= 0. \end{align*} Therefore $\tilde{\beta}$ minimizes the restricted least-squares problem, and \begin{align*} \operatorname{RSS}_0-\operatorname{RSS}_1 &= |X(\hat{\beta}-\tilde{\beta})|^2 \\ &= (\hat{\beta}-\tilde{\beta})^\top X^\top X(\hat{\beta}-\tilde{\beta}) \\ &= (R\hat{\beta}-r)^\top A^{-1}(R\hat{\beta}-r). \end{align*} [guided] The restricted problem is an affine least-squares problem because the constraint is $Rb=r$, not necessarily $Rb=0$. Define \begin{align*} A := RCR^\top \in \mathbb{R}^{q \times q}. \end{align*} We need $A^{-1}$ to exist. Since $C=(X^\top X)^{-1}$ is positive definite and $R$ has rank $q$, every nonzero vector $u \in \mathbb{R}^q$ satisfies $R^\top u \ne 0$, and hence \begin{align*} u^\top A u = u^\top RCR^\top u = (R^\top u)^\top C(R^\top u) > 0. \end{align*} Thus $A$ is positive definite and invertible. Define \begin{align*} \tilde{\beta} := \hat{\beta} - CR^\top A^{-1}(R\hat{\beta}-r). \end{align*} This is the unrestricted estimator corrected in the directions needed to enforce the constraint. We verify feasibility: \begin{align*} R\tilde{\beta} = R\hat{\beta} - RCR^\top A^{-1}(R\hat{\beta}-r) = R\hat{\beta} - A A^{-1}(R\hat{\beta}-r) = r. \end{align*} Now let $b \in \mathbb{R}^p$ satisfy $Rb=r$, and define $h:=b-\tilde{\beta} \in \mathbb{R}^p$. Since both $b$ and $\tilde{\beta}$ satisfy the restriction, $Rh=0$. Decompose the residual: \begin{align*} Y-Xb = (Y-X\hat{\beta}) + X(\hat{\beta}-\tilde{\beta}) - Xh. \end{align*} The first term is orthogonal to $\operatorname{Range}(X)$ by the normal equations for unrestricted least squares, so it is orthogonal to both $X(\hat{\beta}-\tilde{\beta})$ and $Xh$. It remains to check the orthogonality of the two fitted-space terms: \begin{align*} \langle X(\hat{\beta}-\tilde{\beta}),Xh\rangle_{\mathbb{R}^n} &= (\hat{\beta}-\tilde{\beta})^\top X^\top Xh \\ &= \bigl(CR^\top A^{-1}(R\hat{\beta}-r)\bigr)^\top X^\top Xh \\ &= (R\hat{\beta}-r)^\top A^{-1}RCX^\top Xh \\ &= (R\hat{\beta}-r)^\top A^{-1}Rh \\ &= 0. \end{align*} The [Pythagorean theorem](/theorems/3266) gives \begin{align*} |Y-Xb|^2 = |Y-X\hat{\beta}|^2 + |X(\hat{\beta}-\tilde{\beta})|^2 + |Xh|^2. \end{align*} This is minimized exactly when $h=0$, so $\tilde{\beta}$ is the restricted least-squares estimator. Therefore \begin{align*} \operatorname{RSS}_0-\operatorname{RSS}_1 &= |X(\hat{\beta}-\tilde{\beta})|^2 \\ &= (\hat{\beta}-\tilde{\beta})^\top X^\top X(\hat{\beta}-\tilde{\beta}) \\ &= (R\hat{\beta}-r)^\top A^{-1}RCX^\top XCR^\top A^{-1}(R\hat{\beta}-r) \\ &= (R\hat{\beta}-r)^\top A^{-1}RCR^\top A^{-1}(R\hat{\beta}-r) \\ &= (R\hat{\beta}-r)^\top A^{-1}(R\hat{\beta}-r). \end{align*} [/guided] [/step] [step:Identify the numerator as a chi-squared variable with $q$ degrees of freedom] Under $H_0$, $R\beta=r$. Since \begin{align*} \hat{\beta} = C X^\top Y = \beta + C X^\top \varepsilon, \end{align*} we have \begin{align*} R\hat{\beta}-r = RCX^\top \varepsilon. \end{align*} Thus $R\hat{\beta}-r$ is Gaussian with mean $0$ and covariance matrix \begin{align*} \operatorname{Cov}(R\hat{\beta}-r) = \sigma^2 RCX^\top XCR^\top = \sigma^2 RCR^\top = \sigma^2 A. \end{align*} Let $A^{1/2} \in \mathbb{R}^{q \times q}$ denote the unique symmetric positive definite square root of $A$, and let $A^{-1/2}:=(A^{1/2})^{-1}$. Define the random vector \begin{align*} Z := \sigma^{-1} A^{-1/2}(R\hat{\beta}-r) \in \mathbb{R}^q. \end{align*} Then $Z$ has the $q$-dimensional normal distribution with mean $0$ and covariance $I_q$, denoted $Z \sim \mathcal{N}_q(0,I_q)$. Therefore $Z^\top Z$ has the chi-squared distribution with $q$ degrees of freedom, denoted $\chi^2_q$, and \begin{align*} \frac{\operatorname{RSS}_0-\operatorname{RSS}_1}{\sigma^2} = Z^\top Z \sim \chi^2_q. \end{align*} [guided] Under the null hypothesis $H_0$, the true parameter satisfies $R\beta=r$. The normal linear model writes the observed random vector as $Y=X\beta+\varepsilon$, where $\varepsilon \sim \mathcal{N}_n(0,\sigma^2 I_n)$. Substituting this into the unrestricted estimator gives \begin{align*} \hat{\beta} = C X^\top Y = C X^\top(X\beta+\varepsilon) = C(X^\top X)\beta + C X^\top \varepsilon = \beta + C X^\top \varepsilon. \end{align*} Applying $R$ and using $R\beta=r$ yields \begin{align*} R\hat{\beta}-r = RCX^\top \varepsilon. \end{align*} Thus $R\hat{\beta}-r$ is a linear transformation of the Gaussian vector $\varepsilon$, so it is Gaussian. Its mean is zero, and its covariance matrix is \begin{align*} \operatorname{Cov}(R\hat{\beta}-r) &= \operatorname{Cov}(RCX^\top \varepsilon) \\ &= RCX^\top \operatorname{Cov}(\varepsilon)XCR^\top \\ &= \sigma^2 RCX^\top XCR^\top \\ &= \sigma^2 RCR^\top \\ &= \sigma^2 A. \end{align*} We now standardize this Gaussian vector. Since $A$ is symmetric positive definite, it has a unique symmetric positive definite square root $A^{1/2} \in \mathbb{R}^{q \times q}$. Define $A^{-1/2}:=(A^{1/2})^{-1}$ and define \begin{align*} Z := \sigma^{-1} A^{-1/2}(R\hat{\beta}-r) \in \mathbb{R}^q. \end{align*} The covariance of $Z$ is \begin{align*} \operatorname{Cov}(Z) &= \sigma^{-2}A^{-1/2}\operatorname{Cov}(R\hat{\beta}-r)A^{-1/2} \\ &= \sigma^{-2}A^{-1/2}(\sigma^2 A)A^{-1/2} \\ &= I_q. \end{align*} Hence $Z$ has the $q$-dimensional normal distribution with mean $0$ and covariance $I_q$, denoted $Z \sim \mathcal{N}_q(0,I_q)$. By the definition of the chi-squared distribution, $Z^\top Z$ has the chi-squared distribution with $q$ degrees of freedom, denoted $\chi^2_q$. Finally, \begin{align*} Z^\top Z &= \sigma^{-2}(R\hat{\beta}-r)^\top A^{-1/2}A^{-1/2}(R\hat{\beta}-r) \\ &= \sigma^{-2}(R\hat{\beta}-r)^\top A^{-1}(R\hat{\beta}-r) \\ &= \frac{\operatorname{RSS}_0-\operatorname{RSS}_1}{\sigma^2}. \end{align*} Therefore \begin{align*} \frac{\operatorname{RSS}_0-\operatorname{RSS}_1}{\sigma^2} \sim \chi^2_q. \end{align*} [/guided] [/step] [step:Identify the denominator as an independent chi-squared variable with $n-p$ degrees of freedom] Since $P_X$ is an orthogonal projection of rank $p$, the matrix $I_n-P_X$ is an orthogonal projection of rank $n-p$. Under $H_0$, \begin{align*} (I_n-P_X)Y = (I_n-P_X)(X\beta+\varepsilon) = (I_n-P_X)\varepsilon, \end{align*} because $X\beta \in \operatorname{Range}(X)$. Therefore \begin{align*} \frac{\operatorname{RSS}_1}{\sigma^2} = \frac{|(I_n-P_X)\varepsilon|^2}{\sigma^2} \sim \chi^2_{n-p}. \end{align*} Moreover, $R\hat{\beta}-r=RCX^\top\varepsilon$ and $(I_n-P_X)\varepsilon$ are jointly Gaussian linear transformations of $\varepsilon$, and their covariance is \begin{align*} \operatorname{Cov}\bigl(RCX^\top\varepsilon,(I_n-P_X)\varepsilon\bigr) = \sigma^2 RCX^\top(I_n-P_X) = 0. \end{align*} Joint Gaussian random vectors with zero covariance are independent, so \begin{align*} \frac{\operatorname{RSS}_0-\operatorname{RSS}_1}{\sigma^2} \quad\text{and}\quad \frac{\operatorname{RSS}_1}{\sigma^2} \end{align*} are independent. [guided] The unrestricted residual is the part of the noise lying orthogonally to the column space of $X$. Since $P_X$ is an orthogonal projection onto a $p$-dimensional subspace, $I_n-P_X$ is an orthogonal projection onto the orthogonal complement, whose dimension is $n-p$. Under $H_0$ we still have $Y=X\beta+\varepsilon$, and $X\beta \in \operatorname{Range}(X)$. Hence \begin{align*} (I_n-P_X)Y = (I_n-P_X)(X\beta+\varepsilon) = (I_n-P_X)X\beta + (I_n-P_X)\varepsilon = (I_n-P_X)\varepsilon. \end{align*} Because $\varepsilon \sim \mathcal{N}_n(0,\sigma^2 I_n)$, the squared length of its projection onto an $(n-p)$-dimensional subspace, divided by $\sigma^2$, has the $\chi^2_{n-p}$ distribution: \begin{align*} \frac{\operatorname{RSS}_1}{\sigma^2} = \frac{|(I_n-P_X)\varepsilon|^2}{\sigma^2} \sim \chi^2_{n-p}. \end{align*} We also need independence from the numerator. The numerator depends only on \begin{align*} R\hat{\beta}-r=RCX^\top\varepsilon, \end{align*} while the denominator depends only on $(I_n-P_X)\varepsilon$. These are jointly Gaussian because both are linear transformations of the Gaussian vector $\varepsilon$. Their covariance is \begin{align*} \operatorname{Cov}\bigl(RCX^\top\varepsilon,(I_n-P_X)\varepsilon\bigr) = \sigma^2 RCX^\top(I_n-P_X). \end{align*} Since $P_X=XCX^\top$, we have $X^\top P_X=X^\top$, and therefore \begin{align*} X^\top(I_n-P_X)=X^\top-X^\top P_X=0. \end{align*} Thus the covariance is zero. For jointly Gaussian random vectors, zero covariance implies independence. Consequently, \begin{align*} \frac{\operatorname{RSS}_0-\operatorname{RSS}_1}{\sigma^2} \quad\text{and}\quad \frac{\operatorname{RSS}_1}{\sigma^2} \end{align*} are independent. [/guided] [/step] [step:Form the ratio and conclude the $F$ distribution] Define \begin{align*} U := \frac{\operatorname{RSS}_0-\operatorname{RSS}_1}{\sigma^2}, \qquad V := \frac{\operatorname{RSS}_1}{\sigma^2}. \end{align*} The previous steps show that $U \sim \chi^2_q$, $V \sim \chi^2_{n-p}$, and $U$ and $V$ are independent. The distribution $F_{q,n-p}$ denotes the law of $(U/q)/(V/(n-p))$ when $U$ and $V$ are independent chi-squared random variables with $q$ and $n-p$ degrees of freedom. Hence \begin{align*} \frac{U/q}{V/(n-p)} \sim F_{q,n-p}. \end{align*} Substituting the definitions of $U$ and $V$ gives \begin{align*} \frac{U/q}{V/(n-p)} = \frac{(\operatorname{RSS}_0-\operatorname{RSS}_1)/q}{\operatorname{RSS}_1/(n-p)}. \end{align*} Therefore \begin{align*} F = \frac{(\operatorname{RSS}_0-\operatorname{RSS}_1)/q}{\operatorname{RSS}_1/(n-p)} \sim F_{q,n-p}, \end{align*} as required. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Orthogonal Projection

Definitions & Concepts

Explore Further

Distribution Definition Orthogonality Definition Orthogonal Projection Theorem #437 Gambler's Ruin Probability Probability Theory Positive Disjoint Events Are Not Independent Probability Theory Wald, Likelihood-Ratio, and Score Tests in Regular Generalized Linear Models Probability & Statistics Basic Implications Between Modes of Convergence Probability Theory Invariance Properties of Brownian Motion Brownian Motion Kolmogorov's Continuity Criterion Stochastic Processes Reflection Principle for Brownian Motion Brownian Motion Limiting Events Are Measurable Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.