Wald Form of the Linear Restriction F-Test — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first derive the constrained least-squares estimator under the restriction $R\beta=r$ and show that the increase in residual sum of squares is exactly the Wald quadratic form. Then we compute the distribution of $R\hat{\beta}-r$ under $H_0$ and standardise it to obtain a $\chi^2_q$ random variable. Finally, we show that this numerator is independent of the residual variance estimator, whose scaled form has the $\chi^2_{n-p}$ distribution, and the quotient is therefore $F_{q,n-p}$. [/proofplan] [step:Express the constrained residual sum of squares as a quadratic form in $R\hat{\beta}-r$] Let $A \in \mathbb{R}^{p \times p}$ denote the inverse Gram matrix \begin{align*} A = (X^\top X)^{-1}, \end{align*} and let $V \in \mathbb{R}^{q \times q}$ denote the restriction covariance matrix without the factor $\sigma^2$: \begin{align*} V = RAR^\top. \end{align*} Since $X$ has full column rank, $X^\top X$ is positive definite, so $A$ is positive definite. Since $R$ has full row rank, for every nonzero $u \in \mathbb{R}^q$, \begin{align*} u^\top V u = (R^\top u)^\top A(R^\top u) > 0, \end{align*} because $R^\top u \neq 0$. Hence $V$ is positive definite and invertible. Define the unrestricted residual sum of squares $S_U: \Omega \to [0,\infty)$ by \begin{align*} S_U = |Y-X\hat{\beta}|^2, \end{align*} and define the restricted residual sum of squares $S_R: \Omega \to [0,\infty)$ by \begin{align*} S_R = \min\{|Y-Xb|^2 : b \in \mathbb{R}^p,\ Rb=r\}. \end{align*} The affine constraint set is nonempty: the matrix $R \in \mathbb{R}^{q \times p}$ has full row rank, so the [linear map](/page/Linear%20Map) $b \mapsto Rb$ from $\mathbb{R}^p$ to $\mathbb{R}^q$ has range dimension $q$ and is surjective; hence there exists $b_0 \in \mathbb{R}^p$ with $Rb_0=r$. For each $b \in \mathbb{R}^p$, expanding around $\hat{\beta}$ gives \begin{align*} |Y-Xb|^2 &= |Y-X\hat{\beta}+X(\hat{\beta}-b)|^2 \\ &= |Y-X\hat{\beta}|^2 +2(Y-X\hat{\beta})^\top X(\hat{\beta}-b) +(\hat{\beta}-b)^\top X^\top X(\hat{\beta}-b). \end{align*} The normal equations $X^\top(Y-X\hat{\beta})=0$ annihilate the cross term, so \begin{align*} |Y-Xb|^2 = S_U+(b-\hat{\beta})^\top X^\top X(b-\hat{\beta}). \end{align*} We minimize this expression subject to $Rb=r$. The Lagrange multiplier equations for the map \begin{align*} L: \mathbb{R}^p \times \mathbb{R}^q &\to \mathbb{R} \\ (b,\lambda) &\mapsto (b-\hat{\beta})^\top X^\top X(b-\hat{\beta})+2\lambda^\top(Rb-r) \end{align*} are \begin{align*} X^\top X(b-\hat{\beta})+R^\top\lambda &= 0, \\ Rb-r &= 0. \end{align*} The first equation gives $b=\hat{\beta}-AR^\top\lambda$. Substituting this into the constraint gives \begin{align*} R\hat{\beta}-RAR^\top\lambda-r=0, \end{align*} hence \begin{align*} \lambda = V^{-1}(R\hat{\beta}-r). \end{align*} Therefore the restricted least-squares estimator $\tilde{\beta}: \Omega \to \mathbb{R}^p$ is \begin{align*} \tilde{\beta} = \hat{\beta}-AR^\top V^{-1}(R\hat{\beta}-r). \end{align*} Using the quadratic expansion at $b=\tilde{\beta}$, \begin{align*} S_R-S_U &= (\tilde{\beta}-\hat{\beta})^\top X^\top X(\tilde{\beta}-\hat{\beta}) \\ &= (R\hat{\beta}-r)^\top V^{-1}RA X^\top X AR^\top V^{-1}(R\hat{\beta}-r) \\ &= (R\hat{\beta}-r)^\top V^{-1}RAR^\top V^{-1}(R\hat{\beta}-r) \\ &= (R\hat{\beta}-r)^\top V^{-1}(R\hat{\beta}-r). \end{align*} Since $V=R(X^\top X)^{-1}R^\top$, the partial $F$-statistic \begin{align*} F=\frac{(S_R-S_U)/q}{S_U/(n-p)} \end{align*} is exactly \begin{align*} F = \frac{1}{q} \frac{(R\hat{\beta}-r)^\top\left(R(X^\top X)^{-1}R^\top\right)^{-1}(R\hat{\beta}-r)} {\hat{\sigma}^2}. \end{align*} [guided] The goal of this step is to connect the geometric residual-sum-of-squares definition of the partial $F$-test with the matrix expression in the theorem. Define \begin{align*} A=(X^\top X)^{-1}, \qquad V=RAR^\top. \end{align*} Because $X$ has full column rank, $X^\top X$ is positive definite and therefore invertible. Thus $A$ is positive definite. Because $R$ has full row rank, $R^\top u=0$ implies $u=0$ for $u \in \mathbb{R}^q$. Hence, for $u \neq 0$, \begin{align*} u^\top V u = (R^\top u)^\top A(R^\top u) > 0. \end{align*} So $V$ is positive definite and invertible. This verifies that the inverse appearing in the Wald statistic is well-defined. Now define \begin{align*} S_U=|Y-X\hat{\beta}|^2, \qquad S_R=\min\{|Y-Xb|^2:b\in\mathbb{R}^p,\ Rb=r\}. \end{align*} The set over which $S_R$ is minimized is not empty. Indeed, since $R \in \mathbb{R}^{q \times p}$ has full row rank, the linear map $b \mapsto Rb$ from $\mathbb{R}^p$ to $\mathbb{R}^q$ has range dimension $q$, so it is surjective. Because $r \in \mathbb{R}^q$, there exists $b_0 \in \mathbb{R}^p$ satisfying $Rb_0=r$. The basic least-squares identity is obtained by expanding around the unrestricted minimizer $\hat{\beta}$. For $b \in \mathbb{R}^p$, \begin{align*} |Y-Xb|^2 &= |Y-X\hat{\beta}+X(\hat{\beta}-b)|^2 \\ &= |Y-X\hat{\beta}|^2 +2(Y-X\hat{\beta})^\top X(\hat{\beta}-b) +(\hat{\beta}-b)^\top X^\top X(\hat{\beta}-b). \end{align*} The middle term vanishes because the normal equations for ordinary least squares are \begin{align*} X^\top(Y-X\hat{\beta})=0. \end{align*} Therefore \begin{align*} |Y-Xb|^2 = S_U+(b-\hat{\beta})^\top X^\top X(b-\hat{\beta}). \end{align*} Thus the restricted problem is the problem of minimizing the positive definite quadratic form \begin{align*} (b-\hat{\beta})^\top X^\top X(b-\hat{\beta}) \end{align*} over the affine set $\{b\in\mathbb{R}^p:Rb=r\}$. Introduce a Lagrange multiplier $\lambda \in \mathbb{R}^q$ and define \begin{align*} L: \mathbb{R}^p \times \mathbb{R}^q &\to \mathbb{R} \\ (b,\lambda) &\mapsto (b-\hat{\beta})^\top X^\top X(b-\hat{\beta})+2\lambda^\top(Rb-r). \end{align*} The stationarity equations are \begin{align*} X^\top X(b-\hat{\beta})+R^\top\lambda &=0,\\ Rb-r&=0. \end{align*} The first equation gives \begin{align*} b=\hat{\beta}-AR^\top\lambda. \end{align*} Substitution into $Rb=r$ gives \begin{align*} R\hat{\beta}-RAR^\top\lambda-r=0. \end{align*} Since $V=RAR^\top$ is invertible, \begin{align*} \lambda=V^{-1}(R\hat{\beta}-r). \end{align*} Therefore the restricted estimator is \begin{align*} \tilde{\beta} = \hat{\beta}-AR^\top V^{-1}(R\hat{\beta}-r). \end{align*} Now compute the increase in residual sum of squares: \begin{align*} S_R-S_U &= (\tilde{\beta}-\hat{\beta})^\top X^\top X(\tilde{\beta}-\hat{\beta}) \\ &= (R\hat{\beta}-r)^\top V^{-1}RA X^\top X AR^\top V^{-1}(R\hat{\beta}-r). \end{align*} Since $A=(X^\top X)^{-1}$, we have $AX^\top XA=A$, so \begin{align*} S_R-S_U &= (R\hat{\beta}-r)^\top V^{-1}RAR^\top V^{-1}(R\hat{\beta}-r) \\ &= (R\hat{\beta}-r)^\top V^{-1}V V^{-1}(R\hat{\beta}-r) \\ &= (R\hat{\beta}-r)^\top V^{-1}(R\hat{\beta}-r). \end{align*} Since $\hat{\sigma}^2=S_U/(n-p)$, the residual-sum-of-squares form \begin{align*} F=\frac{(S_R-S_U)/q}{S_U/(n-p)} \end{align*} becomes exactly \begin{align*} F = \frac{1}{q} \frac{(R\hat{\beta}-r)^\top\left(R(X^\top X)^{-1}R^\top\right)^{-1}(R\hat{\beta}-r)} {\hat{\sigma}^2}. \end{align*} [/guided] [/step] [step:Standardise the restriction estimator to obtain a $\chi^2_q$ numerator] Since \begin{align*} \hat{\beta} = (X^\top X)^{-1}X^\top Y = \beta+(X^\top X)^{-1}X^\top\varepsilon, \end{align*} the random vector $\hat{\beta}: \Omega \to \mathbb{R}^p$ is Gaussian with mean $\beta$ and covariance matrix $\sigma^2 A$ by the standard affine-transformation property of Gaussian random vectors. Hence \begin{align*} R\hat{\beta}-r \end{align*} is Gaussian with mean $R\beta-r$ and covariance matrix $\sigma^2 V$. Under $H_0$, this mean is zero. Let $V^{-1/2} \in \mathbb{R}^{q \times q}$ denote the positive definite inverse square root of $V$, and define \begin{align*} Z: \Omega \to \mathbb{R}^q, \qquad Z=\sigma^{-1}V^{-1/2}(R\hat{\beta}-r). \end{align*} Under $H_0$, $Z \sim \mathcal{N}(0,I_q)$. Define the numerator statistic $Q: \Omega \to \mathbb{R}$ by \begin{align*} Q = \frac{(R\hat{\beta}-r)^\top V^{-1}(R\hat{\beta}-r)}{\sigma^2}. \end{align*} Then \begin{align*} Q = Z^\top Z = \sum_{j=1}^q Z_j^2, \end{align*} so $Q$ has the $\chi^2_q$ distribution by the definition of the chi-squared distribution as a sum of squares of independent standard normal random variables. [guided] We now identify the numerator distribution. The ordinary least-squares estimator is an affine function of the Gaussian error vector: \begin{align*} \hat{\beta} = (X^\top X)^{-1}X^\top Y = (X^\top X)^{-1}X^\top(X\beta+\varepsilon) = \beta+(X^\top X)^{-1}X^\top\varepsilon. \end{align*} By the standard affine-transformation property of Gaussian random vectors, an affine image of a Gaussian vector is Gaussian. Applying that fact to the affine map $e \mapsto \beta+(X^\top X)^{-1}X^\top e$ from $\mathbb{R}^n$ to $\mathbb{R}^p$ shows that $\hat{\beta}$ is Gaussian. Its mean is \begin{align*} \mathbb{E}[\hat{\beta}] = \beta+(X^\top X)^{-1}X^\top\mathbb{E}[\varepsilon] = \beta, \end{align*} and its covariance matrix is \begin{align*} \operatorname{Cov}(\hat{\beta}) &= (X^\top X)^{-1}X^\top(\sigma^2 I_n)X(X^\top X)^{-1} \\ &= \sigma^2(X^\top X)^{-1}. \end{align*} Thus $R\hat{\beta}-r$ is Gaussian with mean $R\beta-r$ and covariance \begin{align*} \sigma^2R(X^\top X)^{-1}R^\top = \sigma^2V. \end{align*} Under the null hypothesis $H_0:R\beta=r$, the mean is zero. Because $V$ is positive definite, it has a positive definite inverse square root $V^{-1/2}$. Define the standardised random vector \begin{align*} Z: \Omega \to \mathbb{R}^q, \qquad Z=\sigma^{-1}V^{-1/2}(R\hat{\beta}-r). \end{align*} Under $H_0$, its mean is zero and its covariance matrix is \begin{align*} \operatorname{Cov}(Z) &= \sigma^{-2}V^{-1/2}(\sigma^2V)V^{-1/2} \\ &= I_q. \end{align*} Hence $Z\sim\mathcal{N}(0,I_q)$. Define the numerator statistic $Q: \Omega \to \mathbb{R}$ by \begin{align*} Q = \frac{(R\hat{\beta}-r)^\top V^{-1}(R\hat{\beta}-r)}{\sigma^2}. \end{align*} Then the quadratic form in the numerator satisfies \begin{align*} Q = Z^\top Z = \sum_{j=1}^q Z_j^2. \end{align*} Since the coordinates $Z_1,\dots,Z_q$ are independent standard normal random variables, this sum has the $\chi^2_q$ distribution by the definition of the chi-squared distribution. [/guided] [/step] [step:Show that the residual variance gives an independent $\chi^2_{n-p}$ denominator] Define the projection matrix $P_X \in \mathbb{R}^{n \times n}$ onto the column space of $X$ by \begin{align*} P_X=X(X^\top X)^{-1}X^\top, \end{align*} and define the residual projection matrix $M_X \in \mathbb{R}^{n \times n}$ by \begin{align*} M_X=I_n-P_X. \end{align*} Then \begin{align*} Y-X\hat{\beta}=M_XY=M_X\varepsilon, \end{align*} because $M_XX\beta=0$. The matrix $M_X$ is symmetric and idempotent, and its rank is $n-p$. By the spectral theorem for real symmetric matrices, applied to the symmetric idempotent matrix $M_X$, there is an orthonormal basis of eigenvectors and every eigenvalue is either $0$ or $1$ because $\lambda^2=\lambda$ for each eigenvalue $\lambda$. Choose an orthogonal matrix $U \in \mathbb{R}^{n \times n}$ such that \begin{align*} UM_XU^\top = \begin{pmatrix} I_{n-p} & 0 \\ 0 & 0_p \end{pmatrix}. \end{align*} Define $\xi: \Omega \to \mathbb{R}^n$ by \begin{align*} \xi=\sigma^{-1}U\varepsilon. \end{align*} Since $U$ is orthogonal and $\varepsilon\sim\mathcal{N}(0,\sigma^2I_n)$, we have $\xi\sim\mathcal{N}(0,I_n)$. Therefore \begin{align*} \frac{(n-p)\hat{\sigma}^2}{\sigma^2} = \frac{|M_X\varepsilon|^2}{\sigma^2} = \xi^\top \begin{pmatrix} I_{n-p} & 0 \\ 0 & 0_p \end{pmatrix} \xi = \sum_{j=1}^{n-p}\xi_j^2 \end{align*} has the $\chi^2_{n-p}$ distribution. Moreover, $\hat{\beta}-\beta=A X^\top\varepsilon$ depends only on $P_X\varepsilon$, while $Y-X\hat{\beta}=M_X\varepsilon$ depends only on $M_X\varepsilon$. Since $P_XM_X=0$ and $\varepsilon$ is Gaussian with covariance $\sigma^2I_n$, the Gaussian random vectors $P_X\varepsilon$ and $M_X\varepsilon$ have zero cross-covariance. By the standard independence criterion for jointly Gaussian random vectors, zero cross-covariance implies independence. Thus the numerator variable $Q$ is independent of $\hat{\sigma}^2$. [guided] The residual variance estimator is built from the part of $Y$ orthogonal to the column space of $X$. Define \begin{align*} P_X=X(X^\top X)^{-1}X^\top, \qquad M_X=I_n-P_X. \end{align*} The matrix $P_X$ is the [orthogonal projection](/theorems/437) onto $\operatorname{Range}(X)$, and $M_X$ is the orthogonal projection onto its orthogonal complement. Since \begin{align*} X\hat{\beta}=P_XY, \end{align*} we have \begin{align*} Y-X\hat{\beta} = (I_n-P_X)Y = M_XY. \end{align*} Using $Y=X\beta+\varepsilon$ and $M_XX=0$, \begin{align*} Y-X\hat{\beta} = M_X(X\beta+\varepsilon) = M_X\varepsilon. \end{align*} The matrix $M_X$ is symmetric and idempotent. Its rank is $n-p$ because $P_X$ has rank $p$ and projects onto the $p$-dimensional column space of $X$. By the spectral theorem for real symmetric matrices, the symmetric matrix $M_X$ is orthogonally diagonalizable. Since $M_X^2=M_X$, every eigenvalue $\lambda$ of $M_X$ satisfies $\lambda^2=\lambda$, hence $\lambda \in \{0,1\}$. The number of eigenvalues equal to $1$ is the rank of $M_X$, namely $n-p$, so there exists an orthogonal matrix $U \in \mathbb{R}^{n\times n}$ such that \begin{align*} UM_XU^\top = \begin{pmatrix} I_{n-p} & 0 \\ 0 & 0_p \end{pmatrix}. \end{align*} Define \begin{align*} \xi: \Omega \to \mathbb{R}^n, \qquad \xi=\sigma^{-1}U\varepsilon. \end{align*} Because $U$ is orthogonal and $\varepsilon\sim\mathcal{N}(0,\sigma^2I_n)$, the vector $\xi$ has distribution $\mathcal{N}(0,I_n)$. Hence its coordinates are independent standard normal random variables. Therefore \begin{align*} \frac{(n-p)\hat{\sigma}^2}{\sigma^2} &= \frac{|Y-X\hat{\beta}|^2}{\sigma^2} \\ &= \frac{|M_X\varepsilon|^2}{\sigma^2} \\ &= \xi^\top \begin{pmatrix} I_{n-p} & 0 \\ 0 & 0_p \end{pmatrix} \xi \\ &= \sum_{j=1}^{n-p}\xi_j^2. \end{align*} This has the $\chi^2_{n-p}$ distribution. It remains to check independence from the numerator. The numerator depends on $R\hat{\beta}-r$, and \begin{align*} \hat{\beta}-\beta=A X^\top\varepsilon. \end{align*} Thus it depends only on the projection of $\varepsilon$ onto $\operatorname{Range}(X)$, namely $P_X\varepsilon$. The residual vector is $M_X\varepsilon$, which lies in the orthogonal complement of $\operatorname{Range}(X)$. Their cross-covariance is \begin{align*} \operatorname{Cov}(P_X\varepsilon,M_X\varepsilon) = P_X(\sigma^2I_n)M_X = \sigma^2P_XM_X = 0. \end{align*} Since $(P_X\varepsilon,M_X\varepsilon)$ is jointly Gaussian, the standard jointly Gaussian independence criterion applies: zero cross-covariance implies independence. Therefore the quadratic-form numerator is independent of $\hat{\sigma}^2$. [/guided] [/step] [step:Form the quotient and identify the $F_{q,n-p}$ distribution] Under $H_0$, define the random variables $Q: \Omega \to \mathbb{R}$ and $W: \Omega \to \mathbb{R}$ by \begin{align*} Q=\frac{(R\hat{\beta}-r)^\top V^{-1}(R\hat{\beta}-r)}{\sigma^2}, \qquad W=\frac{(n-p)\hat{\sigma}^2}{\sigma^2}. \end{align*} The previous steps show that $Q\sim\chi^2_q$, $W\sim\chi^2_{n-p}$, and $Q$ and $W$ are independent. Therefore \begin{align*} \frac{Q/q}{W/(n-p)} \sim F_{q,n-p}. \end{align*} Finally, \begin{align*} \frac{Q/q}{W/(n-p)} &= \frac{ \frac{1}{q}(R\hat{\beta}-r)^\top V^{-1}(R\hat{\beta}-r)/\sigma^2 }{ \hat{\sigma}^2/\sigma^2 } \\ &= \frac{1}{q} \frac{(R\hat{\beta}-r)^\top V^{-1}(R\hat{\beta}-r)} {\hat{\sigma}^2}. \end{align*} Substituting $V=R(X^\top X)^{-1}R^\top$ gives the displayed Wald statistic, so the statistic has the $F_{q,n-p}$ distribution under $H_0$. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Orthogonal Projection

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.

Wald Form of the Linear Restriction F-Test (Theorem # 4446)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Wald Form of the Linear Restriction F-Test (Theorem # 4446)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further