Student's t Distribution of the Regression Coefficient Test Statistic

Student's t Distribution of the Regression Coefficient Test Statistic (Theorem # 4442)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We write the coefficient error and the residual sum of squares as orthogonal components of the Gaussian noise vector. The coefficient numerator is the projection of $\varepsilon$ onto a unit vector in $\operatorname{Range}(X)$, while the residual sum of squares is the squared length of the projection of $\varepsilon$ onto $\operatorname{Range}(X)^\perp$. Orthogonal invariance of the multivariate normal distribution makes these components independent, giving one standard normal variable and one independent chi-squared variable. The defining representation of the Student $t$ distribution then gives the result. [/proofplan] [step:Express the coefficient error as a Gaussian coordinate in the column space of $X$] Let $e_j \in \mathbb R^p$ denote the $j$th standard basis vector. Since $X$ has rank $p$, the matrix $X^\top X$ is invertible and positive definite, so $C=(X^\top X)^{-1}$ is positive definite and $c_{jj}=e_j^\top C e_j>0$. From $Y=X\beta+\varepsilon$ and $\hat\beta=CX^\top Y$, we obtain \begin{align*} \hat\beta-\beta &=CX^\top(X\beta+\varepsilon)-\beta \\ &=CX^\top X\beta+CX^\top\varepsilon-\beta \\ &=\beta+CX^\top\varepsilon-\beta \\ &=CX^\top\varepsilon. \end{align*} Therefore \begin{align*} \hat\beta_j-\beta_j=e_j^\top CX^\top\varepsilon=(XC e_j)^\top\varepsilon. \end{align*} Define the vector $u_j\in \mathbb R^n$ by \begin{align*} u_j:=\frac{XC e_j}{\sqrt{c_{jj}}}. \end{align*} Then $u_j\in \operatorname{Range}(X)$ and \begin{align*} |u_j|^2 &=\frac{e_j^\top C X^\top X C e_j}{c_{jj}} \\ &=\frac{e_j^\top C e_j}{c_{jj}} \\ &=1. \end{align*} Under $H_0$, $\beta_j=\beta_{j,0}$, so \begin{align*} \frac{\hat\beta_j-\beta_{j,0}}{\sigma\sqrt{c_{jj}}} =\frac{\hat\beta_j-\beta_j}{\sigma\sqrt{c_{jj}}} =\frac{u_j^\top\varepsilon}{\sigma}. \end{align*} Since $\varepsilon\sim\mathcal N(0,\sigma^2 I_n)$ and $u_j$ is a unit vector, the scalar random variable \begin{align*} Z_j:=\frac{u_j^\top\varepsilon}{\sigma} \end{align*} has distribution $\mathcal N(0,1)$. [guided] The numerator of the test statistic is a single coordinate of the least-squares error. We first rewrite it directly in terms of the noise vector $\varepsilon$. Let $e_j\in\mathbb R^p$ be the $j$th standard basis vector. Because $X$ has full column rank, $X^\top X$ is positive definite and invertible. Hence $C=(X^\top X)^{-1}$ is positive definite, and in particular \begin{align*} c_{jj}=e_j^\top C e_j>0. \end{align*} This justifies the denominator $\sqrt{c_{jj}}$. Using the model equation $Y=X\beta+\varepsilon$, we compute \begin{align*} \hat\beta-\beta &=CX^\top Y-\beta \\ &=CX^\top(X\beta+\varepsilon)-\beta \\ &=CX^\top X\beta+CX^\top\varepsilon-\beta \\ &=\beta+CX^\top\varepsilon-\beta \\ &=CX^\top\varepsilon. \end{align*} Taking the $j$th coordinate gives \begin{align*} \hat\beta_j-\beta_j=e_j^\top CX^\top\varepsilon=(XC e_j)^\top\varepsilon. \end{align*} Now normalize the vector defining this linear functional. Define \begin{align*} u_j:=\frac{XC e_j}{\sqrt{c_{jj}}}\in\mathbb R^n. \end{align*} This vector lies in $\operatorname{Range}(X)$ because it is $X$ applied to the vector $C e_j\in\mathbb R^p$. Its Euclidean norm is \begin{align*} |u_j|^2 &=\frac{(XC e_j)^\top(XC e_j)}{c_{jj}} \\ &=\frac{e_j^\top C X^\top X C e_j}{c_{jj}} \\ &=\frac{e_j^\top C e_j}{c_{jj}} \\ &=1, \end{align*} where we used $C X^\top X C=C$ because $C=(X^\top X)^{-1}$. Under the null hypothesis, $\beta_j=\beta_{j,0}$, so the normalized numerator becomes \begin{align*} \frac{\hat\beta_j-\beta_{j,0}}{\sigma\sqrt{c_{jj}}} =\frac{\hat\beta_j-\beta_j}{\sigma\sqrt{c_{jj}}} =\frac{u_j^\top\varepsilon}{\sigma}. \end{align*} A one-dimensional projection of $\mathcal N(0,\sigma^2 I_n)$ onto a unit vector has distribution $\mathcal N(0,\sigma^2)$. Therefore \begin{align*} Z_j:=\frac{u_j^\top\varepsilon}{\sigma} \end{align*} has distribution $\mathcal N(0,1)$. [/guided] [/step] [step:Rewrite the residual sum of squares as an orthogonal Gaussian length] Define the matrix $P\in\mathbb R^{n\times n}$ by \begin{align*} P:=X C X^\top, \end{align*} and define $M\in\mathbb R^{n\times n}$ by \begin{align*} M:=I_n-P. \end{align*} The matrix $P$ is the [orthogonal projection](/theorems/437) onto $\operatorname{Range}(X)$, since $P^\top=P$ and $P^2=P$. Hence $M$ is the orthogonal projection onto $\operatorname{Range}(X)^\perp$. Because $PX=X$, we have $MX=0$. Also $X\hat\beta=PX Y$. Therefore \begin{align*} Y-X\hat\beta &=Y-PY \\ &=MY \\ &=M(X\beta+\varepsilon) \\ &=MX\beta+M\varepsilon \\ &=M\varepsilon. \end{align*} Thus \begin{align*} \operatorname{RSS}=|M\varepsilon|^2. \end{align*} Choose an orthonormal basis $(q_1,\dots,q_n)$ of $\mathbb R^n$ such that $q_1=u_j$, the vectors $q_1,\dots,q_p$ span $\operatorname{Range}(X)$, and $q_{p+1},\dots,q_n$ span $\operatorname{Range}(X)^\perp$. Let $Q\in\mathbb R^{n\times n}$ be the orthogonal matrix with columns $q_1,\dots,q_n$, and define the random vector $\eta:\Omega\to\mathbb R^n$ by \begin{align*} \eta:=\frac{Q^\top\varepsilon}{\sigma}. \end{align*} By orthogonal invariance of the standard multivariate normal distribution, $\eta\sim\mathcal N(0,I_n)$, so its coordinates $\eta_1,\dots,\eta_n$ are independent $\mathcal N(0,1)$ random variables. In this basis, \begin{align*} Z_j=\eta_1 \end{align*} and \begin{align*} \frac{\operatorname{RSS}}{\sigma^2} =\frac{|M\varepsilon|^2}{\sigma^2} =\sum_{k=p+1}^{n}\eta_k^2. \end{align*} Consequently, \begin{align*} W:=\frac{\operatorname{RSS}}{\sigma^2} \end{align*} has the chi-squared distribution with $n-p$ degrees of freedom, and $Z_j$ is independent of $W$. [guided] The residual vector is obtained by projecting $Y$ away from the column space of $X$. Define \begin{align*} P:=XCX^\top\in\mathbb R^{n\times n}, \qquad M:=I_n-P\in\mathbb R^{n\times n}. \end{align*} We verify that $P$ is the orthogonal projection onto $\operatorname{Range}(X)$. First, \begin{align*} P^\top=(XCX^\top)^\top=XC^\top X^\top=XCX^\top=P, \end{align*} because $C$ is symmetric. Second, \begin{align*} P^2 &=XCX^\top XCX^\top \\ &=XC(X^\top X)CX^\top \\ &=XCX^\top \\ &=P. \end{align*} Thus $P$ is a symmetric idempotent matrix, hence the orthogonal projection onto its range. Since $PX=X$, its range is exactly $\operatorname{Range}(X)$. Therefore $M=I_n-P$ is the orthogonal projection onto $\operatorname{Range}(X)^\perp$. Now compute the residual vector: \begin{align*} Y-X\hat\beta &=Y-XCX^\top Y \\ &=Y-PY \\ &=MY \\ &=M(X\beta+\varepsilon) \\ &=MX\beta+M\varepsilon. \end{align*} Because $PX=X$, we have $MX=(I_n-P)X=X-X=0$, so $MX\beta=0$. Hence \begin{align*} Y-X\hat\beta=M\varepsilon, \qquad \operatorname{RSS}=|M\varepsilon|^2. \end{align*} To identify the distribution of this squared length and its independence from the numerator, choose coordinates adapted to the two orthogonal subspaces. Since $u_j\in\operatorname{Range}(X)$ and $|u_j|=1$, choose an orthonormal basis $(q_1,\dots,q_n)$ of $\mathbb R^n$ such that $q_1=u_j$, the vectors $q_1,\dots,q_p$ span $\operatorname{Range}(X)$, and $q_{p+1},\dots,q_n$ span $\operatorname{Range}(X)^\perp$. Let $Q\in\mathbb R^{n\times n}$ be the orthogonal matrix with columns $q_1,\dots,q_n$, and define \begin{align*} \eta:=\frac{Q^\top\varepsilon}{\sigma}. \end{align*} Since $\varepsilon/\sigma\sim\mathcal N(0,I_n)$ and $Q$ is orthogonal, orthogonal invariance of the standard multivariate normal distribution gives \begin{align*} \eta\sim\mathcal N(0,I_n). \end{align*} Thus the coordinates $\eta_1,\dots,\eta_n$ are independent standard normal random variables. The numerator coordinate is \begin{align*} Z_j=\frac{u_j^\top\varepsilon}{\sigma} =\frac{q_1^\top\varepsilon}{\sigma} =\eta_1. \end{align*} The residual projection $M\varepsilon$ keeps exactly the coordinates in $\operatorname{Range}(X)^\perp$, so \begin{align*} \frac{\operatorname{RSS}}{\sigma^2} =\frac{|M\varepsilon|^2}{\sigma^2} =\sum_{k=p+1}^{n}\eta_k^2. \end{align*} A sum of squares of $n-p$ independent standard normal random variables has the chi-squared distribution with $n-p$ degrees of freedom. Therefore \begin{align*} W:=\frac{\operatorname{RSS}}{\sigma^2}\sim\chi^2_{n-p}. \end{align*} Moreover $Z_j=\eta_1$ depends only on the first coordinate, while $W$ depends only on $\eta_{p+1},\dots,\eta_n$. Since the coordinates of $\eta$ are independent, $Z_j$ and $W$ are independent. [/guided] [/step] [step:Identify the Student $t$ representation] By the definition of $\hat\sigma^2$, \begin{align*} \frac{(n-p)\hat\sigma^2}{\sigma^2} =\frac{\operatorname{RSS}}{\sigma^2} =W. \end{align*} Therefore \begin{align*} T_j &=\frac{\hat\beta_j-\beta_{j,0}}{\hat\sigma\sqrt{c_{jj}}} \\ &=\frac{(\hat\beta_j-\beta_{j,0})/(\sigma\sqrt{c_{jj}})} {\sqrt{\hat\sigma^2/\sigma^2}} \\ &=\frac{Z_j}{\sqrt{W/(n-p)}}. \end{align*} We have shown that $Z_j\sim\mathcal N(0,1)$, $W\sim\chi^2_{n-p}$, and $Z_j$ is independent of $W$. By the defining representation of the Student $t$ distribution, $T_j$ has the Student $t$ distribution with $n-p$ degrees of freedom. This proves the theorem. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Orthogonal Projection

Definitions & Concepts

Distribution

Explore Further

Distribution Definition Orthogonal Projection Theorem #437 Tail Sum Formula for Expectation Probability Theory Properties of Variance Probability Theory Extreme Information Levels Probability & Statistics Conditional Expectation as the $L^2$ Risk Minimizer Probability & Statistics Pushforward Formula Probability Theory Conditional Expectations are Uniformly Integrable Martingale Theory Martingale Regularisation Theorem Stochastic Processes Limiting Events Are Measurable Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.