t-Distribution of Normalised Coefficient Estimate

t-Distribution of Normalised Coefficient Estimate (Theorem # 1446)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We rewrite the standardised coefficient as a ratio $Z / \sqrt{W/(n - p)}$ where $Z$ is standard normal and $W$ is an independent $\chi^2_{n-p}$. The numerator $Z$ comes from extracting the $j$-th marginal of $\hat\beta$ and standardising using the variance $\sigma^2(X^\top X)^{-1}_{jj}$. The denominator $\sqrt{W/(n-p)}$ comes from $\mathrm{RSS}/\sigma^2 \sim \chi^2_{n-p}$ and uses that $\tilde\sigma^2 = \mathrm{RSS}/(n-p)$ while $\operatorname{se}(\hat\beta_j) = \tilde\sigma \sqrt{(X^\top X)^{-1}_{jj}}$. Independence of $\hat\beta$ and $\mathrm{RSS}$ delivers the independence required by the definition of $t_{n-p}$. [/proofplan] [step:Extract the marginal of $\hat\beta_j$ and form a standard normal $Z$] Under the normal linear model, the [Distributional Properties of the Normal Linear Model](/theorems/1445) theorem gives $\hat\beta \sim N_p(\beta, \sigma^2 (X^\top X)^{-1})$. The marginal distribution of the $j$-th coordinate of a multivariate normal is univariate normal with mean equal to the $j$-th entry of the mean vector and variance equal to the $(j, j)$ entry of the covariance matrix: \begin{align*} \hat\beta_j &\sim N\!\big(\beta_j,\; \sigma^2 (X^\top X)^{-1}_{jj}\big). \end{align*} Standardising, define the map \begin{align*} Z &: \Omega \to \mathbb{R}, & \omega &\mapsto \frac{\hat\beta_j(\omega) - \beta_j}{\sigma \sqrt{(X^\top X)^{-1}_{jj}}}. \end{align*} Then $Z \sim N(0, 1)$. The matrix $(X^\top X)^{-1}$ is positive definite — the inverse of the positive definite Gram matrix $X^\top X$ (positive definite because $\operatorname{rank}(X) = p$ forces $X^\top X$ to have full rank $p$, and $X^\top X = (X^\top X)^\top$ has non-negative eigenvalues). In particular, the diagonal entry $(X^\top X)^{-1}_{jj}$ is strictly positive, so the square root and division are well-defined. [guided] We need a standard normal to play the role of the numerator in the definition of a $t_{n-p}$ random variable, "standard normal over square root of an independent scaled chi-squared." The natural candidate is the standardised marginal of $\hat\beta_j$. Why does the marginal of a multivariate normal give a univariate normal? For $X \sim N_p(\mu, \Sigma)$ and $j \in \{1, \ldots, p\}$, the $j$-th component is the linear combination $e_j^\top X$ where $e_j$ is the $j$-th standard basis vector. By the [Orthogonal Transformations Preserve Multivariate Normality](/theorems/1434) theorem applied to the $1 \times p$ matrix $e_j^\top$, this linear combination is normal with mean $e_j^\top \mu = \mu_j$ and variance $e_j^\top \Sigma e_j = \Sigma_{jj}$. Applied to $\hat\beta$: \begin{align*} \hat\beta_j = e_j^\top \hat\beta \sim N(\beta_j, \sigma^2 (X^\top X)^{-1}_{jj}). \end{align*} *Why is $(X^\top X)^{-1}_{jj} > 0$?* The matrix $X^\top X$ is symmetric. It is also positive definite: for any non-zero $v \in \mathbb{R}^p$, \begin{align*} v^\top (X^\top X) v = (Xv)^\top (Xv) = \|Xv\|^2 \geq 0, \end{align*} with equality iff $Xv = 0$. Since $\operatorname{rank}(X) = p$ means $X$ has trivial kernel, $Xv = 0 \Leftrightarrow v = 0$. So $X^\top X$ is positive definite, and its inverse $(X^\top X)^{-1}$ is positive definite too (same eigenvectors, reciprocal eigenvalues). The diagonal entries of a positive definite matrix are strictly positive: $(X^\top X)^{-1}_{jj} = e_j^\top (X^\top X)^{-1} e_j > 0$. This guarantees the standardisation is well-defined. Subtracting the mean and dividing by the standard deviation: \begin{align*} Z = \frac{\hat\beta_j - \beta_j}{\sigma \sqrt{(X^\top X)^{-1}_{jj}}} \sim N(0, 1). \end{align*} [/guided] [/step] [step:Identify the scaled chi-squared $W = \mathrm{RSS}/\sigma^2$ independent of $Z$] By the [Distributional Properties of the Normal Linear Model](/theorems/1445) (parts 2 and 3), the random variable \begin{align*} W &: \Omega \to [0, \infty), & \omega &\mapsto \frac{\mathrm{RSS}(\omega)}{\sigma^2} \end{align*} satisfies $W \sim \chi^2_{n-p}$, and $W$ is independent of $\hat\beta$. Independence is preserved under measurable functions (Borel measurability of component extraction and affine standardisation): since $Z$ is a measurable function of $\hat\beta$ and $W$ is a measurable function of $\mathrm{RSS}$, and $\hat\beta \perp\!\!\!\perp \mathrm{RSS}$, we conclude $Z \perp\!\!\!\perp W$. [guided] We need a $\chi^2_{n-p}$ random variable independent of $Z$ to serve as the denominator in the $t$-ratio. Both the distribution and the independence were already established in [Distributional Properties of the Normal Linear Model](/theorems/1445): its part (2) gives $W := \mathrm{RSS}/\sigma^2 \sim \chi^2_{n-p}$, and its part (3) gives $\hat\beta \perp\!\!\!\perp \mathrm{RSS}$. To get the tighter independence $Z \perp\!\!\!\perp W$ we apply the standard fact that independence is preserved under measurable transformations on each side. Explicitly, if $U \perp\!\!\!\perp V$ and $f, g$ are measurable, then $f(U) \perp\!\!\!\perp g(V)$ — this follows from $\sigma(f(U)) \subseteq \sigma(U)$ and $\sigma(g(V)) \subseteq \sigma(V)$, plus the fact that independence between two $\sigma$-algebras is inherited by sub-$\sigma$-algebras. Here: - $Z = (\hat\beta_j - \beta_j)/[\sigma \sqrt{(X^\top X)^{-1}_{jj}}]$ is an affine function of the $j$-th coordinate of $\hat\beta$, hence Borel-measurable in $\hat\beta$. - $W = \mathrm{RSS}/\sigma^2$ is an affine function of $\mathrm{RSS}$, hence Borel-measurable in $\mathrm{RSS}$. Since $\hat\beta \perp\!\!\!\perp \mathrm{RSS}$, we conclude $Z \perp\!\!\!\perp W$, exactly the independence required by the definition of the $t$ distribution. [/guided] [/step] [step:Rewrite the standardised coefficient as $Z / \sqrt{W/(n-p)}$] Recall the standard error $\operatorname{se}(\hat\beta_j) = \tilde\sigma \sqrt{(X^\top X)^{-1}_{jj}}$, where $\tilde\sigma^2 = \mathrm{RSS}/(n-p)$ is the unbiased estimator of $\sigma^2$. The quantity of interest is \begin{align*} \frac{\hat\beta_j - \beta_j}{\operatorname{se}(\hat\beta_j)} &= \frac{\hat\beta_j - \beta_j}{\tilde\sigma \sqrt{(X^\top X)^{-1}_{jj}}}. \end{align*} Multiply numerator and denominator by $1/\big(\sigma \sqrt{(X^\top X)^{-1}_{jj}}\big)$: \begin{align*} \frac{\hat\beta_j - \beta_j}{\tilde\sigma \sqrt{(X^\top X)^{-1}_{jj}}} &= \frac{\dfrac{\hat\beta_j - \beta_j}{\sigma\sqrt{(X^\top X)^{-1}_{jj}}}}{\dfrac{\tilde\sigma}{\sigma}} = \frac{Z}{\tilde\sigma/\sigma}. \end{align*} Square-rooting the identity $\tilde\sigma^2/\sigma^2 = \mathrm{RSS}/[(n-p)\sigma^2] = W/(n-p)$ (recalling $W \geq 0$ so taking positive square roots is unambiguous): \begin{align*} \frac{\tilde\sigma}{\sigma} &= \sqrt{\frac{W}{n-p}}. \end{align*} Therefore \begin{align*} \frac{\hat\beta_j - \beta_j}{\operatorname{se}(\hat\beta_j)} &= \frac{Z}{\sqrt{W/(n-p)}}. \end{align*} [/step] [step:Apply the definition of the $t_{n-p}$ distribution] The $t_k$ distribution is defined as the law of $Z/\sqrt{W/k}$ where $Z \sim N(0, 1)$ and $W \sim \chi^2_k$ are independent. In Step 1 we established $Z \sim N(0,1)$; in Step 2, $W \sim \chi^2_{n-p}$ independent of $Z$; in Step 3, the target expression equals $Z/\sqrt{W/(n-p)}$. Taking $k = n - p$ in the definition gives \begin{align*} \frac{\hat\beta_j - \beta_j}{\operatorname{se}(\hat\beta_j)} &\sim t_{n-p}, \end{align*} as claimed. Since $j \in \{1, \ldots, p\}$ was arbitrary, the same conclusion holds for every coefficient. [/step]

Explore Further

Jackknife Bias Reduction Statistics Cramér-Rao Lower Bound Statistics Consistency of Observed Fisher Information Statistics Consistency of the MLE Statistics Bayes Classification Risk as Misclassification Probability Statistics Fisher Information Tensorizes for I.I.D. Samples Statistics Slutsky's Lemma Statistics Uniform Law of Large Numbers Statistics

What brings you to Androma?

Start with a route through the knowledge graph.