Distributional Properties of the Normal Linear Model

Distributional Properties of the Normal Linear Model (Theorem # 1445)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] This is a synthesis theorem: each of the three claims has been established separately in the preceding results. We reassemble them here, verify that their hypotheses are identical to the hypotheses of the present theorem, and derive the unbiased-estimator corollary $\mathbb{E}[\tilde\sigma^2] = \sigma^2$ from the expectation of the chi-squared distribution. No new arguments are introduced. [/proofplan] [step:Invoke the distribution of $\hat\beta$] Under the normal linear model $Y \sim N_n(X\beta, \sigma^2 I_n)$ with $\operatorname{rank}(X) = p$, the [Distribution of the Least Squares Estimator](/theorems/1442) gives \begin{align*} \hat\beta &\sim N_p\!\big(\beta,\; \sigma^2 (X^\top X)^{-1}\big). \end{align*} The hypothesis of that theorem — that $Y$ is $N_n(X\beta, \sigma^2 I_n)$ with full column rank $X$ (so that $X^\top X$ is invertible) — matches the present hypothesis verbatim. [/step] [step:Invoke the chi-squared distribution of $\mathrm{RSS}$ and extract its expectation] By the [Chi-Squared Distribution of RSS](/theorems/1443), again under the same hypothesis $Y \sim N_n(X\beta, \sigma^2 I_n)$ with $\operatorname{rank}(X) = p$, \begin{align*} \frac{\mathrm{RSS}}{\sigma^2} &\sim \chi^2_{n-p}. \end{align*} Since the expectation of a $\chi^2_k$ random variable is $k$ (the sum of squares of $k$ independent standard normals has mean $k$), \begin{align*} \mathbb{E}\!\left[\frac{\mathrm{RSS}}{\sigma^2}\right] = n - p \quad\Longrightarrow\quad \mathbb{E}[\mathrm{RSS}] = \sigma^2(n-p). \end{align*} [/step] [step:Invoke independence of $\hat\beta$ and $\mathrm{RSS}$] By the [Independence of $\hat\beta$ and $\mathrm{RSS}$](/theorems/1444) theorem, under the same normal linear model hypothesis, $\hat\beta$ and $\mathrm{RSS}$ are independent. [/step] [step:Derive unbiasedness of $\tilde\sigma^2 = \mathrm{RSS}/(n-p)$] Define the map \begin{align*} \tilde\sigma^2 &: \Omega \to (0, \infty), & \omega &\mapsto \frac{\mathrm{RSS}(\omega)}{n - p}. \end{align*} By linearity of expectation and the computation in Step 2, \begin{align*} \mathbb{E}[\tilde\sigma^2] &= \frac{\mathbb{E}[\mathrm{RSS}]}{n - p} = \frac{\sigma^2(n-p)}{n-p} = \sigma^2. \end{align*} Hence $\tilde\sigma^2$ is an unbiased estimator of $\sigma^2$, completing the proof of all claims. [guided] The three distributional facts (1)–(3) were each established separately; our job here is only to collect them and show that the corollary $\mathbb{E}[\tilde\sigma^2] = \sigma^2$ falls out immediately. *Why is $\tilde\sigma^2 = \mathrm{RSS}/(n - p)$ the "right" unbiased estimator, rather than $\mathrm{RSS}/n$?* The MLE under normality is $\hat\sigma^2 = \mathrm{RSS}/n$ — established in [MLE Equals Least Squares Under Normality](/theorems/1440) — but this estimator is biased: \begin{align*} \mathbb{E}[\hat\sigma^2] = \frac{\mathbb{E}[\mathrm{RSS}]}{n} = \frac{\sigma^2(n-p)}{n} = \sigma^2 \cdot \frac{n - p}{n} < \sigma^2. \end{align*} The MLE systematically underestimates $\sigma^2$ by a factor of $(n-p)/n$. The explanation is that the MLE uses the $n$ observed residuals to estimate the $n - p$ degrees of freedom left after fitting the $p$ coefficients; dividing by $n$ double-counts the model fit. Rescaling to divide by $n - p$ corrects for this, giving $\tilde\sigma^2 = \mathrm{RSS}/(n - p)$ with $\mathbb{E}[\tilde\sigma^2] = \sigma^2$. *The expectation of $\chi^2_k$.* This step uses $\mathbb{E}[W] = k$ for $W \sim \chi^2_k$. Writing $W = \sum_{i=1}^k Z_i^2$ with $Z_i \stackrel{\text{iid}}{\sim} N(0, 1)$, linearity of expectation and $\mathbb{E}[Z_i^2] = \operatorname{Var}(Z_i) + \mathbb{E}[Z_i]^2 = 1 + 0 = 1$ give $\mathbb{E}[W] = k \cdot 1 = k$. Thus $\mathbb{E}[\mathrm{RSS}/\sigma^2] = n - p$, and multiplying by $\sigma^2$ yields $\mathbb{E}[\mathrm{RSS}] = \sigma^2 (n - p)$. Dividing by $n - p$ gives $\mathbb{E}[\tilde\sigma^2] = \sigma^2$ — the advertised unbiasedness. [/guided] [/step]

Explore Further

Bias-Variance Decomposition Statistics Glivenko–Cantelli Statistics Unbiased Bayes Rules Are Trivial Statistics Null Distribution of the Sample Correlation Statistics Independence of $\bar{X}$ and $S^2$ for Normal Samples Statistics Independence of Principal Component Scores in the Gaussian Case Statistics Independence of $\hat\beta$ and $\mathrm{RSS}$ Statistics Total Sum of Squares Decomposition Statistics

What brings you to Androma?

Start with a route through the knowledge graph.