Independence of $\hat\beta$ and $\mathrm{RSS}$ — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We stack $\hat\beta$ and the residual vector $R = (I_n - P)Y$ into a single vector $V = DY$. Since $Y$ is multivariate normal and $D$ is deterministic, $V$ is also multivariate normal. We then compute the off-diagonal block of $\operatorname{Cov}(V)$ and show it vanishes: this reduces to the identity $X^\top(I_n - P) = 0$, which says residuals are orthogonal to the column space of $X$. For jointly normal vectors, zero cross-covariance implies independence, so $\hat\beta$ is independent of $R$, hence of any function of $R$ — in particular of $\mathrm{RSS} = R^\top R$. [/proofplan] [step:Stack $\hat\beta$ and the residual vector into a single linear function of $Y$] Let $P = X(X^\top X)^{-1} X^\top$ be the hat matrix, $C := (X^\top X)^{-1} X^\top \in \mathbb{R}^{p \times n}$ so that $\hat\beta = CY$, and let $R = (I_n - P)Y$ be the residual vector. Define the stacked vector and transformation matrix as maps \begin{align*} D &: \mathbb{R}^n \to \mathbb{R}^{p+n}, & V &: \Omega \to \mathbb{R}^{p+n} \\ y &\mapsto \begin{pmatrix} C \\ I_n - P \end{pmatrix} y, & \omega &\mapsto DY(\omega) = \begin{pmatrix} \hat\beta(\omega) \\ R(\omega) \end{pmatrix}. \end{align*} Here $D \in \mathbb{R}^{(p+n) \times n}$ is a deterministic matrix and $V = DY$ is a random vector on the underlying probability space. [guided] Our goal is to prove independence of $\hat\beta$ (a $p$-vector) and $\mathrm{RSS}$ (a scalar). Direct independence proofs rarely succeed in the normal linear model — moment generating functions and density factorisations become unwieldy. The standard technique is to exploit the single most powerful feature of multivariate normals: **for jointly normal vectors, independence is equivalent to zero covariance**. The proof will therefore run in three moves: 1. Package $\hat\beta$ and $R$ together as one linear function of $Y$ (this step). 2. Use multivariate normality of $Y$ to conclude the package is jointly normal (next step). 3. Compute the cross-covariance block and show it vanishes (final step). Why prove independence of $\hat\beta$ and $R$ rather than of $\hat\beta$ and $\mathrm{RSS}$ directly? Because $\mathrm{RSS} = R^\top R$ is a (measurable) function of $R$, and independence is preserved under such measurable functions: if $U$ is independent of $W$, then $U$ is independent of $f(W)$ for any measurable $f$. So it is enough — and much more natural — to prove the vector-vector independence $\hat\beta \perp\!\!\!\perp R$, then invoke this elementary property at the end. Packaging into a single vector $V = DY$ is just notation: we line up the linear maps that extract $\hat\beta$ and $R$ from $Y$ into one tall matrix $D$. Reading off the two blocks of $D$: - top block $C = (X^\top X)^{-1} X^\top$, representing $\hat\beta = CY$ (the least squares formula); - bottom block $I_n - P$, representing $R = (I_n - P)Y$ (the projection onto the residual subspace). Stacking produces a $(p+n) \times n$ matrix whose action on $Y$ returns both quantities simultaneously. [/guided] [/step] [step:Conclude that $V$ is multivariate normal] Under the normal linear model, $Y \sim N_n(X\beta, \sigma^2 I_n)$. Affine transformations of multivariate normals are multivariate normal: for any deterministic matrix $B \in \mathbb{R}^{m \times n}$ and vector $b \in \mathbb{R}^m$, the [Orthogonal Transformations Preserve Multivariate Normality](/theorems/1434) theorem (applied with the more general matrix $B$, not just an orthogonal $B$) gives \begin{align*} BY + b &\sim N_m\!\big(B(X\beta) + b,\; B(\sigma^2 I_n) B^\top\big) = N_m\!\big(BX\beta + b,\; \sigma^2 B B^\top\big). \end{align*} Applying this with $B = D$ and $b = \mathbf{0}$, the stacked vector $V = DY$ is multivariate normal: \begin{align*} V &\sim N_{p+n}\!\big(D X\beta,\; \sigma^2 D D^\top\big). \end{align*} [/step] [step:Compute the cross-covariance block and show it vanishes] Write the covariance of $V$ in $p + n$ block form: \begin{align*} \operatorname{Cov}(V) &= \sigma^2 D D^\top = \sigma^2 \begin{pmatrix} C \\ I_n - P \end{pmatrix} \begin{pmatrix} C^\top & (I_n - P)^\top \end{pmatrix} = \sigma^2 \begin{pmatrix} C C^\top & C (I_n - P)^\top \\ (I_n - P) C^\top & (I_n - P)(I_n - P)^\top \end{pmatrix}. \end{align*} The off-diagonal block of interest is $\sigma^2 C (I_n - P)^\top$. Substituting $C = (X^\top X)^{-1} X^\top$ and using symmetry of $I_n - P$ (established in the proof of the [Chi-Squared Distribution of RSS](/theorems/1443)): \begin{align*} C (I_n - P)^\top &= (X^\top X)^{-1} X^\top (I_n - P) = (X^\top X)^{-1} \big(X^\top - X^\top P\big). \end{align*} Now \begin{align*} X^\top P &= X^\top X (X^\top X)^{-1} X^\top = X^\top, \end{align*} so $X^\top (I_n - P) = X^\top - X^\top = \mathbf{0} \in \mathbb{R}^{p \times n}$. Therefore \begin{align*} C (I_n - P)^\top &= (X^\top X)^{-1} \cdot \mathbf{0} = \mathbf{0}_{p \times n}, \end{align*} so the full off-diagonal block vanishes: \begin{align*} \operatorname{Cov}(\hat\beta, R) &= \sigma^2 C (I_n - P)^\top = \mathbf{0}_{p \times n}. \end{align*} [guided] We computed the cross-covariance block $\operatorname{Cov}(\hat\beta, R) = \sigma^2 C (I_n - P)^\top$ and need to show it is the zero matrix. Substituting the formula $C = (X^\top X)^{-1} X^\top$ and using that $I_n - P$ is symmetric: \begin{align*} C (I_n - P)^\top = (X^\top X)^{-1} X^\top (I_n - P). \end{align*} Everything now comes down to the identity $X^\top (I_n - P) = \mathbf{0}$. This has both an algebraic and a geometric reading. *Algebraic.* We multiply out: \begin{align*} X^\top (I_n - P) = X^\top - X^\top P = X^\top - X^\top \cdot X(X^\top X)^{-1} X^\top = X^\top - (X^\top X)(X^\top X)^{-1} X^\top = X^\top - I_p X^\top = \mathbf{0}. \end{align*} The collapse happens because $(X^\top X)(X^\top X)^{-1} = I_p$ — the very identity that makes the pseudoinverse work. *Geometric.* The matrix $P$ is the orthogonal projection onto $\operatorname{Range}(X) \subset \mathbb{R}^n$, so $I_n - P$ is the orthogonal projection onto $\operatorname{Range}(X)^\perp$. The columns of $X$ span $\operatorname{Range}(X)$, so projecting them via $(I_n - P)$ gives zero — but this is exactly what the identity $(I_n - P) X = 0$ says (already noted in the proof of [Chi-Squared Distribution of RSS](/theorems/1443)). Transposing: \begin{align*} X^\top (I_n - P)^\top = 0 \quad\Longleftrightarrow\quad X^\top (I_n - P) = 0 \quad \text{(since } I_n - P \text{ symmetric)}. \end{align*} This is the statistician's version of the **normal equations**: residuals are orthogonal to the column space of $X$, equivalently, to every predictor. Substituting back, the left-hand factor $(X^\top X)^{-1}$ is bounded and acts on the zero matrix, so \begin{align*} C (I_n - P)^\top = (X^\top X)^{-1} \cdot 0 = 0_{p \times n}, \end{align*} and the cross-covariance block vanishes. [/guided] [/step] [step:Conclude independence of $\hat\beta$ and $\mathrm{RSS}$] Since $V = (\hat\beta^\top, R^\top)^\top$ is multivariate normal (Step 2) and its cross-covariance block is zero (Step 3), the two sub-vectors are independent: for jointly normal vectors, zero cross-covariance is equivalent to independence. Therefore \begin{align*} \hat\beta &\perp\!\!\!\perp R. \end{align*} Independence is preserved under measurable functions of either side: for any Borel-measurable $f: \mathbb{R}^n \to \mathbb{R}$, $\hat\beta$ is independent of $f(R)$. Taking $f(r) := r^\top r$, we obtain $\mathrm{RSS} = R^\top R$ and hence \begin{align*} \hat\beta &\perp\!\!\!\perp \mathrm{RSS}. \end{align*} Finally, since $\hat\sigma^2 = \mathrm{RSS}/n$ (MLE) or $\tilde\sigma^2 = \mathrm{RSS}/(n-p)$ (unbiased) is a deterministic function of $\mathrm{RSS}$, $\hat\beta$ is independent of $\hat\sigma^2$ as well. This completes the proof. [guided] We assemble the three pieces: *Joint normality.* Step 2 established $V = (\hat\beta^\top, R^\top)^\top \sim N_{p+n}(D X\beta, \sigma^2 D D^\top)$. *Zero cross-covariance.* Step 3 showed the $p \times n$ off-diagonal block of $\operatorname{Cov}(V)$ is zero. *Implication.* For a jointly normal vector $V = (U^\top, W^\top)^\top$, the blocks $U$ and $W$ are independent if and only if $\operatorname{Cov}(U, W) = 0$. The "if" direction is the non-trivial one: zero covariance generally does not imply independence, but the joint normality collapses the exception — the joint density factorises as a product of normal densities precisely because the covariance matrix is block-diagonal. We conclude $\hat\beta \perp\!\!\!\perp R$. *Function of an independent variable stays independent.* The last piece is the elementary fact that if $U \perp\!\!\!\perp W$ and $f$ is a Borel-measurable function on the codomain of $W$, then $U \perp\!\!\!\perp f(W)$. This follows from the definition of independence via $\sigma$-algebras: $\sigma(f(W)) \subseteq \sigma(W)$, and independence between $\sigma$-algebras passes to sub-$\sigma$-algebras. Applying this with $f(r) = r^\top r$: \begin{align*} \mathrm{RSS} = R^\top R = f(R) \quad\Longrightarrow\quad \hat\beta \perp\!\!\!\perp \mathrm{RSS}. \end{align*} The estimators $\hat\sigma^2 = \mathrm{RSS}/n$ and $\tilde\sigma^2 = \mathrm{RSS}/(n - p)$ are deterministic functions of $\mathrm{RSS}$, so $\hat\beta \perp\!\!\!\perp \hat\sigma^2$ and $\hat\beta \perp\!\!\!\perp \tilde\sigma^2$ follow immediately. This is the independence property that powers the derivation of the $t$-distribution for normalised coefficients. [/guided] [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Independence of $\hat\beta$ and $\mathrm{RSS}$ (Theorem # 1444)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Independence of $\hat\beta$ and $\mathrm{RSS}$ (Theorem # 1444)

Discussion

Proof

Explore Further