Independence of Fitted Values and Residuals in the Normal Linear Model

Independence of Fitted Values and Residuals in the Normal Linear Model (Theorem # 4441)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The proof is a Gaussian orthogonal-projection argument. First we verify that the hat matrix is symmetric and idempotent, so $H$ and $I_n-H$ project onto complementary subspaces. Then we compute the cross-covariance of $HY$ and $(I_n-H)Y$ and show it vanishes. Finally, we use the characteristic function of the joint Gaussian vector to prove that zero cross-covariance gives independence here, and then pass independence through the measurable maps defining $\hat{\beta}$ and $\operatorname{RSS}$. [/proofplan] [step:Verify that the hat matrix is a symmetric idempotent matrix] Since $X$ has full column rank $p$, the matrix $X^\top X \in \mathbb{R}^{p \times p}$ is invertible. Define \begin{align*} H := X(X^\top X)^{-1}X^\top \in \mathbb{R}^{n \times n}. \end{align*} Because $X^\top X$ is symmetric and invertible, $(X^\top X)^{-1}$ is symmetric. Hence \begin{align*} H^\top = \left(X(X^\top X)^{-1}X^\top\right)^\top = X(X^\top X)^{-1}X^\top = H. \end{align*} Also, \begin{align*} H^2 &= X(X^\top X)^{-1}X^\top X(X^\top X)^{-1}X^\top \\ &= X(X^\top X)^{-1}(X^\top X)(X^\top X)^{-1}X^\top \\ &= X(X^\top X)^{-1}X^\top \\ &= H. \end{align*} Thus $H$ is symmetric and idempotent. Consequently, \begin{align*} H(I_n-H)=H-H^2=0. \end{align*} [/step] [step:Compute the cross-covariance of the fitted vector and residual vector] Define the fitted vector and residual vector by \begin{align*} \hat{Y}: \Omega &\to \mathbb{R}^n, & \hat{Y}(\omega)&=HY(\omega),\\ e: \Omega &\to \mathbb{R}^n, & e(\omega)&=(I_n-H)Y(\omega). \end{align*} Since $Y \sim \mathcal{N}_n(X\beta,\sigma^2 I_n)$, its covariance matrix is $\operatorname{Cov}(Y)=\sigma^2 I_n$. For deterministic matrices $A,B \in \mathbb{R}^{n \times n}$, the covariance identity \begin{align*} \operatorname{Cov}(AY,BY)=A\operatorname{Cov}(Y)B^\top \end{align*} applied with $A=H$ and $B=I_n-H$ gives \begin{align*} \operatorname{Cov}(\hat{Y},e) &=H(\sigma^2 I_n)(I_n-H)^\top \\ &=\sigma^2 H(I_n-H) \\ &=0, \end{align*} because $H^\top=H$ and $H(I_n-H)=0$. [guided] The fitted vector and the residual vector are both linear transformations of the same Gaussian vector $Y$. Their dependence can therefore be detected by their cross-covariance. We first record the two maps: \begin{align*} \hat{Y}: \Omega &\to \mathbb{R}^n, & \hat{Y}(\omega)&=HY(\omega),\\ e: \Omega &\to \mathbb{R}^n, & e(\omega)&=(I_n-H)Y(\omega). \end{align*} The covariance matrix of $Y$ is $\sigma^2 I_n$ by the hypothesis $Y \sim \mathcal{N}_n(X\beta,\sigma^2 I_n)$. For deterministic matrices $A,B \in \mathbb{R}^{n \times n}$, covariance transforms by \begin{align*} \operatorname{Cov}(AY,BY)=A\operatorname{Cov}(Y)B^\top. \end{align*} Using $A=H$ and $B=I_n-H$, we obtain \begin{align*} \operatorname{Cov}(\hat{Y},e) &=H(\sigma^2 I_n)(I_n-H)^\top. \end{align*} The matrix $H$ is symmetric, so $(I_n-H)^\top=I_n-H$. Therefore \begin{align*} \operatorname{Cov}(\hat{Y},e) &=\sigma^2H(I_n-H). \end{align*} From the idempotence $H^2=H$, we have \begin{align*} H(I_n-H)=H-H^2=0. \end{align*} Thus \begin{align*} \operatorname{Cov}(\hat{Y},e)=0. \end{align*} This is the essential orthogonality calculation: the projection part $HY$ and the residual part $(I_n-H)Y$ have zero cross-covariance. [/guided] [/step] [step:Use the Gaussian characteristic function to factor the joint law] Let $a,b \in \mathbb{R}^n$ be arbitrary deterministic vectors. Define the scalar random variable \begin{align*} Z_{a,b}: \Omega &\to \mathbb{R}, & Z_{a,b}(\omega)&=a^\top \hat{Y}(\omega)+b^\top e(\omega). \end{align*} Since $Z_{a,b}$ is a linear functional of the Gaussian vector $Y$, it is a real Gaussian random variable. Its mean is \begin{align*} \mathbb{E}[Z_{a,b}] &=a^\top HX\beta+b^\top(I_n-H)X\beta. \end{align*} Because $HX=X$, we have $(I_n-H)X=0$, and hence \begin{align*} \mathbb{E}[Z_{a,b}]=a^\top X\beta. \end{align*} Its variance is \begin{align*} \operatorname{Var}(Z_{a,b}) &=\operatorname{Var}(a^\top\hat{Y})+\operatorname{Var}(b^\top e) +2\operatorname{Cov}(a^\top\hat{Y},b^\top e)\\ &=\operatorname{Var}(a^\top\hat{Y})+\operatorname{Var}(b^\top e), \end{align*} because \begin{align*} \operatorname{Cov}(a^\top\hat{Y},b^\top e) =a^\top\operatorname{Cov}(\hat{Y},e)b =0. \end{align*} Therefore the joint characteristic function of $(\hat{Y},e)$ satisfies \begin{align*} \mathbb{E}\left[\exp\left(i a^\top\hat{Y}+i b^\top e\right)\right] &=\exp\left(i\mathbb{E}[a^\top\hat{Y}]-\frac{1}{2}\operatorname{Var}(a^\top\hat{Y})\right) \exp\left(i\mathbb{E}[b^\top e]-\frac{1}{2}\operatorname{Var}(b^\top e)\right)\\ &=\mathbb{E}\left[\exp\left(i a^\top\hat{Y}\right)\right] \mathbb{E}\left[\exp\left(i b^\top e\right)\right]. \end{align*} Since this factorization holds for every $a,b \in \mathbb{R}^n$, the joint characteristic function of $(\hat{Y},e)$ factors as the product of the marginal characteristic functions. Hence $\hat{Y}$ and $e$ are independent. [guided] We now convert the zero cross-covariance into independence. This step uses the special structure of Gaussian random vectors: for Gaussian vectors, the characteristic function is determined entirely by the mean vector and covariance matrix. Fix deterministic vectors $a,b \in \mathbb{R}^n$. Define \begin{align*} Z_{a,b}: \Omega &\to \mathbb{R}, & Z_{a,b}(\omega)&=a^\top \hat{Y}(\omega)+b^\top e(\omega). \end{align*} This random variable is a linear functional of $Y$, because \begin{align*} Z_{a,b} =a^\top HY+b^\top(I_n-H)Y. \end{align*} Since $Y$ is multivariate normal, every real linear functional of $Y$ is a real Gaussian random variable. We compute its mean and variance. First, \begin{align*} \mathbb{E}[Z_{a,b}] &=a^\top H\mathbb{E}[Y]+b^\top(I_n-H)\mathbb{E}[Y]\\ &=a^\top HX\beta+b^\top(I_n-H)X\beta. \end{align*} The identity $HX=X$ follows from \begin{align*} HX=X(X^\top X)^{-1}X^\top X=X. \end{align*} Thus $(I_n-H)X=0$, and so \begin{align*} \mathbb{E}[Z_{a,b}]=a^\top X\beta. \end{align*} Next, using the bilinearity of covariance, \begin{align*} \operatorname{Var}(Z_{a,b}) &=\operatorname{Var}(a^\top\hat{Y}+b^\top e)\\ &=\operatorname{Var}(a^\top\hat{Y})+\operatorname{Var}(b^\top e) +2\operatorname{Cov}(a^\top\hat{Y},b^\top e). \end{align*} The cross term is zero because \begin{align*} \operatorname{Cov}(a^\top\hat{Y},b^\top e) =a^\top\operatorname{Cov}(\hat{Y},e)b =0. \end{align*} Therefore \begin{align*} \operatorname{Var}(Z_{a,b}) =\operatorname{Var}(a^\top\hat{Y})+\operatorname{Var}(b^\top e). \end{align*} For a real Gaussian random variable $W$ with mean $m$ and variance $v$, its characteristic function is \begin{align*} \mathbb{E}\left[e^{itW}\right]=\exp\left(itm-\frac{1}{2}t^2v\right). \end{align*} Applying this formula to $Z_{a,b}$ with $t=1$, and using the additivity of the mean and the variance just proved, gives \begin{align*} \mathbb{E}\left[\exp\left(i a^\top\hat{Y}+i b^\top e\right)\right] &=\exp\left(i\mathbb{E}[a^\top\hat{Y}]-\frac{1}{2}\operatorname{Var}(a^\top\hat{Y})\right) \exp\left(i\mathbb{E}[b^\top e]-\frac{1}{2}\operatorname{Var}(b^\top e)\right)\\ &=\mathbb{E}\left[\exp\left(i a^\top\hat{Y}\right)\right] \mathbb{E}\left[\exp\left(i b^\top e\right)\right]. \end{align*} This identity holds for every pair $a,b \in \mathbb{R}^n$. Hence the joint characteristic function of $(\hat{Y},e)$ equals the product of the marginal characteristic functions. Characteristic functions determine probability laws on Euclidean spaces, so the joint law is the product of the two marginal laws. This is exactly the independence of $\hat{Y}$ and $e$. [/guided] [/step] [step:Pass independence to $\hat{\beta}$ and $\operatorname{RSS}$ through measurable functions] Define the deterministic maps \begin{align*} B: \mathbb{R}^n &\to \mathbb{R}^p, & B(z)&=(X^\top X)^{-1}X^\top z,\\ R: \mathbb{R}^n &\to [0,\infty), & R(r)&=r^\top r. \end{align*} The map $B$ is linear and the map $R$ is continuous, hence both are Borel measurable. Since $X^\top H=X^\top$, we have \begin{align*} B(\hat{Y}) &=(X^\top X)^{-1}X^\top HY\\ &=(X^\top X)^{-1}X^\top Y\\ &=\hat{\beta}. \end{align*} Also, \begin{align*} R(e)=e^\top e=\operatorname{RSS}. \end{align*} Because [measurable functions](/page/Measurable%20Functions) of independent random vectors are independent, $B(\hat{Y})$ and $R(e)$ are independent. Therefore $\hat{\beta}$ and $\operatorname{RSS}$ are independent. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Orthogonality

Explore Further

Orthogonality Definition Convergence Criterion via Upcrossings Martingale Theory Addition Property of the Poisson Distribution Poisson Processes Unbiasedness of the Holdout Risk Estimator Probability & Statistics Gambler's Ruin Probability Probability Theory Doob's Upcrossing Inequality Martingale Theory Bias–Variance Decomposition for Prediction Error Probability & Statistics Population Least Squares Projection Probability & Statistics MGF of a Sum Probability Theory Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.