Androma — The Home of Mathematics on the Internet

custom_env admin

[guided]The key point is that whitening has reduced the problem to the ordinary homoscedastic Gauss-Markov geometry. We nevertheless write the argument directly, because it also gives the exact covariance difference. Let $A \in \mathbb{R}^{k \times mn}$ be any matrix satisfying $AX=I_k$. This condition is precisely the linear unbiasedness condition: since $\mathbb{E}[y]=X\beta$, we have \begin{align*} \mathbb{E}[Ay]=AX\beta=\beta \end{align*} for every $\beta \in \mathbb{R}^k$. To compare $Ay$ with the whitened least squares estimator, define \begin{align*} B := A\Omega^{1/2} \in \mathbb{R}^{k \times mn}. \end{align*} Because $\tilde{X}=\Omega^{-1/2}X$, the unbiasedness condition becomes \begin{align*} B\tilde{X}=A\Omega^{1/2}\Omega^{-1/2}X=AX=I_k. \end{align*} Thus $B$ is a linear unbiased estimator matrix for the transformed homoscedastic model. Now define the least squares matrix in the transformed model by \begin{align*} B_0 := (\tilde{X}^\top\tilde{X})^{-1}\tilde{X}^\top \in \mathbb{R}^{k \times mn}. \end{align*} Since $\tilde{X}$ has full column rank, $\tilde{X}^\top\tilde{X}$ is invertible, and \begin{align*} B_0\tilde{X} = (\tilde{X}^\top\tilde{X})^{-1}\tilde{X}^\top\tilde{X} = I_k. \end{align*} Set \begin{align*} C := B-B_0. \end{align*} Then $C$ measures the deviation of the arbitrary unbiased estimator from least squares. Since both $B$ and $B_0$ are unbiased in the transformed model, \begin{align*} C\tilde{X}=B\tilde{X}-B_0\tilde{X}=I_k-I_k=0. \end{align*} This orthogonality relation is the whole Gauss-Markov argument. We compute the covariance of the arbitrary estimator: \begin{align*} \operatorname{Cov}(Ay) &= A\operatorname{Cov}(y)A^\top\\ &= A\Omega A^\top\\ &= A\Omega^{1/2}\Omega^{1/2}A^\top\\ &= BB^\top. \end{align*} The GLS estimator is $B_0\tilde{y}$, so because $\operatorname{Cov}(\tilde{y})=\operatorname{Cov}(\tilde{\varepsilon})=I_{mn}$, its covariance is \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) &= B_0 I_{mn} B_0^\top\\ &= B_0B_0^\top\\ &= (\tilde{X}^\top\tilde{X})^{-1}\tilde{X}^\top\tilde{X}(\tilde{X}^\top\tilde{X})^{-1}\\ &= (\tilde{X}^\top\tilde{X})^{-1}. \end{align*} The cross terms vanish because \begin{align*} CB_0^\top = C\tilde{X}(\tilde{X}^\top\tilde{X})^{-1} = 0. \end{align*} Taking transposes also gives $B_0C^\top=0$. Therefore \begin{align*} \operatorname{Cov}(Ay)-\operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) &= BB^\top-B_0B_0^\top\\ &= (B_0+C)(B_0+C)^\top-B_0B_0^\top\\ &= B_0C^\top+CB_0^\top+CC^\top\\ &= CC^\top. \end{align*} Finally, $CC^\top$ is positive semidefinite because for every vector $v \in \mathbb{R}^k$, \begin{align*} v^\top CC^\top v=|C^\top v|^2 \geq 0. \end{align*} Thus no linear unbiased estimator can have smaller covariance than GLS in the positive semidefinite order.[/guided]

custom_env admin

[step:Exhibit strict gain when errors are correlated and regressor spaces differ]It remains to justify the strict-efficiency assertion. It is enough to give one valid seemingly unrelated regression system with correlated equation errors and different regressor spaces for which GLS and equation-by-equation OLS have different covariance matrices. Take $m=2$, $n=2$, $k_1=1$, and $k_2=1$. Define \begin{align*} X_1 &:= \begin{pmatrix} 1\\ 1 \end{pmatrix} \in \mathbb{R}^{2 \times 1},\\ X_2 &:= \begin{pmatrix} 1\\ 0 \end{pmatrix} \in \mathbb{R}^{2 \times 1}. \end{align*} Both matrices have full column rank, and their column spaces differ because $\operatorname{Range}(X_1)=\operatorname{span}\{(1,1)^\top\}$ while $\operatorname{Range}(X_2)=\operatorname{span}\{(1,0)^\top\}$. Let $\rho \in (-1,1)\setminus\{0\}$ and define \begin{align*} \Sigma := \begin{pmatrix} 1 & \rho\\ \rho & 1 \end{pmatrix}. \end{align*} The eigenvalues of $\Sigma$ are $1+\rho$ and $1-\rho$, both positive because $|\rho|<1$. Hence $\Sigma$ is symmetric positive definite, and its nonzero off-diagonal entries mean that the equation errors are correlated. For this system, define \begin{align*} X:=\operatorname{diag}(X_1,X_2) \in \mathbb{R}^{4 \times 2}, \qquad \Omega:=\Sigma \otimes I_2 \in \mathbb{R}^{4 \times 4}. \end{align*} A direct block computation gives \begin{align*} X^\top X &= \begin{pmatrix} 2 & 0\\ 0 & 1 \end{pmatrix}, \end{align*} so the covariance of equation-by-equation OLS is \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{OLS}}) &= (X^\top X)^{-1}X^\top\Omega X(X^\top X)^{-1}. \end{align*} The first coefficient belongs to the first equation. Its OLS variance is the $(1,1)$ entry of this matrix. Since the first equation error variance is $1$, this entry is \begin{align*} \operatorname{Var}((\hat{\beta}_{\mathrm{OLS}})_1) = (X_1^\top X_1)^{-1} = \frac{1}{2}. \end{align*} For GLS, using \begin{align*} \Sigma^{-1} = \frac{1}{1-\rho^2} \begin{pmatrix} 1 & -\rho\\ -\rho & 1 \end{pmatrix}, \end{align*} we obtain \begin{align*} X^\top\Omega^{-1}X = \frac{1}{1-\rho^2} \begin{pmatrix} X_1^\top X_1 & -\rho X_1^\top X_2\\ -\rho X_2^\top X_1 & X_2^\top X_2 \end{pmatrix} = \frac{1}{1-\rho^2} \begin{pmatrix} 2 & -\rho\\ -\rho & 1 \end{pmatrix}. \end{align*} Therefore \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) = (X^\top\Omega^{-1}X)^{-1}. \end{align*} Inverting the displayed $2 \times 2$ matrix gives \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) = \frac{1-\rho^2}{2-\rho^2} \begin{pmatrix} 1 & \rho\\ \rho & 2 \end{pmatrix}. \end{align*} Thus the GLS variance of the first coefficient is \begin{align*} \operatorname{Var}((\hat{\beta}_{\mathrm{GLS}})_1) = \frac{1-\rho^2}{2-\rho^2}. \end{align*} Since $\rho \neq 0$ and $|\rho|<1$, \begin{align*} \frac{1-\rho^2}{2-\rho^2} < \frac{1}{2}, \end{align*} because \begin{align*} 2(1-\rho^2)<2-\rho^2 \iff -\rho^2<0. \end{align*} Thus the GLS variance of the first coefficient is strictly smaller than the corresponding equation-by-equation OLS variance. Consequently, \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{OLS}}) - \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) \end{align*} is not the zero matrix. Since the previous step already proved it is positive semidefinite, the efficiency gain is strict for this correlated-error system with different regressor spaces.[/step]

custom_env admin

[guided]The general covariance comparison already proves that GLS is never worse than equation-by-equation OLS. To show that the comparison can be strict, we build a two-equation, two-observation example where the two equation design spaces are different and the cross-equation error covariance is nonzero. Let \begin{align*} X_1 &:= \begin{pmatrix} 1\\ 1 \end{pmatrix} \in \mathbb{R}^{2 \times 1}, & X_2 &:= \begin{pmatrix} 1\\ 0 \end{pmatrix} \in \mathbb{R}^{2 \times 1}. \end{align*} The first equation has the constant regressor, while the second equation uses only the first observation direction. The column spaces differ because \begin{align*} \operatorname{Range}(X_1)=\operatorname{span}\{(1,1)^\top\}, \qquad \operatorname{Range}(X_2)=\operatorname{span}\{(1,0)^\top\}. \end{align*} Both $X_1$ and $X_2$ have full column rank because neither matrix is the zero matrix. Choose a correlation parameter $\rho \in (-1,1)\setminus\{0\}$ and put \begin{align*} \Sigma := \begin{pmatrix} 1 & \rho\\ \rho & 1 \end{pmatrix}. \end{align*} The eigenvalues of $\Sigma$ are $1+\rho$ and $1-\rho$, both positive because $|\rho|<1$. Thus $\Sigma$ is symmetric positive definite. Its off-diagonal entries are nonzero because $\rho \neq 0$, so the equation errors are correlated. Define the stacked design matrix and full covariance matrix by \begin{align*} X:=\operatorname{diag}(X_1,X_2) \in \mathbb{R}^{4 \times 2}, \qquad \Omega:=\Sigma \otimes I_2 \in \mathbb{R}^{4 \times 4}. \end{align*} For equation-by-equation OLS, the covariance formula is \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{OLS}}) &= (X^\top X)^{-1}X^\top\Omega X(X^\top X)^{-1}. \end{align*} Since the first coefficient belongs to the first equation, its OLS variance is the $(1,1)$ entry of this matrix. The first equation error variance is $1$, and \begin{align*} X_1^\top X_1=2. \end{align*} Therefore \begin{align*} \operatorname{Var}((\hat{\beta}_{\mathrm{OLS}})_1) = (X_1^\top X_1)^{-1} = \frac{1}{2}. \end{align*} This calculation avoids component notation for the error vector and uses only the covariance matrix already defined. Now compute the GLS variance. Since \begin{align*} \Sigma^{-1} = \frac{1}{1-\rho^2} \begin{pmatrix} 1 & -\rho\\ -\rho & 1 \end{pmatrix}, \end{align*} we have \begin{align*} \Omega^{-1}=\Sigma^{-1}\otimes I_2. \end{align*} Using the block form of $X=\operatorname{diag}(X_1,X_2)$, the information matrix is \begin{align*} X^\top\Omega^{-1}X = \frac{1}{1-\rho^2} \begin{pmatrix} X_1^\top X_1 & -\rho X_1^\top X_2\\ -\rho X_2^\top X_1 & X_2^\top X_2 \end{pmatrix}. \end{align*} The required products are \begin{align*} X_1^\top X_1 &= 2,\\ X_1^\top X_2 &= 1,\\ X_2^\top X_1 &= 1,\\ X_2^\top X_2 &= 1. \end{align*} Therefore \begin{align*} X^\top\Omega^{-1}X = \frac{1}{1-\rho^2} \begin{pmatrix} 2 & -\rho\\ -\rho & 1 \end{pmatrix}. \end{align*} Since the GLS covariance matrix is the inverse of this information matrix, \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) = (X^\top\Omega^{-1}X)^{-1}. \end{align*} The inverse of the displayed $2 \times 2$ matrix is computed using the determinant $2-\rho^2$: \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) = \frac{1-\rho^2}{2-\rho^2} \begin{pmatrix} 1 & \rho\\ \rho & 2 \end{pmatrix}. \end{align*} Thus the first diagonal entry is \begin{align*} \operatorname{Var}((\hat{\beta}_{\mathrm{GLS}})_1) = \frac{1-\rho^2}{2-\rho^2}. \end{align*} This is strictly smaller than the OLS variance: \begin{align*} \frac{1-\rho^2}{2-\rho^2} < \frac{1}{2} \end{align*} because \begin{align*} 2(1-\rho^2)<2-\rho^2 \iff -\rho^2<0, \end{align*} and $\rho \neq 0$. Thus, in this concrete SUR system, the off-diagonal covariance supplies useful information precisely because the two equations do not have the same regressor space. GLS uses that information through $\Omega^{-1}$, while equation-by-equation OLS ignores it. The covariance improvement is therefore strict.[/guided]

custom_env admin

What brings you to Androma?

Start with a route through the knowledge graph.

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Sign in to Androma

Check your inbox

One last step

Attributions & Verification

Proof

Verification Progress

Contributors

Who Can Verify

Quick Actions

Raw Attribution Data