Gauss-Markov Efficiency of Generalized Least Squares in Seemingly Unrelated Regressions

Theorem

Edit Issues Pull Requests Attributions Admin

Let $m,n \in \mathbb{N}$. For each equation index $i \in \{1,\dots,m\}$, let $X_i \in \mathbb{R}^{n \times k_i}$ have full column rank, let $\beta_i \in \mathbb{R}^{k_i}$, and let \begin{align*} y_i = X_i \beta_i + \varepsilon_i \end{align*} be the $i$th regression equation. Define the stacked response vector, coefficient vector, design matrix, and error vector by \begin{align*} y &:= (y_1^\top,\dots,y_m^\top)^\top \in \mathbb{R}^{mn},\\ \beta &:= (\beta_1^\top,\dots,\beta_m^\top)^\top \in \mathbb{R}^{k}, \qquad k := \sum_{i=1}^m k_i,\\ X &:= \operatorname{diag}(X_1,\dots,X_m) \in \mathbb{R}^{mn \times k},\\ \varepsilon &:= (\varepsilon_1^\top,\dots,\varepsilon_m^\top)^\top \in \mathbb{R}^{mn}, \end{align*} so that \begin{align*} y = X\beta + \varepsilon. \end{align*} Assume $\mathbb{E}[\varepsilon]=0$ and \begin{align*} \operatorname{Cov}(\varepsilon)=\Omega:=\Sigma \otimes I_n, \end{align*} where $\Sigma \in \mathbb{R}^{m \times m}$ is symmetric positive definite and $I_n$ is the $n \times n$ identity matrix. Then the generalized least squares estimator \begin{align*} \hat{\beta}_{\mathrm{GLS}} := (X^\top \Omega^{-1}X)^{-1}X^\top\Omega^{-1}y \end{align*} is the best linear unbiased estimator of $\beta$ among all estimators of the form $Ay$ with $A \in \mathbb{R}^{k \times mn}$ and $AX=I_k$. Equivalently, for every such linear unbiased estimator $Ay$, \begin{align*} \operatorname{Cov}(Ay)-\operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) \end{align*} is positive semidefinite. In particular, the equation-by-equation ordinary least squares estimator \begin{align*} \hat{\beta}_{\mathrm{OLS}} := (X^\top X)^{-1}X^\top y \end{align*} is linear unbiased, and hence \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{OLS}}) - \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) \end{align*} is positive semidefinite. Moreover, if the off-diagonal entries of $\Sigma$ are nonzero and the column spaces of the matrices $X_i$ are not all identical, then such systems can exhibit strict efficiency gain: there are choices of $(X_i)_{i=1}^m$ and $\Sigma$ satisfying these conditions for which the above covariance difference is nonzero positive semidefinite.

Discussion

Proof

[proofplan] We regard the seemingly unrelated regression system as one stacked linear model with nonspherical covariance matrix $\Omega$. Whitening by a symmetric inverse square root of $\Omega$ reduces the model to an ordinary homoscedastic linear model, and a direct Gauss-Markov projection argument proves that GLS has minimal covariance among all linear unbiased estimators. Since equation-by-equation OLS is one admissible linear unbiased estimator, GLS has covariance no larger than OLS. Finally, an explicit two-equation system with correlated errors and different regressor spaces shows that the improvement can be strict. [/proofplan] [step:Whiten the stacked regression system] Since $\Sigma$ is symmetric positive definite and $I_n$ is symmetric positive definite, the Kronecker product \begin{align*} \Omega := \Sigma \otimes I_n \end{align*} is symmetric positive definite. Hence there is a symmetric positive definite matrix $\Omega^{-1/2} \in \mathbb{R}^{mn \times mn}$ such that \begin{align*} \Omega^{-1/2}\Omega\Omega^{-1/2}=I_{mn}. \end{align*} Define the transformed response, design matrix, and error vector by \begin{align*} \tilde{y} &:= \Omega^{-1/2}y \in \mathbb{R}^{mn},\\ \tilde{X} &:= \Omega^{-1/2}X \in \mathbb{R}^{mn \times k},\\ \tilde{\varepsilon} &:= \Omega^{-1/2}\varepsilon \in \mathbb{R}^{mn}. \end{align*} Then \begin{align*} \tilde{y}=\tilde{X}\beta+\tilde{\varepsilon}, \end{align*} and \begin{align*} \mathbb{E}[\tilde{\varepsilon}] &=\Omega^{-1/2}\mathbb{E}[\varepsilon] =0,\\ \operatorname{Cov}(\tilde{\varepsilon}) &=\Omega^{-1/2}\operatorname{Cov}(\varepsilon)\Omega^{-1/2} =\Omega^{-1/2}\Omega\Omega^{-1/2} =I_{mn}. \end{align*} Because $X$ has full column rank and $\Omega^{-1/2}$ is invertible, $\tilde{X}$ has full column rank. Therefore $\tilde{X}^\top\tilde{X}$ is invertible. [/step] [step:Identify GLS as ordinary least squares after whitening] The ordinary least squares estimator in the transformed model is \begin{align*} \hat{\beta}_{\mathrm{OLS},\sim} := (\tilde{X}^\top\tilde{X})^{-1}\tilde{X}^\top\tilde{y}. \end{align*} Using the definitions of $\tilde{X}$ and $\tilde{y}$, we compute \begin{align*} \tilde{X}^\top\tilde{X} &= X^\top\Omega^{-1/2}\Omega^{-1/2}X = X^\top\Omega^{-1}X,\\ \tilde{X}^\top\tilde{y} &= X^\top\Omega^{-1/2}\Omega^{-1/2}y = X^\top\Omega^{-1}y. \end{align*} Hence \begin{align*} \hat{\beta}_{\mathrm{OLS},\sim} = (X^\top\Omega^{-1}X)^{-1}X^\top\Omega^{-1}y = \hat{\beta}_{\mathrm{GLS}}. \end{align*} [/step] [step:Prove the covariance minimality of GLS] Let $A \in \mathbb{R}^{k \times mn}$ satisfy $AX=I_k$, so that $Ay$ is a linear unbiased estimator of $\beta$. Define \begin{align*} B := A\Omega^{1/2} \in \mathbb{R}^{k \times mn}. \end{align*} Since $\tilde{X}=\Omega^{-1/2}X$, the condition $AX=I_k$ is equivalent to \begin{align*} B\tilde{X}=A\Omega^{1/2}\Omega^{-1/2}X=AX=I_k. \end{align*} Let \begin{align*} B_0 := (\tilde{X}^\top\tilde{X})^{-1}\tilde{X}^\top \in \mathbb{R}^{k \times mn} \end{align*} be the ordinary least squares matrix in the whitened model. Then $B_0\tilde{X}=I_k$. Define \begin{align*} C := B-B_0 \in \mathbb{R}^{k \times mn}. \end{align*} Then \begin{align*} C\tilde{X}=B\tilde{X}-B_0\tilde{X}=I_k-I_k=0. \end{align*} The covariance of $Ay$ is \begin{align*} \operatorname{Cov}(Ay) &= A\Omega A^\top\\ &= A\Omega^{1/2}\Omega^{1/2}A^\top\\ &= BB^\top. \end{align*} The covariance of $\hat{\beta}_{\mathrm{GLS}}=B_0\tilde{y}$ is \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) &= B_0B_0^\top\\ &= (\tilde{X}^\top\tilde{X})^{-1}\tilde{X}^\top\tilde{X}(\tilde{X}^\top\tilde{X})^{-1}\\ &= (\tilde{X}^\top\tilde{X})^{-1}. \end{align*} Since $C\tilde{X}=0$, we also have \begin{align*} CB_0^\top &= C\tilde{X}(\tilde{X}^\top\tilde{X})^{-1} =0. \end{align*} Therefore \begin{align*} \operatorname{Cov}(Ay)-\operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) &= BB^\top-B_0B_0^\top\\ &= (B_0+C)(B_0+C)^\top-B_0B_0^\top\\ &= B_0C^\top+CB_0^\top+CC^\top\\ &= CC^\top. \end{align*} For every $v \in \mathbb{R}^k$, \begin{align*} v^\top CC^\top v = |C^\top v|^2 \geq 0, \end{align*} so $CC^\top$ is positive semidefinite. Thus $\hat{\beta}_{\mathrm{GLS}}$ is best linear unbiased among all linear unbiased estimators using the full covariance matrix $\Omega$. [guided] The key point is that whitening has reduced the problem to the ordinary homoscedastic Gauss-Markov geometry. We nevertheless write the argument directly, because it also gives the exact covariance difference. Let $A \in \mathbb{R}^{k \times mn}$ be any matrix satisfying $AX=I_k$. This condition is precisely the linear unbiasedness condition: since $\mathbb{E}[y]=X\beta$, we have \begin{align*} \mathbb{E}[Ay]=AX\beta=\beta \end{align*} for every $\beta \in \mathbb{R}^k$. To compare $Ay$ with the whitened least squares estimator, define \begin{align*} B := A\Omega^{1/2} \in \mathbb{R}^{k \times mn}. \end{align*} Because $\tilde{X}=\Omega^{-1/2}X$, the unbiasedness condition becomes \begin{align*} B\tilde{X}=A\Omega^{1/2}\Omega^{-1/2}X=AX=I_k. \end{align*} Thus $B$ is a linear unbiased estimator matrix for the transformed homoscedastic model. Now define the least squares matrix in the transformed model by \begin{align*} B_0 := (\tilde{X}^\top\tilde{X})^{-1}\tilde{X}^\top \in \mathbb{R}^{k \times mn}. \end{align*} Since $\tilde{X}$ has full column rank, $\tilde{X}^\top\tilde{X}$ is invertible, and \begin{align*} B_0\tilde{X} = (\tilde{X}^\top\tilde{X})^{-1}\tilde{X}^\top\tilde{X} = I_k. \end{align*} Set \begin{align*} C := B-B_0. \end{align*} Then $C$ measures the deviation of the arbitrary unbiased estimator from least squares. Since both $B$ and $B_0$ are unbiased in the transformed model, \begin{align*} C\tilde{X}=B\tilde{X}-B_0\tilde{X}=I_k-I_k=0. \end{align*} This orthogonality relation is the whole Gauss-Markov argument. We compute the covariance of the arbitrary estimator: \begin{align*} \operatorname{Cov}(Ay) &= A\operatorname{Cov}(y)A^\top\\ &= A\Omega A^\top\\ &= A\Omega^{1/2}\Omega^{1/2}A^\top\\ &= BB^\top. \end{align*} The GLS estimator is $B_0\tilde{y}$, so because $\operatorname{Cov}(\tilde{y})=\operatorname{Cov}(\tilde{\varepsilon})=I_{mn}$, its covariance is \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) &= B_0 I_{mn} B_0^\top\\ &= B_0B_0^\top\\ &= (\tilde{X}^\top\tilde{X})^{-1}\tilde{X}^\top\tilde{X}(\tilde{X}^\top\tilde{X})^{-1}\\ &= (\tilde{X}^\top\tilde{X})^{-1}. \end{align*} The cross terms vanish because \begin{align*} CB_0^\top = C\tilde{X}(\tilde{X}^\top\tilde{X})^{-1} = 0. \end{align*} Taking transposes also gives $B_0C^\top=0$. Therefore \begin{align*} \operatorname{Cov}(Ay)-\operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) &= BB^\top-B_0B_0^\top\\ &= (B_0+C)(B_0+C)^\top-B_0B_0^\top\\ &= B_0C^\top+CB_0^\top+CC^\top\\ &= CC^\top. \end{align*} Finally, $CC^\top$ is positive semidefinite because for every vector $v \in \mathbb{R}^k$, \begin{align*} v^\top CC^\top v=|C^\top v|^2 \geq 0. \end{align*} Thus no linear unbiased estimator can have smaller covariance than GLS in the positive semidefinite order. [/guided] [/step] [step:Compare GLS with equation-by-equation ordinary least squares] The equation-by-equation ordinary least squares estimator is \begin{align*} \hat{\beta}_{\mathrm{OLS}} = (X^\top X)^{-1}X^\top y. \end{align*} Because $X$ has full column rank, \begin{align*} (X^\top X)^{-1}X^\top X=I_k. \end{align*} Thus $\hat{\beta}_{\mathrm{OLS}}$ is linear unbiased. Applying the covariance minimality result with \begin{align*} A := (X^\top X)^{-1}X^\top \end{align*} gives \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{OLS}}) - \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) \end{align*} positive semidefinite. [/step] [step:Exhibit strict gain when errors are correlated and regressor spaces differ] It remains to justify the strict-efficiency assertion. It is enough to give one valid seemingly unrelated regression system with correlated equation errors and different regressor spaces for which GLS and equation-by-equation OLS have different covariance matrices. Take $m=2$, $n=2$, $k_1=1$, and $k_2=1$. Define \begin{align*} X_1 &:= \begin{pmatrix} 1\\ 1 \end{pmatrix} \in \mathbb{R}^{2 \times 1},\\ X_2 &:= \begin{pmatrix} 1\\ 0 \end{pmatrix} \in \mathbb{R}^{2 \times 1}. \end{align*} Both matrices have full column rank, and their column spaces differ because $\operatorname{Range}(X_1)=\operatorname{span}\{(1,1)^\top\}$ while $\operatorname{Range}(X_2)=\operatorname{span}\{(1,0)^\top\}$. Let $\rho \in (-1,1)\setminus\{0\}$ and define \begin{align*} \Sigma := \begin{pmatrix} 1 & \rho\\ \rho & 1 \end{pmatrix}. \end{align*} The eigenvalues of $\Sigma$ are $1+\rho$ and $1-\rho$, both positive because $|\rho|<1$. Hence $\Sigma$ is symmetric positive definite, and its nonzero off-diagonal entries mean that the equation errors are correlated. For this system, define \begin{align*} X:=\operatorname{diag}(X_1,X_2) \in \mathbb{R}^{4 \times 2}, \qquad \Omega:=\Sigma \otimes I_2 \in \mathbb{R}^{4 \times 4}. \end{align*} A direct block computation gives \begin{align*} X^\top X &= \begin{pmatrix} 2 & 0\\ 0 & 1 \end{pmatrix}, \end{align*} so the covariance of equation-by-equation OLS is \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{OLS}}) &= (X^\top X)^{-1}X^\top\Omega X(X^\top X)^{-1}. \end{align*} The first coefficient belongs to the first equation. Its OLS variance is the $(1,1)$ entry of this matrix. Since the first equation error variance is $1$, this entry is \begin{align*} \operatorname{Var}((\hat{\beta}_{\mathrm{OLS}})_1) = (X_1^\top X_1)^{-1} = \frac{1}{2}. \end{align*} For GLS, using \begin{align*} \Sigma^{-1} = \frac{1}{1-\rho^2} \begin{pmatrix} 1 & -\rho\\ -\rho & 1 \end{pmatrix}, \end{align*} we obtain \begin{align*} X^\top\Omega^{-1}X = \frac{1}{1-\rho^2} \begin{pmatrix} X_1^\top X_1 & -\rho X_1^\top X_2\\ -\rho X_2^\top X_1 & X_2^\top X_2 \end{pmatrix} = \frac{1}{1-\rho^2} \begin{pmatrix} 2 & -\rho\\ -\rho & 1 \end{pmatrix}. \end{align*} Therefore \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) = (X^\top\Omega^{-1}X)^{-1}. \end{align*} Inverting the displayed $2 \times 2$ matrix gives \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) = \frac{1-\rho^2}{2-\rho^2} \begin{pmatrix} 1 & \rho\\ \rho & 2 \end{pmatrix}. \end{align*} Thus the GLS variance of the first coefficient is \begin{align*} \operatorname{Var}((\hat{\beta}_{\mathrm{GLS}})_1) = \frac{1-\rho^2}{2-\rho^2}. \end{align*} Since $\rho \neq 0$ and $|\rho|<1$, \begin{align*} \frac{1-\rho^2}{2-\rho^2} < \frac{1}{2}, \end{align*} because \begin{align*} 2(1-\rho^2)<2-\rho^2 \iff -\rho^2<0. \end{align*} Thus the GLS variance of the first coefficient is strictly smaller than the corresponding equation-by-equation OLS variance. Consequently, \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{OLS}}) - \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) \end{align*} is not the zero matrix. Since the previous step already proved it is positive semidefinite, the efficiency gain is strict for this correlated-error system with different regressor spaces. [guided] The general covariance comparison already proves that GLS is never worse than equation-by-equation OLS. To show that the comparison can be strict, we build a two-equation, two-observation example where the two equation design spaces are different and the cross-equation error covariance is nonzero. Let \begin{align*} X_1 &:= \begin{pmatrix} 1\\ 1 \end{pmatrix} \in \mathbb{R}^{2 \times 1}, & X_2 &:= \begin{pmatrix} 1\\ 0 \end{pmatrix} \in \mathbb{R}^{2 \times 1}. \end{align*} The first equation has the constant regressor, while the second equation uses only the first observation direction. The column spaces differ because \begin{align*} \operatorname{Range}(X_1)=\operatorname{span}\{(1,1)^\top\}, \qquad \operatorname{Range}(X_2)=\operatorname{span}\{(1,0)^\top\}. \end{align*} Both $X_1$ and $X_2$ have full column rank because neither matrix is the zero matrix. Choose a correlation parameter $\rho \in (-1,1)\setminus\{0\}$ and put \begin{align*} \Sigma := \begin{pmatrix} 1 & \rho\\ \rho & 1 \end{pmatrix}. \end{align*} The eigenvalues of $\Sigma$ are $1+\rho$ and $1-\rho$, both positive because $|\rho|<1$. Thus $\Sigma$ is symmetric positive definite. Its off-diagonal entries are nonzero because $\rho \neq 0$, so the equation errors are correlated. Define the stacked design matrix and full covariance matrix by \begin{align*} X:=\operatorname{diag}(X_1,X_2) \in \mathbb{R}^{4 \times 2}, \qquad \Omega:=\Sigma \otimes I_2 \in \mathbb{R}^{4 \times 4}. \end{align*} For equation-by-equation OLS, the covariance formula is \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{OLS}}) &= (X^\top X)^{-1}X^\top\Omega X(X^\top X)^{-1}. \end{align*} Since the first coefficient belongs to the first equation, its OLS variance is the $(1,1)$ entry of this matrix. The first equation error variance is $1$, and \begin{align*} X_1^\top X_1=2. \end{align*} Therefore \begin{align*} \operatorname{Var}((\hat{\beta}_{\mathrm{OLS}})_1) = (X_1^\top X_1)^{-1} = \frac{1}{2}. \end{align*} This calculation avoids component notation for the error vector and uses only the covariance matrix already defined. Now compute the GLS variance. Since \begin{align*} \Sigma^{-1} = \frac{1}{1-\rho^2} \begin{pmatrix} 1 & -\rho\\ -\rho & 1 \end{pmatrix}, \end{align*} we have \begin{align*} \Omega^{-1}=\Sigma^{-1}\otimes I_2. \end{align*} Using the block form of $X=\operatorname{diag}(X_1,X_2)$, the information matrix is \begin{align*} X^\top\Omega^{-1}X = \frac{1}{1-\rho^2} \begin{pmatrix} X_1^\top X_1 & -\rho X_1^\top X_2\\ -\rho X_2^\top X_1 & X_2^\top X_2 \end{pmatrix}. \end{align*} The required products are \begin{align*} X_1^\top X_1 &= 2,\\ X_1^\top X_2 &= 1,\\ X_2^\top X_1 &= 1,\\ X_2^\top X_2 &= 1. \end{align*} Therefore \begin{align*} X^\top\Omega^{-1}X = \frac{1}{1-\rho^2} \begin{pmatrix} 2 & -\rho\\ -\rho & 1 \end{pmatrix}. \end{align*} Since the GLS covariance matrix is the inverse of this information matrix, \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) = (X^\top\Omega^{-1}X)^{-1}. \end{align*} The inverse of the displayed $2 \times 2$ matrix is computed using the determinant $2-\rho^2$: \begin{align*} \operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}}) = \frac{1-\rho^2}{2-\rho^2} \begin{pmatrix} 1 & \rho\\ \rho & 2 \end{pmatrix}. \end{align*} Thus the first diagonal entry is \begin{align*} \operatorname{Var}((\hat{\beta}_{\mathrm{GLS}})_1) = \frac{1-\rho^2}{2-\rho^2}. \end{align*} This is strictly smaller than the OLS variance: \begin{align*} \frac{1-\rho^2}{2-\rho^2} < \frac{1}{2} \end{align*} because \begin{align*} 2(1-\rho^2)<2-\rho^2 \iff -\rho^2<0, \end{align*} and $\rho \neq 0$. Thus, in this concrete SUR system, the off-diagonal covariance supplies useful information precisely because the two equations do not have the same regressor space. GLS uses that information through $\Omega^{-1}$, while equation-by-equation OLS ignores it. The covariance improvement is therefore strict. [/guided] [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

What brings you to Androma?

Start with a route through the knowledge graph.