[guided]The general covariance comparison already proves that GLS is never worse than equation-by-equation OLS. To show that the comparison can be strict, we build a two-equation, two-observation example where the two equation design spaces are different and the cross-equation error covariance is nonzero.
Let
\begin{align*}
X_1
&:=
\begin{pmatrix}
1\\
1
\end{pmatrix}
\in \mathbb{R}^{2 \times 1},
&
X_2
&:=
\begin{pmatrix}
1\\
0
\end{pmatrix}
\in \mathbb{R}^{2 \times 1}.
\end{align*}
The first equation has the constant regressor, while the second equation uses only the first observation direction. The column spaces differ because
\begin{align*}
\operatorname{Range}(X_1)=\operatorname{span}\{(1,1)^\top\},
\qquad
\operatorname{Range}(X_2)=\operatorname{span}\{(1,0)^\top\}.
\end{align*}
Both $X_1$ and $X_2$ have full column rank because neither matrix is the zero matrix.
Choose a correlation parameter $\rho \in (-1,1)\setminus\{0\}$ and put
\begin{align*}
\Sigma
:=
\begin{pmatrix}
1 & \rho\\
\rho & 1
\end{pmatrix}.
\end{align*}
The eigenvalues of $\Sigma$ are $1+\rho$ and $1-\rho$, both positive because $|\rho|<1$. Thus $\Sigma$ is symmetric positive definite. Its off-diagonal entries are nonzero because $\rho \neq 0$, so the equation errors are correlated.
Define the stacked design matrix and full covariance matrix by
\begin{align*}
X:=\operatorname{diag}(X_1,X_2) \in \mathbb{R}^{4 \times 2},
\qquad
\Omega:=\Sigma \otimes I_2 \in \mathbb{R}^{4 \times 4}.
\end{align*}
For equation-by-equation OLS, the covariance formula is
\begin{align*}
\operatorname{Cov}(\hat{\beta}_{\mathrm{OLS}})
&=
(X^\top X)^{-1}X^\top\Omega X(X^\top X)^{-1}.
\end{align*}
Since the first coefficient belongs to the first equation, its OLS variance is the $(1,1)$ entry of this matrix. The first equation error variance is $1$, and
\begin{align*}
X_1^\top X_1=2.
\end{align*}
Therefore
\begin{align*}
\operatorname{Var}((\hat{\beta}_{\mathrm{OLS}})_1)
=
(X_1^\top X_1)^{-1}
=
\frac{1}{2}.
\end{align*}
This calculation avoids component notation for the error vector and uses only the covariance matrix already defined.
Now compute the GLS variance. Since
\begin{align*}
\Sigma^{-1}
=
\frac{1}{1-\rho^2}
\begin{pmatrix}
1 & -\rho\\
-\rho & 1
\end{pmatrix},
\end{align*}
we have
\begin{align*}
\Omega^{-1}=\Sigma^{-1}\otimes I_2.
\end{align*}
Using the block form of $X=\operatorname{diag}(X_1,X_2)$, the information matrix is
\begin{align*}
X^\top\Omega^{-1}X
=
\frac{1}{1-\rho^2}
\begin{pmatrix}
X_1^\top X_1 & -\rho X_1^\top X_2\\
-\rho X_2^\top X_1 & X_2^\top X_2
\end{pmatrix}.
\end{align*}
The required products are
\begin{align*}
X_1^\top X_1 &= 2,\\
X_1^\top X_2 &= 1,\\
X_2^\top X_1 &= 1,\\
X_2^\top X_2 &= 1.
\end{align*}
Therefore
\begin{align*}
X^\top\Omega^{-1}X
=
\frac{1}{1-\rho^2}
\begin{pmatrix}
2 & -\rho\\
-\rho & 1
\end{pmatrix}.
\end{align*}
Since the GLS covariance matrix is the inverse of this information matrix,
\begin{align*}
\operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}})
=
(X^\top\Omega^{-1}X)^{-1}.
\end{align*}
The inverse of the displayed $2 \times 2$ matrix is computed using the determinant $2-\rho^2$:
\begin{align*}
\operatorname{Cov}(\hat{\beta}_{\mathrm{GLS}})
=
\frac{1-\rho^2}{2-\rho^2}
\begin{pmatrix}
1 & \rho\\
\rho & 2
\end{pmatrix}.
\end{align*}
Thus the first diagonal entry is
\begin{align*}
\operatorname{Var}((\hat{\beta}_{\mathrm{GLS}})_1)
=
\frac{1-\rho^2}{2-\rho^2}.
\end{align*}
This is strictly smaller than the OLS variance:
\begin{align*}
\frac{1-\rho^2}{2-\rho^2}
<
\frac{1}{2}
\end{align*}
because
\begin{align*}
2(1-\rho^2)<2-\rho^2
\iff
-\rho^2<0,
\end{align*}
and $\rho \neq 0$.
Thus, in this concrete SUR system, the off-diagonal covariance supplies useful information precisely because the two equations do not have the same regressor space. GLS uses that information through $\Omega^{-1}$, while equation-by-equation OLS ignores it. The covariance improvement is therefore strict.[/guided]