[proofplan]
We prove the result directly by matrix algebra rather than reducing to the ordinary homoskedastic case. First we show that $X^\top \Omega^{-1}X$ is invertible and that the proposed weighted least squares estimator is unbiased. Then we characterize every linear estimator $AY$ that is unbiased for every $\beta \in \mathbb{R}^p$ by the constraint $AX = I_p$, where $I_p \in \mathbb{R}^{p \times p}$ denotes the identity matrix. Finally we write $A$ as the weighted least squares matrix plus an error matrix $B$ satisfying $BX = 0$, expand the covariance, and show that all cross terms vanish while the remaining term $B\Omega B^\top$ is [positive semidefinite](/page/Positive%20Semidefinite%20Matrix).
[/proofplan]
custom_env
admin
[step:Show that the weighted normal matrix is invertible]Since $\Omega$ is symmetric [positive definite](/page/Positive%20Definite%20Matrix), the standard inverse-positivity theorem for positive definite matrices implies that $\Omega^{-1}$ is symmetric positive definite. Define the matrix
\begin{align*}
M := X^\top \Omega^{-1}X \in \mathbb{R}^{p \times p}.
\end{align*}
For every nonzero vector $v \in \mathbb{R}^p$, the full column rank of $X$ gives $Xv \neq 0$. Therefore
\begin{align*}
v^\top Mv
= v^\top X^\top \Omega^{-1}Xv
= (Xv)^\top \Omega^{-1}(Xv)
> 0.
\end{align*}
Thus $M$ is symmetric positive definite. By the standard criterion that a positive definite square matrix has no zero eigenvalue, $M$ is [invertible](/page/Invertible%20Matrix).[/step]
custom_env
admin
[guided]The estimator contains the inverse of $X^\top \Omega^{-1}X$, so the first point is to prove that this inverse exists. Define
\begin{align*}
M := X^\top \Omega^{-1}X \in \mathbb{R}^{p \times p}.
\end{align*}
Because $\Omega$ is symmetric [positive definite](/page/Positive%20Definite%20Matrix), the standard inverse-positivity theorem for positive definite matrices gives that its inverse $\Omega^{-1}$ is also symmetric positive definite. Now take any nonzero vector $v \in \mathbb{R}^p$. Since $X$ has rank $p$, its columns are linearly independent, so $Xv \neq 0$. Applying positive definiteness of $\Omega^{-1}$ to the nonzero vector $Xv \in \mathbb{R}^n$ gives
\begin{align*}
v^\top Mv
= v^\top X^\top \Omega^{-1}Xv
= (Xv)^\top \Omega^{-1}(Xv)
> 0.
\end{align*}
Therefore $M$ is symmetric positive definite. A symmetric positive definite square matrix has no zero eigenvalue, so it is [invertible](/page/Invertible%20Matrix).[/guided]
custom_env
admin
[step:Verify that the weighted least squares estimator is linear and unbiased]
Define
\begin{align*}
A_{\Omega} := (X^\top \Omega^{-1}X)^{-1}X^\top \Omega^{-1} \in \mathbb{R}^{p \times n}.
\end{align*}
Then $\hat{\beta}_{\Omega} = A_{\Omega}Y$, so $\hat{\beta}_{\Omega}$ is linear in $Y$. Moreover,
\begin{align*}
A_{\Omega}X
= (X^\top \Omega^{-1}X)^{-1}X^\top \Omega^{-1}X
= I_p.
\end{align*}
Here $I_p \in \mathbb{R}^{p \times p}$ is the identity matrix, as defined in the proof plan. Using $Y = X\beta + \varepsilon$ and $\mathbb{E}[\varepsilon] = 0$, we obtain
\begin{align*}
\mathbb{E}[\hat{\beta}_{\Omega}]
= \mathbb{E}[A_{\Omega}Y]
= A_{\Omega}X\beta + A_{\Omega}\mathbb{E}[\varepsilon]
= \beta.
\end{align*}
Thus $\hat{\beta}_{\Omega}$ is an unbiased linear estimator of $\beta$.
[/step]
custom_env
admin
[step:Characterize all uniformly unbiased linear estimators by the equation $AX = I_p$]Let $A \in \mathbb{R}^{p \times n}$ be such that $AY$ is an [unbiased linear estimator](/page/Linear%20Unbiased%20Estimator) of $\beta$ for every parameter value $\beta \in \mathbb{R}^p$. Since
\begin{align*}
\mathbb{E}[AY]
= A\mathbb{E}[Y]
= AX\beta,
\end{align*}
this uniform unbiasedness condition is equivalent to
\begin{align*}
AX = I_p.
\end{align*}
Indeed, if $AX = I_p$, then $\mathbb{E}[AY] = \beta$ for every $\beta$. Conversely, if $AX\beta = \beta$ for every $\beta \in \mathbb{R}^p$, then $(AX - I_p)\beta = 0$ for every $\beta \in \mathbb{R}^p$, which implies $AX = I_p$.[/step]
custom_env
admin
[guided]A linear estimator of $\beta$ has the form $AY$, where $A \in \mathbb{R}^{p \times n}$. In the Gauss-Markov comparison class, unbiasedness means unbiasedness for every possible value of the parameter $\beta \in \mathbb{R}^p$, not merely at one fixed parameter value. Its expectation is determined by the mean of $Y$. Since $Y = X\beta + \varepsilon$ and $\mathbb{E}[\varepsilon] = 0$, we have
\begin{align*}
\mathbb{E}[AY]
= A\mathbb{E}[Y]
= A(X\beta)
= AX\beta.
\end{align*}
For $AY$ to be unbiased for $\beta$ for every possible parameter value $\beta \in \mathbb{R}^p$, we must have
\begin{align*}
AX\beta = \beta
\end{align*}
for every $\beta \in \mathbb{R}^p$. This condition is exactly
\begin{align*}
AX = I_p.
\end{align*}
The implication in the other direction is immediate: if $AX = I_p$, then
\begin{align*}
\mathbb{E}[AY] = AX\beta = I_p\beta = \beta.
\end{align*}
Thus the matrices producing unbiased linear estimators are precisely those satisfying $AX = I_p$.[/guided]
custom_env
admin
[step:Decompose an arbitrary unbiased estimator into the weighted estimator plus an orthogonal error]Let $A \in \mathbb{R}^{p \times n}$ satisfy $AX = I_p$. Define
\begin{align*}
B := A - A_{\Omega} \in \mathbb{R}^{p \times n}.
\end{align*}
Since both $A$ and $A_{\Omega}$ satisfy the unbiasedness equation, we have
\begin{align*}
BX
= (A - A_{\Omega})X
= AX - A_{\Omega}X
= I_p - I_p
= 0.
\end{align*}
Therefore
\begin{align*}
A = A_{\Omega} + B
\end{align*}
with $BX = 0$.[/step]
custom_env
admin
[guided]We now compare an arbitrary unbiased linear estimator $AY$ with the weighted least squares estimator $A_{\Omega}Y$. The natural object to measure their difference is the matrix
\begin{align*}
B := A - A_{\Omega} \in \mathbb{R}^{p \times n}.
\end{align*}
The key structural fact is that $B$ kills the columns of $X$. Indeed, the previous step gives $AX = I_p$, and the direct computation of $A_{\Omega}X$ gives $A_{\Omega}X = I_p$. Hence
\begin{align*}
BX
= (A - A_{\Omega})X
= AX - A_{\Omega}X
= I_p - I_p
= 0.
\end{align*}
This identity is the algebraic reason the covariance cross terms vanish in the next step.[/guided]
custom_env
admin
[step:Expand the covariance difference and identify the positive semidefinite remainder]Since $\operatorname{Var}(Y) = \operatorname{Var}(\varepsilon) = \Omega$, covariance under a deterministic matrix gives
\begin{align*}
\operatorname{Var}(AY) = A\Omega A^\top,
\qquad
\operatorname{Var}(\hat{\beta}_{\Omega}) = A_{\Omega}\Omega A_{\Omega}^\top.
\end{align*}
Using $A = A_{\Omega} + B$, we expand:
\begin{align*}
A\Omega A^\top
&= (A_{\Omega}+B)\Omega(A_{\Omega}+B)^\top \\
&= A_{\Omega}\Omega A_{\Omega}^\top
+ A_{\Omega}\Omega B^\top
+ B\Omega A_{\Omega}^\top
+ B\Omega B^\top.
\end{align*}
The first cross term vanishes because
\begin{align*}
A_{\Omega}\Omega B^\top
&= (X^\top \Omega^{-1}X)^{-1}X^\top \Omega^{-1}\Omega B^\top \\
&= (X^\top \Omega^{-1}X)^{-1}X^\top B^\top \\
&= (X^\top \Omega^{-1}X)^{-1}(BX)^\top \\
&= 0.
\end{align*}
The second cross term is its transpose, so
\begin{align*}
B\Omega A_{\Omega}^\top = 0.
\end{align*}
Therefore
\begin{align*}
\operatorname{Var}(AY) - \operatorname{Var}(\hat{\beta}_{\Omega})
= B\Omega B^\top.
\end{align*}
For every $u \in \mathbb{R}^p$,
\begin{align*}
u^\top B\Omega B^\top u
= (B^\top u)^\top \Omega (B^\top u)
\geq 0,
\end{align*}
because $\Omega$ is positive definite. Hence $B\Omega B^\top$ is positive semidefinite.[/step]
custom_env
admin
[guided]The [variance](/page/Variance) of a deterministic linear transformation of a random vector satisfies
\begin{align*}
\operatorname{Var}(AY) = A\operatorname{Var}(Y)A^\top.
\end{align*}
Here $\operatorname{Var}(Y) = \operatorname{Var}(X\beta+\varepsilon) = \operatorname{Var}(\varepsilon) = \Omega$, since $X\beta$ is deterministic. Therefore
\begin{align*}
\operatorname{Var}(AY) = A\Omega A^\top,
\qquad
\operatorname{Var}(\hat{\beta}_{\Omega}) = A_{\Omega}\Omega A_{\Omega}^\top.
\end{align*}
Substitute the decomposition $A = A_{\Omega}+B$:
\begin{align*}
A\Omega A^\top
&= (A_{\Omega}+B)\Omega(A_{\Omega}+B)^\top \\
&= A_{\Omega}\Omega A_{\Omega}^\top
+ A_{\Omega}\Omega B^\top
+ B\Omega A_{\Omega}^\top
+ B\Omega B^\top.
\end{align*}
The important point is that the middle two terms vanish. For the first cross term, use the definition of $A_{\Omega}$ and the identity $\Omega^{-1}\Omega = I_n$, where $I_n \in \mathbb{R}^{n \times n}$ denotes the identity matrix:
\begin{align*}
A_{\Omega}\Omega B^\top
&= (X^\top \Omega^{-1}X)^{-1}X^\top \Omega^{-1}\Omega B^\top \\
&= (X^\top \Omega^{-1}X)^{-1}X^\top B^\top \\
&= (X^\top \Omega^{-1}X)^{-1}(BX)^\top.
\end{align*}
But the previous step proved $BX = 0$, so
\begin{align*}
A_{\Omega}\Omega B^\top = 0.
\end{align*}
The other cross term is the transpose of this one:
\begin{align*}
B\Omega A_{\Omega}^\top
= (A_{\Omega}\Omega B^\top)^\top
= 0.
\end{align*}
Thus the entire covariance excess is
\begin{align*}
\operatorname{Var}(AY) - \operatorname{Var}(\hat{\beta}_{\Omega})
= B\Omega B^\top.
\end{align*}
Finally, to prove positive semidefiniteness, test against an arbitrary vector $u \in \mathbb{R}^p$:
\begin{align*}
u^\top B\Omega B^\top u
= (B^\top u)^\top \Omega (B^\top u)
\geq 0.
\end{align*}
The inequality follows from the positive definiteness of $\Omega$; it also covers the case $B^\top u = 0$, where the value is exactly $0$. Hence $B\Omega B^\top$ is positive semidefinite.[/guided]
custom_env
admin
[step:Conclude the best linear unbiased property]
For every linear estimator $AY$ that is unbiased for every $\beta \in \mathbb{R}^p$, we have shown that
\begin{align*}
\operatorname{Var}(AY) - \operatorname{Var}(\hat{\beta}_{\Omega})
= B\Omega B^\top
\end{align*}
for some $B \in \mathbb{R}^{p \times n}$, and this matrix is positive semidefinite. Therefore $\hat{\beta}_{\Omega}$ has covariance no larger than any other uniformly unbiased linear estimator in the Loewner order. Since $\hat{\beta}_{\Omega}$ is itself linear and unbiased for every $\beta \in \mathbb{R}^p$, it is the [best linear unbiased estimator](/page/Best%20Linear%20Unbiased%20Estimator) of $\beta$ in this comparison class.
[/step]