[proofplan]
We first use the full-column-rank assumption to make the least-squares formula well-defined. Substituting the linear model into the estimator gives the exact decomposition of $\hat\beta$ into the deterministic target $\beta$ plus a conditional-mean-zero error term. Since the matrix multiplying $\varepsilon$ is $\sigma(X)$-measurable, strict exogeneity removes that error term after conditioning on $X$. The unconditional assertion then follows by applying the defining averaging property of conditional expectation.
[/proofplan]
[step:Use full column rank to justify the inverse in the OLS formula]
Let $\Omega_0\in\mathcal F$ denote the event
\begin{align*}
\Omega_0:=\{\omega\in\Omega:\operatorname{rank}X(\omega)=p\}.
\end{align*}
By hypothesis, $\mathbb P(\Omega_0)=1$. For every $\omega\in\Omega_0$, the matrix $X(\omega)^\top X(\omega)\in\mathbb R^{p\times p}$ is invertible. Indeed, if $v\in\mathbb R^p$ satisfies
\begin{align*}
X(\omega)^\top X(\omega)v=0,
\end{align*}
then multiplying on the left by $v^\top$ gives
\begin{align*}
|X(\omega)v|^2=v^\top X(\omega)^\top X(\omega)v=0.
\end{align*}
Thus $X(\omega)v=0$, and since $X(\omega)$ has rank $p$, its nullspace is $\{0\}$, so $v=0$. Therefore $X^\top X$ is invertible almost surely, and $\hat\beta=(X^\top X)^{-1}X^\top y$ is well-defined almost surely.
[/step]
[step:Decompose the estimator into the target plus a noise term]
Define the random matrix
\begin{align*}
A:\Omega&\to\mathbb R^{p\times n}\\
\omega&\mapsto (X(\omega)^\top X(\omega))^{-1}X(\omega)^\top
\end{align*}
on $\Omega_0$. Since $A$ is obtained from $X$ by matrix multiplication and inversion on the [open set](/page/Open%20Set) of invertible $p\times p$ matrices, $A$ is $\sigma(X)$-measurable.
Using $y=X\beta+\varepsilon$, we compute almost surely:
\begin{align*}
\hat\beta
&=(X^\top X)^{-1}X^\top y\\
&=(X^\top X)^{-1}X^\top(X\beta+\varepsilon)\\
&=(X^\top X)^{-1}X^\top X\beta+(X^\top X)^{-1}X^\top\varepsilon\\
&=\beta+A\varepsilon.
\end{align*}
[guided]
The goal is to isolate the part of $\hat\beta$ whose conditional expectation is controlled by strict exogeneity. Define
\begin{align*}
A:\Omega&\to\mathbb R^{p\times n}\\
\omega&\mapsto (X(\omega)^\top X(\omega))^{-1}X(\omega)^\top.
\end{align*}
This is the random matrix that converts the response vector $y$ into the least-squares coefficient vector. It is $\sigma(X)$-measurable because it is a function only of $X$.
Now substitute the model equation $y=X\beta+\varepsilon$ into the estimator:
\begin{align*}
\hat\beta
&=(X^\top X)^{-1}X^\top y\\
&=(X^\top X)^{-1}X^\top(X\beta+\varepsilon)\\
&=(X^\top X)^{-1}X^\top X\beta+(X^\top X)^{-1}X^\top\varepsilon\\
&=\beta+A\varepsilon.
\end{align*}
The equality $(X^\top X)^{-1}X^\top X\beta=\beta$ is valid almost surely because $X^\top X$ is invertible almost surely. Thus the estimator differs from the true parameter only by the transformed noise term $A\varepsilon$.
[/guided]
[/step]
[step:Condition on $X$ and remove the noise term by strict exogeneity]
Since $\hat\beta$ is integrable by hypothesis, its conditional expectation with respect to $\sigma(X)$ is defined. Using the decomposition above and linearity of conditional expectation,
\begin{align*}
\mathbb E[\hat\beta\mid\sigma(X)]
&=\mathbb E[\beta+A\varepsilon\mid\sigma(X)]\\
&=\beta+\mathbb E[A\varepsilon\mid\sigma(X)].
\end{align*}
Because $A$ is $\sigma(X)$-measurable, the pull-out property of conditional expectation gives
\begin{align*}
\mathbb E[A\varepsilon\mid\sigma(X)]
=
A\,\mathbb E[\varepsilon\mid\sigma(X)].
\end{align*}
Strict exogeneity says $\mathbb E[\varepsilon\mid\sigma(X)]=0$ almost surely, hence
\begin{align*}
\mathbb E[\hat\beta\mid\sigma(X)]
=
\beta+A0
=
\beta
\end{align*}
almost surely.
[guided]
After the algebraic decomposition, all randomness in the estimation error is contained in $A\varepsilon$. The important point is that $A$ is known once $X$ is known: it is $\sigma(X)$-measurable. Therefore, when conditioning on $\sigma(X)$, the matrix $A$ behaves as a fixed coefficient matrix.
Using linearity of conditional expectation,
\begin{align*}
\mathbb E[\hat\beta\mid\sigma(X)]
&=\mathbb E[\beta+A\varepsilon\mid\sigma(X)]\\
&=\beta+\mathbb E[A\varepsilon\mid\sigma(X)].
\end{align*}
The vector $\beta$ is deterministic, so its conditional expectation is itself. For the second term, the pull-out property applies because $A$ is $\sigma(X)$-measurable:
\begin{align*}
\mathbb E[A\varepsilon\mid\sigma(X)]
=
A\,\mathbb E[\varepsilon\mid\sigma(X)].
\end{align*}
Now strict exogeneity is used exactly once:
\begin{align*}
\mathbb E[\varepsilon\mid\sigma(X)] = 0
\end{align*}
almost surely. Substituting this into the previous display gives
\begin{align*}
\mathbb E[\hat\beta\mid\sigma(X)]
=
\beta+A0
=
\beta
\end{align*}
almost surely. This proves conditional unbiasedness of the OLS estimator.
[/guided]
[/step]
[step:Average the conditional identity to obtain unconditional unbiasedness]
Assume $\mathbb E[\hat\beta]$ exists. Taking expectations in the almost sure identity
\begin{align*}
\mathbb E[\hat\beta\mid\sigma(X)]=\beta
\end{align*}
and using the defining averaging property of conditional expectation, we obtain
\begin{align*}
\mathbb E[\hat\beta]
=
\mathbb E\big[\mathbb E[\hat\beta\mid\sigma(X)]\big]
=
\mathbb E[\beta]
=
\beta.
\end{align*}
Thus the ordinary least squares estimator is unconditionally unbiased whenever its unconditional expectation is defined.
[/step]