Unbiasedness of Ordinary Least Squares Under Strict Exogeneity

Unbiasedness of Ordinary Least Squares Under Strict Exogeneity (Theorem # 4435)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first use the full-column-rank assumption to make the least-squares formula well-defined. Substituting the linear model into the estimator gives the exact decomposition of $\hat\beta$ into the deterministic target $\beta$ plus a conditional-mean-zero error term. Since the matrix multiplying $\varepsilon$ is $\sigma(X)$-measurable, strict exogeneity removes that error term after conditioning on $X$. The unconditional assertion then follows by applying the defining averaging property of conditional expectation. [/proofplan] [step:Use full column rank to justify the inverse in the OLS formula] Let $\Omega_0\in\mathcal F$ denote the event \begin{align*} \Omega_0:=\{\omega\in\Omega:\operatorname{rank}X(\omega)=p\}. \end{align*} By hypothesis, $\mathbb P(\Omega_0)=1$. For every $\omega\in\Omega_0$, the matrix $X(\omega)^\top X(\omega)\in\mathbb R^{p\times p}$ is invertible. Indeed, if $v\in\mathbb R^p$ satisfies \begin{align*} X(\omega)^\top X(\omega)v=0, \end{align*} then multiplying on the left by $v^\top$ gives \begin{align*} |X(\omega)v|^2=v^\top X(\omega)^\top X(\omega)v=0. \end{align*} Thus $X(\omega)v=0$, and since $X(\omega)$ has rank $p$, its nullspace is $\{0\}$, so $v=0$. Therefore $X^\top X$ is invertible almost surely, and $\hat\beta=(X^\top X)^{-1}X^\top y$ is well-defined almost surely. [/step] [step:Decompose the estimator into the target plus a noise term] Define the random matrix \begin{align*} A:\Omega&\to\mathbb R^{p\times n}\\ \omega&\mapsto (X(\omega)^\top X(\omega))^{-1}X(\omega)^\top \end{align*} on $\Omega_0$. Since $A$ is obtained from $X$ by matrix multiplication and inversion on the [open set](/page/Open%20Set) of invertible $p\times p$ matrices, $A$ is $\sigma(X)$-measurable. Using $y=X\beta+\varepsilon$, we compute almost surely: \begin{align*} \hat\beta &=(X^\top X)^{-1}X^\top y\\ &=(X^\top X)^{-1}X^\top(X\beta+\varepsilon)\\ &=(X^\top X)^{-1}X^\top X\beta+(X^\top X)^{-1}X^\top\varepsilon\\ &=\beta+A\varepsilon. \end{align*} [guided] The goal is to isolate the part of $\hat\beta$ whose conditional expectation is controlled by strict exogeneity. Define \begin{align*} A:\Omega&\to\mathbb R^{p\times n}\\ \omega&\mapsto (X(\omega)^\top X(\omega))^{-1}X(\omega)^\top. \end{align*} This is the random matrix that converts the response vector $y$ into the least-squares coefficient vector. It is $\sigma(X)$-measurable because it is a function only of $X$. Now substitute the model equation $y=X\beta+\varepsilon$ into the estimator: \begin{align*} \hat\beta &=(X^\top X)^{-1}X^\top y\\ &=(X^\top X)^{-1}X^\top(X\beta+\varepsilon)\\ &=(X^\top X)^{-1}X^\top X\beta+(X^\top X)^{-1}X^\top\varepsilon\\ &=\beta+A\varepsilon. \end{align*} The equality $(X^\top X)^{-1}X^\top X\beta=\beta$ is valid almost surely because $X^\top X$ is invertible almost surely. Thus the estimator differs from the true parameter only by the transformed noise term $A\varepsilon$. [/guided] [/step] [step:Condition on $X$ and remove the noise term by strict exogeneity] Since $\hat\beta$ is integrable by hypothesis, its conditional expectation with respect to $\sigma(X)$ is defined. Using the decomposition above and linearity of conditional expectation, \begin{align*} \mathbb E[\hat\beta\mid\sigma(X)] &=\mathbb E[\beta+A\varepsilon\mid\sigma(X)]\\ &=\beta+\mathbb E[A\varepsilon\mid\sigma(X)]. \end{align*} Because $A$ is $\sigma(X)$-measurable, the pull-out property of conditional expectation gives \begin{align*} \mathbb E[A\varepsilon\mid\sigma(X)] = A\,\mathbb E[\varepsilon\mid\sigma(X)]. \end{align*} Strict exogeneity says $\mathbb E[\varepsilon\mid\sigma(X)]=0$ almost surely, hence \begin{align*} \mathbb E[\hat\beta\mid\sigma(X)] = \beta+A0 = \beta \end{align*} almost surely. [guided] After the algebraic decomposition, all randomness in the estimation error is contained in $A\varepsilon$. The important point is that $A$ is known once $X$ is known: it is $\sigma(X)$-measurable. Therefore, when conditioning on $\sigma(X)$, the matrix $A$ behaves as a fixed coefficient matrix. Using linearity of conditional expectation, \begin{align*} \mathbb E[\hat\beta\mid\sigma(X)] &=\mathbb E[\beta+A\varepsilon\mid\sigma(X)]\\ &=\beta+\mathbb E[A\varepsilon\mid\sigma(X)]. \end{align*} The vector $\beta$ is deterministic, so its conditional expectation is itself. For the second term, the pull-out property applies because $A$ is $\sigma(X)$-measurable: \begin{align*} \mathbb E[A\varepsilon\mid\sigma(X)] = A\,\mathbb E[\varepsilon\mid\sigma(X)]. \end{align*} Now strict exogeneity is used exactly once: \begin{align*} \mathbb E[\varepsilon\mid\sigma(X)] = 0 \end{align*} almost surely. Substituting this into the previous display gives \begin{align*} \mathbb E[\hat\beta\mid\sigma(X)] = \beta+A0 = \beta \end{align*} almost surely. This proves conditional unbiasedness of the OLS estimator. [/guided] [/step] [step:Average the conditional identity to obtain unconditional unbiasedness] Assume $\mathbb E[\hat\beta]$ exists. Taking expectations in the almost sure identity \begin{align*} \mathbb E[\hat\beta\mid\sigma(X)]=\beta \end{align*} and using the defining averaging property of conditional expectation, we obtain \begin{align*} \mathbb E[\hat\beta] = \mathbb E\big[\mathbb E[\hat\beta\mid\sigma(X)]\big] = \mathbb E[\beta] = \beta. \end{align*} Thus the ordinary least squares estimator is unconditionally unbiased whenever its unconditional expectation is defined. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Expectation

Explore Further

Expectation Definition Equality Condition in the Gauss-Markov Theorem Probability & Statistics Convergence Criterion via Upcrossings Martingale Theory Independence Is Preserved Under Complements Probability Theory Stirling's Formula Probability Theory Basic Identities for the Hat Matrix Probability & Statistics Positive Disjoint Events Are Not Independent Probability Theory Addition Formula for Two Events Probability Theory Radon-Nikodym Theorem (Probabilistic) Martingale Theory Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.