Omitted Variable Bias Formula — Statement & Proof

Omitted Variable Bias Formula (Theorem # 4453)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] Substitute the population linear projection of $Z$ on an intercept and $X$ into the structural equation for $Y$. This rewrites $Y$ as an affine function of $X$ plus a composite error. We then verify directly that the composite error is orthogonal to both the constant regressor and $X$, so the affine coefficient obtained from the substitution satisfies the population least-squares normal equations. Since $\operatorname{Var}(X)>0$, those normal equations have a unique solution, giving the stated coefficient. [/proofplan] [step:Rewrite the structural equation using the projection of $Z$ on $X$] Define the composite error random variable $w:\Omega\to\mathbb{R}$ by \begin{align*} w := u+\beta_2v. \end{align*} Substituting \begin{align*} Z=\pi_0+\pi_1X+v \end{align*} into \begin{align*} Y=\beta_0+\beta_1X+\beta_2Z+u \end{align*} gives \begin{align*} Y &=\beta_0+\beta_1X+\beta_2(\pi_0+\pi_1X+v)+u\\ &=(\beta_0+\beta_2\pi_0)+(\beta_1+\beta_2\pi_1)X+w. \end{align*} Define \begin{align*} \alpha_0 &:= \beta_0+\beta_2\pi_0,\\ \alpha_1 &:= \beta_1+\beta_2\pi_1. \end{align*} Then \begin{align*} Y=\alpha_0+\alpha_1X+w. \end{align*} [guided] The point of the proof is to express the misspecified regression of $Y$ on only $X$ in a form where the coefficient of $X$ can be read off from the population normal equations. We start from the given linear projection of the omitted variable: \begin{align*} Z=\pi_0+\pi_1X+v. \end{align*} Substituting this identity into the structural equation for $Y$ gives \begin{align*} Y &=\beta_0+\beta_1X+\beta_2Z+u\\ &=\beta_0+\beta_1X+\beta_2(\pi_0+\pi_1X+v)+u\\ &=(\beta_0+\beta_2\pi_0)+(\beta_1+\beta_2\pi_1)X+(u+\beta_2v). \end{align*} Define the composite error random variable $w:\Omega\to\mathbb{R}$ by \begin{align*} w:=u+\beta_2v, \end{align*} and define the two constants \begin{align*} \alpha_0 &:= \beta_0+\beta_2\pi_0,\\ \alpha_1 &:= \beta_1+\beta_2\pi_1. \end{align*} With this notation, the equation becomes \begin{align*} Y=\alpha_0+\alpha_1X+w. \end{align*} The desired conclusion is exactly that the population regression coefficient $\tilde{\beta}_1$ equals this coefficient $\alpha_1$. [/guided] [/step] [step:Show that the composite error is orthogonal to the regressors] We first prove $\mathbb{E}[u\mid X]=0$. Let $h:\mathbb{R}\to\mathbb{R}$ be any bounded Borel measurable function. Since $h(X)$ is measurable with respect to the $\sigma$-algebra generated by $(X,Z)$, the defining property of conditional expectation gives \begin{align*} \mathbb{E}[h(X)u] =\mathbb{E}\big[h(X)\mathbb{E}[u\mid X,Z]\big] =0. \end{align*} Thus $\mathbb{E}[u\mid X]=0$. Therefore \begin{align*} \mathbb{E}[w\mid X] &=\mathbb{E}[u+\beta_2v\mid X]\\ &=\mathbb{E}[u\mid X]+\beta_2\mathbb{E}[v\mid X]\\ &=0. \end{align*} Taking $h(X)=1$ and $h(X)=X$ in the conditional-mean-zero identity, using square-integrability to justify integrability of $Xw$, yields \begin{align*} \mathbb{E}[w]&=0,\\ \mathbb{E}[Xw]&=0. \end{align*} [guided] The rewritten equation is useful only if the new error $w$ is orthogonal to the regressors in the population regression on an intercept and $X$. Those regressors are the constant random variable $1$ and the random variable $X$. First we derive $\mathbb{E}[u\mid X]=0$ from the stronger assumption $\mathbb{E}[u\mid X,Z]=0$. Let $h:\mathbb{R}\to\mathbb{R}$ be any bounded Borel measurable function. Then $h(X)$ is measurable with respect to the information contained in $(X,Z)$, so the defining property of conditional expectation gives \begin{align*} \mathbb{E}[h(X)u] =\mathbb{E}\big[h(X)\mathbb{E}[u\mid X,Z]\big]. \end{align*} The hypothesis $\mathbb{E}[u\mid X,Z]=0$ makes the right-hand side equal to $0$, so \begin{align*} \mathbb{E}[h(X)u]=0. \end{align*} This is precisely the conditional-mean-zero condition $\mathbb{E}[u\mid X]=0$. Now use the definition \begin{align*} w:=u+\beta_2v. \end{align*} By linearity of conditional expectation and the hypothesis $\mathbb{E}[v\mid X]=0$, \begin{align*} \mathbb{E}[w\mid X] &=\mathbb{E}[u+\beta_2v\mid X]\\ &=\mathbb{E}[u\mid X]+\beta_2\mathbb{E}[v\mid X]\\ &=0+\beta_2\cdot 0\\ &=0. \end{align*} Finally, conditional mean zero implies orthogonality to every square-integrable function of $X$. In particular, using the constant function and the identity function of $X$, we obtain \begin{align*} \mathbb{E}[w]&=0,\\ \mathbb{E}[Xw]&=0. \end{align*} Square-integrability of $X,u,v$ ensures that these expectations are finite. [/guided] [/step] [step:Verify that the rewritten coefficients solve the population normal equations] For population least squares on an intercept and $X$, the normal equations are \begin{align*} \mathbb{E}[Y-a-bX]&=0,\\ \mathbb{E}[X(Y-a-bX)]&=0. \end{align*} Substituting $(a,b)=(\alpha_0,\alpha_1)$ and using $Y=\alpha_0+\alpha_1X+w$ gives \begin{align*} \mathbb{E}[Y-\alpha_0-\alpha_1X] &=\mathbb{E}[w] =0, \end{align*} and \begin{align*} \mathbb{E}[X(Y-\alpha_0-\alpha_1X)] &=\mathbb{E}[Xw] =0. \end{align*} Hence $(\alpha_0,\alpha_1)$ satisfies the population least-squares normal equations. [guided] To identify the population regression coefficients, we use the normal equations. A pair $(a,b)\in\mathbb{R}^2$ minimizes \begin{align*} \mathbb{E}\big[(Y-a-bX)^2\big] \end{align*} only if the residual $Y-a-bX$ is orthogonal to both regressors $1$ and $X$. Thus the normal equations are \begin{align*} \mathbb{E}[Y-a-bX]&=0,\\ \mathbb{E}[X(Y-a-bX)]&=0. \end{align*} We test the candidate coefficients obtained from the substitution: \begin{align*} a=\alpha_0,\qquad b=\alpha_1. \end{align*} Since \begin{align*} Y=\alpha_0+\alpha_1X+w, \end{align*} the residual from this candidate regression is exactly $w$. Therefore \begin{align*} \mathbb{E}[Y-\alpha_0-\alpha_1X] &=\mathbb{E}[w] =0, \end{align*} and \begin{align*} \mathbb{E}[X(Y-\alpha_0-\alpha_1X)] &=\mathbb{E}[Xw] =0. \end{align*} So the coefficients produced by the substitution satisfy precisely the two equations characterizing the population least-squares projection onto the span of $1$ and $X$. [/guided] [/step] [step:Use uniqueness of the population projection to identify the coefficient on $X$] Because $\operatorname{Var}(X)>0$, the regressors $1$ and $X$ are linearly independent in $L^2(\Omega,\mathcal{F},\mathbb{P})$. Equivalently, the population Gram matrix \begin{align*} G := \begin{pmatrix} 1 & \mathbb{E}[X]\\ \mathbb{E}[X] & \mathbb{E}[X^2] \end{pmatrix} \end{align*} has determinant \begin{align*} \det G =\mathbb{E}[X^2]-(\mathbb{E}[X])^2 =\operatorname{Var}(X) >0. \end{align*} Thus the population normal equations have a unique solution. Since both $(\tilde{\beta}_0,\tilde{\beta}_1)$ and $(\alpha_0,\alpha_1)$ solve them, they are equal. In particular, \begin{align*} \tilde{\beta}_1 =\alpha_1 =\beta_1+\beta_2\pi_1. \end{align*} This proves the omitted variable bias formula. [/step]

Prerequisites (0/3 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Expectation Definition Orthogonality Definition Determinant Definition Affine Regression as Projection of the Conditional Mean Probability & Statistics Martingale Regularisation Theorem Stochastic Processes Expectation of a Product of Independent Variables Probability Theory Preservation of Independence Probability Theory Asymptotic Normality of Random Design Ordinary Least Squares Probability & Statistics Joint Law Criterion for Independence Probability Theory Conditional Expectation on a Finite Partition Probability Theory Likelihood-Ratio Deviance Comparison Theorem for Nested GLMs Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.

Omitted Variable Bias Formula (Theorem # 4453)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Omitted Variable Bias Formula (Theorem # 4453)

Discussion

Proof

Prerequisites (0/3 completed)

Prerequisites Graph

Explore Further