Population Normal Equations for Simple Linear Regression

Population Normal Equations for Simple Linear Regression (Theorem # 4426)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We expand the quadratic risk $Q(a,b)$ using only finite first and second moments, so the first-order conditions can be computed algebraically. The two resulting normal equations are then solved explicitly: the intercept equation expresses $a$ in terms of $b$, and the slope equation reduces to $\operatorname{Cov}(X,Y)-b\operatorname{Var}(X)=0$. Finally, after inserting the resulting coefficients, the residual is orthogonal to both $1$ and $X$, so every other risk differs by the non-negative quantity $\mathbb E[(c+dX)^2]$; the assumption $\operatorname{Var}(X)>0$ makes equality possible only for $c=d=0$. [/proofplan] [step:Expand the quadratic risk into finite moments] Since $X,Y\in L^2(\Omega,\mathcal F,\mathbb P)$, the random variables $X$, $Y$, $X^2$, $Y^2$, and $XY$ are integrable. The integrability of $XY$ follows from the elementary inequality \begin{align*} |XY|\leq \frac{1}{2}X^2+\frac{1}{2}Y^2. \end{align*} Thus, for every $(a,b)\in\mathbb R^2$, expanding the square gives \begin{align*} Q(a,b) &=\mathbb E[(Y-a-bX)^2]\\ &=\mathbb E[Y^2]-2a\mathbb E[Y]-2b\mathbb E[XY]+a^2+2ab\mathbb E[X]+b^2\mathbb E[X^2]. \end{align*} Therefore $Q$ is a real quadratic polynomial in $(a,b)$ with finite coefficients. [guided] The purpose of this step is to avoid any hidden differentiating-under-the-expectation issue. We first show that every coefficient in the expanded expression is finite. Because $X,Y\in L^2(\Omega,\mathcal F,\mathbb P)$, the second moments $\mathbb E[X^2]$ and $\mathbb E[Y^2]$ are finite. The mixed term $XY$ is integrable because, pointwise on $\Omega$, \begin{align*} |XY|\leq \frac{1}{2}X^2+\frac{1}{2}Y^2, \end{align*} and the right-hand side has finite expectation. Since a probability measure has total mass $1$, square-integrability also implies $\mathbb E[|X|]<\infty$ and $\mathbb E[|Y|]<\infty$. Now expand the square before taking expectations: \begin{align*} (Y-a-bX)^2 =Y^2-2aY-2bXY+a^2+2abX+b^2X^2. \end{align*} Taking expectations term by term is justified by the integrability just verified, and gives \begin{align*} Q(a,b) &=\mathbb E[Y^2]-2a\mathbb E[Y]-2b\mathbb E[XY]+a^2+2ab\mathbb E[X]+b^2\mathbb E[X^2]. \end{align*} Thus $Q:\mathbb R^2\to\mathbb R$ is an ordinary quadratic polynomial in the two real variables $a$ and $b$. [/guided] [/step] [step:Derive the population normal equations from the quadratic polynomial] For fixed $(a,b)\in\mathbb R^2$, the partial derivatives of the polynomial expression for $Q$ are \begin{align*} \frac{\partial Q}{\partial a}(a,b) &=-2\mathbb E[Y]+2a+2b\mathbb E[X]\\ &=-2\mathbb E[Y-a-bX], \end{align*} and \begin{align*} \frac{\partial Q}{\partial b}(a,b) &=-2\mathbb E[XY]+2a\mathbb E[X]+2b\mathbb E[X^2]\\ &=-2\mathbb E[X(Y-a-bX)]. \end{align*} Hence every unconstrained minimizer of $Q$ must satisfy \begin{align*} \mathbb E[Y-a-bX]&=0,& \mathbb E[X(Y-a-bX)]&=0. \end{align*} [/step] [step:Solve the normal equations explicitly] Assume $(a,b)\in\mathbb R^2$ satisfies the normal equations. From the first equation, \begin{align*} 0=\mathbb E[Y-a-bX]=\mathbb E[Y]-a-b\mathbb E[X], \end{align*} so \begin{align*} a=\mathbb E[Y]-b\mathbb E[X]. \end{align*} Substitute this value of $a$ into the second equation: \begin{align*} 0 &=\mathbb E[X(Y-a-bX)]\\ &=\mathbb E[XY]-a\mathbb E[X]-b\mathbb E[X^2]\\ &=\mathbb E[XY]-(\mathbb E[Y]-b\mathbb E[X])\mathbb E[X]-b\mathbb E[X^2]\\ &=\mathbb E[XY]-\mathbb E[X]\mathbb E[Y]-b\bigl(\mathbb E[X^2]-(\mathbb E[X])^2\bigr)\\ &=\operatorname{Cov}(X,Y)-b\operatorname{Var}(X). \end{align*} Since $\operatorname{Var}(X)>0$, this gives \begin{align*} b=\frac{\operatorname{Cov}(X,Y)}{\operatorname{Var}(X)}. \end{align*} Consequently \begin{align*} a=\mathbb E[Y]-\frac{\operatorname{Cov}(X,Y)}{\operatorname{Var}(X)}\mathbb E[X]. \end{align*} Thus the normal equations have exactly one solution, namely $(\alpha,\beta)$. [guided] We now solve the two equations as a linear system in the unknown [real numbers](/page/Real%20Numbers) $a$ and $b$. The first normal equation is \begin{align*} 0=\mathbb E[Y-a-bX]. \end{align*} Linearity of expectation gives \begin{align*} 0=\mathbb E[Y]-a-b\mathbb E[X], \end{align*} so the intercept must be \begin{align*} a=\mathbb E[Y]-b\mathbb E[X]. \end{align*} The second normal equation is \begin{align*} 0=\mathbb E[X(Y-a-bX)]. \end{align*} Expanding the product and using linearity of expectation gives \begin{align*} 0=\mathbb E[XY]-a\mathbb E[X]-b\mathbb E[X^2]. \end{align*} Substitute the expression $a=\mathbb E[Y]-b\mathbb E[X]$ obtained from the first equation: \begin{align*} 0 &=\mathbb E[XY]-(\mathbb E[Y]-b\mathbb E[X])\mathbb E[X]-b\mathbb E[X^2]\\ &=\mathbb E[XY]-\mathbb E[X]\mathbb E[Y]+b(\mathbb E[X])^2-b\mathbb E[X^2]\\ &=\mathbb E[XY]-\mathbb E[X]\mathbb E[Y]-b\bigl(\mathbb E[X^2]-(\mathbb E[X])^2\bigr). \end{align*} The identity \begin{align*} \operatorname{Cov}(X,Y)=\mathbb E[XY]-\mathbb E[X]\mathbb E[Y] \end{align*} follows by expanding $\mathbb E[(X-\mathbb E[X])(Y-\mathbb E[Y])]$, and \begin{align*} \operatorname{Var}(X)=\mathbb E[X^2]-(\mathbb E[X])^2. \end{align*} Therefore the second normal equation becomes \begin{align*} 0=\operatorname{Cov}(X,Y)-b\operatorname{Var}(X). \end{align*} Because $\operatorname{Var}(X)>0$, division by $\operatorname{Var}(X)$ is valid, and hence \begin{align*} b=\frac{\operatorname{Cov}(X,Y)}{\operatorname{Var}(X)}. \end{align*} Substituting this value of $b$ into the intercept formula gives \begin{align*} a=\mathbb E[Y]-\frac{\operatorname{Cov}(X,Y)}{\operatorname{Var}(X)}\mathbb E[X]. \end{align*} So the normal equations have the unique solution $(a,b)=(\alpha,\beta)$. [/guided] [/step] [step:Show the normal-equation solution is the unique minimizer] Define the residual random variable \begin{align*} R:\Omega&\to\mathbb R,\\ \omega&\mapsto Y(\omega)-\alpha-\beta X(\omega). \end{align*} By the previous step, $(\alpha,\beta)$ satisfies the normal equations, so \begin{align*} \mathbb E[R]&=0,& \mathbb E[XR]&=0. \end{align*} Let $(a,b)\in\mathbb R^2$ be arbitrary, and define $c:=a-\alpha$ and $d:=b-\beta$. Then \begin{align*} Y-a-bX=R-c-dX. \end{align*} Expanding the square and using the two orthogonality relations gives \begin{align*} Q(a,b) &=\mathbb E[(R-c-dX)^2]\\ &=\mathbb E[R^2]-2c\mathbb E[R]-2d\mathbb E[XR]+\mathbb E[(c+dX)^2]\\ &=Q(\alpha,\beta)+\mathbb E[(c+dX)^2]\\ &\geq Q(\alpha,\beta). \end{align*} Thus $(\alpha,\beta)$ is a minimizer. If equality holds, then $\mathbb E[(c+dX)^2]=0$, so $c+dX=0$ $\mathbb P$-a.s. If $d\neq 0$, then $X=-c/d$ $\mathbb P$-a.s., which implies $\operatorname{Var}(X)=0$, contradicting the hypothesis. Hence $d=0$, and then $c=0$. Therefore $a=\alpha$ and $b=\beta$, so the minimizer is unique. [/step] [step:Identify the minimizer with the normal equations] We have shown that every minimizer satisfies the normal equations, that the normal equations have the unique solution $(\alpha,\beta)$, and that $(\alpha,\beta)$ is the unique minimizer of $Q$. Therefore, for every $(a,b)\in\mathbb R^2$, \begin{align*} (a,b)=(\alpha,\beta) \end{align*} if and only if \begin{align*} \mathbb E[Y-a-bX]&=0,& \mathbb E[X(Y-a-bX)]&=0. \end{align*} This proves both the stated formulas and the claimed equivalence with the population normal equations. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Orthogonality Definition Expectation Definition Moments from the MGF Probability Theory Identifiability Under Full Column Rank Probability & Statistics PGF of a Sum Probability Theory Law of Total Probability Probability Theory Boundary Continuity Under the Cone Condition Brownian Motion Ordinary Least Squares Projection Theorem Probability & Statistics Discrete Factorisation Criterion Probability Theory Structure of the Zero Set of Brownian Motion Brownian Motion Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.