Centroid Property of the Fitted Line — Statement & Proof

Centroid Property of the Fitted Line (Theorem # 1438)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The least-squares intercept estimator is defined precisely so that the sample centroid $(\bar x, \bar Y)$ lies on the fitted regression line. We substitute $x = \bar x$ into the equation $y = \hat\alpha + \hat\beta\, x$ of the fitted line and use the identity $\hat\alpha = \bar Y - \hat\beta\,\bar x$ obtained from the first normal equation. The two occurrences of $\hat\beta\,\bar x$ cancel and we recover $\bar Y$. No further structure is needed beyond the definition of $\hat\alpha$. [/proofplan] [step:Recall the defining formula for the intercept from the first normal equation] Let $(x_1, Y_1), \dots, (x_n, Y_n)$ be the observations with sample means \begin{align*} \bar x := \frac{1}{n}\sum_{i=1}^n x_i, \qquad \bar Y := \frac{1}{n}\sum_{i=1}^n Y_i. \end{align*} The ordinary least squares estimators $(\hat\alpha, \hat\beta)$ minimise the residual sum of squares \begin{align*} S(\alpha, \beta) := \sum_{i=1}^n (Y_i - \alpha - \beta\,x_i)^2. \end{align*} Setting $\partial S / \partial \alpha = 0$ yields the first [normal equation](/theorems/501): \begin{align*} \sum_{i=1}^n (Y_i - \hat\alpha - \hat\beta\,x_i) = 0. \end{align*} Dividing by $n$ and solving for $\hat\alpha$ gives the identity \begin{align*} \hat\alpha = \bar Y - \hat\beta\,\bar x, \end{align*} which we use in the next step. [guided] Before beginning, we clarify which equation of the two normal equations actually forces the centroid property. The residual sum of squares is the map \begin{align*} S : \mathbb{R}^2 &\to [0, \infty) \\ (\alpha, \beta) &\mapsto \sum_{i=1}^n (Y_i - \alpha - \beta\,x_i)^2. \end{align*} Since $S$ is a non-negative quadratic function of $(\alpha, \beta)$, the global minimiser is characterised by the two first-order conditions $\partial_\alpha S = 0$ and $\partial_\beta S = 0$. It turns out that the centroid property uses **only** the $\partial_\alpha$ equation — the $\partial_\beta$ equation is not needed. Compute \begin{align*} \frac{\partial S}{\partial \alpha} = -2\sum_{i=1}^n (Y_i - \alpha - \beta\,x_i). \end{align*} At the minimiser $(\hat\alpha, \hat\beta)$, the first-order condition $\partial_\alpha S (\hat\alpha, \hat\beta) = 0$ therefore reads \begin{align*} \sum_{i=1}^n (Y_i - \hat\alpha - \hat\beta\,x_i) = 0. \end{align*} This is the statement that the residuals $R_i := Y_i - \hat\alpha - \hat\beta\,x_i$ sum to zero. Dividing through by $n$ and rearranging, \begin{align*} \bar Y - \hat\alpha - \hat\beta\,\bar x = 0, \qquad \text{hence} \qquad \hat\alpha = \bar Y - \hat\beta\,\bar x. \end{align*} So the first normal equation is **equivalent** to the statement $\hat\alpha = \bar Y - \hat\beta\,\bar x$. It is this rewritten form that drives the argument. Why did we not need the slope equation? Because the centroid property is about a single point; the slope equation governs how the line tilts about that point. [/guided] [/step] [step:Evaluate the fitted line at $x = \bar x$] The fitted regression line is the map \begin{align*} \hat\ell : \mathbb{R} &\to \mathbb{R} \\ x &\mapsto \hat\alpha + \hat\beta\,x. \end{align*} Substituting $x = \bar x$ and using the identity $\hat\alpha = \bar Y - \hat\beta\,\bar x$ from the previous step, \begin{align*} \hat\ell(\bar x) = \hat\alpha + \hat\beta\,\bar x = (\bar Y - \hat\beta\,\bar x) + \hat\beta\,\bar x = \bar Y. \end{align*} Hence the point $(\bar x, \bar Y)$ satisfies the equation of $\hat\ell$, i.e. lies on the fitted line. Since $\bar x$ and $\bar Y$ are arbitrary sample means, this identity holds for every data configuration. [/step]

Explore Further

Slutsky's Lemma Statistics Bayes Risk Bounds Maximal Risk Statistics Chi-Squared Distribution of RSS Statistics Fisher Information Under Reparametrisation Statistics Weak Law of Large Numbers Statistics Univariate Delta Method Statistics Uniform Convergence Under Weak Convergence Statistics Multivariate Central Limit Theorem Statistics

What brings you to Androma?

Start with a route through the knowledge graph.

Centroid Property of the Fitted Line (Theorem # 1438)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Centroid Property of the Fitted Line (Theorem # 1438)

Discussion

Proof

Explore Further