[proofplan]
The least-squares intercept estimator is defined precisely so that the sample centroid $(\bar x, \bar Y)$ lies on the fitted regression line. We substitute $x = \bar x$ into the equation $y = \hat\alpha + \hat\beta\, x$ of the fitted line and use the identity $\hat\alpha = \bar Y - \hat\beta\,\bar x$ obtained from the first normal equation. The two occurrences of $\hat\beta\,\bar x$ cancel and we recover $\bar Y$. No further structure is needed beyond the definition of $\hat\alpha$.
[/proofplan]
[step:Recall the defining formula for the intercept from the first normal equation]
Let $(x_1, Y_1), \dots, (x_n, Y_n)$ be the observations with sample means
\begin{align*}
\bar x := \frac{1}{n}\sum_{i=1}^n x_i, \qquad \bar Y := \frac{1}{n}\sum_{i=1}^n Y_i.
\end{align*}
The ordinary least squares estimators $(\hat\alpha, \hat\beta)$ minimise the residual sum of squares
\begin{align*}
S(\alpha, \beta) := \sum_{i=1}^n (Y_i - \alpha - \beta\,x_i)^2.
\end{align*}
Setting $\partial S / \partial \alpha = 0$ yields the first [normal equation](/theorems/501):
\begin{align*}
\sum_{i=1}^n (Y_i - \hat\alpha - \hat\beta\,x_i) = 0.
\end{align*}
Dividing by $n$ and solving for $\hat\alpha$ gives the identity
\begin{align*}
\hat\alpha = \bar Y - \hat\beta\,\bar x,
\end{align*}
which we use in the next step.
[guided]
Before beginning, we clarify which equation of the two normal equations actually forces the centroid property. The residual sum of squares is the map
\begin{align*}
S : \mathbb{R}^2 &\to [0, \infty) \\
(\alpha, \beta) &\mapsto \sum_{i=1}^n (Y_i - \alpha - \beta\,x_i)^2.
\end{align*}
Since $S$ is a non-negative quadratic function of $(\alpha, \beta)$, the global minimiser is characterised by the two first-order conditions $\partial_\alpha S = 0$ and $\partial_\beta S = 0$. It turns out that the centroid property uses **only** the $\partial_\alpha$ equation — the $\partial_\beta$ equation is not needed.
Compute
\begin{align*}
\frac{\partial S}{\partial \alpha} = -2\sum_{i=1}^n (Y_i - \alpha - \beta\,x_i).
\end{align*}
At the minimiser $(\hat\alpha, \hat\beta)$, the first-order condition $\partial_\alpha S (\hat\alpha, \hat\beta) = 0$ therefore reads
\begin{align*}
\sum_{i=1}^n (Y_i - \hat\alpha - \hat\beta\,x_i) = 0.
\end{align*}
This is the statement that the residuals $R_i := Y_i - \hat\alpha - \hat\beta\,x_i$ sum to zero. Dividing through by $n$ and rearranging,
\begin{align*}
\bar Y - \hat\alpha - \hat\beta\,\bar x = 0, \qquad \text{hence} \qquad \hat\alpha = \bar Y - \hat\beta\,\bar x.
\end{align*}
So the first normal equation is **equivalent** to the statement $\hat\alpha = \bar Y - \hat\beta\,\bar x$. It is this rewritten form that drives the argument. Why did we not need the slope equation? Because the centroid property is about a single point; the slope equation governs how the line tilts about that point.
[/guided]
[/step]
[step:Evaluate the fitted line at $x = \bar x$]
The fitted regression line is the map
\begin{align*}
\hat\ell : \mathbb{R} &\to \mathbb{R} \\
x &\mapsto \hat\alpha + \hat\beta\,x.
\end{align*}
Substituting $x = \bar x$ and using the identity $\hat\alpha = \bar Y - \hat\beta\,\bar x$ from the previous step,
\begin{align*}
\hat\ell(\bar x) = \hat\alpha + \hat\beta\,\bar x = (\bar Y - \hat\beta\,\bar x) + \hat\beta\,\bar x = \bar Y.
\end{align*}
Hence the point $(\bar x, \bar Y)$ satisfies the equation of $\hat\ell$, i.e. lies on the fitted line. Since $\bar x$ and $\bar Y$ are arbitrary sample means, this identity holds for every data configuration.
[/step]