[proofplan]
We show that $x^*$ minimises $\|Ax - b\|^2$ if and only if $A^\top(Ax^* - b) = \mathbf{0}$. The proof computes the gradient of the quadratic objective $\phi(x) = \|Ax - b\|^2$, sets it to zero to obtain the normal equations, and uses the convexity of $\phi$ (from the positive semi-definiteness of $A^\top A$) to show that every critical point is a global minimiser and vice versa.
[/proofplan]
[step:Compute the gradient of the quadratic objective]
Define the objective function
\begin{align*}
\phi: \mathbb{R}^n &\to \mathbb{R} \\
x &\mapsto \|Ax - b\|^2 = (Ax - b)^\top(Ax - b) = x^\top A^\top Ax - 2b^\top Ax + \|b\|^2.
\end{align*}
This is a quadratic function of $x$ with gradient
\begin{align*}
\nabla\phi(x) = 2A^\top Ax - 2A^\topb = 2A^\top(Ax - b).
\end{align*}
Setting $\nabla\phi(x^*) = 0$ gives $A^\top(Ax^* - b) = 0$, equivalently $A^\top A\,x^* = A^\topb$.
[/step]
[step:Verify that $\phi$ is convex so critical points are global minimisers]
The matrix $A^\top A$ is positive semi-definite: for any $x \in \mathbb{R}^n$,
\begin{align*}
x^\top A^\top Ax = \|Ax\|^2 \geq 0.
\end{align*}
Since $\phi$ is a quadratic function with positive semi-definite Hessian $2A^\top A$, the function $\phi$ is convex.
For a convex function, any critical point is a global minimiser: if $\nabla\phi(x^*) = 0$, then by the first-order convexity condition $\phi(x) \geq \phi(x^*) + \nabla\phi(x^*)^\top(x - x^*) = \phi(x^*)$ for all $x$.
Conversely, any global minimiser of a differentiable function satisfies $\nabla\phi = 0$.
[/step]
[step:Interpret the normal equations geometrically as orthogonal projection]
Define the residual $r^* = Ax^* - b$.
The condition $A^\topr^* = 0$ means that $r^*$ is orthogonal to every column of $A$, i.e., $r^* \perp \operatorname{col}(A)$.
Equivalently, $Ax^*$ is the orthogonal projection of $b$ onto the column space of $A$.
[/step]