[proofplan]
We compute $\nabla F(x)$ for the quadratic form $F(x) = \frac{1}{2}\langle x, Ax \rangle - \langle b, x \rangle$ by differentiating each term with respect to $x$. The symmetry of $A$ gives $\nabla(\frac{1}{2}x^\top Ax) = Ax$ and the linear term contributes $-b$, yielding $\nabla F(x) = Ax - b = -r$.
[/proofplan]
[step:Differentiate $F(x) = \frac{1}{2}\langle x, Ax \rangle - \langle b, x \rangle$ term by term]
Write $F$ in component form. In coordinates $x = (x_1, \ldots, x_n)^\top$:
\begin{align*}
F(x) = \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n} a_{ij} x_i x_j - \sum_{i=1}^{n} b_i x_i.
\end{align*}
Differentiating with respect to $x_\ell$ for $\ell = 1, \ldots, n$:
\begin{align*}
\frac{\partial F}{\partial x_\ell} = \frac{1}{2}\sum_{j=1}^{n} a_{\ell j} x_j + \frac{1}{2}\sum_{i=1}^{n} a_{i\ell} x_i - b_\ell.
\end{align*}
Since $A$ is symmetric ($a_{i\ell} = a_{\ell i}$), the two sums are equal:
\begin{align*}
\frac{\partial F}{\partial x_\ell} = \sum_{j=1}^{n} a_{\ell j} x_j - b_\ell = (Ax)_\ell - b_\ell.
\end{align*}
Assembling the components into a vector:
\begin{align*}
\nabla F(x) = Ax - b.
\end{align*}
Evaluating at $x = x^{(k)}$ and using the definition $r^{(k)} := b - Ax^{(k)}$:
\begin{align*}
\nabla F(x^{(k)}) = Ax^{(k)} - b = -(b - Ax^{(k)}) = -r^{(k)}.
\end{align*}
[guided]
To differentiate $\frac{1}{2}x^\top Ax$, we use the product rule for bilinear forms. In index notation, $\frac{1}{2}\sum_{i,j} a_{ij} x_i x_j$ has two types of contributions to $\partial/\partial x_\ell$: terms where $i = \ell$ (giving $\frac{1}{2}\sum_j a_{\ell j} x_j$) and terms where $j = \ell$ (giving $\frac{1}{2}\sum_i a_{i\ell} x_i$). Symmetry of $A$ means $a_{i\ell} = a_{\ell i}$, so these two contributions are identical, and their sum is $\sum_j a_{\ell j} x_j = (Ax)_\ell$.
In matrix calculus notation, the standard identity for a symmetric matrix $A$ is $\nabla(x^\top Ax) = 2Ax$, so $\nabla(\frac{1}{2}x^\top Ax) = Ax$. The linear term $-b^\top x$ differentiates to $-b$.
The identity $\nabla F = -r$ is fundamental to the conjugate gradient method: it means that **the negative gradient of the quadratic objective is exactly the residual**. Minimizing $F$ is therefore equivalent to driving the residual to zero, which is the same as solving $Ax = b$. This is why steepest descent (moving in the direction $-\nabla F = r$) reduces both the residual norm and the objective value simultaneously.
[/guided]
[/step]