[proofplan]
We derive the steepest descent update by specializing the [Optimal Step Size via Exact Line Search](/theorems/1396) to the case $d^{(k)} = r^{(k)}$. Substituting $d = r^{(k)}$ into the exact line search formula gives $\alpha_k = \langle r^{(k)}, r^{(k)} \rangle / \langle r^{(k)}, Ar^{(k)} \rangle = \|r^{(k)}\|_2^2 / \|r^{(k)}\|_A^2$, and the update $x^{(k+1)} = x^{(k)} + \alpha_k r^{(k)}$ follows immediately.
[/proofplan]
[step:Identify the search direction and verify the hypotheses of the exact line search formula]
In steepest descent, the search direction is chosen as $d^{(k)} = r^{(k)}$, the negative gradient direction (since $\nabla F(x^{(k)}) = -r^{(k)}$ by the [Gradient Equals Negative Residual](/theorems/1395) theorem).
The [Optimal Step Size via Exact Line Search](/theorems/1396) requires that $\langle d^{(k)}, Ad^{(k)} \rangle \neq 0$. Since $A$ is positive definite and $r^{(k)} \neq 0$ (the iteration has not yet converged), we have $\langle r^{(k)}, Ar^{(k)} \rangle > 0$, so the hypothesis is satisfied.
[guided]
Why is $d^{(k)} = r^{(k)}$ the steepest descent direction? The quadratic $F(x) = \frac{1}{2}x^\top Ax - b^\topx$ has gradient $\nabla F(x) = Ax - b = -r$. The direction of steepest decrease is $-\nabla F = r$, which is the residual. So choosing $d^{(k)} = r^{(k)}$ means we move in the direction that locally decreases $F$ most rapidly.
For the exact line search formula to be valid, we need $\langle d^{(k)}, Ad^{(k)} \rangle \neq 0$. Since $A$ is SPD and $r^{(k)} \neq 0$, the quadratic form $\langle r^{(k)}, Ar^{(k)} \rangle$ is strictly positive — this is the definition of positive definiteness. The condition $r^{(k)} \neq 0$ holds whenever the iteration has not yet converged, i.e., $x^{(k)} \neq A^{-1}b$.
[/guided]
[/step]
[step:Substitute $d^{(k)} = r^{(k)}$ into the exact line search formula]
By the [Optimal Step Size via Exact Line Search](/theorems/1396), the optimal step size along $d^{(k)}$ is
\begin{align*}
\alpha_k = \frac{\langle r^{(k)}, d^{(k)} \rangle}{\langle d^{(k)}, Ad^{(k)} \rangle}.
\end{align*}
Substituting $d^{(k)} = r^{(k)}$:
\begin{align*}
\alpha_k = \frac{\langle r^{(k)}, r^{(k)} \rangle}{\langle r^{(k)}, Ar^{(k)} \rangle} = \frac{\|r^{(k)}\|_2^2}{\langle r^{(k)}, Ar^{(k)} \rangle}.
\end{align*}
[guided]
The numerator $\langle r^{(k)}, r^{(k)} \rangle = \|r^{(k)}\|_2^2$ is the squared Euclidean norm of the residual. The denominator $\langle r^{(k)}, Ar^{(k)} \rangle$ is the $A$-inner product of the residual with itself. Both quantities are strictly positive (the numerator because $r^{(k)} \neq 0$, the denominator by positive definiteness of $A$), so $\alpha_k > 0$. This confirms we take a positive step in the descent direction, as expected.
[/guided]
[/step]
[step:Express the denominator as $\|r^{(k)}\|_A^2$ and write the update]
The denominator $\langle r^{(k)}, Ar^{(k)} \rangle$ is precisely $\|r^{(k)}\|_A^2$, the squared $A$-norm of the residual (defined by $\|v\|_A^2 := \langle v, Av \rangle$ for any $v \in \mathbb{R}^n$). The step size is therefore
\begin{align*}
\alpha_k = \frac{\|r^{(k)}\|_2^2}{\|r^{(k)}\|_A^2},
\end{align*}
and the steepest descent update is
\begin{align*}
x^{(k+1)} = x^{(k)} + \alpha_k d^{(k)} = x^{(k)} + \frac{\|r^{(k)}\|_2^2}{\|r^{(k)}\|_A^2}\,r^{(k)}.
\end{align*}
[guided]
The ratio $\|r^{(k)}\|_2^2 / \|r^{(k)}\|_A^2$ has a clean interpretation. By the Rayleigh quotient, $\|r\|_A^2 / \|r\|_2^2 = r^\top A r / r^\top r$ lies between $\lambda_{\min}(A)$ and $\lambda_{\max}(A)$. Therefore
\begin{align*}
\frac{1}{\lambda_{\max}(A)} \leq \alpha_k = \frac{\|r^{(k)}\|_2^2}{\|r^{(k)}\|_A^2} \leq \frac{1}{\lambda_{\min}(A)}.
\end{align*}
The step size is always bounded between the reciprocals of the extreme eigenvalues. When the residual is aligned with an eigenvector of $A$, the step size equals the reciprocal of the corresponding eigenvalue — in this case, steepest descent takes a single exact step toward the solution along that eigenspace.
[/guided]
[/step]