Steepest Descent Update with Exact Line Search — Statement & Proof

Steepest Descent Update with Exact Line Search (Theorem # 1397)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We derive the steepest descent update by specializing the [Optimal Step Size via Exact Line Search](/theorems/1396) to the case $d^{(k)} = r^{(k)}$. Substituting $d = r^{(k)}$ into the exact line search formula gives $\alpha_k = \langle r^{(k)}, r^{(k)} \rangle / \langle r^{(k)}, Ar^{(k)} \rangle = \|r^{(k)}\|_2^2 / \|r^{(k)}\|_A^2$, and the update $x^{(k+1)} = x^{(k)} + \alpha_k r^{(k)}$ follows immediately. [/proofplan] [step:Identify the search direction and verify the hypotheses of the exact line search formula] In steepest descent, the search direction is chosen as $d^{(k)} = r^{(k)}$, the negative gradient direction (since $\nabla F(x^{(k)}) = -r^{(k)}$ by the [Gradient Equals Negative Residual](/theorems/1395) theorem). The [Optimal Step Size via Exact Line Search](/theorems/1396) requires that $\langle d^{(k)}, Ad^{(k)} \rangle \neq 0$. Since $A$ is positive definite and $r^{(k)} \neq 0$ (the iteration has not yet converged), we have $\langle r^{(k)}, Ar^{(k)} \rangle > 0$, so the hypothesis is satisfied. [guided] Why is $d^{(k)} = r^{(k)}$ the steepest descent direction? The quadratic $F(x) = \frac{1}{2}x^\top Ax - b^\topx$ has gradient $\nabla F(x) = Ax - b = -r$. The direction of steepest decrease is $-\nabla F = r$, which is the residual. So choosing $d^{(k)} = r^{(k)}$ means we move in the direction that locally decreases $F$ most rapidly. For the exact line search formula to be valid, we need $\langle d^{(k)}, Ad^{(k)} \rangle \neq 0$. Since $A$ is SPD and $r^{(k)} \neq 0$, the quadratic form $\langle r^{(k)}, Ar^{(k)} \rangle$ is strictly positive — this is the definition of positive definiteness. The condition $r^{(k)} \neq 0$ holds whenever the iteration has not yet converged, i.e., $x^{(k)} \neq A^{-1}b$. [/guided] [/step] [step:Substitute $d^{(k)} = r^{(k)}$ into the exact line search formula] By the [Optimal Step Size via Exact Line Search](/theorems/1396), the optimal step size along $d^{(k)}$ is \begin{align*} \alpha_k = \frac{\langle r^{(k)}, d^{(k)} \rangle}{\langle d^{(k)}, Ad^{(k)} \rangle}. \end{align*} Substituting $d^{(k)} = r^{(k)}$: \begin{align*} \alpha_k = \frac{\langle r^{(k)}, r^{(k)} \rangle}{\langle r^{(k)}, Ar^{(k)} \rangle} = \frac{\|r^{(k)}\|_2^2}{\langle r^{(k)}, Ar^{(k)} \rangle}. \end{align*} [guided] The numerator $\langle r^{(k)}, r^{(k)} \rangle = \|r^{(k)}\|_2^2$ is the squared Euclidean norm of the residual. The denominator $\langle r^{(k)}, Ar^{(k)} \rangle$ is the $A$-inner product of the residual with itself. Both quantities are strictly positive (the numerator because $r^{(k)} \neq 0$, the denominator by positive definiteness of $A$), so $\alpha_k > 0$. This confirms we take a positive step in the descent direction, as expected. [/guided] [/step] [step:Express the denominator as $\|r^{(k)}\|_A^2$ and write the update] The denominator $\langle r^{(k)}, Ar^{(k)} \rangle$ is precisely $\|r^{(k)}\|_A^2$, the squared $A$-norm of the residual (defined by $\|v\|_A^2 := \langle v, Av \rangle$ for any $v \in \mathbb{R}^n$). The step size is therefore \begin{align*} \alpha_k = \frac{\|r^{(k)}\|_2^2}{\|r^{(k)}\|_A^2}, \end{align*} and the steepest descent update is \begin{align*} x^{(k+1)} = x^{(k)} + \alpha_k d^{(k)} = x^{(k)} + \frac{\|r^{(k)}\|_2^2}{\|r^{(k)}\|_A^2}\,r^{(k)}. \end{align*} [guided] The ratio $\|r^{(k)}\|_2^2 / \|r^{(k)}\|_A^2$ has a clean interpretation. By the Rayleigh quotient, $\|r\|_A^2 / \|r\|_2^2 = r^\top A r / r^\top r$ lies between $\lambda_{\min}(A)$ and $\lambda_{\max}(A)$. Therefore \begin{align*} \frac{1}{\lambda_{\max}(A)} \leq \alpha_k = \frac{\|r^{(k)}\|_2^2}{\|r^{(k)}\|_A^2} \leq \frac{1}{\lambda_{\min}(A)}. \end{align*} The step size is always bounded between the reciprocals of the extreme eigenvalues. When the residual is aligned with an eigenvector of $A$, the step size equals the reciprocal of the corresponding eigenvalue — in this case, steepest descent takes a single exact step toward the solution along that eigenspace. [/guided] [/step]

Explore Further

Row Formula for Non-Symmetric Deflation Numerical Analysis Gershgorin Circle Theorem Numerical Analysis Convergence of Rayleigh Quotients Numerical Analysis Spectral Radius Criterion Numerical Analysis Complexity of the FFT Numerical Analysis Properties of the Conjugate Gradient Method Numerical Analysis Chebyshev Coefficients of a Derivative Numerical Analysis Derivatives of Chebyshev Polynomials Numerical Analysis

What brings you to Androma?

Start with a route through the knowledge graph.

Steepest Descent Update with Exact Line Search (Theorem # 1397)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Steepest Descent Update with Exact Line Search (Theorem # 1397)

Discussion

Proof

Explore Further