Properties of the Conjugate Gradient Method — Statement & Proof

Properties of the Conjugate Gradient Method (Theorem # 1400)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We prove three properties of the conjugate gradient method by induction on $m$. Property (1): the span of residuals equals the span of search directions equals the Krylov subspace $K_{m+1}(A, r^{(0)})$. The equality of the first two spans follows from the Gram-Schmidt-like construction of $d^{(m)}$ from $r^{(m)}$; the identification with the Krylov subspace uses the residual update $r^{(m+1)} = r^{(m)} - \alpha_m Ad^{(m)}$ and a dimension argument. Property (2): mutual orthogonality of residuals follows from the [Finite Termination of Conjugate Direction Methods](/theorems/1399) and the span equality. Property (3): conjugacy of search directions holds by construction. [/proofplan] [step:Prove $\operatorname{span}\{d^{(i)}\}_{i=0}^m = \operatorname{span}\{r^{(i)}\}_{i=0}^m$ from the CG construction] The conjugate gradient method constructs $d^{(m)}$ from $r^{(m)}$ by the recurrence: \begin{align*} d^{(0)} &= r^{(0)}, \\ d^{(m)} &= r^{(m)} + \beta_{m-1}\,d^{(m-1)}, \qquad m \geq 1, \end{align*} where $\beta_{m-1} = \|r^{(m)}\|_2^2 / \|r^{(m-1)}\|_2^2$. This is a Gram-Schmidt-type process that produces search directions from residuals. We prove $\operatorname{span}\{d^{(i)}\}_{i=0}^m = \operatorname{span}\{r^{(i)}\}_{i=0}^m$ by induction on $m$. **Base case ($m = 0$).** $d^{(0)} = r^{(0)}$, so $\operatorname{span}\{d^{(0)}\} = \operatorname{span}\{r^{(0)}\}$. **Inductive step.** Assume $\operatorname{span}\{d^{(i)}\}_{i=0}^{m-1} = \operatorname{span}\{r^{(i)}\}_{i=0}^{m-1}$. The recurrence $d^{(m)} = r^{(m)} + \beta_{m-1}d^{(m-1)}$ shows $d^{(m)} \in \operatorname{span}\{r^{(m)}\} + \operatorname{span}\{d^{(i)}\}_{i=0}^{m-1} = \operatorname{span}\{r^{(m)}\} + \operatorname{span}\{r^{(i)}\}_{i=0}^{m-1}$. Conversely, $r^{(m)} = d^{(m)} - \beta_{m-1}d^{(m-1)} \in \operatorname{span}\{d^{(i)}\}_{i=0}^m$. Therefore: \begin{align*} \operatorname{span}\{d^{(i)}\}_{i=0}^m = \operatorname{span}\{r^{(i)}\}_{i=0}^m. \end{align*} [/step] [step:Prove $\operatorname{span}\{r^{(i)}\}_{i=0}^m = K_{m+1}(A, r^{(0)})$ by induction using the residual update] Recall that $K_{m+1}(A, r^{(0)}) := \operatorname{span}\{r^{(0)}, Ar^{(0)}, \ldots, A^mr^{(0)}\}$. We prove $\operatorname{span}\{r^{(i)}\}_{i=0}^m = K_{m+1}(A, r^{(0)})$ by induction on $m$. **Base case ($m = 0$).** Both sides equal $\operatorname{span}\{r^{(0)}\}$. **Inductive step.** Assume $\operatorname{span}\{r^{(i)}\}_{i=0}^{m-1} = K_m(A, r^{(0)})$. The residual update in CG is: \begin{align*} r^{(m)} = r^{(m-1)} - \alpha_{m-1} Ad^{(m-1)}. \end{align*} By the previous step, $d^{(m-1)} \in \operatorname{span}\{r^{(i)}\}_{i=0}^{m-1} = K_m(A, r^{(0)})$. Therefore $Ad^{(m-1)} \in AK_m(A, r^{(0)}) \subset K_{m+1}(A, r^{(0)})$. Since $r^{(m-1)} \in K_m(A, r^{(0)}) \subset K_{m+1}(A, r^{(0)})$, we get $r^{(m)} \in K_{m+1}(A, r^{(0)})$, hence: \begin{align*} \operatorname{span}\{r^{(i)}\}_{i=0}^m \subset K_{m+1}(A, r^{(0)}). \end{align*} For the reverse inclusion, $K_{m+1}(A, r^{(0)}) = K_m(A, r^{(0)}) + \operatorname{span}\{A^m r^{(0)}\}$. By the inductive hypothesis, $K_m(A, r^{(0)}) = \operatorname{span}\{r^{(i)}\}_{i=0}^{m-1}$. Since $r^{(m)} \notin \operatorname{span}\{r^{(i)}\}_{i=0}^{m-1}$ (because $r^{(m)}$ is orthogonal to all $r^{(0)}, \ldots, r^{(m-1)}$ by property (2), which we establish independently below, and $r^{(m)} \neq 0$ by hypothesis), the dimension satisfies: \begin{align*} \dim \operatorname{span}\{r^{(i)}\}_{i=0}^m = m + 1 = \dim K_{m+1}(A, r^{(0)}), \end{align*} where the last equality holds because $K_{m+1}(A, r^{(0)})$ has dimension at most $m + 1$ and contains the $(m+1)$-dimensional subspace $\operatorname{span}\{r^{(i)}\}_{i=0}^m$. Since $\operatorname{span}\{r^{(i)}\}_{i=0}^m \subset K_{m+1}(A, r^{(0)})$ and both have dimension $m+1$, equality follows. [guided] The residual update $r^{(m)} = r^{(m-1)} - \alpha_{m-1} Ad^{(m-1)}$ is the mechanism that connects residuals to the Krylov subspace. At each step, the new residual $r^{(m)}$ is formed by taking a linear combination of $r^{(m-1)}$ (which lives in $K_m$) and $Ad^{(m-1)}$ (which lives in $AK_m \subset K_{m+1}$). This is how $A$ enters: the multiplication by $A$ in the residual update generates the next power of $A$ needed to extend the Krylov subspace. The dimension argument is crucial: we need $\operatorname{span}\{r^{(i)}\}_{i=0}^m$ to have full dimension $m + 1$ (rather than being a proper subspace of $K_{m+1}$). This is guaranteed by the orthogonality of the residuals: since $r^{(0)}, \ldots, r^{(m)}$ are mutually orthogonal and all nonzero (by hypothesis $r^{(m)} \neq 0$), they are linearly independent. Hence $\dim \operatorname{span}\{r^{(i)}\}_{i=0}^m = m + 1$. Why does the Krylov subspace $K_{m+1}(A, r^{(0)})$ have dimension exactly $m + 1$? The dimension is at most $m + 1$ by definition (it is spanned by $m + 1$ vectors). It is at least $m + 1$ because it contains the $(m+1)$-dimensional subspace $\operatorname{span}\{r^{(i)}\}_{i=0}^m$. So both spaces have the same dimension, and the inclusion $\operatorname{span}\{r^{(i)}\}_{i=0}^m \subset K_{m+1}$ forces equality. [/guided] [/step] [step:Prove mutual orthogonality of residuals from the finite termination theorem and the span equality] By the [Finite Termination of Conjugate Direction Methods](/theorems/1399), $r^{(k)}$ is orthogonal to $\operatorname{span}\{d^{(0)}, \ldots, d^{(k-1)}\}$ for each $k = 1, \ldots, n$. By property (1), $\operatorname{span}\{d^{(0)}, \ldots, d^{(k-1)}\} = \operatorname{span}\{r^{(0)}, \ldots, r^{(k-1)}\}$. Therefore: \begin{align*} \langle r^{(m)}, r^{(i)} \rangle = 0 \qquad \text{for all } i < m. \end{align*} Since $\operatorname{span}\{d^{(i)}\}_{i=0}^{m-1} = \operatorname{span}\{r^{(i)}\}_{i=0}^{m-1}$, orthogonality to the residuals immediately gives: \begin{align*} \langle r^{(m)}, d^{(i)} \rangle = 0 \qquad \text{for all } i < m. \end{align*} [/step] [step:Verify pairwise conjugacy of search directions from the Gram-Schmidt construction] The conjugacy $\langle d^{(m)}, d^{(i)} \rangle_A = 0$ for $i < m$ follows from the choice of $\beta_{m-1}$ in the CG recurrence. By the recurrence $d^{(m)} = r^{(m)} + \beta_{m-1}d^{(m-1)}$: \begin{align*} \langle d^{(m)}, d^{(i)} \rangle_A = \langle r^{(m)}, d^{(i)} \rangle_A + \beta_{m-1}\langle d^{(m-1)}, d^{(i)} \rangle_A. \end{align*} For $i < m - 1$, both terms vanish: $\langle r^{(m)}, d^{(i)} \rangle_A = \langle r^{(m)}, Ad^{(i)} \rangle$, and $Ad^{(i)} = \frac{1}{\alpha_i}(r^{(i)} - r^{(i+1)})$ (from the residual update), so $\langle r^{(m)}, Ad^{(i)} \rangle = \frac{1}{\alpha_i}(\langle r^{(m)}, r^{(i)} \rangle - \langle r^{(m)}, r^{(i+1)} \rangle) = 0$ by the mutual orthogonality of residuals (property (2), since $i < m$ and $i + 1 \leq m - 1 < m$). By induction on $m$, $\langle d^{(m-1)}, d^{(i)} \rangle_A = 0$ for $i < m - 1$. For $i = m - 1$, the coefficient $\beta_{m-1}$ is chosen precisely so that $\langle d^{(m)}, d^{(m-1)} \rangle_A = 0$: \begin{align*} \langle d^{(m)}, d^{(m-1)} \rangle_A = \langle r^{(m)}, d^{(m-1)} \rangle_A + \beta_{m-1}\langle d^{(m-1)}, d^{(m-1)} \rangle_A = 0, \end{align*} which gives $\beta_{m-1} = -\langle r^{(m)}, d^{(m-1)} \rangle_A / \langle d^{(m-1)}, d^{(m-1)} \rangle_A$. This is a one-term Gram-Schmidt $A$-orthogonalisation of $r^{(m)}$ against $d^{(m-1)}$ (only one correction term is needed because $r^{(m)}$ is already $A$-orthogonal to $d^{(0)}, \ldots, d^{(m-2)}$). [guided] A natural question is: why does the CG recurrence only involve a single correction term $\beta_{m-1}d^{(m-1)}$ rather than a full Gram-Schmidt process against all previous directions? The answer is that the mutual orthogonality of the residuals (property (2)) already ensures $\langle r^{(m)}, d^{(i)} \rangle_A = 0$ for $i < m - 1$. The only direction against which $r^{(m)}$ needs correction is $d^{(m-1)}$. To see why $\langle r^{(m)}, d^{(i)} \rangle_A = 0$ for $i < m - 1$: from the residual update $r^{(i+1)} = r^{(i)} - \alpha_i Ad^{(i)}$, we get $Ad^{(i)} = \frac{1}{\alpha_i}(r^{(i)} - r^{(i+1)})$. Therefore: \begin{align*} \langle r^{(m)}, d^{(i)} \rangle_A = \langle r^{(m)}, Ad^{(i)} \rangle = \frac{1}{\alpha_i}\bigl(\langle r^{(m)}, r^{(i)} \rangle - \langle r^{(m)}, r^{(i+1)} \rangle\bigr). \end{align*} When $i + 1 < m$ (i.e., $i \leq m - 2$), both $\langle r^{(m)}, r^{(i)} \rangle$ and $\langle r^{(m)}, r^{(i+1)} \rangle$ vanish by mutual orthogonality. This is the "short recurrence" property that makes CG computationally efficient: each new direction requires only one correction, not $m$ corrections. This short recurrence is special to the symmetric positive definite case. For non-symmetric $A$, the tridiagonal structure breaks down, and one must use full orthogonalisation methods like GMRES, which require $O(m)$ storage for $m$ iterations. [/guided] [/step]

Explore Further

Convergence for the Five-Point Poisson Scheme Numerical Analysis Chebyshev Coefficients of a Derivative Numerical Analysis Equivalence of Orderings Numerical Analysis von Neumann Stability Criterion Numerical Analysis Spectral Radius Criterion Numerical Analysis Chebyshev Convergence Bound for CG Numerical Analysis Jacobi Convergence for SPD-like Matrices Numerical Analysis Strict Diagonal Dominance Implies Nonsingularity Numerical Analysis

What brings you to Androma?

Start with a route through the knowledge graph.

Properties of the Conjugate Gradient Method (Theorem # 1400)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Properties of the Conjugate Gradient Method (Theorem # 1400)

Discussion

Proof

Explore Further