Chebyshev Convergence Bound for CG — Statement & Proof

Chebyshev Convergence Bound for CG (Theorem # 1403)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We derive the Chebyshev convergence bound for CG by combining the polynomial minimization characterization from the [CG Minimizes Error over Polynomials](/theorems/1402) theorem with an explicit upper bound from shifted Chebyshev polynomials. First, we bound $\|P_k(A)e^{(0)}\|_A$ by $\max_{\lambda \in \sigma(A)} |P_k(\lambda)| \cdot \|e^{(0)}\|_A$ using the eigenbasis expansion. Then we construct the optimal polynomial from the Chebyshev polynomial $T_k$ shifted to $[l, L]$, and evaluate the resulting bound using the identity $T_k(t) \geq \frac{1}{2}(t + \sqrt{t^2 - 1})^k$ for $t \geq 1$. [/proofplan] [step:Bound $\|P_k(A)e^{(0)}\|_A$ using the eigenbasis expansion] Since $A$ is symmetric positive definite, let $u_1, \ldots, u_n$ be an orthonormal eigenbasis with eigenvalues $\lambda_1, \ldots, \lambda_n \in [l, L]$. Expand $e^{(0)} = \sum_{i=1}^{n} \hat{e}_i u_i$ where $\hat{e}_i = \langle e^{(0)}, u_i \rangle$. Applying $P_k(A)$: \begin{align*} P_k(A)e^{(0)} = \sum_{i=1}^{n} P_k(\lambda_i) \hat{e}_i u_i. \end{align*} Computing the squared $A$-norm using orthonormality of $(u_i)$ and $Au_i = \lambda_i u_i$: \begin{align*} \|P_k(A)e^{(0)}\|_A^2 = \langle P_k(A)e^{(0)}, AP_k(A)e^{(0)} \rangle = \sum_{i=1}^{n} |P_k(\lambda_i)|^2 \lambda_i |\hat{e}_i|^2. \end{align*} Bounding $|P_k(\lambda_i)|^2 \leq \max_{\lambda \in [l,L]} |P_k(\lambda)|^2$ for each $i$: \begin{align*} \|P_k(A)e^{(0)}\|_A^2 \leq \left(\max_{\lambda \in [l,L]} |P_k(\lambda)|\right)^2 \sum_{i=1}^{n} \lambda_i |\hat{e}_i|^2 = \left(\max_{\lambda \in [l,L]} |P_k(\lambda)|\right)^2 \|e^{(0)}\|_A^2. \end{align*} By the [CG Minimizes Error over Polynomials](/theorems/1402): \begin{align*} \|e^{(k)}\|_A \leq \min_{\substack{P_k : \deg P_k \leq k \\ P_k(0) = 1}} \max_{\lambda \in [l,L]} |P_k(\lambda)| \cdot \|e^{(0)}\|_A. \end{align*} [guided] The eigenbasis expansion converts the matrix polynomial $P_k(A)$ into scalar evaluations $P_k(\lambda_i)$. Since $Au_i = \lambda_i u_i$, any polynomial in $A$ acts on $u_i$ by evaluating that polynomial at $\lambda_i$: \begin{align*} P_k(A)u_i = P_k(\lambda_i)u_i. \end{align*} The $A$-norm computation uses $\|v\|_A^2 = \langle v, Av \rangle$. When $v = \sum_i c_i u_i$, orthonormality gives $\langle v, Av \rangle = \sum_i \lambda_i |c_i|^2$. With $c_i = P_k(\lambda_i)\hat{e}_i$: \begin{align*} \|P_k(A)e^{(0)}\|_A^2 = \sum_{i=1}^{n} \lambda_i |P_k(\lambda_i)|^2 |\hat{e}_i|^2 \leq \left(\max_i |P_k(\lambda_i)|^2\right) \sum_i \lambda_i |\hat{e}_i|^2. \end{align*} The inequality $\max_i |P_k(\lambda_i)| \leq \max_{\lambda \in [l,L]} |P_k(\lambda)|$ holds because all eigenvalues lie in $[l, L]$. The problem has now been reduced to a scalar approximation theory question: find the degree-$k$ polynomial that is smallest on $[l, L]$ subject to being $1$ at the origin. [/guided] [/step] [step:Construct the optimal polynomial from the shifted Chebyshev polynomial on $[l, L]$] The Chebyshev polynomial $T_k$ satisfies $|T_k(x)| \leq 1$ for $x \in [-1, 1]$ and is the polynomial of degree $k$ with leading coefficient $2^{k-1}$ that deviates least from zero on $[-1, 1]$. To transfer this to $[l, L]$, define the affine map $\varphi: [l, L] \to [-1, 1]$ by: \begin{align*} \varphi(x) = \frac{2x - (L + l)}{L - l} = \frac{2(L - x)}{L - l} - 1, \end{align*} so $\varphi(l) = -1$ and $\varphi(L) = 1$. Note that $0 \notin [l, L]$ since $l > 0$, and $\varphi(0) = -(L + l)/(L - l) < -1$. Define the shifted polynomial: \begin{align*} P_k(x) := \frac{T_k(\varphi(x))}{T_k(\varphi(0))}. \end{align*} This satisfies $P_k(0) = 1$ (by construction) and $\deg P_k = k$ (since $\varphi$ is affine and $\deg T_k = k$). For $x \in [l, L]$, $\varphi(x) \in [-1, 1]$, so $|T_k(\varphi(x))| \leq 1$, giving: \begin{align*} \max_{x \in [l, L]} |P_k(x)| = \frac{1}{|T_k(\varphi(0))|} = \frac{1}{\left|T_k\!\left(\frac{L + l}{L - l}\right)\right|}. \end{align*} [/step] [step:Evaluate $T_k\!\left(\frac{L+l}{L-l}\right)$ using the Chebyshev lower bound] Set $\sigma := \frac{L + l}{L - l}$. Since $L > l > 0$, we have $\sigma > 1$. For $t > 1$, the Chebyshev polynomial satisfies $T_k(t) = \cosh(k \operatorname{arccosh}(t))$, and the identity $\cosh(\theta) \geq \frac{1}{2}e^{\theta}$ for $\theta \geq 0$ gives: \begin{align*} T_k(\sigma) \geq \frac{1}{2}\left(\sigma + \sqrt{\sigma^2 - 1}\right)^k. \end{align*} We compute $\sigma + \sqrt{\sigma^2 - 1}$. Setting $\sigma = (L + l)/(L - l)$: \begin{align*} \sigma^2 - 1 = \frac{(L+l)^2 - (L-l)^2}{(L-l)^2} = \frac{4Ll}{(L-l)^2}, \end{align*} so $\sqrt{\sigma^2 - 1} = \frac{2\sqrt{Ll}}{L - l} = \frac{2\sqrt{L}\sqrt{l}}{L - l}$. Therefore: \begin{align*} \sigma + \sqrt{\sigma^2 - 1} = \frac{L + l + 2\sqrt{Ll}}{L - l} = \frac{(\sqrt{L} + \sqrt{l})^2}{(\sqrt{L} - \sqrt{l})(\sqrt{L} + \sqrt{l})} = \frac{\sqrt{L} + \sqrt{l}}{\sqrt{L} - \sqrt{l}}. \end{align*} Hence: \begin{align*} T_k(\sigma) \geq \frac{1}{2}\left(\frac{\sqrt{L} + \sqrt{l}}{\sqrt{L} - \sqrt{l}}\right)^k. \end{align*} [guided] The identity $\cosh(k\theta) = T_k(\cosh\theta)$ is the defining relation for Chebyshev polynomials outside $[-1, 1]$: while $T_k(\cos\theta) = \cos(k\theta)$ for the oscillatory regime, the hyperbolic version $T_k(\cosh\theta) = \cosh(k\theta)$ governs the exponential growth regime. Since $\sigma > 1$, we are in the latter regime, and $T_k(\sigma)$ grows exponentially in $k$. The lower bound $\cosh(\theta) \geq \frac{1}{2}e^\theta$ follows from $\cosh(\theta) = \frac{e^\theta + e^{-\theta}}{2} \geq \frac{e^\theta}{2}$. This gives $T_k(\sigma) = \cosh(k\operatorname{arccosh}(\sigma)) \geq \frac{1}{2}e^{k\operatorname{arccosh}(\sigma)} = \frac{1}{2}(\sigma + \sqrt{\sigma^2 - 1})^k$, using $e^{\operatorname{arccosh}(t)} = t + \sqrt{t^2 - 1}$. The algebraic simplification of $\sigma + \sqrt{\sigma^2 - 1}$ uses the perfect square factorisation $(L+l)^2 - (L-l)^2 = 4Ll$ (difference of squares) and $L + l + 2\sqrt{Ll} = (\sqrt{L} + \sqrt{l})^2$. The denominator factors as $L - l = (\sqrt{L} - \sqrt{l})(\sqrt{L} + \sqrt{l})$, yielding the clean ratio $(\sqrt{L} + \sqrt{l})/(\sqrt{L} - \sqrt{l})$. [/guided] [/step] [step:Combine the bounds to obtain the Chebyshev convergence estimate] From the first step: \begin{align*} \|e^{(k)}\|_A \leq \max_{x \in [l,L]} |P_k(x)| \cdot \|e^{(0)}\|_A = \frac{1}{T_k(\sigma)} \|e^{(0)}\|_A. \end{align*} Substituting the lower bound on $T_k(\sigma)$: \begin{align*} \frac{1}{T_k(\sigma)} \leq \frac{2}{\left(\frac{\sqrt{L} + \sqrt{l}}{\sqrt{L} - \sqrt{l}}\right)^k} = 2\left(\frac{\sqrt{L} - \sqrt{l}}{\sqrt{L} + \sqrt{l}}\right)^k. \end{align*} Therefore: \begin{align*} \|e^{(k)}\|_A \leq 2\left(\frac{\sqrt{L} - \sqrt{l}}{\sqrt{L} + \sqrt{l}}\right)^k \|e^{(0)}\|_A. \end{align*} The convergence rate $\rho = \frac{\sqrt{L} - \sqrt{l}}{\sqrt{L} + \sqrt{l}} = \frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1}$ where $\kappa = L/l = \kappa(A)$ is the spectral condition number. Since $\kappa > 1$, we have $0 < \rho < 1$, and the error decreases geometrically. The factor $2$ accounts for the Chebyshev polynomial not being exactly $\frac{1}{2}(\sigma + \sqrt{\sigma^2 - 1})^k$ but only bounded below by this quantity. [guided] The convergence rate $\rho = (\sqrt{\kappa} - 1)/(\sqrt{\kappa} + 1)$ reveals why CG is dramatically faster than steepest descent for ill-conditioned systems. Steepest descent has convergence rate $(\kappa - 1)/(\kappa + 1)$, which approaches $1$ linearly as $\kappa \to \infty$. CG has convergence rate $(\sqrt{\kappa} - 1)/(\sqrt{\kappa} + 1)$, which approaches $1$ much more slowly. For example, if $\kappa = 100$: steepest descent has rate $99/101 \approx 0.98$ (needing about $115$ iterations to reduce the error by a factor of $10$), while CG has rate $9/11 \approx 0.82$ (needing about $13$ iterations for the same reduction). The square root in CG's rate comes from the Chebyshev polynomial's ability to exploit the gap between $0$ and $[l, L]$ — the polynomial must equal $1$ at $t = 0$ but can oscillate on $[l, L]$, and the "leverage" from the gap at the origin is what produces the $\sqrt{\kappa}$ improvement. The factor of $2$ in the bound is an artefact of the lower bound $\cosh(\theta) \geq \frac{1}{2}e^\theta$. In practice, CG often converges faster than this bound suggests, especially when the eigenvalues are clustered (the bound is sharp only when the eigenvalues are spread uniformly across $[l, L]$). [/guided] [/step]

Explore Further

Algebraic Closure and Differentiation of Fourier Expansions Numerical Analysis Steepest Descent Update with Exact Line Search Numerical Analysis Convergence of Inverse Iteration Numerical Analysis Gradient Equals Negative Residual Numerical Analysis Lax Equivalence Theorem Numerical Analysis QR Iteration Preserves Tridiagonal Structure Numerical Analysis Symmetry and Negative Definiteness of $A$ Numerical Analysis Convergence of Simultaneous Iteration Numerical Analysis

What brings you to Androma?

Start with a route through the knowledge graph.

Chebyshev Convergence Bound for CG (Theorem # 1403)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Chebyshev Convergence Bound for CG (Theorem # 1403)

Discussion

Proof

Explore Further