[proofplan]
We show that after $k$ CG iterations, $\|e^{(k)}\|_A = \min_{P_k} \|P_k(A)e^{(0)}\|_A$ where the minimum is over polynomials $P_k$ of degree $\leq k$ with $P_k(0) = 1$. The proof has two parts. First, we show that $e^{(k)}$ minimizes the $A$-norm of $e^{(0)} - v$ over $v \in K_k(A, r^{(0)})$, using the $A$-orthogonality property from the conjugate direction theory. Second, we identify $K_k(A, r^{(0)})$ with $\{(I - P_k(A))e^{(0)} : P_k(0) = 1, \deg P_k \leq k\}$ via the relation $r^{(0)} = Ae^{(0)}$, establishing the polynomial characterization.
[/proofplan]
[step:Show $e^{(k)}$ is the $A$-orthogonal projection of $e^{(0)}$ away from $K_k(A, r^{(0)})$]
From the [Error Projection Along Search Direction](/theorems/1398), the error update satisfies $e^{(j+1)} = e^{(j)} - \alpha_j d^{(j)}$ for each $j = 0, \ldots, k-1$. Summing these telescopically:
\begin{align*}
e^{(0)} - e^{(k)} = \sum_{j=0}^{k-1} \alpha_j d^{(j)} \in \operatorname{span}\{d^{(0)}, \ldots, d^{(k-1)}\}.
\end{align*}
By the [Properties of the Conjugate Gradient Method](/theorems/1400), $\operatorname{span}\{d^{(0)}, \ldots, d^{(k-1)}\} = K_k(A, r^{(0)})$. Moreover, by the [Finite Termination of Conjugate Direction Methods](/theorems/1399), $\langle e^{(k)}, d^{(j)} \rangle_A = 0$ for all $j = 0, \ldots, k-1$, which means $e^{(k)}$ is $A$-orthogonal to $K_k(A, r^{(0)})$.
This is exactly the characterisation of the $A$-orthogonal projection: $e^{(0)} - e^{(k)} \in K_k(A, r^{(0)})$ and $e^{(k)} \perp_A K_k(A, r^{(0)})$. By the Pythagorean theorem in the $A$-inner product, for any $v \in K_k(A, r^{(0)})$:
\begin{align*}
\|e^{(0)} - v\|_A^2 = \|e^{(k)}\|_A^2 + \|(e^{(0)} - e^{(k)}) - v\|_A^2 \geq \|e^{(k)}\|_A^2,
\end{align*}
with equality iff $v = e^{(0)} - e^{(k)}$. Therefore:
\begin{align*}
\|e^{(k)}\|_A = \min_{v \in K_k(A, r^{(0)})} \|e^{(0)} - v\|_A.
\end{align*}
[guided]
The Pythagorean theorem applies because $e^{(k)}$ and $(e^{(0)} - e^{(k)}) - v$ are $A$-orthogonal: since $(e^{(0)} - e^{(k)}) - v \in K_k(A, r^{(0)})$ (as a difference of two elements of $K_k$), and $e^{(k)} \perp_A K_k(A, r^{(0)})$.
This shows that CG solves a best approximation problem: among all corrections $v$ in the Krylov subspace, CG picks the one that minimizes the $A$-norm of the resulting error. The $A$-orthogonality condition is the necessary and sufficient condition for this optimality.
[/guided]
[/step]
[step:Identify $K_k(A, r^{(0)})$ with the set $\{(I - P_k(A))e^{(0)} : P_k(0) = 1, \deg P_k \leq k\}$]
Since $r^{(0)} = Ae^{(0)}$:
\begin{align*}
K_k(A, r^{(0)}) = \operatorname{span}\{Ae^{(0)}, A^2e^{(0)}, \ldots, A^ke^{(0)}\}.
\end{align*}
Any $v \in K_k(A, r^{(0)})$ has the form $v = \sum_{i=1}^{k} c_i A^i e^{(0)}$ for some coefficients $c_1, \ldots, c_k \in \mathbb{R}$. Therefore:
\begin{align*}
e^{(0)} - v = e^{(0)} - \sum_{i=1}^{k} c_i A^i e^{(0)} = \left(I - \sum_{i=1}^{k} c_i A^i\right)e^{(0)} = P_k(A)e^{(0)},
\end{align*}
where $P_k(t) := 1 - \sum_{i=1}^{k} c_i t^i$ is a polynomial of degree at most $k$ satisfying $P_k(0) = 1$.
Conversely, every polynomial $P_k$ of degree $\leq k$ with $P_k(0) = 1$ can be written as $P_k(t) = 1 - \sum_{i=1}^{k} c_i t^i$ for some coefficients $c_i$, so $P_k(A)e^{(0)} = e^{(0)} - v$ with $v = \sum_{i=1}^{k} c_i A^i e^{(0)} \in K_k(A, r^{(0)})$.
This establishes a bijection between elements $v \in K_k(A, r^{(0)})$ and polynomials $P_k$ of degree $\leq k$ with $P_k(0) = 1$. Combining with the previous step:
\begin{align*}
\|e^{(k)}\|_A = \min_{v \in K_k(A, r^{(0)})} \|e^{(0)} - v\|_A = \min_{\substack{P_k : \deg P_k \leq k \\ P_k(0) = 1}} \|P_k(A)e^{(0)}\|_A.
\end{align*}
[guided]
The constraint $P_k(0) = 1$ comes from the fact that the identity term in $I - \sum c_i A^i$ contributes $1$ at $t = 0$, and the higher-order terms vanish. Geometrically, $P_k(0) = 1$ means the polynomial has no effect on the null space of $A$ — but since $A$ is positive definite, $0$ is not an eigenvalue, so this constraint is really about normalisation.
The polynomial minimization framework is powerful because it decouples the problem from the initial error $e^{(0)}$: we seek a single polynomial $P_k$ that is small on the spectrum of $A$, subject to $P_k(0) = 1$. This is a classical problem in approximation theory, and the Chebyshev polynomials provide the optimal solution (as used in the [Chebyshev Convergence Bound for CG](/theorems/1403)).
The bijection between $v \in K_k$ and polynomials $P_k$ with $P_k(0) = 1$ is central: $v = (I - P_k(A))e^{(0)}$ represents the correction applied after $k$ steps, and $P_k(A)e^{(0)} = e^{(k)}$ is the remaining error. CG automatically finds the polynomial $P_k$ that minimizes $\|P_k(A)e^{(0)}\|_A$.
[/guided]
[/step]