[course:Part IB Numerical Analysis, Cambridge, Michaelmas Term]
Exact solutions to mathematical problems — inverting matrices, evaluating [integrals](/page/Integral), solving differential equations — are rarely available in practice. When the matrix is $10{,}000 \times 10{,}000$, the integral has no closed form, or the ODE is nonlinear, the only option is to compute an *approximate* answer and control the error. [Numerical analysis](/page/Numerical%20Analysis) is the systematic study of such approximations: how to construct them, how accurate they are, when they converge, and when they fail.
The course divides into three parts. Chapters 1–3 address **approximation theory**: how to represent complicated [functions](/page/Function) by polynomials (interpolation, least-squares fitting), how to approximate integrals using finitely many function evaluations (quadrature), and how to bound the resulting errors (the Peano kernel theorem). Chapters 4–6 treat the **numerical solution of ordinary differential equations**: one-step methods (Euler, $\theta$-methods), multi-step methods (Adams, BDF), Runge-Kutta methods, and the interplay between accuracy and stability. Chapters 7–8 cover **numerical linear algebra**: LU and Cholesky factorizations for solving linear systems, and QR factorization for the least-squares problem.
A recurring theme is the tension between *accuracy* (how well does the approximation match the exact answer?) and *efficiency* (how much computational work is required?). The best numerical methods resolve this tension by exploiting mathematical structure — orthogonality, symmetry, sparsity — to achieve high accuracy at low cost.
# Polynomial Interpolation
Polynomials are the simplest class of functions that can be evaluated, differentiated, and integrated exactly using finitely many arithmetic operations. Writing down a polynomial of degree $n$ requires only $n + 1$ numbers (its coefficients), and evaluating it at a point costs $O(n)$ operations via Horner's scheme. The goal of this chapter is to approximate a complicated function $f$ by a polynomial that agrees with $f$ at prescribed points, and to understand the error of this approximation.
**Notation.** We write $P_n[x]$ for the real vector space of polynomials of degree $\le n$. Note $\dim P_n[x] = n + 1$.
## The Interpolation Problem and the Lagrange Formula
Given $n + 1$ distinct points $x_0, \dots, x_n \in \mathbb{R}$ and values $f_0, \dots, f_n \in \mathbb{R}$, the **interpolation problem** asks for a polynomial $p \in P_n[x]$ satisfying $p(x_i) = f_i$ for $i = 0, \dots, n$. Writing $p(x) = a_n x^n + \cdots + a_0$ and imposing the $n + 1$ conditions gives a system of $n + 1$ equations in $n + 1$ unknowns. From linear algebra, such a system need not have a solution in general — but the special structure of polynomial evaluation (the Vandermonde structure) guarantees both existence and uniqueness.
The construction uses the **Lagrange cardinal polynomials** $\ell_k(x) := \prod_{i \ne k} \frac{x - x_i}{x_k - x_i}$, which satisfy $\ell_k(x_j) = \delta_{jk}$. Each $\ell_k$ has degree exactly $n$, equals $1$ at $x_k$, and vanishes at every other interpolation point.
[quotetheorem:473]
The existence proof is immediate: $p(x) = \sum_{k=0}^n f_k \ell_k(x) \in P_n[x]$ satisfies $p(x_j) = \sum_k f_k \delta_{jk} = f_j$. For uniqueness: if $p, q \in P_n[x]$ both interpolate, then $r = p - q \in P_n[x]$ has $n + 1$ zeros $x_0, \dots, x_n$, but a nonzero polynomial of degree $\le n$ has at most $n$ zeros. So $r = 0$.
The Lagrange formula is elegant but impractical: adding a new interpolation point $x_{n+1}$ requires recomputing every cardinal polynomial from scratch. Newton's formula solves this by building the interpolant incrementally.
## Newton's Formula and Divided Differences
The idea is to write the interpolant as a telescoping sum. For $k = 0, \dots, n$, let $p_k \in P_k[x]$ be the unique polynomial interpolating the first $k + 1$ points. Then $p_n(x) = p_0(x) + \sum_{k=1}^n (p_k(x) - p_{k-1}(x))$. Since $p_k$ and $p_{k-1}$ agree at $x_0, \dots, x_{k-1}$, the difference $p_k - p_{k-1}$ vanishes at these points, so $p_k(x) - p_{k-1}(x) = A_k \prod_{i=0}^{k-1}(x - x_i)$ for some coefficient $A_k$.
[definition:Divided Difference]
The **divided difference** $f[x_j, \dots, x_k]$ is the leading coefficient of the unique polynomial in $P_{k-j}[x]$ that interpolates $f$ at $x_j, \dots, x_k$. In particular, $f[x_i] = f(x_i)$.
[/definition]
[quotetheorem:474]
The coefficients $A_k = f[x_0, \dots, x_k]$ appear along the top diagonal of the **divided difference table**, which is built column by column using the recurrence relation.
[quotetheorem:907]
The proof constructs the interpolant at $\{x_j, \dots, x_k\}$ as a convex combination of interpolants at $\{x_j, \dots, x_{k-1}\}$ and $\{x_{j+1}, \dots, x_k\}$: $q_2(x) = \frac{x - x_j}{x_k - x_j} q_1(x) + \frac{x_k - x}{x_k - x_j} q_0(x)$. This correctly interpolates all the data (check directly at $x_j$, $x_k$, and the interior points), so by uniqueness it equals $q_2$. Comparing leading coefficients gives the recurrence.
[citeproof:907]
[example:Divided Difference Table]
Consider $f(x) = e^x$ at points $x_0 = 0, x_1 = 1, x_2 = 2$. The divided difference table is:
$f[x_0] = e^0 = 1$, $f[x_1] = e^1 = e$, $f[x_2] = e^2$.
First-order differences: $f[x_0, x_1] = \frac{e - 1}{1 - 0} = e - 1 \approx 1.718$, $f[x_1, x_2] = \frac{e^2 - e}{2 - 1} = e^2 - e \approx 4.671$.
Second-order difference: $f[x_0, x_1, x_2] = \frac{(e^2 - e) - (e - 1)}{2 - 0} = \frac{e^2 - 2e + 1}{2} = \frac{(e-1)^2}{2} \approx 1.476$.
So the Newton interpolant is $p_2(x) = 1 + (e-1)x + \frac{(e-1)^2}{2} x(x-1)$.
To evaluate at $x = 0.5$ using Horner's scheme: $S \leftarrow \frac{(e-1)^2}{2} \approx 1.476$; $S \leftarrow (0.5 - 1) \cdot 1.476 + (e-1) = -0.738 + 1.718 = 0.980$; $S \leftarrow (0.5 - 0) \cdot 0.980 + 1 = 1.490$. The true value is $e^{0.5} \approx 1.649$, so the error is about $0.16$ — not bad for 3 points, but we can do better with a smarter choice of interpolation points.
[/example]
The crucial link between divided differences and derivatives, which is the engine behind all error bounds, is the following mean-value-type result.
[quotetheorem:908]
The proof applies Rolle's theorem repeatedly: $e = f - p_n$ vanishes at $n + 1$ points, so $e'$ has $\ge n$ zeros, $e''$ has $\ge n - 1$ zeros, and after $n$ differentiations, $e^{(n)}$ has at least one zero $\xi$. Since $p_n^{(n)} = n! f[x_0, \dots, x_n]$ (a constant), we get $f^{(n)}(\xi) = n! f[x_0, \dots, x_n]$.
[citeproof:908]
## Error Bounds and Chebyshev Polynomials
How well does the interpolant $p_n$ approximate $f$ between the interpolation points? The error formula shows the answer depends on two factors: the smoothness of $f$ (via $f^{(n+1)}$) and the placement of the interpolation points (via the nodal polynomial $\omega(x) = \prod_{i=0}^n (x - x_i)$).
[quotetheorem:475]
The proof treats $\bar{x}$ as a new interpolation point $x_{n+1}$. The Newton formula gives $p_{n+1}(x) - p_n(x) = f[x_0, \dots, x_n, \bar{x}] \omega(x)$. Evaluating at $\bar{x}$ (where $p_{n+1}(\bar{x}) = f(\bar{x})$) gives $f(\bar{x}) - p_n(\bar{x}) = f[x_0, \dots, x_n, \bar{x}] \omega(\bar{x})$. The divided-difference-derivative theorem then converts the divided difference into $\frac{1}{(n+1)!} f^{(n+1)}(\xi)$.
[citeproof:475]
The practical bound is $|f(x) - p_n(x)| \le \frac{1}{(n+1)!} \|f^{(n+1)}\|_\infty |\omega(x)|$. For a fixed function $f$, the only term we can control is $\|\omega\|_\infty$, which depends on our choice of interpolation points. This leads to a minimax problem: find the monic polynomial of degree $n + 1$ with smallest $\|\cdot\|_\infty$ on $[-1, 1]$.
The answer involves the **Chebyshev polynomials** $T_n(x) = \cos(n \cos^{-1} x)$, which satisfy the recurrence $T_{n+1}(x) = 2xT_n(x) - T_{n-1}(x)$ with $T_0 = 1$, $T_1 = x$. The key properties of $T_n$ on $[-1, 1]$ are: (i) $|T_n(x)| \le 1$ everywhere, with the maximum $|T_n(X_k)| = 1$ attained at $n + 1$ points $X_k = \cos(k\pi/n)$ with alternating sign $T_n(X_k) = (-1)^k$; (ii) $T_n$ has $n$ distinct zeros at $x_k = \cos\!\left(\frac{2k-1}{2n}\pi\right)$ for $k = 1, \dots, n$; (iii) the leading coefficient of $T_n$ is $2^{n-1}$ for $n \ge 1$.
[quotetheorem:476]
The proof is by contradiction: if some monic $q_n$ had $\|q_n\|_\infty < 2^{-(n-1)}$, then $r = 2^{-(n-1)} T_n - q_n \in P_{n-1}[x]$ (the leading terms cancel) would alternate in sign at the $n + 1$ extrema of $T_n$ (since $|T_n|$ dominates $|q_n|$ there), forcing $r$ to have $\ge n$ zeros — contradicting $\deg r \le n - 1$.
[citeproof:476]
Applying this to the nodal polynomial $\omega = \prod_{i=0}^n (x - x_i) \in P_{n+1}[x]$ (monic of degree $n + 1$): the optimal interpolation points on $[-1, 1]$ are the zeros of $T_{n+1}$, namely $x_k = \cos\!\left(\frac{2k+1}{2(n+1)}\pi\right)$ for $k = 0, \dots, n$, which give $\|\omega\|_\infty = 2^{-n}$.
[example:Chebyshev vs Equispaced Interpolation]
Consider interpolating $f(x) = (1 + 25x^2)^{-1}$ (Runge's function) on $[-1, 1]$ with $n + 1 = 11$ points.
**Equispaced points** $x_k = -1 + 2k/10$: the interpolant oscillates wildly near $x = \pm 1$ and the maximum error is $\approx 1.9$ — worse than not interpolating at all. This is the famous **Runge phenomenon**: equispaced interpolation can diverge as $n \to \infty$ for perfectly smooth functions.
**Chebyshev points** $x_k = \cos\!\left(\frac{2k+1}{22}\pi\right)$: the maximum error drops to $\approx 0.11$. The Chebyshev points cluster near the endpoints, where equispaced interpolation suffers, and spread out in the middle.
The error bound gives $\|f - p_{10}\|_\infty \le \frac{1}{2^{10} \cdot 11!} \|f^{(11)}\|_\infty$. The exponential and factorial in the denominator ensure that for analytic $f$ with controlled derivatives, the error tends to zero as $n \to \infty$ — but only with the Chebyshev choice of points.
[/example]
# Orthogonal Polynomials and Quadrature
The Chebyshev polynomials are one instance of a general class: **orthogonal polynomials** with respect to a weighted inner product. This chapter develops the theory of orthogonal polynomials, uses them for least-squares approximation (which minimizes total error rather than matching $f$ at isolated points), and then for Gaussian quadrature (which approximates integrals using optimally placed evaluation points).
## Inner Products on Polynomial Spaces
The inner products relevant to numerical analysis are weighted integrals and discrete sums.
[definition:Weighted $L^2$ Inner Product]
Let $w: (a, b) \to (0, \infty)$ be a weight function with $\int_a^b w(x) x^n \, dx < \infty$ for all $n \ge 0$. The **weighted inner product** on the space of polynomials is $\langle f, g \rangle = \int_a^b w(x) f(x) g(x) \, dx$.
[/definition]
Important examples include: Legendre ($[a,b] = [-1,1]$, $w \equiv 1$), Chebyshev ($[-1,1]$, $w = (1-x^2)^{-1/2}$), Laguerre ($[0, \infty)$, $w = e^{-x}$), and Hermite ($(- \infty, \infty)$, $w = e^{-x^2}$). The interval may be infinite, provided the weight decays fast enough for $\int w(x) x^n \, dx$ to converge for all $n$.
A **discrete inner product** $\langle f, g \rangle = \sum_{j=1}^m w_j f(\xi_j) g(\xi_j)$ with distinct nodes $\xi_j$ and positive weights $w_j > 0$ is also valid, but only on $P_{m-1}[x]$: a polynomial $p \in P_m[x]$ with $p(\xi_j) = 0$ for all $j$ would satisfy $\langle p, p \rangle = 0$ despite $p \ne 0$.
An essential property of the weighted integral inner product is $\langle xf, g \rangle = \langle f, xg \rangle$ (since $w(x) \cdot xf(x) \cdot g(x) = w(x) \cdot f(x) \cdot xg(x)$). This symmetry is what makes the three-term recurrence work.
## Orthogonal Polynomials and Three-Term Recurrence
In finite-dimensional linear algebra, orthogonal bases simplify everything: projections become dot products, least-squares reduces to component-by-component minimisation, and there are no cross-terms in norm computations. The same principle applies to polynomial spaces. Given an inner product on polynomials, we want a basis $\{p_0, p_1, \dots\}$ where each $p_k$ has degree exactly $k$ and is orthogonal to all lower-degree polynomials. The question is whether such a basis exists (yes, by Gram-Schmidt) and whether it is unique (yes, if we fix the leading coefficient).
[quotetheorem:477]
The proof is by induction using the Gram-Schmidt process. Given $\{p_0, \dots, p_n\}$, define $p_{n+1} = x^{n+1} - \sum_{k=0}^n \frac{\langle x^{n+1}, p_k \rangle}{\langle p_k, p_k \rangle} p_k$, which is monic and orthogonal to $P_n[x]$. Uniqueness follows from the fact that if $\hat{p}_{n+1}$ were another monic orthogonal polynomial, then $p_{n+1} - \hat{p}_{n+1} \in P_n[x]$ would be orthogonal to itself (hence zero).
[citeproof:477]
Computing the Gram-Schmidt sum naively requires inner products against all previous $p_k$. A much more efficient approach exploits the symmetry $\langle xf, g \rangle = \langle f, xg \rangle$ to reduce the sum to just two terms.
[quotetheorem:478]
The proof starts with $q_{n+1} = xp_n$ and applies Gram-Schmidt: $p_{n+1} = xp_n - \sum_{k=0}^n \frac{\langle xp_n, p_k \rangle}{\langle p_k, p_k \rangle} p_k$. The symmetry gives $\langle xp_n, p_k \rangle = \langle p_n, xp_k \rangle$, which vanishes whenever $\deg(xp_k) = k + 1 < n$, i.e. $k \le n - 2$. So only the $k = n$ and $k = n - 1$ terms survive. For $\beta_k$: $\langle p_n, xp_{n-1} \rangle = \langle p_n, p_n + \text{lower} \rangle = \langle p_n, p_n \rangle$ since $xp_{n-1}$ is monic of degree $n$.
[citeproof:478]
[example:Legendre Polynomials]
For $w \equiv 1$ on $[-1, 1]$: $p_0 = 1$, $\alpha_0 = \frac{\langle x, 1 \rangle}{\langle 1, 1 \rangle} = \frac{\int_{-1}^1 x \, dx}{\int_{-1}^1 1 \, dx} = 0$ (by symmetry), so $p_1 = x$. Next: $\alpha_1 = \frac{\langle x^2, x \rangle}{\langle x, x \rangle} = 0$ (odd integrand), and $\beta_1 = \frac{\langle x, x \rangle}{\langle 1, 1 \rangle} = \frac{2/3}{2} = \frac{1}{3}$, giving $p_2 = x^2 - \frac{1}{3}$. Continuing: $\alpha_2 = 0$, $\beta_2 = \frac{\langle p_2, p_2 \rangle}{\langle p_1, p_1 \rangle} = \frac{8/45}{2/3} = \frac{4}{15}$, so $p_3 = x^3 - \frac{3}{5}x$. These are (up to scaling) the Legendre polynomials. All $\alpha_k = 0$ by the symmetry $w(-x) = w(x)$ and the parity of the polynomials.
[/example]
## Least-Squares Polynomial Approximation
Interpolation forces $p$ to agree with $f$ at specific points but says nothing about the error elsewhere. A global alternative is to minimize the total weighted error $\|f - p\|^2 = \langle f - p, f - p \rangle$ over all $p \in P_n[x]$.
[quotetheorem:479]
The proof expands $\|f - p\|^2 = \|f\|^2 - 2\sum_k c_k \langle f, p_k \rangle + \sum_k c_k^2 \|p_k\|^2$ (no cross-terms since $\langle p_j, p_k \rangle = 0$ for $j \ne k$). Differentiating with respect to each $c_k$ and setting to zero gives $c_k = \langle f, p_k \rangle / \|p_k\|^2$. The Hessian is $2 \operatorname{diag}(\|p_0\|^2, \dots, \|p_n\|^2)$, which is positive definite, confirming the minimum.
A key property: the residual $f - p$ is orthogonal to $P_n[x]$, i.e. $\langle f - p, q \rangle = 0$ for all $q \in P_n[x]$. Taking $q = p$ gives $\langle f, p \rangle = \|p\|^2$, and expanding $\|f - p\|^2$ yields the Pythagorean identity $\|f - p\|^2 + \|p\|^2 = \|f\|^2$.
[citeproof:479]
The error formula $\|f - p\|^2 = \|f\|^2 - \sum_{k=0}^n \frac{\langle f, p_k \rangle^2}{\|p_k\|^2}$ shows that adding more terms can only decrease the error: each new $c_{n+1} p_{n+1}$ subtracts a positive quantity. As $n \to \infty$, the [Weierstrass Approximation Theorem](/theorems/480) guarantees $\|f - p_n\|^2 \to 0$ for [continuous](/page/Continuity) $f$ on a finite interval (since Weierstrass gives $\|f - q\|_\infty < \varepsilon$ for some polynomial $q$ of sufficiently high degree, and $\|f - p_n\|^2 \le \|f - q\|^2 \le \varepsilon^2 \int_a^b w$). Since the error tends to zero, the sum of subtracted squares must exhaust $\|f\|^2$ entirely — this is the analogue of Parseval's identity from Fourier analysis, but for arbitrary orthogonal polynomial expansions.
[quotetheorem:481]
[example:Least-Squares Approximation of $e^x$ on $[-1, 1]$]
Approximate $f(x) = e^x$ by a polynomial of degree 1 using the Legendre inner product $\langle f, g \rangle = \int_{-1}^1 f(x)g(x) \, dx$ (weight $w \equiv 1$). The orthogonal polynomials are $p_0 = 1$ and $p_1 = x$.
The coefficients are:
\begin{align*}
c_0 &= \frac{\langle e^x, 1 \rangle}{\langle 1, 1 \rangle} = \frac{\int_{-1}^1 e^x \, dx}{\int_{-1}^1 1 \, dx} = \frac{e - e^{-1}}{2} = \frac{2\sinh(1)}{2} = \sinh(1) \approx 1.1752, \\
c_1 &= \frac{\langle e^x, x \rangle}{\langle x, x \rangle} = \frac{\int_{-1}^1 xe^x \, dx}{\int_{-1}^1 x^2 \, dx} = \frac{2e^{-1}}{2/3} = \frac{3}{e} \approx 1.1036.
\end{align*}
(The numerator $\int_{-1}^1 xe^x \, dx$ is computed by parts: $[xe^x]_{-1}^1 - \int_{-1}^1 e^x \, dx = (e + e^{-1}) - (e - e^{-1}) = 2e^{-1}$.)
So the best linear approximation is $p(x) = \sinh(1) + \frac{3}{e}x \approx 1.1752 + 1.1036x$.
The error is $\|e^x - p\|^2 = \|e^x\|^2 - c_0^2 \|p_0\|^2 - c_1^2 \|p_1\|^2 = \int_{-1}^1 e^{2x} \, dx - \sinh^2(1) \cdot 2 - \frac{9}{e^2} \cdot \frac{2}{3} = \sinh(2) - 2\sinh^2(1) - \frac{6}{e^2} \approx 3.627 - 2.762 - 0.812 = 0.053$. So $\|e^x - p\| \approx 0.230$. Adding a quadratic term $c_2 p_2$ would subtract $c_2^2 \|p_2\|^2 > 0$ from this, further reducing the error.
[/example]
## Gaussian Quadrature
The problem of **numerical quadrature** is to approximate $\int_a^b w(x) f(x) \, dx \approx \sum_{k=1}^\nu b_k f(c_k)$ for some weights $b_k$ and nodes $c_k$, chosen independently of $f$. We have $2\nu$ free parameters ($\nu$ weights and $\nu$ nodes), so we might hope for exactness on $P_{2\nu - 1}[x]$ (which has $2\nu$ dimensions). This is indeed achievable — and optimal.
The starting point is **ordinary quadrature**: fix any nodes $c_1, \dots, c_\nu$ and [set](/page/Set) $b_k = \int_a^b w(x) \ell_k(x) \, dx$ where $\ell_k$ are the Lagrange cardinal polynomials. This is exact on $P_{\nu-1}[x]$ by construction (every $f \in P_{\nu-1}[x]$ equals its own interpolant). But no choice of $\nu$ nodes can give exactness on $P_{2\nu}[x]$: taking $f(x) = \prod_{k=1}^\nu (x - c_k)^2 \in P_{2\nu}[x]$ gives $\int_a^b w f > 0$ but $\sum b_k f(c_k) = 0$.
The optimal nodes are the zeros of the $\nu$-th orthogonal polynomial, as the following results show.
[quotetheorem:482]
The proof shows that if $p_\nu$ changed sign at fewer than $\nu$ points $\xi_1, \dots, \xi_m$ in $(a, b)$, then $q(x) = \prod_{i=1}^m (x - \xi_i) \in P_m[x]$ with $m < \nu$ would satisfy $\langle q, p_\nu \rangle = \int_a^b w \cdot q \cdot p_\nu > 0$ (since $qp_\nu$ does not change sign), contradicting orthogonality. So $p_\nu$ must have $\nu$ sign changes, and since it has degree $\nu$ it can have at most $\nu$ zeros — hence exactly $\nu$ distinct real zeros in $(a, b)$.
[citeproof:482]
[quotetheorem:483]
The proof of part (i) uses the division algorithm: any $f \in P_{2\nu-1}[x]$ can be written as $f = qp_\nu + r$ with $q, r \in P_{\nu-1}[x]$. Then $\int_a^b wf = \langle q, p_\nu \rangle + \int_a^b wr = \int_a^b wr$ (by orthogonality), and $\sum b_k f(c_k) = \sum b_k r(c_k)$ (since $p_\nu(c_k) = 0$). Both sides equal $\int_a^b wr$ since the quadrature is exact on $P_{\nu-1}[x]$. Part (ii) follows by testing $f = \ell_k^2 \in P_{2\nu-2}[x]$: exactness gives $b_k = \int_a^b w \ell_k^2 > 0$.
[citeproof:483]
[example:Gaussian Quadrature on $[-1, 1]$ with $w \equiv 1$]
The Legendre polynomials are $p_1(x) = x$, $p_2(x) = x^2 - 1/3$, $p_3(x) = x^3 - 3x/5$.
**$\nu = 1$:** The zero of $p_1$ is $c_1 = 0$, with weight $b_1 = \int_{-1}^1 1 \, dx = 2$. The rule $\int_{-1}^1 f \approx 2f(0)$ is the midpoint rule, exact for $f \in P_1[x]$.
**$\nu = 2$:** The zeros of $p_2$ are $c_{1,2} = \pm 1/\sqrt{3}$. The weights are $b_1 = b_2 = 1$ (by symmetry and the requirement $b_1 + b_2 = \int_{-1}^1 1 = 2$). The rule $\int_{-1}^1 f \approx f(-1/\sqrt{3}) + f(1/\sqrt{3})$ is exact for $f \in P_3[x]$.
**Test:** $\int_{-1}^1 x^4 \, dx = 2/5 = 0.4$. The $\nu = 2$ rule gives $(-1/\sqrt{3})^4 + (1/\sqrt{3})^4 = 2/9 \approx 0.222$, which is not exact (as expected, since $x^4 \in P_4[x]$ and the rule is only exact on $P_3[x]$).
[/example]
# Error of Approximation
Gaussian quadrature and interpolation are exact for polynomials, but we apply them to general functions. When a quadrature formula (or any linear approximation) is exact for polynomials of degree $\le k$ but applied to a function $f \in C^{k+1}[a,b]$, how large is the error? The Peano kernel theorem provides a systematic way to answer this question for any linear functional — not just quadrature, but also numerical differentiation, interpolation error, and finite difference approximations.
## The Peano Kernel Theorem
The idea is to use Taylor's formula with integral remainder. For $f \in C^{k+1}[a,b]$:
\begin{align*}
f(x) = \sum_{j=0}^k \frac{(x-a)^j}{j!} f^{(j)}(a) + \frac{1}{k!} \int_a^b (x - \theta)_+^k f^{(k+1)}(\theta) \, d\theta,
\end{align*}
where $(x - \theta)_+^k = (x - \theta)^k$ if $x \ge \theta$ and $0$ otherwise. The polynomial part is annihilated by any linear functional $L$ that is exact on $P_k[x]$, leaving $L(f) = \frac{1}{k!} \int_a^b K(\theta) f^{(k+1)}(\theta) \, d\theta$ where $K(\theta) = L((x - \theta)_+^k)$ is the **Peano kernel** — a function of $\theta$ alone that encodes all the information about $L$.
[quotetheorem:484]
The Peano kernel is computed by applying $L$ (which acts on $x$, with $\theta$ held fixed) to the truncated power $(x - \theta)_+^k$. The crucial point is that $K$ is independent of $f$: it depends only on the approximation rule $L$.
[citeproof:484]
[quotetheorem:485]
When $K$ does not change sign on $(a, b)$, the [mean value theorem](/theorems/186) for integrals gives $L(f) = \frac{f^{(k+1)}(\xi)}{k!} \int_a^b K(\theta) \, d\theta$ for some $\xi \in (a, b)$, which is a sharper result than the general bound.
[citeproof:485]
[example:Peano Kernel for Simpson's Rule]
Simpson's rule approximates $\int_{-1}^1 f(x) \, dx \approx \frac{1}{3}(f(-1) + 4f(0) + f(1))$. The error functional is $L(f) = \int_{-1}^1 f(x) \, dx - \frac{1}{3}(f(-1) + 4f(0) + f(1))$. Direct computation shows $L$ annihilates $P_3[x]$ (not just $P_2[x]$ — Simpson's rule gains an extra order of exactness by symmetry).
The Peano kernel for the third-order case is $K(\theta) = L_x\!\left(\frac{(x - \theta)_+^3}{6}\right)$. After computing (splitting into cases $-1 \le \theta \le 0$ and $0 \le \theta \le 1$):
\begin{align*}
K(\theta) = \begin{cases} -\frac{1}{3}\theta(1 + \theta)^2 & -1 \le \theta \le 0, \\ -\frac{1}{3}\theta(1 - \theta)^2 & 0 \le \theta \le 1. \end{cases}
\end{align*}
Since $K(\theta) \le 0$ for all $\theta \in [-1, 1]$ (the factor $-\theta/3$ is non-negative on $[-1, 0]$ and the squared terms are always non-negative; on $[0, 1]$, $-\theta/3 \le 0$ and $(1 - \theta)^2 \ge 0$), the kernel does not change sign. The mean value form gives $L(f) = \frac{f'''(\xi)}{6} \int_{-1}^1 K(\theta) \, d\theta$ for some $\xi$. Computing $\int_{-1}^1 K(\theta) \, d\theta = -1/6$ gives the sharp bound $|L(f)| \le \frac{1}{36} \|f'''\|_\infty$.
For $f \in C^4[-1, 1]$, repeating the Peano kernel computation with $k = 3$ gives $|L(f)| \le \frac{1}{90} \|f^{(4)}\|_\infty$, which is the classical Simpson error bound (after rescaling to a general interval $[a, b]$ of length $h$, this becomes $\frac{h^5}{90} \|f^{(4)}\|_\infty$).
[/example]
# Numerical Solution of Ordinary Differential Equations
We now turn from approximation of functions and integrals to the numerical solution of initial value problems $y'(t) = f(t, y(t))$, $y(0) = y_0$, for $t \in [0, T]$ with $f: \mathbb{R} \times \mathbb{R}^N \to \mathbb{R}^N$ Lipschitz. The [Picard-Lindelöf Theorem](/theorems/69) guarantees existence and uniqueness of the solution, and the goal is to construct a [sequence](/page/Sequence) of approximations $y_n \approx y(t_n)$ at discrete times $t_n = nh$ that converge to the true solution as $h \to 0$.
## One-Step Methods
The most natural approach to numerical ODE solving is to march forward in time: given the current approximation $y_n \approx y(t_n)$, compute the next value $y_{n+1} \approx y(t_{n+1})$ using only local information at $t_n$. This is a **one-step method** — the simplest class of numerical integrators, where each step depends only on the immediately preceding value.
[definition:One-Step Method]
An **(explicit) one-step method** computes $y_{n+1} = \Phi_h(t_n, y_n)$ for some function $\Phi_h: \mathbb{R} \times \mathbb{R}^N \to \mathbb{R}^N$.
[/definition]
The simplest example is **Euler's method**: $y_{n+1} = y_n + hf(t_n, y_n)$, which follows the tangent line at $(t_n, y_n)$ for one step of length $h$. The **local truncation error** $\eta_{n+1} = y(t_{n+1}) - y(t_n) - hf(t_n, y(t_n))$ measures the error at a single step assuming the previous value is exact. By Taylor expansion: $\eta_{n+1} = \frac{h^2}{2} y''(\tau)$ for some $\tau$, so $\|\eta_{n+1}\| \le ch^2$ where $c = \frac{1}{2}\|y''\|_\infty$. A method with $\|\eta_{n+1}\| = O(h^{p+1})$ is said to have **order** $p$; Euler has order 1.
[quotetheorem:488]
The proof patches local errors together via a discrete Gronwall argument. Writing $e_{n+1} = e_n + h(f(t_n, y_n) - f(t_n, y(t_n))) - \eta_{n+1}$, the Lipschitz condition gives $\|e_{n+1}\| \le (1 + h\lambda)\|e_n\| + ch^2$. Unrolling the recursion: $\|e_n\| \le ch^2 \sum_{j=0}^{n-1} (1+h\lambda)^j \le \frac{ch}{\lambda}((1+h\lambda)^n - 1) \le \frac{ch}{\lambda}(e^{\lambda T} - 1)$, using $(1 + h\lambda) \le e^{h\lambda}$.
[citeproof:488]
[definition:$\theta$-Method]
For $\theta \in [0, 1]$, the **$\theta$-method** is $y_{n+1} = y_n + h[\theta f(t_n, y_n) + (1 - \theta)f(t_{n+1}, y_{n+1})]$.
[/definition]
Setting $\theta = 1$ gives Euler; $\theta = 0$ gives the **backward Euler** method; $\theta = 1/2$ gives the **trapezoidal rule**. For $\theta \ne 1$, the method is **implicit**: $y_{n+1}$ appears on both sides, requiring a nonlinear solve at each step. Taylor-expanding the local truncation error gives $\eta = (\theta - \frac{1}{2})h^2 y''(t_n) + O(h^3)$, so the trapezoidal rule ($\theta = 1/2$) has order 2 while all other $\theta$-methods have order 1.
[example:Euler's Method Applied to $y' = -y$]
Consider $y' = -y$ with $y(0) = 1$ (exact solution $y(t) = e^{-t}$). Euler's method gives $y_{n+1} = y_n + h(-y_n) = (1 - h)y_n$, so $y_n = (1-h)^n$.
With $h = 0.5$ and $T = 2$ (so 4 steps):
\begin{align*}
y_0 &= 1, \quad y_1 = 0.5, \quad y_2 = 0.25, \quad y_3 = 0.125, \quad y_4 = 0.0625.
\end{align*}
The exact value is $y(2) = e^{-2} \approx 0.1353$. Error: $|0.0625 - 0.1353| = 0.0728$.
With $h = 0.25$ (8 steps): $y_8 = (0.75)^8 \approx 0.1001$. Error: $|0.1001 - 0.1353| = 0.0352$.
The error roughly halved when $h$ halved — consistent with order 1 (error $\sim ch$). The convergence theorem gives $\|e_n\| \le ch(e^{\lambda T} - 1)/\lambda = ch(e^{2} - 1) \approx 6.39ch$, where $c = \frac{1}{2}\|y''\|_\infty = \frac{1}{2}$ and $\lambda = 1$. So $\|e_n\| \le 3.19h$, giving bounds of $1.60$ (for $h = 0.5$) and $0.80$ (for $h = 0.25$). The actual errors are much smaller — the bound is pessimistic because it accounts for worst-case error accumulation over all possible ODEs with the same Lipschitz constant.
Note: if $h > 2$ (i.e. $h > 2/|\lambda|$), then $|1 - h| > 1$ and $y_n$ oscillates with growing amplitude — the method is **unstable**. This is a preview of the stability theory in the next chapter.
[/example]
## Multi-Step Methods
Instead of using only $(t_n, y_n)$ to compute $y_{n+1}$, multi-step methods reuse previously computed values $y_{n-1}, y_{n-2}, \dots$ to achieve higher order at low cost per step.
[definition:Linear Multi-Step Method]
An **$s$-step linear multi-step method** is $\sum_{\ell=0}^s \rho_\ell y_{n+\ell} = h \sum_{\ell=0}^s \sigma_\ell f(t_{n+\ell}, y_{n+\ell})$ with $\rho_s = 1$. It is **explicit** if $\sigma_s = 0$ and **implicit** otherwise.
[/definition]
The method is characterised by its **generating polynomials** $\rho(w) = \sum_\ell \rho_\ell w^\ell$ and $\sigma(w) = \sum_\ell \sigma_\ell w^\ell$.
[quotetheorem:489]
The proof expands $\rho(e^x) = \sum_\ell \rho_\ell e^{\ell x}$ and $x\sigma(e^x) = x \sum_\ell \sigma_\ell e^{\ell x}$ in Taylor [series](/page/Series) about $x = 0$, and matches coefficients. The condition $\rho(1) = 0$ (i.e. $\sum_\ell \rho_\ell = 0$) is necessary for the method to have any order at all.
[citeproof:489]
[example:Two-Step Adams-Bashforth (AB2)]
The method $y_{n+2} = y_{n+1} + \frac{h}{2}(3f_{n+1} - f_n)$ has $\rho(w) = w^2 - w$ and $\sigma(w) = \frac{3}{2}w - \frac{1}{2}$. Check: $\rho(e^x) - x\sigma(e^x) = (e^{2x} - e^x) - x(\frac{3}{2}e^x - \frac{1}{2}) = \frac{5}{12}x^3 + O(x^4)$, so the order is $p = 2$.
[/example]
Important classes of multi-step methods include **Adams-Bashforth** (explicit, $\rho(w) = w^{s-1}(w-1)$, order $s$), **Adams-Moulton** (implicit, same $\rho$, order $s + 1$), and **backward [differentiation](/page/Derivative) formulae** (BDF, $\sigma(w) = \sigma_s w^s$, order $s$, convergent for $s \le 6$).
High order alone does not guarantee convergence for multi-step methods — an additional structural condition is required.
[quotetheorem:490]
The root condition prevents parasitic solutions from growing. If $\rho$ has a root $w_0$ with $|w_0| > 1$, the numerical solution picks up a component proportional to $w_0^n$ that grows exponentially, even if the true solution is bounded. The proof of the Dahlquist theorem is beyond this course (it is treated in Part III).
[citeproof:490]
[example:A Method That Fails the Root Condition]
Consider the 2-step method $y_{n+2} + 4y_{n+1} - 5y_n = h(4f_{n+1} + 2f_n)$. The generating polynomials are $\rho(w) = w^2 + 4w - 5$ and $\sigma(w) = 4w + 2$. Checking the order: $\rho(e^x) - x\sigma(e^x) = (e^{2x} + 4e^x - 5) - x(4e^x + 2) = O(x^2)$ (the constant term is $1 + 4 - 5 = 0$ and the $x$ coefficient is $2 + 4 - 4 - 2 = 0$), so $p \ge 1$.
However, $\rho(w) = w^2 + 4w - 5 = (w + 5)(w - 1)$. The root $w = -5$ has $|w| = 5 > 1$, violating the root condition. Applied to the trivial ODE $y' = 0$ (solution $y(t) \equiv y_0$), the method gives $y_{n+2} = -4y_{n+1} + 5y_n$. The general solution of this recurrence is $y_n = A + B(-5)^n$. Even a tiny rounding error introduces a component $B(-5)^n$ that grows exponentially, drowning out the true (constant) solution. This method has order $\ge 1$ but is not convergent — the Dahlquist theorem correctly identifies it as defective.
[/example]
## Runge-Kutta Methods
Runge-Kutta methods achieve high order within a single-step framework by evaluating $f$ at multiple intermediate "stages" within each time step. They are motivated by Gaussian quadrature: since $y(t_{n+1}) = y(t_n) + \int_{t_n}^{t_{n+1}} f(t, y(t)) \, dt$, we can approximate the integral using quadrature nodes $t_n + c_\ell h$.
[definition:Explicit Runge-Kutta Method]
A **$\nu$-stage explicit Runge-Kutta method** computes:
\begin{align*}
k_1 &= f(t_n, y_n), \\
k_\ell &= f\!\left(t_n + c_\ell h, \, y_n + h \sum_{j=1}^{\ell-1} a_{\ell j} k_j\right), \quad \ell = 2, \dots, \nu, \\
y_{n+1} &= y_n + h \sum_{\ell=1}^\nu b_\ell k_\ell,
\end{align*}
where $\sum_{j=1}^{\ell-1} a_{\ell j} = c_\ell$ for each $\ell$.
[/definition]
[example:The Classical RK4 Method]
The most widely used Runge-Kutta method is the 4-stage, 4th-order method:
\begin{align*}
k_1 &= f(t_n, y_n), \quad k_2 = f(t_n + \tfrac{h}{2}, y_n + \tfrac{h}{2}k_1), \\
k_3 &= f(t_n + \tfrac{h}{2}, y_n + \tfrac{h}{2}k_2), \quad k_4 = f(t_n + h, y_n + hk_3), \\
y_{n+1} &= y_n + \tfrac{h}{6}(k_1 + 2k_2 + 2k_3 + k_4).
\end{align*}
This has order 4: the local truncation error is $O(h^5)$. The proof of order requires matching Taylor expansions of $y(t_{n+1})$ and the numerical formula through the $h^4$ term, which involves an elaborate comparison of "elementary differentials" — the algebraic structure of iterated derivatives of $f(t, y)$. For the test equation $y' = \lambda y$: $k_1 = \lambda y_n$, $k_2 = \lambda(1 + \frac{h\lambda}{2})y_n$, $k_3 = \lambda(1 + \frac{h\lambda}{2} + \frac{(h\lambda)^2}{4})y_n$, $k_4 = \lambda(1 + h\lambda + \frac{(h\lambda)^2}{2} + \frac{(h\lambda)^3}{4})y_n$, and the update gives $y_{n+1} = (1 + h\lambda + \frac{(h\lambda)^2}{2} + \frac{(h\lambda)^3}{6} + \frac{(h\lambda)^4}{24})y_n = \sum_{k=0}^4 \frac{(h\lambda)^k}{k!} y_n$, matching the Taylor expansion of $e^{h\lambda}$ through order 4.
[/example]
# Stability and Implementation
## Linear Stability
A numerical method should reproduce qualitative behaviour: if the true solution decays, the numerical solution should decay too. We test this on the model equation $y' = \lambda y$ with $\operatorname{Re}(\lambda) < 0$ (whose solution $y(t) = e^{\lambda t} \to 0$).
[definition:Linear Stability Domain]
The **linear stability domain** of a numerical method is $D = \{h\lambda \in \mathbb{C} : \lim_{n \to \infty} y_n = 0\}$ when the method is applied to $y' = \lambda y$.
[/definition]
[definition:A-Stability]
A method is **A-stable** if $\mathbb{C}^- := \{z \in \mathbb{C} : \operatorname{Re}(z) < 0\} \subseteq D$.
[/definition]
[example:Stability Domains of Basic Methods]
**Euler's method:** $y_n = (1 + h\lambda)^n$, so $D = \{z \in \mathbb{C} : |1 + z| < 1\}$ — a disk of radius 1 centred at $-1$. This is *not* A-stable: for $\lambda$ real and negative, we need $h < 2/|\lambda|$.
**Backward Euler:** $y_n = (1 - h\lambda)^{-n}$, so $D = \{z : |1 - z| > 1\}$ — the complement of a disk of radius 1 centred at $+1$. This *is* A-stable: $\mathbb{C}^- \subseteq D$.
**Trapezoidal rule:** $y_n = \left(\frac{1 + z/2}{1 - z/2}\right)^n$ where $z = h\lambda$. For $z = it$ (purely imaginary): $\left|\frac{1 + it/2}{1 - it/2}\right| = 1$, so $|y_n| = 1$ — the method is marginally stable on the imaginary axis. For $\operatorname{Re}(z) < 0$: $\left|\frac{1 + z/2}{1 - z/2}\right| < 1$ (the numerator has smaller modulus than the denominator). So $D = \mathbb{C}^-$ and the method is A-stable.
[/example]
A-stability is a very restrictive condition. The [Dahlquist Second Barrier](/theorems/492) states that no A-stable linear multi-step method can have order greater than 2, and among order-2 A-stable methods, the trapezoidal rule has the smallest error constant. No explicit Runge-Kutta method is A-stable either. This motivates the use of implicit methods (backward Euler, trapezoidal, implicit Runge-Kutta) for **stiff** problems — systems where $|\lambda|$ is very large, requiring impractically small $h$ for an explicit method to be stable.
To verify A-stability for more complex methods, the [Maximum Modulus Principle](/theorems/491) is invaluable: if the stability function $r(z)$ is analytic in $\mathbb{C}^-$ and $|r(z)| \le 1$ on the [boundary](/page/Boundary) $\{z : \operatorname{Re}(z) = 0\} \cup \{\infty\}$, then $|r(z)| \le 1$ throughout $\mathbb{C}^-$.
## Local Error Estimation and Implementation
In practice, the step size $h$ is chosen adaptively: small where the solution changes rapidly, large where it is smooth. **Milne's device** estimates the local truncation error by running two methods of the same order and comparing their outputs.
For example, the Adams-Bashforth method (order 2, $\eta \approx \frac{5}{12}h^3 y'''$) and the trapezoidal rule (order 2, $\eta \approx -\frac{1}{12}h^3 y'''$) give different approximations $y_{n+1}^{\text{AB}}$ and $y_{n+1}^{\text{TR}}$. Eliminating $h^3 y'''$ between the two error expansions:
\begin{align*}
y(t_{n+1}) - y_{n+1}^{\text{TR}} \approx \frac{1}{6}(y_{n+1}^{\text{AB}} - y_{n+1}^{\text{TR}}).
\end{align*}
This estimates the trapezoidal rule's error without knowing $y'''$. If the estimated error exceeds a tolerance, $h$ is halved; if it is much smaller, $h$ is doubled.
For implicit methods, each step requires solving a nonlinear equation $y_{n+1} = y_n + hf(t_{n+1}, y_{n+1})$. **Functional iteration** ($y_{n+1}^{(k+1)} = y_n + hf(t_{n+1}, y_{n+1}^{(k)})$) converges if $h\lambda < 1$ by the [contraction mapping theorem](/theorems/71), but this restriction defeats the purpose of implicit methods for stiff problems. **Newton's method** ($y_{n+1}^{(k+1)} = y_{n+1}^{(k)} - (I - hJ)^{-1}(y_{n+1}^{(k)} - y_n - hf(t_{n+1}, y_{n+1}^{(k)}))$, where $J = \nabla f$) converges without step-size restrictions and is the standard approach.
# Numerical Linear Algebra
Solving linear systems $Ax = b$ is the computational backbone of numerical methods — it arises at every step of an implicit ODE method (Newton iteration), in least-squares fitting, and in PDE discretisations. The strategy is to factor $A$ into triangular matrices, which can be solved in $O(n^2)$ operations by forward/backward substitution.
## LU Factorization
A lower triangular system $Lx = b$ is solved by **forward substitution**: $x_i = \frac{1}{L_{ii}}(b_i - \sum_{j<i} L_{ij} x_j)$. An upper triangular system $Ux = b$ is solved by **backward substitution**: $x_i = \frac{1}{U_{ii}}(b_i - \sum_{j>i} U_{ij} x_j)$. Each requires $O(n^2)$ operations.
The **LU factorization** writes $A = LU$ with $L$ unit lower triangular (ones on the diagonal) and $U$ upper triangular. If this exists, then $Ax = b$ reduces to $Ly = b$ (forward substitution) and $Ux = y$ (backward substitution). The factorization itself costs $O(n^3)$ operations, but once computed, solving for multiple right-hand sides $b$ costs only $O(n^2)$ each.
[quotetheorem:497]
The proof constructs $L$ and $U$ by Gaussian elimination: at step $k$, the $k$-th column of $L$ and $k$-th row of $U$ are extracted from the current matrix $A^{(k-1)}$, and the rank-one update $A^{(k)} = A^{(k-1)} - l_k u_k^T$ zeroes out the $k$-th column below the diagonal. The process succeeds provided $A_{kk}^{(k-1)} = U_{kk} \ne 0$ at each step, which is guaranteed when all leading principal submatrices are nonsingular.
[citeproof:497]
When $A_{kk}^{(k-1)} = 0$ (or is very small, causing numerical instability), we permute rows to bring a large entry into the pivot position. This gives the PLU factorization.
[quotetheorem:498]
In practice, **partial pivoting** (choosing the largest entry in the current column) is standard, giving $|L_{ij}| \le 1$ and excellent numerical stability.
[citeproof:498]
[example:LU Factorization of a $3 \times 3$ Matrix]
Factor $A = \begin{pmatrix} 2 & 1 & 1 \\ 4 & 3 & 3 \\ 8 & 7 & 9 \end{pmatrix}$.
**Step 1.** Extract $u_1^T = (2, 1, 1)$ (first row) and $l_1 = (1, 2, 4)^T$ (first column divided by $A_{11} = 2$). Compute $A^{(1)} = A - l_1 u_1^T$:
\begin{align*}
A^{(1)} = \begin{pmatrix} 2 & 1 & 1 \\ 4 & 3 & 3 \\ 8 & 7 & 9 \end{pmatrix} - \begin{pmatrix} 1 \\ 2 \\ 4 \end{pmatrix} \begin{pmatrix} 2 & 1 & 1 \end{pmatrix} = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 3 & 5 \end{pmatrix}.
\end{align*}
**Step 2.** From the $(2,2)$ submatrix of $A^{(1)}$: $u_2^T = (0, 1, 1)$ and $l_2 = (0, 1, 3)^T$ (dividing by $A^{(1)}_{22} = 1$). Compute $A^{(2)} = A^{(1)} - l_2 u_2^T$: the only nonzero entry is $A^{(2)}_{33} = 5 - 3 = 2$.
**Step 3.** $u_3^T = (0, 0, 2)$, $l_3 = (0, 0, 1)^T$.
Assembling: $L = \begin{pmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 4 & 3 & 1 \end{pmatrix}$, $U = \begin{pmatrix} 2 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 2 \end{pmatrix}$.
Check: $\det(A) = \det(U) = 2 \cdot 1 \cdot 2 = 4$. To solve $Ax = (4, 10, 24)^T$: forward substitution gives $y = (4, 2, 0)^T$, then backward substitution gives $x = (1, 2, 0)^T$. Verify: $A(1, 2, 0)^T = (2 + 2, 4 + 6, 8 + 14)^T = (4, 10, 22)^T$ — this checks out only if $b_3 = 22$; for $b_3 = 24$, the back-substitution gives $x = (1, 1, 1)^T$ instead.
[/example]
[example:LU Factorization Fails Without Pivoting]
The matrix $A = \begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix}$ is nonsingular ($\det A = -1$) but has $A_{11} = 0$, so the first step of Gaussian elimination cannot proceed — we cannot divide by the $(1,1)$ pivot. No LU factorization exists: if $A = LU$ with $L = \begin{pmatrix} 1 & 0 \\ l & 1 \end{pmatrix}$ and $U = \begin{pmatrix} u_{11} & u_{12} \\ 0 & u_{22} \end{pmatrix}$, then $A_{11} = u_{11} = 0$, forcing $A_{21} = l \cdot 0 = 0 \ne 1$, a contradiction.
With row pivoting: swap rows to get $PA = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} = LU$ with $L = I$ and $U = PA$. The permutation matrix is $P = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}$. This is guaranteed to work for any nonsingular matrix by the [PLU theorem](/theorems/498).
[/example]
## Special Matrices
Symmetric positive-definite matrices admit a particularly efficient factorization that exploits both symmetry and positive-definiteness.
[quotetheorem:499]
The key step is showing that all leading principal submatrices have positive determinant (restricting the quadratic form to the first $k$ coordinates gives $x^T A_k x > 0$ for $x \ne 0$). This guarantees the LU factorization exists, and symmetry forces $\hat{U} = L^T$. The diagonal entries $D_{kk} > 0$ can be verified by testing $y_k^T A y_k > 0$ where $L^T y_k = e_k$.
[citeproof:499]
The Cholesky factorization $A = GG^T$ where $G = LD^{1/2}$ is the standard form; it requires half the work of general LU.
For **band matrices** (where $A_{ij} = 0$ for $|i - j| > r$), the sparsity is preserved:
[quotetheorem:500]
This reduces the cost of solving banded systems from $O(n^3)$ to $O(nr^2)$ — a dramatic improvement for the tridiagonal systems ($r = 1$) that arise from finite difference discretisations of ODEs and PDEs.
[citeproof:500]
## Linear Least Squares and QR Factorization
The **least-squares problem** arises when we have more equations than unknowns: given $A \in \mathbb{R}^{m \times n}$ with $m > n$ and $b \in \mathbb{R}^m$, find $x^* \in \mathbb{R}^n$ minimising $\|Ax - b\|^2$.
[quotetheorem:501]
The condition $A^T(Ax^* - b) = 0$ says the residual $r^* = Ax^* - b$ is orthogonal to the column space of $A$: the best approximation $Ax^*$ is the orthogonal projection of $b$ onto $\operatorname{col}(A)$. If $A$ has full column rank, $A^TA$ is positive definite and the [solution is unique](/theorems/502).
[citeproof:501]
Solving the normal equations $A^TAx = A^Tb$ directly is possible but numerically inadvisable: the condition number of $A^TA$ is the square of that of $A$. The **QR factorization** $A = QR$ with $Q$ orthogonal and $R$ upper triangular avoids this problem.
[quotetheorem:503]
Since $Q$ is orthogonal, $\|Ax - b\| = \|QRx - b\| = \|Rx - Q^Tb\|$. Writing $Q^Tb = \binom{c}{d}$ with $c \in \mathbb{R}^n$ and $R = \binom{R_1}{0}$: $\|Ax - b\|^2 = \|R_1 x - c\|^2 + \|d\|^2$. The minimum $\|d\|^2$ is achieved by $x^* = R_1^{-1}c$ (if $A$ has full rank, $R_1$ is nonsingular).
[citeproof:503]
Three algorithms compute the QR factorization:
**Gram-Schmidt** orthogonalises the columns of $A$ directly: $a_k = \sum_{i \le k} R_{ik} q_i$ where $q_1, \dots, q_n$ are orthonormal. This targets the "skinny" QR ($Q \in \mathbb{R}^{m \times n}$, $R \in \mathbb{R}^{n \times n}$). The modified Gram-Schmidt variant reorthogonalises at each step for better numerical stability.
**Givens rotations** introduce zeros one entry at a time by applying $2 \times 2$ rotations in the $(p, q)$ plane: choosing the angle $\theta$ so that the $(q, k)$ entry becomes zero. This requires $mn - n(n+1)/2$ rotations for a full $m \times n$ matrix, but is well-suited for matrices with special sparsity patterns.
**Householder reflections** $H_u = I - 2\frac{uu^T}{u^Tu}$ introduce zeros in an entire column at once. At step $k$, the reflection is chosen so that the $k$-th column below the diagonal is mapped to a multiple of $e_k$. Only $n$ reflections are needed (versus $O(mn)$ Givens rotations), making Householder the method of choice for dense matrices. For numerical stability, the sign in $u = a \mp \|a\|e_k$ is chosen to maximise $\|u\|$ (avoiding catastrophic cancellation).
[example:Least Squares via QR]
Fit a line $y = c_0 + c_1 t$ to the data $(0, 1)$, $(1, 2)$, $(2, 4)$. The system $Ax = b$ is:
\begin{align*}
\begin{pmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} c_0 \\ c_1 \end{pmatrix} = \begin{pmatrix} 1 \\ 2 \\ 4 \end{pmatrix}.
\end{align*}
This is over-determined ($m = 3 > n = 2$). The normal equations $A^TAx = A^Tb$ give:
\begin{align*}
A^TA = \begin{pmatrix} 3 & 3 \\ 3 & 5 \end{pmatrix}, \quad A^Tb = \begin{pmatrix} 7 \\ 10 \end{pmatrix}.
\end{align*}
Solving: $\det(A^TA) = 15 - 9 = 6$, so $c_0 = (5 \cdot 7 - 3 \cdot 10)/6 = 5/6$ and $c_1 = (3 \cdot 10 - 3 \cdot 7)/6 = 9/6 = 3/2$. The best-fit line is $y = 5/6 + (3/2)t$.
**Checking the residual.** The fitted values are $5/6 \approx 0.833$, $5/6 + 3/2 = 7/3 \approx 2.333$, $5/6 + 3 = 23/6 \approx 3.833$. The residuals are $1 - 5/6 = 1/6$, $2 - 7/3 = -1/3$, $4 - 23/6 = 1/6$. The residual vector $r = (1/6, -1/3, 1/6)^T$ satisfies $A^T r = (1/6 - 1/3 + 1/6, 0 - 1/3 + 1/3)^T = (0, 0)^T$ — confirming it is orthogonal to $\operatorname{col}(A)$, as the normal equations guarantee.
The residual norm is $\|r\| = \sqrt{1/36 + 1/9 + 1/36} = \sqrt{1/6} \approx 0.408$. No line can fit these three non-collinear points exactly; $\|r\|$ measures how far the data deviates from linear.
[/example]