These notes cover the [Cambridge IB Numerical Analysis](/page/Cambridge%20IB%20Numerical%20Analysis) course. The subject addresses a single overarching question: *how do we compute with [functions](/page/Function), [integrals](/page/Integral), derivatives, and differential equations when exact symbolic answers are unavailable or impractical?* The answer draws on a surprising interplay between algebra (polynomial spaces, matrix factorisations), analysis (error bounds, convergence), and geometry (orthogonal projections, stability domains).
The course is organised in eight sections. Sections 1–3 develop the theory of **polynomial approximation**: interpolation, orthogonal polynomials, and a unified error framework via the Peano kernel. Sections 4–6 apply these tools to **ordinary differential equations**, progressing from basic convergence theory through stiffness and A-stability to practical implementation with adaptive step-size control. Sections 7–8 address the **linear algebra** that underpins all numerical methods: direct solvers via LU and Cholesky factorisation, and the QR-based solution of least-squares problems.
Throughout, definitions are motivated before they are stated, theorems are linked to their proofs, and worked examples illustrate both the theory and its computational consequences.
# Polynomial Interpolation
Polynomials are among the simplest and most versatile functions in mathematics: they are easy to evaluate, differentiate, and integrate, and a degree-$n$ polynomial is completely determined by just $n+1$ real numbers. This makes them natural candidates for approximating more complicated functions. But polynomials also appear as the *only* option in many computational settings. Numerical quadrature rules approximate integrals by weighted sums of function values — and these rules are exact for polynomials up to a certain degree. Finite difference methods for differential equations replace derivatives by polynomial difference quotients. Spectral methods expand solutions in polynomial bases. In each case, the quality of the computation depends on how well a polynomial can represent the function of interest.
The central question driving this section is: *Given a set of data points, can we construct a polynomial that passes exactly through them — and if so, how well does this polynomial approximate the underlying function elsewhere?* The answer involves two distinct ideas. The first is algebraic: the interpolation problem is always uniquely solvable, and both the Lagrange and Newton forms provide explicit constructions. The second is analytic: the error of the interpolant depends on the smoothness of the function being approximated and — crucially — on the geometric arrangement of the interpolation nodes. This interplay between algebra and analysis culminates in the theory of Chebyshev nodes, which solve a minimax problem and provide the best possible node placement for worst-case error control.
Throughout, we assume only basic knowledge of real analysis and linear algebra: [continuity](/page/Continuity), differentiability, Rolle's theorem, and the fact that a non-zero polynomial of degree $n$ has at most $n$ real roots.
## The Interpolation Problem
To capture the idea of "fitting a curve through given points," we introduce the interpolation problem precisely.
[definition:Polynomial Interpolation Problem]
Given $n+1$ distinct real numbers $x_0, x_1, \dots, x_n$ (called *interpolation nodes*) and corresponding values $f_0, f_1, \dots, f_n \in \mathbb{R}$, find a polynomial $p \in P_n[x]$ such that
\begin{align*}
p(x_i) = f_i \quad \text{for all } i = 0, 1, \dots, n,
\end{align*}
where $P_n[x]$ denotes the real vector space of polynomials of degree at most $n$.
[/definition]
The definition isolates the requirement that the approximating polynomial must match the data exactly at the specified nodes. Note that the data need not come from a known function — interpolation applies equally to experimental measurements, tabulated values, or purely abstract data. In practice, however, we often assume $f_i = f(x_i)$ for some smooth $f$, which enables the error analysis developed later in this section.
Before constructing the interpolant, we must settle the foundational question: is the problem always solvable, and is the solution unique? If two different polynomials of degree $\leq n$ could pass through the same $n+1$ data points, the interpolation problem would be ambiguous and the resulting approximation theory ill-defined.
[quotetheorem:473]
This theorem is the bedrock of polynomial interpolation. Uniqueness means that no matter how we construct the interpolant — Lagrange form, Newton form, or by solving a Vandermonde system — we always arrive at the same polynomial. Existence means the construction always succeeds. The connection to linear algebra is worth noting: the interpolation conditions $p(x_i) = f_i$ form a square linear system whose coefficient matrix (the Vandermonde matrix with entries $V_{ij} = x_i^j$) is invertible precisely when the nodes are distinct. But the Lagrange construction in the proof is far more elegant and explicit than solving a linear system.
[citeproof:473]
[example:Lagrange Interpolation for Three Points]
Let $x_0 = 0$, $x_1 = 1$, $x_2 = 2$, and $f_0 = 1$, $f_1 = 3$, $f_2 = 2$. The Lagrange cardinal polynomials are
\begin{align*}
\ell_0(x) = \frac{(x-1)(x-2)}{(0-1)(0-2)} = \frac{(x-1)(x-2)}{2}, \quad \ell_1(x) = \frac{x(x-2)}{(1)(1-2)} = -x(x-2), \quad \ell_2(x) = \frac{x(x-1)}{(2)(2-1)} = \frac{x(x-1)}{2}.
\end{align*}
Then the interpolating polynomial is
\begin{align*}
p(x) = 1 \cdot \ell_0(x) + 3 \cdot \ell_1(x) + 2 \cdot \ell_2(x) = -\tfrac{3}{2}x^2 + \tfrac{7}{2}x + 1.
\end{align*}
One verifies: $p(0) = 1$, $p(1) = -\tfrac{3}{2} + \tfrac{7}{2} + 1 = 3$, $p(2) = -6 + 7 + 1 = 2$. $\checkmark$
[/example]
## Newton's Divided Differences
The Lagrange form provides an explicit closed-form formula, but it has a serious computational drawback: if a new data point $(x_{n+1}, f_{n+1})$ is added, the entire set of cardinal polynomials $\ell_0, \dots, \ell_n$ must be recomputed (each gains an additional factor in both numerator and denominator). Newton's approach resolves this by expressing the interpolant as a *telescoping* sum, where adding a new point requires computing only one new coefficient.
The key ingredient is the divided difference, which generalises the notion of a slope to higher-order data.
[definition:Newton Divided Differences]
Given a function $f$ and distinct points $x_0, \dots, x_k$, the *divided difference* $f[x_0, \dots, x_k]$ is defined recursively by:
\begin{align*}
f[x_i] = f(x_i), \qquad f[x_j, \dots, x_k] = \frac{f[x_{j+1}, \dots, x_k] - f[x_j, \dots, x_{k-1}]}{x_k - x_j} \quad \text{for } j < k.
\end{align*}
[/definition]
The first divided difference $f[x_0, x_1] = (f(x_1) - f(x_0))/(x_1 - x_0)$ is the slope of the secant line. The second divided difference $f[x_0, x_1, x_2]$ measures how the slope changes — it is the leading coefficient of the quadratic interpolant through the three points. In general, $f[x_0, \dots, x_k]$ is the leading coefficient of the unique polynomial in $P_k[x]$ interpolating $f$ at $x_0, \dots, x_k$. A crucial but non-obvious property is symmetry: divided differences are invariant under any permutation of their arguments, even though the recursive definition uses a particular ordering.
[quotetheorem:474]
The Newton form has three practical advantages over the Lagrange form. First, *incremental computation*: adding a new point $x_{n+1}$ appends one new term $f[x_0, \dots, x_{n+1}]\prod_{i=0}^n (x - x_i)$ without disturbing the existing terms. Second, *efficient evaluation*: the nested product structure allows evaluation via a Horner-like scheme in $O(n)$ operations. Third, *error estimation*: comparing interpolants of successive degrees (the difference is exactly $f[x_0, \dots, x_{n+1}]\prod_{i=0}^n(x - x_i)$) provides a practical estimate of the interpolation error.
[citeproof:474]
[example:Newton Form and Incremental Update]
Using the data from the previous example ($x_0 = 0, x_1 = 1, x_2 = 2$ with $f_0 = 1, f_1 = 3, f_2 = 2$), the divided differences are:
\begin{align*}
f[x_0] &= 1, \quad f[x_0, x_1] = \frac{3 - 1}{1 - 0} = 2, \quad f[x_0, x_1, x_2] = \frac{\frac{2-3}{2-1} - 2}{2 - 0} = \frac{-1 - 2}{2} = -\tfrac{3}{2}.
\end{align*}
The Newton form is $p(x) = 1 + 2(x - 0) - \tfrac{3}{2}(x - 0)(x - 1) = 1 + 2x - \tfrac{3}{2}x^2 + \tfrac{3}{2}x = -\tfrac{3}{2}x^2 + \tfrac{7}{2}x + 1$, matching the Lagrange result.
Now suppose we add the point $x_3 = 3$ with $f_3 = 0$. We need only compute $f[x_0, x_1, x_2, x_3]$: first $f[x_1, x_2, x_3] = \frac{f[x_2,x_3] - f[x_1,x_2]}{x_3 - x_1} = \frac{-1 - (-1)}{2} = 0$, wait — let us redo this. $f[x_2, x_3] = (0 - 2)/(3 - 2) = -2$, $f[x_1, x_2] = (2 - 3)/(2 - 1) = -1$, so $f[x_1, x_2, x_3] = (-2 - (-1))/(3 - 1) = -1/2$. Then $f[x_0, x_1, x_2, x_3] = (-1/2 - (-3/2))/(3 - 0) = 1/3$. The updated interpolant is
\begin{align*}
p_3(x) = p(x) + \tfrac{1}{3}(x)(x-1)(x-2).
\end{align*}
No recomputation of the earlier terms was needed.
[/example]
## Error Analysis
Having established how to construct the interpolant, we now turn to the more subtle question: *how well does it approximate the underlying function at points other than the nodes?* Existence and uniqueness guarantee that the interpolant is well-defined, but they say nothing about accuracy. For the interpolant to be useful as an approximation, we need quantitative control of the error $f(x) - p_n(x)$.
The following result provides an exact pointwise error formula, revealing that the error is governed by two independent factors: the smoothness of $f$ (measured by its $(n+1)$th derivative) and the geometry of the nodes (measured by the nodal polynomial $\omega(x) = \prod_{i=0}^n(x - x_i)$).
[quotetheorem:475]
The factored structure of the error is what makes this formula so powerful. The derivative factor $f^{(n+1)}(\xi_x)/(n+1)!$ depends only on $f$ and cannot be controlled by the choice of nodes. The nodal polynomial $\omega(x) = \prod_{i=0}^n(x-x_i)$, however, depends *only* on the nodes and can be optimised. This cleanly separates the problem into two parts: understanding the function's regularity (which determines how fast the error *can* decay with $n$) and choosing the nodes wisely (which determines whether this potential is realised). The proof technique — constructing an auxiliary function with enough zeros to apply Rolle's theorem repeatedly — is a classic device in approximation theory that reappears in the analysis of quadrature rules and splines.
[citeproof:475]
Taking suprema over $[a,b]$, the error formula yields the bound
\begin{align*}
\|f - p_n\|_{L^\infty[a,b]} \leq \frac{1}{(n+1)!} \|f^{(n+1)}\|_{L^\infty[a,b]} \cdot \|\omega\|_{L^\infty[a,b]}.
\end{align*}
The first factor is fixed by $f$; the second factor is where the choice of nodes matters. This motivates the central optimisation problem: *which nodes minimise $\|\omega\|_{L^\infty}$?*
## Chebyshev Polynomials and Optimal Nodes
The minimax problem — minimising $\|\omega\|_\infty$ over all choices of $n+1$ nodes — is solved by the Chebyshev polynomials. To state the result, we work on the standard interval $[-1,1]$ (any other interval $[a,b]$ is reduced to this case by the affine change of variables $x \mapsto \frac{2x - a - b}{b - a}$).
[definition:Chebyshev Polynomial of the First Kind]
The Chebyshev polynomial of degree $n$ is defined by
\begin{align*}
T_n(x) = \cos(n \arccos x), \quad x \in [-1,1].
\end{align*}
Despite the trigonometric definition, $T_n$ is a polynomial in $x$: this follows from the recurrence relation $T_0(x) = 1$, $T_1(x) = x$, $T_{n+1}(x) = 2x\,T_n(x) - T_{n-1}(x)$.
[/definition]
The trigonometric form $T_n(\cos\theta) = \cos(n\theta)$ immediately reveals the key properties: $T_n$ oscillates between $-1$ and $+1$ exactly $n+1$ times on $[-1,1]$, and its $n$ zeros are located at $x_k = \cos\!\left(\frac{2k+1}{2n}\pi\right)$. This equioscillation is the mechanism behind the following extremal result.
[quotetheorem:476]
The equioscillation argument in the proof deserves emphasis: it shows that no monic polynomial can "beat" $\tilde{T}_{n+1}$ because any smaller competitor would have to change sign in the same places, forcing it to have too many roots for its degree. This is a manifestation of Chebyshev's equioscillation theorem, which characterises best polynomial approximations more generally.
The practical consequence for interpolation is immediate. Combining the [Error Formula](/theorems/475) with the [Minimax Property](/theorems/476), interpolation at Chebyshev nodes on $[-1,1]$ gives
\begin{align*}
\|f - p_n\|_{L^\infty[-1,1]} \leq \frac{1}{2^n(n+1)!} \|f^{(n+1)}\|_{L^\infty[-1,1]}.
\end{align*}
For analytic functions (where $\|f^{(n+1)}\|_\infty$ grows at most like $(n+1)! \, C^{n+1}$ for some constant $C$), this bound decays geometrically in $n$, guaranteeing rapid convergence. By contrast, equispaced nodes on $[-1,1]$ can produce $\|\omega\|_\infty$ that grows exponentially with $n$, leading to the Runge phenomenon: the interpolants oscillate wildly near the endpoints even for smooth functions like $f(x) = 1/(1 + 25x^2)$.
[citeproof:476]
[example:Chebyshev Nodes for $n=2$]
For $n = 2$ (three interpolation points), the Chebyshev nodes on $[-1,1]$ are the zeros of $T_3(x) = 4x^3 - 3x$:
\begin{align*}
x_0 = \cos\!\left(\frac{\pi}{6}\right) = \frac{\sqrt{3}}{2}, \quad x_1 = \cos\!\left(\frac{\pi}{2}\right) = 0, \quad x_2 = \cos\!\left(\frac{5\pi}{6}\right) = -\frac{\sqrt{3}}{2}.
\end{align*}
The nodal polynomial is $\omega(x) = (x - x_0)(x - x_1)(x - x_2) = x^3 - \frac{3}{4}x = \frac{1}{4}T_3(x)$, and $\|\omega\|_{L^\infty[-1,1]} = 1/4 = 1/2^2$, matching the theoretical minimum.
By contrast, the equispaced nodes $x_0 = -1$, $x_1 = 0$, $x_2 = 1$ give $\omega(x) = x^3 - x$ with $\|\omega\|_\infty = 2/(3\sqrt{3}) \approx 0.385 > 0.25$. The Chebyshev nodes reduce the worst-case nodal polynomial by about 35% even for this small example; the advantage grows dramatically with $n$.
[/example]
[problem]
Let $f(x) = e^x$ on $[-1, 1]$. Compute the Newton divided differences $f[x_0]$, $f[x_0, x_1]$, and $f[x_0, x_1, x_2]$ for the equispaced nodes $x_0 = -1$, $x_1 = 0$, $x_2 = 1$. Write down the Newton interpolating polynomial and bound the interpolation error using the [Error Formula](/theorems/475).
[/problem]
[solution]
**Step 1: Divided differences.**
\begin{align*}
f[x_0] &= e^{-1}, \\
f[x_0, x_1] &= \frac{e^0 - e^{-1}}{0 - (-1)} = 1 - e^{-1}, \\
f[x_1, x_2] &= \frac{e^1 - e^0}{1 - 0} = e - 1, \\
f[x_0, x_1, x_2] &= \frac{(e - 1) - (1 - e^{-1})}{1 - (-1)} = \frac{e - 2 + e^{-1}}{2}.
\end{align*}
**Step 2: Newton interpolant.**
\begin{align*}
p_2(x) &= e^{-1} + (1 - e^{-1})(x + 1) + \frac{e - 2 + e^{-1}}{2}(x + 1)x.
\end{align*}
**Step 3: Error bound.** Since $f^{(3)}(x) = e^x$, we have $\|f^{(3)}\|_{L^\infty[-1,1]} = e$. The nodal polynomial is $\omega(x) = (x+1)x(x-1) = x^3 - x$, and $\|\omega\|_{L^\infty[-1,1]} = 2/(3\sqrt{3})$. By the [Error Formula](/theorems/475):
\begin{align*}
\|f - p_2\|_{L^\infty[-1,1]} \leq \frac{e}{3!} \cdot \frac{2}{3\sqrt{3}} = \frac{e}{9\sqrt{3}} \approx 0.175.
\end{align*}
Using Chebyshev nodes instead, the bound improves to $\frac{e}{6} \cdot \frac{1}{4} = \frac{e}{24} \approx 0.113$, a reduction of about 35%.
[/solution]
# Orthogonal Polynomials
[Polynomial interpolation](/theorems/473) constructs a polynomial that matches a given function at prescribed nodes — but it guarantees agreement only at those nodes, with no systematic control over the error elsewhere. For smooth functions at well-chosen nodes (such as [Chebyshev nodes](/theorems/476)), the interpolant converges rapidly, but for noisy data or poorly distributed nodes, it can oscillate wildly between the data points. A more robust approach is to seek a polynomial that is *globally close* to $f$, not just pointwise exact at a few locations. This is the least-squares approximation problem: find $p \in P_n[x]$ minimising $\|f - p\|$ in some appropriate norm.
The solution to this problem is elegant when framed in the language of inner products and orthogonality. An inner product on a space of polynomials defines a notion of "angle" and "projection," and the best approximation is simply the orthogonal projection of $f$ onto $P_n[x]$. To compute this projection efficiently, one needs a basis of $P_n[x]$ that is *orthogonal* with respect to the inner product — a basis in which the projection coefficients decouple, so that each can be computed independently. The resulting orthogonal polynomials are not merely a computational convenience: they possess deep structural properties (a three-term recurrence, interlacing zeros, extremal properties) that make them fundamental tools in numerical integration, spectral methods, and approximation theory. The Chebyshev polynomials encountered in the [Polynomial Interpolation](/theorems/476) section are one instance; Legendre, Hermite, and Laguerre polynomials are others, each arising from a different choice of weight function.
## Inner Products on Polynomial Spaces
To make "closeness" precise, we need a way to measure the size of a polynomial — or more generally, the distance between two polynomials. An inner product provides this, together with the geometric notions of orthogonality and projection that power the entire theory.
[definition:Inner Product on Polynomial Space]
Let $V$ be a vector space of real-valued polynomials. An *inner product* on $V$ is a bilinear map $\langle \cdot, \cdot \rangle : V \times V \to \mathbb{R}$ satisfying:
(i) $\langle f, g \rangle = \langle g, f \rangle$ (symmetry),
(ii) $\langle \alpha f + g, h \rangle = \alpha \langle f, h \rangle + \langle g, h \rangle$ (linearity),
(iii) $\langle f, f \rangle \geq 0$ with equality if and only if $f = 0$ (positive definiteness).
The two most important examples are:
**Continuous inner products:** $\langle f, g \rangle = \int_a^b w(x)\,f(x)\,g(x)\,dx$, where $w > 0$ on $(a,b)$ and $\int_a^b w(x)\,x^{2n}\,dx < \infty$ for all $n \geq 0$. Different choices of weight $w$ and interval $(a,b)$ yield different families of orthogonal polynomials: $w = 1$ on $[-1,1]$ gives Legendre polynomials, $w(x) = (1-x^2)^{-1/2}$ gives Chebyshev polynomials, $w(x) = e^{-x}$ on $(0,\infty)$ gives Laguerre polynomials.
**Discrete inner products:** $\langle f, g \rangle = \sum_{j=1}^m w_j\,f(\xi_j)\,g(\xi_j)$, with $w_j > 0$ and $\xi_j$ distinct. This is defined on $P_{m-1}[x]$ (positive definiteness holds because a polynomial of degree $\leq m-1$ vanishing at $m$ distinct points must be zero). The discrete case is natural for data fitting: the inner product measures the weighted sum of squared residuals at the data points.
[/definition]
[definition:Orthogonal Polynomials]
Given an inner product $\langle \cdot, \cdot \rangle$ on a space of polynomials, a sequence $\{p_n\}_{n=0}^\infty$ with $\deg p_n = n$ is called a [sequence](/page/Sequence) of *orthogonal polynomials* if $\langle p_n, p_m \rangle = 0$ for all $n \neq m$. Equivalently, $p_n$ is orthogonal to every polynomial of lower degree: $\langle p_n, q \rangle = 0$ for all $q \in P_{n-1}[x]$.
[/definition]
Orthogonal polynomials are not unique without a normalisation convention — any non-zero scalar multiple of an orthogonal polynomial is still orthogonal. The standard convention is to require each $p_n$ to be *monic* (leading coefficient $1$), which determines it uniquely.
## Existence, Uniqueness, and the Three-Term Recurrence
The first fundamental result is that monic orthogonal polynomials always exist, are unique, and form a basis for $P_n[x]$. The proof is constructive: it is simply Gram–Schmidt orthogonalisation applied to the monomial basis $\{1, x, x^2, \dots\}$.
[quotetheorem:477]
The significance of this result is that it reduces polynomial approximation to linear algebra in an orthogonal basis. In a non-orthogonal basis like $\{1, x, x^2, \dots\}$, finding the best approximation requires solving a linear system (the normal equations), which can be ill-conditioned. In the orthogonal basis $\{p_0, p_1, \dots, p_n\}$, the coefficients decouple entirely — each is given by a single inner product, with no system to solve.
[citeproof:477]
Computing orthogonal polynomials via the full Gram–Schmidt procedure requires $O(n^2)$ inner products. A dramatic simplification comes from the following structural result: each orthogonal polynomial depends only on the two preceding ones.
[quotetheorem:478]
This recurrence is remarkable for several reasons. First, it reduces the computation of the entire sequence $\{p_0, p_1, \dots, p_n\}$ to $O(n)$ inner products — a major efficiency gain. Second, it reveals a deep connection to tridiagonal matrices: the recurrence coefficients $\alpha_k$ and $\beta_k$ are exactly the entries of a symmetric tridiagonal (Jacobi) matrix, whose eigenvalues are the zeros of $p_n$. This connection to spectral theory is the basis for efficient algorithms to compute Gaussian quadrature nodes. Third, the positivity $\beta_k > 0$ ensures that the recurrence is stable and the zeros of successive orthogonal polynomials interlace.
[citeproof:478]
[example:Chebyshev Polynomials as Orthogonal Polynomials]
The Chebyshev polynomials $T_n(x) = \cos(n\arccos x)$ are orthogonal with respect to the weight $w(x) = (1 - x^2)^{-1/2}$ on $[-1,1]$. Using the substitution $x = \cos\theta$:
\begin{align*}
\langle T_n, T_m \rangle = \int_{-1}^1 \frac{T_n(x)\,T_m(x)}{\sqrt{1-x^2}}\,dx = \int_0^\pi \cos(n\theta)\cos(m\theta)\,d\theta = \begin{cases} 0 & n \neq m, \\ \pi/2 & n = m \geq 1, \\ \pi & n = m = 0. \end{cases}
\end{align*}
The monic Chebyshev polynomials are $\tilde{T}_n = 2^{1-n}T_n$ for $n \geq 1$. The three-term recurrence for $T_n$ is $T_{n+1}(x) = 2x\,T_n(x) - T_{n-1}(x)$, corresponding to $\alpha_k = 0$ (by symmetry of the weight) and $\beta_k = 1/4$ for $k \geq 2$ in the monic normalisation.
[/example]
[example:Legendre Polynomials]
For the unit weight $w(x) = 1$ on $[-1,1]$, the monic orthogonal polynomials are the monic Legendre polynomials. The first few are:
\begin{align*}
p_0(x) = 1, \quad p_1(x) = x, \quad p_2(x) = x^2 - \tfrac{1}{3}, \quad p_3(x) = x^3 - \tfrac{3}{5}x.
\end{align*}
Since the weight is symmetric about $0$, each $\alpha_k = 0$ (odd integrands vanish). The recurrence coefficients $\beta_k$ can be computed explicitly: $\beta_k = k^2/(4k^2 - 1)$.
[/example]
## Least-Squares Polynomial Approximation
With orthogonal polynomials in hand, the least-squares approximation problem has an immediate and elegant solution: the best polynomial approximation is the orthogonal projection of $f$ onto $P_n[x]$.
[quotetheorem:479]
The formula $c_k = \langle f, p_k \rangle / \langle p_k, p_k \rangle$ is the polynomial analogue of the Fourier coefficient formula. Each coefficient depends only on $f$ and $p_k$ — not on any of the other basis polynomials or coefficients. This means the approximation is *incrementally refinable*: increasing the degree from $n$ to $n+1$ requires computing only one new coefficient $c_{n+1}$, without changing the existing ones. The error formula $\|f - p\|^2 = \|f\|^2 - \sum c_k^2 \langle p_k, p_k \rangle$ shows that the error decreases monotonically with $n$, and the decrease at each step equals $c_k^2 \langle p_k, p_k \rangle$ — the "energy" captured by the $k$th component.
[citeproof:479]
A natural question is whether the error can be driven to zero — that is, whether the orthogonal expansion converges. This requires knowing that polynomials are dense in the relevant function space. The following classical result provides the answer.
[quotetheorem:480]
The Weierstrass theorem guarantees that continuous functions on compact intervals can be uniformly approximated by polynomials. Combined with the least-squares theorem, this yields a completeness result: the orthogonal polynomial expansion of any continuous function converges in the $L^2_w$ norm.
[citeproof:480]
[quotetheorem:481]
This is the polynomial Parseval identity: the expansion captures the full "energy" $\|f\|^2$, with no leakage. The proof combines the [Least-Squares Theorem](/theorems/479) (which gives the error formula) with the [Weierstrass Theorem](/theorems/480) (which guarantees that best polynomial approximations converge to zero). Just as in Fourier analysis, Parseval's identity is the key tool for transferring between function-space and coefficient-space computations.
[citeproof:481]
## Gaussian Quadrature
Orthogonal polynomials solve not only approximation problems but also a fundamental problem of numerical integration: given a weight function $w$ on $(a,b)$, choose $\nu$ nodes and weights to make the quadrature rule $\int_a^b w\,f\,dx \approx \sum_{k=1}^\nu b_k\,f(c_k)$ as accurate as possible. With $\nu$ free nodes and $\nu$ free weights, one has $2\nu$ parameters, so one might hope for exactness on all polynomials of degree up to $2\nu - 1$ — a space of dimension $2\nu$. This turns out to be achievable, and the optimal nodes are precisely the zeros of $p_\nu$.
First, we need to know that these zeros are well-behaved.
[quotetheorem:482]
The proof is a beautiful application of orthogonality: if $p_n$ had fewer than $n$ sign changes, one could construct a polynomial of lower degree that does not oscillate against $p_n$, contradicting orthogonality. The fact that all zeros are simple and interior to $(a,b)$ makes them suitable as quadrature nodes.
[citeproof:482]
[quotetheorem:483]
The exactness for degree $2\nu - 1$ is twice what one gets from naive interpolatory quadrature (which uses $\nu$ nodes to achieve exactness for degree $\nu - 1$). The "doubling" comes from orthogonality: when a polynomial of degree $\leq 2\nu - 1$ is divided by $p_\nu$, the quotient is orthogonal to $p_\nu$ and the remainder is interpolated exactly. The positivity of weights $b_k > 0$ is essential for numerical stability — it ensures that rounding errors in function evaluations do not get amplified by sign changes in the weights.
[citeproof:483]
[example:Gauss–Legendre Quadrature]
For $w(x) = 1$ on $[-1,1]$, the orthogonal polynomials are the Legendre polynomials. The $\nu = 2$ Gauss–Legendre rule uses the zeros of $p_2(x) = x^2 - 1/3$, namely $c_{1,2} = \pm 1/\sqrt{3}$, with weights $b_1 = b_2 = 1$ (by symmetry). This two-point rule is exact for all polynomials of degree $\leq 3$:
\begin{align*}
\int_{-1}^1 f(x)\,dx \approx f\!\left(-\tfrac{1}{\sqrt{3}}\right) + f\!\left(\tfrac{1}{\sqrt{3}}\right).
\end{align*}
For comparison, the two-point trapezoidal rule $f(-1) + f(1)$ is exact only for degree $\leq 1$. Gaussian quadrature achieves twice the polynomial exactness with the same number of function evaluations.
[/example]
## Discrete Least-Squares Fitting
The same framework applies verbatim to discrete data. Given values $f(\xi_1), \dots, f(\xi_m)$ at $m$ distinct points, we minimise $\sum_{j=1}^m w_j(f(\xi_j) - p(\xi_j))^2$ over $p \in P_n[x]$ with $n < m$. (The case $n = m - 1$ reduces to interpolation; the interesting case is $n \ll m$, where there are far more data points than parameters.) Using the discrete inner product $\langle f, g \rangle = \sum_{j=1}^m w_j\,f(\xi_j)\,g(\xi_j)$, the [Least-Squares Theorem](/theorems/479) gives the solution immediately:
\begin{align*}
p(x) = \sum_{k=0}^n \frac{\langle f, p_k \rangle}{\langle p_k, p_k \rangle}\,p_k(x),
\end{align*}
where $\{p_k\}$ are the monic orthogonal polynomials for the discrete inner product, computed via the [Three-Term Recurrence](/theorems/478). This approach is numerically superior to solving the normal equations directly, because the orthogonal basis eliminates the ill-conditioning that plagues the Vandermonde and Gram matrices in the monomial basis.
[problem]
Given $m = 4$ data points $(\xi_j, f_j)$ at $\xi_1 = -1$, $\xi_2 = -0.5$, $\xi_3 = 0.5$, $\xi_4 = 1$ with unit weights $w_j = 1$, compute the discrete monic orthogonal polynomials $p_0, p_1, p_2$ via the three-term recurrence and write down the least-squares quadratic fit in the orthogonal basis.
[/problem]
[solution]
**Step 1: $p_0$ and $p_1$.** We have $p_0(x) = 1$. The recurrence gives $p_1(x) = x - \alpha_0$ where
\begin{align*}
\alpha_0 = \frac{\langle x, 1 \rangle}{\langle 1, 1 \rangle} = \frac{(-1) + (-0.5) + 0.5 + 1}{4} = 0,
\end{align*}
so $p_1(x) = x$.
**Step 2: $p_2$.** We need $\alpha_1 = \langle x \cdot p_1, p_1 \rangle / \langle p_1, p_1 \rangle$ and $\beta_1 = \langle p_1, p_1 \rangle / \langle p_0, p_0 \rangle$. Computing:
\begin{align*}
\langle p_1, p_1 \rangle &= (-1)^2 + (-0.5)^2 + 0.5^2 + 1^2 = 2.5, \\
\langle x^2, x \rangle &= (-1)^3 + (-0.5)^3 + 0.5^3 + 1^3 = 0, \quad \text{so } \alpha_1 = 0, \\
\beta_1 &= 2.5/4 = 0.625.
\end{align*}
Therefore $p_2(x) = x \cdot p_1(x) - 0.625 \cdot p_0(x) = x^2 - 0.625$.
**Step 3: Least-squares coefficients.** For data values $(f_1, f_2, f_3, f_4)$, the fit is $p(x) = c_0 + c_1\,x + c_2(x^2 - 0.625)$ with
\begin{align*}
c_0 = \frac{\sum f_j}{4}, \quad c_1 = \frac{\sum f_j \xi_j}{2.5}, \quad c_2 = \frac{\sum f_j(\xi_j^2 - 0.625)}{\langle p_2, p_2 \rangle},
\end{align*}
where $\langle p_2, p_2 \rangle = 2(1 - 0.625)^2 + 2(0.25 - 0.625)^2 = 2(0.140625 + 0.140625) = 0.5625$. Each coefficient is computed by a single inner product — no linear system required.
[/solution]
# Error of Approximation
The previous sections established *how* to approximate: [polynomial interpolation](/theorems/473) constructs a polynomial matching a function at prescribed nodes, while [orthogonal polynomial expansions](/theorems/479) provide optimal least-squares fits. But a method is only useful if we can quantify its accuracy. When we replace a function $f$ by an interpolating polynomial, or approximate an integral by a [Gaussian quadrature rule](/theorems/483), we inevitably introduce error — and sharp bounds on that error determine whether the method is viable in practice, especially when function evaluations are costly or the function is only moderately smooth.
The central insight of this section is that *all* of these error analyses can be unified through a single framework: the **Peano kernel**. The idea is simple but powerful. Any linear approximation scheme that is exact on polynomials up to some degree $k$ defines a linear error functional that annihilates $P_k[x]$. The Peano kernel theorem represents this error functional as an integral of $f^{(k+1)}$ against a kernel function $K_L$ that depends *only on the scheme* and not on $f$. Once $K_L$ is known, the error bound reduces to computing $\|K_L\|_{L^1}$ — a one-time calculation per scheme. This separates the problem cleanly: the smoothness of $f$ contributes through $\|f^{(k+1)}\|_\infty$, while the quality of the scheme contributes through the $L^1$ norm of its Peano kernel.
## Error Functionals and Peano Kernels
Every approximation scheme can be viewed as a linear functional applied to $f$. For instance, the [interpolation error formula](/theorems/475) gives $f(x) - p_n(x) = \frac{f^{(n+1)}(\xi_x)}{(n+1)!}\omega(x)$, which is the value of the linear functional $L_x(f) = f(x) - p_n(x)$ (linear in $f$ because $p_n$ depends linearly on $f$). Similarly, the quadrature error $\int_a^b w\,f\,dx - \sum a_i f(x_i)$ is a linear functional of $f$. In both cases, the functional vanishes on polynomials up to a certain degree.
[definition:Peano Kernel]
Let $L$ be a linear functional on $C^{k+1}[a,b]$ that annihilates all polynomials of degree $\leq k$: $L(p) = 0$ for all $p \in P_k[x]$. The *Peano kernel* of $L$ is the function $K_L : [a,b] \to \mathbb{R}$ defined by
\begin{align*}
K_L(\theta) = L\!\left((x - \theta)_+^k\right),
\end{align*}
where $(x - \theta)_+^k = \max(x - \theta, 0)^k$ is the truncated power function.
[/definition]
The truncated power $(x - \theta)_+^k$ is a piecewise polynomial of degree $k$ in $x$ with a "kink" at $x = \theta$: it is $C^{k-1}$ but not $C^k$. As $\theta$ varies, this family sweeps through all possible locations of the kink, probing how $L$ responds to non-polynomial behaviour at each point. The Peano kernel $K_L(\theta)$ records this response.
## The Peano Kernel Theorem
The fundamental result states that the error of $L$ on a general smooth function is the integral of its highest derivative against the Peano kernel.
[quotetheorem:484]
The theorem is remarkable for its generality: it applies to any linear scheme that is exact on polynomials, including interpolation errors, quadrature errors, finite difference approximation errors, and collocation errors. The proof is short and elegant — it applies $L$ to [Taylor's theorem with integral remainder](/theorems/189), using the annihilation condition to kill the polynomial terms and the commutativity condition to interchange $L$ with the integral.
[citeproof:484]
The immediate consequence is an error bound: taking absolute values and pulling out $\|f^{(k+1)}\|_\infty$ gives $|L(f)| \leq \frac{\|f^{(k+1)}\|_\infty}{k!}\int_a^b |K_L(\theta)|\,d\theta$. The next result shows this bound is sharp — no tighter constant is possible.
[quotetheorem:485]
The sharpness is established by constructing [test functions](/page/Test%20Function) whose $(k+1)$th derivative approximates $\operatorname{sgn}(K_L)$, which maximises the integral $\int K_L\,f^{(k+1)}\,d\theta$. This means the error constant $c_L$ is not merely an upper bound but the *exact* worst-case error rate.
[citeproof:485]
An important special case occurs when the Peano kernel does not change sign on $[a,b]$. In this case, the [mean value theorem](/theorems/186) for integrals applies, and the error takes the simpler form $L(f) = \frac{f^{(k+1)}(\zeta)}{k!}\int_a^b K_L(\theta)\,d\theta$ for some $\zeta \in (a,b)$. The integral $\int_a^b K_L(\theta)\,d\theta$ can then be evaluated by testing $L$ on a single monomial $x^{k+1}/(k+1)!$. Many classical error formulas — including those for the trapezoidal rule, Simpson's rule, and the midpoint rule — are derived this way.
[example:Simpson's Rule Error via Peano Kernel]
Consider Simpson's rule on $[-1,1]$:
\begin{align*}
L(f) = \int_{-1}^1 f(x)\,dx - \tfrac{1}{3}\!\left(f(-1) + 4f(0) + f(1)\right).
\end{align*}
One checks directly that $L(1) = L(x) = L(x^2) = L(x^3) = 0$, so $L$ annihilates $P_3[x]$. By the [Peano Kernel Theorem](/theorems/484) with $k = 3$, $L(f) = \frac{1}{3!}\int_{-1}^1 K_L(\theta)\,f^{(4)}(\theta)\,d\theta$. Computing $K_L(\theta) = L\!\left((x-\theta)_+^3\right)$ by evaluating the integral and the point evaluations, one obtains (after simplification)
\begin{align*}
K_L(\theta) = \begin{cases} -\tfrac{1}{3}\theta^2(1+\theta)^2 & -1 \leq \theta \leq 0, \\ -\tfrac{1}{3}\theta^2(1-\theta)^2 & 0 \leq \theta \leq 1. \end{cases}
\end{align*}
Since $K_L(\theta) \leq 0$ on $[-1,1]$ (the kernel does not change sign), the mean value theorem gives $L(f) = \frac{f^{(4)}(\zeta)}{6}\int_{-1}^1 K_L(\theta)\,d\theta$ for some $\zeta$. Evaluating the integral: $\int_{-1}^1 K_L(\theta)\,d\theta = -\frac{4}{45}$, giving the classical Simpson's rule error formula
\begin{align*}
\int_{-1}^1 f(x)\,dx - \tfrac{1}{3}(f(-1) + 4f(0) + f(1)) = -\frac{2}{45}\,f^{(4)}(\zeta).
\end{align*}
On an interval of width $h$ (by scaling), this becomes $-\frac{h^5}{90}\,f^{(4)}(\zeta)$, confirming Simpson's rule is fourth-order accurate despite using only three function evaluations — a consequence of the "bonus" degree of exactness from symmetry.
[/example]
## Application: Error of Gaussian Quadrature
The Peano kernel framework provides a clean derivation of the error bound for [Gaussian quadrature](/theorems/483). Since $\nu$-point Gaussian quadrature is exact on $P_{2\nu-1}[x]$, the error functional $L$ annihilates polynomials of degree up to $2\nu - 1$, and the Peano kernel theorem applies with $k = 2\nu - 1$. The resulting Peano kernel turns out to have constant sign (a consequence of the positivity of the quadrature weights), which enables a mean-value-theorem simplification and connects the error constant to the $L^2_w$ norm of the orthogonal polynomial.
[quotetheorem:486]
This result is significant for two reasons. First, the error constant $\|Q_{n+1}\|_{L^2_w}^2/(2n+2)!$ involves only the orthogonal polynomial and the weight — quantities that are computable from the [three-term recurrence](/theorems/478). Second, the factorial $(2n+2)!$ in the denominator ensures rapid decay of the error constant with $n$: for analytic functions whose derivatives grow at most geometrically, the Gaussian quadrature error converges faster than any polynomial in $1/n$.
[citeproof:486]
## Application: Error in Derivative Approximation
When a function $f$ is replaced by its interpolant $\ell_\Delta$, the derivatives of $\ell_\Delta$ serve as approximations to the derivatives of $f$. This is the basis of finite difference methods: the derivative $f'(x)$ is approximated by $\ell_\Delta'(x)$, where $\ell_\Delta$ interpolates $f$ at nearby nodes. How accurate is this approximation?
[quotetheorem:487]
The error is controlled by $\|\omega_\Delta^{(k)}\|_\infty$ — the supremum of the $k$th derivative of the nodal polynomial. For equispaced nodes with spacing $h$, one can show $\|\omega_\Delta^{(k)}\|_\infty = O(h^{n+1-k})$, giving the familiar finite-difference convergence rate: the $k$th derivative is approximated to order $O(h^{n+1-k})$ by a degree-$n$ interpolant on $n+1$ equispaced nodes. For Chebyshev nodes, the bound is tighter still (by the [Minimax Property](/theorems/476)), which is why spectral [differentiation](/page/Derivative) matrices — based on interpolation at Chebyshev nodes — achieve spectral (exponential) accuracy for analytic functions.
[citeproof:487]
[problem]
The composite trapezoidal rule on $[0, 1]$ with $N$ equal subintervals of width $h = 1/N$ is
\begin{align*}
T_N(f) = h\left(\tfrac{1}{2}f(0) + \sum_{j=1}^{N-1} f(jh) + \tfrac{1}{2}f(1)\right).
\end{align*}
Use the Peano kernel theorem to derive the error formula $\int_0^1 f(x)\,dx - T_N(f) = -\frac{h^2}{12}\,f''(\zeta)$ for a single panel $[0, h]$, and then obtain the composite error bound $|\int_0^1 f\,dx - T_N(f)| \leq \frac{h^2}{12}\,\|f''\|_\infty$.
[/problem]
[solution]
**Step 1: Single panel.** On $[0, h]$, the trapezoidal rule is $L(f) = \int_0^h f\,dx - \frac{h}{2}(f(0) + f(h))$. One checks $L(1) = h - h = 0$ and $L(x) = h^2/2 - h \cdot h/2 = 0$, so $L$ annihilates $P_1[x]$. Apply the [Peano Kernel Theorem](/theorems/484) with $k = 1$:
\begin{align*}
K_L(\theta) = L\!\left((x - \theta)_+\right) = \int_\theta^h (x - \theta)\,dx - \frac{h}{2}(h - \theta) = \frac{(h-\theta)^2}{2} - \frac{h(h-\theta)}{2} = -\frac{\theta(h-\theta)}{2}.
\end{align*}
Since $K_L(\theta) = -\theta(h-\theta)/2 \leq 0$ on $[0,h]$, the kernel does not change sign. The mean value theorem gives $L(f) = f''(\zeta)\int_0^h K_L(\theta)\,d\theta = f''(\zeta) \cdot (-h^3/12)$, so
\begin{align*}
\int_0^h f\,dx - \frac{h}{2}(f(0) + f(h)) = -\frac{h^3}{12}\,f''(\zeta_j)
\end{align*}
for some $\zeta_j$ in the $j$th panel.
**Step 2: Composite rule.** Summing over $N$ panels: $\int_0^1 f\,dx - T_N(f) = -\frac{h^3}{12}\sum_{j=1}^N f''(\zeta_j)$. By the [intermediate value theorem](/theorems/629) (since $f'' \in C[0,1]$), $\frac{1}{N}\sum f''(\zeta_j) = f''(\zeta)$ for some $\zeta \in [0,1]$. Since $Nh = 1$:
\begin{align*}
\int_0^1 f\,dx - T_N(f) = -\frac{h^3 N}{12}\,f''(\zeta) = -\frac{h^2}{12}\,f''(\zeta),
\end{align*}
giving $\left|\int_0^1 f\,dx - T_N(f)\right| \leq \frac{h^2}{12}\,\|f''\|_\infty$. The composite trapezoidal rule is second-order accurate: halving $h$ reduces the error by a factor of $4$.
[/solution]
# Ordinary Differential Equations
The preceding sections developed tools for approximating *functions*: polynomial interpolation, orthogonal expansions, and the Peano kernel framework for quantifying error. We now turn to a fundamentally different problem — approximating *dynamics*. An ordinary differential equation $y'(t) = f(t, y(t))$ describes a continuous evolution, but a computer can store only finitely many values. The central challenge of numerical ODE theory is to replace the continuous trajectory $y(t)$ by a discrete sequence $y_0, y_1, y_2, \dots$ that approximates it at grid points $t_n = nh$, and to guarantee that this approximation converges as the step size $h \to 0$.
The connection to approximation theory is direct. Every numerical ODE method is, at its core, an approximation of the integral $y(t_{n+1}) = y(t_n) + \int_{t_n}^{t_{n+1}} f(t, y(t))\,dt$. Replacing this integral by a quadrature rule — using current and past values of $f$ — yields a multi-step method. Evaluating $f$ at intermediate points within the step yields a Runge–Kutta method. The error analysis of these methods draws on exactly the Taylor expansion and polynomial exactness techniques from the [Error of Approximation](/theorems/484) section. The new ingredient is *error propagation*: a local truncation error of $O(h^{p+1})$ per step accumulates over $O(1/h)$ steps, and the Lipschitz constant of $f$ controls whether this accumulation is benign or catastrophic.
## The Initial Value Problem
We consider the initial value problem (IVP)
\begin{align*}
y'(t) = f(t, y(t)), \quad y(0) = y_0,
\end{align*}
where $f: [0,T] \times \mathbb{R}^N \to \mathbb{R}^N$ and $y_0 \in \mathbb{R}^N$. The Picard–Lindelöf theorem guarantees a unique solution $y \in C^1([0,T]; \mathbb{R}^N)$ provided $f$ is Lipschitz continuous in its second argument — a condition we now make precise.
[definition:Lipschitz Function]
A function $f: [0,T] \times \mathbb{R}^N \to \mathbb{R}^N$ is *Lipschitz with constant $\lambda \geq 0$* if
\begin{align*}
\|f(t, x) - f(t, \hat{x})\| \leq \lambda\,\|x - \hat{x}\|
\end{align*}
for all $t \in [0,T]$ and $x, \hat{x} \in \mathbb{R}^N$.
[/definition]
The Lipschitz condition ensures that small perturbations in the state $y$ produce proportionally small changes in the velocity $f$. This is essential for both the well-posedness of the IVP and the stability of numerical methods: if $f$ amplifies errors without bound, no discrete approximation can converge. Note that Lipschitz continuity is weaker than differentiability — $f$ need not be smooth — and the choice of norm is immaterial since all norms on $\mathbb{R}^N$ are equivalent.
## One-Step Methods and Convergence
The simplest class of numerical methods advances the solution one step at a time, using only the current state.
[definition:One-Step Method]
A numerical method is a *(explicit) one-step method* if the next iterate $y_{n+1}$ depends only on $(t_n, y_n)$:
\begin{align*}
y_{n+1} = \phi_h(t_n, y_n)
\end{align*}
for some function $\phi_h: [0,T] \times \mathbb{R}^N \to \mathbb{R}^N$.
[/definition]
[definition:Convergence]
A numerical method *converges* if $\lim_{h \to 0} \max_{0 \leq n \leq \lfloor T/h \rfloor} \|y_n(h) - y(nh)\| = 0$, where $y(nh)$ is the exact solution evaluated at $t = nh$.
[/definition]
[definition:Local Truncation Error and Order]
The *local truncation error* at step $n+1$ is $\eta_{n+1} = y(t_{n+1}) - \phi_h(t_n, y(t_n))$: the error of a single step starting from the exact solution. The *order* of the method is the largest integer $p \geq 1$ such that $\|\eta_{n+1}\| = O(h^{p+1})$ as $h \to 0$.
[/definition]
The distinction between local and global error is crucial. The local truncation error measures the accuracy of a *single step*; the global error $e_n = y_n - y(t_n)$ measures the accumulated effect over *all steps*. A method of order $p$ has local error $O(h^{p+1})$, but over $O(1/h)$ steps the global error is $O(h^p)$ — one order lower. The following theorem makes this precise for the simplest method.
[quotetheorem:488]
The bound reveals two competing effects. The factor $ch$ shows that the global error is first-order in $h$ — one order less than the $O(h^2)$ local truncation error, because errors accumulate over $O(1/h)$ steps. The exponential factor $e^{\lambda T}$ shows that the Lipschitz constant $\lambda$ controls error amplification: large $\lambda$ (steep vector field) leads to exponential growth of the error bound. This exponential sensitivity is fundamental — it is not an artefact of the analysis but reflects genuine instability in the underlying ODE when $\lambda$ is large.
[citeproof:488]
[example:Euler's Method Applied to the Scalar Exponential]
Consider $y' = -y$, $y(0) = 1$ on $[0, 1]$, with exact solution $y(t) = e^{-t}$. Here $\lambda = 1$. Euler's method gives $y_{n+1} = y_n - hy_n = (1-h)y_n$, so $y_n = (1-h)^n$. At $t = 1$ (i.e. $n = 1/h$), $y_n = (1-h)^{1/h}$. As $h \to 0$, $(1-h)^{1/h} \to e^{-1}$, confirming convergence. The error is $|e^{-1} - (1-h)^{1/h}| = O(h)$, consistent with the first-order bound.
[/example]
## The $\theta$-Method
A natural generalisation of Euler's method interpolates between evaluating $f$ at the beginning and end of the step.
[definition:$\theta$-Method]
For $\theta \in [0,1]$, the *$\theta$-method* is
\begin{align*}
y_{n+1} = y_n + h\Big(\theta\,f(t_n, y_n) + (1-\theta)\,f(t_{n+1}, y_{n+1})\Big).
\end{align*}
[/definition]
When $\theta = 1$, this is explicit (forward) Euler. When $\theta = 0$, it is implicit (backward) Euler — $y_{n+1}$ appears on both sides, requiring the solution of a nonlinear equation at each step. When $\theta = 1/2$, it is the trapezoidal rule (also called Crank–Nicolson in the PDE literature). Taylor expansion shows that only $\theta = 1/2$ achieves second order; all other values give first-order methods. The advantage of implicit methods ($\theta < 1$) is *stability*: they can handle stiff equations where explicit methods require impractically small step sizes (see the [Stiff Equations](/pages/stiff-equations) section).
## Linear Multi-Step Methods
One-step methods use only the current state $y_n$ to compute $y_{n+1}$. Multi-step methods use a window of $s$ previous values $y_n, y_{n-1}, \dots, y_{n-s+1}$, gaining accuracy by exploiting the trajectory's history.
An $s$-step linear multi-step method has the form
\begin{align*}
\sum_{\ell=0}^s \rho_\ell\,y_{n+\ell} = h\sum_{\ell=0}^s \sigma_\ell\,f(t_{n+\ell}, y_{n+\ell}),
\end{align*}
where the characteristic polynomials $\rho(w) = \sum_\ell \rho_\ell w^\ell$ and $\sigma(w) = \sum_\ell \sigma_\ell w^\ell$ define the method, normalised so that $\rho_s = 1$. The method is *explicit* if $\sigma_s = 0$ (so $y_{n+s}$ can be computed directly) and *implicit* otherwise.
The order conditions for multi-step methods can be expressed elegantly using generating functions.
[quotetheorem:489]
This reformulation is extremely convenient: instead of checking $p+1$ separate algebraic conditions, one expands a single function $\rho(e^x) - x\sigma(e^x)$ in powers of $x$ and reads off the order from where the expansion begins. It also guides the *design* of methods: given $\rho$ (or $\sigma$), the remaining polynomial is determined by requiring maximal order.
[citeproof:489]
Two important families arise from specific structural choices:
**Adams methods** fix $\rho(w) = w^{s-1}(w-1)$ (a simple differencing operator) and choose $\sigma$ for maximal order. If $\sigma_s = 0$ (explicit), the result is an *Adams–Bashforth* method; if $\sigma_s \neq 0$ (implicit), it is an *Adams–Moulton* method. The $s$-step Adams–Bashforth method has order $s$, while the $s$-step Adams–Moulton method has order $s+1$ — one extra order from the implicit evaluation.
**Backward differentiation formulas (BDF)** fix $\sigma(w) = \sigma_s w^s$ (using only $f$ at the newest time level) and choose $\rho$ for maximal order. The $s$-step BDF method has order $s$. BDF methods are particularly suited to stiff problems because their stability regions are large.
[example:Adams–Bashforth 2-Step Method (AB2)]
The AB2 method is
\begin{align*}
y_{n+2} = y_{n+1} + \tfrac{h}{2}\big(3f(t_{n+1}, y_{n+1}) - f(t_n, y_n)\big).
\end{align*}
Here $\rho(w) = w^2 - w$ and $\sigma(w) = \frac{1}{2}(3w - 1)$ (with $\sigma_2 = 0$, so explicit). The [generating polynomial test](/theorems/489) gives $\rho(e^x) - x\sigma(e^x) = (e^{2x} - e^x) - x \cdot \frac{1}{2}(3e^x - 1) = \frac{5}{12}x^3 + O(x^4)$, confirming order $p = 2$. The roots of $\rho(w) = w(w-1) = 0$ are $w = 0$ and $w = 1$, both satisfying the root condition.
[/example]
## The Dahlquist Equivalence Theorem
The central question for multi-step methods is: when does the method converge? The answer separates into two independent conditions — one algebraic (order), one analytic (stability).
[definition:Root Condition]
The *root condition* holds for the polynomial $\rho(w) = \sum_{\ell=0}^s \rho_\ell w^\ell$ if every root $w$ of $\rho(w) = 0$ satisfies $|w| \leq 1$, and any root with $|w| = 1$ is simple.
[/definition]
The root condition is a *stability* requirement: it ensures that the homogeneous recurrence $\sum_\ell \rho_\ell y_{n+\ell} = 0$ (the "parasitic" solution introduced by the multi-step format) does not grow without bound. Violation of the root condition produces spurious oscillations that overwhelm the true solution.
[quotetheorem:490]
This is one of the most important results in numerical analysis. It cleanly decomposes convergence into *consistency* (the method has enough accuracy, order $\geq 1$) and *zero-stability* (the method does not amplify parasitic modes, root condition). Neither condition alone suffices: a method can be high-order yet divergent if the root condition fails, or stable yet inaccurate if the order is zero. The theorem also explains why some natural-looking multi-step formulas are useless in practice — for instance, the midpoint rule $y_{n+2} = y_n + 2hf(t_{n+1}, y_{n+1})$ has order 2 and $\rho(w) = w^2 - 1$ with roots $\pm 1$, both simple, so it satisfies the root condition and converges. But the "leapfrog with memory" rule $y_{n+2} = -y_{n+1} + 2y_n + h(\cdots)$ would have $\rho(w) = w^2 + w - 2$ with a root at $w = -2$, violating the root condition.
[citeproof:490]
## Runge–Kutta Methods
Multi-step methods gain accuracy by using the *history* of the trajectory. Runge–Kutta (RK) methods take a different approach: they gain accuracy by evaluating $f$ at multiple *intermediate* points within a single step, without retaining values from previous steps.
[definition:Runge–Kutta Method]
A *$\nu$-stage Runge–Kutta method* computes
\begin{align*}
y_{n+1} = y_n + h\sum_{\ell=1}^\nu b_\ell\,k_\ell, \quad \text{where} \quad k_\ell = f\!\left(t_n + c_\ell h,\; y_n + h\sum_{j=1}^\nu a_{\ell j}\,k_j\right),
\end{align*}
with $c_\ell = \sum_{j=1}^\nu a_{\ell j}$. The method is encoded by its *Butcher tableau*:
\begin{align*}
\begin{array}{c|c} c & A \\ \hline & b^T \end{array}
\end{align*}
where $A = (a_{\ell j})$, $b = (b_1, \dots, b_\nu)^T$, $c = (c_1, \dots, c_\nu)^T$. The method is *explicit* if $A$ is strictly lower triangular ($a_{\ell j} = 0$ for $j \geq \ell$); otherwise it is *implicit*.
[/definition]
RK methods are *self-starting* (they need only $y_0$, not a window of previous values) and *modular* (the number of stages can be adjusted independently of the problem). The price of high order is multiple $f$-evaluations per step: a $\nu$-stage method requires $\nu$ evaluations. For explicit methods, the maximum achievable order with $\nu$ stages is $p = \nu$ for $\nu \leq 4$; for $\nu \geq 5$, the order is strictly less than $\nu$ due to algebraic constraints (the "Butcher barriers").
[example:The Classical Fourth-Order Runge–Kutta Method (RK4)]
The most widely used explicit RK method is the four-stage, fourth-order scheme:
\begin{align*}
k_1 &= f(t_n, y_n), \\
k_2 &= f(t_n + \tfrac{h}{2}, y_n + \tfrac{h}{2}k_1), \\
k_3 &= f(t_n + \tfrac{h}{2}, y_n + \tfrac{h}{2}k_2), \\
k_4 &= f(t_n + h, y_n + hk_3), \\
y_{n+1} &= y_n + \tfrac{h}{6}(k_1 + 2k_2 + 2k_3 + k_4).
\end{align*}
This is Simpson's rule applied to $\int_{t_n}^{t_{n+1}} f\,dt$, with the intermediate values $k_2, k_3$ providing the midpoint evaluation. Four function evaluations per step yield fourth-order accuracy — the best possible for a four-stage explicit RK method.
[/example]
[example:Second-Order Runge–Kutta and Order Barriers]
A two-stage explicit RK method has the form $y_{n+1} = y_n + h(b_1k_1 + b_2k_2)$ with $k_1 = f(t_n, y_n)$ and $k_2 = f(t_n + c_2 h, y_n + c_2 hk_1)$. Expanding via Taylor series and matching terms up to $O(h^2)$ gives the conditions $b_1 + b_2 = 1$ and $b_2 c_2 = 1/2$, which form a one-parameter family of second-order methods. The choice $b_1 = b_2 = 1/2$, $c_2 = 1$ gives the modified Euler (trapezoidal) method; $b_1 = 0$, $b_2 = 1$, $c_2 = 1/2$ gives the midpoint method. No choice of parameters can achieve order 3 with only two stages.
[/example]
[problem]
Consider the 2-step method $y_{n+2} + \alphay_{n+1} + \betay_n = h\sigma_1f(t_{n+1}, y_{n+1})$ (an implicit one-point method). Use the [generating polynomial condition](/theorems/489) to determine $\alpha$, $\beta$, and $\sigma_1$ for maximum order, and then check the [root condition](/theorems/490) to determine whether the method converges.
[/problem]
[solution]
**Step 1: Order conditions.** Here $\rho(w) = w^2 + \alpha w + \beta$ and $\sigma(w) = \sigma_1 w$. The generating function is
\begin{align*}
\rho(e^x) - x\sigma(e^x) = e^{2x} + \alpha e^x + \beta - \sigma_1 x e^x.
\end{align*}
Expanding: $e^{2x} = 1 + 2x + 2x^2 + \frac{4}{3}x^3 + \cdots$ and $xe^x = x + x^2 + \frac{1}{2}x^3 + \cdots$. Collecting powers:
\begin{align*}
x^0: &\quad 1 + \alpha + \beta = 0, \\
x^1: &\quad 2 + \alpha - \sigma_1 = 0, \\
x^2: &\quad 2 + \frac{\alpha}{2} - \sigma_1 = 0.
\end{align*}
From $x^1$ and $x^2$: $\sigma_1 = 2 + \alpha$ and $\sigma_1 = 2 + \alpha/2$, giving $\alpha = 0$, $\sigma_1 = 2$, $\beta = -1$.
**Step 2: Check order.** With these values, $\rho(e^x) - x\sigma(e^x) = e^{2x} - 1 - 2xe^x = (1 + 2x + 2x^2 + \frac{4}{3}x^3 + \cdots) - 1 - (2x + 2x^2 + x^3 + \cdots) = \frac{1}{3}x^3 + O(x^4)$, confirming order $p = 2$.
**Step 3: Root condition.** $\rho(w) = w^2 - 1 = (w-1)(w+1)$. The roots are $w = 1$ and $w = -1$, both simple and on the unit circle. The root condition holds. By the [Dahlquist Equivalence Theorem](/theorems/490), the method converges.
This is the *midpoint rule* (also called the leapfrog method): $y_{n+2} - y_n = 2hf(t_{n+1}, y_{n+1})$. Despite convergence, the root $w = -1$ introduces a parasitic oscillatory mode that can cause practical difficulties on long time intervals.
[/solution]
# Stiff Equations
The [Dahlquist Equivalence Theorem](/theorems/490) guarantees that a convergent multi-step method will eventually produce accurate results as $h \to 0$. But "eventually" can be a long wait. For many systems arising in practice — chemical kinetics, electrical circuits, semi-discretisations of parabolic PDEs — the step size required for *stability* is orders of magnitude smaller than the step size required for *accuracy*. A system whose solution varies on a timescale of seconds may contain transient components that decay on a timescale of microseconds; an explicit method must resolve the fastest transient to remain stable, even though it contributes nothing to the solution after a brief initial layer.
This phenomenon is called *stiffness*, and it represents the central practical limitation of explicit numerical methods. The diagnosis of stiffness — and the design of methods that overcome it — requires a new conceptual framework centred on *stability domains* and *A-stability*. The key tool is the scalar test equation $y' = \lambda y$ with $\lambda \in \mathbb{C}$, $\operatorname{Re}(\lambda) < 0$: the simplest equation that decays, and a model for each eigenmode of a linear system. Applying a numerical method to this equation produces a recurrence $y_n = r(z)^n y_0$ where $z = h\lambda$ and $r(z)$ is the *stability function* of the method. The question is: for which values of $z$ does $|r(z)| < 1$ (so that $y_n \to 0$, matching the exact decay)?
## Linear Stability
[definition:Linear Stability Domain]
Given a numerical method with stability function $r(z)$, the *linear stability domain* is
\begin{align*}
D = \{z \in \mathbb{C} : |r(z)| < 1\}.
\end{align*}
Here $r(z)$ is determined by applying the method to $y' = \lambda y$ and writing $y_{n+1} = r(z)\,y_n$ with $z = h\lambda$. For one-step methods $r$ is typically a polynomial or rational function; for multi-step methods, the condition generalises to requiring all roots of the characteristic equation (in $w$, with $z$ as parameter) to lie inside the unit disk.
[/definition]
The exact solution of $y' = \lambda y$ decays if and only if $\operatorname{Re}(\lambda) < 0$, i.e. $\lambda$ lies in the open left half-plane $\mathbb{C}^-$. For the numerical solution to mimic this decay, we need $z = h\lambda \in D$. If $|\lambda|$ is large (fast decay), then $|z| = h|\lambda|$ is large, and $z$ may fall outside $D$ unless $h$ is very small. This is the mechanism of stiffness: the method is forced to take tiny steps not for accuracy but to keep $z$ inside the stability domain.
[example:Stability Domain of Forward Euler]
Forward Euler has $r(z) = 1 + z$, so $D = \{z : |1+z| < 1\}$ — the open unit disk centred at $-1$. For a real eigenvalue $\lambda = -1000$ (rapid decay), stability requires $|1 + h\lambda| < 1$, i.e. $0 < h < 2/1000 = 0.002$. The solution $e^{-1000t}$ is negligible after $t \approx 0.01$, yet forward Euler is forced to use $h < 0.002$ for *all* subsequent time, even when the solution is nearly constant.
[/example]
[example:Stability Domain of Backward Euler]
Backward Euler has $r(z) = 1/(1-z)$, so $D = \{z : |1-z| > 1\}$ — the exterior of the unit disk centred at $1$. Since $\operatorname{Re}(z) < 0$ implies $|1-z| > 1$, the entire left half-plane is contained in $D$: backward Euler is A-stable (defined below). For $\lambda = -1000$, *any* $h > 0$ gives $|r(z)| = |1/(1+1000h)| < 1$, so the step size is limited only by accuracy, not stability.
[/example]
## A-Stability
The examples above motivate a stability criterion that eliminates step-size restrictions for decaying problems.
[definition:A-stability]
A numerical method is *A-stable* if its linear stability domain contains the entire open left half-plane: $\mathbb{C}^- \subseteq D$.
[/definition]
A-stability guarantees that for any system $y' = Ay$ with all eigenvalues of $A$ in $\mathbb{C}^-$, the numerical solution decays for *any* step size $h > 0$. The step size is then chosen purely for accuracy, not stability — exactly what is needed for stiff problems.
Verifying A-stability requires checking $|r(z)| \leq 1$ for all $z$ with $\operatorname{Re}(z) \leq 0$. This seems like an infinite-dimensional condition, but the [Maximum Modulus Principle](/theorems/491) reduces it to a [boundary](/page/Boundary) check.
[quotetheorem:491]
Applied to A-stability: if the stability function $r(z)$ is analytic in $\mathbb{C}^-$ (i.e., $r$ has no poles in the left half-plane), then $|r(z)| \leq 1$ on $\mathbb{C}^-$ if and only if $|r(it)| \leq 1$ for all $t \in \mathbb{R}$ (the imaginary axis) and $\limsup_{|z| \to \infty, z \in \mathbb{C}^-} |r(z)| \leq 1$. This reduces A-stability verification to checking a one-dimensional condition.
[citeproof:491]
[example:A-stability of the Trapezoidal Rule]
The trapezoidal rule ($\theta = 1/2$ in the $\theta$-method) has stability function $r(z) = (1 + z/2)/(1 - z/2)$. On the imaginary axis $z = it$:
\begin{align*}
|r(it)| = \left|\frac{1 + it/2}{1 - it/2}\right| = \frac{|1 + it/2|}{|1 - it/2|} = \frac{\sqrt{1 + t^2/4}}{\sqrt{1 + t^2/4}} = 1.
\end{align*}
Since $r$ has its only pole at $z = 2$ (in the right half-plane), $r$ is analytic in $\mathbb{C}^-$. By the [Maximum Modulus Principle](/theorems/491), $|r(z)| \leq 1$ on $\mathbb{C}^-$ (with equality only on the boundary $i\mathbb{R}$, since $r$ is non-constant). The trapezoidal rule is A-stable and second-order.
[/example]
[example:Using the Maximum Modulus Principle for a Rational Stability Function]
Consider a method with $r(z) = (6 - 2z)/(6 - 4z + z^2)$. The poles are at $z = 2 \pm \sqrt{2}i$, both in the right half-plane, so $r$ is analytic in $\mathbb{C}^-$. On $z = it$: $|r(it)|^2 = (36 + 4t^2)/((6-t^2)^2 + 16t^2)$. Expanding the denominator: $(6-t^2)^2 + 16t^2 = 36 - 12t^2 + t^4 + 16t^2 = 36 + 4t^2 + t^4$. Therefore $|r(it)|^2 = (36 + 4t^2)/(36 + 4t^2 + t^4) \leq 1$. As $|z| \to \infty$, $|r(z)| \sim 2/|z| \to 0$. By the [Maximum Modulus Principle](/theorems/491), the method is A-stable.
[/example]
## Dahlquist's Second Barrier
A-stability is clearly desirable for stiff problems. Can we combine it with high order? The following result shows there is a fundamental trade-off.
[quotetheorem:492]
This is a striking limitation. One might hope to construct an A-stable Adams or BDF method of order 4 or 6, but the theorem forbids it. Among all A-stable multi-step methods of order 2, the trapezoidal rule is optimal (smallest error constant), further justifying its central role.
The barrier explains why the *practical* approach to stiff problems takes two directions. First, one can relax A-stability to weaker conditions: BDF methods of order up to 6 satisfy $A(\alpha)$-stability (stability in a wedge $|\arg(-z)| < \alpha$ rather than the full half-plane), which suffices for many applications. Second, one can leave the multi-step framework entirely: *implicit Runge–Kutta methods* (such as Gauss–Legendre or Radau IIA methods) can be both A-stable and of arbitrarily high order, since the Dahlquist barrier applies only to multi-step methods.
[citeproof:492]
## L-Stability
A-stability guarantees decay for all $\operatorname{Re}(z) < 0$, but it does not specify *how fast* the numerical solution decays for very stiff components ($|z| \gg 1$). The trapezoidal rule, for instance, has $|r(it)| = 1$ on the imaginary axis and $|r(z)| \to 1$ as $\operatorname{Re}(z) \to -\infty$ — it damps very stiff components only slowly, which can introduce parasitic oscillations.
[definition:L-stability]
An A-stable method is *L-stable* if its stability function satisfies $r(z) \to 0$ as $\operatorname{Re}(z) \to -\infty$.
[/definition]
L-stability ensures that very stiff components are strongly damped in a single step, matching the qualitative behaviour of the exact solution (which decays to machine precision in a single timestep when $|h\lambda| \gg 1$). Backward Euler is L-stable ($r(z) = 1/(1-z) \to 0$), but the trapezoidal rule is not ($|r(z)| \to 1$). The Radau IIA methods (implicit Runge–Kutta) are both L-stable and high-order — making them the methods of choice for very stiff problems in production ODE solvers.
[problem]
Show that the $\theta$-method is A-stable if and only if $\theta \leq 1/2$, and L-stable if and only if $\theta = 0$ (backward Euler).
[/problem]
[solution]
**Step 1: Stability function.** Applied to $y' = \lambda y$, the $\theta$-method gives $y_{n+1} = y_n + h\lambda(\theta y_n + (1-\theta)y_{n+1})$, so $y_{n+1}(1 - (1-\theta)z) = (1 + \theta z)y_n$ and
\begin{align*}
r(z) = \frac{1 + \theta z}{1 - (1-\theta)z}.
\end{align*}
The pole is at $z = 1/(1-\theta)$, which lies in the right half-plane for $\theta < 1$, so $r$ is analytic in $\mathbb{C}^-$.
**Step 2: A-stability.** On $z = it$: $|r(it)|^2 = (1 + \theta^2 t^2)/(1 + (1-\theta)^2 t^2)$. This is $\leq 1$ iff $\theta^2 \leq (1-\theta)^2$ iff $|\theta| \leq |1-\theta|$ iff $\theta \leq 1/2$ (for $\theta \in [0,1]$). By the [Maximum Modulus Principle](/theorems/491), the method is A-stable iff $\theta \leq 1/2$.
**Step 3: L-stability.** As $\operatorname{Re}(z) \to -\infty$: $r(z) \to \theta/(-(1-\theta)) = -\theta/(1-\theta)$. This is zero iff $\theta = 0$. Therefore only backward Euler ($\theta = 0$) is L-stable among $\theta$-methods.
[/solution]
# Implementation of ODE Methods
The preceding sections established the *theory* of numerical ODE solvers: [convergence](/theorems/488) guarantees that discrete approximations approach the true solution, the [Dahlquist Equivalence Theorem](/theorems/490) characterises convergent multi-step methods, and [A-stability](/theorems/492) identifies methods that handle stiff systems without step-size restrictions. But theory alone does not produce a working solver. Two practical problems remain: *how do we choose the step size $h$?* and *how do we solve the nonlinear equations that implicit methods require?*
The first problem arises because the optimal step size depends on the unknown solution. Too large a step produces unacceptable error; too small a step wastes computation. The solution is *adaptive step-size control*: at each step, estimate the local error using a computable proxy, compare it to a tolerance, and adjust $h$ accordingly. The second problem arises because implicit methods like backward Euler or the trapezoidal rule define $y_{n+1}$ as the solution of a nonlinear equation involving $f(t_{n+1}, y_{n+1})$. Solving this equation requires iterative methods — and the choice between simple fixed-point iteration and Newton's method has profound consequences for stiff problems.
## Local Error Estimation
The key idea is to compute *two* approximations to $y(t_{n+1})$ using methods of the same order but different error constants, and use their difference as a proxy for the local truncation error.
Suppose two methods of order $p$ produce approximations $y_{n+1}^{(1)}$ and $y_{n+1}^{(2)}$ with local truncation errors
\begin{align*}
y(t_{n+1}) - y_{n+1}^{(1)} &= c_1\,h^{p+1}\,y^{(p+1)}(t_n) + O(h^{p+2}), \\
y(t_{n+1}) - y_{n+1}^{(2)} &= c_2\,h^{p+1}\,y^{(p+1)}(t_n) + O(h^{p+2}),
\end{align*}
where $c_1 \neq c_2$. Subtracting eliminates the unknown solution:
\begin{align*}
y_{n+1}^{(1)} - y_{n+1}^{(2)} = (c_2 - c_1)\,h^{p+1}\,y^{(p+1)}(t_n) + O(h^{p+2}).
\end{align*}
This computable difference is proportional to the leading error term, so the local truncation error of method (2) is estimated by
\begin{align*}
y(t_{n+1}) - y_{n+1}^{(2)} \approx \frac{c_2}{c_2 - c_1}\left(y_{n+1}^{(1)} - y_{n+1}^{(2)}\right).
\end{align*}
This is *Milne's device*: pairing two methods of the same order but different error constants to extract a computable error estimate without evaluating any derivatives of $y$.
[example:Adams–Bashforth / Trapezoidal Rule Pair]
The AB2 method (order 2) has error constant $c_{\mathrm{AB}} = 5/12$, and the trapezoidal rule (order 2) has $c_{\mathrm{TR}} = -1/12$. Milne's device gives
\begin{align*}
y(t_{n+1}) - y_{n+1}^{\mathrm{TR}} \approx \frac{-1/12}{-1/12 - 5/12}\left(y_{n+1}^{\mathrm{AB}} - y_{n+1}^{\mathrm{TR}}\right) = \frac{1}{6}\left(y_{n+1}^{\mathrm{AB}} - y_{n+1}^{\mathrm{TR}}\right).
\end{align*}
In the predictor–corrector workflow, AB2 provides a cheap explicit *prediction* that also initialises the trapezoidal rule's implicit solve, and the difference between the two approximations monitors accuracy — all at essentially no extra cost.
[/example]
The same principle applies to Runge–Kutta methods via *embedded pairs*. An embedded RK pair uses two [sets](/page/Set) of weights $b$ and $\hat{b}$ applied to the *same* stage values $k_1, \dots, k_\nu$, producing approximations of orders $p$ and $\hat{p} = p+1$ (or vice versa) with no additional $f$-evaluations. The Dormand–Prince method (DOPRI5), used in MATLAB's `ode45`, embeds a fourth-order and fifth-order pair in a seven-stage framework, using the difference to estimate the local error of the fourth-order solution.
## Adaptive Step-Size Control
Given an error estimate $e_{n+1} \approx y(t_{n+1}) - y_{n+1}$ and a user-specified tolerance $\varepsilon$, the step is *accepted* if $\|e_{n+1}\| \leq \varepsilon$ and *rejected* otherwise. In either case, the next step size is chosen to keep the error near $\varepsilon$. Since the leading error term scales as $h^{p+1}$, a step size $h_{\mathrm{new}}$ that would produce error exactly $\varepsilon$ satisfies
\begin{align*}
\varepsilon \approx \|e_{n+1}\| \cdot \left(\frac{h_{\mathrm{new}}}{h}\right)^{p+1}, \quad \text{so} \quad h_{\mathrm{new}} = h\left(\frac{\varepsilon}{\|e_{n+1}\|}\right)^{1/(p+1)}.
\end{align*}
In practice, a *safety factor* $\sigma \in (0.8, 0.95)$ is applied: $h_{\mathrm{new}} = \sigma\,h\,(\varepsilon/\|e_{n+1}\|)^{1/(p+1)}$, and the ratio $h_{\mathrm{new}}/h$ is clamped to prevent excessively large or small changes (typically between $0.2$ and $5$).
The adaptive algorithm thus works as follows at each step: (1) compute $y_{n+1}$ and an error estimate using the embedded pair; (2) if the error exceeds $\varepsilon$, reject the step, reduce $h$, and retry; (3) if the error is acceptable, advance and possibly increase $h$ for the next step. This feedback loop ensures that the solver spends computational effort where the solution varies rapidly and coasts through smooth regions.
## Solving Implicit Equations
Implicit methods require solving $y_{n+1} = y_n + hf(t_{n+1}, y_{n+1})$ (for backward Euler; analogous equations arise for the trapezoidal rule and implicit Runge–Kutta methods). This is a fixed-point equation $y_{n+1} = G(y_{n+1})$ with $G(x) = y_n + hf(t_{n+1}, x)$.
The simplest approach is *functional iteration* (also called Picard iteration): starting from an initial guess $y_{n+1}^{(0)}$ (typically the explicit Euler prediction), iterate $y_{n+1}^{(k+1)} = G(y_{n+1}^{(k)})$.
[quotetheorem:493]
The condition $h\lambda < 1$ is the crux of the problem. For stiff systems, the Lipschitz constant $\lambda$ can be enormous (e.g. $\lambda = 10^6$ for a fast chemical reaction), so $h < 1/\lambda$ would require microsecond steps — precisely the restriction that implicit methods were supposed to eliminate. Functional iteration reintroduces the stiffness constraint through the back door.
[citeproof:493]
The resolution is *Newton's method*. Define $F(x) = x - y_n - hf(t_{n+1}, x)$, so $y_{n+1}$ is a root of $F$. Newton iteration is
\begin{align*}
F'(y_{n+1}^{(k)})\,\Deltay^{(k)} &= -F(y_{n+1}^{(k)}), \\
y_{n+1}^{(k+1)} &= y_{n+1}^{(k)} + \Deltay^{(k)},
\end{align*}
where $F'(x) = I - hJ$ with $J = \nabla_{y}f(t_{n+1}, x)$. Each iteration requires solving an $N \times N$ linear system with matrix $I - hJ$ — a task addressed by the [Numerical Linear Algebra](/pages/numerical-linear-algebra) section. The crucial advantage is that Newton's method converges *quadratically* near the solution, and its convergence does not require $h\lambda < 1$. For stiff problems, the matrix $I - hJ$ is well-conditioned (when $\operatorname{Re}(\lambda) < 0$ and $h$ is large, $\|hJ\| \gg 1$ and the matrix is dominated by $-hJ$, which is invertible). In practice, 2–3 Newton iterations typically suffice.
A common economy is *simplified Newton iteration*: freeze the Jacobian $J$ at the beginning of the step (or even across several steps), factoring $I - hJ$ once and reusing the factorisation. This trades quadratic for linear convergence but dramatically reduces cost when $J$ changes slowly.
[example:Newton vs. Functional Iteration on a Stiff System]
For $y' = -1000y$, $y(0) = 1$, backward Euler gives $y_{n+1} = y_n/(1 + 1000h)$. Since $f$ is linear, Newton solves the implicit equation *exactly* in one step for any $h > 0$: $F(x) = x - y_n + 1000hx = (1 + 1000h)x - y_n$, so $x = y_n/(1+1000h)$. Functional iteration, by contrast, requires $1000h < 1$ (i.e. $h < 0.001$) to converge at all — even though accuracy requires only $h \approx 0.01$ or larger. For nonlinear stiff systems, Newton needs a few iterations rather than one, but the qualitative advantage persists: convergence is governed by the accuracy of the initial guess, not by the stiffness $\lambda$.
[/example]
## Predictor–Corrector Methods
The ideas above combine naturally in *predictor–corrector* schemes. A typical workflow for the AB2/trapezoidal pair:
(i) **Predict:** compute $y_{n+1}^{(0)}$ using the explicit AB2 formula (cheap, no system to solve).
(ii) **Correct:** use $y_{n+1}^{(0)}$ to initialise Newton iteration (or a single corrector step) for the implicit trapezoidal rule, yielding $y_{n+1}$.
(iii) **Estimate error:** $\|e_{n+1}\| \approx \frac{1}{6}\|y_{n+1}^{(0)} - y_{n+1}\|$.
(iv) **Adapt:** accept or reject the step and adjust $h$ using the formula above.
This PECE (Predict–Evaluate–Correct–Evaluate) cycle is the backbone of classical variable-step ODE codes. Modern codes (such as MATLAB's `ode15s` for stiff problems or `ode45` for non-stiff) replace the multi-step pair with embedded Runge–Kutta or BDF families, but the architectural pattern — predict, correct, estimate, adapt — remains the same.
[problem]
Using the AB2/trapezoidal pair, suppose at step $n$ we compute $y_{n+1}^{\mathrm{AB}} = 1.234$ and $y_{n+1}^{\mathrm{TR}} = 1.230$ with step size $h = 0.1$ and order $p = 2$. Estimate the local truncation error of the trapezoidal rule. If the tolerance is $\varepsilon = 10^{-3}$, compute the recommended new step size (with safety factor $\sigma = 0.9$).
[/problem]
[solution]
**Error estimate.** $\|e_{n+1}\| \approx \frac{1}{6}|1.234 - 1.230| = \frac{0.004}{6} \approx 6.67 \times 10^{-4}$.
**Acceptance.** Since $6.67 \times 10^{-4} < 10^{-3} = \varepsilon$, the step is accepted.
**New step size.** With $p = 2$:
\begin{align*}
h_{\mathrm{new}} = 0.9 \times 0.1 \times \left(\frac{10^{-3}}{6.67 \times 10^{-4}}\right)^{1/3} = 0.09 \times (1.5)^{1/3} \approx 0.09 \times 1.145 \approx 0.103.
\end{align*}
The solver can safely increase the step size by about 3%. If the error had been $3 \times 10^{-3} > \varepsilon$, the step would be rejected and retried with $h_{\mathrm{new}} = 0.09 \times (10^{-3}/3 \times 10^{-3})^{1/3} \approx 0.09 \times 0.693 \approx 0.062$ — a 38% reduction.
[/solution]
# Numerical Linear Algebra
At the heart of nearly every numerical method developed so far lies a linear system. Implicit ODE solvers require solving $(I - hJ)\,\Deltay = r$ at each Newton step (see [Functional Iteration](/theorems/493)). Least-squares approximation leads to normal equations. Gaussian quadrature nodes are eigenvalues of a tridiagonal matrix built from [orthogonal polynomial recurrence coefficients](/theorems/478). The efficiency of all these algorithms depends on a single question: *how do we solve $Ax = b$ quickly and stably?*
The answer lies in *matrix factorisation*: decomposing $A$ into a product of simpler matrices — typically triangular — for which the system can be solved by direct substitution. The fundamental factorisation is $A = LU$, where $L$ is lower triangular and $U$ is upper triangular. Once computed (at a cost of $O(n^3)$ operations), this factorisation reduces $Ax = b$ to two triangular solves $Ly = b$ and $Ux = y$, each costing only $O(n^2)$. For symmetric positive-definite matrices — arising in least-squares problems, finite element methods, and Gaussian process regression — the factorisation simplifies to $A = LDL^T$ or the Cholesky form $A = GG^T$, halving the work. For banded matrices — ubiquitous in ODE and PDE discretisations — the factorisation preserves the band structure, reducing the cost to $O(nr^2)$ where $r$ is the bandwidth.
## Triangular Systems and Substitution
The building blocks of all direct methods are triangular matrices, whose zero structure allows sequential solution.
[definition:Triangular Matrix]
A matrix $A \in \mathbb{R}^{n \times n}$ is *upper triangular* if $A_{ij} = 0$ for $i > j$, and *lower triangular* if $A_{ij} = 0$ for $i < j$. It is *unit* lower (or upper) triangular if additionally $A_{ii} = 1$ for all $i$.
[/definition]
If $L$ is lower triangular with $L_{ii} \neq 0$, the system $Ly = b$ is solved by *forward substitution*: $y_1 = b_1/L_{11}$, and for $i = 2, \dots, n$,
\begin{align*}
y_i = \frac{1}{L_{ii}}\left(b_i - \sum_{j=1}^{i-1} L_{ij}\,y_j\right).
\end{align*}
Each $y_i$ depends only on previously computed values, so the cost is $\sum_{i=1}^n i = O(n^2)$ operations. *Backward substitution* for $Ux = y$ is analogous, proceeding from $x_n$ to $x_1$.
## LU Factorisation
[definition:LU Factorisation]
A matrix $A \in \mathbb{R}^{n \times n}$ admits an *LU factorisation* if $A = LU$ where $L$ is unit lower triangular and $U$ is upper triangular.
[/definition]
[quotetheorem:497]
The condition that all leading principal minors are non-zero is equivalent to requiring that Gaussian elimination succeeds without encountering a zero pivot. The uniqueness argument is elegant: if $A = L_1 U_1 = L_2 U_2$, then $L_2^{-1}L_1 = U_2 U_1^{-1}$ is simultaneously unit lower triangular and upper triangular, hence the identity.
[citeproof:497]
[example:Non-Existence of LU Factorisation]
The matrix $A = \begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix}$ has $\det(A_1) = 0$. Attempting $A = LU$: the $(1,1)$ entry gives $u_{11} = 0$, and the $(2,1)$ entry gives $\ell_{21} \cdot 0 = 1$ — a contradiction.
[/example]
## Pivoting
[quotetheorem:498]
*Partial pivoting* — swapping the current row with the row having the largest absolute value in the current column — ensures that all multipliers satisfy $|L_{ij}| \leq 1$, controlling rounding error growth. The cost of LU factorisation is $\frac{2}{3}n^3 + O(n^2)$ multiplications. Once $PA = LU$ is computed, each solve requires only $2n^2 + O(n)$ operations.
[citeproof:498]
## Symmetric Positive-Definite Systems
[definition:Positive-Definite Matrix]
A symmetric matrix $A \in \mathbb{R}^{n \times n}$ is *positive-definite* if $x^T Ax > 0$ for all $x \neq \mathbf{0}$.
[/definition]
[quotetheorem:499]
The $LDL^T$ factorisation halves the storage and arithmetic ($\frac{1}{3}n^3$ instead of $\frac{2}{3}n^3$). The positivity $d_k > 0$ means the factorisation succeeds *without pivoting*.
[citeproof:499]
[definition:Cholesky Factorisation]
The *Cholesky factorisation* of a symmetric positive-definite matrix $A$ is $A = GG^T$ where $G$ is lower triangular with $G_{ii} > 0$. It is related to $LDL^T$ by $G = LD^{1/2}$.
[/definition]
[example:Cholesky Factorisation of a $2 \times 2$ Matrix]
Let $A = \begin{pmatrix} 4 & 2 \\ 2 & 3 \end{pmatrix}$. Then $G_{11} = 2$, $G_{21} = 1$, $G_{22} = \sqrt{2}$, giving $G = \begin{pmatrix} 2 & 0 \\ 1 & \sqrt{2} \end{pmatrix}$.
[/example]
## Banded Systems
[definition:Band Matrix]
A matrix $A \in \mathbb{R}^{n \times n}$ has *bandwidth $r$* if $A_{ij} = 0$ whenever $|i - j| > r$.
[/definition]
[quotetheorem:500]
For the tridiagonal case ($r = 1$), the cost is $O(n)$, making it feasible to solve systems with millions of unknowns. The combination of bandwidth preservation with $LDL^T$ is particularly powerful for discretised elliptic operators.
[citeproof:500]
[problem]
Show that the inverse of a unit lower triangular matrix is also unit lower triangular.
[/problem]
[solution]
Let $L$ be $n \times n$ unit lower triangular. We prove $M = L^{-1}$ is unit lower triangular by induction on the row index.
**Base case.** Row 1 of $LM = I$: $L_{11}M_{1j} = M_{1j} = \delta_{1j}$. So $M_{11} = 1$ and $M_{1j} = 0$ for $j > 1$. $\checkmark$
**Inductive step.** Assume $M_{ij} = 0$ for $i < j$ and $M_{ii} = 1$ for all $i \leq k-1$. For $j > k$: by the inductive hypothesis $M_{\ell j} = 0$ for $\ell \leq k < j$, so the sum vanishes. For $j = k$: the sum reduces to $L_{kk}M_{kk} = 1$, giving $M_{kk} = 1$. $\checkmark$
Therefore $M$ is unit lower triangular. (This fact was used in the uniqueness argument of the [LU Existence Theorem](/theorems/497).)
[/solution]
# Linear Least Squares
The preceding sections addressed problems where the data is *exact*: interpolation passes through given points precisely, and ODE solvers track a well-defined trajectory. But in practice, data is often noisy, redundant, or overdetermined — there are more measurements than unknowns, and no single set of parameters fits all observations exactly. The natural question becomes: *what is the best fit?*
The linear least-squares problem provides the answer. Given an $m \times n$ matrix $A$ (with $m > n$) and a vector $b \in \mathbb{R}^m$, we seek $x^* \in \mathbb{R}^n$ minimising $\|Ax - b\|^2$. This framework encompasses polynomial fitting (where $A$ is a Vandermonde matrix, connecting directly to the [discrete least-squares theory](/theorems/479) of orthogonal polynomials), linear regression, and calibration problems throughout the sciences.
## The Normal Equations
[quotetheorem:501]
The normal equations $A^TAx^* = A^Tb$ are the algebraic translation of the geometric orthogonality condition $r^* \perp \operatorname{col}(A)$.
[citeproof:501]
[quotetheorem:502]
Full column rank ensures $A^TA$ is positive-definite, making it invertible and guaranteeing uniqueness. The system can be solved via [Cholesky](/theorems/499) at cost $\frac{1}{3}n^3 + mn^2$.
[citeproof:502]
[example:Rank-Deficient Least Squares]
For $A = \begin{pmatrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{pmatrix}$ and $b = (1, 2, 3)^T$: $A^TA = \begin{pmatrix} 3 & 3 \\ 3 & 3 \end{pmatrix}$ is singular. The normal equations give $x_1 + x_2 = 2$, a one-parameter family all achieving minimum residual $\|r^*\|^2 = 2$.
[/example]
## QR Factorisation
While the normal equations are theoretically clean, forming $A^TA$ *squares the condition number*: $\kappa(A^TA) = \kappa(A)^2$. The remedy is orthogonal transformations.
[definition:Orthogonal Matrix]
A matrix $Q \in \mathbb{R}^{m \times m}$ is *orthogonal* if $Q^TQ = I$.
[/definition]
[definition:QR Factorisation]
A *QR factorisation* of $A \in \mathbb{R}^{m \times n}$ ($m \geq n$) is $A = QR$ where $Q$ is orthogonal and $R = \begin{pmatrix} R_1 \\ 0 \end{pmatrix}$ with $R_1 \in \mathbb{R}^{n \times n}$ upper triangular.
[/definition]
[quotetheorem:503]
The proof exploits the norm-preserving property: $\|Ax - b\|^2 = \|Rx - Q^Tb\|^2$ splits into $\|R_1x - c\|^2 + \|d\|^2$, where $c = \tilde{Q}^Tb$. The first term vanishes by solving $R_1x = c$; the second is the irreducible residual. Total cost: $O(mn^2)$ with condition number $\kappa(A)$, not $\kappa(A)^2$.
[citeproof:503]
## Computing the QR Factorisation
Three standard algorithms: **Gram–Schmidt** (classical or modified, $2mn^2$ flops, can lose orthogonality), **Householder reflections** ($2mn^2 - \frac{2}{3}n^3$ flops, numerically superior, standard in LAPACK), and **Givens rotations** ($3mn^2$ flops, advantageous for sparse/banded matrices).
[example:QR Factorisation and Least Squares]
Let $A = \begin{pmatrix} 1 & 1 \\ 1 & -1 \\ 1 & 0 \end{pmatrix}$, $b = (2, 0, 1)^T$. Gram–Schmidt: $q_1 = (1,1,1)^T/\sqrt{3}$, $R_{11} = \sqrt{3}$, $R_{12} = 0$, $q_2 = (1,-1,0)^T/\sqrt{2}$, $R_{22} = \sqrt{2}$. Then $c = (\sqrt{3}, \sqrt{2})^T$, giving $x^* = (1, 1)^T$ with $\|r^*\| = 2$.
[/example]
## Connection to Polynomial Least Squares
The [discrete least-squares theory via orthogonal polynomials](/theorems/479) is a special case of the QR framework. When the polynomial basis is orthogonal, $A$ already has orthogonal columns: $A^TA$ is diagonal and the normal equations decouple. Computing orthogonal polynomials via the [three-term recurrence](/theorems/478) is equivalent to performing Gram–Schmidt on the Vandermonde-like matrix. Using the monomial basis produces a Vandermonde matrix whose condition number grows exponentially with $n$ — a dramatic illustration of why the choice of basis matters for numerical stability.
[problem]
Given $A = \begin{pmatrix} 1 & 1 \\ 1 & -1 \\ 1 & 0 \end{pmatrix}$ and $b = (2, 0, 1)^T$, verify that the normal equations give the same answer as QR, and compare the condition numbers.
[/problem]
[solution]
**Normal equations.** $A^TA = \begin{pmatrix} 3 & 0 \\ 0 & 2 \end{pmatrix}$, $A^Tb = (3, 2)^T$, giving $x^* = (1, 1)^T$. $\checkmark$
**QR approach.** $\tilde{Q}^Tb = (\sqrt{3}, \sqrt{2})^T$, $R_1^{-1}(\sqrt{3}, \sqrt{2})^T = (1, 1)^T$. $\checkmark$
**Condition numbers.** The singular values of $A$ are $\sqrt{3}$ and $\sqrt{2}$, so $\kappa(A) = \sqrt{3/2} \approx 1.22$. For $A^TA$: $\kappa(A^TA) = 3/2 = \kappa(A)^2$. For a matrix with $\kappa(A) = 10^8$, the normal equations would produce $\kappa(A^TA) = 10^{16}$ — exceeding double-precision accuracy — while QR remains stable.
[/solution]