Differential equations sit at the centre of applied mathematics: Newton's second law, the [heat equation](/page/Heat%20Equation), population dynamics, electrical circuits — each reduces to a relationship between an unknown function and its derivatives. Solving such equations requires not only ingenuity specific to each problem type, but a robust toolkit of calculus techniques that recur across every chapter of the subject. This course develops that toolkit alongside the differential equations it serves, beginning with single-variable approximation and [differentiation](/page/Derivative), progressing through first-order nonlinear equations and higher-order linear theory, and culminating in multivariable dynamical systems and an introduction to partial differential equations.
A recurring theme throughout is the interplay between *exact* and *approximate*. We will frequently encounter equations we cannot solve in closed form, and will instead linearise — replacing a complicated nonlinear problem with a simpler linear one that captures the essential behaviour near a point of interest. The quality of this linearisation depends on controlling the error, and for that we need Taylor's theorem, asymptotic notation, and the algebraic rules of differentiation. These are the subjects of the first chapter.
# Basic Calculus
The material in this chapter is a review of tools from Analysis I, recast in the language of asymptotic notation that will be used throughout the course. The three pillars are: big-$O$ and little-$o$ notation for controlling errors and comparing rates of growth; the algebraic differentiation rules (chain, product, quotient, Leibniz) for computing derivatives of composite expressions; and Taylor's theorem, which converts local derivative information into global polynomial approximation with explicit error bounds. These tools are not merely background — they are the engine behind perturbation analysis (Chapter 3), the method of detuning (Chapter 4), [series](/page/Series) solutions (Chapter 4, §11), and the multivariable Taylor expansion that classifies stationary points (Chapter 5).
## Asymptotic Notation
Suppose we approximate $\sin(x)$ near $x = 0$ by its first Taylor polynomial: $\sin(x) \approx x$. How good is this approximation? For $x = 0.1$, the error is roughly $1.7 \times 10^{-4}$; for $x = 0.01$, it drops to $1.7 \times 10^{-7}$. The error is shrinking much faster than $x$ itself — in fact, it behaves like $x^3/6$. To express this precisely, we need a language for comparing the sizes of [functions](/page/Function) near a point. This is what asymptotic notation provides, and it will be indispensable when we truncate Taylor expansions in perturbation arguments later in the course.
[definition: Little O Notation]
Let $f: \mathbb{R} \to \mathbb{R}$ and $g: \mathbb{R} \to \mathbb{R}$ be functions defined in a neighbourhood of $x_0$ with $g(x) \neq 0$ near $x_0$. We write
\begin{align*}
f(x) = o(g(x)) \quad \text{as } x \to x_0
\end{align*}
if
\begin{align*}
\lim_{x \to x_0} \frac{f(x)}{g(x)} = 0.
\end{align*}
Informally, $f(x)$ is *much smaller* than $g(x)$ as $x \to x_0$.
[/definition]
The mnemonic is: "the ratio *o*ver inside goes to zer*o*."
[definition: Big O Notation]
Let $f: \mathbb{R} \to \mathbb{R}$ and $g: \mathbb{R} \to \mathbb{R}$ be functions defined in a neighbourhood of $x_0$. We write
\begin{align*}
f(x) = O(g(x)) \quad \text{as } x \to x_0
\end{align*}
if there exist $\delta > 0$ and $M > 0$ such that for all $x$ with $0 < |x - x_0| < \delta$,
\begin{align*}
|f(x)| \leq M |g(x)|.
\end{align*}
[/definition]
The mnemonic: "the rati*O* *O*ver inside is b*O*unded by a constant." The definition extends to behaviour at infinity by replacing the punctured disc condition with $x > N$ for some $N > 0$.
The two notations are related by a strict hierarchy: little-$o$ implies big-$O$, but not conversely.
[remark: Relationship Between O And o]
If $f(x) = o(g(x))$ as $x \to x_0$, then $f(x) = O(g(x))$ as $x \to x_0$. The converse fails. For instance, $x^2 = O(x^2)$ as $x \to 0$ (take $M = 1$), but $x^2 \neq o(x^2)$ since $x^2/x^2 = 1 \not\to 0$. In this sense, big-$O$ is the weaker statement — it says $f$ is *at most as large as* $g$, while little-$o$ says $f$ is *negligible* compared to $g$.
[/remark]
[example: Polynomial Growth Rate]
Consider $f(x) = x^2 + x$. We claim $f(x) = O(x^2)$ as $x \to \infty$. For $x > 1$:
\begin{align*}
|x^2 + x| = x^2 + x < x^2 + x^2 = 2x^2,
\end{align*}
so the bound holds with $M = 2$ and $N = 1$. More generally, for any polynomial $p(x) = a_n x^n + a_{n-1} x^{n-1} + \cdots + a_0$ with $a_n \neq 0$:
\begin{align*}
a_n x^n + a_{n-1} x^{n-1} + \cdots + a_0 = O(x^n) \quad \text{as } x \to \infty,
\end{align*}
since each lower-order term $a_k x^k$ satisfies $|a_k x^k| \leq |a_k| x^n$ for $x \geq 1$.
[/example]
The distinction between $O$ and $o$ becomes critical when we discuss error terms in Taylor's theorem — the stronger estimate $O(h^{n+1})$ carries more information than the weaker $o(h^n)$, and we will need to understand exactly why. This is the subject of the next section, where asymptotic notation provides the natural language for Taylor remainder estimates.
## Differentiation Rules
The derivative of a function $f: \mathbb{R} \to \mathbb{R}$ at a point $x_0$ is defined as the limit of the difference quotient, when it exists:
[definition: Derivative]
Let $f: I \to \mathbb{R}$ be defined on an open interval $I \subseteq \mathbb{R}$. The **derivative** of $f$ at $x_0 \in I$ is
\begin{align*}
f'(x_0) := \lim_{h \to 0} \frac{f(x_0 + h) - f(x_0)}{h},
\end{align*}
provided this limit exists. For the derivative to exist, the left and right [limits](/page/Limit) must agree:
\begin{align*}
\lim_{h \to 0^-} \frac{f(x_0 + h) - f(x_0)}{h} = \lim_{h \to 0^+} \frac{f(x_0 + h) - f(x_0)}{h}.
\end{align*}
[/definition]
This is a strong smoothness requirement. Functions with corners, cusps, or vertical tangents fail to be differentiable at those points, even if they are continuous everywhere.
[example: Non Differentiability At A Corner]
The function $f: \mathbb{R} \to \mathbb{R}$ defined by $f(x) = |x|$ is continuous at $x = 0$ but not differentiable there. The one-sided limits of the difference quotient disagree:
\begin{align*}
\lim_{h \to 0^-} \frac{|h|}{h} = \lim_{h \to 0^-} \frac{-h}{h} = -1, \qquad \lim_{h \to 0^+} \frac{|h|}{h} = \lim_{h \to 0^+} \frac{h}{h} = 1.
\end{align*}
The graph has a corner at the origin: the left half has slope $-1$ and the right half has slope $+1$, so no single tangent line exists. This function will reappear in a more sophisticated guise when we study the Sobolev-space perspective on weak derivatives — the equation $u' = \operatorname{sgn}(x)$ has $u(x) = |x|$ as a solution in a [distributional](/page/Distribution) sense, even though the classical derivative fails at $x = 0$.
[/example]
Using the definition directly to differentiate composite expressions is impractical. The algebraic rules — collected in the following theorem — reduce the problem to differentiating elementary building blocks.
In terms of asymptotic notation, differentiability at $x_0$ means precisely that $f(x_0 + h) = f(x_0) + hf'(x_0) + o(h)$ as $h \to 0$: the function is approximated by its tangent line, with an error that is $o(h)$.
[quotetheorem:198]
The [Algebra of Derivatives](/theorems/198) contains the four fundamental rules: the sum rule, the product rule, the quotient rule, and the chain rule. The product rule proof illustrates a technique we will use repeatedly — expanding both factors via their linear approximations ($u(x+h) = u(x) + hu'(x) + o(h)$, similarly for $v$), multiplying out, and observing that cross terms involving $o(h)$ vanish after dividing by $h$. The chain rule is more delicate: the naive cancellation argument $\frac{\Delta f}{\Delta g} \cdot \frac{\Delta g}{\Delta x}$ fails when $\Delta g = 0$, and the correct proof introduces an auxiliary error function $\phi$ that is continuous at $g(x_0)$ and absorbs the problematic case. Both proofs reward careful reading — they are the prototypes for the more involved perturbation arguments in Chapter 3.
[citeproof:198]
The product rule generalises to higher derivatives via the Leibniz rule, which has the same combinatorial structure as the binomial theorem — a pattern that is not coincidental, since repeated application of the product rule generates the same Pascal's-triangle coefficients.
[quotetheorem:826]
The [General Leibniz Rule](/theorems/826) is proved by induction on $n$. The base case is the product rule, and the inductive step uses Pascal's identity $\binom{n}{r} + \binom{n}{r-1} = \binom{n+1}{r}$ to combine coefficients after differentiating the $n$-th formula. Although we will not use the Leibniz rule directly in our ODE work, it appears naturally in the theory of linear differential operators and in the analysis of Green's functions.
[citeproof:826]
## Taylor's Theorem
The differentiation rules from the previous section compute derivatives efficiently, but derivatives give only *infinitesimal* information — the slope, curvature, and higher-order bending of a function at a single point. Taylor's theorem is the bridge from this local data to *semi-global* approximation: it reconstructs the values of $f$ near $x_0$ from the derivatives $f^{(k)}(x_0)$, with an explicit error bound that tells us the cost of truncation.
This theorem is arguably the single most important tool in the course. Every time we linearise a nonlinear ODE around an equilibrium (Chapter 3), every time we detune a degenerate characteristic equation (Chapter 4), and every time we expand a coefficient in a Frobenius series (Chapter 4, §11), we are applying Taylor's theorem and controlling the remainder. Understanding the distinction between the weak remainder estimate $o(h^n)$ and the strong estimate $O(h^{n+1})$ is essential for knowing when a linearisation is justified.
Suppose we want to solve $\frac{dy}{dt} = \sin(y)$ near the equilibrium $y = 0$. Replacing $\sin(y)$ by $y$ (its first Taylor polynomial) gives the linearised equation $\frac{dy}{dt} = y$, which we can solve explicitly. But what is the error? Taylor's theorem tells us $\sin(y) = y + E_1$ where $E_1 = o(y)$ — the error is negligible compared to $y$ for small $y$. If we need a sharper bound, the stronger form gives $E_1 = O(y^3)$, meaning the linearisation is accurate to within a constant times $y^3$. Without this control, perturbation analysis would be guesswork.
[quotetheorem:827]
[Taylor's Theorem](/theorems/827) comes in two strengths, and the difference matters. The basic version — requiring only $n$-times differentiability — gives a remainder of $o(h^n)$: the error vanishes faster than $h^n$, but we have no explicit bound on *how much* faster. The stronger version — requiring the existence of $f^{(n+1)}$ — upgrades this to $O(h^{n+1})$ with the precise Lagrange form $E_n = f^{(n+1)}(\xi) h^{n+1}/(n+1)!$, where $\xi$ lies between $x_0$ and $x_0 + h$.
### Why $O(h^{n+1})$ Is Stronger Than $o(h^n)$
The claim that $O(h^{n+1})$ implies $o(h^n)$ is straightforward: if $|g(h)| \leq M h^{n+1}$ for $0 < |h| < \delta$, then
\begin{align*}
\left|\frac{g(h)}{h^n}\right| \leq \frac{M h^{n+1}}{h^n} = Mh \to 0 \quad \text{as } h \to 0,
\end{align*}
so $g = o(h^n)$. The converse fails: for $0 < a < 1$, the function $h^{n+a}$ satisfies $h^{n+a} = o(h^n)$ (since $h^{n+a}/h^n = h^a \to 0$), but $h^{n+a} \neq O(h^{n+1})$, because the inequality $h^{n+a} \leq M h^{n+1}$ would require $1 \leq M h^{1-a}$, which fails for sufficiently small $h$ regardless of the constant $M$. The function $h^{n+a}$ decays faster than $h^n$ but slower than $h^{n+1}$ — it lives in the gap between the two estimates.
This distinction becomes operationally important in perturbation analysis. When we Taylor-expand $f(t, c + \varepsilon)$ in Chapter 3 to derive the linearised stability equation, we write the remainder as $O(\varepsilon^2)$ (not merely $o(\varepsilon)$), because the stronger estimate guarantees that the neglected terms are bounded by a constant times $\varepsilon^2$. The weaker $o(\varepsilon)$ would only tell us the terms are negligible, without quantifying *how* negligible — and for rigorous error control in numerical methods or asymptotic expansions, quantitative bounds are essential.
The proof of [Taylor's Theorem](/theorems/827) belongs to the Analysis I course and proceeds by repeated application of the [mean value theorem](/theorems/186) (for the Lagrange remainder) or by induction on $n$ using [integration by parts](/theorems/210) (for the integral remainder). We will use the theorem freely throughout the course without reproving it.
### L'Hôpital's Rule
Taylor's theorem has an elegant application to the evaluation of indeterminate limits. When both $f(x) \to 0$ and $g(x) \to 0$ as $x \to x_0$, the ratio $f(x)/g(x)$ takes the indeterminate form $0/0$. Rather than manipulating the ratio algebraically (which may be impossible for complicated expressions), we can replace both functions by their Taylor expansions and cancel the common vanishing factor.
[quotetheorem:828]
The key idea behind [L'Hôpital's Rule](/theorems/828) is simple: if $f(x_0) = g(x_0) = 0$, then near $x_0$ both functions are dominated by their first-order Taylor terms, and the ratio $f(x)/g(x) \approx f'(x_0)(x - x_0) / (g'(x_0)(x - x_0)) = f'(x_0)/g'(x_0)$. The proof we give here uses exactly this argument — it is valid under the stronger hypothesis that $f'$ and $g'$ are continuous at $x_0$. The general version (which handles the $\infty/\infty$ form and does not require [continuity](/page/Continuity) of the derivatives) uses the Cauchy mean value theorem and is proved in Analysis I.
[citeproof:828]
[example: Evaluating A Limit Via Taylor Expansion]
Consider $\lim_{x \to 0} \frac{e^x - 1 - x}{x^2}$. By [Taylor's Theorem](/theorems/827) with $n = 2$:
\begin{align*}
e^x = 1 + x + \frac{x^2}{2} + O(x^3),
\end{align*}
so
\begin{align*}
e^x - 1 - x = \frac{x^2}{2} + O(x^3).
\end{align*}
Dividing by $x^2$:
\begin{align*}
\frac{e^x - 1 - x}{x^2} = \frac{1}{2} + O(x) \to \frac{1}{2} \quad \text{as } x \to 0.
\end{align*}
This is faster and more transparent than applying L'Hôpital's rule twice (which would require differentiating the numerator and denominator separately at each step). In general, when the functions involved have known Taylor expansions, direct substitution is preferable to repeated application of L'Hôpital's rule.
[/example]
With the single-variable toolkit in place — asymptotic notation for error control, differentiation rules for computation, and Taylor's theorem for approximation — we are ready to extend these ideas to functions of several variables. The passage from one to many variables introduces qualitatively new phenomena (partial derivatives that may not commute, path-dependent differentials, implicit functions defined by level [sets](/page/Set)), and the multivariate chain rule will become the primary computational tool for solving differential equations by change of variables.
# Integration and Multivariable Calculus
The single-variable tools of Chapter 1 — Taylor's theorem, asymptotic notation, the algebraic differentiation rules — operate on functions of one variable. But the differential equations we care about typically involve functions of several variables: the right-hand side $f(t, y)$ of a first-order ODE $dy/dt = f(t, y)$ depends on both the independent variable $t$ and the unknown $y$; the coefficients $p(x)$ and $q(x)$ of a second-order ODE may themselves require differentiation; and PDEs like the [wave equation](/page/Wave%20Equation) involve derivatives with respect to multiple independent variables simultaneously. To handle these, we need the calculus of functions of several variables — partial derivatives, the multivariate chain rule, and the technique of differentiating under the integral sign.
This chapter begins with the definition of the integral (which we will need for variation of parameters in Chapter 4 and for the Leibniz integral rule at the end of this chapter), then develops partial differentiation and the multivariate chain rule in full generality. The chain rule is not a single result but a family of techniques: it computes derivatives along parametrised curves, transforms differential equations under changes of variable, extracts derivatives of implicitly defined functions, and — in its most powerful application — allows us to differentiate integrals whose limits and integrands depend on a parameter. Each of these applications will be used repeatedly in Chapters 3–5.
## The [Riemann Integral](/page/Riemann%20Integral)
Integration enters this course in three essential ways. First, solving [separable](/page/Separable) ODEs $q(y) \, dy = p(x) \, dx$ requires integrating both sides — the solution is $\int q(y) \, dy = \int p(x) \, dx + C$. Second, the variation of parameters formula in Chapter 4 expresses particular integrals as definite integrals of known functions. Third, the Leibniz integral rule at the end of this chapter differentiates integrals whose limits depend on a parameter — a technique needed to verify that variation-of-parameters solutions actually satisfy the ODE. For all of these, we need a precise definition of the integral.
[definition: Riemann Integral]
Let $f: [a, b] \to \mathbb{R}$ be a bounded function. Partition $[a, b]$ into $N$ equal subintervals of width $\Delta x = (b - a)/N$, with endpoints $x_n = a + n \Delta x$ for $n = 0, 1, \ldots, N$. The **Riemann integral** of $f$ over $[a, b]$ is
\begin{align*}
\int_a^b f(x) \, dx = \lim_{N \to \infty} \sum_{n=0}^{N-1} f(x_n) \Delta x,
\end{align*}
provided this limit exists and is independent of the choice of sample points within each subinterval.
[/definition]
The rigorous theory of when this limit exists — via upper and lower Darboux sums — belongs to the Analysis I course. For our purposes, it suffices to know that continuous functions on closed bounded intervals are always Riemann [integrable](/page/Integral), and that the [Fundamental Theorem of Calculus](/theorems/632) connects integration to antidifferentiation: if $F$ is continuous on $[a, b]$ and differentiable on $(a, b)$ with $F'(x) = f(x)$, then $\int_a^b f(x) \, dx = F(b) - F(a)$. We will use integration freely from this point onwards.
## Partial Differentiation
When a function depends on several variables, we can ask how it changes when we vary one variable while holding the others fixed. This is the idea behind partial differentiation — and it raises a subtlety that has no analogue in single-variable calculus: the order in which we differentiate with respect to different variables might, in principle, matter.
Consider a function $f: U \to \mathbb{R}$ defined on an open subset $U \subseteq \mathbb{R}^2$. If we fix $y$ and vary $x$, we obtain a single-variable function, and its derivative (if it exists) is the partial derivative with respect to $x$.
[definition: Partial Derivative]
Let $U \subseteq \mathbb{R}^2$ be open and let $f: U \to \mathbb{R}$. The **partial derivative** of $f$ with respect to $x$ at the point $(x, y) \in U$ is
\begin{align*}
\frac{\partial f}{\partial x}(x, y) := \lim_{\delta x \to 0} \frac{f(x + \delta x, y) - f(x, y)}{\delta x},
\end{align*}
provided this limit exists. Geometrically, $\partial f / \partial x$ measures the slope of $f$ experienced when moving purely in the positive $x$-direction, with $y$ held constant.
[/definition]
The partial derivative $\partial f / \partial y$ is defined analogously by varying $y$ at fixed $x$. Higher-order partial derivatives are obtained by iterating: $\frac{\partial^2 f}{\partial x \partial y}$ means "first differentiate with respect to $y$, then with respect to $x$." A natural question arises: does the order matter? If we instead compute $\frac{\partial^2 f}{\partial y \partial x}$ — first with respect to $x$, then $y$ — do we get the same result?
The answer is yes, provided the mixed partials are continuous — and this is the content of Schwarz's theorem (also known as Clairaut's theorem). The continuity hypothesis is essential: without it, there exist pathological functions where the mixed partials exist everywhere but disagree at a point.
[quotetheorem:332]
The [Symmetry of Second Derivatives](/theorems/332) is proved in Analysis II using the second-difference operator $\Delta(s,t) = f(a + se_1 + te_2) - f(a + se_1) - f(a + te_2) + f(a)$, which can be computed by differentiating in either order and applying the mean value theorem. The equality of the two iterated computations, in the limit, gives $D_1 D_2 f = D_2 D_1 f$. We will use this result without proof whenever we exchange the order of partial differentiation — which happens implicitly every time we apply the exactness criterion in Chapter 3 or manipulate the Hessian matrix in Chapter 5.
## The Multivariable Chain Rule
In single-variable calculus, the chain rule tells us how to differentiate a composition: if $y = f(g(x))$, then $dy/dx = f'(g(x)) g'(x)$. In several variables, the situation is richer because there are multiple independent directions along which we can differentiate, and the function's dependence on each variable can be mediated through different intermediate quantities. The multivariable chain rule accounts for all of these pathways simultaneously.
The question that motivates the result is concrete: given a function $f(x, y)$ where $x$ and $y$ are themselves functions of some other variable(s), how does $f$ change? For instance, if a temperature field $T(x, y)$ is measured by a sensor moving along a path $(x(t), y(t))$, at what rate does the sensor's reading change with time? The answer requires tracking how $T$ changes due to the motion in the $x$-direction and the motion in the $y$-direction, and adding both contributions — this is the essence of the chain rule.
To derive the result, consider the increment $\delta f = f(x + \delta x, y + \delta y) - f(x, y)$. We decompose the path from $(x, y)$ to $(x + \delta x, y + \delta y)$ into two segments: first move along the $x$-axis to $(x + \delta x, y)$, then along the $y$-axis to $(x + \delta x, y + \delta y)$. Applying [Taylor's Theorem](/theorems/827) along each segment and collecting terms yields the differential $df = \frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy$.
[quotetheorem:830]
The [Multivariable Chain Rule](/theorems/830) is the single most computationally important result in this chapter. Its power comes from the fact that it applies in any situation where variables are related through intermediate quantities — whether those are explicit parametric functions, coordinate transformations, or implicit constraints. The proof given here follows the derivation in the lecture notes: decompose the increment along coordinate axes, apply single-variable Taylor expansions, and take the limit.
[citeproof:830]
The remainder of this chapter develops four distinct applications of the chain rule, each of which will be used in the ODE and PDE chapters.
### Differentiation Along a Parametrised Path
The most direct application is computing $\frac{d}{dt} f(x(t), y(t))$ when $f(x, y)$ is evaluated along a curve $(x(t), y(t))$. The [Multivariable Chain Rule](/theorems/830) gives this directly: since $f$ depends on $t$ through $x(t)$ and $y(t)$, we differentiate with respect to $t$ by summing the contributions from each pathway:
\begin{align*}
\frac{d}{dt} f(x(t), y(t)) = \frac{\partial f}{\partial x} \frac{dx}{dt} + \frac{\partial f}{\partial y} \frac{dy}{dt}.
\end{align*}
This formula will be the starting point for the [method of characteristics](/page/Method%20of%20Characteristics) in Chapter 5: we will choose the path $(x(t), y(t))$ so that $\frac{d}{dt}f$ simplifies, reducing a PDE to an ODE along each characteristic curve.
The chain rule result above is sometimes written in the shorthand $df = \frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy$, where $df$, $dx$, $dy$ are *formal symbols* (the "total differential" of $f$ and the "coordinate differentials") satisfying $df/dt = \frac{d}{dt}f$, $dx/dt = \dot{x}$, $dy/dt = \dot{y}$. This notation is convenient for bookkeeping — for instance, along a curve where $f$ is constant, $df = 0$ — but it should always be understood as an abbreviation for the chain rule applied to a specific parametrisation.
### Change of Variables
Suppose we reparametrise $(x, y)$ in terms of new coordinates $(u, v)$: that is, $x = x(u, v)$ and $y = y(u, v)$. The chain rule gives derivatives of $f$ with respect to the new coordinates:
\begin{align*}
\frac{\partial f}{\partial u} = \frac{\partial f}{\partial x} \frac{\partial x}{\partial u} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial u}, \qquad \frac{\partial f}{\partial v} = \frac{\partial f}{\partial x} \frac{\partial x}{\partial v} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial v}.
\end{align*}
In operator form, this reads $\frac{\partial}{\partial u} = \frac{\partial x}{\partial u} \frac{\partial}{\partial x} + \frac{\partial y}{\partial u} \frac{\partial}{\partial y}$, and similarly for $\frac{\partial}{\partial v}$. The operator form is particularly useful when transforming differential equations: we substitute the new derivative operators into the equation and simplify. This is exactly how we will solve equidimensional ODEs in Chapter 4 (via $z = \ln x$) and factor the wave equation in Chapter 5 (via $\xi = x + ct$, $\eta = x - ct$).
[example: Polar Coordinate Transformation]
Let $f(x, y)$ be a function expressed in Cartesian coordinates, and suppose we wish to compute $\partial f / \partial r$ and $\partial f / \partial \theta$ in polar coordinates, where $x = r \cos \theta$ and $y = r \sin \theta$. The chain rule gives:
\begin{align*}
\frac{\partial f}{\partial r} &= \frac{\partial f}{\partial x} \frac{\partial x}{\partial r} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial r} = \frac{\partial f}{\partial x} \cos \theta + \frac{\partial f}{\partial y} \sin \theta, \\
\frac{\partial f}{\partial \theta} &= \frac{\partial f}{\partial x} \frac{\partial x}{\partial \theta} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial \theta} = -\frac{\partial f}{\partial x} r \sin \theta + \frac{\partial f}{\partial y} r \cos \theta.
\end{align*}
These can be inverted to express the Cartesian partial derivatives in terms of polar ones:
\begin{align*}
\frac{\partial f}{\partial x} = \cos \theta \frac{\partial f}{\partial r} - \frac{\sin \theta}{r} \frac{\partial f}{\partial \theta}, \qquad \frac{\partial f}{\partial y} = \sin \theta \frac{\partial f}{\partial r} + \frac{\cos \theta}{r} \frac{\partial f}{\partial \theta}.
\end{align*}
From these, one derives the Laplacian in polar coordinates: $\Delta f = \frac{\partial^2 f}{\partial r^2} + \frac{1}{r} \frac{\partial f}{\partial r} + \frac{1}{r^2} \frac{\partial^2 f}{\partial \theta^2}$, a computation that requires applying the chain rule twice and using [Schwarz's theorem](/theorems/332) to commute mixed partials.
[/example]
### Implicit Differentiation
In many applications, the relationship between variables is not given explicitly (as $y = g(x)$) but implicitly, through a constraint. For instance, the equation $x^2 + y^2 = 1$ defines $y$ as a function of $x$ near any point where $y \neq 0$, but the explicit forms $y = \pm\sqrt{1 - x^2}$ are cumbersome and lose the symmetry of the original equation. Implicit differentiation computes $dy/dx$ directly from the constraint, without ever solving for $y$.
The idea is to differentiate both sides of the constraint with respect to $x$, using the chain rule to handle the implicit dependence of $y$ on $x$. If $f(x, y) = c$ and $y = y(x)$ satisfies this constraint, then $f(x, y(x)) = c$ is an identity in $x$. Differentiating both sides with respect to $x$ using the [Multivariable Chain Rule](/theorems/830):
\begin{align*}
\frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx} = 0,
\end{align*}
where $\frac{\partial f}{\partial x}$ and $\frac{\partial f}{\partial y}$ are the partial derivatives of $f$ as a function of two independent variables, evaluated at $(x, y(x))$. Solving for $dy/dx$ (valid whenever $\frac{\partial f}{\partial y} \neq 0$):
\begin{align*}
\frac{dy}{dx} = -\frac{\partial f / \partial x}{\partial f / \partial y}.
\end{align*}
The same argument extends to three variables. Suppose $f(x, y, z) = c$ defines $z$ implicitly as a function $z = z(x, y)$. Then $f(x, y, z(x, y)) = c$ identically. Differentiating with respect to $x$ while holding $y$ fixed:
\begin{align*}
\frac{\partial f}{\partial x} + \frac{\partial f}{\partial z} \frac{\partial z}{\partial x} = 0 \quad \implies \quad \frac{\partial z}{\partial x} = -\frac{\partial f / \partial x}{\partial f / \partial z},
\end{align*}
and differentiating with respect to $y$ while holding $x$ fixed:
\begin{align*}
\frac{\partial f}{\partial y} + \frac{\partial f}{\partial z} \frac{\partial z}{\partial y} = 0 \quad \implies \quad \frac{\partial z}{\partial y} = -\frac{\partial f / \partial y}{\partial f / \partial z}.
\end{align*}
In both cases, the formula has the form "minus the ratio of the partial derivative with respect to the numerator variable, to the partial derivative with respect to the denominator variable." The underlying mechanism is the same: apply the chain rule to the identity $f(\ldots) = c$ and solve for the desired derivative.
[example: Implicit Derivative On A Level Surface]
Let $f(x, y, z) = x^2 y + yz^2 + xz$, and suppose $f(x, y, z) = c$ defines $z$ implicitly as a function of $x$ and $y$. Then:
\begin{align*}
\frac{\partial f}{\partial x} = 2xy + z, \qquad \frac{\partial f}{\partial y} = x^2 + z^2, \qquad \frac{\partial f}{\partial z} = 2yz + x.
\end{align*}
The implicit derivatives are:
\begin{align*}
\frac{\partial z}{\partial x} = -\frac{2xy + z}{2yz + x}, \qquad \frac{\partial z}{\partial y} = -\frac{x^2 + z^2}{2yz + x},
\end{align*}
valid at any point where $2yz + x \neq 0$. No explicit formula for $z(x, y)$ is needed.
[/example]
### Rules for Partial Derivatives
When three variables $x, y, z$ are related by a single constraint $f(x, y, z) = c$ (so that any one is implicitly a function of the other two), the implicit differentiation technique from the previous section yields a family of identities. These are frequently useful in thermodynamics and fluid mechanics, where physical quantities are constrained by equations of state. We derive each from the chain rule — no identity is assumed without proof.
**Reciprocal rule.** The constraint $f(x, y, z) = c$ defines both $y = y(x, z)$ (with $z$ held fixed) and $x = x(y, z)$ (with $z$ held fixed). By implicit differentiation:
\begin{align*}
\left(\frac{\partial y}{\partial x}\right)_z = -\frac{\partial f / \partial x}{\partial f / \partial y}, \qquad \left(\frac{\partial x}{\partial y}\right)_z = -\frac{\partial f / \partial y}{\partial f / \partial x}.
\end{align*}
These are reciprocals of each other, so:
\begin{align*}
\left(\frac{\partial x}{\partial y}\right)_z = \frac{1}{(\partial y / \partial x)_z}.
\end{align*}
**Chain rule.** If $x = x(y, z)$ and $y = y(t, z)$ are both defined implicitly (with $z$ held fixed throughout), then $x$ depends on $t$ through $y$. By the single-variable chain rule applied at fixed $z$:
\begin{align*}
\left(\frac{\partial x}{\partial t}\right)_z = \left(\frac{\partial x}{\partial y}\right)_z \left(\frac{\partial y}{\partial t}\right)_z.
\end{align*}
This is simply the chain rule for compositions, with the subscript $z$ reminding us that all three derivatives are computed with the same variable held fixed.
**Cyclic rule.** This is the non-obvious identity, and its proof reveals why the minus sign appears. From $f(x, y, z) = c$, implicit differentiation gives three ratios:
\begin{align*}
\left(\frac{\partial x}{\partial y}\right)_z = -\frac{f_y}{f_x}, \qquad \left(\frac{\partial y}{\partial z}\right)_x = -\frac{f_z}{f_y}, \qquad \left(\frac{\partial z}{\partial x}\right)_y = -\frac{f_x}{f_z}.
\end{align*}
Multiplying all three:
\begin{align*}
\left(\frac{\partial x}{\partial y}\right)_z \left(\frac{\partial y}{\partial z}\right)_x \left(\frac{\partial z}{\partial x}\right)_y = \left(-\frac{f_y}{f_x}\right)\left(-\frac{f_z}{f_y}\right)\left(-\frac{f_x}{f_z}\right) = (-1)^3 \cdot 1 = -1.
\end{align*}
The factors $f_x, f_y, f_z$ cancel in pairs, leaving only the three minus signs — one from each application of implicit differentiation. The cyclic rule is a direct consequence of the fact that implicit differentiation introduces a minus sign, and an odd number of such applications produces $-1$ rather than $+1$.
These identities hold only when $x, y, z$ are genuinely related through a single constraint. They fail when the variables belong to different coordinate systems — for instance, if $(x, y)$ and $(\xi, \eta)$ are two coordinate systems on $\mathbb{R}^2$, the derivative operators $\partial/\partial x$ and $\partial/\partial \xi$ act differently and do not satisfy the reciprocal or cyclic rules in general, because the "held fixed" variable changes between the two systems.
## Differentiating Under the Integral Sign
The final application of the chain rule addresses a problem that arises naturally in the study of parametrised families of integrals: if $I(\alpha) = \int_{a(\alpha)}^{b(\alpha)} f(x; \alpha) \, dx$, where the integrand and the limits of integration all depend on a parameter $\alpha$, what is $dI/d\alpha$?
This question is not merely academic. In Chapter 4, the variation of parameters formula expresses the particular integral of an ODE as an integral whose limits depend on the independent variable. Computing derivatives of such expressions — for instance, verifying that a proposed solution satisfies the ODE — requires the Leibniz integral rule. The result also provides an elegant technique for evaluating definite integrals by introducing a parameter, differentiating, solving the resulting ODE, and then setting the parameter to a specific value.
[quotetheorem:831]
The [Leibniz Integral Rule](/theorems/831) has three terms, each with a clear interpretation. The integral $\int_{a}^{b} \frac{\partial f}{\partial x} dt$ captures the change in $I$ due to the integrand itself varying with the parameter. The term $f(b(x); x) \frac{db}{dx}$ accounts for the contribution from the upper limit sweeping through new values of the integrand. The term $-f(a(x); x) \frac{da}{dx}$ does the same for the lower limit, with a minus sign because the lower limit contributes negatively to the integral. When the limits are constant, only the first term survives, and we recover the familiar rule "differentiate under the integral sign."
[citeproof:831]
[example: Leibniz Rule With Variable Limits]
Consider the function
\begin{align*}
I(x) = \int_0^{x^2} e^{-xt} \, dt.
\end{align*}
Here $f(t; x) = e^{-xt}$, $a(x) = 0$, and $b(x) = x^2$. Applying the [Leibniz Integral Rule](/theorems/831):
\begin{align*}
\frac{dI}{dx} &= \int_0^{x^2} \frac{\partial}{\partial x} e^{-xt} \, dt + e^{-x \cdot x^2} \cdot \frac{d}{dx}(x^2) - e^{-x \cdot 0} \cdot \frac{d}{dx}(0) \\
&= \int_0^{x^2} (-t) e^{-xt} \, dt + 2x e^{-x^3}.
\end{align*}
The remaining integral can be evaluated by integration by parts. Setting $u = t$ and $dv = -x e^{-xt} dt$ (so $du = dt$ and $v = e^{-xt}$, noting we integrate with respect to $t$):
\begin{align*}
\int_0^{x^2} (-t) e^{-xt} \, dt &= \frac{1}{x}\left[t e^{-xt}\right]_0^{x^2} - \frac{1}{x}\int_0^{x^2} e^{-xt} \, dt \\
&= \frac{x^2 e^{-x^3}}{x} - \frac{1}{x} \left[-\frac{1}{x} e^{-xt}\right]_0^{x^2} \\
&= x e^{-x^3} + \frac{1}{x^2}\left(e^{-x^3} - 1\right).
\end{align*}
Combining:
\begin{align*}
\frac{dI}{dx} = x e^{-x^3} + \frac{e^{-x^3} - 1}{x^2} + 2x e^{-x^3} = 3x e^{-x^3} + \frac{e^{-x^3} - 1}{x^2}.
\end{align*}
One can verify this independently by first evaluating $I(x) = \int_0^{x^2} e^{-xt} \, dt = \frac{1 - e^{-x^3}}{x}$ (for $x \neq 0$) and differentiating directly — the results agree.
[/example]
The Leibniz integral rule generalises the Fundamental Theorem of Calculus. In the special case where $f(t; x) = f(t)$ does not depend on $x$ and $a(x) = 0$, the rule reduces to $\frac{d}{dx} \int_0^{b(x)} f(t) \, dt = f(b(x)) b'(x)$, which is exactly the chain-rule form of the FTC. The full version, with all three terms, is what we need for the more complex integral expressions that arise in variation of parameters and Green's function methods.
With the multivariate chain rule and its applications established, we now have the computational machinery needed to attack differential equations directly. The next chapter introduces the first class of equations we will solve: nonlinear first-order ODEs, where the chain rule appears through exact differentials and the technique of separation of variables, and Taylor's theorem drives the perturbation analysis of equilibrium points.
# Nonlinear First-Order Differential Equations
The calculus machinery of Chapters 1 and 2 was preparation; now we use it. A first-order ordinary differential equation is a relation between an unknown function $y$, its first derivative $dy/dx$, and the independent variable $x$. In its most general form:
\begin{align*}
Q(x, y) \frac{dy}{dx} + P(x, y) = 0.
\end{align*}
The central question is: given $P$ and $Q$, can we find $y(x)$? Unlike linear algebra, where a systematic algorithm (Gaussian elimination) solves every system, there is no universal method for nonlinear ODEs. Instead, we develop a collection of techniques, each applicable to a specific structural class of equations. This chapter covers two solution methods — separation of variables and exact equations — and then addresses the qualitative theory of equilibria, where we cannot solve explicitly but can still determine the long-term behaviour of solutions using the perturbation analysis tools from Chapter 1.
The connecting thread is the [Multivariable Chain Rule](/theorems/830) from Chapter 2. Separation of variables works by rewriting $Q \, dy/dx + P = 0$ so that each side depends on only one variable, then integrating. Exact equations are those where $P \, dx + Q \, dy$ is already the differential $df$ of some function $f(x, y)$, so that the solution is the level set $f(x, y) = C$. In both cases, the chain rule — in the guise of the differential $df = \frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy$ — is the mechanism that connects the ODE to an implicit solution.
## Separable Equations
The simplest strategy for solving a first-order ODE is to manipulate it until each side involves only one variable, then integrate. This works when the right-hand side of $dy/dx = f(x, y)$ factors as a product $p(x) / q(y)$ — the $x$-dependence and $y$-dependence are completely disentangled. Rearranging gives $q(y) \, dy = p(x) \, dx$, and since each side is the differential of a single-variable antiderivative, integrating both sides independently yields an implicit solution. This is, in effect, a special case of the exact equations method we will develop next: the equation $p(x) \, dx - q(y) \, dy = 0$ is automatically exact because $P = p(x)$ depends only on $x$ and $Q = -q(y)$ depends only on $y$, so $\partial P / \partial y = 0 = \partial Q / \partial x$.
[definition: Separable Equation]
A first-order ODE is **separable** if it can be written in the form
\begin{align*}
q(y) \, dy = p(x) \, dx,
\end{align*}
where $q: \mathbb{R} \to \mathbb{R}$ depends only on $y$ and $p: \mathbb{R} \to \mathbb{R}$ depends only on $x$.
[/definition]
The solution method is immediate: integrate both sides independently to obtain $\int q(y) \, dy = \int p(x) \, dx + C$. The constant of integration $C$ is determined by an initial condition. The result is an implicit equation relating $x$ and $y$; whether it can be solved explicitly for $y(x)$ depends on the form of $q$ and $p$.
[example: Exponential Growth]
The equation $\frac{dy}{dt} = ky$ (with $k \in \mathbb{R}$ constant) models exponential growth ($k > 0$) or decay ($k < 0$). Separating: $\frac{dy}{y} = k \, dt$. Integrating both sides: $\ln|y| = kt + C_0$, giving $y(t) = Ae^{kt}$ where $A = \pm e^{C_0}$. This is the prototypical separable equation, and its solution — the exponential function — will appear as the complementary function of every constant-coefficient ODE in Chapter 4. The equation $dy/dt = ky$ also arises from the perturbation analysis later in this chapter: the linearised equation near any equilibrium of an autonomous system reduces to this form.
[/example]
[example: Finite Time Blowup]
The equation $\frac{dy}{dt} = y^2$ with $y(0) = 1$ demonstrates that nonlinear equations can behave very differently from linear ones. Separating: $\frac{dy}{y^2} = dt$. Integrating: $-1/y = t + C$. The initial condition gives $C = -1$, so $y(t) = \frac{1}{1 - t}$. The solution exists only for $t < 1$: as $t \to 1^-$, $y(t) \to +\infty$. The solution *blows up in finite time* — a phenomenon impossible for linear equations, where solutions exist for all time. This shows that the domain of a solution is not determined by the equation alone but depends on the initial condition, and that nonlinearity introduces qualitatively new behaviour.
[/example]
Separable equations are rare in practice — most first-order ODEs do not factor into a product of single-variable functions. When separation fails, the next tool is to recognise the equation as an exact differential.
## Exact Equations
Suppose the expression $P(x, y) \, dx + Q(x, y) \, dy$ happens to be the total differential of some function $f(x, y)$. Then the ODE $Q \, dy/dx + P = 0$ is equivalent to $df = 0$, meaning $f(x, y)$ is constant along any solution curve. The solution is therefore the family of level sets $f(x, y) = C$.
The question is: when does such an $f$ exist, and how do we find it?
[definition: Exact Differential Equation]
The equation $P(x, y) \, dx + Q(x, y) \, dy = 0$ is **exact** if there exists a function $f: D \to \mathbb{R}$ (on some domain $D \subseteq \mathbb{R}^2$) such that
\begin{align*}
df = P(x, y) \, dx + Q(x, y) \, dy,
\end{align*}
that is, $\frac{\partial f}{\partial x} = P$ and $\frac{\partial f}{\partial y} = Q$.
[/definition]
If the equation is exact and $df = 0$, then $f(x, y) = C$ is the implicit solution. The key insight connecting this to the chain rule is that $df = \frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy$ by the [Multivariable Chain Rule](/theorems/830), so matching coefficients with $P \, dx + Q \, dy$ forces $\partial f / \partial x = P$ and $\partial f / \partial y = Q$.
### The Exactness Test
If $f$ exists with $\partial f / \partial x = P$ and $\partial f / \partial y = Q$, then by [Schwarz's theorem](/theorems/332) (the symmetry of mixed partials):
\begin{align*}
\frac{\partial P}{\partial y} = \frac{\partial^2 f}{\partial y \partial x} = \frac{\partial^2 f}{\partial x \partial y} = \frac{\partial Q}{\partial x}.
\end{align*}
This necessary condition becomes sufficient on simply connected domains — the content of the following theorem.
[definition: Simply Connected Domain]
A domain $D \subseteq \mathbb{R}^2$ is **simply connected** if it is path-connected and every closed curve in $D$ can be continuously shrunk to a point without leaving $D$.
[/definition]
Intuitively, a simply connected domain has no holes. The open disc, the entire plane, and any convex set are simply connected; the punctured plane $\mathbb{R}^2 \setminus \{0\}$ is not (a loop around the origin cannot be contracted to a point). The [topological](/page/Topology) condition matters because on multiply connected domains, the equality $\partial P / \partial y = \partial Q / \partial x$ does not guarantee the existence of a single-valued potential $f$ — the classical example being the angular form $d\theta = \frac{-y \, dx + x \, dy}{x^2 + y^2}$ on $\mathbb{R}^2 \setminus \{0\}$.
[quotetheorem:832]
The [Exactness Criterion](/theorems/832) gives a practical test: compute $\partial P / \partial y$ and $\partial Q / \partial x$. If they are equal throughout a simply connected domain, the equation is exact and a potential $f$ exists. The proof (which belongs to the theory of line integrals and is covered in Vector Calculus) constructs $f$ by integrating $P$ along a path.
### Solving Exact Equations
Once we know the equation is exact, finding $f$ is a systematic computation. We integrate one of the conditions — say $\partial f / \partial x = P$ — with respect to $x$, treating $y$ as a constant. This gives $f(x, y) = \int P(x, y) \, dx + \gamma(y)$, where the "constant of integration" $\gamma(y)$ may depend on $y$ (since we held $y$ fixed). We then determine $\gamma$ by substituting into the other condition $\partial f / \partial y = Q$ and solving the resulting ODE for $\gamma'(y)$.
[example: Solving An Exact Equation]
Consider the equation $(2xy + 3) \, dx + (x^2 + 4y) \, dy = 0$. Here $P(x, y) = 2xy + 3$ and $Q(x, y) = x^2 + 4y$.
**Step 1: Verify exactness.** Compute:
\begin{align*}
\frac{\partial P}{\partial y} = 2x, \qquad \frac{\partial Q}{\partial x} = 2x.
\end{align*}
Since $\partial P / \partial y = \partial Q / \partial x$, the equation is exact.
**Step 2: Integrate $P$ with respect to $x$.** Holding $y$ fixed:
\begin{align*}
f(x, y) = \int (2xy + 3) \, dx = x^2 y + 3x + \gamma(y).
\end{align*}
**Step 3: Determine $\gamma(y)$ from the second condition.** Differentiating with respect to $y$:
\begin{align*}
\frac{\partial f}{\partial y} = x^2 + \gamma'(y).
\end{align*}
Setting this equal to $Q = x^2 + 4y$:
\begin{align*}
x^2 + \gamma'(y) = x^2 + 4y \quad \implies \quad \gamma'(y) = 4y \quad \implies \quad \gamma(y) = 2y^2.
\end{align*}
**Step 4: Write the solution.** The potential is $f(x, y) = x^2 y + 3x + 2y^2$, and the general solution is
\begin{align*}
x^2 y + 3x + 2y^2 = C.
\end{align*}
[/example]
The exactness test $\partial P / \partial y = \partial Q / \partial x$ is a direct application of [Schwarz's theorem](/theorems/332): if the mixed partials of $f$ are continuous (which they are, since $P$ and $Q$ are assumed $C^1$), they must commute, and this places a constraint on $P$ and $Q$. Conversely, on simply connected domains, this constraint is the only obstruction to the existence of $f$.
With separable and exact equations, we can solve certain first-order ODEs in closed form. But many equations — including most nonlinear ones arising in applications — admit no explicit solution. For these, we turn to qualitative analysis: rather than finding $y(x)$ exactly, we determine the long-term behaviour of solutions by studying the geometry of the vector field $f(t, y)$.
## Solution Curves and Isoclines
We have developed two algebraic methods for solving first-order ODEs — separation of variables and the exactness test. But most nonlinear equations are neither separable nor exact, and for these we need a different approach. The idea is to abandon the search for an explicit formula $y(x)$ and instead visualise the *qualitative* behaviour of solutions directly from the equation $dy/dt = f(t, y)$. This geometric viewpoint — thinking of the ODE as a vector field that steers solutions through the $(t, y)$-plane — is the foundation of the stability analysis that follows.
The starting observation is that the ODE $dy/dt = f(t, y)$ assigns a slope $f(t_0, y_0)$ to every point $(t_0, y_0)$ in the plane. A solution through that point must have exactly that slope there. Plotting these slopes across the plane gives a picture of how all solutions behave simultaneously, without solving anything.
[definition: Solution Curve]
A **solution curve** (or **trajectory**) of the ODE $dy/dt = f(t, y)$ is the graph $\{(t, y(t)) : t \in I\}$ of a solution $y: I \to \mathbb{R}$ corresponding to a specific initial condition $y(t_0) = y_0$. At every point on the curve, its slope equals $f(t, y(t))$.
[/definition]
[definition: Slope Field]
The **slope field** (or **direction field**) of the ODE $dy/dt = f(t, y)$ is the assignment of a short line segment with slope $f(t, y)$ to each point $(t, y)$ in the domain. Solution curves are tangent to the slope field at every point.
[/definition]
Sketching a slope field by evaluating $f(t, y)$ at a grid of points gives a qualitative picture of all solutions at once, but the computation is tedious. A more efficient technique uses isoclines — curves along which the slope is constant.
[definition: Isocline]
An **isocline** of the ODE $dy/dt = f(t, y)$ is a curve in the $(t, y)$-plane along which $f(t, y)$ takes a constant value $c$. On each isocline, the slope field has the same slope $c$, so all solution curves crossing the isocline do so at the same angle to the horizontal.
[/definition]
To sketch the slope field using isoclines: for several values of $c$ (e.g., $c = -2, -1, 0, 1, 2$), plot the curve $f(t, y) = c$ and draw short line segments with slope $c$ along that curve. Solution curves thread through these segments, and their qualitative shape becomes apparent by interpolation. This technique is especially powerful for autonomous equations ($f$ depending only on $y$, not $t$), where the isoclines are horizontal lines $y = \text{const}$ and the slope field is the same in every vertical strip — so solutions starting at different times but the same $y$-value have congruent trajectories.
## Fixed Points and Stability
The qualitative theory of ODEs begins with the simplest possible solutions: the constant ones. If $y(t) = c$ for all $t$, then $dy/dt = 0$, so the ODE $dy/dt = f(t, y)$ requires $f(t, c) = 0$. Such a constant solution is an equilibrium — the system, once placed at $y = c$, remains there forever. The fundamental question is: if the system is *slightly perturbed* away from equilibrium, does it return (stable) or diverge (unstable)?
[definition: Fixed Point]
A **fixed point** (or **equilibrium point**) of the ODE $\frac{dy}{dt} = f(t, y)$ is a constant $c$ such that $f(t, c) = 0$ for all $t$. The constant function $y(t) = c$ is then a solution.
[/definition]
[definition: Stability Of Fixed Points]
A fixed point $y = c$ is **stable** if whenever $y$ is displaced slightly from $c$, the solution $y(t) \to c$ as $t \to \infty$. It is **unstable** if the displacement grows in magnitude as $t \to \infty$.
[/definition]
### Perturbation Analysis
To determine stability, we perturb the system slightly from equilibrium and track the evolution of the perturbation. This is where [Taylor's Theorem](/theorems/827) enters — we expand $f$ around the fixed point and drop higher-order terms, reducing the nonlinear ODE to a linear one that we can solve explicitly.
[quotetheorem:698]
The [Linearised Stability theorem](/theorems/698) (stated here for the general $n$-dimensional case, though we use only the $n = 1$ case in this chapter) is the rigorous formulation of this perturbation argument. For a first-order ODE $dy/dt = f(t, y)$ with fixed point $y = c$, writing $y(t) = c + \varepsilon(t)$ and expanding via [Taylor's Theorem](/theorems/827):
\begin{align*}
\frac{d\varepsilon}{dt} = f(t, c + \varepsilon) = f(t, c) + \varepsilon \frac{\partial f}{\partial y}(t, c) + O(\varepsilon^2) = \varepsilon \frac{\partial f}{\partial y}(t, c) + O(\varepsilon^2),
\end{align*}
since $f(t, c) = 0$ at the equilibrium. For sufficiently small $|\varepsilon|$, the $O(\varepsilon^2)$ term is negligible (this is precisely where the distinction between $O(\varepsilon^2)$ and $o(\varepsilon)$ from Chapter 1 matters — we need the stronger estimate to justify the approximation), giving the linearised equation:
\begin{align*}
\frac{d\varepsilon}{dt} \approx \frac{\partial f}{\partial y}(t, c) \, \varepsilon.
\end{align*}
This is a separable first-order ODE with solution:
\begin{align*}
\varepsilon(t) = \varepsilon(0) \exp\left(\int_0^t \frac{\partial f}{\partial y}(s, c) \, ds\right).
\end{align*}
The perturbation grows or decays depending on the sign of $\int \frac{\partial f}{\partial y} \, ds$: if the integral tends to $-\infty$, the perturbation decays and the fixed point is stable; if to $+\infty$, it is unstable.
### Autonomous Systems
The analysis simplifies dramatically when $f$ does not depend explicitly on $t$ — the autonomous case $dy/dt = f(y)$.
[definition: Autonomous System]
An **autonomous system** is an ODE of the form $\frac{dy}{dt} = f(y)$, where the right-hand side depends only on $y$, not on $t$.
[/definition]
For an autonomous system with fixed point $y = c$, the derivative $\partial f / \partial y = f'(c)$ is a constant (since $f$ depends only on $y$ and we evaluate at the fixed value $c$). The perturbation equation becomes $d\varepsilon/dt = f'(c) \varepsilon$, with solution $\varepsilon(t) = \varepsilon(0) e^{f'(c) t}$. The stability criterion is then immediate:
If $f'(c) < 0$: the perturbation decays exponentially, so the fixed point is **stable**.
If $f'(c) > 0$: the perturbation grows exponentially, so the fixed point is **unstable**.
If $f'(c) = 0$: the linear approximation is inconclusive and higher-order terms in the Taylor expansion must be retained.
[example: Logistic Equation Stability]
The logistic equation $\frac{dy}{dt} = y(1 - y)$ models population growth with a carrying capacity at $y = 1$. Setting $f(y) = y(1 - y) = 0$ gives two fixed points: $y = 0$ and $y = 1$.
Computing $f'(y) = 1 - 2y$:
At $y = 0$: $f'(0) = 1 > 0$, so the zero equilibrium is **unstable**. A small positive population grows away from zero.
At $y = 1$: $f'(1) = -1 < 0$, so the carrying-capacity equilibrium is **stable**. A population slightly above or below $1$ returns to $1$.
This confirms the biological intuition: an initially small population grows toward the carrying capacity, and a population that overshoots is pushed back down.
[/example]
### Fixed Points of Discrete Equations
The perturbation analysis extends naturally to discrete dynamical systems — recurrence relations of the form $x_{n+1} = g(x_n)$.
[definition: Fixed Point Of A Discrete Map]
A **fixed point** of the discrete equation $x_{n+1} = g(x_n)$ is a value $x_f$ such that $g(x_f) = x_f$.
[/definition]
Writing $x_n = x_f + \varepsilon_n$ and expanding $g$ via [Taylor's Theorem](/theorems/827):
\begin{align*}
x_f + \varepsilon_{n+1} = g(x_f + \varepsilon_n) \approx g(x_f) + \varepsilon_n g'(x_f) = x_f + \varepsilon_n g'(x_f),
\end{align*}
so $\varepsilon_{n+1} \approx g'(x_f) \varepsilon_n$. Iterating: $\varepsilon_n \approx \varepsilon_0 [g'(x_f)]^n$. The perturbation shrinks or grows depending on whether $|g'(x_f)|$ is less than or greater than $1$:
If $|g'(x_f)| < 1$: the fixed point is **stable** (perturbations contract geometrically).
If $|g'(x_f)| > 1$: the fixed point is **unstable** (perturbations grow geometrically).
The threshold $|g'(x_f)| = 1$ is the discrete analogue of $f'(c) = 0$ in the continuous case — the linear theory is inconclusive, and higher-order analysis (or direct computation) is needed.
[remark: Continuous Versus Discrete Stability]
The stability criteria for continuous and discrete systems have different thresholds because the mechanisms of growth are different. In continuous time, $\varepsilon(t) = \varepsilon_0 e^{kt}$ decays when $k < 0$ (exponential decay). In discrete time, $\varepsilon_n = \varepsilon_0 \lambda^n$ decays when $|\lambda| < 1$ (geometric decay). The continuous criterion $f'(c) < 0$ corresponds to the discrete criterion $|g'(x_f)| < 1$ via the relationship $\lambda = e^{k \Delta t}$: the continuous eigenvalue $k$ and the discrete multiplier $\lambda$ are related by the exponential map, and $|e^{k \Delta t}| < 1$ if and only if $k < 0$.
[/remark]
The perturbation analysis in this chapter treats a single first-order equation — one degree of freedom. In Chapter 5, we will extend this to systems of two coupled first-order equations $\dot{x} = f(x, y)$, $\dot{y} = g(x, y)$, where the linearisation produces a $2 \times 2$ matrix and the stability is determined by its eigenvalues. The richer geometry of two dimensions — saddle points, spirals, centres — requires the matrix methods developed in the linear algebra course and the multivariate Taylor expansion from Chapter 2. But the underlying idea is the same: expand around an equilibrium, drop higher-order terms, and classify the linear system.
# Higher-Order Linear ODEs
The first-order equations of Chapter 3 model systems with a single degree of freedom — population growth, radioactive decay, simple circuits. Many physical systems, however, are governed by second-order equations: Newton's second law $F = ma$ relates a force to the *second* derivative of position; a vibrating string has displacement satisfying a second-order wave equation; an $LCR$ circuit satisfies a second-order ODE for the charge. This chapter develops the theory of linear ODEs of order two and higher, building on the algebraic and perturbation tools from Chapters 1–3.
The key structural advantage of *linear* equations is the superposition principle: if $y_1$ and $y_2$ are solutions of a homogeneous linear ODE, then so is $c_1 y_1 + c_2 y_2$ for any constants $c_1, c_2$. This reduces the problem of finding the general solution to finding sufficiently many linearly independent particular solutions. For second-order equations, "sufficiently many" means two — and the Wronskian determinant provides a test for linear independence, while [Abel's Theorem](/theorems/834) governs how the Wronskian evolves.
The chapter proceeds in four stages. First, we solve homogeneous equations with constant coefficients via the characteristic equation — the exponential ansatz $y = e^{\lambda x}$ converts the ODE into a polynomial equation for $\lambda$. Second, we develop the general theory of the Wronskian and use it to find second solutions via reduction of order. Third, we find particular integrals of inhomogeneous equations using variation of parameters — a method that works for *any* forcing term, unlike the more restrictive method of undetermined coefficients. Finally, we treat equations with variable coefficients via [power series](/page/Power%20Series) and Frobenius methods, where [Fuchs' Theorem](/theorems/836) guarantees convergence near regular singular points.
## Constant-Coefficient Equations
Consider a second-order linear ODE with constant coefficients:
\begin{align*}
ay'' + by' + cy = 0, \qquad a, b, c \in \mathbb{R}, \quad a \neq 0.
\end{align*}
The exponential ansatz $y = e^{\lambda x}$ converts the ODE into an algebraic equation. Substituting $y = e^{\lambda x}$, $y' = \lambda e^{\lambda x}$, $y'' = \lambda^2 e^{\lambda x}$ and dividing by $e^{\lambda x} \neq 0$:
\begin{align*}
a\lambda^2 + b\lambda + c = 0.
\end{align*}
[definition: Characteristic Equation]
The **characteristic equation** (or **auxiliary equation**) of the constant-coefficient ODE $ay'' + by' + cy = 0$ is the polynomial equation
\begin{align*}
a\lambda^2 + b\lambda + c = 0.
\end{align*}
Its roots $\lambda_1, \lambda_2$ determine the general solution.
[/definition]
The nature of the roots — real and distinct, repeated, or complex conjugates — determines the form of the general solution. The three cases exhaust all possibilities for a quadratic with real coefficients.
**Case 1: Two distinct real roots** $\lambda_1 \neq \lambda_2$. The general solution is $y(x) = c_1 e^{\lambda_1 x} + c_2 e^{\lambda_2 x}$.
**Case 2: A repeated real root** $\lambda_1 = \lambda_2 = \lambda$. The exponential $e^{\lambda x}$ provides one solution, but we need two linearly independent solutions. A second is obtained by *detuning*: write $\lambda_1 = \lambda + \varepsilon$ and $\lambda_2 = \lambda - \varepsilon$ for small $\varepsilon > 0$, so that the general solution for the non-degenerate case is $c_1 e^{(\lambda+\varepsilon)x} + c_2 e^{(\lambda-\varepsilon)x}$. Choosing $c_1 = c_2 = 1/(2\varepsilon)$ and taking $\varepsilon \to 0$:
\begin{align*}
\lim_{\varepsilon \to 0} \frac{e^{(\lambda+\varepsilon)x} - e^{(\lambda-\varepsilon)x}}{2\varepsilon} = \lim_{\varepsilon \to 0} e^{\lambda x} \frac{e^{\varepsilon x} - e^{-\varepsilon x}}{2\varepsilon} = e^{\lambda x} \cdot x,
\end{align*}
where the last step uses [L'Hôpital's Rule](/theorems/828) (or equivalently, the Taylor expansion $e^{\varepsilon x} = 1 + \varepsilon x + O(\varepsilon^2)$). The general solution is therefore $y(x) = (c_1 + c_2 x) e^{\lambda x}$.
**Case 3: Complex conjugate roots** $\lambda = \alpha \pm i\beta$ with $\beta \neq 0$. Euler's formula gives $e^{(\alpha + i\beta)x} = e^{\alpha x}(\cos \beta x + i \sin \beta x)$. Taking real and imaginary parts:
\begin{align*}
y(x) = e^{\alpha x}(c_1 \cos \beta x + c_2 \sin \beta x).
\end{align*}
The parameter $\alpha$ controls exponential growth or decay, while $\beta$ controls the frequency of oscillation. When $\alpha = 0$, the solutions are purely oscillatory — this is the undamped case.
[example: Damped Harmonic Oscillator]
The equation $y'' + 2\gamma y' + \omega_0^2 y = 0$ models a damped oscillator with natural frequency $\omega_0 > 0$ and damping coefficient $\gamma > 0$. The characteristic equation is $\lambda^2 + 2\gamma \lambda + \omega_0^2 = 0$, with roots:
\begin{align*}
\lambda = -\gamma \pm \sqrt{\gamma^2 - \omega_0^2}.
\end{align*}
The three damping regimes correspond to the three cases above. When $\gamma < \omega_0$ (underdamping), the roots are complex: $\lambda = -\gamma \pm i\sqrt{\omega_0^2 - \gamma^2}$, and the solution oscillates with exponentially decaying amplitude. When $\gamma = \omega_0$ (critical damping), the roots coalesce: $\lambda = -\gamma$ (repeated), and the solution is $(c_1 + c_2 t)e^{-\gamma t}$, which decays without oscillation. When $\gamma > \omega_0$ (overdamping), both roots are real and negative, and the solution decays as a sum of two exponentials. Critical damping is the fastest return to equilibrium without oscillation — a fact exploited in the design of shock absorbers and galvanometers.
[/example]
## The Phase Space Viewpoint and Reduction of Order
A second-order ODE $y'' = F(x, y, y')$ can always be rewritten as a first-order *system* by introducing the phase space variable $Y = (y, y')^\top$:
\begin{align*}
\frac{d}{dx} \begin{pmatrix} y \\ y' \end{pmatrix} = \begin{pmatrix} y' \\ F(x, y, y') \end{pmatrix}.
\end{align*}
For the linear equation $y'' + p(x)y' + q(x)y = 0$, this becomes $Y' = A(x) Y$ where $A(x) = \begin{pmatrix} 0 & 1 \\ -q(x) & -p(x) \end{pmatrix}$. The phase space formulation makes the structure of the solution space transparent: the general solution is a two-dimensional vector space (since the initial conditions $y(x_0)$ and $y'(x_0)$ are two free parameters), and any basis of this space provides two linearly independent solutions.
### The Wronskian
The test for linear independence of solutions is the Wronskian determinant.
[definition: Wronskian]
Let $y_1: I \to \mathbb{R}$ and $y_2: I \to \mathbb{R}$ be differentiable functions on an interval $I$. Their **Wronskian** is
\begin{align*}
W(x) = \det \begin{pmatrix} y_1 & y_2 \\ y_1' & y_2' \end{pmatrix} = y_1(x) y_2'(x) - y_2(x) y_1'(x).
\end{align*}
[/definition]
If $y_1$ and $y_2$ are linearly dependent (one is a constant multiple of the other), then $W = 0$ identically. The converse holds for solutions of a second-order linear ODE: if $W(x_0) = 0$ at any single point $x_0$, then $W \equiv 0$ and the solutions are linearly dependent. This remarkable all-or-nothing property is the content of Abel's theorem.
[quotetheorem:834]
[Abel's Theorem](/theorems/834) gives an explicit formula for the Wronskian: $W(x) = W(x_0) \exp\bigl(-\int_{x_0}^x p(u) \, du\bigr)$. Since the exponential is never zero, $W$ either vanishes identically or never vanishes — there is no intermediate case. The proof is a direct computation: differentiate $W$, substitute the ODE to eliminate $y_1''$ and $y_2''$, and recognise the result as the separable equation $W' = -p(x)W$, which we solve using the technique from Chapter 3.
[citeproof:834]
### Reduction of Order
If one solution $y_1$ of a second-order homogeneous ODE is known, the second solution can be found by the substitution $y_2 = v(x) y_1(x)$, where $v$ is an unknown function. Substituting into the ODE and using the fact that $y_1$ is a solution reduces the equation for $v$ to a first-order ODE for $v'$ (the terms involving $v$ without derivatives cancel). [Abel's Theorem](/theorems/834) provides a shortcut: since $W = y_1 y_2' - y_2 y_1' = y_1^2 v'$, we have:
\begin{align*}
v'(x) = \frac{W(x)}{y_1(x)^2} = \frac{W(x_0)}{y_1(x)^2} \exp\left(-\int_{x_0}^x p(u) \, du\right),
\end{align*}
and $v$ is obtained by one further integration.
## Equidimensional Equations
An important class of variable-coefficient equations is the **equidimensional** (or **Euler–Cauchy**) equation:
\begin{align*}
x^2 y'' + bxy' + cy = 0,
\end{align*}
where $b, c \in \mathbb{R}$ are constants. The key observation is that each term $x^k y^{(k)}$ has the same "dimension" — replacing $x$ by $\alpha x$ and $y$ by $y$ scales each term by the same power of $\alpha$. This suggests looking for power-law solutions $y = x^\sigma$.
Substituting $y = x^\sigma$, $y' = \sigma x^{\sigma - 1}$, $y'' = \sigma(\sigma - 1) x^{\sigma - 2}$ and dividing by $x^\sigma$:
\begin{align*}
\sigma(\sigma - 1) + b\sigma + c = 0.
\end{align*}
This is the **indicial equation** for the equidimensional ODE. Its roots determine $\sigma$, and the analysis parallels the characteristic equation: distinct real roots give $y = c_1 x^{\sigma_1} + c_2 x^{\sigma_2}$; a repeated root gives $y = (c_1 + c_2 \ln x) x^\sigma$; and complex roots $\sigma = \alpha \pm i\beta$ give $y = x^\alpha(c_1 \cos(\beta \ln x) + c_2 \sin(\beta \ln x))$.
Alternatively, the substitution $x = e^t$ (or $t = \ln x$ for $x > 0$) transforms the equidimensional equation into a constant-coefficient equation in $t$. To see why, we use the chain rule from Chapter 2. Since $t = \ln x$, we have $dt/dx = 1/x$, so $\frac{d}{dx} = \frac{1}{x} \frac{d}{dt}$ and therefore $xy' = \frac{dy}{dt}$. For the second derivative, we differentiate $y' = \frac{1}{x} \frac{dy}{dt}$ using the product rule:
\begin{align*}
y'' = \frac{d}{dx}\left(\frac{1}{x} \frac{dy}{dt}\right) = -\frac{1}{x^2} \frac{dy}{dt} + \frac{1}{x} \frac{d}{dx}\left(\frac{dy}{dt}\right) = -\frac{1}{x^2} \frac{dy}{dt} + \frac{1}{x^2} \frac{d^2 y}{dt^2},
\end{align*}
so $x^2 y'' = \frac{d^2 y}{dt^2} - \frac{dy}{dt}$. Substituting into $x^2 y'' + bxy' + cy = 0$:
\begin{align*}
\frac{d^2 y}{dt^2} - \frac{dy}{dt} + b \frac{dy}{dt} + cy = \frac{d^2 y}{dt^2} + (b - 1)\frac{dy}{dt} + cy = 0,
\end{align*}
a constant-coefficient equation in $t$, which we solve using the characteristic equation from §1.
## Particular Integrals and Variation of Parameters
For the inhomogeneous equation $y'' + p(x)y' + q(x)y = f(x)$, the general solution is $y = y_h + y_p$, where $y_h = c_1 y_1 + c_2 y_2$ is the complementary function (general solution of the homogeneous equation) and $y_p$ is any particular integral. The method of **variation of parameters** provides a systematic formula for $y_p$ that works for *any* continuous forcing $f(x)$, given two linearly independent complementary functions.
[quotetheorem:835]
The idea behind [Variation of Parameters](/theorems/835) is to "promote" the constants $c_1, c_2$ in the complementary function to unknown functions $u(x), v(x)$: write $y_p = u(x)y_1(x) + v(x)y_2(x)$. The two unknown functions provide one degree of freedom more than needed (we have two unknowns but only one second-order equation), so we impose the constraint $u'y_1 + v'y_2 = 0$ to simplify the computation. Substituting into the ODE then yields $u'y_1' + v'y_2' = f(x)$. Together with the constraint, this is a $2 \times 2$ linear system for $u'$ and $v'$ whose coefficient matrix is the Wronskian matrix — guaranteed to be invertible by [Abel's Theorem](/theorems/834).
[citeproof:835]
### Resonance
When the forcing term $f(x)$ is itself a solution of the homogeneous equation, the standard ansatz for a particular integral fails — this is **resonance**. The physical picture is that the driving force is precisely in tune with the system's natural frequency, causing the response to grow without bound (or, in the presence of damping, to reach a large finite amplitude).
For a constant-coefficient equation $y'' + \omega_0^2 y = F_0 \cos(\omega t)$, the standard guess $y_p = A \cos(\omega t) + B \sin(\omega t)$ fails when $\omega = \omega_0$ because $\cos(\omega_0 t)$ is already a complementary function. The detuning technique from Case 2 of the characteristic equation resolves this: write $\omega = \omega_0 + \varepsilon$ for small $\varepsilon > 0$ and solve the non-resonant problem. A standard calculation (substituting and matching coefficients) gives
\begin{align*}
y_p = \frac{F_0}{\omega_0^2 - \omega^2} \cos(\omega t) = \frac{F_0}{(\omega_0 - \omega)(\omega_0 + \omega)} \cos(\omega t).
\end{align*}
This particular integral is valid only for $\omega \neq \omega_0$. To find the resonant response, we form the general solution at $\omega = \omega_0 + \varepsilon$ (with $A \cos(\omega_0 t)$ from the complementary function), choose the constant $A$ so that $y(0) = 0$ (giving $A = -F_0/(\omega_0^2 - \omega^2)$), and take $\varepsilon \to 0$:
\begin{align*}
y(t) &= \frac{F_0}{\omega_0^2 - (\omega_0 + \varepsilon)^2}\bigl(\cos((\omega_0 + \varepsilon)t) - \cos(\omega_0 t)\bigr) \\
&= \frac{F_0}{-\varepsilon(2\omega_0 + \varepsilon)} \bigl(\cos((\omega_0 + \varepsilon)t) - \cos(\omega_0 t)\bigr).
\end{align*}
As $\varepsilon \to 0$, the numerator $\cos((\omega_0 + \varepsilon)t) - \cos(\omega_0 t) \to -\varepsilon t \sin(\omega_0 t) + O(\varepsilon^2)$ by [Taylor's Theorem](/theorems/827) applied to the $\varepsilon$ dependence, while the denominator is $-2\omega_0 \varepsilon + O(\varepsilon^2)$. The $\varepsilon$ factors cancel:
\begin{align*}
y_p = \lim_{\varepsilon \to 0} \frac{-F_0 \varepsilon t \sin(\omega_0 t)}{-2\omega_0 \varepsilon} = \frac{F_0 t}{2\omega_0} \sin(\omega_0 t),
\end{align*}
which grows linearly in time — the hallmark of resonance. With damping ($y'' + 2\gamma y' + \omega_0^2 y = F_0 \cos(\omega t)$), the response remains bounded but reaches a maximum amplitude at $\omega \approx \omega_0$ (for small $\gamma$), with peak amplitude proportional to $1/\gamma$.
### Delta Function and Heaviside Forcing
The [Variation of Parameters](/theorems/835) formula applies to any continuous forcing, but in applications we often encounter forcing that is singular (the Dirac delta $\delta(x - x_0)$) or discontinuous (the Heaviside step function $H(x - x_0)$).
[definition: Heaviside Step Function]
The **Heaviside step function** is $H(x) = \begin{cases} 0 & x < 0, \\ 1 & x \geq 0. \end{cases}$
[/definition]
For the equation $y'' + p(x)y' + q(x)y = \delta(x - x_0)$ (delta-function forcing), the solution is continuous but its derivative has a jump discontinuity at $x = x_0$. Integrating the ODE across a small interval $[x_0 - \varepsilon, x_0 + \varepsilon]$ and taking $\varepsilon \to 0$, the $\delta$-function contributes $1$ on the right-hand side, the $y$ and $y'$ terms contribute zero (since $y$ is continuous and the interval shrinks to zero width), and the $y''$ term contributes $[y']_{x_0^-}^{x_0^+}$. This gives the **jump condition**:
\begin{align*}
[y']_{x_0^-}^{x_0^+} = y'(x_0^+) - y'(x_0^-) = 1.
\end{align*}
For Heaviside forcing $y'' + p(x)y' + q(x)y = H(x - x_0)$, the solution is $C^1$ but its second derivative has a jump. This can be verified by differentiating the Heaviside solution to recover the delta-function case.
## Series Solutions and Fuchs' Theorem
When the coefficient functions $p(x)$ and $q(x)$ are not constant but are analytic (expressible as convergent power series), we can seek solutions of $y'' + p(x)y' + q(x)y = 0$ as power series themselves. This is the method of **Frobenius**, and its theoretical underpinning is [Fuchs' Theorem](/theorems/836).
The classification of points as ordinary or singular determines which type of series expansion is appropriate.
[definition: Ordinary And Singular Points]
Consider the ODE $P(x)y'' + Q(x)y' + R(x)y = 0$. A point $x_0$ is an **ordinary point** if $Q(x)/P(x)$ and $R(x)/P(x)$ are both analytic at $x_0$. Otherwise, $x_0$ is a **singular point**. A singular point $x_0$ is **regular** if $(x - x_0) Q(x)/P(x)$ and $(x - x_0)^2 R(x)/P(x)$ are both analytic at $x_0$; otherwise it is **irregular**.
[/definition]
The distinction between regular and irregular singular points is crucial: at a regular singular point, the coefficients blow up, but not too fast — at worst like $1/(x - x_0)$ for $p$ and $1/(x - x_0)^2$ for $q$. This controlled singularity allows the Frobenius method to succeed.
[quotetheorem:836]
[Fuchs' Theorem](/theorems/836) guarantees that at an ordinary point, we can find two linearly independent power series solutions converging in a neighbourhood, and at a regular singular point, we can find at least one Frobenius solution of the form $(x - x_0)^\sigma \sum a_n (x - x_0)^n$. The exponent $\sigma$ is determined by the **indicial equation** — the lowest-order algebraic condition obtained by substituting the Frobenius ansatz into the ODE.
### The Frobenius Method
At a regular singular point $x_0$ (which we may take to be $0$ by translation), the equation takes the form:
\begin{align*}
x^2 y'' + x b(x) y' + c(x) y = 0,
\end{align*}
where $b(x)$ and $c(x)$ are analytic at $x = 0$. We substitute the Frobenius ansatz $y = x^\sigma \sum_{n=0}^\infty a_n x^n = \sum_{n=0}^\infty a_n x^{n+\sigma}$ with $a_0 \neq 0$, compute each term of the ODE as a power series in $x$, and equate coefficients of $x^{n+\sigma}$ to zero.
The lowest power of $x$ (the $n = 0$ term) gives the indicial equation:
\begin{align*}
\sigma(\sigma - 1) + b(0) \sigma + c(0) = 0,
\end{align*}
a quadratic in $\sigma$ whose roots $\sigma_1, \sigma_2$ (the **exponents** of the singularity) determine the leading behaviour of the solutions.
Higher powers of $x$ give recurrence relations for the coefficients $a_n$ in terms of $a_0, a_1, \ldots, a_{n-1}$. These recurrence relations determine all coefficients once $a_0$ (and sometimes $a_1$) are chosen freely.
### Second Solutions at Regular Singular Points
The first Frobenius solution (corresponding to the larger root $\sigma_2$ of the indicial equation) is always guaranteed by [Fuchs' Theorem](/theorems/836). The form of the second solution depends on the relationship between the two roots:
[quotetheorem:837]
The [Second Solutions theorem](/theorems/837) classifies three cases. When $\sigma_2 - \sigma_1 \notin \mathbb{Z}$, both roots yield independent Frobenius series — the most favourable situation. When $\sigma_2 - \sigma_1$ is a positive integer, the second solution may (but need not) involve a logarithmic term $y_1(x) \ln(x - x_0)$. When the roots coincide ($\sigma_1 = \sigma_2$), the logarithmic term is always required — a single Frobenius ansatz cannot produce two linearly independent solutions.
The appearance of logarithms is not a pathology but a structural feature: it arises from the reduction of order method. Writing $y_2 = v(x) y_1(x)$ and using [Abel's Theorem](/theorems/834) to find $v'$, the integral $\int v' \, dx$ produces a $\ln(x - x_0)$ term precisely when the indicial roots coincide or differ by an integer.
[example: Bessel's Equation Near The Origin]
Bessel's equation of order $\nu$ is $x^2 y'' + x y' + (x^2 - \nu^2) y = 0$. The origin $x = 0$ is a regular singular point (since $b(x) = 1$ and $c(x) = x^2 - \nu^2$ are analytic). The indicial equation is $\sigma^2 - \nu^2 = 0$, giving $\sigma = \pm \nu$.
When $\nu \notin \mathbb{Z}/2$ (the generic case), the two roots differ by a non-integer amount, and [Fuchs' Theorem](/theorems/836) yields two independent Frobenius solutions $J_\nu(x)$ and $J_{-\nu}(x)$. When $\nu \in \mathbb{N}_0$ (the integer case), the roots differ by an integer, and the second solution $Y_\nu(x)$ (the Bessel function of the second kind) involves a $\ln x$ term.
[/example]
The series and Frobenius methods, underpinned by [Fuchs' Theorem](/theorems/836), extend the reach of exact solution methods to a wide class of variable-coefficient equations. Many special functions of mathematical physics — Bessel functions, Legendre polynomials, Airy functions, hypergeometric functions — arise as Frobenius solutions at regular singular points of specific second-order ODEs. The general theory developed in this chapter provides the framework for constructing and analysing all of them.
With the linear ODE toolkit complete — complementary functions via the characteristic equation, particular integrals via variation of parameters, and series solutions via Frobenius — we are ready for the final chapter, which combines everything: multivariable calculus from Chapter 2, stability analysis from Chapter 3, and the linear systems perspective from this chapter, applied to functions of several variables and to partial differential equations.
# Multivariable Functions and Dynamical Systems
The previous chapters treated differential equations in one independent variable — time or position along a line. This final chapter brings together the multivariable calculus of Chapter 2, the stability analysis of Chapter 3, and the linear theory of Chapter 4, applying them to functions of several variables and to systems of coupled ODEs. The chapter has three parts: the classification of stationary points of scalar functions $f(x, y)$ via the Hessian matrix; the qualitative theory of planar dynamical systems $\dot{x} = f(x, y)$, $\dot{y} = g(x, y)$; and an introduction to partial differential equations through the wave equation.
The unifying idea is Taylor expansion in several variables. Classifying stationary points requires the second-order multivariable Taylor expansion — the Hessian quadratic form — and the sign-definiteness criterion of [Sylvester](/theorems/839). Analysing planar dynamical systems requires the first-order multivariable Taylor expansion — the Jacobian linearisation — and the eigenvalue classification from Chapter 4. Solving the wave equation uses the change-of-variables technique from Chapter 2 to reduce a PDE to an ODE, and [D'Alembert's Formula](/theorems/665) gives the explicit solution.
## Contours and the Gradient
Consider a function $f(x, y)$ representing, say, the height of a landscape. A hiker at position $(x_0, y_0)$ wants to know: in which direction should I walk to ascend most steeply? And which directions lead neither up nor down — the directions along level ground? These two questions motivate the gradient vector and its relationship to contours, and the answers depend entirely on the [Multivariable Chain Rule](/theorems/830).
[definition: Contour]
Let $f: U \to \mathbb{R}$ be defined on an [open set](/page/Open%20Set) $U \subseteq \mathbb{R}^2$. A **contour** (or **level curve**) of $f$ at height $c \in \mathbb{R}$ is the set
\begin{align*}
\{(x, y) \in U : f(x, y) = c\}.
\end{align*}
[/definition]
[definition: Gradient]
Let $f: U \to \mathbb{R}$ be differentiable on an open set $U \subseteq \mathbb{R}^2$. The **gradient** of $f$ at $(x, y) \in U$ is the vector
\begin{align*}
\nabla f(x, y) = \left(\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}\right).
\end{align*}
[/definition]
To understand the gradient geometrically, we compute the rate of change of $f$ along a straight line through $(x_0, y_0)$ in the direction of a unit vector $\hat{n} = (n_1, n_2)$. Parametrise the line as $(x(t), y(t)) = (x_0 + n_1 t, y_0 + n_2 t)$, so that $dx/dt = n_1$ and $dy/dt = n_2$. By the [Multivariable Chain Rule](/theorems/830):
\begin{align*}
\frac{d}{dt}\bigg|_{t=0} f(x_0 + n_1 t, y_0 + n_2 t) = \frac{\partial f}{\partial x} n_1 + \frac{\partial f}{\partial y} n_2 = \nabla f \cdot \hat{n}.
\end{align*}
[definition: Directional Derivative]
The **directional derivative** of $f$ at $(x_0, y_0)$ in the direction of a unit vector $\hat{n}$ is
\begin{align*}
D_{\hat{n}} f := \frac{d}{dt}\bigg|_{t=0} f(x_0 + n_1 t, y_0 + n_2 t) = \nabla f \cdot \hat{n}.
\end{align*}
[/definition]
The directional derivative answers both questions from the opening paragraph. Writing $\nabla f \cdot \hat{n} = |\nabla f| \cos \theta$ where $\theta$ is the angle between $\nabla f$ and $\hat{n}$: the directional derivative is maximised when $\theta = 0$ (walk parallel to $\nabla f$ — the direction of steepest ascent) and equals zero when $\theta = \pi/2$ (walk perpendicular to $\nabla f$ — along a contour). The gradient is therefore perpendicular to contours: if $(x(t), y(t))$ is a curve lying on the contour $f = c$, then $f(x(t), y(t)) = c$ for all $t$, and differentiation via the chain rule gives $\nabla f \cdot (\dot{x}, \dot{y}) = 0$, confirming that $\nabla f$ is orthogonal to the tangent vector of any contour curve.
## Stationary Points and the Hessian
A stationary point of $f(x, y)$ is a point where the gradient vanishes: $\nabla f = \mathbf{0}$. At such a point, the function has no preferred direction of increase — it could be a local maximum, a local minimum, or a saddle point. To distinguish these, we need the second-order behaviour of $f$, captured by the Hessian matrix.
[definition: Stationary Point]
A point $(x_0, y_0)$ is a **stationary point** (or **critical point**) of $f: U \to \mathbb{R}$ if $\nabla f(x_0, y_0) = \mathbf{0}$, i.e., $\frac{\partial f}{\partial x}(x_0, y_0) = 0$ and $\frac{\partial f}{\partial y}(x_0, y_0) = 0$.
[/definition]
[definition: Hessian Matrix]
The **Hessian matrix** of $f: U \to \mathbb{R}$ at a point $(x_0, y_0)$ is the symmetric matrix of second partial derivatives:
\begin{align*}
H = \begin{pmatrix} f_{xx} & f_{xy} \\ f_{xy} & f_{yy} \end{pmatrix}\bigg|_{(x_0, y_0)},
\end{align*}
where $f_{xy} = f_{yx}$ by [Schwarz's theorem](/theorems/332), assuming $f \in C^2$.
[/definition]
By the multivariable Taylor expansion (see theorem [85](/theorems/85)), near a stationary point:
\begin{align*}
f(x_0 + \delta x, y_0 + \delta y) = f(x_0, y_0) + \frac{1}{2} \begin{pmatrix} \delta x & \delta y \end{pmatrix} H \begin{pmatrix} \delta x \\ \delta y \end{pmatrix} + o(|\delta x|^2).
\end{align*}
The first-order terms vanish because $\nabla f = \mathbf{0}$ at a stationary point. The quadratic form $\frac{1}{2} \deltax^\top H \, \deltax$ determines the local shape:
If $H$ is **positive definite** (both eigenvalues positive): the quadratic form is positive for all nonzero $\deltax$, so $f$ increases in every direction — **local minimum**.
If $H$ is **negative definite** (both eigenvalues negative): $f$ decreases in every direction — **local maximum**.
If $H$ is **indefinite** (eigenvalues of opposite signs): $f$ increases in some directions and decreases in others — **saddle point**.
If $H$ is **semidefinite** ($\det H = 0$): the second-order test is inconclusive, and higher-order terms must be examined.
### Sylvester's Criterion
Computing eigenvalues of the Hessian is one way to test definiteness, but for $2 \times 2$ matrices there is a simpler algebraic test via the determinant and trace.
[quotetheorem:839]
For a $2 \times 2$ Hessian $H = \begin{pmatrix} f_{xx} & f_{xy} \\ f_{xy} & f_{yy} \end{pmatrix}$, [Sylvester's Criterion](/theorems/839) reduces to:
$\det H > 0$ and $f_{xx} > 0$: **local minimum** ($H$ positive definite).
$\det H > 0$ and $f_{xx} < 0$: **local maximum** ($H$ negative definite).
$\det H < 0$: **saddle point** ($H$ indefinite).
$\det H = 0$: **inconclusive** ($H$ semidefinite).
Since $\det H = f_{xx} f_{yy} - f_{xy}^2$, this provides a completely explicit classification in terms of the second partial derivatives — no eigenvalue computation needed.
[example: Classifying Stationary Points Of A Polynomial]
Consider $f(x, y) = x^3 - 3xy + y^3$. Setting $\nabla f = \mathbf{0}$:
\begin{align*}
f_x = 3x^2 - 3y = 0 \quad &\implies \quad y = x^2, \\
f_y = -3x + 3y^2 = 0 \quad &\implies \quad x = y^2.
\end{align*}
Substituting $y = x^2$ into $x = y^2$: $x = x^4$, so $x(x^3 - 1) = 0$, giving $x = 0$ (with $y = 0$) and $x = 1$ (with $y = 1$).
The Hessian is $H = \begin{pmatrix} 6x & -3 \\ -3 & 6y \end{pmatrix}$.
At $(0, 0)$: $H = \begin{pmatrix} 0 & -3 \\ -3 & 0 \end{pmatrix}$, $\det H = -9 < 0$ — **saddle point**.
At $(1, 1)$: $H = \begin{pmatrix} 6 & -3 \\ -3 & 6 \end{pmatrix}$, $\det H = 36 - 9 = 27 > 0$ and $f_{xx} = 6 > 0$ — **local minimum**.
[/example]
## Linear Systems and Phase Portraits
The second-order linear ODE $y'' + py' + qy = 0$ with constant coefficients can be written as a first-order system $\dot{Y} = AY$ where $Y = (y, y')^\top$ and $A = \begin{pmatrix} 0 & 1 \\ -q & -p \end{pmatrix}$. The eigenvalues of $A$ are precisely the roots of the characteristic equation from the previous section. The general solution $Y(t) = c_1 v_1 e^{\lambda_1 t} + c_2 v_2 e^{\lambda_2 t}$ traces out curves in the $(y, y')$-plane — the **phase portrait** of the system. The qualitative shape of these curves is determined entirely by the eigenvalue type, and the reason is visible directly from the general solution.
**Real eigenvalues, same sign (node).** If $\lambda_1, \lambda_2$ are both real and negative (say $\lambda_1 < \lambda_2 < 0$), then each exponential $e^{\lambda_i t}$ decays as $t \to \infty$. Since $e^{\lambda_1 t}$ decays faster than $e^{\lambda_2 t}$, for large $t$ the solution is dominated by the slower term: $Y(t) \approx c_2 v_2 e^{\lambda_2 t}$. Trajectories therefore approach the origin tangent to the eigenvector $v_2$ (the "slow eigendirection") — there is no rotation, only exponential decay along the eigenvector directions. This is a **stable node**. If both eigenvalues are positive, the same argument with $t \to -\infty$ shows trajectories recede from the origin — an **unstable node**.
**Real eigenvalues, opposite signs (saddle).** If $\lambda_1 < 0 < \lambda_2$, then the component along $v_1$ decays ($c_1 e^{\lambda_1 t} \to 0$) while the component along $v_2$ grows ($c_2 e^{\lambda_2 t} \to \infty$). A trajectory starting near the origin is pushed away along $v_2$ and pulled toward $v_1$, tracing hyperbolic curves. Only the special trajectories with $c_2 = 0$ (which lie exactly along $v_1$) converge to the origin; all others diverge. This is a **saddle point** — always unstable.
**Complex conjugate eigenvalues (spiral or centre).** If $\lambda = \alpha \pm i\beta$ with $\beta \neq 0$, the general solution involves terms $e^{\alpha t} \cos(\beta t)$ and $e^{\alpha t} \sin(\beta t)$. The trigonometric factors cause trajectories to *rotate* around the origin, while the exponential factor $e^{\alpha t}$ causes them to grow or shrink. If $\alpha < 0$, the rotation is accompanied by decay — a **stable spiral** (or **stable focus**). If $\alpha > 0$, by growth — an **unstable spiral**. If $\alpha = 0$ (purely imaginary eigenvalues), there is no growth or decay, and the trajectories are closed ellipses — a **centre**.
**Repeated eigenvalue (degenerate node or star).** If $\lambda_1 = \lambda_2 = \lambda$ and the eigenspace is two-dimensional ($A = \lambda I$), every nonzero vector is an eigenvector, so all trajectories are straight lines through the origin — a **star node**. If the eigenspace is one-dimensional, the general solution involves a term $t e^{\lambda t}$ (from the detuning argument of §1), and trajectories approach the origin tangent to the single eigenvector direction — a **degenerate node**.
### The Trace-Determinant Classification
These cases can be organised using two quantities: $\operatorname{tr}(A) = \lambda_1 + \lambda_2$ and $\det(A) = \lambda_1 \lambda_2$. The eigenvalues are the roots of the characteristic polynomial $\lambda^2 - \operatorname{tr}(A) \lambda + \det(A) = 0$, whose discriminant is:
\begin{align*}
\Delta = \operatorname{tr}(A)^2 - 4\det(A).
\end{align*}
The sign of $\Delta$ determines whether the roots are real or complex: $\Delta > 0$ gives two distinct real roots, $\Delta = 0$ gives a repeated root, and $\Delta < 0$ gives complex conjugates. Meanwhile, $\det(A) = \lambda_1 \lambda_2$ determines whether the real roots have the same or opposite signs: $\det(A) > 0$ means same sign (node), $\det(A) < 0$ means opposite signs (saddle). Combining:
Saddles occur when $\det(A) < 0$. Nodes occur when $\det(A) > 0$ and $\Delta > 0$ (i.e., $\operatorname{tr}(A)^2 > 4\det(A)$). Spirals occur when $\det(A) > 0$ and $\Delta < 0$ (i.e., $\operatorname{tr}(A)^2 < 4\det(A)$). Centres occur when $\operatorname{tr}(A) = 0$ and $\det(A) > 0$ (purely imaginary eigenvalues). Stability within each class is determined by the sign of $\operatorname{tr}(A) = \lambda_1 + \lambda_2$: the equilibrium is stable if $\operatorname{tr}(A) < 0$ (both eigenvalues have negative real part) and unstable if $\operatorname{tr}(A) > 0$.
## Nonlinear Dynamical Systems
The phase portrait analysis of linear systems extends to *nonlinear* planar systems $\dot{x} = f(x, y)$, $\dot{y} = g(x, y)$ via linearisation — the same perturbation argument used for scalar equations in Chapter 3, now applied in two dimensions.
At an equilibrium $(x_0, y_0)$ where $f(x_0, y_0) = g(x_0, y_0) = 0$, we write $(x, y) = (x_0 + \xi, y_0 + \eta)$ and expand $f$ and $g$ via the multivariable Taylor expansion (theorem [85](/theorems/85)):
\begin{align*}
\dot{\xi} &= f(x_0 + \xi, y_0 + \eta) = \xi f_x + \eta f_y + O(|\varepsilon|^2), \\
\dot{\eta} &= g(x_0 + \xi, y_0 + \eta) = \xi g_x + \eta g_y + O(|\varepsilon|^2),
\end{align*}
where $\varepsilon = (\xi, \eta)$ and all partial derivatives are evaluated at $(x_0, y_0)$. Dropping the $O(|\varepsilon|^2)$ terms gives the linearised system:
\begin{align*}
\begin{pmatrix} \dot{\xi} \\ \dot{\eta} \end{pmatrix} = M \begin{pmatrix} \xi \\ \eta \end{pmatrix}, \qquad M = \begin{pmatrix} f_x & f_y \\ g_x & g_y \end{pmatrix}\bigg|_{(x_0, y_0)}.
\end{align*}
The matrix $M$ is the Jacobian of the vector field at the equilibrium. By [Linearised Stability](/theorems/698), the eigenvalues of $M$ determine the stability:
If all eigenvalues have negative real part: the equilibrium is **asymptotically stable**.
If any eigenvalue has positive real part: the equilibrium is **unstable**.
If eigenvalues are purely imaginary: the linearisation is **inconclusive** — the nonlinear terms may create a stable spiral, an unstable spiral, or a true centre. This is the *only* case where the linear classification can be overturned by nonlinear effects.
[example: Predator Prey System]
The Lotka–Volterra equations $\dot{x} = x(\alpha - \beta y)$, $\dot{y} = y(-\gamma + \delta x)$ model a predator-prey interaction ($x$ = prey, $y$ = predator, $\alpha, \beta, \gamma, \delta > 0$). The equilibria are $(0, 0)$ and $(x^*, y^*) = (\gamma/\delta, \alpha/\beta)$.
At $(0, 0)$: the Jacobian is $M = \begin{pmatrix} \alpha & 0 \\ 0 & -\gamma \end{pmatrix}$ with eigenvalues $\alpha > 0$ and $-\gamma < 0$ — a **saddle point** (unstable).
At $(\gamma/\delta, \alpha/\beta)$: the Jacobian is $M = \begin{pmatrix} 0 & -\beta\gamma/\delta \\ \delta\alpha/\beta & 0 \end{pmatrix}$ with $\operatorname{tr}(M) = 0$ and $\det(M) = \alpha\gamma > 0$, giving purely imaginary eigenvalues $\lambda = \pm i\sqrt{\alpha\gamma}$. The linearisation predicts a **centre**, and indeed the Lotka–Volterra system has a conserved quantity $H(x, y) = \delta x - \gamma \ln x + \beta y - \alpha \ln y$, confirming that the orbits are closed curves (not spirals). This is one of the rare cases where the linearisation's centre prediction is confirmed by the nonlinear system.
[/example]
## Partial Differential Equations: The Wave Equation
The final topic extends the ODE framework to partial differential equations — equations involving partial derivatives with respect to multiple independent variables. We focus on the one-dimensional wave equation, which combines the change-of-variables technique from Chapter 2 with the superposition principle from Chapter 4.
The one-dimensional wave equation governs the displacement $u(x, t)$ of a vibrating string:
\begin{align*}
\frac{\partial^2 u}{\partial t^2} = c^2 \frac{\partial^2 u}{\partial x^2},
\end{align*}
where $c > 0$ is the wave speed.
### Factorisation via Change of Variables
The wave operator $\partial_t^2 - c^2 \partial_x^2$ factors as $(\partial_t - c\partial_x)(\partial_t + c\partial_x)$. This suggests introducing **characteristic coordinates** $\xi = x + ct$ and $\eta = x - ct$. By the change-of-variables formula from Chapter 2:
\begin{align*}
\frac{\partial}{\partial x} = \frac{\partial}{\partial \xi} + \frac{\partial}{\partial \eta}, \qquad \frac{\partial}{\partial t} = c\frac{\partial}{\partial \xi} - c\frac{\partial}{\partial \eta}.
\end{align*}
In these coordinates, the wave equation becomes:
\begin{align*}
\frac{\partial^2 u}{\partial \xi \partial \eta} = 0.
\end{align*}
This is trivially solved: integrating with respect to $\eta$ gives $\frac{\partial u}{\partial \xi} = F'(\xi)$ for some function $F$; integrating with respect to $\xi$ gives $u(\xi, \eta) = F(\xi) + G(\eta)$. Reverting to the original coordinates:
\begin{align*}
u(x, t) = F(x + ct) + G(x - ct).
\end{align*}
This is the **general solution** of the wave equation: a superposition of a rightward-travelling wave $G(x - ct)$ and a leftward-travelling wave $F(x + ct)$, each maintaining its shape as it propagates at speed $c$.
### D'Alembert's Formula
The initial-value problem specifies the initial displacement $u(x, 0) = g(x)$ and the initial velocity $\partial_t u(x, 0) = h(x)$. Substituting $t = 0$ into the general solution and its time derivative:
\begin{align*}
g(x) &= F(x) + G(x), \\
h(x) &= cF'(x) - cG'(x).
\end{align*}
Solving for $F$ and $G$ and substituting back gives d'Alembert's formula.
[quotetheorem:665]
[D'Alembert's Formula](/theorems/665) expresses the solution as:
\begin{align*}
u(x, t) = \frac{1}{2}\bigl(g(x + ct) + g(x - ct)\bigr) + \frac{1}{2c}\int_{x - ct}^{x + ct} h(s) \, ds.
\end{align*}
The first term shows that the initial displacement splits into two halves, each travelling in opposite directions at speed $c$. The second term shows that the initial velocity creates a wave whose profile is the integral of $h$ over a growing interval — the **domain of dependence** $[x - ct, x + ct]$ of the point $(x, t)$.
### First-Order PDEs and the Method of Characteristics
The factorisation of the wave equation into two first-order operators suggests studying first-order PDEs directly. Consider the general first-order linear PDE:
\begin{align*}
a(x, y) \frac{\partial u}{\partial x} + b(x, y) \frac{\partial u}{\partial y} = c(x, y, u).
\end{align*}
The left-hand side is the directional derivative of $u$ along the vector field $(a, b)$. The **method of characteristics** parametrises the curves $(x(t), y(t))$ that are tangent to this vector field:
\begin{align*}
\frac{dx}{dt} = a(x, y), \qquad \frac{dy}{dt} = b(x, y).
\end{align*}
Along these **characteristic curves**, the PDE reduces to an ODE:
\begin{align*}
\frac{du}{dt} = \frac{\partial u}{\partial x} \frac{dx}{dt} + \frac{\partial u}{\partial y} \frac{dy}{dt} = a u_x + b u_y = c(x, y, u),
\end{align*}
where the first equality uses the [Multivariable Chain Rule](/theorems/830). This reduction from PDE to ODE along characteristics is one of the most powerful applications of the chain rule in the entire course. The characteristic equations are themselves a system of ODEs, which can be solved using the techniques of Chapters 3 and 4.
[example: Advection Equation]
The simplest first-order PDE is the advection equation $u_t + c u_x = 0$, describing transport at constant speed $c$. The characteristic equations are $\frac{dt}{ds} = 1$, $\frac{dx}{ds} = c$, giving the straight-line characteristics $x = x_0 + cs$, $t = s$. Along each characteristic, $\frac{du}{ds} = 0$, so $u$ is constant: $u(x, t) = u(x - ct, 0) = g(x - ct)$, where $g$ is the initial profile. The solution is simply the initial data translated at speed $c$ — a rightward-travelling wave.
[/example]
This chapter has brought the course full circle. The Taylor expansion from Chapter 1 — now in several variables — classifies stationary points of functions and linearises nonlinear dynamical systems. The chain rule from Chapter 2 transforms the wave equation into a trivially solvable form and reduces first-order PDEs to ODEs along characteristic curves. The linear algebra of eigenvalues from Chapter 4 classifies phase portraits of planar systems. And the stability analysis from Chapter 3, extended to two dimensions, determines whether equilibria of nonlinear systems attract or repel nearby trajectories. These tools, taken together, form the core of applied mathematics at this level.