Integrable systems represent one of the deepest and most beautiful intersections of geometry, dynamics, and analysis in mathematical physics. Rather than viewing these systems as exceptional cases, this course positions integrability as a fundamental organizing principle: many of the nonlinear equations that govern waves, fluids, and field theories possess hidden symmetries and conservation laws that make them exactly solvable. We begin with finite-dimensional Hamiltonian mechanics, where the Arnold-Liouville theorem reveals that integrable systems foliate phase space into invariant tori, allowing us to reduce their dynamics to simple flows on these geometric structures. This classical perspective provides both intuition and rigorous tools that we will extend into infinite dimensions.
The course then transitions to integrable partial differential equations—primarily nonlinear wave equations like the Korteweg-de Vries equation and sine-Gordon equation—where integrability manifests through remarkable properties: solitons that preserve their shape through collisions, exact solutions via algebraic methods, and a surprising connection to the scattering data of an associated linear operator. The Inverse Scattering Transform emerges as the fundamental technique, revealing that certain nonlinear PDEs can be linearized through a clever change of variables involving spectral data. From this vantage point, solving a nonlinear equation reduces to manipulating the spectral parameters of an associated Schrodinger-type problem.
Throughout, we emphasize that integrability is not a computational accident but a structural phenomenon rooted in deep algebraic and geometric principles: infinite-dimensional Hamiltonian structures, Lax representations, and symmetry groups. By the course's end, you will understand how these disparate tools—canonical transformations, Lie algebras, spectral theory, and differential geometry—unify under the umbrella of integrable systems theory.
The theory of ordinary differential equations reaches its most elegant form when a system possesses enough structure to be solved by pure integration rather than case-by-case analysis. This chapter develops the classical theory of integrable Hamiltonian systems, beginning with the geometry of flows generated by vector fields and culminating in the Arnold-Liouville theorem, which characterises exactly when a Hamiltonian system can be reduced to linear motion on a torus. The journey passes through the Poisson bracket formalism, the notion of first integrals, canonical coordinate transformations, and the action-angle variables that make explicit integration possible.
The central question is: when can we find coordinates in which Hamilton's equations become so simple that the solution is just a linear function of time? The answer, given by Arnold-Liouville, requires precisely $n$ independent first integrals in involution for a $2n$-dimensional system. Establishing this requires understanding both the algebraic structure of Hamiltonian flows and the topology of the level sets they foliate.
# 1. Integrability of ODE's
## Vector Fields and Flow Maps
A fundamental question in the theory of differential equations is whether a given initial value problem has a solution, whether that solution is unique, and how it depends on the initial data. For ODE systems of the form $\dot{x} = V(x)$, these questions are answered once we know the vector field $V$ is sufficiently regular.
An $m$-dimensional first-order ODE is specified by a vector field $V: \mathbb{R}^m \to \mathbb{R}^m$ and an initial condition $x_0 \in \mathbb{R}^m$. The task is to find $x: (a, b) \to \mathbb{R}^m$, defined on some open interval $(a, b)$ containing $0$, satisfying
[definition: Initial Value Problem]
The **initial value problem** associated to a vector field $V: \mathbb{R}^m \to \mathbb{R}^m$ and initial condition $x_0 \in \mathbb{R}^m$ is the system
\begin{align*}
\dot{x} = V(x), \quad x(0) = x_0.
\end{align*}
A **solution** is a smooth map $x: (a, b) \to \mathbb{R}^m$ on some open interval $(a, b) \ni 0$ satisfying both conditions.
[/definition]
Restricting to first-order systems is no restriction in practice: any $k$-th order ODE can be rewritten as a first-order system by introducing auxiliary variables for each derivative up to order $k - 1$. The course assumes throughout that vector fields are smooth enough for the following fundamental existence and uniqueness result to hold.
[quotetheorem:1331]
The proof of this result uses the Picard-Lindelöf fixed-point argument and belongs to the Part IB Differential Equations and Analysis courses; it is quoted here without proof.
The smoothness hypothesis on $V$ is essential. To see what can go wrong, consider the scalar ODE $\dot{x} = x^{2/3}$ on $\mathbb{R}$. The right-hand side is continuous but fails to be Lipschitz at $x = 0$. Starting from $x(0) = 0$, both $x(t) = 0$ and $x(t) = (t/3)^3$ are solutions — so uniqueness fails. Even when a local solution exists and is unique, it need not persist for all time: $\dot{x} = x^2$ with $x(0) = 1$ has the unique solution $x(t) = 1/(1-t)$, which blows up at $t = 1$. Global existence requires either a global Lipschitz bound or an a priori estimate preventing blow-up.
The smooth dependence on initial conditions is particularly significant: it allows us to package all solutions simultaneously into a single map.
[definition: Flow Map]
Given a smooth vector field $V: \mathbb{R}^m \to \mathbb{R}^m$, the **flow map** is the family of smooth maps $g^t: \mathbb{R}^m \to \mathbb{R}^m$ defined by
\begin{align*}
g^t(x_0) = x(t),
\end{align*}
where $x(t)$ is the unique solution to $\dot{x} = V(x)$, $x(0) = x_0$. We often write $g^t x_0$ for $g^t(x_0)$.
[/definition]
The flow map encodes the full dynamics: knowing $g^t$ for all $t$ is equivalent to knowing all solutions. Its algebraic properties reflect the deterministic and time-reversible character of smooth ODEs.
[quotetheorem:1351]
[citeproof:1351]
Together, properties (1)-(3) say that $\{g^t : t \in \mathbb{R}\}$ forms a one-parameter group of diffeomorphisms of $\mathbb{R}^m$. The vector field $V$ is called the **infinitesimal generator** of this group, because Taylor-expanding the flow gives
\begin{align*}
g^\varepsilon x_0 = x_0 + \varepsilon V(x_0) + o(\varepsilon),
\end{align*}
so $V(x_0)$ is precisely the initial velocity of the trajectory starting at $x_0$.
### Commutativity of Flows and the Commutator
Having two vector fields $V_1$ and $V_2$ with respective flows $g_1^t$ and $g_2^s$, one naturally asks: do these flows commute? That is, does following the flow of $V_1$ for time $t$ and then $V_2$ for time $s$ give the same result as doing them in the opposite order?
In general, the answer is no. The obstruction is measured by a third vector field constructed from $V_1$ and $V_2$.
[definition: Commutator Of Vector Fields]
For two smooth vector fields $V_1, V_2: \mathbb{R}^m \to \mathbb{R}^m$, the **commutator** (or Lie bracket) is the vector field $[V_1, V_2]: \mathbb{R}^m \to \mathbb{R}^m$ whose $i$-th component is
\begin{align*}
[V_1, V_2]_i = \sum_{j=1}^m \left( (V_1)_j \frac{\partial (V_2)_i}{\partial x_j} - (V_2)_j \frac{\partial (V_1)_i}{\partial x_j} \right).
\end{align*}
Equivalently, in operator notation,
\begin{align*}
[V_1, V_2] = \left(V_1 \cdot \frac{\partial}{\partial x}\right) V_2 - \left(V_2 \cdot \frac{\partial}{\partial x}\right) V_1.
\end{align*}
[/definition]
The commutator measures precisely the failure of the two flows to commute. To see why the formula is the right one, think of following $V_1$ for a small time $\varepsilon$, then $V_2$ for time $\varepsilon$, then $V_1$ backward, then $V_2$ backward. To first order in $\varepsilon$ these paths cancel, but to order $\varepsilon^2$ there is a residual displacement proportional to $[V_1, V_2]$. The commutator thus has an intrinsically second-order geometric meaning: it measures the area enclosed by the four-legged path in phase space.
[quotetheorem:1332]
[citeproof:1332]
The converse direction — showing that $[V_1, V_2] = 0$ implies commutativity of the flows for all $s$ and $t$, not just infinitesimally — is harder and uses the full Taylor expansion of the flow. The proof above establishes the more useful half: if flows commute, the bracket must vanish.
[remark: Commutator As Lie Bracket]
The commutator $[V_1, V_2]$ is the Lie bracket of vector fields. The space of smooth vector fields on $\mathbb{R}^m$ equipped with this bracket forms a Lie algebra. The theorem above is the statement that flows commute if and only if the corresponding Lie algebra elements commute.
[/remark]
## Hamiltonian Dynamics
The general theory of ODEs becomes vastly richer when the vector field has special geometric structure. The most important class for this course is Hamiltonian systems, which model mechanical and physical systems where energy is conserved. The question now is: what makes a system "Hamiltonian", and what extra structure does this impose on the solutions?
The arena for Hamiltonian mechanics is the **phase space** $M = \mathbb{R}^{2n}$, whose points are described by coordinates
\begin{align*}
(q, p) = (q_1, \ldots, q_n, p_1, \ldots, p_n),
\end{align*}
where the $q_i$ are interpreted as generalised position coordinates and the $p_i$ as their conjugate momenta. It is convenient to write $x = (q, p)^\top \in \mathbb{R}^{2n}$.
The pairing between positions and momenta is encoded in the **symplectic matrix**
\begin{align*}
J = \begin{pmatrix} 0 & I_n \\ -I_n & 0 \end{pmatrix},
\end{align*}
which satisfies $J^2 = -I_{2n}$ and $J^\top = -J$. This matrix plays the role that the metric plays in Riemannian geometry: it defines the fundamental bilinear structure on phase space.
### The Poisson Bracket and First Integrals
To study how functions on phase space evolve under the dynamics, we need a way to combine two observables and produce a third. The Poisson bracket provides this operation.
[definition: Poisson Bracket]
For smooth functions $f, g: M \to \mathbb{R}$ on phase space $M = \mathbb{R}^{2n}$, the **Poisson bracket** is defined by
\begin{align*}
\{f, g\} = \left\langle \frac{\partial f}{\partial x}, J \frac{\partial g}{\partial x} \right\rangle = \frac{\partial f}{\partial q} \cdot \frac{\partial g}{\partial p} - \frac{\partial f}{\partial p} \cdot \frac{\partial g}{\partial q},
\end{align*}
where $\langle \cdot, \cdot \rangle$ denotes the standard inner product on $\mathbb{R}^{2n}$.
[/definition]
The expression $\langle \partial f/\partial x, J \, \partial g / \partial x \rangle$ is the more intrinsic form, since it makes clear how to generalise the Poisson bracket to other phase spaces. The component-wise formula is more useful for computation.
[quotetheorem:1333]
[citeproof:1333]
Properties (2) and (4) together say that $C^\infty(M)$ equipped with the Poisson bracket is a Lie algebra. Property (3) says it is also a derivation of the ring structure of $C^\infty(M)$, making it a **Poisson algebra**. Property (5) is the fundamental commutation relation of classical mechanics.
The Jacobi identity deserves comment. To verify it directly, expand all three cyclic terms using the component formula $\{f,g\} = \sum_i (\partial f/\partial q_i \, \partial g/\partial p_i - \partial f/\partial p_i \, \partial g/\partial q_i)$. Each of the three bracket-of-brackets produces terms involving second partial derivatives of one function and first partials of the other two. Expanding all nine contributions and collecting, every mixed partial $\partial^2 f / \partial q_i \partial p_j$ appears with opposite signs in two of the three cyclic terms, so the entire sum cancels by symmetry of mixed partials. The Jacobi identity is not automatic from antisymmetry alone — it encodes the associativity of the underlying flow composition. A bilinear antisymmetric bracket on $C^\infty(M)$ that fails the Jacobi identity would not correspond to any consistent notion of commuting flows, and the correspondence $[V_f, V_g] = -V_{\{f,g\}}$ would break down.
Now we can define Hamiltonian dynamics. The role of the Hamiltonian is to generate the time evolution of all observables through the Poisson bracket.
Before writing down Hamilton's equations, it is worth appreciating what makes Hamiltonian structure special. Consider a damped harmonic oscillator $\ddot{q} = -\omega^2 q - \gamma \dot{q}$ (with damping coefficient $\gamma > 0$). Written as a first-order system $\dot{q} = p$, $\dot{p} = -\omega^2 q - \gamma p$, the total energy $E = \frac{1}{2}p^2 + \frac{1}{2}\omega^2 q^2$ satisfies $\dot{E} = -\gamma p^2 \leq 0$. Energy is strictly decreasing whenever $p \neq 0$ — the system dissipates. There is no Hamiltonian $H$ such that this system equals $\dot{x} = J \, \partial H / \partial x$: a Hamiltonian system always conserves $H$, so any dissipative system is automatically non-Hamiltonian. The Hamiltonian structure is not merely a notational convenience — it enforces energy conservation and with it all the rigid geometric properties (volume preservation, symplectic structure) that make integrability possible.
[definition: Hamilton's Equations]
Given a smooth function $H: M \to \mathbb{R}$ called the **Hamiltonian**, **Hamilton's equations** are
\begin{align*}
\dot{q} = \frac{\partial H}{\partial p}, \quad \dot{p} = -\frac{\partial H}{\partial q}.
\end{align*}
In compact form using the symplectic matrix,
\begin{align*}
\dot{x} = J \frac{\partial H}{\partial x}.
\end{align*}
We interpret $H$ as the total energy of the system.
[/definition]
The definition captures a profound idea: the Hamiltonian $H$ simultaneously describes the energy of the system and generates all the dynamics through the Poisson bracket. Every observable $f$ evolves via $\dot{f} = \{f, H\}$, so knowing $H$ is equivalent to knowing the equations of motion for every quantity at once. This is quite different from the Newtonian picture, where force laws must be specified separately for each degree of freedom.
[example: Simple Harmonic Oscillator]
Consider a particle of unit mass on a spring with angular frequency $\omega$. The Hamiltonian is the total energy
\begin{align*}
H(q, p) = \frac{1}{2}p^2 + \frac{1}{2}\omega^2 q^2,
\end{align*}
where $q$ is displacement and $p$ is momentum. Hamilton's equations give
\begin{align*}
\dot{q} = \frac{\partial H}{\partial p} = p, \quad \dot{p} = -\frac{\partial H}{\partial q} = -\omega^2 q.
\end{align*}
This is the standard harmonic oscillator $\ddot{q} = -\omega^2 q$. The level sets of $H$ are ellipses $\frac{1}{2}p^2 + \frac{1}{2}\omega^2 q^2 = c$ in the $(q, p)$-plane, and solutions trace these ellipses periodically.
[/example]
The Poisson bracket immediately gives a criterion for conservation.
[quotetheorem:1334]
[citeproof:1334]
This theorem is fundamental: it says that $f$ is conserved along Hamiltonian trajectories if and only if $\{f, H\} = 0$. As an immediate corollary, the Hamiltonian itself is always conserved.
[quotetheorem:1335]
[citeproof:1335]
Conservation of energy means all trajectories lie on hypersurfaces $\{H = c\}$ in phase space. For the harmonic oscillator these are the ellipses pictured above. The topology of these level sets turns out to determine whether the system can be completely integrated. This is a first hint that Hamiltonian dynamics has a strongly geometric character: the algebraic property $\dot{H} = 0$ constrains the trajectory to a specific geometric locus, and the shape of that locus (compact? connected? simply connected?) governs what kind of motion is possible.
### Hamiltonian Vector Fields
The Poisson bracket formalism connects vector fields to functions on phase space in a natural way. Each smooth function $f$ generates a vector field, called its Hamiltonian vector field, via the symplectic matrix.
[definition: Hamiltonian Vector Field]
For a smooth function $f: M \to \mathbb{R}$ on a $2n$-dimensional phase space $M \subseteq \mathbb{R}^{2n}$, the **Hamiltonian vector field** generated by $f$ is the map $V_f: M \to \mathbb{R}^{2n}$ defined by
\begin{align*}
V_f = J \frac{\partial f}{\partial x}.
\end{align*}
In particular, $V_H = J \, \partial H / \partial x$ is the vector field whose flow is the Hamiltonian flow.
[/definition]
The commutator of two Hamiltonian vector fields is itself a Hamiltonian vector field, and the generating function is the Poisson bracket. This is the bridge between the algebraic structure of $C^\infty(M)$ under the Poisson bracket and the geometric structure of flows on phase space: bracket of functions corresponds to commutator of the flows they generate.
[quotetheorem:1336]
The proof follows by direct computation: expand $[V_f, V_g]_i = \sum_j ((V_f)_j \, \partial_j (V_g)_i - (V_g)_j \, \partial_j (V_f)_i)$ using $(V_f)_j = (J \, \partial f/\partial x)_j$ and $(V_g)_j = (J \, \partial g/\partial x)_j$, then collect terms to recognise the component formula for $-V_{\{f,g\}}$. This result has a crucial consequence: if $\{f, g\} = 0$, then $[V_f, V_g] = 0$, which means the flows of $V_f$ and $V_g$ commute by the earlier theorem on commutativity. Functions whose Poisson bracket vanishes will be central to the definition of integrability.
Conversely, when $\{f, g\} \neq 0$, the flows of $V_f$ and $V_g$ fail to commute: following one flow and then the other produces a different result depending on the order. This means there is no coordinate system in which both flows simultaneously straighten out to translations along coordinate axes. Since the strategy for solving an integrable system is precisely to find such coordinates — the action-angle variables — non-vanishing Poisson brackets are the fundamental obstruction to integrability. The requirement that all first integrals be "in involution" (i.e.\ pairwise Poisson-commuting) is not an arbitrary algebraic condition but a geometric necessity: without it, the level sets of the integrals lack the torus structure needed for the Arnold-Liouville theorem to apply.
### First Integrals and Involution
We now introduce the key ingredients for integrability. A first integral is a quantity that is preserved along the Hamiltonian flow.
[definition: First Integral]
A smooth function $f: M \to \mathbb{R}$ is a **first integral** of the Hamiltonian system $(M, H)$ if
\begin{align*}
\{f, H\} = 0.
\end{align*}
By the Evolution theorem, this is equivalent to $\dot{f} = 0$ along all trajectories, i.e.\ $f$ is conserved.
[/definition]
The Hamiltonian $H$ is itself always a first integral. The Jacobi identity for the Poisson bracket shows that first integrals are closed under the bracket operation.
[quotetheorem:1352]
[citeproof:1352]
This theorem is encouraging but not always useful in practice: new first integrals produced this way may be functionally dependent on $f$ and $g$, or may simply vanish. For integrability we need the right notion of independence and compatibility.
The next two definitions make precise what we mean by a collection of first integrals being genuinely independent and mutually compatible. Independence is a rank condition on the gradients — it ensures each integral constrains the motion in a new direction. Involution is an algebraic compatibility condition — it ensures the corresponding Hamiltonian flows can be simultaneously straightened out.
Two first integrals are in involution if they Poisson-commute. This condition ensures their Hamiltonian flows commute, which is precisely what allows a simultaneous change of coordinates.
[definition: Involution]
Two smooth functions $F, G: M \to \mathbb{R}$ are **in involution** (or **Poisson-commute**) if
\begin{align*}
\{F, G\} = 0.
\end{align*}
A collection of functions $\{f_i\}$ is in involution if $\{f_i, f_j\} = 0$ for all $i \neq j$.
[/definition]
Independence is a separate condition: we need the first integrals to genuinely constrain the motion in independent directions.
[definition: Independent First Integrals]
A collection of smooth functions $f_1, \ldots, f_k: M \to \mathbb{R}$ is **independent** if at each point $x \in M$, the gradient vectors $\partial f_i / \partial x$ for $i = 1, \ldots, k$ are linearly independent.
[/definition]
Independence guarantees that the level sets $\{f_i = c_i\}$ are smooth hypersurfaces that intersect transversally, cutting the phase space down to a smooth submanifold of codimension $k$. Without linear independence, some integral would be redundant — its level set would contain the intersection of the others, contributing no new constraint and failing to reduce the dimension of the accessible region of phase space.
We can now state the central definition of the course.
[definition: Integrable System]
A $2n$-dimensional Hamiltonian system $(M, H)$ is **completely integrable** (or **Liouville integrable**) if there exist $n$ first integrals $f_1, \ldots, f_n$ that are:
- **independent**: $\partial f_1 / \partial x, \ldots, \partial f_n / \partial x$ are linearly independent at each point, and
- **in involution**: $\{f_i, f_j\} = 0$ for all $i, j$.
[/definition]
The choice of $n$ integrals for a $2n$-dimensional system is not arbitrary. Conservation of $H$ already confines trajectories to a $(2n-1)$-dimensional hypersurface. Each additional independent first integral cuts down the dimension by one further. With $n$ independent integrals in involution, the dynamics is confined to an $n$-dimensional submanifold. The involution condition then ensures these $n$-dimensional level sets have just enough structure — they turn out to be tori — to support linear motion.
[example: Two-Dimensional Systems Are Always Integrable]
Any $2n$-dimensional Hamiltonian system with $n = 1$, i.e.\ any system on $\mathbb{R}^2$ with one degree of freedom, is automatically completely integrable. The Hamiltonian $H$ itself is the unique first integral required (since $\{H, H\} = 0$ by antisymmetry, involution is automatic). The level sets $\{H = c\}$ are curves in the $(q, p)$-plane, and the motion on each curve is determined.
[/example]
## Canonical Transformations
Integrability promises that we can find coordinates in which Hamilton's equations simplify drastically. But not all coordinate changes are valid in Hamiltonian mechanics: the pairing between $q_i$ and $p_i$ is part of the structure, and we must preserve it. The question is: which coordinate changes leave Hamilton's equations in Hamiltonian form?
[definition: Canonical Transformation]
A smooth coordinate change $(q, p) \mapsto (Q, P)$ on phase space is **canonical** if, whenever $(q(t), p(t))$ satisfies Hamilton's equations with Hamiltonian $H$, the transformed coordinates $(Q(t), P(t))$ satisfy Hamilton's equations with the transformed Hamiltonian $\widetilde{H}(Q, P) = H(q, p)$.
[/definition]
Writing $x = (q, p)^\top$ and $y = (Q, P)^\top$, the condition is that
\begin{align*}
\dot{x} = J \frac{\partial H}{\partial x} \quad \Longleftrightarrow \quad \dot{y} = J \frac{\partial \widetilde{H}}{\partial y}.
\end{align*}
A simple but instructive non-example: swapping $q$ and $p$ by setting $Q = p$, $P = q$ gives $\dot{Q} = -\partial H / \partial P$ and $\dot{P} = \partial H / \partial Q$, which has the wrong sign — this is not canonical.
The condition for canonicalness has a clean characterisation for linear maps.
[quotetheorem:1337]
[citeproof:1337]
This characterisation is both algebraically clean and geometrically meaningful. The condition $AJA^\top = J$ says that $A$ intertwines $J$ with itself — equivalently, that $A$ preserves the symplectic form. Linear canonical transformations are precisely the linear symmetries of Hamiltonian mechanics. The set of all such matrices forms the **symplectic group** $\operatorname{Sp}(2n)$.
[definition: Symplectic Matrix]
A matrix $A \in \mathbb{R}^{2n \times 2n}$ is **symplectic** if $A J A^\top = J$.
[/definition]
Symplectic matrices form a group under multiplication. One can verify that $\det A = \pm 1$ for any symplectic $A$, and in fact $\det A = 1$. The condition $AJA^\top = J$ is equivalent to requiring that the transformation preserve the bilinear form $\omega(u,v) = \langle u, J^{-1}v \rangle = \langle u, -Jv \rangle$ — the symplectic form. Preserving this bilinear structure is the Hamiltonian analogue of preserving the inner product in orthogonal geometry.
For nonlinear maps, the local condition is that the derivative is symplectic at each point.
[quotetheorem:1338]
This is the nonlinear analogue: a diffeomorphism is canonical precisely when it is a symplectomorphism — when it preserves the symplectic structure at each point. The locality of the condition is essential: even if a map looks globally complicated, it is canonical if and only if its linearisation at every point is symplectic. A map that is symplectic at some points but not others cannot be canonical — there is no way to patch local failures into a globally consistent Hamiltonian structure. This is also why the generating function method is powerful: it automatically produces maps whose Jacobian satisfies $Jy_x J (Jy_x)^\top = J$ everywhere, by construction from the exact differential $dS$.
[example: Checking That A Transformation Is Not Canonical]
Consider the map $(q, p) \mapsto (Q, P)$ defined by $Q = p$, $P = q$ (swapping position and momentum). The Jacobian is
\begin{align*}
Jy = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}.
\end{align*}
Computing:
\begin{align*}
Jy \cdot J \cdot (Jy)^\top = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix} = -J.
\end{align*}
Since this equals $-J \neq J$, the transformation is not canonical. This confirms the earlier sign-flip observation directly.
[/example]
### Generating Functions
Knowing that a transformation is canonical does not tell us how to construct one. A systematic method uses **generating functions**: functions of mixed old and new coordinates from which the transformation is derived by differentiation.
[definition: Generating Function Method]
Let $S = S(q, P): \mathbb{R}^n \times \mathbb{R}^n \to \mathbb{R}$ be a smooth function satisfying the non-degeneracy condition
\begin{align*}
\det\left( \frac{\partial^2 S}{\partial q_i \, \partial P_j} \right) \neq 0
\end{align*}
everywhere. Define $(Q, P)$ from $(q, p)$ by the implicit relations
\begin{align*}
p_i = \frac{\partial S}{\partial q_i}(q, P), \quad Q_i = \frac{\partial S}{\partial P_i}(q, P(q, p)).
\end{align*}
Then the map $(q, p) \mapsto (Q, P)$ is a canonical transformation. Such a function $S$ is called a **generating function** of the transformation.
[/definition]
The non-degeneracy condition ensures the implicit function theorem can be applied to solve for $P$ as a function of $(q, p)$. The proof that the resulting transformation is canonical can be found in the course handouts; the key idea is that the transformation preserves the symplectic form $dq \wedge dp = dQ \wedge dP$.
The identity example below shows that the generating function method is consistent: the simplest possible function $S$ reproduces the identity transformation. More interesting generating functions produce genuine canonical maps — including, ultimately, the transformation to action-angle coordinates.
[example: Identity Transformation From A Generating Function]
Take $S(q, P) = q \cdot P = \sum_i q_i P_i$. The non-degeneracy condition gives
\begin{align*}
\det\left( \frac{\partial^2 S}{\partial q_i \, \partial P_j} \right) = \det(\delta_{ij}) = 1 \neq 0.
\end{align*}
The first relation gives $p_i = \partial S / \partial q_i = P_i$, so $P = p$. The second gives $Q_i = \partial S / \partial P_i = q_i$, so $Q = q$. The generating function $S(q, P) = q \cdot P$ thus produces the identity transformation $(Q, P) = (q, p)$.
[/example]
The shear example below illustrates that even a simple quadratic perturbation of the identity generating function produces a genuine canonical map, and that the symplecticity condition can be verified directly from the Jacobian.
[example: A Shear Canonical Transformation]
In a two-dimensional phase space ($n = 1$), consider $S(q, P) = qP + q^2$. The non-degeneracy condition gives $\partial^2 S / \partial q \, \partial P = 1 \neq 0$, so this is a valid generating function. Setting $p = \partial S / \partial q = P + 2q$ gives $P = p - 2q$. Then $Q = \partial S / \partial P = q$. The resulting canonical transformation is
\begin{align*}
(Q, P) = (q,\; p - 2q),
\end{align*}
which in matrix form reads
\begin{align*}
\begin{pmatrix} Q \\ P \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ -2 & 1 \end{pmatrix} \begin{pmatrix} q \\ p \end{pmatrix}.
\end{align*}
To verify canonicalness directly, let $A = \bigl(\begin{smallmatrix} 1 & 0 \\ -2 & 1 \end{smallmatrix}\bigr)$. Then
\begin{align*}
AJA^\top = \begin{pmatrix} 1 & 0 \\ -2 & 1 \end{pmatrix} \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix} \begin{pmatrix} 1 & -2 \\ 0 & 1 \end{pmatrix} = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix} = J,
\end{align*}
confirming $A$ is symplectic and the transformation is canonical.
[/example]
## The Arnold-Liouville Theorem
We now reach the central result of the chapter. For integrable systems, there is a canonical transformation to coordinates in which the Hamiltonian depends only on the action variables, making the equations of motion immediately solvable. The question is: what are these coordinates, and why do they exist?
To motivate the general theorem, consider what would happen if the Hamiltonian $\widetilde{H}$ depended only on new momentum variables $I = (I_1, \ldots, I_n)$ in some canonical coordinates $(\phi, I)$. Hamilton's equations would become
\begin{align*}
\dot{I} = -\frac{\partial \widetilde{H}}{\partial \phi} = 0, \quad \dot{\phi} = \frac{\partial \widetilde{H}}{\partial I} = \Omega,
\end{align*}
where $\Omega = \partial \widetilde{H} / \partial I$ is a constant (since $I$ is constant). The solution is immediate:
\begin{align*}
I(t) = I_0, \quad \phi(t) = \phi_0 + \Omega t.
\end{align*}
The dynamics is linear motion at constant angular velocity. The Arnold-Liouville theorem guarantees such coordinates exist for any completely integrable system.
### The Harmonic Oscillator as a Model Case
Before stating the general theorem, we work through the harmonic oscillator to see explicitly what the action-angle coordinates look like and how they are found.
[example: Action-Angle Coordinates For The Harmonic Oscillator]
Consider the harmonic oscillator $H(q, p) = \frac{1}{2}p^2 + \frac{1}{2}\omega^2 q^2$. This is a $2$-dimensional system ($n = 1$), and $H$ itself is the single required first integral, so the system is integrable.
The level sets of $H$ are the ellipses $\frac{1}{2}p^2 + \frac{1}{2}\omega^2 q^2 = c$ in the $(q, p)$-plane. Each such ellipse is homeomorphic to $S^1$. The Arnold-Liouville theorem will tell us that $n$-dimensional level sets of integrable systems are tori $T^n = S^1 \times \cdots \times S^1$; for $n = 1$ this is indeed $S^1$.
We introduce action-angle coordinates $(\phi, I)$ defined by
\begin{align*}
q = \sqrt{\frac{2I}{\omega}} \sin\phi, \quad p = \sqrt{2I\omega} \cos\phi.
\end{align*}
Substituting into the Hamiltonian,
\begin{align*}
\widetilde{H}(\phi, I) = \frac{1}{2}(2I\omega)\cos^2\phi + \frac{1}{2}\omega^2 \cdot \frac{2I}{\omega}\sin^2\phi = I\omega\cos^2\phi + I\omega\sin^2\phi = \omega I.
\end{align*}
The Hamiltonian $\widetilde{H} = \omega I$ is independent of $\phi$, exactly as desired. Hamilton's equations give $\dot{I} = 0$ and $\dot{\phi} = \omega$, integrating to $I(t) = I_0$, $\phi(t) = \phi_0 + \omega t$.
The action variable $I$ can be recovered from the original coordinates by integrating along the level set. Computing the integral of $p\, dq$ along the ellipse $H = c$ with $\phi$ running from $0$ to $2\pi$ (at fixed $I$):
\begin{align*}
\frac{1}{2\pi}\oint p\, dq &= \frac{1}{2\pi}\int_0^{2\pi} p(\phi, I) \frac{\partial q}{\partial \phi} \, d\phi = \frac{1}{2\pi}\int_0^{2\pi} \sqrt{2I\omega}\cos\phi \cdot \sqrt{\frac{2I}{\omega}}\cos\phi \, d\phi\\
&= \frac{1}{2\pi}\int_0^{2\pi} 2I\cos^2\phi\, d\phi = I.
\end{align*}
This is not a coincidence: in general, the action variables are defined by exactly this contour integral.
[/example]
### Statement of the Arnold-Liouville Theorem
The harmonic oscillator example reveals two key features that hold in general: the level sets of the first integrals are tori, and the action variables arise as contour integrals. The Arnold-Liouville theorem makes both precise.
[quotetheorem:1353]
The proof is not covered in this course. It involves showing that the involution condition $\{f_i, f_j\} = 0$ implies the Hamiltonian vector fields $V_{f_i}$ commute (by the theorem relating commutators and Poisson brackets), that these $n$ commuting flows generate an action of $\mathbb{R}^n$ on $M_c$, that this action is transitive and that compactness forces the isotropy group to be a lattice $\mathbb{Z}^n \subset \mathbb{R}^n$, making $M_c \cong \mathbb{R}^n / \mathbb{Z}^n = T^n$.
Each hypothesis earns its place. Compactness of $M_c$ is essential: without it, the level set can be diffeomorphic to $\mathbb{R}^k \times T^{n-k}$ rather than a torus. For instance, the free particle on $\mathbb{R}$ has Hamiltonian $H = \frac{1}{2}p^2$, and the level sets $\{p = \text{const}\}$ are lines, not circles — the motion is unbounded translation rather than periodic oscillation. The requirement of exactly $n$ independent integrals for a $2n$-dimensional system is also sharp: with fewer than $n$, the level set has dimension greater than $n$ and cannot support the rigid torus structure needed for linear flow. With $n - 1$ integrals one typically sees chaotic motion on the remaining unconstrained directions. Finally, the result is local in the space of level sets: the action-angle coordinates are constructed in a neighbourhood of a single compact connected component $M_c$, and may not extend globally to the entire phase space. Points where the gradients $\partial f_i / \partial x$ become linearly dependent — the singular values of the moment map $(f_1, \ldots, f_n)$ — are excluded, and the torus fibration can change topology across such singularities.
[remark: Action-Angle Variables]
In action-angle coordinates $(\phi, I)$:
- The **angle variables** $\phi_1, \ldots, \phi_n$ are coordinates on the torus $M_c$, each taking values in $[0, 2\pi)$.
- The **action variables** $I_1, \ldots, I_n$ are first integrals that are constant along each torus $M_c$; they parametrise which torus we are on.
The angle variables increase linearly in time, while the action variables are frozen. This is the content of the phrase "integrable motion is quasiperiodic."
[/remark]
### Applying Arnold-Liouville in Practice
The Arnold-Liouville theorem not only guarantees the existence of action-angle coordinates but also provides a concrete algorithm to find them. The action variables are computed as contour integrals, and the angle variables are derived from a generating function.
[definition: Arnold-Liouville Algorithm]
For a completely integrable $2n$-dimensional Hamiltonian system with first integrals $f_1, \ldots, f_n$:
**Step 1 — Find the first integrals** $f_1, \ldots, f_n$ that are independent and in involution.
**Step 2 — Compute the action variables** by integrating along independent cycles $C_i$ on the torus $M_c$:
\begin{align*}
I_i = \frac{1}{2\pi} \oint_{C_i} \sum_j p_j \, dq_j.
\end{align*}
**Step 3 — Build the generating function**
\begin{align*}
S(q, I) = \int_{x_0}^{q} \sum_j p_j(q', c(I)) \, dq_j',
\end{align*}
where $x_0$ is a base point on the torus and $c(I)$ expresses the level $c$ as a function of the actions $I$.
**Step 4 — Find the angle variables** by differentiating the generating function:
\begin{align*}
\phi_i = \frac{\partial S}{\partial I_i}.
\end{align*}
**Step 5 — Integrate the reduced system:**
\begin{align*}
I(t) = I_0, \quad \phi(t) = \phi_0 + \frac{\partial \widetilde{H}}{\partial I}(I_0) \cdot t.
\end{align*}
[/definition]
A subtlety in Step 3 is that the integral $\int_{x_0}^q p \, dq$ depends on the path taken through the torus, not just the endpoints. Different paths differing by a full loop $C_i$ contribute $2\pi I_i$ to the integral, so $S$ is well-defined only up to multiples of $2\pi I_i$. This is why $\phi_i = \partial S / \partial I_i$ is well-defined modulo $2\pi$ — it is genuinely an angle coordinate.
[example: Two Coupled Harmonic Oscillators]
Consider the Hamiltonian on $\mathbb{R}^4$:
\begin{align*}
H(q_1, q_2, p_1, p_2) = \frac{1}{2}(p_1^2 + \omega_1^2 q_1^2 + p_2^2 + \omega_2^2 q_2^2),
\end{align*}
with $\omega_1, \omega_2 > 0$. This is a four-dimensional system ($n = 2$), so we need two independent first integrals in involution.
Write $H = H_1 + H_2$ where $H_k = \frac{1}{2}(p_k^2 + \omega_k^2 q_k^2)$ is the energy of the $k$-th oscillator. The equations of motion decouple: $\dot{q}_k = p_k$, $\dot{p}_k = -\omega_k^2 q_k$. Since the system splits as two independent oscillators, $H_1$ and $H_2$ are separately conserved:
\begin{align*}
\{H_1, H\} = \{H_1, H_2\} = 0, \quad \{H_2, H\} = 0.
\end{align*}
To confirm involution, note $\{H_1, H_2\} = 0$ since $H_1$ depends only on $(q_1, p_1)$ and $H_2$ only on $(q_2, p_2)$.
The level set $M_c = \{H_1 = c_1, H_2 = c_2\}$ is the product of two ellipses, which is a $2$-torus $T^2 = S^1 \times S^1$, as expected from Arnold-Liouville.
To find the action variables, we integrate along the loops $C_k = \{\text{ellipse for } H_k = c_k\}$ while holding the other pair of coordinates constant. For $I_1$, holding $(q_2, p_2)$ fixed at a constant makes the $p_2 \, dq_2$ contribution vanish:
\begin{align*}
I_1 = \frac{1}{2\pi} \oint_{C_1} p_1 \, dq_1 = \frac{c_1}{\omega_1}.
\end{align*}
Similarly $I_2 = c_2 / \omega_2$. Inverting, $c_k = \omega_k I_k$, so the transformed Hamiltonian is
\begin{align*}
\widetilde{H}(\phi, I) = \omega_1 I_1 + \omega_2 I_2.
\end{align*}
The equations of motion in action-angle coordinates are $\dot{I}_k = 0$ and $\dot{\phi}_k = \omega_k$, giving
\begin{align*}
I_k(t) = I_k^0, \quad \phi_k(t) = \phi_k^0 + \omega_k t.
\end{align*}
Solutions are periodic in time if and only if both angles return simultaneously to their initial values, which requires $\omega_1 T = 2\pi m$ and $\omega_2 T = 2\pi n$ for some integers $m, n$ and period $T$. Dividing, we need $\omega_1 / \omega_2 = m/n \in \mathbb{Q}$.
When $\omega_1 = \omega_2 = \omega$, the Hamiltonian becomes $H = \frac{1}{2}(p_1^2 + p_2^2 + \omega^2(q_1^2 + q_2^2))$, which is rotationally symmetric in the $(q_1, q_2)$-plane. By Noether's theorem, angular momentum is conserved:
\begin{align*}
L = q_1 p_2 - q_2 p_1.
\end{align*}
One can verify directly that $\{L, H\} = 0$.
[/example]
### Review: The Structure of Integrable Systems
The chapter has built up the following picture. Starting from a Hamiltonian system on $\mathbb{R}^{2n}$, we ask whether it possesses $n$ independent first integrals in involution. If it does, then:
For any value $c \in \mathbb{R}^n$ of the first integrals, the level set $M_c$ is a smooth $n$-dimensional submanifold invariant under the flow. When $M_c$ is compact and connected, it is topologically a torus $T^n$. The action variables $I_i = \frac{1}{2\pi} \oint_{C_i} p \, dq$ measure the "size" of each cycle on this torus, and they are first integrals themselves. The angle variables $\phi_i$ give coordinates on the torus and advance linearly in time with constant angular velocities $\Omega_i = \partial \widetilde{H} / \partial I_i$.
In the action-angle picture, the phase space fibers over action space $\{(I_1, \ldots, I_n)\}$, with each fiber being a torus $T^n$ carrying the angle dynamics. Different initial conditions on the same torus lead to the same action values but different phases, and the motion on each torus is uniform rotation. When the frequencies $\Omega_i$ are rationally related, the trajectory closes up periodically; when they are irrationally related, the trajectory is dense on $T^n$.
[remark: What Integrable Buys Us]
The word "integrable" reflects that the equations of motion can be literally integrated: the solution is obtained by computing the contour integrals for the actions, solving the algebraic problem of inverting $c \mapsto I(c)$, and then reading off the linear motion $\phi_i(t) = \phi_i^0 + \Omega_i t$. No approximation is involved. The contrast with generic Hamiltonian systems — where trajectories can be chaotic — is stark: integrability is an exceptional property that makes the dynamics completely explicit.
[/remark]
Having established the geometric framework of finite-dimensional Hamiltonian mechanics and the Arnold-Liouville characterization of integrability, we now face the question: do infinite-dimensional systems—particularly nonlinear PDEs—exhibit analogous integrable structure? The next chapter answers this affirmatively by studying concrete examples of integrable PDEs, where exact solutions and conserved quantities reveal a discrete hierarchy of nonlinear waves called solitons.
Partial differential equations offer a vast generalization of the ODE theory developed in Chapter 1. Where a $2n$-dimensional integrable ODE possesses $n$ conserved first integrals, an integrable PDE — being infinite-dimensional — should possess infinitely many. The central question of this chapter is: what does integrability look like for PDEs, and what remarkable structures does it produce?
We approach this question not through a general definition — the notion of integrability for PDEs is genuinely subtle and will not be given a structural treatment until Chapter 4, where the Hamiltonian and bi-Hamiltonian frameworks are extended to infinite dimensions — but through two concrete equations that exhibit the hallmarks of integrable behavior: the Korteweg–de Vries (KdV) equation and the sine-Gordon equation. Both admit special solutions called solitons, which maintain their shape indefinitely and survive collisions intact. Recall from Chapter 1 that a Hamiltonian system is called completely integrable when it possesses the maximal number of independent first integrals in involution; the PDE analogues of these integrals — infinitely many of them — will be produced systematically in Chapters 3 and 4. We also introduce Bäcklund transformations, a technique for generating new solutions from old ones, which will prove to be one of the key structural features of integrable systems.
## Competing Effects in Nonlinear Wave Equations
The KdV equation sits at the intersection of two competing physical effects: dispersion and nonlinearity. To understand the balance between them, it helps to study each in isolation first.
Consider the linear PDE
\begin{align*}
u_t + u_{xxx} = 0,
\end{align*}
where $u = u(x,t)$ is a real-valued function of position $x \in \mathbb{R}$ and time $t \geq 0$. This equation admits solutions of the form $e^{ikx - i\omega t}$, called **plane wave modes**. Substituting into the equation forces the dispersion relation
\begin{align*}
\omega = \omega(k) = -k^3.
\end{align*}
For any $k \in \mathbb{R}$, choosing $\omega$ according to this relation yields a solution. Writing the plane wave mode as
\begin{align*}
u(x,t) = \exp\!\left(ik\!\left(x - \frac{\omega(k)}{k}\,t\right)\right),
\end{align*}
we see that it travels at phase speed $\omega(k)/k = -k^2$. The crucial point is that this speed depends on $k$: different Fourier modes travel at different speeds. For linear PDEs on convex domains, every solution is a superposition of plane wave modes,
\begin{align*}
u(x,t) = \int_{\mathbb{R}} A(k)\, e^{ikx - i\omega(k)t}\, d\mathcal{L}^1(k).
\end{align*}
So a localized initial pulse, which is composed of many Fourier modes, will spread out over time as its components drift apart. This phenomenon is called **dispersion**, and it is caused by the third-order spatial derivative $\partial_{xxx}$.
The opposite effect — concentration rather than spreading — is produced by nonlinearity. Consider the pure nonlinear equation
\begin{align*}
u_t - 6uu_x = 0
\end{align*}
with initial condition $u(x,0) = f(x)$. The method of characteristics solves this equation: $u(x,t) = f(\xi)$ where the characteristic $\xi$ is given implicitly by
\begin{align*}
\xi = x - 6t\,f(\xi).
\end{align*}
To understand what happens to derivatives, differentiate this implicit relation with respect to $x$:
\begin{align*}
\frac{\partial \xi}{\partial x} = 1 - 6t\,f'(\xi)\,\frac{\partial \xi}{\partial x} \implies \frac{\partial \xi}{\partial x} = \frac{1}{1 + 6t\,f'(\xi)}.
\end{align*}
Therefore
\begin{align*}
u_x = f'(\xi)\,\frac{\partial \xi}{\partial x} = \frac{f'(\xi)}{1 + 6t\,f'(\xi)}.
\end{align*}
If $f'(\xi) < 0$ somewhere, then the denominator $1 + 6t\,f'(\xi)$ will reach zero at a finite time $t^* = -1/(6f'(\xi))$, causing $u_x$ to blow up. Geometrically, the profile steepens until it becomes vertical and then tries to fold over itself, producing a multi-valued function. This phenomenon is called **wave-breaking**, and it is the characteristic signature of the nonlinear term $-6uu_x$.
The competition between these two effects is the key to understanding the KdV equation: dispersion wants to spread the solution out, while nonlinearity wants to focus and break it. The remarkable fact about the KdV equation is that these two tendencies can balance each other perfectly, producing solutions that neither disperse nor break, but instead travel intact at constant speed.
## The KdV Equation and Solitons
We have now seen that dispersion spreads a localized pulse while nonlinearity steepens it toward breaking. What equation balances these two effects, and what solutions can arise from that balance? The KdV equation is precisely the equation where dispersion and nonlinearity cancel each other out, allowing solutions to propagate indefinitely without changing shape.
[definition: KdV Equation]
The **KdV equation** (Korteweg–de Vries equation) is the partial differential equation
\begin{align*}
u_t + u_{xxx} - 6u\,u_x = 0,
\end{align*}
where $u: \mathbb{R} \times [0,\infty) \to \mathbb{R}$. The term $u_{xxx}$ produces dispersion and the term $-6uu_x$ produces nonlinear steepening.
[/definition]
The KdV equation was first derived by Diederik Korteweg and Gustav de Vries in 1895 to model shallow-water waves. Its importance was not fully appreciated until the 1960s, when numerical experiments revealed that it has an extraordinary class of solutions that behave more like particles than waves.
[definition: Soliton]
A **soliton** is a solitary wave solution to a nonlinear PDE that maintains its shape and amplitude as it propagates, and that survives collisions with other solitons with only a phase shift — emerging from the interaction unchanged in form.
[/definition]
The existence of solitons is striking because nonlinear equations are typically very poorly behaved: solutions develop singularities, interact in complicated ways, and resist closed-form analysis. That the KdV equation supports perfectly stable, particle-like wave structures is the first sign that it is a very special — integrable — equation.
[quotetheorem:1354]
[citeproof:1354]
This solution is localized: $\operatorname{sech}^2\eta \to 0$ exponentially as $|x| \to \infty$, so $u \to 0$ at spatial infinity for all $t$. The solution profile is a single negative hump centered at $x = 4\chi_1^2 t$, which moves to the right at speed $4\chi_1^2$. A crucial observation is that taller solitons (larger $\chi_1$) travel faster: the amplitude is $-2\chi_1^2$ and the speed is $4\chi_1^2$, so speed scales as the absolute amplitude. This amplitude-speed coupling is what makes multi-soliton collisions nontrivial.
[remark: Hypothesis on the Parameter]
The constant $\chi_1$ may in principle be any real number, but taking $\chi_1 > 0$ gives a leftward-opening profile moving to the right. Negative $\chi_1$ gives the same solution by symmetry (since $\chi_1^2 = (-\chi_1)^2$ and $\operatorname{sech}^2$ is even). The amplitude $-2\chi_1^2$ is always negative, so 1-solitons for KdV are always troughs, not peaks.
[/remark]
The term **$N$-soliton solution** refers to a solution that, for large $|t|$, decomposes into $N$ well-separated solitons of the form above. The 2-soliton interaction illustrates the defining property of integrability. Place a tall soliton (large $\chi_1$, moving fast) far to the left of a shorter soliton (small $\chi_1$, moving slowly). Since the tall one moves faster, it will eventually catch up. Based on the nonlinearity of the KdV equation, one might expect a complicated, destructive collision. What actually happens is more remarkable: the two solitons pass through each other as if they were ghosts. After the interaction, both emerge with exactly their original shapes and speeds, shifted only slightly in position relative to where they would have been had no collision occurred. This phase shift is the only lasting trace of the interaction.
This phenomenon was first observed in numerical simulations by Zabusky and Kruskal in 1965, and it was their discovery that prompted the name "soliton" — by analogy with particles, which also preserve their identity through interactions. Later, the inverse scattering transform (developed in the next chapter) provides explicit formulas for all $N$-soliton solutions and explains this quasi-particle behavior through the integrability of the system.
## The Sine-Gordon Equation
The soliton phenomenon observed in the KdV equation raises a natural question: is this stability a consequence of KdV's specific algebraic structure, or is it a general feature that other nonlinear equations can share? The sine-Gordon equation answers this decisively — it too supports stable, topologically protected solitons, but via a mechanism entirely different from the amplitude-speed coupling of KdV. Understanding this second example clarifies which features of soliton behavior are universal and which are equation-specific.
A second important example arises in the study of crystal dislocations and differential geometry: the sine-Gordon equation.
The equation we are about to define is motivated by a concrete failure: the linearized version $u_{tt} - u_{xx} + u = 0$ (the Klein-Gordon equation) does not support soliton solutions. Replacing the linear restoring force $u$ with the nonlinear term $\sin u$ remedies this, coupling the dispersive wave structure of the Klein-Gordon equation to a nonlinear potential with infinitely many minima.
[definition: Sine-Gordon Equation]
The **sine-Gordon equation** is the partial differential equation
\begin{align*}
u_{tt} - u_{xx} + \sin u = 0,
\end{align*}
where $u: \mathbb{R} \times \mathbb{R} \to \mathbb{R}$.
[/definition]
The name is a pun on the Klein-Gordon equation $u_{tt} - u_{xx} + u = 0$, obtained by replacing the linear term $u$ with $\sin u$. This replacement, though it looks mild, makes the equation nonlinear and gives it an entirely different character.
A natural change of variables simplifies the structure. In **light cone coordinates**
\begin{align*}
\xi = \tfrac{1}{2}(x - t), \qquad \tau = \tfrac{1}{2}(x + t),
\end{align*}
the sine-Gordon equation transforms into
\begin{align*}
\frac{\partial^2 u}{\partial \xi\,\partial \tau} = \sin u.
\end{align*}
This mixed-derivative form is often more convenient for analysis and is the version we will work with when studying Bäcklund transformations.
[quotetheorem:1339]
[citeproof:1339]
The parameter $a > 0$ controls the width and velocity of the kink: the solution is concentrated near $\eta = 0$, i.e., near $\xi = -\tau/a^2$ (in lab coordinates, this corresponds to motion at velocity $v = (a^2 - 1)/(a^2 + 1)$ in the original $x, t$ variables, satisfying $|v| < 1$ for all $a > 0$). The constraint $|v| < 1$ thus follows automatically from the parameterization: since $a > 0$, we have $(a^2-1)/(a^2+1) \in (-1,1)$, so the soliton always travels slower than the wave speed.
At a fixed time $\tau$, this solution interpolates smoothly between the values $u \to 0$ as $\xi \to -\infty$ and $u \to 2\pi$ as $\xi \to +\infty$. The profile is an S-shaped curve (a kink) connecting the two limiting values.
The topological character of this soliton becomes clearest when we regard $u$ as taking values in the circle $S^1 = \mathbb{R}/2\pi\mathbb{Z}$ rather than in $\mathbb{R}$. Under this identification, both $u = 0$ and $u = 2\pi$ represent the same point on the circle. The 1-soliton solution wraps once around $S^1$ as $\xi$ ranges over $\mathbb{R}$, while the constant solution $u = 0$ does not wrap at all. Since the winding number is an integer-valued quantity that varies continuously, no continuous deformation can convert the soliton into the constant solution. The soliton is **topologically protected**: it cannot decay, regardless of how complicated the dynamics become. This is a qualitatively different mechanism for stability than the amplitude-speed coupling responsible for KdV solitons.
[remark: Velocity in Lab Coordinates]
In the original $(x,t)$ coordinates, the 1-soliton travels at velocity $v = (a^2-1)/(a^2+1)$. As $a \to 0^+$ the velocity approaches $-1$ (leftward at wave speed), as $a = 1$ the soliton is stationary ($v = 0$), and as $a \to +\infty$ the velocity approaches $+1$ (rightward at wave speed). Since $a > 0$ is unrestricted, all velocities $v \in (-1,1)$ are achieved, confirming the relativistic constraint $|v| < 1$ automatically. This is analogous to the requirement that a massive particle travels slower than light: the sine-Gordon equation has a Lorentz symmetry in the $(x,t)$ plane, and the kink behaves like a massive relativistic particle.
[/remark]
## Bäcklund Transformations
For a linear PDE, the principle of superposition allows us to manufacture new solutions by adding known ones. No such principle is available for nonlinear equations. How, then, can we generate new solutions? The Cauchy-Riemann equations offer a classical precedent for a powerful substitute technique.
[example: Cauchy-Riemann as a Bäcklund Transformation]
Laplace's equation in two dimensions,
\begin{align*}
\frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2} = 0,
\end{align*}
is solved by any harmonic function $u: \mathbb{R}^2 \to \mathbb{R}$. The Cauchy-Riemann equations,
\begin{align*}
u_x = v_y, \qquad u_y = -v_x,
\end{align*}
relate a harmonic function $u$ to another harmonic function $v$, called the harmonic conjugate of $u$. Given a solution $u$, we can find $v$ by integrating these first-order equations. For instance, if $u = 2xy$, then $v_x = -2y$ and $v_y = 2x$; integrating gives $v = -x^2 + y^2 + C$ for an arbitrary constant $C$. We have generated a new solution to Laplace's equation from a known one. The Cauchy-Riemann system is an example of an **auto-Bäcklund transformation** for Laplace's equation: it is a system of first-order equations involving two solutions of the same PDE that, given one solution, allows us to find another.
[/example]
This structure — a system of first-order equations that mediates between solutions of a PDE — generalizes to a broad class of equations.
[definition: Bäcklund Transformation]
Let $u, v: \mathbb{R}^p \to \mathbb{R}$ be smooth functions of $p$ independent variables. A **Bäcklund transformation** is a system of first-order equations relating $u$ and $v$ such that:
1. If $v$ is a solution to a given PDE $P_1(v) = 0$, then any $u$ satisfying the system is a solution to a (possibly different) PDE $P_2(u) = 0$.
2. Conversely, if $u$ solves $P_2(u) = 0$, then any $v$ satisfying the system solves $P_1(v) = 0$.
When $P_1 = P_2$ — that is, when both $u$ and $v$ satisfy the same equation — the transformation is called an **auto-Bäcklund transformation**.
[/definition]
Without Bäcklund transformations, generating explicit multi-soliton solutions would require solving the full nonlinear PDE directly, which is generally infeasible. The power of the method is that it reduces the problem of finding a new solution to solving a system of first-order equations — a much simpler task. Given a known "seed" solution (often the constant solution $u = 0$), one applies the Bäcklund transformation to produce a nontrivial solution, and then iterates the process to generate solutions with progressively more solitons.
[example: Bäcklund Transformation Relating Two PDEs]
The equation $u_{xt} = e^u$ is related to the linear equation $v_{xt} = 0$ via the Bäcklund transformation
\begin{align*}
u_x + v_x &= \sqrt{2}\,\exp\!\left(\frac{u - v}{2}\right),\\
u_t - v_t &= \sqrt{2}\,\exp\!\left(\frac{u + v}{2}\right).
\end{align*}
Since $v_{xt} = 0$ is easy to solve (its general solution is $v(x,t) = f(x) + g(t)$ for arbitrary functions $f$ and $g$), this transformation converts the nonlinear problem $u_{xt} = e^u$ into an integrable first-order system. One verifies the transformation by cross-differentiating: differentiating the first equation with respect to $t$ and the second with respect to $x$, then using $v_{xt} = 0$, recovers $u_{xt} = e^u$.
[/example]
The sine-Gordon equation possesses a particularly rich family of auto-Bäcklund transformations, parameterized by a continuous parameter $\varepsilon \neq 0$.
[quotetheorem:1340]
[citeproof:1340]
This computation is elegant in its economy: the verification amounts to a single product-to-sum identity, and the parameter $\varepsilon$ cancels in the critical step. The fact that $\varepsilon$ appears in the transformation but not in the final equivalence means that each value of $\varepsilon$ gives a different way of producing a new solution, and hence applying transformations with different $\varepsilon$ values sequentially can produce a rich family of solutions.
The requirement $\varepsilon \neq 0$ is not merely a technical convenience: at $\varepsilon = 0$, the first equation of the Bäcklund system becomes $\partial_\xi(\varphi_1 - \varphi_2) = 0$ while the second acquires a factor $2/\varepsilon \to \infty$, so the system degenerates and no longer defines a meaningful transformation. The parameter $\varepsilon$ therefore plays the role of a spectral parameter: each nonzero value produces a genuinely distinct transformation, and the rich family they form as $\varepsilon$ varies is a signature of the integrability of the sine-Gordon equation.
The significance of having a free parameter $\varepsilon$ goes beyond mere aesthetic richness. For linear equations, a family of solutions depending on a parameter is not remarkable — we can always scale solutions. For the sine-Gordon equation, however, the parameter $\varepsilon$ encodes genuine physical information: it controls the velocity of the soliton produced. This is a hallmark of integrability: the existence of a parameter-dependent family of Bäcklund transformations corresponds, at a deeper level, to the existence of infinitely many conserved quantities.
### Generating Solitons from the Constant Solution
The Bäcklund transformation is useful as a generator: starting from a known simple solution, we apply the transformation to produce a nontrivial one. For the sine-Gordon equation, the constant solution $\varphi_1 = 0$ serves as the seed.
With $\varphi_1 = 0$, the Bäcklund system reduces to
\begin{align*}
\frac{\partial \varphi}{\partial \xi} &= 2\varepsilon\,\sin\!\left(\frac{\varphi}{2}\right),\\
\frac{\partial \varphi}{\partial \tau} &= -\frac{2}{\varepsilon}\,\sin\!\left(\frac{\varphi}{2}\right),
\end{align*}
where $\varphi = \varphi_2$. The two equations have opposite signs and the replacement $\varepsilon \leftrightarrow -\varepsilon^{-1}$ maps one to the other, motivating the ansatz
\begin{align*}
\varphi(\xi,\tau) = 2\chi(\varepsilon\xi - \varepsilon^{-1}\tau).
\end{align*}
Substituting into either equation, and writing $z = \varepsilon\xi - \varepsilon^{-1}\tau$, both equations reduce to the single ODE
\begin{align*}
\frac{d\chi}{dz} = \sin\chi.
\end{align*}
This separable ODE is solved by writing $\csc\chi\,d\chi = dz$, which integrates to
\begin{align*}
\log\tan\frac{\chi}{2} = z + C.
\end{align*}
Therefore $\chi(z) = 2\arctan(Ae^z)$ for an arbitrary constant $A$, and the new solution is
\begin{align*}
\varphi(\xi,\tau) = 4\arctan\!\left(A\,e^{\varepsilon\xi - \varepsilon^{-1}\tau}\right).
\end{align*}
To match the canonical 1-soliton $4\arctan(\exp(a\xi + a^{-1}\tau))$ from the theorem above, set $\varepsilon = -a$ (which is nonzero for $a \neq 0$): then $\varepsilon\xi - \varepsilon^{-1}\tau = -a\xi + a^{-1}\tau$, and $4\arctan(Ae^{-a\xi+a^{-1}\tau})$ is an antikink traveling at velocity $v = (a^2-1)/(a^2+1)$ in the opposite sense. The kink $4\arctan(e^{a\xi+a^{-1}\tau})$ arises by taking $A$ and $\varepsilon$ with the opposite orientation. Both kink and antikink are valid 1-soliton solutions, related by $u \mapsto -u$ (i.e., sending $u \to 2\pi - u$ modulo the circle identification). The free parameters $A$ and $|\varepsilon|$ correspond to the position and speed of the soliton.
[example: Generating Multi-Solitons by Iteration]
Once we have the 1-soliton solution $\varphi_2$ produced from the seed $\varphi_1 = 0$, we can apply the Bäcklund transformation again — this time to $\varphi_2$ as the new seed, using a different value $\varepsilon'$ of the parameter — to produce a 2-soliton solution. Each application of the Bäcklund transformation adds one soliton. In this way, the entire $N$-soliton family can in principle be constructed iteratively, without ever solving the full nonlinear PDE directly. The resulting solutions have the property that, as $t \to \pm\infty$, they decompose into $N$ individual solitons traveling at different speeds, with the only lasting effect of their interactions being a spatial phase shift.
[/example]
The emergence of explicit, structured multi-soliton solutions from a simple iterative procedure is not coincidental. It reflects the underlying integrability of the sine-Gordon equation, which will be understood more systematically once we introduce the Lax pair formalism. The Bäcklund transformation is, in a precise sense, the shadow of a deeper algebraic structure — but tracing that shadow back to its source requires the inverse scattering transform developed in the following chapter.
The KdV and sine-Gordon equations exhibit a remarkable duality: their solutions can be read directly from the spectral data of an associated linear scattering problem. This observation motivates the Inverse Scattering Transform, a method that transforms the nonlinear PDE into a sequence of simpler problems on spectral parameters, thereby linearizing the nonlinear dynamics and providing exact solutions to nonlinear waves.
The inverse scattering transform is one of the deepest ideas in the theory of integrable systems — a nonlinear analogue of the Fourier transform that reduces the problem of solving certain PDEs to a sequence of linear steps. Chapter 1 developed Hamiltonian mechanics — the Poisson bracket, first integrals, and the Arnold–Liouville theorem — and Chapter 2 introduced solitons and Bäcklund transformations as concrete evidence that certain PDEs behave like infinite-dimensional integrable systems; here we harness those ideas in a spectral setting. The central equation throughout is the Korteweg–de Vries (KdV) equation $u_t + u_{xxx} - 6uu_x = 0$, a nonlinear PDE that turns out to be completely solvable via a remarkable indirect route: transform the potential $u$ into scattering data, evolve the data in time (a linear operation), and then invert the transform to recover $u$.
# 3. Inverse Scattering Transform
## Forward Scattering Problem
The starting point is a question about spectra. Given a potential $u = u(x)$ with compact support, what are the eigenvalues and eigenfunctions of the associated Schrödinger operator? This question, entirely linear in character, is the forward scattering problem.
Throughout this section we work with the Schrödinger operator
\begin{align*}
L = -\frac{\partial^2}{\partial x^2} + u(x),
\end{align*}
where the potential $u : \mathbb{R} \to \mathbb{R}$ has compact support, meaning there exists $R > 0$ such that $u(x) = 0$ for all $|x| > R$.
[definition: Forward Problem]
The **forward problem** for the Schrödinger operator $L = -\partial_x^2 + u$, where $u \in C_c^\infty(\mathbb{R})$ is a compactly supported smooth potential, is: given $u$, find all $\lambda \in \mathbb{R}$ and $\psi \in C^2(\mathbb{R})$ satisfying $L\psi = \lambda\psi$. For $\lambda < 0$, we additionally require $\psi \in L^2(\mathbb{R})$ (bound states); for $\lambda > 0$, we require $\psi$ to be bounded (scattering states). Equivalently, find the spectrum of $L$ and an associated collection of eigenfunctions.
[/definition]
The spectral theory of this operator separates cleanly into two regimes, distinguished by the sign of $\lambda$.
[definition: Inverse Problem]
The **inverse problem** for $L = -\partial_x^2 + u$ is the converse: given the spectral data — the discrete eigenvalues $\{-\kappa_n^2\}_{n=1}^N$, their normalisation constants $\{c_n\}_{n=1}^N$, and the reflection coefficient $R: \mathbb{R} \to \mathbb{C}$ — reconstruct the potential $u \in C_c^\infty(\mathbb{R})$.
[/definition]
The power of the inverse scattering method lies in the fact that both the forward and inverse problems can be solved, and that the forward transform interacts beautifully with the time evolution imposed by KdV.
### Continuous Spectrum
When $\lambda = k^2 > 0$, the eigenvalue equation $L\psi = k^2\psi$ becomes
\begin{align*}
-\psi_{xx} + u(x)\psi = k^2\psi.
\end{align*}
Since $u$ has compact support and vanishes for $|x| > R$, in the region $|x| > R$ this reduces to
\begin{align*}
\psi_{xx} + k^2\psi = 0,
\end{align*}
which has the general solution $\psi = \alpha e^{-ikx} + \beta e^{ikx}$. So any solution, as $|x| \to \infty$, must be a linear combination of plane waves $e^{\pm ikx}$. Since the potential $u$ is compactly supported, there is no reason the coefficients on the left and right need to match: the potential acts as a scatterer.
We single out a specific solution $\phi(x, k)$ by imposing the asymptotic condition on the left:
\begin{align*}
\phi(x, k) \sim e^{-ikx} \quad \text{as } x \to -\infty.
\end{align*}
This is an incoming wave from the right travelling leftward. After passing through the support of $u$, the wave is partially reflected and partially transmitted.
[quotetheorem:1341]
The existence follows from the fact that for $|x| > R$ the equation is a constant-coefficient ODE with a two-dimensional solution space, and the conditions at $-\infty$ pick out a unique element. Since $k$ is real, solutions exist for every $k$, which is why this part of the spectrum is called the continuous spectrum.
The functions $a(k)$ and $b(k)$ encode how the potential responds to the incident wave. It is more physically transparent to work with their ratio.
[definition: Reflection Coefficient]
Given the scattering solution $\phi(x, k)$ with asymptotic coefficients $a(k)$ and $b(k)$, the **reflection coefficient** $R: \mathbb{R} \to \mathbb{C}$ is
\begin{align*}
R(k) = \frac{b(k)}{a(k)},
\end{align*}
and the **transmission coefficient** $T: \mathbb{R} \to \mathbb{C}$ is
\begin{align*}
T(k) = \frac{1}{a(k)}.
\end{align*}
[/definition]
To interpret these, set $\Phi = \phi / a$. Then
\begin{align*}
\Phi(x, k) \sim \begin{cases} T(k) e^{-ikx} & x \to -\infty, \\ e^{-ikx} + R(k)e^{ikx} & x \to +\infty. \end{cases}
\end{align*}
Viewing $e^{-ikx}$ as a wave moving to the left and $e^{ikx}$ as a wave moving to the right, we can read $\Phi$ as follows: an incident wave $e^{-ikx}$ arrives from the right; the potential reflects a fraction $R(k)e^{ikx}$ back to the right and transmits the remainder $T(k)e^{-ikx}$ through to the left. The reflection and transmission coefficients satisfy an energy conservation law.
[quotetheorem:1355]
This identity is the statement that the total energy of the scattered wave equals the energy of the incident wave: the potential neither creates nor destroys energy, only redistributes it between transmitted and reflected components.
At large $|k|$, the wave has high frequency and correspondingly high energy. The potential becomes a negligible perturbation at these scales, so we expect $T(k) \to 1$ and $R(k) \to 0$ as $|k| \to \infty$. This is indeed the case and provides a useful sanity check on computations.
### Discrete Spectrum and Bound States
When $\lambda = -\kappa^2 < 0$ with $\kappa > 0$, the character of the problem changes entirely. The eigenvalue equation becomes
\begin{align*}
-\psi_{xx} + u(x)\psi = -\kappa^2\psi.
\end{align*}
For $|x| > R$ this reduces to $\psi_{xx} - \kappa^2\psi = 0$, whose general solution is $\alpha e^{\kappa x} + \beta e^{-\kappa x}$. For $\psi$ to be square-integrable — that is, for $\int_{-\infty}^\infty \psi(x)^2 \, dx < \infty$ — we need exponential decay in both directions, which forces $\psi(x) \to 0$ as $|x| \to \infty$.
[definition: Bound States]
A **bound state** with eigenvalue $-\kappa^2 < 0$ is a solution $\psi_\kappa \in L^2(\mathbb{R})$ to $L\psi_\kappa = -\kappa^2\psi_\kappa$ satisfying the normalisation condition
\begin{align*}
\|\psi_\kappa\|^2 = \int_{-\infty}^\infty \psi_\kappa(x)^2 \, dx = 1.
\end{align*}
[/definition]
The square-integrability requirement imposes a severe constraint on which $\kappa$ can appear. This is what produces the discrete part of the spectrum.
[quotetheorem:1356]
[citeproof:1356]
The key structural fact about bound states is their finiteness.
[quotetheorem:1342]
The proof of this result uses an explicit formula for the coefficient $\beta(k)$ (analytically continued to imaginary $k$) and shows that it has only finitely many zeros, each corresponding to a bound state eigenvalue. We quote the result without proof.
The finitely many bound states $\{\psi_n\}_{n=1}^N$ satisfy $L\psi_n = -\kappa_n^2\psi_n$ with
\begin{align*}
\kappa_1 > \kappa_2 > \cdots > \kappa_N > 0,
\end{align*}
and $\|\psi_n\| = 1$, with large-$x$ asymptotics
\begin{align*}
\psi_n(x) \sim c_n \, e^{-\kappa_n x} \quad \text{as } x \to +\infty.
\end{align*}
The normalisation constants $\{c_n\}_{n=1}^N$ play a central role in what follows.
### Summary and Scattering Data
Assembling the continuous and discrete parts of the spectrum, we arrive at the key definition that makes the entire method work.
[definition: Scattering Data]
The **scattering data** for the Schrödinger operator $L = -\partial_x^2 + u : C^2(\mathbb{R}) \to C^0(\mathbb{R})$ is the collection
\begin{align*}
S = \bigl\{\{\kappa_n, c_n\}_{n=1}^N,\, R(k)\bigr\},
\end{align*}
consisting of the discrete eigenvalues $\kappa_n$ with their normalisation constants $c_n$, and the reflection coefficient $R(k)$ from the continuous spectrum. (The transmission coefficient $T(k)$ is determined by $R(k)$ via the energy conservation identity and is therefore not independent data.)
[/definition]
[example: Scattering Data For Dirac Delta Potential]
Consider the Dirac potential $u(x) = -2\alpha\delta(x)$ with $\alpha > 0$.
**Continuous spectrum.** Since $u = 0$ away from $x = 0$, we have
\begin{align*}
\Phi(x, k) = \begin{cases} T(k)e^{-ikx} & x < 0, \\ e^{-ikx} + R(k)e^{ikx} & x > 0. \end{cases}
\end{align*}
Requiring continuity of $\Phi$ at $x = 0$ gives $T(k) = 1 + R(k)$. Integrating $L\Phi = k^2\Phi$ over $(-\varepsilon, \varepsilon)$ and taking $\varepsilon \to 0$ yields the jump condition $ik(R - 1) + ikT = -2\alpha T$. Solving these two equations gives
\begin{align*}
R(k) = \frac{i\alpha}{k - i\alpha}, \qquad T(k) = \frac{k}{k - i\alpha}.
\end{align*}
One verifies $|R|^2 + |T|^2 = 1$, and as $k \to \infty$ we have $R(k) \to 0$, $T(k) \to 1$, confirming that high-energy waves pass through unimpeded.
**Discrete spectrum.** For $x \neq 0$ we need $-\partial_x^2\psi + \kappa^2\psi = 0$, so $\psi_n(x) = c_n e^{-\kappa_n|x|}$. Integrating $L\psi_n = -\kappa_n^2\psi_n$ over $(-\varepsilon, \varepsilon)$ and passing to the limit gives $c_n\kappa_n = c_n\alpha$, so there is exactly one bound state with $\kappa_1 = \alpha$. Normalising: $1 = \int_{-\infty}^\infty c_1^2 e^{-2\alpha|x|} \, dx = c_1^2/\alpha$, giving $c_1 = \sqrt{\alpha}$.
The full scattering data is therefore
\begin{align*}
S = \left\{\bigl\{\alpha, \sqrt{\alpha}\bigr\},\; \frac{i\alpha}{k - i\alpha}\right\}.
\end{align*}
[/example]
## Inverse Scattering Problem
The forward problem produces scattering data from a potential. The inverse question asks whether this process can be reversed: given scattering data $S = \{\{\kappa_n, c_n\}_{n=1}^N, R(k)\}$, can we reconstruct a potential $u$ whose Schrödinger operator has exactly this scattering data?
Without a positive answer to this question, the scattering data would be a useful description but not a tool for solving PDEs. The remarkable fact, established by Gelfand, Levitan, and Marchenko in the 1950s, is that the answer is yes and the reconstruction formula is explicit.
[quotetheorem:1357]
The proof of the GLM theorem is substantial — it requires analytic function theory and careful control of the Volterra operators involved — and is not covered in this course. What matters for us is the structure of the result.
Notice that the GLM equation is linear in $K$: it has the form
\begin{align*}
K + F + \mathcal{A}K = 0
\end{align*}
for a linear operator $\mathcal{A}$ depending on $F$. For each fixed $x$, this is a Fredholm integral equation in $y$, and standard results guarantee a unique solution under mild conditions on $F$. This linearity is crucial: it means that even though we began with the nonlinear KdV equation, the reconstruction step is a linear problem.
The function $F(x)$ is the **Fourier transform of the reflection coefficient** plus an exponential contribution from each bound state. It captures all the spectral information of the potential in a single function. The GLM equation then unscrambles this information to produce $K$, and the derivative of $K$ along the diagonal gives the potential.
[remark: Structure of the GLM Equation]
The GLM equation can be viewed as an infinite-dimensional linear system $K + F + AK = 0$, where $A$ is a linear operator parametrised by $x$. For each $x$, one solves for $K(x, \cdot)$ as a function of $y$, then evaluates at $y = x$ to extract $u(x)$.
[/remark]
The full circle of the inverse scattering method for KdV can now be sketched. Starting from an initial potential $u_0(x) = u(x, 0)$, we compute $S(0) = \{\{\kappa_n, c_n(0)\}_{n=1}^N, R(k, 0)\}$. The KdV flow acts on the scattering data, producing $S(t)$ for each time $t$. We then apply GLM to $S(t)$ to recover $u(x, t)$. The KdV equation, which directly evolves $u$, becomes a simple linear ODE for the scattering data. The details of how the scattering data evolves under KdV are determined by the Lax pair formulation developed in the next section.
## Lax Pairs
The missing link in the programme above is a precise mechanism for converting the KdV equation into a statement about how scattering data evolves. To do this, we need a structure that connects the time evolution of the potential $u(x, t)$ to the spectral theory of the Schrödinger operator $L = -\partial_x^2 + u$. This structure is a **Lax pair**.
The motivation is an analogy with Hamiltonian mechanics. Recall from Chapter 1 that in a Hamiltonian system, the time evolution of any observable $f$ is determined by the Poisson bracket: $\dot{f} = \{f, H\}$. The Hamiltonian $H$ generates time evolution through this algebraic structure. We need something similar at the operator level: a second operator that generates the time evolution of $L$.
If we allowed $u$ (and hence $L$) to vary in time, we would expect the eigenvalue equation $L(t)\psi(t) = \lambda(t)\psi(t)$ to evolve. The question is: for which time evolutions of $u$ do the eigenvalues $\lambda$ remain constant? Time-independent eigenvalues would mean the spectral type of $L$ is preserved — a property called **isospectrality**.
[definition: Lax Pair]
Let $L(t)$ be a time-dependent self-adjoint linear differential operator acting on $C^\infty(\mathbb{R})$,
\begin{align*}
L = a_m(x, t)\partial_x^m + \cdots + a_1(x, t)\partial_x + a_0(x, t),
\end{align*}
with $L_t := \dot{a}_m\partial_x^m + \cdots + \dot{a}_0$ denoting the operator obtained by differentiating each coefficient in $t$. If there exists a second operator $A: C^\infty(\mathbb{R}) \to C^\infty(\mathbb{R})$ such that
\begin{align*}
L_t = [L, A] := LA - AL,
\end{align*}
then the pair $(L, A)$ is called a **Lax pair**.
[/definition]
The equation $L_t = [L, A]$ is called the **Lax equation**. Its geometric content is this: $A$ generates a unitary flow in the space of operators, and the Lax equation says that $L$ evolves by conjugation, which preserves eigenvalues. This is the content of the following fundamental theorem.
[quotetheorem:1358]
[citeproof:1358]
The isospectral flow theorem is the engine of the entire method. For the Schrödinger operator specifically, it tells us that the discrete eigenvalues $\{-\kappa_n^2\}$ are conserved in time whenever the potential $u$ evolves according to a PDE that can be written as a Lax equation. The eigenvalues $\kappa_n$ are integrals of motion.
The theorem also requires self-adjointness of $L$, which is a genuine hypothesis and not merely a technical convenience. If $L$ were not self-adjoint, the inner product identity used in the proof would fail. More broadly, the requirement that $L_t = [L, A]$ holds exactly — not approximately, not as a formal expansion — is essential. The fact that KdV admits such an exact Lax pair representation is the core miracle of the theory.
[example: Lax Pair Formulation of KdV]
Set
\begin{align*}
L &= -\partial_x^2 + u(x, t), \\
A &= 4\partial_x^3 - 3\bigl(u\partial_x + \partial_x u\bigr).
\end{align*}
Then $(L, A)$ is a Lax pair if and only if $u = u(x, t)$ satisfies the KdV equation:
\begin{align*}
L_t = [L, A] \iff u_t + u_{xxx} - 6uu_x = 0.
\end{align*}
This equivalence is verified by direct computation: expanding $[L, A]$ and $L_t$ in terms of differential operators and equating coefficients, one finds that all terms involving $\partial_x$, $\partial_x^2$, and lower cancel, leaving exactly the KdV equation as the compatibility condition.
[/example]
The discovery of this Lax pair by Peter Lax in 1968 was the breakthrough that unlocked the full theory. It converted the nonlinear PDE into the question of how a self-adjoint operator evolves under a Lax flow, and the Isospectral Flow Theorem gave the answer. Notice that $A$ is antisymmetric (in the $L^2$ sense): $A^* = -A$. This is what ensures the operator $e^{-At}$ is unitary, and unitarity is what preserves the spectrum.
## Evolution of Scattering Data
We now allow the potential $u = u(x, t)$ to evolve according to the KdV equation
\begin{align*}
u_t + u_{xxx} - 6uu_x = 0,
\end{align*}
and track how the scattering data of $L = -\partial_x^2 + u(x, t)$ evolves. By the Lax Pair formulation of KdV, we have $L_t = [L, A]$ with $A = 4\partial_x^3 - 3(u\partial_x + \partial_x u)$. Since $u$ has compact support, for large $|x|$ the potential vanishes and $A$ simplifies to $A = 4\partial_x^3$.
The isospectral flow theorem immediately tells us that the discrete eigenvalues $\kappa_n$ are constant in time. The normalisation constants $c_n$ and the reflection coefficient $R(k)$ require more work to track.
### Evolution of the Continuous Spectrum
[quotetheorem:1343]
[citeproof:1343]
The transmission coefficient is preserved exactly, while the reflection coefficient acquires a purely oscillatory phase factor. In particular, $|R(k, t)| = |R(k, 0)|$ for all time: the amplitude of the reflection coefficient is a conserved quantity, only its phase evolves. This is precisely the linear, easy-to-invert evolution that the method requires.
### Evolution of the Discrete Spectrum
[quotetheorem:1344]
[citeproof:1344]
The structure of the proof is elegant: the Isospectral Flow Theorem guarantees that $\tilde\psi_n$ is a bound state, but the antisymmetry of $A$ forces $\tilde\psi_n$ to be orthogonal to every bound state at the same eigenvalue, collapsing it to zero. The entire time evolution of the discrete spectrum is then encoded in the simple exponential $e^{4\kappa_n^3 t}$.
### Summary of the Inverse Scattering Transform
Putting together the forward transform, the time evolution, and the GLM inverse, the method gives a complete solution procedure for KdV. Given initial data $u_0(x) = u(x, 0)$:
1. **Forward transform.** Compute $S(0) = \{\{\kappa_n, c_n(0)\}_{n=1}^N, R(k, 0)\}$ from $u_0$.
2. **Evolve.** The scattering data at time $t$ is
\begin{align*}
S(t) = \left\{\bigl\{\kappa_n,\, c_n(0)e^{4\kappa_n^3 t}\bigr\}_{n=1}^N,\; R(k, 0)e^{8ik^3 t}\right\}.
\end{align*}
3. **Inverse transform.** Apply the GLM theorem to $S(t)$ to recover $u(x, t)$.
Every step in this procedure is linear. The nonlinearity of KdV is entirely absorbed into the nonlinear forward transform (computing $S$ from $u_0$). Once in scattering space, the dynamics are purely linear.
The analogy with the Fourier method for the linearised equation $u_t + u_{xxx} = 0$ is illuminating. For that equation, one applies the Fourier transform to get $\hat u_t - ik^3\hat u = 0$, evolves as $\hat u(k, t) = \hat u_0(k)e^{ik^3 t}$, and inverts. The inverse scattering transform replaces the Fourier transform with the scattering map, the simple Fourier evolution with the explicit scattering data evolution, and the inverse Fourier transform with the GLM equation. The key enabling fact is that $u_t + u_{xxx} - 6uu_x = 0$ holds if and only if $L_t = [L, A]$ — without this exact equivalence, the Isospectral Flow Theorem would not apply and the method would fail.
[remark: Integrals of Motion]
Since the discrete eigenvalues $\kappa_n$ are time-independent, the quantities $\kappa_1, \ldots, \kappa_N$ are conserved quantities of the KdV equation. They are in involution (their Poisson brackets vanish), which is one reason KdV is called Liouville-integrable in the PDE sense; the full infinite-dimensional Hamiltonian structure that makes this precise — including the bi-Hamiltonian framework that generates the complete tower of commuting conservation laws — is developed in Chapter 4.
[/remark]
## Reflectionless Potentials
The GLM equation is difficult to solve in full generality because $F(x)$ contains an integral over the continuous spectrum. However, for a special class of potentials, the contribution from $R(k)$ vanishes entirely, and the GLM equation reduces to a finite-dimensional linear system that can be solved by elementary linear algebra. These special potentials are the seeds from which soliton solutions grow.
[definition: Reflectionless Potential]
A compactly supported potential $u(\cdot\,, 0) \in C_c^\infty(\mathbb{R})$ is called **reflectionless** if $R(k, 0) = 0$ for all $k \in \mathbb{R}$.
[/definition]
If $u$ starts reflectionless, the evolution formula gives $R(k, t) = R(k, 0)e^{8ik^3 t} = 0$ for all $t > 0$. Reflectionless potentials remain reflectionless under KdV evolution — a remarkable structural stability property.
For a reflectionless potential, the function $F$ entering the GLM equation simplifies dramatically:
\begin{align*}
F(x) = \sum_{n=1}^N c_n^2 e^{-\kappa_n x}.
\end{align*}
This is a finite sum of exponentials. We now make an Ansatz for the kernel $K$: suppose
\begin{align*}
K(x, y) = \sum_{m=1}^N K_m(x) e^{-\kappa_m y}
\end{align*}
for some unknown functions $K_1, \ldots, K_N$. Substituting into the GLM equation and using the linear independence of $\{e^{-\kappa_n y}\}_{n=1}^N$, we extract $N$ equations, one for each $n$:
\begin{align*}
c_n^2 e^{-\kappa_n x} + K_n(x) + \sum_{m=1}^N \frac{c_n^2 K_m(x)}{\kappa_n + \kappa_m} e^{-(\kappa_n + \kappa_m)x} = 0.
\end{align*}
This is a linear system for $(K_1(x), \ldots, K_N(x))$ at each fixed $x$. Setting
\begin{align*}
A_{nm}(x) = \delta_{nm} + \frac{c_n^2 e^{-(\kappa_n + \kappa_m)x}}{\kappa_n + \kappa_m}, \qquad K = (K_1, \ldots, K_N)^\top, \qquad c = (c_1^2 e^{-\kappa_1 x}, \ldots, c_N^2 e^{-\kappa_N x})^\top,
\end{align*}
the system becomes $AK = -c$.
Rather than solving this system directly, we exploit a beautiful identity to compute $K(x, x)$ without knowing $K$ explicitly. Observe that
\begin{align*}
\frac{d}{dx} A_{nm}(x) = -c_n^2 e^{-\kappa_n x} e^{-\kappa_m x} = (-c)_n e^{-\kappa_m x}.
\end{align*}
Since $K(x, x) = \sum_{m=1}^N K_m(x)e^{-\kappa_m x} = \sum_{m,n} (A^{-1})_{mn}(-c)_n e^{-\kappa_m x}$, we can substitute to get
\begin{align*}
K(x, x) = \sum_{m,n} (A^{-1})_{mn} A'_{nm} = \operatorname{tr}(A^{-1}A') = \frac{d}{dx}\log(\det A).
\end{align*}
The last step uses the matrix identity $\operatorname{tr}(A^{-1}A') = (\det A)'/(\det A)$. Therefore,
\begin{align*}
u(x) = -2\frac{d}{dx}K(x, x) = -2\frac{d^2}{dx^2}\log(\det A).
\end{align*}
Restoring the time dependence via $c_n(t) = c_n(0)e^{4\kappa_n^3 t}$, we obtain the $N$-soliton solution to KdV:
\begin{align*}
u(x, t) = -2\frac{\partial^2}{\partial x^2}\log(\det A(x, t)),
\end{align*}
where
\begin{align*}
A_{nm}(x, t) = \delta_{nm} + \frac{c_n(0)^2 e^{8\kappa_n^3 t} e^{-(\kappa_n + \kappa_m)x}}{\kappa_n + \kappa_m}.
\end{align*}
The integer $N$ — the number of discrete bound states — is exactly the number of solitons in the solution. Each bound state eigenvalue $\kappa_n$ determines the speed and amplitude of the $n$-th soliton. The interaction between solitons is encoded in the off-diagonal entries of the matrix $A$.
[example: One-Soliton Solution]
For $N = 1$, the matrix $A$ is the scalar
\begin{align*}
A(x, t) = 1 + \frac{c_1(0)^2 e^{8\kappa_1^3 t} e^{-2\kappa_1 x}}{2\kappa_1}.
\end{align*}
Setting $\xi = \kappa_1 x - 4\kappa_1^3 t - \frac{1}{2}\log\frac{c_1(0)^2}{2\kappa_1}$ and computing $-2\partial_x^2\log A$ gives
\begin{align*}
u(x, t) = -2\kappa_1^2 \operatorname{sech}^2(\kappa_1 x - 4\kappa_1^3 t + \delta),
\end{align*}
for some phase shift $\delta$. This is the classical one-soliton: a localised wave of amplitude $-2\kappa_1^2$ (negative, since KdV has a negative sign in its nonlinear term with this sign convention) moving at speed $4\kappa_1^2$. Larger $\kappa_1$ gives a taller, narrower, faster soliton.
[/example]
The fact that the number of solitons is determined by the number of discrete eigenvalues explains why different initial data produce different numbers of solitons. A smooth, rapidly decaying initial condition $u_0(x)$ generically produces finitely many bound states, hence finitely many solitons, plus a dispersive tail (the radiative part from the continuous spectrum). Over long times, the solitons separate and travel at their individual speeds, with only the radiative component decaying.
## Linearised Inverse Scattering as a Fourier Transform
The inverse scattering transform is a nonlinear generalisation of the Fourier transform. To make this precise, consider a small potential $u(x) = \varepsilon q(x)$ with $0 < \varepsilon \ll 1$. In this perturbative regime, the scattering map can be linearised, and the result is precisely the Fourier transform.
This section shows that the Fourier transform is the limit of the inverse scattering transform as the nonlinearity is turned off — confirming that IST is the natural nonlinear extension of the classical Fourier method.
[quotetheorem:1345]
[citeproof:1345]
With this linearised $R(k)$, the GLM equation becomes tractable. Since both $K$ and $F$ are $O(\varepsilon)$, we have $K(x, y) + F(x + y) = O(\varepsilon^2)$, so to leading order $K(x, x) = -F(2x)$. For a reflectionless perturbation with no bound states, $F(x) = \frac{1}{2\pi}\int_{-\infty}^\infty e^{ikx}R(k)\,dk$, so
\begin{align*}
K(x, x) = -\frac{1}{2\pi}\int_{-\infty}^\infty e^{2ikx}R(k)\,dk.
\end{align*}
Differentiating and substituting the linearised $R(k)$:
\begin{align*}
u(x) = -2\frac{d}{dx}K(x, x) = \frac{1}{2\pi}\int_{-\infty}^\infty 2ik\, R(k)\, e^{2ikx}\, dk.
\end{align*}
Substituting $R(k) = \frac{\varepsilon}{2ik}\hat q(2k) + O(\varepsilon^2)$ where $\hat q(\xi) = \int_{-\infty}^\infty e^{-i\xi z}q(z)\,dz$ is the Fourier transform of $q$:
\begin{align*}
u(x) = \frac{\varepsilon}{2\pi}\int_{-\infty}^\infty \hat q(2k)\, e^{2ikx}\, d(2k) = \varepsilon q(x),
\end{align*}
by the Fourier inversion theorem. The IST recovers $\varepsilon q(x)$ exactly, which is $u(x)$ to leading order.
This computation has a striking interpretation. In the linear regime, the scattering map $u \mapsto R(k)$ is (up to the weight $1/(2ik)$) the Fourier transform. The GLM inversion is the inverse Fourier transform. The full nonlinear IST is therefore the natural generalisation of the Fourier method to the nonlinear setting: the scattering map plays the role of the Fourier transform, the evolution of scattering data plays the role of the ODE in Fourier space, and the GLM equation plays the role of the inverse Fourier transform.
[remark: IST as Nonlinear Fourier Analysis]
The inverse scattering transform can be viewed as a change of coordinates on the space of initial data, mapping $u_0$ to scattering data $S$. In these new coordinates, the KdV flow is linear. The hard part — the nonlinearity — is entirely contained in the coordinate change. This is the same philosophy as the action-angle variables constructed in Chapter 1: find canonical coordinates in which the Hamiltonian depends only on the action variables, exploit the resulting linearity of Hamilton's equations, and transform back.
[/remark]
While the Inverse Scattering Transform solves specific equations, a deeper question emerges: what structural principles—beyond ad hoc Lax pairs and spectral methods—guarantee that a PDE is integrable? The next chapter unveils the abstract architecture underlying all integrable systems: infinite-dimensional Hamiltonian formalism, bi-Hamiltonian structures, and zero-curvature conditions that encode integrability as a geometric property independent of specific examples.
Chapter 1 developed integrability theory for finite-dimensional Hamiltonian systems — phase spaces of dimension $2n$, the symplectic matrix $J$, the Poisson bracket, and the Arnold–Liouville theorem, which identifies the invariant tori and action-angle coordinates. Chapters 2 and 3 then showed that the KdV equation and related PDEs behave like infinite-dimensional integrable systems, admitting soliton solutions and an inverse scattering transform. The Korteweg–de Vries equation and its kin, however, live in an infinite-dimensional world — the phase space is a function space, not $\mathbb{R}^{2n}$. To understand why these PDEs are integrable, we must extend the entire Hamiltonian framework to this infinite-dimensional setting. This chapter builds that extension systematically: we define functionals, functional derivatives, and Hamiltonian operators, show that KdV fits this mould in not just one but two compatible ways (making it bi-Hamiltonian), and then develop the zero curvature representation, which is the universal language unifying Lax pairs and the inverse scattering method.
# 4. Structure of Integrable PDEs
## Infinite-Dimensional Hamiltonian Systems
What does it mean for a PDE to be a Hamiltonian system? In finite dimensions, the data are a phase space $\mathbb{R}^{2n}$, an antisymmetric nondegenerate matrix $J$, and a Hamiltonian $H: \mathbb{R}^{2n} \to \mathbb{R}$. The equation of motion is
\begin{align*}
\frac{dx}{dt} = J \frac{\partial H}{\partial x}.
\end{align*}
For a PDE like KdV, the "state" is a function $u(x,t)$, not a finite list of coordinates. The index $i = 1, \ldots, 2n$ is replaced by a continuous variable $x \in \mathbb{R}$. To lift the finite-dimensional theory, we must replace each ingredient — inner products, derivative operators, and the matrix $J$ — with an infinite-dimensional analogue.
### The $L^2$ Inner Product and Functionals
The finite-dimensional dot product $x \cdot y = \sum_i x_i y_i$ becomes an integral. For functions $u, v \in L^2(\mathbb{R})$, we write
\begin{align*}
\langle u, v \rangle = \int_{\mathbb{R}} u(x) v(x)\, dx.
\end{align*}
When $u$ and $v$ also depend on $t$, this inner product is a function of $t$.
In the finite-dimensional setting, the Hamiltonian is a smooth function $H: \mathbb{R}^{2n} \to \mathbb{R}$. Its infinite-dimensional analogue cannot be an arbitrary functional on a function space — we restrict attention to a class of functionals with good variational properties.
Without this restriction, the notion of "differentiating with respect to $u$" would not be well-defined in the sense needed to write down equations of motion.
[definition: Functional]
A **functional** $F: C^\infty(\mathbb{R}) \to \mathbb{R}$ is a real-valued map of the form
\begin{align*}
F[u] = \int_{\mathbb{R}} f(x, u, u_x, u_{xx}, \ldots)\, dx,
\end{align*}
where $f$ is a smooth function of its arguments and $u \in C^\infty(\mathbb{R})$ decays sufficiently rapidly at infinity for the integral to converge. If $u$ depends on time, then $F[u]$ is a function of $t$.
[/definition]
Square brackets denote functionals throughout, to distinguish them from ordinary functions: $H[u]$ is a functional, while $h(x)$ is a function.
### The Functional Derivative
To write the equation of motion $u_t = \mathcal{J}\, \delta H$, we need a notion of "derivative of a functional with respect to $u$." The analogue of $\partial H / \partial x_i$ is the functional derivative. The key point is that $\delta F$ is a function (not a number), playing the role of the gradient of $F$ in function space.
Without the functional derivative, we cannot formulate the equation of motion for an infinite-dimensional Hamiltonian system, since the right-hand side of $u_t = \mathcal{J}\, \delta H$ would be undefined.
[definition: Functional Derivative]
The **functional derivative** of $F[u]$ at $u$ is the function $\delta F[u]: \mathbb{R} \to \mathbb{R}$ satisfying
\begin{align*}
\langle \delta F, \eta \rangle = \lim_{\varepsilon \to 0} \frac{F[u + \varepsilon\eta] - F[u]}{\varepsilon}
\end{align*}
for all $\eta \in C_c^\infty(\mathbb{R})$. Equivalently,
\begin{align*}
F[u + \varepsilon\eta] = F[u] + \varepsilon\langle \delta F, \eta \rangle + o(\varepsilon)
\end{align*}
as $\varepsilon \to 0$. The functional derivative $\delta F$ is itself a function of $x$, depending on the base point $u$.
[/definition]
The functional derivative is the infinite-dimensional analogue of the gradient vector $\partial H / \partial x$: it is the direction in function space along which $F$ increases most steeply.
[example: Functional Derivative of the Dirichlet Functional]
Let $F[u] = \frac{1}{2}\int_{\mathbb{R}} u_x^2\, dx$. To compute $\delta F$, we expand:
\begin{align*}
F[u + \varepsilon\eta] &= \frac{1}{2}\int_{\mathbb{R}} (u_x + \varepsilon\eta_x)^2\, dx \\
&= F[u] + \varepsilon \int_{\mathbb{R}} u_x \eta_x\, dx + o(\varepsilon) \\
&= F[u] + \varepsilon \langle u_x, \eta_x \rangle + o(\varepsilon).
\end{align*}
This is not yet in the form $\langle \delta F, \eta \rangle$, since the inner product involves $\eta_x$ rather than $\eta$. Integrating by parts — valid because $\eta$ has compact support so boundary terms vanish — gives
\begin{align*}
\int_{\mathbb{R}} u_x \eta_x\, dx = -\int_{\mathbb{R}} u_{xx} \eta\, dx = \langle -u_{xx}, \eta \rangle.
\end{align*}
Therefore $\delta F = -u_{xx}$.
[/example]
Integration by parts is the workhorse for computing functional derivatives: whenever the inner product lands on a derivative of $\eta$, we integrate by parts to transfer derivatives back onto $u$.
The general formula, familiar from variational principles, gives: if $F[u] = \int f(x, u, u_x, u_{xx}, \ldots)\, dx$, then
\begin{align*}
\delta F = \frac{\partial f}{\partial u} - \frac{d}{dx}\left(\frac{\partial f}{\partial u_x}\right) + \frac{d^2}{dx^2}\left(\frac{\partial f}{\partial u_{xx}}\right) - \cdots
\end{align*}
Here $\frac{d}{dx}$ denotes the **total derivative** with respect to $x$, which differs from the partial derivative when $f$ depends on $u(x)$ and its derivatives.
[definition: Total Derivative]
Let $f: \mathbb{R} \times \mathbb{R} \times \mathbb{R} \times \cdots \to \mathbb{R}$, written $f = f(x, u, u_x, u_{xx}, \ldots)$, be a smooth function of finitely many arguments. For a given function $u = u(x)$, the **total derivative** of $f$ with respect to $x$ is
\begin{align*}
\frac{d}{dx} f(x, u(x), u_x(x), \ldots) = \frac{\partial f}{\partial x} + u_x \frac{\partial f}{\partial u} + u_{xx} \frac{\partial f}{\partial u_x} + \cdots
\end{align*}
[/definition]
The total derivative applies the chain rule to $f$ along a particular function $u(x)$, treating $u$ and all its derivatives as functions of $x$.
### The Hamiltonian Operator and Equation of Motion
The last ingredient is the replacement for the antisymmetric matrix $J$. In finite dimensions, the Poisson bracket of two functions $f, g: \mathbb{R}^{2n} \to \mathbb{R}$ is
\begin{align*}
\{f, g\} = \frac{\partial f}{\partial x} \cdot J \frac{\partial g}{\partial x}.
\end{align*}
This bracket is bilinear, antisymmetric, and satisfies the Jacobi identity. To promote this to infinite dimensions, we replace $\partial / \partial x$ with the functional derivative, replace the dot product with $\langle \cdot, \cdot \rangle$, and replace $J$ with a linear operator $\mathcal{J}$.
The difficulty is that antisymmetry and linearity of $\mathcal{J}$ do not automatically guarantee the Jacobi identity. The Jacobi identity is an additional constraint on $\mathcal{J}$, which must be verified case by case. Without it, the bracket would not define a genuine Poisson structure, and the connection to conserved quantities would break down.
[definition: Hamiltonian Operator]
A linear operator $\mathcal{J}: C^\infty(\mathbb{R}) \to C^\infty(\mathbb{R})$ is a **Hamiltonian operator** if it is antisymmetric (i.e., $\langle u, \mathcal{J} v \rangle = -\langle \mathcal{J} u, v \rangle$ for all $u, v$ decaying at infinity) and the bracket
\begin{align*}
\{F, G\} = \langle \delta F, \mathcal{J}\, \delta G \rangle = \int_{\mathbb{R}} \delta F(x)\, (\mathcal{J}\, \delta G)(x)\, dx
\end{align*}
satisfies the Jacobi identity $\{F, \{G, H\}\} + \{G, \{H, F\}\} + \{H, \{F, G\}\} = 0$ for all functionals $F, G, H$.
[/definition]
The simplest antisymmetric linear operator on functions is $\mathcal{J} = \partial_x$. To see why $\partial_x$ is antisymmetric, integrate by parts: for any two smooth functions $u, v$ decaying at infinity,
\begin{align*}
\langle u, \partial_x v \rangle = \int_{\mathbb{R}} u\, v_x\, dx = -\int_{\mathbb{R}} u_x\, v\, dx = -\langle \partial_x u, v \rangle,
\end{align*}
where the boundary term vanishes. Antisymmetry of $\partial_x$ is therefore a direct consequence of integration by parts. That $\partial_x$ also satisfies the Jacobi identity can be verified by a direct calculation; the same integration-by-parts argument shows that the relevant triple bracket terms cancel.
However, antisymmetry alone does not guarantee that an operator is Hamiltonian — the Jacobi identity is an additional, independent condition. To see this concretely, consider $\mathcal{J} = \partial_x^2$. This operator is symmetric, not antisymmetric (integration by parts gives $\langle u, \partial_x^2 v \rangle = \langle \partial_x^2 u, v \rangle$), so it fails already at the first requirement. A more instructive failure is $\mathcal{J} = \partial_x^2 \circ Q$ for suitable nonlinear multipliers $Q$: such operators can be antisymmetric yet fail the Jacobi identity, and the bracket they define is not a Poisson bracket. Antisymmetry is necessary but not sufficient.
[definition: Hamiltonian Form]
An evolution equation for $u = u(x, t)$ is in **Hamiltonian form** if it can be written as
\begin{align*}
\frac{\partial u}{\partial t} = \mathcal{J}\, \delta H
\end{align*}
for some functional $H = H[u]$ and some Hamiltonian operator $\mathcal{J}: C^\infty(\mathbb{R}) \to C^\infty(\mathbb{R})$.
[/definition]
The functional $H$ is the Hamiltonian of the system, and $\mathcal{J}$ encodes the symplectic (or Poisson) structure. For any functional $I = I[u]$, the time evolution of $I$ along solutions is governed by the Poisson bracket, exactly as in finite dimensions.
[quotetheorem:1346]
[citeproof:1346]
This result has an immediate and important consequence: $I$ is a first integral of the system (conserved in time) if and only if $\{I, H\} = 0$. The functional $I$ Poisson-commutes with the Hamiltonian exactly when it is conserved. This is the infinite-dimensional version of Noether's theorem.
The Jacobi identity for $\mathcal{J}$ is what makes this statement useful in practice. If $\mathcal{J}$ failed the Jacobi identity, the bracket $\{\cdot,\cdot\}$ would be a skew-symmetric bilinear form but not a Poisson bracket. In that case the identity $\frac{dI}{dt} = \{I,H\}$ would still hold formally — it is a tautology given the definition of $\{I,H\}$ — but the collection of first integrals would no longer be closed under the bracket. Two conserved quantities $I_1, I_2$ with $\{I_1, H\} = 0 = \{I_2, H\}$ do not necessarily give $\{\{I_1, I_2\}, H\} = 0$ unless the Jacobi identity holds. The entire architecture of commuting conservation laws therefore depends on $\mathcal{J}$ being a genuine Hamiltonian operator.
The correspondence between finite and infinite dimensions is captured in the following dictionary:
| Finite-dimensional | Infinite-dimensional |
|---|---|
| Coordinates $x_i(t)$, $i = 1, \ldots, 2n$ | Function $u(x, t)$, $x \in \mathbb{R}$ |
| Inner product $x \cdot y = \sum_i x_i y_i$ | Inner product $\langle u, v \rangle = \int u(x)v(x)\, dx$ |
| Gradient $\partial / \partial x$ | Functional derivative $\delta / \delta u$ |
| Antisymmetric matrix $J$ | Hamiltonian operator $\mathcal{J}$ |
| Functions $f = f(x)$ | Functionals $F = F[u]$ |
| Hamiltonian system $\dot{x} = J\, \partial H/\partial x$ | PDE $u_t = \mathcal{J}\, \delta H$ |
## Bi-Hamiltonian Systems
### Two Compatible Hamiltonian Structures
The Hamiltonian framework explains conservation laws one at a time: each Poisson-commuting pair $(I, H)$ gives one conserved quantity. But integrable PDEs like KdV possess infinitely many independent conserved quantities. Where do they all come from?
The answer lies in a remarkable phenomenon: some equations can be written in Hamiltonian form in two distinct ways, using two different Hamiltonian operators. When the two structures are compatible in an appropriate sense, they generate an infinite tower of conserved quantities automatically.
[definition: Bi-Hamiltonian System]
A PDE is **bi-Hamiltonian** if there exist two Hamiltonian operators $\mathcal{J}_0, \mathcal{J}_1: C^\infty(\mathbb{R}) \to C^\infty(\mathbb{R})$, and two functionals $H_0, H_1: C^\infty(\mathbb{R}) \to \mathbb{R}$, such that the PDE can be written both as
\begin{align*}
\frac{\partial u}{\partial t} = \mathcal{J}_1\, \delta H_1 = \mathcal{J}_0\, \delta H_0.
\end{align*}
[/definition]
The two Hamiltonian representations must be genuinely different — otherwise the bi-Hamiltonian structure is vacuous. The key condition that makes the two structures "compatible" is that any linear combination $\alpha \mathcal{J}_0 + \beta \mathcal{J}_1$ is also a Hamiltonian operator. This compatibility is the engine that drives the generation of infinitely many conserved quantities.
### KdV is Bi-Hamiltonian
The definition of a bi-Hamiltonian system raises an immediate question: does any concrete PDE actually satisfy it? Writing down two different Hamiltonian operators for the same equation is not automatic — we need $\mathcal{J}_1\,\delta H_1 = \mathcal{J}_0\,\delta H_0$, and we also need both $\mathcal{J}_0$ and $\mathcal{J}_1$ to satisfy the Jacobi identity independently. For the simpler operator $\mathcal{J}_1 = \partial_x$ this can be checked directly, but for a more complicated operator involving $u$ itself the Jacobi identity is far from obvious. The KdV equation $u_t = 6uu_x - u_{xxx}$ is the prototypical instance where all of this works out.
[example: KdV as a Bi-Hamiltonian System]
**First Hamiltonian structure.** Take $\mathcal{J}_1 = \partial_x$ and
\begin{align*}
H_1[u] = \int_{\mathbb{R}} \left(\frac{1}{2}u_x^2 + u^3\right) dx.
\end{align*}
The functional derivative is $\delta H_1 = -u_{xx} + 3u^2$, and therefore
\begin{align*}
\mathcal{J}_1\, \delta H_1 = \partial_x(-u_{xx} + 3u^2) = -u_{xxx} + 6uu_x,
\end{align*}
which is exactly the KdV equation.
**Second Hamiltonian structure.** Take
\begin{align*}
\mathcal{J}_0 = -\partial_x^3 + 4u\partial_x + 2u_x, \qquad H_0[u] = \int_{\mathbb{R}} \frac{1}{2}u^2\, dx.
\end{align*}
The functional derivative is $\delta H_0 = u$, and one can verify that $\mathcal{J}_0\, \delta H_0 = -u_{xxx} + 6uu_x$, again reproducing KdV. So $\mathcal{J}_1\, \delta H_1 = \mathcal{J}_0\, \delta H_0$, confirming that KdV is bi-Hamiltonian.
[/example]
The second Hamiltonian operator $\mathcal{J}_0 = -\partial_x^3 + 4u\partial_x + 2u_x$ is more complex than $\mathcal{J}_1 = \partial_x$: it depends on $u$ itself, making it a nonlinear operator in the usual sense (linear as a map on function space for fixed $u$, but the coefficients depend on $u$). Verifying that $\mathcal{J}_0$ satisfies the Jacobi identity requires checking the Schouten bracket condition, which is a substantial computation we do not carry out here.
### The Infinite Tower of Conserved Quantities
A single Hamiltonian structure gives us one Poisson bracket, and from it we can extract conservation laws one at a time: $I$ is conserved when $\{I, H\} = 0$. There is no mechanism in a single Hamiltonian structure that automatically produces new conserved quantities from old ones. The second Hamiltonian structure changes this entirely. Having two compatible structures $\mathcal{J}_0$ and $\mathcal{J}_1$ satisfying $\mathcal{J}_1\,\delta H_1 = \mathcal{J}_0\,\delta H_0$, we can define a sequence of Hamiltonians via the recurrence relation
\begin{align*}
\mathcal{J}_1\, \delta H_{n+1} = \mathcal{J}_0\, \delta H_n, \qquad n \geq 0.
\end{align*}
This says: to pass from $H_n$ to $H_{n+1}$, apply $\mathcal{J}_0$ to $\delta H_n$ and then "invert" $\mathcal{J}_1$. The existence and uniqueness of $H_{n+1}$ given $H_n$ — that is, the solvability of this equation in the space of local functionals — can be established, though the proof requires tools from formal variational calculus.
[quotetheorem:1347]
[citeproof:1347]
[remark: Involution and Integrability]
The involution property $\{H_n, H_m\}_1 = 0$ is the infinite-dimensional analogue of the condition for Arnold–Liouville integrability established in Chapter 1: functionally independent first integrals that Poisson-commute pairwise. For KdV, the first few conserved quantities $H_0, H_1, H_2, \ldots$ recover the mass, momentum, energy, and higher-order conserved densities familiar from explicit computation. The bi-Hamiltonian structure explains all of these at once.
[/remark]
The theorem's proof hinges critically on the antisymmetry of $\mathcal{J}_0$. If $\mathcal{J}_0$ failed to be antisymmetric, the third equality would not hold, and the index-shift argument would collapse. Similarly, the recurrence relation must be solvable for each $H_{n+1}$; if $\mathcal{J}_1$ were not invertible in the appropriate sense, the tower would terminate. Both hypotheses are essential.
## Zero Curvature Representation
### The Compatibility Condition
The Lax pair approach — writing a PDE as the compatibility condition of an overdetermined linear system — generalises beyond the specific KdV framework. Let us consider a general overdetermined system for an $N$-dimensional vector $v = v(x, t; \lambda)$:
\begin{align*}
\frac{\partial v}{\partial x} = U(x, t; \lambda)\, v, \qquad \frac{\partial v}{\partial t} = V(x, t; \lambda)\, v,
\end{align*}
where $U$ and $V$ are $N \times N$ matrix-valued functions depending on $(x, t)$ and on a spectral parameter $\lambda$. This system is overdetermined: two equations for a single unknown vector $v$. For a solution to exist, the two equations must be compatible.
[quotetheorem:1348]
[citeproof:1348]
[remark: Compatibility vs. Curvature]
The name "zero curvature" has geometric content. The matrices $U$ and $V$ can be thought of as the components of a connection one-form on the $(x, t)$-plane with values in a Lie algebra. The expression $\frac{\partial U}{\partial t} - \frac{\partial V}{\partial x} + [U, V]$ is precisely the curvature of this connection. The equation says the connection is flat — that is, parallel transport around any loop in the $(x,t)$-plane is path-independent.
[/remark]
If the curvature $\frac{\partial U}{\partial t} - \frac{\partial V}{\partial x} + [U,V]$ is nonzero, then $v_{xt} \neq v_{tx}$: parallel transport from $(x_0, t_0)$ to $(x_1, t_1)$ depends on the path taken, and the fundamental matrix is no longer well-defined as a function on the $(x,t)$-plane. In physical terms, the scattering data picks up a nontrivial path-dependent factor as time evolves, destroying the isospectrality on which the inverse scattering method relies. The zero curvature condition is therefore precisely what must hold for the inverse scattering method to be applicable.
The zero curvature equation is a nonlinear PDE for $U$ and $V$ themselves. When $U$ and $V$ are parametrised by a physical field $u(x,t)$ in a particular way, the zero curvature equation becomes equivalent to the integrable PDE satisfied by $u$.
### From Lax Pairs to Zero Curvature
How does the Lax pair formulation of earlier chapters fit into the zero curvature framework? Recall from Chapter 3 that the isospectral flow theorem says: if $L_t = [L, A]$, then the eigenvalues of $L$ are constant in time. The eigensolutions satisfy the pair of equations
\begin{align*}
L\psi &= \lambda\psi \\
\psi_t + A\psi &= 0.
\end{align*}
Conversely, if we enforce $\lambda_t = 0$ and differentiate the first equation with respect to $t$, substituting the second, we recover $L_t = [L, A]$. So Lax's equation is precisely the compatibility condition for these two linear equations.
To convert this into zero curvature form, we reduce the scalar eigenvalue problem to a first-order system. Suppose $L$ and $A$ are differential operators of order $n$ and $m$ respectively. The equation $L\psi = \lambda\psi$ expresses $\partial_x^n\psi$ as a linear combination of $\psi, \partial_x\psi, \ldots, \partial_x^{n-1}\psi$. Introducing the vector
\begin{align*}
\Psi = (\psi,\, \partial_x\psi,\, \ldots,\, \partial_x^{n-1}\psi)^\top \in \mathbb{R}^n,
\end{align*}
the scalar equation $L\psi = \lambda\psi$ becomes the matrix equation
\begin{align*}
\frac{\partial \Psi}{\partial x} = U(\lambda)\,\Psi,
\end{align*}
where $U(\lambda)$ is the companion matrix. Similarly, the time equation $\psi_t + A\psi = 0$ produces
\begin{align*}
\frac{\partial \Psi}{\partial t} = V(\lambda)\,\Psi
\end{align*}
for some matrix $V(\lambda)$.
The chain of equivalences is therefore:
\begin{align*}
L_t = [L, A] \quad \iff \quad \begin{cases} L\psi = \lambda\psi \\ \psi_t + A\psi = 0 \end{cases} \quad \iff \quad \begin{cases} \Psi_x = U(\lambda)\Psi \\ \Psi_t = V(\lambda)\Psi \end{cases} \quad \iff \quad \frac{\partial U}{\partial t} - \frac{\partial V}{\partial x} + [U, V] = 0.
\end{align*}
Every Lax pair gives rise to a zero curvature representation, and the two conditions are equivalent.
[example: Zero Curvature for KdV]
For the KdV Lax pair, $L = -\partial_x^2 + u$ and $A = -4\partial_x^3 + 6u\partial_x + 3u_x$, so $n = 2$. Setting $\Psi = (\psi, \partial_x\psi)^\top$, the companion matrix is
\begin{align*}
U(\lambda) = \begin{pmatrix} 0 & 1 \\ \lambda - u & 0 \end{pmatrix}.
\end{align*}
The matrix $V(\lambda)$ encoding the time evolution $\psi_t + A\psi = 0$ has entries depending on $u$ and $u_x$. The zero curvature equation $U_t - V_x + [U, V] = 0$ then recovers $u_t = 6uu_x - u_{xxx}$ by direct computation.
[/example]
## The Geometric Idea Behind Zero Curvature
### Scattering Data and Flat Connections
The zero curvature equation guarantees that parallel transport of $v$ around any loop in the $(x,t)$-plane is path-independent. But why should that imply anything about the eigenvalues of the scattering operator? The question is this: the spatial equation $\partial_x v = Uv$ generates a fundamental matrix solution $\Phi(x,t;\lambda)$ whose monodromy encodes the scattering data. As $t$ advances, $u(x,t)$ changes, so $U$ changes, so $\Phi$ changes. What geometric structure forces the eigenvalues of the monodromy to remain constant? The answer is flatness. Consider the general system
\begin{align*}
\frac{\partial v}{\partial x} = U(x, t; \lambda)\,v, \qquad \frac{\partial v}{\partial t} = V(x, t; \lambda)\,v.
\end{align*}
The spatial equation $\partial_x v = Uv$ plays the role of the Schrödinger equation $L\psi = k^2\psi$ from inverse scattering: it generates scattering data at a fixed time. The time equation $\partial_t v = Vv$ then governs how this scattering data evolves.
The zero curvature equation is the condition that ensures this evolution is consistent — that time can be run forwards and backwards without ambiguity. In the language of connections, this is the statement that the connection with one-form components $U\, dx + V\, dt$ is flat.
Flatness has a crucial consequence: the monodromy matrix — the matrix solution obtained by transporting the fundamental matrix around a spatial period — depends on $\lambda$ but its eigenvalues are independent of $t$. These eigenvalues are the spectral invariants of the system, playing precisely the role that the discrete eigenvalues played in the KdV inverse scattering transform.
[remark: Why Zero Curvature Unifies Integrable Systems]
The zero curvature formulation provides a uniform language for all known integrable PDEs. Different integrable systems correspond to different choices of Lie algebra (the matrices $U$ and $V$ may take values in $\mathfrak{sl}_2(\mathbb{C})$, $\mathfrak{gl}_N(\mathbb{C})$, loop algebras, etc.) and different polynomial dependence on the spectral parameter $\lambda$. The nonlinear Schrödinger equation, the sine-Gordon equation, the KdV equation, and the Toda lattice all fit into this framework. This universality — one condition, many integrable systems — is part of what makes the zero curvature representation so powerful.
[/remark]
The central message of this chapter is that the structural properties of integrable PDEs are not accidents. They reflect deep geometric and algebraic facts: the Hamiltonian structure explains conservation laws, the bi-Hamiltonian structure explains why there are infinitely many of them and why they commute, and the zero curvature representation reveals that the entire theory rests on the geometry of flat connections in the $(x,t)$-plane. These three perspectives — Hamiltonian, bi-Hamiltonian, and zero curvature — are complementary lenses on the same underlying integrability.
The abstract structures of integrable PDEs are intimately connected to their symmetries, which can be systematized through Lie groups and prolongation techniques. The final chapter explores how symmetry groups and their infinitesimal generators organize the solution space, and how special tests—notably the Painleve test—serve as diagnostic tools to identify integrable systems and unlock their explicit solutions.
Symmetry is one of the most powerful organising principles in mathematics and physics. The chapters so far have examined integrability through conserved quantities, Hamiltonian structure, and the inverse scattering method. This chapter takes a different but complementary route: instead of asking what quantities are preserved, we ask what transformations preserve the equation itself. When a differential equation possesses a continuous family of such transformations — a Lie group of symmetries — the structure can be exploited systematically to reduce order, find exact solutions, and test for integrability. The machinery that makes this precise is the theory of Lie groups, Lie algebras, and prolongations.
# 5. Symmetry Methods in PDEs
## Lie Groups and Lie Algebras
What does it mean for a family of transformations to depend "smoothly" on its parameters? The rotation matrices $\mathrm{SO}(2)$ form a group, but they also depend continuously on an angle — rotating by $\theta$ and then by $\phi$ is the same as rotating by $\theta + \phi$, and this composition rule is smooth in the parameters. This extra smoothness is precisely what a Lie group captures, and it is what allows us to differentiate group elements and pass from the nonlinear world of groups to the linear world of algebras.
We begin with the basic algebraic structures before adding the smooth manifold structure.
[definition: Group]
A **group** is a set $G$ equipped with a binary operation $(g_1, g_2) \mapsto g_1 g_2$ satisfying:
1. **Associativity**: $(g_1 g_2) g_3 = g_1(g_2 g_3)$ for all $g_1, g_2, g_3 \in G$.
2. **Identity**: there exists a unique $e \in G$ with $ge = eg = g$ for all $g \in G$.
3. **Inverses**: for each $g \in G$ there exists $g^{-1} \in G$ with $gg^{-1} = g^{-1}g = e$.
[/definition]
What matters for applications is not just the algebraic structure, but how a group acts on the spaces where our differential equations live.
[definition: Group Action]
A group $G$ **acts** on a set $X$ if there is a map $G \times X \to X$, written $(g, x) \mapsto g(x)$, satisfying $g(h(x)) = (gh)(x)$ and $e(x) = x$ for all $g, h \in G$ and $x \in X$.
[/definition]
The rotation group $\mathrm{SO}(2)$ acts on $\mathbb{R}^2$ via matrix multiplication, and this action is smooth in the parameter $t$ used to label the rotation. Making this smoothness precise requires the notion of a Lie group.
[definition: Lie Group]
An **$m$-dimensional Lie group** is a group $G$ whose elements depend continuously on $m$ parameters $t = (t_1, \ldots, t_m) \in \mathbb{R}^m$, written $g(t)$, in such a way that the map $(g_1, g_2) \mapsto g_1 g_2^{-1}$ is smooth as a function of the parameters. We adopt the convention $g(\mathbf{0}) = e$.
[/definition]
The practical criterion is that we only need to check smoothness of $(g_1, g_2) \mapsto g_1 g_2^{-1}$; the separate conditions on multiplication and inversion follow from this.
[example: SO(2) As A Lie Group]
Any element of $G = \mathrm{SO}(2)$ can be written as
\begin{align*}
g(t) = \begin{pmatrix} \cos t & -\sin t \\ \sin t & \cos t \end{pmatrix}, \quad t \in \mathbb{R}.
\end{align*}
Since $g(t)^{-1} = g(-t)$, the map $(g_1, g_2) \mapsto g_1 g_2^{-1}$ corresponds to $(t_1, t_2) \mapsto t_1 - t_2$, which is smooth. So $\mathrm{SO}(2)$ is a $1$-dimensional Lie group.
[/example]
[example: The Heisenberg Group]
Consider upper-triangular matrices of the form
\begin{align*}
g(t) = \begin{pmatrix} 1 & t_1 & t_3 \\ 0 & 1 & t_2 \\ 0 & 0 & 1 \end{pmatrix}, \quad t = (t_1, t_2, t_3) \in \mathbb{R}^3.
\end{align*}
Under matrix multiplication, $g(a) g(b)$ yields upper-triangular matrices with entries $a_1 + b_1$, $a_2 + b_2$, and $a_3 + b_3 + a_1 b_2$. Computing the inverse gives
\begin{align*}
g(b)^{-1} = \begin{pmatrix} 1 & -b_1 & b_1 b_2 - b_3 \\ 0 & 1 & -b_2 \\ 0 & 0 & 1 \end{pmatrix}.
\end{align*}
The map $(a, b) \mapsto g(a)g(b)^{-1}$ corresponds to the parameter map $(a, b) \mapsto (a_1 - b_1,\, a_2 - b_2,\, b_1 b_2 - b_3 - a_1 b_2 + a_3)$, which is polynomial and hence smooth. The Heisenberg group is therefore a $3$-dimensional Lie group.
[/example]
The key insight is that Lie groups are complicated nonlinear objects, but they carry an associated linear structure that is far easier to work with. Recall from Chapter 1 that flow maps are hard to study directly, but the vector fields generating them — the infinitesimal flows — are tractable; Chapter 1 formalised this through the commutator $[V_1, V_2]$ of vector fields and the theorem that two flows commute if and only if their generating vector fields have vanishing commutator. Exactly the same philosophy applies here: instead of working with the Lie group directly, we pass to its **Lie algebra**, which consists of the "infinitesimal group elements" near the identity.
For matrix Lie groups, this passage is concrete. Given a curve $A(\varepsilon)$ in $G$ with $A(0) = I$, expand
\begin{align*}
A(\varepsilon) = I + \varepsilon a + o(\varepsilon).
\end{align*}
The **Lie algebra** $\mathfrak{g}$ is the set of all leading-order terms $a$ arising from such curves. To see that $\mathfrak{g}$ is a vector space: if $A(\varepsilon) = I + \varepsilon a + o(\varepsilon)$ and $B(\varepsilon) = I + \varepsilon b + o(\varepsilon)$, then $A(\varepsilon)B(\varepsilon) = I + \varepsilon(a+b) + o(\varepsilon)$, so $a + b \in \mathfrak{g}$. Scalar multiples follow by reparametrising $\varepsilon \mapsto \lambda\varepsilon$.
The more remarkable fact is that $\mathfrak{g}$ is closed under the **Lie bracket** (commutator). Consider the curve
\begin{align*}
C(\varepsilon) = A(\sqrt{\varepsilon})\, B(\sqrt{\varepsilon})\, A(\sqrt{\varepsilon})^{-1} B(\sqrt{\varepsilon})^{-1}.
\end{align*}
Since $A(\varepsilon)^{-1} = I - \varepsilon a + o(\varepsilon)$ (which follows from $A(\varepsilon)A(\varepsilon)^{-1} = I$), expanding $C(\varepsilon)$ to order $\varepsilon$ yields
\begin{align*}
C(\varepsilon) = I + \varepsilon(ab - ba) + o(\varepsilon).
\end{align*}
So $[a, b]_L := ab - ba \in \mathfrak{g}$ whenever $a, b \in \mathfrak{g}$. The product of two infinitesimally-near-identity elements need not stay near the identity, but their commutator does.
[definition: Lie Algebra]
A **Lie algebra** is a vector space $\mathfrak{g}$ equipped with a bilinear, anti-symmetric map $[\cdot,\cdot]_L : \mathfrak{g} \times \mathfrak{g} \to \mathfrak{g}$, called the **Lie bracket**, satisfying the **Jacobi identity**:
\begin{align*}
[a,[b,c]_L]_L + [b,[c,a]_L]_L + [c,[a,b]_L]_L = 0
\end{align*}
for all $a, b, c \in \mathfrak{g}$. If $\dim \mathfrak{g} = m$, we say the algebra has **dimension** $m$.
[/definition]
The most important source of Lie algebras in this course is Lie groups, but the definition is purely algebraic.
[example: Cross Product As Lie Bracket]
Set $\mathfrak{g} = \mathbb{R}^3$ with $[a, b]_L = a \times b$. The cross product is bilinear and anti-symmetric. The vector triple product identity $a \times (b \times c) = b(a \cdot c) - c(a \cdot b)$ implies the Jacobi identity, confirming that $(\mathbb{R}^3, \times)$ is a $3$-dimensional Lie algebra.
[/example]
[example: Poisson Bracket As Lie Bracket]
Let $M$ be a phase space and $\mathfrak{g} = \{f : M \to \mathbb{R} \text{ smooth}\}$. Define $[f, g]_L = \{f, g\}$ (the Poisson bracket). Since the Poisson bracket is bilinear, anti-symmetric, and satisfies the Jacobi identity, this is a Lie algebra.
[/example]
[example: Lie Algebra Of SO(n)]
Let $G = \mathrm{SO}(n)$. For a curve $A(\varepsilon)$ in $G$ with $A(0) = I$, differentiate $A(\varepsilon)A(\varepsilon)^\top = I$ to get $a + a^\top = 0$, so the leading coefficient $a$ must be anti-symmetric. The trace condition $\det A(\varepsilon) = 1$ gives $\operatorname{tr}(a) = 0$, which is automatic for anti-symmetric matrices. Conversely, every anti-symmetric matrix $a$ generates a curve $A(t) = \exp(at)$ in $\mathrm{SO}(n)$. Thus
\begin{align*}
\mathfrak{so}(n) = \{a \in \mathrm{Mat}_n(\mathbb{R}) : a + a^\top = 0\}.
\end{align*}
That $\mathfrak{so}(n)$ is closed under $[a,b]_L = ab - ba$ follows from $(ab - ba)^\top = b^\top a^\top - a^\top b^\top = ba - ab = -(ab-ba)$.
[/example]
By convention, if a Lie group is denoted by capital letters (e.g. $\mathrm{SO}(n)$), its Lie algebra is denoted by the same letters in lowercase Fraktur (e.g. $\mathfrak{so}(n)$). Since $\mathfrak{g}$ is a vector space, we may choose a basis $\{a_i\}_{i=1}^m$. The Lie bracket must then satisfy
\begin{align*}
[a_i, a_j]_L = \sum_{k=1}^m c_{ij}^k\, a_k
\end{align*}
for constants $c_{ij}^k$ called the **structure constants** of the algebra. These constants encode the entire bracket structure.
## Vector Fields and One-Parameter Groups of Transformations
How do we bring Lie group theory to bear on differential equations? The key mechanism is to let the group act on the coordinate space of our equation. A group element $g \in G$ sends coordinates $x$ to new coordinates $\tilde{x} = g(x)$. For this to be useful analytically, we need a way to pass between the group (complicated) and a linear description (tractable). One-parameter subgroups provide exactly this bridge.
[definition: One-Parameter Group Of Transformations]
A smooth map $g^\varepsilon : \mathbb{R}^n \to \mathbb{R}^n$ is called a **one-parameter group of transformations** (1.p.g.t.) if
\begin{align*}
g^0 = \mathrm{id}, \qquad g^{\varepsilon_1} g^{\varepsilon_2} = g^{\varepsilon_1 + \varepsilon_2}.
\end{align*}
[/definition]
This is a smooth homomorphism from $(\mathbb{R}, +)$ into the group of diffeomorphisms of $\mathbb{R}^n$. The generating vector field records the infinitesimal action.
[definition: Generating Vector Field]
Given a one-parameter group of transformations $g^\varepsilon$, its **generating vector field** is $V : \mathbb{R}^n \to \mathbb{R}^n$ defined by
\begin{align*}
V(x) = \frac{d}{d\varepsilon} g^\varepsilon x\bigg|_{\varepsilon = 0}.
\end{align*}
This is the leading-order term in the Taylor expansion $g^\varepsilon x = x + \varepsilon V(x) + o(\varepsilon)$.
[/definition]
Going in the reverse direction — from a vector field to a one-parameter group — is an existence result for ODEs.
[quotetheorem:1349]
The result is not surprising: we are simply invoking the existence and uniqueness theorem for ODEs, which guarantees a smooth dependence on initial conditions. The one-dimensional space of solutions to the ODE is pinned down by requiring $\tilde{x}(0) = x$, forcing the linear term to be exactly $V$.
[example: From Vector Field To Transformation]
Consider the vector field
\begin{align*}
V = x \frac{\partial}{\partial x} + \frac{\partial}{\partial y}.
\end{align*}
The ODE system is $d\tilde{x}/d\varepsilon = \tilde{x}$ and $d\tilde{y}/d\varepsilon = 1$, with initial conditions $(\tilde{x}(0), \tilde{y}(0)) = (x, y)$. Solving gives the one-parameter group
\begin{align*}
g^\varepsilon(x, y) = (x e^\varepsilon,\; y + \varepsilon).
\end{align*}
[/example]
[example: From Transformation To Vector Field]
Consider the rotation group acting on $\mathbb{R}^2$:
\begin{align*}
g^\varepsilon(x, y) = (x\cos\varepsilon - y\sin\varepsilon,\; y\cos\varepsilon + x\sin\varepsilon).
\end{align*}
One can verify $g^0 = \mathrm{id}$ and $g^{\varepsilon_1} g^{\varepsilon_2} = g^{\varepsilon_1+\varepsilon_2}$. Differentiating at $\varepsilon = 0$:
\begin{align*}
V(x, y) = \frac{d}{d\varepsilon}(x\cos\varepsilon - y\sin\varepsilon, y\cos\varepsilon + x\sin\varepsilon)\bigg|_{\varepsilon=0} = (-y, x).
\end{align*}
In differential operator notation, this is $V = -y\,\partial/\partial x + x\,\partial/\partial y$.
[/example]
### The Exponential Map and Taylor Expansion
It is natural to identify a vector field $V = (V_1, \ldots, V_n)^\top$ with the first-order differential operator $V = \sum_{i=1}^n V_i \,\partial/\partial x_i$. Under this identification, the commutator of vector fields as differential operators coincides with the bracket we computed from the Lie group:
\begin{align*}
[V, W](f) = (VW - WV)(f) = \left(V_j \frac{\partial W_i}{\partial x_j} - W_j \frac{\partial V_i}{\partial x_j}\right)\frac{\partial f}{\partial x_i},
\end{align*}
where repeated indices are summed. So the commutator $[V, W] = VW - WV$ is itself a first-order operator, confirming closure under the Lie bracket.
[quotetheorem:1359]
[citeproof:1359]
[remark: Convergence Of The Exponential]
The series $\exp(\varepsilon V)$ should be interpreted with care: it converges extremely rarely, even for simple vector fields. Defining convergence of the operator series $\exp(\varepsilon V)$ requires significant functional-analytic machinery. The formula is nonetheless useful as a formal device for computing the action of $\psi^\varepsilon$.
[/remark]
The link to Lie groups is as follows. If a Lie group $G$ acts on $\mathbb{R}^n$, its action typically contains many one-parameter subgroups. Crucially, near the identity, every element $g(t)$ can be reached by composing finitely many one-parameter flows:
\begin{align*}
g(t) = g_{i_1}^{\varepsilon_1} g_{i_2}^{\varepsilon_2} \cdots g_{i_N}^{\varepsilon_N}.
\end{align*}
So to understand the group, we only need to understand its one-parameter subgroups — and to understand those, we only need their generating vector fields, i.e. the Lie algebra. This is the fundamental economy of Lie theory.
## Symmetries of Differential Equations
With the machinery of one-parameter groups in place, we can make precise what it means for a transformation to be a symmetry of a differential equation.
Denote a general differential equation (possibly a system) by
\begin{align*}
\Delta[x, u, u_{x}, u_{\mathbf{xx}}, \ldots] = 0,
\end{align*}
where $u = u(x)$ is the unknown function. The key observation is that when we apply a transformation, we must transform **both** the independent variables $x$ and the dependent variable $u$ — and these can be mixed together. A transformation $g^\varepsilon$ thus acts on the "extended space" of both domain and codomain.
[definition: Lie Point Symmetry]
A one-parameter group of transformations $g^\varepsilon : (x, u) \mapsto (\tilde{x}, \tilde{u})$ is a **Lie point symmetry** of $\Delta$ if
\begin{align*}
\Delta[x, u, u_{x}, \ldots] = 0 \implies \Delta[\tilde{x}, \tilde{u}, \tilde{u}_{\tilde{x}}, \ldots] = 0.
\end{align*}
In other words, $g^\varepsilon$ maps solutions to solutions.
[/definition]
The vector field generating the Lie point symmetry takes the form
\begin{align*}
V = \xi(x, u) \frac{\partial}{\partial x} + \eta(x, u) \frac{\partial}{\partial u},
\end{align*}
where $\xi$ and $\eta$ encode the infinitesimal transformations of the independent and dependent variables respectively.
[example: Translation Symmetry Of KdV]
Consider the KdV equation $\Delta = u_t + u_{xxx} - 6uu_x = 0$. The time translation
\begin{align*}
g^\varepsilon(x, t, u) = (x, t + \varepsilon, u)
\end{align*}
is a Lie point symmetry. By the chain rule, $\partial\tilde{u}/\partial\tilde{t} = \partial u/\partial t$, $\tilde{u}_{\tilde{x}} = u_x$, and $\tilde{u}_{\tilde{x}\tilde{x}\tilde{x}} = u_{xxx}$. So $\Delta[\tilde{x}, \tilde{t}, \tilde{u}] = \Delta[x, t, u] = 0$ whenever $\Delta = 0$. The generator is $V = \partial/\partial t$.
[/example]
The real power of Lie point symmetries comes from using them to solve or reduce equations. The strategy is to find coordinates adapted to the symmetry — coordinates in which the symmetry generator looks like a pure translation.
[example: Solving ODEs Via Lie Point Symmetry]
Consider the ODE $du/dx = F(u/x)$. The transformation $g^\varepsilon(x, u) = (e^\varepsilon x, e^\varepsilon u)$ is a Lie point symmetry, generated by $V = x\,\partial/\partial x + u\,\partial/\partial u$. We seek coordinates $(s, t)$ in which $V = \partial/\partial t$, i.e. coordinates satisfying
\begin{align*}
V(s) = x\frac{\partial s}{\partial x} + u\frac{\partial s}{\partial u} = 0, \qquad V(t) = x\frac{\partial t}{\partial x} + u\frac{\partial t}{\partial u} = 1.
\end{align*}
One solution is $s = u/x$ and $t = \log|x|$, with inverse $x = e^t$, $u = se^t$. In the $(s,t)$ coordinates, the ODE transforms to
\begin{align*}
\frac{dt}{ds} = \frac{1}{F(s) - s}
\end{align*}
(for $F(s) \neq s$), which has no explicit $t$ dependence and can be integrated directly:
\begin{align*}
\log|x| = C + \int^{u/x} \frac{ds}{F(s) - s}.
\end{align*}
Knowledge of a single Lie point symmetry reduced the problem to a quadrature.
[/example]
This example illustrates the general reduction method: for an $n$th-order ODE admitting a Lie point symmetry generated by $V = \xi(x,u)\,\partial/\partial x + \eta(x,u)\,\partial/\partial u$, introduce invariant coordinates $(s,t)$ with $V = \partial/\partial t$. In these coordinates the ODE has no explicit $t$, so the substitution $r = dt/ds$ reduces the order by one. Repeating with further symmetries can, in principle, fully integrate the equation.
This reduction strategy has two limitations worth emphasising. First, we need to know a symmetry in advance — finding it systematically requires the prolongation theory developed in the next section. Second, the method reduces order by one for each symmetry, but the reduced equation may itself be difficult. The method converts the problem, not necessarily to something easy, but to something more tractable.
## Jets and Prolongations
We know that Lie point symmetries are useful, but how do we find them systematically for a given equation? The fundamental difficulty is that a symmetry of an $n$th-order equation must preserve not just the values of $u$ and $x$, but also all the derivatives up to order $n$. We need a framework that tracks how derivatives transform when we change variables.
Without a systematic way to compute how $u_x$, $u_{xx}$, and higher derivatives transform under $(x, u) \mapsto (\tilde{x}, \tilde{u})$, we would have to recompute from scratch for every equation. The jet space framework provides the right setting, and the prolongation formula makes the computation algorithmic.
### The Jet Space Framework
The idea is to treat the values of $u$ and all its derivatives up to order $n$ as independent coordinates in a larger "jet" space.
[definition: Jet Space]
Given dependent variable $u$ and independent variable $x$, the **base space** has coordinates $(x, u)$. The **first jet space** has coordinates $(x, u, u_x)$, and the **$n$th jet space** has coordinates $(x, u, u_x, u_{xx}, \ldots, u^{(n)})$.
[/definition]
A function $u = u(x)$ traces out a curve in the $n$th jet space parametrised by $x$. An $n$th-order ODE $\Delta[x, u, u_x, \ldots, u^{(n)}] = 0$ is an algebraic equation cutting out a hypersurface in the $n$th jet space. A Lie point symmetry is then a transformation of the base space whose induced action on the $n$th jet space preserves this hypersurface.
The key question is: how does a transformation of the base space act on the jet space? Given $g^\varepsilon : (x, u) \mapsto (\tilde{x}, \tilde{u})$, the induced transformation $\mathrm{pr}^{(n)} g^\varepsilon$ on the $n$th jet space (the **$n$th prolongation**) is determined by the chain rule.
[example: Prolongation By Direct Computation]
Let $g^\varepsilon : (x, u) \mapsto (e^\varepsilon x, e^{-\varepsilon} u)$, with generator $V = x\,\partial/\partial x - u\,\partial/\partial u$. By the chain rule:
\begin{align*}
\frac{d\tilde{u}}{d\tilde{x}} = \frac{d\tilde{u}/dx}{d\tilde{x}/dx} = \frac{e^{-\varepsilon} u_x}{e^\varepsilon} = e^{-2\varepsilon} u_x.
\end{align*}
So the first prolongation is $\mathrm{pr}^{(1)} g^\varepsilon : (x, u, u_x) \mapsto (e^\varepsilon x, e^{-\varepsilon} u, e^{-2\varepsilon} u_x)$. Continuing, $d\tilde{u}_{xx}/d\tilde{x} = e^{-3\varepsilon} u_{xx}$, giving
\begin{align*}
\mathrm{pr}^{(2)} g^\varepsilon : (x, u, u_x, u_{xx}) \mapsto (e^\varepsilon x, e^{-\varepsilon} u, e^{-2\varepsilon} u_x, e^{-3\varepsilon} u_{xx}).
\end{align*}
The generator of $\mathrm{pr}^{(2)} g^\varepsilon$, called the **second prolongation** of $V$, is
\begin{align*}
\mathrm{pr}^{(2)} V = x\frac{\partial}{\partial x} - u\frac{\partial}{\partial u} - 2u_x\frac{\partial}{\partial u_x} - 3u_{xx}\frac{\partial}{\partial u_{xx}}.
\end{align*}
[/example]
Notice how we only needed to compute the new $u_{xx}$ coefficient — the earlier terms $x$, $u$, $u_x$ do not change when we prolong further. This observation is the key to the recursive prolongation formula.
### The Prolongation Formula
Rather than working with the Lie group action (which would be horrendous to prolong), we prolong the vector field — an object in the Lie algebra. This is where the linearisation pays off.
The fundamental tool is the **total derivative** operator:
\begin{align*}
D_x = \frac{\partial}{\partial x} + u_x \frac{\partial}{\partial u} + u_{xx} \frac{\partial}{\partial u_x} + u_{xxx} \frac{\partial}{\partial u_{xx}} + \cdots,
\end{align*}
which differentiates along curves in jet space arising from genuine functions $u = u(x)$.
[quotetheorem:1350]
[citeproof:1350]
The prolongation formula is the engine of the symmetry calculation. Its significance is that it reduces a geometric question (how does the group action extend to jet space?) to a routine algebraic computation using the recursion. Crucially, we only need to know $\xi$ and $\eta$ — the infinitesimal data — and everything higher follows by differentiation.
[example: Checking The Prolongation Formula]
For $g^\varepsilon : (x, u) \mapsto (e^\varepsilon x, e^{-\varepsilon} u)$, we have $\xi = x$ and $\eta = -u$. Then
\begin{align*}
\eta_1 &= D_x(-u) - u_x D_x(x) = -u_x - u_x \cdot 1 = -2u_x,
\end{align*}
so $\mathrm{pr}^{(1)} V = x\,\partial/\partial x - u\,\partial/\partial u - 2u_x\,\partial/\partial u_x$, agreeing with the direct computation.
[/example]
### Finding Lie Point Symmetries Systematically
With the prolongation formula in hand, we can derive the **symmetry condition** that characterises Lie point symmetries.
For an $n$th-order ODE written as $u^{(n)} = \omega(x, u, u', \ldots, u^{(n-1)})$, think of $\Delta = u^{(n)} - \omega$ as a function on the $n$th jet space. The Lie point symmetry condition (under a maximal rank hypothesis) is:
[quotetheorem:1360]
This is the **on-shell condition**: the prolonged vector field must annihilate the defining equation when evaluated on the solution surface in jet space. The condition translates into an overdetermined system of PDEs for $(\xi, \eta)$, called the **determining equations**. Because these equations are overdetermined, they generically have only finitely-dimensional solution spaces — the symmetry algebra of a generic ODE is finite-dimensional.
[example: Symmetries Of The Free Particle Equation]
Consider $u'' = 0$. The symmetry condition requires $\mathrm{pr}^{(2)} V(u'') = \eta^{(2)} = 0$. Expanding $\eta^{(2)}$ using the prolongation formula gives
\begin{align*}
\eta^{(2)} = \eta_{xx} + (2\eta_{xu} - \xi_{xx}) u' + (\eta_{uu} - 2\xi_{xu}) (u')^2 - \xi_{uu} (u')^3.
\end{align*}
For this to vanish for all values of $u'$ (treating $u'$ as a free parameter on the jet space), each coefficient must vanish separately:
\begin{align*}
\eta_{xx} = 0, \quad 2\eta_{xu} - \xi_{xx} = 0, \quad \eta_{uu} - 2\xi_{xu} = 0, \quad \xi_{uu} = 0.
\end{align*}
Solving this system gives the general solution
\begin{align*}
\xi(x, u) &= c_1 + c_3 x + c_5 u + c_7 x^2 + c_8 xu, \\
\eta(x, u) &= c_2 + c_4 u + c_6 x + c_7 xu + c_8 u^2.
\end{align*}
The symmetry algebra is $8$-dimensional, spanned by
\begin{align*}
V_1 = \partial_x, \quad V_2 = \partial_u, \quad V_3 = x\partial_x, \quad V_4 = u\partial_u, \quad V_5 = u\partial_x, \\
V_6 = x\partial_u, \quad V_7 = x^2\partial_x + xu\partial_u, \quad V_8 = xu\partial_x + u^2\partial_u.
\end{align*}
[/example]
The fact that $u'' = 0$ has an $8$-dimensional symmetry algebra reflects its very high degree of symmetry: it is equivalent to the projective transformations of the line. The size of the symmetry algebra is a measure of how "integrable" an ODE is; most $n$th-order ODEs have no symmetries at all.
[example: KdV Symmetries]
For the KdV equation $u_t + u_{xxx} - 6uu_x = 0$, the vector field $V_3 = \partial/\partial u + \alpha t\,\partial/\partial x$ generates a Lie point symmetry for a specific value of $\alpha$. The flow is $g^\varepsilon : (x, t, u) \mapsto (x + \alpha t\varepsilon, t, u + \varepsilon)$. Transforming the equation under this flow and requiring it to still be satisfied forces $\alpha = -6$. This reflects a Galilean-type boost symmetry of KdV.
[/example]
## Painleve Test and Integrability
We have seen that Lie point symmetries can be used to reduce PDEs to ODEs. The Ablowitz–Ramani–Segur conjecture, which we state below, asserts a deep connection between this reduction process and a much older notion of "goodness" for ODEs: the Painleve property. This section explains that property and shows how it provides a necessary test for integrability of PDEs.
How can one tell whether an equation is integrable just by looking at it? For linear equations, the answer involves the singularity structure of solutions. For nonlinear equations, the analogous question is far more subtle.
For the linear ODE $w'' + p(z) w' + q(z) w = 0$, any singularity of a solution $w(z)$ must arise from a singularity of $p(z)$ or $q(z)$. In particular, the **locations** of singularities are determined by the equation, not by the initial conditions. This is a strong rigidity property.
Nonlinear equations behave differently. Consider
\begin{align*}
\frac{dw}{dz} + w^2 = 0,
\end{align*}
which has the general solution $w(z) = 1/(z - z_0)$. The singularity at $z = z_0$ is a pole, and its location $z_0$ is determined by the initial condition — not by the equation itself. Such singularities are called **movable**.
[definition: Movable Singularity]
Let $w: U \subseteq \mathbb{C} \to \mathbb{C}$ be a complex-valued function defined on an open subset $U$. A **singularity** of $w$ is a point $z_0 \in \mathbb{C}$ at which $w$ loses analyticity (a pole, branch point, essential singularity, etc.). A singularity is **movable** if its location depends on the initial conditions rather than being fixed by the equation.
[/definition]
The nonlinear equation $dw/dz + w^3 = 0$ has the general solution $w(z) = 1/\sqrt{2(z - z_0)}$, which has a **branch point** singularity. Branch points represent a qualitatively worse behaviour than poles: they prevent the solution from being continued as a single-valued function. This distinction motivates the Painleve property.
[definition: Painleve Property]
An ODE of the form $d^n w/dz^n = F(d^{n-1}w/dz^{n-1}, \ldots, w, z)$, where $F$ is rational in $w$ and its derivatives and analytic in $z$, has the **Painleve property** if all movable singularities of its solutions are at worst poles.
[/definition]
The condition "at worst poles" is the key: poles are manageable (the function is still meromorphic), while branch points and essential singularities introduce multi-valuedness that prevents analytic continuation.
[example: ODE With The Painleve Property]
The equation $dw/dz + w^2 = 0$ has the Painleve property: its only movable singularities are the poles $z = z_0$.
[/example]
[example: ODE Without The Painleve Property]
The equation $dw/dz + w^3 = 0$ does not have the Painleve property: the movable singularity $w = (2(z-z_0))^{-1/2}$ is a branch point, not a pole.
[/example]
Painleve's original motivation was classification: he sought to catalogue all second-order ODEs of the form $d^2w/dz^2 = F(dw/dz, w, z)$, with $F$ rational, that possess the Painleve property. He showed there are exactly 50 such equations (up to simple coordinate changes). Of these, 44 can be solved in terms of classical functions (Jacobi elliptic functions, Weierstrass $\wp$-functions, Bessel functions, etc.). The remaining six are irreducible — their general solutions define genuinely new transcendental functions, the six **Painleve transcendents** $P_\mathrm{I}$ through $P_\mathrm{VI}$:
\begin{align*}
(P_\mathrm{I}) \quad \frac{d^2w}{dz^2} &= 6w^2 + z \\
(P_\mathrm{II}) \quad \frac{d^2w}{dz^2} &= 2w^3 + zw + \alpha \\
(P_\mathrm{III}) \quad \frac{d^2w}{dz^2} &= \frac{1}{w}\left(\frac{dw}{dz}\right)^2 + \frac{1}{z}\left(-\frac{dw}{dz} + \alpha w^2 + \beta\right) + \gamma w^3 + \frac{\delta}{w} \\
(P_\mathrm{IV}) \quad \frac{d^2w}{dz^2} &= \frac{1}{2w}\left(\frac{dw}{dz}\right)^2 + \frac{3w^3}{2} + 4zw^2 + 2(z^2 - \alpha)w + \frac{\beta}{w} \\
(P_\mathrm{V}) \quad \frac{d^2w}{dz^2} &= \left(\frac{1}{2w} + \frac{1}{w-1}\right)\left(\frac{dw}{dz}\right)^2 - \frac{1}{z}\frac{dw}{dz} + \frac{(w-1)^2}{z^2}\left(\alpha w + \frac{\beta}{w}\right) + \frac{\gamma w}{z} + \frac{\delta w(w+1)}{w-1} \\
(P_\mathrm{VI}) \quad \frac{d^2w}{dz^2} &= \frac{1}{2}\left(\frac{1}{w} + \frac{1}{w-1} + \frac{1}{w-z}\right)\left(\frac{dw}{dz}\right)^{\!2} - \left(\frac{1}{z} + \frac{1}{z-1} + \frac{1}{w-z}\right)\frac{dw}{dz} \\
&\quad + \frac{w(w-1)(w-z)}{z^2(z-1)^2}\left(\alpha + \frac{\beta z}{w^2} + \frac{\gamma(z-1)}{(w-1)^2} + \frac{\delta z(z-1)}{(w-z)^2}\right).
\end{align*}
The Painleve transcendents are now regarded as nonlinear special functions of fundamental importance in mathematical physics — they arise in random matrix theory, quantum gravity, and the description of special solutions to integrable systems.
### ODE Reductions and the ARS Conjecture
The connection to integrable PDEs comes through **ODE reductions**: if a PDE admits a Lie point symmetry, we can use invariant coordinates to reduce it to an ODE. Specifically, if $g^\varepsilon$ is a Lie point symmetry of a PDE with invariant variable $z$ (satisfying $V(z) = 0$), then we seek solutions of the form $u(x,t) = F(z)$. Substituting into the PDE yields an ODE for $F$.
[example: ODE Reduction Of The Sine-Gordon Equation]
Consider the sine-Gordon equation $u_{xt} = \sin u$ in light-cone coordinates. It admits the Lie point symmetry
\begin{align*}
g^\varepsilon : (x, t, u) \mapsto (e^\varepsilon x, e^{-\varepsilon} t, u),
\end{align*}
generated by $V = x\,\partial/\partial x - t\,\partial/\partial t$. Since $V(xt) = xt - tx = 0$, the variable $z = xt$ is invariant. Setting $u(x,t) = F(z)$ and substituting into the sine-Gordon equation, then setting $w = e^{iF}$, yields
\begin{align*}
\frac{d^2w}{dz^2} = \frac{1}{w}\left(\frac{dw}{dz}\right)^2 - \frac{1}{z}\frac{dw}{dz} - \frac{1}{2z}.
\end{align*}
This is equivalent to $P_\mathrm{III}$. So this ODE reduction of the sine-Gordon equation has the Painleve property.
[/example]
[example: ODE Reduction Of KdV]
The KdV equation $u_t + u_{xxx} - 6uu_x = 0$ admits the less-obvious Lie point symmetry
\begin{align*}
g^\varepsilon(x, t, u) = \left(x + \varepsilon t + \tfrac{1}{2}\varepsilon^2,\; t + \varepsilon,\; u - \tfrac{1}{6}\varepsilon\right),
\end{align*}
generated by $V = t\,\partial/\partial x + \partial/\partial t - \tfrac{1}{6}\,\partial/\partial u$. The invariant coordinates are $z = x - \tfrac{1}{2}t^2$ and $w = u + \tfrac{1}{6}t$. Writing $u(x,t) = -\tfrac{1}{6}t + w(z)$ and substituting, all $t$-dependence cancels (as guaranteed by the symmetry), leaving
\begin{align*}
w'''(z) - 6w(z)w'(z) - \tfrac{1}{6} = 0.
\end{align*}
Integrating once gives $w''(z) - 3w(z)^2 - \tfrac{1}{6}z + z_0 = 0$, which is $P_\mathrm{I}$. This ODE reduction of KdV also has the Painleve property.
[/example]
These examples are not accidents. The Ablowitz–Ramani–Segur conjecture captures a general pattern:
[quotetheorem:1361]
This is stated as a theorem because it is widely believed and verified in all known integrable cases, but it remains a conjecture in full generality — partly because we do not yet have a universally agreed definition of integrability for PDEs. Where specific definitions of integrability are available (e.g. existence of a Lax pair, or solvability by inverse scattering), the conjecture has been verified.
The practical upshot is the Painleve test:
[definition: Painleve Test Of Integrability]
Given a PDE, to test for integrability:
1. Find all Lie point symmetries of the PDE.
2. For each symmetry, compute the corresponding ODE reduction.
3. Test each reduced ODE for the Painleve property.
If any ODE reduction fails the Painleve test (has movable branch points or essential singularities), then the PDE is **not** integrable. If all reductions pass, the PDE is a candidate for integrability, though this is not conclusive.
[/definition]
The asymmetry is important: the Painleve test can definitively rule out integrability, but passing the test does not guarantee it. It is a necessary condition, not a sufficient one. This limitation reflects the deeper fact that integrability of PDEs is not yet a fully characterised property — unlike for Hamiltonian systems in finite dimensions, where the Arnold–Liouville theorem of Chapter 1 gives a clear criterion: exactly $n$ independent first integrals in involution for a $2n$-dimensional system.
[remark: Historical Note]
Painleve served as Prime Minister of France on two occasions: for nine weeks in 1917 and for seven months in 1925. His mathematical work on singularities of ODEs predates both terms and remains foundational to the modern theory of integrable systems.
[/remark]
The framework assembled in this chapter — Lie groups, one-parameter groups, prolongations, and the Painleve test — forms a coherent approach to integrability that complements the Hamiltonian and inverse scattering methods developed earlier in the course. The Lie symmetry approach is particularly powerful for discovering new solutions and for identifying candidates for integrability, while the Painleve test provides a quick screening tool that can rule out integrability before more labour-intensive methods are applied.
## References
A. Ashton, *Integrable Systems*, Cambridge Part II Lecture Notes, Michaelmas 2016.
Contents
- 1. Integrability of ODE's
- Vector Fields and Flow Maps
- Commutativity of Flows and the Commutator
- Hamiltonian Dynamics
- The Poisson Bracket and First Integrals
- Hamiltonian Vector Fields
- First Integrals and Involution
- Canonical Transformations
- Generating Functions
- The Arnold-Liouville Theorem
- The Harmonic Oscillator as a Model Case
- Statement of the Arnold-Liouville Theorem
- Applying Arnold-Liouville in Practice
- Review: The Structure of Integrable Systems
- Competing Effects in Nonlinear Wave Equations
- The KdV Equation and Solitons
- The Sine-Gordon Equation
- Bäcklund Transformations
- Generating Solitons from the Constant Solution
- 3. Inverse Scattering Transform
- Forward Scattering Problem
- Continuous Spectrum
- Discrete Spectrum and Bound States
- Summary and Scattering Data
- Inverse Scattering Problem
- Lax Pairs
- Evolution of Scattering Data
- Evolution of the Continuous Spectrum
- Evolution of the Discrete Spectrum
- Summary of the Inverse Scattering Transform
- Reflectionless Potentials
- Linearised Inverse Scattering as a Fourier Transform
- 4. Structure of Integrable PDEs
- Infinite-Dimensional Hamiltonian Systems
- The $L^2$ Inner Product and Functionals
- The Functional Derivative
- The Hamiltonian Operator and Equation of Motion
- Bi-Hamiltonian Systems
- Two Compatible Hamiltonian Structures
- KdV is Bi-Hamiltonian
- The Infinite Tower of Conserved Quantities
- Zero Curvature Representation
- The Compatibility Condition
- From Lax Pairs to Zero Curvature
- The Geometric Idea Behind Zero Curvature
- Scattering Data and Flat Connections
- 5. Symmetry Methods in PDEs
- Lie Groups and Lie Algebras
- Vector Fields and One-Parameter Groups of Transformations
- The Exponential Map and Taylor Expansion
- Symmetries of Differential Equations
- Jets and Prolongations
- The Jet Space Framework
- The Prolongation Formula
- Finding Lie Point Symmetries Systematically
- Painleve Test and Integrability
- ODE Reductions and the ARS Conjecture
- References
Cambridge II Integrable Systems
Content
Problems
History
Created by admin on 4/22/2026 | Last updated on 4/22/2026
Prerequisites
No prerequisites required for this page.
Rate this page
★
★
★
★
★
Poor
Excellent