This course develops the mathematical framework of classical mechanics, treating mechanics not just as a collection of equations of motion but as a geometric theory of configuration, motion, and symmetry. It begins with the basic language of configuration spaces and variational principles, then moves to Lagrangian mechanics, constraints, and the passage from velocities to momenta via the Legendre transform. From there, the course builds the Hamiltonian formalism and the symplectic viewpoint, where phase space becomes the central object and the structure of the equations of motion is expressed through geometry.
The main themes are variational methods, symplectic geometry, conservation laws, symmetry reduction, and qualitative dynamics. Noether’s theory and momentum maps explain how symmetries produce conserved quantities, while symplectic reduction shows how to simplify systems with symmetry in a systematic way. Later chapters develop Hamilton-Jacobi theory, generating functions, stability analysis, periodic motion, normal forms, and the theory of integrable systems with action-angle coordinates. The final part turns to scattering, adiabatic invariants, and perturbative ideas, showing how the same geometric framework applies both to idealized exactly solvable systems and to more realistic systems with slow variation or weak interaction.
The chapters are arranged to move from foundations to advanced applications in a logical progression. The early material establishes the variational and geometric language needed to formulate mechanics cleanly; the middle chapters translate that language into Hamiltonian dynamics and symmetry-based methods; and the later chapters use those tools to study structure, stability, and long-time behavior. By the end of the course, the student should be able to move fluently between Lagrangian and Hamiltonian descriptions, exploit symmetry, and recognize the geometric mechanisms underlying classical phenomena.
# Introduction
This opening chapter sets the course in motion by fixing the viewpoint from which classical mechanics will be studied. Instead of treating mechanics as a catalogue of force laws, we regard it as a chain of geometric structures: configuration spaces describe possible positions, variational principles select physical paths, phase spaces organise evolution, and symmetries produce conserved quantities. The aim is to make familiar examples such as the pendulum, central force motion, and rigid body dynamics serve as prototypes for broader ideas in symplectic geometry and dynamical systems.
The course assumes multivariable calculus, linear algebra, ordinary differential equations, basic real analysis, smooth manifolds, differential forms, and elementary Lie theory. These prerequisites enter because the natural home of mechanics is rarely a single Euclidean coordinate chart: a particle on a sphere, a rigid body, or a system with symmetry asks for manifolds, tangent bundles, cotangent bundles, and group actions. The chapter therefore records the organising questions and the recurring objects that will appear throughout the lectures.
The failures to keep in mind are concrete. Coordinates may introduce artificial singularities, constraint forces may obscure the true degrees of freedom, and a conserved numerical quantity may be invisible if the symmetry producing it is not identified. The structures introduced here are therefore not decorative abstractions: each is designed to remove a specific ambiguity or obstruction that appears in direct Newtonian coordinates.
## What Is Classical Mechanics Trying To Describe?
The first question is not how to solve a differential equation, but what data a mechanical model should contain. A system has a space of possible positions, a rule assigning an energy or action to paths, and an evolution law that turns initial data into motion. The course studies how these pieces fit together and how much of the motion is forced by geometry rather than by coordinates.
[definition: Configuration Space]
A configuration space is a smooth manifold $Q$ whose points represent possible positions of a mechanical system.
[/definition]
The configuration space is the stage for positions, but motion needs velocities. For a path $q: I \to Q$, its velocity at time $t \in I$ lies in $T_{q(t)}Q$, and the collection of all such position-velocity pairs is the tangent bundle $TQ$. This motivates the Lagrangian notion of state, where position and velocity are recorded together.
[definition: State In Lagrangian Mechanics]
A state in Lagrangian mechanics is a point $(q, v) \in TQ$, where $q \in Q$ and $v \in T_qQ$.
[/definition]
This distinction matters because the same position can support many possible velocities. Newton's second-order equations become first-order equations on $TQ$, and the Lagrangian formalism expresses those equations through a scalar function on $TQ$. The basic Euclidean particle is the reference model that shows how this abstract language matches the classical force law.
[example: Particle In Euclidean Space]
For a particle moving in $\mathbb R^n$, the configuration space is $Q=\mathbb R^n$, and each tangent space is canonically $T_q\mathbb R^n\cong \mathbb R^n$. Hence
\begin{align*}
TQ\cong \mathbb R^n\times \mathbb R^n,
\end{align*}
where a point of $TQ$ is written $(q,v)$ with $q=(q_1,\dots,q_n)$ and $v=(v_1,\dots,v_n)$. For a path $q:I\to\mathbb R^n$, the velocity is
\begin{align*}
\dot q(t)=(\dot q_1(t),\dots,\dot q_n(t)).
\end{align*}
The Lagrangian
\begin{align*}
L(q,v)=\frac{1}{2}m|v|^2-V(q)
\end{align*}
means explicitly
\begin{align*}
L(q,v)=\frac{1}{2}m\sum_{j=1}^n v_j^2-V(q_1,\dots,q_n).
\end{align*}
Applying the *Euler-Lagrange equations from stationary action*, we compute the two derivatives in the $i$th coordinate. Since $q_i$ occurs only in the potential term,
\begin{align*}
\frac{\partial L}{\partial q_i}(q,v)=-\frac{\partial V}{\partial q_i}(q).
\end{align*}
Since $v_i$ occurs in the kinetic term as $\frac{1}{2}m v_i^2$,
\begin{align*}
\frac{\partial L}{\partial v_i}(q,v)=m v_i.
\end{align*}
Along the path, $v_i=\dot q_i(t)$, so
\begin{align*}
\frac{d}{dt}\left(\frac{\partial L}{\partial \dot q_i}(q(t),\dot q(t))\right)=\frac{d}{dt}(m\dot q_i(t))=m\ddot q_i(t).
\end{align*}
The Euler-Lagrange equation therefore becomes
\begin{align*}
m\ddot q_i(t)-\left(-\frac{\partial V}{\partial q_i}(q(t))\right)=0.
\end{align*}
Equivalently,
\begin{align*}
m\ddot q_i(t)=-\frac{\partial V}{\partial q_i}(q(t)).
\end{align*}
Putting the $n$ scalar equations together gives
\begin{align*}
m\ddot q(t)=-\nabla V(q(t)).
\end{align*}
Thus the variational equation for kinetic minus potential energy recovers Newton's force law with conservative force $F(q)=-\nabla V(q)$.
[/example]
This example is deliberately familiar: it shows that the geometric language is not replacing Newtonian mechanics, but reframing it in a way that survives beyond Cartesian coordinates. The next object records the principle that physical paths are selected by varying a number attached to the whole path.
## Why Variational Principles Come First
The next problem is to replace local force balance by a global rule for choosing paths. In many systems, the equation of motion is not introduced directly; it is derived from stationarity of an action functional. This is the bridge from [calculus of variations](/page/Calculus%20of%20Variations) to mechanics.
[definition: Lagrangian]
A Lagrangian on a configuration space $Q$ is a smooth function $L: TQ \to \mathbb R$.
[/definition]
A Lagrangian assigns an instantaneous value to each position-velocity pair. To compare entire candidate motions, the course needs a path-level quantity obtained by accumulating this value over time. This motivates the action functional.
[definition: Action Functional]
Let $Q$ be a configuration space, let $I=[t_0,t_1]$, and let $L:TQ\to\mathbb R$ be a Lagrangian. For fixed endpoints $q_0,q_1\in Q$, let
\begin{align*}
\mathcal A(q_0,q_1)=\{q\in C^\infty(I,Q):q(t_0)=q_0,\ q(t_1)=q_1\}.
\end{align*}
The action functional is the map $S:\mathcal A(q_0,q_1)\to\mathbb R$ defined by
\begin{align*}
S[q] = \int_{t_0}^{t_1} L(q(t), \dot q(t))\,dt.
\end{align*}
[/definition]
The action turns mechanics into a variational problem, but stationarity is still an infinite-dimensional condition on an entire path. To obtain equations that can be solved locally in time, one has to understand how every compactly supported variation changes the integral and how fixed endpoints remove the boundary terms. This is the obstruction that the Euler-Lagrange calculation resolves.
[quotetheorem:3504]
[citeproof:3504]
This cited calculation explains why endpoint conditions, smoothness, and local coordinates are built into the formalism. The fixed-endpoint hypothesis removes boundary terms; if the endpoint $q(t_1)$ is allowed to vary freely, the same [integration by parts](/theorems/210) leaves the boundary contribution $\frac{\partial L}{\partial \dot q_i}(q(t_1),\dot q(t_1))\eta_i(t_1)$, so stationarity imposes a natural boundary condition rather than only the displayed differential equation. Smoothness is also doing real work: for a path with a corner, such as two straight line segments meeting with different velocities, [integration by parts](/theorems/2098) must be applied separately on the two subintervals and produces an extra corner condition matching the momenta across the join. The coordinate hypothesis is local for the same reason; a path on a circle crossing the angular coordinate cut cannot be treated by a single angle chart without splitting the interval or changing charts. The theorem is only a necessary condition for stationarity, not a guarantee that the path minimises the action, so later lectures separate the variational calculation from questions of existence, uniqueness, and stability of solutions. The free particle then isolates the kinetic-energy part of this calculation.
[example: Free Particle]
For $Q=\mathbb R^n$, write $q=(q_1,\dots,q_n)$ and $v=(v_1,\dots,v_n)$. The kinetic-energy Lagrangian is
\begin{align*}
L(q,v)=\frac{1}{2}m|v|^2=\frac{1}{2}m\sum_{j=1}^n v_j^2.
\end{align*}
For each coordinate $q_i$, the function $L$ has no dependence on $q_i$, so
\begin{align*}
\frac{\partial L}{\partial q_i}(q,v)=0.
\end{align*}
For each velocity coordinate $v_i$,
\begin{align*}
\frac{\partial L}{\partial v_i}(q,v)=\frac{\partial}{\partial v_i}\left(\frac{1}{2}m\sum_{j=1}^n v_j^2\right)=mv_i.
\end{align*}
Along a path $q(t)$, we have $v_i=\dot q_i(t)$, hence
\begin{align*}
\frac{d}{dt}\left(\frac{\partial L}{\partial \dot q_i}(q(t),\dot q(t))\right)=\frac{d}{dt}\left(m\dot q_i(t)\right)=m\ddot q_i(t).
\end{align*}
By the *Euler-Lagrange equations from stationary action*,
\begin{align*}
m\ddot q_i(t)-0=0.
\end{align*}
Thus $m\ddot q_i(t)=0$, and since $m>0$ this is equivalent to
\begin{align*}
\ddot q_i(t)=0.
\end{align*}
Integrating once gives $\dot q_i(t)=b_i$ for a constant $b_i\in\mathbb R$, and integrating again gives
\begin{align*}
q_i(t)=a_i+tb_i
\end{align*}
for a constant $a_i\in\mathbb R$. Combining the coordinates, every solution has the form
\begin{align*}
q(t)=a+tb,\qquad a,b\in\mathbb R^n.
\end{align*}
Thus straight-line inertial motion is the variational consequence of kinetic energy alone.
[/example]
The variational formulation is especially useful when the configuration space is curved or constrained. Rather than writing external constraint forces first, the course often builds the geometry into $Q$ and lets the variational principle operate there.
## From Lagrangian To Hamiltonian Pictures
The next question is why the course later abandons velocities in favour of momenta. The reason is that phase space carries a canonical geometric structure, while $TQ$ depends on the chosen Lagrangian to produce the equations of motion. Passing from $TQ$ to $T^*Q$ makes symplectic geometry available.
[definition: Phase Space]
For a configuration space $Q$, the phase space of Hamiltonian mechanics is the cotangent bundle $T^*Q$.
[/definition]
A point of $T^*Q$ consists of a position and a covector at that position. In coordinates it is written $(q,p)$, where $p$ is momentum conjugate to $q$; this notation hides the coordinate-free fact that $p\in T_q^*Q$. To determine motion on this space, the next definition introduces the Hamiltonian as the scalar generator of phase-space evolution.
[definition: Hamiltonian]
A Hamiltonian on a phase space $T^*Q$ is a smooth function $H:T^*Q\to\mathbb R$.
[/definition]
The Hamiltonian often represents total energy, but a scalar function does not by itself specify a trajectory through phase space. The missing step is to turn the derivatives of $H$ into a velocity vector $(\dot q,\dot p)$ in a way compatible with the canonical pairing between positions and momenta. In canonical coordinates, this requirement forces a concrete system of first-order equations.
The useful test is whether this geometric prescription can be written as an explicit recipe for computing trajectories from $H$. Without such a recipe, the Hamiltonian would remain only an energy function, and the phase-space curve through an initial condition would not be computable. In canonical coordinates the obstruction is to determine which partial derivatives give the position velocities and which give the momentum velocities, including the sign forced by the canonical pairing. Hamilton's equations resolve exactly that local computational problem.
[quotetheorem:6833]
[citeproof:6833]
This theorem is the local coordinate expression of a coordinate-free construction. The signs and variables depend on using canonical coordinates for the canonical symplectic form; in polar-type coordinates on the punctured plane, for instance, the area form carries coordinate factors, so writing an arbitrary coordinate pair as $(q,p)$ would give the wrong equations unless the pair is symplectic canonical. Smoothness is also doing work, since the derivatives of $H$ must exist for the Hamiltonian vector field to be defined in this elementary form. The theorem is local on $T^*Q$, so global questions such as whether the flow exists for all time require separate estimates. Much of the course will move between the local equations, which are effective for calculation, and the symplectic formulation, which explains invariance under canonical transformations. The harmonic oscillator is the standard example where the phase-space geometry is visible at once.
[example: Harmonic Oscillator Phase Portrait]
For $Q=\mathbb R$, let
\begin{align*}
H(q,p)=\frac{p^2}{2m}+\frac{kq^2}{2}
\end{align*}
with $m,k>0$. We compute the Hamiltonian vector field using *[Hamilton Equations In Canonical Coordinates](/theorems/6841)*. The partial derivatives are
\begin{align*}
\frac{\partial H}{\partial p}(q,p)=\frac{\partial}{\partial p}\left(\frac{p^2}{2m}+\frac{kq^2}{2}\right)=\frac{p}{m}
\end{align*}
and
\begin{align*}
\frac{\partial H}{\partial q}(q,p)=\frac{\partial}{\partial q}\left(\frac{p^2}{2m}+\frac{kq^2}{2}\right)=kq.
\end{align*}
Therefore Hamilton's equations give
\begin{align*}
\dot q=\frac{p}{m}
\end{align*}
and
\begin{align*}
\dot p=-kq.
\end{align*}
Along a solution $(q(t),p(t))$, the time derivative of $H$ is
\begin{align*}
\frac{d}{dt}H(q(t),p(t))=\frac{\partial H}{\partial q}(q(t),p(t))\dot q(t)+\frac{\partial H}{\partial p}(q(t),p(t))\dot p(t).
\end{align*}
Substituting the derivatives and Hamilton's equations gives
\begin{align*}
\frac{d}{dt}H(q(t),p(t))=kq(t)\frac{p(t)}{m}+\frac{p(t)}{m}(-kq(t)).
\end{align*}
The two terms cancel:
\begin{align*}
kq(t)\frac{p(t)}{m}+\frac{p(t)}{m}(-kq(t))=\frac{kq(t)p(t)}{m}-\frac{kq(t)p(t)}{m}=0.
\end{align*}
Hence $H(q(t),p(t))$ is constant in time.
If the constant energy is $E>0$, the trajectory is contained in the level set
\begin{align*}
\frac{p^2}{2m}+\frac{kq^2}{2}=E.
\end{align*}
Dividing by $E$ gives
\begin{align*}
\frac{p^2}{2mE}+\frac{kq^2}{2E}=1.
\end{align*}
Equivalently,
\begin{align*}
\frac{p^2}{2mE}+\frac{q^2}{2E/k}=1,
\end{align*}
which is an ellipse in the $(q,p)$-plane. For $E=0$, positivity of both terms forces $p=0$ and $q=0$, so the zero-energy level is the equilibrium point $(0,0)$. Thus the phase portrait consists of a stable center surrounded by nested elliptical energy curves.
[/example]
[illustration:harmonic-oscillator-phase-portrait]
The oscillator also previews a recurring theme: when conserved quantities restrict motion to smaller sets, the qualitative dynamics can often be read from geometry before solving the equations explicitly.
## Symmetry And Conservation Laws
The next problem is to understand why some quantities stay constant during motion. [Conservation of energy](/theorems/1335), angular momentum, and linear momentum are not separate accidents; they arise from symmetries of the action or Hamiltonian. This course treats that relation as one of the central structural principles of mechanics.
[definition: Conserved Quantity]
Let $M$ be a phase space with flow $\varphi_t:M\to M$. A conserved quantity is a function $F:M\to\mathbb R$ such that
\begin{align*}
F(\varphi_t(x))=F(x)
\end{align*}
for every $x\in M$ and every time $t$ for which the flow is defined.
[/definition]
A conserved quantity cuts down the region of phase space that an orbit can visit. Several independent conserved quantities may make the system solvable by quadratures, which is the entry point to integrability. The next principle explains where the most important conserved quantities come from.
[quotetheorem:6834]
[citeproof:6834]
This principle gives the conceptual reason for many conservation laws used in computations. Its hypotheses are restrictive: preserving the equations of motion alone is not the same as preserving the action, and a transformation that changes the Lagrangian by a total time derivative requires a modified conserved quantity. Galilean boosts for a free particle provide the standard model situation: the equations are invariant, while the Lagrangian changes by a total derivative, so the conserved boost quantity includes the corresponding correction term. The conclusion also applies along solutions of the Euler-Lagrange equations; away from solutions the displayed quantity need not be constant. Chapter 5 refines this statement using group actions and momentum maps, and Chapter 6 applies it to symplectic reduction. Rotational symmetry gives the most familiar example.
[example: Rotational Symmetry And Angular Momentum]
Let $R\in SO(3)$ be a rotation. Since rotations preserve the Euclidean [inner product](/page/Inner%20Product),
\begin{align*}
|Rv|^2=(Rv)\cdot(Rv)=v\cdot v=|v|^2.
\end{align*}
They also preserve distance from the origin:
\begin{align*}
|Rq|^2=(Rq)\cdot(Rq)=q\cdot q=|q|^2.
\end{align*}
Therefore
\begin{align*}
L(Rq,Rv)=\frac{1}{2}m|Rv|^2-V(|Rq|)=\frac{1}{2}m|v|^2-V(|q|)=L(q,v).
\end{align*}
Thus every one-parameter rotation symmetry preserves the Lagrangian.
Fix an angular velocity vector $\omega\in\mathbb R^3$, and let $\Phi_s(q)$ be rotation about the axis $\omega$ with infinitesimal generator
\begin{align*}
\xi(q)=\omega\times q.
\end{align*}
The momentum conjugate to $v$ is
\begin{align*}
\frac{\partial L}{\partial v}=mv,
\end{align*}
because
\begin{align*}
\frac{\partial}{\partial v_i}\left(\frac{1}{2}m(v_1^2+v_2^2+v_3^2)-V(|q|)\right)=mv_i.
\end{align*}
By *[Noether's Theorem for Point Symmetries](/theorems/6834)*, the quantity
\begin{align*}
J_\omega(q,\dot q)=m\dot q\cdot(\omega\times q)
\end{align*}
is constant along every solution. Using the scalar triple product identity,
\begin{align*}
m\dot q\cdot(\omega\times q)=\omega\cdot(q\times m\dot q).
\end{align*}
Since this holds for every fixed $\omega$, the vector
\begin{align*}
\ell=q\times m\dot q
\end{align*}
is conserved. This is angular momentum.
If $\ell\neq 0$, then
\begin{align*}
q(t)\cdot \ell=q(t)\cdot(q(t)\times m\dot q(t))=0,
\end{align*}
so $q(t)$ lies in the fixed plane $\ell^\perp$. Also
\begin{align*}
\dot q(t)\cdot \ell=\dot q(t)\cdot(q(t)\times m\dot q(t))=m\,\dot q(t)\cdot(q(t)\times\dot q(t))=0,
\end{align*}
so the velocity remains tangent to the same plane. In polar coordinates in that plane,
\begin{align*}
q=r e_r
\end{align*}
and
\begin{align*}
\dot q=\dot r e_r+r\dot\theta e_\theta.
\end{align*}
Hence
\begin{align*}
|\dot q|^2=\dot r^2+r^2\dot\theta^2.
\end{align*}
The angular momentum magnitude is
\begin{align*}
|\ell|=m r^2\dot\theta.
\end{align*}
Writing $\ell_0=|\ell|$, we get
\begin{align*}
\dot\theta=\frac{\ell_0}{m r^2}.
\end{align*}
The energy is
\begin{align*}
E=\frac{1}{2}m(\dot r^2+r^2\dot\theta^2)+V(r).
\end{align*}
Substituting $\dot\theta=\ell_0/(m r^2)$ gives
\begin{align*}
E=\frac{1}{2}m\dot r^2+\frac{\ell_0^2}{2mr^2}+V(r).
\end{align*}
Thus the three-dimensional central force problem reduces to motion in the fixed plane $\ell^\perp$, with the radial coordinate moving as a one-dimensional system with effective potential
\begin{align*}
V_{\mathrm{eff}}(r)=V(r)+\frac{\ell_0^2}{2mr^2}.
\end{align*}
[/example]
Symmetry also changes how we count degrees of freedom. If a system has redundant variables coming from a [group action](/page/Group%20Action), reduction constructs a smaller phase space on which the essential dynamics lives.
## Integrability As A Final Organising Goal
The final question of the course is when Hamiltonian systems can be solved by geometry. A general nonlinear system may have complicated long-time behaviour, but systems with enough commuting conserved quantities admit special coordinates in which the motion becomes linear. This is the setting of action-angle variables.
The bracket notation in the next definition is the Poisson bracket developed in Chapter 4; at this stage, it records the compatibility condition that the conserved quantities generate mutually compatible Hamiltonian flows.
[definition: Completely Integrable Hamiltonian System]
Let $(M,\omega)$ be a symplectic manifold of dimension $2n$, and let $H:M\to\mathbb R$ be a Hamiltonian. The Hamiltonian system is completely integrable if there are smooth functions $F_1,\dots,F_n:M\to\mathbb R$ such that each $F_i$ is conserved along the Hamiltonian flow, the differentials $dF_1,\dots,dF_n$ are linearly independent on a dense open subset of $M$, and
\begin{align*}
\{F_i,F_j\}=0,\qquad i,j=1,\dots,n.
\end{align*}
[/definition]
The condition has two parts: enough conserved quantities to restrict the motion, and compatibility through the Poisson bracket. The compatibility condition allows the corresponding Hamiltonian flows to fit together rather than obstruct each other. This motivates the Liouville-Arnold theorem, which identifies the geometric shape and coordinates forced by complete integrability under regularity and compactness assumptions.
[quotetheorem:1353]
The assumptions are essential to the conclusion. Regularity and independence ensure that the common level set is a smooth $n$-dimensional submanifold; at a critical energy of the pendulum, for instance, the separatrix has a singular crossing at the unstable equilibrium and is not a circle on which an angle coordinate winds uniformly. Compactness rules out noncompact level sets such as the positive-energy trajectories of a free particle on a line, where the level set has open components rather than tori. Connectedness prevents the conclusion from accidentally merging several invariant tori into one object. These examples explain why the theorem is stated with regular, compact, connected components rather than arbitrary common level sets.
[example: One-Dimensional Bound Motion]
Fix a regular energy value $E$ and a connected bounded component $C_E$ of the level set
\begin{align*}
H^{-1}(E)=\{(q,p):p^2/(2m)+V(q)=E\}.
\end{align*}
Since $dH=(V'(q),p/m)$ is nonzero on $C_E$, the level set is locally a smooth one-dimensional curve. Boundedness makes $C_E$ a compact connected one-dimensional curve in the $(q,p)$-plane, so it is a closed loop.
On this loop the energy equation gives
\begin{align*}
\frac{p^2}{2m}=E-V(q).
\end{align*}
Multiplying by $2m$ gives
\begin{align*}
p^2=2m(E-V(q)).
\end{align*}
Hence away from turning points,
\begin{align*}
p=\sqrt{2m(E-V(q))}
\end{align*}
on the upper branch and
\begin{align*}
p=-\sqrt{2m(E-V(q))}
\end{align*}
on the lower branch. If the bounded motion runs between turning points $q_-$ and $q_+$ with $V(q_-)=V(q_+)=E$, then the enclosed phase-space area is the upper-branch integral minus the lower-branch integral:
\begin{align*}
A(E)=\int_{q_-}^{q_+}\sqrt{2m(E-V(q))}\,dq-\int_{q_-}^{q_+}\left(-\sqrt{2m(E-V(q))}\right)\,dq.
\end{align*}
Combining the two terms gives
\begin{align*}
A(E)=2\int_{q_-}^{q_+}\sqrt{2m(E-V(q))}\,dq.
\end{align*}
The corresponding action variable is therefore
\begin{align*}
I(E)=\frac{1}{2\pi}A(E)=\frac{1}{\pi}\int_{q_-}^{q_+}\sqrt{2m(E-V(q))}\,dq.
\end{align*}
Hamilton's equations are
\begin{align*}
\dot q=\frac{p}{m}
\end{align*}
and
\begin{align*}
\dot p=-V'(q).
\end{align*}
On the upper branch, $p=\sqrt{2m(E-V(q))}$, so
\begin{align*}
dt=\frac{m\,dq}{\sqrt{2m(E-V(q))}}.
\end{align*}
The return trip along the lower branch contributes the same time, hence the period is
\begin{align*}
T(E)=2\int_{q_-}^{q_+}\frac{m\,dq}{\sqrt{2m(E-V(q))}}.
\end{align*}
An angle coordinate may then be chosen so that
\begin{align*}
\theta(t)=\theta(0)+\frac{2\pi}{T(E)}t\pmod{2\pi}.
\end{align*}
Thus the action records the area enclosed by the regular energy loop, while the angle records uniform motion around that loop; this is the one-degree-of-freedom model for the *Liouville-Arnold theorem*.
[/example]
This final theme links the earlier parts of the course. Variational principles produce equations, the Legendre transform relates Lagrangian and Hamiltonian descriptions, symplectic geometry gives the invariant language, symmetry supplies conserved quantities, and integrability explains when those quantities organise the full motion.
## Course Map
The course begins with configuration spaces, paths, action functionals, and the Euler-Lagrange equations. It then studies constraints, cyclic coordinates, Routh reduction, and effective potentials, showing how variational methods adapt when the allowed motions are restricted.
The middle part develops Hamiltonian mechanics on cotangent bundles. The main tools are the canonical symplectic form, Hamiltonian vector fields, Poisson brackets, canonical transformations, generating functions, and Hamilton-Jacobi theory.
The final part studies symmetry and integrability. In computations the recurring pattern is to choose the configuration space $Q$, write either a Lagrangian $L:TQ\to\mathbb R$ or a Hamiltonian $H:T^*Q\to\mathbb R$, derive the equations of motion, identify symmetries and conserved quantities, and then reduce the system when those quantities make redundant variables removable. Momentum maps and reduction explain how Lie group actions simplify dynamics, while the Liouville-Arnold theorem and action-angle variables describe the most structured Hamiltonian systems. Throughout, examples such as geodesic motion, the pendulum, central forces, rigid bodies, and oscillators serve as test cases for the theory.
Chapter 0 has already shown that symmetry, geometry, and conservation laws are the right language for mechanics. Chapter 1 begins by making that language precise: it separates the set of possible configurations from the bundle of possible velocities, and then uses the variational principle to pick out the actual motions.
# 1. Configuration Spaces and Variational Principles
Classical mechanics begins by separating the space of possible positions from the space of possible motions. A configuration space records where a system can be, while its tangent bundle records both position and instantaneous velocity. The variational principle then selects physically admissible paths by making an action functional stationary.
The prerequisites for this chapter are smooth manifolds, tangent spaces, smooth curves, and the chain rule in local coordinates. We also use the basic language of functionals on spaces of paths and ordinary differential equations with prescribed initial data. The chapter builds the Lagrangian side of the theory: configuration manifolds, actions, endpoint variations, the Euler-Lagrange equations, and the regularity condition that turns those equations into a second-order dynamical system.
## Configuration Manifolds and Paths
What should count as the position of a mechanical system? For a particle moving on a line the answer is a real number, but for a pendulum the angle is periodic, and for a rigid body the position includes an orientation. The point of using manifolds is to treat all these examples with one geometric language while still computing in coordinates when needed.
[definition: Configuration Manifold]
A configuration manifold is a smooth manifold $Q$ whose points $q \in Q$ represent the possible positions of a mechanical system.
[/definition]
The dimension of $Q$ is the number of degrees of freedom. Local coordinates on $Q$ are called generalized coordinates because they need not be Cartesian coordinates in an ambient Euclidean space.
[example: Pendulum Configuration Space]
For a planar pendulum of fixed length $l>0$ with pivot at the origin, the bob position is a point $(x,y)\in \mathbb R^2$ satisfying the constraint
\begin{align*}
x^2+y^2=l^2.
\end{align*}
Writing the position by an angle $\theta$ gives
\begin{align*}
(x,y)=(l\cos\theta,l\sin\theta).
\end{align*}
Substituting this parametrization into the constraint gives
\begin{align*}
(l\cos\theta)^2+(l\sin\theta)^2=l^2\cos^2\theta+l^2\sin^2\theta=l^2(\cos^2\theta+\sin^2\theta)=l^2.
\end{align*}
Thus every angular value produces an allowed position on the circle of radius $l$.
The configuration manifold is therefore $Q=S^1$: the radius $l$ is fixed, so the only remaining degree of freedom is the angular position. The values $\theta$ and $\theta+2\pi$ represent the same configuration because
\begin{align*}
(l\cos(\theta+2\pi),l\sin(\theta+2\pi))=(l\cos\theta,l\sin\theta).
\end{align*}
A local angular coordinate $\theta$ describes the position away from a chosen coordinate cut, but the manifold description records that the two ends of an angular interval are the same physical point rather than distinct boundary configurations.
[/example]
The pendulum example shows that a configuration is not enough to describe motion: two pendula can be at the same angle while moving with different angular velocities. The next problem is to build a space whose points record both a configuration and an allowed instantaneous direction at that configuration. Since the allowed directions vary from point to point on $Q$, the required object is obtained by attaching each tangent space to its base point.
[definition: Tangent Bundle]
The tangent bundle of a smooth manifold $Q$ is the disjoint union
\begin{align*}
TQ = \bigsqcup_{q \in Q} T_qQ,
\end{align*}
with projection $\pi_Q:TQ \to Q$ given by $\pi_Q(v_q)=q$ for $v_q \in T_qQ$.
[/definition]
A point of $TQ$ is often written $(q,v)$, but this notation hides the fact that $v \in T_qQ$ depends on $q$. In a coordinate chart $(U,\varphi)$ with coordinates $(q_1,\dots,q_n)$, a tangent vector has coordinate expression
\begin{align*}
v_q = \sum_{i=1}^n v_i \frac{\partial}{\partial q_i}\Big|_q,
\end{align*}
and the induced coordinates on $TU$ are $(q_1,\dots,q_n,v_1,\dots,v_n)$.
[example: Central Force Coordinates]
A particle moving in the punctured plane has configuration space $Q=\mathbb R^2_0:=\mathbb R^2\setminus\{0\}$. On the polar coordinate chart,
\begin{align*}
(x,y)=(r\cos\theta,r\sin\theta),
\end{align*}
with $r>0$, so a velocity vector is recorded by the four coordinate numbers $(r,\theta,\dot r,\dot\theta)$. Differentiating the coordinate functions with respect to time gives
\begin{align*}
\dot x=\dot r\cos\theta-r\sin\theta\,\dot\theta.
\end{align*}
\begin{align*}
\dot y=\dot r\sin\theta+r\cos\theta\,\dot\theta.
\end{align*}
The Euclidean squared speed is $\dot x^2+\dot y^2$, so substituting these expressions gives
\begin{align*}
|\dot q|^2=(\dot r\cos\theta-r\sin\theta\,\dot\theta)^2+(\dot r\sin\theta+r\cos\theta\,\dot\theta)^2.
\end{align*}
Expanding both squares,
\begin{align*}
|\dot q|^2=\dot r^2\cos^2\theta-2r\dot r\dot\theta\sin\theta\cos\theta+r^2\dot\theta^2\sin^2\theta+\dot r^2\sin^2\theta+2r\dot r\dot\theta\sin\theta\cos\theta+r^2\dot\theta^2\cos^2\theta.
\end{align*}
The mixed terms cancel, leaving
\begin{align*}
|\dot q|^2=\dot r^2(\cos^2\theta+\sin^2\theta)+r^2\dot\theta^2(\sin^2\theta+\cos^2\theta).
\end{align*}
Since $\sin^2\theta+\cos^2\theta=1$,
\begin{align*}
|\dot q|^2=\dot r^2+r^2\dot\theta^2.
\end{align*}
Thus the coordinate velocity components $\dot r$ and $\dot\theta$ do not contribute symmetrically to Euclidean speed: angular motion at radius $r$ has length contribution $r\dot\theta$, not just $\dot\theta$.
[/example]
The central-force example uses dotted symbols, so we need to say what these symbols mean independent of a chosen coordinate formula. A motion should be a curve in $Q$, and its velocity should be the tangent vector obtained by differentiating that curve at each time. This construction turns a path in configuration space into a path in the tangent bundle.
[definition: Smooth Path and Velocity Lift]
Let $I \subset \mathbb R$ be an interval and let $Q$ be a smooth manifold. A smooth path is a smooth map $q:I \to Q$. Its velocity lift is the path $\dot q:I \to TQ$ satisfying $\dot q(t) \in T_{q(t)}Q$ and, in local coordinates,
\begin{align*}
\dot q(t) = \sum_{i=1}^n \dot q_i(t)\frac{\partial}{\partial q_i}\Big|_{q(t)}.
\end{align*}
[/definition]
The lift $\dot q(t)$ is coordinate-independent even though the symbols $\dot q_i$ depend on a chart. This distinction is important because the equations of motion must make sense on $Q$, not only in one coordinate system.
## Action Functionals and First Variation
Which paths are physically realised? Newtonian mechanics answers by differential equations, while the variational formulation answers by a scalar functional on paths. The key idea is that the physical path is not usually a minimiser in a global sense; it is a stationary point under variations that respect the prescribed endpoint data.
[definition: Lagrangian]
A Lagrangian on a configuration manifold $Q$ is a smooth function $L:TQ \times \mathbb R \to \mathbb R$. In the autonomous case it is a smooth function $L:TQ \to \mathbb R$.
[/definition]
The time variable is included when external forcing or time-dependent potentials are allowed. Most structural results in this chapter are already visible in the autonomous case, but the [first variation formula](/theorems/2728) has the same form with $L(q,v,t)$.
[example: Kinetic Minus Potential Energy]
Let $Q$ be a Riemannian manifold with metric $g$, and let $V:Q\to\mathbb R$ be a smooth potential. The mechanical Lagrangian assigns to each position-velocity pair $(q,v)\in TQ$ the number
\begin{align*}
L(q,v)=\frac12 g_q(v,v)-V(q).
\end{align*}
The first term is kinetic energy because $g_q(v,v)$ is the squared length of the velocity vector $v\in T_qQ$.
For $Q=\mathbb R^n$ with the Euclidean metric, write $v=(v_1,\dots,v_n)$. Then
\begin{align*}
g_q(v,v)=v\cdot v=v_1^2+\cdots+v_n^2.
\end{align*}
Therefore the Lagrangian becomes
\begin{align*}
L(q,v)=\frac12(v_1^2+\cdots+v_n^2)-V(q).
\end{align*}
If the particle has mass $m$, the Euclidean kinetic energy is obtained by using the metric $m\,g_{\mathrm{Euc}}$, since
\begin{align*}
(m\,g_{\mathrm{Euc}})_q(v,v)=m(v_1^2+\cdots+v_n^2).
\end{align*}
Thus
\begin{align*}
L(q,v)=\frac12m(v_1^2+\cdots+v_n^2)-V(q),
\end{align*}
which is kinetic energy minus potential energy.
For a curved or constrained configuration space, the same formula still makes sense because $g_q$ measures the squared length of velocities tangent to $Q$ at the point $q$. The metric is therefore the geometric object that supplies the kinetic-energy term in coordinates adapted to the configuration manifold.
[/example]
The preceding example assigns an instantaneous cost to each position and velocity, but a mechanical trajectory occupies a whole time interval. To compare entire paths, the local quantity $L(q(t),\dot q(t),t)$ must be accumulated over time. The resulting number is the action, the central functional of the variational formulation.
[definition: Action Functional]
Let $L:TQ \times \mathbb R \to \mathbb R$ be a Lagrangian and let $a<b$. The action functional on the smooth path space $C^\infty([a,b],Q)$ is the map
\begin{align*}
S:C^\infty([a,b],Q)\to \mathbb R.
\end{align*}
It is defined by
\begin{align*}
S[q]=\int_a^b L(q(t),\dot q(t),t)\,dt.
\end{align*}
[/definition]
The action turns mechanics into a problem about stationarity of functions on an infinite-dimensional space of paths. To test stationarity, we perturb a given path through nearby paths and differentiate the action with respect to the perturbation parameter. Fixing endpoints models experiments where the initial and final configurations are prescribed.
[definition: Fixed-Endpoint Variation]
Let $q:[a,b]\to Q$ be a smooth path. A fixed-endpoint variation of $q$ is a smooth map $q_\varepsilon:[a,b]\to Q$, defined for $\varepsilon$ in a neighbourhood of $0$, such that $q_0=q$ and
\begin{align*}
q_\varepsilon(a)=q(a), \qquad q_\varepsilon(b)=q(b)
\end{align*}
for all $\varepsilon$.
[/definition]
The infinitesimal variation is the vector field along $q$ given by
\begin{align*}
\eta(t)=\frac{\partial q_\varepsilon(t)}{\partial \varepsilon}\Big|_{\varepsilon=0} \in T_{q(t)}Q.
\end{align*}
For fixed-endpoint variations, $\eta(a)=\eta(b)=0$. The next calculation identifies how the action changes to first order in terms of $\eta$, and it is the formula from which the equations of motion will be read.
In a local coordinate chart, write the path and infinitesimal variation as $q(t)=(q_1(t),\dots,q_n(t))$ and $\eta(t)=(\eta_1(t),\dots,\eta_n(t))$. Differentiating the action under the integral sign gives
\begin{align*}
\delta S[q](\eta)=\int_a^b \sum_{i=1}^n
\left(
\frac{\partial L}{\partial q_i}(q,\dot q,t)\eta_i
+\frac{\partial L}{\partial \dot q_i}(q,\dot q,t)\dot\eta_i
\right)\,dt.
\end{align*}
Integrating the velocity-variation term by parts gives the local first-variation formula
\begin{align*}
\delta S[q](\eta)
=
\left[\sum_{i=1}^n\frac{\partial L}{\partial \dot q_i}(q,\dot q,t)\eta_i(t)\right]_{a}^{b}
+\int_a^b\sum_{i=1}^n
\left(
\frac{\partial L}{\partial q_i}
-\frac{d}{dt}\frac{\partial L}{\partial \dot q_i}
\right)(q,\dot q,t)\eta_i(t)\,dt.
\end{align*}
The formula separates the bulk term, which controls the equation along the interior of the path, from the endpoint term, which records how boundary data enter the variational problem. The smoothness hypothesis on $L$ and on the variation is what permits differentiating under the integral sign and integrating by parts; with weaker regularity the same identity may require Sobolev or distributional interpretation. The coordinate-chart hypothesis is a local convenience, not a claim that mechanics depends on a preferred coordinate system. For fixed-endpoint variations the boundary term vanishes, while for free endpoints it would impose additional boundary conditions rather than the same variational problem. This motivates the precise stationarity condition used in Hamilton's principle.
[definition: Stationary Path]
A smooth path $q:[a,b]\to Q$ is stationary for the action $S$ with fixed endpoints if
\begin{align*}
\delta S[q;\eta]=0
\end{align*}
for every smooth infinitesimal variation $\eta$ along $q$ with $\eta(a)=\eta(b)=0$.
[/definition]
The first variation formula reduces the variational problem to the question of when an integral against all compactly supported test variations vanishes. This is an interior question: compactly supported variations can be chosen near any point strictly between $a$ and $b$, while the endpoint values are controlled separately by the endpoint term in the first variation formula. The required analytic input is therefore a lemma that converts vanishing against all interior test functions into a pointwise interior conclusion.
[quotetheorem:45]
[citeproof:45]
This lemma is the bridge from stationarity to differential equations. Continuity is essential for the stated pointwise conclusion: for merely integrable $f$, the same argument gives $f=0$ only almost everywhere. The compact-support condition keeps the test variations away from the endpoints, so the lemma detects the interior equation but says nothing about endpoint terms. Thus the variational argument splits naturally into an interior Euler-Lagrange equation and, when endpoints are allowed to move, separate boundary conditions.
## Euler-Lagrange Equations on the Tangent Bundle
What differential equation is encoded by stationarity of the action? In coordinates, the answer is obtained by applying the fundamental lemma to each generalized coordinate. Geometrically, the result is an equation on $TQ$ whose solutions are velocity lifts of paths in $Q$.
[quotetheorem:6835]
The equations involve $q_i$, $\dot q_i$, and usually $\ddot q_i$ after differentiating $\partial L/\partial v_i$ in time. The hypotheses above are not only technical packaging. If $L(q,v)=|v|$ on $Q=\mathbb R$ is used at $v=0$, the derivative with respect to $v$ is not defined, so the displayed pointwise Euler-Lagrange equation is not available for paths that stop. If endpoint variations are allowed for $L(q,v)=\frac12 v^2$, the integration-by-parts boundary term gives the extra conditions $\dot q(a)=\dot q(b)=0$ rather than the fixed-endpoint variational problem. If a path on $S^1$ is forced into a single angular coordinate crossing its coordinate cut, the coordinate expression can fail even though the geometric path is smooth. The theorem is also local in time: for an equation such as $\ddot q=\dot q^2$ on $Q=\mathbb R$, initial data can lead to finite-time blow-up, so the variational derivation of the equation does not itself prove global existence.
[example: Geodesics from Kinetic Energy]
Let $(Q,g)$ be a Riemannian manifold and take the kinetic energy Lagrangian $L(q,v)=\frac12 g_q(v,v)$. In local coordinates,
\begin{align*}
L(q,v)=\frac12\sum_{i,j=1}^n g_{ij}(q)v_i v_j.
\end{align*}
Along a path $q(t)$, set $v_i=\dot q_i$. Since $g_{ij}=g_{ji}$,
\begin{align*}
\frac{\partial L}{\partial v_k}=\frac12\sum_{j=1}^n g_{kj}(q)v_j+\frac12\sum_{i=1}^n g_{ik}(q)v_i=\sum_{j=1}^n g_{kj}(q)v_j.
\end{align*}
Therefore
\begin{align*}
\frac{d}{dt}\frac{\partial L}{\partial v_k}(q,\dot q)=\sum_{j=1}^n g_{kj}(q)\ddot q_j+\sum_{i,j=1}^n \frac{\partial g_{kj}}{\partial q_i}(q)\dot q_i\dot q_j.
\end{align*}
Also,
\begin{align*}
\frac{\partial L}{\partial q_k}(q,\dot q)=\frac12\sum_{i,j=1}^n \frac{\partial g_{ij}}{\partial q_k}(q)\dot q_i\dot q_j.
\end{align*}
Substituting these two expressions into the coordinate Euler-Lagrange equation gives
\begin{align*}
\sum_{j=1}^n g_{kj}\ddot q_j+\sum_{i,j=1}^n \frac{\partial g_{kj}}{\partial q_i}\dot q_i\dot q_j-\frac12\sum_{i,j=1}^n \frac{\partial g_{ij}}{\partial q_k}\dot q_i\dot q_j=0.
\end{align*}
Because $\dot q_i\dot q_j=\dot q_j\dot q_i$, the middle sum may be symmetrized:
\begin{align*}
\sum_{i,j=1}^n \frac{\partial g_{kj}}{\partial q_i}\dot q_i\dot q_j=\frac12\sum_{i,j=1}^n\left(\frac{\partial g_{kj}}{\partial q_i}+\frac{\partial g_{ki}}{\partial q_j}\right)\dot q_i\dot q_j.
\end{align*}
Hence
\begin{align*}
\sum_{j=1}^n g_{kj}\ddot q_j+\frac12\sum_{i,j=1}^n\left(\frac{\partial g_{kj}}{\partial q_i}+\frac{\partial g_{ki}}{\partial q_j}-\frac{\partial g_{ij}}{\partial q_k}\right)\dot q_i\dot q_j=0.
\end{align*}
Multiplying by the inverse matrix $(g^{\ell k})$ and summing over $k$ gives
\begin{align*}
\ddot q_\ell+\sum_{i,j=1}^n \Gamma_{ij}^{\ell}(q)\dot q_i\dot q_j=0,
\end{align*}
where
\begin{align*}
\Gamma_{ij}^{\ell}=\frac12\sum_{k=1}^n g^{\ell k}\left(\frac{\partial g_{kj}}{\partial q_i}+\frac{\partial g_{ki}}{\partial q_j}-\frac{\partial g_{ij}}{\partial q_k}\right).
\end{align*}
Thus the stationary paths of the kinetic energy action satisfy the coordinate geodesic equation, so they are precisely geodesics with an affine parameter; for this Lagrangian the speed $g_q(\dot q,\dot q)$ is constant along such solutions.
[/example]
The pendulum gives the first standard example where the configuration space is curved but one coordinate describes the motion locally and globally up to periodicity.
[example: Pendulum on the Circle]
For a pendulum of mass $m>0$ and length $l>0$ in a uniform gravitational field, use the angular coordinate $\theta$ on $Q=S^1$. With potential $V(\theta)=mgl(1-\cos\theta)$, the Lagrangian is
\begin{align*}
L(\theta,\dot\theta)=\frac12 ml^2\dot\theta^2-mgl(1-\cos\theta).
\end{align*}
By the *[Coordinate Euler-Lagrange Equations](/theorems/6835)*, the one-coordinate equation is
\begin{align*}
\frac{d}{dt}\frac{\partial L}{\partial \dot\theta}-\frac{\partial L}{\partial \theta}=0.
\end{align*}
First,
\begin{align*}
\frac{\partial L}{\partial \dot\theta}=\frac{\partial}{\partial \dot\theta}\left(\frac12 ml^2\dot\theta^2-mgl(1-\cos\theta)\right)=ml^2\dot\theta.
\end{align*}
Since $m$ and $l$ are constant along the motion,
\begin{align*}
\frac{d}{dt}\frac{\partial L}{\partial \dot\theta}=\frac{d}{dt}(ml^2\dot\theta)=ml^2\ddot\theta.
\end{align*}
Also,
\begin{align*}
\frac{\partial L}{\partial \theta}=\frac{\partial}{\partial \theta}\left(\frac12 ml^2\dot\theta^2-mgl(1-\cos\theta)\right)=-mgl\sin\theta.
\end{align*}
Substituting these two derivatives into the Euler-Lagrange equation gives
\begin{align*}
ml^2\ddot\theta-(-mgl\sin\theta)=0.
\end{align*}
Therefore
\begin{align*}
ml^2\ddot\theta+mgl\sin\theta=0.
\end{align*}
Dividing by $ml^2$ gives
\begin{align*}
\ddot\theta+\frac{g}{l}\sin\theta=0.
\end{align*}
Thus the variational principle on the circle produces the nonlinear pendulum equation; the nonlinearity is the $\sin\theta$ term coming from the gravitational potential.
[/example]
For systems with symmetry, the coordinate expression often reveals conserved quantities before the full Hamiltonian formalism has been developed.
[example: Central Force Lagrangian]
For a particle of mass $m$ in the punctured plane with potential $V(r)$ depending only on distance from the origin, the polar-coordinate Lagrangian on $\mathbb R^2_0$ is
\begin{align*}
L(r,\theta,\dot r,\dot\theta)=\frac12m(\dot r^2+r^2\dot\theta^2)-V(r).
\end{align*}
We compute the two coordinate Euler-Lagrange equations for $r$ and $\theta$.
For the radial coordinate,
\begin{align*}
\frac{\partial L}{\partial \dot r}=\frac{\partial}{\partial \dot r}\left(\frac12m\dot r^2+\frac12mr^2\dot\theta^2-V(r)\right)=m\dot r.
\end{align*}
Since $m$ is constant,
\begin{align*}
\frac{d}{dt}\frac{\partial L}{\partial \dot r}=\frac{d}{dt}(m\dot r)=m\ddot r.
\end{align*}
Also,
\begin{align*}
\frac{\partial L}{\partial r}=\frac{\partial}{\partial r}\left(\frac12m\dot r^2+\frac12mr^2\dot\theta^2-V(r)\right)=mr\dot\theta^2-V'(r).
\end{align*}
Substituting these expressions into the radial Euler-Lagrange equation gives
\begin{align*}
m\ddot r-(mr\dot\theta^2-V'(r))=0.
\end{align*}
Equivalently,
\begin{align*}
m\ddot r-mr\dot\theta^2+V'(r)=0.
\end{align*}
For the angular coordinate,
\begin{align*}
\frac{\partial L}{\partial \dot\theta}=\frac{\partial}{\partial \dot\theta}\left(\frac12m\dot r^2+\frac12mr^2\dot\theta^2-V(r)\right)=mr^2\dot\theta.
\end{align*}
The Lagrangian has no explicit $\theta$-dependence, so
\begin{align*}
\frac{\partial L}{\partial \theta}=0.
\end{align*}
Substituting into the angular Euler-Lagrange equation gives
\begin{align*}
\frac{d}{dt}(mr^2\dot\theta)-0=0.
\end{align*}
Thus
\begin{align*}
\frac{d}{dt}(mr^2\dot\theta)=0.
\end{align*}
Therefore $mr^2\dot\theta$ is constant along the motion. This constant is the angular momentum, so rotational symmetry of the potential appears in the variational equations as conservation of angular momentum.
[/example]
The central-force example shows what goes wrong if the derivatives of $L$ with respect to velocities are treated as disposable algebraic terms: the conserved angular momentum would be hidden inside an integration-by-parts calculation. These derivatives are the quantities whose time derivatives appear in the Euler-Lagrange equations, and in cyclic coordinates they become conserved. This motivates naming them as the momenta conjugate to generalized coordinates.
[definition: Generalized Momentum]
Let $L:TQ\times\mathbb R\to\mathbb R$ be a Lagrangian. In local coordinates on $TQ$, the generalized momentum associated to $q_i$ is the coordinate function
\begin{align*}
p_i:TQ\times\mathbb R\to \mathbb R
\end{align*}
given by
\begin{align*}
p_i(q,v,t)=\frac{\partial L}{\partial v_i}(q,v,t).
\end{align*}
For an autonomous Lagrangian $L:TQ\to\mathbb R$, this is the coordinate function $p_i:TQ\to\mathbb R$ given by $p_i(q,v)=\partial L/\partial v_i(q,v)$.
[/definition]
This definition is coordinate-level at this stage. Later, the Legendre transform will assemble the $p_i$ into a covector and move the dynamics from $TQ$ to $T^*Q$.
## Regular Lagrangians and the Fiber Hessian
When do the Euler-Lagrange equations determine the acceleration from the current position and velocity? This question matters because mechanics is expected to define time evolution from initial data. The answer is controlled by the Hessian of $L$ in the velocity variables.
[definition: Fiber Hessian]
Let $L:TQ\times\mathbb R\to\mathbb R$ be smooth. In local coordinates $(q_i,v_i)$ on $TQ$, the fiber Hessian of $L$ is the map
\begin{align*}
\operatorname{Hess}_v L:TQ\times\mathbb R\to \operatorname{Mat}_{n\times n}(\mathbb R)
\end{align*}
given by
\begin{align*}
(\operatorname{Hess}_v L)(q,v,t)=\left(\frac{\partial^2 L}{\partial v_i\partial v_j}(q,v,t)\right)_{i,j=1}^n.
\end{align*}
[/definition]
The word fiber refers to differentiating only in the tangent-vector direction while keeping the base point and time fixed. The next problem is to identify which Lagrangians let the Euler-Lagrange equations solve for acceleration from initial data. Although the displayed matrix is coordinate-dependent, invertibility of this matrix is invariant under coordinate changes, so it gives a geometric condition on $L$.
[definition: Regular Lagrangian]
A Lagrangian $L:TQ\times\mathbb R\to\mathbb R$ is regular if its fiber Hessian is invertible at every point $(q,v,t)$ in its domain.
[/definition]
Regularity is the Lagrangian condition that lets the second-order equations be solved for $\ddot q$. Without it, the variational equations may contain constraints rather than evolution equations: the coefficients of the accelerations may fail to determine those accelerations uniquely. The local issue is therefore whether the Euler-Lagrange equations can be reorganized as an explicit ODE on $TQ$.
The next formal step is to turn that local algebraic solvability into an actual dynamical statement. Once the accelerations can be isolated, the Euler-Lagrange equations should define a vector field on the tangent bundle, so initial position and velocity determine a local motion.
[quotetheorem:6836]
[citeproof:6836]
This result is the local mechanism behind existence and uniqueness for regular Lagrangian systems: after solving for $\ddot q$, the dynamics become a first-order system on $TQ$. The invertibility hypothesis is not cosmetic; if the fiber Hessian has a kernel, some acceleration components may be undetermined or replaced by constraint equations. The theorem is local in time and in coordinates, so it does not guarantee global existence, completeness of the flow, or avoidance of coordinate singularities. Those questions belong to ODE theory and to the global geometry of the configuration manifold.
[example: Mechanical Lagrangians are Regular]
Let $(Q,g)$ be a Riemannian manifold and let $L(q,v)=\frac12 g_q(v,v)-V(q)$. In local coordinates, write $v=\sum_{i=1}^n v_i\partial/\partial q_i|_q$ and $g_{ij}(q)=g_q(\partial/\partial q_i,\partial/\partial q_j)$. By bilinearity of $g_q$,
\begin{align*}
g_q(v,v)=\sum_{i=1}^n\sum_{j=1}^n g_{ij}(q)v_i v_j.
\end{align*}
Thus
\begin{align*}
L(q,v)=\frac12\sum_{i=1}^n\sum_{j=1}^n g_{ij}(q)v_i v_j-V(q).
\end{align*}
Fix $k$. Differentiating with respect to $v_k$ gives
\begin{align*}
\frac{\partial L}{\partial v_k}=\frac12\sum_{j=1}^n g_{kj}(q)v_j+\frac12\sum_{i=1}^n g_{ik}(q)v_i.
\end{align*}
Since $g_{ik}(q)=g_{ki}(q)$ by symmetry of the metric, the two sums are the same after renaming the index, so
\begin{align*}
\frac{\partial L}{\partial v_k}=\sum_{j=1}^n g_{kj}(q)v_j.
\end{align*}
Differentiating this expression with respect to $v_\ell$ gives
\begin{align*}
\frac{\partial^2 L}{\partial v_\ell\partial v_k}=g_{k\ell}(q).
\end{align*}
Therefore the fiber Hessian matrix is exactly
\begin{align*}
\operatorname{Hess}_v L(q,v)=\bigl(g_{ij}(q)\bigr)_{i,j=1}^n.
\end{align*}
For each fixed $q$, the [bilinear form](/page/Bilinear%20Form) $g_q$ is an inner product on $T_qQ$, so its coordinate matrix is positive definite: if $a\in\mathbb R^n$ is nonzero and $w=\sum_i a_i\partial/\partial q_i|_q$, then
\begin{align*}
a^T(g_{ij}(q))a=g_q(w,w)>0.
\end{align*}
A positive definite matrix has trivial kernel, hence is invertible. Thus the fiber Hessian is invertible at every $(q,v)$, so mechanical Lagrangians with non-degenerate kinetic energy are regular.
[/example]
A singular Lagrangian often indicates that the chosen variables include constraints or gauge freedom. Such systems require extra techniques and will not behave like ordinary second-order equations on $Q$.
[example: A Singular Lagrangian]
On $Q=\mathbb R$, use the global coordinate $q$ and write the velocity coordinate as $v$. Consider
\begin{align*}
L(q,v)=qv.
\end{align*}
Differentiating once with respect to the velocity variable gives
\begin{align*}
\frac{\partial L}{\partial v}(q,v)=\frac{\partial}{\partial v}(qv)=q,
\end{align*}
because $q$ is held fixed when taking the fiber derivative. Differentiating once more with respect to $v$ gives
\begin{align*}
\frac{\partial^2 L}{\partial v^2}(q,v)=\frac{\partial}{\partial v}(q)=0.
\end{align*}
Thus the fiber Hessian is the $1\times 1$ zero matrix, so it is not invertible.
Along a smooth path $q(t)$, the velocity coordinate is $v(t)=\dot q(t)$. The one-coordinate Euler-Lagrange equation is
\begin{align*}
\frac{d}{dt}\frac{\partial L}{\partial v}(q(t),\dot q(t))-\frac{\partial L}{\partial q}(q(t),\dot q(t))=0.
\end{align*}
The first term is
\begin{align*}
\frac{d}{dt}\frac{\partial L}{\partial v}(q(t),\dot q(t))=\frac{d}{dt}q(t)=\dot q(t).
\end{align*}
The second term is
\begin{align*}
\frac{\partial L}{\partial q}(q,v)=\frac{\partial}{\partial q}(qv)=v.
\end{align*}
Substituting $v=\dot q(t)$ gives
\begin{align*}
\frac{\partial L}{\partial q}(q(t),\dot q(t))=\dot q(t).
\end{align*}
Therefore the Euler-Lagrange equation becomes
\begin{align*}
\dot q(t)-\dot q(t)=0.
\end{align*}
Since this identity holds for every smooth function $q(t)$, the variational equation imposes no condition on the acceleration or on the future motion. This singular Lagrangian therefore supplies no deterministic evolution law for $q$.
[/example]
The chapter has moved from configuration spaces to variational dynamics. A configuration manifold $Q$ gives the possible positions, $TQ$ gives velocities, a Lagrangian assigns an action to paths, and the first variation turns stationarity into the Euler-Lagrange equations. Regularity of the fiber Hessian then ensures that these equations define genuine time evolution on the tangent bundle. The next chapter adds constraints and symmetry reduction, where some variables are restricted, some are cyclic, and the variational principle must be adapted without losing its geometric form.
Once the Euler-Lagrange equations are in hand, the next question is what happens when coordinates are not all independent. Chapter 2 answers this by introducing constraints and symmetry reduction, so the variational principle can still be used in settings where some directions are fixed, redundant, or only virtual.
# 2. Lagrangian Mechanics and Constraints
This chapter continues the variational mechanics developed in Chapter 1 by studying what changes when not every formal coordinate motion is physically available. The main prerequisites are the Euler-Lagrange equations, tangent spaces to smooth manifolds, and the interpretation of the first variation as a covector paired with a virtual displacement. The central question is how to impose restrictions on configurations or velocities without discarding the variational structure that produced the equations of motion. Holonomic constraints reduce the configuration space itself, non-holonomic constraints restrict the allowed velocities, and cyclic coordinates expose variables whose momenta can be eliminated from the dynamics.
## Holonomic Constraints and Constrained Configuration Spaces
Suppose a system is first described in a manifold $Q$, but its configurations must satisfy equations $F(q)=0$. The problem is to decide when these equations define a genuine smaller configuration manifold and how the Euler-Lagrange equations on this smaller manifold are related to equations in the original coordinates.
[definition: Holonomic Constraint]
Let $Q$ be a smooth manifold. A holonomic constraint is a smooth submanifold $C \subset Q$ whose points are the admissible configurations.
[/definition]
This definition treats the constraint as geometry rather than as a list of equations. In computations, however, constraints usually arrive as equations
\begin{align*}
F_1(q)=\cdots=F_k(q)=0,
\end{align*}
so the next question is when those equations really define a submanifold and what the tangent vectors to that submanifold look like. The answer is needed before we can restrict velocities, variations, and the Lagrangian itself.
[quotetheorem:6837]
[citeproof:6837]
The theorem turns equations into a constrained configuration space and identifies the admissible velocities as $TC$. The regularity hypothesis is doing real work: it rules out singular level sets whose dimension changes or whose tangent space is not described by a constant-rank kernel. For instance, the map $F:\mathbb{R}^2\to\mathbb{R}$ defined by $F(x,y)=x^2+y^2$ has $F^{-1}(0)=\{(0,0)\}$, but $dF_{(0,0)}=0$, so level-set information at the constraint point does not produce a codimension-one curve. The theorem also does not say that every equation chosen by a modeller is minimal; redundant equations may describe the same set while failing the regular-value condition.
Once the regular replacement has been made, the variational principle from the first chapter can be applied without inventing a new rule: the action is evaluated only on paths in $C$. To express that operation intrinsically, we give the restricted Lagrangian its own name.
[definition: Reduced Holonomic Lagrangian]
Let $C\subset Q$ be a holonomic constraint and let $\iota:C\hookrightarrow Q$ be the inclusion. For a Lagrangian $L:TQ\to \mathbb{R}$, the reduced holonomic Lagrangian is
\begin{align*}
L_C := L\circ T\iota : TC\to \mathbb{R}.
\end{align*}
[/definition]
The reduced Lagrangian gives the clean geometric description, but many calculations are carried out in the ambient coordinates on $Q$. In those coordinates, tangent variations to $C$ are awkward to parametrize directly. The next result replaces that tangency condition by extra unknown functions, whose role is to enforce the constraint equations.
[quotetheorem:3509]
[citeproof:3509]
The multipliers are not new physical coordinates. They encode the constraint reactions, and their values are determined together with the trajectory by differentiating the constraint equations as needed. The independence assumption is what makes the conormal space have the advertised basis: if the same constraint is repeated, or if one equation is a function of the others along $C$, the multiplier representation becomes non-unique and may not record the actual rank of the constraint. These equations are necessary equations for a constrained extremal; by themselves they do not prove that a solution exists, that it is unique, or that a chosen multiplier parametrisation remains regular for all time.
[example: Particle on a Sphere]
Consider a particle of mass $m$ moving in $\mathbb{R}^3$ under a potential $V(x)$, constrained by
\begin{align*}
|x|^2=R^2.
\end{align*}
For
\begin{align*}
L(x,\dot x)=\frac{m}{2}|\dot x|^2-V(x),
\end{align*}
the constraint function is $F(x)=|x|^2-R^2$, so its coordinate gradient is
\begin{align*}
\nabla F(x)=2x.
\end{align*}
The holonomic multiplier equations therefore read
\begin{align*}
m\ddot x=-\nabla V(x)+2\lambda x,\qquad |x|^2=R^2.
\end{align*}
To determine $\lambda$, differentiate the constraint along the motion. Since $|x|^2=x\cdot x$ and $R^2$ is constant,
\begin{align*}
\frac{d}{dt}(x\cdot x)=2x\cdot\dot x=0.
\end{align*}
Thus $x\cdot\dot x=0$. Differentiating this identity once more gives
\begin{align*}
\frac{d}{dt}(x\cdot\dot x)=\dot x\cdot\dot x+x\cdot\ddot x=0,
\end{align*}
so
\begin{align*}
x\cdot\ddot x=-|\dot x|^2.
\end{align*}
Now take the dot product of the multiplier equation with $x$:
\begin{align*}
m\,x\cdot\ddot x=-x\cdot\nabla V(x)+2\lambda\,x\cdot x.
\end{align*}
Using $x\cdot\ddot x=-|\dot x|^2$ and $x\cdot x=R^2$, this becomes
\begin{align*}
-m|\dot x|^2=-x\cdot\nabla V(x)+2\lambda R^2.
\end{align*}
Solving for $\lambda$ gives
\begin{align*}
\lambda=\frac{x\cdot\nabla V(x)-m|\dot x|^2}{2R^2}.
\end{align*}
Thus the multiplier is not an extra degree of freedom: it is fixed by the radial component of the equation of motion, while the actual motion remains tangent to the sphere.
[/example]
## Non-Holonomic Velocity Constraints and the Lagrange-d'Alembert Principle
Some constraints restrict the instantaneous velocity without integrating to restrictions on position alone. The problem is that the path is not confined to a smaller configuration submanifold, so the correct variational principle must restrict virtual displacements rather than replace $Q$ by a smaller $C$.
[definition: Linear Non-Holonomic Constraint]
Let $Q$ be a smooth manifold. A linear non-holonomic constraint is a smooth vector subbundle $D\subset TQ$ whose fibre $D_q\subset T_qQ$ is the space of admissible velocities at $q$.
[/definition]
This definition separates admissible directions from admissible positions. A curve must have velocity in $D$, but the set of points it can reach may still be open in $Q$. Because there is no reduced configuration manifold available, we need a variational rule that says which virtual displacements are allowed at each point of the original configuration space.
[definition: Lagrange-d'Alembert Principle]
Let $L:TQ\to \mathbb{R}$ be a Lagrangian and let $D\subset TQ$ be a linear non-holonomic constraint. A curve $q:[t_0,t_1]\to Q$ satisfies the Lagrange-d'Alembert principle if $\dot q(t)\in D_{q(t)}$ and
\begin{align*}
\delta \int_{t_0}^{t_1} L(q(t),\dot q(t))\,dt=0
\end{align*}
for all variations with fixed endpoints and variation field $\delta q(t)\in D_{q(t)}$.
[/definition]
This principle is not the same as applying Hamilton's principle to all paths satisfying the velocity constraint. For non-integrable $D$, restricting the varied paths would impose a different set of equations. To compute actual trajectories, we therefore need the coordinate equations obtained by projecting the Euler-Lagrange covector onto the annihilator of the admissible distribution.
[quotetheorem:6838]
The distinction between holonomic and non-holonomic constraints is therefore a distinction between reducing the configuration manifold and projecting the variational equation onto allowed virtual displacements. The independence of the one-forms is needed so that $D_q^\circ$ has the displayed basis and the multipliers describe every reaction covector without artificial redundancy. If the one-forms become dependent, the same admissible distribution may be represented by too many equations, or by equations whose rank changes along the motion. A concrete rank-loss example is given on $Q=\mathbb{R}^2$ by the single constraint one-form $\omega=x\,dy-y\,dx$: away from the origin it cuts out a one-dimensional admissible subspace, while at $(0,0)$ it imposes no condition at all, so the fibres do not form a vector subbundle of constant rank. The theorem is also restricted to linear velocity constraints. For instance, the constraint $\dot x^2+\dot y^2=1$ selects a circle in each tangent space rather than a vector subspace, so it cannot be encoded as a distribution $D\subset TQ$ and requires a different formulation of admissible virtual displacements. As with the holonomic multiplier equations, the result gives necessary equations for curves satisfying the Lagrange-d'Alembert principle, not a general existence or uniqueness theorem.
The next example is the standard test case because its constraint is linear in velocity but not a position equation.
[example: Rolling Disk Without Slipping]
Let a vertical disk of radius $R>0$ roll on the plane. Use coordinates $(x,y,\theta,\varphi)$, where $(x,y)$ is the contact point, $\theta$ is the heading angle, and $\varphi$ is the spin angle. The no-slip constraints are
\begin{align*}
\dot x=R\dot\varphi\cos\theta,\qquad \dot y=R\dot\varphi\sin\theta.
\end{align*}
Equivalently, the admissible velocities are the vectors
\begin{align*}
v=a\frac{\partial}{\partial x}+b\frac{\partial}{\partial y}+c\frac{\partial}{\partial\theta}+d\frac{\partial}{\partial\varphi}
\end{align*}
satisfying
\begin{align*}
a-Rd\cos\theta=0,\qquad b-Rd\sin\theta=0.
\end{align*}
Thus $a=Rd\cos\theta$ and $b=Rd\sin\theta$, while $c$ and $d$ are free. Hence
\begin{align*}
v=c\frac{\partial}{\partial\theta}+d\left(R\cos\theta\frac{\partial}{\partial x}+R\sin\theta\frac{\partial}{\partial y}+\frac{\partial}{\partial\varphi}\right).
\end{align*}
So the constraint distribution is
\begin{align*}
D=\operatorname{span}\left\{\frac{\partial}{\partial\theta},\;R\cos\theta\frac{\partial}{\partial x}+R\sin\theta\frac{\partial}{\partial y}+\frac{\partial}{\partial\varphi}\right\},
\end{align*}
which has rank two at every point because the first spanning vector has $\partial/\partial\theta$ component $1$, while the second has $\partial/\partial\varphi$ component $1$.
This rank-two distribution is not the tangent distribution of fixed equations among $x,y,\theta,\varphi$. Let
\begin{align*}
X=\frac{\partial}{\partial\theta}
\end{align*}
and
\begin{align*}
Y=R\cos\theta\frac{\partial}{\partial x}+R\sin\theta\frac{\partial}{\partial y}+\frac{\partial}{\partial\varphi}.
\end{align*}
Since $X$ differentiates the $\theta$-dependent coefficients of $Y$ and $Y$ has no $\partial/\partial\theta$ component, their Lie bracket is
\begin{align*}
[X,Y]=-R\sin\theta\frac{\partial}{\partial x}+R\cos\theta\frac{\partial}{\partial y}.
\end{align*}
If $[X,Y]$ lay in $D$, then for some functions $\alpha,\beta$ we would have
\begin{align*}
[X,Y]=\alpha X+\beta Y.
\end{align*}
Comparing the $\partial/\partial\varphi$ component gives $\beta=0$, and comparing the $\partial/\partial\theta$ component gives $\alpha=0$. The right-hand side would then be $0$, but the left-hand side has squared length $R^2\sin^2\theta+R^2\cos^2\theta=R^2\neq 0$ in the $(x,y)$ components. Thus $[X,Y]\notin D$.
The calculation shows how changing the heading creates new future rolling directions rather than a fixed constraint surface. Therefore the motion must be treated by the Lagrange-d'Alembert principle: velocities and virtual displacements lie in $D$, and the reaction covectors annihilate $D$ rather than coming from equations defining a smaller configuration manifold.
[/example]
[illustration:rolling-disk-distribution]
## Cyclic Coordinates, Routh Reduction, and Effective Potentials
Constraints reduce motion by removing forbidden directions, but symmetry can reduce motion by removing redundant directions. The guiding question here is what happens when a coordinate does not appear in the Lagrangian: its conjugate momentum is conserved, and under a regularity condition that momentum level can be used to eliminate the corresponding velocity.
[definition: Cyclic Coordinate]
Let $U\subset \mathbb{R}^n$ be a coordinate chart domain and let $L:U\times\mathbb{R}^n\to\mathbb{R}$ be a smooth local-coordinate Lagrangian with variables $(q,\dot q)=(q_1,\dots,q_n,\dot q_1,\dots,\dot q_n)$. The coordinate $q_n$ is cyclic if $L$ is independent of $q_n$.
[/definition]
The definition singles out an ignorable coordinate: the Lagrangian can depend on its velocity but not on its value. The Euler-Lagrange equation in that coordinate therefore loses its force term and becomes a conservation law. This is the first reduction mechanism that does not require a constraint force.
In local Lagrangian coordinates this conservation law is immediate. If $q_n$ is cyclic, then $\partial L/\partial q_n=0$. The $q_n$ component of the Euler-Lagrange equations is
\begin{align*}
\frac{d}{dt}\frac{\partial L}{\partial \dot q_n}(q(t),\dot q(t))-\frac{\partial L}{\partial q_n}(q(t),\dot q(t))=0.
\end{align*}
Therefore
\begin{align*}
\frac{d}{dt}p_n(t)=0,\qquad p_n(t):=\frac{\partial L}{\partial \dot q_n}(q(t),\dot q(t)).
\end{align*}
Thus the momentum conjugate to an ignorable coordinate is constant along every Euler-Lagrange trajectory for which the coordinate calculation is valid.
The hypothesis cannot be weakened to approximate independence: if $L$ depends on $q_n$, then the Euler-Lagrange equation gives $dp_n/dt=\partial L/\partial q_n$, so the generalized momentum may change in time. For example, a planar pendulum with angle $\theta$ has
\begin{align*}
L(\theta,\dot\theta)=\frac{ml^2}{2}\dot\theta^2-mgl(1-\cos\theta),
\end{align*}
and the conjugate angular momentum $p_\theta=ml^2\dot\theta$ satisfies $\dot p_\theta=-mgl\sin\theta$ rather than a conservation law. Conservation is also not yet a reduction theorem. It gives a first integral, but it eliminates the cyclic velocity only when the relation $p_n=\partial L/\partial\dot q_n$ can be solved for $\dot q_n$ on the momentum level under consideration.
A conserved momentum is useful only if it can be used to solve for the velocity in the cyclic direction. Routh reduction formalises this elimination and produces a new variational problem for the remaining coordinates.
[definition: Routhian]
Let $V\subset\mathbb{R}^{n}$ be a coordinate domain, let $L:V\times\mathbb{R}^{n}\to\mathbb{R}$ be a smooth local-coordinate Lagrangian for which $q_n$ is cyclic, let $U\subset\mathbb{R}^{n-1}$ be the corresponding domain for the non-cyclic coordinates, and fix a momentum value $\ell\in\mathbb{R}$.
Assume the equation
\begin{align*}
\frac{\partial L}{\partial \dot q_n}(q_1,\dots,q_{n-1},\dot q_1,\dots,\dot q_n)=\ell
\end{align*}
can be solved for $\dot q_n$ as a smooth function $\psi_\ell:U\times\mathbb{R}^{n-1}\to\mathbb{R}$, written
\begin{align*}
\dot q_n=\psi_\ell(q_1,\dots,q_{n-1},\dot q_1,\dots,\dot q_{n-1}).
\end{align*}
The Routhian at momentum value $\ell$ is the smooth map $R_\ell:U\times\mathbb{R}^{n-1}\to\mathbb{R}$ defined by
\begin{align*}
R_\ell(q_1,\dots,q_{n-1},\dot q_1,\dots,\dot q_{n-1})
= L\big(q,\dot q_1,\dots,\dot q_{n-1},\psi_\ell\big)-\ell\psi_\ell.
\end{align*}
[/definition]
The sign convention here matches the partial Legendre transform in the cyclic velocity. The definition gives a candidate reduced Lagrangian, but it remains to justify that its Euler-Lagrange equations are exactly the non-cyclic part of the original dynamics at fixed momentum. That justification is where the regularity condition enters.
[quotetheorem:6839]
[citeproof:6839]
The nonzero fibre Hessian is the local invertibility condition behind the whole construction. If $\partial p_n/\partial\dot q_n$ vanishes, the momentum equation may fail to determine a smooth cyclic velocity. For example, if the cyclic part of a local Lagrangian is $L_{\mathrm{cyc}}(\dot q_n)=\dot q_n^3/3$, then $p_n=\dot q_n^2$ and the momentum level $\ell=0$ has the unique solution $\dot q_n=0$ but no smooth implicit branch selected by a nonzero derivative; for $\ell>0$ there are two branches $\dot q_n=\pm\sqrt{\ell}$ rather than a single canonical one. Even when the local calculation works, the theorem does not settle global reconstruction: different coordinate patches, periodic cyclic variables, or disconnected momentum levels may require separate branches and compatibility conditions. It also assumes the momentum value is regular enough for the reduction; singular momentum levels can carry dynamics that is not captured by one Routhian chart.
Effective potentials are the most familiar outcome of this procedure when the cyclic coordinate is an angle and the conserved momentum contributes a repulsive term to the reduced radial motion.
[example: Spherical Pendulum and Effective Radial Motion]
For a spherical pendulum of length $l$, use spherical coordinates $(\theta,\varphi)$, where $\varphi$ is the azimuthal angle. The Lagrangian is
\begin{align*}
L=\frac{m l^2}{2}\big(\dot\theta^2+\sin^2\theta\,\dot\varphi^2\big)-mgl(1-\cos\theta).
\end{align*}
Since $L$ contains $\dot\varphi$ but not $\varphi$, the coordinate $\varphi$ is cyclic. Its conjugate momentum is
\begin{align*}
p_\varphi=\frac{\partial L}{\partial \dot\varphi}=ml^2\sin^2\theta\,\dot\varphi.
\end{align*}
Fix a momentum value $p_\varphi=\ell$. On the region where $\sin\theta\neq 0$, the momentum equation gives
\begin{align*}
ml^2\sin^2\theta\,\dot\varphi=\ell.
\end{align*}
Dividing by $ml^2\sin^2\theta$ gives
\begin{align*}
\dot\varphi=\frac{\ell}{ml^2\sin^2\theta}.
\end{align*}
The total energy is kinetic plus potential energy:
\begin{align*}
E=\frac{m l^2}{2}\dot\theta^2+\frac{m l^2}{2}\sin^2\theta\,\dot\varphi^2+mgl(1-\cos\theta).
\end{align*}
Substituting $\dot\varphi=\ell/(ml^2\sin^2\theta)$ into the azimuthal kinetic term gives
\begin{align*}
\frac{m l^2}{2}\sin^2\theta\,\dot\varphi^2=\frac{m l^2}{2}\sin^2\theta\left(\frac{\ell}{ml^2\sin^2\theta}\right)^2.
\end{align*}
Squaring the fraction gives
\begin{align*}
\left(\frac{\ell}{ml^2\sin^2\theta}\right)^2=\frac{\ell^2}{m^2l^4\sin^4\theta}.
\end{align*}
Therefore
\begin{align*}
\frac{m l^2}{2}\sin^2\theta\,\dot\varphi^2=\frac{m l^2\sin^2\theta}{2}\cdot\frac{\ell^2}{m^2l^4\sin^4\theta}=\frac{\ell^2}{2ml^2\sin^2\theta}.
\end{align*}
Hence the reduced one-dimensional energy is
\begin{align*}
E=\frac{ml^2}{2}\dot\theta^2+\frac{\ell^2}{2ml^2\sin^2\theta}+mgl(1-\cos\theta).
\end{align*}
Thus the effective potential for the $\theta$-motion is
\begin{align*}
V_{\mathrm{eff}}(\theta)=\frac{\ell^2}{2ml^2\sin^2\theta}+mgl(1-\cos\theta).
\end{align*}
The first term is the angular-momentum barrier: when $\ell\neq 0$, it tends to infinity as $\sin\theta\to 0$, so fixed nonzero azimuthal momentum prevents the pendulum from reaching the vertical axis in this reduced description.
[/example]
[illustration:spherical-pendulum-effective-potential]
The chapter has separated three mechanisms that often appear together in mechanical systems. Holonomic constraints change the configuration space, non-holonomic constraints change the variational directions, and cyclic variables allow a momentum level to replace an ignorable coordinate. The cyclic-coordinate theorem is the coordinate-level form of Noether's theorem: invariance under translation in $q_n$ gives conservation of the conjugate momentum. The contrast between holonomic and non-holonomic constraints also points toward distribution integrability: an integrable distribution comes from tangent spaces to constraint submanifolds, while a genuinely non-integrable one must be handled through Lagrange-d'Alembert virtual displacements. In Hamiltonian mechanics these ideas reappear as constraint surfaces, momentum maps, and symplectic reduction.
The constrained Lagrangian picture naturally points toward momenta and phase space. Chapter 3 takes that step by passing from the tangent bundle to the cotangent bundle, where the Legendre transform converts velocity data into Hamiltonian dynamics.
# 3. Legendre Transform and Hamiltonian Formalism
The preceding chapters formulated mechanics on the tangent bundle $TQ$: a path $q(t)$ is selected by a variational principle, and its velocity $\dot q(t)$ appears in the Lagrangian $L(q, \dot q)$. This chapter passes from velocities to momenta and builds the Hamiltonian description of classical mechanics. The main questions are when the Euler--Lagrange dynamics on $TQ$ can be rewritten as a first-order flow on the cotangent bundle $T^*Q$, how the canonical symplectic form encodes Hamilton's equations, and why autonomous Hamiltonians are conserved. The prerequisites are the variational formulation from the previous chapter, basic differential forms on manifolds, and the local-coordinate description of tangent and cotangent bundles.
The Hamiltonian viewpoint also prepares tools used well outside point-particle mechanics. The same Legendre transform appears in thermodynamics when passing between energy, Helmholtz free energy, and Gibbs free energy; the same cotangent-bundle geometry underlies geometric optics through the eikonal equation; and the same first-order symplectic formulation is the entry point for semiclassical analysis and canonical quantisation.
## From Velocities to Momenta
The Euler--Lagrange equations are second-order equations for $q(t)$, but many structural features of mechanics become sharper after replacing velocity by momentum. The problem is to turn the derivative of $L$ with respect to the velocity variables into a geometric map $TQ \to T^*Q$.
[definition: Fiber Derivative]
Let $Q$ be a smooth manifold and let $L:TQ \to \mathbb R$ be smooth. The fiber derivative of $L$ is the map $\mathbb{F}L:TQ \to T^*Q$ defined by
\begin{align*}
(\mathbb{F}L)(v_q)(w_q) = \frac{d}{ds}\Big|_{s=0} L(v_q + s w_q),
\end{align*}
for $v_q,w_q \in T_qQ$.
[/definition]
The definition differentiates only along the vector-space fiber $T_qQ$, so the output is a covector at the same base point $q$. In local coordinates $(q_i,\dot q_i)$ on $TQ$, the induced coordinates on $T^*Q$ are $(q_i,p_i)$ with
\begin{align*}
p_i = \frac{\partial L}{\partial \dot q_i}(q,\dot q).
\end{align*}
To use these momenta as replacement coordinates, the map from velocities to momenta must be locally invertible. This requirement leads to a condition on the second velocity derivatives of $L$.
[definition: Regular Lagrangian]
Let $L:TQ \to \mathbb R$ be smooth. The Lagrangian $L$ is regular if, in every local coordinate system, the fiber Hessian matrix
\begin{align*}
W_{ij}(q,\dot q)=\frac{\partial^2 L}{\partial \dot q_i\partial \dot q_j}(q,\dot q)
\end{align*}
is invertible at every point of $TQ$.
[/definition]
Regularity says that the momentum variables locally determine the velocities. This is the exact non-degeneracy condition needed for the passage from a second-order Lagrangian equation to a first-order Hamiltonian equation.
[example: Mechanical Lagrangian]
Let $Q=\mathbb R^n$ and let
\begin{align*}
L(q,\dot q)=\frac12 \dot q^\top M\dot q - V(q),
\end{align*}
where $M$ is symmetric positive definite and $V:\mathbb R^n\to\mathbb R$ is smooth. For a fiber direction $w\in T_q\mathbb R^n\simeq \mathbb R^n$, the fiber derivative is computed by varying only the velocity:
\begin{align*}
L(q,\dot q+sw)=\frac12(\dot q+sw)^\top M(\dot q+sw)-V(q).
\end{align*}
Expanding the quadratic term and using $M^\top=M$,
\begin{align*}
(\dot q+sw)^\top M(\dot q+sw)=\dot q^\top M\dot q+s\,w^\top M\dot q+s\,\dot q^\top Mw+s^2w^\top Mw.
\end{align*}
Since $w^\top M\dot q=(\dot q^\top Mw)^\top=\dot q^\top Mw$, this becomes
\begin{align*}
L(q,\dot q+sw)=\frac12\dot q^\top M\dot q+s\,\dot q^\top Mw+\frac12s^2w^\top Mw-V(q).
\end{align*}
Differentiating at $s=0$ gives
\begin{align*}
(\mathbb FL)(q,\dot q)(w)=\dot q^\top Mw=(M\dot q)\cdot w.
\end{align*}
Thus the covector corresponding to $(\mathbb FL)(q,\dot q)$ has coordinate vector
\begin{align*}
p=M\dot q.
\end{align*}
In coordinates, $p_i=\sum_{j=1}^n M_{ij}\dot q_j$, so
\begin{align*}
\frac{\partial p_i}{\partial \dot q_j}=M_{ij}.
\end{align*}
Hence the fiber Hessian matrix is $W=M$. Because $M$ is positive definite, $z^\top Mz>0$ for every nonzero $z$, so $Mz=0$ forces $z^\top Mz=0$ and therefore $z=0$; thus $M$ is invertible. The Lagrangian is regular, and the momentum equation $p=M\dot q$ determines the velocity uniquely as
\begin{align*}
\dot q=M^{-1}p.
\end{align*}
This is the standard kinetic-minus-potential case: the potential affects the force through $q$, while regularity of the velocity-to-momentum map is controlled entirely by the kinetic matrix $M$.
[/example]
The preceding example captures the usual kinetic-minus-potential case, but the same construction allows velocity-linear terms. Once the momentum has replaced the velocity, we need a function on phase space whose derivatives reproduce the equations of motion.
[definition: Hamiltonian Associated to a Regular Lagrangian]
Let $L:TQ \to \mathbb R$ be regular, and suppose that $\mathbb{F}L:TQ \to T^*Q$ is a diffeomorphism onto its image. The Hamiltonian associated to $L$ is the function $H:\mathbb{F}L(TQ) \to \mathbb R$ defined by
\begin{align*}
H(q,p)=p(v_q)-L(v_q), \qquad (q,p)=\mathbb{F}L(v_q).
\end{align*}
[/definition]
The formula subtracts the Lagrangian from the velocity-momentum pairing. In coordinates, if $p_i=\partial L/\partial \dot q_i$ and $\dot q=\dot q(q,p)$ is obtained by inverting the fiber derivative, then
\begin{align*}
H(q,p)=\sum_{i=1}^n p_i\dot q_i(q,p)-L(q,\dot q(q,p)).
\end{align*}
The remaining issue is whether this new phase-space function is dynamically equivalent to the Euler--Lagrange equations. The answer is the Legendre transform equivalence theorem.
[quotetheorem:6840]
[citeproof:6840]
This theorem is the algebraic bridge between the two pictures of mechanics, but its hypotheses are doing real work. Regularity gives local invertibility of the velocity-to-momentum map, so the equations can be rewritten in nearby coordinates. The diffeomorphism-onto-image hypothesis is stronger: it rules out different velocities producing the same momentum on the region under consideration, which is the global invertibility needed to define $H$ as a single-valued function there. The open-image assumption then ensures that this image is a genuine phase-space manifold region on which the Hamiltonian vector field can be formed. Thus the theorem does not say that every Lagrangian gives a Hamiltonian system on all of $T^*Q$, nor that momentum coordinates remain globally independent when the fiber derivative has singularities, self-overlaps, or non-open image.
[example: Singular Lagrangian]
Let $Q=\mathbb R$ and let $L(q,\dot q)=V(q)$, so the Lagrangian has no dependence on the velocity coordinate $\dot q$. We compute the fiber derivative by varying only the velocity: for $w\in T_q\mathbb R\simeq \mathbb R$,
\begin{align*}
L(q,\dot q+sw)=V(q).
\end{align*}
The right-hand side is constant as a function of $s$, so
\begin{align*}
(\mathbb FL)(q,\dot q)(w)=\frac{d}{ds}\Big|_{s=0}V(q)=0.
\end{align*}
Thus $(\mathbb FL)(q,\dot q)$ is the zero covector at $q$ for every velocity $\dot q$, and in coordinates this says
\begin{align*}
p=\frac{\partial L}{\partial \dot q}=0.
\end{align*}
The fiber Hessian is the $1\times 1$ matrix
\begin{align*}
W(q,\dot q)=\frac{\partial^2 L}{\partial \dot q^2}(q,\dot q)=\frac{\partial}{\partial \dot q}(0)=0.
\end{align*}
This matrix is not invertible, since multiplication by $0$ sends every velocity variation to $0$. Equivalently, the momentum equation $p=0$ imposes no condition on $\dot q$: if $p=0$, then every $\dot q\in\mathbb R$ maps to that same momentum. Therefore the velocity cannot be recovered from the momentum. This is the basic failure pattern behind constrained Hamiltonian systems: the image of the fiber derivative is the constraint subset $\{(q,p)\in T^*\mathbb R:p=0\}$, not the full cotangent bundle with arbitrary $(q,p)$.
[/example]
The open-image assumption is also not cosmetic. Hamilton's equations use vector fields on a phase-space manifold, so the image of the Legendre transform must at least be a suitable open region, or else one must replace the ordinary Hamiltonian flow by a constrained or presymplectic formulation.
[example: Global Failure of Injectivity]
Let $Q=S^1$ with angular coordinate $q$, and write $v=\dot q$. For
\begin{align*}
L(q,v)=\frac14 v^4-\frac12 v^2,
\end{align*}
the fiber derivative is obtained by differentiating with respect to the velocity variable:
\begin{align*}
p=\frac{\partial L}{\partial v}(q,v)=\frac14\cdot 4v^3-\frac12\cdot 2v=v^3-v.
\end{align*}
Thus the velocity-to-momentum map on each fiber is the function $f(v)=v^3-v$. Its derivative is
\begin{align*}
f'(v)=3v^2-1.
\end{align*}
At any velocity with $3v^2-1\ne 0$, the *one-variable [inverse function theorem](/theorems/51)* gives a locally defined inverse from momentum back to velocity.
The same map is not globally injective. Indeed,
\begin{align*}
f(-1)=(-1)^3-(-1)=-1+1=0,
\end{align*}
\begin{align*}
f(0)=0^3-0=0,
\end{align*}
and
\begin{align*}
f(1)=1^3-1=0.
\end{align*}
So the three distinct velocities $v=-1,0,1$ all determine the same momentum $p=0$ at the same base point $q$.
The Legendre expression $pv-L(q,v)$ therefore depends on which velocity [lying over](/theorems/2876) $p=0$ is chosen. For $v=0$,
\begin{align*}
pv-L(q,v)=0\cdot 0-\left(\frac14\cdot 0^4-\frac12\cdot 0^2\right)=0.
\end{align*}
For $v=1$,
\begin{align*}
pv-L(q,v)=0\cdot 1-\left(\frac14\cdot 1^4-\frac12\cdot 1^2\right)=0-\left(\frac14-\frac12\right)=\frac14.
\end{align*}
For $v=-1$,
\begin{align*}
pv-L(q,v)=0\cdot (-1)-\left(\frac14\cdot (-1)^4-\frac12\cdot (-1)^2\right)=0-\left(\frac14-\frac12\right)=\frac14.
\end{align*}
Thus the same phase-space point with momentum $p=0$ receives different Legendre-transform values from different velocities. Without restricting to a single locally invertible branch of $v\mapsto v^3-v$, this Lagrangian does not determine a single-valued Hamiltonian on its full Legendre image.
[/example]
The preceding example separates local invertibility on selected branches from global injectivity. A different failure occurs when the velocity-to-momentum map lands in a subset with boundary, so the resulting momenta do not form an ordinary open phase-space region.
[example: Non-Open Legendre Image]
Let $Q=\mathbb R$ and define
\begin{align*}
L(q,\dot q)=\frac13\dot q^3.
\end{align*}
For a fiber direction $w\in T_q\mathbb R\simeq \mathbb R$, varying only the velocity gives
\begin{align*}
L(q,\dot q+sw)=\frac13(\dot q+sw)^3.
\end{align*}
Expanding the cubic,
\begin{align*}
(\dot q+sw)^3=\dot q^3+3s\dot q^2w+3s^2\dot q w^2+s^3w^3.
\end{align*}
Hence
\begin{align*}
L(q,\dot q+sw)=\frac13\dot q^3+s\dot q^2w+s^2\dot q w^2+\frac13s^3w^3.
\end{align*}
Differentiating with respect to $s$ at $s=0$ gives
\begin{align*}
(\mathbb FL)(q,\dot q)(w)=\dot q^2w.
\end{align*}
Therefore the covector $(\mathbb FL)(q,\dot q)$ has coordinate
\begin{align*}
p=\frac{\partial L}{\partial \dot q}=\dot q^2.
\end{align*}
Since $\dot q^2\ge 0$ for every $\dot q\in\mathbb R$, every point in the image satisfies $p\ge 0$. Conversely, if $p\ge 0$, then choosing $\dot q=\sqrt p$ gives $\dot q^2=p$, so every momentum with $p\ge 0$ occurs. Thus
\begin{align*}
\mathbb FL(T\mathbb R)=\{(q,p)\in T^*\mathbb R:p\ge 0\}.
\end{align*}
The fiber Hessian is
\begin{align*}
W(q,\dot q)=\frac{\partial^2 L}{\partial \dot q^2}=\frac{\partial}{\partial \dot q}(\dot q^2)=2\dot q.
\end{align*}
At the boundary point $p=0$, the equation $p=\dot q^2$ forces $\dot q=0$, so
\begin{align*}
W(q,0)=2\cdot 0=0.
\end{align*}
Thus the Legendre image is the closed half-space $p\ge 0$, with boundary $p=0$, and the fiber Hessian degenerates exactly at that boundary. The failure is not merely global: near $p=0$ the image is not an open phase-space region of $T^*\mathbb R$, so the usual Hamiltonian vector-field construction on an open cotangent-bundle domain does not apply there without a constrained or boundary-sensitive formulation.
[/example]
## Canonical Symplectic Geometry and Hamilton's Equations
After passing to $T^*Q$, the next problem is to describe dynamics without referring to any chosen Lagrangian. The cotangent bundle has a canonical one-form and symplectic form, and a Hamiltonian function determines a vector field through this symplectic structure.
[definition: Canonical One-Form]
Let $\pi:T^*Q \to Q$ be the cotangent projection. The canonical one-form is the smooth one-form $\theta \in \Omega^1(T^*Q)$ defined pointwise by
\begin{align*}
\theta_{\alpha_q}:T_{\alpha_q}(T^*Q) \to \mathbb R, \qquad
\theta_{\alpha_q}(V)=\alpha_q(d\pi_{\alpha_q}(V)),
\end{align*}
for $\alpha_q \in T_q^*Q$ and $V \in T_{\alpha_q}(T^*Q)$.
[/definition]
In canonical coordinates $(q_i,p_i)$, this becomes
\begin{align*}
\theta=\sum_{i=1}^n p_i\,dq_i.
\end{align*}
The canonical one-form measures the base displacement of a tangent vector against the covector sitting above that base point. A one-form cannot pair two tangent directions or identify differentials with vector fields, so it is not yet the geometric object needed for Hamiltonian dynamics.
To build Hamiltonian dynamics intrinsically, the cotangent bundle needs a two-form that can compare two tangent directions at once. Taking the [exterior derivative](/theorems/1525) of the canonical one-form produces exactly this object: a closed, non-degenerate form that will convert differentials of Hamiltonians into vector fields.
[definition: Canonical Symplectic Form]
The canonical symplectic form on $T^*Q$ is the smooth two-form $\omega \in \Omega^2(T^*Q)$ defined by
\begin{align*}
\omega=-d\theta.
\end{align*}
In canonical coordinates $(q_i,p_i)$,
\begin{align*}
\omega=\sum_{i=1}^n dq_i\wedge dp_i.
\end{align*}
[/definition]
Equivalently, for each $\alpha_q \in T^*Q$, the value $\omega_{\alpha_q}$ is an alternating bilinear map
\begin{align*}
\omega_{\alpha_q}:T_{\alpha_q}(T^*Q)\times T_{\alpha_q}(T^*Q) \to \mathbb R.
\end{align*}
The form $\omega$ is closed because exterior differentiation squares to zero, and it is non-degenerate in canonical coordinates. Non-degeneracy is what lets a differential $dH$ be converted into a vector field, so the next object is the vector field determined by a Hamiltonian.
[definition: Hamiltonian Vector Field]
Let $(M,\omega)$ be a symplectic manifold and let $H:M \to \mathbb R$ be smooth. The Hamiltonian vector field is the smooth section $X_H:M\to TM$ satisfying
\begin{align*}
\omega(X_H,Y)=dH(Y)
\end{align*}
for every vector field $Y:M\to TM$ on $M$.
[/definition]
This definition turns the scalar function $H$ into dynamics, but the equation $\omega(X_H,Y)=dH(Y)$ is implicit: it defines $X_H$ by how it pairs with every test vector field $Y$. For calculations one must solve that implicit symplectic equation for the components of $X_H$. In a canonical cotangent chart, the special form of $\omega$ makes those components explicit.
The point of returning to coordinates is not to abandon the intrinsic definition, but to recover the familiar differential equations from it. The implicit equation determines $X_H$ only through its pairings with arbitrary vector fields, so a reader still has to solve for the actual coefficients of $X_H$ in a chart before any trajectory can be computed. In canonical cotangent coordinates, the Darboux form of $\omega$ removes that ambiguity and turns the intrinsic definition into the usual signed system for $\dot q_i$ and $\dot p_i$.
The natural coordinate question is therefore precise: if $H$ is written as a function of canonical position and momentum variables, which derivatives of $H$ give the velocity components of the associated flow? The following result answers that question by translating the intrinsic defining equation for $X_H$ into the standard component equations used to compute Hamiltonian trajectories.
[quotetheorem:6841]
[citeproof:6841]
The symplectic derivation shows that Hamilton's equations are coordinate expressions of a geometric construction. The canonical-coordinate hypothesis is part of the statement, not a cosmetic choice: the formulas above use a chart in which the canonical form has the Darboux expression $\omega=\sum_i dq_i\wedge dp_i$ with $q_i$ and $p_i$ paired. In a non-canonical chart, the same vector field is still defined by $\omega(X_H,Y)=dH(Y)$, but the displayed signs and pairings may be replaced by the coordinate expression of the transformed two-form. Thus the theorem gives the local coordinate form of Hamiltonian dynamics on each canonical cotangent chart, while the global object is the vector field $X_H$.
[example: Scaling to a Non-Canonical Chart]
On $T^*\mathbb R$ with canonical coordinates $(q,p)$, introduce coordinates $Q=q$ and $P=2p$. Since $q=Q$ and $p=P/2$, their differentials satisfy $dq=dQ$ and $dp=\frac12\,dP$. Therefore the original canonical symplectic form becomes
\begin{align*}
\omega=dq\wedge dp=dQ\wedge \left(\frac12\,dP\right)=\frac12\,dQ\wedge dP.
\end{align*}
Write the Hamiltonian in the new coordinates as $K(Q,P)=H(Q,P/2)$. Let
\begin{align*}
X_K=a\,\partial_Q+b\,\partial_P
\end{align*}
and test it against an arbitrary vector field
\begin{align*}
Y=c\,\partial_Q+d\,\partial_P.
\end{align*}
The wedge product satisfies
\begin{align*}
(dQ\wedge dP)(X_K,Y)=dQ(X_K)dP(Y)-dP(X_K)dQ(Y)=ad-bc.
\end{align*}
Hence
\begin{align*}
\omega(X_K,Y)=\frac12(ad-bc).
\end{align*}
On the other hand,
\begin{align*}
dK(Y)=\frac{\partial K}{\partial Q}c+\frac{\partial K}{\partial P}d.
\end{align*}
The defining equation $\omega(X_K,Y)=dK(Y)$ for every choice of $c$ and $d$ gives
\begin{align*}
\frac12 a=\frac{\partial K}{\partial P}
\end{align*}
and
\begin{align*}
-\frac12 b=\frac{\partial K}{\partial Q}.
\end{align*}
Thus
\begin{align*}
a=2\frac{\partial K}{\partial P}
\end{align*}
and
\begin{align*}
b=-2\frac{\partial K}{\partial Q}.
\end{align*}
Along an integral curve of $X_K$, this means
\begin{align*}
\dot Q=2\frac{\partial K}{\partial P}
\end{align*}
and
\begin{align*}
\dot P=-2\frac{\partial K}{\partial Q}.
\end{align*}
The factor $2$ appears because the transformed coordinates satisfy $\omega=\frac12\,dQ\wedge dP$, so $(Q,P)$ is a legitimate coordinate chart but not a canonical coordinate pair for the original symplectic form.
[/example]
The non-degeneracy of $\omega$ is essential: if a closed two-form is degenerate, then the equation $\omega(X,Y)=dH(Y)$ may have no solution or many solutions. For instance, if $\omega=0$ on $\mathbb R^2$ and $H$ is non-constant, then $dH$ cannot be represented by contraction with any vector field; if $H$ is constant, every vector field satisfies the equation.
The coordinate formula also has a local meaning. A canonical coordinate chart gives the displayed equations on that chart, but the equations alone do not identify a global Hamiltonian system unless the underlying symplectic form and Hamiltonian function agree on overlaps. This point matters later when canonical transformations and symmetries are introduced: the form $\omega$, not a particular coordinate chart, is the object that must be preserved.
[example: Harmonic Oscillator]
Let $Q=\mathbb R$ and let $m,k>0$. For the Hamiltonian
\begin{align*}
H(q,p)=\frac{p^2}{2m}+\frac12 kq^2,
\end{align*}
the partial derivatives are
\begin{align*}
\frac{\partial H}{\partial p}=\frac{1}{2m}\cdot 2p=\frac{p}{m}
\end{align*}
and
\begin{align*}
\frac{\partial H}{\partial q}=\frac12 k\cdot 2q=kq.
\end{align*}
By *Hamilton Equations in Canonical Coordinates*, the integral curves satisfy
\begin{align*}
\dot q=\frac{\partial H}{\partial p}=\frac{p}{m}
\end{align*}
and
\begin{align*}
\dot p=-\frac{\partial H}{\partial q}=-kq.
\end{align*}
Differentiating $\dot q=p/m$ with respect to time gives
\begin{align*}
\ddot q=\frac{1}{m}\dot p.
\end{align*}
Substituting $\dot p=-kq$ yields
\begin{align*}
\ddot q=\frac{1}{m}(-kq)=-\frac{k}{m}q.
\end{align*}
Equivalently,
\begin{align*}
\ddot q+\frac{k}{m}q=0.
\end{align*}
Along any solution of the displayed first-order system,
\begin{align*}
\frac{d}{dt}H(q(t),p(t))
=\frac{\partial H}{\partial q}\dot q+\frac{\partial H}{\partial p}\dot p
=(kq)\frac{p}{m}+\frac{p}{m}(-kq)=0.
\end{align*}
Thus the Hamiltonian has a constant value $E$ on each trajectory, and the phase curve lies on
\begin{align*}
\frac{p^2}{2m}+\frac12 kq^2=E.
\end{align*}
For $E>0$, dividing by $E$ gives
\begin{align*}
\frac{p^2}{2mE}+\frac{q^2}{2E/k}=1,
\end{align*}
which is an ellipse in the $(q,p)$-plane. The case $E=0$ gives $p=0$ and $q=0$, so the origin is the equilibrium trajectory. Thus the Hamiltonian first-order system encodes the usual harmonic oscillator equation, and its constant-energy sets are the oscillator phase curves.
[/example]
[illustration:harmonic-oscillator-phase-portrait]
The harmonic oscillator illustrates that Hamiltonian dynamics is naturally first-order, even when it represents a familiar second-order Newton equation. More complicated force laws are handled by changing the Hamiltonian rather than changing the symplectic form.
[example: Kepler Hamiltonian]
Let $Q=\mathbb R^3_0=\mathbb R^3\setminus\{0\}$, write $q=(q_1,q_2,q_3)$ and $p=(p_1,p_2,p_3)$, and consider
\begin{align*}
H(q,p)=\frac{|p|^2}{2m}-\frac{\mu m}{|q|}
\end{align*}
with $m,\mu>0$. Since
\begin{align*}
|p|^2=p_1^2+p_2^2+p_3^2,
\end{align*}
we have, for each $i=1,2,3$,
\begin{align*}
\frac{\partial H}{\partial p_i}=\frac{1}{2m}\frac{\partial}{\partial p_i}(p_1^2+p_2^2+p_3^2)=\frac{p_i}{m}.
\end{align*}
Also,
\begin{align*}
|q|^{-1}=(q_1^2+q_2^2+q_3^2)^{-1/2},
\end{align*}
so the chain rule gives
\begin{align*}
\frac{\partial}{\partial q_i}|q|^{-1}=-\frac12(q_1^2+q_2^2+q_3^2)^{-3/2}\cdot 2q_i=-\frac{q_i}{|q|^3}.
\end{align*}
Therefore
\begin{align*}
\frac{\partial H}{\partial q_i}=-\mu m\frac{\partial}{\partial q_i}|q|^{-1}=\frac{\mu m q_i}{|q|^3}.
\end{align*}
By *Hamilton Equations in Canonical Coordinates*, the integral curves satisfy
\begin{align*}
\dot q_i=\frac{\partial H}{\partial p_i}=\frac{p_i}{m}
\end{align*}
and
\begin{align*}
\dot p_i=-\frac{\partial H}{\partial q_i}=-\frac{\mu m q_i}{|q|^3}.
\end{align*}
In vector form this is
\begin{align*}
\dot q=\frac{p}{m}
\end{align*}
and
\begin{align*}
\dot p=-\frac{\mu m q}{|q|^3}.
\end{align*}
Differentiating $\dot q_i=p_i/m$ with respect to time gives
\begin{align*}
\ddot q_i=\frac{1}{m}\dot p_i.
\end{align*}
Substituting the equation for $\dot p_i$ gives
\begin{align*}
\ddot q_i=\frac{1}{m}\left(-\frac{\mu m q_i}{|q|^3}\right)=-\frac{\mu q_i}{|q|^3}.
\end{align*}
Multiplying by $m$ and collecting the three coordinate equations yields
\begin{align*}
m\ddot q=-\frac{\mu m q}{|q|^3}.
\end{align*}
The point $q=0$ is excluded because the term $|q|^{-1}$ is not defined there. Along every Hamiltonian trajectory, the chain rule and the displayed Hamilton equations give
\begin{align*}
\frac{d}{dt}H(q(t),p(t))
=\sum_{i=1}^3\frac{\partial H}{\partial q_i}\dot q_i+\sum_{i=1}^3\frac{\partial H}{\partial p_i}\dot p_i
=\sum_{i=1}^3\frac{\mu m q_i}{|q|^3}\frac{p_i}{m}
+\sum_{i=1}^3\frac{p_i}{m}\left(-\frac{\mu m q_i}{|q|^3}\right)=0.
\end{align*}
Hence $H(q(t),p(t))$ is constant along the motion.
[/example]
## Energy and Autonomous Hamiltonian Dynamics
The final question in this chapter is what quantity remains constant during motion. In the Lagrangian picture the relevant function is the energy, while in the Hamiltonian picture it is the Hamiltonian itself whenever there is no explicit time dependence.
[definition: Lagrangian Energy]
Let $L:TQ \to \mathbb R$ be smooth. The Lagrangian energy is the function $E_L:TQ \to \mathbb R$ defined by
\begin{align*}
E_L(v_q)=(\mathbb{F}L(v_q))(v_q)-L(v_q).
\end{align*}
[/definition]
For a regular Lagrangian, the Hamiltonian associated to $L$ is precisely the Lagrangian energy transported through the Legendre transform. That is,
\begin{align*}
H\circ \mathbb{F}L=E_L.
\end{align*}
This identity explains why the Hamiltonian is interpreted as energy in autonomous mechanical systems, but conservation still needs a separate argument from the Hamiltonian flow.
[quotetheorem:6842]
[citeproof:6842]
This conservation argument is short because conservation is built into the symplectic definition of the vector field. The two-form structure matters more than the notation suggests: alternation gives $\omega(X_H,X_H)=0$ for every vector field $X_H$. If instead the dynamics were defined using a non-alternating bilinear form $g$ by $g(X,Y)=dH(Y)$, then the derivative along the resulting vector field would be $dH(X)=g(X,X)$, which need not vanish. Conservation is therefore a symplectic consequence, not a generic consequence of representing $dH$ by some bilinear pairing.
The autonomy hypothesis is equally essential. If a Hamiltonian depends explicitly on time, say
\begin{align*}
H_t(q,p)=\frac{p^2}{2m}+\frac12 k(t)q^2,
\end{align*}
then along a solution the total derivative includes the extra term $\partial H_t/\partial t=k'(t)q^2/2$, so the Hamiltonian need not be conserved. Thus conservation here is not a numerical accident of the coordinate equations; it is the autonomous symplectic pairing $dH(X_H)=\omega(X_H,X_H)$ that forces the derivative to vanish.
[example: Charged Particle in a Magnetic Vector Potential]
Let $Q=\mathbb R^3$, let $A=(A_1,A_2,A_3):\mathbb R^3\to\mathbb R^3$ be smooth, and let $V:\mathbb R^3\to\mathbb R$ be smooth. For
\begin{align*}
L(q,\dot q)=\frac12 m|\dot q|^2+eA(q)\cdot \dot q-eV(q),
\end{align*}
we compute the momentum by differentiating with respect to the velocity coordinates. Since
\begin{align*}
|\dot q|^2=\sum_{j=1}^3 \dot q_j^2
\end{align*}
and
\begin{align*}
A(q)\cdot \dot q=\sum_{j=1}^3 A_j(q)\dot q_j,
\end{align*}
we have
\begin{align*}
p_i=\frac{\partial L}{\partial \dot q_i}=\frac12 m\cdot 2\dot q_i+eA_i(q)=m\dot q_i+eA_i(q).
\end{align*}
Thus
\begin{align*}
p=m\dot q+eA(q),
\end{align*}
so the velocity is recovered from the canonical momentum by
\begin{align*}
\dot q=\frac{1}{m}(p-eA(q)).
\end{align*}
The Legendre expression is
\begin{align*}
H(q,p)=p\cdot \dot q-L(q,\dot q).
\end{align*}
Using $p=m\dot q+eA(q)$, the first term becomes
\begin{align*}
p\cdot \dot q=(m\dot q+eA(q))\cdot \dot q=m|\dot q|^2+eA(q)\cdot \dot q.
\end{align*}
Therefore
\begin{align*}
H(q,p)=m|\dot q|^2+eA(q)\cdot \dot q-\left(\frac12 m|\dot q|^2+eA(q)\cdot \dot q-eV(q)\right).
\end{align*}
Cancelling the velocity-linear terms gives
\begin{align*}
H(q,p)=\frac12 m|\dot q|^2+eV(q).
\end{align*}
Substituting $\dot q=(p-eA(q))/m$ gives
\begin{align*}
H(q,p)=\frac12 m\left|\frac{p-eA(q)}{m}\right|^2+eV(q)=\frac{1}{2m}|p-eA(q)|^2+eV(q).
\end{align*}
Now set
\begin{align*}
u=p-eA(q).
\end{align*}
Then
\begin{align*}
H(q,p)=\frac{1}{2m}\sum_{j=1}^3 u_j^2+eV(q).
\end{align*}
For each $i$,
\begin{align*}
\frac{\partial H}{\partial p_i}=\frac{1}{2m}\cdot 2u_i\frac{\partial u_i}{\partial p_i}=\frac{u_i}{m},
\end{align*}
because $\partial u_i/\partial p_i=1$ and $\partial u_j/\partial p_i=0$ for $j\ne i$. Also,
\begin{align*}
\frac{\partial H}{\partial q_i}=\frac{1}{2m}\sum_{j=1}^3 2u_j\frac{\partial u_j}{\partial q_i}+e\frac{\partial V}{\partial q_i}.
\end{align*}
Since $u_j=p_j-eA_j(q)$,
\begin{align*}
\frac{\partial u_j}{\partial q_i}=-e\frac{\partial A_j}{\partial q_i}.
\end{align*}
Hence
\begin{align*}
\frac{\partial H}{\partial q_i}=-\frac{e}{m}\sum_{j=1}^3 u_j\frac{\partial A_j}{\partial q_i}+e\frac{\partial V}{\partial q_i}.
\end{align*}
By *Hamilton Equations in Canonical Coordinates*, the integral curves satisfy
\begin{align*}
\dot q_i=\frac{u_i}{m}=\frac{p_i-eA_i(q)}{m}
\end{align*}
and
\begin{align*}
\dot p_i=\frac{e}{m}\sum_{j=1}^3 u_j\frac{\partial A_j}{\partial q_i}-e\frac{\partial V}{\partial q_i}.
\end{align*}
Because $u_j=m\dot q_j$, the momentum equation becomes
\begin{align*}
\dot p_i=e\sum_{j=1}^3 \dot q_j\frac{\partial A_j}{\partial q_i}-e\frac{\partial V}{\partial q_i}.
\end{align*}
On the other hand, differentiating $p_i=m\dot q_i+eA_i(q)$ along the curve gives
\begin{align*}
\dot p_i=m\ddot q_i+e\sum_{j=1}^3 \frac{\partial A_i}{\partial q_j}\dot q_j.
\end{align*}
Equating the two formulas for $\dot p_i$ gives
\begin{align*}
m\ddot q_i+e\sum_{j=1}^3 \frac{\partial A_i}{\partial q_j}\dot q_j=e\sum_{j=1}^3 \dot q_j\frac{\partial A_j}{\partial q_i}-e\frac{\partial V}{\partial q_i}.
\end{align*}
Moving the $A_i$-derivative terms to the right gives
\begin{align*}
m\ddot q_i=e\sum_{j=1}^3 \dot q_j\left(\frac{\partial A_j}{\partial q_i}-\frac{\partial A_i}{\partial q_j}\right)-e\frac{\partial V}{\partial q_i}.
\end{align*}
If $B=\nabla\times A$, then the coordinate identity
\begin{align*}
(\dot q\times B)_i=\sum_{j=1}^3 \dot q_j\left(\frac{\partial A_j}{\partial q_i}-\frac{\partial A_i}{\partial q_j}\right)
\end{align*}
rewrites the equation as
\begin{align*}
m\ddot q=e\dot q\times B-e\nabla V.
\end{align*}
Thus the Hamiltonian equations recover the Lorentz force law with electric field $-\nabla V$ and magnetic field $B=\nabla\times A$. The canonical momentum is $p=m\dot q+eA(q)$, while the mechanical momentum is $m\dot q=p-eA(q)$, so the two coincide only when the vector-potential term vanishes.
[/example]
The magnetic example is a useful warning for later symplectic reduction and gauge symmetry. The cotangent coordinates remain canonical, but the physical velocity is recovered from $p-eA(q)$ rather than from $p$ alone.
Hamiltonian mechanics is most transparent once the underlying linear algebra is understood. Chapter 4 isolates the symplectic structure behind phase space, showing how canonical coordinates, skew pairings, and Poisson brackets organize the equations of motion.
# 4. Symplectic Linear Algebra and Canonical Phase Space
Building on the Legendre transform and Hamiltonian formalism of Chapter 3, this chapter isolates the symplectic linear algebra behind phase-space mechanics. The central issue is that phase space is not just a space of positions and momenta: it carries a skew pairing that determines Hamilton's equations, canonical transformations, and conserved brackets. We first study this structure in linear algebra, then see how the cotangent bundle $T^*Q$ carries it without making coordinate choices.
## Symplectic Vector Spaces and Darboux Bases
What kind of linear structure is needed to turn a derivative $dH$ of an energy function into a vector field? A Riemannian metric would identify covectors and vectors by a symmetric pairing, but Hamiltonian mechanics uses a skew pairing instead. The nondegeneracy of this skew pairing is what allows every covector to determine a unique vector.
[definition: Symplectic Vector Space]
A symplectic [vector space](/page/Vector%20Space) is a finite-dimensional real vector space $V$ together with a bilinear map $\omega: V \times V \to \mathbb R$ such that:
1. $\omega(u,v) = -\omega(v,u)$ for all $u,v \in V$;
2. if $\omega(u,v)=0$ for all $v \in V$, then $u=0$.
[/definition]
The first condition says that $\omega$ measures oriented two-dimensional area rather than length. The second condition says that no nonzero vector is invisible to the pairing, so the map $V \to V^*$ given by $u \mapsto \omega(u,\cdot)$ is an isomorphism. To compute with this abstract structure, the next example gives the model that all finite-dimensional symplectic vector spaces locally resemble.
[example: Standard Symplectic Space]
Let $V=\mathbb R^{2n}$ with coordinates $(q_1,\dots,q_n,p_1,\dots,p_n)$, and write a vector as $u=(a,b)$ with $a,b\in\mathbb R^n$, where $a$ is the $q$-part and $b$ is the $p$-part. Define
\begin{align*}
\omega_0=\sum_{i=1}^n dq_i\wedge dp_i.
\end{align*}
For $u=(a,b)$ and $v=(c,d)$, the coordinate one-forms satisfy $dq_i(u)=a_i$, $dp_i(u)=b_i$, $dq_i(v)=c_i$, and $dp_i(v)=d_i$. Hence
\begin{align*}
(dq_i\wedge dp_i)(u,v)=dq_i(u)dp_i(v)-dq_i(v)dp_i(u)=a_i d_i-c_i b_i.
\end{align*}
Summing over $i$ gives
\begin{align*}
\omega_0(u,v)=\sum_{i=1}^n(a_i d_i-c_i b_i)=a\cdot d-c\cdot b.
\end{align*}
The same formula with $u$ and $v$ interchanged gives
\begin{align*}
\omega_0(v,u)=c\cdot b-a\cdot d=-(a\cdot d-c\cdot b)=-\omega_0(u,v),
\end{align*}
so $\omega_0$ is skew-symmetric.
It remains to check nondegeneracy. Suppose $\omega_0(u,v)=0$ for every $v=(c,d)$. Taking $v=(0,d)$ gives
\begin{align*}
0=\omega_0((a,b),(0,d))=a\cdot d.
\end{align*}
Since this holds for every $d\in\mathbb R^n$, choosing $d=a$ gives $a\cdot a=0$, so $a=0$. Taking $v=(c,0)$ gives
\begin{align*}
0=\omega_0((a,b),(c,0))=-c\cdot b.
\end{align*}
Since this holds for every $c\in\mathbb R^n$, choosing $c=b$ gives $-b\cdot b=0$, so $b=0$. Thus $u=(0,0)$, and $\omega_0$ is nondegenerate. This is the standard model of a symplectic vector space: the $q$-directions and $p$-directions pair with each other, while neither family pairs with itself.
[/example]
The standard example suggests that every symplectic vector space should have paired position-like and momentum-like directions, but an arbitrary skew nondegenerate form need not come with such pairs visibly marked. The linear problem is to choose a basis in which the form has the same pairing pattern as the model on $\mathbb R^{2n}$. Solving that problem is what justifies using canonical linear coordinates in symplectic mechanics.
The needed linear algebra result is [Classification of Skew-Symmetric Forms](/theorems/3295): after a suitable choice of basis, the matrix of a skew-symmetric bilinear form has zero diagonal blocks, an identity block $I_r$ pairing one family of basis vectors with a second family, a block $-I_r$ in the opposite position, and only zero blocks on the radical. In the nondegenerate case the radical block is absent, so the vector space has even dimension and the form is equivalent to the standard symplectic form.
The nondegeneracy hypothesis is essential: a skew form with a nonzero radical cannot pair every chosen $e_i$ with some $f_i$, so the induction would stop before spanning $V$. For example, the zero two-form on $\mathbb R^2$ is skew-symmetric but admits no Darboux basis satisfying $\omega(e_1,f_1)=1$. The theorem does not say that a Darboux basis is canonical or unique; many different bases are related by symplectic linear transformations. Its value is that it reduces local linear calculations to the standard model, which is exactly the model used next to analyse complements and Lagrangian subspaces.
A basis as in the theorem is called a Darboux basis, and it turns arbitrary symplectic linear algebra into the model calculation on $\mathbb R^{2n}$. The next question is how to identify subspaces that are invisible to the symplectic pairing, because constraints and canonical relations are often expressed by such subspaces rather than by full bases.
[definition: Symplectic Complement]
Let $(V,\omega)$ be a symplectic vector space and let $W\subset V$ be a linear subspace. The symplectic complement of $W$ is
\begin{align*}
W^\omega = \{v\in V : \omega(v,w)=0 \text{ for all } w\in W\}.
\end{align*}
[/definition]
The symplectic complement is not the same as an orthogonal complement from Euclidean geometry, since $W$ may intersect $W^\omega$ nontrivially. The covector-side analogue is the annihilator: if $U\subset V$ is a subspace of a finite-dimensional vector space, then
\begin{align*}
U^0=\{\alpha\in V^*:\alpha(u)=0\text{ for every }u\in U\}.
\end{align*}
Under the symplectic isomorphism $V\to V^*$, $v\mapsto \omega(v,\cdot)$, the symplectic complement $W^\omega$ corresponds to the annihilator $W^0$. To know how large complements can be, and hence how many independent constraints a vanishing pairing can impose, we need the following dimension formula.
[quotetheorem:420]
[citeproof:420]
The formula depends on nondegeneracy of $\omega$, because otherwise the map $V\to V^*$ used in the proof would not be an isomorphism. If $\omega=0$ on $\mathbb R^2$ and $W$ is a line, then $W^\omega=V$, so $\dim W+\dim W^\omega=3\ne 2$. The theorem does not claim that $W\cap W^\omega$ is zero; isotropic subspaces may lie inside their own symplectic complements. This dimension count is the precise reason that the next definition, $L=L^\omega$, singles out half-dimensional maximal isotropic subspaces.
The formula shows that a subspace equal to its symplectic complement must have half the dimension of the ambient space. This motivates the following definition of the maximal subspaces on which the symplectic pairing vanishes; these are the linear prototypes for configuration slices, generating functions, and graphs of canonical transformations.
[definition: Lagrangian Subspace]
Let $(V,\omega)$ be a symplectic vector space. A linear subspace $L\subset V$ is Lagrangian if $L=L^\omega$.
[/definition]
For example, in $\mathbb R^{2n}$ the subspace $\{(q,0):q\in\mathbb R^n\}$ and the subspace $\{(0,p):p\in\mathbb R^n\}$ are both Lagrangian. More generally, the graph of a [linear map](/page/Linear%20Map) $A:\mathbb R^n\to(\mathbb R^n)^*$ inside $\mathbb R^n\oplus(\mathbb R^n)^*$ is Lagrangian precisely when the bilinear form $(u,v)\mapsto A(u)(v)$ is symmetric. This prepares the language for linear canonical transformations, which are maps preserving the symplectic pairing itself.
[example: Linear Canonical Transformation]
Let $S:\mathbb R^{2n}\to\mathbb R^{2n}$ be linear, and let $J$ be the matrix representing the standard symplectic form, so that
\begin{align*}
\omega_0(u,v)=u^\top Jv
\end{align*}
for all $u,v\in\mathbb R^{2n}$. The condition that $S$ preserve $\omega_0$ is
\begin{align*}
\omega_0(Su,Sv)=\omega_0(u,v)\quad\text{for all }u,v\in\mathbb R^{2n}.
\end{align*}
Using the matrix formula for $\omega_0$, the left-hand side is
\begin{align*}
\omega_0(Su,Sv)=(Su)^\top J(Sv).
\end{align*}
Since $(Su)^\top=u^\top S^\top$, this becomes
\begin{align*}
(Su)^\top J(Sv)=u^\top S^\top J S v.
\end{align*}
Thus preservation of $\omega_0$ says
\begin{align*}
u^\top S^\top J S v=u^\top Jv\quad\text{for all }u,v.
\end{align*}
Subtracting the right-hand side gives
\begin{align*}
u^\top(S^\top J S-J)v=0\quad\text{for all }u,v.
\end{align*}
If a matrix $A$ satisfies $u^\top A v=0$ for all $u,v$, then choosing $u=e_i$ and $v=e_j$ gives $A_{ij}=0$ for every pair $(i,j)$, so $A=0$. Applying this to $A=S^\top J S-J$ gives
\begin{align*}
S^\top J S=J.
\end{align*}
Conversely, if $S^\top J S=J$, then
\begin{align*}
\omega_0(Su,Sv)=u^\top S^\top J S v=u^\top Jv=\omega_0(u,v),
\end{align*}
so $S$ preserves the standard symplectic form. Therefore a linear map is canonical precisely when its matrix satisfies $S^\top J S=J$; these matrices form the symplectic group $Sp(2n,\mathbb R)$.
[/example]
## The Tautological One-Form and the Canonical Symplectic Form on Cotangent Bundles
How does a general configuration manifold $Q$ acquire a phase space without choosing coordinates or a metric? The answer is that covectors already know how to pair with tangent vectors. The cotangent bundle $T^*Q$ therefore carries a canonical one-form, and its exterior derivative gives the canonical symplectic form.
[definition: Tautological One-Form]
Let $Q$ be a smooth manifold, let $\pi:T^*Q\to Q$ be the bundle projection, and let $\alpha\in T_q^*Q$. The tautological one-form $\theta$ on $T^*Q$ is defined at $\alpha\in T^*Q$ by
\begin{align*}
\theta_\alpha(X)=\alpha(d\pi_\alpha(X))
\end{align*}
for all $X\in T_\alpha(T^*Q)$.
[/definition]
This definition says that a tangent vector to $T^*Q$ first projects down to a tangent vector on $Q$, and then the covector sitting at the point evaluates it. This motivates the following definition: to turn this canonical one-form into the two-form required by Hamiltonian mechanics, we take its exterior derivative with the sign convention that matches Hamilton's equations.
[definition: Canonical Symplectic Form]
The canonical symplectic form on $T^*Q$ is the two-form $\omega\in\Omega^2(T^*Q)$ defined by
\begin{align*}
\omega = -d\theta.
\end{align*}
[/definition]
The definition gives a closed two-form, but Hamiltonian mechanics needs nondegeneracy as well, otherwise $dH$ would not determine a unique vector field. The point that still has to be checked is global rather than notational: the formula $\omega=-d\theta$ must give a genuine symplectic form at every cotangent vector, independently of the coordinate chart used to compute it.
[quotetheorem:6843]
Both parts of the theorem are needed for Hamiltonian mechanics. Closedness alone would not identify covectors with vectors: the zero two-form is closed but cannot define Hamiltonian vector fields. Nondegeneracy alone would not give the bracket identities that later rely on $d\omega=0$. The theorem also does not say that every symplectic manifold is globally a cotangent bundle; it only supplies the canonical phase-space structure once a configuration manifold $Q$ has been chosen. This is the structure transported by cotangent lifts and then used to define Hamiltonian dynamics.
This theorem is the geometric bridge from Lagrangian mechanics to Hamiltonian mechanics. Once the Legendre transform sends a regular Lagrangian system from $TQ$ to $T^*Q$, the canonical form is the structure used to define its evolution. It also explains why diffeomorphisms of configuration space produce canonical maps of phase space.
[example: Cotangent Lift of a Diffeomorphism]
Let $F:Q\to Q$ be a diffeomorphism, and define its cotangent lift by
\begin{align*}
F_\sharp(\alpha_q)=\alpha_q\circ d(F^{-1})_{F(q)}\in T_{F(q)}^*Q.
\end{align*}
This map covers $F$, because $F_\sharp(\alpha_q)$ is a covector based at $F(q)$, so
\begin{align*}
\pi(F_\sharp(\alpha_q))=F(q)=F(\pi(\alpha_q)).
\end{align*}
Thus
\begin{align*}
\pi\circ F_\sharp=F\circ \pi.
\end{align*}
We compute the pullback of the tautological one-form. Fix $\alpha_q\in T_q^*Q$ and $X\in T_{\alpha_q}(T^*Q)$. By the definition of pullback of a one-form,
\begin{align*}
(F_\sharp^*\theta)_{\alpha_q}(X)=\theta_{F_\sharp(\alpha_q)}(d(F_\sharp)_{\alpha_q}X).
\end{align*}
By the definition of the tautological one-form,
\begin{align*}
\theta_{F_\sharp(\alpha_q)}(d(F_\sharp)_{\alpha_q}X)=F_\sharp(\alpha_q)\bigl(d\pi_{F_\sharp(\alpha_q)}(d(F_\sharp)_{\alpha_q}X)\bigr).
\end{align*}
Using $\pi\circ F_\sharp=F\circ\pi$ and differentiating at $\alpha_q$ gives
\begin{align*}
d\pi_{F_\sharp(\alpha_q)}\circ d(F_\sharp)_{\alpha_q}=dF_q\circ d\pi_{\alpha_q}.
\end{align*}
Therefore
\begin{align*}
(F_\sharp^*\theta)_{\alpha_q}(X)=F_\sharp(\alpha_q)\bigl(dF_q(d\pi_{\alpha_q}X)\bigr).
\end{align*}
Substituting the definition of $F_\sharp(\alpha_q)$ gives
\begin{align*}
(F_\sharp^*\theta)_{\alpha_q}(X)=\alpha_q\bigl(d(F^{-1})_{F(q)}(dF_q(d\pi_{\alpha_q}X))\bigr).
\end{align*}
Since $d(F^{-1})_{F(q)}\circ dF_q=\operatorname{id}_{T_qQ}$, this becomes
\begin{align*}
(F_\sharp^*\theta)_{\alpha_q}(X)=\alpha_q(d\pi_{\alpha_q}X).
\end{align*}
By the definition of $\theta$ again,
\begin{align*}
\alpha_q(d\pi_{\alpha_q}X)=\theta_{\alpha_q}(X).
\end{align*}
Hence $F_\sharp^*\theta=\theta$.
Now use $\omega=-d\theta$ and the fact that exterior differentiation commutes with pullback:
\begin{align*}
F_\sharp^*\omega=F_\sharp^*(-d\theta).
\end{align*}
Thus
\begin{align*}
F_\sharp^*\omega=-d(F_\sharp^*\theta).
\end{align*}
Since $F_\sharp^*\theta=\theta$, we get
\begin{align*}
F_\sharp^*\omega=-d\theta=\omega.
\end{align*}
The inverse of $F_\sharp$ is $(F^{-1})_\sharp$, so $F_\sharp$ is a diffeomorphism of $T^*Q$ preserving the canonical symplectic form. Thus cotangent lifts are canonical transformations.
[/example]
## Poisson Brackets, Hamiltonian Flows, and Symplectomorphisms
Once a symplectic form identifies covectors with vectors, every smooth energy function should produce a vector field. The resulting vector field is Hamilton's vector field, its integral curves are the physical motions, and the Poisson bracket records how observables change along those motions.
[definition: Hamiltonian Vector Field]
Let $(M,\omega)$ be a symplectic manifold and let $H\in C^\infty(M)$. The Hamiltonian vector field $X_H\in\mathfrak X(M)$ is defined by
\begin{align*}
\omega(X_H,Y)=dH(Y)
\end{align*}
for all vector fields $Y\in\mathfrak X(M)$.
[/definition]
With the convention $\omega=\sum_i dq_i\wedge dp_i$, the Hamiltonian vector field on $T^*Q$ in canonical coordinates is
\begin{align*}
X_H=\sum_{i=1}^n \frac{\partial H}{\partial p_i}\frac{\partial}{\partial q_i}-\frac{\partial H}{\partial q_i}\frac{\partial}{\partial p_i}.
\end{align*}
Hence an integral curve $(q(t),p(t))$ satisfies Hamilton's equations
\begin{align*}
\dot q_i &= \frac{\partial H}{\partial p_i}, & \dot p_i &= -\frac{\partial H}{\partial q_i}.
\end{align*}
This motivates the following bracket operation, which packages differentiation of observables along Hamiltonian vector fields.
[definition: Poisson Bracket]
Let $(M,\omega)$ be a symplectic manifold. The Poisson bracket is the map $\{\cdot,\cdot\}:C^\infty(M)\times C^\infty(M)\to C^\infty(M)$ defined by
\begin{align*}
\{F,G\}=dF(X_G)=-dG(X_F)
\end{align*}
for $F,G\in C^\infty(M)$.
[/definition]
In canonical coordinates this becomes
\begin{align*}
\{F,G\}=\sum_{i=1}^n\left(\frac{\partial F}{\partial q_i}\frac{\partial G}{\partial p_i}-\frac{\partial F}{\partial p_i}\frac{\partial G}{\partial q_i}\right).
\end{align*}
The sign reflects the convention used in the definition of $X_H$ above. The time derivative of an observable $F$ along the Hamiltonian flow of $H$ is
\begin{align*}
\frac{d}{dt}F=\{F,H\}.
\end{align*}
The next theorem records the algebraic rules that make this bracket the correct language for conserved quantities and symmetry generators.
[quotetheorem:1333]
[citeproof:1333]
The identities require the full symplectic structure, not just an arbitrary bilinear operation on functions. If the two-form were not closed, the commutator calculation for Hamiltonian vector fields would acquire an extra $d\omega$ term and the Jacobi identity could fail. The theorem does not say that every Lie bracket on $C^\infty(M)$ comes from a symplectic form, nor does it identify all conserved quantities for a given Hamiltonian. A concrete failure mode is to keep a skew nondegenerate two-form but allow $d\omega\ne 0$: skew-symmetry and the Leibniz rule still come from pointwise algebra and the product rule, while the Jacobi identity can fail because differentiating $\omega$ contributes extra terms. The bracket turns smooth observables into a Lie algebra only when this obstruction is absent, and it also remembers the derivation property expected of differentiation. This raises the geometric version of the same preservation question: which diffeomorphisms preserve the symplectic structure that defines all Hamiltonian vector fields and brackets?
[definition: Symplectomorphism]
Let $(M,\omega)$ and $(N,\eta)$ be symplectic manifolds. A diffeomorphism $\Phi:M\to N$ is a symplectomorphism if
\begin{align*}
\Phi^*\eta=\omega.
\end{align*}
[/definition]
Symplectomorphisms are the coordinate changes allowed in Hamiltonian mechanics. They preserve the structure that defines Hamiltonian vector fields, not necessarily a metric, length, or angle. The key dynamical question is whether time evolution generated by a Hamiltonian is itself such a canonical transformation.
[quotetheorem:6844]
[citeproof:6844]
The conclusion uses that the vector field is Hamiltonian, not merely any vector field on a symplectic manifold. On $\mathbb R^2$ with $\omega=dq\wedge dp$, the dilation flow $(q,p)\mapsto(e^tq,e^tp)$ pulls $\omega$ back to $e^{2t}\omega$, so a general flow need not be symplectic. The theorem also does not imply that every symplectomorphism is the time-$t$ map of a globally defined Hamiltonian flow. Its role here is narrower and fundamental: the time evolution generated by $H$ is a canonical transformation, which is what the harmonic oscillator example visualises.
This result is the geometric form of Liouville's principle: Hamiltonian time evolution is canonical. It is stronger than conservation of energy, because it says that the entire symplectic structure of phase space is transported unchanged by the dynamics. The harmonic oscillator gives the basic picture in the smallest nontrivial phase space.
[illustration:harmonic-oscillator-phase-portrait]
[example: Harmonic Oscillator Phase Portrait]
On $T^*\mathbb R\cong\mathbb R^2$ with canonical coordinates $(q,p)$, let
\begin{align*}
H(q,p)=\frac{1}{2}(p^2+\nu^2q^2),\qquad \nu>0.
\end{align*}
The coordinate formula for the Hamiltonian vector field gives
\begin{align*}
X_H=\frac{\partial H}{\partial p}\frac{\partial}{\partial q}-\frac{\partial H}{\partial q}\frac{\partial}{\partial p}.
\end{align*}
Since
\begin{align*}
\frac{\partial H}{\partial p}=p
\end{align*}
and
\begin{align*}
\frac{\partial H}{\partial q}=\nu^2q,
\end{align*}
we get
\begin{align*}
X_H=p\frac{\partial}{\partial q}-\nu^2q\frac{\partial}{\partial p}.
\end{align*}
Thus an integral curve $(q(t),p(t))$ satisfies
\begin{align*}
\dot q(t)=p(t),\qquad \dot p(t)=-\nu^2q(t).
\end{align*}
Along such a curve, the energy is constant because
\begin{align*}
\frac{d}{dt}H(q(t),p(t))=\frac{\partial H}{\partial q}\dot q+\frac{\partial H}{\partial p}\dot p.
\end{align*}
Substituting the computed derivatives and Hamilton's equations gives
\begin{align*}
\frac{d}{dt}H(q(t),p(t))=(\nu^2q)p+p(-\nu^2q)=0.
\end{align*}
Therefore each trajectory stays on a level set $H=E$, which is
\begin{align*}
\frac{1}{2}(p^2+\nu^2q^2)=E.
\end{align*}
Equivalently,
\begin{align*}
p^2+\nu^2q^2=2E.
\end{align*}
For $E>0$ this is an ellipse in the $(q,p)$-plane, while $E=0$ gives the single equilibrium point $(0,0)$.
To see the periodic motion, set $Q=\nu q$. Then
\begin{align*}
\dot Q=\nu\dot q=\nu p
\end{align*}
and
\begin{align*}
\dot p=-\nu^2q=-\nu Q.
\end{align*}
Hence
\begin{align*}
Q(t)=Q(0)\cos(\nu t)+p(0)\sin(\nu t)
\end{align*}
and
\begin{align*}
p(t)=p(0)\cos(\nu t)-Q(0)\sin(\nu t).
\end{align*}
Differentiating these two displayed formulas gives $\dot Q=\nu p$ and $\dot p=-\nu Q$, and at $t=0$ they recover the initial values. Thus, after rescaling the $q$-axis by $Q=\nu q$, the phase portrait is circular motion with angular frequency $\nu$ and period $2\pi/\nu$; in the original $(q,p)$-plane these circles appear as the ellipses $p^2+\nu^2q^2=2E$.
[/example]
The oscillator shows the chapter's three main ideas at once. The plane carries its canonical area form, the Hamiltonian produces rotation-like flow through that form, and the flow preserves the symplectic structure while moving observables according to the Poisson bracket.
With phase space and Poisson brackets in place, symmetry can be treated systematically rather than case by case. Chapter 5 develops Noether theory and momentum maps, explaining how invariance produces conserved quantities and how group actions are encoded on phase space.
# 5. Noether Theory and Momentum Maps
Symmetry turns mechanics from a collection of differential equations into a theory with structure. After Chapters 3 and 4 introduced Hamiltonian vector fields and symplectic phase spaces, we now ask what happens when a Lie group moves the system without changing the physics. The answer is Noether theory: infinitesimal invariance produces conserved quantities, and in Hamiltonian mechanics those conserved quantities assemble into momentum maps.
The chapter moves from local infinitesimal symmetries to global Hamiltonian group actions. The guiding examples are translations, rotations, and rigid body motion, because they show how linear momentum, angular momentum, and body angular momentum are the same construction in different coordinates.
## Infinitesimal Symmetries Of Lagrangian Systems
The first question is how to recognise a continuous symmetry before solving the equations of motion. A symmetry should transform admissible paths into admissible paths and preserve the action, possibly up to a boundary term, since boundary terms do not affect fixed-endpoint variational equations.
[definition: Infinitesimal Generator]
Let $Q$ be a smooth configuration manifold and let $\rho: G \times Q \to Q$ be a smooth left action of a Lie group $G$ on $Q$. For $\xi \in \mathfrak g$, the infinitesimal generator is the section $\xi_Q:Q\to TQ$ of the tangent bundle defined by
\begin{align*}
\xi_Q(q) = \frac{d}{ds}\Big|_{s=0} \rho(\exp(s\xi),q).
\end{align*}
[/definition]
Equivalently, $\xi_Q$ is the vector field in $\mathfrak X(Q)$ whose value at $q$ is the velocity of the one-parameter orbit $s\mapsto \rho(\exp(s\xi),q)$. Since the equations from a Lagrangian use both positions and velocities, the next problem is to lift this infinitesimal motion from $Q$ to $TQ$. This motivates the tangent lift, which is the correct action on velocities.
[definition: Tangent Lift]
Let $\rho: G \times Q \to Q$ be a smooth left action. Its tangent lift is the action $T\rho: G \times TQ \to TQ$ given by
\begin{align*}
T\rho_g(v_q) = T_q\rho_g(v_q),
\end{align*}
where $\rho_g(q)=\rho(g,q)$.
[/definition]
For $\xi \in \mathfrak g$, the infinitesimal generator of the tangent-lifted action is denoted $\xi_{TQ}$. In local coordinates $(q_i,\dot q_i)$, if $\xi_Q = \sum_i a_i(q)\partial_{q_i}$, then
\begin{align*}
\xi_{TQ}=\sum_i a_i(q)\partial_{q_i}+\sum_{i,j}\frac{\partial a_i}{\partial q_j}(q)\dot q_j\partial_{\dot q_i}.
\end{align*}
This formula says that varying the base point also varies the velocity vector by the derivative of the infinitesimal displacement field. The next test is whether this lifted vector field changes the Lagrangian, so it gives the local criterion for strict invariance.
[definition: Infinitesimal Lagrangian Symmetry]
Let $L:TQ \to \mathbb R$ be a smooth Lagrangian and let $\xi \in \mathfrak g$ act on $Q$ with tangent-lifted infinitesimal generator $\xi_{TQ}$. The element $\xi$ is an infinitesimal Lagrangian symmetry of $L$ if
\begin{align*}
\xi_{TQ}(L)=0.
\end{align*}
[/definition]
This is the strict form of symmetry. Many mechanical Lagrangians are invariant only up to a total time derivative, and this weaker condition is the one naturally matched to the action principle. To include these systems, the next definition allows the change in $L$ to be an endpoint contribution rather than zero pointwise.
[definition: Infinitesimal Variational Symmetry]
Let $L:TQ \to \mathbb R$ be a smooth Lagrangian. A vector field $Y\in \mathfrak X(Q)$ is an infinitesimal variational symmetry with boundary term $F:Q\to \mathbb R$ if, along every smooth path $q(t)$,
\begin{align*}
\frac{d}{d\varepsilon}\Big|_{\varepsilon=0}L(q_\varepsilon(t),\dot q_\varepsilon(t)) = \frac{d}{dt}F(q(t)),
\end{align*}
where $q_\varepsilon(t)$ is any variation whose variational vector field is $Y(q(t))$.
[/definition]
The boundary term changes the action by $F(q(b))-F(q(a))$, so fixed-endpoint variations still give the same Euler--Lagrange equations. This is the Lagrangian origin of conserved quantities that include gauge-like correction terms.
[example: Translations Of A Free Particle]
Let $Q=\mathbb R^n$ with coordinates $(q_1,\ldots,q_n)$, and write $\dot q=(\dot q_1,\ldots,\dot q_n)$. The free-particle Lagrangian is
\begin{align*}
L(q,\dot q)=\frac{m}{2}\sum_{i=1}^n \dot q_i^2.
\end{align*}
For a fixed vector $a=(a_1,\ldots,a_n)\in\mathbb R^n$, the translation curve $s\mapsto q+sa$ has velocity
\begin{align*}
\frac{d}{ds}\Big|_{s=0}(q+sa)=a.
\end{align*}
Thus the infinitesimal generator is
\begin{align*}
Y(q)=\sum_{i=1}^n a_i\partial_{q_i}.
\end{align*}
Since each coefficient $a_i$ is constant, $\partial a_i/\partial q_j=0$ for every $i,j$, so the tangent-lift formula gives
\begin{align*}
Y_{TQ}=\sum_{i=1}^n a_i\partial_{q_i}.
\end{align*}
The Lagrangian has no dependence on the coordinates $q_i$, because it is a function only of $\dot q_1,\ldots,\dot q_n$. Therefore
\begin{align*}
\frac{\partial L}{\partial q_i}=0
\end{align*}
for every $i$, and hence
\begin{align*}
Y_{TQ}(L)=\sum_{i=1}^n a_i\frac{\partial L}{\partial q_i}=0.
\end{align*}
So translations are strict infinitesimal Lagrangian symmetries of the free particle.
The momentum pairing with $Y$ is
\begin{align*}
J_Y(q,\dot q)=\sum_{i=1}^n \frac{\partial L}{\partial \dot q_i}a_i.
\end{align*}
Since
\begin{align*}
\frac{\partial L}{\partial \dot q_i}=\frac{\partial}{\partial \dot q_i}\left(\frac{m}{2}\sum_{k=1}^n \dot q_k^2\right)=m\dot q_i,
\end{align*}
we get
\begin{align*}
J_Y(q,\dot q)=\sum_{i=1}^n m\dot q_i a_i=m\dot q\cdot a.
\end{align*}
For a free-particle solution, the Euler--Lagrange equations read
\begin{align*}
\frac{d}{dt}(m\dot q_i)-0=0
\end{align*}
for each $i$, so
\begin{align*}
\frac{d}{dt}(m\dot q\cdot a)=\sum_{i=1}^n a_i\frac{d}{dt}(m\dot q_i)=0.
\end{align*}
Thus the conserved scalar is the component of linear momentum $p=m\dot q$ in the direction $a$.
[/example]
This example already contains the template: a direction of symmetry gives a scalar conserved quantity. Varying $a$ over all translation directions packages these scalar quantities into the vector $p=m\dot q$.
[example: Rotations In A Central Potential]
Let $Q=\mathbb R^3_0$ and
\begin{align*}
L(q,\dot q)=\frac{m}{2}|\dot q|^2-V(|q|).
\end{align*}
Choose $\xi\in\mathfrak{so}(3)$, and identify it with the vector $\omega\in\mathbb R^3$ such that $\xi v=\omega\times v$. The infinitesimal generator on $Q$ is therefore
\begin{align*}
\xi_Q(q)=\omega\times q.
\end{align*}
Since the map $q\mapsto \omega\times q$ is linear, its derivative in the direction $\dot q$ is $\omega\times \dot q$, so the tangent-lifted infinitesimal generator is
\begin{align*}
\xi_{TQ}(q,\dot q)=(\omega\times q)\cdot \partial_q+(\omega\times \dot q)\cdot \partial_{\dot q}.
\end{align*}
We compute the change of $L$ under this lifted vector field. The velocity term gives
\begin{align*}
(\omega\times \dot q)\cdot \nabla_{\dot q}\left(\frac{m}{2}|\dot q|^2\right)=m(\omega\times \dot q)\cdot \dot q.
\end{align*}
Because $\omega\times \dot q$ is perpendicular to $\dot q$, this term is $0$. For the potential term, the chain rule gives
\begin{align*}
(\omega\times q)\cdot \nabla_q\bigl(-V(|q|)\bigr)=-V'(|q|)(\omega\times q)\cdot \frac{q}{|q|}.
\end{align*}
Because $\omega\times q$ is perpendicular to $q$, this term is also $0$. Hence
\begin{align*}
\xi_{TQ}(L)=0+0=0.
\end{align*}
Thus rotations are strict infinitesimal Lagrangian symmetries of a particle in a central potential.
The Lagrangian momentum paired with this infinitesimal rotation is
\begin{align*}
J_\omega(q,\dot q)=\frac{\partial L}{\partial \dot q}\cdot(\omega\times q).
\end{align*}
Since $\partial L/\partial \dot q=m\dot q$, we get
\begin{align*}
J_\omega(q,\dot q)=m\dot q\cdot(\omega\times q).
\end{align*}
By cyclic invariance of the scalar triple product,
\begin{align*}
m\dot q\cdot(\omega\times q)=\omega\cdot(q\times m\dot q).
\end{align*}
Writing $p=m\dot q$, the scalar conserved by the rotation generated by $\omega$ is therefore
\begin{align*}
J_\omega(q,\dot q)=\omega\cdot(q\times p).
\end{align*}
Since this holds for every rotation axis $\omega$, the vector quantity encoded by these scalar pairings is the angular momentum $q\times p$.
[/example]
Translations and rotations differ geometrically, but they are treated by the same infinitesimal calculation. The next step is to prove that this calculation always produces a first integral.
## Noether Theorem For Variational Symmetries
The central problem is to pass from invariance of the action to an expression that is constant along every motion. The mechanism is the first variation formula: when the path solves the Euler--Lagrange equations, the only surviving contribution from an infinitesimal variation is the boundary term involving the conjugate momentum.
[definition: Lagrangian Momentum Pairing]
Let $L:TQ\to \mathbb R$ be a smooth Lagrangian. In local coordinates $(q_i,\dot q_i)$, the Lagrangian momentum paired with a vector field $Y=\sum_i Y_i(q)\partial_{q_i}$ is the function $J_Y:TQ\to\mathbb R$ defined by
\begin{align*}
J_Y(q,\dot q)=\sum_i \frac{\partial L}{\partial \dot q_i}(q,\dot q)Y_i(q).
\end{align*}
[/definition]
This expression is coordinate-independent: the fiber derivative $\mathbb{F}L:TQ\to T^*Q$ sends $(q,v)$ to the covector $D_vL(q,v):T_qQ\to\mathbb R$, and $J_Y(q,v)=D_vL(q,v)(Y(q))$. For a group action it is the scalar momentum associated with the Lie algebra element generating the motion. The following theorem is needed to connect that scalar momentum with the variational [symmetry condition](/theorems/1360).
[quotetheorem:3513]
[citeproof:3513]
This cited argument shows why the boundary correction has the sign it does: it subtracts the change already accounted for by the boundary term in the action. Each hypothesis has a real role. If $Y$ is not a variational symmetry, the action variation contains an uncancelled integral term, so the momentum pairing can drift; for instance, a particle in a non-constant potential has translation momentum changing according to the force. If the path is not a solution of the Euler--Lagrange equations, the first variation still contains the Euler--Lagrange residual, so Noether's conclusion is not a statement about arbitrary curves in $TQ$. In the strict invariant case $F=0$, the conserved quantity is just the momentum pairing, and this is the form that will reappear in Hamiltonian language as a component of a momentum map.
[example: Galilean Boost Boundary Term]
For a free particle on $Q=\mathbb R$, take
\begin{align*}
L(q,\dot q)=\frac{m}{2}\dot q^2.
\end{align*}
The infinitesimal Galilean boost is the time-dependent vector field $Y_t(q)=t\partial_q$, so along a path $q(t)$ the induced first-order variation is $\delta q=t$ and hence $\delta\dot q=d(t)/dt=1$. Therefore the first-order change of the Lagrangian is
\begin{align*}
\delta L=\frac{\partial L}{\partial q}\delta q+\frac{\partial L}{\partial\dot q}\delta\dot q=0\cdot t+m\dot q\cdot 1=m\dot q.
\end{align*}
This is a total time derivative, because
\begin{align*}
\frac{d}{dt}(mq)=m\dot q.
\end{align*}
Thus the boost is a time-dependent variational symmetry with boundary term $F(t,q)=mq$.
The momentum pairing with $Y_t$ is
\begin{align*}
J_{Y_t}(q,\dot q)=\frac{\partial L}{\partial\dot q}t=m\dot q\,t=mt\dot q.
\end{align*}
By the time-dependent form of *Noether's theorem*, the conserved quantity is
\begin{align*}
J_{Y_t}-F=mt\dot q-mq.
\end{align*}
Equivalently, multiplying by $-1$ gives the conserved quantity
\begin{align*}
m(q-t\dot q).
\end{align*}
For a free-particle solution $q(t)=q_0+vt$, we have $\dot q(t)=v$, so
\begin{align*}
m(q-t\dot q)=m(q_0+vt-tv)=mq_0.
\end{align*}
The boost symmetry therefore records the initial position parameter $q_0$, rather than the linear momentum.
[/example]
This example is slightly broader than the autonomous configuration-space statement because the generator depends on time. The same endpoint calculation applies with time-dependent vector fields, and this is often the right setting for spacetime symmetries.
[remark: Conserved Currents In Field Theory]
In classical field theory, the same argument is localised. A variational symmetry of a Lagrangian density produces a current whose divergence vanishes on solutions. For mechanics, time is the only base variable, so a divergence-free current reduces to a quantity with zero time derivative.
[/remark]
Lagrangian Noether theory is tied to the tangent bundle and the variational principle. Hamiltonian mechanics gives a sharper geometric formulation: conserved quantities are functions whose Hamiltonian vector fields generate symmetries.
## Hamiltonian Symmetries And Momentum Maps
The Hamiltonian question is reversed: given a Lie group action on phase space, when are its infinitesimal generators Hamiltonian vector fields? If the answer is yes in a way that depends linearly on the Lie algebra element, the Hamiltonian functions combine into a momentum map.
[definition: Symplectic Group Action]
Let $(M,\omega)$ be a symplectic manifold. A smooth left action $\rho:G\times M\to M$ is symplectic if each map $\rho_g:M\to M$ satisfies
\begin{align*}
\rho_g^*\omega=\omega.
\end{align*}
[/definition]
A symplectic action preserves the Poisson bracket and maps Hamiltonian vector fields to Hamiltonian vector fields. This preservation is necessary for symmetry of Hamiltonian dynamics, but it does not by itself provide Hamiltonian functions for the infinitesimal generators. The next definition adds exactly that missing Hamiltonian data.
[definition: Hamiltonian Group Action]
Let $(M,\omega)$ be a symplectic manifold with a symplectic left action of $G$. The action is Hamiltonian if there exists a smooth map $J:M\to\mathfrak g^*$ such that, for every $\xi\in\mathfrak g$, the function
\begin{align*}
J_\xi(x)=J(x)(\xi)
\end{align*}
satisfies
\begin{align*}
dJ_\xi=\iota_{\xi_M}\omega.
\end{align*}
[/definition]
The map $J$ is the momentum map. The sign convention here fixes $X_{J_\xi}=\xi_M$ when Hamiltonian vector fields are defined by $\iota_{X_f}\omega=df$. To compare momentum values at points related by the group action, we need a compatibility condition with the natural action on $\mathfrak g^*$.
[definition: Equivariant Momentum Map]
Let $G$ act on $(M,\omega)$ by a Hamiltonian action with momentum map $J:M\to\mathfrak g^*$. The momentum map is equivariant if
\begin{align*}
J(g\cdot x)=\operatorname{Ad}^*_{g^{-1}}J(x)
\end{align*}
for all $g\in G$ and $x\in M$.
[/definition]
With the left-action convention $\xi_M(x)=\frac{d}{ds}\big|_{s=0}\exp(s\xi)\cdot x$ and the momentum-map sign $dJ_\xi=\iota_{\xi_M}\omega$, equivariance is written using $g^{-1}$ relative to the coadjoint action defined below by $(\operatorname{Ad}^*_g\mu)(\xi)=\mu(\operatorname{Ad}_{g^{-1}}\xi)$. This convention is the one compatible with the later Lie--Poisson sign choice; authors who define the infinitesimal generator or Hamiltonian vector field with the opposite sign often write the corresponding formula with $\operatorname{Ad}^*_g$ instead. Equivariance says that momentum transforms under the natural dual action of $G$ on $\mathfrak g^*$, after these sign conventions have been fixed. It is the condition that makes the components of the momentum map reproduce the Lie algebra bracket through the Poisson bracket up to the sign dictated by the chosen left-action convention.
More explicitly, under these conventions an equivariant momentum map satisfies the infinitesimal identity
\begin{align*}
\{J_\xi,J_\eta\}=-J_{[\xi,\eta]}
\end{align*}
for all $\xi,\eta\in\mathfrak g$, where the Poisson bracket is determined by $\{F,G\}=dF(X_G)$ and $\iota_{X_f}\omega=df$. This identity is the sign check that links equivariance, the chosen coadjoint action, and the later Lie--Poisson bracket convention.
The next question is whether these Hamiltonian generators are constants of motion for a given Hamiltonian $H$. The answer cannot follow from the existence of $J$ alone: $J_\xi$ generates the symmetry direction, while conservation asks how $J_\xi$ changes along the separate vector field $X_H$. The missing hypothesis is that $H$ is unchanged along every symmetry direction, so that the derivative of $H$ on each infinitesimal generator vanishes. The following theorem is the Hamiltonian version of Noether's theorem and turns the momentum-map identity into an actual conservation law.
[quotetheorem:6845]
[citeproof:6845]
The theorem is the Hamiltonian form of Noether's theorem. Instead of deriving the conserved quantity from a first variation formula, it identifies the conserved quantity as the Hamiltonian function generating the symmetry. The $G$-invariance hypothesis is essential: without it, $dH(\xi_M)$ need not vanish, so the component $J_\xi$ changes along the flow; a particle in a potential depending on position gives the standard failure of translation-momentum conservation. Hamiltonianity of the action is also essential, because a merely symplectic action may preserve $\omega$ without supplying global functions $J_\xi$ whose Hamiltonian vector fields are the infinitesimal generators. A concrete obstruction appears for the translation action of $S^1$ on the symplectic torus $(T^2,d\theta\wedge d\phi)$: the generator $\partial_\theta$ satisfies $\iota_{\partial_\theta}(d\theta\wedge d\phi)=d\phi$, and the closed one-form $d\phi$ is not exact on $T^2$, so no globally defined Hamiltonian component exists. The conclusion is conservation of the momentum-map value along unreduced dynamics; it is the input for Marsden--Weinstein symplectic reduction at level sets $J^{-1}(\mu)$, but it does not by itself construct the reduced phase space or prove that quotient dynamics is smooth.
[example: Cotangent Lift Momentum Map]
Let $G$ act smoothly on $Q$, and let the cotangent-lifted action act on $T^*Q$ with canonical one-form $\theta$ and canonical symplectic form $\omega=-d\theta$. For $\alpha_q\in T_q^*Q$, define $J:T^*Q\to\mathfrak g^*$ by
\begin{align*}
J(\alpha_q)(\xi)=\alpha_q(\xi_Q(q)).
\end{align*}
We verify that this function has the momentum-map differential required by the convention $dJ_\xi=\iota_{\xi_{T^*Q}}\omega$, where $J_\xi(\alpha_q)=J(\alpha_q)(\xi)$.
The canonical one-form is characterized by
\begin{align*}
\theta_{\alpha_q}(V)=\alpha_q(T_{\alpha_q}\pi(V)),
\end{align*}
where $\pi:T^*Q\to Q$ is the bundle projection. The infinitesimal generator $\xi_{T^*Q}$ projects to $\xi_Q$, so
\begin{align*}
T_{\alpha_q}\pi(\xi_{T^*Q}(\alpha_q))=\xi_Q(q).
\end{align*}
Therefore
\begin{align*}
\theta_{\alpha_q}(\xi_{T^*Q}(\alpha_q))=\alpha_q(\xi_Q(q)).
\end{align*}
By the definition of $J_\xi$, this says
\begin{align*}
\iota_{\xi_{T^*Q}}\theta=J_\xi.
\end{align*}
The cotangent lift preserves the canonical one-form, so its infinitesimal generator satisfies $\mathcal L_{\xi_{T^*Q}}\theta=0$. Cartan's formula gives
\begin{align*}
0=\mathcal L_{\xi_{T^*Q}}\theta=\iota_{\xi_{T^*Q}}d\theta+d(\iota_{\xi_{T^*Q}}\theta).
\end{align*}
Since $\omega=-d\theta$ and $\iota_{\xi_{T^*Q}}\theta=J_\xi$, this becomes
\begin{align*}
0=-\iota_{\xi_{T^*Q}}\omega+dJ_\xi.
\end{align*}
Hence
\begin{align*}
dJ_\xi=\iota_{\xi_{T^*Q}}\omega.
\end{align*}
Thus $J$ is the canonical momentum map for the cotangent lift.
For translations on $\mathbb R^n$, an element $a\in\mathbb R^n$ has infinitesimal generator $a_Q(q)=a$. Writing a covector as $p\in(\mathbb R^n)^*$, the pairing is
\begin{align*}
J(q,p)(a)=p(a).
\end{align*}
Under the Euclidean identification $(\mathbb R^n)^*\cong\mathbb R^n$, this is
\begin{align*}
J(q,p)(a)=p\cdot a.
\end{align*}
Since this holds for every $a$, the momentum map is
\begin{align*}
J(q,p)=p.
\end{align*}
For rotations on $\mathbb R^3$, identify $\xi\in\mathfrak{so}(3)$ with $\omega\in\mathbb R^3$ by $\xi v=\omega\times v$. The infinitesimal generator is
\begin{align*}
\xi_Q(q)=\omega\times q.
\end{align*}
Therefore
\begin{align*}
J(q,p)(\omega)=p\cdot(\omega\times q).
\end{align*}
Using cyclic invariance of the scalar triple product,
\begin{align*}
p\cdot(\omega\times q)=\omega\cdot(q\times p).
\end{align*}
Thus, after identifying $\mathfrak{so}(3)^*\cong\mathbb R^3$ by the Euclidean pairing, the momentum map is
\begin{align*}
J(q,p)=q\times p.
\end{align*}
The same formula $J(\alpha_q)(\xi)=\alpha_q(\xi_Q(q))$ therefore packages linear momentum and angular momentum as covectors paired with infinitesimal displacements.
[/example]
Cotangent lifts explain why momenta in mechanics are pairings between covectors and infinitesimal displacements. They also show that the familiar vector formulas depend on identifications such as $\mathfrak{so}(3)^*\cong\mathbb R^3$.
[example: Angular Momentum For The Rotation Action]
Let $M=T^*\mathbb R^3$ with coordinates $(q,p)$ and canonical symplectic form, and let $SO(3)$ act by
\begin{align*}
R\cdot(q,p)=(Rq,Rp).
\end{align*}
For $\omega\in\mathbb R^3$, identify the corresponding element of $\mathfrak{so}(3)$ with the linear map $v\mapsto \omega\times v$. The infinitesimal generator on the configuration space is
\begin{align*}
\xi_Q(q)=\omega\times q.
\end{align*}
Using the cotangent-lift momentum formula $J(q,p)(\omega)=p\cdot \xi_Q(q)$, we get
\begin{align*}
J(q,p)(\omega)=p\cdot(\omega\times q).
\end{align*}
By cyclic invariance of the scalar triple product,
\begin{align*}
p\cdot(\omega\times q)=\omega\cdot(q\times p).
\end{align*}
Since this identity holds for every $\omega\in\mathbb R^3$, the vector representing the momentum map under the Euclidean identification $\mathfrak{so}(3)^*\cong\mathbb R^3$ is
\begin{align*}
J(q,p)=q\times p.
\end{align*}
Now take
\begin{align*}
H(q,p)=\frac{1}{2m}|p|^2+V(|q|).
\end{align*}
For every $R\in SO(3)$, orthogonality gives $|Rp|^2=p\cdot R^TRp=p\cdot p=|p|^2$ and $|Rq|=|q|$. Therefore
\begin{align*}
H(Rq,Rp)=\frac{1}{2m}|Rp|^2+V(|Rq|)=\frac{1}{2m}|p|^2+V(|q|)=H(q,p).
\end{align*}
Thus $H$ is rotation-invariant. By *Momentum Map Conservation*, the momentum-map value is constant along the Hamiltonian flow, so
\begin{align*}
q(t)\times p(t)
\end{align*}
is conserved. This is the Hamiltonian version of angular momentum conservation for a particle in a central potential.
[/example]
This is the phase-space version of the central-force calculation. The conservation law now follows from symmetry of $H$, without returning to the Euler--Lagrange equations.
[example: Linear Momentum For Translations]
Let $M=T^*\mathbb R^n$ with coordinates $(q,p)$, where $q=(q_1,\ldots,q_n)$ and $p=(p_1,\ldots,p_n)$. The translation action of $\mathbb R^n$ is
\begin{align*}
a\cdot(q,p)=(q+a,p).
\end{align*}
For a fixed $a=(a_1,\ldots,a_n)$, the one-parameter subgroup generated by $a$ is $s\mapsto sa$, so the orbit through $(q,p)$ is
\begin{align*}
s\mapsto(q+sa,p).
\end{align*}
Differentiating at $s=0$ gives the infinitesimal generator
\begin{align*}
a_{T^*\mathbb R^n}(q,p)=\sum_{i=1}^n a_i\partial_{q_i}.
\end{align*}
The corresponding infinitesimal generator on the configuration space is $a_Q(q)=a$. Using the cotangent-lift momentum formula, the momentum map satisfies
\begin{align*}
J(q,p)(a)=p(a).
\end{align*}
Writing the covector $p$ in coordinates as $p=\sum_{i=1}^n p_i\,dq_i$, we have
\begin{align*}
p(a)=\sum_{i=1}^n p_i\,dq_i\left(\sum_{j=1}^n a_j\partial_{q_j}\right).
\end{align*}
Since $dq_i(\partial_{q_j})=\delta_{ij}$, this becomes
\begin{align*}
p(a)=\sum_{i=1}^n\sum_{j=1}^n p_i a_j\delta_{ij}=\sum_{i=1}^n p_i a_i=p\cdot a.
\end{align*}
Under the Euclidean identification $(\mathbb R^n)^*\cong\mathbb R^n$, if $J(q,p)$ is represented by a vector $P$, then
\begin{align*}
P\cdot a=J(q,p)(a)=p\cdot a
\end{align*}
for every $a\in\mathbb R^n$. Taking $a=P-p$ gives
\begin{align*}
|P-p|^2=(P-p)\cdot(P-p)=0,
\end{align*}
so $P=p$. Thus, in this identification,
\begin{align*}
J(q,p)=p.
\end{align*}
For the free Hamiltonian
\begin{align*}
H(q,p)=\frac{1}{2m}|p|^2=\frac{1}{2m}\sum_{i=1}^n p_i^2,
\end{align*}
translations leave $H$ unchanged because
\begin{align*}
H(q+a,p)=\frac{1}{2m}|p|^2=H(q,p).
\end{align*}
By *Momentum Map Conservation*, the momentum-map value is constant along the Hamiltonian flow, hence $p(t)$ is conserved. More generally, if $H(q,p)=K(p)$ has no $q$-dependence, then $H(q+a,p)=K(p)=H(q,p)$, so the same conservation argument gives constant linear momentum.
[/example]
The same definition gives both linear and angular momentum. What changes is the Lie algebra acting on the configuration space and therefore the infinitesimal generator paired with the covector $p$.
## Coadjoint Geometry And The Lie--Poisson Bracket
Momentum maps naturally take values in $\mathfrak g^*$, not in $\mathfrak g$. To understand reduced Hamiltonian dynamics, we need the intrinsic Poisson geometry of $\mathfrak g^*$ itself.
There is an immediate obstruction to treating $\mathfrak g^*$ as an ordinary symplectic phase space. A symplectic form must be non-degenerate and therefore lives on an even-dimensional manifold, while $\mathfrak g^*$ may have odd dimension; even when its dimension is even, the natural reduced bracket is usually degenerate. The correct geometric picture is that $\mathfrak g^*$ is a Poisson manifold whose symplectic leaves are coadjoint orbits. The coadjoint action is therefore the structure that describes where reduced Hamiltonian motion is allowed to move.
[definition: Coadjoint Action]
Let $G$ be a Lie group with Lie algebra $\mathfrak g$. The coadjoint action is the smooth action
\begin{align*}
\operatorname{Ad}^*:G\times\mathfrak g^*\to\mathfrak g^*
\end{align*}
defined by
\begin{align*}
(\operatorname{Ad}^*_g\mu)(\xi)=\mu(\operatorname{Ad}_{g^{-1}}\xi)
\end{align*}
for $g\in G$, $\mu\in\mathfrak g^*$, and $\xi\in\mathfrak g$.
[/definition]
Coadjoint orbits are the natural symmetry leaves in the space of momenta. Reduced Hamiltonian motion should stay on these leaves, so we need a Poisson bracket on $\mathfrak g^*$ whose Hamiltonian vector fields are tangent to coadjoint orbits. This motivates the Lie--Poisson bracket, which assigns a new smooth function to each pair of smooth functions on $\mathfrak g^*$.
[definition: Lie Poisson Bracket]
Let $\mathfrak g$ be a finite-dimensional Lie algebra. The Lie--Poisson bracket is the bilinear map
\begin{align*}
\{\cdot,\cdot\}_{\mathfrak g^*}:C^\infty(\mathfrak g^*)\times C^\infty(\mathfrak g^*)\to C^\infty(\mathfrak g^*)
\end{align*}
defined for $F,K\in C^\infty(\mathfrak g^*)$ by
\begin{align*}
\{F,K\}_{\mathfrak g^*}(\mu)=\mu\left([dF_\mu,dK_\mu]\right),
\end{align*}
where $dF_\mu,dK_\mu\in\mathfrak g$ under the natural identification $T_\mu^*\mathfrak g^*\cong\mathfrak g$.
[/definition]
Some authors use the negative of this bracket, depending on left versus right invariance conventions. The sign must be kept consistent with the chosen momentum map and Hamiltonian vector field convention. The next theorem turns the bracket into an explicit differential equation on $\mathfrak g^*$.
[quotetheorem:6846]
[citeproof:6846]
The equation says that Lie--Poisson dynamics moves momentum along coadjoint orbits. The hypotheses matter: finite-dimensionality identifies $T^*_\mu\mathfrak g^*$ with $\mathfrak g$ without functional-analytic completions, and smoothness of $h$ is what makes $dh_\mu$ a well-defined element of the Lie algebra. The sign in the displayed equation is convention-dependent; changing the Lie--Poisson bracket sign or the momentum-map convention reverses the coadjoint term. The equation is also only the reduced Poisson dynamics on $\mathfrak g^*$: it does not reconstruct the original curve in the Lie group or phase space without an additional reconstruction equation.
[illustration:so3-coadjoint-orbits]
[example: Rigid Body In Body Coordinates]
For a free rigid body, take $G=SO(3)$ and identify $\mathfrak{so}(3)^*\cong\mathbb R^3$ by the Euclidean dot product. Let $\Pi\in\mathbb R^3$ be the body angular momentum, and let the reduced Hamiltonian be
\begin{align*}
h(\Pi)=\frac{1}{2}\Pi\cdot I^{-1}\Pi,
\end{align*}
where the inertia tensor $I$ is symmetric and positive definite in body coordinates.
To compute $dh_\Pi$, test it on an arbitrary variation $\eta\in\mathbb R^3$. Since $I^{-1}$ is symmetric,
\begin{align*}
h(\Pi+s\eta)=\frac{1}{2}(\Pi+s\eta)\cdot I^{-1}(\Pi+s\eta).
\end{align*}
Expanding the dot product gives
\begin{align*}
h(\Pi+s\eta)=\frac{1}{2}\Pi\cdot I^{-1}\Pi+\frac{s}{2}\eta\cdot I^{-1}\Pi+\frac{s}{2}\Pi\cdot I^{-1}\eta+\frac{s^2}{2}\eta\cdot I^{-1}\eta.
\end{align*}
By symmetry of $I^{-1}$, $\eta\cdot I^{-1}\Pi=\Pi\cdot I^{-1}\eta$, so
\begin{align*}
h(\Pi+s\eta)=h(\Pi)+s\,\eta\cdot I^{-1}\Pi+\frac{s^2}{2}\eta\cdot I^{-1}\eta.
\end{align*}
Therefore
\begin{align*}
dh_\Pi(\eta)=\frac{d}{ds}\Big|_{s=0}h(\Pi+s\eta)=\eta\cdot I^{-1}\Pi.
\end{align*}
Under the Euclidean identification $T_\Pi^*\mathbb R^3\cong\mathbb R^3$, this means
\begin{align*}
dh_\Pi=I^{-1}\Pi.
\end{align*}
Writing
\begin{align*}
\Omega=I^{-1}\Pi,
\end{align*}
the differential of $h$ is the body angular velocity.
For $\mathfrak{so}(3)\cong\mathbb R^3$, the Lie bracket is the cross product:
\begin{align*}
[\xi,\eta]=\xi\times\eta.
\end{align*}
The Lie--Poisson equation from *Lie Poisson Hamilton Equations* is
\begin{align*}
\dot\Pi=-\operatorname{ad}^*_{\Omega}\Pi.
\end{align*}
To identify the vector on the right, pair it with an arbitrary $\eta\in\mathbb R^3$. By the definition of $\operatorname{ad}^*$,
\begin{align*}
(-\operatorname{ad}^*_{\Omega}\Pi)\cdot\eta=\Pi\cdot[\Omega,\eta].
\end{align*}
Using $[\Omega,\eta]=\Omega\times\eta$, this becomes
\begin{align*}
(-\operatorname{ad}^*_{\Omega}\Pi)\cdot\eta=\Pi\cdot(\Omega\times\eta).
\end{align*}
Cyclic invariance of the scalar triple product gives
\begin{align*}
\Pi\cdot(\Omega\times\eta)=\eta\cdot(\Pi\times\Omega).
\end{align*}
Thus
\begin{align*}
(-\operatorname{ad}^*_{\Omega}\Pi)\cdot\eta=(\Pi\times\Omega)\cdot\eta.
\end{align*}
Since this holds for every $\eta$, we obtain Euler's equation in body coordinates:
\begin{align*}
\dot\Pi=\Pi\times\Omega.
\end{align*}
The squared length of body angular momentum is constant along this motion, because
\begin{align*}
\frac{d}{dt}|\Pi|^2=2\Pi\cdot\dot\Pi.
\end{align*}
Substituting Euler's equation gives
\begin{align*}
\frac{d}{dt}|\Pi|^2=2\Pi\cdot(\Pi\times\Omega).
\end{align*}
The vector $\Pi\times\Omega$ is perpendicular to $\Pi$, so
\begin{align*}
\frac{d}{dt}|\Pi|^2=0.
\end{align*}
Thus the reduced rigid-body motion stays on the spheres $|\Pi|^2=\text{constant}$, which are the coadjoint orbits of $SO(3)$ under the Euclidean identification $\mathfrak{so}(3)^*\cong\mathbb R^3$.
[/example]
The rigid body example is the bridge from Noether conservation to reduced dynamics. Spatial angular momentum is conserved by rotational symmetry, while body angular momentum evolves by the coadjoint geometry induced by expressing the motion in a moving frame.
[remark: Momentum Maps As Organising Data]
A momentum map simultaneously records conserved quantities, the infinitesimal generators of symmetries, and the map from canonical phase space to reduced Poisson space. In applications, finding $J$ is often the decisive step: it identifies which quantities survive reduction and which variables describe the reduced motion.
[/remark]
Conserved quantities do more than simplify equations: they also make reduction possible. Chapter 6 uses momentum maps to quotient out symmetry directions and describe the resulting reduced dynamics on smaller symplectic or Poisson spaces.
# 6. Symplectic Reduction and Mechanical Systems with Symmetry
Symmetry changes the way Hamiltonian mechanics is organized: conserved quantities are not only first integrals, but also coordinates along directions that can be removed. In Chapter 5, momentum maps recorded the infinitesimal effect of group actions on phase space. This chapter asks how a Hamiltonian system descends to a smaller phase space after fixing a conserved momentum value, how solutions are reconstructed upstairs, and how symmetry helps identify steady motions that are not equilibria in the full phase space.
## Fixing Momentum and Dividing by Symmetry
The main reduction problem is this: if a Lie group acts by symmetries on a Hamiltonian system, then motion starting with a fixed value of the momentum map remains at that value; what geometric object carries the remaining dynamics after the redundant group directions are removed?
Let $(M,\omega)$ be a symplectic manifold, let $G$ be a Lie group acting smoothly on $M$, and write $\mathfrak g$ for its Lie algebra. For $\xi \in \mathfrak g$, the infinitesimal generator is the vector field $\xi_M$ on $M$ defined by the action curve $t \mapsto \exp(t\xi)\cdot m$.
[definition: Hamiltonian Group Action]
A smooth action $G \curvearrowright M$ on a symplectic manifold $(M,\omega)$ is Hamiltonian if there is a smooth map $J:M\to \mathfrak g^*$ such that, for every $\xi \in \mathfrak g$, the function $J_\xi:M\to \mathbb R$ defined by $J_\xi(m)=J(m)(\xi)$ satisfies
\begin{align*}
i_{\xi_M}\omega = dJ_\xi .
\end{align*}
[/definition]
The map $J$ is the momentum map. The equation says that infinitesimal symmetry directions are Hamiltonian vector fields for the components of momentum, so fixing $J$ is the geometric replacement for fixing linear or angular momentum.
[example: Translation Symmetry in Cotangent Space]
Let $Q=\mathbb R^n$ and identify $M=T^*Q$ with $\mathbb R^n_q\times \mathbb R^n_p$. Use canonical coordinates $(q_1,\ldots,q_n,p_1,\ldots,p_n)$ and the convention
\begin{align*}
\omega=\sum_{i=1}^n dq_i\wedge dp_i .
\end{align*}
The translation action of $\mathbb R^n$ is $a\cdot(q,p)=(q+a,p)$. For $\xi=(\xi_1,\ldots,\xi_n)\in\mathbb R^n$, the action curve through $(q,p)$ is
\begin{align*}
t\mapsto \exp(t\xi)\cdot(q,p)=(q+t\xi,p),
\end{align*}
so differentiating at $t=0$ gives
\begin{align*}
\xi_M(q,p)=\sum_{i=1}^n \xi_i\frac{\partial}{\partial q_i}.
\end{align*}
We compute the contraction with $\omega$ term by term. Since $dq_i(\xi_M)=\xi_i$ and $dp_i(\xi_M)=0$, the identity $i_X(\alpha\wedge\beta)=\alpha(X)\beta-\beta(X)\alpha$ gives
\begin{align*}
i_{\xi_M}(dq_i\wedge dp_i)=\xi_i\,dp_i-0\cdot dq_i=\xi_i\,dp_i.
\end{align*}
Therefore
\begin{align*}
i_{\xi_M}\omega=\sum_{i=1}^n \xi_i\,dp_i.
\end{align*}
On the other hand,
\begin{align*}
p\cdot \xi=\sum_{i=1}^n p_i\xi_i,
\end{align*}
and $\xi_i$ is constant with respect to the phase-space variables, so
\begin{align*}
d(p\cdot\xi)=\sum_{i=1}^n \xi_i\,dp_i.
\end{align*}
Thus $i_{\xi_M}\omega=d(p\cdot\xi)$ for every $\xi$, so the momentum map is
\begin{align*}
J(q,p)=p\in(\mathbb R^n)^*\cong\mathbb R^n.
\end{align*}
Translation symmetry therefore recovers ordinary linear momentum: the momentum component in direction $\xi$ is exactly $p\cdot\xi$.
[/example]
This example shows that the momentum value is not an arbitrary constraint; it is tied to a symmetry direction. After fixing a momentum value $\mu\in\mathfrak g^*$, the next problem is to identify which group elements still preserve the level set $J^{-1}(\mu)$, because only those elements can be quotiented while staying inside the fixed-momentum surface.
[definition: Coadjoint Stabiliser]
For $\mu \in \mathfrak g^*$, the coadjoint stabiliser is
\begin{align*}
G_\mu = \{g\in G : \operatorname{Ad}^*_{g^{-1}}\mu=\mu\}.
\end{align*}
[/definition]
The reduced space is formed by restricting to the constraint $J=\mu$ and then quotienting only by $G_\mu$. This is the symplectic analogue of first imposing a conserved value and then forgetting the remaining symmetry variables.
[definition: Marsden-Weinstein Reduced Space]
Let $(M,\omega)$ be a Hamiltonian $G$-space with momentum map $J:M\to\mathfrak g^*$. For $\mu\in\mathfrak g^*$, the Marsden-Weinstein reduced space at $\mu$ is
\begin{align*}
M_\mu = J^{-1}(\mu)/G_\mu,
\end{align*}
whenever the quotient is a smooth manifold.
[/definition]
The definition gives the candidate quotient, but a quotient of a symplectic manifold need not remain symplectic: the restricted two-form may have kernel directions along group orbits, and singular level sets or non-free actions may prevent the quotient from being a manifold. The reduction problem is to show that, after imposing the right regularity and freeness hypotheses, exactly those orbit directions are removed and a nondegenerate two-form descends to the quotient.
[quotetheorem:6847]
[citeproof:6847]
This theorem explains why reduction is not merely a quotient construction. The symplectic form on the quotient is forced by the original symplectic form, so reduced trajectories remain Hamiltonian trajectories rather than just projected curves. Each hypothesis has a concrete role. Regularity makes $J^{-1}(\mu)$ a smooth constraint surface; at a critical momentum value the level set may have singularities. Equivariance ensures that the correct residual symmetry is $G_\mu$ and that the kernel of $i_\mu^*\omega$ is the orbit distribution. Freeness and properness make the orbit space a manifold; if stabilisers remain, the quotient is usually stratified rather than smooth. The theorem also does not say that the quotient by all of $G$ is available: unless $\mu$ is fixed by the whole coadjoint action, a general element of $G$ moves $J^{-1}(\mu)$ to another momentum level.
[illustration:marsden-weinstein-reduction]
[example: Rotational Reduction in a Central Force Problem]
For a particle in $\mathbb R^3_0$ with Hamiltonian $H(q,p)=\frac{1}{2m}|p|^2+V(|q|)$, the rotation action of $SO(3)$ on $T^*\mathbb R^3_0$ has momentum map $J(q,p)=q\times p\in\mathfrak{so}(3)^*\cong\mathbb R^3$. Fix $\mu\neq 0$ and write $\ell=|\mu|$. On the level set $J(q,p)=\mu$, we have
\begin{align*}
q\cdot \mu=q\cdot(q\times p)=0.
\end{align*}
Also,
\begin{align*}
p\cdot \mu=p\cdot(q\times p)=0.
\end{align*}
Thus both $q$ and $p$ lie in the plane perpendicular to $\mu$.
Let $r=|q|$ and define the radial unit vector $e_r=q/r$. Decompose the momentum into its radial and angular parts in this plane:
\begin{align*}
p=p_r e_r+p_\perp,\qquad p_r=p\cdot e_r,\qquad p_\perp\cdot e_r=0.
\end{align*}
Since $q=r e_r$, the radial part contributes no angular momentum:
\begin{align*}
q\times(p_r e_r)=r e_r\times(p_r e_r)=rp_r(e_r\times e_r)=0.
\end{align*}
Therefore
\begin{align*}
q\times p=q\times p_\perp.
\end{align*}
Because $q$ is perpendicular to $p_\perp$, the norm of the cross product is
\begin{align*}
|q\times p_\perp|^2=|q|^2|p_\perp|^2=r^2|p_\perp|^2.
\end{align*}
Using $q\times p=\mu$, this gives
\begin{align*}
\ell^2=r^2|p_\perp|^2.
\end{align*}
Hence
\begin{align*}
|p_\perp|^2=\frac{\ell^2}{r^2}.
\end{align*}
Since $p_r e_r$ is perpendicular to $p_\perp$,
\begin{align*}
|p|^2=p_r^2+|p_\perp|^2=p_r^2+\frac{\ell^2}{r^2}.
\end{align*}
Substituting this into the Hamiltonian gives the reduced radial energy
\begin{align*}
H_{\mathrm{rad}}(r,p_r)=\frac{p_r^2}{2m}+V(r)+\frac{\ell^2}{2mr^2}.
\end{align*}
Thus the effective potential is
\begin{align*}
V_{\mathrm{eff}}(r)=V(r)+\frac{\ell^2}{2mr^2}.
\end{align*}
The stabiliser $SO(3)_\mu$ consists of rotations around the axis spanned by $\mu$, so quotienting by $SO(3)_\mu$ removes the remaining angular coordinate in the plane perpendicular to $\mu$. Fixing only $|J|=\ell$ would instead allow every sphere of angular momentum directions with that norm, so it is a union of coadjoint levels rather than one Marsden-Weinstein level set.
[/example]
## Reduced Hamiltonians and Reconstruction
Once the reduced phase space has been constructed, the next question is dynamical: which Hamiltonians descend to the quotient, and how much of the original motion can be recovered from a reduced solution?
A Hamiltonian descends only when it gives the same energy to points representing the same physical state after quotienting. This requirement is expressed by invariance under the symmetry group, and it is the dynamical assumption behind the conservation law used in reduction.
[definition: Invariant Hamiltonian]
Let $G\curvearrowright M$ be a smooth action. A Hamiltonian $H:M\to\mathbb R$ is $G$-invariant if
\begin{align*}
H(g\cdot m)=H(m)
\end{align*}
for all $g\in G$ and $m\in M$.
[/definition]
Invariance is the condition that the Hamiltonian assigns the same energy to physically equivalent points. The next issue is to prove that this symmetry condition has two consequences at once: momentum stays fixed, and the Hamiltonian vector field projects to the reduced quotient.
[quotetheorem:6848]
[citeproof:6848]
The theorem gives a genuine reduced Hamiltonian system, but the hypotheses are again doing essential work. If $H$ is not $G$-invariant, then $dH(\xi_M)$ need not vanish, so the momentum components $J_\xi$ need not be conserved and the flow may leave $J^{-1}(\mu)$. If $\mu$ is singular or if the $G_\mu$-action is not free and proper, the formula for $H_\mu$ may still make formal sense on orbit classes, but there may be no smooth reduced phase space carrying an ordinary Hamiltonian vector field. The theorem also does not reconstruct the discarded group variables; it identifies only the projected motion on the quotient. The next definition isolates this missing part of the problem: after finding the reduced curve, we must recover the group motion that was removed by the quotient.
[definition: Reconstruction Problem]
Let $I\subset\mathbb R$ be an interval. Given a solution curve $\bar m:I\to M_\mu$ of the reduced Hamiltonian system on $M_\mu$, the reconstruction problem is to find a curve $m:I\to J^{-1}(\mu)$ such that $\pi_\mu(m(t))=\bar m(t)$ for all $t\in I$ and $m:I\to M$ solves the original Hamiltonian system.
[/definition]
The precise reconstruction equation depends on a choice of connection on the principal bundle $J^{-1}(\mu)\to M_\mu$. Conceptually, the reduced curve determines the horizontal part of the motion, while the conserved momentum determines the vertical group drift.
[example: Spherical Pendulum Reduced by Axial Symmetry]
Use spherical coordinates $(\theta,\varphi)$ on the unit sphere, with $\theta$ the polar angle from the vertical axis and $\varphi$ the azimuthal angle. In the unit-length convention, the kinetic energy is
\begin{align*}
T=\frac{m}{2}\dot\theta^2+\frac{m}{2}\sin^2\theta\,\dot\varphi^2.
\end{align*}
The momentum conjugate to $\varphi$ is
\begin{align*}
p_\varphi=\frac{\partial T}{\partial \dot\varphi}=m\sin^2\theta\,\dot\varphi.
\end{align*}
Rotations around the vertical axis translate $\varphi$, so $p_\varphi$ is the momentum map component for the $S^1$ symmetry. Fix $p_\varphi=\mu$. Then
\begin{align*}
\dot\varphi=\frac{\mu}{m\sin^2\theta}.
\end{align*}
The Hamiltonian in these coordinates is
\begin{align*}
H(\theta,\varphi,p_\theta,p_\varphi)=\frac{p_\theta^2}{2m}+\frac{p_\varphi^2}{2m\sin^2\theta}+V(\theta).
\end{align*}
Restricting to the momentum level $p_\varphi=\mu$ gives
\begin{align*}
H_\mu(\theta,p_\theta)=\frac{p_\theta^2}{2m}+V(\theta)+\frac{\mu^2}{2m\sin^2\theta}.
\end{align*}
Thus the reduced motion is the one-degree-of-freedom polar motion with effective potential
\begin{align*}
V_{\mathrm{eff}}(\theta)=V(\theta)+\frac{\mu^2}{2m\sin^2\theta}.
\end{align*}
If the pendulum has length $L$ instead of $1$, the factor $m\sin^2\theta$ is replaced by $mL^2\sin^2\theta$, so the angular term becomes $\mu^2/(2mL^2\sin^2\theta)$.
Once $\theta(t)$ is known from the reduced equation, the removed azimuthal variable is reconstructed by integrating
\begin{align*}
\varphi(t)=\varphi(0)+\int_0^t \frac{\mu}{m\sin^2\theta(s)}\,ds.
\end{align*}
The quotient has removed the cyclic angle $\varphi$, while the fixed momentum value $\mu$ determines exactly how that angle drifts in the original spherical pendulum.
[/example]
A larger mechanical system often has several layers of symmetry. Translational reduction removes center-of-mass motion, while rotational reduction can then leave internal shape variables and conserved angular momentum.
[example: Two-Body Problem Reduced by Translations and Rotations]
Let the masses be $m_1,m_2>0$, set $M=m_1+m_2$, and write
\begin{align*}
m_{\mathrm{red}}=\frac{m_1m_2}{M}.
\end{align*}
For an interaction potential depending only on separation, the Hamiltonian is
\begin{align*}
H(q_1,q_2,p_1,p_2)=\frac{|p_1|^2}{2m_1}+\frac{|p_2|^2}{2m_2}+V(|q_1-q_2|).
\end{align*}
Introduce the center-of-mass and relative coordinates
\begin{align*}
R=\frac{m_1q_1+m_2q_2}{M},\qquad r=q_1-q_2.
\end{align*}
Solving these equations for $q_1$ and $q_2$ gives
\begin{align*}
q_1=R+\frac{m_2}{M}r,\qquad q_2=R-\frac{m_1}{M}r.
\end{align*}
The corresponding canonical momenta are
\begin{align*}
P=p_1+p_2,\qquad p_r=\frac{m_2}{M}p_1-\frac{m_1}{M}p_2,
\end{align*}
because substituting $dq_1=dR+\frac{m_2}{M}dr$ and $dq_2=dR-\frac{m_1}{M}dr$ gives
\begin{align*}
p_1\cdot dq_1+p_2\cdot dq_2=(p_1+p_2)\cdot dR+\left(\frac{m_2}{M}p_1-\frac{m_1}{M}p_2\right)\cdot dr.
\end{align*}
The inverse momentum formulas are
\begin{align*}
p_1=\frac{m_1}{M}P+p_r,\qquad p_2=\frac{m_2}{M}P-p_r.
\end{align*}
Substituting these into the kinetic energy gives
\begin{align*}
\frac{|p_1|^2}{2m_1}=\frac{1}{2m_1}\left|\frac{m_1}{M}P+p_r\right|^2.
\end{align*}
Expanding the square,
\begin{align*}
\frac{|p_1|^2}{2m_1}=\frac{m_1}{2M^2}|P|^2+\frac{1}{M}P\cdot p_r+\frac{|p_r|^2}{2m_1}.
\end{align*}
Similarly,
\begin{align*}
\frac{|p_2|^2}{2m_2}=\frac{1}{2m_2}\left|\frac{m_2}{M}P-p_r\right|^2.
\end{align*}
Expanding this square,
\begin{align*}
\frac{|p_2|^2}{2m_2}=\frac{m_2}{2M^2}|P|^2-\frac{1}{M}P\cdot p_r+\frac{|p_r|^2}{2m_2}.
\end{align*}
Adding the two expressions cancels the mixed terms:
\begin{align*}
\frac{|p_1|^2}{2m_1}+\frac{|p_2|^2}{2m_2}=\frac{m_1+m_2}{2M^2}|P|^2+\left(\frac{1}{2m_1}+\frac{1}{2m_2}\right)|p_r|^2.
\end{align*}
Since $m_1+m_2=M$ and $\frac{1}{m_{\mathrm{red}}}=\frac{M}{m_1m_2}=\frac{1}{m_1}+\frac{1}{m_2}$, this becomes
\begin{align*}
\frac{|p_1|^2}{2m_1}+\frac{|p_2|^2}{2m_2}=\frac{|P|^2}{2M}+\frac{|p_r|^2}{2m_{\mathrm{red}}}.
\end{align*}
Thus
\begin{align*}
H=\frac{|P|^2}{2M}+\frac{|p_r|^2}{2m_{\mathrm{red}}}+V(|r|).
\end{align*}
Fixing the translation momentum $P$ fixes the center-of-mass velocity $\dot R=P/M$, so the internal dynamics is governed by
\begin{align*}
H_{\mathrm{rel}}(r,p_r)=\frac{|p_r|^2}{2m_{\mathrm{red}}}+V(|r|).
\end{align*}
Now fix the rotational momentum
\begin{align*}
\ell=r\times p_r.
\end{align*}
Let $\rho=|r|$ and $e_r=r/\rho$. Decompose
\begin{align*}
p_r=p_\rho e_r+p_\perp,\qquad p_\rho=p_r\cdot e_r,\qquad p_\perp\cdot e_r=0.
\end{align*}
The radial part contributes no angular momentum:
\begin{align*}
r\times(p_\rho e_r)=\rho e_r\times(p_\rho e_r)=\rho p_\rho(e_r\times e_r)=0.
\end{align*}
Therefore
\begin{align*}
\ell=r\times p_\perp.
\end{align*}
Because $r$ is perpendicular to $p_\perp$,
\begin{align*}
|\ell|^2=|r\times p_\perp|^2=|r|^2|p_\perp|^2=\rho^2|p_\perp|^2.
\end{align*}
Hence
\begin{align*}
|p_\perp|^2=\frac{|\ell|^2}{\rho^2}.
\end{align*}
Since $p_\rho e_r$ is perpendicular to $p_\perp$,
\begin{align*}
|p_r|^2=p_\rho^2+|p_\perp|^2=p_\rho^2+\frac{|\ell|^2}{\rho^2}.
\end{align*}
Substitution into $H_{\mathrm{rel}}$ gives the radial Hamiltonian
\begin{align*}
H_{\mathrm{rad}}(\rho,p_\rho)=\frac{p_\rho^2}{2m_{\mathrm{red}}}+V(\rho)+\frac{|\ell|^2}{2m_{\mathrm{red}}\rho^2}.
\end{align*}
Equivalently,
\begin{align*}
V_{\mathrm{eff}}(\rho)=V(\rho)+\frac{|\ell|^2}{2m_{\mathrm{red}}\rho^2}.
\end{align*}
Thus translation reduction removes uniform center-of-mass motion, rotational reduction removes the angular coordinate, and the remaining Kepler-type conic motion is encoded by a one-degree-of-freedom radial Hamiltonian.
[/example]
## Relative Equilibria and Amended Potentials
Reduction also changes the meaning of equilibrium. A point may move in the full phase space only by symmetry, while its projection to the reduced phase space is stationary; these motions are the natural steady states of symmetric mechanics.
The two examples above show reduced motion after quotienting symmetry, but a special case deserves its own language: the reduced curve may be constant. In the original phase space this corresponds not to rest, but to motion generated entirely by the group action.
[definition: Relative Equilibrium]
Let $G\curvearrowright M$ be a Hamiltonian group action with Hamiltonian $H:M\to\mathbb R$. A point $m\in M$ is a relative equilibrium if its Hamiltonian trajectory is contained in a single $G$-orbit.
[/definition]
Thus a rotating rigid body, a circular orbit, or a steadily precessing pendulum can be a relative equilibrium even though the point in the unreduced phase space is not fixed. The reduced phase space records it as an equilibrium after symmetry drift has been removed, and the next theorem makes this statement precise.
[quotetheorem:6849]
[citeproof:6849]
The theorem locates relative equilibria on the reduced phase space, but its equivalence is only as smooth as the reduction used to state it. With isotropy, a point can project to a singular stratum rather than to a point of a smooth reduced manifold, and the equilibrium condition must then be interpreted stratum by stratum. The result also does not say that the full trajectory is stationary in $M$; it may move steadily along a group orbit while all reduced observables remain constant. In computations, constructing the quotient explicitly may be more machinery than a mechanical example requires. For simple mechanical systems, the same information can often be extracted from a function on configuration space that combines the potential with the kinetic cost of fixed momentum.
[definition: Amended Potential]
Let $Q$ be a Riemannian configuration manifold with a free proper action of $G$ by isometries. Let $V:Q\to\mathbb R$ be a $G$-invariant potential, and let $L:TQ\to\mathbb R$ be the simple mechanical Lagrangian
\begin{align*}
L(q,\dot q)=\frac{1}{2}|\dot q|_q^2-V(q).
\end{align*}
For a fixed momentum value $\mu\in\mathfrak g^*$ for which the locked inertia tensor is invertible on the relevant orbit directions, the amended potential is the function $V_\mu:Q\to\mathbb R$ defined by
\begin{align*}
V_\mu(q)=V(q)+\frac{1}{2}\mu(\mathbb I(q)^{-1}\mu),
\end{align*}
where $\mathbb I(q):\mathfrak g\to\mathfrak g^*$ is the locked inertia tensor.
[/definition]
The term involving $\mathbb I(q)^{-1}$ is the kinetic cost of carrying momentum $\mu$ while remaining on the group orbit through $q$. Critical points of this potential are candidates for steady motions with fixed momentum.
[quotetheorem:6850]
[citeproof:6850]
This criterion is the version of the energy method adapted to symmetry. The positive-definiteness hypothesis is the mechanism that creates a strict local minimum of the reduced energy; if the Hessian has a negative direction, nearby reduced trajectories can move away from the candidate steady motion. If the Hessian is degenerate, the test is inconclusive rather than false, and higher-order terms or additional conserved quantities must be examined. Regularity, freeness, and properness are also part of the conclusion: without a smooth reduced model, the phrase "reduced shape directions" must be replaced by a stratum or slice calculation. The criterion does not claim that the full trajectory stays near a fixed point of $M$; it stays near the group orbit of the relative equilibrium.
[example: Rigid Body with a Fixed Point]
For the heavy top, write $A\in SO(3)$ for the attitude of the body and let $e_3$ be the upward vertical unit vector in space. If $\chi$ is the body-frame vector from the fixed point to the centre of mass, then the height of the centre of mass is $\ell\, e_3\cdot A\chi$. Equivalently, with
\begin{align*}
\gamma=A^{-1}e_3,
\end{align*}
the gravitational potential is
\begin{align*}
V(A)=mg\ell\, e_3\cdot A\chi=mg\ell\, A^{-1}e_3\cdot \chi=mg\ell\, \gamma\cdot\chi .
\end{align*}
The symmetry is rotation about the vertical axis, so an infinitesimal group velocity $\xi\in\mathbb R$ produces spatial angular velocity $\xi e_3$. In body coordinates this is
\begin{align*}
\Omega=A^{-1}(\xi e_3)=\xi A^{-1}e_3=\xi\gamma .
\end{align*}
If $\mathbb I_{\mathrm{body}}$ is the inertia tensor about the fixed point, the kinetic energy along this symmetry orbit is
\begin{align*}
T_{\mathrm{orbit}}(A,\xi)=\frac{1}{2}\Omega\cdot \mathbb I_{\mathrm{body}}\Omega .
\end{align*}
Substituting $\Omega=\xi\gamma$ gives
\begin{align*}
T_{\mathrm{orbit}}(A,\xi)=\frac{1}{2}(\xi\gamma)\cdot \mathbb I_{\mathrm{body}}(\xi\gamma)=\frac{1}{2}\xi^2\,\gamma\cdot \mathbb I_{\mathrm{body}}\gamma .
\end{align*}
Thus the locked inertia for the vertical $S^1$-action is the scalar
\begin{align*}
\mathbb I(A)=\gamma\cdot \mathbb I_{\mathrm{body}}\gamma .
\end{align*}
The momentum conjugate to $\xi$ is
\begin{align*}
\mu=\frac{\partial T_{\mathrm{orbit}}}{\partial \xi}=\xi\,\gamma\cdot \mathbb I_{\mathrm{body}}\gamma=\xi\,\mathbb I(A),
\end{align*}
so, for fixed $\mu$, the required angular velocity is
\begin{align*}
\xi=\frac{\mu}{\mathbb I(A)}.
\end{align*}
The kinetic cost of carrying this fixed momentum is therefore
\begin{align*}
\frac{1}{2}\mathbb I(A)\xi^2=\frac{1}{2}\mathbb I(A)\left(\frac{\mu}{\mathbb I(A)}\right)^2=\frac{\mu^2}{2\mathbb I(A)}.
\end{align*}
Hence the amended potential is
\begin{align*}
V_\mu(A)=mg\ell\,\gamma\cdot\chi+\frac{\mu^2}{2\,\gamma\cdot \mathbb I_{\mathrm{body}}\gamma}.
\end{align*}
For an axisymmetric top with $\mathbb I_{\mathrm{body}}=\operatorname{diag}(I_1,I_1,I_3)$ and $\chi=e_3$, write $\theta$ for the angle between the body axis and the vertical. Then $\gamma\cdot\chi=\cos\theta$, and if $\gamma=(\sin\theta\cos\phi,\sin\theta\sin\phi,\cos\theta)$, then
\begin{align*}
\gamma\cdot \mathbb I_{\mathrm{body}}\gamma=I_1\sin^2\theta\cos^2\phi+I_1\sin^2\theta\sin^2\phi+I_3\cos^2\theta.
\end{align*}
Since $\cos^2\phi+\sin^2\phi=1$, this becomes
\begin{align*}
\gamma\cdot \mathbb I_{\mathrm{body}}\gamma=I_1\sin^2\theta+I_3\cos^2\theta.
\end{align*}
Thus
\begin{align*}
V_\mu(\theta)=mg\ell\cos\theta+\frac{\mu^2}{2(I_1\sin^2\theta+I_3\cos^2\theta)}.
\end{align*}
Critical points of this one-variable amended potential give steady rotations or steady precessions at the chosen vertical angular momentum $\mu$, and the sign of the second derivative at such a critical point is the reduced second-variation test for stability modulo the rotation symmetry.
[/example]
The rigid body example also indicates the limits of the clean regular theory: mechanical systems often contain special momentum values, residual isotropy, or degeneracies in the quotient. This motivates keeping track of the hypotheses in every reduction argument, especially freeness, properness, and regularity of the momentum value.
[remark: Singular Momentum Values]
The regular free proper hypotheses exclude important mechanical examples with isotropy, collisions, or zero angular momentum. At singular momentum values the quotient may be stratified rather than smooth, and the reduced dynamics moves on symplectic strata. The regular theory in this chapter is the model case; singular reduction keeps the same philosophy but requires more refined geometry.
[/remark]
Reduction exposes the geometry of motion, but it does not yet solve the equations. Chapter 7 turns to Hamilton-Jacobi theory, where the problem of integrating trajectories is recast as finding a generating function or action whose level sets organize the flow.
# 7. Hamilton-Jacobi Theory and Generating Functions
Hamilton-Jacobi theory rewrites Hamiltonian mechanics as a first-order partial differential equation for an action function. The reward is conceptual and computational: a good solution of this PDE produces a canonical transformation in which the motion becomes algebraic. This chapter develops the Hamilton-Jacobi equation, explains how generating functions encode canonical transformations, and then applies separation of variables to free motion, the harmonic oscillator, and the Kepler problem.
## The Hamilton-Jacobi Equation
Hamilton's equations describe curves in phase space, but in many problems we want a function whose level geometry controls a whole family of such curves. The guiding question is: can the dynamics be recovered from a single scalar function on configuration space and time?
Let $Q$ be an $n$-dimensional configuration manifold and work locally in coordinates $(q_1,\dots,q_n)$ on $Q$ with canonical coordinates $(q,p)$ on $T^*Q$. Let $H:T^*Q\times \mathbb R\to \mathbb R$ be a smooth Hamiltonian, written locally as $H(q,p,t)$.
[definition: Hamilton Principal Function]
A Hamilton principal function for $H$ is a smooth function $S:U\times I\to \mathbb R$, where $U\subset Q$ is a coordinate domain and $I\subset \mathbb R$ is a time interval, satisfying
\begin{align*}
\frac{\partial S}{\partial t}(q,t)+H\left(q,\frac{\partial S}{\partial q}(q,t),t\right)=0.
\end{align*}
[/definition]
The derivative $\partial S/\partial q$ denotes the covector with components $\partial S/\partial q_i$. The equation is nonlinear because the Hamiltonian is evaluated on this covector, and it is first order because only first derivatives of $S$ occur.
[example: Free Particle Hamilton-Jacobi Equation]
For a free particle of mass $m>0$ on $\mathbb R^n$, the Hamiltonian is
\begin{align*}
H(q,p)=\frac{|p|^2}{2m}.
\end{align*}
Substituting $p=\partial S/\partial q$ into the Hamilton-Jacobi equation gives
\begin{align*}
\frac{\partial S}{\partial t}+\frac{1}{2m}\left|\frac{\partial S}{\partial q}\right|^2=0.
\end{align*}
Fix a constant covector $a\in \mathbb R^n$ and define
\begin{align*}
S(q,t;a)=a\cdot q-\frac{|a|^2}{2m}t.
\end{align*}
For each coordinate $q_i$,
\begin{align*}
\frac{\partial S}{\partial q_i}(q,t;a)=a_i.
\end{align*}
Hence
\begin{align*}
\frac{\partial S}{\partial q}(q,t;a)=a.
\end{align*}
The time derivative is
\begin{align*}
\frac{\partial S}{\partial t}(q,t;a)=-\frac{|a|^2}{2m}.
\end{align*}
Therefore the left side of the Hamilton-Jacobi equation is
\begin{align*}
-\frac{|a|^2}{2m}+\frac{1}{2m}|a|^2=0.
\end{align*}
Thus $S(q,t;a)$ is a solution. The momentum obtained from the action is
\begin{align*}
p=\frac{\partial S}{\partial q}=a,
\end{align*}
so the momentum is constant in time; this is exactly the momentum behavior of uniform rectilinear free-particle motion.
[/example]
The free particle calculation gives a family of solutions, but a single parameter choice only describes one momentum value. This motivates the definition of a complete integral: an action with enough independent parameters to label nearby initial conditions and reconstruct the full Hamiltonian flow.
[definition: Complete Integral]
A complete integral of the Hamilton-Jacobi equation for an $n$-degree-of-freedom Hamiltonian is a smooth map
\begin{align*}
S:U\times I\times A\to \mathbb R,
\end{align*}
where $U\subset Q$ is a coordinate domain, $I\subset \mathbb R$ is a time interval, and $A\subset \mathbb R^n$ is open, such that for $\alpha=(\alpha_1,\dots,\alpha_n)\in A$,
\begin{align*}
\frac{\partial S}{\partial t}(q,t;\alpha)+H\left(q,\frac{\partial S}{\partial q}(q,t;\alpha),t\right)=0
\end{align*}
and the mixed Hessian matrix
\begin{align*}
\left(\frac{\partial^2 S}{\partial q_i\partial \alpha_j}\right)_{1\le i,j\le n}
\end{align*}
has nonzero determinant on $U\times I\times A$.
[/definition]
The nondegeneracy condition says that the parameters distinguish a full family of Lagrangian graphs $p=\partial S/\partial q$. The next theorem is needed to turn this parametrized family into actual Hamiltonian trajectories: it identifies the missing constants conjugate to $\alpha$ and proves that the resulting curve solves Hamilton's equations.
[quotetheorem:3532]
[citeproof:3532]
This theorem turns a PDE solution into the full phase-space flow only when the complete integral is nondegenerate. For example, if $S$ is independent of one parameter $\alpha_j$, then $\partial S/\partial \alpha_j$ cannot be used to solve for a missing coordinate, so the constants fail to label a full family of trajectories. The theorem also does not assert global existence or single-valuedness of $S$; caustics, coordinate singularities, or multivalued action functions may force the construction to be restricted to a local patch. The next point is to interpret the successful local construction as a canonical change of variables that makes the new momenta constant.
## Generating Functions for Canonical Transformations
Canonical transformations preserve Hamilton's equations because they preserve the symplectic form. The practical problem is that checking preservation of the symplectic form directly can be cumbersome, so we seek functions whose differential identities guarantee canonicity.
Work locally with old canonical coordinates $(q,p)$ and new canonical coordinates $(Q,P)$ on a $2n$-dimensional phase space; here $Q_i$ denotes a new coordinate, not the earlier configuration manifold $Q$. The canonical one-form in old coordinates is $\theta=p\cdot dq$, and in new coordinates it is $\Theta=P\cdot dQ$. Throughout this section the symplectic form is $\omega=dq_i\wedge dp_i=-d\theta$, with the same convention in the new coordinates.
[definition: Type-One Generating Function]
A type-one generating function is a smooth map
\begin{align*}
F:U_q\times U_Q\times I\to \mathbb R,
\end{align*}
where $U_q,U_Q\subset \mathbb R^n$ are coordinate domains and $I\subset \mathbb R$ is a time interval, such that for each fixed $t\in I$ the induced local transformation is a smooth map
\begin{align*}
\Phi_t:V_t\subset T^*U_q\to V_t'\subset T^*U_Q
\end{align*}
between open subsets of phase space, written in coordinates as $\Phi_t(q,p)=(Q,P)$, and defined by
\begin{align*}
p_i&=\frac{\partial F}{\partial q_i}, & P_i&=-\frac{\partial F}{\partial Q_i}.
\end{align*}
[/definition]
The signs are chosen so that the difference of canonical one-forms is exact. The theorem below is needed because exactness is the usable test: it proves that the differential identities from $F$ preserve the symplectic form and gives the transformed Hamiltonian when $F$ depends on time.
[quotetheorem:6851]
The mixed Hessian condition is the local invertibility condition hidden in the generating-function formulae. If, for instance, $F(q,Q,t)=f(q,t)+g(Q,t)$, then $p$ depends only on $q$ and $P$ depends only on $Q$, so the equations do not couple old and new coordinates strongly enough to determine a local phase-space transformation from $(q,p)$ to $(Q,P)$. The criterion is therefore local and sufficient for transformations represented by this type of generating function; it does not say that every canonical transformation admits a global type-one generating function. The Hamilton-Jacobi method is the special case where the generating function is chosen to eliminate the transformed Hamiltonian. If $F$ is the action function and the new momenta are the constants $\alpha$, then $K=0$ is exactly the Hamilton-Jacobi equation.
[remark: Other Types of Generating Functions]
Generating functions may also use variables $(q,P)$, $(p,Q)$, or $(p,P)$ after Legendre transforms in the relevant variables. For instance, a type-two generating function $G(q,P,t)$ satisfies $p_i=\partial G/\partial q_i$ and $Q_i=\partial G/\partial P_i$. This is the most common form for the Hamilton-Jacobi theorem, where $G(q,\alpha,t)=S(q,t;\alpha)$ and $Q=\partial S/\partial\alpha$.
[/remark]
The generating-function criterion supplies the geometric reason that a complete integral solves the dynamics. The remaining challenge is analytic: how do we find such an integral in examples?
## Separation of Variables and Constants of Motion
Separation of variables tries to exploit special coordinates in which the Hamilton-Jacobi equation splits into independent ordinary equations. The problem is to recognize constants that can be introduced during this splitting and then read them as conserved quantities or orbital parameters.
For an autonomous Hamiltonian $H(q,p)$, it is natural to look for a principal function of the form
\begin{align*}
S(q,t;\alpha)=W(q;\alpha)-Et,
\end{align*}
where $E$ is included among the parameters of the complete integral, or is singled out together with $n-1$ further separation constants. The Hamilton-Jacobi equation then becomes the time-independent equation
\begin{align*}
H\left(q,\frac{\partial W}{\partial q}(q;\alpha)\right)=E.
\end{align*}
The function $W$ is called Hamilton's characteristic function.
[definition: Separated Complete Integral]
A separated complete integral is a complete integral for which, in some coordinate system $(q_1,\dots,q_n)$, the characteristic function is a smooth map
\begin{align*}
W:U\times A\to \mathbb R
\end{align*}
of the form
\begin{align*}
W(q;\alpha)=\sum_{i=1}^n W_i(q_i;\alpha),
\end{align*}
where each component is a smooth map
\begin{align*}
W_i:U_i\times A\to \mathbb R
\end{align*}
and $U\subset U_1\times\cdots\times U_n$.
[/definition]
After substitution into the time-independent Hamilton-Jacobi equation, separation constants appear because terms depending on different coordinates must balance to produce the fixed energy. The next theorem is needed to extract motion from these constants: it says that differentiating the action with respect to parameters supplies the missing implicit equations for the trajectory.
[quotetheorem:3532]
[citeproof:3532]
Jacobi's theorem explains why differentiating a separated action with respect to constants gives angle variables, phase shifts, or time-of-flight formulae. The nondegeneracy assumption is again essential: if two parameters enter $S$ only through their sum, then the equations $\partial S/\partial\alpha_i=\beta_i$ are dependent and cannot determine all coordinates. The theorem also does not guarantee that the separated quadratures can be evaluated in elementary functions, nor that the resulting implicit solution is global through turning points or singular coordinates. The examples below show the mechanism in increasing geometric complexity.
[example: Harmonic Oscillator via the Action Function]
For the one-dimensional harmonic oscillator assume $m>0$, $\omega>0$, and
\begin{align*}
H(q,p)=\frac{p^2}{2m}+\frac{m\omega^2q^2}{2}.
\end{align*}
Seek a separated principal function $S(q,t;E)=W(q;E)-Et$. Then
\begin{align*}
\frac{\partial S}{\partial t}=-E
\end{align*}
and
\begin{align*}
\frac{\partial S}{\partial q}=\frac{dW}{dq}.
\end{align*}
Substitution into the Hamilton-Jacobi equation gives
\begin{align*}
-E+\frac{1}{2m}\left(\frac{dW}{dq}\right)^2+\frac{m\omega^2q^2}{2}=0.
\end{align*}
Equivalently,
\begin{align*}
\frac{1}{2m}\left(\frac{dW}{dq}\right)^2=E-\frac{m\omega^2q^2}{2}.
\end{align*}
Multiplying by $2m$ gives
\begin{align*}
\left(\frac{dW}{dq}\right)^2=2mE-m^2\omega^2q^2.
\end{align*}
Thus, on either momentum branch,
\begin{align*}
\frac{dW}{dq}=\sigma\sqrt{2mE-m^2\omega^2q^2},\qquad \sigma\in\{1,-1\}.
\end{align*}
For $E>0$, set
\begin{align*}
A=\sqrt{\frac{2E}{m\omega^2}}.
\end{align*}
Then
\begin{align*}
m^2\omega^2A^2=m^2\omega^2\frac{2E}{m\omega^2}=2mE.
\end{align*}
Therefore
\begin{align*}
2mE-m^2\omega^2q^2=m^2\omega^2(A^2-q^2).
\end{align*}
Choose the branch
\begin{align*}
W(q;E)=\sigma\int_0^q\sqrt{2mE-m^2\omega^2x^2}\,dx.
\end{align*}
Differentiating under the integral sign on the interval $|q|<A$ gives
\begin{align*}
\frac{\partial W}{\partial E}(q;E)=\sigma\int_0^q\frac{m}{\sqrt{2mE-m^2\omega^2x^2}}\,dx.
\end{align*}
Using $2mE-m^2\omega^2x^2=m^2\omega^2(A^2-x^2)$, this becomes
\begin{align*}
\frac{\partial W}{\partial E}(q;E)=\frac{\sigma}{\omega}\int_0^q\frac{dx}{\sqrt{A^2-x^2}}.
\end{align*}
Since
\begin{align*}
\frac{d}{dx}\arcsin\left(\frac{x}{A}\right)=\frac{1}{\sqrt{A^2-x^2}},
\end{align*}
we obtain
\begin{align*}
\frac{\partial W}{\partial E}(q;E)=\frac{\sigma}{\omega}\arcsin\left(\frac{q}{A}\right).
\end{align*}
By *Jacobi's Integration Theorem*, the time relation is
\begin{align*}
t+\beta=\frac{\partial W}{\partial E}(q;E).
\end{align*}
Hence
\begin{align*}
t+\beta=\frac{\sigma}{\omega}\arcsin\left(\frac{q}{A}\right).
\end{align*}
Multiplying by $\sigma\omega$ gives
\begin{align*}
\arcsin\left(\frac{q}{A}\right)=\sigma\omega(t+\beta).
\end{align*}
Taking sine of both sides gives
\begin{align*}
\frac{q}{A}=\sin(\sigma\omega(t+\beta)).
\end{align*}
Thus
\begin{align*}
q(t)=A\sin(\sigma\omega t+\sigma\omega\beta).
\end{align*}
Writing the constant phase as $\delta=\sigma\omega\beta$ and absorbing the branch sign into the phase gives
\begin{align*}
q(t)=A\sin(\omega t+\delta),\qquad A=\sqrt{\frac{2E}{m\omega^2}}.
\end{align*}
The separated action therefore recovers the oscillator amplitude from the conserved energy and recovers the remaining integration constant as the phase $\delta$.
[/example]
The oscillator has one degree of freedom, so separation is almost forced by energy conservation. The Kepler problem is the first major example where coordinate choice exposes hidden structure.
[example: Kepler Problem by Separation in Polar Coordinates]
For planar Kepler motion with mass $m>0$ and gravitational parameter $\kappa>0$, use polar coordinates $(r,\theta)$ with Hamiltonian
\begin{align*}
H(r,\theta,p_r,p_\theta)=\frac{p_r^2}{2m}+\frac{p_\theta^2}{2mr^2}-\frac{\kappa}{r}.
\end{align*}
Since $\theta$ does not occur in $H$, seek a separated principal function
\begin{align*}
S(r,\theta,t;E,L)=W_r(r;E,L)+L\theta-Et.
\end{align*}
Then
\begin{align*}
\frac{\partial S}{\partial t}=-E.
\end{align*}
Also
\begin{align*}
\frac{\partial S}{\partial r}=\frac{dW_r}{dr}.
\end{align*}
And
\begin{align*}
\frac{\partial S}{\partial \theta}=L.
\end{align*}
Substituting these derivatives into the Hamilton-Jacobi equation gives
\begin{align*}
-E+\frac{1}{2m}\left(\frac{dW_r}{dr}\right)^2+\frac{L^2}{2mr^2}-\frac{\kappa}{r}=0.
\end{align*}
Equivalently,
\begin{align*}
\frac{1}{2m}\left(\frac{dW_r}{dr}\right)^2+\frac{L^2}{2mr^2}-\frac{\kappa}{r}=E.
\end{align*}
Multiplying by $2m$ gives
\begin{align*}
\left(\frac{dW_r}{dr}\right)^2+\frac{L^2}{r^2}-\frac{2m\kappa}{r}=2mE.
\end{align*}
Hence, on a chosen radial momentum branch,
\begin{align*}
\frac{dW_r}{dr}=\sigma\sqrt{2mE+\frac{2m\kappa}{r}-\frac{L^2}{r^2}},\qquad \sigma\in\{1,-1\}.
\end{align*}
The momenta recovered from the action are therefore
\begin{align*}
p_r=\sigma\sqrt{2mE+\frac{2m\kappa}{r}-\frac{L^2}{r^2}}.
\end{align*}
And
\begin{align*}
p_\theta=L.
\end{align*}
Thus $L$ is the angular momentum parameter.
Choose
\begin{align*}
W_r(r;E,L)=\sigma\int^r\sqrt{2mE+\frac{2m\kappa}{\rho}-\frac{L^2}{\rho^2}}\,d\rho
\end{align*}
on an interval where the square root is positive. Differentiating with respect to $L$ gives
\begin{align*}
\frac{\partial W_r}{\partial L}=\sigma\int^r\frac{-L/\rho^2}{\sqrt{2mE+2m\kappa/\rho-L^2/\rho^2}}\,d\rho.
\end{align*}
The constant conjugate to $L$ is
\begin{align*}
\beta_L=\frac{\partial S}{\partial L}=\theta+\frac{\partial W_r}{\partial L}.
\end{align*}
Therefore
\begin{align*}
\theta-\beta_L=\sigma\int^r\frac{L/\rho^2}{\sqrt{2mE+2m\kappa/\rho-L^2/\rho^2}}\,d\rho.
\end{align*}
Set $u=1/\rho$. Then $du=-d\rho/\rho^2$, so the integral becomes
\begin{align*}
\theta-\beta_L=-\sigma\int^u\frac{L\,du}{\sqrt{2mE+2m\kappa u-L^2u^2}}.
\end{align*}
Now complete the square in the denominator. First,
\begin{align*}
2mE+2m\kappa u-L^2u^2=-L^2\left(u^2-\frac{2m\kappa}{L^2}u-\frac{2mE}{L^2}\right).
\end{align*}
Since
\begin{align*}
u^2-\frac{2m\kappa}{L^2}u=\left(u-\frac{m\kappa}{L^2}\right)^2-\frac{m^2\kappa^2}{L^4},
\end{align*}
we get
\begin{align*}
2mE+2m\kappa u-L^2u^2=L^2\left(\frac{m^2\kappa^2}{L^4}+\frac{2mE}{L^2}-\left(u-\frac{m\kappa}{L^2}\right)^2\right).
\end{align*}
Define
\begin{align*}
e^2=1+\frac{2EL^2}{m\kappa^2}.
\end{align*}
Then
\begin{align*}
\frac{m^2\kappa^2}{L^4}+\frac{2mE}{L^2}=\frac{m^2\kappa^2}{L^4}e^2.
\end{align*}
Thus
\begin{align*}
2mE+2m\kappa u-L^2u^2=L^2\left(\frac{m^2\kappa^2}{L^4}e^2-\left(u-\frac{m\kappa}{L^2}\right)^2\right).
\end{align*}
For bounded negative-energy motion, $E<0$ and $0\le e<1$. The preceding integral is therefore an inverse trigonometric integral, so its result can be written by absorbing the signs and additive constants into a single angle $\theta_0$:
\begin{align*}
u=\frac{m\kappa}{L^2}\left(1+e\cos(\theta-\theta_0)\right).
\end{align*}
Since $u=1/r$, this is
\begin{align*}
\frac{1}{r}=\frac{m\kappa}{L^2}\left(1+e\cos(\theta-\theta_0)\right).
\end{align*}
Inverting gives
\begin{align*}
r(\theta)=\frac{L^2/(m\kappa)}{1+e\cos(\theta-\theta_0)}.
\end{align*}
With $\ell=L^2/(m\kappa)$, the orbit is
\begin{align*}
r(\theta)=\frac{\ell}{1+e\cos(\theta-\theta_0)}.
\end{align*}
Thus the separated action identifies $L$ as angular momentum and recovers the bounded Kepler trajectories as conic sections with eccentricity $0\le e<1$.
[/example]
The Hamilton-Jacobi viewpoint packages several earlier ideas from the course. The action is a generating function, its derivatives give momenta and constants, and separation finds coordinates adapted to conserved quantities. Before returning to that global integrability picture, Chapter 8 studies the local behaviour near special motions through stability, periodic orbits, and normal forms. Later, in Chapter 9, the same construction reappears as action-angle variables: the separated actions become coordinates on invariant tori, and Hamilton's equations become linear flow.
Hamilton-Jacobi methods often reduce dynamics to the study of motion near special solutions. Chapter 8 focuses on that local picture through stability, periodic orbits, and normal forms, showing how linearization and canonical coordinates reveal the structure of small oscillations and instabilities.
# 8. Stability, Periodic Motion, and Normal Forms
Stability theory asks what the qualitative behaviour of a mechanical system looks like near a solution that is already known. The chapter assumes Hamilton's equations and conserved energy from Chapter 3, symplectic coordinates from Chapter 4, and the standard ODE tools of linearization, eigenvalues of real matrices, and Taylor expansion near a critical point. In earlier chapters, conserved quantities and symplectic structure gave global constraints on motion; here we use the local geometry of Hamiltonian vector fields to distinguish stable oscillation, escape from equilibrium, and the onset of periodic or quasiperiodic motion. The chapter moves from linearization and Lyapunov stability to the special eigenvalue constraints imposed by symplectic geometry, then ends with Birkhoff normal form as the local model for nonlinear elliptic dynamics.
## Stability Near Equilibria
The first local question is whether a system released close to an equilibrium remains close to it, and whether the linearized equations already decide the answer. Mechanical equilibria often come from critical points of an energy or potential, but the stability notion is dynamical: it concerns entire trajectories, not only the sign of a Hessian.
[definition: Equilibrium Point]
Let $X: U \to \mathbb R^n$ be a smooth vector field on an [open set](/page/Open%20Set) $U \subset \mathbb R^n$. A point $x^* \in U$ is an equilibrium point of the ODE $\dot{x}=X(x)$ if
\begin{align*}
X(x^*)=0.
\end{align*}
[/definition]
Once an equilibrium is fixed, the phase portrait is read in coordinates centred at that point. This motivates isolating the first-order approximation to the vector field, because nearby trajectories initially respond to the derivative at the equilibrium.
[definition: Linearization At An Equilibrium]
Let $X: U \to \mathbb R^n$ be smooth and let $x^* \in U$ be an equilibrium. The linearization of $\dot{x}=X(x)$ at $x^*$ is the linear ODE
\begin{align*}
\dot{y}=JX_{x^*}y,
\end{align*}
where $JX_{x^*}$ is the Jacobian matrix of $X$ at $x^*$.
[/definition]
The linearized equation is obtained by writing $x=x^*+y$ and expanding $X(x^*+y)=JX_{x^*}y+O(|y|^2)$ as $y \to 0$. This motivates a precise stability definition, since boundedness of the linear approximation only matters if it controls all future times for nearby nonlinear solutions.
[definition: Lyapunov Stability]
Let $X:U\to\mathbb R^n$ be a smooth vector field on an open set $U\subset\mathbb R^n$, let $x^*\in U$ be an equilibrium of $\dot{x}=X(x)$, and let $\varphi_t:V\to U$ denote the time-$t$ flow map defined on a neighbourhood $V\subset U$ of $x^*$ for all $t\ge 0$. The equilibrium $x^*$ is Lyapunov stable if for every $\varepsilon>0$ there exists $\delta>0$ such that
\begin{align*}
|x_0-x^*|<\delta \implies |\varphi_t(x_0)-x^*|<\varepsilon \quad \text{for all } t\ge 0.
\end{align*}
It is asymptotically stable if it is Lyapunov stable and there exists $\delta_0>0$ such that $|x_0-x^*|<\delta_0$ implies $\varphi_t(x_0)\to x^*$ as $t\to\infty$.
[/definition]
Lyapunov stability is deliberately uniform in forward time. A nearby orbit may oscillate forever without converging, which is stable but not asymptotically stable; this raises the central question of when the eigenvalues of the linearization force stability or instability for the nonlinear system.
[quotetheorem:6852]
[citeproof:6852]
The theorem leaves open the borderline case in which all eigenvalues have nonpositive real part and some lie on the imaginary axis. The hypothesis with strictly negative real parts is sufficient but not necessary for stability: the planar linear center $\dot{x}=-y$, $\dot{y}=x$ has eigenvalues $\pm i$ and all nearby trajectories are closed circles. The instability hypothesis is also genuinely one-sided: the scalar equation $\dot{x}=x^2$ has zero linearization at the origin but solutions with $x_0>0$ move away from the equilibrium. Hamiltonian equilibria live precisely in this delicate borderline regime much of the time, so an example from small oscillations is the natural test case.
[example: Small Oscillations About A Potential Minimum]
Consider a natural mechanical system on $\mathbb R^n$ with Hamiltonian
\begin{align*}
H(q,p)=\frac{1}{2}p^\top M^{-1}p+V(q),
\end{align*}
where $M$ is symmetric positive definite and $V\in C^2(\mathbb R^n)$. Hamilton's equations are
\begin{align*}
\dot q=M^{-1}p, \qquad \dot p=-\nabla V(q).
\end{align*}
If $q^*$ is a nondegenerate local minimum of $V$, then $\nabla V(q^*)=0$ and $K=D^2V(q^*)$ is symmetric positive definite, so at $(q^*,0)$ we have
\begin{align*}
\dot q=M^{-1}0=0, \qquad \dot p=-\nabla V(q^*)=0.
\end{align*}
Thus $(q^*,0)$ is an equilibrium.
Write $q=q^*+\eta$ and $p=\pi$. Since $V$ is $C^2$,
\begin{align*}
\nabla V(q^*+\eta)=\nabla V(q^*)+D^2V(q^*)\eta+r(\eta)
\end{align*}
with $r(\eta)/|\eta|\to 0$ as $\eta\to 0$. Using $\nabla V(q^*)=0$, the equations become
\begin{align*}
\dot\eta=M^{-1}\pi, \qquad \dot\pi=-K\eta-r(\eta).
\end{align*}
Keeping only the terms linear in $(\eta,\pi)$ gives the linearized system
\begin{align*}
\dot\eta=M^{-1}\pi, \qquad \dot\pi=-K\eta.
\end{align*}
Differentiating the first equation and substituting the second gives
\begin{align*}
\ddot\eta=M^{-1}\dot\pi=-M^{-1}K\eta.
\end{align*}
Now set $x=M^{1/2}\eta$, so $\eta=M^{-1/2}x$ and $\ddot\eta=M^{-1/2}\ddot x$. Substitution into $\ddot\eta=-M^{-1}K\eta$ gives
\begin{align*}
M^{-1/2}\ddot x=-M^{-1}KM^{-1/2}x.
\end{align*}
Multiplying by $M^{1/2}$ gives
\begin{align*}
\ddot x=-M^{-1/2}KM^{-1/2}x.
\end{align*}
The matrix $A=M^{-1/2}KM^{-1/2}$ is symmetric positive definite, because for $w\ne 0$,
\begin{align*}
w^\top Aw=(M^{-1/2}w)^\top K(M^{-1/2}w)>0.
\end{align*}
Hence there is an orthogonal matrix $O$ and positive numbers $\omega_1^2,\dots,\omega_n^2$ such that
\begin{align*}
O^\top AO=\operatorname{diag}(\omega_1^2,\dots,\omega_n^2).
\end{align*}
With $u=O^\top x$, we have $x=Ou$ and $\ddot x=O\ddot u$, so
\begin{align*}
O\ddot u=-AOu.
\end{align*}
Multiplying by $O^\top$ gives
\begin{align*}
\ddot u=-O^\top AOu=-\operatorname{diag}(\omega_1^2,\dots,\omega_n^2)u.
\end{align*}
Therefore each normal coordinate satisfies
\begin{align*}
\ddot u_j+\omega_j^2u_j=0.
\end{align*}
The linearized motion near the potential minimum is therefore a superposition of harmonic oscillations with positive frequencies $\omega_j$, so the linear model predicts bounded oscillation rather than attraction.
[/example]
This example explains why positive definiteness of the potential Hessian produces bounded linear motion rather than attraction. Since linear eigenvalues on the imaginary axis do not trigger the indirect method, we need a conserved quantity argument that proves stability without decay.
[quotetheorem:6853]
[citeproof:6853]
The energy argument is weaker than asymptotic stability but is exactly adapted to conservative mechanics. Strictness of the minimum is essential for confinement: for $X(x,y)=(0,x)$, the function $H(x,y)=x^2$ is conserved and has a non-strict local minimum at the origin, but solutions with $x_0\ne 0$ satisfy $y(t)=y_0+x_0t$ and eventually leave every small ball. Conservation is essential as well, because a strict minimum of a non-conserved function gives no barrier against crossing its level sets. The criterion also does not imply asymptotic stability; if a nonconstant conserved quantity remains equal to its initial value, a whole nearby orbit cannot converge to an isolated point with a different energy value. This motivates the classification of linear Hamiltonian equilibria, because the quadratic part of $H$ determines whether the nearby energy surfaces look confining or saddle-shaped.
## Symplectic Linearization And Eigenvalue Pairing
The next question is what extra restrictions appear when the vector field is Hamiltonian rather than arbitrary. A general real matrix can have many eigenvalue patterns, but a Hamiltonian linearization must preserve the symplectic form, and this forces eigenvalues to occur in symmetric families.
[definition: Linear Hamiltonian System]
Let $(\mathbb R^{2n},\omega_0)$ have canonical coordinates $(q,p)$, and let $J:\mathbb R^{2n}\to\mathbb R^{2n}$ be the canonical symplectic linear map defined by $J(q,p)=(p,-q)$. A linear ODE $\dot{z}=Az$, with $A:\mathbb R^{2n}\to\mathbb R^{2n}$ linear, is Hamiltonian if there exists a symmetric linear map $S:\mathbb R^{2n}\to\mathbb R^{2n}$, represented by a matrix satisfying $S=S^\top$, such that
\begin{align*}
A=JS.
\end{align*}
[/definition]
For a nonlinear Hamiltonian $H:U\subset\mathbb R^{2n}\to\mathbb R$, the linearization at an equilibrium $z^*$ is of this form with
\begin{align*}
A=J D^2H(z^*),
\end{align*}
where $D^2H(z^*)$ is the symmetric Hessian matrix of $H$ at $z^*$. This motivates the spectral pairing theorem, since the Hamiltonian matrix identity should have visible consequences for the eigenvalues.
[quotetheorem:6854]
[citeproof:6854]
This pairing is the source of the standard local types, but it is a special consequence of the Hamiltonian identity and fails for general real matrices: a matrix such as $\operatorname{diag}(-1,-2)$ has no matching positive eigenvalues. The realness assumption is what supplies the conjugate pair: without real coefficients in the characteristic polynomial, $\lambda$ need not bring $\bar{\lambda}$ with it. The Hamiltonian condition is what supplies the sign pair: without $A^\top J+JA=0$, there is no symplectic skew-adjointness forcing $\lambda$ and $-\lambda$ to occur together. Pairing also does not by itself prove nonlinear stability, since a hyperbolic pair already contains an expanding direction and an elliptic pair may be destabilised or constrained depending on nonlinear terms and conserved quantities. A standard concrete warning is the Cherry Hamiltonian
\begin{align*}
H(q_1,p_1,q_2,p_2)=\frac{1}{2}(q_1^2+p_1^2)-(q_2^2+p_2^2)+\mu\left(q_1p_2^2-2p_1q_2p_2-\frac{1}{3}q_1^3\right),
\end{align*}
whose origin has purely imaginary linear eigenvalues for $\mu\ne 0$, while the nonlinear equilibrium is Lyapunov unstable. What the theorem supplies is a spectral grammar for Hamiltonian phase portraits, which the next definitions translate into local building blocks. To use the theorem in phase portraits, we give names to the three eigenvalue patterns that occur in low-dimensional Hamiltonian blocks.
[definition: Elliptic Hyperbolic And Focus Linear Types]
Let $A:\mathbb R^{2n}\to\mathbb R^{2n}$ be a real Hamiltonian matrix for the canonical symplectic structure. A spectral pair is elliptic if it has the form $\{i\omega,-i\omega\}$ with $\omega>0$. It is hyperbolic if it has the form $\{\lambda,-\lambda\}$ with $\lambda>0$. It is focus-focus if it has the form
\begin{align*}
\{\alpha+i\beta,\alpha-i\beta,-\alpha+i\beta,-\alpha-i\beta\}
\end{align*}
with $\alpha\ne 0$ and $\beta\ne 0$.
[/definition]
These labels describe the linear phase portrait rather than the full nonlinear dynamics. Elliptic directions rotate, hyperbolic directions stretch and contract, and focus-focus directions combine rotation with exponential growth and decay; a coupled oscillator gives a concrete elliptic model.
[illustration:hamiltonian-linear-blocks]
[example: Two Coupled Oscillators]
Let
\begin{align*}
H(q,p)=\frac{1}{2}(p_1^2+p_2^2)+\frac{1}{2}(k_1q_1^2+k_2q_2^2)+\varepsilon q_1q_2,
\end{align*}
where $k_1,k_2>0$ and $|\varepsilon|<\sqrt{k_1k_2}$. Hamilton's equations are
\begin{align*}
\dot q_1=p_1,\quad \dot q_2=p_2,\quad \dot p_1=-k_1q_1-\varepsilon q_2,\quad \dot p_2=-\varepsilon q_1-k_2q_2.
\end{align*}
At $(q,p)=(0,0)$ all four right-hand sides vanish, so the origin is an equilibrium.
The potential Hessian $K$ is the symmetric linear map with
\begin{align*}
K(q_1,q_2)=(k_1q_1+\varepsilon q_2,\varepsilon q_1+k_2q_2).
\end{align*}
For $v=(a,b)\ne 0$,
\begin{align*}
v^\top Kv=k_1a^2+2\varepsilon ab+k_2b^2.
\end{align*}
Completing the square in $a$ gives
\begin{align*}
k_1a^2+2\varepsilon ab+k_2b^2=k_1\left(a+\frac{\varepsilon}{k_1}b\right)^2+\left(k_2-\frac{\varepsilon^2}{k_1}\right)b^2.
\end{align*}
Since $k_1>0$ and $k_2-\varepsilon^2/k_1=(k_1k_2-\varepsilon^2)/k_1>0$, this quadratic form is positive for every nonzero $v$. Thus $K$ is positive definite.
The linearized equations are already the displayed equations, because $H$ is quadratic. Differentiating $\dot q=p$ gives
\begin{align*}
\ddot q_1=\dot p_1=-k_1q_1-\varepsilon q_2.
\end{align*}
Similarly,
\begin{align*}
\ddot q_2=\dot p_2=-\varepsilon q_1-k_2q_2.
\end{align*}
Equivalently,
\begin{align*}
\ddot q=-Kq.
\end{align*}
The characteristic polynomial of $K$ is
\begin{align*}
(k_1-\lambda)(k_2-\lambda)-\varepsilon^2=\lambda^2-(k_1+k_2)\lambda+(k_1k_2-\varepsilon^2).
\end{align*}
Hence its two eigenvalues are
\begin{align*}
\lambda_\pm=\frac{k_1+k_2\pm\sqrt{(k_1-k_2)^2+4\varepsilon^2}}{2}.
\end{align*}
Their sum is $k_1+k_2>0$ and their product is $k_1k_2-\varepsilon^2>0$, so both eigenvalues are positive. Write $\omega_\pm=\sqrt{\lambda_\pm}$.
Choose orthonormal eigenvectors of $K$ and let $O$ be the corresponding orthogonal change of coordinates. With $Q=O^\top q$ and $P=O^\top p$, we have $p^\top p=P^\top P$ and $q^\top Kq=Q^\top (O^\top KO)Q$. Since $O^\top KO$ acts by multiplying the two coordinates by $\lambda_+$ and $\lambda_-$, the Hamiltonian becomes
\begin{align*}
H=\frac{1}{2}(P_+^2+P_-^2)+\frac{1}{2}(\lambda_+Q_+^2+\lambda_-Q_-^2).
\end{align*}
Therefore
\begin{align*}
\ddot Q_++\omega_+^2Q_+=0,\quad \ddot Q_-+\omega_-^2Q_-=0.
\end{align*}
The equilibrium is elliptic: the two [normal coordinates](/theorems/2713) oscillate with positive frequencies $\omega_+$ and $\omega_-$, while the original coordinates are linear combinations of these normal oscillations. The coupling term $\varepsilon q_1q_2$ therefore changes the coordinate oscillators into collective normal modes rather than producing attraction or exponential escape.
[/example]
The coupled oscillator model is the finite-dimensional prototype for normal modes in mechanics. The same calculation explains why resonances matter: if integer relations among the frequencies occur, nonlinear terms can accumulate instead of averaging away.
[remark: Hamiltonian Systems Do Not Have Isolated Attractors]
A Hamiltonian flow preserves symplectic volume by [Liouville's theorem](/theorems/38). Therefore an open set of initial conditions cannot be compressed into a smaller attracting neighbourhood of an equilibrium. This is why asymptotic stability is not the natural target in conservative mechanics, while Lyapunov stability and orbital confinement are central.
[/remark]
The absence of attraction shifts attention from decay to persistence of oscillation. To understand nonlinear corrections to elliptic motion, we need coordinates that simplify the Hamiltonian order by order near the equilibrium.
## Birkhoff Normal Form Near Elliptic Equilibria
The final question is how much of the nonlinear Hamiltonian near an elliptic equilibrium can be removed by canonical changes of variables. Linear theory gives harmonic oscillators, but the nonlinear terms alter frequencies, create resonant interactions, and control the long-time geometry of nearby invariant tori.
[definition: Nonresonant Elliptic Equilibrium]
Let $H:U\to\mathbb R$ be a smooth Hamiltonian on an open neighbourhood $U\subset\mathbb R^{2n}$ of $0$, with an elliptic equilibrium at $0$. Let the linearized frequencies be $\omega_1,\dots,\omega_n>0$. The equilibrium is nonresonant to order $N$ if for every $k=(k_1,\dots,k_n)\in\mathbb Z^n$ with $0<|k_1|+\dots+|k_n|\le N$, one has
\begin{align*}
k_1\omega_1+\dots+k_n\omega_n\ne 0.
\end{align*}
It is nonresonant if this condition holds for every $N\in\mathbb N$.
[/definition]
Nonresonance says that no finite integer combination of the basic linear frequencies vanishes. If $n=2$ and $\omega_1=\omega_2$, the relation $\omega_1-\omega_2=0$ means that angle-dependent monomials coupling the two oscillators can commute with the quadratic flow and therefore cannot all be removed. This motivates action variables, since the normal form separates terms depending only on oscillator amplitudes from terms depending on oscillatory angles.
[definition: Action Variables For The Harmonic Oscillator]
In canonical coordinates $(Q,P)\in\mathbb R^{2n}$, the $j$th quadratic action is the map $I_j:\mathbb R^{2n}\to\mathbb R$ defined by
\begin{align*}
I_j(Q,P)=\frac{1}{2}(Q_j^2+P_j^2), \qquad j=1,\dots,n.
\end{align*}
[/definition]
The action $I_j$ measures the amplitude squared of the $j$th oscillator. Since a Hamiltonian depending only on $I=(I_1,\dots,I_n)$ produces rotations with angle-independent amplitudes, this motivates the [Birkhoff normal form theorem](/theorems/6855): it asks how far a nonlinear Hamiltonian can be converted into an action-only expression by canonical transformations.
[quotetheorem:6855]
[citeproof:6855]
The theorem is a formal and finite-order statement in these notes. The differentiability hypothesis is needed because a degree-$N$ normal form is built from the Taylor polynomial through order $N+1$; for example, adding a term such as $|Q_1|^{N+\alpha}$ with $0<\alpha<1$ to a harmonic oscillator gives a Hamiltonian with insufficient smoothness for the required degree-$(N+1)$ Taylor remainder. Ellipticity is also essential. At a hyperbolic equilibrium, such as the top of the pendulum, the linear flow has expanding and contracting directions rather than oscillator actions, so the action-only normal form above is the wrong local model. Nonresonance to order $N$ is the condition that permits division by the frequency combinations appearing in the homological equation; in the resonant case $\omega_1=\omega_2$, terms carrying the angle combination $\theta_1-\theta_2$ may survive because the corresponding divisor is zero. The smoothness and locality assumptions matter because the construction uses only a finite Taylor jet near the equilibrium, not a globally defined canonical coordinate system. Thus the theorem does not by itself assert convergence of the normalizing transformation or global stability for all time, but it gives the canonical local expansion from which long-time stability results, celestial-mechanics perturbation theory, molecular vibration theory, and KAM theory begin. The same distinction between removable oscillatory terms and resonant surviving terms also reappears in averaging methods, wave interactions, and semiclassical perturbation theory, where local normal forms turn complicated dynamics into a hierarchy of effective frequencies and slow variables.
[example: Nonlinear Pendulum Near The Bottom]
For the pendulum Hamiltonian
\begin{align*}
H(\theta,p)=\frac{1}{2}p^2+1-\cos\theta,
\end{align*}
Hamilton's equations are
\begin{align*}
\dot\theta=p,\qquad \dot p=-\sin\theta.
\end{align*}
At $(\theta,p)=(0,0)$, these give $\dot\theta=0$ and $\dot p=0$, so $(0,0)$ is an equilibrium. Linearizing $\sin\theta$ at $\theta=0$ gives $\sin\theta=\theta+O(|\theta|^3)$, hence the linearized equation is
\begin{align*}
\dot\theta=p,\qquad \dot p=-\theta.
\end{align*}
Differentiating $\dot\theta=p$ gives $\ddot\theta=\dot p=-\theta$, so the linearized motion has frequency $1$ and the equilibrium is elliptic.
Using the Taylor expansion
\begin{align*}
\cos\theta=1-\frac{\theta^2}{2}+\frac{\theta^4}{24}+O(|\theta|^6),
\end{align*}
we get
\begin{align*}
1-\cos\theta=\frac{\theta^2}{2}-\frac{\theta^4}{24}+O(|\theta|^6).
\end{align*}
Substituting this into $H$ gives
\begin{align*}
H(\theta,p)=\frac{1}{2}p^2+\frac{1}{2}\theta^2-\frac{1}{24}\theta^4+O(|\theta|^6).
\end{align*}
Thus
\begin{align*}
H(\theta,p)=\frac{1}{2}(p^2+\theta^2)-\frac{1}{24}\theta^4+O(|\theta|^6).
\end{align*}
Introduce harmonic-oscillator action-angle variables by
\begin{align*}
\theta=\sqrt{2I}\sin\phi,\qquad p=\sqrt{2I}\cos\phi.
\end{align*}
Then
\begin{align*}
\frac{1}{2}(p^2+\theta^2)=\frac{1}{2}(2I\cos^2\phi+2I\sin^2\phi).
\end{align*}
Since $\sin^2\phi+\cos^2\phi=1$, this becomes
\begin{align*}
\frac{1}{2}(p^2+\theta^2)=I.
\end{align*}
The quartic term is
\begin{align*}
-\frac{1}{24}\theta^4=-\frac{1}{24}(2I)^2\sin^4\phi.
\end{align*}
Therefore
\begin{align*}
-\frac{1}{24}\theta^4=-\frac{1}{6}I^2\sin^4\phi.
\end{align*}
Using
\begin{align*}
\sin^4\phi=\frac{3-4\cos(2\phi)+\cos(4\phi)}{8},
\end{align*}
we obtain
\begin{align*}
-\frac{1}{6}I^2\sin^4\phi=-\frac{1}{48}I^2(3-4\cos(2\phi)+\cos(4\phi)).
\end{align*}
Distributing the factor $-I^2/48$ gives
\begin{align*}
-\frac{1}{6}I^2\sin^4\phi=-\frac{1}{16}I^2+\frac{1}{12}I^2\cos(2\phi)-\frac{1}{48}I^2\cos(4\phi).
\end{align*}
In one degree of freedom there is no nonzero integer $k$ with $k\cdot 1=0$, so the elliptic frequency is nonresonant to every order. By *Birkhoff Normal Form*, the angle-dependent quartic terms can be removed by a local canonical change of variables, while the action-only quartic term remains. Thus, through fourth order, the normal form is
\begin{align*}
H_{\mathrm{BNF}}(I)=I-\frac{1}{16}I^2+O(I^3).
\end{align*}
The corresponding frequency is the derivative of the normal-form Hamiltonian with respect to the action:
\begin{align*}
\Omega(I)=\frac{d}{dI}\left(I-\frac{1}{16}I^2+O(I^3)\right).
\end{align*}
Hence
\begin{align*}
\Omega(I)=1-\frac{1}{8}I+O(I^2).
\end{align*}
Since $I=(Q^2+P^2)/2$ measures the squared oscillation amplitude, the negative quartic correction means that small pendulum oscillations slow down as their amplitude increases.
[/example]
This local picture describes small oscillations, but it cannot see the global change in topology of trajectories at the separatrix. The same pendulum therefore illustrates both the strength and the limitation of normal forms.
[illustration:pendulum-phase-cylinder]
[example: Pendulum Near The Separatrix]
For
\begin{align*}
H(\theta,p)=\frac{1}{2}p^2+1-\cos\theta,
\end{align*}
Hamilton's equations are $\dot\theta=p$ and $\dot p=-\sin\theta$. At $(\theta,p)=(\pi,0)$, these give
\begin{align*}
\dot\theta=0
\end{align*}
and
\begin{align*}
\dot p=-\sin\pi=0,
\end{align*}
so $(\pi,0)$ is an equilibrium. Write $\theta=\pi+u$. Since
\begin{align*}
\sin(\pi+u)=-\sin u=-u+O(|u|^3),
\end{align*}
the equations near $(\pi,0)$ become $\dot u=p$ and $\dot p=u+O(|u|^3)$. Keeping the linear terms gives $\dot u=p$ and $\dot p=u$. The linearization sends $(u,p)$ to $(p,u)$, so applying it twice sends $(u,p)$ back to $(u,p)$. Hence its eigenvalues satisfy $\lambda^2=1$, and both values occur: $(1,1)$ has eigenvalue $1$, while $(1,-1)$ has eigenvalue $-1$. Thus the top equilibrium has one expanding linear direction and one contracting linear direction.
The energy at the top equilibrium is
\begin{align*}
H(\pi,0)=\frac{1}{2}0^2+1-\cos\pi=1-(-1)=2.
\end{align*}
On the same energy level,
\begin{align*}
\frac{1}{2}p^2+1-\cos\theta=2.
\end{align*}
Subtracting $1-\cos\theta$ from both sides gives
\begin{align*}
\frac{1}{2}p^2=1+\cos\theta.
\end{align*}
Multiplying by $2$ gives
\begin{align*}
p^2=2(1+\cos\theta).
\end{align*}
Using $1+\cos\theta=2\cos^2(\theta/2)$, this becomes
\begin{align*}
p^2=4\cos^2(\theta/2).
\end{align*}
Thus, on the interval $-\pi<\theta<\pi$, the two separatrix branches are
\begin{align*}
p=2\cos(\theta/2)
\end{align*}
and
\begin{align*}
p=-2\cos(\theta/2),
\end{align*}
with the endpoints identified on the phase cylinder.
For an energy $E<2$, the equation
\begin{align*}
\frac{1}{2}p^2+1-\cos\theta=E
\end{align*}
has turning points where $p=0$, namely where
\begin{align*}
1-\cos\theta=E.
\end{align*}
These trajectories oscillate between two turning angles, so they are librations. For $E>2$, the same energy equation gives
\begin{align*}
p^2=2(E-1+\cos\theta).
\end{align*}
Since $\cos\theta\ge -1$, we have
\begin{align*}
2(E-1+\cos\theta)\ge 2(E-2)>0.
\end{align*}
Therefore $p$ never vanishes, so $\theta$ keeps moving around the circle and the trajectories are rotations. The level $H=2$ through $(\pi,0)$ is exactly the boundary between libration and rotation: the bottom equilibrium is governed locally by elliptic normal form, while the top equilibrium organizes the global transition between qualitatively different motions.
[/example]
The separatrix example marks the boundary of the chapter's local methods. Near elliptic equilibria, normal forms organize nonlinear oscillation; near hyperbolic equilibria, invariant manifolds and separatrix geometry control instability and transition.
The local analysis of equilibria and periodic motion prepares the way for the global structure of integrable systems. Chapter 9 combines the previous geometric tools to show when enough commuting integrals allow a change of variables to action-angle coordinates and a complete description of the flow.
# 9. Integrable Systems and Action-Angle Coordinates
This chapter brings together the geometric language developed earlier in the course and applies it to Hamiltonian systems with many conserved quantities. The guiding problem is to understand when a Hamiltonian system can be solved by changing coordinates rather than by integrating a new differential equation. The answer is that enough independent first integrals, provided they Poisson-commute, force the motion onto invariant tori where the dynamics becomes linear.
The progression is from algebra to geometry to coordinates. First integrals in involution give commuting Hamiltonian vector fields; their common level sets become the natural candidates for invariant manifolds. Under compactness and regularity hypotheses these level sets are tori, and action-angle coordinates turn the Hamiltonian flow into uniform rotation.
## First Integrals and Liouville Integrability
The main question is how many conserved quantities are needed to constrain a Hamiltonian system on a $2n$-dimensional phase space. A single conserved Hamiltonian confines motion to an energy hypersurface, but this still leaves many possible directions of motion. Complete integrability asks for $n$ independent conserved quantities whose Hamiltonian flows are mutually compatible.
[definition: First Integral]
Let $(M,\omega)$ be a symplectic manifold, and let $H:M\to \mathbb R$ be a smooth Hamiltonian. A smooth function $F:M\to \mathbb R$ is a first integral of $H$ if
\begin{align*}
\{F,H\}=0.
\end{align*}
[/definition]
The condition says that $F$ is constant along the Hamiltonian flow of $H$. It is stronger than constancy on a single trajectory, because it is a global equation involving the Poisson bracket.
[example: Energy as a First Integral]
Let $(M,\omega)$ be a symplectic manifold and let $H:M\to \mathbb R$ be a smooth Hamiltonian. The Poisson bracket is skew-symmetric, so for any smooth functions $F,G$ one has $\{F,G\}=-\{G,F\}$. Taking $F=G=H$ gives
\begin{align*}
\{H,H\}=-\{H,H\}.
\end{align*}
Adding $\{H,H\}$ to both sides gives
\begin{align*}
2\{H,H\}=0,
\end{align*}
and hence
\begin{align*}
\{H,H\}=0.
\end{align*}
Therefore $H$ satisfies the defining condition for being a first integral of its own Hamiltonian flow. This shows that first integrals are not rare by themselves; Liouville integrability requires enough additional conserved quantities that are independent from $H$.
[/example]
The example shows that conservation alone is too weak, because conserved quantities may fail to organise a compatible family of motions. To use several conserved quantities at once, we need a condition saying that their Hamiltonian flows can be followed in any order. This motivates the following definition.
[definition: Involution]
Let $(M,\omega)$ be a symplectic manifold. Smooth functions $F_1,\dots,F_k:M\to \mathbb R$ are in involution if
\begin{align*}
\{F_i,F_j\}=0
\end{align*}
for all $1\le i,j\le k$.
[/definition]
Involution is the algebraic condition that turns conservation laws into a compatible family, but the compatibility needed for integrability is dynamical rather than only algebraic. The obstruction is that two conserved quantities might fail to preserve each other's level sets, so their flows would not stay on the same common fibre. Vanishing Poisson bracket is the test that removes this obstruction.
[quotetheorem:1336]
[citeproof:1336]
The theorem explains why involution is the right compatibility condition. The condition $\{F,G\}=0$ is a sufficient algebraic test for the Hamiltonian flows to commute and, more importantly for integrability, it says that each Hamiltonian flow preserves the other function's level sets. Without this preservation, the flows generated by the proposed integrals may carry a point away from the common level set on which the torus construction is supposed to take place. For example, on $(\mathbb R^2,dq\wedge dp)$ with $F=q$ and $G=p$, the bracket $\{F,G\}$ is nonzero: the coordinate vector fields themselves are constant and commute, but the flow of $X_F$ changes $G$ and the flow of $X_G$ changes $F$, so the pair does not define common invariant level sets. The theorem does not say that the individual flows are complete, nor does it say that common level sets are compact or toroidal. It only supplies the infinitesimal compatibility needed before the global Arnold-Liouville argument can turn level sets into invariant tori.
To turn this compatibility into a solvability criterion, we still need to specify how many such functions are required and rule out redundant conserved quantities. This leads to the definition of Liouville integrability.
[definition: Liouville Integrable System]
Let $(M,\omega)$ be a symplectic manifold of dimension $2n$. A Hamiltonian system with Hamiltonian $H:M\to \mathbb R$ is Liouville integrable on an open set $U\subset M$ if there exist smooth functions $F_1,\dots,F_n:U\to \mathbb R$ such that $F_1=H|_U$, the functions $F_1,\dots,F_n$ are in involution, and the differentials $dF_1,\dots,dF_n$ are linearly independent at every point of $U$.
[/definition]
The independence condition prevents the conserved quantities from being redundant. Since $n$ independent equations cut a regular level set down from dimension $2n$ to dimension $n$, Liouville integrability creates candidates for half-dimensional invariant manifolds.
[example: Harmonic Oscillator in n Dimensions]
On $T^*\mathbb R^n$ with canonical coordinates $(q_1,\dots,q_n,p_1,\dots,p_n)$, consider
\begin{align*}
H(q,p)=\frac{1}{2}\sum_{i=1}^{n}(p_i^2+\omega_i^2q_i^2),
\end{align*}
where each $\omega_i>0$, and define
\begin{align*}
F_i(q,p)=\frac{1}{2}(p_i^2+\omega_i^2q_i^2).
\end{align*}
Then $H=F_1+\cdots+F_n$. With the canonical Poisson bracket,
\begin{align*}
\{F_i,F_j\}=\sum_{k=1}^{n}\left(\frac{\partial F_i}{\partial q_k}\frac{\partial F_j}{\partial p_k}-\frac{\partial F_i}{\partial p_k}\frac{\partial F_j}{\partial q_k}\right).
\end{align*}
The $q_k$-partial derivative of $F_i$ is $\omega_i^2q_i$ when $k=i$ and is $0$ when $k\ne i$. The $p_k$-partial derivative of $F_i$ is $p_i$ when $k=i$ and is $0$ when $k\ne i$. If $i\ne j$, then in each summand either the derivative of $F_i$ or the derivative of $F_j$ in the same coordinate pair is $0$, so $\{F_i,F_j\}=0$. If $i=j$, the only possibly nonzero summand is
\begin{align*}
(\omega_i^2q_i)(p_i)-(p_i)(\omega_i^2q_i)=0.
\end{align*}
Thus $\{F_i,F_j\}=0$ for all $i,j$, and hence
\begin{align*}
\{F_i,H\}=\sum_{j=1}^{n}\{F_i,F_j\}=0.
\end{align*}
Each $F_i$ is therefore a first integral of $H$.
The differentials are
\begin{align*}
dF_i=\omega_i^2q_i\,dq_i+p_i\,dp_i.
\end{align*}
On the region where all $F_i>0$, no pair $(q_i,p_i)$ is equal to $(0,0)$. If $\sum_i a_i\,dF_i=0$, then evaluating on $\partial/\partial q_i$ gives
\begin{align*}
0=a_i\omega_i^2q_i,
\end{align*}
and evaluating on $\partial/\partial p_i$ gives
\begin{align*}
0=a_i p_i.
\end{align*}
Since $(\omega_i^2q_i,p_i)\ne(0,0)$, these two equalities force $a_i=0$. This holds for every $i$, so $dF_1,\dots,dF_n$ are linearly independent on this region. The family $(H,F_2,\dots,F_n)$ is also independent, because replacing $F_1$ by $H=F_1+\cdots+F_n$ is an invertible linear change of the list of differentials.
For a regular value $c=(c_1,\dots,c_n)$ with every $c_i>0$, the equations $F_i=c_i$ are
\begin{align*}
p_i^2+\omega_i^2q_i^2=2c_i.
\end{align*}
For each $i$, this is an ellipse in the $(q_i,p_i)$-plane. It is parametrised by
\begin{align*}
q_i=\frac{\sqrt{2c_i}}{\omega_i}\cos\phi_i.
\end{align*}
The corresponding momentum coordinate is
\begin{align*}
p_i=\sqrt{2c_i}\sin\phi_i.
\end{align*}
Substituting these expressions gives
\begin{align*}
p_i^2+\omega_i^2q_i^2=2c_i\sin^2\phi_i+2c_i\cos^2\phi_i=2c_i.
\end{align*}
Thus the common level set is a product of $n$ circles. On the open region where all oscillator energies are positive, the system is Liouville integrable and its regular level sets are $n$-tori.
[/example]
## Lagrangian Invariant Tori and Quasi-Periodic Flows
Once the conserved quantities define a regular common level set, the geometric problem is to identify what kind of submanifold it is. The symplectic form gives a sharp answer: regular compact connected common level sets of a completely integrable system are Lagrangian tori. The dynamics on them is not arbitrary; it is translation by a constant vector.
[definition: Common Level Set of Integrals]
Let $F=(F_1,\dots,F_n):U\to \mathbb R^n$ be a smooth map on an open subset $U$ of a symplectic manifold $(M,\omega)$. For $c=(c_1,\dots,c_n)\in \mathbb R^n$, the common level set at $c$ is
\begin{align*}
N_c=F^{-1}(c)=\{x\in U: F_i(x)=c_i\text{ for all }1\le i\le n\}.
\end{align*}
[/definition]
When $c$ is a regular value, $N_c$ is an $n$-dimensional submanifold. The involution condition forces the Hamiltonian vector fields $X_{F_1},\dots,X_{F_n}$ to lie tangent to it and to span its tangent spaces, and the same Poisson-commuting condition makes the symplectic form vanish on those tangent directions. Thus the level set has two features that must be tracked together: it has half the ambient dimension, and the symplectic form detects no tangent two-dimensional area on it.
Those two features are not merely consequences of the preceding definition; they are the geometric condition that lets an invariant level set behave like a phase torus rather than an arbitrary submanifold. To state the integrable-systems theorem cleanly, we need a name for submanifolds that are maximal among those on which the symplectic form vanishes.
[definition: Lagrangian Submanifold]
Let $(M,\omega)$ be a symplectic manifold of dimension $2n$. A submanifold $L\subset M$ is Lagrangian if $\dim L=n$ and $\omega|_L=0$.
[/definition]
Lagrangian submanifolds are the largest submanifolds on which the symplectic form vanishes. In the present setting, the tangent vectors to the common level set are generated by mutually Poisson-commuting Hamiltonian vector fields. The next theorem says that compact regular common level sets have both this Lagrangian structure and the topology of tori.
[quotetheorem:1353]
[citeproof:1353]
The theorem gives a complete qualitative description near a regular compact level set, and each hypothesis has a visible role. Regularity is needed because singular fibres can collapse cycles or contain equilibria, as in the harmonic oscillator at zero energy where the level set is a point rather than a circle. Compactness excludes regular fibres such as cylinders or planes, where an $\mathbb R^n$-action need not quotient by a lattice to give a torus. Connectedness matters because a compact regular level set could be a disjoint union of invariant tori, so one connected component rather than the whole fibre is the natural object. Independence of the $dF_i$ is what makes the fibre $n$-dimensional and makes the Hamiltonian vector fields span it; without it the proposed coordinates would have redundant actions. Thus the theorem is a regular-fibre theorem, not a classification of singular fibres, noncompact fibres, or global topology across a whole family of tori.
[illustration:integrable-system-invariant-torus]
To understand the actual trajectories on the torus, we need a name for linear motion with several independent angular speeds. The familiar one-dimensional picture suggests that motion with a constant angular speed should close after a period, but higher-dimensional tori have a new failure mode: the trajectory may wind forever without returning to its starting point, and may instead fill the torus densely. This motivates the following definition.
[definition: Quasi-Periodic Flow]
A flow $\varphi:\mathbb R\times\mathbb T^n\to\mathbb T^n$ on $\mathbb T^n=\mathbb R^n/\mathbb Z^n$ is quasi-periodic with frequency vector $\omega=(\omega_1,\dots,\omega_n)\in\mathbb R^n$ if, for every $t\in\mathbb R$ and $\vartheta\in\mathbb T^n$, it has the form
\begin{align*}
\varphi(t,\vartheta)=\vartheta+t\omega \pmod{\mathbb Z^n}.
\end{align*}
[/definition]
The arithmetic of the components of $\omega$ controls the orbit structure. If there is a nonzero integer relation $k\cdot\omega=0$, each orbit is confined to a proper subtorus determined by the resonance relations; the motion on that subtorus is periodic only when the remaining independent frequencies are all commensurable. If the components are rationally independent, each orbit is dense in $\mathbb T^n$. Resonance is therefore a number-theoretic condition on the frequency vector: integer relations determine which torus directions are removed from the orbit closure and which directions remain densely wound.
[example: Irrational Flow on the Two-Torus]
On $\mathbb T^2=\mathbb R^2/\mathbb Z^2$, fix
\begin{align*}
\varphi_t(\vartheta_1,\vartheta_2)=(\vartheta_1+t,\vartheta_2+\alpha t)\pmod{\mathbb Z^2},
\end{align*}
where $\alpha\in\mathbb R\setminus\mathbb Q$. If the trajectory through $(\vartheta_1,\vartheta_2)$ returned to its starting point at time $t$, then
\begin{align*}
\vartheta_1+t\equiv \vartheta_1 \pmod{\mathbb Z}
\end{align*}
so $t\in\mathbb Z$, and also
\begin{align*}
\vartheta_2+\alpha t\equiv \vartheta_2 \pmod{\mathbb Z}
\end{align*}
so $\alpha t\in\mathbb Z$. If $t\ne 0$, then
\begin{align*}
\alpha=\frac{\alpha t}{t}\in\mathbb Q,
\end{align*}
contradicting the assumption that $\alpha$ is irrational. Hence the only return time is $t=0$, so the trajectory is not periodic.
The orbit is dense. Let $(\eta_1,\eta_2)\in\mathbb T^2$ be any target point, and choose real representatives for all angular coordinates. Set
\begin{align*}
a=\eta_1-\vartheta_1.
\end{align*}
For any integer $m$, the time $t=a+m$ gives
\begin{align*}
\vartheta_1+t=\vartheta_1+a+m=\eta_1+m\equiv \eta_1 \pmod{\mathbb Z}.
\end{align*}
At the same time, the second coordinate is
\begin{align*}
\vartheta_2+\alpha t=\vartheta_2+\alpha a+\alpha m.
\end{align*}
Since $\alpha$ is irrational, *Kronecker's density theorem* implies that the set $\{\alpha m\pmod{\mathbb Z}:m\in\mathbb Z\}$ is dense in $\mathbb T^1$. Therefore we can choose $m$ so that
\begin{align*}
\vartheta_2+\alpha a+\alpha m\equiv \eta_2 \pmod{\mathbb Z}
\end{align*}
to within any prescribed circle-neighbourhood. Thus every neighbourhood of every point of $\mathbb T^2$ meets the trajectory. The irrational frequency ratio prevents closure and makes the orbit wind densely around the invariant torus.
[/example]
## Construction and Interpretation of Action-Angle Variables
The preceding section describes the invariant tori, but it does not yet provide canonical coordinates near them. The constructive problem is to find coordinates that preserve the symplectic form and make the Hamiltonian depend only on the conserved quantities. These coordinates separate the slow data specifying the torus from the angular data specifying position on the torus.
[definition: Action-Angle Coordinates]
Let $(M,\omega)$ be a symplectic manifold of dimension $2n$. Action-angle coordinates on an open set $V\subset M$ consist of an open set $A\subset\mathbb R^n$ and a diffeomorphism
\begin{align*}
\Phi:V\to A\times\mathbb T^n,
\end{align*}
where $\mathbb T^n=\mathbb R^n/\mathbb Z^n$, such that $\Phi(x)=(I(x),\vartheta(x))$ with $I:V\to A$ and $\vartheta:V\to\mathbb T^n$, and
\begin{align*}
\omega=\sum_{i=1}^{n}dI_i\wedge d\vartheta_i.
\end{align*}
[/definition]
As in Chapter 7's Hamilton-Jacobi discussion, the variables $I_i$ label nearby invariant tori, while the variables $\vartheta_i$ are period-$1$ coordinates along each torus. This convention absorbs the usual factor of $2\pi$ into the action variables: if $\phi_i=2\pi\vartheta_i$ are $2\pi$-periodic angles and $J_i$ are the corresponding normalised actions, then $I_i=2\pi J_i$ gives $dI_i\wedge d\vartheta_i=dJ_i\wedge d\phi_i$. What remains is to know that such coordinates exist under the regular compactness hypotheses of the Arnold-Liouville theorem. The next theorem gives that canonical normal form.
[quotetheorem:1353]
[citeproof:1353]
This theorem turns integrability into a coordinate normal form, but the conclusion is deliberately local near one compact regular torus. Without compact regular tori, there may be no periodic angular variables: for instance, the free particle on $T^*\mathbb R^n$ has regular fibres with translational directions rather than torus angles. A singular fibre can make the action integrals degenerate, as in the zero-energy harmonic oscillator where the circle collapses to an equilibrium. Locality also matters because a family of regular tori may have monodromy; the spherical pendulum is the standard mechanical example, where transporting a basis of cycles around the focus-focus singular value changes that basis. The theorem therefore gives a computational method near a chosen regular torus, not a global coordinate system on the whole phase space. In practice, one computes the actions by integrating the Liouville one-form over a basis of cycles, rewrites $H$ as $h(I)$, and reads off the frequencies as $\partial h/\partial I_i$.
Solving the equations now means evaluating $I(t)=I(0)$ and integrating a constant angular velocity.
[example: Actions for the One-Dimensional Harmonic Oscillator]
For
\begin{align*}
H(q,p)=\frac{1}{2}(p^2+\omega^2q^2),
\end{align*}
with $\omega>0$, the positive energy curve $H=E$ satisfies
\begin{align*}
p^2+\omega^2q^2=2E.
\end{align*}
Parametrize this ellipse by
\begin{align*}
q(\phi)=\frac{\sqrt{2E}}{\omega}\cos\phi,\qquad p(\phi)=-\sqrt{2E}\sin\phi,\qquad 0\le \phi\le 2\pi.
\end{align*}
Then
\begin{align*}
dq=-\frac{\sqrt{2E}}{\omega}\sin\phi\,d\phi.
\end{align*}
Substituting into the action integral gives
\begin{align*}
\oint p\,dq=\int_0^{2\pi}\left(-\sqrt{2E}\sin\phi\right)\left(-\frac{\sqrt{2E}}{\omega}\sin\phi\right)\,d\phi.
\end{align*}
Thus
\begin{align*}
\oint p\,dq=\frac{2E}{\omega}\int_0^{2\pi}\sin^2\phi\,d\phi.
\end{align*}
Using $\sin^2\phi=(1-\cos 2\phi)/2$,
\begin{align*}
\int_0^{2\pi}\sin^2\phi\,d\phi=\int_0^{2\pi}\frac{1-\cos 2\phi}{2}\,d\phi=\pi.
\end{align*}
Therefore
\begin{align*}
\oint p\,dq=\frac{2\pi E}{\omega}.
\end{align*}
The standard action paired with the $2\pi$-periodic phase $\phi$ is
\begin{align*}
J=\frac{1}{2\pi}\oint p\,dq=\frac{E}{\omega}.
\end{align*}
Equivalently,
\begin{align*}
E=\omega J.
\end{align*}
Since $H=E$ on the energy curve, this convention gives
\begin{align*}
H=\omega J.
\end{align*}
Hamilton's equation for the conjugate pair $(J,\phi)$ gives
\begin{align*}
\dot\phi=\frac{\partial H}{\partial J}=\omega.
\end{align*}
In the period-$1$ convention used for action-angle coordinates, set
\begin{align*}
\vartheta=\frac{\phi}{2\pi},\qquad I=2\pi J.
\end{align*}
Since $J=I/(2\pi)$, the Hamiltonian becomes
\begin{align*}
H=\omega J=\frac{\omega}{2\pi}I.
\end{align*}
Hamilton's equations in the coordinates $(I,\vartheta)$ therefore give
\begin{align*}
\dot I=-\frac{\partial H}{\partial \vartheta}=0.
\end{align*}
They also give
\begin{align*}
\dot\vartheta=\frac{\partial H}{\partial I}=\frac{\omega}{2\pi}.
\end{align*}
Finally, because $\phi=2\pi\vartheta$,
\begin{align*}
\dot\phi=2\pi\dot\vartheta=2\pi\frac{\omega}{2\pi}=\omega.
\end{align*}
Thus changing from the $2\pi$-periodic phase to the period-$1$ angle rescales the action and angular coordinate, but the physical oscillator still rotates with angular speed $\omega$.
[/example]
The formula for the action shows why these variables are natural in mechanics. Actions measure symplectic areas, so they are invariant under canonical changes of coordinates and record the size of the torus in phase space.
## Classical Families of Integrable Systems
The abstract theory becomes useful only after recognising complete sets of integrals in examples. Three recurrent sources are separable oscillators, rotational symmetry, and rigid body motion. Each illustrates a different way in which enough conserved quantities can appear.
The harmonic oscillator is the model case where integrability comes from decoupling into independent two-dimensional oscillators. Its invariant tori are products of energy circles, and the frequencies are fixed by the oscillator constants.
[example: n-Dimensional Harmonic Oscillator Frequencies]
For the Hamiltonian
\begin{align*}
H(I,\vartheta)=\sum_{i=1}^{n}\omega_i I_i
\end{align*}
in action-angle coordinates, Hamilton's equations give
\begin{align*}
\dot I_k=-\frac{\partial H}{\partial \vartheta_k}=0
\end{align*}
because $H$ has no $\vartheta_k$-dependence. Also,
\begin{align*}
\frac{\partial H}{\partial I_k}=\frac{\partial}{\partial I_k}\left(\omega_k I_k+\sum_{i\ne k}\omega_i I_i\right)=\omega_k,
\end{align*}
so
\begin{align*}
\dot\vartheta_k=\frac{\partial H}{\partial I_k}=\omega_k.
\end{align*}
Thus $I(t)=I(0)$, and the angular motion is
\begin{align*}
\vartheta(t)=\vartheta(0)+t(\omega_1,\dots,\omega_n)\pmod{\mathbb Z^n}.
\end{align*}
Assume first that all ratios $\omega_i/\omega_j$ are rational, with $\omega_i>0$. Fix $\omega_1$. For each $i$, write
\begin{align*}
\frac{\omega_i}{\omega_1}=\frac{a_i}{b_i}
\end{align*}
with $a_i,b_i\in\mathbb Z_{>0}$. Let $B$ be a common multiple of $b_1,\dots,b_n$, and set
\begin{align*}
T=\frac{B}{\omega_1}.
\end{align*}
Then for each $i$,
\begin{align*}
T\omega_i=\frac{B}{\omega_1}\omega_i=B\frac{\omega_i}{\omega_1}=B\frac{a_i}{b_i}\in\mathbb Z.
\end{align*}
Therefore
\begin{align*}
\vartheta(T)=\vartheta(0)+T\omega\equiv\vartheta(0)\pmod{\mathbb Z^n},
\end{align*}
so the trajectory closes after the common period $T$.
If instead $\omega_1,\dots,\omega_n$ are rationally independent, meaning that
\begin{align*}
k_1\omega_1+\cdots+k_n\omega_n=0
\end{align*}
with $k_i\in\mathbb Z$ forces $k_1=\cdots=k_n=0$, then *Kronecker's density theorem* implies that the set
\begin{align*}
\{\vartheta(0)+t\omega\pmod{\mathbb Z^n}:t\in\mathbb R\}
\end{align*}
is dense in $\mathbb T^n$. Thus rational resonance gives closed trajectories, while rational independence makes the same linear flow wind densely on the invariant torus $I=I(0)$.
[/example]
Central force motion is integrable for a different reason: rotational symmetry gives angular momentum conservation, and the planar reduction leaves a one-degree-of-freedom radial problem.
[example: Central Force Problem]
Let $m>0$, let $\mathbb R^3_0=\mathbb R^3\setminus\{0\}$, and consider the Hamiltonian
\begin{align*}
H(q,p)=\frac{1}{2m}|p|^2+V(|q|)
\end{align*}
on $T^*\mathbb R^3_0$. Write $r=|q|$ and $L=q\times p$. Hamilton's equations are
\begin{align*}
\dot q=\frac{\partial H}{\partial p}=\frac{p}{m}
\end{align*}
and
\begin{align*}
\dot p=-\frac{\partial H}{\partial q}=-V'(r)\frac{q}{r}.
\end{align*}
Therefore
\begin{align*}
\dot L=\frac{d}{dt}(q\times p)=\dot q\times p+q\times \dot p.
\end{align*}
Substituting the equations of motion gives
\begin{align*}
\dot L=\frac{p}{m}\times p+q\times\left(-V'(r)\frac{q}{r}\right).
\end{align*}
Since $p\times p=0$ and $q\times q=0$, this becomes
\begin{align*}
\dot L=0-\frac{V'(r)}{r}(q\times q)=0.
\end{align*}
Thus angular momentum is conserved.
If $L\ne 0$, then
\begin{align*}
q\cdot L=q\cdot(q\times p)=0
\end{align*}
and
\begin{align*}
p\cdot L=p\cdot(q\times p)=0.
\end{align*}
Because $L$ is constant, both $q(t)$ and $p(t)$ remain in the fixed plane perpendicular to $L$. Choose polar coordinates $(r,\phi)$ in that plane, with canonical momenta $(p_r,\ell)$, where $\ell$ is the signed angular momentum in the chosen plane. Then
\begin{align*}
p=p_r e_r+\frac{\ell}{r}e_\phi
\end{align*}
with $e_r\cdot e_\phi=0$ and $|e_r|=|e_\phi|=1$. Hence
\begin{align*}
|p|^2=\left|p_r e_r+\frac{\ell}{r}e_\phi\right|^2=p_r^2+\frac{\ell^2}{r^2}.
\end{align*}
The reduced Hamiltonian is therefore
\begin{align*}
H_{\mathrm{red}}(r,\phi,p_r,\ell)=\frac{p_r^2}{2m}+\frac{\ell^2}{2mr^2}+V(r).
\end{align*}
It has no $\phi$-dependence, so Hamilton's equation gives
\begin{align*}
\dot\ell=-\frac{\partial H_{\mathrm{red}}}{\partial \phi}=0.
\end{align*}
For a fixed value of $\ell$, the radial motion is governed by
\begin{align*}
H_{\mathrm{rad}}(r,p_r)=\frac{p_r^2}{2m}+V_{\mathrm{eff}}(r)
\end{align*}
where
\begin{align*}
V_{\mathrm{eff}}(r)=V(r)+\frac{\ell^2}{2mr^2}.
\end{align*}
Thus the central force problem reduces, on each nonzero angular momentum plane, to a one-dimensional radial Hamiltonian together with the conserved angle momentum $\ell$. On regular regions where the differentials of the chosen conserved quantities are independent, the energy and angular momentum give commuting integrals, so the reduced system is Liouville integrable.
[/example]
The Euler top shows that integrability is not restricted to particle motion in Euclidean space. It is a finite-dimensional Hamiltonian system on a Lie-Poisson phase space, and its conserved quantities come from energy and the fixed magnitude of angular momentum.
[example: Euler Top as an Integrable System]
For a free rigid body with positive principal moments of inertia $I_1,I_2,I_3$, the body angular momentum components $M_1,M_2,M_3$ satisfy the Euler equations
\begin{align*}
\dot M_1=\left(\frac{1}{I_3}-\frac{1}{I_2}\right)M_2M_3.
\end{align*}
\begin{align*}
\dot M_2=\left(\frac{1}{I_1}-\frac{1}{I_3}\right)M_3M_1.
\end{align*}
\begin{align*}
\dot M_3=\left(\frac{1}{I_2}-\frac{1}{I_1}\right)M_1M_2.
\end{align*}
We verify that the energy
\begin{align*}
H=\frac{1}{2}\left(\frac{M_1^2}{I_1}+\frac{M_2^2}{I_2}+\frac{M_3^2}{I_3}\right)
\end{align*}
is conserved along these equations. Differentiating term by term gives
\begin{align*}
\dot H=\frac{M_1\dot M_1}{I_1}+\frac{M_2\dot M_2}{I_2}+\frac{M_3\dot M_3}{I_3}.
\end{align*}
Substituting the Euler equations gives
\begin{align*}
\dot H=\frac{M_1M_2M_3}{I_1}\left(\frac{1}{I_3}-\frac{1}{I_2}\right)+\frac{M_1M_2M_3}{I_2}\left(\frac{1}{I_1}-\frac{1}{I_3}\right)+\frac{M_1M_2M_3}{I_3}\left(\frac{1}{I_2}-\frac{1}{I_1}\right).
\end{align*}
Factoring out $M_1M_2M_3$ leaves
\begin{align*}
\dot H=M_1M_2M_3\left(\frac{1}{I_1I_3}-\frac{1}{I_1I_2}+\frac{1}{I_1I_2}-\frac{1}{I_2I_3}+\frac{1}{I_2I_3}-\frac{1}{I_1I_3}\right).
\end{align*}
The six terms cancel in pairs, so
\begin{align*}
\dot H=0.
\end{align*}
The squared angular momentum
\begin{align*}
C=M_1^2+M_2^2+M_3^2
\end{align*}
is also conserved. Differentiating gives
\begin{align*}
\dot C=2M_1\dot M_1+2M_2\dot M_2+2M_3\dot M_3.
\end{align*}
Substituting the Euler equations gives
\begin{align*}
\dot C=2M_1M_2M_3\left(\frac{1}{I_3}-\frac{1}{I_2}\right)+2M_1M_2M_3\left(\frac{1}{I_1}-\frac{1}{I_3}\right)+2M_1M_2M_3\left(\frac{1}{I_2}-\frac{1}{I_1}\right).
\end{align*}
Factoring out $2M_1M_2M_3$ gives
\begin{align*}
\dot C=2M_1M_2M_3\left(\frac{1}{I_3}-\frac{1}{I_2}+\frac{1}{I_1}-\frac{1}{I_3}+\frac{1}{I_2}-\frac{1}{I_1}\right)=0.
\end{align*}
Thus each trajectory stays on the sphere $C=c$ and also on the energy ellipsoid $H=E$. On a regular sphere $C=c>0$, the reduced phase space is two-dimensional, and the additional equation $H=E$ cuts out one-dimensional invariant curves except at critical levels. Hence the reduced Euler top has one degree of freedom with a conserved Hamiltonian, so on regular level curves its motion is integrable and can be described by an angle coordinate along each invariant curve.
[/example]
These examples also mark the boundary of the theory. Liouville integrability is a rigid condition: it needs the correct number of independent commuting integrals on regular regions, and singular fibres require additional analysis beyond the regular Arnold-Liouville theorem. In applications, much of the work lies in finding the integrals and identifying where their differentials are independent.
Even when exact integration is unavailable, the same geometric ideas still guide the analysis. Chapter 10 applies symplectic methods, reduction, Hamilton-Jacobi ideas, and action-angle thinking to scattering, adiabatic change, and perturbation theory, where conservation laws and slow variation remain the main organizing principles.
# 10. Classical Scattering, Adiabatic Invariants, and Perturbation Ideas
This closing chapter uses the symplectic, reduction, Hamilton-Jacobi, stability, and action-angle machinery developed in Chapters 3 through 9 in three situations where exact integration is no longer the whole story. Central-force scattering asks how conserved quantities determine an incoming-to-outgoing map rather than a bound orbit. Slowly varying systems show that some quantities remain nearly conserved even when energy need not be. Near-integrable Hamiltonians then explain why resonance, recurrence, and perturbation theory are the natural next questions after action-angle variables.
## Central-Force Scattering
The basic scattering problem is not to solve for the whole trajectory as a function of time, but to compare the incoming asymptote with the outgoing asymptote. For a particle moving in a central potential, rotational symmetry and energy conservation reduce this comparison to a one-dimensional radial calculation. We write $\mathbb R^3_0:=\mathbb R^3\setminus\{0\}$ for punctured Euclidean space, since the central force law is singular or undefined at the centre.
[definition: Classical Scattering Orbit]
Let $m>0$ be the mass. Let $V:(0,\infty)\to \mathbb R$ be a central potential. Consider a trajectory $q(t)\in\mathbb R^3_0$ satisfying
\begin{align*}
m\ddot q(t)=-V'(|q(t)|)\frac{q(t)}{|q(t)|}.
\end{align*}
Let $\mathcal S_E(V)$ be the set of maps $q\in C^2(\mathbb R;\mathbb R^3_0)$ satisfying this equation and having conserved energy $E$. A scattering orbit at energy $E$ is an element $q\in\mathcal S_E(V)$ for which $|q(t)|\to\infty$ as $t\to\pm\infty$ and the asymptotic velocities
\begin{align*}
v_\pm=\lim_{t\to\pm\infty}\dot q(t)
\end{align*}
exist with $|v_-|=|v_+|>0$, and for which there exist vectors $q_\pm\in\mathbb R^3$ such that
\begin{align*}
q(t)-t v_\pm\to q_\pm
\end{align*}
as $t\to\pm\infty$.
[/definition]
This definition isolates unbound motion with well-defined incoming and outgoing directions. To compare different incoming lines with the same speed, we need a scalar parameter measuring how close the incoming asymptote comes to the centre.
[definition: Impact Parameter]
Let $\mathcal S_{E,\mathrm{sc}}(V)\subset\mathcal S_E(V)$ be the set of scattering orbits at energy $E$. For $q\in\mathcal S_{E,\mathrm{sc}}(V)$, let $v_-=\lim_{t\to-\infty}\dot q(t)$ and let $q_-\in\mathbb R^3$ satisfy $q(t)-t v_-\to q_-$ as $t\to-\infty$. The impact parameter is the map $b:\mathcal S_{E,\mathrm{sc}}(V)\to[0,\infty)$ defined by
\begin{align*}
b(q)=\frac{|q_-\times v_-|}{|v_-|}.
\end{align*}
[/definition]
The vector $q_-$ is determined only modulo addition of a multiple of $v_-$, and the displayed quantity is independent of that choice because $v_-\times v_-=0$. Hence $b(q)$ is the perpendicular distance from the origin to the incoming asymptotic line. The impact parameter is the geometric datum left after fixing the incoming speed and choosing the incoming direction. It should be tied to conserved quantities, because conservation laws are what make the incoming data computable throughout the encounter rather than only at infinity. The next theorem records the precise link between scattering geometry, angular momentum, and planar reduction.
[quotetheorem:6856]
[citeproof:6856]
The theorem turns a three-dimensional scattering problem into a planar problem with two constants of motion. The hypothesis $V(r)\to0$ is not cosmetic: without it, the asymptotic kinetic energy need not equal the conserved energy, so $v_\infty$ would not determine $E$ by the displayed formula. Smoothness is also doing work; for a potential with corners or singular impulses, differentiating $V(|q|)$ along the trajectory may fail at the singular point and the displayed force law may not define a classical $C^2$ solution. Centrality is equally essential: a small non-radial force can exert nonzero torque, so $mq\times\dot q$ need not be conserved and the motion need not stay in a fixed plane. The conclusion also does not say that every orbit scatters; bound orbits and collision orbits may still occur depending on the energy and angular momentum.
The radial coordinate is not free motion after the planar reduction. Even when the original potential is weak far from the centre, angular momentum forces the particle to spend kinetic energy on rotation, and that rotational cost grows like $r^{-2}$ near the origin. A one-dimensional radial equation should therefore remember both the original potential and this angular-momentum barrier; otherwise the turning-point condition would miss the closest approach.
[illustration:central-force-effective-potential]
The radial turning point is where all available energy has been converted into potential energy after the angular motion has been eliminated. To make that condition usable, we package the original potential and the angular-momentum barrier into a single radial function whose level set records the closest approach.
[definition: Effective Radial Potential]
For fixed angular momentum magnitude $\ell>0$, the effective radial potential is the function $V_{\mathrm{eff}}:(0,\infty)\to\mathbb R$ defined by
\begin{align*}
V_{\mathrm{eff}}(r)=V(r)+\frac{\ell^2}{2mr^2},\qquad r>0.
\end{align*}
[/definition]
The term
\begin{align*}
\frac{\ell^2}{2mr^2}
\end{align*}
is the centrifugal contribution produced by eliminating the angular coordinate, and it creates a barrier near $r=0$ whenever $\ell\ne0$. Once the radial turning point is known from $E=V_{\mathrm{eff}}(r)$, the remaining task is to measure how much angle is accumulated during the inward and outward legs of the orbit. The following formula is the bridge from conserved quantities to the actual scattering map.
[quotetheorem:6857]
[citeproof:6857]
The formula expresses the scattering map through a single improper integral, but its hypotheses describe exactly where the reduction can fail. If the radicand vanishes again beyond $r_{\min}$, the orbit has another radial obstruction rather than a single incoming-to-outgoing passage. If the turning point is not simple, the integral may acquire a different local singularity and the scattering time or angle can behave non-uniformly. Long-range potentials require additional care because the comparison with a straight-line asymptote may need renormalisation. Thus the formula determines the deflection for the chosen branch of scattering data; it does not by itself classify all possible orbits of the potential. Its main use in mechanics is that different potentials leave different fingerprints in the dependence of $\Theta$ on the impact parameter.
[illustration:scattering-geometry]
[example: Rutherford-Type Inverse-Square Scattering]
Consider the repulsive Coulomb potential
\begin{align*}
V(r)=\frac{\kappa}{r}
\end{align*}
with $\kappa>0$, incoming speed $v_\infty$, and nonzero impact parameter $b>0$. The conserved quantities are
\begin{align*}
E=\frac{m v_\infty^2}{2},\qquad \ell=m v_\infty b.
\end{align*}
For this potential, the scattering half-angle is
\begin{align*}
\Phi(E,\ell)=\int_{r_{\min}}^\infty \frac{\ell\,dr}{r^2\sqrt{2mE-2m\kappa/r-\ell^2/r^2}}.
\end{align*}
Put $u=1/r$, so $du=-dr/r^2$. As $r$ goes from $r_{\min}$ to $\infty$, $u$ goes from $u_{\max}=1/r_{\min}$ to $0$, hence
\begin{align*}
\Phi(E,\ell)=\int_0^{u_{\max}}\frac{\ell\,du}{\sqrt{2mE-2m\kappa u-\ell^2u^2}}.
\end{align*}
The turning point is determined by the vanishing of the radicand:
\begin{align*}
2mE-2m\kappa u_{\max}-\ell^2u_{\max}^2=0.
\end{align*}
Completing the square gives
\begin{align*}
2mE-2m\kappa u-\ell^2u^2=\left(2mE+\frac{m^2\kappa^2}{\ell^2}\right)-\ell^2\left(u+\frac{m\kappa}{\ell^2}\right)^2.
\end{align*}
Set
\begin{align*}
A=\sqrt{2mE+\frac{m^2\kappa^2}{\ell^2}},\qquad y=\frac{\ell}{A}\left(u+\frac{m\kappa}{\ell^2}\right).
\end{align*}
Then $\ell\,du=A\,dy$, and the radicand is $A^2(1-y^2)$, so
\begin{align*}
\Phi(E,\ell)=\int_{y_0}^{1}\frac{dy}{\sqrt{1-y^2}}.
\end{align*}
Here the upper endpoint is $1$ because the radicand vanishes at $u=u_{\max}$, while the lower endpoint is
\begin{align*}
y_0=\frac{m\kappa}{\ell A}=\frac{m\kappa}{\sqrt{2mE\ell^2+m^2\kappa^2}}.
\end{align*}
Therefore
\begin{align*}
\Phi(E,\ell)=\frac{\pi}{2}-\arcsin(y_0).
\end{align*}
Since $\Theta=\pi-2\Phi$, we get
\begin{align*}
\frac{\Theta}{2}=\arcsin(y_0).
\end{align*}
Thus
\begin{align*}
\tan\frac{\Theta}{2}=\frac{y_0}{\sqrt{1-y_0^2}}=\frac{m\kappa}{\sqrt{2mE\ell^2}}.
\end{align*}
Substituting $E=m v_\infty^2/2$ and $\ell=m v_\infty b$ gives
\begin{align*}
\tan\frac{\Theta}{2}=\frac{\kappa}{m v_\infty^2 b}.
\end{align*}
Thus small impact parameters produce large deflection, while large impact parameters produce nearly straight motion; the same dependence can also be read from the hyperbolic orbit and its asymptotes.
[/example]
The inverse-square example is a model calculation because it links a measurable angular distribution to a potential law. It also shows that scattering data are naturally parametrised by asymptotic invariants rather than by initial position and velocity at a finite time.
## Adiabatic Invariance Of Action Variables
Exact conservation laws come from symmetries, but many physical systems change slowly in time and have no exact time-translation symmetry. The question is whether the action variables from integrable mechanics remain useful when the Hamiltonian depends on a slowly varying parameter.
[definition: Slowly Varying One-Degree Hamiltonian]
Let $M\subset\mathbb R^2$ be a two-dimensional phase space with canonical coordinates $(q,p)$. Let $0<\varepsilon\ll1$, let $\lambda:\mathbb R\to\Lambda$ be smooth for an open parameter interval $\Lambda\subset\mathbb R$, and let $H:M\times\Lambda\to\mathbb R$ be smooth. The slowly varying Hamiltonian is the map
\begin{align*}
H_\varepsilon:M\times\mathbb R&\to\mathbb R, & H_\varepsilon(q,p,t)&=H(q,p,\lambda(\varepsilon t)).
\end{align*}
[/definition]
The small parameter separates the rapid orbital motion from the slow drift of the external parameter. To state what is nearly preserved, we freeze the parameter and assign a phase-space area to each closed orbit. That area is the action variable, and it is the quantity that survives the slow deformation.
Freezing the parameter replaces the non-autonomous problem by an ordinary one-degree Hamiltonian system, so each regular bounded level set is a closed curve in the $(q,p)$-plane. The relevant invariant cannot be the frozen energy alone, because that energy changes when the parameter changes; instead we need a quantity attached to the whole closed curve. The canonical line integral $\oint p\,dq$ records the signed symplectic area enclosed by that curve, and normalising by $2\pi$ gives the action coordinate used in the adiabatic statement.
[definition: One-Degree Action Variable]
Let $\mathcal A$ be the set of pairs $(E,\lambda)$ for which the frozen level set of $H(q,p,\lambda)$ contains a regular closed energy curve $\Gamma_{E,\lambda}$. The action variable is the map $I:\mathcal A\to\mathbb R$ defined by
\begin{align*}
I(E,\lambda)=\frac{1}{2\pi}\oint_{\Gamma_{E,\lambda}} p\,dq.
\end{align*}
[/definition]
The action measures the symplectic area enclosed by the periodic orbit, divided by $2\pi$. In a time-independent integrable system it is exactly conserved. The adiabatic theorem says that the same variable remains nearly conserved when the parameter drifts slowly and the orbit stays away from separatrices.
[quotetheorem:6858]
[citeproof:6858]
This result explains why slowly changing systems often keep an almost fixed phase-space area even while their energy changes. Each hypothesis protects one part of the averaging argument: smoothness gives action-angle coordinates varying regularly with the parameter, compactness keeps the estimates uniform, and staying away from separatrices prevents the period from becoming unbounded. If the frequency approaches zero, as in an oscillator with $\omega(\varepsilon t)\to0$, the fast angle is no longer fast and the averaging denominator loses control. If the orbit drifts toward a separatrix, as in a pendulum with slowly varying length or drive, crossing between oscillation and rotation can produce an order-one jump in the action. If the parameter dependence is not smooth, even a small kink in the Hamiltonian can create impulses in the transformed equations that are not captured by the averaged system. Thus adiabatic invariance is a regular-region statement, not a promise that slow variation preserves every qualitative type of motion.
[example: Slowly Varying Harmonic Oscillator]
Let
\begin{align*}H(q,p,t)=\frac{p^2}{2m}+\frac{m\omega(\varepsilon t)^2q^2}{2},\end{align*}
where $\omega(s)>0$ is smooth and bounded below by a positive constant. Freeze the slow time $s=\varepsilon t$ and write $\omega=\omega(s)$. On the frozen energy level
\begin{align*}E=\frac{p^2}{2m}+\frac{m\omega^2q^2}{2},\end{align*}
the turning points are $q=\pm\sqrt{2E/(m\omega^2)}$ when $p=0$, and $p=\pm\sqrt{2mE}$ when $q=0$. A convenient parametrisation of the closed ellipse is
\begin{align*}q(\phi)=\sqrt{\frac{2E}{m\omega^2}}\sin\phi,\qquad p(\phi)=\sqrt{2mE}\cos\phi,\qquad 0\le\phi\le2\pi.\end{align*}
Then
\begin{align*}dq=\sqrt{\frac{2E}{m\omega^2}}\cos\phi\,d\phi.\end{align*}
Therefore the action of the frozen orbit is
\begin{align*}I=\frac{1}{2\pi}\oint p\,dq=\frac{1}{2\pi}\int_0^{2\pi}\sqrt{2mE}\cos\phi\sqrt{\frac{2E}{m\omega^2}}\cos\phi\,d\phi.\end{align*}
Multiplying the constants gives
\begin{align*}\sqrt{2mE}\sqrt{\frac{2E}{m\omega^2}}=\frac{2E}{\omega},\end{align*}
so
\begin{align*}I=\frac{1}{2\pi}\frac{2E}{\omega}\int_0^{2\pi}\cos^2\phi\,d\phi.\end{align*}
Since $\int_0^{2\pi}\cos^2\phi\,d\phi=\pi$, this becomes
\begin{align*}I=\frac{E}{\omega}.\end{align*}
For the slowly varying oscillator, the frozen action along the actual motion is therefore
\begin{align*}I(t)=\frac{E(t)}{\omega(\varepsilon t)},\qquad E(t)=\frac{p(t)^2}{2m}+\frac{m\omega(\varepsilon t)^2q(t)^2}{2}.\end{align*}
By *[Adiabatic Invariance of the Action for One-Degree-of-Freedom Hamiltonian Systems](/theorems/6858)*,
\begin{align*}\frac{E(t)}{\omega(\varepsilon t)}=\frac{E(0)}{\omega(0)}+O(\varepsilon)\end{align*}
for times $0\le t\le C/\varepsilon$, as long as the orbit remains in a compact family of frozen periodic orbits. Equivalently,
\begin{align*}E(t)=\omega(\varepsilon t)\frac{E(0)}{\omega(0)}+O(\varepsilon)\end{align*}
when $\omega$ stays bounded on the relevant interval. Thus a slow increase in frequency raises the energy in the same proportion as $\omega$, while the phase-space area $2\pi I$ enclosed by the frozen ellipse remains fixed up to order $\varepsilon$.
[/example]
The oscillator example is the canonical picture: energy is not protected, but action is. This distinction is central in celestial mechanics, plasma physics, and semiclassical quantisation.
[remark: Limits Of Adiabatic Reasoning]
Adiabatic invariance requires a persistent family of periodic orbits and a controlled separation between fast and slow time scales. Near equilibria, separatrices, or resonant multi-degree systems, the averaging argument may lose uniform estimates. These failures are not technical details; they mark the transition from regular perturbation theory to resonance and chaotic dynamics.
[/remark]
The final section begins from that transition. Once there is more than one angle, averaging no longer removes every oscillatory term, because integer combinations of frequencies may vanish.
## Near-Integrable Hamiltonians And Resonances
Action-angle variables make integrable Hamiltonian systems transparent: actions are constant and angles rotate linearly. Perturbation theory asks which parts of that picture survive under a small coupling or weak external force.
[definition: Near-Integrable Hamiltonian]
Let $D\subset\mathbb R^n$ be open and let $\mathbb T^n$ have angle coordinates $\theta$. A Hamiltonian $H:D\times\mathbb T^n\to\mathbb R$ is near-integrable if there exist smooth maps $H_0:D\to\mathbb R$ and $H_1:D\times\mathbb T^n\to\mathbb R$ such that
\begin{align*}
H(I,\theta)=H_0(I)+\varepsilon H_1(I,\theta),
\end{align*}
where $0<|\varepsilon|\ll1$ and $H_1$ is $2\pi$-periodic in each angle.
[/definition]
The unperturbed equations are $\dot I=0$ and $\dot\theta=\omega(I)$, where $\omega(I)=\nabla H_0(I)$. A perturbation contributes Fourier modes in the angles, and each mode rotates with frequency $k\cdot\omega(I)$. The modes with nonzero rotation tend to average, but modes with zero rotation can accumulate and force a separate analysis. This obstruction motivates the definition of resonance.
[definition: Resonance]
For an integrable Hamiltonian $H_0(I)$ with frequency map $\omega(I)=\nabla H_0(I)$, an action $I_0$ is resonant if there exists $k\in\mathbb Z^n\setminus\{0\}$ such that
\begin{align*}
k\cdot\omega(I_0)=0.
\end{align*}
[/definition]
A resonance means that the phase $k\cdot\theta$ stops rotating in the unperturbed dynamics. Fourier modes of the perturbation with this wave vector can accumulate instead of averaging away. The normal-form question is therefore which angle-dependent terms can be removed by a canonical change of variables and which terms must remain.
[quotetheorem:6859]
[citeproof:6859]
The theorem is the first perturbative glimpse of KAM theory and resonance normal forms. The compactness and lower bound hypotheses are what prevent small divisors from making the generating function large; without them, the formal division by $k\cdot\omega(I)$ may destroy the near-identity character of the transformation. Smoothness is needed because the canonical change of variables is built from derivatives of the generating function; with insufficient regularity, the Fourier coefficients or the transformed Hamiltonian may not have the differentiability required to close the first-order calculation. The finite Fourier truncation is also part of the statement rather than a convenience: controlling finitely many denominators is different from controlling infinitely many possible small divisors, whose accumulation is the source of the harder KAM estimates. At exact resonance, such as $k\cdot\omega(I)=0$, the corresponding Fourier coefficient is not a removable oscillation but part of the effective slow dynamics. The statement is only a first-order normal-form result on a chosen finite set of modes: it does not prove persistence of invariant tori, long-time stability of actions, convergence of the perturbation series, or control of infinitely many small divisors. Those conclusions require stronger hypotheses and different arguments, such as the nondegeneracy and Diophantine conditions used in KAM theory. Thus nonresonant oscillations average out only in regions where the relevant frequency denominators are uniformly controlled, while resonant ones create slower effective dynamics.
[example: Resonance In Two Coupled Oscillators]
Consider two harmonic oscillators in action-angle variables with unperturbed Hamiltonian
\begin{align*}H_0(I_1,I_2)=\omega_1 I_1+\omega_2 I_2.\end{align*}
The unperturbed frequency vector is
\begin{align*}\omega(I)=\nabla H_0(I_1,I_2)=(\omega_1,\omega_2),\end{align*}
so the unperturbed angles satisfy
\begin{align*}\theta_1(t)=\theta_1(0)+\omega_1t,\qquad \theta_2(t)=\theta_2(0)+\omega_2t.\end{align*}
For the coupling
\begin{align*}\varepsilon H_1(I,\theta)=\varepsilon a(I)\cos(\theta_1-\theta_2),\end{align*}
the relevant Fourier wave vector is $k=(1,-1)$, because $\theta_1-\theta_2=k\cdot\theta$.
If $\omega_1\ne\omega_2$, then
\begin{align*}k\cdot\omega=(1,-1)\cdot(\omega_1,\omega_2)=\omega_1-\omega_2\ne0.\end{align*}
Along the unperturbed motion,
\begin{align*}\theta_1(t)-\theta_2(t)=\theta_1(0)-\theta_2(0)+(\omega_1-\omega_2)t.\end{align*}
Averaging over one period $T=2\pi/|\omega_1-\omega_2|$ of this phase gives
\begin{align*}\frac{1}{T}\int_{[0,T]}\cos(\theta_1(0)-\theta_2(0)+(\omega_1-\omega_2)t)\,dt=0,\end{align*}
because the substitution $\phi=\theta_1(0)-\theta_2(0)+(\omega_1-\omega_2)t$ turns the integral into the average of $\cos\phi$ over one full period. Thus this first-order coupling is a nonresonant oscillatory mode and is removed by the first-order averaging mechanism described in *[First-Order Averaging Lemma for Nonresonant Fourier Modes](/theorems/6859)*.
If $\omega_1=\omega_2$, then
\begin{align*}k\cdot\omega=(1,-1)\cdot(\omega_1,\omega_1)=\omega_1-\omega_1=0.\end{align*}
The phase $\theta_1-\theta_2$ is then stationary for the unperturbed flow, so the term $\varepsilon a(I)\cos(\theta_1-\theta_2)$ does not average to zero. Hamilton's equations for the actions give
\begin{align*}\dot I_1=-\frac{\partial H}{\partial\theta_1}=\varepsilon a(I)\sin(\theta_1-\theta_2),\end{align*}
and
\begin{align*}\dot I_2=-\frac{\partial H}{\partial\theta_2}=-\varepsilon a(I)\sin(\theta_1-\theta_2).\end{align*}
Therefore
\begin{align*}\frac{d}{dt}(I_1+I_2)=\dot I_1+\dot I_2=0,\end{align*}
while $I_1$ and $I_2$ can change with opposite signs when $a(I)\sin(\theta_1-\theta_2)\ne0$. The resonant coupling therefore preserves the total action but transfers action between the two oscillators, which is the local mechanism behind beats and resonant energy transfer.
[/example]
Perturbation theory also changes the qualitative question. Instead of asking for explicit solutions, we ask whether typical orbits return near their starting points, whether invariant tori persist, and where resonant regions dominate the dynamics.
[quotetheorem:3425]
The result is measure-theoretic: a measure-preserving transformation on a finite-[measure space](/page/Measure%20Space) cannot move a positive-measure set into infinitely many disjoint copies of itself. Both main hypotheses are essential. If the measure is infinite, translation on the real line preserves [Lebesgue measure](/page/Lebesgue%20Measure) but points do not return to a bounded interval. If measure preservation fails, a set may be compressed or dissipated away from itself, so no recurrence conclusion follows.
[remark: Recurrence Versus Stability]
Poincare recurrence does not say that a trajectory is periodic, nor does it give a useful return time bound. It is compatible with complicated dynamics, sensitive dependence, and very long recurrence times. Its role here is to show that finite-measure Hamiltonian systems have a global measure-preserving constraint even when explicit integration fails.
[/remark]
The chapter ends by tying together three levels of approximation. Scattering extracts asymptotic invariants from unbound motion, adiabatic theory protects actions under slow change, and perturbation theory separates averaged behaviour from resonant behaviour. These are the entry points from classical mechanics into geometric scattering, KAM theory, and modern Hamiltonian dynamics.
## Beyond Classical Mechanics: Connections And Next Directions
Classical mechanics is not an isolated subject in these notes. It is a meeting point for differential geometry, variational analysis, Lie theory, symplectic geometry, and dynamical systems. The Lagrangian chapters use tangent bundles, constrained submanifolds, and first-variation arguments; the Hamiltonian chapters recast the same dynamics on cotangent bundles with canonical symplectic forms, Poisson brackets, and Hamiltonian vector fields. A natural next Androma path is therefore to study smooth manifolds, differential forms, vector fields, and symplectic manifolds as independent geometric objects.
The symmetry chapters point in a second direction. Noether's theorem, momentum maps, coadjoint orbits, and reduction show how Lie groups turn conservation laws into geometry. These ideas lead naturally to Lie algebras, group actions, representation theory, and geometric invariant theory. They also explain why the same mathematical structures appear in rigid body motion, fluid models, gauge theories, and geometric quantisation.
The stability, integrability, and perturbation chapters connect classical mechanics to analysis and long-time dynamics. Linearization, normal forms, action-angle variables, resonances, and recurrence are the starting vocabulary for qualitative ordinary differential equations, ergodic theory, KAM theory, and Hamiltonian chaos. The important lesson is that explicit solutions are exceptional: much of modern mechanics studies invariant structures, approximate invariants, and the obstructions that appear near resonances or separatrices.
For mathematical physics, the next subjects are quantum mechanics, field theory, and statistical mechanics. Hamilton-Jacobi theory foreshadows semiclassical phase functions; Poisson brackets foreshadow commutators; action principles and symmetries reappear in classical field theory; recurrence and invariant measure lead toward statistical ensembles. These later theories change the objects of study, but they keep the organising principles developed here: variational structure, symmetry, conservation, phase space, and approximation.
These notes should therefore be read as a bridge. They build the classical core, but their main purpose is to make the geometric and analytic language reusable in the broader Androma landscape of geometry, analysis, dynamical systems, and mathematical physics.
## References
- [The Euler-Lagrange Equation](/theorems/3504) for the variational equation behind the opening Lagrangian chapters.
- [Fundamental Lemma of Calculus of Variations](/theorems/45) for the analytic step from weak first variation to pointwise equations.
- [Classification of Skew-Symmetric Forms](/theorems/3295) and [Dimension Of Annihilator](/theorems/420) for the linear algebra supporting symplectic complements and Darboux bases.
- [Properties Of The Poisson Bracket](/theorems/1333) and [Commutator Of Hamiltonian Vector Fields](/theorems/1336) for the algebraic structure of Hamiltonian mechanics.
- [Noether's Theorem for First-Order Lagrangians](/theorems/3513) for the bridge from variational symmetries to conserved quantities.
- [Arnold-Liouville Theorem](/theorems/1353), [Jacobi's Integration Theorem](/theorems/3532), and [Poincare Recurrence Theorem](/theorems/3425) for the integrability, Hamilton-Jacobi, and recurrence themes used in the later chapters.
- [Differential Forms I: Exterior Calculus](/page/Differential%20Forms%20I%3A%20Exterior%20Calculus) and [Differential Forms II: Manifolds and Cohomology](/page/Differential%20Forms%20II%3A%20Manifolds%20and%20Cohomology) for the differential-form language behind symplectic geometry.
- [Lie Algebras I: Foundations](/page/Lie%20Algebras%20I%3A%20Foundations) for infinitesimal symmetries and momentum maps.
- [Cambridge II Dynamical Systems](/page/Cambridge%20II%20Dynamical%20Systems) and [Ergodic Theory I: Foundations](/page/Ergodic%20Theory%20I%3A%20Foundations) for the qualitative and measure-theoretic dynamics behind stability, recurrence, and perturbation theory.
Contents
- Introduction
- What Is Classical Mechanics Trying To Describe?
- Why Variational Principles Come First
- From Lagrangian To Hamiltonian Pictures
- Symmetry And Conservation Laws
- Integrability As A Final Organising Goal
- Course Map
- 1. Configuration Spaces and Variational Principles
- Configuration Manifolds and Paths
- Action Functionals and First Variation
- Euler-Lagrange Equations on the Tangent Bundle
- Regular Lagrangians and the Fiber Hessian
- 2. Lagrangian Mechanics and Constraints
- Holonomic Constraints and Constrained Configuration Spaces
- Non-Holonomic Velocity Constraints and the Lagrange-d'Alembert Principle
- Cyclic Coordinates, Routh Reduction, and Effective Potentials
- 3. Legendre Transform and Hamiltonian Formalism
- From Velocities to Momenta
- Canonical Symplectic Geometry and Hamilton's Equations
- Energy and Autonomous Hamiltonian Dynamics
- 4. Symplectic Linear Algebra and Canonical Phase Space
- Symplectic Vector Spaces and Darboux Bases
- The Tautological One-Form and the Canonical Symplectic Form on Cotangent Bundles
- Poisson Brackets, Hamiltonian Flows, and Symplectomorphisms
- 5. Noether Theory and Momentum Maps
- Infinitesimal Symmetries Of Lagrangian Systems
- Noether Theorem For Variational Symmetries
- Hamiltonian Symmetries And Momentum Maps
- Coadjoint Geometry And The Lie--Poisson Bracket
- 6. Symplectic Reduction and Mechanical Systems with Symmetry
- Fixing Momentum and Dividing by Symmetry
- Reduced Hamiltonians and Reconstruction
- Relative Equilibria and Amended Potentials
- 7. Hamilton-Jacobi Theory and Generating Functions
- The Hamilton-Jacobi Equation
- Generating Functions for Canonical Transformations
- Separation of Variables and Constants of Motion
- 8. Stability, Periodic Motion, and Normal Forms
- Stability Near Equilibria
- Symplectic Linearization And Eigenvalue Pairing
- Birkhoff Normal Form Near Elliptic Equilibria
- 9. Integrable Systems and Action-Angle Coordinates
- First Integrals and Liouville Integrability
- Lagrangian Invariant Tori and Quasi-Periodic Flows
- Construction and Interpretation of Action-Angle Variables
- Classical Families of Integrable Systems
- 10. Classical Scattering, Adiabatic Invariants, and Perturbation Ideas
- Central-Force Scattering
- Adiabatic Invariance Of Action Variables
- Near-Integrable Hamiltonians And Resonances
- Beyond Classical Mechanics: Connections And Next Directions
- References
Mathematical Physics I: Classical Mechanics
Content
Problems
History
Created by admin on 6/12/2026 | Last updated on 6/12/2026
Prerequisites (0/10 completed)
Log in to track your prerequisite progress.
Prerequisites Graph
Interactive dependency map showing prerequisite concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Rate this page
★
★
★
★
★
Poor
Excellent