The [calculus of variations](/page/Calculus%20of%20Variations) studies the optimization of functionals—quantities that depend on entire functions or paths rather than just finite sets of variables. Where ordinary calculus asks "what value of $x$ minimizes $f(x)$?", the calculus of variations asks "what function $y(x)$ minimizes a given functional like $\int_a^b L(x, y, y') \, dx$?" This fundamental shift in perspective leads to some of the most powerful and elegant mathematics in analysis, with applications spanning classical mechanics, optics, differential geometry, and physics. This course develops the classical theory systematically, building from concrete variational problems to the abstract structures that unify and explain them.
The course begins by introducing functionals and the variational problem, then derives the Euler–Lagrange equation—the central [necessary condition for an extremum](/theorems/3520) of a functional. Subsequent chapters refine this picture: we handle boundary conditions and constraints, develop tests for distinguishing minima, maxima, and saddle points (via Legendre's condition and the second variation), and analyze the role of conjugate points in determining sufficiency. Along the way, we encounter the Weierstrass excess function, which characterizes *strong* extrema that remain optimal under perturbations not just in the function but in its derivative as well. These chapters form a complete sufficiency theory for classical extremal problems.
The course culminates in perspectives that connect the calculus of variations to deeper structures in mathematics and physics. Noether's theorem reveals that every symmetry of a variational problem yields a conservation law, unifying mechanics and geometry. The Hamiltonian formulation recasts the theory in terms of phase space rather than configuration space, and Hamilton–Jacobi theory provides a powerful method for solving variational problems and understanding the structure of solutions globally. The final chapter shows how Hamilton–Jacobi analysis completes the sufficiency story, tying together the classical necessary conditions of the middle chapters with a comprehensive existence and optimality framework.
# Introduction
What kind of mathematical object has a derivative when its input is an entire curve rather than a point in Euclidean space? This course studies that question in the classical setting: smooth curves, integral functionals, and the differential equations forced by extremality. The central theme is that a problem about minimising a number attached to a path often becomes a boundary value problem for an ordinary differential equation, and sometimes a first-order partial differential equation gives a global certificate of minimality.
The course is classical in two senses. First, the main unknowns are sufficiently smooth curves, so variations may be computed by differentiating under the integral sign and integrating by parts. Second, the theory is organised around the Euler--Lagrange equation, the second variation, Jacobi fields, conjugate points, Noether's theorem, and the Hamilton--Jacobi equation. Later courses in the series replace parts of this smooth picture by weak compactness, lower semicontinuity, relaxation, and Gamma-convergence.
## What the Course Is About
Which features of a minimisation problem survive when the variable is a function? In finite dimensions, a differentiable function $F: U \subseteq \mathbb R^n \to \mathbb R$ has a local extremum at an interior point $x_0$ only if $\nabla F(x_0)=0$. The calculus of variations asks for the analogue when the variable is a curve $y: [a,b] \to \mathbb R^n$ and the quantity to be extremised is usually an integral depending on $y$ and its derivative.
[definition: Integral Functional]
Let $a<b$, let $U\subseteq \mathbb R^n$ be open, and let $L:[a,b]\times U\times \mathbb R^n\to \mathbb R$ be a smooth function. An integral functional associated to $L$ is a map
\begin{align*}
J[y] = \int_a^b L(x,y(x),y'(x))\,dx
\end{align*}
defined on a specified class of curves $y\in C^1([a,b];U)$ satisfying prescribed admissibility conditions.
[/definition]
The function $L$ is called the Lagrangian. The variable $x$ often represents time, arc length, or a spatial parameter; the curve $y$ is the unknown; and $y'$ records the velocity or slope. The admissibility conditions are not decoration: fixed endpoints, free endpoints, constraints, and transversality conditions change the differential equations and boundary conditions that arise.
[example: Length Of A Plane Curve]
For curves $y\in C^1([a,b];\mathbb R)$ with fixed endpoints, the Euclidean length of the graph $x\mapsto (x,y(x))$ is
\begin{align*}
J[y]=\int_a^b \sqrt{1+(y'(x))^2}\,dx.
\end{align*}
Here the Lagrangian is $L(x,y,p)=\sqrt{1+p^2}$, so its derivatives in the $y$ and $p$ variables are
\begin{align*}
\partial_y L(x,y,p)=0
\end{align*}
and
\begin{align*}
\partial_p L(x,y,p)=\frac{p}{\sqrt{1+p^2}}.
\end{align*}
Thus the Euler--Lagrange equation for a smooth stationary curve becomes
\begin{align*}
\frac{d}{dx}\left(\frac{y'(x)}{\sqrt{1+(y'(x))^2}}\right)=0.
\end{align*}
This says that the function
\begin{align*}
\frac{y'(x)}{\sqrt{1+(y'(x))^2}}
\end{align*}
is constant on $[a,b]$. If $f(p)=p/\sqrt{1+p^2}$, then
\begin{align*}
f'(p)=\frac{\sqrt{1+p^2}-p\cdot p(1+p^2)^{-1/2}}{1+p^2}
\end{align*}
and the numerator equals
\begin{align*}
\sqrt{1+p^2}-\frac{p^2}{\sqrt{1+p^2}}=\frac{1+p^2-p^2}{\sqrt{1+p^2}}=\frac{1}{\sqrt{1+p^2}}.
\end{align*}
Therefore
\begin{align*}
f'(p)=\frac{1}{(1+p^2)^{3/2}}>0.
\end{align*}
So $f$ is strictly increasing, and $f(y'(x))$ being constant forces $y'(x)$ itself to be constant. Hence $y(x)=mx+c$ for constants $m,c$, and the extremals for the length functional are straight line segments.
[/example]
This first example explains the basic shape of the subject: a geometric or physical quantity is encoded by $L$, a class of admissible curves is fixed, and stationarity is translated into equations. The next issue is that a stationary curve need not be a minimiser, so the course distinguishes necessary conditions from sufficient conditions throughout.
## From Variations To Differential Equations
How does a small perturbation of a curve produce an equation? If $y$ is admissible and $h$ is an allowed variation direction, the perturbed curves $y+\varepsilon h$ generate a one-variable function $\varepsilon\mapsto J[y+\varepsilon h]$. The derivative at $\varepsilon=0$ is the first variation, and an interior extremum must make this derivative vanish for every allowed $h$.
[definition: First Variation]
Let $\mathcal A\subset C^1([a,b];\mathbb R^n)$ be an admissible class of curves, and let $J:\mathcal A\to\mathbb R$ be a functional. If $y\in\mathcal A$ and $h$ is an admissible variation direction at $y$, the first variation of $J$ at $y$ in the direction $h$ is
\begin{align*}
\delta J[y;h] = \frac{d}{d\varepsilon}\Big|_{\varepsilon=0} J[y+\varepsilon h],
\end{align*}
whenever this derivative exists.
[/definition]
This definition is the infinite-dimensional analogue of a directional derivative. In the basic fixed-endpoint problem, $h(a)=h(b)=0$, so [integration by parts](/theorems/210) transfers derivatives from $h$ onto the coefficients involving $L$. The vanishing of $\delta J[y;h]$ for all such $h$ then forces the Euler--Lagrange equation.
[quotetheorem:6986]
[citeproof:6986]
The theorem gives only stationarity, not minimality. The hypotheses about admissible directions matter because a perturbation must preserve the endpoint and constraint conditions; otherwise the one-variable test is being applied along curves outside the problem. Much of the first half of the course develops the consequences of this stationarity condition, while the second half asks when stationarity plus additional hypotheses proves that a curve is actually minimising; the later example of a stationary curve with negative second variation shows concretely why the first variation cannot settle minimality by itself.
[example: Brachistochrone Setup]
Choose coordinates with $x$ horizontal and $y$ measuring downward vertical distance from the release point, so the bead starts at $y=0$ and admissible curves satisfy $y(x)>0$ for $x>0$. Conservation of mechanical energy gives the speed at height $y$ as $v=\sqrt{2gy}$, and the length element along the graph is $\sqrt{1+(y'(x))^2}\,dx$, so the travel-time functional is
\begin{align*}
T[y]=\int_0^b \frac{\sqrt{1+(y'(x))^2}}{\sqrt{2g\,y(x)}}\,dx.
\end{align*}
Thus
\begin{align*}
L(y,p)=\frac{\sqrt{1+p^2}}{\sqrt{2gy}}.
\end{align*}
Since $L$ has no explicit $x$-dependence, the Euler--Lagrange equation implies that $L-p\partial_pL$ is constant along any smooth extremal. Indeed,
\begin{align*}
\frac{d}{dx}\bigl(L-y'\partial_pL\bigr)=\partial_yL\,y'+\partial_pL\,y''-y''\partial_pL-y'\frac{d}{dx}\partial_pL.
\end{align*}
The middle terms cancel, and the Euler--Lagrange equation gives $\frac{d}{dx}\partial_pL=\partial_yL$, so
\begin{align*}
\frac{d}{dx}\bigl(L-y'\partial_pL\bigr)=\partial_yL\,y'-y'\partial_yL=0.
\end{align*}
For the brachistochrone Lagrangian,
\begin{align*}
\partial_pL(y,p)=\frac{p}{\sqrt{2gy}\sqrt{1+p^2}}.
\end{align*}
Therefore
\begin{align*}
L-p\partial_pL=\frac{\sqrt{1+p^2}}{\sqrt{2gy}}-\frac{p^2}{\sqrt{2gy}\sqrt{1+p^2}}.
\end{align*}
Putting the terms over the common denominator $\sqrt{2gy}\sqrt{1+p^2}$ gives
\begin{align*}
L-p\partial_pL=\frac{1+p^2-p^2}{\sqrt{2gy}\sqrt{1+p^2}}.
\end{align*}
Hence
\begin{align*}
L-p\partial_pL=\frac{1}{\sqrt{2gy}\sqrt{1+p^2}}.
\end{align*}
Along an extremal this equals a positive constant $C$, so with $p=y'$ we get
\begin{align*}
2g\,y(x)\bigl(1+(y'(x))^2\bigr)=\frac{1}{C^2}.
\end{align*}
Writing $a=1/(2gC^2)>0$, this becomes
\begin{align*}
y(x)\bigl(1+(y'(x))^2\bigr)=a.
\end{align*}
Solving for $y'$ on a descending branch gives
\begin{align*}
y'(x)=\sqrt{\frac{a-y(x)}{y(x)}}.
\end{align*}
Equivalently,
\begin{align*}
\frac{dx}{dy}=\sqrt{\frac{y}{a-y}}.
\end{align*}
Set
\begin{align*}
y=\frac{a}{2}(1-\cos\theta).
\end{align*}
Then
\begin{align*}
a-y=\frac{a}{2}(1+\cos\theta).
\end{align*}
Also
\begin{align*}
\frac{dy}{d\theta}=\frac{a}{2}\sin\theta.
\end{align*}
Using $1-\cos\theta=2\sin^2(\theta/2)$ and $1+\cos\theta=2\cos^2(\theta/2)$,
\begin{align*}
\sqrt{\frac{y}{a-y}}=\tan(\theta/2).
\end{align*}
Thus
\begin{align*}
\frac{dx}{d\theta}=\frac{dx}{dy}\frac{dy}{d\theta}=\tan(\theta/2)\cdot \frac{a}{2}\sin\theta.
\end{align*}
Since $\tan(\theta/2)=\sin\theta/(1+\cos\theta)$, this is
\begin{align*}
\frac{dx}{d\theta}=\frac{a}{2}\frac{\sin^2\theta}{1+\cos\theta}.
\end{align*}
Using $\sin^2\theta=(1-\cos\theta)(1+\cos\theta)$,
\begin{align*}
\frac{dx}{d\theta}=\frac{a}{2}(1-\cos\theta).
\end{align*}
Integrating from the initial point gives
\begin{align*}
x=\frac{a}{2}(\theta-\sin\theta).
\end{align*}
Together with
\begin{align*}
y=\frac{a}{2}(1-\cos\theta),
\end{align*}
this is a cycloid. Thus the Euler--Lagrange equation does not produce a straight chord for the fastest descent problem; it selects cycloidal arcs, with the parameter $a$ and endpoint value of $\theta$ determined by the prescribed final point.
[/example]
The brachistochrone illustrates why the subject is not only a reformulation of shortest-path geometry. The same formal machinery treats physical action, optical travel time, surface area, elastic energy, and constrained optimisation problems, but each model imposes its own admissible class and boundary conditions.
## Necessary Conditions, Sufficient Conditions, And Geometry
When does the differential equation obtained from the first variation identify a minimum rather than just a candidate? In finite-dimensional calculus, a critical point is classified using second derivatives, convexity, or comparison arguments. The variational theory has corresponding tools: the second variation, the Legendre condition, the Jacobi equation, conjugate points, and fields of extremals.
[definition: Stationary Curve]
Let $J$ be a functional on an admissible class $\mathcal A$. A curve $y\in\mathcal A$ is stationary for $J$ if
\begin{align*}
\delta J[y;h]=0
\end{align*}
for every admissible variation direction $h$ at $y$.
[/definition]
A stationary curve is therefore a candidate selected by the first-order theory. To decide whether such a candidate bends the functional upward or downward under perturbation, the next object records the second-order coefficient of $J[y+\varepsilon h]$ at $\varepsilon=0$.
[definition: Second Variation]
Let $\mathcal A\subset C^1([a,b];\mathbb R^n)$ be an admissible class of curves, let $J:\mathcal A\to\mathbb R$ be a functional, and let $y\in\mathcal A$ be a curve for which the following second derivative exists for admissible directions $h$. The second variation of $J$ at $y$ in the direction $h$ is
\begin{align*}
\delta^2 J[y;h] = \frac{d^2}{d\varepsilon^2}\Big|_{\varepsilon=0} J[y+\varepsilon h].
\end{align*}
[/definition]
The second variation gives a quadratic form in the variation direction for the classical integral functionals considered here. Positivity of this quadratic form is necessary for a minimum under appropriate hypotheses, but the subtle question is whether positivity is strong enough to be sufficient.
[example: Stationary But Not Minimising]
For the functional
\begin{align*}
J[y]=\int_0^{2\pi}\big((y'(x))^2-y(x)^2\big)\,dx
\end{align*}
on curves satisfying $y(0)=y(2\pi)=0$, the zero curve is stationary because every admissible variation $k$ has $k(0)=k(2\pi)=0$ and
\begin{align*}
J[\varepsilon k]=\varepsilon^2\int_0^{2\pi}\big((k'(x))^2-k(x)^2\big)\,dx.
\end{align*}
Differentiating with respect to $\varepsilon$ gives
\begin{align*}
\frac{d}{d\varepsilon}J[\varepsilon k]=2\varepsilon\int_0^{2\pi}\big((k'(x))^2-k(x)^2\big)\,dx,
\end{align*}
so at $\varepsilon=0$ the first variation is $0$ for every admissible $k$.
Now take the admissible variation $h(x)=\sin(x/2)$, which satisfies $h(0)=0$ and $h(2\pi)=\sin\pi=0$. Its derivative is
\begin{align*}
h'(x)=\frac{1}{2}\cos(x/2).
\end{align*}
Therefore
\begin{align*}
\delta^2J[0;h]=2\int_0^{2\pi}\left(\frac{1}{4}\cos^2(x/2)-\sin^2(x/2)\right)\,dx.
\end{align*}
Using $u=x/2$, so $dx=2\,du$ and $u$ runs from $0$ to $\pi$, we get
\begin{align*}
\int_0^{2\pi}\cos^2(x/2)\,dx=2\int_0^\pi \cos^2 u\,du=\pi
\end{align*}
and
\begin{align*}
\int_0^{2\pi}\sin^2(x/2)\,dx=2\int_0^\pi \sin^2 u\,du=\pi.
\end{align*}
Hence
\begin{align*}
\delta^2J[0;h]=2\left(\frac{\pi}{4}-\pi\right)=-\frac{3\pi}{2}<0.
\end{align*}
Equivalently,
\begin{align*}
J[\varepsilon h]=\varepsilon^2\left(\frac{\pi}{4}-\pi\right)=-\frac{3\pi}{4}\varepsilon^2<0
\end{align*}
for every $\varepsilon\neq 0$, while $J[0]=0$. Thus arbitrarily close admissible curves lower the value of $J$, so the stationary zero curve is not a local minimiser.
[/example]
The example points toward Jacobi theory: the interval length and the oscillation of solutions to an associated linear equation control whether the second variation can be positive. Conjugate points record where a nonzero variation field can vanish at both endpoints, and their absence becomes a decisive local minimality condition in the classical theory.
## Symmetry And Conservation Laws
Why do conserved quantities appear in variational problems with symmetry? Many Lagrangians do not depend on all variables equally: some are independent of $x$, some are independent of $y$, and some are invariant under a continuous group of transformations. The variational viewpoint turns those invariances into first integrals along Euler--Lagrange extremals.
[definition: First Integral]
Let $y:[a,b]\to\mathbb R^n$ be a solution of an Euler--Lagrange equation. A first integral is a function $I(x,y,p)$ such that
\begin{align*}
\frac{d}{dx} I(x,y(x),y'(x))=0
\end{align*}
for every solution $y$ in the class under consideration.
[/definition]
A first integral reduces the order of the equation or gives a quantity that can be tracked without solving the whole boundary value problem. The [Beltrami identity](/theorems/3505) and momentum conservation are early examples, and Noether's theorem is the general structural result behind them.
[example: Translation Invariance]
Suppose $L=L(p)$ depends only on the velocity variable for scalar curves, so $\partial_y L=0$ and $\partial_pL(x,y,y')=\partial_pL(y')$. Along a smooth extremal, the Euler--Lagrange equation is
\begin{align*}
\partial_yL(y'(x))-\frac{d}{dx}\partial_pL(y'(x))=0.
\end{align*}
Since $\partial_yL=0$, this becomes
\begin{align*}
-\frac{d}{dx}\partial_pL(y'(x))=0.
\end{align*}
Multiplying by $-1$ gives
\begin{align*}
\frac{d}{dx}\partial_pL(y'(x))=0.
\end{align*}
Therefore $\partial_pL(y'(x))$ is constant along each smooth extremal.
For the length Lagrangian $L(p)=\sqrt{1+p^2}$, differentiating with respect to $p$ gives
\begin{align*}
\partial_pL(p)=\frac{1}{2}(1+p^2)^{-1/2}\cdot 2p.
\end{align*}
Hence
\begin{align*}
\partial_pL(p)=\frac{p}{\sqrt{1+p^2}}.
\end{align*}
The first integral says that
\begin{align*}
\frac{y'(x)}{\sqrt{1+(y'(x))^2}}
\end{align*}
is constant. If $f(p)=p(1+p^2)^{-1/2}$, then the product rule gives
\begin{align*}
f'(p)=(1+p^2)^{-1/2}-p^2(1+p^2)^{-3/2}.
\end{align*}
Putting the two terms over the common denominator $(1+p^2)^{3/2}$ gives
\begin{align*}
f'(p)=\frac{1+p^2-p^2}{(1+p^2)^{3/2}}.
\end{align*}
Thus
\begin{align*}
f'(p)=\frac{1}{(1+p^2)^{3/2}}>0.
\end{align*}
So $f$ is strictly increasing, and $f(y'(x))$ being constant forces $y'(x)$ to be constant. Therefore the extremals for the graph-length functional have constant slope, hence are straight line segments.
[/example]
This conservation-law perspective also prepares the Hamiltonian side of the course. Passing from velocities to momenta leads to Hamilton's equations, and the Hamilton--Jacobi equation supplies a method for certifying minimisers by constructing a suitable scalar function.
## The Hamilton--Jacobi Viewpoint
How can a partial differential equation prove that a curve minimises an integral? Instead of comparing a candidate curve with every nearby curve directly, the Hamilton--Jacobi method builds a function whose differential calibrates the Lagrangian along the chosen extremals. This turns the integral into a boundary term plus a nonnegative remainder.
[definition: Hamiltonian]
Let $L(x,y,p)$ be a smooth Lagrangian for which the momentum relation $q=\partial_p L(x,y,p)$ can be solved for $p$ in terms of $(x,y,q)$. The Hamiltonian associated to $L$ is
\begin{align*}
H(x,y,q)= q\cdot p - L(x,y,p),
\end{align*}
where $p$ is determined by $q=\partial_p L(x,y,p)$.
[/definition]
The Hamiltonian changes the variables from velocity to momentum. To use it as a sufficient-condition tool, the course seeks a scalar function whose gradient supplies the momentum field along a family of extremals; the equation defining such a function is the Hamilton--Jacobi equation.
[definition: Hamilton--Jacobi Equation]
For a Hamiltonian $H(x,y,q)$, the Hamilton--Jacobi equation for an unknown function $S(x,y)$ is
\begin{align*}
\partial_x S(x,y)+H(x,y,\nabla_y S(x,y))=0.
\end{align*}
[/definition]
A solution $S$ behaves like a generating function for a family of extremals. The later Hamilton--Jacobi chapter uses this equation to obtain a global lower bound for the action of any admissible curve with the same endpoints.
[example: Mechanical Lagrangian]
For the one-dimensional mechanical Lagrangian
\begin{align*}
L(y,p)=\frac{1}{2}p^2 - V(y),
\end{align*}
the momentum variable is defined by differentiating $L$ with respect to the velocity $p$. Since $V(y)$ is independent of $p$,
\begin{align*}
q=\partial_p L(y,p)=\partial_p\left(\frac{1}{2}p^2\right)-\partial_p V(y)=p-0=p.
\end{align*}
Thus the momentum relation is $q=p$, so solving for the velocity gives $p=q$.
The Hamiltonian is defined by
\begin{align*}
H(y,q)=qp-L(y,p),
\end{align*}
with $p$ replaced by the velocity determined from $q=p$. Substituting $p=q$ gives
\begin{align*}
H(y,q)=q\cdot q-\left(\frac{1}{2}q^2-V(y)\right).
\end{align*}
Expanding the minus sign,
\begin{align*}
H(y,q)=q^2-\frac{1}{2}q^2+V(y).
\end{align*}
Combining the quadratic terms,
\begin{align*}
H(y,q)=\frac{1}{2}q^2+V(y).
\end{align*}
For a function $S(x,y)$, the Hamilton--Jacobi equation is
\begin{align*}
\partial_x S(x,y)+H(y,\partial_y S(x,y))=0.
\end{align*}
Substituting $q=\partial_y S(x,y)$ into the Hamiltonian gives
\begin{align*}
H(y,\partial_y S(x,y))=\frac{1}{2}(\partial_y S(x,y))^2+V(y).
\end{align*}
Therefore the Hamilton--Jacobi equation becomes
\begin{align*}
\partial_x S(x,y)+\frac{1}{2}(\partial_y S(x,y))^2+V(y)=0.
\end{align*}
The kinetic term becomes quadratic in the momentum $\partial_y S$, while the potential term remains $V(y)$, so the variational problem is converted into a first-order PDE for the generating function $S$.
[/example]
This final viewpoint completes the classical arc of the course: variation gives Euler--Lagrange equations, the second variation tests local behaviour, symmetry gives conservation laws, and Hamilton--Jacobi theory gives a powerful route to sufficient conditions.
## Prerequisites And Conventions
What background is needed to follow the arguments without interrupting the variational ideas? The course assumes real analysis at the level of metric spaces, compactness, [uniform convergence](/page/Uniform%20Convergence), and Lebesgue integration; multivariable calculus including the chain rule and [integration by parts](/theorems/2098); linear algebra for quadratic forms and eigenvalue intuition; and basic ODE existence theory. Familiarity with mechanics helps with examples, but the mathematical development is self-contained once the analytic prerequisites are in place.
The main function spaces in this first course are classical spaces such as $C^1([a,b];\mathbb R^n)$ and $C^2([a,b];\mathbb R^n)$, usually with endpoint restrictions. Norms on curves are written with subscripts when ambiguity matters, for instance
\begin{align*}
\|y\|_{C^1}=\sup_{x\in[a,b]} |y(x)|+\sup_{x\in[a,b]} |y'(x)|.
\end{align*}
Functionals are written with square brackets, as in $J[y]$, and ordinary functions are written with parentheses, as in $L(x,y,p)$.
The later direct-method course changes the ambient space from smooth curves to Sobolev spaces and replaces explicit ODE analysis by compactness and lower semicontinuity. This course deliberately stays in the smooth regime long enough to develop the geometric and differential-equation structure that motivates those later generalisations.
Having sketched the landscape of the calculus of variations and committed to the smooth regime, we now introduce the fundamental objects that make this program concrete: functionals and the first variation. These form the vocabulary for posing and solving variational problems.
# 1. Functionals and the Variational Problem
The calculus of variations studies functions by asking how quantities depending on an entire curve change when the curve is perturbed. In this first chapter we set up the basic objects: functionals, admissible classes, boundary conditions, and the first variation. The aim is not yet to derive the Euler--Lagrange equation, but to isolate the variational principle that will drive the rest of the course: an extremising curve must have zero first-order change under every admissible infinitesimal perturbation.
## Functionals on Spaces of Curves
A variational problem begins with a quantity assigned to each curve, not with a function evaluated at a point. We need a name for a real-valued map whose input may be an entire curve, a surface, or another mathematical object.
[definition: Functional]
Let $X$ be a set. A functional on $X$ is a map
\begin{align*}
J : X \to \mathbb R.
\end{align*}
[/definition]
The word functional emphasises that the input is itself usually a function. To compare curves by small perturbations, we also need a curve space in which both the position and the derivative vary continuously. This motivates the standard $C^1$ setting for the first part of the course.
[definition: The Space C One Curves]
Let $a<b$ and let $n\in\mathbb N$. Define
\begin{align*}
C^1([a,b];\mathbb R^n)
= \{y:[a,b]\to\mathbb R^n : y \text{ is continuously differentiable}\}.
\end{align*}
For $y\in C^1([a,b];\mathbb R^n)$, set
\begin{align*}
\|y\|_{C^1} = \sup_{x\in[a,b]} |y(x)| + \sup_{x\in[a,b]} |y'(x)|.
\end{align*}
[/definition]
This norm makes two curves close when their positions and velocities are uniformly close. That is the right notion for a Lagrangian depending continuously on $x$, $y$, and $y'$.
[example: Length Functional on Plane Curves]
Let $y=(y_1,y_2)\in C^1([a,b];\mathbb R^2)$. Its velocity is $y'(x)=(y_1'(x),y_2'(x))$, so the Euclidean speed at $x$ is
\begin{align*}
|y'(x)|=\sqrt{(y_1'(x))^2+(y_2'(x))^2}.
\end{align*}
The length functional is therefore
\begin{align*}
J[y]=\int_a^b |y'(x)|\,dx=\int_a^b \sqrt{(y_1'(x))^2+(y_2'(x))^2}\,dx.
\end{align*}
Because $y_1'$ and $y_2'$ are continuous, the function $x\mapsto |y'(x)|$ is continuous on $[a,b]$, hence integrable. Thus $J$ assigns to the parametrised plane curve $y$ the accumulated speed over the interval, which is the usual arc length for a regular $C^1$ parametrisation. This illustrates the basic variational pattern: a geometric quantity is encoded as an integral depending on the derivative of the unknown curve.
[/example]
The length example is simple in form but already variational: if the endpoints are prescribed, the question is which curve gives the least value of $J$. Most of the course replaces $|y'|$ by a more general function of position and velocity.
## Integral Functionals and Admissible Curves
The central problem is to choose a curve from an allowed collection so that an integral quantity is as small, or sometimes as large, as possible. The first task is to write the integral in a form flexible enough for mechanics, geometry, and optimisation with constraints.
[definition: Classical Integral Functional]
Let $L:[a,b]\times\mathbb R^n\times\mathbb R^n\to\mathbb R$ be a function. The associated classical integral functional is
\begin{align*}
J[y] = \int_a^b L(x,y(x),y'(x))\,dx,
\end{align*}
defined for curves $y\in C^1([a,b];\mathbb R^n)$ for which the integrand is integrable.
[/definition]
The function $L$ is called the Lagrangian. The three arguments represent the independent variable, the position of the curve, and the velocity of the curve. The optimisation problem is not determined by $L$ alone, because boundary data and constraints decide which curves are allowed; this motivates naming the admissible class explicitly.
[definition: Admissible Class]
An admissible class for a variational problem is a subset
\begin{align*}
\mathcal A \subset C^1([a,b];\mathbb R^n)
\end{align*}
of curves over which the functional $J$ is to be optimised.
[/definition]
The admissible class is part of the problem, not an afterthought. Without fixing $\mathcal A$, the comparison statement "$y$ is best" has no mathematical content, because there is no specified collection of competitors. Once $\mathcal A$ is fixed, a solution must be described by an inequality against every admissible curve in that class, with the direction of the inequality distinguishing minimisation from maximisation.
[definition: Minimiser and Maximiser]
Let $J:\mathcal A\to\mathbb R$ be a functional. A curve $y\in\mathcal A$ is a minimiser of $J$ on $\mathcal A$ if
\begin{align*}
J[y]\le J[z]\quad\text{for all }z\in\mathcal A.
\end{align*}
It is a maximiser of $J$ on $\mathcal A$ if
\begin{align*}
J[y]\ge J[z]\quad\text{for all }z\in\mathcal A.
\end{align*}
[/definition]
A minimiser compares $y$ with every admissible competitor, so it is a global notion. First-variation arguments begin with comparisons against nearby competitors, because they are built from infinitesimal perturbations. This motivates the following local definition.
[definition: Local Minimiser]
Let $J:\mathcal A\to\mathbb R$ and let $\mathcal A\subset C^1([a,b];\mathbb R^n)$. A curve $y\in\mathcal A$ is a local minimiser in the $C^1$ topology if there exists $\varepsilon>0$ such that
\begin{align*}
J[y]\le J[z]
\end{align*}
for every $z\in\mathcal A$ with $\|z-y\|_{C^1}<\varepsilon$.
[/definition]
Local minimality is enough to force first-order stationarity. Later chapters ask when stationarity, plus second-order or convexity hypotheses, is enough to recover a minimum.
[example: Shortest Path Between Two Points]
Let $p,q\in\mathbb R^2$ and consider
\begin{align*}
\mathcal A = \{y\in C^1([0,1];\mathbb R^2): y(0)=p,\ y(1)=q\},
\qquad
J[y] = \int_0^1 |y'(x)|\,dx.
\end{align*}
For any admissible curve $y\in\mathcal A$, the [fundamental theorem of calculus](/theorems/632) gives
\begin{align*}
q-p=y(1)-y(0)=\int_0^1 y'(x)\,dx.
\end{align*}
Using the triangle inequality for integrals in $\mathbb R^2$,
\begin{align*}
|q-p|=\left|\int_0^1 y'(x)\,dx\right|\le \int_0^1 |y'(x)|\,dx=J[y].
\end{align*}
Now take the straight-line curve
\begin{align*}
y_*(x)=(1-x)p+xq.
\end{align*}
It satisfies $y_*(0)=p$ and $y_*(1)=q$, so $y_*\in\mathcal A$. Its derivative is constant:
\begin{align*}
y_*'(x)=q-p.
\end{align*}
Therefore
\begin{align*}
J[y_*]=\int_0^1 |q-p|\,dx=|q-p|\int_0^1 1\,dx=|q-p|.
\end{align*}
Thus every admissible curve has length at least $|q-p|$, and the straight-line curve attains this value. The geometric shortest-path problem is therefore encoded as a minimisation problem over the admissible class $\mathcal A$.
[/example]
This shortest-path problem also warns us that parametrisation can matter in the analysis even when the geometric image is the main object. Later, reparametrisation-invariant functionals require special care because their Lagrangians may fail strict convexity in the velocity variable.
## Boundary Conditions and Constraints
A curve is rarely allowed to vary freely in every direction, so the next problem is to record which endpoint and constraint data determine the permitted perturbations. These data enter the first variation through the class of allowed directions. The simplest and most important case fixes both endpoints in advance.
[definition: Fixed-Endpoint Boundary Condition]
Let $p,q\in\mathbb R^n$. The fixed-endpoint admissible class is
\begin{align*}
\mathcal A_{p,q}
=\{y\in C^1([a,b];\mathbb R^n): y(a)=p,\ y(b)=q\}.
\end{align*}
[/definition]
For fixed endpoints, admissible perturbations must vanish at both endpoints. Many geometric and physical problems instead leave one or both endpoints free to move, and then the boundary terms in the first variation carry information rather than disappearing. This motivates isolating the free-endpoint case.
[definition: Free-Endpoint Boundary Condition]
A free-endpoint variational problem is one in which at least one endpoint value $y(a)$ or $y(b)$ is not prescribed as a fixed vector in $\mathbb R^n$.
[/definition]
Free endpoints allow endpoint variations and lead to natural boundary conditions. A different obstruction appears when the admissible curve must preserve a numerical quantity such as length, area, or enclosed volume. Then not every small perturbation is allowed: admissible perturbations must remain on the level set of an auxiliary functional. The constraint therefore has to be recorded as part of the variational problem, rather than treated as an ordinary endpoint condition.
[definition: Isoperimetric Constraint]
Let $G:C^1([a,b];\mathbb R^n)\to\mathbb R$ be a functional and let $c\in\mathbb R$. An isoperimetric constraint is the requirement
\begin{align*}
G[y]=c.
\end{align*}
[/definition]
The name comes from classical problems in which length is fixed and area is optimised, or conversely. Analytically, an isoperimetric constraint restricts variations to those tangent to the constraint surface, leading to Lagrange multiplier conditions in later chapters.
[example: Brachistochrone Setup]
Let a particle start from rest at $p=(0,0)$ and move under constant gravity to $q=(X,Y)$, where $X>0$ and $Y<0$. Write the path as a graph $x\mapsto (x,u(x))$ with $u(0)=0$, $u(X)=Y$, and $u(x)<0$ for $0<x\le X$. Since the vertical drop at height $u(x)$ is $-u(x)$, conservation of mechanical energy gives
\begin{align*}
\frac12 m v(x)^2 = mg(-u(x)).
\end{align*}
Dividing by $m$ and multiplying by $2$ gives
\begin{align*}
v(x)^2 = 2g(-u(x)).
\end{align*}
Because speed is nonnegative,
\begin{align*}
v(x)=\sqrt{2g(-u(x))}.
\end{align*}
For the graph parametrisation $r(x)=(x,u(x))$, the derivative is $r'(x)=(1,u'(x))$, so the arclength element is
\begin{align*}
ds=|r'(x)|\,dx=\sqrt{1+u'(x)^2}\,dx.
\end{align*}
Travel time is arclength divided by speed, hence
\begin{align*}
dT=\frac{ds}{v(x)}=\frac{\sqrt{1+u'(x)^2}}{\sqrt{2g(-u(x))}}\,dx.
\end{align*}
Integrating from $0$ to $X$ gives
\begin{align*}
T[u]=\int_0^X \frac{\sqrt{1+u'(x)^2}}{\sqrt{2g(-u(x))}}\,dx.
\end{align*}
Equivalently, with the positive constant $C=1/\sqrt{2g}$,
\begin{align*}
T[u]=C\int_0^X \sqrt{\frac{1+u'(x)^2}{-u(x)}}\,dx.
\end{align*}
Thus the brachistochrone problem is to minimise this travel-time functional over admissible graphs with fixed endpoints and lying below the starting height.
[/example]
The brachistochrone illustrates why the Lagrangian may be singular at part of the boundary of the admissible class. Classical derivations still begin with formal variations, but existence and endpoint behaviour need separate analysis.
[example: Minimal Surface of Revolution Setup]
Let $0<a<b$ and let $r_0,r_1>0$. A positive curve $y\in C^1([a,b];\mathbb R)$ with $y(a)=r_0$ and $y(b)=r_1$ generates a surface of revolution about the $x$-axis by
\begin{align*}
\Phi(x,\theta)=(x,y(x)\cos\theta,y(x)\sin\theta),
\end{align*}
where $x\in[a,b]$ and $\theta\in[0,2\pi]$. The tangent vectors are
\begin{align*}
\partial_x\Phi(x,\theta)=(1,y'(x)\cos\theta,y'(x)\sin\theta)
\end{align*}
and
\begin{align*}
\partial_\theta\Phi(x,\theta)=(0,-y(x)\sin\theta,y(x)\cos\theta).
\end{align*}
Their squared lengths and dot product are
\begin{align*}
|\partial_x\Phi|^2=1+y'(x)^2\cos^2\theta+y'(x)^2\sin^2\theta=1+y'(x)^2
\end{align*}
and
\begin{align*}
|\partial_\theta\Phi|^2=y(x)^2\sin^2\theta+y(x)^2\cos^2\theta=y(x)^2
\end{align*}
and
\begin{align*}
\partial_x\Phi\cdot\partial_\theta\Phi=-y(x)y'(x)\cos\theta\sin\theta+y(x)y'(x)\sin\theta\cos\theta=0.
\end{align*}
Thus the area element is
\begin{align*}
\sqrt{|\partial_x\Phi|^2|\partial_\theta\Phi|^2-(\partial_x\Phi\cdot\partial_\theta\Phi)^2}\,d\theta\,dx
=y(x)\sqrt{1+y'(x)^2}\,d\theta\,dx,
\end{align*}
using $y(x)>0$. Integrating over the full angle gives
\begin{align*}
A[y]=\int_a^b\int_0^{2\pi} y(x)\sqrt{1+y'(x)^2}\,d\theta\,dx
=2\pi\int_a^b y(x)\sqrt{1+y'(x)^2}\,dx.
\end{align*}
The Goldschmidt problem asks for minimisers of this area functional among positive fixed-endpoint profiles, and the formula shows that the variational problem depends simultaneously on the radius $y(x)$ and the slope $y'(x)$.
[/example]
This example is a first sign that the classical theory of stationary smooth curves is not the whole story of minimisation. A curve may satisfy the differential equation and still fail to be the global minimiser.
## First Variation and Stationarity
To extract a necessary condition from minimality, we compare a candidate curve with nearby admissible curves. The first question is not how $J$ changes, but which infinitesimal directions respect the admissible class.
[definition: Admissible Variation]
Let $\mathcal A\subset C^1([a,b];\mathbb R^n)$ and let $y\in\mathcal A$. A curve $h\in C^1([a,b];\mathbb R^n)$ is an admissible variation at $y$ if there exists $\varepsilon_0>0$ such that
\begin{align*}
y+\varepsilon h\in\mathcal A
\end{align*}
for all $\varepsilon\in(-\varepsilon_0,\varepsilon_0)$.
[/definition]
An admissible variation identifies a line of nearby competitors through $y$. Local minimality only gives inequalities for the values of $J[y+\varepsilon h]$, so the next task is to extract the first-order term in this one-parameter comparison. If that directional derivative exists, it is the quantity that must vanish at an interior local extremum and the quantity that later becomes an integral expression involving $h$ and $h'$.
[definition: First Variation]
Let $J:\mathcal A\to\mathbb R$, let $y\in\mathcal A$, and let $h$ be an admissible variation at $y$. If the limit exists, the first variation of $J$ at $y$ in the direction $h$ is
\begin{align*}
\delta J[y;h]
= \frac{d}{d\varepsilon}\Big|_{\varepsilon=0} J[y+\varepsilon h]
= \lim_{\varepsilon\to 0}\frac{J[y+\varepsilon h]-J[y]}{\varepsilon}.
\end{align*}
[/definition]
The first variation is a directional derivative of a functional. Since the same construction appears for maps on arbitrary normed vector spaces, it is useful to name the general derivative before specialising back to integral functionals. This motivates the Gâteaux derivative.
[definition: Gateaux Derivative]
Let $X$ be a [normed vector space](/page/Normed%20Vector%20Space), let $U\subset X$, and let $J:U\to\mathbb R$. The Gâteaux derivative of $J$ at $y\in U$ in the direction $h\in X$ is the derivative at $0$ of the map
\begin{align*}
\varepsilon\mapsto J[y+\varepsilon h],
\end{align*}
whenever this derivative is defined for sufficiently small $\varepsilon$.
[/definition]
Thus the first variation is the Gâteaux derivative written in the notation traditional to the calculus of variations. For the integral functionals of this course, the abstract derivative must be converted into a formula involving $L$, $h$, and $h'$. If
\begin{align*}
J[y]=\int_a^b L(x,y(x),y'(x))\,dx
\end{align*}
and $L$ is smooth enough to differentiate under the integral sign, then the first variation is
\begin{align*}
\delta J[y;h]=\int_a^b \left(\partial_yL(x,y,y')\cdot h+\partial_{y'}L(x,y,y')\cdot h'\right)\,dx.
\end{align*}
This formula is the computational bridge between the original optimisation problem and differential equations. It separates the effect of changing the curve itself, represented by $\partial_yL\cdot h$, from the effect of changing its velocity, represented by $\partial_{y'}L\cdot h'$. The hypotheses are doing real work: differentiability of $L$ justifies differentiating through the integral, and admissibility of the variation ensures that the curve being tested stays inside the problem. The formula is not yet the Euler--Lagrange equation, because the test direction still appears through both $h$ and $h'$. The next step is to define the vanishing of this first-order expression as stationarity, and then use integration by parts to convert stationarity into an equation for the extremal.
[definition: Stationary Point]
Let $J:\mathcal A\to\mathbb R$ and let $y\in\mathcal A$. The curve $y$ is stationary for $J$ on $\mathcal A$ if
\begin{align*}
\delta J[y;h]=0
\end{align*}
for every admissible variation $h$ at $y$ for which the first variation exists.
[/definition]
Stationarity is a necessary first-order condition, not a complete answer to the minimisation problem. A stationary curve may be a minimum, a maximum, or neither. The reason local extrema are stationary comes from reducing the variational problem to a one-variable extremum along each admissible direction.
[quotetheorem:6987]
[citeproof:6987]
The theorem reduces a problem over infinitely many curves to a family of one-dimensional perturbation problems. Each admissible direction $h$ gives an ordinary function $\varepsilon\mapsto J[y+\varepsilon h]$, and local optimality of $y$ forces that one-variable function to have zero derivative at $\varepsilon=0$. The hypotheses are deliberately modest but necessary: the variation must stay inside the admissible class, and the first variation must exist in that direction. If the endpoint constraints, obstacle constraints, or corner constraints exclude a direction, this theorem says nothing about that excluded perturbation. It also gives only a necessary condition; a curve can satisfy $\delta J[y;h]=0$ for every admissible $h$ and still be a maximum, saddle, or non-minimising stationary curve. Its role is therefore diagnostic rather than decisive. In the next section, the explicit first-variation formula and integration by parts turn this abstract vanishing condition into the Euler--Lagrange equation and the natural boundary terms.
[example: First Variation of the Energy Functional]
Let
\begin{align*}
J[y]=\frac12\int_a^b |y'(x)|^2\,dx
\end{align*}
on fixed-endpoint curves $y\in C^1([a,b];\mathbb R^n)$. If $h\in C^1([a,b];\mathbb R^n)$ satisfies $h(a)=h(b)=0$, then $y+\varepsilon h$ has the same endpoints as $y$ for every $\varepsilon$, so $h$ is an admissible variation. For small $\varepsilon$,
\begin{align*}
J[y+\varepsilon h]=\frac12\int_a^b |y'(x)+\varepsilon h'(x)|^2\,dx.
\end{align*}
Using $|u+\varepsilon v|^2=|u|^2+2\varepsilon u\cdot v+\varepsilon^2|v|^2$ with $u=y'(x)$ and $v=h'(x)$,
\begin{align*}
J[y+\varepsilon h]=\frac12\int_a^b \left(|y'(x)|^2+2\varepsilon y'(x)\cdot h'(x)+\varepsilon^2|h'(x)|^2\right)\,dx.
\end{align*}
By linearity of the integral,
\begin{align*}
J[y+\varepsilon h]=J[y]+\varepsilon\int_a^b y'(x)\cdot h'(x)\,dx+\frac{\varepsilon^2}{2}\int_a^b |h'(x)|^2\,dx.
\end{align*}
Therefore
\begin{align*}
\frac{J[y+\varepsilon h]-J[y]}{\varepsilon}=\int_a^b y'(x)\cdot h'(x)\,dx+\frac{\varepsilon}{2}\int_a^b |h'(x)|^2\,dx.
\end{align*}
Letting $\varepsilon\to 0$ gives
\begin{align*}
\delta J[y;h]=\int_a^b y'(x)\cdot h'(x)\,dx.
\end{align*}
Now assume $y\in C^2([a,b];\mathbb R^n)$. Integration by parts for the scalar function $x\mapsto y'(x)\cdot h(x)$ gives
\begin{align*}
\int_a^b y'(x)\cdot h'(x)\,dx=\left[y'(x)\cdot h(x)\right]_{a}^{b}-\int_a^b y''(x)\cdot h(x)\,dx.
\end{align*}
Since $h(a)=h(b)=0$,
\begin{align*}
\left[y'(x)\cdot h(x)\right]_{a}^{b}=y'(b)\cdot h(b)-y'(a)\cdot h(a)=0.
\end{align*}
Thus
\begin{align*}
\delta J[y;h]=-\int_a^b y''(x)\cdot h(x)\,dx.
\end{align*}
If $y$ is stationary, then this quantity is $0$ for every fixed-endpoint variation $h$, so
\begin{align*}
\int_a^b y''(x)\cdot h(x)\,dx=0
\end{align*}
for every such $h$. By the *fundamental lemma of the calculus of variations*, $y''(x)=0$ on $(a,b)$. Hence $y'(x)$ is constant, and so $y(x)=cx+d$ for fixed vectors $c,d\in\mathbb R^n$. With endpoints $y(a)=p$ and $y(b)=q$, this affine curve is
\begin{align*}
y(x)=p+\frac{x-a}{b-a}(q-p).
\end{align*}
Thus the stationary fixed-endpoint curves for the energy functional are precisely affine line segments.
[/example]
This energy example previews the structure of the next chapter: compute $\delta J$, integrate by parts, use arbitrariness of $h$, and read off a differential equation. The only new ingredient needed is the fundamental lemma of the calculus of variations, which converts integral identities against all test variations into pointwise equations.
The first variation measures how an integral responds to a small perturbation of the curve, but a variational statement alone is not yet a differential equation. The Euler–Lagrange equation emerges by integrating by parts and invoking the fundamental lemma, which converts integral constraints against all test variations into pointwise conditions.
# 2. The Euler–Lagrange Equation
The previous chapter introduced functionals, admissible variations, and the first variation as the linear response of an integral functional to a perturbation of a curve. In the notation of this chapter, the functional is written $I$ rather than $J$. The central question now is how the abstract stationarity condition $\delta I[y;h]=0$ becomes a differential equation for the unknown extremal $y$. The answer is the Euler-Lagrange equation: integration by parts moves derivatives off the arbitrary variation, and the remaining arbitrariness forces a pointwise condition on the integrand.
This chapter develops that passage in the classical one-dimensional setting. We first prove the fundamental lemma of the calculus of variations, then derive the scalar and vector Euler-Lagrange equations, extract first integrals in autonomous problems, and finish with the weak DuBois-Reymond formulation that explains what remains valid when solutions are not initially known to be smooth. The same pattern reappears in mechanics as conservation laws, in differential geometry as geodesic equations, and in PDE as the passage from weak formulations to classical equations under regularity.
## From Stationarity to a Pointwise Equation
The first variation is an integral identity involving every admissible perturbation $h$. The problem is to convert this global identity into local information about $y$, because an ordinary differential equation is a pointwise condition. The bridge is the ability to choose variations whose support is concentrated in a small interval.
[definition: Compactly Supported Variation]
Let $a<b$ and let $y \in C^1([a,b];\mathbb R^n)$. A compactly supported variation of $y$ is a function $h \in C_c^1((a,b);\mathbb R^n)$.
[/definition]
Compact support in $(a,b)$ encodes fixed endpoints without carrying boundary terms through every formula: $h(a)=h(b)=0$. To state the variational condition that will later become a differential equation, we need a name for curves whose first variation vanishes against all such interior perturbations.
[definition: Stationary Curve for Fixed Endpoints]
Let $L \in C^1([a,b]\times \mathbb R^n\times \mathbb R^n;\mathbb R)$ and let $\mathcal A\subset C^1([a,b];\mathbb R^n)$ be an admissible class of curves with prescribed endpoint values. Define the functional $I:\mathcal A\to\mathbb R$ by
\begin{align*}
I[y] = \int_a^b L(x,y(x),y'(x))\,dx,
\qquad y\in\mathcal A.
\end{align*}
A curve $y \in \mathcal A$ is stationary for fixed-endpoint variations if
\begin{align*}
\delta I[y;h] = 0
\end{align*}
for every $h \in C_c^1((a,b);\mathbb R^n)$ for which the first variation is defined.
[/definition]
Stationarity is still implicit until the first variation is written as an integral involving $L$. The formula above supplies that expression and gives the raw material for the integration-by-parts argument.
The hypotheses in the formula do real work. The $C^1$ regularity of $L$ supplies the finite-dimensional chain rule in the variables $(y,y')$, while the assumed differentiability of $\varepsilon\mapsto I[y+\varepsilon h]$ is what permits the first variation to exist as a derivative of the integral rather than only as a formal expression. If $L$ has a corner in the velocity variable, such as $L(p)=|p|$, then the derivative with respect to $\varepsilon$ can fail when $y'+\varepsilon h'$ crosses zero; if the differentiated integrand lacks a local integrable bound, differentiating under the integral sign can also fail. The theorem therefore computes the first variation when that differentiation is justified; it does not by itself prove stationarity, minimality, or the Euler-Lagrange equation.
The formula separates the variation into $h$ and $h'$. A pointwise necessary condition should involve only $h$, since the final coefficient must be tested against arbitrary local perturbations. The next obstruction is therefore the derivative on the [test function](/page/Test%20Function): before the fundamental lemma can be applied, integration by parts must move $h'$ onto the momentum term $\partial_{y'}L$. Once that has been done, the fundamental lemma removes the remaining integral identity and turns stationarity into a pointwise equation.
[quotetheorem:45]
[citeproof:45]
The lemma says that a continuous coefficient paired against all compactly supported variations must vanish. Continuity is what upgrades the conclusion from an integral statement to a pointwise statement: if the coefficient is positive at one point, it remains positive on a small interval and a non-negative test function supported there detects it. Without continuity, the same hypothesis only forces vanishing almost everywhere; a function that is nonzero at a single point still pairs to zero with every ordinary test function. Compact support also explains why endpoints require care: the tests live inside $(a,b)$, so endpoint values are recovered here only through continuity from the interior, and in weaker distributional versions no independent endpoint conclusion is available.
[example: Detecting a Nonzero Coefficient]
Let $f(x)=x-\frac12$ on $[0,1]$. Define $h$ by setting $h(x)=\left(x-\frac34\right)^2\left(\frac78-x\right)^2$ for $\frac34\le x\le \frac78$, and setting $h(x)=0$ outside this interval. Because the polynomial factor and its first derivative vanish at $x=\frac34$ and $x=\frac78$, this gives $h\in C_c^1((0,1))$. Also $h\ge 0$, and $h(x)>0$ for every $x\in\left(\frac34,\frac78\right)$.
On the support interval where $h$ is positive, we have
\begin{align*}
f(x)=x-\frac12>\frac34-\frac12=\frac14.
\end{align*}
Therefore
\begin{align*}
\int_0^1 f(x)h(x)\,dx=\int_{3/4}^{7/8}\left(x-\frac12\right)\left(x-\frac34\right)^2\left(\frac78-x\right)^2\,dx.
\end{align*}
For every $x\in\left(\frac34,\frac78\right)$, the factor $x-\frac12$ is positive and the factor $\left(x-\frac34\right)^2\left(\frac78-x\right)^2$ is positive, so the integrand is positive on a nonempty interval. Hence
\begin{align*}
\int_0^1 f(x)h(x)\,dx>0.
\end{align*}
Thus the integral cannot vanish for every compactly supported test function. A test supported in a small interval is already enough to detect that the coefficient has a fixed nonzero sign there.
[/example]
## The Euler-Lagrange Equation
We now apply the fundamental lemma to the first variation. The problem is to remove the derivative from $h'$ without losing information. For fixed endpoints, the boundary term vanishes, leaving a coefficient of $h$ that must be zero.
[quotetheorem:3504]
[citeproof:3504]
The theorem gives a necessary condition, not a sufficient condition. The $C^2$ assumptions make the integration-by-parts step classical: $x\mapsto \partial_{y'}L(x,y(x),y'(x))$ is differentiable, so the displayed derivative is an ordinary derivative. If $y$ is only $C^1$, the same stationarity condition may still make sense weakly, but the term $d(\partial_{y'}L)/dx$ may not exist pointwise; for example, an absolutely continuous momentum with a corner has a [weak derivative](/page/Weak%20Derivative) but not a classical derivative at the corner. The fixed-endpoint hypothesis is also essential for this form of the equation, since allowing variations with $h(a)$ or $h(b)$ nonzero produces boundary terms and natural boundary conditions rather than only the interior Euler-Lagrange equation.
We therefore need terminology for curves selected by this necessary differential equation, independently of whether they minimise the functional. This terminology separates two ideas that are often conflated: being a minimiser is a variational property, while satisfying the Euler-Lagrange equation is a differential property. The distinction matters because later examples will first solve the differential equation to obtain candidate curves, then use separate arguments to decide which candidates minimise.
[definition: Extremal]
Let $L \in C^2([a,b]\times \mathbb R^n\times \mathbb R^n;\mathbb R)$. A curve $y\in C^2((a,b);\mathbb R^n)$ is a classical extremal for
\begin{align*}
I[y]=\int_a^b L(x,y(x),y'(x))\,dx
\end{align*}
if it satisfies the Euler-Lagrange equation associated with $L$ at every point of $(a,b)$.
[/definition]
The displayed theorem already includes vector-valued curves, so the component interpretation is immediate: applying integration by parts to each coordinate gives a system of second-order ODEs. Each coordinate of the unknown curve contributes one Euler-Lagrange equation, and the compact support of every component of $h$ is what prevents extra endpoint conditions from appearing in the statement. The hypotheses also exclude constrained variations such as curves forced to remain on a sphere; in that setting the admissible perturbations are tangent to the constraint, and the unconstrained vector equation must be replaced by a projected equation or by Lagrange multipliers. As in the scalar case, the result is necessary for smooth stationary curves and does not decide whether the extremal is a minimiser.
[example: Shortest Paths in the Plane]
For curves $y:[a,b]\to\mathbb R$ written as graphs, the length functional has Lagrangian
\begin{align*}
L(y,p)=\sqrt{1+p^2}.
\end{align*}
Since $L$ has no $y$ dependence, $\partial_y L=0$. Differentiating the velocity dependence gives
\begin{align*}
\partial_p L(y,p)=\frac{1}{2}(1+p^2)^{-1/2}\cdot 2p=\frac{p}{\sqrt{1+p^2}}.
\end{align*}
Thus the Euler-Lagrange equation for a stationary graph is
\begin{align*}
0-\frac{d}{dx}\left(\frac{y'}{\sqrt{1+(y')^2}}\right)=0.
\end{align*}
Equivalently,
\begin{align*}
\frac{d}{dx}\left(\frac{y'}{\sqrt{1+(y')^2}}\right)=0.
\end{align*}
Therefore there is a constant $C$ such that
\begin{align*}
\frac{y'(x)}{\sqrt{1+(y'(x))^2}}=C
\end{align*}
for every $x\in(a,b)$. To see that this forces $y'$ to be constant, define $\Phi(p)=p/\sqrt{1+p^2}$. Then
\begin{align*}
\Phi'(p)=\frac{1}{\sqrt{1+p^2}}-\frac{p^2}{(1+p^2)^{3/2}}=\frac{1+p^2-p^2}{(1+p^2)^{3/2}}=\frac{1}{(1+p^2)^{3/2}}>0.
\end{align*}
Hence $\Phi$ is strictly increasing, so $\Phi(y'(x))=C$ implies that $y'(x)$ is the same constant for all $x$. Integrating $y'=m$ gives $y(x)=mx+d$, so the extremals are straight line segments. The variational equation recovers the geometric fact that straight paths are the stationary curves for planar graph length.
[/example]
## First Integrals and the Beltrami Identity
Solving the Euler-Lagrange equation directly may be difficult because it is usually second order. The next question is whether special structure in $L$ lowers the order. When the Lagrangian has no explicit $x$ dependence, a conserved quantity appears.
[definition: Autonomous Lagrangian]
Let $\mathcal A\subset C^1([a,b];\mathbb R^n)$ be an admissible class of curves. A functional $I:\mathcal A\to\mathbb R$ is generated by an autonomous Lagrangian if there exists $L\in C^2(\mathbb R^n\times\mathbb R^n;\mathbb R)$ such that
\begin{align*}
I[y]=\int_a^b L(y(x),y'(x))\,dx,
\qquad y\in\mathcal A.
\end{align*}
[/definition]
Autonomy means that translating the independent variable does not change the rule assigning cost. We now need the conserved quantity produced by this symmetry, because it reduces the Euler-Lagrange equation to a first-order relation.
[quotetheorem:3505]
[citeproof:3505]
This identity is especially useful in scalar problems where the first integral can be solved for $y'$ as a function of $y$. Autonomy is essential: if $L(x,y,p)=x p^2/2$, then an Euler-Lagrange solution need not make $L-\partial_pL\,p=-xp^2/2$ constant, because the missing term in the derivative is precisely the explicit $\partial_xL$. The identity is only a first integral along curves that already satisfy the Euler-Lagrange equation; it does not prove that a curve is stationary, and it may not determine $y$ uniquely without endpoint data and a solvable first-order relation. The $C^2$ and $C^2$-curve hypotheses ensure that all quantities in the differentiated conserved expression are classical. In the brachistochrone and catenoid computations below, the identity is the step that lowers the Euler-Lagrange equation to a separable first-order equation before the explicit parametrisations appear.
The brachistochrone is the standard example: time minimisation under gravity becomes an autonomous variational problem after choosing the vertical coordinate convention.
[example: Brachistochrone and the Cycloid]
Let $g>0$, and let $y>0$ measure vertical distance fallen from rest. Conservation of mechanical energy gives $\frac12 v^2=gy$, hence $v=\sqrt{2gy}$. For a graph $y=y(x)$, the arclength element is $ds=\sqrt{1+(y')^2}\,dx$, so the travel time is
\begin{align*}
T[y]=\int \frac{ds}{v}=\int \frac{\sqrt{1+(y')^2}}{\sqrt{2gy}}\,dx.
\end{align*}
Thus
\begin{align*}
L(y,p)=\frac{\sqrt{1+p^2}}{\sqrt{2gy}}.
\end{align*}
The Lagrangian is autonomous. Its momentum is
\begin{align*}
\partial_pL(y,p)=\frac{1}{\sqrt{2gy}}\cdot \frac{1}{2}(1+p^2)^{-1/2}\cdot 2p=\frac{p}{\sqrt{2gy}\sqrt{1+p^2}}.
\end{align*}
Therefore
\begin{align*}
L(y,p)-p\,\partial_pL(y,p)=\frac{\sqrt{1+p^2}}{\sqrt{2gy}}-\frac{p^2}{\sqrt{2gy}\sqrt{1+p^2}}=\frac{1+p^2-p^2}{\sqrt{2gy}\sqrt{1+p^2}}.
\end{align*}
Hence, by the *Beltrami Identity*, an extremal satisfies
\begin{align*}
\frac{1}{\sqrt{2gy}\sqrt{1+(y')^2}}=C
\end{align*}
for some constant $C>0$. Squaring gives
\begin{align*}
1=2gC^2y(1+(y')^2).
\end{align*}
If $A=1/(2gC^2)$, then
\begin{align*}
y(1+(y')^2)=A.
\end{align*}
Equivalently,
\begin{align*}
(y')^2=\frac{A-y}{y}.
\end{align*}
Write $A=2a$ with $a>0$ and set
\begin{align*}
y=a(1-\cos\theta).
\end{align*}
The cycloid parametrisation
\begin{align*}
x=a(\theta-\sin\theta)+x_0
\end{align*}
has
\begin{align*}
\frac{dy}{d\theta}=a\sin\theta
\end{align*}
and
\begin{align*}
\frac{dx}{d\theta}=a(1-\cos\theta).
\end{align*}
Therefore, where $dx/d\theta\ne0$,
\begin{align*}
y'=\frac{dy/d\theta}{dx/d\theta}=\frac{\sin\theta}{1-\cos\theta}.
\end{align*}
Substituting this into the first-order equation gives
\begin{align*}
y(1+(y')^2)=a(1-\cos\theta)\left(1+\frac{\sin^2\theta}{(1-\cos\theta)^2}\right).
\end{align*}
Since $\sin^2\theta=1-\cos^2\theta$, the factor in parentheses is
\begin{align*}
1+\frac{\sin^2\theta}{(1-\cos\theta)^2}=\frac{(1-\cos\theta)^2+\sin^2\theta}{(1-\cos\theta)^2}=\frac{2(1-\cos\theta)}{(1-\cos\theta)^2}=\frac{2}{1-\cos\theta}.
\end{align*}
Thus
\begin{align*}
y(1+(y')^2)=a(1-\cos\theta)\frac{2}{1-\cos\theta}=2a=A.
\end{align*}
So the extremal is an arc of a cycloid,
\begin{align*}
x=a(\theta-\sin\theta)+x_0,\qquad y=a(1-\cos\theta),
\end{align*}
with $a$ and $x_0$ determined by the endpoint conditions. The variational first integral has reduced the time-minimising curve to the classical cycloid.
[/example]
The same principle appears in geometric problems. Symmetries of the integrand create conserved quantities, and these conserved quantities reduce the order of the equations before explicit integration begins.
[example: Minimal Surfaces of Revolution]
Rotate the positive graph $r=r(x)>0$ about the $x$-axis. Up to the constant factor $2\pi$, the surface area functional is
\begin{align*}
A[r]=\int_a^b r\sqrt{1+(r')^2}\,dx.
\end{align*}
For $L(r,p)=r\sqrt{1+p^2}$, the momentum is
\begin{align*}
\partial_pL(r,p)=r\cdot \frac{1}{2}(1+p^2)^{-1/2}\cdot 2p=\frac{rp}{\sqrt{1+p^2}}.
\end{align*}
Since $L$ is autonomous, the *Beltrami Identity* gives
\begin{align*}
L(r,r')-r'\partial_pL(r,r')=C.
\end{align*}
The left-hand side is
\begin{align*}
r\sqrt{1+(r')^2}-r'\frac{rr'}{\sqrt{1+(r')^2}}=\frac{r(1+(r')^2)-r(r')^2}{\sqrt{1+(r')^2}}=\frac{r}{\sqrt{1+(r')^2}}.
\end{align*}
Thus an extremal satisfies
\begin{align*}
\frac{r}{\sqrt{1+(r')^2}}=C.
\end{align*}
Because $r>0$, the constant in this equation must be positive. Squaring gives
\begin{align*}
\frac{r^2}{1+(r')^2}=C^2.
\end{align*}
Multiplying by $1+(r')^2$ and then dividing by $C^2$ gives
\begin{align*}
1+(r')^2=\frac{r^2}{C^2}.
\end{align*}
Hence
\begin{align*}
(r')^2=\frac{r^2-C^2}{C^2}.
\end{align*}
On an interval where $r'\ne0$, separation gives
\begin{align*}
\frac{dr}{dx}=\pm \frac{\sqrt{r^2-C^2}}{C}.
\end{align*}
Equivalently,
\begin{align*}
dx=\pm \frac{C\,dr}{\sqrt{r^2-C^2}}.
\end{align*}
Using the substitution $r=Cu$, so that $dr=C\,du$, the integral becomes
\begin{align*}
\int \frac{C\,dr}{\sqrt{r^2-C^2}}=C\int \frac{du}{\sqrt{u^2-1}}=C\,\operatorname{arcosh}(u)=C\,\operatorname{arcosh}\left(\frac{r}{C}\right).
\end{align*}
Therefore
\begin{align*}
x-x_0=\pm C\,\operatorname{arcosh}\left(\frac{r}{C}\right).
\end{align*}
Applying $\cosh$ and using that $\cosh$ is even gives
\begin{align*}
r(x)=C\cosh\left(\frac{x-x_0}{C}\right).
\end{align*}
Indeed, for this formula,
\begin{align*}
r'(x)=\sinh\left(\frac{x-x_0}{C}\right),
\end{align*}
and
\begin{align*}
\frac{r}{\sqrt{1+(r')^2}}=\frac{C\cosh((x-x_0)/C)}{\sqrt{1+\sinh^2((x-x_0)/C)}}=\frac{C\cosh((x-x_0)/C)}{\cosh((x-x_0)/C)}=C.
\end{align*}
The corresponding surface of revolution is a catenoid, with $C$ and $x_0$ fixed by the prescribed boundary radii and the separation of the boundary circles.
[/example]
## Geodesics and Constrained Coordinates
The Euler-Lagrange equation also governs curves constrained to lie on a surface, but the coordinates used to describe the surface matter. The question is how to write a length or energy functional in local coordinates so that ordinary Euler-Lagrange equations encode intrinsic geometry.
[example: Geodesics on the Sphere]
Let the unit sphere be parametrised away from the poles by
\begin{align*}
q(\theta,\varphi)=(\sin\theta\cos\varphi,\sin\theta\sin\varphi,\cos\theta),
\end{align*}
where $\theta\in(0,\pi)$ and $\varphi\in\mathbb R/2\pi\mathbb Z$. For the coordinate Lagrangian
\begin{align*}
L(\theta,\varphi,\theta',\varphi')=\frac12\left((\theta')^2+\sin^2\theta\,(\varphi')^2\right),
\end{align*}
the energy is $E[\theta,\varphi]=\int_a^b L(\theta,\varphi,\theta',\varphi')\,dt$.
First compute the $\varphi$ equation. Since $L$ has no explicit $\varphi$ dependence,
\begin{align*}
\partial_\varphi L=0.
\end{align*}
Also,
\begin{align*}
\partial_{\varphi'}L=\frac12\cdot 2\sin^2\theta\,\varphi'=\sin^2\theta\,\varphi'.
\end{align*}
Thus the Euler-Lagrange equation in the $\varphi$ coordinate is
\begin{align*}
0-\frac{d}{dt}\left(\sin^2\theta\,\varphi'\right)=0.
\end{align*}
Equivalently,
\begin{align*}
\frac{d}{dt}\left(\sin^2\theta\,\varphi'\right)=0.
\end{align*}
Therefore $\sin^2\theta\,\varphi'$ is constant along the curve; this is the conserved angular momentum about the polar axis.
Now compute the $\theta$ equation. The velocity derivative is
\begin{align*}
\partial_{\theta'}L=\theta'.
\end{align*}
The $\theta$ derivative is
\begin{align*}
\partial_\theta L=\frac12\cdot 2\sin\theta\cos\theta\,(\varphi')^2=\sin\theta\cos\theta\,(\varphi')^2.
\end{align*}
Hence
\begin{align*}
\sin\theta\cos\theta\,(\varphi')^2-\frac{d}{dt}(\theta')=0.
\end{align*}
Since $\frac{d}{dt}(\theta')=\theta''$, this is
\begin{align*}
\theta''-\sin\theta\cos\theta\,(\varphi')^2=0.
\end{align*}
To see the geometric meaning, introduce the moving tangent vectors
\begin{align*}
e_\theta=\partial_\theta q,\qquad e_\varphi=\frac{1}{\sin\theta}\partial_\varphi q.
\end{align*}
Then $e_\theta$ and $e_\varphi$ are orthonormal tangent vectors on the coordinate patch, and
\begin{align*}
q'=\theta'e_\theta+\sin\theta\,\varphi'e_\varphi.
\end{align*}
Differentiating this expression and collecting the $e_\theta$, $e_\varphi$, and normal $q$ components gives
\begin{align*}
q''=\left(\theta''-\sin\theta\cos\theta\,(\varphi')^2\right)e_\theta+\left(\sin\theta\,\varphi''+2\cos\theta\,\theta'\varphi'\right)e_\varphi-\left((\theta')^2+\sin^2\theta\,(\varphi')^2\right)q.
\end{align*}
The $\theta$ equation makes the $e_\theta$ component vanish. Expanding the conserved momentum equation gives
\begin{align*}
\frac{d}{dt}\left(\sin^2\theta\,\varphi'\right)=2\sin\theta\cos\theta\,\theta'\varphi'+\sin^2\theta\,\varphi''.
\end{align*}
Because $\sin\theta>0$ on the coordinate patch, the equation $\frac{d}{dt}(\sin^2\theta\,\varphi')=0$ is equivalent to
\begin{align*}
\sin\theta\,\varphi''+2\cos\theta\,\theta'\varphi'=0.
\end{align*}
Thus the $e_\varphi$ component of $q''$ also vanishes, so
\begin{align*}
q''=-\left((\theta')^2+\sin^2\theta\,(\varphi')^2\right)q.
\end{align*}
The squared speed is
\begin{align*}
|q'|^2=(\theta')^2+\sin^2\theta\,(\varphi')^2.
\end{align*}
Differentiating it gives
\begin{align*}
\frac{d}{dt}|q'|^2=2\theta'\theta''+2\sin\theta\cos\theta\,\theta'(\varphi')^2+2\sin^2\theta\,\varphi'\varphi''.
\end{align*}
Using $\theta''=\sin\theta\cos\theta(\varphi')^2$ and $\sin^2\theta\,\varphi''=-2\sin\theta\cos\theta\,\theta'\varphi'$ in this line gives
\begin{align*}
\frac{d}{dt}|q'|^2=2\sin\theta\cos\theta\,\theta'(\varphi')^2+2\sin\theta\cos\theta\,\theta'(\varphi')^2-4\sin\theta\cos\theta\,\theta'(\varphi')^2=0.
\end{align*}
So $|q'|=v$ is constant. If $v>0$, the embedded curve satisfies
\begin{align*}
q''+v^2q=0.
\end{align*}
Solving this linear equation in $\mathbb R^3$ gives
\begin{align*}
q(t)=u\cos(v(t-t_0))+w\sin(v(t-t_0)),
\end{align*}
where $u=q(t_0)$ and $w=q'(t_0)/v$. Since $|u|=1$, $u\cdot w=0$, and $|w|=1$, the curve lies in the plane spanned by $u$ and $w$, and that plane passes through the origin. Its intersection with the unit sphere is a great circle. Thus the [coordinate Euler-Lagrange equations](/theorems/6835) express the invariant fact that fixed-speed geodesics on the sphere are great circles, while the conserved quantity $\sin^2\theta\,\varphi'$ records the rotational symmetry about the polar axis.
[/example]
This example is a warning about coordinates rather than a new theorem. The Euler-Lagrange equations are written in the chosen chart, while the geometric curve may have a simpler invariant description.
[remark: Energy Instead of Length]
For fixed endpoints and fixed parameter interval, geodesics are often obtained from the energy functional rather than the length functional. The energy integrand is smooth in the velocities, while the length integrand is homogeneous and can become singular at zero velocity. Constant-speed critical points of energy are precisely the geodesics parametrised proportionally to arclength.
[/remark]
## Weak Euler-Lagrange Equations and DuBois-Reymond
The classical derivation assumed enough smoothness to differentiate $\partial_{y'}L(x,y,y')$ with respect to $x$. In variational problems, a minimiser may first be known only in a weaker class. The question is how much of the Euler-Lagrange equation survives before full regularity is established.
[definition: Weak Euler-Lagrange Equation]
Let $L\in C^1([a,b]\times\mathbb R^n\times\mathbb R^n;\mathbb R)$ and let $y\in C^1([a,b];\mathbb R^n)$. The curve $y$ satisfies the weak Euler-Lagrange equation if
\begin{align*}
\int_a^b \left(\partial_y L(x,y,y')\cdot h + \partial_{y'}L(x,y,y')\cdot h'\right)\,dx=0
\end{align*}
for every $h\in C_c^1((a,b);\mathbb R^n)$.
[/definition]
The weak form is exactly the vanishing of the first variation. DuBois-Reymond's lemma converts such an identity into an integrated form of the Euler-Lagrange equation, requiring less differentiability than the classical equation.
[quotetheorem:3521]
[citeproof:3521]
Applied with $f=\partial_yL(x,y,y')$ and $g=\partial_{y'}L(x,y,y')$, the lemma says that $\partial_{y'}L$ is an antiderivative of $\partial_yL$ up to a constant. The continuity assumptions are what make the conclusion a pointwise identity for every $x$; with merely integrable $f$ and $g$, the same argument belongs to distribution theory and gives an almost-everywhere or weak conclusion instead. Compact support of $h$ is also essential: if endpoint values of $h$ were allowed, integration by parts would leave boundary terms, and $g$ could be constrained by endpoint conditions rather than only by a constant vector in the interior. The constant vector reflects the fact that testing only against derivatives $h'$ detects the derivative of $g-\int f$ but not its additive constant.
The next result records the regularity upgrade from the weak integral identity back to the pointwise equation used earlier in the chapter. DuBois-Reymond has already identified the momentum $\partial_{y'}L$ as an antiderivative of $\partial_yL$ up to a constant, but that statement is still an integrated relation. To recover the classical Euler-Lagrange equation, we need enough regularity to differentiate the momentum in the ordinary sense and compare its derivative with $\partial_yL$ point by point.
[quotetheorem:6988]
[citeproof:6988]
This weak-to-classical passage is the prototype for later variational analysis. The $C^1$ momentum hypothesis is the point at which the integrated DuBois-Reymond relation becomes an ordinary differential equation; if the momentum is only continuous, the identity still says it is an integral of a [continuous function](/page/Continuous%20Function) plus a constant and hence differentiable, but that conclusion must come from the lemma rather than from a pre-existing classical derivative. If the right-hand side were only $L^1$, the derivative of the momentum would generally exist only almost everywhere or distributionally, so the classical equation at every point would be too strong. Thus the theorem records a regularity threshold, not a general smoothness theorem for all weak stationary curves.
[example: Weak Form for the Dirichlet Energy in One Dimension]
For the one-dimensional Dirichlet energy, the Lagrangian is $L(y,p)=\frac12 p^2$, so
\begin{align*}
\partial_yL(y,p)=0.
\end{align*}
Also,
\begin{align*}
\partial_pL(y,p)=p.
\end{align*}
Therefore the weak Euler-Lagrange equation becomes
\begin{align*}
\int_a^b y'(x)h'(x)\,dx=0
\end{align*}
for every $h\in C_c^1((a,b))$.
This is the DuBois-Reymond form with $f(x)=0$ and $g(x)=y'(x)$. By the *DuBois-Reymond Lemma*, there is a constant $C\in\mathbb R$ such that
\begin{align*}
y'(x)=C+\int_a^x 0\,ds.
\end{align*}
Since
\begin{align*}
\int_a^x 0\,ds=0,
\end{align*}
we get
\begin{align*}
y'(x)=C
\end{align*}
on $(a,b)$. Fixing any $x_0\in(a,b)$ and integrating from $x_0$ to $x$ gives
\begin{align*}
y(x)-y(x_0)=\int_{x_0}^x C\,ds.
\end{align*}
The constant integral is
\begin{align*}
\int_{x_0}^x C\,ds=C(x-x_0),
\end{align*}
so
\begin{align*}
y(x)=Cx+\bigl(y(x_0)-Cx_0\bigr).
\end{align*}
Thus every weak stationary curve for the one-dimensional Dirichlet energy is affine, which is the weak form of the shortest-path equation.
[/example]
## Regularity and the Role of the Legendre Condition
The Euler-Lagrange equation may be implicit in the highest derivative. The final issue in this chapter is when a weak or first-order integrated equation actually gives a regular second-order ODE for $y$. The answer depends on whether the map $p\mapsto \partial_{y'}L(x,y,p)$ can be inverted locally.
[definition: Regular Lagrangian in One Dimension]
Let $L\in C^2([a,b]\times\mathbb R\times\mathbb R;\mathbb R)$. The Lagrangian is regular along a curve $y\in C^1([a,b];\mathbb R)$ if
\begin{align*}
\partial_{y'y'}L(x,y(x),y'(x))\ne 0
\end{align*}
for every $x\in[a,b]$.
[/definition]
Regularity is the scalar version of nondegeneracy of the momentum variable $\partial_{y'}L$. The analytic obstruction is that the Euler-Lagrange equation contains the derivative of $\partial_{y'}L(x,y,y')$, and after expanding this derivative the coefficient of $y''$ is $\partial_{y'y'}L$. If that coefficient vanishes, the equation may fail to determine an acceleration at all. Under regularity, the implicit second-order relation can instead be read as an ordinary differential equation for $y$.
[quotetheorem:6989]
[citeproof:6989]
This result does not prove that every weak extremal is smooth; it identifies the mechanism by which smoothness is propagated once the momentum relation is nondegenerate. The failure mode is concrete: for $L(p)=p^4$, the coefficient $\partial_{pp}L=12p^2$ vanishes whenever $p=0$, so the expanded Euler-Lagrange equation cannot be divided by $\partial_{pp}L$ at points where $y'=0$. In such a degenerate problem, the Euler-Lagrange equation may impose a lower-order relation on the momentum without selecting a unique acceleration. Later necessary conditions refine this nondegeneracy into the Legendre condition for minima, but even regularity and the ODE form remain necessary-equation tools rather than proofs of minimality.
[remark: Necessary Equations Versus Minimality]
The Euler-Lagrange equation, the Beltrami identity, and the DuBois-Reymond lemma are all necessary conditions for stationarity. They do not decide whether the stationary curve minimises the functional. The next part of the course studies second variation, convexity, and conjugate points to distinguish minimisers from other extremals.
[/remark]
The Euler–Lagrange equation assumes all admissible curves meet fixed boundary values, but many problems allow endpoints to move or impose partial constraints. This chapter generalises the integration-by-parts argument to those settings, yielding natural boundary conditions and transversality relations that replace the simple fixed-endpoint assumption.
# 3. Natural Boundary Conditions and Constraints
Earlier chapters derived the Euler--Lagrange equation from the first variation in the fixed-endpoint case, where all admissible variations vanish at the boundary. This chapter loosens those assumptions: once endpoints, corners, and constraints are allowed, the same integration-by-parts calculation contains information not only in the integral term but also in boundary, matching, and multiplier terms.
## Free Endpoints and Boundary Terms
In the fixed-endpoint problem, variations vanish at $a$ and $b$, so the integration-by-parts boundary term disappears before it can say anything. If one or both endpoints may move, that same term becomes part of the stationarity condition. The boundary condition obtained in this way is called natural because it is forced by the variational problem rather than prescribed in the admissible class.
We begin with the scalar functional
\begin{align*}
I:C^1([a,b]) &\to \mathbb R, &
I[y] &= \int_a^b L(x,y(x),y'(x))\,dx,
\end{align*}
where $L \in C^2([a,b]\times \mathbb R \times \mathbb R)$. The admissible class is a subset of $C^1([a,b])$, determined by whichever endpoint conditions are imposed. For a variation $y_\varepsilon = y + \varepsilon h$, the first variation is
\begin{align*}
\delta I[y;h]
= \int_a^b \left(\partial_y L(x,y,y')h + \partial_{y'}L(x,y,y')h'\right)\,dx.
\end{align*}
Integrating by parts separates the interior and endpoint contributions:
\begin{align*}
\delta I[y;h]
= \int_a^b \left(\partial_y L - \frac{d}{dx}\partial_{y'}L\right)h\,dx
+ \left[\partial_{y'}L(x,y,y')h\right]_{a}^{b}.
\end{align*}
This computation is the place where the boundary data enter the variational problem. For fixed endpoints, the values $h(a)$ and $h(b)$ are forced to vanish, so the bracketed term disappears and only the interior Euler--Lagrange equation remains. If an endpoint value is free, the corresponding endpoint value of $h$ becomes an independent first-order test direction. Stationarity must then cancel not only the integral term but also the coefficient of that boundary variation, giving the natural boundary condition.
[quotetheorem:3506]
[citeproof:3506]
This theorem says that a free endpoint does not mean absence of a condition. It means that the conjugate momentum $\partial_{y'}L$ must vanish at that endpoint unless some geometric endpoint restriction supplies a different boundary term. The free-endpoint hypothesis has three separate roles. First, it identifies the allowed boundary variations: when $y(b)$ is free, the value $h(b)$ can be prescribed independently of the interior test function. Second, it supplies the boundary coefficient that must vanish after the Euler--Lagrange equation has removed the integral term. Third, it tells us which endpoint condition should be used in the next stage of solving the boundary-value problem. If the hypothesis is dropped, the conclusion can fail. Consider the fixed-endpoint problem
\begin{align*}
I[y]=\frac{1}{2}\int_0^1 (y')^2\,dx,
\qquad y(0)=0,\qquad y(1)=1.
\end{align*}
The stationary curve is $y(x)=x$, so $\partial_{y'}L(1,y(1),y'(1))=1\ne0$. Thus imposing the natural boundary condition at a fixed endpoint would discard the correct extremal. The theorem is only a first-order necessary condition; it does not decide whether the stationary curve is a minimum, which is why the next examples separate stationarity from minimality and why constrained endpoints require a different boundary equation.
[example: Free Endpoint for the Dirichlet Energy]
Consider
\begin{align*}
I[y] = \frac{1}{2}\int_a^b (y')^2\,dx
\end{align*}
with $y(a)=\alpha$ fixed and $y(b)$ free. Here $L(x,y,p)=\frac{1}{2}p^2$, so
\begin{align*}
\partial_y L(x,y,p)=0,\qquad \partial_pL(x,y,p)=p.
\end{align*}
The Euler--Lagrange expression is therefore
\begin{align*}
\partial_yL-\frac{d}{dx}\partial_pL=0-\frac{d}{dx}(y')=-y''.
\end{align*}
Stationarity in the interior gives $y''=0$, hence $y'$ is constant on $[a,b]$; write $y'(x)=m$. Integrating once gives
\begin{align*}
y(x)=mx+c.
\end{align*}
For an admissible variation $y_\varepsilon=y+\varepsilon h$, the fixed left endpoint gives $h(a)=0$, while the free right endpoint leaves $h(b)$ unrestricted. The first variation is
\begin{align*}
\delta I[y;h]=\int_a^b y'h'\,dx.
\end{align*}
Integration by parts gives
\begin{align*}
\delta I[y;h]=y'(b)h(b)-y'(a)h(a)-\int_a^b y''h\,dx.
\end{align*}
Using $h(a)=0$ and $y''=0$, this reduces to
\begin{align*}
\delta I[y;h]=y'(b)h(b).
\end{align*}
Since $h(b)$ may be chosen arbitrarily, stationarity forces $y'(b)=0$. For the affine candidate $y'(x)=m$, this gives $m=0$. The fixed endpoint condition then gives
\begin{align*}
\alpha=y(a)=0\cdot a+c=c.
\end{align*}
Thus the stationary curve is $y(x)=\alpha$. The free endpoint has selected the horizontal affine curve, rather than leaving every affine solution admissible.
[/example]
The preceding example had a free vertical endpoint over a fixed value of $x$. A slightly richer boundary problem lets the endpoint slide along a prescribed curve, so the endpoint variation has both horizontal and vertical components. The first variation must then be tested only against endpoint displacements tangent to that curve, which leads to a transversality condition rather than the scalar condition $\partial_{y'}L=0$. If the tangency restriction is ignored, the calculation would incorrectly demand orthogonality to all endpoint displacements in $\mathbb R^2$, as if the endpoint were completely free rather than constrained to the track.
[quotetheorem:6990]
[citeproof:6990]
The vector in the theorem is the boundary momentum in the $(x,y)$ plane. The regularity of $\Gamma$ matters because a corner or cusp in the endpoint constraint may not have a single tangent line, and then the first-order condition must be phrased using the actual cone of admissible endpoint velocities. A concrete failure occurs when $\Gamma$ is the union of the positive coordinate axes at the origin: an endpoint at the origin has two one-sided tangent directions rather than a single tangent line, so orthogonality to one chosen line would miss admissible motions along the other branch. The hypothesis that the left endpoint is fixed prevents an additional boundary term at $a$ from entering the same stationarity equation; if both endpoints slide, stationarity contains one transversality condition at each end, and using only the right endpoint can leave an uncancelled term at $a$. The realisation hypothesis also matters: if the endpoint is constrained to move only to one side along a smooth track, the first variation yields a variational inequality against the allowed half-line of endpoint velocities, not orthogonality to the full tangent line. This transversality equation is only a necessary first-order condition, not a sufficient condition for a minimum; for the Dirichlet energy, an affine graph may satisfy the endpoint transversality equation on a horizontal track while still failing to minimise among curves with a different fixed left height. The fixed-abscissa free-ordinate case is recovered by taking $\tau=(0,1)$, while the next section shows that the same boundary-term bookkeeping also appears at interior joining points.
## Broken Extremals and Corner Conditions
The Euler--Lagrange equation governs smooth arcs, but variational problems often admit candidates made by joining smooth pieces. The question is what happens at a corner: can a stationary curve change velocity abruptly and still satisfy the variational condition? The first variation answers this by leaving an interior boundary term at the joining point.
Let $c\in(a,b)$ and suppose $y$ is $C^2$ on $[a,c]$ and on $[c,b]$, continuous at $c$, and possibly has distinct one-sided derivatives $y'(c-)$ and $y'(c+)$. Such a curve is called a broken extremal when each smooth arc satisfies the Euler--Lagrange equation. The natural admissible class for this discussion is piecewise smooth, since a genuine corner is not an element of $C^1([a,b])$.
[definition: Broken Extremal]
Let $L:[a,b]\times\mathbb R\times\mathbb R\to\mathbb R$ be a $C^2$ function. Let $X$ be the set of continuous curves $y:[a,b]\to\mathbb R$ that are piecewise $C^1$ on a finite partition of $[a,b]$. Define the functional $I:X\to\mathbb R$ by
\begin{align*}
I[y]=\int_a^b L(x,y(x),y'(x))\,dx,
\end{align*}
where $y'$ is taken on each smooth subinterval. A function $y\in X$ is a broken extremal for $I$ with corner at $c\in(a,b)$ if $y\in C^2([a,c])\cap C^2([c,b])$ and each restriction satisfies the Euler--Lagrange equation on its open subinterval.
[/definition]
The definition isolates the interior equation on each side. Stationarity of the whole curve still has to control what happens when the corner point is varied, because the split integral creates boundary terms at $c$ from both arcs.
[quotetheorem:3507]
[citeproof:3507]
The first condition is continuity of the momentum conjugate to $y$. The second is continuity of the Hamiltonian quantity associated with translations in $x$, using the sign convention $H=y'\partial_{y'}L-L$. Each hypothesis controls a different possible jump. Vertical freedom of the corner gives continuity of $\partial_{y'}L$; if the admissible class pins the corner height, a jump in momentum can persist because the variation that would detect it is absent. Horizontal freedom gives continuity of $L-y'\partial_{y'}L$; if the joining point $c$ is fixed in advance, the horizontal corner variation is unavailable, and the second condition need not follow. For example, with a piecewise-defined admissible class in which curves must pass through a fixed point $(c,\beta)$, two affine Dirichlet-energy arcs joining $(a,\alpha)$ to $(c,\beta)$ and $(c,\beta)$ to $(b,\gamma)$ are stationary under variations fixing that interior point even when the one-sided slopes differ. The Weierstrass--Erdmann momentum condition would reject such a path only after the corner height is allowed to vary. These conditions are necessary, not sufficient: a curve may satisfy both matching equations and still fail to minimise the functional. The arc-length example below illustrates that, for strictly convex dependence on $y'$, the momentum condition alone can already rule out genuine corners.
[example: Corners for Arc Length]
For the arc-length functional
\begin{align*}
I[y]=\int_a^b \sqrt{1+(y')^2}\,dx,
\end{align*}
the integrand is $L(x,y,p)=\sqrt{1+p^2}$. Differentiating with respect to $p$ gives
\begin{align*}
\partial_pL(x,y,p)=\frac{1}{2}(1+p^2)^{-1/2}\cdot 2p=\frac{p}{\sqrt{1+p^2}}.
\end{align*}
By the first condition in *Weierstrass--Erdmann Corner Conditions*, a stationary broken extremal with a movable corner at $c$ must satisfy
\begin{align*}
\frac{y'(c-)}{\sqrt{1+(y'(c-))^2}}=\frac{y'(c+)}{\sqrt{1+(y'(c+))^2}}.
\end{align*}
To see what this condition implies, define
\begin{align*}
f(p)=\frac{p}{\sqrt{1+p^2}}=p(1+p^2)^{-1/2}.
\end{align*}
Then
\begin{align*}
f'(p)=1\cdot(1+p^2)^{-1/2}+p\left(-\frac{1}{2}\right)(1+p^2)^{-3/2}\cdot 2p.
\end{align*}
Combining the two terms over the common denominator $(1+p^2)^{3/2}$ gives
\begin{align*}
f'(p)=\frac{1+p^2}{(1+p^2)^{3/2}}-\frac{p^2}{(1+p^2)^{3/2}}=\frac{1}{(1+p^2)^{3/2}}>0.
\end{align*}
Thus $f$ is strictly increasing, so equality $f(y'(c-))=f(y'(c+))$ forces
\begin{align*}
y'(c-)=y'(c+).
\end{align*}
The one-sided slopes therefore match at the corner, so a stationary length candidate cannot have a genuine corner there; the variational condition recovers the geometric fact that shortest planar paths are straight line segments.
[/example]
The corner conditions do not say every broken extremal is a minimiser. They are necessary matching conditions, and later second-variation theory is needed to decide whether a stationary candidate gives a minimum.
## Isoperimetric Constraints and Multipliers
Many variational problems ask for an extremum subject to an integral side condition. The model is to extremise $I[y]$ while requiring another functional $K[y]$ to have a prescribed value. The question is how the Euler--Lagrange equation changes when variations must stay tangent to the constraint surface.
Let $I:C^1([a,b])\to\mathbb R$ be defined by
\begin{align*}
I[y] = \int_a^b L(x,y(x),y'(x))\,dx.
\end{align*}
Let $K:C^1([a,b])\to\mathbb R$ be defined by
\begin{align*}
K[y] = \int_a^b M(x,y(x),y'(x))\,dx.
\end{align*}
The admissible curves lie in the fixed-endpoint affine subspace of $C^1([a,b])$ and satisfy $K[y]=\ell$. The first variation of the constraint is
\begin{align*}
\delta K[y;h]
= \int_a^b \left(\partial_y M - \frac{d}{dx}\partial_{y'}M\right)h\,dx
\end{align*}
for fixed-endpoint variations.
The multiplier argument only works when the constraint has a genuine first-order normal direction at the candidate curve. If the derivative of $K$ vanishes on every admissible direction, then the equation $K[y]=\ell$ may still restrict the admissible class, but its restriction is invisible to first-order variations. The next definition excludes this abnormal case by requiring at least one direction in which the constraint changes to first order.
[definition: Regular Isoperimetric Extremal]
Let $L,M\in C^2([a,b]\times\mathbb R\times\mathbb R)$, and define $I,K:C^1([a,b])\to\mathbb R$ by
\begin{align*}
I[y] = \int_a^b L(x,y(x),y'(x))\,dx
\end{align*}
and
\begin{align*}
K[y] = \int_a^b M(x,y(x),y'(x))\,dx.
\end{align*}
A curve $y\in C^2([a,b])$ in the fixed-endpoint admissible affine subspace of $C^1([a,b])$ is a regular isoperimetric extremal for $I$ subject to $K[y]=\ell$ if $y$ is stationary for $I$ among admissible curves satisfying $K[y]=\ell$, and there exists an admissible variation direction $h\in C^1([a,b])$ with fixed endpoints such that $\delta K[y;h]\ne 0$.
[/definition]
Regularity means the constraint surface has a genuine tangent hyperplane at $y$. The remaining problem is to express stationarity under all variations tangent to this hypersurface in a form usable by the Euler--Lagrange machinery. The multiplier theorem does this by replacing the constrained first-variation equation with an unconstrained first-variation equation for an augmented integrand.
[quotetheorem:3508]
[citeproof:3508]
The theorem reduces the constrained variational problem to an unconstrained one with a modified integrand. The regularity assumption is essential: if $\delta K[y;h]=0$ for every admissible direction, the constraint has no nonzero first-order normal at $y$, and the scalar multiplier argument can break down. A concrete abnormal failure is obtained by taking
\begin{align*}
K[y]=\left(\int_a^b y\,dx\right)^2
\end{align*}
at a feasible curve with
\begin{align*}
\int_a^b y\,dx=0.
\end{align*}
Then $\delta K[y;h]=0$ for every fixed-endpoint direction $h$, so first-order variations do not see the constraint even though the constraint still restricts admissible curves to the hypersurface
\begin{align*}
\int_a^b y\,dx=0.
\end{align*}
In such a situation, there need not be a scalar $\lambda$ detected by the usual ratio argument, because the denominator variation of $K$ is identically zero. The multiplier is not usually known in advance; it is determined together with the boundary conditions and the side constraint. The theorem gives a necessary first-order equation only, so the resulting augmented Euler--Lagrange equation may produce stationary curves that are maxima, minima, or saddle points for the constrained problem.
[example: Queen Dido Isoperimetric Problem]
Among plane curves enclosing a fixed positive area $A$ with a straight coastline, Queen Dido's problem asks for the shortest boundary arc. In the standard straight-coastline model, take the coastline to be $y=0$, with the endpoints of the arc on that line. Away from endpoint contact, write the arc locally as a graph $y=y(x)$. On such a graph patch,
\begin{align*}
I[y]=\int_a^b \sqrt{1+(y')^2}\,dx
\end{align*}
and
\begin{align*}
K[y]=\int_a^b y\,dx=A.
\end{align*}
By the *Isoperimetric Multiplier Theorem*, the graph satisfies the Euler--Lagrange equation for the augmented integrand
\begin{align*}
F(x,y,p)=\sqrt{1+p^2}-\lambda y.
\end{align*}
Its partial derivatives are
\begin{align*}
\partial_yF(x,y,p)=-\lambda
\end{align*}
and
\begin{align*}
\partial_pF(x,y,p)=\frac{1}{2}(1+p^2)^{-1/2}\cdot 2p=\frac{p}{\sqrt{1+p^2}}.
\end{align*}
Therefore the augmented Euler--Lagrange equation is
\begin{align*}
-\lambda-\frac{d}{dx}\left(\frac{y'}{\sqrt{1+(y')^2}}\right)=0.
\end{align*}
The derivative in the second term is the signed curvature of the graph. Indeed,
\begin{align*}
\frac{d}{dx}\left(y'(1+(y')^2)^{-1/2}\right)=y''(1+(y')^2)^{-1/2}+y'\left(-\frac{1}{2}\right)(1+(y')^2)^{-3/2}\cdot 2y'y''.
\end{align*}
Putting the two terms over the common denominator $(1+(y')^2)^{3/2}$ gives
\begin{align*}
\frac{d}{dx}\left(\frac{y'}{\sqrt{1+(y')^2}}\right)=\frac{y''(1+(y')^2)}{(1+(y')^2)^{3/2}}-\frac{(y')^2y''}{(1+(y')^2)^{3/2}}.
\end{align*}
The numerator is
\begin{align*}
y''(1+(y')^2)-(y')^2y''=y'',
\end{align*}
so
\begin{align*}
\frac{d}{dx}\left(\frac{y'}{\sqrt{1+(y')^2}}\right)=\frac{y''}{(1+(y')^2)^{3/2}}.
\end{align*}
Thus the graph has constant signed curvature $-\lambda$. A regular plane curve with constant signed curvature is a circular arc when $\lambda\ne0$, and the case $\lambda=0$ gives a line segment, which cannot enclose positive area with the coastline.
The endpoint condition comes from the constrained-endpoint transversality equation. Along the coastline $y=0$, the endpoint tangent direction is $\tau=(1,0)$. For the augmented integrand $F$, the boundary vector in the *Transversality Condition for a Constrained Endpoint* is
\begin{align*}
\left(F-p\partial_pF,\partial_pF\right).
\end{align*}
At an endpoint on $y=0$ this becomes
\begin{align*}
\left(\sqrt{1+p^2}-p\frac{p}{\sqrt{1+p^2}},\frac{p}{\sqrt{1+p^2}}\right)=\left(\frac{1+p^2-p^2}{\sqrt{1+p^2}},\frac{p}{\sqrt{1+p^2}}\right).
\end{align*}
Hence
\begin{align*}
\left(F-p\partial_pF,\partial_pF\right)=\left(\frac{1}{\sqrt{1+p^2}},\frac{p}{\sqrt{1+p^2}}\right).
\end{align*}
Orthogonality to $\tau=(1,0)$ would require
\begin{align*}
\frac{1}{\sqrt{1+p^2}}=0,
\end{align*}
which is impossible for finite $p$. Thus the limiting endpoint slope must be vertical: the circular arc meets the coastline orthogonally.
If a circle meets the line $y=0$ orthogonally at two points $(x_1,0)$ and $(x_2,0)$, then the radius at each contact point is horizontal, so the centre lies on the coastline. Writing the centre as $(h,0)$, equality of the two radii gives
\begin{align*}
|x_1-h|=|x_2-h|.
\end{align*}
For $x_1<x_2$, this implies
\begin{align*}
h-x_1=x_2-h,
\end{align*}
so
\begin{align*}
h=\frac{x_1+x_2}{2}.
\end{align*}
The admissible stationary arc is therefore the upper semicircle with radius
\begin{align*}
r=\frac{x_2-x_1}{2}.
\end{align*}
The area constraint fixes $r$ because the enclosed area is
\begin{align*}
A=\frac{1}{2}\pi r^2,
\end{align*}
hence
\begin{align*}
r=\sqrt{\frac{2A}{\pi}}.
\end{align*}
This conclusion belongs to the parametrised-curve formulation, or to the limiting interpretation of the graph calculation on interior subarcs. The semicircle has vertical tangents where it meets the coastline, so it is not a single $C^1$ graph over the closed interval between the contact points with finite endpoint slopes; the graph Euler--Lagrange equation gives constant curvature in the interior, and the geometric endpoint transversality condition supplies orthogonal contact with the coastline.
[/example]
The same mechanism governs elastic rods under a prescribed length. There the constraint is not a boundary condition but a global geometric condition, and the multiplier has the interpretation of tension.
[example: Elastic Curve Under Length Constraint]
For a planar elastic curve parametrised by arclength $s$, the bending energy is
\begin{align*}
I[\gamma]=\frac{1}{2}\int_0^L \kappa(s)^2\,ds,
\end{align*}
where $\kappa$ is curvature. On a graph patch $\gamma(x)=(x,y(x))$, the velocity with respect to $x$ is
\begin{align*}
\gamma'(x)=(1,y'(x)).
\end{align*}
Thus its speed is
\begin{align*}
|\gamma'(x)|=\sqrt{1+(y'(x))^2},
\end{align*}
so the arclength element is
\begin{align*}
ds=\sqrt{1+(y')^2}\,dx.
\end{align*}
The signed curvature of the graph is
\begin{align*}
\kappa(x)=\frac{y''(x)}{(1+(y'(x))^2)^{3/2}}.
\end{align*}
Substituting this curvature and the arclength element into the bending energy gives
\begin{align*}
I[y]=\frac{1}{2}\int_a^b \left(\frac{y''}{(1+(y')^2)^{3/2}}\right)^2\sqrt{1+(y')^2}\,dx.
\end{align*}
The square in the integrand is
\begin{align*}
\left(\frac{y''}{(1+(y')^2)^{3/2}}\right)^2=\frac{(y'')^2}{(1+(y')^2)^3}.
\end{align*}
Therefore
\begin{align*}
\frac{(y'')^2}{(1+(y')^2)^3}\sqrt{1+(y')^2}=\frac{(y'')^2}{(1+(y')^2)^{5/2}},
\end{align*}
and hence
\begin{align*}
I[y]=\frac{1}{2}\int_a^b \frac{(y'')^2}{(1+(y')^2)^{5/2}}\,dx.
\end{align*}
The length constraint is obtained from the same arclength element:
\begin{align*}
K[y]=\int_a^b \sqrt{1+(y')^2}\,dx=\ell.
\end{align*}
By the *Isoperimetric Multiplier Theorem*, a regular constrained stationary curve is stationary for the augmented functional
\begin{align*}
J[y]=I[y]-\lambda K[y].
\end{align*}
In graph coordinates this means
\begin{align*}
J[y]=\int_a^b \left(\frac{1}{2}\frac{(y'')^2}{(1+(y')^2)^{5/2}}-\lambda\sqrt{1+(y')^2}\right)\,dx.
\end{align*}
The augmented graph integrand depends on $y'$ and $y''$, not on $y$ alone, so its Euler--Lagrange equation is higher-order. In intrinsic arclength coordinates, the same multiplier appears in the augmented energy
\begin{align*}
\int_0^L \left(\frac{1}{2}\kappa(s)^2-\lambda\right)\,ds,
\end{align*}
where $\lambda$ enforces the prescribed length. Thus the local graph formula and the arclength formulation encode the same constraint force, but the graph version exposes it through a higher-order equation because the curvature contains $y''$.
[/example]
## Holonomic Constraints and Variational Mechanics
Integral constraints restrict the whole path through a scalar side condition. Holonomic constraints restrict the configuration pointwise, for example forcing a mechanical system to move on a surface. The central question becomes how the variational equation separates into tangential dynamics and normal constraint forces.
[definition: Holonomic Constraint]
Let $q:[t_0,t_1]\to\mathbb R^n$ be a configuration path. A holonomic constraint is a condition
\begin{align*}
\Phi(q(t))=0
\end{align*}
for all $t\in[t_0,t_1]$, where $\Phi:\mathbb R^n\to\mathbb R^m$ is a smooth map with $J\Phi_q$ of rank $m$ along the constraint set.
[/definition]
The rank assumption makes the constraint set a smooth submanifold locally, so the definition has converted a pointwise equation into a geometric restriction on the path. Its immediate effect on the variational calculation is that variations $\eta(t)$ must be tangent to that submanifold, which means $J\Phi_{q(t)}\eta(t)=0$ for every $t$. To express the corresponding stationarity condition, we first need the mechanical analogue of the functionals used earlier in the chapter. That functional is the action: it integrates the Lagrangian along the path and gives the object whose first variation will later be restricted to tangent variations.
[definition: Mechanical Action]
For a Lagrangian $L:T\mathbb R^n\to\mathbb R$, written $L(q,\dot q)$, the action is the functional
\begin{align*}
S:C^1([t_0,t_1];\mathbb R^n) &\to \mathbb R, &
S[q]&=\int_{t_0}^{t_1} L(q(t),\dot q(t))\,dt.
\end{align*}
[/definition]
The action definition puts mechanics into the same variational form as the earlier functionals, but the holonomic constraint means the first variation is tested only against tangent variations. The resulting equation should therefore contain the usual Euler--Lagrange residual plus a normal reaction term. The following theorem makes that statement precise by representing every normal residual through multiplier functions.
[quotetheorem:3509]
[citeproof:3509]
The multiplier term is normal to the constraint manifold. The full-rank hypothesis is essential because it makes the normal space equal to the range of $J\Phi_q^\top$ with constant dimension; if the rank drops, the number and interpretation of constraint forces can change along the path. A concrete rank-dropping example is $\Phi(q)=|q|^2$ in $\mathbb R^n$ at the constrained point $q=0$. The set $\Phi^{-1}(0)$ is a single point, so all virtual displacements vanish, but $J\Phi_0=0$ and hence $J\Phi_0^\top\mu=0$ for every multiplier. A nonzero Euler--Lagrange residual at that point cannot be represented as $J\Phi_0^\top\mu$, showing why the full-rank condition is not cosmetic. In mechanics the multiplier is the reaction force: it enforces the constraint but does no virtual work along allowed variations. The theorem does not solve the constrained dynamics by itself, since the multiplier must be found simultaneously with $q$ and the constraint equation.
[example: Particle Constrained to a Sphere]
Let $q(t)\in\mathbb R^3$ be constrained by
\begin{align*}
\Phi(q)=|q|^2-R^2=0.
\end{align*}
For
\begin{align*}
L(q,\dot q)=\frac{m}{2}|\dot q|^2-V(q),
\end{align*}
the velocity derivative is
\begin{align*}
\partial_{\dot q}L(q,\dot q)=m\dot q.
\end{align*}
Therefore
\begin{align*}
\frac{d}{dt}\partial_{\dot q}L(q,\dot q)=m\ddot q.
\end{align*}
The position derivative is
\begin{align*}
\partial_qL(q,\dot q)=-\nabla V(q).
\end{align*}
Hence the Euler--Lagrange residual is
\begin{align*}
\frac{d}{dt}\partial_{\dot q}L-\partial_qL=m\ddot q-(-\nabla V(q))=m\ddot q+\nabla V(q).
\end{align*}
The constraint derivative is
\begin{align*}
J\Phi_q(v)=\frac{d}{d\varepsilon}\Big|_{\varepsilon=0}\left(|q+\varepsilon v|^2-R^2\right)=2q\cdot v.
\end{align*}
Thus $J\Phi_q$ is the row vector $2q^\top$, so
\begin{align*}
J\Phi_q^\top\mu(t)=2\mu(t)q.
\end{align*}
The holonomic multiplier equation therefore becomes
\begin{align*}
m\ddot q+\nabla V(q)=2\mu(t)q.
\end{align*}
Together with the original constraint, the system is
\begin{align*}
m\ddot q+\nabla V(q)=2\mu(t)q,\qquad |q|^2=R^2.
\end{align*}
The force $2\mu(t)q$ is radial. Indeed, if $v$ is tangent to the sphere at $q$, then differentiating $|q+\varepsilon v|^2=R^2$ at $\varepsilon=0$ gives
\begin{align*}
2q\cdot v=0.
\end{align*}
Hence
\begin{align*}
(2\mu q)\cdot v=2\mu(q\cdot v)=0.
\end{align*}
So the multiplier force has no tangential component and acts only in the normal direction.
One can also see how $\mu$ is determined by the constraint. Differentiating $|q|^2=R^2$ once gives
\begin{align*}
2q\cdot \dot q=0.
\end{align*}
Differentiating again gives
\begin{align*}
2|\dot q|^2+2q\cdot \ddot q=0.
\end{align*}
Thus
\begin{align*}
q\cdot \ddot q=-|\dot q|^2.
\end{align*}
Taking the dot product of the equation of motion with $q$ gives
\begin{align*}
m q\cdot \ddot q+q\cdot \nabla V(q)=2\mu |q|^2.
\end{align*}
Using $q\cdot \ddot q=-|\dot q|^2$ and $|q|^2=R^2$, this becomes
\begin{align*}
-m|\dot q|^2+q\cdot \nabla V(q)=2\mu R^2.
\end{align*}
Therefore
\begin{align*}
\mu(t)=\frac{q(t)\cdot \nabla V(q(t))-m|\dot q(t)|^2}{2R^2}.
\end{align*}
The tangential part of $m\ddot q+\nabla V(q)$ governs motion along the sphere, while the multiplier supplies exactly the radial reaction needed to keep $|q(t)|=R$.
[/example]
Natural boundary conditions, corner conditions, and multiplier equations all come from the same source: the terms left after taking the first variation and imposing only the variations that the problem permits. The fixed-endpoint Euler--Lagrange equation is therefore only the interior part of a broader stationarity principle. This viewpoint is also the bridge to Hamiltonian mechanics: the quantities $\partial_{y'}L$ and $y'\partial_{y'}L-L$ are the momentum and Hamiltonian expressions whose boundary and corner matching conditions encode conservation laws when the Lagrangian has symmetries. In geometric optics, Fermat's principle gives the same structure, with transversality and corner conditions becoming refraction and reflection laws at interfaces. Subsequent chapters use these necessary conditions as the starting point for second variation and Jacobi-field tests, which decide when a stationary curve is actually locally minimizing.
The first-variation framework identifies which curves are stationary, but stationarity alone leaves open whether they minimise, maximise, or are saddle points of the functional. Second-order theory enters here: the second variation becomes a quadratic form on perturbations, and Legendre's condition on the Lagrangian provides the first test for minimality.
# 4. Legendre's Condition and the Second Variation
After the first-variation and boundary-condition chapters, this chapter begins the second-order theory of variational problems. The Euler--Lagrange equation tells us which curves can be stationary, but stationarity alone does not distinguish minima from maxima or saddle points. The second variation measures the first non-vanishing change of the functional near an extremal, and Legendre's condition extracts from it a pointwise convexity requirement in the velocity variable. This leads naturally to the Jacobi accessory equation, whose solutions record the directions in which the second variation may degenerate.
## The Second Variation as a Quadratic Test
Once a curve $y$ satisfies the Euler--Lagrange equation, the next question is whether nearby admissible curves increase the functional. We work first with fixed endpoints, since then variations vanish at the boundary and no transversality terms obscure the interior calculation.
[definition: Second Variation]
Let $L \in C^2([a,b] \times \mathbb R \times \mathbb R)$ and fix endpoint values $y_a,y_b \in \mathbb R$. Let
\begin{align*}
\mathcal A := \{z\in C^1([a,b]) : z(a)=y_a,\ z(b)=y_b\}
\end{align*}
and define the functional
\begin{align*}
J: \mathcal A &\to \mathbb R, &
z &\mapsto \int_a^b L(x,z(x),z'(x))\,dx.
\end{align*}
For $y\in \mathcal A\cap C^2([a,b])$, let
\begin{align*}
\mathcal V := \{h\in C^1([a,b]) : h(a)=h(b)=0\}.
\end{align*}
The second variation of $J$ at $y$ is the quadratic form
\begin{align*}
\delta^2J[y;\cdot]:\mathcal V&\to\mathbb R, &
h&\mapsto \frac{d^2}{d\varepsilon^2}\Big|_{\varepsilon=0}J[y+\varepsilon h].
\end{align*}
[/definition]
The definition is a Gâteaux second derivative along affine lines in the admissible space. For scalar fixed-endpoint problems it can be written as an integral quadratic form in $h$ and $h'$. If $L$ is $C^2$ and $y,h$ are regular enough to justify differentiating under the integral sign, then along the variation $y+\varepsilon h$ one obtains
\begin{align*}
\delta^2J[y;h]
= \int_a^b \left(
L_{yy}(x,y,y')h^2
+2L_{yp}(x,y,y')hh'
+L_{pp}(x,y,y')(h')^2
\right)\,dx.
\end{align*}
Here the partial derivatives of $L$ are evaluated at $(x,y(x),y'(x))$.
The formula makes second variation computable, but by itself it is only an identity. The $C^2$ hypothesis is what permits the Taylor expansion in the two dependent variables $(y,p)$ with a controlled quadratic part; with merely first differentiability there need not be a well-defined quadratic coefficient. The $C^1$ variation class is also essential for this classical calculation, because the integrand contains $h'$ pointwise. The endpoint assumptions are not needed for differentiating twice, but they remove boundary terms when this quadratic form is later compared with the Euler--Lagrange equation; without fixed endpoints, additional natural boundary conditions enter the second-order test. This motivates the next theorem: at a genuine local minimum, the computed quadratic form must have nonnegative sign in every admissible direction.
[quotetheorem:3522]
[citeproof:3522]
This theorem is a necessary condition, not a sufficient condition for minimality. A stationary curve can have nonnegative second variation in all directions and still fail to be a local minimiser if higher-order terms have the wrong sign; finite-dimensional examples such as $f(t)=t^3$ show the same limitation of second-order necessary tests. The weak $C^1$ topology matters because the proof only probes curves of the form $y+\varepsilon h$ that remain close in both position and velocity; a different admissible topology changes what perturbations count as nearby. The theorem is global in the direction $h$, but it hides a local consequence: the coefficient of $(h')^2$ cannot be negative at any point along a minimising extremal. Isolating that coefficient gives Legendre's condition.
[example: A Positive Second Variation for the Dirichlet Energy]
For the Dirichlet energy
\begin{align*}
J[y]=\int_a^b \frac{1}{2}(y')^2\,dx,
\end{align*}
the Lagrangian is $L(x,y,p)=\frac12p^2$. Its first derivatives are $L_y=0$ and $L_p=p$, so the Euler--Lagrange equation becomes
\begin{align*}
0-\frac{d}{dx}(y')=0.
\end{align*}
Thus $y''=0$, and integrating twice gives $y(x)=mx+c$ for constants $m,c$.
Now fix such an extremal $y$ and a fixed-endpoint variation $h\in C^1([a,b])$ with $h(a)=h(b)=0$. Expanding the perturbed functional gives
\begin{align*}
J[y+\varepsilon h]=\int_a^b \frac12(y'+\varepsilon h')^2\,dx.
\end{align*}
Since $(y'+\varepsilon h')^2=(y')^2+2\varepsilon y'h'+\varepsilon^2(h')^2$, this is
\begin{align*}
J[y+\varepsilon h]=\int_a^b \left(\frac12(y')^2+\varepsilon y'h'+\frac{\varepsilon^2}{2}(h')^2\right)\,dx.
\end{align*}
Differentiating with respect to $\varepsilon$ gives
\begin{align*}
\frac{d}{d\varepsilon}J[y+\varepsilon h]=\int_a^b \left(y'h'+\varepsilon(h')^2\right)\,dx.
\end{align*}
Differentiating once more and setting $\varepsilon=0$ gives
\begin{align*}
\delta^2J[y;h]=\int_a^b (h')^2\,dx.
\end{align*}
The integrand $(h')^2$ is nonnegative, so $\delta^2J[y;h]\ge 0$. If $\delta^2J[y;h]=0$, then $\int_a^b(h')^2\,dx=0$; since $h'$ is continuous, this forces $h'(x)=0$ for every $x\in[a,b]$. Hence $h$ is constant, and the endpoint conditions $h(a)=h(b)=0$ force $h=0$. Thus the second variation is positive on every nonzero fixed-endpoint variation, reflecting the strengthened Legendre coefficient $L_{pp}=1$ and the absence of nonzero degenerate directions.
[/example]
## Legendre's Necessary Condition
The nonnegativity of $\delta^2J$ must hold for every short, sharply concentrated variation. Such variations make the derivative term $(h')^2$ dominate the lower-order terms $h^2$ and $hh'$, so a negative value of $L_{pp}$ at one point would contradict minimality.
[quotetheorem:3523]
[citeproof:3523]
Legendre's condition is a pointwise convexity test in the velocity variable. The weak-minimum hypothesis is essential: for $L(x,y,p)=-p^2/2$, affine curves are stationary but the velocity Hessian is negative, so stationarity alone gives no Legendre inequality. The regularity hypothesis is also doing work, since the proof localises a continuous coefficient $L_{pp}(x,y(x),y'(x))$; if the second velocity derivative failed to exist or oscillated without continuity, the bump argument would not produce a stable sign on a small interval. The condition is necessary, not sufficient: it sees only the leading part of the quadratic form under localisation, not the cumulative effect of the lower-order terms over the whole interval.
[example: Failure of Legendre Condition for a Non-Convex Lagrangian]
Let $L(x,y,p)=-\frac{1}{2}p^2$ and impose fixed endpoints. Its derivatives in the dependent variables are
\begin{align*}
L_y(x,y,p)=0,\qquad L_p(x,y,p)=-p,\qquad L_{pp}(x,y,p)=-1.
\end{align*}
Along a curve $y$, the Euler--Lagrange equation is
\begin{align*}
L_y(x,y,y')-\frac{d}{dx}L_p(x,y,y')=0.
\end{align*}
Substituting the derivatives of $L$ gives
\begin{align*}
0-\frac{d}{dx}(-y')=0.
\end{align*}
Since $\frac{d}{dx}(-y')=-y''$, this becomes
\begin{align*}
y''=0.
\end{align*}
Therefore every affine curve $y(x)=mx+c$ is an extremal. However, because $L_{pp}=-1$ everywhere, the Legendre necessary condition would require $-1\ge 0$, which is false.
Fix an affine extremal $y(x)=mx+c$ and a fixed-endpoint variation $h\in C^1([a,b])$ with $h(a)=h(b)=0$. The perturbed functional is
\begin{align*}
J[y+\varepsilon h]=-\frac12\int_a^b (y'+\varepsilon h')^2\,dx.
\end{align*}
Since $y'=m$, expand the square:
\begin{align*}
(y'+\varepsilon h')^2=m^2+2\varepsilon mh'+\varepsilon^2(h')^2.
\end{align*}
Hence
\begin{align*}
J[y+\varepsilon h]=-\frac12\int_a^b m^2\,dx-\varepsilon m\int_a^b h'\,dx-\frac{\varepsilon^2}{2}\int_a^b (h')^2\,dx.
\end{align*}
The middle term vanishes because
\begin{align*}
\int_a^b h'\,dx=h(b)-h(a)=0.
\end{align*}
Thus
\begin{align*}
J[y+\varepsilon h]-J[y]=-\frac{\varepsilon^2}{2}\int_a^b (h')^2\,dx.
\end{align*}
If $h\ne 0$, then $\int_a^b(h')^2\,dx>0$: otherwise continuity of $h'$ would force $h'=0$ on $[a,b]$, so $h$ would be constant, and the endpoint conditions would force $h=0$. Therefore, for every nonzero fixed-endpoint variation $h$ and every $\varepsilon\ne 0$,
\begin{align*}
J[y+\varepsilon h]-J[y]<0.
\end{align*}
Equivalently,
\begin{align*}
\delta^2J[y;h]=-\int_a^b(h')^2\,dx<0.
\end{align*}
So these affine stationary curves are not weak local minimisers; solving the Euler--Lagrange equation has found stationary curves with the wrong second-order sign.
[/example]
The condition can also be read as convexity of $p \mapsto L(x,y,p)$ along the extremal, but only infinitesimally along the curve under study. A Lagrangian may be globally non-convex in $p$ while still satisfying Legendre's condition along a particular extremal, and then further tests are needed.
[remark: Scalar and Vector Forms]
For vector-valued curves $y:[a,b]\to \mathbb R^n$, the velocity Hessian is the matrix $L_{pp}(x,y,y')\in \mathbb R^{n\times n}$. Legendre's condition becomes
\begin{align*}
\xi^\top L_{pp}(x,y(x),y'(x))\xi \ge 0
\end{align*}
for every $\xi\in \mathbb R^n$ and every $x\in [a,b]$. The scalar statement is the case $n=1$.
[/remark]
## The Strengthened Legendre Condition
The weak inequality $L_{pp}\ge 0$ permits degeneracy: the second variation may fail to control the derivative of the variation. To build a stable second-order theory, especially for conjugate points, one usually assumes strict positivity of the velocity Hessian.
[definition: Strengthened Legendre Condition]
Let $y\in C^2([a,b])$ be an extremal for $J[y]=\int_a^b L(x,y,y')\,dx$. The strengthened Legendre condition along $y$ is
\begin{align*}
L_{pp}(x,y(x),y'(x))>0
\end{align*}
for every $x\in [a,b]$.
[/definition]
The scalar definition controls variations of one real-valued curve. This motivates the vector version, where the same positivity requirement must hold in every velocity direction rather than for a single coefficient.
[definition: Strengthened Vector Legendre Condition]
Let $y:[a,b]\to \mathbb R^n$ be an extremal. The strengthened vector Legendre condition along $y$ is that the symmetric matrix $L_{pp}(x,y(x),y'(x))$ is positive definite for every $x\in [a,b]$.
[/definition]
The pointwise definition is the form in which the condition is checked. For estimates and ODE theory, however, a uniform lower bound is the useful consequence, and compactness of the interval supplies it.
[quotetheorem:6991]
[citeproof:6991]
The preceding theorem explains why strict Legendre positivity is a natural hypothesis before deriving the Jacobi accessory equation. Both assumptions are needed. Strict positivity at each point would not by itself give a uniform lower bound on a non-compact interval; for instance $P(x)=e^{-x}$ is positive on $[0,\infty)$ but has no positive lower bound. Compactness is what turns pointwise positivity into a constant $\theta>0$, while strict positivity is what excludes examples such as $P(x)=(x-a)^2$ on $[a,b]$, whose minimum is zero. This uniform bound prevents the coefficient of the highest derivative term from vanishing, so the linear ODE governing degeneracy can be solved with standard initial data.
[example: Degenerate Velocity Hessian]
For
\begin{align*}
L(x,y,p)=p^4,
\end{align*}
the derivatives relevant to the Euler--Lagrange equation and the Legendre coefficient are
\begin{align*}
L_y(x,y,p)=0,\qquad L_p(x,y,p)=4p^3,\qquad L_{pp}(x,y,p)=12p^2.
\end{align*}
If $y(x)=c$ is constant, then $y'(x)=0$, so the Euler--Lagrange expression along $y$ is
\begin{align*}
L_y(x,c,0)-\frac{d}{dx}L_p(x,c,0)=0-\frac{d}{dx}(0)=0.
\end{align*}
Thus the constant curve is an extremal. Along this extremal,
\begin{align*}
L_{pp}(x,c,0)=12\cdot 0^2=0.
\end{align*}
Hence [Legendre's necessary condition](/theorems/3523) holds as the weak inequality $0\ge 0$, but the strengthened Legendre condition fails because it would require $L_{pp}(x,c,0)>0$ at every point.
Now take a fixed-endpoint variation $h\in C^1([a,b])$ with $h(a)=h(b)=0$. Since $(c+\varepsilon h)'=\varepsilon h'$, the perturbed functional is
\begin{align*}
J[c+\varepsilon h]=\int_a^b ((c+\varepsilon h)')^4\,dx=\int_a^b (\varepsilon h')^4\,dx=\varepsilon^4\int_a^b (h')^4\,dx.
\end{align*}
Therefore
\begin{align*}
\frac{d}{d\varepsilon}J[c+\varepsilon h]=4\varepsilon^3\int_a^b (h')^4\,dx.
\end{align*}
Differentiating once more gives
\begin{align*}
\frac{d^2}{d\varepsilon^2}J[c+\varepsilon h]=12\varepsilon^2\int_a^b (h')^4\,dx.
\end{align*}
Setting $\varepsilon=0$ yields
\begin{align*}
\delta^2J[c;h]=0
\end{align*}
for every admissible direction $h$. Thus the second variation cannot distinguish the constant extremal from nearby curves: the quadratic term vanishes identically, and the first possible nonzero contribution appears at order $\varepsilon^4$.
[/example]
## The Jacobi Accessory Equation and Jacobi Fields
After Legendre's condition, the next question is when the nonnegative quadratic form $\delta^2J$ has a nonzero direction in which it vanishes. The Euler--Lagrange equation can be linearised along an extremal, and the resulting equation is the Jacobi accessory equation.
[definition: Jacobi Accessory Equation]
Let $y\in C^2([a,b])$ be an extremal for a scalar fixed-endpoint variational problem with $L\in C^3$. A function $u\in C^2([a,b])$ satisfies the Jacobi accessory equation along $y$ if
\begin{align*}
-\frac{d}{dx}\left(P(x)u'(x)+Q(x)u(x)\right)+Q(x)u'(x)+R(x)u(x)=0,
\end{align*}
where $P(x)=L_{pp}(x,y(x),y'(x))$, $Q(x)=L_{yp}(x,y(x),y'(x))$, and $R(x)=L_{yy}(x,y(x),y'(x))$.
[/definition]
This equation is the Euler--Lagrange equation of the second variation quadratic functional. The notation $P,Q,R$ separates the second-order variational coefficients from the original curve and makes the accessory problem look like an ordinary linear boundary-value problem.
[definition: Jacobi Field]
A Jacobi field along an extremal $y$ is a solution $u\in C^2([a,b])$ of the Jacobi accessory equation along $y$.
[/definition]
Jacobi fields are infinitesimal variations through nearby extremals in the cases where such a family exists. Even when no family has been constructed, the accessory equation provides the linearised obstruction to strict positivity of the second variation.
[quotetheorem:3524]
[citeproof:3524]
The strengthened Legendre hypothesis is the regularity condition that makes this initial-value theorem ordinary rather than singular. If $P=L_{pp}$ vanishes, the accessory equation may cease to determine $u''$ from $u$ and $u'$; equations such as $x u''+u'=0$ near $x=0$ illustrate how uniqueness and smooth continuation can fail at a zero of the leading coefficient. The $C^3$ assumption ensures that the differentiated coefficients in the expanded equation are continuous along the $C^2$ extremal, which is the level of regularity required by the standard linear ODE theorem. The boundary conditions distinguish Jacobi fields relevant to minimality from arbitrary solutions of the accessory equation. A nonzero Jacobi field that vanishes at the left endpoint and again at an interior point signals the arrival of a conjugate point, which is the next chapter's criterion for loss of minimality.
[definition: Conjugate Point Along an Extremal]
Let $y$ be an extremal on $[a,b]$. A point $c\in(a,b]$ is conjugate to $a$ along $y$ if there exists a nonzero Jacobi field $u$ along $y|_{[a,c]}$ such that
\begin{align*}
u(a)=0,\qquad u(c)=0.
\end{align*}
[/definition]
This definition is included here because it explains why the accessory equation matters. The detailed relationship between conjugate points and positivity of the second variation belongs to the Jacobi theory developed after this chapter.
[example: Jacobi Fields on Geodesics of the Sphere]
On the unit sphere $S^2$, geodesics are great circles. Along a unit-speed great circle, write $x$ for arclength. Since the sectional curvature in every tangent two-plane is $1$, a normal Jacobi field has scalar coefficient $u$ satisfying
\begin{align*}
u''+u=0.
\end{align*}
We solve this equation explicitly. If
\begin{align*}
u(x)=A\cos x+B\sin x,
\end{align*}
then
\begin{align*}
u'(x)=-A\sin x+B\cos x
\end{align*}
and
\begin{align*}
u''(x)=-A\cos x-B\sin x.
\end{align*}
Therefore
\begin{align*}
u''(x)+u(x)=(-A\cos x-B\sin x)+(A\cos x+B\sin x)=0.
\end{align*}
Conversely, a solution of the second-order linear equation is determined by $u(0)$ and $u'(0)$, and the formula above has $u(0)=A$ and $u'(0)=B$, so the solutions are exactly
\begin{align*}
u(x)=A\cos x+B\sin x.
\end{align*}
Taking $A=0$ and $B=1$ gives the Jacobi field
\begin{align*}
u(x)=\sin x.
\end{align*}
Its endpoint values are
\begin{align*}
u(0)=\sin 0=0
\end{align*}
and
\begin{align*}
u(\pi)=\sin \pi=0.
\end{align*}
Since $u$ is not the zero function, this is a nonzero Jacobi field vanishing at $0$ and $\pi$. Hence the point at arclength $\pi$, the antipodal point on the great circle, is conjugate to the starting point. This is the basic geometric model for conjugate points: neighbouring geodesics separate at first and then refocus at the antipode.
[/example]
The sphere example shows that Jacobi fields are not merely formal solutions of a linear equation; they encode how neighbouring extremals separate and meet again. This geometric interpretation is also obtained directly by differentiating a family of extremals.
[remark: Accessory Equation as Linearised Euler--Lagrange]
If $y_\varepsilon$ is a smooth family of extremals and $u=\partial_\varepsilon y_\varepsilon|_{\varepsilon=0}$, differentiating the Euler--Lagrange equation for $y_\varepsilon$ with respect to $\varepsilon$ gives the Jacobi accessory equation for $u$. Thus Jacobi fields describe infinitesimal motions inside the space of extremals, not arbitrary perturbations of the curve.
[/remark]
Legendre's condition is necessary but not sufficient for minimality, and the second-variation form may change sign even on an extremal. Jacobi fields—solutions to a certain accessory differential equation—precisely encode this: their zeros detect directions in which the quadratic form becomes non-positive, signalling loss of strict minimality.
# 5. Conjugate Points and Jacobi's Theorem
The previous chapter turned the second variation into a quadratic form and introduced Jacobi fields as possible degenerate directions. This chapter makes that connection precise: it asks when the quadratic form can change sign along an extremal and how the zeros of Jacobi fields encode that failure. This chapter studies when that quadratic form can change sign along an extremal. The central object is a special solution of the linearised Euler--Lagrange equation, and the central geometric event is the appearance of a second point where such a solution vanishes.
Jacobi's theory refines the Legendre condition. The strengthened Legendre condition controls the integrand pointwise in the velocity variable; Jacobi's condition controls how that local positivity propagates along the whole interval. In the scalar fixed-endpoint problem, this propagation is governed by a second-order linear ODE in Sturm--Liouville form.
## Vanishing Variations Along an Extremal
The second variation tests nearby curves through variations $h$ with $h(a)=h(b)=0$. If the quadratic form has a non-zero direction on which its leading positive term is cancelled by the lower-order term, then a minimum can fail even when the Legendre condition holds. The first task is to isolate the ODE satisfied by the borderline directions.
Let $L \in C^3([a,b] \times \mathbb R \times \mathbb R)$ and fix endpoint values $\alpha,\beta \in \mathbb R$. The functional is the map
\begin{align*}
J: \mathcal A_{\alpha,\beta} \to \mathbb R, \qquad J[z] = \int_a^b L(x,z(x),z'(x))\,dx,
\end{align*}
where
\begin{align*}
\mathcal A_{\alpha,\beta}=\{z \in C^1([a,b];\mathbb R): z(a)=\alpha,\ z(b)=\beta\}.
\end{align*}
Let $y \in C^2([a,b];\mathbb R) \cap \mathcal A_{\alpha,\beta}$ be an extremal. Along $y$, define $P(x)=L_{y'y'}(x,y(x),y'(x))$, $R(x)=L_{yy'}(x,y(x),y'(x))$, and $T(x)=L_{yy}(x,y(x),y'(x))$. For fixed-endpoint variations $h,k \in C^1([a,b])$ with $h(a)=h(b)=k(a)=k(b)=0$, the symmetric [bilinear form](/page/Bilinear%20Form) associated with the second variation is
\begin{align*}
\delta^2J[y;h,k] = \int_a^b \big(P h'k' + R(hk' + h'k) + T hk\big)\,dx.
\end{align*}
Integrating the mixed term by parts when the endpoints vanish rewrites the quadratic form as
\begin{align*}
\delta^2J[y;h,h] = \int_a^b \big(P(h')^2 + Qh^2\big)\,dx, \qquad Q := T - R'.
\end{align*}
The operator hidden in this quadratic form is obtained by integrating by parts once more. On the test domain
\begin{align*}
\mathcal D_0=\{u \in C^2([a,b];\mathbb R): u(a)=u(b)=0\},
\end{align*}
the Jacobi operator is the map
\begin{align*}
\mathcal J_y: \mathcal D_0 \to C([a,b];\mathbb R), \qquad \mathcal J_yu=-(Pu')'+Qu.
\end{align*}
Its null solutions are the directions in which the second variation is stationary as a functional of the variation itself.
[definition: Jacobi Equation]
Let $P,Q \in C([a,b])$ with $P \in C^1([a,b])$. The Jacobi equation associated with the scalar second variation is the equation for an unknown map $u:[a,b]\to \mathbb R$ with $u \in C^2((a,b);\mathbb R)\cap C^1([a,b];\mathbb R)$ given by
\begin{align*}
-(P(x)u'(x))' + Q(x)u(x) = 0, \qquad x \in (a,b).
\end{align*}
[/definition]
The sign convention is chosen so that the second variation is the quadratic form of the differential operator $u \mapsto -(Pu')' + Qu$. To use this equation in the original variational problem, we need to single out its non-zero solutions along the extremal and regard them as infinitesimal deformations.
[definition: Jacobi Field Along an Extremal]
Let $y$ be an extremal and let $P,Q$ be the coefficients obtained from its second variation. A Jacobi field along $y$ is a non-zero map $u:[a,b]\to \mathbb R$ such that $u \in C^2((a,b);\mathbb R) \cap C^1([a,b];\mathbb R)$ and $u$ solves the Jacobi equation on $(a,b)$.
[/definition]
The word field comes from the geometric case, where variations through geodesics produce vector fields along a reference geodesic. In the scalar classical problem it is a function, but it plays the same role: it records an infinitesimal family of extremals.
[example: Jacobi Equation For The Quadratic Lagrangian]
Consider
\begin{align*}
L(x,y,y')=\frac{1}{2}(y')^2-\frac{\lambda}{2}y^2
\end{align*}
on $[a,b]$, with $\lambda\in\mathbb R$. Its first partial derivatives are
\begin{align*}
L_y(x,y,y')=-\lambda y, \qquad L_{y'}(x,y,y')=y'.
\end{align*}
Hence the Euler--Lagrange equation $\frac{d}{dx}L_{y'}-L_y=0$ becomes
\begin{align*}
y''-(-\lambda y)=0,
\end{align*}
or equivalently
\begin{align*}
y''+\lambda y=0.
\end{align*}
For the second variation coefficients, the relevant second partial derivatives are
\begin{align*}
L_{y'y'}=1, \qquad L_{yy'}=0, \qquad L_{yy}=-\lambda.
\end{align*}
Thus along any extremal,
\begin{align*}
P=1, \qquad R=0, \qquad T=-\lambda.
\end{align*}
Since $Q=T-R'$, we get
\begin{align*}
Q=-\lambda-0=-\lambda.
\end{align*}
The Jacobi equation $-(Pu')'+Qu=0$ is therefore
\begin{align*}
-(1\cdot u')'-\lambda u=0,
\end{align*}
which is
\begin{align*}
-u''-\lambda u=0.
\end{align*}
Equivalently,
\begin{align*}
u''+\lambda u=0.
\end{align*}
When $\lambda>0$, write $\mu=\sqrt{\lambda}$. The solutions are
\begin{align*}
u(x)=A\cos(\mu(x-a))+B\sin(\mu(x-a)).
\end{align*}
In particular, the principal solution with $u(a)=0$ and $u'(a)=1$ is
\begin{align*}
u(x)=\frac{1}{\mu}\sin(\mu(x-a)).
\end{align*}
Its zeros occur when $\mu(x-a)=n\pi$, so
\begin{align*}
x=a+\frac{n\pi}{\sqrt{\lambda}}, \qquad n\in\mathbb Z.
\end{align*}
Thus larger positive $\lambda$ makes the Jacobi fields oscillate faster, and the first zero after $a$ appears sooner. This is the mechanism by which oscillation of the Jacobi equation signals possible loss of minimality.
[/example]
## Conjugate Points and the Necessary Condition
The second variation is tested against functions vanishing at both endpoints. If a Jacobi field vanishes at $a$ and again at an interior point $c$, then it supplies a direction supported on $[a,c]$ where the quadratic form is already degenerate. The appearance of such a point is the obstruction Jacobi found.
[definition: Conjugate Point]
Let $y$ be an extremal on $[a,b]$ and let $P,Q$ be the scalar Jacobi coefficients along $y$. A point $c \in (a,b]$ is conjugate to $a$ along $y$ if there exists a non-zero map $u:[a,b]\to \mathbb R$ with $u \in C^2((a,b);\mathbb R)\cap C^1([a,b];\mathbb R)$ solving the Jacobi equation on $(a,b)$ such that
\begin{align*}
u(a)=0, \qquad u(c)=0.
\end{align*}
[/definition]
Only the base point $a$ is fixed in this definition. For a two-endpoint variational problem, conjugacy to the initial endpoint tells us whether the endpoint condition at $b$ lies beyond the first place where extremals cease to separate.
[quotetheorem:3525]
[citeproof:3525]
The hypothesis $P>0$ is essential: if the strengthened Legendre condition is dropped, the derivative part of the second variation can already have the wrong sign, so the absence or presence of conjugate points no longer controls minimality. The restriction $c\in(a,b)$ is also essential, because a zero exactly at $b$ gives a null endpoint variation rather than an immediate negative direction; for example the constant-coefficient problem with $P=1$, $Q=-\pi^2/(b-a)^2$ has principal solution proportional to $\sin(\pi(x-a)/(b-a))$ and a degenerate second variation. The theorem is therefore only a necessary test: it rules out weak local minima after an interior conjugate point, but it does not prove minimality when no such point has appeared. This is why the sufficiency theorem below must add a global non-vanishing condition up to $b$ and use the Weierstrass--Jacobi identity.
[remark: Endpoint Conjugacy]
If $b$ is conjugate to $a$, the second variation has a non-zero null direction satisfying the same endpoint conditions as the original variational problem. This does not by itself prove that $y$ is not a weak local minimum, but it prevents the second variation from being positive definite. Higher variations or global comparison may decide the problem.
[/remark]
## Sturm--Liouville Form and Oscillation
The Jacobi equation becomes useful because it is a second-order linear equation whose zeros can be counted and compared. The variational question about positivity of an integral is converted into an oscillation question for solutions of an ODE. This section records the ODE facts used in the sufficiency theorem.
The strengthened Legendre condition gives $P(x)>0$, so the Jacobi equation can be studied as
\begin{align*}
-(Pu')' + Qu = 0.
\end{align*}
Given initial data $u(a)=0$ and $u'(a)=1$, existence and uniqueness produce a distinguished solution. Its first zero after $a$, if it exists, is the first conjugate point to $a$.
[definition: Principal Jacobi Solution]
Assume $P \in C^1([a,b])$, $Q \in C([a,b])$, and $P(x)>0$ on $[a,b]$. The principal Jacobi solution based at $a$ is the map $u:[a,b]\to \mathbb R$ such that $u \in C^2((a,b);\mathbb R) \cap C^1([a,b];\mathbb R)$ and
\begin{align*}
-(Pu')' + Qu = 0, \qquad u(a)=0, \qquad u'(a)=1.
\end{align*}
[/definition]
The normalization $u'(a)=1$ is harmless here because the same positive coefficient $P$ is used throughout the comparison theory; replacing it by $P(a)u'(a)=1$ only rescales the same principal solution by the positive constant $P(a)$.
Because the equation is linear and second order, any Jacobi field vanishing at $a$ is a scalar multiple of the principal solution. The next ODE result tells us that zeros of independent solutions interlace, so conjugate points are isolated events rather than an uncontrolled set.
[quotetheorem:6992]
[citeproof:6992]
Separation prevents zeros from clustering and gives the picture behind conjugate points: after the first zero of the principal solution, the Jacobi fields have begun to oscillate. Each hypothesis is doing real work. If $u$ and $v$ are linearly dependent, their zeros coincide rather than interlace, so the conclusion fails in the most basic way. If $P$ is allowed to vanish or change sign, the equation may cease to be a regular Sturm--Liouville equation and uniqueness through a zero can fail. For a concrete degenerate model, take $P(x)=x^2$ and $Q(x)=0$ on $[-1,1]$. Then $(x^2w')'=0$ on each side of $0$, so functions that are constant on $(-1,0)$ and independently constant on $(0,1)$ solve the equation away from the degeneracy after imposing only weak matching at $0$. The weighted Wronskian $x^2(u'v-uv')$ can vanish at $0$ without forcing the usual uniqueness information across $0$. Continuity of $Q$ and $C^1$ regularity of $P$ with $P>0$ are the regularity and non-degeneracy assumptions that justify differentiating the weighted Wronskian and applying uniqueness. The next comparison theorem explains how changing the coefficient $Q$ shifts these zeros once the same regular Sturm--Liouville framework is in place.
[quotetheorem:3510]
[citeproof:3510]
The [comparison principle](/theorems/4870) is the analytic form of the idea that a more negative potential in the second variation produces faster oscillation. The direction of the inequality matters: reversing $Q_2\le Q_1$ reverses which equation is forced to oscillate sooner. The strict inequality somewhere is also needed, since if $Q_1=Q_2$ then the two principal solutions coincide and there is no new zero before $c$. The assumption that $a$ and $c$ are consecutive zeros prevents sign changes of $u_1$ inside the interval; without it, the integral identity can have cancellations and no comparison conclusion follows. The theorem does not locate the first zero exactly, and it does not say that every later zero of $u_2$ is controlled without reapplying the argument between consecutive zeros.
[example: Constant Coefficients And The First Conjugate Point]
Set $\mu=\sqrt{\lambda}$, so $\mu>0$. With $P=1$ and $Q=-\lambda$, the principal Jacobi equation is
\begin{align*}
-(1\cdot u')'-\lambda u=0.
\end{align*}
Since $(1\cdot u')'=u''$, this is
\begin{align*}
-u''-\lambda u=0.
\end{align*}
Multiplying by $-1$ gives
\begin{align*}
u''+\lambda u=0.
\end{align*}
The principal initial conditions are $u(a)=0$ and $u'(a)=1$. For
\begin{align*}
u(x)=\frac{1}{\mu}\sin(\mu(x-a)),
\end{align*}
we have
\begin{align*}
u(a)=\frac{1}{\mu}\sin(0)=0.
\end{align*}
Differentiating gives
\begin{align*}
u'(x)=\cos(\mu(x-a)).
\end{align*}
Therefore
\begin{align*}
u'(a)=\cos(0)=1.
\end{align*}
Differentiating once more gives
\begin{align*}
u''(x)=-\mu\sin(\mu(x-a)).
\end{align*}
Also,
\begin{align*}
\lambda u(x)=\mu^2\cdot \frac{1}{\mu}\sin(\mu(x-a))=\mu\sin(\mu(x-a)).
\end{align*}
Hence
\begin{align*}
u''(x)+\lambda u(x)=-\mu\sin(\mu(x-a))+\mu\sin(\mu(x-a))=0.
\end{align*}
The zeros of $u$ are the points where
\begin{align*}
\sin(\mu(x-a))=0.
\end{align*}
This holds exactly when
\begin{align*}
\mu(x-a)=n\pi, \qquad n\in\mathbb Z.
\end{align*}
Solving for $x$ gives
\begin{align*}
x=a+\frac{n\pi}{\mu}=a+\frac{n\pi}{\sqrt{\lambda}}.
\end{align*}
The zero at $n=0$ is the base point $a$. The first zero after $a$ occurs at $n=1$, so the first conjugate point is
\begin{align*}
c=a+\frac{\pi}{\sqrt{\lambda}}.
\end{align*}
Thus the critical interval length is
\begin{align*}
b-a=\frac{\pi}{\sqrt{\lambda}}.
\end{align*}
If $b-a<\pi/\sqrt{\lambda}$, then the principal solution has no zero in $(a,b]$; if $b-a=\pi/\sqrt{\lambda}$, it vanishes at the endpoint $b$; and if $b-a>\pi/\sqrt{\lambda}$, it has already crossed zero inside the interval. Larger positive $\lambda$ therefore moves the first conjugate point closer to $a$ by increasing the oscillation frequency $\sqrt{\lambda}$.
[/example]
## Jacobi's Sufficient Condition
The necessary condition rules out one kind of failure, but a calculus of variations course also needs a positive test for a minimum. The correct statement combines two ingredients: the strengthened Legendre condition gives coercivity in the derivative direction, and absence of conjugate points gives a factorisation of the second variation.
[quotetheorem:6993]
[citeproof:6993]
The strengthened Legendre condition is needed because the square identity weights the derivative term by $P$; if $P$ becomes negative, a high-frequency variation can make the second variation negative independently of conjugate points. The absence of conjugate points must include the endpoint $b$: if $u(b)=0$, then the variation $h=u$ satisfies the fixed endpoint conditions and gives a non-zero null direction, so positive definiteness fails. The smoothness and uniform Taylor hypotheses are what convert positivity of the quadratic second variation into an actual local statement about $J[y+h]-J[y]$; without control of the remainder, a positive quadratic term need not dominate the higher-order terms in the chosen topology. The theorem is still local: it proves strict weak local minimality near $y$, not global minimality among distant competitors. This is the local counterpart to the global sufficient fields developed later in Hamilton--Jacobi theory, where the information is carried by a family of extremals filling a region rather than by a single non-vanishing Jacobi field.
[explanation: The Weierstrass--Jacobi Identity]
The identity in the proof is the decisive algebraic step. The Jacobi equation lets the lower-order term $Qh^2$ be absorbed into a square involving $h'-(u'/u)h$. Thus the second variation is not merely non-negative by estimation; it is represented as an integral of a square.
The absence of conjugate points is exactly what permits division by $u$ on the whole open interval up to $b$. If $u$ vanished at some interior point, the Riccati coefficient $u'/u$ would become singular and the square-completion argument would break down at that point.
[/explanation]
## Geometric and Classical Examples
The abstract definition of conjugate points becomes memorable in examples where many extremals pass through the same endpoint. On a sphere, geodesics starting at the north pole meet again at the south pole; for surfaces of revolution, catenoids compete with discontinuous limiting surfaces. These examples show that Jacobi theory detects the first local obstruction, while global minimality may fail for additional reasons.
[example: Geodesics On The Sphere]
Let $S^2$ carry its standard round metric, and let $\gamma(t)$ be a unit-speed great circle with $\gamma(0)=p$. Along $\gamma$, a normal Jacobi field has the form $J(t)=j(t)E(t)$, where $E(t)$ is a parallel unit normal field along $\gamma$, and the scalar coefficient satisfies
\begin{align*}
j''(t)+j(t)=0.
\end{align*}
The general solution is
\begin{align*}
j(t)=A\cos t+B\sin t.
\end{align*}
The condition $J(0)=0$ is equivalent to $j(0)=0$, and
\begin{align*}
j(0)=A\cos 0+B\sin 0=A.
\end{align*}
Thus $A=0$, so every non-zero Jacobi field vanishing at $t=0$ has scalar part
\begin{align*}
j(t)=B\sin t
\end{align*}
with $B\ne 0$. Its next zero occurs when
\begin{align*}
B\sin t=0.
\end{align*}
Since $B\ne 0$, this is equivalent to
\begin{align*}
\sin t=0.
\end{align*}
Hence
\begin{align*}
t=n\pi, \qquad n\in\mathbb Z.
\end{align*}
The first positive zero is therefore $t=\pi$, which is the antipodal point of $p$ on the great circle.
For endpoints separated by a great-circle parameter length $T<\pi$, the chosen arc is the shorter arc and minimises length. If $T>\pi$, the complementary great-circle arc has length $2\pi-T$, and
\begin{align*}
2\pi-T<T
\end{align*}
is equivalent to
\begin{align*}
\pi<T.
\end{align*}
Thus after the antipodal point the original great-circle arc is longer than the arc going the other way around the sphere, so the conjugate point at $t=\pi$ marks the first loss of length-minimising behavior.
[/example]
This example explains the geometric meaning of the word conjugate. At the antipodal point, the family of meridians through the north pole focuses again, so the endpoint map from initial direction to final point loses rank.
[example: Catenoid And Goldschmidt Competitor]
For a concrete symmetric model, take two coaxial circles of radius $R$ in the planes $x=-h$ and $x=h$, and rotate a positive graph $y=y(x)$ about the $x$-axis. Its area is
\begin{align*}
A[y]=2\pi\int_{-h}^{h} y(x)\sqrt{1+y'(x)^2}\,dx.
\end{align*}
The Euler--Lagrange equation for $L(y,y')=y\sqrt{1+y'^2}$ has the first integral
\begin{align*}
L-y'L_{y'}=y\sqrt{1+y'^2}-y'\frac{yy'}{\sqrt{1+y'^2}}=\frac{y}{\sqrt{1+y'^2}}=a,
\end{align*}
where $a>0$ is constant. Solving $\frac{y}{\sqrt{1+y'^2}}=a$ gives the catenary
\begin{align*}
y(x)=a\cosh\left(\frac{x}{a}\right)
\end{align*}
in the symmetric case, and its rotation is a catenoid. The boundary condition is
\begin{align*}
R=a\cosh\left(\frac{h}{a}\right).
\end{align*}
Write $\tau=h/a$, so $R=a\cosh \tau$. The catenoid area is
\begin{align*}
A_{\mathrm{cat}}=2\pi a^2\int_{-\tau}^{\tau}\cosh^2 t\,dt.
\end{align*}
Since $\cosh^2t=(1+\cosh 2t)/2$,
\begin{align*}
\int_{-\tau}^{\tau}\cosh^2t\,dt=\tau+\sinh \tau\cosh \tau.
\end{align*}
Thus
\begin{align*}
A_{\mathrm{cat}}=2\pi a^2\bigl(\tau+\sinh \tau\cosh \tau\bigr).
\end{align*}
Using $a=R/\cosh \tau$, this becomes
\begin{align*}
A_{\mathrm{cat}}=2\pi R^2\left(\frac{\tau}{\cosh^2\tau}+\tanh \tau\right).
\end{align*}
The Goldschmidt competitor consists of the two disks spanning the boundary circles, so its area is
\begin{align*}
A_{\mathrm{Gold}}=2\pi R^2.
\end{align*}
Hence the Goldschmidt competitor has smaller area exactly when
\begin{align*}
\frac{\tau}{\cosh^2\tau}+\tanh \tau>1.
\end{align*}
For instance, at $\tau=1$,
\begin{align*}
\frac{1}{\cosh^2 1}+\tanh 1=\frac{4e^2}{(e^2+1)^2}+\frac{e^2-1}{e^2+1}.
\end{align*}
Putting the two terms over the common denominator $(e^2+1)^2$ gives
\begin{align*}
\frac{4e^2}{(e^2+1)^2}+\frac{e^2-1}{e^2+1}=\frac{e^4+4e^2-1}{(e^2+1)^2}.
\end{align*}
Because
\begin{align*}
e^4+4e^2-1-(e^2+1)^2=2(e^2-1)>0,
\end{align*}
we have $A_{\mathrm{cat}}>A_{\mathrm{Gold}}$ at $\tau=1$.
The Jacobi field coming from varying the neck parameter $a$ in the family $y_a(x)=a\cosh(x/a)$ is
\begin{align*}
\frac{\partial y_a}{\partial a}(x)=\cosh\left(\frac{x}{a}\right)-\frac{x}{a}\sinh\left(\frac{x}{a}\right).
\end{align*}
At the endpoint $x=h$, this equals
\begin{align*}
\frac{\partial y_a}{\partial a}(h)=\cosh \tau-\tau\sinh \tau=\cosh \tau(1-\tau\tanh \tau).
\end{align*}
Thus the endpoint Jacobi field vanishes precisely when
\begin{align*}
\tau\tanh \tau=1.
\end{align*}
This is the conjugate-point threshold for this symmetric catenoid family. The area comparison above shows that a catenoid may still be locally stable against nearby smooth variations while already losing the global comparison with the discontinuous two-disk competitor. The example therefore separates second-variation stability from global minimality.
[/example]
The lesson is that Jacobi's theorem is local in both the interval and the topology of admissible curves. It gives an exact second-variation criterion in the classical fixed-endpoint setting, but it does not replace comparison with distant competitors.
## Summary of the Jacobi Test
For a scalar fixed-endpoint variational problem, the practical test proceeds as follows. First solve the Euler--Lagrange equation and obtain the candidate extremal $y$. Next compute
\begin{align*}
P=L_{y'y'}(x,y,y'), \qquad Q=L_{yy}(x,y,y')-\frac{d}{dx}L_{yy'}(x,y,y').
\end{align*}
Then solve the principal Jacobi equation
\begin{align*}
-(Pu')' + Qu =0, \qquad u(a)=0, \qquad u'(a)=1.
\end{align*}
If $P>0$ and this solution has no zero in $(a,b]$, the second variation is positive definite and the extremal is a strict weak local minimum under the standard smoothness hypotheses. If a zero occurs in $(a,b)$, the extremal cannot be a weak local minimum. If the first zero occurs exactly at $b$, the second variation is degenerate and further analysis is needed.
The conjugate-point theory of Chapter 5 tests minimality under small smooth perturbations in the [weak topology](/page/Weak%20Topology). This chapter asks a stronger question: does the extremal remain a minimiser against all nearby curves, even those with large slopes? The Weierstrass excess function, built from the Lagrangian itself, provides a pointwise test for this strong minimality.
# 6. Weierstrass Excess Function and Strong Extrema
Chapters 4 and 5 developed the second variation, Legendre's condition, and Jacobi's conjugate point theory as tests for extrema measured in a weak $C^1$ topology. This chapter changes the comparison class. We ask when an extremal minimises against all nearby curves in the stronger geometric sense of being close in position, even if their slopes differ substantially. The answer requires the Weierstrass excess function, which measures how much the integrand rises when the actual slope is replaced by a competing slope while the comparison is made against the tangent field of a family of extremals.
## Weak and Strong Comparison
The Euler--Lagrange equation is obtained by perturbing a curve by small $C^1$ variations. For many geometric problems, however, the natural competitors are only required to stay close to the curve in space; they may have corners or steep slopes. We therefore need to distinguish extrema relative to the topology controlling both $y$ and $y'$ from extrema relative to the topology controlling only $y$.
[definition: Weak Local Minimum]
Let $L: [a,b] \times U \times \mathbb R^n \to \mathbb R$ be $C^2$, where $U \subset \mathbb R^n$ is open, and let $y_0 \in C^1([a,b];U)$ satisfy fixed endpoint conditions. The functional
\begin{align*}
J:\mathcal A_w \to \mathbb R,
\qquad
J[y] = \int_a^b L(x,y(x),y'(x))\,dx,
\end{align*}
where $\mathcal A_w \subset C^1([a,b];U)$ is the class of admissible curves satisfying the prescribed endpoint conditions, has a weak local minimum at $y_0$ if there exists $\varepsilon>0$ such that $J[y] \ge J[y_0]$ for every $y \in \mathcal A_w$ with
\begin{align*}
\|y-y_0\|_{C^1} < \varepsilon.
\end{align*}
[/definition]
This is the topology used by the first and second variation. It prevents competitors from having large changes in velocity, so it cannot detect the cost of sharply kinked comparison curves that remain close to $y_0$. To state the stronger geometric problem, the next definition keeps only the $C^0$ closeness of curves and allows piecewise smooth competitors.
[definition: Strong Local Minimum]
Let $L: [a,b] \times U \times \mathbb R^n \to \mathbb R$ be $C^2$, where $U \subset \mathbb R^n$ is open, and let $y_0 \in C^1([a,b];U)$ satisfy fixed endpoint conditions. The functional
\begin{align*}
J:\mathcal A_s \to \mathbb R,
\qquad
J[y] = \int_a^b L(x,y(x),y'(x))\,dx,
\end{align*}
where $\mathcal A_s$ is the class of admissible piecewise $C^1$ curves $y:[a,b]\to U$ satisfying the prescribed endpoint conditions and for which the integral is finite. The functional $J$ has a strong local minimum at $y_0$ if there exists $\varepsilon>0$ such that $J[y] \ge J[y_0]$ for every $y \in \mathcal A_s$ with
\begin{align*}
\|y-y_0\|_{C^0} < \varepsilon.
\end{align*}
[/definition]
A strong minimum is a more demanding object. Since the derivative of the competitor is not controlled in the hypothesis, a sufficient condition must compare the Lagrangian at many possible velocities, not only at velocities close to $y_0'$.
[example: Weak Minimum Need Not Control Corners]
Assume there is a velocity $Q>0$ such that
\begin{align*}
L(Q)-L(0)-L'(0)Q<0.
\end{align*}
Write this negative number as $-\eta$, with $\eta>0$. Since $L''(0)>0$, Taylor expansion at $0$ gives
\begin{align*}
L(-r)-L(0)+L'(0)r=\frac{1}{2}L''(0)r^2+o(r^2)
\end{align*}
as $r\to 0^+$. Hence we may choose $r>0$ so small that
\begin{align*}
\frac{Q}{r}\bigl(L(-r)-L(0)+L'(0)r\bigr)<\frac{\eta}{2}.
\end{align*}
Choose a short interval and define a piecewise linear competitor that has slope $Q$ for time $\ell$, then slope $-r$ for time $Q\ell/r$, and is equal to $0$ outside this interval. Its net displacement is
\begin{align*}
Q\ell+(-r)\frac{Q\ell}{r}=0,
\end{align*}
so it returns to $y_0=0$ and keeps the same endpoints. Its maximum height is $Q\ell$, so by taking $\ell$ small we make $\|y-y_0\|_{C^0}$ arbitrarily small.
On the two sloped pieces, the change in action relative to $y_0$ is
\begin{align*}
\ell\bigl(L(Q)-L(0)\bigr)+\frac{Q\ell}{r}\bigl(L(-r)-L(0)\bigr).
\end{align*}
Insert and subtract the tangent-line terms:
\begin{align*}
\ell\bigl(L(Q)-L(0)-L'(0)Q\bigr)+\frac{Q\ell}{r}\bigl(L(-r)-L(0)+L'(0)r\bigr).
\end{align*}
The omitted linear part is zero because
\begin{align*}
\ell L'(0)Q+\frac{Q\ell}{r}L'(0)(-r)=0.
\end{align*}
Therefore
\begin{align*}
J[y]-J[y_0]<-\eta\ell+\frac{\eta}{2}\ell=-\frac{\eta}{2}\ell<0.
\end{align*}
Thus the competitor can be arbitrarily close in position while lowering the action. The positivity of $L''(0)$ only controls slopes near $0$, so strong minimality requires a finite-slope inequality such as nonnegativity of the Weierstrass excess.
[/example]
## Fields of Extremals and Slope Selection
A single extremal does not tell us what slope should be used when a nearby competitor passes through a nearby point. Weierstrass' method solves this by embedding the candidate curve in a family of extremals. The family assigns to each nearby point $(x,y)$ a preferred slope $p(x,y)$, and the comparison curve is judged against that field.
[definition: Field of Extremals]
Let $D \subset [a,b] \times U$ be relatively open. A field of extremals on $D$ is a $C^1$ map
\begin{align*}
p: D \to \mathbb R^n
\end{align*}
such that through each point $(x_0,y_0)\in D$ there passes a unique $C^2$ solution $y$ of the Euler--Lagrange equation with $y(x_0)=y_0$ and $y'(x_0)=p(x_0,y_0)$, and the graph of this solution remains in $D$ locally.
[/definition]
The field turns a geometric neighbourhood into a family of candidate tangents, but a sufficiency theorem about a specified extremal $y_0$ needs an anchoring condition: along the graph of $y_0$, the field must choose the slope already belonging to $y_0$. This motivates the definition of a central field.
[definition: Central Field]
Let $y_0 \in C^2([a,b];U)$ be an extremal. A central field around $y_0$ is a field of extremals $p:D\to \mathbb R^n$ on a neighbourhood $D$ of the graph of $y_0$ such that
\begin{align*}
p(x,y_0(x)) = y_0'(x)
\end{align*}
for every $x\in [a,b]$.
[/definition]
Central fields are usually constructed from a one-parameter or $n$-parameter family of extremals. The next issue is existence: a field can fail when nearby extremals cross in a way that makes the slope through a point ambiguous. Jacobi's no-conjugate-point condition is the standard local nondegeneracy condition preventing that failure.
[quotetheorem:6994]
[citeproof:6994]
This theorem connects the Jacobi theory from the previous chapter with the Weierstrass method. The strengthened Legendre condition is used because the construction depends on solving the Euler--Lagrange equation smoothly from initial data; without that nondegeneracy, the differential equation may not determine a smooth local family of extremals. The absence of conjugate points prevents neighbouring extremals from meeting the central extremal with the same endpoint data, which would make the slope through a nearby point ambiguous. Endpoint assumptions matter because a family with a common endpoint is singular at that endpoint, so the local field must be understood through the patching argument in the proof rather than through the raw initial-slope parametrisation alone. If a conjugate point occurs, the projection from the extremal family to the $(x,y)$-plane can lose rank, and the Weierstrass comparison no longer has a single slope field to compare against.
## The Weierstrass Excess Function
We now need the quantity that measures the penalty for choosing a competitor slope $q$ instead of the field slope $p$. The first-order part in $q-p$ must be subtracted, because the Euler--Lagrange equation makes that part integrate to a boundary term along a field of extremals.
[definition: Weierstrass Excess Function]
Let $L: [a,b]\times U\times \mathbb R^n\to \mathbb R$ be $C^1$ in the velocity variable. The Weierstrass excess function is the map
\begin{align*}
\mathcal E:[a,b]\times U\times \mathbb R^n\times \mathbb R^n \to \mathbb R,
\qquad
\mathcal E(x,y,q,p)
= L(x,y,q)-L(x,y,p)-\partial_{y'}L(x,y,p)\cdot(q-p),
\end{align*}
where $x\in [a,b]$, $y\in U$, and $p,q\in \mathbb R^n$.
[/definition]
For fixed $(x,y,p)$, the expression is the vertical gap between $L(x,y,\cdot)$ at $q$ and the affine tangent approximation at $p$. Thus nonnegativity of $\mathcal E$ is a finite-slope version of convexity at the slope selected by the extremal field.
[remark: Relation With Legendre Condition]
If $L$ is $C^2$ in $y'$ and $q=p+r$, then Taylor expansion gives
\begin{align*}
\mathcal E(x,y,p+r,p)=\frac{1}{2}r^\top \partial^2_{y'y'}L(x,y,p)r+o(|r|^2)
\end{align*}
as $r\to 0$. Therefore nonnegativity of $\mathcal E$ for all small $r$ implies the Legendre condition, but it also tests velocities far from $p$.
[/remark]
The excess function is the additional ingredient missing from the weak theory. To use it inside an integral, we must separate the action into a part depending only on endpoints and a part equal to the accumulated excess. Hilbert's invariant integral supplies exactly that endpoint-dependent part.
[quotetheorem:3527]
[citeproof:3527]
The hypotheses in this theorem are not cosmetic. Closedness of the Hilbert form gives endpoint independence only on regions where the relevant curves can be deformed into one another without leaving the field domain; on a multiply connected domain, a closed form may have nonzero periods around holes. The graph-in-$D$ condition is also essential, because the field slope $p(x,y)$ and the differential form are not available outside the tube where the extremal family is single-valued. Thus the identity should be read as a local invariant-integral statement in the tube used for the sufficiency proof, not as a global path-independence theorem for arbitrary curves.
Subtracting this invariant quantity from $J[y]$ isolates the nonnegative part of the variation. Indeed, the definition of $\mathcal E$ gives the pointwise identity
\begin{align*}
L(x,y,y') = L(x,y,p)+\partial_{y'}L(x,y,p)\cdot(y'-p)+\mathcal E(x,y,y',p).
\end{align*}
[example: Convex Integrands]
Let $L(x,y,q)=a(x,y)|q|^2+V(x,y)$ with $a(x,y)>0$. Fix $(x,y)$ and write $a=a(x,y)$ and $V=V(x,y)$. Since the derivative is taken only in the velocity variable,
\begin{align*}
\partial_{y'}L(x,y,p)=2a p.
\end{align*}
Substituting this into the definition of the excess function gives
\begin{align*}
\mathcal E(x,y,q,p)=a|q|^2+V-\bigl(a|p|^2+V\bigr)-2ap\cdot(q-p).
\end{align*}
The potential terms cancel, so
\begin{align*}
\mathcal E(x,y,q,p)=a|q|^2-a|p|^2-2ap\cdot q+2a|p|^2.
\end{align*}
Combining the two $|p|^2$ terms yields
\begin{align*}
\mathcal E(x,y,q,p)=a\bigl(|q|^2-2p\cdot q+|p|^2\bigr).
\end{align*}
Because $|q-p|^2=(q-p)\cdot(q-p)=|q|^2-2p\cdot q+|p|^2$, we obtain
\begin{align*}
\mathcal E(x,y,q,p)=a(x,y)|q-p|^2.
\end{align*}
Thus every central field for an extremal yields a nonnegative excess term, and since $a(x,y)>0$, the excess is strictly positive exactly when $q\ne p$.
[/example]
## Weierstrass Sufficient Condition
The final step is to combine the invariant integral with positivity of the excess function. The theorem is local in the strong topology: the central field only needs to exist in a tube around the candidate curve, and competitors are restricted to that tube by their $C^0$ distance from $y_0$.
[quotetheorem:3528]
[citeproof:3528]
This theorem is a genuine strong sufficiency result. Unlike second-variation criteria, it allows competitors with slopes far from the reference slope, provided the finite excess remains nonnegative. The central field is necessary because the comparison must know which extremal slope belongs to each nearby point; without a single-valued field, the affine part subtracted in $\mathcal E$ is not tied to an invariant integral. The endpoint condition is equally important, since Hilbert's invariant integral cancels between $y$ and $y_0$ only when the boundary contribution is the same. The inequality must hold for every competing velocity $q$, not only for $q$ near $p(x,y)$, because strong competitors are controlled in position but may have large slopes on short intervals. Without strictness, the theorem proves minimality but not uniqueness: equality may persist along another integral curve of the field, or along a competitor whose velocity lies in a flat direction of the excess.
[example: Brachistochrone As A Strong Minimum]
For the brachistochrone functional, choose the downward vertical coordinate $y>0$ and omit the positive physical constant multiplying the travel time. The Lagrangian is
\begin{align*}
L(y,q)=\sqrt{\frac{1+q^2}{y}}=y^{-1/2}(1+q^2)^{1/2}.
\end{align*}
For fixed $y>0$, its first velocity derivative is
\begin{align*}
\partial_q L(y,q)=y^{-1/2}\frac{q}{\sqrt{1+q^2}}.
\end{align*}
Differentiating once more gives
\begin{align*}
\partial_{qq}L(y,q)=y^{-1/2}\left((1+q^2)^{-1/2}-q^2(1+q^2)^{-3/2}\right).
\end{align*}
Putting the two terms over the common denominator $(1+q^2)^{3/2}$ gives
\begin{align*}
\partial_{qq}L(y,q)=y^{-1/2}\frac{1+q^2-q^2}{(1+q^2)^{3/2}}=\frac{1}{\sqrt y(1+q^2)^{3/2}}>0.
\end{align*}
Now fix the slope $p$ selected by the cycloidal central field and compare it with an arbitrary competing slope $q$. The excess is
\begin{align*}
\mathcal E(y,q,p)=\frac{1}{\sqrt y}\left(\sqrt{1+q^2}-\sqrt{1+p^2}-\frac{p}{\sqrt{1+p^2}}(q-p)\right).
\end{align*}
Let $f(s)=\sqrt{1+s^2}$. Since
\begin{align*}
f''(s)=\frac{1}{(1+s^2)^{3/2}}>0,
\end{align*}
the function $f$ is convex. Hence
\begin{align*}
f(q)-f(p)-f'(p)(q-p)\ge 0,
\end{align*}
because the tangent line to a convex differentiable function lies below its graph. Since $1/\sqrt y>0$, this gives
\begin{align*}
\mathcal E(y,q,p)\ge 0.
\end{align*}
Before the first caustic, the cycloidal extremals give a central field, so the hypotheses of *[Weierstrass Sufficient Condition](/theorems/3528) For A Strong Minimum* apply in a tube around the chosen cycloidal arc. Therefore that cycloidal arc is a strong local minimum among nearby admissible curves with the same endpoints.
[/example]
The brachistochrone illustrates how the theorem handles a non-quadratic integrand once the correct field of extremals is known. A second geometric case shows the same principle in its most familiar form: the chosen field consists of straight geodesics, and the excess inequality is the triangle inequality in analytic language.
[example: Geodesics In A Convex Region]
Let $\Omega\subset\mathbb R^n$ be convex and open, and let $A,B\in\Omega$. For piecewise $C^1$ curves $y:[a,b]\to\Omega$ with $y(a)=A$ and $y(b)=B$, the length functional is
\begin{align*}
J[y]=\int_a^b |y'(x)|\,dx.
\end{align*}
The straight segment
\begin{align*}
\gamma(x)=A+\frac{x-a}{b-a}(B-A)
\end{align*}
lies in $\Omega$ because each point of $\gamma$ is a convex combination of $A$ and $B$. Its derivative is
\begin{align*}
\gamma'(x)=\frac{B-A}{b-a}.
\end{align*}
Hence its length is
\begin{align*}
J[\gamma]=\int_a^b \left|\frac{B-A}{b-a}\right|\,dx.
\end{align*}
Since the integrand is constant,
\begin{align*}
J[\gamma]=(b-a)\frac{|B-A|}{b-a}=|B-A|.
\end{align*}
Now let $y$ be any admissible piecewise $C^1$ curve in $\Omega$ with the same endpoints. By the fundamental theorem of calculus on each smooth subinterval and summing over the partition,
\begin{align*}
B-A=y(b)-y(a)=\int_a^b y'(x)\,dx.
\end{align*}
Taking norms gives
\begin{align*}
|B-A|=\left|\int_a^b y'(x)\,dx\right|.
\end{align*}
The integral form of the triangle inequality gives
\begin{align*}
\left|\int_a^b y'(x)\,dx\right|\le \int_a^b |y'(x)|\,dx=J[y].
\end{align*}
Therefore
\begin{align*}
J[y]\ge |B-A|=J[\gamma].
\end{align*}
Thus the straight segment minimises length among all admissible curves contained in $\Omega$, so it is in particular a strong local minimum.
This example sits just outside the stated $C^2$ Weierstrass theorem because $q\mapsto |q|$ is not $C^2$ at $q=0$. Away from zero velocity, the corresponding excess calculation is still transparent: for $p\ne 0$,
\begin{align*}
\mathcal E(q,p)=|q|-|p|-\frac{p}{|p|}\cdot(q-p).
\end{align*}
Expanding the last term gives
\begin{align*}
\mathcal E(q,p)=|q|-|p|-\frac{p}{|p|}\cdot q+\frac{p}{|p|}\cdot p.
\end{align*}
Since $\frac{p}{|p|}\cdot p=|p|$, this becomes
\begin{align*}
\mathcal E(q,p)=|q|-\frac{p}{|p|}\cdot q.
\end{align*}
By Cauchy--Schwarz,
\begin{align*}
\frac{p}{|p|}\cdot q\le \left|\frac{p}{|p|}\right||q|=|q|.
\end{align*}
Hence
\begin{align*}
\mathcal E(q,p)\ge 0.
\end{align*}
So the Weierstrass excess inequality is exactly the analytic form of the triangle-inequality argument for straight geodesics.
[/example]
The chapter therefore completes the progression from infinitesimal necessary conditions to a local sufficient condition in the strong topology. Jacobi theory supplies the central field, the excess function supplies the finite-slope positivity, and Hilbert's invariant integral turns those two pieces into a comparison of actions.
Chapters 2 through 5 developed necessary conditions—Euler–Lagrange, Legendre, and Jacobi—all extracting information from local differential structure. This chapter adds a global perspective: when the variational problem is invariant under a continuous symmetry, Noether's theorem couples that symmetry to conservation laws, which often simplify the differential equations and reveal special solutions.
# 7. Noether's Theorem
The preceding chapters developed the Euler--Lagrange equation, boundary conditions, and the second-variation tests as tools for detecting and classifying stationary curves. This chapter adds a structural principle: when a variational problem has a continuous symmetry, the Euler-Lagrange equation contains a hidden first integral. Noether's theorem is the systematic form of this principle, and it explains why the conservation laws of mechanics are not accidental features of special examples.
## Continuous Symmetries of Curves
The basic question is how to describe a family of transformations depending smoothly on a parameter, so that differentiating at parameter value zero gives the infinitesimal motion generated by the symmetry. In mechanics the independent variable is often time $t$, and the dependent variable is a curve $q: [a,b] \to \mathbb R^n$, but the same notation applies to variational problems with a general independent variable $x$.
[definition: One-Parameter Group of Transformations]
A one-parameter group of transformations on a set $M$ is a family $(\Phi_s)_{s \in I}$ of maps $\Phi_s: M \to M$, where $I \subset \mathbb R$ is an interval containing $0$, such that $\Phi_0 = \operatorname{id}_M$ and
\begin{align*}
\Phi_{s+r} = \Phi_s \circ \Phi_r
\end{align*}
whenever $s,r,s+r \in I$.
[/definition]
The group law says that applying the transformation for parameter $r$ and then for parameter $s$ is the same as applying it once for parameter $s+r$. In applications $M$ is usually an open subset of $\mathbb R^n$ or an extended configuration space containing both $t$ and $q$.
[example: Translations and Rotations]
Fix $v\in \mathbb R^n$. The translations $\Phi_s(q)=q+sv$ form a one-parameter group because $\Phi_0(q)=q$ and, for parameters $s,r$,
\begin{align*}
\Phi_s(\Phi_r(q))=(q+rv)+sv=q+(r+s)v=\Phi_{s+r}(q).
\end{align*}
Their infinitesimal generator is the constant vector field
\begin{align*}
\left.\frac{d}{ds}\right|_{s=0}\Phi_s(q)=\left.\frac{d}{ds}\right|_{s=0}(q+sv)=v.
\end{align*}
For rotations, let $R_s\in SO(n)$ with $R_0=I$ and $R_{s+r}=R_sR_r$, and set $\Phi_s(q)=R_s q$. Then
\begin{align*}
\Phi_s(\Phi_r(q))=R_s(R_rq)=(R_sR_r)q=R_{s+r}q=\Phi_{s+r}(q).
\end{align*}
If $A=\dot R_0$, then the infinitesimal generator is
\begin{align*}
\left.\frac{d}{ds}\right|_{s=0}R_s q=\dot R_0q=Aq.
\end{align*}
Since $R_s\in SO(n)$, we have $R_s^\top R_s=I$ for every $s$. Differentiating this identity at $s=0$ gives
\begin{align*}
\dot R_0^\top R_0+R_0^\top\dot R_0=A^\top+A=0,
\end{align*}
so $A$ is skew-symmetric. Thus translations generate the constant velocity field $v$, while rotations generate the linear field $q\mapsto Aq$; these are the infinitesimal forms of spatial homogeneity and isotropy.
[/example]
The example shows that the finite transformation is often more information than the variational calculation needs. Noether's theorem differentiates the symmetry identity at $s=0$, so it requires the velocity field of the [group action](/page/Group%20Action) at the identity rather than the whole orbit of transformations.
[definition: Infinitesimal Generator]
Let $M \subseteq \mathbb R^m$ be open, and let $(\Phi_s)$ be a smooth one-parameter group of transformations on $M$. Its infinitesimal generator is the vector field $X: M \to \mathbb R^m$ defined by
\begin{align*}
X(z) = \left.\frac{d}{ds}\right|_{s=0}\Phi_s(z).
\end{align*}
[/definition]
The generator is the linearised symmetry. Once $X$ is known, many conservation laws can be written without mentioning the full family $\Phi_s$.
[example: Infinitesimal Rotation in the Plane]
Let $M=\mathbb R^2$, write $q=(q_1,q_2)$, and let $\Phi_s(q)=R_s q$ be counterclockwise rotation by angle $s$, where
\begin{align*}
R_s q=(q_1\cos s-q_2\sin s,\ q_1\sin s+q_2\cos s).
\end{align*}
The infinitesimal generator is obtained by differentiating each component at $s=0$:
\begin{align*}
X(q)=\left.\frac{d}{ds}\right|_{s=0}R_s q=\left(-q_1\sin 0-q_2\cos 0,\ q_1\cos 0-q_2\sin 0\right)=(-q_2,q_1).
\end{align*}
Thus $X(q)=Aq$ for the [linear map](/page/Linear%20Map) $A:\mathbb R^2\to\mathbb R^2$ defined by $A(q_1,q_2)=(-q_2,q_1)$.
To see that this vector is tangent to the circle through $q$, use the fact that the circle centered at the origin has radial direction $q$. The dot product is
\begin{align*}
q\cdot X(q)=(q_1,q_2)\cdot(-q_2,q_1)=-q_1q_2+q_2q_1=0.
\end{align*}
So $X(q)$ is perpendicular to the radius vector $q$, hence tangent to the circle $|q|=\text{constant}$ at $q$. This vector field records the instantaneous angular displacement produced by the rotation group.
[/example]
For variational integrals we also need to transform velocities. If $q_s(t)=\Phi_s(q(t))$, then differentiating with respect to $t$ gives $\dot q_s(t)=D\Phi_s(q(t))\dot q(t)$, so the action on curves determines the induced action on position-velocity pairs.
## Invariance of the Action
The next question is what it means for a variational problem to have a symmetry. The answer is not that the Lagrangian must be pointwise unchanged in every possible coordinate expression, but that the action assigned to each admissible curve is unchanged under the transformed curve.
[definition: Invariance Under a Configuration-Space Symmetry]
Let $L: [a,b] \times U \times \mathbb R^n \to \mathbb R$ be a $C^1$ Lagrangian, where $U \subseteq \mathbb R^n$ is open. The action
\begin{align*}
J: C^1([a,b];U) \to \mathbb R, \qquad
J[q] = \int_a^b L(t,q(t),\dot q(t))\,dt
\end{align*}
is invariant under a smooth one-parameter group $(\Phi_s)$ on $U$ if, for every subinterval $[\alpha,\beta]\subseteq [a,b]$, every admissible $C^1$ curve $q: [\alpha,\beta] \to U$, and all sufficiently small $s$,
\begin{align*}
\int_\alpha^\beta L(t,\Phi_s(q(t)),D\Phi_s(q(t))\dot q(t))\,dt
= \int_\alpha^\beta L(t,q(t),\dot q(t))\,dt.
\end{align*}
[/definition]
This definition treats the independent variable $t$ as fixed. It covers spatial translations, rotations, and internal symmetries of the configuration variable. To use it inside a differential equation, we need a pointwise identity obtained by differentiating the action invariance condition and then localising in time.
[quotetheorem:3512]
[citeproof:3512]
This criterion is the local form of symmetry. The subinterval hypothesis is not cosmetic: invariance of one fixed integral over $[a,b]$ can hide cancellations between different parts of the curve, while Noether's theorem needs a density-level cancellation at each time. The regularity assumptions ensure that differentiating the transformed velocity is legitimate and that the resulting integrand is continuous enough for localisation.
The criterion also does not assert that every transformation producing a zero first variation along one particular curve is a symmetry of the variational problem. For instance, an accidental cancellation along a single extremal gives no conservation law for neighbouring solutions unless it comes from the invariance identity for all admissible curves. The theorem is therefore best read as a bridge from a uniform symmetry of the Lagrangian density to the algebraic expression that will cancel against the Euler-Lagrange equation.
[example: Rotational Invariance of a Central Force Lagrangian]
Let $\mathbb R^2_0=\mathbb R^2\setminus\{0\}$, and let
\begin{align*}
L(q,\nu)=\frac12|\nu|^2-V(|q|)
\end{align*}
on $\mathbb R^2_0\times \mathbb R^2$. For the rotation $\Phi_s(q)=R_s q$, the induced velocity transformation is $D\Phi_s(q)\nu=R_s\nu$, because $\Phi_s$ is linear. Since $R_s\in SO(2)$, we have $R_s^\top R_s=I$, so
\begin{align*}
|R_s q|^2=(R_s q)\cdot(R_s q)=q\cdot(R_s^\top R_s)q=q\cdot q=|q|^2.
\end{align*}
Both norms are nonnegative, hence $|R_s q|=|q|$. The same calculation gives
\begin{align*}
|R_s\nu|^2=(R_s\nu)\cdot(R_s\nu)=\nu\cdot(R_s^\top R_s)\nu=\nu\cdot\nu=|\nu|^2.
\end{align*}
Therefore
\begin{align*}
L(R_s q,R_s\nu)=\frac12|R_s\nu|^2-V(|R_s q|)=\frac12|\nu|^2-V(|q|)=L(q,\nu).
\end{align*}
Applying this pointwise to $(q(t),\dot q(t))$ shows that the action is invariant under rotations on every subinterval.
If $A=\dot R_0$, then the infinitesimal generator is
\begin{align*}
X(q)=\left.\frac{d}{ds}\right|_{s=0}R_s q=Aq.
\end{align*}
Also,
\begin{align*}
\partial_\nu L(q,\nu)=\nu,
\end{align*}
because differentiating $\frac12|\nu|^2=\frac12(\nu\cdot\nu)$ with respect to $\nu$ gives $\nu$, while $V(|q|)$ has no $\nu$-dependence. Thus *Noether Theorem for Configuration-Space Symmetries* gives the conserved quantity
\begin{align*}
\partial_\nu L(q,\nu)\cdot Aq=\nu\cdot Aq.
\end{align*}
For the standard counterclockwise generator $A(q_1,q_2)=(-q_2,q_1)$, this scalar is
\begin{align*}
\nu\cdot Aq=(\nu_1,\nu_2)\cdot(-q_2,q_1)=-\nu_1q_2+\nu_2q_1=q_1\nu_2-q_2\nu_1,
\end{align*}
the usual planar angular momentum.
[/example]
Time translations require a slightly larger notion because the independent variable itself is transformed. For the classical mechanics applications below, it is enough to record the resulting energy law for autonomous Lagrangians.
[definition: Autonomous Lagrangian]
A Lagrangian $L: U\times \mathbb R^n\to \mathbb R$ is autonomous if it has no explicit dependence on the independent variable $t$.
[/definition]
Autonomy is invariance under translations of time: replacing $q(t)$ by the same geometric motion with shifted parameter does not change the rule by which the action density is computed. The conserved quantity associated with this symmetry is the energy.
## Noether's Theorem
We now ask how the symmetry identity interacts with the Euler-Lagrange equation. The Euler-Lagrange equation converts $\partial_q L$ into a total derivative of $\partial_{\dot q}L$, and this is the step that turns infinitesimal invariance into a conserved expression.
[quotetheorem:3513]
[citeproof:3513]
This is the classical form of Noether's theorem for symmetries of the dependent variable. The conserved quantity is obtained by pairing the canonical momentum $\partial_{\dot q}L$ with the infinitesimal generator of the symmetry. Each hypothesis has a concrete role: invariance supplies the algebraic cancellation, the Euler-Lagrange equation supplies stationarity of the curve, and the differentiability assumptions allow the product rule computation.
If the potential is not invariant under the proposed symmetry, the conclusion fails in the expected way. For example, with
\begin{align*}
L(q,\nu)=\frac12|\nu|^2-V(q_1)
\end{align*}
on $\mathbb R^2\times \mathbb R^2$, translations in the $q_1$-direction are not symmetries unless $V$ is constant. The Euler-Lagrange equation gives $\ddot q_1=-V'(q_1)$, so the momentum component $\dot q_1$ is not conserved when $V'(q_1)\ne 0$ along the motion.
The theorem also has a deliberately limited scope. It does not include transformations of time, symmetries that change the Lagrangian by a total derivative, or boundary terms coming from gauge-type transformations. Those extensions use the same cancellation principle but add extra terms to the conserved quantity; the autonomous energy law below is the first such extension needed in the classical course.
[remark: Momentum Map Viewpoint]
For each generator $X$, Noether's theorem gives the scalar quantity $\partial_{\dot q}L\cdot X(q)$. If a Lie group has several independent generators, these scalar quantities assemble into a map from phase space to the dual of the Lie algebra. In this classical course we use only the coordinate form, but the same calculation is the seed of the modern momentum map formalism.
[/remark]
The momentum-map viewpoint explains how many configuration-space symmetries can be packaged together, but it also shows the limitation of the version just proved: it pairs momentum only with vector fields on configuration space. The next missing symmetry is translation of the independent variable itself. This case matters because autonomous mechanical systems have no preferred time origin, and the corresponding first integral should measure energy rather than linear or angular momentum. This motivates the following theorem, obtained by the same Euler-Lagrange cancellation in a slightly different algebraic form.
[quotetheorem:6995]
[citeproof:6995]
This conserved energy is the Noether quantity for time-translation symmetry. Autonomy is essential: if $L=L(t,q,\dot q)$ depends explicitly on $t$, the same computation gives
\begin{align*}
\frac{d}{dt}\left(\partial_{\dot q}L(t,q,\dot q)\cdot \dot q-L(t,q,\dot q)\right)
=-\partial_t L(t,q,\dot q)
\end{align*}
along an Euler-Lagrange solution. Thus a periodically forced or time-dependent system generally exchanges energy with the forcing rather than conserving the expression above.
This statement is still a Lagrangian conservation law, not the full Hamiltonian formalism. Writing
\begin{align*}
H=\partial_{\dot q}L\cdot \dot q-L
\end{align*}
matches the usual Hamiltonian only after the Legendre transform between velocity and momentum is well behaved. For this chapter the important point is narrower: absence of explicit time dependence gives a first integral directly from the Euler-Lagrange equation.
## Standard Conservation Laws
The main use of Noether's theorem is that it turns visible symmetries of $L$ into computable first integrals. We now extract the three classical conservation laws: momentum from translations, angular momentum from rotations, and energy from time translations.
[example: Free Particle]
Let $L(q,\nu)=\frac12|\nu|^2=\frac12(\nu\cdot \nu)$ on $\mathbb R^n\times \mathbb R^n$. Its derivatives are
\begin{align*}
\partial_q L(q,\nu)=0
\end{align*}
and
\begin{align*}
\partial_\nu L(q,\nu)=\nu.
\end{align*}
Therefore the Euler-Lagrange equation is
\begin{align*}
\frac{d}{dt}\partial_\nu L(q(t),\dot q(t))=\partial_qL(q(t),\dot q(t)).
\end{align*}
Substituting the two derivatives gives
\begin{align*}
\frac{d}{dt}\dot q(t)=0.
\end{align*}
Hence $\ddot q(t)=0$, so $\dot q(t)$ is constant and $q(t)=q(a)+(t-a)\dot q(a)$.
For translations in a fixed direction $v\in\mathbb R^n$, the symmetry is $\Phi_s(q)=q+sv$ and its infinitesimal generator is $X(q)=v$. The Noether quantity from *Noether Theorem for Configuration-Space Symmetries* is
\begin{align*}
\partial_\nu L(q,\nu)\cdot X(q)=\nu\cdot v.
\end{align*}
Along a solution this becomes
\begin{align*}
\dot q(t)\cdot v.
\end{align*}
Since this scalar is constant for every $v\in\mathbb R^n$, each component of $\dot q(t)$ is constant; this is conservation of linear momentum.
For rotations, let $R_s$ be an orthogonal one-parameter group and write $A=\dot R_0$. The induced velocity is $R_s\nu$, and orthogonality gives $R_s^\topR_s=I$. Thus
\begin{align*}
|R_s\nu|^2=(R_s\nu)\cdot(R_s\nu)=\nu\cdot(R_s^\topR_s)\nu=\nu\cdot\nu=|\nu|^2.
\end{align*}
Therefore
\begin{align*}
L(R_s q,R_s\nu)=\frac12|R_s\nu|^2=\frac12|\nu|^2=L(q,\nu),
\end{align*}
so the free-particle Lagrangian is rotationally invariant. The infinitesimal generator is $X(q)=Aq$, where differentiating $R_s^\topR_s=I$ at $s=0$ gives $A^\top+A=0$, so $A$ is skew-symmetric. Noether's theorem then gives the conserved scalar
\begin{align*}
\partial_\nu L(q,\nu)\cdot Aq=\nu\cdot Aq.
\end{align*}
Along a solution this is
\begin{align*}
\dot q(t)\cdot A q(t).
\end{align*}
This family of conserved scalars is the angular momentum tensor in disguise. With the Frobenius pairing normalized by $\langle M,A\rangle=\frac12\operatorname{tr}(M^\topA)$ on skew-symmetric matrices, set
\begin{align*}
M=\dot q\otimes q-q\otimes \dot q.
\end{align*}
Then
\begin{align*}
\langle M,A\rangle=\frac12\bigl((\dot q\otimes q):A-(q\otimes \dot q):A\bigr).
\end{align*}
Using $(u\otimes w):A=u\cdot Aw$, this becomes
\begin{align*}
\langle M,A\rangle=\frac12\bigl(\dot q\cdot Aq-q\cdot A\dot q\bigr).
\end{align*}
Since $A^\top=-A$, we have
\begin{align*}
q\cdot A\dot q=(A^\topq)\cdot\dot q=(-Aq)\cdot\dot q=-\dot q\cdot Aq.
\end{align*}
Hence
\begin{align*}
\langle M,A\rangle=\dot q\cdot Aq.
\end{align*}
Using the generator $-Aq$ reverses the sign and gives the convention $q\otimes \dot q-\dot q\otimes q$ instead.
[/example]
The free particle displays the entire dictionary in the simplest setting: homogeneity of space, isotropy of space, and homogeneity of time give momentum, angular momentum, and energy. The Kepler problem keeps the same symmetry logic but adds a nonconstant potential.
[example: Kepler Problem]
Consider the planar Kepler Lagrangian
\begin{align*}
L(q,\nu)=\frac12|\nu|^2+\frac{\mu}{|q|}, \qquad q\in \mathbb R^2_0,
\end{align*}
where $\mu>0$. It is autonomous because $L$ has no explicit $t$-dependence, and
\begin{align*}
\partial_\nu L(q,\nu)=\nu.
\end{align*}
Thus *[Energy Conservation for Autonomous Lagrangians](/theorems/6995)* gives
\begin{align*}
E=\partial_\nu L(q,\dot q)\cdot \dot q-L(q,\dot q)=\dot q\cdot \dot q-\left(\frac12|\dot q|^2+\frac{\mu}{|q|}\right)=\frac12|\dot q|^2-\frac{\mu}{|q|}.
\end{align*}
The Lagrangian is rotationally invariant. If $\Phi_s(q)=R_s q$ with $R_s\in SO(2)$, then $D\Phi_s(q)\nu=R_s\nu$, and $R_s^\topR_s=I$ gives
\begin{align*}
|R_s q|^2=(R_s q)\cdot(R_s q)=q\cdot(R_s^\topR_s)q=q\cdot q=|q|^2.
\end{align*}
Since both norms are nonnegative, $|R_s q|=|q|$. The same calculation gives $|R_s\nu|=|\nu|$, so
\begin{align*}
L(R_s q,R_s\nu)=\frac12|R_s\nu|^2+\frac{\mu}{|R_s q|}=\frac12|\nu|^2+\frac{\mu}{|q|}=L(q,\nu).
\end{align*}
For the standard infinitesimal rotation $A(q_1,q_2)=(-q_2,q_1)$, *Noether Theorem for Configuration-Space Symmetries* gives the conserved angular momentum
\begin{align*}
\partial_\nu L(q,\dot q)\cdot Aq=\dot q\cdot Aq=(\dot q_1,\dot q_2)\cdot(-q_2,q_1)=q_1\dot q_2-q_2\dot q_1.
\end{align*}
The Lagrangian is not translation invariant: for a translation $\Phi_s(q)=q+sv$, the kinetic term is unchanged, but the potential becomes $\mu/|q+sv|$. For example, with $q=(1,0)$ and $v=(1,0)$, one has $|q+sv|=|1+s|$, which is not equal to $|q|=1$ for small $s\ne 0$.
Writing $q=r(\cos\theta,\sin\theta)$ with $r=|q|>0$, the velocity is
\begin{align*}
\dot q=(\dot r\cos\theta-r\dot\theta\sin\theta,\ \dot r\sin\theta+r\dot\theta\cos\theta).
\end{align*}
Expanding the square gives
\begin{align*}
|\dot q|^2=(\dot r\cos\theta-r\dot\theta\sin\theta)^2+(\dot r\sin\theta+r\dot\theta\cos\theta)^2=\dot r^2+r^2\dot\theta^2.
\end{align*}
The angular momentum is
\begin{align*}
h=q_1\dot q_2-q_2\dot q_1=r\cos\theta(\dot r\sin\theta+r\dot\theta\cos\theta)-r\sin\theta(\dot r\cos\theta-r\dot\theta\sin\theta)=r^2\dot\theta.
\end{align*}
Substituting $\dot\theta=h/r^2$ into the energy gives
\begin{align*}
E=\frac12\dot r^2+\frac{h^2}{2r^2}-\frac{\mu}{r}.
\end{align*}
Thus energy and angular momentum reduce the planar Kepler motion to a one-dimensional radial equation with effective potential $h^2/(2r^2)-\mu/r$.
[/example]
Rotational symmetry also appears in elastic variational problems, where the unknown curve represents the shape of a physical object rather than the trajectory of a particle.
[example: Elastic Rod With Rotational Symmetry]
Let $q:[0,\ell]\to\mathbb R^2$ be an arclength-parametrised rod, and define the second-order Lagrangian
\begin{align*}
\mathcal L(q,\dot q,\ddot q)=F(|\dot q|,|\ddot q|).
\end{align*}
For a rotation $\Phi_\theta(q)=R_\theta q$ with $R_\theta\in SO(2)$, the transformed curve is $q_\theta(s)=R_\theta q(s)$. Since $R_\theta$ is constant in $s$,
\begin{align*}
\dot q_\theta(s)=\frac{d}{ds}(R_\theta q(s))=R_\theta\dot q(s).
\end{align*}
Similarly,
\begin{align*}
\ddot q_\theta(s)=\frac{d}{ds}(R_\theta\dot q(s))=R_\theta\ddot q(s).
\end{align*}
Because $R_\theta^\topR_\theta=I$, the first derivative has unchanged norm:
\begin{align*}
|R_\theta\dot q|^2=(R_\theta\dot q)\cdot(R_\theta\dot q)=\dot q\cdot(R_\theta^\topR_\theta)\dot q=\dot q\cdot\dot q=|\dot q|^2.
\end{align*}
Both sides are nonnegative, so $|R_\theta\dot q|=|\dot q|$. The same calculation gives
\begin{align*}
|R_\theta\ddot q|^2=(R_\theta\ddot q)\cdot(R_\theta\ddot q)=\ddot q\cdot(R_\theta^\topR_\theta)\ddot q=|\ddot q|^2.
\end{align*}
Hence $|R_\theta\ddot q|=|\ddot q|$, and therefore
\begin{align*}
\mathcal L(q_\theta,\dot q_\theta,\ddot q_\theta)=F(|R_\theta\dot q|,|R_\theta\ddot q|)=F(|\dot q|,|\ddot q|)=\mathcal L(q,\dot q,\ddot q).
\end{align*}
Integrating this identity over $[0,\ell]$ shows that $J[q_\theta]=J[q]$, so rotating the whole rod is a symmetry of the bending energy.
If $A=\dot R_0$, then the infinitesimal rotation field is $X(q)=Aq$, and along the curve one has
\begin{align*}
\frac{d}{ds}X(q(s))=A\dot q(s).
\end{align*}
For a second-order Lagrangian, the angular Noether expression has the form
\begin{align*}
\left(\partial_{\dot q}\mathcal L-\frac{d}{ds}\partial_{\ddot q}\mathcal L\right)\cdot Aq+\partial_{\ddot q}\mathcal L\cdot A\dot q.
\end{align*}
The first factor is the momentum conjugate to $\dot q$, corrected by the derivative of the momentum conjugate to $\ddot q$; the second term is the extra contribution caused by the dependence on curvature. Thus the rotational symmetry of the rod produces an angular balance law, and its structure is exactly the higher-order analogue of pairing momentum with the infinitesimal generator.
[/example]
The chapter's message is that conservation laws are diagnostic: they reveal the symmetry group of the variational problem. Conversely, when solving Euler-Lagrange equations, finding a symmetry is often the fastest route to reducing the order of the differential equation.
Noether's theorem revealed that the Lagrangian formulation automatically produces conserved momenta whenever the Lagrangian has a cyclic coordinate. Moving beyond velocity and acceleration, this chapter introduces the Hamiltonian and Legendre transform, recasting the first-order conservation laws into the language of phase space and canonical coordinates.
# 8. Hamiltonian Formulation and Legendre Transform
The Euler-Lagrange equation treats the velocity $y'$ as part of the Lagrangian data, while mechanics often works instead with momentum. Building on Noether's conserved momenta from the previous chapter, this chapter explains the passage from the Lagrangian description to the Hamiltonian description by a Legendre transform in the velocity variable. The reward is a first-order system on phase space, a geometric formulation using the symplectic form, and a compact language for conservation laws already seen through Noether's theorem.
## From Velocity to Momentum
The guiding problem is whether the Euler-Lagrange equation can be rewritten as a first-order system whose variables are position and momentum. For a curve $y:[a,b] \to \mathbb R^n$ and a Lagrangian $L:[a,b]\times \mathbb R^n\times \mathbb R^n \to \mathbb R$, the momentum should measure the sensitivity of $L$ to changes in velocity.
[definition: Conjugate Momentum]
Let $L \in C^2([a,b]\times \mathbb R^n\times \mathbb R^n)$ and let $y:[a,b]\to \mathbb R^n$ be a $C^1$ curve. The conjugate momentum associated to $y$ is the map $p:[a,b]\to \mathbb R^n$ defined by
\begin{align*}
p(x) = \partial_v L(x,y(x),y'(x)).
\end{align*}
[/definition]
Momentum gives a new variable, but it is useful as a replacement for velocity only when no information is lost. The possible failure is that two different velocities can give the same momentum, or that the inverse relation changes unstably under small perturbations.
The next hypothesis packages the needed nondegeneracy as a uniform convexity condition in the velocity variable. By requiring the velocity Hessian to dominate a fixed positive multiple of the identity, it gives a robust criterion for treating momentum as an invertible replacement for velocity when passing from Lagrangian to Hamiltonian variables.
[definition: Strengthened Legendre Condition]
A Lagrangian $L \in C^2([a,b]\times \mathbb R^n\times \mathbb R^n)$ satisfies the strengthened Legendre condition on a set $U \subset [a,b]\times \mathbb R^n\times \mathbb R^n$ if there exists $\theta>0$ such that
\begin{align*}
\xi^\top \partial^2_{vv}L(x,y,v)\,\xi \ge \theta |\xi|^2
\end{align*}
for all $(x,y,v)\in U$ and all $\xi\in \mathbb R^n$.
[/definition]
This condition is stronger than the nondegeneracy condition used in Jacobi theory: the velocity Hessian must be positive definite, not merely invertible. It makes the momentum-velocity relation locally invertible and, in the standard convex mechanical examples, globally invertible. Once velocity can be recovered from $(x,y,p)$, the variational quantity to study is the dual energy obtained by subtracting the Lagrangian from $p\cdot v$. This motivates the following definition.
[definition: Hamiltonian]
Let $W\subset [a,b]\times \mathbb R^n\times \mathbb R^n$ be a set of triples $(x,y,p)$ for which there is a unique velocity $v=v(x,y,p)\in \mathbb R^n$ satisfying
\begin{align*}
p=\partial_v L(x,y,v).
\end{align*}
The Hamiltonian associated to $L$ on $W$ is the map $H:W\to \mathbb R$ defined by
\begin{align*}
H(x,y,p)=p\cdot v(x,y,p)-L(x,y,v(x,y,p)).
\end{align*}
[/definition]
The formula is the Legendre transform of $L$ in the velocity variable. To use $H$ in differential equations, we need to know how its derivatives are related to the derivatives of $L$; the important cancellation is that differentiating the hidden velocity $v(x,y,p)$ produces terms that cancel against $p=\partial_vL$. This motivates the following theorem.
[quotetheorem:6996]
[citeproof:6996]
The duality theorem gives the general identities, but its hypotheses are doing real work. The diffeomorphism assumption is what lets $p$ serve as a coordinate: if two different velocities give the same momentum, then $H(x,y,p)$ is not a single-valued function of $(x,y,p)$. The strengthened Legendre condition is a convenient convexity hypothesis ensuring that the velocity Hessian has a definite sign; for example, $L(v)=v^4/4$ has $p=v^3$, but the Hessian vanishes at $v=0$, so the inverse velocity map is not smoothly controlled there. The theorem does not say that every Lagrangian admits a global Hamiltonian, nor does it address singular systems with constraints. It supplies the derivative identities needed next, where the Euler-Lagrange equation is rewritten as a first-order system.
A quadratic kinetic energy example shows how the formal construction recovers the usual mechanical Hamiltonian and gives a concrete model for the strengthened Legendre condition.
[example: Natural Mechanical Lagrangian]
Let $M$ be a symmetric positive definite $n\times n$ matrix and let
\begin{align*}
L(y,v)=\frac{1}{2}v^\top Mv-V(y).
\end{align*}
Since $M^\top=M$, differentiating the quadratic term in the velocity variable gives
\begin{align*}
\partial_v\left(\frac{1}{2}v^\top Mv\right)=\frac{1}{2}(M+M^\top)v=Mv.
\end{align*}
The potential term $V(y)$ has no velocity dependence, so the conjugate momentum is
\begin{align*}
p=\partial_vL(y,v)=Mv.
\end{align*}
Because $M$ is positive definite, it is invertible, and therefore the velocity is recovered from momentum by
\begin{align*}
v=M^{-1}p.
\end{align*}
Substituting this velocity into the Legendre-transform formula $H(y,p)=p\cdot v-L(y,v)$ gives
\begin{align*}
H(y,p)=p^\top M^{-1}p-\left(\frac{1}{2}(M^{-1}p)^\top M(M^{-1}p)-V(y)\right).
\end{align*}
Using $(M^{-1})^\top=M^{-1}$, we have
\begin{align*}
(M^{-1}p)^\top M(M^{-1}p)=p^\top M^{-1}MM^{-1}p=p^\top M^{-1}p.
\end{align*}
Hence
\begin{align*}
H(y,p)=p^\top M^{-1}p-\frac{1}{2}p^\top M^{-1}p+V(y)=\frac{1}{2}p^\top M^{-1}p+V(y).
\end{align*}
Also $\partial^2_{vv}L=M$, so the strengthened Legendre condition is exactly the statement that there is $\theta>0$ such that $\xi^\top M\xi\ge \theta|\xi|^2$ for every $\xi\in\mathbb R^n$. Thus the Hamiltonian splits into kinetic energy written in momentum variables plus potential energy.
[/example]
The example makes the duality identities look like the usual conversion between velocity and momentum. In the general theory, those same identities are the computational engine behind Hamilton's equations: differentiating $H$ with respect to momentum gives velocity, while differentiating with respect to position gives the negative force term from the Euler-Lagrange equation.
## Hamilton's Canonical Equations
The next question is whether the Hamiltonian system contains the same information as the Euler-Lagrange equation. Under the strengthened Legendre condition, a second-order equation for $y$ becomes a first-order system for $(y,p)$.
[definition: Hamiltonian Trajectory]
Let $H:[a,b]\times \mathbb R^n\times \mathbb R^n\to \mathbb R$ be a function with $H\in C^2([a,b]\times \mathbb R^n\times \mathbb R^n;\mathbb R)$. A Hamiltonian trajectory is a $C^1$ curve $(y,p):[a,b]\to \mathbb R^n\times \mathbb R^n$ satisfying
\begin{align*}
y'(x)=\partial_p H(x,y(x),p(x)).
\end{align*}
and
\begin{align*}
p'(x)=-\partial_y H(x,y(x),p(x)).
\end{align*}
[/definition]
These are [Hamilton's canonical equations](/theorems/3515). The independent variable is still denoted $x$ to match the variational problem; in mechanics it is usually time $t$. The next theorem verifies that the new first-order system is equivalent to the old Euler-Lagrange equation whenever the Legendre transform is valid.
[quotetheorem:6997]
[citeproof:6997]
The theorem shows that Hamilton's formulation is not a different variational problem at this level. It is a change of coordinates from tangent variables $(y,v)$ to cotangent variables $(y,p)$, valid where the Legendre map is invertible. The invertibility hypothesis cannot be removed: if $L$ is independent of $v$, then $p=0$ for every velocity, so the momentum variable loses the entire velocity information and no equivalent first-order phase-space system can be recovered from $(y,p)$. Even nondegenerate but non-global Legendre maps may give only local Hamiltonian descriptions, so the theorem is not a global existence statement for all solutions. Its role is to justify transferring variational equations into phase space before studying geometry and conservation laws.
[example: Simple Pendulum]
For a pendulum of length $\ell$ and mass $m$, use the angle $\theta$ as coordinate and take
\begin{align*}
L(\theta,\dot{\theta})=\frac{1}{2}m\ell^2\dot{\theta}^{2}-mg\ell(1-\cos\theta).
\end{align*}
Differentiating with respect to the velocity variable $\dot{\theta}$ gives
\begin{align*}
p=\partial_{\dot{\theta}}L(\theta,\dot{\theta})=\partial_{\dot{\theta}}\left(\frac{1}{2}m\ell^2\dot{\theta}^{2}\right)-\partial_{\dot{\theta}}\left(mg\ell(1-\cos\theta)\right)=m\ell^2\dot{\theta}.
\end{align*}
Since $m\ell^2>0$, this relation can be inverted:
\begin{align*}
\dot{\theta}=\frac{p}{m\ell^2}.
\end{align*}
Substituting this velocity into $H(\theta,p)=p\dot{\theta}-L(\theta,\dot{\theta})$ gives
\begin{align*}
H(\theta,p)=p\frac{p}{m\ell^2}-\left(\frac{1}{2}m\ell^2\left(\frac{p}{m\ell^2}\right)^2-mg\ell(1-\cos\theta)\right).
\end{align*}
The quadratic term is
\begin{align*}
\frac{1}{2}m\ell^2\left(\frac{p}{m\ell^2}\right)^2=\frac{1}{2}m\ell^2\frac{p^2}{m^2\ell^4}=\frac{p^2}{2m\ell^2}.
\end{align*}
Therefore
\begin{align*}
H(\theta,p)=\frac{p^2}{m\ell^2}-\frac{p^2}{2m\ell^2}+mg\ell(1-\cos\theta)=\frac{p^2}{2m\ell^2}+mg\ell(1-\cos\theta).
\end{align*}
Hamilton's first equation gives
\begin{align*}
\dot{\theta}=\partial_pH(\theta,p)=\partial_p\left(\frac{p^2}{2m\ell^2}+mg\ell(1-\cos\theta)\right)=\frac{p}{m\ell^2}.
\end{align*}
Hamilton's second equation gives
\begin{align*}
\dot p=-\partial_\theta H(\theta,p)=-\partial_\theta\left(\frac{p^2}{2m\ell^2}+mg\ell(1-\cos\theta)\right)=-mg\ell\sin\theta.
\end{align*}
Differentiating $\dot{\theta}=p/(m\ell^2)$ with respect to time gives
\begin{align*}
\ddot{\theta}=\frac{\dot p}{m\ell^2}.
\end{align*}
Substituting $\dot p=-mg\ell\sin\theta$ yields
\begin{align*}
\ddot{\theta}=-\frac{mg\ell\sin\theta}{m\ell^2}=-\frac{g}{\ell}\sin\theta.
\end{align*}
Multiplying by $m\ell^2$ gives
\begin{align*}
m\ell^2\ddot{\theta}+mg\ell\sin\theta=0.
\end{align*}
Thus the Hamiltonian first-order system recovers the usual second-order pendulum equation.
[/example]
The Hamiltonian form also separates the roles of position and momentum in phase portraits. For the pendulum, level sets of $H$ distinguish oscillating trajectories from rotating trajectories, with separatrices passing through the unstable equilibrium at $\theta=\pi$.
## Symplectic Structure and Canonical Transformations
The coordinate equations above hide a geometric structure. The central question is which changes of phase-space coordinates preserve the Hamiltonian form of the equations.
[definition: Canonical Symplectic Form]
On phase space $\mathbb R^n_y\times \mathbb R^n_p=\mathbb R^{2n}$, the canonical symplectic form is the differential $2$-form $\omega\in \Omega^2(\mathbb R^{2n})$ defined by
\begin{align*}
\omega=\sum_{i=1}^n dy_i\wedge dp_i.
\end{align*}
For each $z\in\mathbb R^{2n}$, the value of $\omega$ at $z$ is the alternating bilinear map
\begin{align*}
\omega_z:T_z\mathbb R^{2n}\times T_z\mathbb R^{2n}\to \mathbb R
\end{align*}
determined by the displayed formula.
[/definition]
The symplectic form pairs changes in position with changes in momentum. In matrix language, if $z=(y,p)\in \mathbb R^{2n}$, Hamilton's equations can be written using
\begin{align*}
J=\begin{pmatrix}0&I_n\cr -I_n&0\end{pmatrix}
\end{align*}
as
\begin{align*}
z'=J\nabla_z H(x,z).
\end{align*}
With the convention $\omega=\sum_i dy_i\wedge dp_i$, this matrix equation is equivalent to
\begin{align*}
\omega(z',\zeta)=dH(\zeta)
\end{align*}
for every tangent vector $\zeta\in \mathbb R^{2n}$. This structure suggests a precise test for an acceptable change of phase-space coordinates: it should preserve $\omega$, since $\omega$ is the object that converts $dH$ into the Hamiltonian vector field. This motivates the following definition.
[definition: Canonical Transformation]
A $C^1$ diffeomorphism $\Phi:U\subset \mathbb R^{2n}\to V\subset \mathbb R^{2n}$ is canonical if it preserves the canonical symplectic form:
\begin{align*}
\Phi^*\omega=\omega.
\end{align*}
[/definition]
This condition is the phase-space analogue of an orthogonal change of variables preserving a Euclidean [inner product](/page/Inner%20Product), except that the preserved object is skew-symmetric rather than symmetric. The reason for imposing it is dynamical: in coordinates where the transformation is a sufficiently regular symplectic diffeomorphism and the transformed Hamiltonian is written by composition with the inverse coordinate map, Hamilton's equations keep their canonical form. The canonical hypothesis is essential because an arbitrary diffeomorphism may distort $\omega$ and therefore change the rule that turns $dH$ into a vector field. For instance, if $y$ is fixed and $p$ is replaced uniformly by $\lambda p$ with $\lambda\ne 1$, then each term $dy_i\wedge dp_i$ is multiplied by $\lambda$. The resulting dynamics is not determined by the original Hamiltonian form unless the Hamiltonian and the coordinate change are adjusted so that the symplectic equation is preserved. This observation does not classify all canonical transformations; it only gives the invariance principle needed for Hamiltonian dynamics. This is why phase space, rather than configuration space alone, is the natural setting for the geometric theory.
Canonical transformations preserve the symplectic form itself. Hamiltonian flows have the same coordinate-level preservation property. For an autonomous Hamiltonian $H(y,p)$, the Hamiltonian vector field in canonical coordinates is
\begin{align*}
X_H(y,p)=\left(\partial_pH(y,p),-\partial_yH(y,p)\right).
\end{align*}
Its ordinary Euclidean divergence is
\begin{align*}
\operatorname{div}X_H
=\sum_{i=1}^n \frac{\partial^2H}{\partial y_i\partial p_i}
-\sum_{i=1}^n \frac{\partial^2H}{\partial p_i\partial y_i}
=0,
\end{align*}
assuming the mixed partial derivatives commute. Thus the local phase-space volume does not expand or contract under the Hamiltonian flow. The Hamiltonian hypothesis is essential: a general first-order system such as $z'=-z$ has negative divergence and contracts volume. This invariant volume behaviour is one of the basic qualitative features distinguishing Hamiltonian dynamics from gradient-like dynamics.
## Cyclic Coordinates and Conservation Laws
The final question is how the conservation laws from Noether's theorem look after passing to Hamiltonian variables. The simplest case is a coordinate that does not appear in the Hamiltonian.
[definition: Cyclic Coordinate]
Let $W\subset [a,b]\times \mathbb R^n\times \mathbb R^n$ and let $H:W\to \mathbb R$ be a Hamiltonian with coordinates $(x,y,p)$, where $y=(y_1,\dots,y_n)$. The coordinate $y_i$ is cyclic if $H$ is independent of $y_i$.
[/definition]
A cyclic coordinate removes one force term from Hamilton's second equation. The relevant obstruction is explicit dependence on $y_i$: when $H$ varies with that coordinate, Hamilton's equation gives $p_i'=-\partial_{y_i}H$, so a force term can change the corresponding momentum. If the dependence is absent, the same equation predicts that the momentum component is constant along every Hamiltonian trajectory.
For a Hamiltonian $H(y,p)$ on an open phase-space domain, if the coordinate $y_i$ is cyclic, then $\partial H/\partial y_i=0$. Hamilton's momentum equation therefore gives
\begin{align*}
\frac{d p_i}{dt}(t)=0.
\end{align*}
Consequently $p_i$ is constant along every Hamiltonian trajectory on which the cyclic-coordinate hypothesis holds. This is a direct but important use of the exact Hamiltonian equations. Its hypothesis is necessary: if $H$ depends on $y_i$, then $p_i'=-\partial_{y_i}H$ usually produces a nonzero force term, so the corresponding momentum need not be conserved. The calculation also does not claim that every conserved momentum comes from a visible coordinate independence; more general symmetries require the broader Noether viewpoint. A second basic conservation law comes from independence of the independent variable itself: if the system is autonomous, then the Hamiltonian has no explicit $x$ dependence and should be preserved along its own flow. This motivates the following theorem.
[quotetheorem:6842]
[citeproof:6842]
The autonomy hypothesis is the whole point of the energy conservation theorem. If $H=H(x,y,p)$ has explicit $x$ dependence, then differentiating along a Hamiltonian trajectory leaves an additional term $\partial_xH$, so the Hamiltonian value is generally not constant. The theorem also concerns conservation along solutions, not arbitrary curves in phase space, and it does not by itself imply that level sets are compact or that the motion is periodic. These conservation laws are especially efficient in central-force problems, where angular coordinates are cyclic and the Hamiltonian is autonomous.
[example: Kepler Problem in Hamiltonian Form]
For a particle of mass $m>0$ moving in $\mathbb R^3_0$ under the attractive potential $V(r)=-\kappa/r$, with $r=|q|$ and $\kappa>0$, take
\begin{align*}
H(q,p)=\frac{|p|^2}{2m}-\frac{\kappa}{|q|}.
\end{align*}
The momentum derivative is
\begin{align*}
\partial_pH(q,p)=\partial_p\left(\frac{p\cdot p}{2m}\right)-\partial_p\left(\frac{\kappa}{|q|}\right)=\frac{1}{2m}(2p)-0=\frac{p}{m}.
\end{align*}
For the position derivative, write $|q|=(q\cdot q)^{1/2}$. Then
\begin{align*}
\partial_{q_i}|q|^{-1}=-|q|^{-2}\partial_{q_i}|q|=-|q|^{-2}\frac{q_i}{|q|}=-\frac{q_i}{|q|^3}.
\end{align*}
Hence
\begin{align*}
\partial_{q_i}H(q,p)=-\kappa\partial_{q_i}|q|^{-1}=\frac{\kappa q_i}{|q|^3}.
\end{align*}
Therefore
\begin{align*}
\partial_qH(q,p)=\frac{\kappa q}{|q|^3}.
\end{align*}
Hamilton's equations are consequently
\begin{align*}
\dot q=\partial_pH(q,p)=\frac{p}{m}.
\end{align*}
and
\begin{align*}
\dot p=-\partial_qH(q,p)=-\frac{\kappa q}{|q|^3}.
\end{align*}
Along any solution, the Hamiltonian derivative is
\begin{align*}
\frac{d}{dt}H(q(t),p(t))=\partial_qH\cdot \dot q+\partial_pH\cdot \dot p.
\end{align*}
Substituting the two Hamilton equations gives
\begin{align*}
\frac{d}{dt}H(q(t),p(t))=\frac{\kappa q}{|q|^3}\cdot \frac{p}{m}+\frac{p}{m}\cdot\left(-\frac{\kappa q}{|q|^3}\right)=0.
\end{align*}
Thus the total energy $H$ is constant.
The angular momentum is
\begin{align*}
L_{\mathrm{ang}}=q\times p.
\end{align*}
Differentiating and using the product rule for the cross product gives
\begin{align*}
\frac{d}{dt}(q\times p)=\dot q\times p+q\times \dot p.
\end{align*}
Substituting $\dot q=p/m$ and $\dot p=-\kappa q/|q|^3$ gives
\begin{align*}
\frac{d}{dt}(q\times p)=\frac{p}{m}\times p+q\times\left(-\frac{\kappa q}{|q|^3}\right).
\end{align*}
Since a vector crosses itself to zero,
\begin{align*}
\frac{p}{m}\times p=\frac{1}{m}(p\times p)=0.
\end{align*}
and
\begin{align*}
q\times\left(-\frac{\kappa q}{|q|^3}\right)=-\frac{\kappa}{|q|^3}(q\times q)=0.
\end{align*}
Therefore
\begin{align*}
\frac{d}{dt}(q\times p)=0.
\end{align*}
The Kepler Hamiltonian is autonomous, giving [conservation of energy](/theorems/1335), and its force is always parallel to $q$, giving conservation of angular momentum; in spherical coordinates this is the same cyclic-coordinate mechanism expressed through angular variables.
[/example]
The Hamiltonian viewpoint closes the circle from the Euler-Lagrange equation back to conservation laws. The Legendre transform converts tangent dynamics into phase-space dynamics; the symplectic form identifies the invariant geometric structure; cyclic coordinates and autonomy turn symmetry into conserved quantities in a compact first-order language.
The Hamilton–Jacobi equation is a first-order PDE in which a single unknown function—the action—encodes all information needed to integrate the canonical equations. Rather than solving many first-order ODEs, Hamilton–Jacobi theory solves one PDE whose characteristics recover the extremals and whose level sets define a complete system of canonical transformations.
# 9. Hamilton–Jacobi Theory
Hamilton-Jacobi theory recasts the variational problem in a form where a single first-order partial differential equation carries the information of Hamilton's canonical equations. Chapters 4 through 8 used extremals, second variation, conjugate points, conserved quantities, and Hamiltonian phase space to understand a given stationary curve. Here the viewpoint is global: instead of solving for one extremal at a time, we seek a family of extremals encoded by a generating function whose derivatives produce momenta.
The chapter has two linked aims. First, it explains how the Hamilton-Jacobi equation arises from Hamilton's principal function. Second, it shows how a sufficiently rich solution, called a complete integral, reduces Hamilton's system to quadratures. This is the classical bridge between variational calculus, mechanics, and integrable systems.
## The Hamilton-Jacobi Equation
The problem is to replace the system of canonical ordinary differential equations by a first-order PDE for one scalar function. Suppose the extremals of a functional have already been transformed by the Legendre correspondence into Hamiltonian form, with independent variable $x$, dependent variable $y \in \mathbb R^n$, momentum $p \in \mathbb R^n$, and Hamiltonian $H: U \times V \times \mathbb R^n \to \mathbb R$, where $U \subset \mathbb R$ and $V \subset \mathbb R^n$ are open.
[definition: Hamiltonian System]
Let $H \in C^2(U \times V \times \mathbb R^n)$. A curve $(y,p): I \to V \times \mathbb R^n$ is a solution of Hamilton's equations for $H$ if, for each $i=1,\dots,n$ and all $x \in I$,
\begin{align*}
\frac{dy_i}{dx} = \frac{\partial H}{\partial p_i}(x,y,p)
\end{align*}
and
\begin{align*}
\frac{dp_i}{dx} = -\frac{\partial H}{\partial y_i}(x,y,p).
\end{align*}
[/definition]
Hamilton's equations describe a curve in phase space. The next step is to ask whether such curves can lie on the graph of a gradient $p = \nabla_y S(x,y)$, so that the momentum is derived from one scalar function $S$ rather than chosen independently. This requirement is exactly what forces the Hamilton-Jacobi equation.
[definition: Hamilton-Jacobi Equation]
Let $H \in C^1(U \times V \times \mathbb R^n)$. A function $S \in C^1(U \times V)$ satisfies the Hamilton-Jacobi equation for $H$ if
\begin{align*}
\frac{\partial S}{\partial x}(x,y) + H\bigl(x,y,\nabla_y S(x,y)\bigr) = 0
\end{align*}
for all $(x,y) \in U \times V$.
[/definition]
The notation $S_x + H(x,y,S_y)=0$ in the lecture plan is the scalar version of this formula. When $n=1$, $S_y$ is the ordinary partial derivative with respect to the dependent coordinate; when $n>1$, it denotes the gradient with respect to $y$.
[example: Free Particle Hamilton-Jacobi Equation]
Consider the free-particle Hamiltonian $H(p)=|p|^2/(2m)$ on $\mathbb R^n$, with $m>0$. Since this Hamiltonian has no explicit dependence on $x$ or $y$, substituting it into the Hamilton-Jacobi equation gives
\begin{align*}
S_x(x,y)+\frac{|\nabla_yS(x,y)|^2}{2m}=0.
\end{align*}
Fix a constant vector $a\in\mathbb R^n$ and define
\begin{align*}
S(x,y;a)=a\cdot y-\frac{|a|^2}{2m}x.
\end{align*}
Differentiating with respect to $x$ gives
\begin{align*}
S_x(x,y;a)=-\frac{|a|^2}{2m}.
\end{align*}
For each coordinate $y_i$, differentiating $a\cdot y=\sum_{i=1}^n a_i y_i$ gives
\begin{align*}
\frac{\partial S}{\partial y_i}(x,y;a)=a_i.
\end{align*}
Hence
\begin{align*}
\nabla_yS(x,y;a)=a.
\end{align*}
Substituting these two derivatives into the Hamilton-Jacobi equation gives
\begin{align*}
S_x+\frac{|\nabla_yS|^2}{2m}=-\frac{|a|^2}{2m}+\frac{|a|^2}{2m}=0.
\end{align*}
The generated momentum is therefore
\begin{align*}
p=\nabla_yS=a.
\end{align*}
Hamilton's equations for this Hamiltonian are
\begin{align*}
\frac{dy}{dx}=\nabla_pH(p)=\frac{p}{m}
\end{align*}
and
\begin{align*}
\frac{dp}{dx}=0,
\end{align*}
because $H$ is independent of $y$. With $p=a$, this becomes
\begin{align*}
\frac{dy}{dx}=\frac{a}{m}.
\end{align*}
Integrating componentwise gives
\begin{align*}
y(x)=y_0+\frac{a}{m}(x-x_0).
\end{align*}
Thus each parameter $a$ labels a constant momentum, and the scalar functions $S(x,y;a)$ encode the corresponding straight-line free-particle trajectories.
[/example]
The free particle shows that a solution $S$ can produce a momentum field. The variational question is whether any solution of the Hamilton-Jacobi equation has this property, and whether the resulting curves satisfy the canonical equations already obtained from the Euler-Lagrange theory.
[quotetheorem:3517]
[citeproof:3517]
This theorem turns the PDE into a device for generating characteristic curves. It also explains why the Hamilton-Jacobi equation gives a sufficient condition for extremals: once the graph $p=\nabla_y S$ is known, the canonical equations follow from solving only the first-order system for $y$. The $C^2$ hypotheses are used exactly where the proof differentiates the Hamilton-Jacobi equation in the $y$ variables and then compares the result with the derivative of $p=\nabla_yS$ along the curve. If $S$ is only $C^1$, the equation may still make classical pointwise sense, but this calculation no longer justifies Hamilton's momentum equation; weaker theories require viscosity or weak-solution methods not used in these notes. The theorem also does not assert that every Hamiltonian trajectory lies on a single-valued gradient graph $p=\nabla_yS$; caustics and crossing projections can prevent such a graph from existing globally.
[remark: Geometric Reading]
The graph of $d_yS$ is a Lagrangian submanifold of phase space in the symplectic language. Hamilton-Jacobi theory asks for such a graph to be invariant under the Hamiltonian flow, with the extra $S_x$ term accounting for the independent variable. The course uses this geometric picture only as guidance; the proofs here are coordinate computations.
[/remark]
The theorem gives a way to pass from $S$ to trajectories, but it does not yet explain how to find enough solutions $S$. The [method of characteristics](/page/Method%20of%20Characteristics) supplies the local construction and connects the PDE back to Hamilton's equations.
## Complete Integrals and Characteristics
The central question now is not whether a single solution $S$ generates some Hamiltonian motion, but whether a family of solutions generates all nearby motions. A first-order PDE typically needs a family of characteristic curves, and for Hamilton-Jacobi this family should carry $n$ independent constants matching the dimension of configuration space.
[definition: Complete Integral]
Let $H \in C^1(U \times V \times \mathbb R^n)$. A $C^2$ function
\begin{align*}
S: U \times V \times A \to \mathbb R,
\end{align*}
where $A \subset \mathbb R^n$ is open, is a complete integral of the Hamilton-Jacobi equation if, for each $a \in A$, the function $(x,y)\mapsto S(x,y,a)$ satisfies
\begin{align*}
S_x(x,y,a)+H(x,y,\nabla_yS(x,y,a))=0,
\end{align*}
and the matrix
\begin{align*}
\left(\frac{\partial^2 S}{\partial y_i\partial a_j}(x,y,a)\right)_{i,j=1}^n
\end{align*}
is invertible at the points under consideration.
[/definition]
The non-degeneracy condition says that the parameters $a$ genuinely move the momenta. Without it, the family may contain repeated or dependent solutions and cannot parametrize an $n$-parameter family of phase-space initial data.
[example: Complete Integral for the Free Particle]
For the free-particle Hamiltonian $H(p)=|p|^2/(2m)$ with $m>0$, consider the family
\begin{align*}
S(x,y,a)=a\cdot y-\frac{|a|^2}{2m}x
\end{align*}
with parameter $a=(a_1,\dots,a_n)\in\mathbb R^n$. Since
\begin{align*}
a\cdot y=\sum_{k=1}^n a_k y_k
\end{align*}
and
\begin{align*}
|a|^2=\sum_{k=1}^n a_k^2,
\end{align*}
differentiating with respect to $x$ gives
\begin{align*}
S_x(x,y,a)=-\frac{|a|^2}{2m}.
\end{align*}
For each $i=1,\dots,n$, differentiating with respect to $y_i$ gives
\begin{align*}
\frac{\partial S}{\partial y_i}(x,y,a)=a_i.
\end{align*}
Thus
\begin{align*}
\nabla_y S(x,y,a)=a.
\end{align*}
Substituting these expressions into the Hamilton-Jacobi equation gives
\begin{align*}
S_x(x,y,a)+H(\nabla_yS(x,y,a))=-\frac{|a|^2}{2m}+\frac{|a|^2}{2m}=0.
\end{align*}
So each fixed value of $a$ gives a solution of the Hamilton-Jacobi equation.
It remains to check the non-degeneracy condition in the definition of complete integral. Since
\begin{align*}
\frac{\partial S}{\partial y_i}(x,y,a)=a_i,
\end{align*}
differentiating this identity with respect to $a_j$ gives
\begin{align*}
\frac{\partial^2S}{\partial y_i\partial a_j}(x,y,a)=\frac{\partial a_i}{\partial a_j}.
\end{align*}
The coordinate derivative satisfies $\partial a_i/\partial a_j=1$ when $i=j$ and $\partial a_i/\partial a_j=0$ when $i\ne j$, so
\begin{align*}
\left(\frac{\partial^2S}{\partial y_i\partial a_j}\right)_{i,j=1}^n=I_n.
\end{align*}
The identity matrix is invertible, hence $S$ is a complete integral on $\mathbb R^n$ in the parameter $a$. The constants $a$ are exactly the generated momenta $p=\nabla_yS=a$, while the remaining constants are obtained from Jacobi's equations $S_a=b$.
[/example]
The free-particle complete integral illustrates how parameters label momenta, but it does not yet identify the differential equations along which $S$ is transported. To build complete integrals locally, we need the characteristic system attached to the Hamilton-Jacobi PDE.
[quotetheorem:3517]
[citeproof:3517]
The last equation is the action identity. Since $p\cdot y'-H$ is the Legendre transform expression for the Lagrangian, $S$ changes along a characteristic by the Lagrangian action density. This statement is conditional on already being on a characteristic with $p=\nabla_yS(x,y(x))$; it does not by itself construct a solution $S$ on an [open set](/page/Open%20Set). If two projected characteristics meet with different momenta, a single $C^2$ function $S$ cannot represent both branches near the crossing, because the gradient at the same point would have to take two values.
[explanation: Characteristics as Extremals]
When the Legendre transform between $y'$ and $p$ is regular, the characteristic curves project to Euler-Lagrange extremals in configuration space. Thus the Hamilton-Jacobi PDE does not introduce a different variational problem; it reorganises the same extremals into wavefronts $S=\text{constant}$ and gradient graphs $p=\nabla_yS$. The advantage is that constants of integration may be found through partial derivatives of $S$ rather than by integrating Hamilton's equations directly.
[/explanation]
A complete integral contains enough parameters to recover trajectories by algebraic differentiation. This is Jacobi's theorem.
## Jacobi's Theorem and Integration of Hamilton's Equations
The next problem is to turn a complete integral into explicit solutions of Hamilton's equations. The guiding principle is that the parameters $a$ in $S(x,y,a)$ should be paired with conjugate constants $b$, obtained by differentiating $S$ with respect to $a$.
[quotetheorem:6998]
[citeproof:6998]
Jacobi's theorem is an integration method because the solution is obtained from equations $S_a=b$ rather than from solving the canonical ODEs directly. The constants $a$ and $b$ provide the expected $2n$ constants for a first-order Hamiltonian system in $2n$ phase variables. The mixed Hessian condition is essential: if $S_{ya}$ is singular, the equations $S_a=b$ may fail to determine $y$ as a function of $x$, or may determine fewer than $n$ independent branches. The conclusion is therefore local in both configuration space and parameter space; near turning points, caustics, or singular separated coordinates, the same motion may need several compatible generating functions rather than one global complete integral.
[example: Harmonic Oscillator]
For the one-dimensional harmonic oscillator with $m>0$ and $\omega>0$,
\begin{align*}
H(y,p)=\frac{p^2}{2m}+\frac{m\omega^2y^2}{2}.
\end{align*}
Fix an energy $E>0$ and work on an interval where $2mE-m^2\omega^2y^2>0$. Seek a separated solution $S(x,y,E)=W(y,E)-Ex$. The Hamilton-Jacobi equation $S_x+H(y,S_y)=0$ becomes
\begin{align*}
-E+\frac{1}{2m}\left(\frac{dW}{dy}\right)^2+\frac{m\omega^2y^2}{2}=0.
\end{align*}
Equivalently,
\begin{align*}
\left(\frac{dW}{dy}\right)^2=2mE-m^2\omega^2y^2.
\end{align*}
Choosing the positive momentum branch gives
\begin{align*}
\frac{dW}{dy}=\sqrt{2mE-m^2\omega^2y^2}.
\end{align*}
Thus, with any fixed lower endpoint $y_\ast$ in the same interval,
\begin{align*}
W(y,E)=\int_{y_\ast}^y \sqrt{2mE-m^2\omega^2s^2}\,ds.
\end{align*}
For this choice,
\begin{align*}
S_x(x,y,E)=-E.
\end{align*}
By the fundamental theorem of calculus,
\begin{align*}
S_y(x,y,E)=W_y(y,E)=\sqrt{2mE-m^2\omega^2y^2}.
\end{align*}
Substituting into the Hamiltonian gives
\begin{align*}
H(y,S_y)=\frac{2mE-m^2\omega^2y^2}{2m}+\frac{m\omega^2y^2}{2}.
\end{align*}
The first term is
\begin{align*}
\frac{2mE-m^2\omega^2y^2}{2m}=E-\frac{m\omega^2y^2}{2},
\end{align*}
so
\begin{align*}
H(y,S_y)=E-\frac{m\omega^2y^2}{2}+\frac{m\omega^2y^2}{2}=E.
\end{align*}
Therefore
\begin{align*}
S_x+H(y,S_y)=-E+E=0,
\end{align*}
so the separated expression satisfies the Hamilton-Jacobi equation on the chosen branch.
Jacobi's equation is $S_E=b$. Since
\begin{align*}
\frac{\partial}{\partial E}\sqrt{2mE-m^2\omega^2s^2}=\frac{m}{\sqrt{2mE-m^2\omega^2s^2}},
\end{align*}
differentiating under the integral sign gives
\begin{align*}
S_E(x,y,E)=\int_{y_\ast}^y \frac{m\,ds}{\sqrt{2mE-m^2\omega^2s^2}}-x.
\end{align*}
Thus $S_E=b$ is equivalent to
\begin{align*}
x+b=\int_{y_\ast}^y \frac{m\,ds}{\sqrt{2mE-m^2\omega^2s^2}}.
\end{align*}
Set
\begin{align*}
A=\sqrt{\frac{2E}{m\omega^2}}.
\end{align*}
Then
\begin{align*}
2mE-m^2\omega^2s^2=m^2\omega^2(A^2-s^2),
\end{align*}
so the phase relation becomes
\begin{align*}
x+b=\frac{1}{\omega}\int_{y_\ast}^y \frac{ds}{\sqrt{A^2-s^2}}.
\end{align*}
Since
\begin{align*}
\frac{d}{ds}\arcsin\left(\frac{s}{A}\right)=\frac{1}{\sqrt{A^2-s^2}},
\end{align*}
we obtain
\begin{align*}
\omega(x+b)=\arcsin\left(\frac{y}{A}\right)-\arcsin\left(\frac{y_\ast}{A}\right).
\end{align*}
Solving for $y$ gives
\begin{align*}
y(x)=A\sin\left(\omega(x+b)+\arcsin\left(\frac{y_\ast}{A}\right)\right).
\end{align*}
Thus the energy fixes the amplitude $A=\sqrt{2E/(m\omega^2)}$, while the constant $b$ fixes the phase of the sinusoidal motion.
[/example]
The oscillator computation also shows the local nature of the construction. The square root changes sign at turning points, so the complete integral is usually built in coordinate patches and then interpreted through the full phase portrait.
[remark: Complete Integrals Are Not Unique]
Adding a function of the parameters $a$ to $S(x,y,a)$ changes the constants $b=S_a$ but not the momenta $p=S_y$. Thus different complete integrals may generate the same family of phase-space curves with different labels. This freedom is often useful when choosing coordinates adapted to separation.
[/remark]
Jacobi's theorem gives the abstract integration method. The remaining question is how a complete integral can be found in examples beyond the free particle and oscillator. The classical answer is separation of variables.
## Separation of Variables and Action-Angle Coordinates
The practical question in Hamilton-Jacobi theory is whether the PDE can be reduced to ordinary differential equations. Separation of variables looks for a complete integral as a sum of terms, each depending on only one coordinate and on suitable separation constants.
[definition: Additive Separation]
Let $Q=Q_1\times\cdots\times Q_n\subset\mathbb R^n$ be an open coordinate domain with coordinates $q=(q_1,\dots,q_n)$, and let $H:Q\times\mathbb R^n\to\mathbb R$ be a $C^1$ autonomous Hamiltonian. The time-independent Hamilton-Jacobi equation separates additively on $Q$ if there is an open parameter domain $A\subset\mathbb R^n$ and a complete integral $S:U\times Q\times A\to\mathbb R$ of the form
\begin{align*}
S(x,q,a)=W(q,a)-a_1x, \qquad W(q,a)=W_1(q_1,a)+\cdots+W_n(q_n,a),
\end{align*}
where each $W_i:Q_i\times A\to\mathbb R$ is $C^1$ in the variables used and $a=(a_1,\dots,a_n)$ are separation constants.
[/definition]
After separation, the Hamilton-Jacobi PDE becomes a chain of ordinary differential equations for the functions $W_i$. The constant $a_1$ is often the energy, while the remaining constants arise from hidden symmetries or special coordinates.
[example: Central Force Separation]
For a particle in the plane, take polar coordinates $(r,\theta)$ and Hamiltonian
\begin{align*}H(r,\theta,p_r,p_\theta)=\frac{p_r^2}{2m}+\frac{p_\theta^2}{2mr^2}+V(r).\end{align*}
Try a separated function
\begin{align*}S(x,r,\theta;E,\ell)=W_r(r;E,\ell)+\ell\theta-Ex.\end{align*}
Its partial derivatives are
\begin{align*}S_x=-E,\qquad S_r=\frac{dW_r}{dr},\qquad S_\theta=\ell.\end{align*}
Substituting $p_r=S_r$ and $p_\theta=S_\theta$ into the Hamilton-Jacobi equation $S_x+H(r,\theta,S_r,S_\theta)=0$ gives
\begin{align*}-E+\frac{1}{2m}\left(\frac{dW_r}{dr}\right)^2+\frac{\ell^2}{2mr^2}+V(r)=0.\end{align*}
Moving $E$ to the other side gives the radial equation
\begin{align*}\frac{1}{2m}\left(\frac{dW_r}{dr}\right)^2+\frac{\ell^2}{2mr^2}+V(r)=E.\end{align*}
Multiplying by $2m$ and isolating the square gives
\begin{align*}\left(\frac{dW_r}{dr}\right)^2=2m(E-V(r))-\frac{\ell^2}{r^2}.\end{align*}
On an interval where
\begin{align*}2m(E-V(r))-\frac{\ell^2}{r^2}>0,\end{align*}
one branch of the radial action is therefore
\begin{align*}W_r(r;E,\ell)=\sigma\int_{r_\ast}^{r}\sqrt{2m(E-V(s))-\frac{\ell^2}{s^2}}\,ds,\qquad \sigma\in\{1,-1\}.\end{align*}
For this branch,
\begin{align*}p_r=S_r=\sigma\sqrt{2m(E-V(r))-\frac{\ell^2}{r^2}}\end{align*}
and
\begin{align*}p_\theta=S_\theta=\ell.\end{align*}
Thus the parameter $\ell$ is exactly the angular momentum, and the radial momentum is determined by the energy equation.
Jacobi's equations are $S_E=b_E$ and $S_\ell=b_\ell$. Since
\begin{align*}\frac{\partial}{\partial E}\sqrt{2m(E-V(s))-\frac{\ell^2}{s^2}}=\frac{m}{\sqrt{2m(E-V(s))-\frac{\ell^2}{s^2}}},\end{align*}
the first equation becomes
\begin{align*}b_E=\sigma\int_{r_\ast}^{r}\frac{m\,ds}{\sqrt{2m(E-V(s))-\ell^2/s^2}}-x.\end{align*}
Equivalently,
\begin{align*}x+b_E=\sigma\int_{r_\ast}^{r}\frac{m\,ds}{\sqrt{2m(E-V(s))-\ell^2/s^2}}.\end{align*}
This relation determines $r$ as a function of $x$ on any interval where the right-hand side is locally invertible.
For the angular equation,
\begin{align*}\frac{\partial}{\partial \ell}\sqrt{2m(E-V(s))-\frac{\ell^2}{s^2}}=-\frac{\ell}{s^2\sqrt{2m(E-V(s))-\ell^2/s^2}},\end{align*}
so
\begin{align*}b_\ell=\theta-\sigma\int_{r_\ast}^{r}\frac{\ell\,ds}{s^2\sqrt{2m(E-V(s))-\ell^2/s^2}}.\end{align*}
Equivalently,
\begin{align*}\theta-b_\ell=\sigma\int_{r_\ast}^{r}\frac{\ell\,ds}{s^2\sqrt{2m(E-V(s))-\ell^2/s^2}}.\end{align*}
Thus separation reduces the central-force motion to two one-variable integrals: one gives the radial motion $r(x)$, and the other gives the angle $\theta$ as a function of $r$.
[/example]
The central-force example shows that separated constants can be interpreted as conserved quantities. For periodic motion, a more intrinsic way to record such constants is to measure the phase-space area enclosed by a cycle, which leads to the action variable.
[definition: Action Variable]
Let $J,K\subset\mathbb R$ be open intervals with phase coordinates $(y,p)\in J\times K$, and let $H:J\times K\to\mathbb R$ be a $C^1$ one-degree-of-freedom Hamiltonian. If $\Gamma:S^1\to J\times K$ is a closed $C^1$ phase curve of a periodic solution, oriented by the Hamiltonian flow, the action variable associated with $\Gamma$ is
\begin{align*}
I=\frac{1}{2\pi}\oint_\Gamma p\,dy.
\end{align*}
[/definition]
The action variable is designed to be paired with an angle coordinate measuring position along the periodic orbit. This raises the structural question behind complete integrability: if enough such pairs exist, how simple do Hamilton's equations become? In local action-angle coordinates $(I,\theta)\in A\times\mathbb T^n$, when the Hamiltonian has the form $H=H(I)$, Hamilton's equations reduce to
\begin{align*}
\dot I=0,\qquad \dot\theta=\nabla_IH(I).
\end{align*}
Thus the motion is linear on invariant tori. The force of this normal form lies in its hypothesis. Action-angle coordinates are not automatic for an arbitrary Hamiltonian system; they encode a strong integrability assumption and may exist only locally on suitable regular invariant tori. Once such coordinates are available, the equations become immediate: the Hamiltonian has no dependence on the angle variables, so the actions are constant and the angles advance with frequency vector $\nabla_IH(I)$. This explains why Hamilton-Jacobi separation is powerful: finding enough separation constants is the analytic shadow of finding action variables.
[example: Jacobi's Ellipsoid Problem]
Let the ellipsoid have semiaxes $a_1>a_2>a_3>0$, and write $\alpha_i=a_i^2$. In ellipsoidal coordinates $(\lambda,\mu)$ on the ellipsoid, adapted to the confocal quadrics
\begin{align*}
\sum_{i=1}^3 \frac{x_i^2}{\alpha_i-\rho}=1,
\end{align*}
the geodesic Hamiltonian has the orthogonal form
\begin{align*}
H(\lambda,\mu,p_\lambda,p_\mu)
=\frac{2(\alpha_1-\lambda)(\alpha_2-\lambda)(\alpha_3-\lambda)}{\lambda-\mu}p_\lambda^2
+\frac{2(\alpha_1-\mu)(\alpha_2-\mu)(\alpha_3-\mu)}{\mu-\lambda}p_\mu^2.
\end{align*}
Seek a separated solution
\begin{align*}
S(x,\lambda,\mu;E,k)=W_\lambda(\lambda;E,k)+W_\mu(\mu;E,k)-Ex.
\end{align*}
Then
\begin{align*}
S_x=-E,\qquad S_\lambda=\frac{dW_\lambda}{d\lambda},\qquad S_\mu=\frac{dW_\mu}{d\mu}.
\end{align*}
Substituting these derivatives into $S_x+H(\lambda,\mu,S_\lambda,S_\mu)=0$ gives
\begin{align*}
-E+\frac{2(\alpha_1-\lambda)(\alpha_2-\lambda)(\alpha_3-\lambda)}{\lambda-\mu}\left(\frac{dW_\lambda}{d\lambda}\right)^2
+\frac{2(\alpha_1-\mu)(\alpha_2-\mu)(\alpha_3-\mu)}{\mu-\lambda}\left(\frac{dW_\mu}{d\mu}\right)^2=0.
\end{align*}
Multiplying by $\lambda-\mu$ gives
\begin{align*}
-E(\lambda-\mu)+2(\alpha_1-\lambda)(\alpha_2-\lambda)(\alpha_3-\lambda)\left(\frac{dW_\lambda}{d\lambda}\right)^2
-2(\alpha_1-\mu)(\alpha_2-\mu)(\alpha_3-\mu)\left(\frac{dW_\mu}{d\mu}\right)^2=0.
\end{align*}
Rearranging separates the $\lambda$-terms from the $\mu$-terms:
\begin{align*}
2(\alpha_1-\lambda)(\alpha_2-\lambda)(\alpha_3-\lambda)\left(\frac{dW_\lambda}{d\lambda}\right)^2-E\lambda
=
2(\alpha_1-\mu)(\alpha_2-\mu)(\alpha_3-\mu)\left(\frac{dW_\mu}{d\mu}\right)^2-E\mu.
\end{align*}
The left side depends only on $\lambda$, while the right side depends only on $\mu$, so on a rectangle in the $(\lambda,\mu)$-plane both sides must equal a constant $k$. Hence
\begin{align*}
2(\alpha_1-\lambda)(\alpha_2-\lambda)(\alpha_3-\lambda)\left(\frac{dW_\lambda}{d\lambda}\right)^2-E\lambda=k
\end{align*}
and
\begin{align*}
2(\alpha_1-\mu)(\alpha_2-\mu)(\alpha_3-\mu)\left(\frac{dW_\mu}{d\mu}\right)^2-E\mu=k.
\end{align*}
Solving each equation for the square of the derivative gives
\begin{align*}
\left(\frac{dW_\lambda}{d\lambda}\right)^2=
\frac{E\lambda+k}{2(\alpha_1-\lambda)(\alpha_2-\lambda)(\alpha_3-\lambda)}
\end{align*}
and
\begin{align*}
\left(\frac{dW_\mu}{d\mu}\right)^2=
\frac{E\mu+k}{2(\alpha_1-\mu)(\alpha_2-\mu)(\alpha_3-\mu)}.
\end{align*}
On coordinate intervals where the displayed quotients are positive, choosing signs $\sigma_\lambda,\sigma_\mu\in\{1,-1\}$ gives
\begin{align*}
W_\lambda(\lambda;E,k)=\sigma_\lambda\int_{\lambda_\ast}^{\lambda}
\sqrt{\frac{Es+k}{2(\alpha_1-s)(\alpha_2-s)(\alpha_3-s)}}\,ds
\end{align*}
and
\begin{align*}
W_\mu(\mu;E,k)=\sigma_\mu\int_{\mu_\ast}^{\mu}
\sqrt{\frac{Es+k}{2(\alpha_1-s)(\alpha_2-s)(\alpha_3-s)}}\,ds.
\end{align*}
Thus the Hamilton-Jacobi equation has been reduced to two one-variable quadratures. The constant $E$ fixes the speed of the geodesic, while the second separation constant $k$ labels the confocal quadric tangent to the geodesic; Jacobi's reconstruction equations $S_E=b_E$ and $S_k=b_k$ then recover the geodesic from these separated integrals.
[/example]
Hamilton-Jacobi theory therefore completes the classical arc of the course. The Euler-Lagrange equation finds stationary curves, the second variation and Jacobi fields test them locally, Noether's theorem identifies conserved quantities, and the Hamilton-Jacobi equation organises sufficiently many conserved quantities into a method for integrating the whole Hamiltonian system.
The final chapter assembles the sufficient conditions for a global minimum: the Euler–Lagrange equation (necessary), Legendre's condition (necessary), the Weierstrass excess function (strong local sufficiency), Jacobi's no-conjugate-point criterion (weak local sufficiency), and the Hamilton–Jacobi equation (global structure). Together, these five perspectives provide a complete sufficiency theory that validates the stationary curves found by the first variation.
# 10. Sufficiency via the Hamilton–Jacobi Equation
The final chapter turns the necessary conditions developed earlier into a sufficiency theory. The Euler--Lagrange equation, Legendre condition, Weierstrass excess condition, Jacobi equation, and Hamilton--Jacobi equation each test a candidate curve from a different angle; this chapter packages them into a single sufficiency construction. The central question is: when does a stationary curve stop being merely a candidate and become a genuine minimiser among all nearby admissible curves?
## From Fields to the Hamilton-Jacobi Equation
The Weierstrass theory gave a local sufficiency criterion once an extremal field was available. The remaining problem is to recognise when such a field exists without constructing every extremal separately. Hamilton-Jacobi theory answers this by replacing the family of extremals by one scalar generating function.
A field should assign to each point in a region a direction in which the unique extremal through that point travels. The following definition records the geometric object whose existence made the Weierstrass sufficiency proof work.
[definition: Field of Extremals]
Let $L: [a,b] \times U \times \mathbb R^n \to \mathbb R$ be $C^2$, where $U \subset \mathbb R^n$ is open, and let the action functional be
\begin{align*}
J:C^1([a,b],U) \to \mathbb R
\end{align*}
be defined by
\begin{align*}
J[y]=\int_a^b L(x,y(x),y'(x))\,dx.
\end{align*}
A field of extremals on a region $D \subset [a,b] \times U$ is a $C^1$ map $p: D \to \mathbb R^n$ such that every integral curve $y:I\to U$ satisfying $y'(x)=p(x,y(x))$ is an extremal for the restricted action on each subinterval $I\subset [a,b]$ on which its graph lies in $D$.
[/definition]
A field turns the variational problem into comparison against a calibrated family of stationary curves. To connect this with a PDE, we must pass from velocities to momenta, since the Hamilton-Jacobi function differentiates to momentum rather than velocity.
[definition: Legendre Transform for a Regular Lagrangian]
Let $L: [a,b] \times U \times \mathbb R^n \to \mathbb R$ be $C^2$. The fibre derivative of $L$ is the map
\begin{align*}
\mathcal{F}L:[a,b]\times U\times \mathbb R^n \to [a,b]\times U\times (\mathbb R^n)^*.
\end{align*}
It is given by
\begin{align*}
\mathcal{F}L(x,y,v)=(x,y,\partial_v L(x,y,v)).
\end{align*}
Given a region $D\subset [a,b]\times U$ and an open velocity set $V\subset \mathbb R^n$, the Lagrangian is regular on $D \times V$ if, for each $(x,y)\in D$, the map $v \mapsto \partial_v L(x,y,v)$ has nonsingular Jacobian matrix $\partial^2_{vv}L(x,y,v)$ for every $v\in V$.
[/definition]
Regularity lets us solve momentum for velocity, so it makes the Hamiltonian a genuine function rather than a multivalued relation. The next definition introduces the energy-like quantity whose characteristic equations reproduce the Euler-Lagrange equation and whose appearance in the Hamilton-Jacobi PDE will encode the field.
[definition: Hamiltonian Associated to a Regular Lagrangian]
Let $L$ be regular on $D\times V$, and suppose the fibre derivative restricts to a diffeomorphism from $D\times V$ onto an open momentum region $D\times P$, where $P\subset (\mathbb R^n)^*$. The inverse velocity map is
\begin{align*}
w:D\times P\to V
\end{align*}
defined by the relation
\begin{align*}
\pi=\partial_v L(x,y,w(x,y,\pi)).
\end{align*}
The Hamiltonian associated to $L$ is the function $H:D\times P\to \mathbb R$ defined by
\begin{align*}
H(x,y,\pi)=\pi\cdot w(x,y,\pi)-L(x,y,w(x,y,\pi)).
\end{align*}
[/definition]
The Hamiltonian is designed so that differentiating with respect to momentum recovers velocity, while differentiating with respect to position recovers the momentum equation. We now seek a scalar function whose spatial gradient gives the momentum field; imposing compatibility with the Hamiltonian produces the Hamilton-Jacobi equation.
[definition: Hamilton-Jacobi Equation]
For a Hamiltonian $H: [a,b]\times U\times \mathbb R^n\to \mathbb R$, the Hamilton-Jacobi equation for an unknown function $S: [a,b]\times U\to \mathbb R$ is
\begin{align*}
\partial_x S(x,y)+H(x,y,\nabla_y S(x,y))=0.
\end{align*}
[/definition]
Here $x$ is the independent curve parameter, not a spatial coordinate inside $y$. The gradient $\nabla_y S$ supplies the momentum field, and the Legendre transform converts that momentum field into the velocity field. The next theorem states that this is not only a useful construction: under the exactness condition needed to integrate the field form, it is equivalent to the field method.
[quotetheorem:7000]
[citeproof:7000]
This theorem explains why Hamilton-Jacobi theory is not an unrelated PDE method. It is the same field theory expressed through an exact differential form. The simply connected hypothesis is not cosmetic: on a punctured region a closed one-form may have nonzero period around the hole, so it need not be the differential of a single-valued action function $S$. The closedness of the full Hilbert form is also essential, since matching only the spatial momentum components does not control the $dx$ component that becomes the Hamilton-Jacobi equation. Regularity is the bridge between momentum and velocity; if the Legendre transform is singular, the same momentum may correspond to more than one velocity, and a scalar generating function no longer determines a unique extremal field.
[example: Quadratic Kinetic Lagrangian]
Take $L(x,y,v)=\frac12|v|^2-V(y)$ on $U\subset\mathbb R^n$, with the Euclidean pairing used to identify momenta with vectors. Its fibre derivative is
\begin{align*}
\partial_vL(x,y,v)=v.
\end{align*}
Thus the momentum variable $\pi$ satisfies $\pi=v$, so the inverse velocity map is
\begin{align*}
w(x,y,\pi)=\pi.
\end{align*}
Substituting this into the definition of the Hamiltonian gives
\begin{align*}
H(x,y,\pi)=\pi\cdot w(x,y,\pi)-L(x,y,w(x,y,\pi)).
\end{align*}
Since $w(x,y,\pi)=\pi$, this becomes
\begin{align*}
H(x,y,\pi)=\pi\cdot\pi-\left(\frac12|\pi|^2-V(y)\right).
\end{align*}
Using $\pi\cdot\pi=|\pi|^2$, we obtain
\begin{align*}
H(x,y,\pi)=|\pi|^2-\frac12|\pi|^2+V(y).
\end{align*}
Therefore
\begin{align*}
H(x,y,\pi)=\frac12|\pi|^2+V(y).
\end{align*}
The Hamilton-Jacobi equation $\partial_xS(x,y)+H(x,y,\nabla_yS(x,y))=0$ therefore becomes
\begin{align*}
\partial_xS(x,y)+\frac12|\nabla_yS(x,y)|^2+V(y)=0.
\end{align*}
The characteristic velocity is obtained by differentiating $H$ with respect to momentum:
\begin{align*}
\partial_\pi H(x,y,\pi)=\pi.
\end{align*}
With $\pi=\nabla_yS(x,y)$, the generated extremals satisfy
\begin{align*}
y'(x)=\nabla_yS(x,y(x)).
\end{align*}
Thus, for this kinetic-minus-potential Lagrangian, the Hamilton-Jacobi solution turns the mechanical trajectories into the characteristic curves of the first-order PDE.
[/example]
The example shows the characteristic meaning of the equation. Sufficiency needs one more ingredient: the Hamilton-Jacobi solution must calibrate competitors so that its endpoint values control the action.
[quotetheorem:3518]
[citeproof:3518]
This is the sufficiency theorem in its most useful form. It turns a global PDE solution into a certificate that no admissible curve inside the field region has smaller action. The Weierstrass condition is the inequality that prevents a competitor from gaining action by using a different velocity at the same point; without it, the endpoint term supplied by $S$ would still be fixed, but the excess could be negative. The requirement that every competitor have graph in $D$ is equally important, because outside $D$ the function $S$ and the field velocity may not exist or may become multivalued after characteristics cross. Regularity keeps the comparison single-valued: if the Legendre transform degenerates, the gradient of $S$ may fail to determine the velocity used in the Hilbert identity.
## Jacobi Fields and Breakdown of Hamilton-Jacobi Solutions
The previous theorem depends on a smooth field covering the competitors. The natural failure mode is that nearby extremals cross, because then a point no longer determines a unique velocity in the field. Jacobi fields measure this crossing infinitesimally.
[definition: Jacobi Field Along an Extremal]
Let $L:[a,b]\times U\times\mathbb R^n\to\mathbb R$ be $C^2$, and let
\begin{align*}
J:C^1([a,b],U)\to\mathbb R
\end{align*}
be defined by
\begin{align*}
J[y]=\int_a^b L(x,y(x),y'(x))\,dx.
\end{align*}
Let $y_0\in C^2([a,b],U)$ be an extremal for $J$. A Jacobi field along $y_0$ is a function $h\in C^2([a,b],\mathbb R^n)$ satisfying the linearised Euler-Lagrange equation obtained by differentiating the Euler-Lagrange equation at $y_0$ in the variation direction $h$.
[/definition]
Jacobi fields arise from differentiating a family of extremals, so they describe first-order changes in endpoints under first-order changes in the initial direction. When a nonzero such field vanishes at two parameter values, the family has lost local invertibility at the second endpoint; this motivates the precise notion of conjugacy.
[definition: Conjugate Point]
Let $L:[a,b]\times U\times\mathbb R^n\to\mathbb R$ be $C^2$, let $J:C^1([a,b],U)\to\mathbb R$ be the associated action functional, and let $y_0\in C^2([a,b],U)$ be an extremal for $J$. A point $c\in(a,b]$ is conjugate to $a$ along $y_0$ if there exists a nonzero Jacobi field $h\in C^2([a,b],\mathbb R^n)$ along $y_0$ such that $h(a)=0$ and $h(c)=0$.
[/definition]
The definition is the infinitesimal version of two nearby extremals meeting again, so it marks the place where a field may cease to be a graph. Hamilton-Jacobi theory relies on assigning a single momentum to each nearby point through a smooth field of extremals. A conjugate point is precisely the warning sign that the endpoint map from initial data has lost local invertibility, so the characteristic family can fold and cease to define such a single-valued field.
[quotetheorem:3533]
[citeproof:3533]
The theorem is the Hamilton-Jacobi interpretation of the Jacobi necessary condition. Before the first conjugate point, extremals can form a field; at a conjugate point, the characteristic construction folds. The regularity of the field is part of the conclusion, not a technical decoration: if the velocity assignment is discontinuous or multi-valued, the inverse-function-theorem argument has no smooth map to invert. The neighbourhood assumption also matters, because a conjugate point outside the region on which $S$ is defined says nothing about a local calibration that never reaches it. Conversely, absence of conjugate points is only an obstruction test; it does not by itself construct a global Hamilton-Jacobi solution or rule out topological obstructions to exactness.
[example: Harmonic Oscillator Caustic]
For $\omega>0$, the Euler-Lagrange equation for $L(q,v)=\frac12v^2-\frac12\omega^2q^2$ is obtained from
\begin{align*}
\partial_vL(q,v)=v
\end{align*}
and
\begin{align*}
\partial_qL(q,v)=-\omega^2q.
\end{align*}
Thus
\begin{align*}
\frac{d}{dt}\partial_vL(q,q')-\partial_qL(q,q')=q''+\omega^2q,
\end{align*}
so extremals satisfy $q''+\omega^2q=0$.
Linearising this equation in a variation direction $h$ gives
\begin{align*}
\frac{d}{ds}\bigl((q+sh)''+\omega^2(q+sh)\bigr)\big|_{s=0}=h''+\omega^2h.
\end{align*}
Hence the Jacobi equation is $h''+\omega^2h=0$. Its general solution is
\begin{align*}
h(t)=A\cos(\omega t)+B\sin(\omega t).
\end{align*}
The condition $h(0)=0$ gives $A=0$, and
\begin{align*}
h'(t)=B\omega\cos(\omega t)
\end{align*}
so $h'(0)=1$ gives $B=1/\omega$. Therefore
\begin{align*}
h(t)=\omega^{-1}\sin(\omega t).
\end{align*}
At $t=\pi/\omega$,
\begin{align*}
h(\pi/\omega)=\omega^{-1}\sin(\pi)=0.
\end{align*}
The characteristic family starting from $q(0)=q_0$ with initial velocity $\lambda$ is
\begin{align*}
Q(t,\lambda)=q_0\cos(\omega t)+\frac{\lambda}{\omega}\sin(\omega t).
\end{align*}
The endpoint map $F_t(\lambda)=Q(t,\lambda)$ has derivative
\begin{align*}
D_\lambda F_t(\lambda)=\frac1\omega\sin(\omega t).
\end{align*}
This derivative is nonzero for $0<t<\pi/\omega$ and equals $0$ at $t=\pi/\omega$. Thus the endpoint parametrisation is locally invertible before $t=\pi/\omega$, but at $t=\pi/\omega$ nearby initial velocities focus at the same endpoint to first order, so the Hamilton-Jacobi characteristic family based at $t=0$ develops a caustic there.
[/example]
This example gives a concrete picture of the general obstruction. The Jacobi equation is not merely a second-variation test; it predicts where the Hamilton-Jacobi sufficiency certificate stops being available.
## Brachistochrone Revisited
The brachistochrone was introduced as a model problem where the Euler-Lagrange equation identifies a cycloid. At the end of the course, we can return to it and ask for a sufficiency proof, not just a derivation of the candidate.
For a bead sliding without friction under gravity, with vertical coordinate chosen so that $y>0$ measures downward distance from the release point, the travel-time functional can be written in the form
\begin{align*}
J[y]=\int_{x_A}^{x_B}\sqrt{\frac{1+(y')^2}{2gy}}\,dx.
\end{align*}
The integrand is convex in $y'$ for $y>0$, but the singularity at $y=0$ and the endpoint geometry make the global argument delicate.
[quotetheorem:3518]
[citeproof:3518]
This proof has a different status from the Euler-Lagrange calculation. The cycloid is not only stationary; inside a cycloidal field before the crossing of extremals, it is certified by a calibration. The restriction $y\ge \varepsilon>0$ avoids the singular release point: if an endpoint lies at $y=0$, then $L(y,v)$ is singular and neither regularity nor the Hamilton-Jacobi comparison is available at that endpoint. The nonintersection assumption is separate: once two cycloids cross inside the swept region, a point may carry two possible velocities, so there is no field. The admissible-class restriction is also separate from both of these failures, because the calibration only compares curves whose graphs remain in $D$; a curve that leaves $D$ is not measured by the same endpoint-plus-excess identity. Finally, closedness of the Hilbert form is the exactness input: if this one-form has nonzero [exterior derivative](/theorems/1525) or a nonzero period on the comparison region, the endpoint contribution cannot be represented by a single-valued action potential $S$.
[example: Cycloid as a Calibrated Curve]
Parametrise a cycloid by
\begin{align*}x(\theta)=r(\theta-\sin\theta),\qquad y(\theta)=r(1-\cos\theta).\end{align*}
For $0<\theta<2\pi$, its slope is obtained from
\begin{align*}\frac{dy}{dx}=\frac{dy/d\theta}{dx/d\theta}=\frac{r\sin\theta}{r(1-\cos\theta)}=\frac{\sin\theta}{1-\cos\theta}.\end{align*}
Thus, along the cycloid,
\begin{align*}1+\left(\frac{dy}{dx}\right)^2=1+\frac{\sin^2\theta}{(1-\cos\theta)^2}=\frac{(1-\cos\theta)^2+\sin^2\theta}{(1-\cos\theta)^2}.\end{align*}
Using $\sin^2\theta+\cos^2\theta=1$, the numerator is
\begin{align*}(1-\cos\theta)^2+\sin^2\theta=1-2\cos\theta+\cos^2\theta+\sin^2\theta=2(1-\cos\theta).\end{align*}
Therefore
\begin{align*}1+\left(\frac{dy}{dx}\right)^2=\frac{2}{1-\cos\theta}.\end{align*}
Since $y=r(1-\cos\theta)$, the brachistochrone integrand becomes
\begin{align*}L(y,y')=\sqrt{\frac{1+(y')^2}{2gy}}=\sqrt{\frac{2/(1-\cos\theta)}{2gr(1-\cos\theta)}}=\frac{1}{\sqrt{gr}(1-\cos\theta)}.\end{align*}
The Beltrami quantity for $L(y,v)=\sqrt{(1+v^2)/(2gy)}$ is
\begin{align*}L-v\partial_vL=\frac{\sqrt{1+v^2}}{\sqrt{2gy}}-\frac{v^2}{\sqrt{2gy}\sqrt{1+v^2}}=\frac{1}{\sqrt{2gy}\sqrt{1+v^2}}.\end{align*}
Along the cycloid this gives
\begin{align*}L-y'\partial_vL=\frac{1}{\sqrt{2gr(1-\cos\theta)}\sqrt{2/(1-\cos\theta)}}=\frac{1}{2\sqrt{gr}}.\end{align*}
Hence the first integral fixes $r$ once the constant $L-y'\partial_vL$ is fixed by the endpoint member of the cycloidal field.
If $S$ is the Hamilton-Jacobi potential for this field and $p(x,y)$ is the cycloidal velocity through $(x,y)$, then the Hilbert identity writes every competitor velocity $v$ as
\begin{align*}L(y,v)=\partial_xS(x,y)+\partial_yS(x,y)v+E(x,y,p(x,y),v).\end{align*}
Along the cycloid itself, $v=p(x,y)$, so $E(x,y,p,p)=0$ because
\begin{align*}E(x,y,p,p)=L(y,p)-L(y,p)-\partial_vL(y,p)(p-p)=0.\end{align*}
For a competitor $y$, integrating the identity gives
\begin{align*}J[y]=S(x_B,y_B)-S(x_A,y_A)+\int_{x_A}^{x_B}E(x,y(x),p(x,y(x)),y'(x))\,dx.\end{align*}
The cycloidal arc has the same endpoint term and zero excess, while convexity of $v\mapsto L(y,v)$ for $y>0$ gives nonnegative excess for competitors inside the calibrated field. Thus the travel time splits into a fixed endpoint contribution and a nonnegative variational remainder.
[/example]
The brachistochrone also illustrates the limitation of the method. If the chosen endpoint lies beyond the caustic of the cycloidal family, the field is no longer single-valued and the same Hamilton-Jacobi certificate cannot be used on that larger region.
## Minimal Surfaces and Global Comparison
The minimal surface of revolution gives a second test case. The Euler-Lagrange equation produces catenoids, but the Goldschmidt phenomenon shows that a smooth stationary surface need not be the global minimiser.
For a surface of revolution generated by a positive curve $r=r(x)$ between two coaxial circles, the area functional is
\begin{align*}
A[r]=2\pi\int_a^b r\sqrt{1+(r')^2}\,dx.
\end{align*}
The catenary solutions are extremals, and locally they can be organised into fields as long as the corresponding family does not fold.
[quotetheorem:3518]
[citeproof:3518]
This theorem deliberately has a restricted conclusion. It explains why a catenoid can be locally sufficient while still losing to a discontinuous competitor outside the smooth field class. The restriction to $D$ keeps the comparison inside the region where the catenary family is a single-valued field; after the family folds, the Hamilton-Jacobi potential cannot be used as a global action coordinate. The condition $r>0$ is also structural, because the area integrand and the surface-of-revolution interpretation degenerate at the rotation axis. The nonnegative excess comes from convexity in $v$, so if a different surface functional lost this convexity, the same endpoint calibration would not prevent a faster competitor direction.
[example: Goldschmidt Competition]
Take two equal coaxial circles of radius $R$ in the planes $x=-a$ and $x=a$, and consider the symmetric catenoid
\begin{align*}
r(x)=c\cosh(x/c)
\end{align*}
with boundary condition
\begin{align*}
R=c\cosh(a/c).
\end{align*}
Put $\alpha=a/c$. Then $R=c\cosh\alpha$. Since
\begin{align*}
r'(x)=\sinh(x/c)
\end{align*}
and
\begin{align*}
1+(r'(x))^2=1+\sinh^2(x/c)=\cosh^2(x/c),
\end{align*}
the area of the catenoid is
\begin{align*}
A_{\mathrm{cat}}=2\pi\int_{-a}^{a}c\cosh(x/c)\cosh(x/c)\,dx.
\end{align*}
With $u=x/c$, so $dx=c\,du$ and the limits become $-\alpha$ and $\alpha$, this is
\begin{align*}
A_{\mathrm{cat}}=2\pi c^2\int_{-\alpha}^{\alpha}\cosh^2u\,du.
\end{align*}
Using $\cosh^2u=(1+\cosh(2u))/2$, we get
\begin{align*}
\int_{-\alpha}^{\alpha}\cosh^2u\,du=\alpha+\frac12\sinh(2\alpha).
\end{align*}
Therefore
\begin{align*}
A_{\mathrm{cat}}=2\pi c^2\left(\alpha+\frac12\sinh(2\alpha)\right).
\end{align*}
The disconnected pair of disks has area
\begin{align*}
A_{\mathrm{disk}}=2\pi R^2=2\pi c^2\cosh^2\alpha.
\end{align*}
Thus the catenoid beats the disk pair exactly when
\begin{align*}
\alpha+\frac12\sinh(2\alpha)<\cosh^2\alpha.
\end{align*}
Equivalently, since $\frac12\sinh(2\alpha)=\sinh\alpha\cosh\alpha$, the difference is
\begin{align*}
A_{\mathrm{disk}}-A_{\mathrm{cat}}=2\pi c^2\left(\cosh^2\alpha-\sinh\alpha\cosh\alpha-\alpha\right).
\end{align*}
At $\alpha=0$ the bracket equals $1$, while for large $\alpha$ the term $\cosh^2\alpha-\sinh\alpha\cosh\alpha=\cosh\alpha(\cosh\alpha-\sinh\alpha)=e^{-\alpha}\cosh\alpha$ stays bounded and the term $-\alpha$ dominates. Hence the areas cross as the separation parameter grows. Past that crossing, the smooth catenoid can still be stationary, but it is not the global minimiser in the broader class that allows the disk-pair collapse. The Hamilton-Jacobi certificate only compares competitors that remain inside the smooth catenary field region, so it does not rule out this discontinuous competitor.
[/example]
The minimal-surface example warns against interpreting sufficiency as an absolute statement detached from the admissible class. Hamilton-Jacobi theory certifies minimisers relative to the region and class on which the calibration exists.
## Unifying the Classical Conditions
The course has developed several tests, and their relationship is now visible. The Euler-Lagrange equation identifies stationary curves; Legendre, Jacobi, and Weierstrass refine stationarity; Hamilton-Jacobi turns the strongest local information into a global comparison argument.
[explanation: Necessary and Sufficient Conditions as a Hierarchy]
The Legendre condition is infinitesimal in the velocity variable. It asks whether the second variation can be nonnegative against highly localised variations, so it is a necessary condition for a weak minimum.
The Jacobi condition is infinitesimal along the whole curve. It studies the second variation through the Jacobi equation and detects conjugate points, which signal the loss of local minimality and the breakdown of the extremal field.
The Weierstrass condition is pointwise but nonlinear in the competing velocity. It strengthens the second-order Legendre information into a finite comparison between the field velocity and arbitrary velocities.
The Hamilton-Jacobi condition is global on a region. A solution $S$ is a calibration: it converts the action into an endpoint term plus nonnegative excess, giving sufficiency for all competitors contained in the calibrated region.
[/explanation]
This hierarchy also clarifies the role of regularity, because each layer assumes enough smooth structure to pass from local equations to fields and then to a calibrated comparison. It is useful to collect the hypotheses into a single final theorem, since this is the practical checklist for applying the classical method to a new variational problem.
[quotetheorem:7001]
[citeproof:7001]
The theorem is the endpoint of the classical smooth theory. It shows that sufficiency is not a separate trick but the culmination of the Euler-Lagrange, Weierstrass, Jacobi, and Hamilton-Jacobi viewpoints. It does not claim that every extremal is a minimiser: for the harmonic oscillator, an extremal segment past the first conjugate time no longer comes from an invertible endpoint parametrisation, so the field needed for the Hamilton-Jacobi certificate has folded. The Weierstrass hypothesis is also independent: a smooth nonconvex integrand such as $L(v)=v^4-v^2$ has a stationary velocity $v=0$, but the excess at $v=0$ equals $v^4-v^2$ and is negative for small nonzero $v$, so the endpoint term alone cannot prove minimality. Regularity can fail separately; for instance $L(v)=v^4$ has $\partial^2_{vv}L(0)=0$, so the Legendre transform is singular at zero velocity and a momentum gradient does not determine a unique regular characteristic by the [inverse function theorem](/theorems/51). Finally, the theorem does not claim global minimality beyond the chosen class, as the catenoid versus disk-pair comparison demonstrates. In modern language, the function $S$ is a calibration, and the theorem proves minimality exactly on the region where that calibration and its nonnegative excess identity are valid.
## Outlook Toward Direct Methods
The final question is what happens when the hypotheses above fail. Many important variational problems do not have smooth minimisers, do not have a single-valued extremal field, or have minimising sequences that leave every compact smooth class.
Classical methods begin with a candidate and test it. Direct methods reverse the order: first prove that a minimiser exists in a complete function space, then study its regularity and derive equations it satisfies. This shift is forced by examples such as the Goldschmidt discontinuous solution, cavitation, Lavrentiev gaps, and weak limits of oscillating sequences.
[remark: Why Smooth Sufficiency Is Not Enough]
Hamilton-Jacobi sufficiency is powerful when a calibration exists, but it does not itself produce compactness of minimising sequences. It also does not decide whether the correct admissible class should contain corners, discontinuities, Sobolev maps, measures, or currents. These questions belong to the direct method, where lower semicontinuity, coercivity, weak compactness, and relaxation replace the construction of smooth extremal fields.
[/remark]
This is the transition to Calculus of Variations II. The classical theory remains indispensable: it predicts the equations, conserved quantities, second-variation tests, and calibrations that any smooth minimiser should satisfy. The direct theory explains when minimisers exist, what regularity they possess, and how variational problems behave beyond the smooth regime.
## Beyond and Connected Topics
The closest continuation is the direct method in the calculus of variations. The classical Euler-Lagrange theorem [Euler-Lagrange Equation](/theorems/3504) predicts the equation a smooth minimiser should satisfy, while direct methods add coercivity, lower semicontinuity, compactness, and relaxation to prove that a minimiser exists before differentiating the functional.
A second continuation is geometric variational theory. The geodesic first- and second-variation formulas [First Variation Formula](/theorems/2728) and [Second Variation Formula](/theorems/2729) are not substitutes for the scalar integral formulas used in this course; they are the Riemannian analogue, where variations are vector fields along curves and curvature enters the second variation. Minimal-surface theory gives a higher-dimensional analogue through area functionals, stability inequalities, and calibration arguments.
Hamiltonian mechanics is another connected branch. The Legendre transform and Hamilton-Jacobi equation turn the Lagrangian viewpoint into phase-space dynamics, where conservation laws such as [Noether's Theorem](/theorems/3513) and Hamiltonian first integrals organise the same variational information symplectically.
Finally, weak and nonsmooth problems require different function spaces. Sobolev spaces, [weak convergence](/page/Weak%20Convergence), convexity, quasiconvexity, and measure-valued limits explain why smooth extremals can fail to exist or fail to minimise globally. These tools preserve the insight of the classical tests while replacing smooth extremal fields with compactness and lower-semicontinuity arguments.
## References
- Bolza, O. *Lectures on the Calculus of Variations*. University of Chicago Press, 1904.
- Gelfand, I. M., and Fomin, S. V. *Calculus of Variations*. Prentice-Hall, 1963.
- Giaquinta, M., and Hildebrandt, S. *Calculus of Variations I: The Lagrangian Formalism*. Springer, 1996.
- Weinstock, R. *Calculus of Variations, with Applications to Physics and Engineering*. McGraw-Hill, 1952.
Contents
- Introduction
- What the Course Is About
- From Variations To Differential Equations
- Necessary Conditions, Sufficient Conditions, And Geometry
- Symmetry And Conservation Laws
- The Hamilton--Jacobi Viewpoint
- Prerequisites And Conventions
- 1. Functionals and the Variational Problem
- Functionals on Spaces of Curves
- Integral Functionals and Admissible Curves
- Boundary Conditions and Constraints
- First Variation and Stationarity
- 2. The Euler–Lagrange Equation
- From Stationarity to a Pointwise Equation
- The Euler-Lagrange Equation
- First Integrals and the Beltrami Identity
- Geodesics and Constrained Coordinates
- Weak Euler-Lagrange Equations and DuBois-Reymond
- Regularity and the Role of the Legendre Condition
- 3. Natural Boundary Conditions and Constraints
- Free Endpoints and Boundary Terms
- Broken Extremals and Corner Conditions
- Isoperimetric Constraints and Multipliers
- Holonomic Constraints and Variational Mechanics
- 4. Legendre's Condition and the Second Variation
- The Second Variation as a Quadratic Test
- Legendre's Necessary Condition
- The Strengthened Legendre Condition
- The Jacobi Accessory Equation and Jacobi Fields
- 5. Conjugate Points and Jacobi's Theorem
- Vanishing Variations Along an Extremal
- Conjugate Points and the Necessary Condition
- Sturm--Liouville Form and Oscillation
- Jacobi's Sufficient Condition
- Geometric and Classical Examples
- Summary of the Jacobi Test
- 6. Weierstrass Excess Function and Strong Extrema
- Weak and Strong Comparison
- Fields of Extremals and Slope Selection
- The Weierstrass Excess Function
- Weierstrass Sufficient Condition
- 7. Noether's Theorem
- Continuous Symmetries of Curves
- Invariance of the Action
- Noether's Theorem
- Standard Conservation Laws
- 8. Hamiltonian Formulation and Legendre Transform
- From Velocity to Momentum
- Hamilton's Canonical Equations
- Symplectic Structure and Canonical Transformations
- Cyclic Coordinates and Conservation Laws
- 9. Hamilton–Jacobi Theory
- The Hamilton-Jacobi Equation
- Complete Integrals and Characteristics
- Jacobi's Theorem and Integration of Hamilton's Equations
- Separation of Variables and Action-Angle Coordinates
- 10. Sufficiency via the Hamilton–Jacobi Equation
- From Fields to the Hamilton-Jacobi Equation
- Jacobi Fields and Breakdown of Hamilton-Jacobi Solutions
- Brachistochrone Revisited
- Minimal Surfaces and Global Comparison
- Unifying the Classical Conditions
- Outlook Toward Direct Methods
- Beyond and Connected Topics
- References
Calculus of Variations I: Classical Theory
Content
Problems
History
Created by admin on 6/14/2026 | Last updated on 6/14/2026
Prerequisites (0/19 completed)
Log in to track your prerequisite progress.
Prerequisites Graph
Interactive dependency map showing prerequisite concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Rate this page
★
★
★
★
★
Poor
Excellent