Differential geometry is the study of smooth manifolds—curved spaces that locally look like Euclidean space—and the geometric structures defined on them. Unlike the classical extrinsic approach of studying curves and surfaces embedded in higher-dimensional Euclidean space, differential geometry emphasizes intrinsic properties: those measurements and invariants that depend only on the manifold itself, not on any ambient space. The central realization is that curvature, arising naturally from connections on tangent bundles, provides the language to measure how space curves and is the key to understanding topology through analysis. By developing calculus on manifolds, we learn that differentiation is not an absolute operation but requires a choice of connection—a way of comparing tangent spaces at nearby points.
The course traces a systematic path from foundations to applications. We begin with smooth manifolds and tangent bundles, establishing the local and global coordinate structures that make calculus possible. Differential forms then provide a coordinate-free calculus, leading naturally to curvature as the obstruction to integrability. Connections formalize the notion of parallel transport and covariant differentiation on vector bundles, revealing that curvature is a fundamental invariant living in these connections. Three classical geometric structures—Riemannian metrics, symplectic forms, and complex structures—exemplify how to impose additional constraints on a manifold, each with distinct geometric meaning and topological consequences. Geodesics emerge as the shortest paths on a Riemannian manifold, and their existence and properties reflect the underlying curvature. Finally, the Yang-Mills equation presents a variational principle on gauge theory, showing how connections themselves become dynamical objects satisfying equations of motion, unifying geometry with physics.
Each chapter builds essential machinery for the next: manifolds provide the setting; differential forms give us a calculus; connections supply the notion of differentiation; geometric structures impose regularity; geodesics reveal the interplay of local curvature and global shape; and Yang-Mills demonstrates that these structures obey natural dynamical laws. Together, they form a unified framework in which local infinitesimal invariants determine global geometric and topological properties.
# 1. Smooth Manifolds
Differential geometry begins with the question: what is the right setting for doing calculus on curved spaces? Euclidean space $\mathbb{R}^n$ is the model, but many natural geometric objects — spheres, projective spaces, Lie groups — are not flat. A smooth manifold is the answer: a topological space that locally looks like $\mathbb{R}^n$, equipped with enough structure to make calculus well-defined. This chapter builds that structure from the ground up, introduces the three equivalent constructions of the tangent space, develops the language of vector bundles, and proves the Frobenius integrability theorem, which tells us precisely when a family of tangent directions can be integrated into submanifolds.
## Topological and Smooth Manifolds
The first step is to identify what topological properties are needed to do geometry sensibly.
[definition: Topological Manifold]
A **topological manifold** is a Hausdorff, second countable topological space $X$ that is locally homeomorphic to $\mathbb{R}^n$ for some fixed $n$, called the **dimension** of $X$.
Second countability means there exists a countable basis $\{U_i\}_{i \in \mathbb{N}}$ for the topology of $X$. Hausdorff means distinct points can be separated by disjoint open sets.
[/definition]
The Hausdorff and second countability conditions are not merely technical: without Hausdorff, limits are not unique and the geometry becomes pathological; without second countability, partitions of unity (a key tool) need not exist.
A **chart** centred at $p \in X$ is a pair $(U, \varphi)$ where $U$ is open in $X$, $p \in U$, and
\begin{align*}
\varphi : (U, p) \xrightarrow{\;\sim\;} (B^n, 0)
\end{align*}
is a homeomorphism onto the open unit ball $B^n = B(0,1) \subset \mathbb{R}^n$. A choice of chart near $p$ gives **local coordinates** $\{x_1, \ldots, x_n\}$ on $X$ near $p$, via the pullback of standard coordinates on $\mathbb{R}^n$. Given two charts $(U, \varphi)$ and $(V, \psi)$ at $p$, the **transition function** is
\begin{align*}
\psi \circ \varphi^{-1} : \varphi(U \cap V) \longrightarrow \psi(U \cap V),
\end{align*}
which is a homeomorphism between open subsets of $\mathbb{R}^n$. Since the target and domain lie in $\mathbb{R}^n$, it makes sense to ask whether this transition function is smooth. Requiring smoothness is precisely what promotes a topological manifold to a smooth manifold.
[definition: Smooth Manifold]
A **smooth (differentiable) manifold** is a topological manifold $X$ equipped with an **atlas** of charts $\{(U_\alpha, \varphi_\alpha)\}_{\alpha \in A}$ — an open cover of $X$ — such that all transition functions
\begin{align*}
\varphi_{\alpha\beta} = \varphi_\beta \circ \varphi_\alpha^{-1} : \varphi_\alpha(U_\alpha \cap U_\beta) \longrightarrow \varphi_\beta(U_\alpha \cap U_\beta)
\end{align*}
are smooth diffeomorphisms for all $\alpha, \beta$. A choice of such an atlas is called a **differentiable structure** on $X$.
[/definition]
Any smooth atlas is contained in a unique **maximal atlas**, obtained by including every chart whose transition maps with the given atlas are smooth (this maximality is proved via a Zorn's lemma argument). Two atlases that determine the same maximal atlas define the same differentiable structure. Remarkably, the same topological manifold can admit distinct differentiable structures: Milnor's exotic spheres are the classical example.
We write $M^n$ for a smooth $n$-dimensional manifold.
[definition: Smooth Map]
If $M, N$ are smooth manifolds, a map $f : M \to N$ is **smooth at** $p \in M$ if for any chart $(U, \varphi)$ at $p$ and chart $(V, \psi)$ at $f(p) \in N$, the composition
\begin{align*}
\psi \circ f \circ \varphi^{-1} : \varphi(U) \longrightarrow \psi(V)
\end{align*}
is smooth (in the classical sense, as a map between open subsets of Euclidean spaces). We say $f$ is **smooth** if it is smooth at every $p \in M$. This notion is independent of the choice of charts, precisely because the transition maps are smooth.
[/definition]
With smooth maps defined, the correct equivalence relation on smooth manifolds is diffeomorphism: a smooth bijection with smooth inverse. This is the natural isomorphism in the category of smooth manifolds — two diffeomorphic manifolds are geometrically indistinguishable.
[definition: Diffeomorphism]
Smooth manifolds $M, N$ are **diffeomorphic** if there exists a smooth bijection $f : M \to N$ with smooth inverse.
[/definition]
Diffeomorphism is the fundamental equivalence relation for smooth manifolds: it preserves not just the topology but the entire differentiable structure. Two diffeomorphic manifolds are indistinguishable from the perspective of differential geometry. The examples below illustrate the most basic smooth manifolds and will serve as recurring test cases throughout the chapter.
[example: $\mathbb{R}^n$ and $S^n$ as Smooth Manifolds]
$\mathbb{R}^n$ is a smooth manifold with a single-chart atlas $(\mathbb{R}^n, \mathrm{id})$.
The sphere $S^n = \{x \in \mathbb{R}^{n+1} : |x| = 1\}$ is a smooth manifold. The atlas has two charts: stereographic projection $\varphi_N : S^n \setminus \{N\} \to \mathbb{R}^n$ from the north pole, and the analogous projection $\varphi_S$ from the south pole. The single non-degenerate transition function works out to be
\begin{align*}
(x_0, x_1, \ldots, x_{n-1}) \longmapsto \left(\frac{x_0}{|x|^2}, \ldots, \frac{x_{n-1}}{|x|^2}\right),
\end{align*}
which is smooth on $\mathbb{R}^n \setminus \{0\}$ and is in fact self-inverse. This confirms $S^n$ is a smooth manifold.
[/example]
The sphere is the prototypical compact manifold. Projective space provides a more subtle example: it cannot be realised as an obvious subspace of $S^n$, and its construction as a quotient by a discrete group illustrates a general technique that will recur when we discuss quotient manifolds and Grassmannians.
[example: Real Projective Space $\mathbb{R}P^n$]
Define $\mathbb{R}P^n$ to be the set of lines through the origin in $\mathbb{R}^{n+1}$:
\begin{align*}
\mathbb{R}P^n := \{v \in \mathbb{R}^{n+1} \setminus \{0\}\}/{\sim} \;=\; S^n/\{\pm 1\},
\end{align*}
where $v \sim \lambda v$ for all $\lambda \in \mathbb{R} \setminus \{0\}$. Points are denoted by homogeneous coordinates $[x_0 : \cdots : x_n]$.
For $i = 0, \ldots, n$, let $U_i = \{x_i \neq 0\}$ with chart map $\varphi_i : U_i \to \mathbb{R}^n$ given by
\begin{align*}
\varphi_i([x_0 : \cdots : x_n]) = \frac{1}{x_i}(x_0, \ldots, \hat{x}_i, \ldots, x_n)
\end{align*}
(omit $x_i$, divide by it). These are homeomorphisms, showing $\mathbb{R}P^n$ is a topological manifold. The transition function $\varphi_j \circ \varphi_i^{-1}$, which sends
\begin{align*}
(y_1, \ldots, y_n) \longmapsto \frac{1}{y_j}(y_1, \ldots, y_{i-1}, 1, y_i, \ldots, \hat{y}_j, \ldots, y_n),
\end{align*}
is smooth on the domain $\{y_j \neq 0\}$, so $\mathbb{R}P^n$ is a smooth manifold.
[/example]
## Immersions, Submersions, and Submanifolds
With smooth maps defined, we can study their local linear approximation via the rank of the coordinate Jacobian. This rank is independent of the choice of charts (by the chain rule), so the following definitions make sense.
[definition: Immersion and Submersion]
Let $f : M^n \to N^k$ be smooth. For charts $(U, \varphi)$ at $p$ and $(V, \psi)$ at $f(p)$, let $D(\psi \circ f \circ \varphi^{-1})|_{\varphi(p)}$ denote the Jacobian matrix. Then $f$ is:
- an **immersion at** $p$ if this Jacobian is injective (rank $n$, so $n \leq k$);
- a **submersion at** $p$ if this Jacobian is surjective (rank $k$, so $k \leq n$).
[/definition]
Immersions and submersions are the two extreme rank conditions on the differential $df_p$. An immersion folds $M$ into $N$ without self-tangency at each point; a submersion projects $N$ onto $M$ with full rank at each point. Neither condition alone guarantees global injectivity or surjectivity — those are stronger requirements. The relevant notion for a subset of a manifold to be itself a manifold is captured by the submanifold condition.
[definition: Submanifold]
If $i : N^k \hookrightarrow M^n$ is a smooth injective map, then $i(N)$ is a **submanifold** of $M$ if $i$ is an immersion and a homeomorphism onto its image (i.e., the manifold topology on $N$ agrees with the subspace topology on $i(N)$ inherited from $M$).
[/definition]
The homeomorphism condition matters. Consider the torus $T^2 = \mathbb{R}^2/\mathbb{Z}^2$ and the map $i : \mathbb{R} \to T^2$ given by $t \mapsto [t, \alpha t]$ for some irrational $\alpha$. This map is injective and an immersion everywhere, but the subspace topology on $i(\mathbb{R})$ is not the standard topology on $\mathbb{R}$ (the image is a dense line wrapping around the torus without closing up). The two topologies are incompatible, so this is an immersion but not a submanifold.
[definition: Regular Value]
Let $f : M^n \to N^k$ be smooth. A point $q \in N$ is a **regular value** of $f$ if for every $p \in f^{-1}(q)$, $f$ is a submersion at $p$ (i.e., the Jacobian at $\varphi(p)$ has rank $k$). Points $p$ with this property are called **regular points**.
[/definition]
The Pre-Image Theorem is the primary tool for constructing submanifolds:
[quotetheorem:1512]
[citeproof:1512]
The Pre-Image Theorem is the principal tool for recognising submanifolds in practice: to verify that a set $S \subset M$ is a submanifold, write $S = f^{-1}(q)$ for some smooth $f$ and check that $q$ is a regular value. The regularity hypothesis is essential — consider $f : \mathbb{R} \to \mathbb{R}$ given by $f(x) = x^2$ at $q = 0$: here $Df|_0 = 0$ is not surjective, and the level set $\{0\}$ is a point rather than the promised $(n-k)$-dimensional submanifold. The theorem says nothing about critical values — $f^{-1}(q)$ may still be a manifold, but there is no guarantee from the theorem alone.
[example: The Orthogonal Group $O(n)$]
Consider $f : \mathrm{Mat}_n(\mathbb{R}) \to \mathrm{Sym}_n(\mathbb{R})$ defined by $f(A) = A^\top A$. Its derivative is $Df|_A(H) = H^\top A + A^\top H$, and in particular $Df|_I(H) = H + H^\top$, which is surjective onto symmetric matrices: given any symmetric $S$, take $H = S/2$ and compute $Df|_I(S/2) = S/2 + (S/2)^\top = S$. Thus the identity matrix $I$ is a regular value of $f$, and $f^{-1}(I) = O(n)$ is a smooth submanifold of $\mathrm{Mat}_n(\mathbb{R}) \cong \mathbb{R}^{n^2}$.
[/example]
The dimension of $O(n)$ is $\dim \mathrm{Mat}_n - \dim \mathrm{Sym}_n = n^2 - n(n+1)/2 = n(n-1)/2$. With the Pre-Image Theorem established, it is natural to ask: how abundant are regular values? Sard's Theorem gives the answer — the set of singular values has measure zero, so regular values are dense.
[remark: Sard's Theorem]
Sard's Theorem states that if $f : M^n \to N^k$ is smooth, then the set of regular values of $f$ is dense in $N^k$. In particular, regular values always exist. This is quoted without proof here; it is a foundational result of differential topology.
[/remark]
## Tangent Vectors and Tangent Bundles
Having defined smooth manifolds, the next task is to define tangent vectors intrinsically — without embedding the manifold in any ambient Euclidean space. There are three equivalent constructions, each with its own advantages. The course presents all three.
### Tangent Vectors via the Embedding (Provisional)
When $\Sigma \subset \mathbb{R}^N$ is a smooth submanifold, the tangent space at $p \in \Sigma$ has a concrete meaning: it is the set of velocity vectors of smooth curves through $p$. Formally, with embedding $i : \Sigma \hookrightarrow \mathbb{R}^N$ and a chart $\varphi : (U,p) \to (B^k, 0)$,
\begin{align*}
T_p\Sigma = \operatorname{Image}\!\left(D(i \circ \varphi^{-1})\big|_{\varphi(p)}\right) \subset \mathbb{R}^N.
\end{align*}
This is a $k$-dimensional affine subspace of $\mathbb{R}^N$ and is independent of the choice of chart (since transition maps have invertible derivatives). Globally, the **tangent bundle** is
\begin{align*}
T\Sigma := \bigsqcup_{p \in \Sigma} T_p\Sigma = \{(p, v) \in \mathbb{R}^N \times \mathbb{R}^N : p \in \Sigma,\; v \in T_p\Sigma\},
\end{align*}
topologised as a subspace of $\mathbb{R}^N \times \mathbb{R}^N$. This construction has two key properties: there is a canonical projection $\pi : T\Sigma \to \Sigma$ with fibre $\pi^{-1}(p) \cong \mathbb{R}^k$, and $\pi$ admits local product structure: given a chart $(U, \varphi)$, one has $T\Sigma|_U \cong U \times \mathbb{R}^k$.
This provisional definition depends on the embedding. Our goal is an intrinsic definition that works for any smooth manifold. The embedding-based construction also motivates the abstract definition of a vector bundle.
[definition: Smooth Vector Bundle]
If $M$ is a smooth manifold, a **smooth vector bundle of rank $k$** is a smooth $(n+k)$-manifold $E$ together with a submersion $\pi : E \to M$ such that:
- each fibre $E_p = \pi^{-1}(p)$ is a $k$-dimensional vector space;
- $\pi$ is **locally split as a product**: for every $p \in M$, there exists an open neighbourhood $U \ni p$ and a diffeomorphism $\Phi : \pi^{-1}(U) \xrightarrow{\sim} U \times \mathbb{R}^k$ that restricts to a linear isomorphism $E_q \to \{q\} \times \mathbb{R}^k$ for each $q \in U$.
[/definition]
The tangent bundle $T\Sigma$ of a submanifold $\Sigma \subset \mathbb{R}^N$ is a smooth vector bundle of rank $n = \dim \Sigma$.
### Tangent Vectors via Germs of Curves
How should we describe a tangent vector to an abstract manifold, without appealing to an ambient Euclidean space in which velocities live? The most geometric answer is to say a tangent vector is the initial velocity of a curve through $p$. Two curves should represent the same tangent vector if they agree to first order at $p$ in every chart, and the tangent space should then be the set of equivalence classes. The first step in making this precise is to localise: only the behaviour of the curve arbitrarily close to $p$ is relevant.
[definition: Germ of a Curve]
A **germ of a curve** on $M$ at $p$ is a smooth map $\gamma : (-\varepsilon, \varepsilon) \to M$ with $\gamma(0) = p$, considered up to the equivalence $\gamma \sim \tau$ if there exists $\delta < \min(\varepsilon_\gamma, \varepsilon_\tau)$ such that $\gamma \equiv \tau$ on $(-\delta, \delta)$. We only care about the behaviour of the curve near $p$.
[/definition]
Two germs encode the same tangent direction precisely when they agree infinitesimally in every coordinate chart. The germ-based definition is geometrically transparent, but working effectively with the tangent space requires a technical tool: bump functions allow one to localise derivations and prove that the derivation space is exactly $n$-dimensional.
The **tangent space** $T_pM$ is then defined as equivalence classes of germs, where
\begin{align*}
\gamma_1 \sim \gamma_2 \iff \text{for all charts } (U,\varphi) \text{ at } p, \quad (\varphi \circ \gamma_1)'(0) = (\varphi \circ \gamma_2)'(0).
\end{align*}
This condition says the two curves have the same velocity in every coordinate system. The addition is defined by $[\gamma_1] + [\gamma_2] = [\delta]$, where $\delta$ satisfies $(\varphi \circ \delta)'(0) = (\varphi \circ \gamma_1)'(0) + (\varphi \circ \gamma_2)'(0)$ in all charts.
Given a chart $\varphi : (U,p) \to (B^n, 0)$, the map $\alpha : TM|_U \to U \times \mathbb{R}^n$ sending $(q, [\gamma]) \mapsto (q, (\varphi \circ \gamma)'(0))$ is an isomorphism, showing $T_pM \cong \mathbb{R}^n$. This makes $TM$ into a smooth vector bundle of rank $n$.
### Tangent Vectors via Derivations
Germs of curves give a geometric picture of tangent vectors but are awkward to compute with: even adding two tangent vectors requires producing a third curve with the right velocity in every chart. A cleaner approach observes that a tangent vector should act on smooth functions by directional differentiation, producing a number from each function. What algebraic properties characterise such an action intrinsically, without mentioning curves at all? Linearity and the Leibniz rule suffice; the tangent space is then the space of all such first-order operators at $p$.
[definition: Derivation at a Point]
Let $C^\infty(M)$ denote the algebra of smooth functions $M \to \mathbb{R}$. A **derivation at** $p \in M$ is a linear map $\alpha_p : C^\infty(M) \to \mathbb{R}$ satisfying the Leibniz rule:
\begin{align*}
\alpha_p(f \cdot g) = f(p) \cdot \alpha_p(g) + \alpha_p(f) \cdot g(p)
\end{align*}
for all $f, g \in C^\infty(M)$. The set of all derivations at $p$ is a vector space denoted $\mathrm{Der}_p$.
[/definition]
The utility of derivations depends on a foundational existence result.
[quotetheorem:1513]
[citeproof:1513]
[remark: Bump Functions and Complex Manifolds]
Bump functions exist on smooth manifolds but not on complex manifolds, where Liouville's theorem prohibits non-constant bounded holomorphic functions. This is a fundamental difference between real differential geometry and complex geometry: many constructions in this course use bump functions extensively and have no direct complex analogue.
[/remark]
Bump functions imply locality of derivations: if two smooth functions agree near $p$, then any derivation at $p$ gives the same value on both. This locality means a derivation depends only on the germ of a function at $p$, not on its global behaviour, and reduces the dimension count to a coordinate calculation. The following theorem confirms that derivations give exactly $n$ independent directions at each point.
[quotetheorem:1514]
[citeproof:1514]
The theorem tells us that the derivation definition of tangent space is both correct and computable: the partial derivative operators $\partial/\partial x_i|_p$ in any coordinate chart form an explicit basis. The argument relies crucially on smoothness: the expansion $f(x) = f(0) + \sum_i x_i g_i(x)$ uses the fundamental theorem of calculus and produces a $g_i$ that is smooth only if $f$ is smooth. On a $C^k$ manifold with $k$ finite, the $g_i$ is only $C^{k-1}$, and the Leibniz calculation loses a derivative; for $C^1$ manifolds the space of derivations can in fact be infinite-dimensional, so the derivation definition collapses. Likewise, at points where $f$ is only finitely differentiable the Taylor expansion used above fails, and the identification of derivations with partial derivative operators breaks down. This is why the theory of smooth manifolds — $C^\infty$, not $C^k$ — is the natural setting. This motivates the following official definition.
[definition: Tangent Space via Derivations]
We define $T_pM$, the **tangent space of $M$ at $p$**, to be $\mathrm{Der}_p$. In a local coordinate chart $(U, \varphi)$ near $p$ with coordinates $\{x_1, \ldots, x_n\}$,
\begin{align*}
T_pM = \operatorname{Span}\!\left\langle \frac{\partial}{\partial x_1}\bigg|_p, \ldots, \frac{\partial}{\partial x_n}\bigg|_p \right\rangle.
\end{align*}
[/definition]
This definition gives an explicit basis. If $\{y_1, \ldots, y_n\}$ are different local coordinates at $p$, the chain rule gives the coordinate change formula:
\begin{align*}
\frac{\partial}{\partial x_i}\bigg|_p = \sum_{j=1}^n \frac{\partial y_j}{\partial x_i}\bigg|_p \cdot \frac{\partial}{\partial y_j}\bigg|_p.
\end{align*}
The two definitions are equivalent: a germ of a curve $\gamma$ through $p$ yields a derivation via $f \mapsto (f \circ \gamma)'(0)$, and this correspondence is a vector space isomorphism.
### Tangent Vectors via Cocycles
The germ and derivation definitions describe tangent vectors at a single point $p$; assembling the fibres into a smooth bundle still requires verifying compatibility across chart overlaps. Can we turn this compatibility data into the definition itself — build the tangent bundle directly from how the fibres transform under changes of chart? This is the cocycle perspective: specify the bundle by recording the invertible linear maps that glue local products $U_\alpha \times \mathbb{R}^n$ together, subject to a consistency condition on triple overlaps. The construction is less geometric than curves or derivations but vastly more flexible, because every linear-algebraic operation on fibres (dual, tensor product, exterior power, symmetric square) is encoded by a corresponding operation on the gluing data.
From the local triviality of a vector bundle $E \to M$, whenever two trivialising neighbourhoods $U_\alpha, U_\beta$ overlap, the two trivialisations differ by a map $\psi_{\alpha\beta} : U_\alpha \cap U_\beta \to GL_k(\mathbb{R})$. These transition functions satisfy the **cocycle condition**: $\psi_{\alpha\alpha} = \mathrm{id}$ and $\psi_{\alpha\beta} \circ \psi_{\beta\gamma} \circ \psi_{\gamma\alpha} = \mathrm{id}$ on all triple overlaps.
Conversely, any collection of smooth maps $\psi_{\alpha\beta} : U_\alpha \cap U_\beta \to GL_k(\mathbb{R})$ satisfying the cocycle condition determines a vector bundle: take the disjoint union $\bigsqcup_\alpha (U_\alpha \times \mathbb{R}^k)$ and identify $(q,v) \in U_\alpha \times \mathbb{R}^k$ with $(q, \psi_{\alpha\beta}(q) v) \in U_\beta \times \mathbb{R}^k$ for $q \in U_\alpha \cap U_\beta$.
For the tangent bundle $TM$, the cocycles are simply the Jacobians of the transition functions: given an atlas $\{(U_\alpha, \varphi_\alpha)\}$, define
\begin{align*}
\varphi_{\alpha\beta} : U_\alpha \cap U_\beta \to GL_n(\mathbb{R}), \quad p \mapsto D(\varphi_\beta \circ \varphi_\alpha^{-1})\big|_{\varphi_\alpha(p)}.
\end{align*}
The cocycle condition is then just the chain rule. This construction recovers the same tangent bundle as the other two definitions, and the change-of-basis formula $(\ast)$ above is exactly the Jacobian.
[remark: Usefulness of the Cocycle Approach]
The cocycle approach makes it easy to construct new bundles from old ones. Given the cocycles $\{\psi_{\alpha\beta}\}$ for $E$, one constructs the dual bundle $E^*$ from $\{\psi_{\alpha\beta}^{-\top}\}$, the tensor product $E \otimes F$ from $\{\psi_{\alpha\beta} \otimes \chi_{\alpha\beta}\}$, and so on. This is much more transparent than trying to define the bundle structure directly.
[/remark]
The following meta-theorem shows these three constructions are the only reasonable notion of tangent bundle:
[quotetheorem:1515]
The proof is given in Spivak's *Comprehensive Introduction to Differential Geometry*, Chapter 3. The theorem says that our three constructions all give the same object, and it is the unique such object. Each axiom is genuinely needed: dropping functoriality allows pathological associations, dropping the $\mathbb{R}^n$ normalisation allows rescaling the fibres, and dropping compatibility with restriction allows charts to give inconsistent local identifications.
## Vector Fields, Flows, and Completeness
Having identified the tangent bundle as the fundamental object, the next question is: what does it mean to differentiate a vector field? A vector field assigns a tangent vector to each point; differentiating it requires comparing vectors in different fibres, which are a priori unrelated. The most natural approach — flowing along one field and watching how another changes — leads to integral curves, flows, and ultimately the Lie bracket. This section develops that theory.
[definition: Section and Vector Field]
A **(smooth) section** of a smooth vector bundle $\pi : E \to M$ is a smooth map $s : M \to E$ with $\pi \circ s = \mathrm{id}_M$, i.e., $s(p) \in E_p$ for all $p$. The vector space of sections of $E$ is denoted $\Gamma(E)$.
A **vector field** is a section of the tangent bundle, $X \in \Gamma(TM)$. In local coordinates $\{x_1, \ldots, x_n\}$, it has the form $X = \sum_i X_i \partial_{x_i}$ where $X_i \in C^\infty(U)$.
[/definition]
A vector field $X$ acts on smooth functions by $X \cdot f := \sum_i X_i \partial f/\partial x_i$, giving a derivation at each point. Geometrically, a vector field assigns a direction at each point of $M$, and one can ask to "follow" that direction: this is the idea of a flow.
[quotetheorem:1516]
[citeproof:1516]
The slogan is: tangent vectors are derivations infinitesimally, while vector fields are derivations globally.
The Existence of Flows theorem deserves care. The smoothness hypothesis is essential: if the coefficients $a_i$ are only Lipschitz (not smooth), the flow exists but need not be smooth in $p$; for merely continuous $a_i$ uniqueness can fail entirely (Peano's theorem gives existence but not uniqueness in general). Looking forward: the local existence theorem gives flows only for small time, and the question of whether they extend to all of $\mathbb{R}$ — completeness — is addressed below.
[definition: Integral Curve]
An **integral curve** of a vector field $X$ is a smooth curve $\gamma$ such that $\dot{\gamma}(t) = X_{\gamma(t)}$ for all $t$ (the tangent to $\gamma$ at each point equals the vector field there).
[/definition]
[example: Translation Flow on the Open Disc]
Let $M = B(1) \subset \mathbb{R}^n$ and $X = \partial_{x_1}$ (the constant unit vector field in the first coordinate direction). The flow is translation: $\varphi_t(a) = a + t e_1$. This is only locally defined (for $t$ small enough that the flow stays within $M = B(1)$).
[/example]
The example exhibits a plain obstruction to global flows: the trajectory eventually escapes the domain. On a compact manifold without boundary there is nowhere to escape, but on non-compact manifolds the obstruction is real. A sufficient condition for a global flow is that the vector field has compact support: outside a compact set it vanishes identically, so the flow is the identity there and the local pieces patch together for all time.
[quotetheorem:1517]
[citeproof:1517]
Compact support is sufficient but not necessary for completeness: on any compact manifold without boundary every vector field is complete, since the ODE trajectory cannot escape the compact domain. The following definition captures the general notion.
[definition: Complete Vector Field]
A vector field $X$ is **complete** if it defines a flow for all time, giving a 1-parameter subgroup $\mathbb{R} \to \mathrm{Diff}(M)$. Compactly supported vector fields are automatically complete.
[/definition]
Complete vector fields generate flows in $\mathrm{Diff}(M)$ that run for all time. A striking consequence is that on a connected manifold the group $\mathrm{Diff}(M)$ is as large as possible — any point can be moved to any other by a diffeomorphism. This is a vast generalisation of the transitivity of the Euclidean isometry group on $\mathbb{R}^n$.
[quotetheorem:1518]
[citeproof:1518]
Connectedness is essential — on a disconnected manifold, each component is its own orbit — and the construction via bump functions implicitly uses that compactly supported vector fields are complete, so the smooth structure (not just the topology) enters the argument. The theorem does not, however, say that $\mathrm{Diff}(M)$ acts transitively on ordered tuples of distinct points: moving $(p_1, p_2)$ to $(q_1, q_2)$ simultaneously requires a further argument (typically a second application of the same bump-function trick in a chart avoiding the already-placed point). More strikingly, the transitivity says that $M$ is, from the point of view of $\mathrm{Diff}(M)$, homogeneous: every point looks like every other. This is a precursor of the theory of homogeneous spaces $G/H$, where a Lie group acts transitively with stabiliser $H$; the present theorem shows that even without any group structure on $M$, its full diffeomorphism group — an enormous, infinite-dimensional object, never itself a finite-dimensional Lie group except in trivial cases — is always transitive.
## Partitions of Unity and the Whitney Embedding Theorem
A central question in differential geometry is whether local data can be assembled into global objects. Metrics, connections, and tensor fields are all defined locally in coordinate charts; the problem is gluing these local pieces together consistently. The technical tool that makes this possible is a partition of unity: a family of non-negative smooth functions adapted to an open cover, summing to one everywhere, that allows local constructions to be averaged into globally-defined objects. As a first major application we prove Whitney's embedding theorem, which shows every abstract smooth manifold can be concretely realised as a submanifold of some Euclidean space.
[definition: Partition of Unity]
Let $M$ be a manifold and $\{V_\alpha\}_{\alpha \in A}$ an open cover. A **partition of unity subordinate to the cover** is a collection $\{\phi_\alpha\}_{\alpha \in A}$ of smooth functions $\phi_\alpha : M \to \mathbb{R}$ such that:
- $\phi_\alpha \geq 0$ for all $\alpha$;
- $\mathrm{supp}(\phi_\alpha) \subset V_\alpha$ for some indexing;
- supports are locally finite: for every $x \in M$, there is an open $U \ni x$ meeting only finitely many $\mathrm{supp}(\phi_\alpha)$;
- $\sum_\alpha \phi_\alpha \equiv 1$ on $M$.
[/definition]
The local finiteness condition ensures the sum in the last condition is finite at each point and hence defines a smooth function.
Every open cover of a smooth manifold admits a subordinate partition of unity; this follows from second countability. The proof is an exercise in point-set topology.
Partitions of unity are indispensable: they allow local constructions (defined in coordinate charts) to be assembled into global objects on $M$.
[quotetheorem:1519]
[citeproof:1519]
This theorem justifies the provisional definition of tangent space via embeddings: every smooth manifold can be embedded in some $\mathbb{R}^N$.
The dimension bound $2n$ is sharp: $\mathbb{R}P^2$ does not embed smoothly in $\mathbb{R}^3$ (its normal bundle is non-orientable), so the bound cannot be improved to $n+1$ in general. The distinction between immersion and embedding is important — Whitney's immersion theorem separately gives immersions into $\mathbb{R}^{2n-1}$, one dimension less, while embeddings (requiring global injectivity, not just infinitesimal) need the extra room to avoid self-intersections. The surprising consequence of Whitney is that abstract manifolds are secretly subsets of Euclidean space all along.
## Cotangent Bundles, Lie Algebras, and Lie Groups
### The Cotangent Bundle and 1-Forms
Vector fields are sections of the tangent bundle and encode directions of motion on the manifold. But many quantities in geometry and analysis live in the dual: they accept a tangent direction and return a number. The rate of change of a smooth function $f$ in a direction $v$ is the pairing of $df$ with $v$, so $df$ is naturally a covector rather than a vector. The cotangent bundle formalises this dual perspective and provides the natural domain for integration on manifolds. A key advantage of 1-forms over vector fields is their canonical pullback: while vector fields push forward only along diffeomorphisms, 1-forms pull back along any smooth map.
[definition: Cotangent Bundle]
The **cotangent bundle** $T^*M$ of a smooth manifold $M$ is the dual bundle of $TM$: its fibre at $p$ is $T_p^*M = (T_pM)^*$. Sections of $T^*M$ are called **differential 1-forms**, and their space is denoted $\Omega^1(M)$.
[/definition]
In a coordinate chart $(U, \varphi)$ with coordinates $\{x_1, \ldots, x_n\}$, the **dual basis** $\{dx_i|_q\}$ of the basis $\{\partial_{x_i}|_q\}$ of $T_qM$ is characterised by $dx_i|_q(\partial_{x_j}|_q) = \delta_{ij}$. These span $T_q^*M$ at each $q \in U$.
Any smooth function $f : M \to \mathbb{R}$ defines a canonical global section $df \in \Omega^1(M)$ via $df(X) := X \cdot f$ for $X \in \Gamma(TM)$. In local coordinates:
\begin{align*}
df_q = \sum_i \frac{\partial f}{\partial x_i}\bigg|_q dx_i|_q \in T_q^*M.
\end{align*}
More generally, for a smooth map $f : M \to N$, the differential $df_p : T_pM \to T_{f(p)}N$ (also written $f_* = df$) acts on a derivation $v \in T_pM$ and a function $g \in C^\infty(N)$ by $(df_p(v))(g) := v(g \circ f)$. The dual map $f^* : T^*N \to T^*M$ (the **pullback**) acts on 1-forms by $(f^*\alpha)_p = \alpha_{f(p)} \circ df_p$.
### The Lie Bracket and Lie Algebras
Vector fields form a vector space under pointwise addition, but they also interact with one another through differentiation: given $X$ and $Y$, one can ask how $Y$ changes as one flows along $X$. The naive computation produces second-order derivative terms in any coordinate expression — but the remarkable fact is that all second-order terms cancel, leaving a first-order operator $[X,Y]$ called the Lie bracket. This bracket turns the infinite-dimensional space $\Gamma(TM)$ into a Lie algebra. The abstract axioms of a Lie algebra capture precisely the algebraic structure that survives after all coordinate choices are eliminated.
[definition: Lie Algebra]
A **Lie algebra** is a vector space $\mathfrak{g}$ equipped with a bilinear map $[\cdot, \cdot] : \mathfrak{g} \times \mathfrak{g} \to \mathfrak{g}$ (the **Lie bracket**) satisfying:
- **skew-commutativity**: $[x,y] = -[y,x]$ for all $x,y \in \mathfrak{g}$;
- **Jacobi identity**: $[x,[y,z]] + [y,[z,x]] + [z,[x,y]] = 0$ for all $x,y,z \in \mathfrak{g}$.
[/definition]
The Jacobi identity is the infinitesimal shadow of the associativity of a group. For $\mathfrak{gl}_n = \mathrm{Mat}_n(\mathbb{R})$ with bracket $[A,B] = AB - BA$, the Jacobi identity follows immediately from associativity of matrix multiplication. The following definition specialises the bracket to vector fields on a manifold; the theorem after it verifies this is genuinely a Lie algebra.
The vector space of vector fields $\Gamma(TM)$ carries a natural Lie algebra structure via the commutator:
[definition: Lie Bracket of Vector Fields]
If $X, Y \in \Gamma(TM)$, their **Lie bracket** (or commutator) $[X,Y] \in \Gamma(TM)$ is defined by
\begin{align*}
[X,Y] \cdot f := X \cdot (Y \cdot f) - Y \cdot (X \cdot f) \qquad \text{for all } f \in C^\infty(M).
\end{align*}
[/definition]
[quotetheorem:1520]
[citeproof:1520]
The cancellation of second-order terms is the key point: the Lie bracket measures the failure of $X$ and $Y$ to commute as differential operators, and this failure is itself a first-order operator. This is precisely why $[X,Y]$ is a vector field rather than a second-order differential operator.
The Lie bracket satisfies the expected properties: skew-commutativity, the Jacobi identity, the module rule $[fX, gY] = fg[X,Y] + f(Xg)Y - g(Yf)X$, and naturality: if $\varphi : M \to N$ is smooth, then $d\varphi([X_1, X_2]) = [d\varphi(X_1), d\varphi(X_2)]$ (smooth maps give Lie algebra homomorphisms on tangent spaces). This makes $(\Gamma(TM), [\cdot,\cdot])$ an infinite-dimensional Lie algebra.
### Lie Groups
Many of the most important symmetry groups in mathematics — the rotation group $SO(3)$, the unitary groups $U(n)$, the symplectic groups $Sp(2n)$ — are not just groups but carry a smooth structure compatible with their group operations. The objects that encode this structure are called Lie groups. The central insight is that a Lie group can be studied infinitesimally via its Lie algebra: a finite-dimensional object that captures the local structure of the group near the identity, even when the group itself is non-compact.
[definition: Lie Group]
A **Lie group** $G$ is a smooth manifold that is also a group, with the property that multiplication $m : G \times G \to G$ and inversion $g \mapsto g^{-1}$ are smooth maps.
[/definition]
The combination of smooth structure and group structure imposes strong rigidity on the topology of $G$: for instance, every Lie group has a parallelisable tangent bundle, which is a strong topological constraint. The fundamental examples come from linear algebra.
[example: $GL_n(\mathbb{R})$ and $O(n)$]
$GL_n(\mathbb{R}) \subset \mathrm{Mat}_n(\mathbb{R}) \cong \mathbb{R}^{n^2}$ is an open subset of Euclidean space, hence a smooth manifold, and a group under matrix multiplication. The orthogonal group $O(n) = f^{-1}(I)$ for $f(A) = A^\top A$ is a closed Lie subgroup of $GL_n(\mathbb{R})$ (by the Pre-Image Theorem applied earlier).
[/example]
Lie groups have a remarkable global property: their tangent bundle is parallelisable (globally split as a product as a vector bundle).
For $g \in G$, the left multiplication $L_g : G \to G$, $h \mapsto gh$, is a diffeomorphism. Its derivative at $e$ gives an isomorphism $dL_g|_e : T_eG \xrightarrow{\sim} T_gG$. Doing this for all $g$ defines a global trivialisation $TG \cong G \times T_eG$. This is a non-obvious topological fact: for instance, $S^n$ is parallelisable only for $n = 1, 3, 7$, corresponding to the unit spheres in the normed division algebras $\mathbb{C}$, $\mathbb{H}$, and $\mathbb{O}$. The global trivialisation singles out a distinguished class of vector fields: those invariant under all left multiplications.
[definition: Left-Invariant Vector Field]
A vector field $X \in \Gamma(TG)$ is **left-invariant** if $dL_g(X_h) = X_{gh}$ for all $g, h \in G$. The space of left-invariant vector fields $\mathrm{Vect}_L(G)$ is in bijection with $T_eG$: each $v \in T_eG$ determines a unique left-invariant vector field $X_v$ with $X_v(e) = v$.
[/definition]
Since left-invariant vector fields are closed under the Lie bracket (because $dL_g$ preserves brackets), $\mathrm{Vect}_L(G)$ is a finite-dimensional Lie algebra of dimension $\dim(T_eG)$. The key point is that left-invariance provides a canonical identification of all tangent spaces with the single space $T_eG$, collapsing the infinite-dimensional algebra $\Gamma(TG)$ down to a finite-dimensional object.
[definition: Lie Algebra of a Lie Group]
The **Lie algebra** of a Lie group $G$ is $\mathfrak{g} := T_eG \cong \mathrm{Vect}_L(G)$.
[/definition]
The Lie algebra is finite-dimensional but encodes much of the local structure of $G$. The remarkable fact proved below is that each element of $\mathfrak{g}$ generates a flow that runs for all time, giving a 1-parameter subgroup $\mathbb{R} \to G$. These 1-parameter subgroups are precisely what the exponential map parametrises.
[quotetheorem:1521]
[citeproof:1521]
The completeness of left-invariant fields is a special feature of Lie groups, not shared by arbitrary vector fields on non-compact manifolds. Left-invariance is the key: the group structure lets us translate any finite-time trajectory to start anywhere in $G$, so no finite time can be a maximal time of existence. A general smooth vector field on $G$ need not be complete. Each complete left-invariant flow defines a 1-parameter subgroup $t \mapsto \varphi_t(e)$ in $G$, and the exponential map packages all these subgroups simultaneously.
[definition: Exponential Map]
The **exponential map** $\exp : \mathfrak{g} \to G$ sends $\xi \in \mathfrak{g}$ to $\gamma_\xi(1)$, where $\gamma_\xi$ is the integral curve of $X_\xi$ starting at $e$. It satisfies $d(\exp)|_0 = \mathrm{id}$ and is a local diffeomorphism near $0 \in \mathfrak{g}$.
[/definition]
The exponential map provides local coordinates on $G$ near the identity (the exponential chart) and is the canonical bridge between the Lie algebra and the group. For matrix groups the formula is entirely explicit, confirming that the abstract construction agrees with the familiar matrix power series.
[example: Matrix Exponential]
For $G = GL_n(\mathbb{R})$ with Lie algebra $\mathfrak{gl}_n \cong \mathrm{Mat}_n(\mathbb{R})$, the exponential map is the matrix exponential: $\exp(A) = e^A = I + A + A^2/2! + \cdots$. The curve $t \mapsto e^{tA}$ is the unique 1-parameter subgroup with tangent vector $A$ at the identity.
[/example]
If $\varphi : G \to H$ is a Lie group homomorphism, then $\exp \circ\, d\varphi = \varphi \circ \exp$, i.e., the diagram of Lie groups and their algebras commutes. This naturality confirms that the exponential map is the canonical passage from the infinitesimal to the global, not an ad hoc construction.
## Tensors and the Lie Derivative
### Tensor Bundles
With the tangent bundle $TM$ and its dual $T^*M$ in hand, we can build all multilinear objects over each point by taking tensor products of these two spaces. This generates metrics (symmetric $(0,2)$-tensors), differential forms (alternating $(k,0)$-tensors), and connections — all sharing the structure of sections of a tensor bundle. The algebraic operations of contraction, symmetrisation, and alternation then become global operations on sections, giving the language needed to write the fundamental equations of Riemannian and symplectic geometry.
[definition: Tensor Field]
A **(k,l)-tensor on $M$** is a section of the bundle $(T^*M)^{\otimes k} \otimes (TM)^{\otimes l}$, where $V^{\otimes k} = \bigotimes_{i=1}^k V$. A $(1,0)$-tensor is a 1-form; a $(0,1)$-tensor is a vector field.
[/definition]
### The Lie Derivative
Given a tensor field $\tau$ and a vector field $X$, one wants to differentiate $\tau$ in the direction of $X$. The naive approach — taking partial derivatives in a coordinate direction — is not intrinsic and depends on the chart. The correct notion is the Lie derivative: pull $\tau$ back along the flow $\varphi_t$ of $X$, then differentiate with respect to $t$ at $t=0$. The result is again a tensor field of the same type as $\tau$, defined without any choice of connection or coordinates.
[definition: Lie Derivative]
The **Lie derivative** of a tensor $\tau$ along a vector field $X$ is
\begin{align*}
\mathcal{L}_X \tau := \frac{d}{dt}\bigg|_{t=0} (\varphi_t)^* \tau,
\end{align*}
where $\varphi_t$ is the flow of $X$ and $(\varphi_t)^*$ acts on $(k,l)$-tensors via $\varphi_*$ and $\varphi^*$ on each factor.
[/definition]
The Lie derivative measures how $\tau$ changes as one flows along $X$. Some key properties:
- $\mathcal{L}_X f = X \cdot f$ for functions $f$ (recovering the earlier formula);
- $\mathcal{L}_X Y = [X,Y]$ for vector fields $Y$ — this is exactly the content of the commutator characterisation in terms of flows;
- $\mathcal{L}_X [Y,Z] = [\mathcal{L}_X Y, Z] + [Y, \mathcal{L}_X Z]$ (a Jacobi-type identity for the Lie derivative).
## Integrability and the Frobenius Theorem
### Distributions and Involutivity
A **distribution** is a subbundle $E \subset TM$ of rank $k$: a smooth assignment of a $k$-dimensional subspace $E_p \subset T_pM$ at each point. For a single vector field (rank 1), we can always integrate it into integral curves; the Frobenius theorem asks when a higher-rank distribution can be integrated into submanifolds.
[definition: Involutive Distribution and Integrability]
A distribution $E \subset TM$ is **involutive** if $[X,Y] \in \Gamma(E)$ for all $X, Y \in \Gamma(E)$ (it is closed under the Lie bracket).
A submanifold $N \subset M$ is an **integral submanifold** of $E$ if $T_pN = E_p$ for all $p \in N$. The distribution $E$ is **integrable** if every point of $M$ has an integral submanifold through it.
[/definition]
[example: A Non-Integrable Distribution]
Consider the distribution $E \subset T\mathbb{R}^3$ spanned at $(x,y,z)$ by $\partial_y$ and $\partial_x + y\,\partial_z$. This is a rank-2 subbundle of $T\mathbb{R}^3$. To check involutivity in coordinates, compute the Lie bracket of the two spanning sections:
\begin{align*}
[\partial_y,\, \partial_x + y\,\partial_z] = \partial_z,
\end{align*}
which is not in $E$ (since $\partial_z \notin \mathrm{span}\{\partial_y, \partial_x + y\partial_z\}$). So $E$ is not involutive, hence not integrable by the Frobenius theorem below. Concretely: if there were a surface $\Sigma \subset \mathbb{R}^3$ with $0 \in \Sigma$ such that $T_p\Sigma = E_p$ for all $p \in \Sigma$, then $\Gamma(T\Sigma)$ would be closed under the Lie bracket — but $\partial_y$ and $\partial_x + y\partial_z$ are sections of $E = T\Sigma$ while their bracket $\partial_z$ is not, a contradiction. This is the contact distribution, and its non-integrability is a fundamental fact in contact geometry.
[/example]
The example shows the Lie bracket functioning as an obstruction detector: a single bracket computation certifies that the distribution cannot be flattened into coordinate planes. The Frobenius theorem says this is the only obstruction — involutivity is both necessary and sufficient for integrability.
### The Frobenius Integrability Theorem
For a rank-1 distribution — a line field spanned by a nowhere-vanishing vector field $X$ — the ODE existence theorem already provides integral curves through every point, so the line field is automatically integrable. The question becomes interesting for higher rank: given a rank-$k$ distribution $E \subset TM$, when does there exist, through each point, a $k$-dimensional submanifold whose tangent spaces agree with $E$? The contact distribution on $\mathbb{R}^3$ spanned by $\partial_y$ and $\partial_x + y\,\partial_z$ shows that integrability can fail even in a smooth, nowhere-degenerate example: no surface can have that distribution as its tangent spaces, and the obstruction is detected by a single Lie bracket, $[\partial_y, \partial_x + y\,\partial_z] = \partial_z \notin E$.
This suggests that closure under the Lie bracket — involutivity — is at least a necessary condition for integrability, since the bracket of two sections of $TN$ for a submanifold $N$ is again a section of $TN$. The Frobenius theorem is the remarkable statement that involutivity is also sufficient, and in the strongest possible form: an involutive distribution is not merely integrable, it can be flattened into the coordinate planes of a chart. Every obstruction to integrability lives in a single bracket.
[quotetheorem:1522]
[citeproof:1522]
The converse also holds: any integrable distribution has an involutive tangent bundle (since the tangent bundle of any submanifold is closed under brackets). So involutive $\iff$ integrable. The contact distribution recalled above is the canonical demonstration of what fails when involutivity is dropped: $[\partial_y, \partial_x + y\partial_z] = \partial_z$ is transverse to $E$, so at any point the two flows of the spanning fields move the base point off every candidate integral surface within a single bracket's worth of time. This distinction between distributions that integrate to foliations and those that do not is exactly the holonomic / non-holonomic split in mechanics: holonomic constraints cut out configuration submanifolds and can be solved by restricting coordinates, while non-holonomic constraints (such as the rolling-without-slipping condition on a ball) cannot — they are genuine constraints on velocities but allow the system to reach every configuration through a sequence of admissible motions, precisely because the bracket generates directions outside the distribution.
The Frobenius theorem was proved by reducing to the case of commuting flows. The following lemma makes this reduction precise, characterising the vanishing of $[X,Y]$ in terms of the flows of $X$ and $Y$.
[quotetheorem:1523]
[citeproof:1523]
The smoothness hypothesis enters both in the Taylor expansions used to derive the flow formula for $[X,Y]$ and in the Picard–Lindelöf step producing the flows in the first place: with merely continuous $X, Y$ the flows may fail to exist uniquely and the limit defining the bracket need not converge. The theorem also relies on the manifold being Hausdorff — on the line with doubled origin the flow of a vector field crossing the double point is not even well-defined as a single map, so neither side of the equivalence makes literal sense. In mechanics the commutativity of flows is the geometric content of Poisson-commuting Hamiltonians: if $\{H_1, H_2\} = 0$, their Hamiltonian vector fields commute, their flows commute, and $H_2$ becomes a conserved quantity along the $H_1$-flow — exactly the setup for Liouville integrability, where $n$ commuting Hamiltonians on a $2n$-dimensional phase space foliate it into invariant tori. The same machinery, applied to time-dependent vector fields, yields the theory of non-autonomous flows via the associated autonomous field on $M \times \mathbb{R}$, where commuting with $\partial_t$ recovers time-translation invariance.
### Applications of Frobenius
With involutivity established as the single obstruction to integrability, the Frobenius theorem becomes a powerful constructive device: any time a problem reduces to producing a submanifold with prescribed tangent directions, one checks a bracket condition and the submanifold is delivered. The three applications below — passing from Lie subalgebras to Lie subgroups, constructing quotient manifolds by group actions, and recognising Grassmannians as smooth manifolds — are archetypal. Each builds a new smooth object (a subgroup, a quotient, a moduli space) by assembling local integral data that Frobenius guarantees fit together coherently.
[remark: Lie Subalgebras and Connected Subgroups]
If $G$ is a Lie group and $\mathfrak{h} \subset \mathfrak{g} = T_eG$ is a Lie subalgebra, then the left-invariant vector fields generated by $\mathfrak{h}$ define an involutive distribution on $G$. By the Frobenius theorem, this distribution integrates to a foliation of $G$. The leaf through $e$ is a connected Lie subgroup $H \subset G$ with $T_eH = \mathfrak{h}$.
[/remark]
This subalgebra-to-subgroup correspondence underlies Lie's fundamental theorems. With Frobenius providing the integrability, we turn to the question of when a group action on a manifold produces a quotient manifold.
[quotetheorem:1524]
[citeproof:1524]
The quotient manifold theorem is the primary device for constructing smooth manifolds from group actions. The Grassmannian is the canonical example.
[example: Grassmannians]
The Grassmannian $G_k(\mathbb{R}^n) = \{k\text{-dimensional subspaces of }\mathbb{R}^n\}$ satisfies
\begin{align*}
G_k(\mathbb{R}^n) \cong \frac{O(n)}{O(k) \times O(n-k)},
\end{align*}
obtained by letting $O(k) \times O(n-k)$ act freely on $O(n)$ by choosing orthonormal bases for the subspace and its complement. By the quotient manifold theorem, $G_k(\mathbb{R}^n)$ is a smooth manifold. For $k=1$, this recovers $G_1(\mathbb{R}^n) \cong \mathbb{R}P^{n-1}$.
[/example]
Grassmannians appear throughout geometry and topology as classifying spaces for vector bundles. The quotient construction makes their smooth structure manifest. The next example shows what goes wrong when the hypotheses of the quotient theorem — freeness and compactness — fail.
[example: Non-Hausdorff Quotient]
Let $G = \mathbb{R}$ act on $T^2 = S^1 \times S^1$ via $(\theta_1, \theta_2) \mapsto (\theta_1 + c_1 t, \theta_2 + c_2 t)$ in angle coordinates, where $c_1/c_2 \notin \mathbb{Q}$. The action is smooth but not free in the right sense, and the quotient $T^2/G$ is not Hausdorff, hence cannot be a manifold. This illustrates why the freeness and compactness hypotheses in the quotient theorem are genuinely necessary.
[/example]
Now that we have the precise language of smooth manifolds and their quotients, we turn to calculus. Differential forms are the natural way to integrate over these curved spaces, extending the exterior derivative as the fundamental operation that connects all of analysis on manifolds.
# 2. Differential Forms and Curvature
Chapter 1 built the foundational language of smooth manifolds: charts, tangent bundles, cotangent bundles, and the Frobenius integrability theorem. This chapter develops the calculus that lives on top of that structure. The central objects are differential forms — the correct notion of "things you can integrate on a manifold" — and the algebraic machinery of the exterior derivative, which organises them into a cochain complex. The cohomology of that complex, the de Rham cohomology, turns out to encode deep topological information about the manifold. We then examine what it means to integrate and orient a manifold, culminating in the general form of Stokes' theorem. The chapter closes with Moser's theorem, a beautiful application of the Lie derivative showing that any two volume forms of equal total volume on a closed manifold are equivalent up to diffeomorphism.
## Tensors and the Exterior Algebra
What does it mean to "measure area" on a curved surface, or to detect whether a map reverses orientation? Ordinary multilinear maps $T: V^q \to \mathbb{R}$ are not sufficient: they do not automatically capture the signed, coordinate-independent character of area and volume elements. The determinant hints at the right structure — it is multilinear and changes sign when two arguments are swapped. Formalising this observation leads to alternating tensors and the exterior algebra, the correct linear-algebraic setting for differential forms. The key construction is the tensor product.
[definition: Tensor Product]
Given vector spaces $U$ and $V$, there is a vector space $U \otimes V$, called the **tensor product** of $U$ and $V$, together with a bilinear map $\pi: U \times V \to U \otimes V$, uniquely characterised (up to isomorphism) by the following universal property: for every vector space $W$ and every bilinear map $\alpha: U \times V \to W$, there exists a unique linear map $\hat{\alpha}: U \otimes V \to W$ such that $\alpha = \hat{\alpha} \circ \pi$.
In other words, the tensor product is the universal recipient of bilinear maps: every bilinear map on $U \times V$ factors uniquely through $U \otimes V$.
[/definition]
There are several equivalent ways to construct $U \otimes V$. The most concrete, valid in finite dimensions, takes bases $\{u_i\}$ and $\{v_j\}$ for $U$ and $V$ respectively and declares $U \otimes V$ to be the vector space with basis $\{u_i \otimes v_j\}_{i,j}$, so $\dim(U \otimes V) = \dim(U)\dim(V)$. The element $u \otimes v$ is then just $\pi(u,v)$, and bilinearity of $\pi$ forces
\begin{align*}
\left(\sum_i a_i u_i\right) \otimes \left(\sum_j b_j v_j\right) = \sum_{i,j} a_i b_j \, u_i \otimes v_j.
\end{align*}
From now on we work exclusively with finite-dimensional vector spaces, since all tangent spaces are finite-dimensional.
The tensor product satisfies several important structural identities: $U \otimes V \cong V \otimes U$, $(U \otimes V)^* \cong V^* \otimes U^*$, and $\operatorname{Hom}(U,V) \cong U^* \otimes V$.
[definition: Tensor of Type $(p,q)$]
Given a smooth manifold $M$, the vector bundle $(TM)^{\otimes p} \otimes (T^*M)^{\otimes q} \to M$ is well-defined by taking tensor products fibrewise. A section of this bundle is called a **tensor of type $(p,q)$** on $M$.
[/definition]
The preceding definition packages contravariant and covariant tensor fields into a single graded object, but the notation is most useful when we recognise which familiar geometric quantities it subsumes. Before proceeding to alternating tensors, it is worth pausing to see how standard objects — covector fields, bilinear forms, determinants — fall out as special cases of the $(p,q)$ formalism. This keeps the abstract construction tethered to objects already familiar from linear algebra and Riemannian geometry.
[remark: Recovering familiar objects]
A $(0,1)$-tensor is just a covector field, i.e. a section of $T^*M$. A $(0,2)$-tensor is a bilinear form on each tangent space — the Riemannian metric, for instance, is a symmetric $(0,2)$-tensor. The determinant of a linear map $\alpha: V \to V$ is a $(0,k)$-tensor, where $k = \dim(V)$.
[/remark]
Among all $(0,q)$-tensors, the alternating ones play a privileged role. The reason is geometric: when we eventually integrate over a manifold, the change-of-variables formula contributes a determinant, and the determinant is itself an alternating function of its column vectors. More fundamentally, alternating tensors are the objects that transform correctly under orientation-reversing maps (picking up a sign) rather than collapsing all orientation information. They also satisfy the crucial nilpotency $v \wedge v = 0$, which encodes the intuition that a degenerate parallelepiped — one with a repeated edge direction — has zero volume. It is this combination of signed multilinearity and nilpotency that makes alternating tensors the correct building blocks for integration on manifolds. A $(0,q)$-tensor $T: V^q \to \mathbb{R}$ is alternating if swapping any two arguments changes the sign:
[definition: Alternating Tensor]
Let $V$ be a finite-dimensional real vector space. A **$(0,q)$-tensor** $T: V^q \to \mathbb{R}$ is called **alternating** (or antisymmetric) if
\begin{align*}
T(v_1,\dots,v_i,\dots,v_j,\dots,v_q) = -T(v_1,\dots,v_j,\dots,v_i,\dots,v_q)
\end{align*}
for all $1 \le i < j \le q$ and all vectors $v_1,\dots,v_q \in V$.
[/definition]
We can project any tensor onto its alternating part. For $\pi \in \operatorname{Sym}_q$ (the symmetric group) with sign $(-1)^\pi$, define the $\pi$-permutation of $T$ by $T^\pi(v_1,\dots,v_q) := T(v_{\pi(1)},\dots,v_{\pi(q)})$. Then
\begin{align*}
\operatorname{Alt}(T) := \frac{1}{q!} \sum_{\pi \in \operatorname{Sym}_q} (-1)^\pi T^\pi
\end{align*}
is alternating, and $T$ is already alternating if and only if $\operatorname{Alt}(T) = T$.
We can multiply alternating tensors together using the **wedge product**: if $\alpha$ is an alternating $(0,k)$-tensor and $\beta$ is an alternating $(0,l)$-tensor, set
\begin{align*}
\alpha \wedge \beta := \frac{(k+l)!}{k!\,l!} \operatorname{Alt}(\alpha \otimes \beta),
\end{align*}
where $(\alpha \otimes \beta)(v_1,\dots,v_{k+l}) := \alpha(v_1,\dots,v_k)\beta(v_{k+1},\dots,v_{k+l})$.
[definition: Exterior Powers]
Write $\Lambda^k V^*$ for the space of alternating $(0,k)$-tensors on $V$. The **exterior algebra** of $V^*$ is
\begin{align*}
\Lambda^*(V^*) := \bigoplus_{q \ge 0} \Lambda^q(V^*),
\end{align*}
where $\Lambda^0(V^*) = \mathbb{R}$ and $\Lambda^1(V^*) = V^*$, equipped with the wedge product $\wedge$.
[/definition]
Having defined $\Lambda^*(V^*)$ as a graded algebra, we now need to understand how its multiplication behaves. Unlike the tensor algebra, the wedge product is not strictly commutative — instead it obeys a sign rule governed by the degrees of the factors. The next remark records this graded commutativity together with the associativity that is required to make $\Lambda^*(V^*)$ a genuine associative algebra.
[remark: Graded commutativity]
The wedge product is graded commutative: for $\alpha \in \Lambda^k(V^*)$ and $\beta \in \Lambda^l(V^*)$,
\begin{align*}
\alpha \wedge \beta = (-1)^{kl}\, \beta \wedge \alpha.
\end{align*}
In particular, $\alpha \wedge \alpha = 0$ whenever $\alpha$ is a 1-form (or has odd degree). Wedge is also associative: for $\alpha \in \Lambda^k(V^*)$, $\beta \in \Lambda^l(V^*)$, $\gamma \in \Lambda^m(V^*)$, one verifies $(\alpha \wedge \beta) \wedge \gamma = \alpha \wedge (\beta \wedge \gamma)$ directly from the definition $\alpha \wedge \beta = \tfrac{(k+l)!}{k!l!}\operatorname{Alt}(\alpha \otimes \beta)$: both sides equal $\tfrac{(k+l+m)!}{k!l!m!}\operatorname{Alt}(\alpha \otimes \beta \otimes \gamma)$, using the fact that $\operatorname{Alt}$ is a projection ($\operatorname{Alt} \circ \operatorname{Alt} = \operatorname{Alt}$).
[/remark]
A crucial structural fact determines the dimension of $\Lambda^k V^*$: if $\{\phi_i\}$ is a basis of $V^*$, then for each multi-index $I = \{i_1 < \cdots < i_k\}$, the element $\phi_I := \phi_{i_1} \wedge \cdots \wedge \phi_{i_k}$ is well-defined, and the collection $\{\phi_I\}$ forms a basis for $\Lambda^k V^*$. Therefore
\begin{align*}
\dim(\Lambda^k V^*) = \binom{\dim V}{k}.
\end{align*}
In particular, $\Lambda^n V^* \cong \mathbb{R}$ when $n = \dim V$ (one-dimensional, spanned by any $\phi_1 \wedge \cdots \wedge \phi_n$), and $\Lambda^k V^* = 0$ for $k > \dim V$.
A linear map $\alpha: U \to V$ induces, for each $k$, a map $\Lambda^k \alpha: \Lambda^k U \to \Lambda^k V$ by $u_1 \wedge \cdots \wedge u_k \mapsto \alpha(u_1) \wedge \cdots \wedge \alpha(u_k)$. Tracing through the definition, $\Lambda^{\dim U}\alpha: \Lambda^{\dim U} U \cong \mathbb{R} \to \mathbb{R}$ is multiplication by $\det(\alpha)$. This yields the **determinant bundle**: for a vector bundle $E \to M$ of rank $k$, one sets $\det(E) := \Lambda^k E$, a canonical line bundle over $M$.
## Differential Forms and the Exterior Derivative
On a manifold $M$, the differential $d: C^\infty(M) \to \Omega^1(M)$ sends a function to its 1-form derivative. The question is: what global operator plays the role of $d$ for higher-degree forms? We want an operator that extends $d$ on functions, produces a signed Leibniz rule, and satisfies $d^2 = 0$ — but we need it to be intrinsic, independent of the choice of local coordinates. It is not obvious that such an operator exists globally, or that it is unique. The answer, the exterior derivative, turns out to be uniquely characterised by these properties alone.
With the exterior algebra established pointwise, we globalise to manifolds.
[definition: Differential Forms]
Let $M$ be a smooth $n$-manifold. The vector space of **differential $i$-forms** on $M$ is
\begin{align*}
\Omega^i(M) := \Gamma\!\left(\Lambda^i(T^*M)\right),
\end{align*}
the space of smooth sections of $\Lambda^i(T^*M)$. The full **de Rham algebra** is $\Omega^*(M) = \bigoplus_{i=0}^n \Omega^i(M)$ with multiplication given by $\wedge$.
[/definition]
So $\Omega^0(M) = C^\infty(M)$, and $\Omega^1(M) = \Gamma(T^*M)$ are the familiar 1-forms. Note that $\Omega^i(M) = 0$ for $i < 0$ or $i > n$.
[definition: Pullback of Differential Forms]
Let $f: M \to N$ be a smooth map. The **pullback** $f^*: \Omega^k(N) \to \Omega^k(M)$ is defined by
\begin{align*}
(f^*\omega)_p(v_1,\dots,v_k) := \omega_{f(p)}(df_p(v_1),\dots,df_p(v_k))
\end{align*}
for $\omega \in \Omega^k(N)$, $p \in M$, and $v_1,\dots,v_k \in T_pM$. The pullback $f^*: \Omega^*(N) \to \Omega^*(M)$ respects the grading and satisfies $f^*(\omega \wedge \theta) = f^*\omega \wedge f^*\theta$ and $(f \circ g)^* = g^* \circ f^*$. Locally, if $f: U \to V$ with coordinates $x_1,\dots,x_k$ on $U$ and $y_1,\dots,y_l$ on $V$, and if $\omega = \sum_I a_I \, dy_I \in \Omega^r(V)$, then
\begin{align*}
f^*\omega = \sum_I a_I \, df_{i_1} \wedge \cdots \wedge df_{i_r}, \quad \text{where } df_i = \sum_j \frac{\partial f_i}{\partial x_j} dx_j.
\end{align*}
[/definition]
We already know how to differentiate functions: $d: \Omega^0(M) \to \Omega^1(M)$ sends $f \mapsto df$, where $df(X) = X \cdot f$ for vector fields $X$. The following proposition shows this extends uniquely to all forms.
[quotetheorem:1525]
[citeproof:1525]
With the abstract characterisation of $d$ now in hand, it is useful to see the operator at work in a concrete coordinate setting. The following example illustrates two features at once: the mechanical computation of $d\omega$ from the local formula $d(\sum_I \omega_I dx_I) = \sum_I d\omega_I \wedge dx_I$, and the way nilpotency $dx_i \wedge dx_i = 0$ causes many terms to vanish automatically. This is the kind of calculation one performs constantly in practice.
[example: Exterior derivative in coordinates]
On $M = \mathbb{R}^3$ with coordinates $(x_1, x_2, x_3)$, let $\omega = x_1 x_2 \, dx_1 \wedge dx_2$. Then
\begin{align*}
d\omega &= d(x_1 x_2) \wedge dx_1 \wedge dx_2 = (x_2 \, dx_1 + x_1 \, dx_2) \wedge dx_1 \wedge dx_2 \\
&= x_1 \, dx_2 \wedge dx_1 \wedge dx_2 = 0,
\end{align*}
since $dx_2 \wedge dx_2 = 0$. On the other hand, for $\alpha = x_1 \, dx_2$, one computes $d\alpha = dx_1 \wedge dx_2$, which is indeed $\omega/x_2$ near $x_2 \ne 0$, illustrating that $\omega$ is exact on that region.
[/example]
The exterior derivative gives a concrete formula for $d\omega$ on vector fields:
[quotetheorem:1526]
[citeproof:1526]
This formula provides an intrinsic, coordinate-free definition of the exterior derivative and will be used repeatedly when computing curvature. Each axiom in the Exterior Derivative theorem plays a role: dropping $\mathbb{R}$-linearity (1) would allow $d$ to depend on a choice of connection; dropping Leibniz (2) would break compatibility with the ring structure of $\Omega^*(M)$; dropping $d^2 = 0$ (3) is the one condition that has content — it is equivalent to the commutativity of mixed partial derivatives and is what makes cohomology possible; dropping naturality (4) would mean $d$ is not intrinsic to the manifold but depends on an embedding. The vector-field formula is intrinsic in that it references only Lie brackets and actions on functions — no coordinates needed — but it is less convenient for computation than the coordinate formula. The tradeoff is that the intrinsic formula makes many proofs (especially those involving naturality and Lie derivatives) transparent, while the coordinate formula is better suited to explicit calculations.
## de Rham Cohomology
What topological information is detected by closed-mod-exact forms? A 1-form $\omega$ on $\mathbb{R}^2 \setminus \{0\}$ illustrates the issue concretely. Consider the angle form
\begin{align*}
\omega = d\theta = \frac{-y\,dx + x\,dy}{x^2+y^2}.
\end{align*}
One computes $d\omega = 0$ (it is closed), but $\int_{S^1} \omega = 2\pi \ne 0$, so $\omega$ cannot equal $df$ for any smooth function on $\mathbb{R}^2 \setminus \{0\}$ — if it did, the integral over the closed curve $S^1$ would be zero by the fundamental theorem of calculus. The obstruction is the hole at the origin: there is no smooth branch of $\theta$ on all of $\mathbb{R}^2 \setminus \{0\}$. The ratio $\ker d / \operatorname{Im} d$ measures precisely this kind of topological obstruction — it is trivial on contractible spaces (Poincaré lemma) and nontrivial when there are holes or handles.
Since $d^2 = 0$, the sequence
\begin{align*}
\Omega^0(M) \xrightarrow{d} \Omega^1(M) \xrightarrow{d} \Omega^2(M) \xrightarrow{d} \cdots
\end{align*}
is a cochain complex. This chain complex encodes global topological information.
[definition: Closed and Exact Forms]
A form $\omega \in \Omega^i(M)$ is **closed** if $d\omega = 0$, and **exact** if $\omega = d\theta$ for some $\theta \in \Omega^{i-1}(M)$. Since $d^2 = 0$, every exact form is closed.
[/definition]
The inclusion $\{\text{exact}\} \subseteq \{\text{closed}\}$ is automatic from $d^2 = 0$, but the reverse inclusion can fail — and when it fails, it does so in a way that reflects the global topology of $M$, as the angle-form example on $\mathbb{R}^2 \setminus \{0\}$ already indicated. The quotient of closed forms by exact forms packages this failure into an honest invariant. That quotient is the central definition of this section.
[definition: de Rham Cohomology]
The **$i$-th de Rham cohomology group** of $M$ is
\begin{align*}
H^i_{dR}(M) := \frac{\ker(d: \Omega^i(M) \to \Omega^{i+1}(M))}{\operatorname{Im}(d: \Omega^{i-1}(M) \to \Omega^i(M))}.
\end{align*}
The full de Rham cohomology is $H^*_{dR}(M) = \bigoplus_{i=0}^n H^i_{dR}(M)$.
[/definition]
By naturality of $d$, a smooth map $f: M \to N$ induces pullback maps $f^*: H^i_{dR}(N) \to H^i_{dR}(M)$, so $H^*_{dR}$ is a contravariant functor. In particular, $H^*_{dR}(M)$ is a diffeomorphism invariant of $M$.
The most basic computation comes from the Poincaré lemma, which tells us that closed forms are locally exact.
[quotetheorem:832]
[citeproof:832]
The Poincaré lemma has an important corollary for general manifolds: every closed form is locally exact. This means the de Rham cohomology measures purely global (topological) obstruction to exactness. The star-shapedness condition cannot be relaxed: on $\mathbb{R}^2 \setminus \{0\}$, which is not star-shaped, the angle form $d\theta = \frac{-y\,dx+x\,dy}{x^2+y^2}$ is closed but not exact, giving $H^1_{dR}(\mathbb{R}^2 \setminus \{0\}) \ne 0$. This is the simplest example of a topological invariant detected by de Rham cohomology.
[remark: Comparison theorem]
It is a fundamental theorem (which we quote without proof, as it requires tools from algebraic topology) that
\begin{align*}
H^*_{dR}(M) \cong H^*_{\operatorname{sing}}(M;\mathbb{R}),
\end{align*}
i.e. de Rham cohomology is naturally isomorphic to singular cohomology with real coefficients. This is the de Rham theorem, and it shows that $H^*_{dR}(M)$ is always finite-dimensional when $M$ is compact.
[/remark]
## Orientation and Integration
To integrate a function on a manifold, one would naively use local charts and piece together with a partition of unity. The problem is that under an orientation-reversing coordinate change, the integral picks up a sign, so the result is not intrinsic. Not every manifold can be oriented: the Möbius band, for instance, is a non-orientable 2-manifold with boundary. Concretely, the Möbius band is obtained from the square $[0,1] \times [0,1]$ by identifying $(0,t) \sim (1,1-t)$; any top form defined near one boundary edge comes back with the opposite sign after traversing the band, so no nowhere-vanishing 2-form can exist globally. The projective plane $\mathbb{R}P^2$ is a closed non-orientable surface for the same reason. The solution, for orientable manifolds, is to integrate differential forms rather than functions, since forms carry the orientation information in their wedge structure.
Recall the concrete transformation law: if $f: U \to V$ with $U,V \subset \mathbb{R}^n$ open, then
\begin{align*}
f^*(dy_1 \wedge \cdots \wedge dy_n) = \det(Df)\, dx_1 \wedge \cdots \wedge dx_n.
\end{align*}
The absolute value of $\det(Df)$ appears in the classical change-of-variables formula, but if $\det(Df) > 0$ everywhere, then no absolute value is needed. This observation motivates orientation.
[definition: Orientation]
An $n$-manifold $M$ is **orientable** if it admits a nowhere-vanishing $n$-form $\omega \in \Omega^n(M)$, called a **volume form**. An **orientation** of $M$ is a choice of volume form up to multiplication by everywhere-positive smooth functions. Equivalently, orientability asks that the determinant line bundle $\det(TM) = \Lambda^n(TM)$ — which is always a line bundle over $M$ — admits a nowhere-vanishing global section; this is exactly the condition that $\Lambda^n T^*M$ is a trivial line bundle $M \times \mathbb{R}$. When $M$ is not orientable, the tangent bundle $TM$ has a non-trivial orientation double cover.
[/definition]
The global definition of orientation via a nowhere-vanishing $n$-form is conceptually clean but not always the most convenient to check in practice. It is often easier to work chart-by-chart, asking only that local coordinate systems fit together coherently. The following theorem gives the precise equivalence: orientability is the same as the existence of an atlas whose transition maps are all orientation-preserving, and this is the criterion one actually verifies when proving that a specific manifold is orientable.
[quotetheorem:1527]
[citeproof:1527]
The Möbius band witnesses that orientability is a genuine restriction: no atlas on the Möbius band has all transition maps with positive Jacobian determinant, and correspondingly $\Lambda^2 T^*(\text{Möbius})$ is a nontrivial line bundle. The non-orientable closed surface $\mathbb{R}P^2$ provides another example: $H^2_{dR}(\mathbb{R}P^2; \mathbb{R}) = 0$, consistent with the absence of a global volume form.
The boundary of an oriented manifold inherits a canonical orientation, sometimes called the Stokes orientation. In low dimensions, the convention can be read off directly: for the interval $[0,1]$ as a 1-manifold with boundary $\{0,1\}$, the outward normal at $1$ is $+\partial_x$ and the outward normal at $0$ is $-\partial_x$. The induced orientation at $1$ is $+1$ (positive) and at $0$ is $-1$ (negative), which is why $\int_{[0,1]} df = f(1) - f(0)$ — the sign difference between the endpoints is exactly the Stokes orientation.
[quotetheorem:1528]
[citeproof:1528]
Now we define integration. For non-orientable manifolds, the definition fails: without a consistent orientation, the local integrals can cancel rather than add. One can recover a notion of integration on non-orientable manifolds by replacing differential forms with **densities** (objects that transform by $|\det(Df)|$ rather than $\det(Df)$), but this is beyond the present chapter. For non-compact support, one requires that $\omega$ has compact support or imposes absolute integrability conditions; without compactness of support the partition-of-unity sum may fail to converge. Finally, for $k < n$ forms on an $n$-manifold $M$, integration over $M$ is not defined — one can only integrate $k$-forms over $k$-dimensional submanifolds. The key point is that an $n$-form on an oriented $n$-manifold can be integrated intrinsically, because orientation-preserving coordinate changes contribute $+\det(Df)$ (not $|\det(Df)|$), so the partition-of-unity construction is well-defined.
[definition: Compactly Supported Forms]
Let $\Omega^*_c(M)$ denote the subspace of $\Omega^*(M)$ consisting of forms with compact support. For compact $M$, $\Omega^*_c(M) = \Omega^*(M)$.
[/definition]
Compact support is the technical ingredient that makes the partition-of-unity construction converge: only finitely many $\rho_\alpha$ are nonzero on $\operatorname{supp}(\omega)$, so the sum defining the integral is actually finite at each stage. With this hypothesis in place, we can now state the main theorem of this section, which asserts that the local Euclidean integrals assemble into a single well-defined linear functional on $\Omega^n_c(M)$.
[quotetheorem:1529]
[citeproof:1529]
The abstract construction is best digested through a concrete computation on the simplest non-trivial oriented manifold. The circle $S^1$ admits a natural orientation and a canonical closed 1-form, the angle form $d\theta$, whose integral detects the non-triviality of $H^1_{dR}(S^1)$. The following example carries out the integration explicitly and foreshadows the cohomological significance of the answer.
[example: Integration on $S^1$]
Identify $S^1 \subset \mathbb{R}^2$ as the unit circle oriented counterclockwise. The standard angle coordinate $\theta \in (0, 2\pi)$ provides a chart on $S^1 \setminus \{(1,0)\}$. The 1-form $\omega = d\theta$ is closed (as $d^2\theta = 0$). Its integral is $\int_{S^1} d\theta = 2\pi \ne 0$, so $\omega$ is not exact on $S^1$. This is the generator of $H^1_{dR}(S^1) \cong \mathbb{R}$.
[/example]
### Stokes' Theorem
The culmination of the integration theory is the general form of Stokes' theorem, which unifies the divergence theorem, Green's theorem, and the classical Stokes' theorem in $\mathbb{R}^3$.
[quotetheorem:1530]
[citeproof:1530]
The compactness-of-support hypothesis is essential. On $\mathbb{R}^2 \setminus \{0\}$, consider the 1-form $\eta = \frac{x\,dy - y\,dx}{x^2+y^2}$. One computes $d\eta = 0$ on $\mathbb{R}^2 \setminus \{0\}$, so formally $\int_{\mathbb{R}^2 \setminus \{0\}} d\eta = 0$. But $\int_{S^1} \eta = 2\pi \ne 0$. There is no contradiction: $\eta$ does not have compact support, so Stokes' theorem in the compact-support form does not apply. To apply Stokes on a region like $\{1 \le r \le 2\}$, one must account for both boundary circles, and the contributions cancel correctly. A practical guide: to evaluate $\int_M d\omega$ using Stokes' theorem, first check that $\operatorname{supp}(\omega)$ is compact (or that $M$ is compact), then read off $\int_{\partial M} \omega$ from the boundary restriction.
[remark: Closed manifolds]
When $M$ is a closed $n$-manifold (compact, without boundary), Stokes' theorem gives $\int_M d\omega = 0$ for all $\omega \in \Omega^{n-1}(M)$. In particular, the integral $\int_M: \Omega^n(M) \to \mathbb{R}$ descends to a map on cohomology $\int_M: H^n_{dR}(M) \to \mathbb{R}$, since integrating an exact form gives 0.
[/remark]
The fact that $\int_M$ descends to a map $H^n_{dR}(M) \to \mathbb{R}$ on a closed manifold is already a non-trivial structural consequence of Stokes' theorem. The natural follow-up question is whether this map is injective, surjective, or an isomorphism. For connected compact oriented manifolds, the answer is the best possible: the integration map is an isomorphism, pinning down the top cohomology to a single real number.
[quotetheorem:1531]
[citeproof:1531]
This result, combined with the $H^0_{dR}(M) \cong \mathbb{R}$ result for connected $M$, shows that the de Rham cohomology of compact oriented manifolds is nontrivial in both degree 0 and degree $n$. Orientability is essential: $H^2_{dR}(\mathbb{R}P^2; \mathbb{R}) = 0$, so the top cohomology of a non-orientable manifold can vanish. This is the first hint of Poincaré duality, which will assert that $H^k_{dR}(M) \cong (H^{n-k}_{dR}(M))^*$ for compact oriented $M$ — orientability is required throughout.
## Manifold Type and Finite Dimensionality
The Mayer-Vietoris sequence allows one to compute $H^*_{dR}(M)$ inductively if $M$ can be covered by open sets whose cohomologies are already known. But why should the resulting cohomology groups be finite-dimensional? Poincaré's lemma tells us that contractible pieces have vanishing cohomology in positive degree, and Mayer-Vietoris assembles pieces into a long exact sequence. The obstacle is that a general manifold might require infinitely many pieces, and the long exact sequences might produce infinitely-generated groups at each step. What is needed is a finite covering with contractible intersections — this is the notion of finite type, and compactness supplies it.
To show that de Rham cohomology is always finite-dimensional for compact manifolds, we develop a framework for controlling the complexity of a manifold.
[definition: Manifold Type]
A smooth manifold $M$ has **type $k$** if it admits a covering by $k$ open sets $U_1,\dots,U_k$ such that each $U_i \cong \mathbb{R}^n$ (diffeomorphic to an open ball), and for every multi-index $I = \{i_1 < \cdots < i_m\}$, the intersection $U_I := U_{i_1} \cap \cdots \cap U_{i_m}$ is either empty or diffeomorphic to $\mathbb{R}^n$. We say $M$ has **finite type** if it has type $k$ for some $k \ge 1$.
[/definition]
The definition is clean, but its verifiability is not immediate: even for simple compact manifolds it is not obvious that a good cover — one whose finite intersections are all diffeomorphic to $\mathbb{R}^n$ — actually exists. The sphere $S^2$ provides an instructive test case, because the most natural two-chart atlas (stereographic projection) fails the finite-type condition. Analysing why it fails, and how to repair it, points to the general tool that handles every compact manifold: geodesically convex covers.
[example: Sphere]
$S^2$ does not have type 2 from stereographic projection alone, because the intersection of the two stereographic charts is diffeomorphic to $\mathbb{R}^2 \setminus \{0\}$, which is not contractible. To realise $S^2$ as finite type, one instead uses a cover by geodesically convex open balls with respect to any Riemannian metric: any finite intersection of convex balls is convex, hence star-shaped, hence diffeomorphic to $\mathbb{R}^n$ by the radial contraction map. A cover of $S^2$ by three such balls (arranged so that triple and higher intersections are empty or contractible) witnesses that $S^2$ has finite type. The same argument applies to any compact manifold.
[/example]
The sphere example is the model for the general construction: convexity of small geodesic balls with respect to an arbitrary Riemannian metric gives a uniform supply of contractible charts with contractible intersections, and compactness ensures that finitely many of them suffice. Promoting this argument from the sphere to a general compact manifold yields the following theorem.
[quotetheorem:1532]
To compute de Rham cohomology from a finite-type cover, the main algebraic tool is Mayer-Vietoris.
[quotetheorem:1533]
[citeproof:1533]
Combining Mayer-Vietoris with the Snake Lemma (which produces a long exact sequence in cohomology from a short exact sequence of complexes) yields the **Mayer-Vietoris sequence**:
\begin{align*}
\cdots \to H^i_{dR}(M) \to H^i_{dR}(U) \oplus H^i_{dR}(V) \to H^i_{dR}(U \cap V) \xrightarrow{\delta} H^{i+1}_{dR}(M) \to \cdots
\end{align*}
This long exact sequence is the practical tool for computing de Rham cohomology inductively. For manifolds of finite type, an induction on type using Mayer-Vietoris and the rank-nullity theorem shows:
[quotetheorem:1534]
[citeproof:1534]
For non-finite-type manifolds — such as an infinite disjoint union of circles $\bigsqcup_{n=1}^\infty S^1$ — the cohomology in degree 1 is infinite-dimensional ($H^1$ has one $\mathbb{R}$ factor per component). Even for connected non-compact manifolds, cohomology can be infinite-dimensional: for example, a manifold with infinitely many handles has $H^1$ of infinite rank.
## Moser's Theorem
When are two volume forms on a manifold equivalent under diffeomorphism? A necessary condition is obvious: if $\psi^*\omega_1 = \omega_0$, then $\int_M \omega_0 = \int_M \psi^*\omega_1 = \int_M \omega_1$ (since $\psi$ is a diffeomorphism). The surprising consequence of Moser's theorem is that on a closed manifold, equal total volume is also sufficient — the entire classification of volume forms up to diffeomorphism collapses to a single real number. This fails on non-compact manifolds (where total volume can be infinite or depend on behavior at infinity) and fails when the total volumes differ (for instance, on $S^2$, a form with total area $1$ cannot be pulled back to one with total area $2$ by any diffeomorphism). The proof is a beautiful instance of Moser's method: instead of constructing the diffeomorphism directly, one differentiates the desired condition and reduces it to an ODE for a time-dependent vector field.
The final section of this chapter demonstrates the power of the Lie derivative and Cartan's formulas in this striking geometric result. We first recall the necessary tools.
[definition: Interior Product]
Let $X \in \Gamma(TM)$ be a vector field and $\omega \in \Omega^r(M)$ an $r$-form. The **interior product** (or contraction) $\iota_X \omega \in \Omega^{r-1}(M)$ is defined by
\begin{align*}
(\iota_X \omega)(Y_1,\dots,Y_{r-1}) := \omega(X,Y_1,\dots,Y_{r-1})
\end{align*}
for vector fields $Y_1,\dots,Y_{r-1}$. For $f \in \Omega^0(M) = C^\infty(M)$, set $\iota_X f = 0$.
[/definition]
The interior product is the left-inverse of $\wedge$ in a graded sense: $\iota_X \circ \iota_X = 0$ (by antisymmetry of forms), and $\iota_X(\omega \wedge \omega') = (\iota_X\omega) \wedge \omega' + (-1)^{|\omega|}\omega \wedge (\iota_X\omega')$.
[definition: Lie Derivative]
For a vector field $X \in \Gamma(TM)$ with flow $\varphi_t: M \to M$ (defined for small $t$), the **Lie derivative** of a tensor field $T$ along $X$ is
\begin{align*}
\mathcal{L}_X T := \frac{d}{dt}\bigg|_{t=0} \varphi_t^* T.
\end{align*}
For $\omega \in \Omega^k(M)$, $\mathcal{L}_X \omega \in \Omega^k(M)$. For functions, $\mathcal{L}_X f = X \cdot f$; for vector fields, $\mathcal{L}_X Y = [X,Y]$. The Lie derivative, viewed as an operator on tensor fields of a fixed type, is $\mathcal{L}_X: \Gamma((TM)^{\otimes p} \otimes (T^*M)^{\otimes q}) \to \Gamma((TM)^{\otimes p} \otimes (T^*M)^{\otimes q})$, and it extends to all tensor fields as a derivation: $\mathcal{L}_X(T \otimes S) = (\mathcal{L}_X T) \otimes S + T \otimes (\mathcal{L}_X S)$.
[/definition]
The definition of $\mathcal{L}_X$ via the flow $\varphi_t$ is conceptually transparent but computationally unwieldy: it asks us to integrate the vector field and pull back along the resulting diffeomorphism. For differential forms specifically, there is a far more efficient description that expresses $\mathcal{L}_X$ purely in terms of $d$ and the interior product $\iota_X$, with no flow or integration involved. This is Cartan's magic formula, and it is the identity that makes the proof of Moser's theorem tractable.
[quotetheorem:1535]
[citeproof:1535]
Now we come to the main theorem.
[quotetheorem:1536]
[citeproof:1536]
The method used in this proof — finding a vector field at each time to kill an infinitesimal obstruction, then integrating — is called **Moser's method**. It is a powerful technique that reappears in symplectic geometry (where it classifies symplectic manifolds locally) and in the study of other geometric structures.
The necessity of the equal-volume condition is witnessed concretely: on $S^2$, a volume form of total area $4\pi$ (the round metric) cannot be pulled back to one of total area $2\pi$ by any diffeomorphism, since diffeomorphisms preserve total volume. On non-compact manifolds (say $M = \mathbb{R}^2$), the theorem fails even with equal total volume: one can construct volume forms of equal (infinite) total volume that are not equivalent under compactly-supported diffeomorphisms, because the argument in the proof uses compactness of $M$ to integrate the vector field $X_t$ globally and ensure $\psi_t$ is defined for all $t \in [0,1]$.
[remark: Implication for volume-preserving diffeomorphisms]
Moser's theorem implies that the group $\operatorname{Diff}_{\rm vol}(M, \omega) = \{f \in \operatorname{Diff}(M) : f^*\omega = \omega\}$ of volume-preserving diffeomorphisms is "large" in the homotopy-theoretic sense: the inclusion $\operatorname{Diff}_{\rm vol}(M,\omega) \hookrightarrow \operatorname{Diff}(M)$ is a weak homotopy equivalence. This is because any diffeomorphism isotopic to the identity can be isotoped through volume-preserving diffeomorphisms, by the proof above applied to the pair $(\omega, f^*\omega)$.
[/remark]
The machinery of differential forms and the exterior derivative provides the algebraic foundation for understanding curvature, but to measure how geometry actually bends, we need to differentiate vector fields in a compatible way. Connections are exactly this: a way to prescribe covariant derivatives that reduce to ordinary derivatives in flat spaces and reveal the intrinsic curvature of any manifold.
# 3. Connections
The first two chapters built the foundational language of manifolds, vector bundles, differential forms, de Rham cohomology, and exterior differentiation. A recurring theme was the exterior derivative $d$, which gives a canonical way to differentiate differential forms on $M$. The natural question for this chapter is: what happens when we want to differentiate sections of an arbitrary vector bundle $E \to M$, not just the tangent bundle? The exterior derivative alone does not generalise — it exploits the special structure of forms on $T^*M$. A **connection** is precisely the additional data needed to differentiate sections of a general bundle, and the obstruction to making this differentiation "trivial" is the **curvature**. The chapter closes with two major applications: Chern-Weil theory, which extracts topological invariants of bundles from curvature, and torsion, which measures the asymmetry of connections on the tangent bundle itself.
## Bundle-Valued Differential Forms
Before defining connections, we fix the target space for differentiation. Recall that if $E \to M$ is a smooth vector bundle of rank $k$, then:
- $\Omega^0(E) = \Gamma(E)$ is the space of global sections of $E$, i.e. smooth $E$-valued functions on $M$,
- $\Omega^i(E) = \Gamma(E \otimes \Lambda^i(T^*M))$ is the space of $E$-valued differential $i$-forms.
Concretely, an $E$-valued $i$-form at a point $p \in M$ is an element of $(E \otimes \Lambda^i(T^*M))_p = E_p \otimes \Lambda^i(T^*_p M)$. Since $E_p \cong \mathbb{R}^k$ via a local trivialisation, such an element has the form $v \otimes \omega$ with $v \in E_p$ and $\omega \in \Lambda^i(T^*_p M)$, which we can think of as a multilinear map $T_p M \times \cdots \times T_p M \to E_p$ evaluating as $(v \otimes \omega)(x_1, \ldots, x_i) = v \cdot \omega(x_1, \ldots, x_i)$. When $E = M \times \mathbb{R}$ is the product line bundle, $E$-valued $i$-forms are just the usual differential $i$-forms, since each fibre is a copy of $\mathbb{R}$.
## Connections and the Connection Matrix
With forms in hand, we can define the central object of the chapter.
[motivation]
### Why Connections Are Needed
The exterior derivative $d: \Omega^i(M) \to \Omega^{i+1}(M)$ is canonical — it exists on every manifold without any additional choice. However, if $E \to M$ is a general vector bundle, there is no canonical way to differentiate sections. The issue is local-to-global: we can differentiate locally in a trivialisation, but different trivialisations give different answers, and there is no intrinsic way to reconcile them. A connection is exactly a choice of how to identify nearby fibres, consistently enough to make differentiation well-defined.
The Leibniz (product) rule is the minimal compatibility condition between differentiation and the $C^\infty(M)$-module structure of $\Gamma(E)$: differentiating $fs$ should behave like $d(fs) = (df) \cdot s + f \cdot ds$. Any operator satisfying this rule is a connection, and any two connections differ by a $C^\infty(M)$-linear operator — an endomorphism-valued 1-form.
[/motivation]
With this motivation in hand, we can now state the definition precisely. The single axiom — the Leibniz rule — turns out to be exactly the right condition: it is strong enough to ensure the operator behaves like differentiation, yet weak enough that connections exist in abundance and form an affine space.
[definition: Connection]
Let $E \to M$ be a smooth vector bundle. A **connection** $A$ on $E$ is a linear operator
\begin{align*}
d_A : \Omega^0(E) \to \Omega^1(E)
\end{align*}
satisfying the **Leibniz rule**: for every $f \in C^\infty(M)$ and $s \in \Gamma(E)$,
\begin{align*}
d_A(f \cdot s) = s \otimes df + f \cdot d_A(s).
\end{align*}
[/definition]
The Leibniz rule is the only axiom. A connection is a generalisation of the exterior derivative to bundle-valued functions: it satisfies the same product rule, but without the higher-order properties that $d$ enjoys (such as $d^2 = 0$).
### The Connection Matrix
To understand how connections look in practice, fix a trivialising open set $U \subset M$ for $E$, so $E|_U \cong U \times \mathbb{R}^k$, and let $e_1, \ldots, e_k$ be a local basis of sections of $E|_U$. Since $d_A(e_i) \in \Omega^1(E|_U)$, we can expand it as:
\begin{align*}
d_A(e_i) = \sum_{j=1}^k e_j \otimes \theta_{ji}
\end{align*}
for some 1-forms $\theta_{ji} \in \Omega^1(U)$. The matrix $\theta = (\theta_{ji})$ is called the **connection matrix** of $A$ in the open set $U$. Since $d_A$ is linear and determined by its values on a basis, $\theta$ determines $d_A$ completely on $U$.
For a general local section $s = \sum_{i=1}^k s_i e_i$ with $s_i \in C^\infty(U)$, the Leibniz rule gives:
\begin{align*}
d_A(s) = \sum_i d_A(s_i e_i) = \sum_i \left(e_i \otimes ds_i + s_i d_A(e_i)\right) = \sum_i \left(ds_i + \sum_j s_j \theta_{ji}\right) \otimes e_i.
\end{align*}
This shows that locally, the connection acts as $d_A = d + \theta$, where $\theta$ is the matrix of 1-forms that "corrects" the exterior derivative to account for the non-triviality of the bundle.
### Change of Frame and the Transformation Law
If $e'_1, \ldots, e'_k$ is another local basis of sections over $U$, related to the first by $e'_j|_p = \sum_i \psi_{ji} e_i|_p$ for some smooth map $\psi: U \to GL_k(\mathbb{R})$, the connection matrix must transform consistently.
[quotetheorem:1537]
[citeproof:1537]
The transformation law $\theta' = d\psi \cdot \psi^{-1} + \psi\theta\psi^{-1}$ has a fundamental consequence for gauge theory: the inhomogeneous term $d\psi \cdot \psi^{-1}$ is unavoidable and prevents the connection matrix from transforming as a tensor. This is not a defect but a feature — it is precisely this inhomogeneous behaviour that allows $\theta$ to represent the gauge field (the vector potential in physics), whose transformation under gauge changes $\psi$ mirrors the transformation of a connection. A genuine tensor would transform homogeneously and could not encode the physical degrees of freedom of a gauge theory.
[remark: Local Principle for Connections]
Essentially everything with connections revolves around working locally in a trivialising chart, where $d_A = d + \theta$. The transformation law above is then the global gluing condition ensuring these local pieces define a genuine global operator.
[/remark]
## Existence and the Space of Connections
Having defined connections, two natural questions arise: do they exist, and what is the structure of the space of all connections?
[quotetheorem:1538]
[citeproof:1538]
Part (ii) is a key structural result: there is no "preferred" connection, but any two connections differ by a globally defined endomorphism-valued 1-form. The affine structure is analogous to how affine subspaces of $\mathbb{R}^2$ (planes not through the origin) are not vector subspaces but still have a well-defined notion of difference.
## Extension to Higher Forms
Just as the exterior derivative extends from 0-forms to $i$-forms, any connection extends to a differential operator on all $E$-valued forms.
[quotetheorem:1539]
[citeproof:1539]
The crucial difference from the exterior derivative is that $d_A^2 \neq 0$ in general: the composition $\Omega^i(E) \xrightarrow{d_A} \Omega^{i+1}(E) \xrightarrow{d_A} \Omega^{i+2}(E)$ can be non-zero. This failure is precisely the curvature.
## Curvature
With the extension of $d_A$ to higher-degree forms established, we observed that $d_A^2$ need not vanish. This failure is not a defect to be corrected but a fundamental invariant to be measured: it asks how far the connection is from being a cochain differential, i.e. from satisfying $d_A^2 = 0$. Since $d_A^2$ is $C^\infty(M)$-linear (the non-$C^\infty$-linear terms cancel by the Leibniz rule), it must be given by wedging with a globally defined 2-form valued in $\mathrm{End}(E)$. This object deserves a name.
[definition: Curvature]
Let $A$ be a connection on $E \to M$. The map $\Omega^i(E) \to \Omega^{i+2}(E)$ defined by $\alpha \mapsto d_A(d_A(\alpha))$ has the form
\begin{align*}
\alpha \mapsto F_A \wedge \alpha
\end{align*}
for a uniquely determined global section $F_A \in \Omega^2(\mathrm{End}(E))$, called the **curvature** of the connection $A$. In other words, $d_A^2 = F_A \wedge$.
[/definition]
The curvature measures the failure of $d_A^2 = 0$. When $E = M \times \mathbb{R}$ and $A = d$ is the trivial connection, then $F_A = d^2 = 0$. Non-zero curvature signals that the bundle is "twisted" in a way that prevents trivialisation.
### Computing Curvature Locally
[quotetheorem:1540]
[citeproof:1540]
The local formula $F_A = d\theta + \theta \wedge \theta$ reveals the essential non-linearity of curvature. The quadratic term $\theta \wedge \theta$ arises because applying $d_A$ twice involves composing the correction $\theta \wedge$ with itself; in an abelian setting (e.g. a line bundle where $\theta$ is a scalar 1-form), $\theta \wedge \theta = 0$ by antisymmetry of the wedge product, so the curvature reduces to the linear term $d\theta$. For non-abelian structure groups, $\theta$ takes values in a Lie algebra and $\theta \wedge \theta$ is non-zero — this is why non-abelian gauge theories are genuinely more complex. Despite $\theta$ not being tensorial (it transforms with an inhomogeneous term), $F_A$ is tensorial: the inhomogeneous terms cancel in $d\theta + \theta \wedge \theta$ under a change of frame, a fact confirmed by the overlap computation in the proof.
[example: Flat Connection in Local Coordinates]
If $\theta_\alpha = 0$ on $U_\alpha$ (i.e. $A$ is the product connection in this chart), then $(F_A)_\alpha = 0$. More generally, if $E = M \times \mathbb{R}^k$ is trivial and $A$ is the product connection $d_A = d$, then $F_A \equiv 0$ because $d^2 = 0$. A connection with $F_A \equiv 0$ is called **flat**.
[/example]
[remark: Flat Bundles with Nontrivial Holonomy]
The example above gives a flat connection on a trivial bundle, where flatness is unsurprising. A more instructive example is the following. On $M = \mathbb{R}^2 \setminus \{0\}$, consider the trivial rank-1 bundle $E = M \times \mathbb{R}$ with connection matrix $\theta = \frac{x_2\,dx_1 - x_1\,dx_2}{x_1^2 + x_2^2}$. One computes $d\theta = 0$ (since $\theta$ is the angle 1-form on the punctured plane), so $F_A = 0$ and $A$ is flat. However, parallel transport around the unit circle $\gamma(t) = (\cos 2\pi t, \sin 2\pi t)$ contributes a holonomy factor $e^{\int_\gamma \theta} = e^{2\pi i} = 1$; adjusting the winding number of $\theta$ one can construct flat $U(1)$-bundles on $M$ with nontrivial holonomy representation $\pi_1(M) \to U(1)$. This shows flatness is a local condition that does not force global triviality: the obstruction to a global parallel frame is the holonomy of $\pi_1(M)$.
[/remark]
### Induced Connections on Associated Bundles
A connection $A$ on $E$ induces connections on all bundles built from $E$.
The connection on the dual bundle $E^*$ is defined by requiring a product rule for pairings: for $\xi \in \Gamma(E^*)$ and $s \in \Gamma(E)$,
\begin{align*}
(d_{A^*}(\xi))(s) := d(\xi(s)) - \xi(d_A(s)).
\end{align*}
In terms of local bases: if $\{e_j\}$ is a local basis for $E$ with dual basis $\{e^*_j\}$ for $E^*$, the condition $e^*_j(e_i) = \delta_{ij}$ and the Leibniz rule force the connection matrix for $A^*$ to be $-\theta^\top_\alpha$ (the negative transpose of the connection matrix for $A$).
For direct sums and tensor products, the induced connections are:
\begin{align*}
d_{A \oplus B}(s, t) &= (d_A s, d_B t), \\
d_{A \otimes B}(s \otimes t) &= d_A(s) \otimes t + s \otimes d_B(t).
\end{align*}
In particular, the connection on $\mathrm{End}(E) = E^* \otimes E$ is given by $d_{A^* \otimes A}$.
## The Bianchi Identity
Having defined curvature, we ask: does $F_A$ satisfy any canonical differential identity? Recall that for the exterior derivative, $d^2 = 0$ immediately implies $d^3 = 0$. For a connection, $d_A^2 = F_A \wedge$ is not zero, so the analogue of $d^3 = 0$ is not immediate — yet it holds, in the form of the Bianchi identity. This identity governs how the curvature itself is differentiated by the induced connection on $\mathrm{End}(E)$, and it is the key input for Chern-Weil theory.
[quotetheorem:1541]
The informal content is: since $F_A = d_A^2$, the Bianchi identity says "$d_A^3 = 0$", an analogue of $d^3 = 0$ for the exterior derivative.
[proof]
For a section $\varphi \in \Omega^s(\mathrm{Hom}(E, F))$, the induced connection on $\mathrm{Hom}(E, F) = E^* \otimes F$ satisfies, locally:
\begin{align*}
d_{A^* \otimes B}(\varphi)_\alpha = d\varphi_\alpha + \Theta_\alpha \wedge \varphi_\alpha + (-1)^{|\varphi|} \varphi_\alpha \wedge \theta_\alpha,
\end{align*}
where $\theta_\alpha$ and $\Theta_\alpha$ are the connection matrices for $A$ and $B$ respectively. Taking $A = B$ and $\varphi = F_A$, with $(F_A)_\alpha = d\theta_\alpha + \theta_\alpha \wedge \theta_\alpha$:
\begin{align*}
d_{A^* \otimes A}(F_A)_\alpha &= d(d\theta_\alpha + \theta_\alpha \wedge \theta_\alpha) + \theta_\alpha \wedge (d\theta_\alpha + \theta_\alpha \wedge \theta_\alpha) - (d\theta_\alpha + \theta_\alpha \wedge \theta_\alpha) \wedge \theta_\alpha \\
&= d(\theta_\alpha \wedge \theta_\alpha) + \theta_\alpha \wedge d\theta_\alpha - d\theta_\alpha \wedge \theta_\alpha = 0,
\end{align*}
using $d^2 = 0$ and the Leibniz property of $d$.
[/proof]
## Covariant Constancy and Flat Connections
A section $s \in \Gamma(E)$ is called **horizontal** or **parallel** with respect to $A$ if it does not change under covariant differentiation — that is, if $d_A s = 0$. Geometrically, a parallel section is one that is constant along every path in $M$ when transported by $A$. Understanding when parallel sections exist, and how many, is the central question of this section. The answer turns out to be completely governed by the curvature: the local existence of a full basis of parallel sections is equivalent to flatness $F_A = 0$.
[definition: Covariant Differentiation and Parallel Sections]
The operator $d_A$ is called **covariant differentiation** with respect to the connection $A$. A section $\varphi \in \Omega^0(E) = \Gamma(E)$ satisfying $d_A \varphi = 0$ is called **covariant constant** (or **parallel**).
[/definition]
The question of when there exists a local basis of parallel sections is answered completely by the curvature.
[quotetheorem:1542]
The curvature is therefore the only obstruction to finding a local trivialisation in which the connection matrix vanishes.
[proof]
($\Leftarrow$) If $s_1, \ldots, s_k$ is a local basis with $d_A s_i = 0$ for all $i$, then the connection matrix in this basis is $\theta = 0$, so $(F_A)_\alpha = d\theta + \theta \wedge \theta = 0$.
($\Rightarrow$) Take local coordinates $x_1, \ldots, x_n$ on $M$ and extend to coordinates $x_1, \ldots, x_n, \lambda_1, \ldots, \lambda_k$ on the total space of $E$. Define 1-forms on $E$ by:
\begin{align*}
\psi_i = d\lambda_i + \sum_{j=1}^k \theta_{ij} \lambda_j.
\end{align*}
A direct computation using $F_A = 0$ gives $d\psi_i = \sum_j \psi_j \wedge \theta_{ij}$. By the Frobenius integrability criterion (a lemma on when distributions are involutive), the distribution $V = \{x \in TE : \psi_i(x) = 0\; \forall i\}$ is involutive. By the Frobenius Integrability Theorem (Chapter 1), there are local coordinates $y_1, \ldots, y_{n+k}$ on $E$ such that $V = \mathrm{span}\{\partial_{y_1}, \ldots, \partial_{y_n}\}$.
Fix constants $y_{n+1} = a_1, \ldots, y_{n+k} = a_k$; this defines a local integral submanifold $W \subset E$ for $V$. The projection $\pi|_W: W \to M$ is a local diffeomorphism (its kernel is transverse to $TW$ since any $v \in \ker(d\pi) \cap TW$ must satisfy $\psi_i(v) = 0$ for all $i$, forcing $v = 0$). Its local inverse gives functions $\gamma_1, \ldots, \gamma_k$ on $M$, and then $s = \sum_i \gamma_i s_i$ satisfies $d_A s = 0$. Varying the choice of constants $a_j$ produces a full basis of parallel sections.
[/proof]
The theorem as stated is local — it guarantees a basis of parallel sections over a small neighbourhood of every point, but makes no claim about global parallel frames. This distinction matters: a flat bundle on a simply-connected manifold is globally trivialised by parallel transport, but on a manifold with nontrivial fundamental group (such as the punctured plane $\mathbb{R}^2 \setminus \{0\}$), there can exist flat connections with nontrivial holonomy that obstruct the existence of a global parallel frame despite local flatness at every point.
The key tool used in the forward direction is the following lemma, which characterises involutivity of a distribution in terms of the exterior derivatives of its annihilating forms.
[quotetheorem:1543]
[citeproof:1543]
The Involutivity Criterion is a theorem about when a system of 1-forms defines an integrable distribution: the condition $d\theta_i \equiv 0 \pmod{\theta_1, \ldots, \theta_m}$ is both necessary and sufficient for involutivity. The necessity direction is elementary; the sufficiency direction requires constructing local coordinates, and it is precisely this construction that is invoked in the proof of Flatness Characterises Parallel Frames above. The criterion also has a life of its own as the key step in the Frobenius Integrability Theorem (Chapter 1), which reappears in Chapter 4 in the study of polarisations and their Lagrangian foliations.
## Holonomy
The flatness theorem has an important interpretation in terms of parallel transport along curves.
[remark: Holonomy from Parallel Transport]
Let $E \to M$ be a bundle with connection $A$ and $\gamma: [0,1] \to M$ a smooth path. The pullback $\gamma^* E \to [0,1]$ carries the pullback connection $\gamma^* A$. Since $\Lambda^2([0,1]) = 0$, the curvature of $\gamma^* A$ vanishes automatically. By the flatness theorem, there exist global covariant constant sections $\sigma_1, \ldots, \sigma_k$ of $\gamma^* E$. These sections provide a consistent way to transport vectors in the fibres along $\gamma$.
[/remark]
Parallel transport along $\gamma$ is the mechanism that gives a concrete meaning to "moving a vector in $E_{\gamma(0)}$ to $E_{\gamma(1)}$ without rotating or stretching it" — where "without rotating" is measured by $A$. We now make this definition precise.
[definition: Parallel Transport]
Let $\gamma: [0,1] \to M$ be a smooth path and $v_0 \in E_{\gamma(0)}$. Writing $v_0 = \sum_{i=1}^k c_i \sigma_i(0)$ in the basis of parallel sections, define $v_t = \sum_{i=1}^k c_i \sigma_i(t) \in E_{\gamma(t)}$. The vector $v_1 \in E_{\gamma(1)}$ is called the **parallel transport** of $v_0$ along $\gamma$.
[/definition]
Parallel transport defines a linear isomorphism $E_{\gamma(0)} \to E_{\gamma(1)}$ (with inverse given by transport along the reversed curve $\tilde\gamma(t) = \gamma(1-t)$). When $\gamma: S^1 \to M$ is a smooth loop, parallel transport gives an automorphism of $E_{\gamma(0)}$, the **holonomy** of $A$ along $\gamma$. Varying the loop yields the **holonomy group**, the image of the map
\begin{align*}
C^\infty(S^1, M) \to GL(E_{\gamma(0)}).
\end{align*}
When $M$ is connected, the corresponding conjugacy class of subgroups of $GL_k(\mathbb{R})$ is well-defined. If $F_A \equiv 0$, the holonomy factors through $\pi_1(M, m)$, so the holonomy group is a quotient of the fundamental group.
## Chern-Weil Theory
We have seen that the curvature $F_A \in \Omega^2(\mathrm{End}(E))$ depends on the choice of connection. A fundamental observation is that certain combinations of the curvature define cohomology classes that are independent of this choice — these are **characteristic classes** of $E$.
### Powers of the Curvature
Combining the wedge product on differential forms with composition of endomorphisms, we obtain a product on $\Omega^*(\mathrm{End}(E))$. Given $\varphi \otimes s \in \Omega^i(\mathrm{End}(E))$ with $\varphi \in \Gamma(\mathrm{End}(E))$ and $s \in \Gamma(\Lambda^i(T^*M))$, define:
\begin{align*}
(\varphi \otimes s)^m := (\varphi \circ \cdots \circ \varphi) \cdot (s \wedge \cdots \wedge s) \in \Omega^{mi}(\mathrm{End}(E)).
\end{align*}
Taking the trace of the endomorphism part gives a differential form: $\mathrm{tr}(F_A^m) \in \Omega^{2m}(M)$.
[quotetheorem:1544]
[citeproof:1544]
The closedness of $\mathrm{tr}(F_A^m)$ rests on two ingredients: the Bianchi identity $d_{A^* \otimes A}(F_A) = 0$ and the Ad-invariance of the trace, which ensures that $d(\mathrm{tr}(\varphi)) = \mathrm{tr}(d_{A^* \otimes A}(\varphi))$ for any endomorphism-valued form $\varphi$. Without Bianchi, one would not be able to commute $d$ through the trace; without Ad-invariance, the trace itself would not be well-defined on $\Omega^*(\mathrm{End}(E))$ as a real-valued form. The resulting cohomology classes $[\mathrm{tr}(F_A^m)]$ are the Chern classes (up to normalisation) of $E$ — they are the fundamental topological invariants detected by curvature. The independence of these classes from the choice of connection is established next.
[quotetheorem:1545]
[citeproof:1545]
The proof shows that $\mathrm{tr}(F_A^m) - \mathrm{tr}(F_B^m)$ is always exact, so the cohomology class is genuinely independent of the choice of connection. This is remarkable: the curvature $F_A$ depends very much on $A$, but the trace of its powers does not (in cohomology). The proof uses the interpolation trick (a one-parameter family of connections), the Bianchi identity for $F_t$ at each $t$, and Ad-invariance of the trace. Without any one of these three ingredients, the argument would fail.
The map $E \mapsto [\mathrm{tr}(F_A^m)] \in H^{2m}_{dR}(M)$ is a **characteristic class**: it is natural under pullbacks, meaning if $f: M \to N$ is smooth and $E \to N$, then $[\mathrm{tr}(F_{f^*A}^m)] = f^*([\mathrm{tr}(F_A^m)])$ in $H^{2m}_{dR}(M)$.
[remark: Trivial Bundles Have Vanishing Characteristic Classes]
If $E \cong M \times \mathbb{R}^k$ is a trivial bundle, the product connection $d_A = d$ has $F_A = d^2 = 0$, so $\mathrm{tr}(F_A^m) = 0$ for all $m \geq 1$. Therefore:
\begin{align*}
[\mathrm{tr}(F_A^m)] \neq 0 \text{ for some } m \implies E \not\cong \text{trivial bundle.}
\end{align*}
Taking $E = TM$, one obtains canonical cohomology classes in $H^{2m}_{dR}(M)$ for any smooth manifold $M$, which are preserved by the diffeomorphism group. This is one of the central applications of Chern-Weil theory to the study of manifolds.
[/remark]
This remark has a powerful converse: if all characteristic classes $[\mathrm{tr}(F_A^m)]$ vanish, the bundle is not necessarily trivial (there are non-trivial bundles detected only by torsion invariants invisible to de Rham cohomology), but any non-vanishing class is a definitive obstruction. The classes $[\mathrm{tr}(F_A^m)]$ for $m = 1, 2, \ldots$ generate all Chern classes of a complex bundle under appropriate normalisation.
### Example: The Tautological Line Bundle over $\mathbb{C}P^1$
The following example, though non-examinable, illustrates Chern-Weil theory in a concrete computation.
[example: Curvature of the Tautological Bundle]
Write $\mathbb{C}P^1 = \mathbb{C}_1 \cup \mathbb{C}_2$, where $\mathbb{C}_1$ has coordinate $z$ and $\mathbb{C}_2$ has coordinate $w$, related by $w = 1/z$ on the overlap. The tautological line bundle $L \to \mathbb{C}P^1$ has transition function $\psi_{12}(z) = 1/z$.
A Hermitian metric on $L$ is specified by smooth functions $H_1, H_2 > 0$ on $\mathbb{C}_1$ and $\mathbb{C}_2$ respectively, satisfying $H_2 = |z|^{-2} H_1$ on the overlap. Setting $H_1 = 1 + |z|^2$ and $H_2 = 1 + |w|^2$ satisfies this compatibility.
The Chern connection (the unique connection compatible with the metric and the holomorphic structure) has local connection matrix $\theta_1 = H_1^{-1} \partial H_1$ in the $z$-chart. Computing:
\begin{align*}
\theta_1 = \frac{1}{1+|z|^2} \cdot \frac{\partial}{\partial z}(1+|z|^2)\, dz = \frac{\bar{z}\, dz}{1+|z|^2}.
\end{align*}
Since $\theta_1$ is a 1-form of type $(1,0)$, its self-wedge vanishes: $\theta_1 \wedge \theta_1 = 0$. Thus:
\begin{align*}
F_A = d\theta_1 = \bar\partial\theta_1 = \frac{\partial}{\partial \bar z}\!\left(\frac{\bar z}{1+z\bar z}\right) d\bar z \wedge dz = \frac{1}{(1+|z|^2)^2}\, d\bar z \wedge dz.
\end{align*}
In polar coordinates $z = re^{i\phi}$, we have $d\bar z \wedge dz = -2ir\, dr \wedge d\phi$, so:
\begin{align*}
\int_{\mathbb{C}P^1} F_A = \int_{\mathbb{C}} \frac{-2ir\, dr \wedge d\phi}{(1+r^2)^2} = -i\pi.
\end{align*}
Since this integral is non-zero, $[F_A] \neq 0$ in $H^2_{dR}(\mathbb{C}P^1)$, confirming that $L$ is not the trivial bundle. For the bundle $L^{\otimes n}$ with metric $H_1 = (1+|z|^2)^n$, one finds $\int_{\mathbb{C}P^1} F_A = -in\pi$.
[/example]
## Torsion
So far we have studied connections on general vector bundles. When $E = TM$ is the tangent bundle, the connection has additional structure coming from the interaction between $TM$ and itself.
[definition: Covariant Derivative Along a Vector Field]
Let $A$ be a connection on a vector bundle $E \to M$ and $X \in \Gamma(TM)$ a vector field. The composition
\begin{align*}
\Gamma(E) \xrightarrow{d_A} \Gamma(E \otimes T^*M) \xrightarrow{\iota_X} \Gamma(E)
\end{align*}
where $\iota_X$ is the interior product, defines the **covariant derivative of $A$ along $X$**:
\begin{align*}
\nabla_X := \iota_X \circ d_A : \Omega^0(E) \to \Omega^0(E).
\end{align*}
[/definition]
The covariant derivative along a vector field packages the connection into a more familiar form: rather than a map from sections to 1-forms, $\nabla_X$ takes a section to a section, giving the rate of change in the direction $X$. This is the language familiar from Riemannian geometry. When $E = TM$, this operation can interact with the Lie bracket of vector fields in a non-trivial way, leading to the notion of torsion.
[definition: Affine Connection]
A connection on $TM$ is called an **affine** (or **Koszul**) **connection** on $M$. An affine connection $A$ gives an operator $\nabla_X: \Gamma(TM) \to \Gamma(TM)$ for each vector field $X$.
[/definition]
The induced connection $d_{A^*}$ on $T^*M = \Omega^1(M)$ makes it natural to compose:
\begin{align*}
\Gamma(T^*M) \xrightarrow{d_{A^*}} \Gamma(T^*M \otimes T^*M) \xrightarrow{\pi} \Gamma(\Lambda^2(T^*M)) = \Omega^2(M),
\end{align*}
where $\pi(a \otimes b) = a \wedge b$ is the antisymmetrisation map. A calculation shows that $\pi \circ d_{A^*} + d$ is $C^\infty(M)$-linear as an operator $\Omega^1(M) \to \Omega^2(M)$, hence it is a bundle map.
[definition: Torsion]
The **torsion** of an affine connection $A$ is the bundle homomorphism:
\begin{align*}
\tau_A = \pi \circ d_{A^*} + d : \Omega^1(M) \to \Omega^2(M).
\end{align*}
A connection with $\tau_A = 0$ is called **torsion-free**.
[/definition]
To see torsion in coordinates, let $(U, \varphi)$ be a chart with coordinates $(x_1, \ldots, x_n)$ and let $\Gamma^k_{ij}$ denote the Christoffel symbols of the connection, defined by $\nabla_{\partial_{x_i}} \partial_{x_j} = \sum_k \Gamma^k_{ij} \partial_{x_k}$. Since $[\partial_{x_i}, \partial_{x_j}] = 0$ in a coordinate chart, the torsion tensor evaluates as:
\begin{align*}
T(\partial_{x_i}, \partial_{x_j}) = \nabla_{\partial_{x_i}} \partial_{x_j} - \nabla_{\partial_{x_j}} \partial_{x_i} = \sum_k (\Gamma^k_{ij} - \Gamma^k_{ji})\,\partial_{x_k}.
\end{align*}
Thus torsion-free ($\tau_A = 0$) is equivalent to the symmetry $\Gamma^k_{ij} = \Gamma^k_{ji}$ of the Christoffel symbols in their lower indices. This is why the Levi-Civita connection of a Riemannian metric — which is defined to be torsion-free and metric-compatible — has symmetric Christoffel symbols.
The torsion is a first-order measure of how the connection fails to be symmetric. It is an intrinsic invariant of the affine connection — unlike curvature, it is not defined for connections on general bundles, only on $TM$.
### Torsion and Curvature via the Covariant Derivative
The torsion $\tau_A$ and curvature $F_A$ each have equivalent formulations directly in terms of the covariant derivative $\nabla$.
[quotetheorem:1546]
[citeproof:1546]
The Lie bracket term $-[X, Y]$ in both $T(X,Y)$ and $K(X,Y)$ is what ensures tensoriality. The naive expression $\nabla_X Y - \nabla_Y X$ is not $C^\infty(M)$-linear in $X$ or $Y$: multiplying $Y$ by a function $f$ introduces a term $X(f) \nabla_Y$ which does not cancel. Subtracting $[X, Y]$ supplies exactly the correcting term $X(f) Y - Y(f) X$ that restores linearity. The same mechanism operates for $K(X,Y)$: the combination $\nabla_X \nabla_Y - \nabla_Y \nabla_X - \nabla_{[X,Y]}$ is $C^\infty$-linear in $X$, $Y$, and in the section acted upon, making it a genuine tensor. For torsion-free connections, the symmetry $\Gamma^k_{ij} = \Gamma^k_{ji}$ means that $T = 0$ captures the same information as the classical condition that Christoffel symbols are symmetric.
[remark: Memorising the Curvature Formula]
The curvature tensor is compactly written as $K(X,Y) = [\nabla_X, \nabla_Y] - \nabla_{[X,Y]}$. This says: the curvature measures how much the commutator of covariant derivatives $\nabla_X$ and $\nabla_Y$ fails to equal the covariant derivative along the Lie bracket $[X,Y]$. When $K = 0$ (flat connection), these agree — the connection is compatible with the Lie algebra structure of vector fields.
[/remark]
With connections and curvature in hand, we now ask: which connections are the most natural ones for a given manifold? Geometric structures like Riemannian metrics, complex structures, and symplectic forms pin down a unique connection, and their interplay with curvature — captured by the Ricci tensor and scalar curvature — opens deep geometric and topological insights.
# 4. Geometric Structures
The first three chapters built the foundational language of smooth manifolds: the tangent and cotangent bundles, differential forms, connections on vector bundles, curvature, and torsion. This chapter puts that machinery to work by studying three of the most important geometric structures a manifold can carry — affine, symplectic, and Riemannian — and explores what each structure implies, both locally and globally. The chapter culminates in the Riemann curvature tensor and the Einstein condition, which connects the geometry of a Riemannian manifold to its Ricci tensor in a way that appears, in a different signature, in Einstein's field equations of general relativity.
## Affine Structures
An affine structure is the simplest geometric structure one can place on a manifold: one asks for coordinates whose transition maps are as rigid as possible — not merely smooth, but drawn from a specific group of transformations.
[definition: Affine Group]
The **affine group** $\mathrm{Aff}(\mathbb{R}^n)$ is the group of affine transformations of $\mathbb{R}^n$:
\begin{align*}
\mathrm{Aff}(\mathbb{R}^n) = \{ M : \mathbb{R}^n \to \mathbb{R}^n : M(x) = Ax + b \text{ for some } A \in GL_n(\mathbb{R}),\ b \in \mathbb{R}^n \}.
\end{align*}
Here $A$ encodes dilations and rotations, while $b$ is the translation component.
[/definition]
[definition: Affine Structure]
An **affine structure** on a smooth manifold $M$ is an atlas of charts such that all transition maps are restrictions of affine transformations of $\mathbb{R}^n$.
[/definition]
The $n$-torus $\mathbb{T}^n = \mathbb{R}^n / \mathbb{Z}^n$ carries a natural affine structure: it is covered by $\mathbb{R}^n$ via the projection, and all deck transformations are translations, hence affine.
The connection between affine structures and connections comes from the following key theorem, which shows that a flat, torsion-free connection is exactly what is needed to build an affine atlas.
[quotetheorem:1547]
[citeproof:1547]
This theorem has two notable corollaries. First, the developing map from general topology produces a natural map $\widetilde{M} \to \mathbb{R}^n$ from the universal cover, making $M$ a quotient of an open set in $\mathbb{R}^n$ by a discrete subgroup. Second, in the special case that all transition maps have no translation component ($b = 0$), a theorem of Bieberbach guarantees that $M$ has a finite cover which is a torus (see the theory of crystallographic groups).
[remark: Chern's Conjecture]
Chern's Conjecture (circa 1955) predicts that if $M$ is a closed affine manifold, then $\chi(M) = 0$. This remains open in general.
[/remark]
## Symplectic Structures
Symplectic geometry is the study of an additional structure on a manifold: a non-degenerate closed 2-form. The motivation comes from classical mechanics, where the phase space of a Hamiltonian system carries exactly this structure, with position and momentum coordinates playing symmetric roles.
[definition: Symplectic Form]
A **symplectic form** on a smooth manifold $M$ is a 2-form $\omega \in \Omega^2(M)$ satisfying:
- $\omega$ is **non-degenerate**: for each $x \in M$, if $\omega_x(u, v) = 0$ for all $u \in T_xM$, then $v = 0$.
- $\omega$ is **closed**: $d\omega = 0$.
A manifold equipped with a symplectic form is called a **symplectic manifold**.
[/definition]
Non-degeneracy of $\omega$ means precisely that the map $\omega : TM \to T^*M$ defined fibrewise by $x \mapsto \omega_p(x, \cdot)$ is a bundle isomorphism. This natural isomorphism between the tangent and cotangent bundle is one of the distinguishing features of symplectic geometry.
### Linear Symplectic Algebra
The Darboux theorem says that every symplectic manifold looks locally like $\mathbb{R}^{2n}$ with its standard symplectic form — the manifold-level counterpart of the Standard Form Theorem we establish below. Before proving the Standard Form Theorem, we need to understand what symplectic forms look like on a single vector space: what is the normal form for a non-degenerate skew-symmetric bilinear form? This question reduces the manifold-level problem to a linear-algebra problem, and the answer — the Standard Form Theorem — is the foundation on which Darboux is built.
[quotetheorem:1548]
[citeproof:1548]
The Standard Form Theorem has an immediate consequence for dimension: since the induction builds pairs $(u, v)$, the dimension of $V$ must be even. Non-degeneracy forces even dimension — a skew-symmetric form on an odd-dimensional space is always degenerate, because the matrix of $\omega$ has odd size and skew-symmetric matrices of odd size have zero determinant. This dimension constraint is the simplest topological obstruction from symplectic geometry, and it already shows that $S^{2n+1}$ (which is odd-dimensional) cannot be a symplectic manifold. At the manifold level, the Darboux theorem promotes this vector-space normal form to a local coordinate statement: near any point of a symplectic manifold, coordinates $(x_1, \ldots, x_n, y_1, \ldots, y_n)$ exist in which $\omega = \sum_i dx_i \wedge dy_i$ — a remarkable rigidity, since there is no corresponding local invariant distinguishing one symplectic manifold from another.
The Standard Form Theorem is a purely algebraic statement about a single vector space, but its utility for global symplectic geometry depends on being able to detect non-degeneracy in a way that interacts with calculus on manifolds. The criterion we have — that $\omega(\cdot, v) \neq 0$ for every non-zero $v$ — is pointwise and analytic, so it does not immediately connect to the de Rham cohomology of the ambient manifold. What we need is an equivalent formulation expressed in terms of wedge products, because wedge products are precisely the operations that behave well under $d$ and hence descend to cohomology. The next theorem provides exactly this reformulation, replacing non-degeneracy with the non-vanishing of the top exterior power $\omega^n$.
[quotetheorem:1549]
[citeproof:1549]
The equivalence between non-degeneracy and $\omega^n \neq 0$ is more than a curiosity: it converts an analytic condition (a bilinear form is non-degenerate) into a cohomological one (a de Rham class is non-zero). This is the key to the topological obstruction theorem below, because it means the hypothesis that $(M, \omega)$ is symplectic directly produces a non-trivial cohomology class $[\omega^n] \in H^{2n}_{\mathrm{dR}}(M)$, which one can then use with Stokes' theorem to derive constraints on the topology of $M$.
### Topological Obstructions from Symplectic Geometry
Linear symplectic algebra tells us about a single vector space, but a symplectic form on a manifold weaves these pointwise structures into a globally coherent object via the closedness condition $d\omega = 0$. When the manifold is also closed (compact without boundary), Stokes' theorem becomes available and transforms algebraic statements about $\omega^n$ into topological statements about $M$. The next theorem extracts the two simplest consequences — orientability and the non-triviality of even-degree cohomology — from the combination of the closedness of $\omega$ and the non-vanishing of its top power. These are genuine constraints: they rule out vast classes of manifolds as candidates for symplectic structures.
[quotetheorem:1550]
[citeproof:1550]
This gives a clean obstruction: the even-degree de Rham cohomology must all be non-trivial. In particular, $S^{2n}$ for $n \geq 2$ admits no symplectic structure, since $H^2_{\mathrm{dR}}(S^{2n}) = 0$.
Having seen what symplectic forms forbid at the level of whole manifolds, we now turn to the internal structure that symplectic forms single out: the distinguished subspaces and submanifolds on which $\omega$ behaves as simply as possible. The most important of these are the Lagrangian objects, which are maximal isotropic — the form vanishes on them entirely, and they are of the largest possible dimension consistent with this vanishing. Lagrangian submanifolds play the role of "solutions" in symplectic geometry: they are the graphs of Hamiltonian flows, the supports of WKB states in geometric quantization, and the leaves of integrable systems. The following definitions set up the linear algebra required to study them.
### Lagrangian Subspaces and Submanifolds
[definition: Lagrangian Subspace]
A **Lagrangian subspace** of a symplectic vector space $(V, \omega)$ is a subspace $L \subseteq V$ with $\dim_{\mathbb{R}}(L) = \frac{1}{2}\dim_{\mathbb{R}}(V)$ on which $\omega$ vanishes completely: $\omega|_{L \times L} \equiv 0$.
[/definition]
In the standard normal form, $V = \mathbb{R}^{2n}$ with basis vectors $x_1, y_1, \ldots, x_n, y_n$, the subspaces $\mathrm{span}\langle x_1, \ldots, x_n \rangle$ and $\mathrm{span}\langle y_1, \ldots, y_n \rangle$ are both Lagrangian. By contrast, $\mathrm{span}\langle x_1, y_1 \rangle$ is a **symplectic subspace** — the restriction of $\omega$ to it remains non-degenerate.
[definition: Lagrangian Submanifold]
A submanifold $L \subseteq (M^{2n}, \omega)$ of a symplectic manifold is a **Lagrangian submanifold** if $\dim(L) = n$ and for every $p \in L$, the tangent space $T_pL$ is a Lagrangian subspace of $(T_pM, \omega_p)$, i.e. $i^*\omega = 0$ where $i : L \hookrightarrow M$ is the inclusion.
[/definition]
### The Cotangent Bundle as a Symplectic Manifold
The canonical example of a symplectic manifold is the cotangent bundle of any smooth manifold, a fact that underlies the phase space formulation of classical mechanics.
[example: Cotangent Bundle]
Let $Q$ be any smooth manifold. The cotangent bundle $T^*Q$, with projection $p : T^*Q \to Q$, is canonically a symplectic manifold.
There is a tautological 1-form $\theta \in \Omega^1(T^*Q)$, defined as follows. For $m \in T^*Q$ and $X \in T_m(T^*Q)$, set
\begin{align*}
\theta_m(X) = \xi(dp_m(X)),
\end{align*}
where $m = (p(m), \xi)$ with $\xi \in T^*_{p(m)}Q$ the covector part of $m$.
In local coordinates $x_1, \ldots, x_n$ on $Q$ and dual fibre coordinates $y_1, \ldots, y_n$ (with $y_i = dx_i$ dual to $\partial_{x_i}$, so that $y$ represents momentum and $x$ position), the projection satisfies $dp_m(\partial_{x_i}) = \partial_{x_i}$ and $dp_m(\partial_{y_i}) = 0$. Computing $\theta$ in local coordinates gives
\begin{align*}
\theta = \sum_{i=1}^n y_i \, dx_i,
\end{align*}
and so
\begin{align*}
d\theta = \sum_{i=1}^n dy_i \wedge dx_i,
\end{align*}
which is the standard symplectic form on $\mathbb{R}^{2n}$ pointwise. Hence $(T^*Q, d\theta)$ is a symplectic manifold. Note that $d\theta$ cannot be globally exact — its powers give non-trivial de Rham cohomology classes by the earlier corollary, ruling out exact symplectic forms on closed manifolds.
The zero section $Q \hookrightarrow T^*Q$ (given by $q \mapsto 0 \in T^*_q Q$) satisfies $\theta|_Q \equiv 0$ and $d\theta|_Q = 0$, so it is Lagrangian. Each cotangent fibre $T^*_q Q$ satisfies $dx_i|_{T^*_qQ} = 0$, so $\theta|_{T^*_qQ} = 0$ and $\omega|_{T^*_qQ} = 0$, making every cotangent fibre Lagrangian. The cotangent fibres form a family of Lagrangian submanifolds sweeping all of $T^*Q$.
[/example]
## Lagrangian Foliations
A polarisation is a global organisation of Lagrangian submanifolds into a foliation of a symplectic manifold.
The cotangent bundle example points to a phenomenon more general than a single Lagrangian submanifold: on $T^*Q$ the fibres $T^*_qQ$ fit together smoothly as $q$ varies, sweeping out the whole manifold as a family of pairwise disjoint Lagrangian leaves. This is exactly the structure that a polarisation abstracts. A polarisation records not one Lagrangian submanifold but a coherent choice of Lagrangian tangent subspace at every point, assembled into a subbundle of $TM$ that is both Lagrangian fibrewise and integrable as a distribution. The next definition isolates this notion precisely; after it we will see how the Frobenius integrability theorem turns the infinitesimal data of a polarisation into a genuine foliation by Lagrangian submanifolds.
[definition: Polarisation]
A **polarisation** of a $2n$-dimensional symplectic manifold $(M, \omega)$ is a rank-$n$ subbundle $E \subset TM$ that is involutive and satisfies $\omega|_{E \times E} \equiv 0$, i.e. an involutive Lagrangian distribution.
[/definition]
In $T^*Q$, the vertical subbundle $E_m = T_m(T^*_{p(m)}Q)$ (the subbundle of vectors tangent to the cotangent fibres) is a polarisation. Since each fibre is Lagrangian, $\omega|_E = 0$, and the involutivity follows because the fibres are flat.
By the Frobenius integrability theorem (applied to the involutive distribution $E$), locally there exist coordinates $x_1, \ldots, x_n, y_1, \ldots, y_n$ on $M$ such that $E = \mathrm{span}\langle \partial_{y_1}, \ldots, \partial_{y_n} \rangle$. The integrable submanifolds of $E$ are the level sets $\{x = \mathrm{const}\}$, which are Lagrangian.
The following theorem shows that the integrable submanifolds of any polarisation automatically carry flat affine connections — a deep interaction between symplectic and affine geometry.
[quotetheorem:1551]
[citeproof:1551]
Combining this with the Affine Structure from Flat Connection theorem above, we conclude that the integrable submanifolds of any polarisation are affine manifolds.
If $E$ is not involutive — that is, if there exist sections $X, Y \in \Gamma(E)$ whose Lie bracket $[X,Y]$ does not lie in $E$ — then the Frobenius integrability theorem fails and $E$ does not integrate to a foliation at all. In that case the construction above breaks down at the first step: the covariant derivative $\nabla_X \xi = \beta([X, \tilde{\xi}])$ is not well-defined, because the ambiguity in the lift $\tilde{\xi}$ cannot be corrected using $E$-sections alone. From the perspective of geometric quantization, involutivity of the polarisation is precisely the condition that ensures the quantum Hilbert space (sections of a line bundle covariantly constant along $E$) is well-defined — dropping it produces a system with no honest quantization.
[remark: Lagrangian Foliations and Integrable Systems]
A related situation studied on Example Sheet 3 is when $(M, \omega)$ is foliated by Lagrangian tori (i.e. $E$ is abelian, not merely involutive). This corresponds to the setting of integrable systems in classical mechanics, where the Liouville–Arnold theorem guarantees that level sets of conserved quantities are tori.
[/remark]
## Riemannian Structures
A Riemannian metric is the second fundamental type of geometric structure on a manifold. While a symplectic form is skew-symmetric and non-degenerate, a Riemannian metric is symmetric and positive definite — it assigns a notion of length and angle to each tangent space.
[definition: Metric on a Vector Bundle]
A **metric** on a smooth vector bundle $E \to M$ is a smoothly varying family of inner products on the fibres, given by $g \in \Gamma(E^* \otimes E^*)$ that is symmetric and fibrewise non-degenerate.
When $E = TM$, $g$ is called a **Riemannian metric** on $M$, and $(M, g)$ is a **Riemannian manifold**.
[/definition]
In local coordinates $\{x_i\}$ on $M$, a Riemannian metric takes the form $g = \sum_{i,j} g_{ij}\, dx_i \otimes dx_j$ where the matrix $(g_{ij})$ is symmetric and positive definite. This shows Riemannian metrics exist locally; by a partition of unity argument, they exist globally on any smooth manifold.
It is worth comparing symplectic forms and Riemannian metrics: a symplectic form is a non-degenerate element of $\Gamma(\Lambda^2 T^*M)$, while a Riemannian metric is a non-degenerate element of $\Gamma(S^2(T^*M))$ — the second symmetric power. These are the two natural non-degenerate 2-tensors on a manifold.
Not every connection on $TM$ interacts well with a metric. To see what can go wrong, take $M = \mathbb{R}^n$ with $g = \sum_i dx_i \otimes dx_i$ the flat Euclidean metric, and define a connection by $\nabla_X Y = X \cdot Y + AY$ for some fixed non-zero matrix $A \in \mathfrak{gl}_n(\mathbb{R})$ that is not skew-symmetric. Then $\nabla_X(g(Y, Z)) - g(\nabla_X Y, Z) - g(Y, \nabla_X Z) = g(AY, Z) + g(Y, AZ) = (A + A^\top)_{ij} Y_i Z_j \neq 0$ whenever $A + A^\top \neq 0$, so the Leibniz rule fails. The metric is not preserved under parallel transport along any curve: a vector parallel-transported around a closed loop arrives back with a different length. Metric compatibility is the condition that rules this out.
[definition: Metric Compatibility]
An affine connection $A$ on $E$ is **compatible with a metric** $g$ on $E$ if $d_{A^* \otimes A^*}(g) = 0$, i.e. $g$ is covariant constant with respect to the induced connection on $E^* \otimes E^*$.
Equivalently, for all $u, v \in \Gamma(E)$:
\begin{align*}
d(g(u, v)) = g(d_A u, v) + g(u, d_A v) \in \Gamma(T^*M),
\end{align*}
which on evaluating at $X \in \Gamma(TM)$ says $X \cdot g(Y, Z) = g(\nabla_X Y, Z) + g(Y, \nabla_X Z)$ for all vector fields $X, Y, Z$.
[/definition]
This is the natural analogue of the product rule for derivatives: the connection is metric compatible when it differentiates inner products by the Leibniz rule.
### The Levi-Civita Connection
The central theorem of Riemannian geometry is that a Riemannian manifold has a unique connection adapted to its metric. Without the torsion-freeness condition, uniqueness fails: if $\nabla$ is the Levi-Civita connection and $B \in \Gamma(T^*M \otimes \Lambda^2 T^*M)$ is any 2-form valued section, then $\tilde{\nabla}_X Y = \nabla_X Y + B(X, Y)$ is again metric-compatible (since adding a skew-endomorphism preserves the metric pairing), but its torsion is $T^{\tilde{\nabla}}(X, Y) = 2B(X, Y) \neq 0$ in general. So dropping torsion-freeness opens up a family of metric-compatible connections parametrised by $\Gamma(T^*M \otimes \Lambda^2 T^*M)$, and uniqueness is lost. Requiring torsion-freeness selects exactly one.
[quotetheorem:1552]
[citeproof:1552]
The proof is elegant but opaque: it does not tell us what $A_{LC}$ actually looks like. The alternative viewpoint below provides an explicit formula.
### The Koszul Formula
Starting from the two defining properties of the Levi-Civita connection — metric compatibility and torsion-freeness — one derives the **Koszul formula** that uniquely determines $\nabla_X Y$:
\begin{align*}
g(\nabla_X Y, Z) = \frac{1}{2}\bigl[X \cdot g(Y,Z) + Y \cdot g(Z,X) - Z \cdot g(X,Y) + g([X,Y],Z) - g([X,Z],Y) - g([Y,Z],X)\bigr].
\end{align*}
Since the right-hand side depends only on $X, Y, Z$ and $g$, non-degeneracy of $g$ determines $\nabla_X Y$ uniquely — this is another proof of uniqueness. Existence follows by taking this as a definition and verifying the four properties: linearity over $C^\infty(M)$ in $X$, the Leibniz rule in $Y$, torsion-freeness $\nabla_X Y - \nabla_Y X = [X,Y]$, and metric compatibility.
In local coordinates $x_1, \ldots, x_n$ with $\partial_i = \partial_{x_i}$, writing $g_{ab} = g(\partial_a, \partial_b)$ and using $[\partial_i, \partial_j] = 0$, the Koszul formula gives:
\begin{align*}
g(\nabla_{\partial_i} \partial_j, \partial_k) = \frac{1}{2}\bigl[\partial_i g_{jk} + \partial_j g_{ik} - \partial_k g_{ij}\bigr].
\end{align*}
The coefficients $\Gamma^k_{ij}$ defined by $\nabla_{\partial_i} \partial_j = \sum_k \Gamma^k_{ij} \partial_k$ are the **Christoffel symbols**, and satisfy $\Gamma^k_{ij} = \frac{1}{2}\sum_m g^{km}(\partial_i g_{jm} + \partial_j g_{im} - \partial_m g_{ij})$ where $(g^{km})$ is the inverse of $(g_{km})$.
**How to compute Christoffel symbols in practice.** Given metric coefficients $g_{ij}$ in a coordinate chart $(U, x_1, \ldots, x_n)$:
1. Compute $(g^{km})$ by inverting the matrix $(g_{km})$.
2. For each triple $(k, i, j)$, form the combination $\partial_i g_{jm} + \partial_j g_{im} - \partial_m g_{ij}$ for each $m$.
3. Contract with $g^{km}$ and multiply by $\tfrac{1}{2}$:
\begin{align*}
\Gamma^k_{ij} = \frac{1}{2}\sum_{m=1}^n g^{km}\bigl(\partial_i g_{jm} + \partial_j g_{im} - \partial_m g_{ij}\bigr).
\end{align*}
For example, on $\mathbb{R}^2$ in polar coordinates $(r, \theta)$ with $g = dr \otimes dr + r^2 d\theta \otimes d\theta$, so $g_{rr} = 1$, $g_{\theta\theta} = r^2$, $g_{r\theta} = 0$, the only non-zero Christoffel symbols are $\Gamma^r_{\theta\theta} = -r$ and $\Gamma^\theta_{r\theta} = \Gamma^\theta_{\theta r} = \tfrac{1}{r}$, as computed directly from the formula above.
### Flat Riemannian Manifolds
The Levi-Civita connection brings its own curvature tensor $F_{LC}$. When this curvature vanishes, the manifold is locally indistinguishable from Euclidean space.
[quotetheorem:1553]
[citeproof:1553]
The argument uses two ingredients in an essential way: flatness of the connection to produce covariant-constant coordinate fields, and torsion-freeness to ensure those fields integrate to genuine coordinates. If one drops torsion-freeness, the covariant-constant sections need not commute (their Lie bracket need not vanish), and the forms $d\tilde{x}_i$ need not be closed, so the local coordinate construction fails. More geometrically, the obstruction to local flatness in the torsion-free case is measured by a single tensor — the Riemann curvature — which we now define. Sectional curvature, the two-dimensional version, can be thought of as the rate at which geodesics emanating from a point spread apart or converge, and it vanishes precisely when the manifold is locally Euclidean.
## Riemann Curvature
What intrinsic invariant of $(M, g)$ measures the failure of parallel transport to commute around small loops? In Chapter 3, we defined the curvature of an arbitrary connection on a vector bundle: $F_A(X, Y) = \nabla_X \nabla_Y - \nabla_Y \nabla_X - \nabla_{[X,Y]}$, and showed it measured the holonomy of infinitesimal loops. On a Riemannian manifold the Levi-Civita connection plays the role of $A$, and its curvature — the Riemann curvature tensor — is the fundamental local invariant of the metric. Unlike the symplectic case (where Darboux's theorem gives local rigidity), Riemannian geometry is genuinely locally non-trivial: the Riemann curvature is a complete local invariant, and understanding its symmetries is the first step toward computing it.
Let $(M, g)$ be a Riemannian manifold with Levi-Civita connection $A_{LC}$ and associated covariant derivatives $\nabla_X$.
[definition: Riemann Curvature]
The **Riemann curvature** of $(M, g)$ is the curvature $F_{LC}$ of the Levi-Civita connection:
\begin{align*}
F_{LC} \in \Omega^2(\mathrm{End}(TM)) = \Gamma(TM \otimes T^*M \otimes \Lambda^2 T^*M).
\end{align*}
For vector fields $X, Y$, the curvature operator is $R(X, Y) = \nabla_X \nabla_Y - \nabla_Y \nabla_X - \nabla_{[X,Y]}$.
[/definition]
In local coordinates $x_1, \ldots, x_n$ with basis $\partial_i = \partial_{x_i}$, the curvature is recorded by the **(1,3)-curvature coefficients** $R^i_{jkl}$ defined by
\begin{align*}
R(\partial_k, \partial_l)(\partial_j) = \sum_i R^i_{jkl} \partial_i.
\end{align*}
Lowering the first index with the metric, the **(0,4)-curvature tensor** is $R_{ijkl} := g(R(\partial_k, \partial_l)\partial_j, \partial_i) = \sum_m g_{mi} R^m_{jkl}$.
### Symmetries of the Curvature Tensor
The $(0,4)$-tensor $R_{ijkl}$ has four index slots, each ranging over $\{1, \ldots, n\}$, so naively it carries $n^4$ independent components. For $n = 4$ (relevant to general relativity), this is $256$ numbers. How much information does $R$ really carry? The answer comes from the symmetries of the curvature tensor, which cut the independent component count dramatically — for $n = 4$ down to $20$ independent components — and which express deep geometric facts about the Levi-Civita connection.
The curvature tensor satisfies the following symmetries:
**(i)** $R_{ijkl} = -R_{ijlk}$ — skew-symmetry in the last two indices, since $R(\partial_k, \partial_l) = -R(\partial_l, \partial_k)$.
**(ii)** $R_{ijkl} = -R_{jikl}$ — skew-symmetry in the first two indices, which follows from metric compatibility: the Levi-Civita connection is skew-symmetric as an endomorphism of $(TM, g)$.
**(iii) First Bianchi Identity**: $R_{ijkl} + R_{iklj} + R_{iljk} = 0$, or more compactly $R_{i[jkl]} = 0$. This follows from the vanishing torsion: since $[\partial_k, \partial_l] = 0$ and torsion-freeness gives $\nabla_{\partial_k}\partial_j = \nabla_{\partial_j}\partial_k$, expanding $R(\partial_k, \partial_l)\partial_j + R(\partial_l, \partial_j)\partial_k + R(\partial_j, \partial_k)\partial_l$ shows all terms cancel.
**(iv)** $R_{ijkl} = R_{klij}$ — the tensor is symmetric under exchange of the two pairs of indices. This follows from the other three symmetries by writing out the first Bianchi identity for all orderings of the indices and combining.
**(v) Second Bianchi Identity**: $\partial_i(R^m_{ljk}) + \partial_j(R^m_{lki}) + \partial_k(R^m_{lij}) = 0$, or $R^m_{l[ij;k]} = 0$ (the semicolons denoting covariant derivatives, here simply partial derivatives in geodesic normal coordinates). This comes from the second Bianchi identity $d_{A^* \otimes A}(F_A) = 0$ for any connection. In geodesic normal coordinates, the Christoffel symbols vanish at $p$, so $d\Theta = 0$ at $p$ (where $\Theta = F_{LC}$ locally), giving the stated relation.
[remark: Geodesic Normal Coordinates]
On any Riemannian manifold $(M, g)$, there exist local coordinates at any point $p$ in which all Christoffel symbols of $A_{LC}$ vanish at $p$ (geodesic normal coordinates, constructed in Chapter 5 via geodesics). The second Bianchi identity in the form above is valid specifically in these coordinates.
[/remark]
### Ricci and Scalar Curvature
The full curvature tensor $R_{ijkl}$ carries a great deal of information. By taking traces, one extracts scalar invariants of the manifold.
The Riemann tensor encodes every local curvature datum of $(M, g)$, but for many purposes — geometric analysis on manifolds, the study of Einstein metrics, comparison geometry — one wants a coarser invariant that still captures meaningful curvature information. Tracing the Riemann tensor against the metric produces such invariants: a $(0,2)$-tensor called the Ricci curvature, and a scalar function called the scalar curvature. These traced objects lose the directional refinement of the full Riemann tensor, but they retain enough to formulate global curvature hypotheses like $\mathrm{Ric} \geq 0$ or $S > 0$ and to write down the Einstein equations. The next two definitions introduce Ricci and scalar curvature; the symmetry of Ricci (forced by the first Bianchi identity) is what makes them useful.
[definition: Ricci Curvature]
The **Ricci curvature** $\mathrm{Ric} \in \Gamma(T^*M \otimes T^*M)$ is the trace of the Riemann curvature tensor:
\begin{align*}
\mathrm{Ric}(e_i, e_j) = \sum_k R_{kijk}
\end{align*}
for $\{e_i\}$ a local orthonormal basis of $T_pM$ with respect to $g$. In a general (non-orthonormal) local basis $\{\partial_i\}$:
\begin{align*}
\mathrm{Ric}(\partial_i, \partial_j) = \sum_{l,m} g^{lm} R_{limj}.
\end{align*}
[/definition]
A computation from the first Bianchi identity (symmetry (iii)) shows that $\mathrm{Ric}$ is always symmetric, i.e. $\mathrm{Ric}(\partial_i, \partial_j) = \mathrm{Ric}(\partial_j, \partial_i)$. This is necessary for the next definition to make sense.
Ricci curvature is a $(0,2)$-tensor, a pointwise bilinear form on each tangent space, and it still carries directional information — the Ricci curvature in a particular direction $v$ measures the average sectional curvature of the 2-planes containing $v$. To obtain a purely scalar quantity at each point, one takes one more trace, this time pairing Ricci against the metric to contract away both remaining indices. The result is the scalar curvature, a single real-valued function on $M$ that captures the "total" curvature at each point. Scalar curvature is the crudest but most accessible curvature invariant, and it features prominently in the Yamabe problem, the prescribed scalar curvature equation, and the positive mass theorem.
[definition: Scalar Curvature]
The **scalar curvature** is the trace of the Ricci curvature:
\begin{align*}
S = \sum_{i=1}^n \mathrm{Ric}(e_i, e_i),
\end{align*}
for $\{e_i\}$ a local orthonormal basis. It is a smooth function $S : M \to \mathbb{R}$.
[/definition]
### Einstein Manifolds
Both $\mathrm{Ric}$ and $g$ are symmetric $(0,2)$-tensors — elements of $\Gamma(T^*M \otimes T^*M)$. When these two tensors are proportional, the manifold carries a particularly uniform curvature distribution.
[definition: Einstein Manifold]
A Riemannian manifold $(M, g)$ is **Einstein** if there exists $\lambda \in C^\infty(M)$ such that $\mathrm{Ric} = \lambda g$.
[/definition]
Both $\mathrm{Ric}$ and $g$ are symmetric $(0,2)$-tensors, so the Einstein condition is a system of $\frac{n(n+1)}{2}$ scalar equations on the metric. It is a remarkable over-determined system: $\lambda$ is a priori an arbitrary smooth function, but the next theorem shows that on any connected manifold of dimension $\geq 3$ it must be a constant. Before reading that theorem, note the following physical interpretation.
The Einstein condition is not merely a piece of differential-geometric bookkeeping; it was the guiding equation in Einstein's original formulation of general relativity, and the Riemannian version studied here is the mathematical analogue in positive-definite signature. In the Lorentzian setting, the same equation — suitably interpreted, and with matter sources included via the stress–energy tensor — governs the curvature of spacetime itself. The Riemannian case is easier to study rigorously and has its own rich theory (Einstein metrics on compact manifolds, moduli of Einstein structures, existence and rigidity results), but it is illuminating to pause and note the physical pedigree of the condition before moving on.
[remark: General Relativity]
In Lorentzian signature $(+,-,-,-)$, the Einstein condition $\mathrm{Ric} = \lambda g$ (with $\lambda$ related to the cosmological constant) is precisely Einstein's vacuum field equation from general relativity.
[/remark]
The following proposition shows that the proportionality function $\lambda$ cannot vary — it is forced to be a constant on any connected Einstein manifold of dimension at least 3.
[quotetheorem:1554]
[citeproof:1554]
In particular, every connected Einstein manifold of dimension $\geq 3$ has constant scalar curvature $S = n\lambda$.
The constancy of $\lambda$ reduces the Einstein condition from a variable-coefficient identity $\mathrm{Ric} = \lambda(x) g$ to a rigid equation $\mathrm{Ric} = cg$ with a single numerical parameter, but it does not explain where Einstein metrics come from in practice. Constructing explicit examples directly from the partial differential equation $\mathrm{Ric} = cg$ is hard, since this is a non-linear second-order system for the metric. The most productive strategy is instead to use symmetry: if a manifold has a sufficiently large isometry group, the group action forces enough algebraic constraints on $\mathrm{Ric}$ that it must be proportional to $g$. The following subsection develops this circle of ideas and yields large families of Einstein manifolds — round spheres, compact Lie groups with bi-invariant metrics, and many homogeneous spaces — directly from a representation-theoretic hypothesis on the isotropy action.
### Constructing Einstein Manifolds from Group Symmetry
A powerful source of Einstein manifolds comes from Riemannian manifolds with large isometry groups.
[quotetheorem:1555]
[citeproof:1555]
Irreducibility of the $H$-action is the key hypothesis. It rules out any non-trivial $H$-invariant splitting of $T_mM$ into a direct sum of subspaces on which $\theta$ could take different eigenvalues. To see why reducibility can break the conclusion, consider $M = S^2 \times S^2$ with the product round metric and the product action of $G = O(3) \times O(3)$. This action is transitive, but the stabiliser of a point $(e_1, e_1)$ is $O(2) \times O(2)$, which acts on $T_{(e_1,e_1)}M \cong \mathbb{R}^2 \oplus \mathbb{R}^2$ reducibly (preserving each $\mathbb{R}^2$ factor). The Ricci curvature equals $g$ on each factor with its own constant, and one can scale the two sphere metrics independently to get $\mathrm{Ric} = \lambda_1 g_1 + \lambda_2 g_2$ with $\lambda_1 \neq \lambda_2$ — not an Einstein metric on the product.
[example: Round Sphere]
The orthogonal group $O(n+1)$ acts on $S^n \subset \mathbb{R}^{n+1}$ isometrically when $S^n$ carries the metric induced from the Euclidean metric on $\mathbb{R}^{n+1}$. The stabiliser of $e_1 = (1, 0, \ldots, 0)$ is $H \cong O(n)$, which acts on $T_{e_1}S^n \cong \mathbb{R}^n$ by the standard representation. Since $O(n)$ acts irreducibly on $\mathbb{R}^n$, the theorem implies $(S^n, g_{\mathrm{round}})$ is Einstein.
[/example]
The round sphere illustrates one large family of Einstein manifolds: homogeneous spaces $G/H$ where $H$ acts irreducibly on the tangent space. The orthogonal group itself provides a second example, where the relevant action is the adjoint representation on the Lie algebra.
[example: Orthogonal Group with Bi-Invariant Metric]
Let $M = O(n)$. The tangent space at the identity is $T_e O(n) = \mathfrak{g} = \{\text{skew-symmetric } n \times n \text{ matrices}\}$. Define an inner product $g_e(A, B) = -\mathrm{tr}(AB)$ on $\mathfrak{g}$ (one checks this is positive definite). Extend $g$ to a left-invariant metric on $O(n)$ by requiring $\langle L_h \alpha, L_h \beta \rangle_h = \langle \alpha, \beta \rangle_e$ for all $h \in O(n)$. Since the trace is conjugation-invariant, $g$ is also right-invariant, hence bi-invariant. Set $G = O(n) \times O(n) \leq \mathrm{Isom}(O(n), g)$ acting by $(h, k) \cdot m = hmk^{-1}$. This action is transitive (given $m \in O(n)$, take $h = m$, $k = e$). The stabiliser of $e$ is $\{(h, h) : h \in O(n)\} \cong O(n)$, which acts on $T_e O(n) = \mathfrak{g}$ by conjugation $A \mapsto hAh^{-1}$. Since $O(n)$ acts irreducibly on $\mathfrak{g}$ by this adjoint action, the Irreducible Isometry Action Implies Einstein theorem above implies $(O(n), g)$ is Einstein.
[/example]
The symmetry-based construction produces many Einstein manifolds but is ultimately a positive tool: it tells us when an Einstein metric exists, not when one cannot exist. For the opposite direction — excluding Einstein metrics on a given smooth manifold — we need topological obstructions that depend only on the diffeomorphism type of $M$, not on any particular metric. Chern–Weil theory provides exactly such obstructions by extracting characteristic classes from the curvature of the Levi-Civita connection and integrating them against $M$. On a closed oriented 4-manifold, this produces a striking constraint: the sign of the Euler characteristic controls whether Einstein metrics can exist at all, giving a purely topological obstruction to Einstein geometry.
### An Obstruction from Chern–Weil Theory
Chern–Weil theory produces characteristic classes in $H^*_{\mathrm{dR}}(M)$ from traces of powers of the curvature. The Euler class $e(M) \in H^n_{\mathrm{dR}}(M)$ of a closed oriented $n$-manifold satisfies $\int_M e(M) = \chi(M)$.
For a closed, oriented 4-manifold $M^4$ with Riemannian metric $g$, this gives:
\begin{align*}
\chi(M) = \frac{1}{8\pi^2} \int_M \bigl(|R|^2 - |z|^2\bigr) \, \mathrm{vol}_g,
\end{align*}
where $R$ and $z$ are expressions in the curvature of $F_{LC}$, and $z = 0 \iff g$ is Einstein. Therefore:
\begin{align*}
(M, g) \text{ is Einstein} \implies z = 0 \implies \chi(M) \geq 0.
\end{align*}
This provides a topological obstruction: if $\chi(M) < 0$, then $M$ admits no Einstein metric for any Riemannian structure. For instance, the 4-torus $T^4$ has $\chi(T^4) = 0$, so this particular obstruction does not rule it out; but other manifolds with negative Euler characteristic are definitively excluded from carrying Einstein metrics.
The Riemannian and curvature structures developed so far are not merely abstract; they govern the behavior of curves. Geodesics are the objects that capture what "straight" means on a curved manifold, and their properties — existence, length-minimization, stability — reveal how local geometry controls global topology through the Ricci and sectional curvatures.
# 5. Geodesics
Chapter 4 developed the Riemannian and Riemann curvature structures on a manifold, culminating in the Riemann curvature tensor and the Levi-Civita connection. Chapter 5 puts these tools to work by studying **geodesics** — the curves that play the role of straight lines on a curved manifold. The central questions are: what does "straightness" mean intrinsically, how do we construct such curves, and what do they reveal about the global shape of $(M, g)$? The answers touch the exponential map, canonical coordinate systems, the energy functional and its variations, and ultimately Myers' Theorem, which converts a curvature bound into a topological conclusion.
## Covariant Differentiation Along a Curve
The geodesic equation will read $\nabla_t \gamma'(t) = 0$, so before we can even write it down we need a meaning for $\nabla_t$ applied to a vector field defined only along the curve $\gamma : (a,b) \to M$. The naive attempt is to extend $\gamma'$ to a global vector field $X$ on $M$ and set $\nabla_t \gamma' := \nabla^{\mathrm{aff}}_{\gamma'} X$. This fails: the curve may have dense image — for instance, an irrational line on the torus passes arbitrarily close to every point, and no smooth extension of $\gamma'$ to a neighbourhood of $\gamma(t)$ can exist. What we need is an operator intrinsic to the curve itself, one that differentiates sections of the pullback bundle $\gamma^* TM$ (whose fibre over $t$ is $T_{\gamma(t)}M$) without ever leaving the curve. The rest of this section constructs such an operator and pins it down uniquely by two natural axioms.
Given any affine connection $\nabla^{\mathrm{aff}}$ on $M$ (which always exists; in particular the Levi-Civita connection always exists by the Fundamental Theorem of Riemannian Geometry in Chapter 4, and connections on general bundles were studied in Chapter 3), we want an operator that differentiates sections of $\gamma^* TM$ with respect to $t$.
[quotetheorem:1556]
[citeproof:1556]
The two axioms that pin down $\nabla_{\partial_t}$ are not redundant: dropping either one destroys uniqueness. Without the Leibniz rule (i), there is no way to tell the operator how to act on general sections, since the local formula above is precisely what (i) extracts from the action on a basis; one could multiply $Y$ by any function $f$ without changing the output. Without consistency with $\nabla^{\mathrm{aff}}$ (ii), the operator is only constrained to behave like a derivation in $t$, and any choice of $\Gamma^i_{jk}$ — including the zero connection — would satisfy (i) alone. It is precisely the interplay of the two axioms that forces the Christoffel correction and thereby ties covariant differentiation along $\gamma$ to the ambient geometry. The next remark addresses a further subtlety about why (ii) cannot simply be used as a definition on its own.
[remark: Why Not Just Use (ii) as a Definition]
One might hope to define $\nabla_{\partial_t} Y$ by simply choosing a local extension $X$ of $Y$ to all of $M$ and setting $\nabla_{\partial_t} Y = \nabla^{\mathrm{aff}}_{\gamma'} X$. The obstruction is that $Y$ need not extend locally: if $\gamma$ has dense image (e.g. a curve winding densely on a torus), no such $X$ can be found near most points of $(a,b)$. The proof above circumvents this by working directly in a chart, where the coordinate vector fields always extend.
[/remark]
The formula
\begin{align*}
\nabla_{\partial_t} Y\big|_t = \sum_i \Bigl( Y_i'(t) + \sum_{j,k} \Gamma^i_{jk}\, x_k'(t)\, Y_j(t) \Bigr) \partial_{x_i}
\end{align*}
is fundamental: it says that differentiating a vector field along $\gamma$ involves not just the rate of change of its components $Y_i$, but also how the ambient connection "rotates" the basis as one moves along the curve. On flat $\mathbb{R}^n$ all $\Gamma^i_{jk} = 0$ and the formula reduces to ordinary differentiation of components.
## Geodesics and the Geodesic Equation
A geodesic should be a curve that "goes straight" — one whose velocity vector is transported parallelly along itself. This is precisely what the vanishing of $\nabla_{\partial_t} \gamma'$ means.
[definition: Geodesic]
Let $M$ carry an affine connection $\nabla^{\mathrm{aff}}$. A smooth curve $\gamma : (a,b) \to M$ is a **geodesic** for $\nabla^{\mathrm{aff}}$ if
\begin{align*}
\nabla_{\partial_t} \gamma'(t) = 0 \quad \text{for all } t \in (a,b),
\end{align*}
where $\gamma' = \gamma_*(\partial_t)$ is the velocity vector field along $\gamma$.
[/definition]
Substituting $Y_i(t) = x_i'(t)$ (since $\gamma' = \sum_j x_j'(t)\, \partial_{x_j}|_{\gamma(t)}$) into the formula for $\nabla_{\partial_t}$ yields the **geodesic equations** in local coordinates:
\begin{align*}
x_i''(t) + \sum_{j,k} \Gamma^i_{jk}\, x_j'(t)\, x_k'(t) = 0 \qquad \text{for each } i = 1, \dots, n.
\end{align*}
This is a system of $n$ second-order ODEs in $n$ unknowns $(x_1(t), \dots, x_n(t))$. Standard ODE theory (existence and uniqueness of solutions with prescribed initial data) then gives:
[quotetheorem:1557]
The theorem is a direct application of the Picard–Lindelöf theorem to the geodesic ODE system, which is why smoothness of the metric (and hence of the $\Gamma^i_{jk}$) matters: Picard–Lindelöf needs the right-hand side of the ODE to be locally Lipschitz, and smooth Christoffel symbols give that for free. Its significance is that **every point admits a geodesic in every direction**. Note, however, that the conclusion is strictly local: the interval $(-\varepsilon, \varepsilon)$ on which $\gamma$ is defined depends on $m$ and $v$, and in general $\varepsilon \to 0$ as the initial data approaches the boundary of a chart or runs out toward infinity. The theorem gives **no** global existence; a geodesic may fail to extend for all time, as happens for instance on $\mathbb{R}^2 \setminus \{0\}$, where a geodesic aimed at the missing origin runs off the manifold in finite time. The question of when all geodesics extend globally is exactly the content of geodesic completeness, taken up in the Hopf–Rinow theorem below.
## The Exponential Map and Distinguished Coordinates
Having secured local existence and uniqueness of geodesics, a natural question presses itself: is there a canonical coordinate chart around a point $p \in M$ in which geodesics through $p$ are simply straight lines through the origin? Such a chart would convert the nonlinear geodesic ODE at $p$ into the simple linear equation $x_i''(t) = 0$, and would let us transfer the linear-algebraic structure of $T_p M$ onto the manifold itself in a geometrically faithful way. The construction is forced on us by the theorem just proved: each tangent vector $v \in T_p M$ generates a unique geodesic $\gamma_v$, and we can package all of them into a single map by sending $v$ to the point reached at parameter time $1$. This map — the exponential map — turns out to be a local diffeomorphism near the origin, and the resulting coordinates (geodesic normal coordinates) are the natural generalisation of Cartesian coordinates to a Riemannian manifold. From Chapter 4 onwards we restrict to Riemannian manifolds $(M, g)$ and take $\nabla^{\mathrm{aff}}$ to be the Levi-Civita connection. This is the setting of most interest.
Let $\Omega \subset TM$ be the open neighbourhood of the zero section $0_M \subset TM$ consisting of all pairs $(m, v)$ for which the geodesic $\gamma_v$ with $\gamma_v(0) = m$, $\gamma_v'(0) = v$ is defined on some interval $(-\varepsilon, 1 + \delta)$ with $\delta > 0$ (that is, the geodesic survives past time $1$).
[definition: Geodesic Completeness]
A Riemannian manifold $(M, g)$ is **geodesically complete** if $\Omega = TM$, i.e. if every geodesic exists for all time.
[/definition]
Compact manifolds are always geodesically complete, since their metric spaces $(M, d)$ are always complete (see the Hopf–Rinow theorem below).
[definition: Exponential Map]
The **exponential map** is the smooth map $\exp : \Omega \to M$ sending $(m, v) \mapsto \gamma_v(1)$, where $\gamma_v$ is the unique geodesic through $m$ with initial velocity $v$.
For each $m \in M$, write $\exp_m := \exp|_{\Omega \cap T_m M}$ for the restriction to the tangent space at $m$.
[/definition]
[example: Lie Groups]
If $G$ is a Lie group carrying a bi-invariant metric $g$, then the exponential map $\exp_e : T_e G \cong \mathfrak{g} \to G$ in the Riemannian sense coincides (up to a suitable rescaling of $g$) with the classical Lie group exponential map $\exp : \mathfrak{g} \to G$ seen earlier in Chapter 1. The geodesics of $g$ through $e$ are precisely the one-parameter subgroups.
[/example]
The ODE smoothness and the inverse function theorem combine to give the key properties of $\exp$:
(i) $\exp$ is smooth.
(ii) The map $\Phi : \Omega \to M \times M$, $\Phi(m, v) = (m, \exp_m(v))$, is a local diffeomorphism from a neighbourhood $W$ of $0_M$ to a neighbourhood of the diagonal $\Delta \subset M \times M$. This follows because $D(\exp_m)|_0 = \mathrm{id}_{T_m M}$, so $D\Phi_{(m,0)}$ is invertible.
(iii) For each $m \in M$ there exist a neighbourhood $U \ni m$ and $\varepsilon > 0$ such that for any $x, y \in U$ there is a unique $v \in T_x M$ with $g(v,v) \leq \varepsilon$ and $\exp_x(v) = y$. In other words, nearby points can always be joined by a unique short geodesic.
Think of $\exp_m$ as mapping a small ball in the vector space $(T_m M, g_m)$ diffeomorphically onto a "round" neighbourhood of $m$ in $M$, with the straight rays through the origin corresponding to geodesics through $m$.
### Geodesic Normal Coordinates
Take $m \in M$ and $\varepsilon > 0$ small enough that $\exp_m : B_\varepsilon(0) \to M$ is a diffeomorphism onto its image, where $B_\varepsilon(0) \subset (T_m M, g_m)$ is the metric ball. Fix an orthonormal basis $e_1, \dots, e_n$ of $T_m M$; the map $\exp_m$ then defines coordinates $(x_1, \dots, x_n)$ on a neighbourhood of $m$.
In these coordinates, the geodesic through $m$ in the direction $v = \sum_i v_i e_i$ is the straight line
\begin{align*}
\gamma_v(t) = (tv_1, \dots, tv_n).
\end{align*}
Substituting into the geodesic equation $x_i'' + \sum_{j,k} \Gamma^i_{jk} x_j' x_k' = 0$ and noting $x_i'(t) = v_i$ (constant), one finds $\sum_{j,k} \Gamma^i_{jk} v_j v_k = 0$ for all $v$ and all $i$. Since this holds for every tangent vector $v$ at $m$, symmetry forces
\begin{align*}
\Gamma^i_{jk}\big|_m = 0 \quad \text{for all } i, j, k.
\end{align*}
These are the **geodesic normal coordinates**: the Christoffel symbols of the Levi-Civita connection vanish at the chosen base point $m$. This is a powerful simplification that allows one to verify tensor identities at a point by working as if the connection were flat.
### Geodesic Polar Coordinates
Instead of Cartesian coordinates on $T_m M \cong \mathbb{R}^n$, one may use polar coordinates $(r, \theta_1, \dots, \theta_{n-1})$, where $r$ is the radial distance and $\theta_i$ are coordinates on $S^{n-1} \subset T_m M$. The exponential map then gives a coordinate system on a punctured neighbourhood of $m$ via $(r, \omega) \mapsto \exp_m(r\omega)$ for $\omega \in S^{n-1}$.
The geometry of these coordinates is captured by the following fundamental lemma.
## Gauss' Lemma and Length Minimisation
[quotetheorem:1558]
The lemma does **not** assert that the metric is a product metric on $S^{n-1} \times (0, \varepsilon)$: the coefficients $g_{\alpha\beta}$ may depend on $r$. What it asserts is that the radial direction is everywhere a unit vector field and is orthogonal to every angular direction.
[proof]
The integral curves of $\partial_r$ are the images under $\exp_m$ of radial straight lines in $B_\varepsilon(0) \subset T_m M$, hence are geodesics. For any geodesic $\gamma$, compatibility of the Levi-Civita connection with $g$ and the geodesic equation $\nabla_{\partial_t} \gamma' = 0$ give $\frac{d}{dt} g(\gamma', \gamma') = 2\, g(\nabla_{\partial_t} \gamma', \gamma') = 0$, so geodesics have constant speed. Applied to the radial geodesics, this gives $g(\partial_r, \partial_r) = \mathrm{const} = 1$ (the value at $m$, where the radial direction is a unit vector in $(T_m M, g_m)$).
For the cross terms, fix $\alpha$ and consider $\frac{\partial}{\partial r}\bigl(g(\partial_{\theta_\alpha}, \partial_r)\bigr)$ along a radial geodesic. Using compatibility of the connection:
\begin{align*}
\frac{\partial}{\partial r} g\!\left(\frac{\partial}{\partial \theta_\alpha}, \frac{\partial}{\partial r}\right) = g\!\left(\nabla_{\partial_r} \frac{\partial}{\partial \theta_\alpha}, \frac{\partial}{\partial r}\right) + g\!\left(\frac{\partial}{\partial \theta_\alpha}, \nabla_{\partial_r} \frac{\partial}{\partial r}\right).
\end{align*}
The second term vanishes because $\nabla_{\partial_r}(\partial_r) = 0$ (radial curves are geodesics). Since the Levi-Civita connection is torsion-free, $\nabla_{\partial_r}(\partial_{\theta_\alpha}) - \nabla_{\partial_{\theta_\alpha}}(\partial_r) = [\partial_r, \partial_{\theta_\alpha}] = 0$ (coordinate vector fields commute). So the first term equals $g(\nabla_{\partial_{\theta_\alpha}}(\partial_r), \partial_r) = \frac{1}{2} \frac{\partial}{\partial \theta_\alpha} g(\partial_r, \partial_r) = 0$. Hence $g(\partial_{\theta_\alpha}, \partial_r)$ is constant in $r$; as $r \to 0$ this quantity tends to $0$ since $\partial_{\theta_\alpha}$ and $\partial_r$ are orthogonal in $T_m M$. So $g(\partial_{\theta_\alpha}, \partial_r) = 0$ everywhere.
[/proof]
[explanation: Length Minimisation via Gauss' Lemma]
Gauss' Lemma has an immediate and important consequence: **radial geodesics minimise length among all curves joining the centre to a boundary point of a geodesic ball**.
Let $U = \exp_m(B_\varepsilon(0))$ and let $p, q \in U$ with $p = m$ being the centre. Let $\tilde{\gamma}$ be any smooth curve from $m$ to $q$. Write $\tilde{\gamma}$ in geodesic polar coordinates as $(r(t), \theta(t))$.
If $\tilde{\gamma} \subset U$, then by Gauss' Lemma:
\begin{align*}
L(\tilde{\gamma}) = \int \sqrt{\dot{r}^2 + \sum_{\alpha,\beta} g_{\alpha\beta}\, \dot{\theta}_\alpha \dot{\theta}_\beta}\, dt \geq \int |\dot{r}|\, dt \geq r(q),
\end{align*}
with equality if and only if $\tilde{\gamma}$ is the radial geodesic (i.e. $\dot{\theta} = 0$ and $\dot{r} > 0$ throughout). If $\tilde{\gamma}$ leaves $U$, it must cross $\exp_m(\partial B_{r(q)}(0))$, and by the triangle inequality its length is still at least $r(q)$.
This shows that the distance function $d(m, \cdot)$ in the geodesic ball equals the radial coordinate $r$, and that the unique length-minimising path from $m$ to any $q \in U$ is the radial geodesic.
A further convexity argument shows that for $\varepsilon > 0$ sufficiently small, the ball $\exp_m(B_\varepsilon(0))$ is **geometrically convex**: any two points $p, q$ in the ball can be joined by a length-minimising geodesic that remains entirely in the ball. The key step is to show that the function $F(t) = r(\gamma(t))^2$, where $\gamma$ is any geodesic and $r$ is the distance from $m$, satisfies $\frac{d^2 F}{dt^2}\big|_{(0,m,v)} = 2|v|^2 > 0$, so $r^2$ is strictly convex along geodesics near $m$. This prevents $r$ from having interior local maxima along short geodesics not based at $m$.
[/explanation]
## The Hopf–Rinow Theorem
The local picture — every point has a geodesically convex neighbourhood — raises the question of whether global length-minimising geodesics always exist. The answer for complete manifolds is yes.
[definition: Riemannian Distance]
For $p, q \in M$, define
\begin{align*}
d(p, q) := \inf \{ L(\gamma) : \gamma \text{ is a smooth curve from } p \text{ to } q \},
\end{align*}
where $L(\gamma) = \int_a^b |\gamma'(t)|_g\, dt$ is the length of $\gamma$. This defines a metric on $M$ that induces the same topology as the manifold topology.
[/definition]
[quotetheorem:1559]
[citeproof:1559]
The Hopf–Rinow theorem has a particularly clean consequence for compact manifolds: any compact Riemannian manifold is automatically geodesically complete (its metric space is compact, hence complete), so global length-minimising geodesics always exist between any two points. Completeness cannot be dropped from the hypotheses — the punctured plane $\mathbb{R}^2 \setminus \{0\}$ with its Euclidean metric is a standard counterexample: it is a Riemannian manifold, but the straight-line geodesic from $(-1, 0)$ to $(1, 0)$ would have to pass through the missing origin, and no minimising geodesic exists. Uniqueness of the minimiser is also subtle even in the complete case: antipodal points $p$ and $-p$ on the sphere $S^2$ are joined by an entire great-circle family of length-minimising geodesics, so Hopf–Rinow guarantees existence but not uniqueness. A clean structural corollary is that on a complete Riemannian manifold, closed bounded sets are compact (the Heine–Borel property), which in particular means that any two points in the same connected component are joined by a minimising geodesic and that bounded-diameter complete manifolds are automatically compact — a fact we will exploit in the proof of Myers' Theorem.
[explanation: Geodesics on the Sphere and in the Hyperbolic Plane]
It is useful to see how the machinery of this chapter plays out on two familiar surfaces, since the same calculations recur whenever one uses the geodesic equation in practice.
**How to use the geodesic equation.** Write down the metric in coordinates, read off the $g_{ij}$, compute the Christoffel symbols $\Gamma^i_{jk} = \frac{1}{2} \sum_l g^{il}(\partial_j g_{lk} + \partial_k g_{lj} - \partial_l g_{jk})$, plug into $x_i'' + \sum_{j,k} \Gamma^i_{jk} x_j' x_k' = 0$, and solve the resulting ODE system with the initial conditions $\gamma(0) = p$, $\gamma'(0) = v$.
**Great circles on $S^2$.** In spherical coordinates $(\theta, \varphi)$ on $S^2 \subset \mathbb{R}^3$ with the round metric $g = d\theta^2 + \sin^2\theta\, d\varphi^2$, the non-zero Christoffel symbols are $\Gamma^\theta_{\varphi\varphi} = -\sin\theta\cos\theta$ and $\Gamma^\varphi_{\theta\varphi} = \Gamma^\varphi_{\varphi\theta} = \cot\theta$. The geodesic equation becomes
\begin{align*}
\theta'' - \sin\theta\cos\theta\,(\varphi')^2 = 0, \qquad \varphi'' + 2\cot\theta\,\theta'\varphi' = 0.
\end{align*}
An equatorial solution $\theta \equiv \pi/2$, $\varphi = ct$ satisfies both equations directly. By rotational symmetry, every great circle is a geodesic, and Hopf–Rinow plus the Gauss–Bonnet-type uniqueness argument (failing exactly at antipodal pairs) shows that they are all of them.
**Geodesics in the upper half-plane model of $H^2$.** The hyperbolic plane in the upper half-plane model is $\{(x, y) : y > 0\}$ with $g = (dx^2 + dy^2)/y^2$, so $g_{11} = g_{22} = 1/y^2$ and $g^{11} = g^{22} = y^2$. The non-zero Christoffel symbols are $\Gamma^1_{12} = \Gamma^1_{21} = -1/y$, $\Gamma^2_{11} = 1/y$, $\Gamma^2_{22} = -1/y$. Solving $x'' - 2x'y'/y = 0$ and $y'' + ((x')^2 - (y')^2)/y = 0$, the unit-speed solutions are either vertical rays $x = \mathrm{const}$, $y = e^t$, or semicircles $(x - a)^2 + y^2 = r^2$ orthogonal to the real axis. This is exactly the classical picture of hyperbolic geodesics used throughout non-Euclidean geometry.
[/explanation]
## Energy, Variations, and the Index Form
Since length is what we care about geometrically, one might expect to study geodesics by examining the first and second variations of the length functional $L$. There is a serious obstacle: $L$ is invariant under reparameterisation, so any geodesic $\gamma$ sits on an infinite-dimensional orbit of reparameterisations on which $L$ is constant. The second variation therefore has a non-trivial kernel — variations tangent to the reparameterisation orbit — and the standard variational machinery (positive-definiteness arguments, Morse indices, Jacobi field analysis) fails because the relevant bilinear form is only positive semi-definite. The energy functional $E(\gamma) = \int |\gamma'|_g^2\, dt$ breaks this symmetry: it is **not** reparameterisation-invariant (rescaling speed changes $E$ even though it leaves length fixed), and its critical points among fixed-endpoint variations are precisely the **constant-speed** geodesics. This rigidification is exactly what makes $E$ the right object to differentiate twice, and the second variation of $E$ will yield the index form — the quadratic form on vector fields along $\gamma$ that governs the local minimality of geodesics.
[definition: Energy of a Path]
The **energy** is the map
\begin{align*}
E : C^\infty([a,b], M) \longrightarrow \mathbb{R}
\end{align*}
that assigns to a smooth curve $\gamma : [a, b] \to M$ the non-negative real number
\begin{align*}
E(\gamma) = \int_a^b g(\gamma'(t), \gamma'(t))\, dt = \int_a^b |\gamma'(t)|_g^2\, dt.
\end{align*}
[/definition]
Unlike length, energy depends on the parameterisation — rescaling speed changes the energy but not the length. A Cauchy–Schwarz inequality argument shows that for fixed endpoints and a fixed parameter interval $[a,b]$, energy-minimising curves are exactly the arc-length-parameterised geodesics: they minimise $E$ if and only if they minimise $L$ and travel at constant speed.
### First Variation of Energy
Let $c_0(t)$ be a curve in $M$ and let $c(t, s) = c_s(t)$ be a smooth variation with $c_0 = c|_{s=0}$. Write $\dot{c} = \partial_t c$ and $c' = \partial_s c$. Using compatibility of the Levi-Civita connection with $g$ and the torsion-free property (which gives $\nabla_{\partial_s} \dot{c} = \nabla_{\partial_t} c'$):
\begin{align*}
\frac{d}{ds} E(c_s) = 2\, g(c', \dot{c})\big|_{t=a}^{t=b} - 2 \int_a^b g\!\left(c', \nabla_{\partial_t} \dot{c}\right) dt.
\end{align*}
This is the **first variation formula**. It shows that $c_0$ is a critical point of $E$ among variations with fixed endpoints if and only if $\nabla_{\partial_t} \dot{c}\big|_{s=0} = 0$ for all $t$, i.e. $c_0$ is a geodesic.
### Second Variation and the Index Form
When $c_0$ is a geodesic, the first variation vanishes and the second variation governs stability. Differentiating the first variation formula again and using the curvature tensor $R$ (which appears from commuting $\nabla_{\partial_s}$ and $\nabla_{\partial_t}$):
\begin{align*}
\frac{1}{2} \frac{d^2}{ds^2} E(c_s)\bigg|_{s=0} = \int_a^b g\!\left(\nabla_{\partial_t} c', \nabla_{\partial_t} c'\right) dt - \int_a^b g\!\left(R(\dot{c}, c') c', \dot{c}\right) dt + g\!\left(\nabla_{\partial_s} c', \dot{c}\right)\bigg|_{(a,0)}^{(b,0)}.
\end{align*}
For a variation with fixed endpoints, $c'|_{t=a} = c'|_{t=b} = 0$ and the boundary term vanishes. Let $\mathcal{V}_0(c_0)$ denote the space of smooth vector fields along $c_0$ that vanish at the endpoints $t = a$ and $t = b$. The resulting symmetric bilinear form is the map
\begin{align*}
I : \mathcal{V}_0(c_0) \times \mathcal{V}_0(c_0) \longrightarrow \mathbb{R}, \qquad I(Y, Z) := \int_a^b \left[ g\!\left(\nabla_{\partial_t} Y, \nabla_{\partial_t} Z\right) - g\!\left(R(\dot{c}, Y) Z, \dot{c}\right) \right] dt,
\end{align*}
known as the **index form** (or second fundamental form of the energy functional). Since $c_0$ is a length-minimising geodesic, $I(Y, Y) \geq 0$ for every $Y \in \mathcal{V}_0(c_0)$.
## Myers' Theorem
The index form is the key to proving Myers' Theorem, one of the central results connecting curvature to topology in Riemannian geometry.
[quotetheorem:1560]
The proof is a direct application of the second variation machinery to a length-minimising geodesic between two points near opposite ends of the diameter, combined with a clever choice of variation fields built from parallel transport and a sine bump. The curvature assumption enters only through the Ricci-trace of the index form, which is why a lower bound on $\mathrm{Ric}$ — a much weaker hypothesis than a bound on sectional curvature — suffices.
[proof]
Completeness implies geodesic completeness by Hopf–Rinow. Fix any $L < \mathrm{diam}(M, g)$; then there exist $p, q \in M$ with $d(p, q) = L$, and by Hopf–Rinow there is a length-minimising geodesic $\gamma : [0, L] \to M$ parameterised by arc length.
Let $e_1 = \gamma'(0), e_2, \dots, e_n$ be an orthonormal basis of $T_p M$. Parallel transport along $\gamma$ extends this to an orthonormal frame $X_1(t) = \gamma'(t), X_2(t), \dots, X_n(t)$ of vector fields along $\gamma$. For each $i = 2, \dots, n$, define
\begin{align*}
Y_i(t) = \sin\!\left(\frac{\pi t}{L}\right) X_i(t).
\end{align*}
Each $Y_i$ vanishes at $t = 0$ and $t = L$. Since $\gamma$ is length-minimising, $I(Y_i, Y_i) \geq 0$.
Since the $X_i$ are parallel, $\nabla_{\partial_t} X_i = 0$, so $\nabla_{\partial_t} Y_i = \frac{\pi}{L} \cos(\pi t/L)\, X_i$. Computing:
\begin{align*}
I(Y_i, Y_i) = \int_0^L \sin^2\!\left(\frac{\pi t}{L}\right)\! \left(\frac{\pi^2}{L^2} - R(\gamma', X_i, \gamma', X_i)\right) dt
\end{align*}
(using that $g(Y_i, \ddot{Y}_i) = -g(\nabla_{\partial_t} Y_i, \nabla_{\partial_t} Y_i)$ after integration by parts, and properties of $R$). Summing over $i = 2, \dots, n$:
\begin{align*}
\sum_{i=2}^n I(Y_i, Y_i) = \int_0^L \sin^2\!\left(\frac{\pi t}{L}\right)\! \left((n-1)\frac{\pi^2}{L^2} - \mathrm{Ric}(\gamma', \gamma')\right) dt.
\end{align*}
Since $I(Y_i, Y_i) \geq 0$ and the curvature bound gives $\mathrm{Ric}(\gamma', \gamma') \geq (n-1)/r^2$ (as $|\gamma'|_g = 1$), we obtain:
\begin{align*}
0 \leq \int_0^L \sin^2\!\left(\frac{\pi t}{L}\right)\!\left(\frac{(n-1)\pi^2}{L^2} - \frac{n-1}{r^2}\right) dt.
\end{align*}
Since $\sin^2(\pi t/L) > 0$ on $(0, L)$, the integrand being non-negative forces
\begin{align*}
\frac{(n-1)\pi^2}{L^2} \geq \frac{n-1}{r^2} \implies L \leq \pi r.
\end{align*}
As $L < \mathrm{diam}(M,g)$ was arbitrary, $\mathrm{diam}(M, g) \leq \pi r < \infty$. A manifold with finite diameter is compact (it is a complete metric space in which every sequence has a convergent subsequence, since every sequence lies in a bounded set). Compactness of $M$ implies compactness of $\pi_1(M)$ as a discrete group acting by deck transformations on $\widetilde{M}$, hence $\pi_1(M)$ is finite.
[/proof]
[remark: Curvature Comparison]
The sphere $S^n(r)$ of radius $r$ has constant Ricci curvature $(n-1)/r^2$ and diameter $\pi r$. Myers' Theorem says: if $(M,g)$ has Ricci curvature bounded below by $(n-1)/r^2$, then $M$ is no "bigger" than $S^n(r)$ in terms of diameter. The assumption is exactly a uniform lower bound on the Ricci curvature; since $\mathrm{Ric}$ is continuous and $M$ is connected, such a bound always corresponds to some $r > 0$.
[/remark]
The sphere saturates the Myers bound, so the conclusion $\mathrm{diam}(M) \le \pi r$ is sharp — there is no room for improvement without strengthening the hypothesis. A standard application illustrates how the compactness conclusion bites even when one only knows a qualitative Ricci lower bound.
[example: Application to Positive Ricci Curvature]
If $(M, g)$ is a connected complete Riemannian manifold of dimension $n$ with $\mathrm{Ric} \geq \kappa > 0$ uniformly, then taking $r = \sqrt{(n-1)/\kappa}$, Myers' Theorem gives $\mathrm{diam}(M) \leq \pi\sqrt{(n-1)/\kappa} < \infty$, so $M$ is compact. In particular, no complete non-compact manifold can have strictly positive Ricci curvature bounded away from zero. This rules out, for instance, $\mathbb{R}^n$ as an example with positive Ricci curvature — consistent with the fact that $\mathbb{R}^n$ is flat.
[/example]
Myers' Theorem is a beautiful illustration of the philosophy running through Riemannian geometry: curvature controls geometry, and geometry controls topology. The positivity of Ricci curvature forces the manifold to be compact with finite fundamental group — a conclusion that seems far removed from a local differential equation, yet follows directly from the second variation of energy and the index form inequality.
The Yang-Mills equation crowns the course by unifying all these threads: connections on bundles, curvature, variational methods, and the interplay of analysis with topology. This is where differential geometry meets mathematical physics and becomes a tool for understanding the deep structure of gauge theories and the topology of manifolds themselves.
# 6. The Yang-Mills Equation
This chapter marks the culmination of the course, bringing together the theory of connections, curvature, and Riemannian geometry to study a profound variational problem rooted in mathematical physics. The Yang-Mills functional measures the $L^2$ size of the curvature of a connection on a bundle $E \to M$, and the connections that minimise it satisfy the Yang-Mills equation — a nonlinear PDE whose study has had striking consequences for 4-manifold topology. The highlight is Donaldson's theorem, proved in the 1980s, which uses the geometry of anti-self-dual connections to constrain the topology of 4-manifolds.
[remark: Non-examinable]
This chapter is non-examinable. It is presented to illustrate how the machinery developed throughout the course — differential forms, connections, curvature, and Riemannian metrics — converges on a single beautiful geometric equation with deep topological consequences.
[/remark]
To orient ourselves, recall the two broad strategies developed in this course. For differential forms, we studied the algebra $\Omega^\bullet(M)$ through its de Rham cohomology $H^\bullet_{\mathrm{dR}}(M)$, a topological invariant, and we also studied specific forms (symplectic forms, volume forms) with their own geometry. For connections, we mostly studied $d_A$ for a fixed connection $A$ — for example, computing holonomy — or we studied the Levi-Civita connection $d_{LC}$ on a Riemannian manifold $(M, g)$. In this chapter we take a new perspective: we vary $A$ across the entire affine space of connections on $E$ and ask which connections are optimal, in the sense of minimising the Yang-Mills functional.
## Metrics on Bundles and Forms
To formulate a variational principle for connections, we need a way to measure the size of curvature 2-forms. The curvature $F_A$ of a connection is a 2-form valued in $\operatorname{End}(E)$, so integrating $|F_A|^2$ requires both a metric on the form-degree (coming from a Riemannian metric on $M$) and a metric on $\operatorname{End}(E)$ (coming from a metric on the bundle $E$). The question is: given $g$ on $M$ and $g_E$ on $E$, how do these combine to produce a well-defined inner product on $\Omega^2(\operatorname{End}(E))$? This section constructs that metric and uses it to define the Yang-Mills functional.
Let $E \to M$ be a smooth vector bundle equipped with a metric $g_E$ (a smoothly varying inner product on each fibre). A connection $A$ on $E$ is compatible with $g_E$ if
\begin{align*}
d(g_E(s, t)) = g_E(d_A s, t) + g_E(s, d_A t) \quad \forall s, t \in \Gamma(E).
\end{align*}
For such metric-compatible connections, the connection matrices $\theta_\alpha$ in any local orthonormal frame are skew-symmetric. Since the curvature is locally
\begin{align*}
(F_A)_\alpha = d\theta_\alpha + \theta_\alpha \wedge \theta_\alpha,
\end{align*}
skew-symmetry of $\theta_\alpha$ implies that $F_A$ is itself skew-symmetric, so $F_A \in \Omega^2(\operatorname{Skew}(\operatorname{End}(E)))$. This parallels the fact that inside $\operatorname{Mat}_n(\mathbb{R})$, the skew-symmetric matrices form the Lie algebra of $O(n)$.
A metric on $E$ induces a metric on $\operatorname{End}(E)$ via
\begin{align*}
|S|^2 = \operatorname{tr}(SS^*) = \sum_{i,j} |S_{ij}|^2,
\end{align*}
which on $\operatorname{Skew}(\operatorname{End}(E))$ reduces to $-\operatorname{tr}(S^2)$, since $S^* = -S$ for skew-endomorphisms. More generally, a metric on a vector space $V$ induces metrics on all exterior powers $\Lambda^k V$: if $\{e_1, \ldots, e_n\}$ is an orthonormal basis for $V$, then we declare the basis $\{e_I = e_{i_1} \wedge \cdots \wedge e_{i_k} : i_1 < \cdots < i_k\}$ to be an orthonormal basis for $\Lambda^k V$.
In particular, a Riemannian metric $g$ on $M$ induces metrics on all bundles $\Lambda^k T^*M$, and hence on all spaces of differential forms.
[definition: Volume Form]
Let $(M^n, g)$ be an oriented Riemannian $n$-manifold, and let $\omega$ be a nowhere-zero $n$-form on $M$. The **volume form** is
\begin{align*}
\operatorname{vol}_g := \frac{\omega}{\sqrt{g(\omega, \omega)}},
\end{align*}
the unique unit-length nowhere-zero $n$-form compatible with the orientation.
[/definition]
With the volume form in hand, we can define an $L^2$ inner product on the space of differential forms.
[definition: $L^2$ Inner Product on Forms]
Let $(M, g)$ be a compact oriented Riemannian manifold. The **$L^2$ inner product** on $\Omega^k(M)$ is
\begin{align*}
\langle \alpha, \beta \rangle_{L^2} := \int_M g(\alpha(x), \beta(x))\, \operatorname{vol}_g \in \mathbb{R},
\end{align*}
with corresponding norm $\|\alpha\|_{L^2} := \sqrt{\langle \alpha, \alpha \rangle_{L^2}}$.
[/definition]
Given $(M, g)$ and a bundle $E \to M$ with a metric on $E$, the metrics on $\Omega^2(M)$ and $\operatorname{End}(E)$ combine to give a metric on $\Omega^2(M) \otimes \operatorname{End}(E)$: for $\eta \in \Omega^2(M)$ and $A \in \operatorname{End}(E)$, we set
\begin{align*}
|\eta \otimes A|^2 := g(\eta, \eta)|A|^2.
\end{align*}
This extends to a metric on the entire space $\Omega^2(\operatorname{End}(E))$.
[definition: Yang-Mills Functional]
Let $E \to M$ be a smooth vector bundle with a metric, and let $\mathcal{A}_E$ denote the affine space of all metric-compatible connections on $E$. The **Yang-Mills functional** $\mathrm{YM}_E : \mathcal{A}_E \to \mathbb{R}$ is
\begin{align*}
\mathrm{YM}_E(A) := \|F_A\|_{L^2}^2 = \int_M |F_A|^2\, \operatorname{vol}_g.
\end{align*}
[/definition]
The Yang-Mills functional assigns to each connection the total $L^2$ energy of its curvature. Flat connections (with $F_A = 0$) have zero Yang-Mills energy; in general, topological obstructions prevent the curvature from vanishing, and the Yang-Mills equation identifies the connections of minimal energy.
## The Euler-Lagrange Equations
Since $\mathcal{A}_E$ is an affine space modelled on $\Omega^1(\operatorname{End}(E))$, any smooth path $A + ta$ (with $a \in \Omega^1(\operatorname{End}(E))$ and $t \in \mathbb{R}$) is a valid deformation of a connection $A$. The critical points of $\mathrm{YM}_E$ are the connections $A$ for which this functional is stationary in every direction.
Recall from Chapter 3 that
\begin{align*}
F_{A+ta} = F_A + t\, d_{A \otimes A^*}(a) + t^2\, a \wedge a,
\end{align*}
where $d_{A \otimes A^*} : \Omega^1(\operatorname{End}(E)) \to \Omega^2(\operatorname{End}(E))$ is the connection-twisted exterior derivative. Differentiating $\mathrm{YM}_E(A + ta)$ at $t = 0$ gives
\begin{align*}
0 = \frac{d}{dt}\bigg|_{t=0} \|F_{A+ta}\|_{L^2}^2 = 2\int_M g(F_A,\, d_{A \otimes A^*}(a))\, \operatorname{vol}_g.
\end{align*}
To extract what condition on $A$ this places, we invoke the following analytical fact.
[quotetheorem:1561]
The proof that such a formal adjoint exists uses integration by parts on a compact manifold — compactness (or compactly supported sections) is essential because integration by parts generates a boundary term that vanishes only when $\partial M = \varnothing$ or the sections have compact support. The formal adjoint is what allows us to rewrite the stationarity condition as a pointwise PDE: rather than demanding that a global integral vanishes, we can move $d_{A \otimes A^*}$ off the test section $a$ and onto $F_A$, translating the variational condition into the Yang-Mills equation $(d_{A \otimes A^*})^*(F_A) = 0$. This is precisely the pattern from Hodge theory, where the formal adjoint $d^*$ of the de Rham differential defines harmonic forms as the simultaneous kernel of $d$ and $d^*$; here the Yang-Mills equation plays the role of the $d^*$-harmonic condition for the curvature.
Applying this adjoint to our stationarity condition, we can move $d_{A \otimes A^*}$ off $a$ and onto $F_A$:
\begin{align*}
0 = 2\int_M g\!\left((d_{A \otimes A^*})^*(F_A),\, a\right)\operatorname{vol}_g \quad \text{for all } a \in \Omega^1(\operatorname{End}(E)).
\end{align*}
Since this holds for all $a$, the integrand must vanish pointwise, yielding the Euler-Lagrange equation.
[definition: Yang-Mills Equation]
A connection $A$ on $E \to M$ is called a **Yang-Mills connection** if it satisfies the **Yang-Mills equation**:
\begin{align*}
(d_{A \otimes A^*})^*(F_A) = 0.
\end{align*}
[/definition]
The Yang-Mills equation is a second-order nonlinear PDE in $A$. Before proceeding to special solutions in dimension 4, it is instructive to see what breaks when metric-compatibility is dropped.
[example: Failure Without Metric-Compatibility]
Suppose $E \to M$ is a complex line bundle with a Hermitian metric $h$, and let $A$ be any $h$-compatible connection, so its local connection 1-form $\theta$ is purely imaginary: $\theta = i\phi$ for a real 1-form $\phi$. The curvature is then $F_A = i\,d\phi$, which is imaginary-valued, consistent with the skew-Hermitian condition $F_A \in \Omega^2(i\mathbb{R})$. Now consider a connection $A'$ whose local 1-form is $\theta' = f + i\phi$ for some nonzero real-valued function $f$; this connection fails to be $h$-compatible. Its curvature $F_{A'} = d\theta' + \theta' \wedge \theta' = df + i\,d\phi + (f + i\phi)^2\,d\mathrm{vol}$ acquires a real (non-skew-Hermitian) component. The pointwise norm $|F_{A'}|^2$ is no longer given by $-\operatorname{tr}(F_{A'}^2)$ (the formula valid for skew-Hermitian endomorphisms), so the Yang-Mills functional cannot be defined via the fibre metric on $\operatorname{Skew}(\operatorname{End}(E))$. Metric-compatibility is thus not a convenience but a prerequisite for the variational setup itself.
[/example]
[remark: Comparison with the Bianchi Identity]
The second Bianchi identity says $d_{A \otimes A^*}(F_A) = 0$ for every connection $A$, without exception. The Yang-Mills equation $(d_{A \otimes A^*})^*(F_A) = 0$ is the analogous equation with $d_{A \otimes A^*}$ replaced by its formal adjoint. The fact that these two equations look so similar — one being automatic, the other a non-trivial PDE — underscores the depth of the variational problem.
[/remark]
[remark: Harmonic Forms]
There is a classical analogue in de Rham theory: the exterior derivative $d : \Omega^k(M) \to \Omega^{k+1}(M)$ has a formal adjoint $d^*$, defined via the Hodge-star operator. A form $\alpha \in \Omega^k(M)$ is **harmonic** if $d\alpha = 0$ and $d^*\alpha = 0$; these are precisely the Euler-Lagrange equations for the functional
\begin{align*}
\alpha \mapsto \|\alpha\|_{L^2}^2
\end{align*}
restricted to closed forms in a fixed cohomology class $[\alpha] \in H^k_{\mathrm{dR}}(M)$. For compact $M$, harmonic representatives exist and are unique in each class, providing an analytic realisation of de Rham cohomology as the space of energy-minimising forms.
[/remark]
With the Yang-Mills equation understood as a critical-point condition for the $L^2$ energy of the curvature, the next step is to ask whether dimension imposes additional structure. In dimension 4, the Hodge star on 2-forms turns out to have an unexpected property that is absent in all other dimensions.
## The Hodge Star in Dimension 4
What makes dimension 4 special for Yang-Mills theory? In any dimension $n$, the Hodge star $\star : \Omega^k(M) \to \Omega^{n-k}(M)$ maps $k$-forms to $(n-k)$-forms. On 2-forms, $\star$ maps $\Omega^2 \to \Omega^{n-2}$, and this is a self-map only when $n - 2 = 2$, i.e., $n = 4$. It is precisely this self-map property that allows $\star$ to be diagonalised on $\Omega^2$, splitting 2-forms into $\pm 1$ eigenspaces. This splitting is what produces the topological lower bound on the Yang-Mills energy and ultimately powers Donaldson's theorem. The question that motivates this section is: what is the structure of 2-forms in dimension 4, and how does the Hodge star act on them?
[definition: Hodge Star on $\Lambda^2 V$]
Let $V$ be an oriented 4-dimensional inner product space with orthonormal basis $\{e_1, e_2, e_3, e_4\}$. The **Hodge-star operator** $\star : \Lambda^2 V \to \Lambda^2 V$ is defined on basis elements by
\begin{align*}
\star(e_{\sigma(1)} \wedge e_{\sigma(2)}) = \operatorname{sign}(\sigma) \cdot e_{\sigma(3)} \wedge e_{\sigma(4)}
\end{align*}
for each permutation $\sigma \in S_4$.
[/definition]
A direct computation on basis elements shows that $\star \circ \star = \operatorname{id}_{\Lambda^2 V}$, so $\star$ has eigenvalues $\pm 1$. It is worth emphasising that this self-map property is dimension-specific: in dimension 6, for instance, the Hodge star maps $\star : \Omega^2 \to \Omega^4$, which has square $\star^2 = (-1)^{2(6-2)} = +1$ on 2-forms only in the sense that $\star : \Omega^3 \to \Omega^3$ squares to the identity (since $3 = 6/2$), but $\star$ does not preserve $\Omega^2$ at all. The splitting into $\pm 1$ eigenspaces on 2-forms is therefore a feature unique to dimension 4, and it has no analogue for 2-forms in dimensions 6, 8, or any other even dimension.
[definition: Self-Dual and Anti-Self-Dual 2-Forms]
Let $(M, g)$ be an oriented Riemannian 4-manifold. The $\pm 1$ eigenspaces of the Hodge-star $\star : \Omega^2(M) \to \Omega^2(M)$ give the orthogonal decomposition
\begin{align*}
\Omega^2(M) = \Omega^2_+(M) \oplus \Omega^2_-(M).
\end{align*}
Elements of $\Omega^2_+(M)$ are called **self-dual** 2-forms ($\star\alpha = \alpha$); elements of $\Omega^2_-(M)$ are called **anti-self-dual** 2-forms ($\star\alpha = -\alpha$).
[/definition]
The decomposition $\Omega^2 = \Omega^2_+ \oplus \Omega^2_-$ is orthogonal with respect to the $L^2$ inner product and depends on both the metric $g$ and the orientation of $M$; reversing orientation swaps $\Omega^2_+$ and $\Omega^2_-$.
To understand these spaces concretely, one can check directly that
\begin{align*}
(\Lambda^2 V)_+ &= \operatorname{span}\{e_1 \wedge e_2 + e_3 \wedge e_4,\; e_1 \wedge e_3 + e_2 \wedge e_4,\; e_1 \wedge e_4 + e_2 \wedge e_3\}, \\
(\Lambda^2 V)_- &= \operatorname{span}\{e_1 \wedge e_2 - e_3 \wedge e_4,\; e_1 \wedge e_3 - e_2 \wedge e_4,\; e_1 \wedge e_4 - e_2 \wedge e_3\}.
\end{align*}
Each space is 3-dimensional, and they are orthogonal with respect to the wedge product: if $\alpha \in (\Lambda^2 V)_+$ and $\beta \in (\Lambda^2 V)_-$, then $\alpha \wedge \beta = 0$.
The Hodge star satisfies a key identity relating the wedge product and the inner product:
\begin{align*}
\alpha \wedge (\star \beta) = g(\alpha, \beta)\cdot \operatorname{vol}_g
\end{align*}
for all $\alpha, \beta \in \Omega^2(M)$. Since $\alpha \wedge (\star\beta)$ is a top-degree form, it must be a scalar multiple of $\operatorname{vol}_g$, and checking on basis elements verifies the constant is $g(\alpha, \beta)$.
[explanation: The Wedge Square Identity]
For any $\alpha \in \Omega^2(M)$, decompose $\alpha = \alpha_+ + \alpha_-$ with $\alpha_\pm = \frac{1}{2}(\alpha \pm \star\alpha) \in \Omega^2_\pm(M)$. Then:
\begin{align*}
\alpha \wedge \alpha &= (\alpha_+ + \alpha_-) \wedge (\alpha_+ + \alpha_-) \\
&= \alpha_+ \wedge \alpha_+ + \alpha_- \wedge \alpha_-.
\end{align*}
The cross terms vanish because $\alpha_+ \wedge \alpha_- = 0$ (self-dual and anti-self-dual forms are $\wedge$-orthogonal). Now, since $\alpha_+ = \star \alpha_+$ and $\alpha_- = -\star\alpha_-$, the identity $\alpha \wedge \star\beta = g(\alpha,\beta)\operatorname{vol}_g$ gives
\begin{align*}
\alpha_+ \wedge \alpha_+ &= \alpha_+ \wedge \star\alpha_+ = |\alpha_+|^2\, \operatorname{vol}_g, \\
\alpha_- \wedge \alpha_- &= -\alpha_- \wedge \star\alpha_- = -|\alpha_-|^2\, \operatorname{vol}_g.
\end{align*}
Therefore
\begin{align*}
\alpha \wedge \alpha = (|\alpha_+|^2 - |\alpha_-|^2)\,\operatorname{vol}_g.
\end{align*}
Combined with $|\alpha|^2 = |\alpha_+|^2 + |\alpha_-|^2$, this shows that $\alpha$ is anti-self-dual (i.e., $\alpha_+ = 0$) if and only if $-\alpha \wedge \alpha = |\alpha|^2\,\operatorname{vol}_g$, a condition that forces $\alpha \wedge \alpha$ to be a constant multiple of $\operatorname{vol}_g$ with a negative sign.
[/explanation]
## Anti-Self-Dual Connections and the Yang-Mills Minimum
For a connection $A$ on a bundle $E \to M^4$, the curvature decomposes as
\begin{align*}
F_A \in \Omega^2(\operatorname{End}(E)) = \Gamma\!\left(\operatorname{End}(E) \otimes \Lambda^2_+\right) \oplus \Gamma\!\left(\operatorname{End}(E) \otimes \Lambda^2_-\right),
\end{align*}
giving components $F_A^+$ (the self-dual part) and $F_A^-$ (the anti-self-dual part).
[quotetheorem:1562]
[citeproof:1562]
[remark: When the Lower Bound is Negative]
The topological term $\int_M \operatorname{tr}(F_A \wedge F_A)$ belongs to $H^4_{\mathrm{dR}}(M) \cong \mathbb{R}$ (for connected compact $M$). If this characteristic number is negative, then the lower bound is negative, and the Yang-Mills energy is positive, so we cannot have equality. In this case, the bundle $E$ admits no anti-self-dual connection: no connection achieves the topological minimum.
[/remark]
The energy bound deserves closer examination. The fact that dimension 4 is essential here is not incidental: the self-dual/anti-self-dual splitting of $\Omega^2$ exists only because $\star$ is a self-map on 2-forms only when $\dim M = 4$, so the entire decomposition $\mathrm{YM}_E(A) = \|F_A^+\|^2 + \|F_A^-\|^2$ collapses in other dimensions. One should also note what the theorem does not say: if the topological class of $E$ satisfies $\int_M \operatorname{tr}(F_A \wedge F_A) > 0$, then anti-self-dual connections are minimisers on $M$ with its given orientation, but self-dual connections ($F_A^- = 0$) are minimisers on $\bar{M}$ (the same manifold with reversed orientation). A third caveat is that metric-compatibility is genuinely required: without it, $F_A$ need not be skew-Hermitian, and the fibre norm $|F_A|^2 = -\operatorname{tr}(F_A^2)$ is not well-defined. These observations point forward to the moduli space of anti-self-dual connections, where the interplay between topology and analysis is most fully realised.
[example: ASD Connections on the Round $S^4$]
The prototypical example of an anti-self-dual connection is the basic instanton on the Hopf bundle. Consider the $\mathrm{SU}(2)$ bundle $E \to S^4$ associated to the Hopf fibration $S^7 \to S^4$ with fibre $S^3 \cong \mathrm{SU}(2)$. With the round metric on $S^4$, the natural connection on this bundle — constructed via the quaternionic structure on $\mathbb{R}^4 \cong \mathbb{H}$ — has curvature $F_A$ that is purely anti-self-dual: $F_A^+ = 0$. This connection achieves the topological lower bound $\mathrm{YM}_E(A) = \int_{S^4} \operatorname{tr}(F_A \wedge F_A) = 8\pi^2$. The moduli space of all ASD connections on this bundle over $S^4$ is 5-dimensional (before modding out gauge equivalence), and this 5-manifold with its compactification is a basic building block in Donaldson's proof of his theorem.
[/example]
Anti-self-dual connections are therefore the absolute minimisers of the Yang-Mills functional within any topological class of connections. Their defining equation $F_A^+ = 0$ is a first-order PDE (as opposed to the second-order Yang-Mills equation), and crucially it is an **elliptic** equation, meaning that the moduli space of solutions (modulo gauge equivalence) is a finite-dimensional smooth manifold. This finiteness is what makes anti-self-dual connections tractable tools for topology.
## Donaldson's Theorem
What topological constraints does the existence of a smooth structure impose on a 4-manifold? In the topological category, Freedman's theorem (1982) shows that every unimodular symmetric bilinear form over $\mathbb{Z}$ is realised as the intersection form of some simply connected topological 4-manifold. The smooth category is far more restrictive: Donaldson used the geometry of anti-self-dual connections to show that, in the smooth setting, positive-definite intersection forms must be diagonal. The contrast — every unimodular form is topologically realisable, but only the standard diagonal form is smoothly realisable — is one of the most striking discoveries in modern geometry.
[quotetheorem:1563]
[citeproof:1563]
[remark: Topological 4-Manifolds and the $E_8$ Form]
The assumption that the intersection form is positive definite is essential. There exist non-diagonalisable definite forms over $\mathbb{Z}$, most famously the $E_8$ form. Remarkably, Freedman had proved (also in 1983) that every unimodular symmetric bilinear form arises as the intersection form of some simply connected topological 4-manifold. Donaldson's theorem then implies that the $E_8$ topological 4-manifold admits no smooth structure: it is a topological manifold that cannot be smoothed. This contrast between topological and smooth 4-manifolds has no analogue in any other dimension.
[/remark]
Several hypotheses in Donaldson's theorem are indispensable. Simple-connectedness ($\pi_1(M) = 1$) is used to ensure that gauge transformations have a tractable homotopy theory, so that the moduli space $\mathcal{M}_{\mathrm{ASD}}$ is a manifold rather than an orbifold; without it, the cobordism argument breaks down. Compactness of $M$ is needed to guarantee that the Yang-Mills functional achieves its infimum and that the moduli space is compact after adding ideal-instanton boundary components. The theorem also assumes positive definiteness: the indefinite case (where the intersection form has both positive and negative eigenvalues) is not covered by this argument, and indefinite forms are more flexible — any indefinite unimodular form is smoothly realisable by connected sums of $S^2 \times S^2$ and $\mathbb{C}P^2$-type pieces. Freedman's topological realisation theorem is the essential complement: together, the two results give a near-complete picture of which unimodular forms arise as smooth versus topological intersection forms in dimension 4.
Donaldson's theorem is the culmination of the course: it shows how the abstract machinery of connections, curvature, and the Yang-Mills functional — developed systematically over the preceding chapters — can yield concrete, non-trivial information about the topology of manifolds. The interplay between analysis (ellipticity, moduli spaces of solutions to PDEs) and topology (cobordism arguments, intersection forms) exemplified here is one of the defining features of modern differential geometry.
## References
- Minter, P. *Differential Geometry (Part III)*. Lecture notes, Cambridge University, Michaelmas 2017. Lectures by Ivan Smith.
Contents
- 1. Smooth Manifolds
- Topological and Smooth Manifolds
- Immersions, Submersions, and Submanifolds
- Tangent Vectors and Tangent Bundles
- Tangent Vectors via the Embedding (Provisional)
- Tangent Vectors via Germs of Curves
- Tangent Vectors via Derivations
- Tangent Vectors via Cocycles
- Vector Fields, Flows, and Completeness
- Partitions of Unity and the Whitney Embedding Theorem
- Cotangent Bundles, Lie Algebras, and Lie Groups
- The Cotangent Bundle and 1-Forms
- The Lie Bracket and Lie Algebras
- Lie Groups
- Tensors and the Lie Derivative
- Tensor Bundles
- The Lie Derivative
- Integrability and the Frobenius Theorem
- Distributions and Involutivity
- The Frobenius Integrability Theorem
- Applications of Frobenius
- 2. Differential Forms and Curvature
- Tensors and the Exterior Algebra
- Differential Forms and the Exterior Derivative
- de Rham Cohomology
- Orientation and Integration
- Stokes' Theorem
- Manifold Type and Finite Dimensionality
- Moser's Theorem
- 3. Connections
- Bundle-Valued Differential Forms
- Connections and the Connection Matrix
- Why Connections Are Needed
- The Connection Matrix
- Change of Frame and the Transformation Law
- Existence and the Space of Connections
- Extension to Higher Forms
- Curvature
- Computing Curvature Locally
- Induced Connections on Associated Bundles
- The Bianchi Identity
- Covariant Constancy and Flat Connections
- Holonomy
- Chern-Weil Theory
- Powers of the Curvature
- Example: The Tautological Line Bundle over $\mathbb{C}P^1$
- Torsion
- Torsion and Curvature via the Covariant Derivative
- 4. Geometric Structures
- Affine Structures
- Symplectic Structures
- Linear Symplectic Algebra
- Topological Obstructions from Symplectic Geometry
- Lagrangian Subspaces and Submanifolds
- The Cotangent Bundle as a Symplectic Manifold
- Lagrangian Foliations
- Riemannian Structures
- The Levi-Civita Connection
- The Koszul Formula
- Flat Riemannian Manifolds
- Riemann Curvature
- Symmetries of the Curvature Tensor
- Ricci and Scalar Curvature
- Einstein Manifolds
- Constructing Einstein Manifolds from Group Symmetry
- An Obstruction from Chern–Weil Theory
- 5. Geodesics
- Covariant Differentiation Along a Curve
- Geodesics and the Geodesic Equation
- The Exponential Map and Distinguished Coordinates
- Geodesic Normal Coordinates
- Geodesic Polar Coordinates
- Gauss' Lemma and Length Minimisation
- The Hopf–Rinow Theorem
- Energy, Variations, and the Index Form
- First Variation of Energy
- Second Variation and the Index Form
- Myers' Theorem
- 6. The Yang-Mills Equation
- Metrics on Bundles and Forms
- The Euler-Lagrange Equations
- The Hodge Star in Dimension 4
- Anti-Self-Dual Connections and the Yang-Mills Minimum
- Donaldson's Theorem
- References
Cambridge III Differential Geometry
Content
Problems
History
Created by admin on 4/24/2026 | Last updated on 4/24/2026
Prerequisites
No prerequisites required for this page.
Rate this page
★
★
★
★
★
Poor
Excellent