Graph theory studies the combinatorial structure of networks through vertices and edges, asking fundamental questions about connectivity, colouring, and extremal properties. This course develops the core theory underlying modern combinatorics, beginning with essential objects like trees and planar graphs before advancing to deep structural results. The Cambridge II curriculum emphasises three interconnected themes: min-max duality (where upper and lower bounds on graph properties meet in equilibrium), the probabilistic method (where existence proofs avoid explicit construction), and the surprising power of algebraic invariants in capturing combinatorial phenomena.
The course progresses from local to global understanding. Early chapters establish foundations: how connectivity and matching relate through Hall's theorem and Menger's theorem, and how the chromatic number — a fundamental measure of graph structure — appears across colouring, extremal theory, and random graph models. The extremal results of Chapter 4 answer the question "how many edges can a graph have while avoiding a given subgraph," a line of inquiry culminating in Turán's theorem and feeding directly into Ramsey theory. Chapters 5 and 6 reveal the probabilistic method as a tool for proving existence results that constructive approaches cannot reach, yielding both Ramsey numbers and the surprising fact that random graphs can simultaneously have large girth and chromatic number.
The final chapter reframes graph theory through linear algebra and spectral methods, showing that eigenvalues of the adjacency matrix encode deep information about graph structure. This algebraic perspective not only unifies earlier results but opens connections to geometry and algebra that extend far beyond the combinatorial plane. Throughout, the course demonstrates how the same graph-theoretic objects — trees, colourings, dense subgraphs — appear across vastly different contexts, each yielding new insights into what makes a graph "typical," "extremal," or "structured."
# 1. Introduction
This chapter lays the combinatorial foundations of graph theory. We introduce the basic objects — graphs, their substructures, and the structural properties that distinguish them — and study three fundamental classes: trees, bipartite graphs, and planar graphs. The central questions are: what does connectivity look like locally and globally, how do acyclic structures characterise spanning subgraphs, and what constraints does planarity impose on edge counts?
## Basic Definitions and Graph Families
[motivation]
### Why graphs?
Many problems in combinatorics, computer science, and geometry reduce to questions about pairwise relationships between objects. A graph is the minimal abstraction capturing exactly this: a set of objects (vertices) and a set of pairs (edges). By stripping away everything except the adjacency structure, graph theory isolates the combinatorial essence of connectivity, reachability, and layout problems.
The power of the abstraction lies in what it ignores. A graph does not know about the physical positions of its vertices, the lengths of its edges, or any weighting. Two graphs are the same combinatorial object if there is a bijection between their vertex sets preserving adjacency — a graph isomorphism. This means we study structure, not representation.
[/motivation]
We fix some notation that will be used throughout the course. Write $[n] = \{1, \ldots, n\}$ for the set of the first $n$ positive integers. For a set $X$ and $k \in \mathbb{N}$, write $X^{(k)} = \{Y \subseteq X \mid |Y| = k\}$ for the collection of all $k$-element subsets of $X$.
[definition: Graph]
A **graph** is a pair $G = (V, E)$, where $V$ is a set of **vertices** and $E \subseteq V^{(2)}$ is a set of **edges** — that is, $E$ is a collection of 2-element subsets of $V$. We write $V(G)$ for the vertex set and $E(G)$ for the edge set. The **order** of $G$ is $|G| = |V(G)|$, and the **size** is $e(G) = |E(G)|$.
[/definition]
Edges are unordered pairs: $\{u, v\} = \{v, u\}$. There are no loops (edges from a vertex to itself) and no multiple edges between the same pair of vertices — this is a **simple** graph. We sometimes write $uv$ as shorthand for the edge $\{u, v\}$, so for instance $13$ denotes the edge $\{1, 3\}$. Most graphs in this course are finite, meaning $|V(G)|$ is finite.
Before listing families, we note that $K_3$ and $C_3$ are the same graph by definition — the cycle $C_3$ is the complete graph $K_3$. This is a first sign that the same combinatorial object can arise through different constructions, and that what matters is the adjacency structure, not the label attached to it. The formal notion making this precise is graph isomorphism, defined below.
The following families serve as running examples and benchmarks throughout the course. Their names encode specific structural features that we will recognise again and again.
**Complete graph $K_n$**: the graph with $V = [n]$ and $E = V^{(2)}$, so every pair of vertices is joined by an edge. It has $\binom{n}{2}$ edges. The complete graph is the densest simple graph on $n$ vertices.
**Empty graph $\overline{K}_n$**: the graph with $V = [n]$ and $E = \varnothing$. No two vertices are adjacent. It sits at the opposite extreme from $K_n$.
**Path $P_n$**: the graph with vertex set $V = [n+1]$ and edge set $E = \{\{1,2\}, \{2,3\}, \ldots, \{n, n+1\}\}$. It has $n$ edges connecting $n+1$ vertices in a line. Paths are the simplest trees; they will appear as the extremal case in several results.
**Cycle $C_n$**: the graph with vertex set $V = [n]$ and edge set $E = \{\{1,2\}, \{2,3\}, \ldots, \{n-1,n\}, \{n,1\}\}$. It has $n$ vertices and $n$ edges forming a closed loop. Even and odd cycles behave very differently — odd cycles are the fundamental obstruction to bipartiteness.
These four families are not merely illustrations: $K_n$ maximises edges, $P_n$ and $C_n$ are the simplest connected and cyclic structures, and their parity (even vs. odd) turns out to govern deep properties of the graphs that contain them.
[definition: Neighbourhood and Degree]
Let $G$ be a graph and $x \in V(G)$. The **neighbourhood** of $x$ in $G$ is
\begin{align*}
N_G(x) = \{y \in V(G) \mid \{x, y\} \in E(G)\}.
\end{align*}
If $y \in N_G(x)$, we say $x$ and $y$ are **adjacent** and write $x \sim y$. The **degree** function is the map $\deg: V(G) \to \mathbb{N}$ assigning to each vertex $x$ the size of its neighbourhood: $\deg(x) = |N_G(x)|$. We write $\delta(G)$ for the minimum degree and $\Delta(G)$ for the maximum degree of $G$.
[/definition]
Note that $\sim$ is irreflexive ($x \not\sim x$ since there are no loops) and symmetric, but not transitive in general — adjacency does not propagate along paths.
Once we have a language for individual graphs and their local structure, a natural organising question arises: when should we regard two graphs as being "really the same"? The vertex labels are arbitrary — any two drawings of what we would informally call "the same graph" differ only in how the vertices happen to be named. To make this idea precise, we need a bijection between vertex sets that preserves the adjacency relation. This leads to the central notion of graph isomorphism, which underpins the rest of the course: statements about graphs are always statements about isomorphism classes, never about particular labellings.
[definition: Graph Isomorphism]
Let $G, H$ be graphs. A **graph isomorphism** is a bijection $\varphi: V(G) \to V(H)$ such that
\begin{align*}
\{u, v\} \in E(G) \iff \{\varphi(u), \varphi(v)\} \in E(H).
\end{align*}
We say $G$ and $H$ are **isomorphic**, written $G \cong H$, if such a bijection exists.
[/definition]
Isomorphism captures the idea that two graphs are "the same" up to relabelling of vertices. The complete graph $K_n$ is, up to isomorphism, unique for each $n$ — the labelling of vertices is irrelevant to the combinatorial structure.
[definition: Subgraph and Induced Subgraph]
A graph $H$ is a **subgraph** of $G$, written $H \subseteq G$, if $V(H) \subseteq V(G)$ and $E(H) \subseteq E(G)$.
For $X \subseteq V(G)$, the **induced subgraph** $G[X]$ is the graph with vertex set $X$ and edge set $\{xy \in E(G) \mid x \in X, y \in X\}$ — it retains all edges of $G$ with both endpoints in $X$.
For $x \in V(G)$, write $G - x = G[V(G) \setminus \{x\}]$. For an edge $xy \in E(G)$, write $G - xy = (V(G), E(G) \setminus \{xy\})$, and for $x, y \in V(G)$ with $xy \notin E(G)$, write $G + xy = (V(G), E(G) \cup \{xy\})$.
[/definition]
## Walks, Paths, and Connectivity
To reason about reachability in a graph, we need to distinguish between sequences of vertices that may revisit points (walks) and those that do not (paths).
[definition: Walk and Path]
Let $x, y \in V(G)$. A **walk** from $x$ to $y$ in $G$ is a sequence of vertices $(x = v_0, v_1, \ldots, v_k = y)$ such that $\{v_i, v_{i+1}\} \in E(G)$ for all $0 \le i < k$. The **length** of the walk is $k$ (the number of edges traversed).
A **path** from $x$ to $y$ is a walk in which all vertices are distinct.
[/definition]
The concatenation of two walks $P$ and $P'$ (when the terminal vertex of $P$ equals the initial vertex of $P'$) is written $PP'$. The concatenation of two walks is again a walk. The concatenation of two paths need not be a path, since the two paths may share a vertex.
[quotetheorem:2007]
[citeproof:2007]
This proposition is small but essential: it lets us pass freely from the existence of a walk to the existence of a path, which is a much more structured object. The hypothesis $x \neq y$ is genuinely needed here — the one-vertex walk at a single vertex $x$ is not a path of positive length, and the statement would be false for $x = y$.
The connected components of a graph partition its vertices into maximal subsets between which paths exist.
[definition: Connected Graph]
A graph $G$ is **connected** if for every pair of vertices $x, y \in V(G)$, there exists an $x$–$y$ path in $G$.
[/definition]
Once we know two vertices can be connected by some path, it is natural to ask how efficiently — leading to the notion of distance.
[definition: Distance]
Let $G$ be a connected graph and $x, y \in V(G)$. The **distance** $d(x, y)$ is the length of the shortest $x$–$y$ path in $G$.
[/definition]
One checks that $d$ satisfies the metric axioms: $d(x,x) = 0$, $d(x,y) = d(y,x)$, and the triangle inequality $d(x,z) \le d(x,y) + d(y,z)$. Thus the vertex set of a connected graph carries a natural metric structure.
## Trees
Given a connected graph, one can ask: what is the minimal edge set that still preserves connectivity? Removing any edge from such a minimal structure disconnects the graph — every edge is a bridge. These minimal connected graphs are precisely the trees, and they turn out to be characterised simultaneously by minimality of connectivity and maximality of acyclicity.
Trees are important both intrinsically and as a tool: every connected graph contains a spanning tree, which provides a framework for inductive arguments. The absence of cycles in a tree is what makes induction so clean — removing a leaf reduces the problem by a single vertex without breaking connectivity.
[definition: Acyclic Graph and Tree]
A graph $G$ is **acyclic** if it contains no cycle $C_k$ as a subgraph. A graph $G$ is a **tree** if it is both acyclic and connected.
[/definition]
Trees can be characterised in several equivalent ways, each illuminating a different facet of their structure.
[quotetheorem:2008]
[citeproof:2008]
[remark: Interpretation of the Characterisations]
The three characterisations illuminate trees from different perspectives. Minimal connectivity says a tree is the thriftiest connected graph — remove any single edge and connectivity is lost. Maximal acyclicity says a tree fills in all possible connections that do not create cycles. Together they show trees occupy a precise boundary between "not enough edges" and "too many edges."
[/remark]
The next step is to understand what trees look like locally. Every tree with at least two vertices has at least one vertex of degree 1 — a vertex connected to the rest of the tree by a single edge. These are the leaves.
[definition: Leaf]
Let $T$ be a tree. A **leaf** of $T$ is a vertex $v \in V(T)$ with $\deg(v) = 1$.
[/definition]
Leaves are more than just a structural curiosity — they are the induction handles that make proofs about trees tractable. The following result guarantees their existence.
[quotetheorem:2009]
[citeproof:2009]
[illustration:trees-have-leaves-longest-path]
The hypothesis $|T| \ge 2$ is essential: a single-vertex tree has no edges and its unique vertex has degree 0, so it cannot have a leaf in the sense above. The bound "at least two" is tight — a path $P_n$ has exactly two leaves (its endpoints), so there is no room to improve the lower bound. The importance of this result goes beyond the count: the leaf $x_k$ identified in the proof is the induction anchor for the next theorem, where removing a leaf reduces the tree to a smaller tree.
[quotetheorem:2010]
[citeproof:2010]
The edge count $n - 1$ is a sharp invariant of trees. For contrast: any connected graph on $n$ vertices with a cycle has at least $n$ edges, because each independent cycle adds one edge beyond what is needed for connectivity. The difference $|E(G)| - (n - 1)$ counts the number of independent cycles in $G$, a quantity known as the **cycle rank** or first Betti number of $G$. The edge count of trees is therefore the base case of a more general formula; for trees the cycle rank is 0, and every additional edge beyond $n - 1$ creates exactly one new independent cycle.
[definition: Spanning Tree]
Let $G$ be a connected graph. A **spanning tree** of $G$ is a subgraph $T \subseteq G$ with $V(T) = V(G)$ that is a tree.
[/definition]
A spanning tree retains all vertices but only enough edges to maintain connectivity — it discards all redundant (cyclic) edges while keeping the graph together.
[quotetheorem:2011]
[citeproof:2011]
[remark: Use of Spanning Trees]
The existence of spanning trees is the basis for many inductive arguments in graph theory. When proving a property of all connected graphs, it often suffices to prove it for trees (by induction on $n - 1 = |E(T)|$) and then argue that adding edges back preserves or can be handled separately.
The removal argument in the proof above is precisely the naive cycle-breaking algorithm: find any cycle, delete any edge from it, repeat until no cycles remain. Equivalently, one can grow a spanning tree from the bottom up — start with a single vertex and repeatedly add an edge that connects a new vertex without creating a cycle. This growth template is the skeleton of both Kruskal's and Prim's algorithms for minimum spanning trees, which refine the choice of "next edge" by edge weight.
If we drop the connectedness hypothesis, every connected component gets its own spanning tree, and their disjoint union is a **spanning forest** of $G$ — a maximal acyclic subgraph containing every vertex, with exactly $|V(G)| - c$ edges where $c$ is the number of components.
Spanning trees are not unique: a connected graph typically has many of them. The number of spanning trees of a graph $G$ is given by the **Matrix-Tree theorem**, which expresses it as any cofactor of the graph's Laplacian matrix. For the complete graph $K_n$ this specialises to **Cayley's formula** $n^{n-2}$ — a result we will meet again when we study enumerative aspects of trees.
[/remark]
## Bipartite Graphs
Can we assign one of two "colours" to every vertex of a graph so that no two adjacent vertices share a colour? This 2-colouring question arises naturally in scheduling (grouping tasks so that conflicting tasks fall on different days), matching problems (pairing workers to jobs with no worker assigned two jobs), and the study of graph cycles. The graphs that admit such a colouring are precisely the bipartite graphs, and the obstruction turns out to be purely structural: a graph fails to be 2-colourable if and only if it contains an odd cycle.
[definition: Bipartite Graph]
A graph $G = (V, E)$ is **bipartite** if $V = A \cup B$ with $A \cap B = \varnothing$ such that every edge $\{x, y\} \in E$ satisfies $x \in A, y \in B$ or $x \in B, y \in A$. The pair $(A, B)$ is called a **bipartition** of $G$.
[/definition]
The sets $A$ and $B$ are independent sets: no two vertices within the same part are adjacent. The complete bipartite graph $K_{n,m}$ is the bipartite graph with $|A| = n$, $|B| = m$, and all $nm$ possible edges between $A$ and $B$.
[example: Even and Odd Cycles]
The cycle $C_{2n}$ (even length) is bipartite: partition vertices into those at even distance from vertex 1 and those at odd distance. The cycle $C_{2n+1}$ (odd length) is not bipartite. To see why, suppose we try a bipartition and colour vertex 1 with colour $A$. Traversing the cycle forces alternating colours $A, B, A, B, \ldots$; after $2n+1$ steps we return to vertex 1 having been forced to use colour $B$ — a contradiction. This observation generalises: odd cycles are the obstruction to bipartiteness.
[/example]
The connection between bipartiteness and odd cycles is made precise by one of the fundamental theorems of graph theory.
[definition: Circuit]
A **circuit** in $G$ is a sequence of vertices $x_1, x_2, \ldots, x_\ell, x_1$ (returning to the start) such that $x_i x_{i+1} \in E(G)$ for each $i$, with $x_{\ell+1} = x_1$. This is a **closed walk**. Its **length** is $\ell$. A circuit is **odd** if $\ell$ is odd and **even** if $\ell$ is even.
[/definition]
A circuit is the natural closed object produced when two paths from a common source to a common destination are concatenated: if $P$ is an $x$–$y$ path and $Q$ is a $y$–$x$ path, then $PQ$ is a circuit (a closed walk), even if it repeats vertices. The distinction between circuits and cycles — the latter requiring all vertices distinct — is important in the next proof, where we extract a genuine cycle from a closed walk that may have repeated vertices.
[quotetheorem:2012]
[citeproof:2012]
This result functions as a black box inside the bipartite characterisation proof: we do not need to construct an odd cycle directly from the BFS paths; we only need to produce an odd closed walk (circuit), and the theorem above guarantees an odd cycle exists inside it.
The "odd" hypothesis is essential. If we drop it, the conclusion can fail: an even closed walk need not contain any even cycle. Consider two triangles sharing a single vertex $v$ (a "bowtie" graph). Traversing the first triangle and then the second gives a closed walk of length 6, yet the only cycles in this graph are the two triangles themselves, both of odd length 3. More generally, the parity of a closed walk is the parity of the sum of the lengths of the cycles that make up its decomposition (with the walk traversing each cycle some number of times and possibly repeating edges). An odd circuit forces at least one summand of odd length, hence at least one odd cycle; an even circuit imposes no such constraint, since an even sum can be built from any number of odd or even summands.
[quotetheorem:2013]
[citeproof:2013]
[remark: Distance Parity and Bipartitions]
The proof gives an explicit construction of the bipartition: sort vertices by the parity of their distance from a fixed root. This "BFS-level parity" perspective is useful in algorithmic graph theory, and it translates directly into a linear-time bipartiteness test. Run a breadth-first search from any vertex $x_0$, assigning each vertex a colour according to the parity of its BFS depth (even depth to $V_0$, odd depth to $V_1$). Because BFS depth equals graph distance $d(x, x_0)$, tree edges always connect vertices of opposite parity; we only need to inspect non-tree edges. If every non-tree edge also goes between the two colour classes, the partition witnesses bipartiteness. If some non-tree edge joins two vertices of the same colour, the proof above produces an odd closed walk — and by the previous theorem, an explicit odd cycle, which is a certificate of non-bipartiteness.
[/remark]
[illustration:bipartite-bfs-levels]
## Planar Graphs
When laying out an electrical circuit or drawing a network, a fundamental question is whether the connections can be routed without any two wires crossing. This is not merely an aesthetic concern: crossings in circuit boards can cause electrical interference, and in graph drawings they obscure structure. The graphs that admit crossing-free drawings are the planar graphs, and their study reveals a striking interplay between topology and combinatorics.
[definition: Plane Graph and Planar Graph]
A **plane graph** is a drawing of a graph in the plane $\mathbb{R}^2$ representing edges as piecewise-linear curves, with no two edges crossing except possibly at shared endpoints.
A graph $G$ is **planar** if it admits a plane graph representation.
[/definition]
[illustration:k4-planar-embedding]
[example: Planar and Non-planar Graphs]
The graphs $K_1, K_2, K_3$ are planar in the degenerate sense — their vertices can be placed anywhere with edges as straight lines and no crossings arise. For $K_4$: place three vertices at the corners of a triangle, and the fourth vertex at the center of the triangle. Connect each corner to the center with three straight edges running through the interior. The three edges of the outer triangle and the three edges to the center give all $\binom{4}{2} = 6$ edges of $K_4$ without crossings.
The path $P_n$ is planar for all $n \in \mathbb{N}$. The complete bipartite graph $K_{n,2}$ is planar: place the two vertices of the 2-part, say $a$ and $b$, at the top and bottom of the drawing. Place the $n$ vertices of the other part in a horizontal row between them. Draw the $n$ edges from $a$ straight down to each vertex, and the $n$ edges from $b$ straight up to each vertex. All edges go between layers with no crossings.
The complete graph $K_5$ is not planar, as we will see from Euler's formula. Surprisingly, $K_{3,3}$ is also not planar, but this does not follow from the same edge-count argument — we need a refined bound.
[/example]
[definition: Faces of a Plane Graph]
Let $G$ be a plane graph. The complement $\mathbb{R}^2 \setminus G$ has finitely many connected components, each called a **face** of $G$. Exactly one face is unbounded, called the **outer face**. The **boundary** of a face $F$ is the set $\partial F$ of vertices and edges of $G$ that lie on the boundary of the region.
[/definition]
For example, $K_4$ drawn as in the example above has four faces: three inner triangular regions and one outer (unbounded) face.
[remark: Faces and Graph Structure]
The boundary of a face need not be a cycle, and need not be connected. Two different plane drawings of the same graph can have topologically different face structures. However, the *number* of faces is determined by the graph's combinatorial structure via Euler's formula.
[/remark]
The number of faces, edges, and vertices of a connected plane graph are not independent — they are tied together by Euler's formula, one of the oldest and deepest results in combinatorics.
[quotetheorem:2014]
[citeproof:2014]
Connectedness is essential in Euler's formula. For a plane graph with $c$ connected components, the correct statement is $|V(G)| - |E(G)| + F = 1 + c$: the formula fails for disconnected graphs because different components contribute separate face counts that do not combine as in the inductive step. A further observation: although the number of faces can vary between different plane drawings of the same graph, Euler's formula guarantees that $F = 2 - |V(G)| + |E(G)|$ is a fixed combinatorial invariant — any two plane drawings of the same connected graph have the same number of faces. The quantity $|V| - |E| + F$ is the **Euler characteristic** of the drawing surface; for drawings in the plane (equivalently, on the sphere $S^2$), it equals 2, reflecting the topology of the 2-sphere.
Euler's formula has immediate corollaries bounding edge density for planar graphs.
[quotetheorem:2015]
[citeproof:2015]
The hypothesis $|G| \ge 3$ is needed because for $|G| \le 2$ every face has degree at most 2 (a single edge bounds the same face on both sides), so the face-degree estimate $3F \le 2e(G)$ breaks down. The bound is tight: triangulations — plane graphs where every face (including the outer face) is a triangle — achieve $e(G) = 3|G| - 6$. Adding the edge $xy$ to a non-adjacent pair $x, y$ in a triangulation would necessarily create a crossing, since every face is already a triangle with no room for a chord. The argument rests on the key lemma "each face has degree at least 3," which captures the combinatorial content of $|G| \ge 3$; any relaxation of this would allow faces of degree 1 or 2, widening the bound.
[remark: Non-planarity of K5]
For $K_5$: $e(K_5) = 10$ and $3|K_5| - 6 = 9$. Since $10 > 9$, $K_5$ violates the bound and is therefore not planar.
For $K_{3,3}$: $e(K_{3,3}) = 9$ and $3|K_{3,3}| - 6 = 12$. The bound is not violated! A stronger version is needed.
[/remark]
To handle $K_{3,3}$, we use the fact that it is bipartite and hence triangle-free — every face must have degree at least 4.
[quotetheorem:2016]
[citeproof:2016]
For $K_{3,3}$: $e(K_{3,3}) = 9$ and $2(|K_{3,3}| - 2) = 8$. Since $K_{3,3}$ is bipartite it has no triangles, yet $9 > 8$ — so $K_{3,3}$ is not planar.
The bound $2n - 4$ is tight: it is saturated by triangle-free planar graphs in which every face has degree exactly 4 — so-called **quadrangulations**. A canonical example is the 3-dimensional hypercube $Q_3$, with $n = 8$ vertices, $12$ edges, and six 4-cycle faces; indeed $12 = 2 \cdot 8 - 4$. This bound is strictly weaker than the general planar bound $3n - 6$ because forbidding triangles forces the shortest face boundary to grow from 3 to 4, which in turn cuts down the edge-to-face ratio. The pattern generalises to any girth constraint: a planar graph with girth at least $g$ (smallest cycle length $\ge g$) satisfies
\begin{align*}
e(G) \le \frac{g}{g-2}(|G| - 2).
\end{align*}
For $g = 3$ this recovers $3n - 6$; for $g = 4$ (triangle-free) it gives $2n - 4$; for $g = 5$ (triangle- and square-free) it gives $\frac{5}{3}(n - 2)$. Higher girth squeezes the edge count tighter, reflecting the topological cost of large girth in a planar embedding.
These edge-count arguments show that certain graphs cannot be planar, but they cannot directly characterise planarity for all graphs. The complete characterisation is given by Kuratowski's theorem.
[definition: Subdivision]
A **subdivision** of a graph $G$ is a graph obtained by replacing some edges of $G$ with internally vertex-disjoint paths. Formally, each edge $\{u,v\}$ that is subdivided is replaced by a path $u = w_0, w_1, \ldots, w_k = v$ where $w_1, \ldots, w_{k-1}$ are new vertices not in $V(G)$.
[/definition]
[remark: Subdivisions and Planarity]
A subdivision of a planar graph is planar (inserting extra vertices along edges preserves any plane drawing). Conversely, a subdivision of a non-planar graph is non-planar. In particular, if $G$ contains a subgraph that is a subdivision of $K_5$ or $K_{3,3}$, then $G$ is not planar.
[/remark]
The remarkable converse — that these two obstructions are the *only* obstructions — is Kuratowski's theorem.
[quotetheorem:2017]
The forward direction (planar implies no such subdivision) follows because subdivisions of $K_5$ and $K_{3,3}$ are themselves non-planar. The reverse direction — that the absence of these two obstructions is sufficient for planarity — is the deep part of the theorem. The proof uses techniques from topological graph theory and is not covered in this course.
[remark: Kuratowski as a Finite Obstruction Theorem]
Kuratowski's theorem is an instance of a finite obstruction result: planarity is a property that fails precisely because of two specific "bad" substructures. This philosophy — characterising a graph property by a finite set of minimal obstructions — appears throughout graph theory and is related to the Robertson–Seymour graph minors theorem.
[/remark]
The characterisation of graph properties by finite sets of minimal obstructions that we glimpsed through Kuratowski's theorem now becomes central to understanding graph structure. The next chapter develops two fundamental themes — matching and connectivity — that reveal how to decompose and robustly preserve graph structure, providing both conceptual frameworks and efficient algorithms for what are among the most practically important problems in the field.
# 2. Connectivity and Matching
This chapter develops two central structural themes in graph theory: when and how you can match the vertices of one set bijectively into another, and how robustly a graph stays connected under the removal of vertices or edges. These questions — seemingly different — are unified by an underlying min-max philosophy: the obstruction to a large structure (matching, path system) is always a small witness (neighbourhood deficiency, separator). Building on the basic graph-theoretic definitions from Chapter 1, the chapter culminates in Hall's theorem for matchings and Menger's theorem for connectivity, both cornerstones of combinatorics with wide applications.
## Matchings in Bipartite Graphs
To see why matching is a non-trivial problem, consider three vertices $x_1, x_2$ on the left and a single vertex $y$ on the right, with both $x_1$ and $x_2$ adjacent only to $y$. Any matching from $\{x_1, x_2\}$ to $\{y\}$ would need to assign a distinct right-neighbour to each of $x_1$ and $x_2$, but there is only one right-vertex available. More precisely, the set $A = \{x_1, x_2\}$ satisfies $|N(A)| = 1 < 2 = |A|$: the neighbourhood of $A$ is too small to absorb all of $A$. This simple "two men, one woman" obstruction turns out to be the only possible obstruction to a matching, a fact made precise by Hall's theorem below.
A **bipartite graph** $G = (X \sqcup Y, E)$ has vertex set partitioned into two disjoint classes $X$ and $Y$, with every edge running between the two sides. The natural question is: can we assign to each vertex in $X$ a distinct neighbour in $Y$?
[definition: Matching From X to Y]
Let $G = (X \sqcup Y, E)$ be a bipartite graph. A **matching from $X$ to $Y$** is a set of edges $E' \subseteq E$ of the form $\{xy_x \mid x \in X\}$ such that the assignment $x \mapsto y_x$ is injective — every vertex in $X$ is matched to a distinct vertex of $Y$.
[/definition]
The existence of such a matching is impossible if some subset $A \subseteq X$ has too few neighbours: if $|N(A)| < |A|$, there are not enough vertices in $Y$ near $A$ to accommodate all of $A$ in the matching. Hall's theorem says this obstruction is the only obstruction.
To state it precisely, we record the neighbourhood notation:
[definition: Neighbourhood of a Set]
Let $G$ be a graph and $A \subseteq V(G)$. The **neighbourhood** of $A$ is
\begin{align*}
N_G(A) = \bigcup_{x \in A} N(x),
\end{align*}
the set of all vertices adjacent to at least one vertex in $A$. When the graph is clear from context we write $N(A)$.
[/definition]
The neighbourhood condition is the key tool for verifying (or ruling out) matchings. Before stating the main theorem, it is worth noting that checking Hall's condition in general requires examining all $2^{|X|}$ subsets of $X$, which is computationally expensive. Hall's theorem nonetheless gives a complete characterisation of when a matching exists.
[quotetheorem:2018]
[citeproof:2018]
Hall's theorem has a natural quantitative extension. When a full matching is impossible, we want to know the best we can do.
[definition: Matching of Deficiency d]
A **matching of deficiency $d$** from $X$ to $Y$ is a matching from some $X' \subseteq X$ to $Y$ where $|X'| = |X| - d$. In other words, at most $d$ vertices of $X$ are left unmatched.
[/definition]
The Defect Hall theorem tells us exactly when a matching of deficiency $d$ exists. The key idea is to reduce this to standard Hall by appending $d$ dummy vertices on the $Y$-side, each connected to everything in $X$: the dummy vertices act as "wildcards" that can absorb up to $d$ unmatched vertices from $X$. This reduction is robust — the dummy-vertex trick works for any $d \geq 0$, and the case $d = 0$ recovers Hall's theorem exactly. Notice that when $d = |X|$, the condition becomes $|A| \leq |N(A)| + |X|$, which holds for every $A$ automatically, so a matching of deficiency $|X|$ always exists (the empty matching). This is consistent: the deficiency condition is non-vacuous only when $d < |X|$.
[quotetheorem:2019]
[citeproof:2019]
The dummy-vertex trick above is the prototype of a broader technique: embed a constrained matching problem into a larger graph where standard Hall applies. The defect version connects naturally to König's theorem, which characterises the maximum matching size in a bipartite graph via the minimum vertex cover, and which can also be deduced from Defect Hall by optimising over $d$.
### Regular Bipartite Graphs Always Have a Matching
Checking Hall's condition requires examining all $2^{|X|}$ subsets — impractical for even moderate graphs. Are there natural structural conditions that guarantee a matching without the full condition-check? Regularity provides one such condition.
[definition: Regular Graph]
The **maximum degree** $\Delta(G)$ and **minimum degree** $\delta(G)$ of a graph $G$ are the maximum and minimum degrees taken over all vertices. The graph $G$ is **$k$-regular** if every vertex has degree exactly $k$, i.e.\ $\delta(G) = \Delta(G) = k$.
[/definition]
[quotetheorem:2020]
[citeproof:2020]
Several remarks sharpen this result. First, $k \geq 1$ is genuinely necessary: the empty graph ($k = 0$) has no edges, so certainly no matching. Second, $k$-regularity forces $|X| = |Y|$: counting the $k|X|$ edges from the $X$-side and the $k|Y|$ edges from the $Y$-side gives $k|X| = e(G) = k|Y|$, so the matching is actually a perfect matching. Third, regularity cannot be dropped entirely: $K_{1,2}$ (one vertex on the left adjacent to two on the right) is bipartite but not regular; the two-element set $A = \{$ both right-vertices $\}$ has $|N(A)| = 1 < 2 = |A|$, so Hall's condition fails and no matching from $Y$ to $X$ exists. Fourth, this result connects forward to König's edge-colouring theorem: a $k$-regular bipartite graph has a proper $k$-edge-colouring (its edges can be partitioned into $k$ perfect matchings), a fact that requires finding a perfect matching and then proceeding by induction on $k$.
### A Group-Theoretic Application
Hall's theorem has a striking application to coset representatives in finite group theory.
[example: Common Coset Representatives]
Let $\Gamma$ be a finite group and $H \leq \Gamma$ a subgroup of index $n$. Let $L_1, \ldots, L_n$ be the left cosets and $R_1, \ldots, R_n$ the right cosets of $H$. We claim there exist elements $g_1, \ldots, g_n \in \Gamma$ such that $g_1 H, \ldots, g_n H$ are the $n$ left cosets and $H g_1, \ldots, H g_n$ are the $n$ right cosets simultaneously — a **simultaneous system of coset representatives**.
Construct the bipartite graph $G = (\{L_1, \ldots, L_n\} \sqcup \{R_1, \ldots, R_n\}, E)$ where $L_i \sim R_j$ if and only if $L_i \cap R_j \neq \varnothing$. A matching in $G$ gives the desired representatives: each matched pair $(L_i, R_j)$ contributes any $g \in L_i \cap R_j$, which simultaneously represents $L_i$ as a left coset and $R_j$ as a right coset.
To verify Hall's condition, let $A = \{L_{i_1}, \ldots, L_{i_k}\} \subseteq \{L_1, \ldots, L_n\}$. The union $L_{i_1} \cup \cdots \cup L_{i_k}$ has size $k|H|$. Since the right cosets partition $\Gamma$ into classes of size $|H|$, at least $k$ right cosets must intersect this union — more precisely, $|\bigcup_\ell L_{i_\ell}| = k|H|$ and each right coset has exactly $|H|$ elements, so we need at least $k$ right cosets. Hence $|N(A)| \geq k = |A|$, and Hall's condition is satisfied.
In fact, $G$ is $|H|$-regular: each left coset $gH$ of size $|H|$ meets exactly $|H|$ distinct right cosets of $H$, because each element $g' \in gH$ lies in a unique right coset $Hg'$, and as $g'$ ranges over the $|H|$ elements of $gH$ these cosets are all distinct (if $Hg' = Hg''$ then $g'^{-1}g'' \in H$, but both $g',g'' \in gH$ means $g'^{-1}g'' \in H$ exactly when $g' = g''$). So the corollary on regular bipartite graphs also applies directly.
[/example]
## Vertex Connectivity
With matchings in hand, the chapter turns to connectivity: how many vertices must be removed to disconnect a graph, or to reduce it to a single vertex? This question captures the robustness of a network.
Let $S \subseteq V(G)$. We write $G - S = G[V(G) \setminus S]$ for the subgraph obtained by deleting all vertices in $S$ and all edges incident to them.
[definition: Connectivity]
Let $G$ be a graph with $|G| \geq 1$. The **connectivity** $\kappa(G)$ is
\begin{align*}
\kappa(G) = \min\{|S| \mid S \subseteq V(G),\; G - S \text{ is disconnected or a single vertex}\}.
\end{align*}
We say $G$ is **$k$-connected** if $k \leq \kappa(G)$, equivalently, if for every set $S$ of at most $k-1$ vertices the graph $G - S$ is connected and has at least two vertices.
[/definition]
[example: Connectivity of Familiar Graphs]
Several basic examples illustrate the range of $\kappa$:
- For any tree $T$, $\kappa(T) = 1$: removing a leaf's unique neighbour disconnects the tree (or leaves a single vertex).
- For the cycle $C_n$ with $n \geq 3$, $\kappa(C_n) = 2$: deleting any single vertex leaves a path, which is connected, but deleting two non-adjacent vertices can disconnect it.
- For the complete graph $K_n$, $\kappa(K_n) = n - 1$: removing any $n-2$ vertices leaves an edge, which is connected, and removing all $n-1$ other vertices leaves a single vertex.
- The Petersen graph satisfies $\kappa = 3$: it is vertex-transitive and 3-regular, so $\kappa \leq \delta = 3$; removing the three neighbours of any vertex leaves an isolated vertex separated from the rest, so $\kappa \leq 3$; checking that no two-element set disconnects it follows from its high symmetry and girth 5.
[/example]
Two basic inequalities relate connectivity to degree and stability under vertex deletion:
[remark: Basic Connectivity Inequalities]
For any graph $G$: $\kappa(G) \leq \delta(G)$, since removing all neighbours of a minimum-degree vertex disconnects $G$ or leaves it a single vertex. Also, $\kappa(G - x) \geq \kappa(G) - 1$ for any vertex $x$: any separator in $G - x$ can be extended by $x$ to give a separator in $G$. The inequality $\kappa(G) \leq \delta(G)$ can be strict: for example, take two disjoint copies of $K_4$ and identify one vertex from each — the resulting graph has $\delta = 3$ but $\kappa = 1$, since the identified vertex is a cut vertex. Thus the chain $\kappa(G) \leq \lambda(G) \leq \delta(G)$ (where $\lambda$ is the edge connectivity, defined in the next section) can exhibit strict inequalities at both positions.
[/remark]
### Disjoint Paths and Menger's Theorem
The key structural insight for $k$-connectivity is that it is equivalent to the existence of many internally disjoint paths between every pair of vertices.
[definition: Internally Disjoint Paths]
Let $G$ be a graph and $a, b \in V(G)$. We say that $a$–$b$ paths $P_1, \ldots, P_k$ are **internally disjoint** (or simply **disjoint** in this context) if $P_i \cap P_j = \{a, b\}$ for all $i \neq j$ — the paths share only their endpoints.
[/definition]
Internally disjoint paths and separators are dual objects: any separator must contain at least one internal vertex from each disjoint path, so $k$ disjoint paths force any separator to have size at least $k$. The content of Menger's theorem is that this lower bound is always achieved — there is no gap between the maximum number of disjoint paths and the minimum separator size.
[definition: a–b Separator]
Let $G$ be a graph and $a \neq b \in V(G)$ with $a \not\sim b$. A set $S \subseteq V(G) \setminus \{a, b\}$ is an **$a$–$b$ separator** if $G - S$ has $a$ and $b$ in different connected components.
[/definition]
The relationship between separators and disjoint paths is captured by one of the most fundamental theorems in graph theory:
[quotetheorem:2021]
[citeproof:2021]
This local form of Menger's theorem, first proved by Karl Menger in 1927, is one of the foundational results in structural graph theory and network flow theory. The non-adjacency hypothesis $a \not\sim b$ is essential: if $a \sim b$, the edge $ab$ itself is not broken by any $\{a,b\}$-free separator, so the statement would need to count the direct path $a,b$ separately (as is done in the global form). Historically, the induction on $e(G)$ is the natural minimality parameter: fewer edges means a simpler graph, while the path structure between $a$ and $b$ is preserved enough to apply the inductive hypothesis.
The global form, characterising $k$-connectivity in terms of disjoint paths for all vertex pairs, follows readily from the local form via a gadget argument:
[quotetheorem:2022]
[citeproof:2022]
The deduction from local to global uses the following gadget: given a $k$-connected graph and any pair $a, b$ with $a \sim b$, add a new vertex $c$ of degree $k$ joined to $k$ specified vertices, turning the adjacency constraint into a non-adjacency problem to which the local form applies. If $G$ is disconnected then $\kappa(G) = 0$ and there are zero disjoint paths between vertices in different components, so the statement holds vacuously. On the algorithmic side, the global Menger theorem means that computing $\kappa(G)$ reduces to computing max-flow in a related network, giving polynomial-time algorithms for connectivity. A classical consequence is Dirac's theorem: in a $k$-connected graph, any $k$ vertices lie on a common cycle — the many disjoint paths guaranteed by $k$-connectivity can be assembled into such a cycle.
## Edge Connectivity
Vertex connectivity measures robustness against vertex failures. A companion notion measures robustness against edge failures.
[definition: Edge Connectivity]
Let $G$ be a graph. The **edge connectivity** $\lambda(G)$ is
\begin{align*}
\lambda(G) = \min\{|W| \mid W \subseteq E(G),\; G - W \text{ is disconnected}\},
\end{align*}
the minimum number of edges whose removal disconnects $G$. We say $G$ is **$k$-edge-connected** if $k \leq \lambda(G)$.
[/definition]
[example: Edge vs Vertex Connectivity]
These two connectivity parameters can differ substantially:
- For the cycle $C_n$ ($n \geq 3$), both $\kappa(C_n) = 2$ and $\lambda(C_n) = 2$: two vertices or two edges must be removed to disconnect it.
- For the "theta graph" obtained by taking two copies of $K_n$ sharing exactly one vertex $v$, we have $\kappa = 1$ (removing $v$ disconnects the graph) but $\lambda = n - 1$ (one must remove all edges incident to $v$ in one of the $K_n$ halves to disconnect). This shows $\lambda$ can be much larger than $\kappa$.
[/example]
In general, the three parameters satisfy $\kappa(G) \leq \lambda(G) \leq \delta(G)$.
[definition: Edge-Disjoint Paths]
Paths $P_1, \ldots, P_k$ are **edge-disjoint** if $E(P_i) \cap E(P_j) = \varnothing$ for $i \neq j$ — they share no edge (though they may share internal vertices).
[/definition]
There is an edge analogue of Menger's theorem. The proof strategy reduces it to the vertex version via the line graph construction. Before stating the theorem, we need the line graph, which converts an edge-connectivity problem into a vertex-connectivity problem by encoding edges as vertices.
[definition: Line Graph]
Let $G$ be a graph. The **line graph** $L(G)$ has vertex set $V(L(G)) = E(G)$, and two vertices $e, f \in E(G)$ are adjacent in $L(G)$ if and only if $e$ and $f$ share an endpoint in $G$.
[/definition]
[illustration:line-graph-construction]
The line graph construction can introduce multi-edges when $G$ itself has vertices of high degree, and the reduction from edge-connectivity to vertex-connectivity requires some care: one must check that the path translation is exact, and that edge-disjoint walks in $G$ can be rerouted into edge-disjoint paths (the rerouting lemma). With these points in hand, the argument goes through cleanly.
[quotetheorem:2023]
[citeproof:2023]
[quotetheorem:2024]
[citeproof:2024]
The edge version of Menger's theorem is in fact equivalent to the max-flow min-cut theorem for unit-capacity networks. One direction is immediate from the proof above. For the other, assign capacity 1 to each edge and observe that a maximum $a$–$b$ flow decomposes into edge-disjoint paths; the min-cut corresponds exactly to the minimum edge separator. This connection places Menger's theorem squarely within the theory of network flows and gives polynomial-time algorithms for computing $\lambda(G)$ via max-flow.
[remark: The Min-Max Principle]
Both Hall's theorem and Menger's theorem are instances of a general **min-max duality**: in each case, the maximum size of a certain structure (matching, set of disjoint paths) equals the minimum size of an obstacle (neighbourhood deficiency bound, separator). This duality is the combinatorial counterpart of LP duality and permeates all of combinatorics. Menger's theorem for edges is in fact a special case of the max-flow min-cut theorem from network flow theory, a perspective that reveals the deep unity underlying these results. Algorithmically, both the matching problem (via Hopcroft-Karp, running in $O(E\sqrt{V})$ time for bipartite graphs) and the connectivity problem (via max-flow for $\kappa$ and $\lambda$) admit efficient polynomial-time solutions, making the theory practically applicable at scale.
[/remark]
Having solved the matching and connectivity problems through polynomial-time algorithms, we now turn to a problem that appears simple to state but proves far more subtle: can we colour vertices with a bounded palette such that no two adjacent vertices share a colour? This classical question reveals deep connections between local constraints and global structure, and will lead us to one of the great theorems of twentieth-century mathematics.
# 3. Colouring
This chapter takes up one of the central problems in graph theory: given a graph $G$, how many colours are needed to colour its vertices so that no two adjacent vertices share a colour? This question connects combinatorics with topology (via planar graphs and surfaces), algebra (via the chromatic polynomial), and discrete optimisation. The chapter develops the theory from the basic greedy bound through Brooks' theorem, the colouring of planar graphs, the chromatic polynomial, Vizing's theorem on edge colourings, and finally the Heawood bound for graphs on surfaces of higher genus.
## Vertex Colourings and the Chromatic Number
Suppose you must schedule 100 university examinations into time slots. Two exams conflict if some student is enrolled in both — they cannot be sat simultaneously. Draw a graph with an exam at each vertex and an edge between every conflicting pair. Assigning time slots is the same as colouring the vertices so that adjacent vertices receive different colours: how few slots suffice? More concretely, the exam-conflict graph of a large faculty might have maximum degree 20, but many exams have far fewer conflicts; the answer depends not just on the maximum degree but on the graph's actual structure. The chromatic number of the conflict graph is exactly the minimum number of time slots required.
[definition: Proper Colouring]
Let $G$ be a graph. A **proper $k$-colouring** of $G$ is a function $c : V(G) \to \{1, \ldots, k\}$ such that $x \sim y \implies c(x) \neq c(y)$. The **chromatic number** of $G$, denoted $\chi(G)$, is the minimum $k$ for which a proper $k$-colouring exists.
[/definition]
A $k$-colouring partitions the vertex set into at most $k$ independent sets (the **colour classes**). The chromatic number measures how far $G$ is from being a union of independent sets.
[example: Basic Chromatic Numbers]
Several fundamental examples illustrate the range of chromatic numbers:
- A path $P_n$ has $\chi(P_n) = 2$: alternate colours along the path.
- More generally, a graph is bipartite if and only if $\chi(G) = 2$ (equivalently, $G$ contains no odd cycle).
- An even cycle $C_{2k}$ has $\chi(C_{2k}) = 2$, since it is bipartite.
- An odd cycle $C_{2k+1}$ has $\chi(C_{2k+1}) = 3$: any 2-colouring must alternate colours around the cycle, but the odd length forces a collision at the last vertex, so three colours are needed.
- A tree has $\chi = 2$, since trees are bipartite.
- The complete graph $K_n$ has $\chi(K_n) = n$: all $n$ vertices are mutually adjacent, so every vertex requires a distinct colour, and $n$ colours suffice.
- Chromatic number can far exceed clique number. The Mycielski construction produces triangle-free graphs with arbitrarily large chromatic number: starting from $K_2$ and iterating a specific doubling construction yields graphs with no triangle ($\omega = 2$) but $\chi$ equal to any prescribed value. This shows $\chi(G) \gg \omega(G)$ is possible.
[/example]
The first general upper bound comes from a greedy algorithm that processes vertices one at a time.
[quotetheorem:2025]
[citeproof:2025]
The argument above gives a universal upper bound, but it uses only a single numerical invariant of $G$. Before we try to improve it, it is worth recording what information the bound actually ignores, because that will tell us where refinement is possible.
[remark: Greedy Colouring]
This bound uses only $\Delta(G)$, discarding all structural information about the graph. A single high-degree vertex can force the bound upward even if the rest of the graph is sparse: if $G$ is a star $K_{1,d}$, then $\Delta = d$ and the bound gives $\chi \leq d + 1$, but $\chi(K_{1,d}) = 2$. The bound is tight at $K_n$ (where $\chi = n = \Delta + 1$), but for graphs with many low-degree vertices the bound is far from sharp. This motivates Brooks' theorem, which shows the bound can almost always be improved by one by exploiting structural constraints rather than just maximum degree.
[/remark]
## Colouring Planar Graphs
The greedy bound $\Delta(G) + 1$ is nearly vacuous for planar graphs — $K_5$ with its clique of size 5 is already unattainable in the plane, yet planar graphs in general can have arbitrarily large maximum degree (a planar star $K_{1,d}$ has $\Delta = d$). Is there a uniform chromatic bound for the entire class of planar graphs, one that ignores $\Delta$ altogether? And if so, what is the smallest such bound? The answer is yes, and we will work our way down to it in stages, exploiting a structural feature specific to planar graphs: their edge density is at most linear in the number of vertices, which forces their minimum degree to be small.
[quotetheorem:2026]
[citeproof:2026]
The hypothesis $|G| \geq 3$ is essential: the Euler-based edge bound $e(G) \leq 3n - 6$ requires enough vertices for a face structure to exist (for $n \leq 2$ the bound fails — $K_2$ has $e = 1 > 3(2) - 6 = 0$, and the inequality does not meaningfully constrain minimum degree on such small graphs). The bound $\delta \leq 5$ is tight: the icosahedron is a 5-regular planar triangulation, and no planar graph achieves $\delta = 6$ (that would force $e \geq 3n$, contradicting $e \leq 3n - 6$). This minimum-degree fact is the engine that drives every inductive argument in this section: it guarantees a low-degree vertex we can remove, colour the rest by induction, and reinsert. We exploit it first for the six-colour theorem, then refine the argument for the five-colour theorem.
[quotetheorem:2027]
[citeproof:2027]
The 6-colour theorem is not sharp: the argument works because every planar graph has a vertex of degree at most 5, but having $\delta \leq 5$ is strictly weaker than requiring $\chi \leq 5$. A non-planar graph with $\delta \leq 5$ (such as a blow-up of $K_6$ with subdivided edges) can have $\chi > 6$, so the same inductive argument does not apply to arbitrary low-degree graphs — planarity is doing essential work. Kempe's chain method, which we use next, refines the argument by allowing a recolouring step when we run out of free colours, and this extra flexibility reduces the bound from 6 to 5.
The greedy argument extends to five colours with a more careful recolouring step. This requires the notion of a **Kempe chain**: given a 5-colouring and two colours $i$ and $j$, the $\{i,j\}$-components are the connected components of the subgraph induced by vertices coloured $i$ or $j$. Swapping colours $i$ and $j$ within a single $\{i,j\}$-component produces another valid 5-colouring.
[quotetheorem:2028]
[citeproof:2028]
[remark: The Four-Colour Theorem and the Role of Planarity]
Every planar graph admits a 4-colouring; this is the **four-colour theorem**, proven by Appel and Haken in 1976 using a computer-aided search after reducing the problem to checking thousands of specific local configurations. The Kempe-chain method used above does not extend to 4-colourings: swapping colours in a component to free up a colour for the new vertex may inadvertently create a conflict elsewhere, and there is no analogue of the planarity crossing argument to derive a contradiction in the third case. The four-colour bound is sharp since $K_4$ is planar and requires four colours. Planarity is essential throughout: $K_6$ has $\chi = 6$, so without planarity the five-colour bound fails, and the argument above says nothing about higher-genus surfaces (a theme we return to in the final section on the Heawood bound).
[/remark]
[illustration:five-colour-kempe-chains]
## Colouring Non-Planar Graphs
For general graphs, the $\Delta(G) + 1$ bound from the greedy algorithm can often be tightened. The key observation is that if not all vertices have degree $\Delta(G)$, we can exploit the low-degree vertex.
[quotetheorem:2029]
[citeproof:2029]
The connectedness hypothesis is necessary: a disconnected graph with one component equal to $K_{\Delta+1}$ satisfies $\delta < \Delta$ globally but still requires $\Delta + 1$ colours. The condition $\delta < \Delta$ is also genuinely needed — if every vertex has degree exactly $\Delta$, the ordering trick fails and the greedy bound cannot be improved by this argument alone. Brooks' theorem handles precisely that remaining case, completing the picture: together, the two results show that $\chi(G) = \Delta(G) + 1$ is possible only for complete graphs and odd cycles.
[quotetheorem:2030]
[citeproof:2030]
[remark: Sharpness of Brooks' Theorem]
Combined with the greedy upper bound, Brooks' theorem says $\chi(G) = \Delta(G) + 1$ if and only if $G$ is an odd cycle or a complete graph. For all other connected graphs, the chromatic number is at most $\Delta(G)$.
[/remark]
Brooks' theorem is a structural classification result, not just a bound. Several important consequences deserve attention:
(a) **Connectedness is necessary.** The disjoint union $K_5 + K_5$ has $\Delta = 4$ and $\chi = 5 > \Delta$.
(b) **The excluded cases are genuine.** Both $K_{\Delta+1}$ and odd cycles achieve $\chi = \Delta + 1$; neither can be coloured with just $\Delta$ colours.
(c) **Degenerate cases.** For $\Delta \leq 2$ the theorem is either vacuous ($\Delta = 0$: isolated vertices) or classical ($\Delta = 1$: matchings, $\Delta = 2$: paths and cycles); the nontrivial content begins at $\Delta \geq 3$.
(d) **Forward connections.** The list-colouring analogue (Vizing's list-colouring conjecture), the NP-hardness of computing $\chi$ in general, and a linear-time constructive algorithm achieving the Brooks bound all build on this result.
## The Chromatic Polynomial
Knowing $\chi(G) = k$ tells us that a graph can be $k$-coloured, but says nothing about how many distinct $k$-colourings exist. For $K_3$ there are exactly $k(k-1)(k-2)$ proper $k$-colourings; for $C_4$ there are $(k-1)^4 + (k-1)$. Both are polynomials in $k$ — a fact that at first looks like a coincidence for these simple graphs. The surprising truth is that this is always so: the number of proper $k$-colourings of any graph is a polynomial in $k$. That a function defined only on non-negative integers $k$ should be a polynomial is not immediate; the deletion-contraction recursion will give us the structural reason.
[definition: Chromatic Polynomial]
Let $G$ be a graph. The **chromatic polynomial** of $G$ is the function $P_G : \mathbb{Z}_{\geq 0} \to \mathbb{Z}_{\geq 0}$ where $P_G(t)$ is the number of proper $t$-colourings of $G$.
[/definition]
Before developing the polynomial structure itself, we record a basic observation: the chromatic number is not a separate invariant but a feature of $P_G$ readable directly off its values at small integers.
[remark: Chromatic Number from the Polynomial]
The chromatic number $\chi(G)$ is the smallest positive integer $t$ for which $P_G(t) > 0$. The chromatic polynomial thus encodes the chromatic number as a zero: $\chi(G) = \min\{t \in \mathbb{Z}_{>0} : P_G(t) > 0\}$. One can also recover the number of colourings at the chromatic number — a meaningful quantity in applications like schedule counting — directly from the polynomial.
[/remark]
With the definition and its relation to $\chi$ in hand, it is useful to compute the polynomial in the simplest cases, both to see concrete formulas and to suggest patterns.
[example: Chromatic Polynomials of Basic Graphs]
Several standard examples illustrate the chromatic polynomial:
- **Empty graph on $n$ vertices** (no edges): every assignment of colours is proper, so $P_G(t) = t^n$.
- **Complete graph $K_n$**: colour the vertices one at a time; the $i$th vertex has $t - (i-1)$ choices, giving
\begin{align*}
P_{K_n}(t) = t(t-1)(t-2)\cdots(t-n+1) = n!\binom{t}{n}.
\end{align*}
- **Path $P_n$**: the first vertex has $t$ choices; each subsequent vertex has $t-1$ choices (any colour except its unique predecessor's colour), giving $P_{P_n}(t) = t(t-1)^{n-1}$.
- **Tree on $n$ vertices**: the same argument applies by induction — colour a leaf (which has a unique neighbour), remove it, and colour the rest. Since every leaf has exactly one neighbour, $P_T(t) = t(t-1)^{n-1}$ for any tree $T$ on $n$ vertices.
- **Non-isomorphic graphs with the same polynomial.** The chromatic polynomial does not determine a graph up to isomorphism. A concrete instance: the path $P_4$ and the star $K_{1,3}$ are both trees on 4 vertices, so both have polynomial $t(t-1)^3$, yet they are non-isomorphic (their degree sequences differ). More strikingly, there exist non-isomorphic graphs with the same chromatic polynomial that are not even trees — two graphs on 6 vertices constructed by different triangulations can share their polynomial — showing that the polynomial retains structural information but falls short of a complete invariant.
[/example]
The key tool for computing chromatic polynomials is a deletion-contraction relation. Recall that for an edge $e = xy \in E(G)$, the **deletion** $G - e$ removes $e$ while keeping all vertices, and the **contraction** $G/e$ identifies $x$ and $y$ into a single new vertex $a$, removing the edge $e$ but retaining all other adjacencies (with $a$ inheriting all neighbours of both $x$ and $y$, with any resulting multi-edges simplified to single edges).
[quotetheorem:2031]
[citeproof:2031]
The identity above is so central that it has acquired a suggestive name, and its recursive use gives a practical (though not always efficient) procedure for computing chromatic polynomials.
[remark: Cut-Fuse Relation]
This identity is sometimes called the **cut-fuse** (or deletion-contraction) relation. It is the standard recursive method for computing chromatic polynomials: repeatedly delete and contract edges until only empty graphs remain, at which point $P_G(t) = t^{|G|}$.
The recursion has exponential worst-case blowup: at each step the problem branches into two subproblems, and applying deletion-contraction to all $m$ edges produces a recursion tree of depth $m$ and up to $2^m$ leaves. This makes deletion-contraction a structural tool for proving theorems rather than a practical algorithm for large graphs; matching-tree-based algorithms achieve exponentially better performance in practice by exploiting the graph's tree-width.
[/remark]
Structural tool or not, the recursion is powerful enough to settle the basic polynomial-ness question.
[quotetheorem:2032]
[citeproof:2032]
The polynomial-ness of $P_G$ is more surprising than it first appears. The chromatic polynomial is initially defined only as a function on non-negative integers: for each $k \in \mathbb{Z}_{\geq 0}$ we count a finite collection of colourings. The theorem shows this function extends uniquely to a polynomial on all of $\mathbb{C}$, enabling evaluation at non-integer inputs. One meaningful such evaluation: for planar graphs, $P_G(2 + \phi) \neq 0$ where $\phi = \frac{1+\sqrt{5}}{2}$ is the golden ratio, a result related to the Beraha numbers and deep properties of planar structures. Another: the roots of $P_G$ encode phase-transition behaviour in the Potts model from statistical mechanics. These analytic-continuation uses of the polynomial would be entirely invisible from the integer-counting definition alone.
Having established that $P_G$ is a polynomial of degree $|G|$, we next inspect its leading coefficients, which turn out to encode the number of edges.
[quotetheorem:2033]
[citeproof:2033]
The two leading coefficients count vertices and edges; remarkably, the lower-order coefficients continue to encode combinatorial invariants of $G$ in a precise and sometimes deep way.
[remark: Higher Coefficients]
The coefficients of $P_G$ encode further structural information. For instance, the coefficient of $t^{n-2}$ equals $\binom{m}{2}$ minus the number of triangles in $G$. The coefficients $c_0, \ldots, c_n$ of $P_G$ (written with alternating signs as $P_G(t) = \sum_{i=0}^n (-1)^{n-i} c_i\, t^i$) are known to be **log-concave** by a deep result of Huh (2012): $c_i^2 \geq c_{i-1} c_{i+1}$ for all $i$. Moreover, if $G$ is planar, then $P_G(2 + \frac{1+\sqrt{5}}{2}) \neq 0$, a fact related to the golden ratio and the structure of planar graphs.
[/remark]
[example: Deletion-Contraction for $P_{C_4}(t)$]
We compute the chromatic polynomial of the 4-cycle $C_4$ by deletion-contraction on any single edge $e$. Deleting $e$ turns $C_4$ into the path $P_4$; contracting $e$ merges the two endpoints of $e$ into one vertex, producing the triangle $K_3$. Using $P_{P_n}(t) = t(t-1)^{n-1}$ and $P_{K_n}(t) = t(t-1)\cdots(t-n+1)$:
\begin{align*}
P_{C_4}(t) &= P_{P_4}(t) - P_{K_3}(t) \\
&= t(t-1)^3 - t(t-1)(t-2) \\
&= t(t-1)\bigl[(t-1)^2 - (t-2)\bigr] \\
&= t(t-1)\bigl[t^2 - 2t + 1 - t + 2\bigr] \\
&= t(t-1)(t^2 - 3t + 3).
\end{align*}
As a sanity check: $P_{C_4}(2) = 2 \cdot 1 \cdot (4 - 6 + 3) = 2 \cdot 1 \cdot 1 = 2 > 0$, so $\chi(C_4) \leq 2$, and since $C_4$ has edges we also have $\chi(C_4) \geq 2$; hence $\chi(C_4) = 2$, as expected for a bipartite graph. The integer roots of $P_{C_4}$ are exactly $t = 0, 1$; the remaining quadratic factor $t^2 - 3t + 3$ has complex roots $\frac{3 \pm i\sqrt{3}}{2}$, which explains why no further integers below $\chi(C_4)$ appear as zeros.
[/example]
## Edge Colouring and Vizing's Theorem
The line graph $L(G)$ — whose vertices are the edges of $G$, with two vertices of $L(G)$ adjacent whenever the corresponding edges of $G$ share an endpoint — lets us reduce edge colouring to vertex colouring: a proper edge colouring of $G$ is exactly a proper vertex colouring of $L(G)$. But this trade is costly in density. At a vertex $v$ of degree $d$ in $G$, the $d$ edges incident to $v$ form a clique of size $d$ in $L(G)$, so $\chi(L(G)) \geq \Delta(G)$. Can we do no better? The answer is almost tight: Vizing's theorem shows that only one additional colour beyond $\Delta(G)$ ever suffices, making $\Delta(G) + 1$ a universal ceiling.
[definition: Edge Colouring and Chromatic Index]
Let $G$ be a graph. A **proper $k$-edge colouring** is a function $c : E(G) \to \{1, \ldots, k\}$ such that $c(e) \neq c(f)$ whenever $e$ and $f$ share an endpoint. The **chromatic index** (or **edge chromatic number**) $\chi'(G)$ is the minimum $k$ for which a proper $k$-edge colouring exists.
[/definition]
The definition stands on its own, but it is useful to record the line-graph reformulation explicitly — it translates every edge-colouring question into a vertex-colouring question on a related graph, and tells us in advance how tightly $\chi'$ is controlled by $\Delta$.
[remark: Edge Colouring as Vertex Colouring of the Line Graph]
An edge colouring of $G$ corresponds exactly to a vertex colouring of the **line graph** $L(G)$, whose vertices are the edges of $G$ with two vertices of $L(G)$ adjacent whenever the corresponding edges of $G$ share an endpoint. Thus $\chi'(G) = \chi(L(G))$. The key difference from vertex colouring is that $\chi'(G)$ is always within one of $\Delta(G)$ — a much tighter range than the general $\chi$ vs $\Delta$ relationship.
[/remark]
Before attacking Vizing's theorem itself, concrete cases show both the lower bound $\chi'(G) \geq \Delta(G)$ and the fact that this bound is not always tight.
[example: Edge Chromatic Numbers]
A few examples show that the chromatic index can differ from the chromatic number:
- An even cycle $C_{2k}$ has $\chi'(C_{2k}) = 2$ (colour alternate edges), and an odd cycle $C_{2k+1}$ has $\chi'(C_{2k+1}) = 3$. Cycles are their own line graphs.
- Every vertex of degree $\Delta(G)$ forces all its incident edges to receive different colours, so $\chi'(G) \geq \Delta(G)$.
- The star $K_{1,t}$ has $\chi(K_{1,t}) = 2$ but $\chi'(K_{1,t}) = t$, so $\chi$ and $\chi'$ can be arbitrarily far apart.
- The Petersen graph $P$ is 3-regular, so $\chi'(P) \geq 3$. However, $P$ is a *snark*: although $P$ has perfect matchings (in fact exactly six of them), no three perfect matchings of $P$ are pairwise edge-disjoint, so its edge set cannot be partitioned into three perfect matchings. A 3-edge-colouring of a 3-regular graph would give exactly such a partition (each colour class a perfect matching, the three classes pairwise edge-disjoint by definition), so no 3-edge-colouring exists, and $\chi'(P) > 3$. Vizing's theorem gives $\chi'(P) \leq \Delta(P) + 1 = 4$. Hence $\chi'(P) = 4$, making $P$ a Class 2 graph.
[/example]
The greedy bound $\chi'(G) \leq 2\Delta(G) - 1$ follows from a direct count (each edge has at most $2\Delta(G) - 2$ other edges sharing an endpoint), but Vizing's theorem shows $\chi'(G)$ is always either $\Delta(G)$ or $\Delta(G) + 1$.
Before stating the proof, we introduce the key structure. Given an edge colouring $c : E(G) \to \{1, \ldots, k\}$, define the **colour classes** $C_i = \{e \in E(G) : c(e) = i\}$. For two colours $i, j$, the subgraph $(V(G), C_i \cup C_j)$ is a disjoint union of paths, even cycles, and isolated vertices; its connected components are called **$\{i,j\}$-components**. A colour $c$ is **missing** at a vertex $y$ if no edge incident to $y$ has colour $c$.
[quotetheorem:2034]
[citeproof:2034]
With the dichotomy established, it is natural to name the two classes it creates and record what we can say about distinguishing them.
[remark: Class 1 and Class 2 Graphs]
Graphs with $\chi'(G) = \Delta(G)$ are called **class 1**; those with $\chi'(G) = \Delta(G) + 1$ are called **class 2**. Bipartite graphs are always class 1 (König's theorem). Determining which class a given graph belongs to is NP-complete in general.
The lower bound $\chi'(G) \geq \Delta(G)$ is witnessed at every max-degree vertex: the $\Delta$ edges incident there must all receive distinct colours. The upper bound $\chi' \leq \Delta + 1$ is constructive in principle (the Vizing proof gives a procedure), though the proof is existential rather than directly algorithmic. Vizing's theorem gives no constructive criterion for whether a specific graph is class 1 or class 2. A concrete class 2 example: any odd cycle $C_{2k+1}$ has $\Delta = 2$ but $\chi' = 3$ (the $2k+1$ edges must be coloured with colours $a$ and $b$ alternating, but the odd length forces two adjacent edges to collide, requiring a third colour).
[/remark]
## Graphs on Surfaces and the Heawood Bound
$K_7$ has chromatic number 7, yet it is not planar — it cannot be drawn in the plane without edge crossings. Where does it live naturally? On the torus: all $\binom{7}{2} = 21$ edges of $K_7$ can be drawn on a torus without crossings, using a well-known hexagonal embedding. This is no coincidence. The four-colour theorem governs the sphere (genus 0); the torus (genus 1) supports $K_7$ and requires up to 7 colours. In general, the genus of a surface controls the maximum chromatic number of graphs embeddable on it — a bound due to Heawood — and the four-colour theorem is the first case of this genus-chromatic number correspondence.
The sphere $S^2$ is a compact orientable surface of genus $0$ (Euler characteristic $\mathcal{E} = 2$), and the torus $T^2$ has genus $1$ ($\mathcal{E} = 0$). More generally, for any $g \in \mathbb{N}$, attaching $g$ handles to a sphere produces the compact orientable surface of genus $g$ with Euler characteristic $\mathcal{E} = 2 - 2g$. The Euler formula $|G| - e(G) + f = \mathcal{E}$ holds for connected graphs on such surfaces (with equality for connected graphs, and $\leq \mathcal{E}$ for disconnected ones, since we can add edges to connect components).
Just as for planar graphs, the inequality $3f \leq 2e(G)$ gives $e(G) \leq 3(|G| - \mathcal{E})$ for any graph with at least one edge on a surface of characteristic $\mathcal{E}$.
[quotetheorem:2035]
[citeproof:2035]
The bound proven above is an upper bound in principle; a natural next question is whether the complete graphs witnessing sharpness actually embed on the surfaces in question.
[remark: Sharpness of the Heawood Bound]
The Heawood bound is sharp for all surfaces with $\mathcal{E} \leq 0$: for each such surface, the complete graph $K_{H(\mathcal{E})}$ embeds on it. For the torus ($\mathcal{E} = 0$, $H(0) = 7$), the complete graph $K_7$ embeds with all 21 edges drawn without crossings — this is the canonical embedding referenced in the section opening. For the double torus ($\mathcal{E} = -2$, $H(-2) = 8$), $K_8$ embeds on the double torus. The sharpness for all $\mathcal{E} \leq 0$ was established by Ringel and Youngs (1968) using a sequence of intricate current-graph constructions for each residue class of $H(\mathcal{E})$.
Note that $H(2) = 4$, which would immediately imply the four-colour theorem if the Heawood argument applied to the sphere. However, the step that uses $\mathcal{E} \leq 0$ to bound $\delta(G)$ breaks down for $\mathcal{E} = 2$: the inequality $6 - 6\mathcal{E}/k \geq k-1$ need not hold when $\mathcal{E} > 0$. The four-colour theorem for the sphere therefore required a separate, far harder proof (Appel–Haken 1976), and the Heawood theory does not subsume it.
[/remark]
The chromatic polynomial and the theory of graph colouring have shown how local constraints propagate through a graph's structure, culminating in the four-colour theorem's striking assertion about planarity. Extremal graph theory reframes this intuition: instead of colouring existing graphs, it asks the inverse question — how much structure must a graph contain before it is forced to exhibit a particular property?
# 4. Extremal Graph Theory
Extremal graph theory asks: how many edges can a graph on $n$ vertices have before it is forced to contain a particular substructure? This question leads to some of the most beautiful results in combinatorics, where tight algebraic bounds meet elegant counting arguments. Chapter 3 developed the chromatic number as a measure of graph complexity; here we use those ideas — and introduce new ones — to understand the edge-density thresholds that force Hamiltonian cycles, long paths, triangles, cliques, and complete bipartite subgraphs. The culminating result, the Erdős–Stone theorem, unifies these questions into a single asymptotic formula governed entirely by chromatic number.
## Hamiltonian Cycles and Minimum Degree
The question of when a graph must contain a Hamiltonian cycle is central to combinatorics and has deep computational implications. A graph might have many edges yet fail to be Hamiltonian — think of a path, which has $n - 1$ edges but no cycle at all. So merely counting edges is not enough; we need a more refined condition. The right condition turns out to be a lower bound on the minimum degree $\delta(G)$.
[definition: Hamiltonian Graph]
A graph $G$ is **Hamiltonian** if it contains a cycle passing through every vertex exactly once. Such a cycle is called a **Hamilton cycle**. A path passing through every vertex exactly once is called a **Hamiltonian path**.
[/definition]
The existence of a Hamilton cycle is a global property of the graph, and there is no known efficient algorithm to decide it in general. However, a strong minimum-degree condition guarantees Hamiltonicity, and the proof gives an illuminating model of the "rotation-extension" technique.
[quotetheorem:2036]
[citeproof:2036]
The bound $\delta(G) \geq \frac{n}{2}$ is sharp in both parities. When $n$ is even, two disjoint copies of $K_{n/2}$ give a graph with $\delta(G) = \frac{n}{2} - 1$ that is not even connected, let alone Hamiltonian. When $n$ is odd, two copies of $K_{(n+1)/2}$ that share a single vertex yield a connected graph with $\delta(G) = \frac{n-1}{2}$ that has no Hamilton cycle, since any cycle must alternate between the two cliques but can visit the shared vertex only once. The minimum-degree condition is therefore essentially the weakest possible global condition guaranteeing Hamiltonicity.
It is worth noting that no condition on the number of edges alone can force Hamiltonicity. The graph $K_{n-1}$ augmented by a single vertex connected to $K_{n-1}$ by one edge has $\binom{n-1}{2} + 1$ edges but is not Hamiltonian — the leaf vertex has degree 1 and cannot be part of any cycle. This rules out edge-count theorems of the simple form $e(G) \geq k \Rightarrow G$ is Hamiltonian.
## Paths of a Given Length
Having pinned down the threshold for Hamiltonian cycles, we turn to a related but more flexible question: when must a graph contain a path of a prescribed length $k$, which may be much smaller than $n$? Here both minimum-degree and edge-count conditions yield sharp results.
[quotetheorem:2037]
[citeproof:2037]
The hypotheses here are all necessary. If $k \geq n$, then $K_n$ itself has no path of length $k$ (only $n - 1$ vertices to work with). If $G$ is disconnected, taking $\lfloor n/k \rfloor$ disjoint complete graphs each of size $k$ gives $\delta(G) = k - 1 \geq k/2$ (for $k \geq 2$) but no path longer than $k - 1$. The sharpness of $\delta \geq k/2$ is witnessed by collections of copies of $K_{(k+1)/2}$ that all share a single vertex, giving minimum degree $\frac{k-1}{2}$ and no path of length $k$.
The above lemma addresses the minimum-degree regime. For the edge-count regime, a different inductive argument applies.
[quotetheorem:2038]
[citeproof:2038]
When $k \mid n$, a collection of $n/k$ disjoint copies of $K_k$ has $\frac{n}{k} \cdot \binom{k}{2} = \frac{n(k-1)}{2}$ edges and no path of length $k$, so the bound is sharp. The two theorems together reveal a pleasing duality: minimum degree controls long paths in connected graphs via a local-to-global argument, while edge count does the same via a global averaging argument.
## Forcing Triangles: Mantel's Theorem
The simplest forbidden subgraph question asks: how many edges can a triangle-free graph have? A triangle-free graph might still have many edges — the complete bipartite graph $K_{\lfloor n/2 \rfloor, \lceil n/2 \rceil}$ is triangle-free and has about $n^2/4$ edges. Mantel's theorem says this is the best possible.
Before stating the main result, we recall a classical inequality that will drive the proof.
[quotetheorem:515]
The proof is the standard proof by induction using the definition of convexity. It is worth pausing to note the equality condition: if $f$ is strictly convex, equality holds if and only if all $x_i$ are equal. For a general convex $f$, equality holds if and only if $f$ is affine on the convex hull of $\{x_1, \ldots, x_n\}$. This condition will matter when we identify the extremal graphs: equality in Jensen's forces all degrees to be equal, which pins down the structure of the extremal example. The same template — apply Jensen's to $f(t) = t^2$ or to $f(t) = \binom{t}{r}$, square the degrees, bound via the edge count — recurs in both the Mantel proof and the Kővári–Sós–Turán bound below, making it one of the most reused tools in this chapter.
The averaging form of Jensen's that we use is: for a strictly convex $f$,
\begin{align*}
\sum_{i=1}^n f(x_i) \geq n \cdot f\!\left(\frac{1}{n}\sum_{i=1}^n x_i\right),
\end{align*}
with equality iff all $x_i$ are equal. Applying this to the degrees of a graph will convert the structural constraint (no triangles means adjacent degrees sum to at most $n$) into a numerical bound.
[quotetheorem:2039]
[citeproof:2039]
The complete bipartite graph $K_{n/2, n/2}$ (for $n$ even) has exactly $\frac{n^2}{4}$ edges and is triangle-free, so the bound is sharp. Mantel's theorem is the case $r = 2$ of Turán's theorem, which handles all clique sizes. The argument via Jensen's inequality — encoding a structural constraint (no triangles means adjacent degrees sum to at most $n$) into a numerical bound via convexity — is the template for the Zarankiewicz problem in Section 4.5.
## Forcing Cliques: Turán's Theorem
Mantel's theorem raises the natural generalisation: what is the maximum number of edges in a $K_{r+1}$-free graph? The answer is given by Turán's theorem, one of the foundational results of extremal graph theory. The extremal graphs are the Turán graphs, a canonical family of complete multipartite graphs.
[definition: Complete Multipartite Graph]
Given natural numbers $n_1, \ldots, n_r$, the **complete $r$-partite graph** $K_{n_1, \ldots, n_r}$ is the graph whose vertex set is partitioned into $r$ parts of sizes $n_1, \ldots, n_r$, with all edges between distinct parts and no edges within any part. A graph is **$r$-partite** if it admits such a partition, equivalently if $\chi(G) \leq r$.
[/definition]
Among complete multipartite graphs, the balanced partitions occupy a special place. Intuitively, dividing vertices evenly maximises the number of edges: since every edge crosses between two different parts, making one part very large at the expense of another reduces the number of crossing pairs. More precisely, if part $V_i$ has $a$ vertices and part $V_j$ has $b$ vertices with $a > b + 1$, then moving one vertex from $V_i$ to $V_j$ changes the edge count between these two parts from $ab$ to $(a-1)(b+1) = ab + a - b - 1 > ab$ (since $a > b + 1$). Balancing the parts therefore strictly increases the edge count. The Turán graph captures this optimality.
[definition: Turán Graph]
The **Turán graph** $T_{r,n}$ is the complete $r$-partite graph $K_{n_1, \ldots, n_r}$ with $\sum_{i=1}^r n_i = n$ and $n_1, \ldots, n_r$ differing by at most one (the parts are as equal as possible).
[/definition]
The edge count of $T_{r,n}$ is worth computing explicitly, as it is the benchmark against which all $K_{r+1}$-free graphs are measured. Write $n = qr + s$ with $0 \leq s < r$, so $s$ parts have size $q+1$ and $r - s$ parts have size $q$. Every edge of $T_{r,n}$ goes between two distinct parts, so
\begin{align*}
e(T_{r,n}) = \frac{1}{2}\left[\left(\sum_{i=1}^r n_i\right)^2 - \sum_{i=1}^r n_i^2\right] = \frac{n^2 - \sum_{i=1}^r n_i^2}{2}.
\end{align*}
The sum $\sum n_i^2$ is minimised precisely when the $n_i$ are as equal as possible (by convexity of $t^2$), confirming that $T_{r,n}$ maximises edges. When $r \mid n$, all parts have size $n/r$ and
\begin{align*}
e(T_{r,n}) = \binom{r}{2} \cdot \frac{n^2}{r^2} = \left(1 - \frac{1}{r}\right)\frac{n^2}{2}.
\end{align*}
The Turán graph contains no $(r+1)$-clique, since any $r+1$ vertices must include two from the same part, which are non-adjacent. This makes $T_{r,n}$ the natural extremal candidate: it is $K_{r+1}$-free and has as many edges as possible among all complete $r$-partite graphs on $n$ vertices.
[quotetheorem:2040]
[citeproof:2040]
This first form tells us when a clique must be present, but it only provides a bound on the number of edges — it does not tell us which graph achieves equality. The induction argument traces through an $r$-clique and the remaining graph, but the recursive structure loses the global information about how the parts fit together. In particular, many non-isomorphic graphs could in principle match the edge count $\left(1-1/r\right)n^2/2$ without being Turán graphs. To pin down the extremal graph uniquely, a second form of the theorem is needed.
[quotetheorem:2041]
[citeproof:2041]
The second form is the sharp version: Turán graphs are the unique extremal graphs for this problem. The proof, using a graph-transformation to canonicalise the extremal structure, is an early instance of the "stability" technique that pervades modern extremal graph theory.
Among complete $r$-partite graphs, the Turán graph is the edge-maximiser. But why should we study $r$-partite graphs separately, before turning to all $K_{r+1}$-free graphs? The answer is that $r$-partite graphs are the "canonical" $K_{r+1}$-free class: by definition, a complete $r$-partite graph cannot contain $K_{r+1}$ since any $r+1$ vertices include two in the same part. Every $K_{r+1}$-free graph is contained in an $r$-partite graph (by Turán's second form, the extremal graph is complete $r$-partite). So determining the maximum edge count among $r$-partite graphs is the first, irreducible step — it tells us what the "best possible" extremal structure looks like before we ask whether any $K_{r+1}$-free graph can beat it.
[quotetheorem:2042]
[citeproof:2042]
The key convexity argument here deserves emphasis: if $|V_i| > |V_j| + 1$, moving a vertex from $V_i$ to $V_j$ strictly increases the edge count between those two parts. This is exactly the discrete version of the statement that $xy$ is maximised (for fixed $x + y$) when $x = y$, which itself is an instance of the AM-GM inequality. The result says, in particular, that among $r$-partite graphs the Turán graph is the unique maximiser. It says nothing directly about graphs that are NOT $r$-partite — those are handled by Turán's theorem proper, which shows no non-$r$-partite $K_{r+1}$-free graph can beat $T_{r,n}$ either.
Together, these results establish Turán's theorem as a tight, complete answer to the clique-forcing problem. The extremal number $\mathrm{ex}(n, K_{r+1}) = e(T_{r,n})$ is realised precisely by the Turán graphs.
## The Zarankiewicz Problem
A natural companion to Turán's theorem asks about complete bipartite subgraphs: how many edges can a bipartite graph on vertex sets $X$ and $Y$ (each of size $n$) have before it is forced to contain $K_{t,t}$? This is the Zarankiewicz problem, and unlike the clique problem its answer is not yet fully understood.
The question is the bipartite analogue of the clique-forcing problem. Just as Turán's theorem asks how dense a graph can be before it contains a clique (a "balanced complete subgraph on $r+1$ vertices"), the Zarankiewicz problem asks how dense a bipartite graph can be before it contains a complete balanced bipartite subgraph $K_{t,t}$. In the bipartite setting, $K_{t,t}$ plays the role that $K_{r+1}$ plays in the clique setting: it is the "forbidden subgraph" whose absence we enforce while maximising edges. The two sides of the bipartition correspond to two independent structures (points and lines, or vertices and hyperedges, in combinatorial applications), making $K_{t,t}$-free conditions natural in incidence geometry and algebraic combinatorics.
A product bound is natural to expect: if $G$ has $e$ edges in $X \times Y$ and no $K_{t,t}$, then for any set of $t$ vertices in $X$, their common neighbourhood in $Y$ has size at most $t-1$. There are $\binom{n}{t}$ such sets, and each $y \in Y$ is counted $\binom{\deg y}{t}$ times. The tension between these two counts — one measuring the forbidden configuration, the other tracking the degree distribution — produces the upper bound.
[definition: Zarankiewicz Number]
The **Zarankiewicz number** $Z(n, t)$ is the maximum number of edges in a bipartite graph $G = (X \sqcup Y, E)$ with $|X| = |Y| = n$ that contains no $K_{t,t}$ as a subgraph.
[/definition]
The Zarankiewicz problem asks for the asymptotic growth of $Z(n,t)$ as $n \to \infty$ with $t$ fixed. The upper bound comes from a Jensen's-inequality argument that generalises the Mantel proof.
[quotetheorem:2043]
[citeproof:2043]
The bound shows that $Z(n,t) = O(n^{2 - 1/t})$, meaning the exponent of $n$ is strictly less than 2. This is in sharp contrast to the Turán bound $e(T_{r,n}) \sim (1 - 1/r) \frac{n^2}{2}$, which is quadratic. For matching lower bounds, finite geometry provides the key construction.
[example: Projective Plane Incidence Graphs Achieve $Z(n,2)$ Lower Bound]
For $t = 2$, the Kővári–Sós–Turán bound gives $Z(n, 2) \leq cn^{3/2}$ for a constant $c$. A matching lower bound comes from projective planes. Let $q$ be a prime power and let $\Pi_q$ be the projective plane of order $q$. This plane has $q^2 + q + 1$ points and $q^2 + q + 1$ lines; every point lies on exactly $q + 1$ lines, and every line contains exactly $q + 1$ points.
Form the bipartite incidence graph $G$ with $X$ = points, $Y$ = lines, and an edge $\{p, \ell\}$ whenever point $p$ lies on line $\ell$. Then $|X| = |Y| = q^2 + q + 1 =: n$. The graph is $(q+1)$-regular on both sides, giving $e(G) = (q+1)n$.
The graph contains no $K_{2,2}$: two distinct points determine at most one line, so any two points $p_1, p_2 \in X$ have $|N(p_1) \cap N(p_2)| \leq 1 < 2$. Hence $G$ is $K_{2,2}$-free, giving
\begin{align*}
Z(n, 2) \geq e(G) = (q+1)n \sim q \cdot n \sim n^{1/2} \cdot n = n^{3/2},
\end{align*}
since $q \sim n^{1/2}$ as $q \to \infty$. More precisely, $q + 1 = \Theta(n^{1/2})$ so $e(G) = \Theta(n^{3/2})$. This matches the Kővári–Sós–Turán upper bound up to the constant factor, showing $Z(n, 2) = \Theta(n^{3/2})$.
[/example]
For $Z(n, 3)$, a similar construction using the incidence graph of a finite geometry (the polarity graph of a projective plane) yields $Z(n, 3) \geq cn^{5/3}$, again matching the upper bound in exponent. For $t \geq 4$, determining the correct exponent in $Z(n, t)$ remains an open problem.
[illustration:ktt-bipartite-subgraph]
## The Extremal Number and the Erdős–Stone Theorem
The results so far — Turán's theorem for cliques, the Kővári–Sós–Turán bound for complete bipartite graphs, and the path theorem — are all instances of a single question: for a fixed graph $H$, how large can a graph on $n$ vertices be without containing $H$ as a subgraph?
[definition: Extremal Number]
Let $H$ be a fixed graph and $n \in \mathbb{N}$. The **extremal number** (or **Turán number**) of $H$ is
\begin{align*}
\mathrm{ex}(n, H) = \max\{e(G) \mid |G| = n,\, G \text{ contains no copy of } H\}.
\end{align*}
[/definition]
The results developed above give several values and bounds:
- $\mathrm{ex}(n, K_{r+1}) = e(T_{r,n}) \leq \left(1 - \frac{1}{r}\right)\frac{n^2}{2}$, by Turán's theorem.
- $\mathrm{ex}(n, P_k) = \frac{n(k-1)}{2}$, by the path theorem (the extremal graph is a disjoint union of $K_k$'s when $k \mid n$).
- $\mathrm{ex}(n, K_{t,t}) \leq t^{1/t} n^{2 - 1/t} + tn$, by Kővári–Sós–Turán.
These examples suggest that the chromatic number of $H$ plays a decisive role. A graph $H$ with $\chi(H) \geq 3$ cannot be embedded in any bipartite graph, so the Turán graph $T_{\chi(H)-1, n}$ (which is $(\chi(H)-1)$-partite) contains no copy of $H$, giving $\mathrm{ex}(n, H) \geq e(T_{\chi(H)-1, n})$. The Erdős–Stone theorem says this lower bound is asymptotically tight.
[quotetheorem:2044]
The proof proceeds via the Erdős–Stone–Simonovits theorem: for any graph $H$ with chromatic number $r + 1$ and any $\varepsilon > 0$, every graph on $n$ vertices with more than $\left(1 - \frac{1}{r} + \varepsilon\right)\binom{n}{2}$ edges contains a copy of $H$, provided $n$ is sufficiently large (depending on $H$ and $\varepsilon$). The proof uses Szemerédi's regularity lemma, which partitions the vertices of any large graph into a bounded number of parts such that edges between most pairs of parts are distributed nearly randomly (in a precise quantitative sense). Given this partition, one can embed $H$ into the nearly-regular pieces by matching its chromatic structure to the multipartite structure of the partition. We do not prove the regularity lemma or the full Erdős–Stone theorem here; see Bollobás, *Modern Graph Theory*, Chapter VI, or Diestel, *Graph Theory*, Chapter 7.
The lower bound $\mathrm{ex}(n, H) \geq \left(1 - \frac{1}{\chi(H)-1}\right)\binom{n}{2}$ is immediate: the Turán graph $T_{\chi(H)-1, n}$ contains no copy of $H$ because any copy of $H$ would require a monochromatic edge (as $\chi(H) > \chi(H) - 1$), and Turán graphs have no edges within parts.
When $\chi(H) \geq 3$, the Erdős–Stone theorem determines $\mathrm{ex}(n, H)$ up to lower-order terms: the leading behaviour is $\left(1 - \frac{1}{\chi(H)-1}\right)\frac{n^2}{2}$. In particular, any non-bipartite graph $H$ (meaning $\chi(H) \geq 3$) has $\mathrm{ex}(n, H) \sim c n^2$ for a constant $c$ depending only on $\chi(H)$.
When $\chi(H) = 2$, the theorem gives $\mathrm{ex}(n, H) = o(n^2)$: a bipartite graph $H$ can be forbidden with subquadratically many edges. This is consistent with the Zarankiewicz bound, since any bipartite graph $H$ is a subgraph of $K_{t,t}$ for some $t$, and $\mathrm{ex}(n, K_{t,t}) \lesssim n^{2-1/t}$. The Erdős–Stone theorem does not, however, determine the correct exponent for bipartite graphs — this remains the central open problem in extremal graph theory.
[example: Erdős–Stone for Specific Graphs]
Let $H = C_5$, the cycle of length 5. Since $\chi(C_5) = 3$, the Erdős–Stone theorem gives
\begin{align*}
\lim_{n \to \infty} \frac{\mathrm{ex}(n, C_5)}{\binom{n}{2}} = 1 - \frac{1}{2} = \frac{1}{2}.
\end{align*}
So $\mathrm{ex}(n, C_5) \sim \frac{n^2}{4}$. The extremal graphs are approximately the Turán graphs $T_{2,n}$, i.e., complete bipartite graphs with equal parts, which have $\frac{n^2}{4}$ edges and no $C_5$ (being bipartite, they have no odd cycles at all). More precisely, any graph on $n$ vertices with more than $\left(\frac{1}{2} + \varepsilon\right)\binom{n}{2}$ edges and $n$ sufficiently large must contain a $C_5$.
Now let $H = K_4$. Since $\chi(K_4) = 4$, the Erdős–Stone theorem gives
\begin{align*}
\lim_{n \to \infty} \frac{\mathrm{ex}(n, K_4)}{\binom{n}{2}} = 1 - \frac{1}{3} = \frac{2}{3},
\end{align*}
and in fact $\mathrm{ex}(n, K_4) = e(T_{3,n})$ exactly, by Turán's theorem. The extremal graph is the complete 3-partite graph with equal parts.
[/example]
The Erdős–Stone theorem is a landmark unification: the single parameter $\chi(H)$ controls the asymptotic edge density of $H$-free graphs. The chromatic number, introduced in Chapter 3 as a colouring invariant, now appears as the decisive parameter governing the extremal structure. This connection between colouring and edge density is one of the deepest themes in graph theory, and the questions it opens — determining $\mathrm{ex}(n, H)$ for bipartite $H$, establishing stability results, and understanding the structure of near-extremal graphs — remain active areas of research.
In extremal graph theory, the chromatic number emerges as a decisive governing parameter, controlling how densely a graph can be packed with edges before forbidden substructures appear. Ramsey theory takes this theme in a new direction, asking not about edges in a single graph but about inevitable monochromatic structures that must appear whenever we partition any sufficiently large combinatorial object.
# 5. Ramsey Theory
Ramsey theory sits at the boundary between combinatorics, logic, and number theory, unified by a single philosophical principle: in any sufficiently large and complex combinatorial system, complete disorder is impossible. No matter how one attempts to colour, partition, or arrange the elements of a large enough structure, some organised sub-pattern is guaranteed to emerge. This chapter develops the foundational theorems of Ramsey theory in the context of graphs: the finite Ramsey theorem for complete graphs, its generalisation to infinite graphs and to colourings of higher-order sets, and then the quantitative question of how large "sufficiently large" must be — with both upper and lower bounds on Ramsey numbers.
The chapter connects naturally to the extremal graph theory of Chapter 4. There, we asked how many edges force a particular subgraph; here, we ask how many vertices force a monochromatic clique under any edge colouring. The probabilistic method, introduced in Chapter 4 for the Zarankiewicz problem and developed further in Chapter 6, appears again here in Erdős's landmark lower bound for Ramsey numbers.
## The Warm-Up: Six Vertices and Two Colours
Before developing general theory, it is worth asking what the simplest non-trivial instance of Ramsey's principle looks like. Given only six vertices and two colours, can we always force a monochromatic triangle regardless of how we colour the edges?
[quotetheorem:2045]
[citeproof:2045]
This argument is characteristic of Ramsey-type reasoning: a pigeonhole step produces a structured neighbourhood, and examining that neighbourhood yields the desired monochromatic substructure. The proof is short, but it illustrates exactly the pattern that the general theorem will formalise. The number six is tight here: a careful construction on $K_5$ — colouring the edges of a 5-cycle red and the remaining edges blue — produces a 2-colouring with no monochromatic triangle, so $R(3,3) = 6$.
## Ramsey Numbers and the Existence Theorem
The proposition above raises a precise quantitative question: for each pair of clique sizes $s$ and $t$, what is the smallest $n$ such that every 2-colouring of $K_n$ contains a red $K_s$ or a blue $K_t$? It is not at all obvious, a priori, that a finite such $n$ exists for every $s$ and $t$. The central theorem of this section establishes both existence and a recursive upper bound.
[definition: Ramsey Number R(s,t)]
Let $s, t \geq 2$. The **Ramsey number** $R(s,t)$ is the minimal $n \in \mathbb{N}$ such that every 2-edge colouring of $K_n$ contains either a red $K_s$ or a blue $K_t$.
The **diagonal Ramsey number** is $R(s) := R(s,s)$, the smallest $n$ such that every 2-colouring of $K_n$ contains a monochromatic $K_s$.
[/definition]
The symmetry $R(s,t) = R(t,s)$ follows immediately from swapping the role of the two colours. The base cases are also easy to identify: $R(2,t) = t$ for all $t \geq 2$, since forcing a red $K_2$ means forcing any red edge at all, which happens as soon as the graph is large enough to force a monochromatic $K_t$ otherwise.
With the Ramsey number now defined as the threshold we are searching for, the natural next question is not how to compute it, but whether it is finite at all. The following theorem answers this existence question: for every $s$ and $t$, a finite threshold does exist, and the recursive bound tells us concretely how the thresholds for $(s,t)$ depend on the smaller cases $(s-1,t)$ and $(s,t-1)$.
[quotetheorem:2046]
[citeproof:2046]
The recursion $R(s,t) \leq R(s-1,t) + R(s,t-1)$ mirrors Pascal's identity for binomial coefficients, and it is this structural parallel that leads to the binomial upper bound derived in the next section. The theorem tells us only that Ramsey numbers are finite — it says nothing about their actual values, which are notoriously difficult to compute. The argument works by finding a vertex with a large monochromatic neighbourhood and recursing: this is essentially the same pigeonhole strategy used in the warm-up proposition, now made inductive.
[remark: Known Ramsey Numbers]
Very few Ramsey numbers are known exactly. We have $R(3) = R(3,3) = 6$, established above and confirmed by the $K_5$ construction showing $R(3) > 5$. The value $R(4) = 18$ is known. The value of $R(5)$ remains unknown; current bounds are $43 \leq R(5) \leq 48$. The difficulty of computing even $R(5)$ illustrates that while Ramsey numbers exist and are finite, the recursion provides upper bounds that are far from sharp.
[/remark]
The near-total ignorance about specific Ramsey numbers, despite the elegance of the existence proof, reflects a deeper theme: Ramsey theory is fundamentally asymptotic. The existence theorem answers "does a threshold exist?" — the hard problem is determining where that threshold lies.
## Multicolour Ramsey Theory
The 2-colour theorem generalises immediately to any fixed number of colours, via a bootstrapping argument that reduces $k$ colours to 2 colours via one additional application of the two-colour theorem.
[definition: Multicolour Ramsey Number]
Let $k \geq 2$ and $s_1, \ldots, s_k \geq 2$. The **multicolour Ramsey number** $R_k(s_1, \ldots, s_k)$ is the minimal $n$ such that every $k$-edge colouring of $K_n$ contains a $K_{s_i}$ coloured with colour $i$ for some $i \in \{1, \ldots, k\}$.
[/definition]
The definition extends the two-colour framework to a genuine $k$-way competition: we are asking for the smallest graph that forces a monochromatic clique no matter how many colours are used. Before proving this number is finite, it is worth appreciating why this is not immediate from the two-colour case alone. Knowing that $K_N$ forces a red or blue clique under any 2-colouring does not directly tell us what happens under a 3-colouring, because the structure of three colour classes is fundamentally different from two, and no simple substitution collapses them. The induction below circumvents this by merging colours rather than separating them.
[quotetheorem:2047]
[citeproof:2047]
The theorem guarantees that $R_k(s_1,\ldots,s_k)$ is finite, but it is important to understand what it does not say. It gives no sharp bounds, and the merge-colours trick is genuinely lossy: when we merge $k$ colours into 2, we apply the two-colour theorem once to obtain a clique of size $R_{k-1}(s_2,\ldots,s_k)$, then recurse. At each stage the clique sizes are themselves Ramsey numbers from the previous level, so the resulting bound has tower-type dependence on $k$. Concretely, $R_3(s,s,s) \leq R(s, R(s,s))$, and since $R(s,s) \leq 4^s$, this gives a bound on the order of $4^{4^s}$ — a tower of height 2 in $s$. But the actual multicolour Ramsey number for three colours could grow much more slowly; the merge argument cannot detect this because it discards all structural information about the individual colour classes the moment they are lumped together. Whether the true growth of $R_k(s,\ldots,s)$ for fixed $k$ and $s \to \infty$ is much slower than tower$(k)$ is unknown in general. No explicit, derandomisable construction matching the probabilistic lower bounds is known for $k \geq 3$.
The reduction from $k$ colours to 2 is elegant but potentially wasteful, and the resulting bound is a tower-type bound that grows very rapidly with $k$. For practical purposes, the multicolour theorem is most useful in the $k=3$ case, where it says that among enough people any two of whom share a favourite colour, three will share the same colour.
## Infinite Ramsey Theory
The finite Ramsey theorem tells us that for any $s$, there exists a threshold $n$ beyond which every 2-colouring of $K_n$ forces a monochromatic $K_s$. A natural question is what happens if we allow $n = \infty$: does every 2-colouring of the countably infinite complete graph contain an infinite monochromatic clique?
[quotetheorem:2048]
[citeproof:2048]
The subtlety highlighted by the course is that the finite Ramsey theorem does not directly give this result. One cannot simply "take a limit" of the finite Ramsey statements: the finite version guarantees a clique of size $s$ for each fixed $s$, but these cliques need not be nested or converge to an infinite monochromatic set. The infinite theorem requires its own direct construction.
[remark: Extension to Finitely Many Colours]
The same proof works for any finite number of colours $k$: at each step, $x_i$ has infinitely many neighbours in $S_{i-1}$, and at least one colour class must be infinite. The inductive structure is identical to the multicolour reduction in the finite case.
[/remark]
The interplay between the finite and infinite versions runs deeper: one can also recover the finite Ramsey theorem from the infinite version via compactness, though the course notes the relationship without developing it in full. The infinite Ramsey theorem is thus not merely a strengthening of the finite case — it is a logically related but genuinely distinct result.
[example: Arithmetic Colourings]
Consider the colouring of $\mathbb{N}^{(2)}$ defined by $c(\{i,j\}) = v_2(i+j) \pmod{2}$, where $v_2(m)$ denotes the 2-adic valuation — the largest $n$ such that $2^n \mid m$. The set $X = \{2^2, 2^4, 2^6, \ldots\}$ of even powers of 2 is an infinite monochromatic clique under this colouring: for $i = 2^{2a}$ and $j = 2^{2b}$ with $a \neq b$, say $a < b$, we have $i + j = 2^{2a}(1 + 2^{2(b-a)})$, so $v_2(i+j) = 2a$, which is always even. Hence all edges in $X$ receive colour $0$.
By contrast, the colouring defined by $c(\{i,j\}) = \omega(i+j) \pmod{2}$, where $\omega(m)$ denotes the number of distinct prime factors of $m$, also has an infinite monochromatic clique by the infinite Ramsey theorem — but determining which colour class contains it is an open problem in analytic number theory.
[/example]
The second colouring in the example is a genuine illustration of how opaque the Ramsey theorem can be: existence is guaranteed, but explicit identification of the monochromatic set may depend on unresolved number-theoretic questions.
## Ramsey Theory for r-Sets
Both the finite and the infinite Ramsey theorems above concern colourings of 2-element subsets — the edges of a complete graph. A natural generalisation asks what happens when we colour $r$-element subsets for arbitrary $r \geq 2$.
To make the generalisation concrete before defining it, consider $r = 3$. Take $[6] = \{1,2,3,4,5,6\}$ and colour each 3-element subset $\{i,j,k\}$ by $(i+j+k) \pmod{2}$: a triple gets colour 0 if its element-sum is even, and colour 1 if odd. Does there exist a 4-element set $S \subseteq [6]$ whose six 3-subsets are all the same colour? One can check: $S = \{1,3,5,7\}$ (once we allow a larger ground set) would have all 3-subsets with odd sum, forming a monochromatic set under colour 1. The question of how large the ground set must be to guarantee such a monochromatic 4-element set is exactly the finite Ramsey question for $r=3$, generalised below.
[definition: r-Set Colouring]
For $r \geq 1$ and a set $\mathbb{N}$, write $\mathbb{N}^{(r)}$ for the collection of all $r$-element subsets of $\mathbb{N}$. A **2-colouring of $r$-sets** is a function $c : \mathbb{N}^{(r)} \to \{\text{red}, \text{blue}\}$. A set $X \subseteq \mathbb{N}$ is **monochromatic** for this colouring if all $r$-element subsets of $X$ receive the same colour.
[/definition]
With this language in place, we can ask the infinite version of the $r$-set question directly: does every 2-colouring of $\mathbb{N}^{(r)}$ admit an infinite monochromatic set? The answer is yes, and the proof proceeds by induction on $r$, reducing the $r$-set problem to the $(r-1)$-set problem via a clever freezing of the leading element.
[quotetheorem:2049]
[citeproof:2049]
This theorem is a cornerstone of combinatorics with consequences far beyond graph theory. It implies, for instance, that any infinite sequence of real numbers has an infinite monotone subsequence (Dilworth's theorem in the simplest case), and it underlies the combinatorial part of the Paris–Harrington theorem, which gives a Ramsey-type statement that is true but unprovable in Peano arithmetic (Combinatorics: Ancient and Modern, Wilson and Watkins, eds., Oxford University Press, 2013).
The finite analogue of this result — the existence of $R^{(r)}(s,t)$ — is a separate question that requires its own argument. The finite case does not follow from the infinite one by a naive restriction, because the infinite proof constructs the monochromatic set by an unbounded inductive process that has no natural finite analogue. To handle the finite case we define the relevant Ramsey numbers and prove their existence directly via double induction.
[definition: r-Set Ramsey Number]
Let $r \geq 1$ and $s, t \geq 1$. The **$r$-set Ramsey number** $R^{(r)}(s,t)$ is the minimal $n$ such that every 2-colouring of $\{1,\ldots,n\}^{(r)}$ contains either a red set $S$ with $|S| = s$ (all $r$-subsets of $S$ red) or a blue set $T$ with $|T| = t$ (all $r$-subsets of $T$ blue).
[/definition]
The definition mirrors the standard Ramsey number $R(s,t) = R^{(2)}(s,t)$ but lifts it to arbitrary $r$. Before stating the existence theorem, it is worth recording the boundary values, which will serve as base cases in the induction.
[remark: Base Cases of r-Set Ramsey Numbers]
The $r=1$ case: $R^{(1)}(s,t) = s + t - 1$, since colouring $n$ elements with two colours forces $s$ of one colour or $t$ of the other precisely when $n \geq s + t - 1$. The $r=2$ case recovers the standard Ramsey number: $R^{(2)}(s,t) = R(s,t)$. The boundary case $R^{(r)}(r,t) = t = R^{(r)}(t,r)$: any $r$-element set is automatically a "red $K_r$", so the constraint is vacuous on that side.
[/remark]
The recurrence for $R^{(r)}$ is more complex than the one for $R(s,t)$, but the structure is analogous:
[quotetheorem:2050]
[citeproof:2050]
The double induction on both $r$ and $s+t$ is not a technicality that could be replaced by a simpler single induction on $r$ alone. The reason is structural: the recurrence $R^{(r)}(s,t) \leq R^{(r-1)}(R^{(r)}(s-1,t), R^{(r)}(s,t-1)) + 1$ has $R^{(r)}$ on both sides — the right-hand side depends on $R^{(r)}$ at smaller values of $s+t$. So to know the value of $R^{(r-1)}$ needed at the inductive step for fixed $r$, one must already have established all values of $R^{(r)}(s',t')$ with $s'+t' < s+t$. A single induction on $r$ cannot provide this, because at the step $r \mapsto r$ it would require knowing $R^{(r)}$ in advance. The double induction resolves this by first establishing $R^{(r)}$ for all $s+t$ via the inner induction, and only then advancing $r$ in the outer induction.
The bound $R^{(r)}(s,t) \leq f_r(s+t)$ for appropriately defined rapidly-growing tower functions $f_r$ emerges from iterating this recurrence. Define $f_1(x) = 2x$ and $f_n(x) = f_{n-1}^x(x)$ (iterate $f_{n-1}$ on itself $x$ times). Then $f_2(x) \sim 2^x$, $f_3(x)$ is a tower of exponentials, and so on. The $r$-set Ramsey number $R^{(r)}(s,t)$ grows asymptotically on the order of $f_r(s+t)$. This hyper-rapid growth explains why even the $r=2$ case (standard Ramsey numbers) already outpaces our computational abilities for moderate values of $s$.
## Upper Bounds on Ramsey Numbers
The recursion $R(s,t) \leq R(s-1,t) + R(s,t-1)$ is powerful enough to produce an explicit closed-form upper bound via the Pascal-identity structure of binomial coefficients.
[quotetheorem:2051]
[citeproof:2051]
The bound $R(s) \leq 4^s$ should be interpreted as saying that Ramsey numbers grow at most exponentially in $s$. This is a genuine structural insight: it rules out superexponential growth, and the true growth rate of $R(s)$ is one of the most important open problems in combinatorics. The current best upper bound on $R(s,s)$ has the form $R(s,s) \leq (4 - \varepsilon)^s$ for a small explicit $\varepsilon > 0$ (Campos, Griffiths, Morris, Sahasrabudhe, 2023), a major recent breakthrough that had resisted improvement for sixty years.
## Lower Bounds: The Probabilistic Method
Upper bounds on Ramsey numbers come from explicit combinatorial constructions (like the recursive argument above). Lower bounds require demonstrating the existence of colourings with no large monochromatic clique. For small values, explicit constructions suffice.
[quotetheorem:2052]
[citeproof:2052]
This construction is elementary but only quadratic. The true lower bound on $R(s)$ is exponential, and establishing it required Erdős's introduction of the probabilistic method to combinatorics — arguably one of the most influential proof techniques of the twentieth century.
[quotetheorem:2053]
[citeproof:2053]
The Erdős lower bound is a landmark not just because of its quantitative content but because of the method it introduces. The proof derives the existence of a good colouring entirely from a probabilistic calculation — no colouring is ever written down explicitly. This is the hallmark of the probabilistic method: random objects are shown to have a desired property because the probability of failure is less than 1, which forces the complementary probability to be positive, guaranteeing existence. Seeing this argument for the first time, the natural reaction is scepticism: how can probability tell us anything about a deterministic combinatorial object? The answer is that probability here is merely a proof device. The random colouring model is a mathematical construct; what the calculation actually shows is that the average number of monochromatic cliques — averaged over all possible colourings — is less than 1. If the average is less than 1, at least one colouring must achieve 0 monochromatic cliques.
[remark: The Probabilistic Method]
The Erdős lower bound is **nonconstructive**: the proof guarantees that a good colouring exists without exhibiting one. Providing an explicit (deterministic) construction of a colouring of $K_n$ for $n = \lfloor 2^{s/2} \rfloor$ with no monochromatic $K_s$ is a major open problem in combinatorics. It is also open to improve the constant: any explicit construction showing $R(s) > (1+\varepsilon)^s$ for some $\varepsilon > 0$ would be a significant breakthrough.
[/remark]
Taken together, the upper and lower bounds give exponential growth: $2^{s/2} \leq R(s) \leq 4^s$. The true base of the exponential remains unknown, though it lies somewhere in the interval $[\sqrt{2}, 4]$. The gap between these bounds — a factor of $(\sqrt{2})^s$ in the exponent — captures the depth of our ignorance about Ramsey numbers, and closing even part of this gap has required some of the most sophisticated tools in modern combinatorics.
The Ramsey bounds we have derived — the exponential lower and upper bounds on diagonal Ramsey numbers — hint at a vast and largely unexplored landscape where precise values remain tantalisingly unknown. The probabilistic method, the subject of the next chapter, offers a powerful technique for navigating this landscape: instead of constructing graphs explicitly, we use probability to prove existence without revealing explicit structure.
# 6. Random Graphs
The probabilistic method is one of the most powerful tools in combinatorics: to prove that a graph with certain properties exists, one constructs a suitable probability space on graphs and shows the desired property holds with positive probability. This chapter develops that method in depth, beginning with a return to the Zarankiewicz problem from Chapter 4, then proving Erdős's celebrated theorem on graphs of simultaneously high chromatic number and high girth, and finally building the theory of binomial random graphs $G(n,p)$ — studying the sharp threshold for connectivity and the second moment method for subgraph containment.
## Probabilistic Lower Bounds for Zarankiewicz Numbers
Recall from Chapter 4 the Zarankiewicz number $Z(n,t)$: the maximum number of edges in a bipartite graph on vertex classes of size $n$ that contains no complete bipartite subgraph $K_{t,t}$. The Kővári–Sós–Turán theorem established the upper bound $Z(n,t) \leq 2n^{2-1/t}$, but no construction was given to show this is close to sharp. The question is: can we match this upper bound with a lower bound of the same order?
[quotetheorem:2054]
[citeproof:2054]
The lower bound $Z(n,t) \geq \frac{1}{2}n^{2-2/(t+1)}$ and the upper bound $Z(n,t) \leq 2n^{2-1/t}$ do not match in exponent: the exponents $2-2/(t+1)$ and $2-1/t$ differ unless one looks at $t = 1$ (where both collapse). For $t = 2$ the bounds are $n^{4/3}$ and $n^{3/2}$ respectively, and the true asymptotics remain open. The probabilistic argument here is a prototype for the **probabilistic deletion method**: construct a random object that is close to what you want, then remove a bounded amount of structure to fix the remaining violations.
## Graphs of High Girth and High Chromatic Number
A natural question in colouring theory is whether girth and chromatic number can be simultaneously large. Short cycles force a graph to "look locally like a tree," which might seem to force small chromatic number, since trees are 2-colourable. The remarkable content of Erdős's theorem is that this intuition is false: there is no constraint preventing both invariants from being large simultaneously.
[definition: Girth]
The **girth** of a graph $G$ is the length of a shortest cycle in $G$. If $G$ has no cycles, its girth is defined to be $\infty$.
[/definition]
A graph of girth at least $g$ contains no cycle of length $3, 4, \ldots, g-1$. Trees have infinite girth. The cycle $C_n$ has girth $n$. Before the probabilistic proof, we record a useful lower bound on chromatic number in terms of independence number.
[quotetheorem:2055]
[citeproof:2055]
This bound is sharp for vertex-transitive graphs (where all vertices are equivalent under automorphisms), but can be very loose in general. Its utility here is that controlling $\alpha(G)$ from above and $|G|$ from below gives a lower bound on $\chi(G)$ — the strategy Erdős uses below.
To see that the bound can be far from tight: the Kneser graph $\mathrm{KG}_{n,k}$, whose vertices are $k$-element subsets of $[n]$ with edges between disjoint sets, has independence number $\binom{n-1}{k-1}$ but chromatic number $n - 2k + 2$ by Lovász's theorem. For fixed $k$ and large $n$, the independence number grows polynomially while $n - 2k + 2$ grows linearly, so the ratio $|G|/\alpha(G)$ falls well short of $\chi(G)$.
[quotetheorem:2056]
[citeproof:2056]
This result is considered one of the jewels of the probabilistic method. The key conceptual step is that girth and chromatic number are controlled by different scales: short cycles are controlled by the first moment at scale $p \sim n^{-1+1/g}$, while independence number is controlled by the exponential decay of the probability that a fixed set is independent. The local structure (cycles) and the global structure (colouring) can be tuned independently. It is worth noting that the deletion step at the end is delicate: we delete vertices rather than edges, so that no new short cycles are introduced.
## The Binomial Random Graph Model
The probabilistic method of the previous sections used random graphs as a tool for proving deterministic existence. But the random graph itself is a rich object: what does a typical graph look like? What properties does it almost surely have for a given edge density? For small $p$ most graphs are forests; for $p = 1/2$ almost every graph has diameter 2. The central question becomes: for which $p$ does a given property $\mathcal{Q}$ hold with high probability, and how sharply does the transition occur? These questions have precise and beautiful answers for many natural properties.
[definition: Binomial Random Graph]
The **binomial random graph** $G(n,p)$ is the probability space on graphs with vertex set $\{1, \ldots, n\}$ in which each of the $\binom{n}{2}$ potential edges is included independently with probability $p \in [0,1]$.
[/definition]
The parameter $p$ may depend on $n$, and the central questions concern the limiting behaviour as $n \to \infty$. We write $G \sim G(n,p)$ for a random graph drawn from this model. The notation $a_n \ll b_n$ means $a_n/b_n \to 0$ as $n \to \infty$.
The density of $G(n,p)$ scales like $p\binom{n}{2} \sim pn^2/2$, so $p \sim c/n$ corresponds to an average degree $\sim c$. When $p$ grows faster than $1/n$, the graph becomes increasingly dense.
[explanation: Technique Recipe for Thresholds]
The same recipe underlies all threshold arguments in this chapter. To compute a threshold for a property $\mathcal{Q}$:
(i) **Define a counting variable.** Let $X$ count copies of the relevant structure (a subgraph, an isolated vertex, etc.) in $G \sim G(n,p)$.
(ii) **Compute $\mathbb{E}[X]$.** Identify the scale of $p$ at which $\mathbb{E}[X]$ transitions from $0$ to $\infty$.
(iii) **Upper regime (property fails a.a.s.).** Apply Markov's inequality: $\mathbb{P}(X \geq 1) \leq \mathbb{E}[X] \to 0$ shows that $X = 0$ with high probability, so the property is absent.
(iv) **Lower regime (property holds a.a.s.).** Compute $\operatorname{Var}(X)$ and apply the second moment method: if $\operatorname{Var}(X)/(\mathbb{E}[X])^2 \to 0$, then $\mathbb{P}(X = 0) \to 0$, so the property is present with high probability.
The threshold $p^*$ is where $\mathbb{E}[X] \asymp 1$, and the second moment method succeeds whenever the copies of the structure are "nearly independent," i.e., most pairs share no edges.
[/explanation]
### Triangles: First and Second Moment Methods
Triangles are the simplest non-trivial subgraph, and studying their appearance in $G(n,p)$ is the natural first test for the probabilistic toolkit. The key transition occurs near $p = 1/n$.
Let $X$ denote the number of triangles in $G \sim G(n,p)$. By linearity of expectation,
\begin{align*}
\mathbb{E}[X] = \binom{n}{3} p^3 = \frac{n(n-1)(n-2)}{6} p^3.
\end{align*}
For $n$ large this is approximately $\frac{n^3 p^3}{6} = \frac{(np)^3}{6}$. We see that $\mathbb{E}[X] \to 0$ when $p \ll 1/n$, and $\mathbb{E}[X] \to \infty$ when $p \gg 1/n$.
When $p \ll 1/n$ (so $np \to 0$), Markov's inequality $\mathbb{P}(X \geq 1) \leq \mathbb{E}[X] \to 0$ immediately gives that $G$ contains no triangle with high probability. The case $p \gg 1/n$ requires a more refined argument: the first moment showing $\mathbb{E}[X] \to \infty$ does not by itself imply that $\mathbb{P}(X \geq 1) \to 1$. We need to control variance.
[quotetheorem:514]
This is the simplest first moment bound, and it is tight when $X$ concentrates all its mass at one point. For our purposes it gives upper bounds: when $\mathbb{E}[X]$ is small, $X$ is unlikely to be positive. The nonnegativity hypothesis is essential: a random variable taking values $\pm n$ with equal probability has $\mathbb{E}[X] = 0$, so the inequality $\mathbb{P}(X \geq a) \leq \mathbb{E}[X]/a = 0$ would be false for any $a > 0$ — but of course $\mathbb{P}(X \geq n) = 1/2 > 0$. The bound only applies when $X \geq 0$ so that large values genuinely cost probability.
[quotetheorem:1126]
Chebyshev's inequality requires finite variance: a random variable like the Cauchy distribution has no finite mean or variance, and no analogue of this bound holds for it. The bound is tight for the two-point distribution that puts mass $1/2$ at $\mathbb{E}[X] \pm \sigma$ (where $\sigma^2 = \operatorname{Var}(X)$): setting $t = \sigma$ gives $\mathbb{P}(|X - \mathbb{E}[X]| \geq \sigma) = 1 = \sigma^2/\sigma^2$. Without a finite variance assumption, even a sum of indicators with bounded mean can have arbitrary tail behaviour: if $X_n$ concentrates all its probability at $0$ and $n$ with weights $1 - 1/n$ and $1/n$, then $\mathbb{E}[X_n] = 1$ but $\operatorname{Var}(X_n) = n - 1 \to \infty$, and the Chebyshev bound gives no useful information. The key mechanism for the second moment method is combining Chebyshev with $\mathbb{E}[X] \to \infty$: when the variance is small relative to the squared mean, the distribution is concentrated near its mean and therefore positive with high probability.
[quotetheorem:2057]
[citeproof:2057]
The second moment method reduces the problem $\mathbb{P}(X = 0) \to 0$ to showing $\operatorname{Var}(X)/(\mathbb{E}[X])^2 \to 0$. This ratio equals $\mathbb{E}[X^2]/(\mathbb{E}[X])^2 - 1$, which explains why the method is sometimes stated as "show $\mathbb{E}[X^2] \sim (\mathbb{E}[X])^2$." The variance is small relative to the mean squared precisely when the indicator random variables are nearly uncorrelated, which happens when most pairs of copies of the subgraph share no edges. Note that the method can fail even when $\mathbb{E}[X] \to \infty$: if $\operatorname{Var}(X) = \omega((\mathbb{E}[X])^2)$, the ratio diverges and we learn nothing. For example, taking $X \sim \mathrm{Binomial}(n, 1/n)$ gives $\mathbb{E}[X] = 1$ and $\operatorname{Var}(X) \sim 1$, so $\operatorname{Var}(X)/(\mathbb{E}[X])^2 \sim 1$ and the bound gives only $\mathbb{P}(X = 0) \leq 1$, which is useless; indeed $\mathbb{P}(X = 0) = (1 - 1/n)^n \to e^{-1} \approx 0.37 > 0$.
[quotetheorem:2058]
[citeproof:2058]
The threshold $p \sim 1/n$ is called a **coarse threshold** for triangle containment. The terminology distinguishes it from the sharp threshold encountered in the next section. At the boundary $p = \lambda/n$ for fixed $\lambda > 0$, the limiting probability is $1 - e^{-\lambda^3/6}$, but establishing this requires the method of moments or Poisson approximation and falls outside this course.
[example: The Triangle Threshold at $p = c/n$]
Let $G \sim G(n,p)$ with $p = c/n$ for a fixed constant $c > 0$, and let $X$ count triangles. We compute:
\begin{align*}
\mathbb{E}[X] = \binom{n}{3}\left(\frac{c}{n}\right)^3 = \frac{n(n-1)(n-2)}{6} \cdot \frac{c^3}{n^3} \to \frac{c^3}{6}.
\end{align*}
So the expected number of triangles converges to the finite limit $c^3/6$.
To see what happens to $\operatorname{Var}(X)$, note that for $p = c/n$ the variance computation from the Triangle Threshold proof gives
\begin{align*}
\operatorname{Var}(X) \leq \mathbb{E}[X] + 3n^4 \cdot \frac{c^5}{n^5} = \frac{c^3}{6} + \frac{3c^5}{n} \to \frac{c^3}{6}.
\end{align*}
The variance and mean both converge to the same constant $c^3/6$. This is characteristic of a Poisson limit: a random variable $X$ with $\mathbb{E}[X] \to \lambda$ and $\operatorname{Var}(X) \to \lambda$ and whose distribution concentrates on small integer values converges in distribution to $\mathrm{Poisson}(\lambda)$. A careful application of the method of moments (computing all factorial moments) confirms that $X \xrightarrow{d} \mathrm{Poisson}(c^3/6)$. In particular,
\begin{align*}
\mathbb{P}(G \text{ contains no triangle}) \to \mathbb{P}(\mathrm{Poisson}(c^3/6) = 0) = e^{-c^3/6}.
\end{align*}
For $c < (6 \log 2)^{1/3} \approx 1.75$, this probability exceeds $1/2$, so most graphs at this density are triangle-free; for $c > (6 \log 2)^{1/3}$, most graphs contain triangles.
[/example]
[example: The Threshold for $K_4$ Containment]
We determine the threshold $p^*$ such that $K_4 \subset G(n,p)$ holds with high probability if and only if $p \gg p^*$. Let $X$ count copies of $K_4$ in $G \sim G(n,p)$. Since $K_4$ has 4 vertices and 6 edges,
\begin{align*}
\mathbb{E}[X] = \binom{n}{4} p^6 \approx \frac{n^4}{24} p^6.
\end{align*}
Setting $\mathbb{E}[X] \asymp 1$ gives $n^4 p^6 \asymp 1$, so $p \asymp n^{-4/6} = n^{-2/3}$.
This exponent $-2/3$ is precisely $-1/m(K_4)$ where $m(K_4) = e(K_4)/|K_4| = 6/4 = 3/2$ is the edge density of $K_4$. The general principle (the densest subgraph controls the threshold) applies here because $K_4$ is itself the densest subgraph of $K_4$ — every proper subgraph $H \subsetneq K_4$ satisfies $e(H)/|H| < 3/2$. For instance, a triangle has $e/v = 3/3 = 1 < 3/2$, and a single edge has $e/v = 1/2$.
To confirm both directions: when $p \ll n^{-2/3}$, $\mathbb{E}[X] \to 0$ and Markov gives $\mathbb{P}(K_4 \subset G) \to 0$. When $p \gg n^{-2/3}$, the second moment method applies because $K_4$ is balanced (all subgraphs have smaller density), so $\operatorname{Var}(X)/(\mathbb{E}[X])^2 \to 0$ and $\mathbb{P}(K_4 \subset G) \to 1$. Thus the threshold for $K_4$ is $p^* = n^{-2/3}$.
[/example]
[remark: Densest Subgraph Controls the Threshold]
The triangle threshold is not representative of all subgraph thresholds. Consider the graph $H$ consisting of a triangle together with $1000$ additional isolated vertices. The expected number of copies of $H$ in $G(n,p)$ grows like $\binom{n}{1003}p^3 \approx n^{1003} p^3/1003!$, which becomes large when $p = n^{-1000/3} \ll 1/n$. At this value of $p$, though, $G(n,p)$ contains no triangles with high probability, so certainly no copies of $H$.
The general principle is: the threshold for $H \subset G$ is governed by the **densest subgraph** of $H$, the subgraph $K$ that maximises the ratio $e(K)/|K|$ over all subgraphs $K$ with at least one edge. When the expected number of copies of $K$ in $G(n,p)$ diverges, the probability that $G$ contains a copy of $H$ approaches $1$.
[/remark]
The significance of this remark is that the "bottleneck" for a graph $H$ appearing in $G(n,p)$ is the part of $H$ that is hardest to embed, which is its densest subgraph. For balanced graphs (where every subgraph has the same density as $H$ itself, like the triangle), the threshold is simply $p \sim n^{-1/m(H)}$ where $m(H) = e(H)/|H|$.
## Connectivity and Sharp Thresholds
The emergence of connectivity in $G(n,p)$ illustrates a qualitatively different phenomenon: a **sharp threshold**, where the transition from the property being absent to it being present occurs over a much narrower window of $p$ than the coarse threshold $p \sim 1/n$ seen for triangles.
The obstacle to connectivity is the existence of isolated vertices. An isolated vertex $v$ has no edges to the rest of the graph, so it cannot be connected to any other vertex. When $p$ is below the connectivity threshold, isolated vertices are the primary obstruction.
[quotetheorem:2059]
[citeproof:2059]
The critical scale $p \sim \frac{\log n}{n}$ is conspicuously larger than $1/n$. The reason is that for vertex $v$ to become isolated, all $n-1$ potential edges must fail to appear; the probability of this is $(1-p)^{n-1} \approx e^{-pn}$, so we need $e^{-pn} \approx 1/n$ for the expected number of isolated vertices to be order 1, giving $pn \approx \log n$.
[remark: What the Isolated Vertex Threshold Does Not Say]
The theorem above gives a clean dichotomy at the scale $p \sim \log n / n$, but it leaves open several finer questions. It does not describe the distribution of the number of isolated vertices at the critical scale $p = (\log n + c)/n$ for fixed $c \in \mathbb{R}$: at this scale, the expected number of isolated vertices is $e^{-c}$, and the distribution converges to $\mathrm{Poisson}(e^{-c})$. It also does not describe the width of the transition window: the transition from "a.a.s. no isolated vertices" to "a.a.s. isolated vertices exist" occurs over a window of width $O(1/n)$ around $p = \log n/n$, which is much sharper than the $\Theta(1/n)$ threshold window for triangles.
[/remark]
[quotetheorem:2060]
[citeproof:2060]
The connectivity threshold and the isolated vertex threshold are in fact the same: a graph is disconnected if and only if it has an isolated vertex at this scale. More precisely, the bottleneck to connectivity at $p \sim \log n/n$ is exactly the isolated vertices — once all vertices have at least one edge, the graph is connected with high probability. This is a genuinely remarkable coincidence of two thresholds.
[remark: Sharp vs Coarse Thresholds]
The connectivity threshold illustrates a **sharp threshold**: the transition from "disconnected with high probability" to "connected with high probability" occurs over a window of width $o(\log n/n)$ around $p = \log n/n$. By contrast, the triangle threshold is **coarse**: the transition occurs over an interval of width $\sim 1/n$, and there is an entire range $p \sim c/n$ where neither $\mathbb{P}(K_3 \subset G) \to 0$ nor $\mathbb{P}(K_3 \subset G) \to 1$.
Heuristically, sharp thresholds tend to arise for global properties (connectivity, Hamiltonicity) that depend on the entire graph in an all-or-nothing way. Coarse thresholds tend to arise for local properties (triangle containment, fixed-subgraph containment) that depend only on a constant-size subset of vertices.
[/remark]
[remark: What the Connectivity Threshold Does Not Say]
The connectivity theorem says nothing about the fine structure of the transition. At the critical point $p = (\log n + c)/n$, the limiting probability that $G$ is connected equals the limiting probability that $G$ has no isolated vertices, and both converge to $e^{-e^{-c}}$. The theorem as stated does not capture this quantitative window, nor does it describe the order in which small components (paths, trees, unicyclic graphs) disappear as $p$ increases toward the connectivity threshold. These finer questions involve Poisson limits and exact enumeration of small components.
[/remark]
The interplay between threshold phenomena and graph properties is one of the central themes of modern probabilistic combinatorics. The techniques developed here — first moment (Markov), second moment (Chebyshev), and the union bound — are the fundamental tools; more sophisticated questions require finer methods such as the Lovász Local Lemma, martingale concentration inequalities, and Fourier analysis on the Boolean hypercube.
The probabilistic method and its refinements — moment methods, concentration inequalities, and the Lovász Local Lemma — have shown how randomness can be used to prove existence results and establish tight bounds on random graph properties. To push these techniques further and to unlock the deepest structures in graph theory, we now introduce the algebraic perspective: the adjacency matrix and its spectrum provide a gateway to linear algebra, offering computational power and structural insight far beyond what counting arguments alone can deliver.
# 7. Algebraic Graph Theory
This chapter brings linear algebra to bear on graph theory, giving access to tools far more powerful than purely combinatorial counting arguments. The central object is the adjacency matrix of a graph, a symmetric matrix whose eigenvalues encode deep structural properties of the graph. Along the way we encounter Moore graphs — extremal objects whose existence is tied to a remarkable integrality condition — and strongly regular graphs, a class so constrained by spectral arithmetic that only finitely many degree sequences are possible for each order.
## Diameter Constraints and Moore Graphs
How many vertices can a graph have if every pair of vertices is connected by a short path? This is not merely a curiosity: it underlies the design of efficient communication networks, where low diameter means fast routing. The simplest non-trivial case is diameter at most 2, and the answer turns out to hinge on the maximum degree alone.
[definition: Diameter]
Let $G$ be a connected graph. The **diameter** of $G$ is
\begin{align*}
\operatorname{diam}(G) = \max\{d(x, y) \mid x, y \in V(G)\},
\end{align*}
where $d(x,y)$ denotes the length of a shortest path between $x$ and $y$.
[/definition]
The diameter equals $1$ precisely when $G$ is complete, in which case $G$ has $\binom{n}{2}$ edges. For diameter $2$, every vertex is reachable in at most two steps from any other vertex, which forces a tight relationship with the maximum degree $\Delta(G)$.
[quotetheorem:2061]
[citeproof:2061]
The bound $\Delta(G)^2 + 1$ is tight only for very special graphs. Equality forces the neighbourhood structure to be as efficient as possible: every vertex has exactly $\Delta(G)$ neighbours, and every pair of non-adjacent vertices has exactly one common neighbour. This motivates the following definition.
[definition: Moore Graph]
A connected graph $G$ with $\operatorname{diam}(G) \leq 2$ is a **Moore graph** if $|G| = \Delta(G)^2 + 1$.
[/definition]
Moore graphs are necessarily regular: if any vertex had degree less than $\Delta(G)$, the counting in the proof above would be strict, preventing equality. They also contain no triangles: if $x \sim y \sim z \sim x$, then $y$ and $z$ are both neighbours of $x$ who share $x$ as a common neighbour, but they are also adjacent, giving two paths of length at most $2$ between them — contradicting the uniqueness property characterising Moore graphs.
[remark: Characterisation of Moore Graphs]
A graph $G$ is a Moore graph if and only if it is connected with $\operatorname{diam}(G) \leq 2$ and every distinct pair $x, y \in V(G)$ has a **unique** path of length at most $2$ between them. This uniqueness characterisation is equivalent to equality in the counting bound.
[/remark]
The two smallest Moore graphs are well-known: the 5-cycle $C_5$ is a Moore graph with $\Delta = 2$, and the Petersen graph is a Moore graph with $\Delta = 3$. Determining which degrees permit Moore graphs requires the spectral machinery developed in the following sections.
## Adjacency Matrices and Their Eigenvalues
Given a graph, how do we encode its structure in an algebraic object we can compute with? The adjacency matrix is the natural answer — it records which pairs of vertices are adjacent — but its real power comes from treating the associated eigenvalue problem seriously.
[definition: Adjacency Matrix]
Let $G$ be a graph on vertex set $\{1, \ldots, n\}$. The **adjacency matrix** of $G$ is the $n \times n$ matrix $A_G$ with entries
\begin{align*}
(A_G)_{xy} = \mathbb{1}_{xy \in E(G)}.
\end{align*}
[/definition]
The matrix $A_G$ is symmetric with zero diagonal, so $\operatorname{tr}(A_G) = 0$. Since $A_G$ is symmetric, it is diagonalisable over $\mathbb{R}$ and all its eigenvalues are real.
[quotetheorem:2062]
[citeproof:2062]
This proposition has immediate applications: the diagonal entry $(A_G^3)_{xx}$ counts closed walks of length $3$ through $x$, which equals twice the number of triangles through $x$. The trace $\operatorname{tr}(A_G^2) = \sum_{xy} (A_G)_{xy}^2 = 2|E(G)|$ recovers the edge count.
[example: Adjacency Matrix of $C_4$]
Consider the 4-cycle $C_4$ on vertices $\{1, 2, 3, 4\}$ with edges $12, 23, 34, 41$. The adjacency matrix is
\begin{align*}
A_{C_4} = \begin{pmatrix} 0 & 1 & 0 & 1 \\ 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & 1 & 0 \end{pmatrix}.
\end{align*}
For the vector $x = (1, 2, -2, 3)^\top$, the product $A_{C_4} x$ has $i$-th entry equal to the sum of $x_j$ over all $j \sim i$:
\begin{align*}
(A_{C_4} x)_1 &= x_2 + x_4 = 2 + 3 = 5, \\
(A_{C_4} x)_2 &= x_1 + x_3 = 1 + (-2) = -1, \\
(A_{C_4} x)_3 &= x_2 + x_4 = 2 + 3 = 5, \\
(A_{C_4} x)_4 &= x_1 + x_3 = 1 + (-2) = -1.
\end{align*}
So $A_{C_4} x = (5, -1, 5, -1)^\top$.
To find the eigenvalues, note first that $\mathbf{1} = (1,1,1,1)^\top$ satisfies $A_{C_4}\mathbf{1} = 2\mathbf{1}$, so $\lambda_1 = 2$ is an eigenvalue. The matrix $A_{C_4}$ has rank $2$ (the row space is spanned by $(0,1,0,1)$ and $(1,0,1,0)$), giving two zero eigenvalues. Since eigenvalues sum to $\operatorname{tr}(A_{C_4}) = 0$, we need $2 + 0 + 0 + \lambda_4 = 0$, giving $\lambda_4 = -2$. The vector $(1,-1,1,-1)^\top$ confirms this: $A_{C_4}(1,-1,1,-1)^\top = (-2, 2, -2, 2)^\top = -2(1,-1,1,-1)^\top$.
[/example]
The spectral properties of symmetric matrices now give us powerful tools. By the spectral theorem, $A_G$ has an orthonormal basis of eigenvectors $u_1, \ldots, u_n$ with real eigenvalues $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n$. Since $\operatorname{tr}(A_G) = 0$, the eigenvalues sum to zero, so if $G$ is non-empty then $\lambda_1 > 0$ and $\lambda_n < 0$.
The largest and smallest eigenvalues admit a variational characterisation that will be essential for the key bounds.
[quotetheorem:2063]
The expression $\langle x, Ax \rangle / \langle x, x \rangle$ is called the **Rayleigh quotient** of $x$. The theorem states that the extremal eigenvalues are exactly the extremal values of the Rayleigh quotient, and the extrema are attained at the corresponding eigenvectors. This result follows from the spectral decomposition of $A$ and will be used extensively below. The proof uses standard linear algebra from Part IB.
The following proposition collects the fundamental spectral bounds for graph eigenvalues, connecting them to the minimum and maximum degrees of $G$.
[quotetheorem:2064]
[citeproof:2064]
Part (ii) of this theorem has a striking implication: in a connected graph, the all-ones vector is an eigenvector exactly when the graph is regular. Part (iii) characterises bipartite regular graphs by the symmetry of their spectrum: the eigenvalue $-\Delta$ is the "mirror" of $\Delta$, reflecting the bipartition structure. Part (iv) provides a lower bound on the largest eigenvalue purely from the degree sequence, with equality when the graph is regular.
## Strongly Regular Graphs
The spectral analysis of adjacency matrices becomes especially powerful when applied to graphs with a high degree of symmetry. Strongly regular graphs are defined by three parameters controlling the neighbourhood intersection counts, and the integrality of eigenvalue multiplicities imposes a severe arithmetic constraint on which parameter triples are feasible.
[definition: Strongly Regular Graph]
A graph $G$ on $n$ vertices is **$(k, a, b)$-strongly regular** if:
(i) $G$ is $k$-regular;
(ii) every pair of adjacent vertices has exactly $a$ common neighbours: $|N(x) \cap N(y)| = a$ for all $x \sim y$;
(iii) every pair of distinct non-adjacent vertices has exactly $b$ common neighbours: $|N(x) \cap N(y)| = b$ for all $x \not\sim y$, $x \neq y$.
[/definition]
The uniformity of neighbourhood intersections is a very strong condition: it means the graph looks "the same" from every edge and from every non-edge, a discrete analogue of vertex-transitivity. The cycle $C_4$ is $(2,0,2)$-strongly regular (adjacent vertices share no common neighbour, non-adjacent vertices share both remaining vertices as common neighbours), while $C_5$ is $(2,0,1)$-strongly regular. Any Moore graph is $(k, 0, 1)$-strongly regular, since adjacent vertices in a triangle-free graph share no common neighbour, and non-adjacent vertices must have a unique common neighbour by the path-of-length-2 uniqueness property.
The power of the strongly regular framework lies in what the adjacency matrix equation forces about the eigenvalue spectrum. Since the number of common neighbours between $x$ and $y$ is $(A_G^2)_{xy}$, the conditions (ii) and (iii) translate directly into a matrix polynomial identity.
[quotetheorem:2065]
[citeproof:2065]
The integrality condition in this theorem is genuinely powerful. It says that for most triples $(k, a, b)$, no strongly regular graph with those parameters can exist — the arithmetic simply fails to produce integers. This severely restricts which regular graphs with uniform neighbourhood intersections can appear.
Returning to Moore graphs, their spectral parameters are forced.
[quotetheorem:2066]
[citeproof:2066]
The cases $k = 2$ and $k = 3$ are realised by $C_5$ and the Petersen graph. The case $k = 7$ is realised by the Hoffman–Singleton graph, a remarkable 50-vertex graph constructed explicitly. The case $k = 57$ remains open: it is not known whether a Moore graph of degree $57$ exists. No proof of non-existence has been found, but no construction is known either. This unresolved case stands as one of the more tantalising open problems in algebraic combinatorics.
[remark: The $k = 57$ Problem]
The existence of a Moore graph of degree $57$ (which would have $57^2 + 1 = 3250$ vertices) is unknown. Hoffman and Singleton proved in 1960 that these four degrees are the only possibilities; subsequent work has shown that if such a graph exists, it cannot be vertex-transitive. The problem has resisted all attempts at resolution for over sixty years.
[/remark]
## References
- Bollobás, B. *Modern Graph Theory*. Springer, 1998.
- Diestel, R. *Graph Theory*, 5th ed. Springer, 2017.
- Alon, N. and Spencer, J. H. *The Probabilistic Method*, 4th ed. Wiley, 2016.
- Cambridge Part II Mathematical Tripos: Graph Theory (2024 lecture notes).