Combinatorial optimisation studies how to choose the best object from a finite collection subject to discrete constraints. These notes develop the certificate viewpoint behind exact algorithms: matchings, flows, matroids, submodular functions, and polyhedra are useful because they let a proposed optimum be checked by an independently verifiable structure.
# 1. Optimisation Problems on Finite Structures
Combinatorial optimisation studies how to choose the best object from a finite but often enormous collection. The objects may be paths in a graph, subsets of a ground set, matchings between two populations, or integer vectors satisfying linear constraints. The point of the subject is not only to compute an optimum, but to explain why the computation is correct through structures such as exchanges, augmenting paths, cuts, and dual certificates.
This first chapter sets up the language used throughout the course. We describe finite optimisation problems, the role of weights, the meaning of feasibility, and the basic forms of certificate that allow an algorithm to stop with a proof of optimality. The prerequisites are basic discrete mathematics, elementary graph theory, and enough linear algebra to read vectors, matrices, and linear inequalities. The examples are deliberately classical: shortest paths and minimum spanning trees return as running certificate examples, while assignment problems return in Chapter 3 as weighted bipartite matching.
## Optimising Over Finite Families
What does it mean to optimise when the search space is finite but too large to list? The starting point is to separate the data of a problem from its feasible solutions and objective function. This separation lets us compare graph problems, set-system problems, and integer problems using the same vocabulary. The definition below names this common format so later arguments can talk about feasibility, objective values, and certificates without reintroducing the surrounding model each time.
[definition: Finite Optimisation Problem]
A finite optimisation problem consists of a finite feasible set $\mathcal F$, an objective function $c: \mathcal F \to \mathbb R$, and a choice of whether to minimise or maximise $c$ over $\mathcal F$.
[/definition]
The feasible set is rarely given by an explicit list. In combinatorial optimisation it is usually described implicitly, for example as all paths between two vertices, all spanning trees of a graph, or all integer points satisfying a collection of constraints. The first example shows this implicit description in a graph.
[example: Shortest Paths As Optimisation]
Let $G=(V,E)$ be a directed graph, let $s,t\in V$, and let $w:E\to\mathbb R$ assign a length to each directed edge. A feasible solution is a directed $s$-$t$ path
\begin{align*}
P=(v_0v_1,\ v_1v_2,\ \dots,\ v_{k-1}v_k),
\end{align*}
where $v_0=s$, $v_k=t$, and $v_{i-1}v_i\in E$ for each $i=1,\dots,k$. Its objective value is the sum of the edge lengths along the path:
\begin{align*}
w(P)=\sum_{e\in P}w(e).
\end{align*}
For this path, the same sum written in vertex order is
\begin{align*}
w(P)=w(v_0v_1)+w(v_1v_2)+\cdots+w(v_{k-1}v_k).
\end{align*}
The shortest path problem is therefore the minimisation problem of finding a feasible directed $s$-$t$ path $P$ whose value $w(P)$ is as small as possible. The feasible family is described implicitly by the graph and the endpoints $s,t$, so this is a finite optimisation problem even when the individual paths are not listed one by one.
[/example]
The path example separates the feasible family from the numerical edge lengths. This motivates a distinction that will recur whenever a structural existence problem is refined into an optimisation problem with costs or profits.
[definition: Weighted And Unweighted Problem]
A finite optimisation problem is unweighted if every feasible solution is compared only by an intrinsic size or by feasibility itself. It is weighted if the input includes a finite index set $I$, a weight function $w:I\to\mathbb R$, and an objective function $c:\mathcal F\to\mathbb R$ whose value on each feasible solution is computed from the weights $w_i$.
[/definition]
This distinction matters because the same feasible family can support very different questions. Sometimes feasibility or cardinality is the whole problem; in other cases the structure only describes the candidates, while the weights decide which candidate is best.
For graph problems, weights are often assigned to edges or vertices. For set systems, weights are assigned to elements of the ground set. For integer-point problems, weights appear as coefficients in a linear objective. The spanning tree problem shows how weights can create the real optimisation task.
[example: Minimum Spanning Tree]
Let $G=(V,E)$ be a connected undirected graph with edge weights $w:E\to \mathbb R$. A feasible solution is a spanning tree $T\subset E$, meaning that the subgraph with edge set $T$ is connected, contains every vertex of $G$, and has no cycle. Its objective value is the total weight
\begin{align*}
w(T)=\sum_{e\in T}w(e).
\end{align*}
Thus, if $T=\{e_1,e_2,\dots,e_m\}$, then the same value written edge by edge is
\begin{align*}
w(T)=w(e_1)+w(e_2)+\cdots+w(e_m).
\end{align*}
The minimum spanning tree problem is the minimisation problem
\begin{align*}
\text{minimise } \sum_{e\in T}w(e)
\quad\text{over all spanning trees }T\subset E.
\end{align*}
In the unweighted version, all feasible solutions have the same intrinsic size: every spanning tree on the vertex set $V$ has $|V|-1$ edges. Therefore counting edges cannot distinguish one spanning tree from another, while the weighted objective can distinguish them through the values $w(e)$. This is why the weights create the meaningful optimisation problem.
[/example]
Spanning trees are subsets of the edge set satisfying a feasibility rule. To compare them with matchings, independent sets, and selected jobs, we need a neutral term for a finite ground set together with a family of allowed subsets. This terminology also prepares the later exchange and matroid chapters, where the shape of the feasible family is more important than the particular graph that produced it.
[definition: Set System]
A set system is a pair $(E,\mathcal F)$ where $E$ is a finite ground set and $\mathcal F\subset 2^E$ is a family of feasible subsets.
[/definition]
Many graph problems are set-system problems after choosing $E$ to be an edge set or vertex set. Spanning trees, matchings, independent sets, and cuts can all be encoded this way, although the resulting families have very different algorithmic behaviour. Matchings give the form that will dominate the next chapter.
[example: Assignment As Matching]
Let $W$ be a finite set of workers and $J$ a finite set of jobs. In the bipartite graph $G=(W\cup J,E)$, an edge $wj\in E$ means that worker $w$ can perform job $j$. A feasible assignment is a matching $M\subset E$: this means that no worker is used by two edges of $M$, and no job is used by two edges of $M$.
If profits $p:E\to\mathbb R$ are given, then the value of the assignment $M$ is
\begin{align*}
p(M)=\sum_{e\in M}p(e).
\end{align*}
Writing the matching as
\begin{align*}
M=\{w_1j_1,\ w_2j_2,\ \dots,\ w_kj_k\},
\end{align*}
where the workers $w_1,\dots,w_k$ are distinct and the jobs $j_1,\dots,j_k$ are distinct, the same objective is
\begin{align*}
p(M)=p(w_1j_1)+p(w_2j_2)+\cdots+p(w_kj_k).
\end{align*}
Thus the weighted assignment problem is the maximisation problem
\begin{align*}
\text{maximise } \sum_{e\in M}p(e)
\quad\text{over all matchings }M\subset E,
\end{align*}
with the optional additional constraint that every job is assigned. Under that extra requirement, for each $j\in J$ there must be exactly one edge $wj\in M$, so the matching selects one compatible worker for every job while never assigning one worker to two selected jobs.
[/example]
The matching example can be treated as a family of subsets, but algorithms and duality often need algebraic constraints as well. This motivates encoding choices by integer variables so that feasibility becomes a system of equations and inequalities. The point is not to replace the combinatorics, but to create a language in which relaxations, dual variables, and polyhedral certificates can later be compared directly with the discrete problem.
[definition: Integer Formulation]
An integer formulation of a finite optimisation problem consists of a feasible set
\begin{align*}
X=\{x\in \mathbb Z^n : x \text{ satisfies the stated constraints}\}
\end{align*}
and an objective function $f:X\to \mathbb R$, often of the form $f(x)=c^\top x$ for some $c\in \mathbb R^n$.
[/definition]
For many subset problems, the vector $x$ is an incidence vector: $x_e=1$ if element $e$ is chosen and $x_e=0$ otherwise. This encoding turns combinatorial choices into algebraic constraints, which is the bridge to relaxations and dual certificates.
[example: Incidence Vector Of A Matching]
Let $G=(W\cup J,E)$ be a bipartite graph, and let $M\subseteq E$ be a matching. Its incidence vector is the vector $x\in\{0,1\}^E$ defined by setting $x_e=1$ when $e\in M$ and $x_e=0$ when $e\notin M$. For a vertex $v\in W\cup J$, the sum
\begin{align*}
\sum_{e\ni v}x_e
\end{align*}
counts the number of selected edges incident to $v$, because each incident edge in $M$ contributes $1$ and each incident edge outside $M$ contributes $0$. Since $M$ is a matching, at most one selected edge is incident to $v$, so
\begin{align*}
\sum_{e\ni v}x_e\le 1 \qquad\text{for every }v\in W\cup J.
\end{align*}
Conversely, if $x\in\{0,1\}^E$ satisfies these inequalities, define
\begin{align*}
M_x=\{e\in E:x_e=1\}.
\end{align*}
If two distinct selected edges $e,f\in M_x$ shared a vertex $v$, then both would appear in the sum over edges incident to $v$, and therefore
\begin{align*}
\sum_{g\ni v}x_g\ge x_e+x_f=1+1=2.
\end{align*}
This contradicts the constraint $\sum_{g\ni v}x_g\le 1$, so no two selected edges share a vertex. Hence $M_x$ is a matching.
If each edge $e$ has profit $p_e$, then the linear objective selects exactly the profits of the edges in the matching. Indeed, terms with $e\in M$ contribute $p_e\cdot 1=p_e$, while terms with $e\notin M$ contribute $p_e\cdot 0=0$, so
\begin{align*}
\sum_{e\in E}p_e x_e=\sum_{e\in M}p_e.
\end{align*}
Thus the binary constraints encode the local non-conflict rule at every worker and job, while the linear objective records the total profit of the selected edges.
[/example]
## Feasibility, Local Moves, And Certificates
How can an algorithm know that a candidate solution is not only feasible but optimal? A brute-force check compares it with every feasible solution, but that defeats the purpose when the finite set is exponentially large. The central idea is to use a certificate: a short mathematical object whose validity proves a bound matching the value of the candidate solution.
[definition: Feasible Solution]
A feasible solution is an element of the feasible family $\mathcal F$ or, in an integer formulation, an integer vector satisfying all constraints of the problem.
[/definition]
Feasibility is the first condition an algorithm must maintain, but it is rarely enough to find an optimum by inspection. Many algorithms move through feasible solutions by small changes, so the next concept records the kind of permitted transformation an algorithm may use and why such transformations can become evidence that no improvement remains.
[definition: Local Move]
Let $\mathcal F$ be the feasible family of a finite optimisation problem. A local move is a map
\begin{align*}
m:\mathcal D \to \mathcal F
\end{align*}
defined on a subset $\mathcal D\subseteq \mathcal F$, where $m(F)$ is the feasible solution reached from $F$ by the move.
[/definition]
Local moves are useful when their absence has a global meaning. In applications, $m(F)$ often differs from $F$ only on a small part of the underlying finite structure. For matchings this is the role of augmenting paths; for spanning trees it is the role of exchanges along cycles and cuts. The tree case gives the cleanest first model.
[example: Tree Exchange Move]
Let $T$ be a spanning tree of a connected graph $G=(V,E)$, and let $e=uv\in E\setminus T$. Since $T$ is connected, there is a $u$-$v$ path in $T$:
\begin{align*}
u=v_0,\ v_1,\ \dots,\ v_k=v.
\end{align*}
Adding the edge $e=uv$ to this path gives the cycle
\begin{align*}
v_0v_1,\ v_1v_2,\ \dots,\ v_{k-1}v_k,\ v_kv_0.
\end{align*}
No other cycle is created: any cycle in $T\cup\{e\}$ must use $e$, because $T$ itself has no cycle, and after deleting $e$ from such a cycle the remaining edges form a $u$-$v$ path in $T$; the uniqueness of paths in a tree forces that path to be the one displayed above.
Now let $f$ be an edge on this cycle with $f\ne e$, and define
\begin{align*}
T'=T+e-f=(T\cup\{e\})\setminus\{f\}.
\end{align*}
The graph $T\cup\{e\}$ has exactly one cycle, and removing an edge from that cycle breaks the cycle while leaving every vertex still connected to the rest of the cycle through the complementary route around it. Hence $T'$ is connected and has no cycle. Its number of edges is
\begin{align*}
|T'|=|T|+1-1=|T|.
\end{align*}
Since $T$ is a spanning tree, $T'$ contains all vertices of $G$ and is again a spanning tree. Thus the exchange replaces one edge of $T$ by one nearby edge $e$, preserving feasibility while changing only the edges on the unique cycle.
[/example]
The exchange example explains how to move between feasible solutions, but optimality also needs a way to rule out all better solutions. For a minimisation problem, that ruling-out device is a lower-bound certificate.
[definition: Minimisation Certificate]
For a minimisation problem with feasible set $\mathcal F$ and objective $c$, a minimisation certificate for a number $L$ is data proving that $c(F)\ge L$ for every $F\in \mathcal F$.
[/definition]
For maximisation the obstruction is reversed: a feasible solution with large value is not certified by a lower bound, because other feasible solutions might still be larger. To rule out all improvements, the certificate has to put a universal ceiling on every feasible value.
[definition: Maximisation Certificate]
For a maximisation problem with feasible set $\mathcal F$ and objective $c$, a maximisation certificate for a number $U$ is data proving that $c(F)\le U$ for every $F\in \mathcal F$.
[/definition]
A bound alone and a candidate solution alone each leave a gap: the bound may be unattained, and the candidate may be improvable. This raises the basic certificate question for any optimisation argument: what exact comparison is enough to stop searching? The stopping rule used throughout the page is elementary: a feasible object is optimal once its value agrees with a valid certificate from the opposite side.
For a minimisation problem, if $F_0\in\mathcal F$ is feasible and a certificate proves $c(F)\ge L$ for every feasible $F$, then equality $c(F_0)=L$ proves that $F_0$ is optimal. For a maximisation problem, if $F_0\in\mathcal F$ is feasible and a certificate proves $c(F)\le U$ for every feasible $F$, then equality $c(F_0)=U$ proves that $F_0$ is optimal.
Each hypothesis in the certificate comparison is doing real work. Feasibility is needed because a non-feasible object can match a bound without being an admissible answer; for example, in a shortest-path problem a disconnected list of edges might have the same length as a lower bound but still fail to be a path from $s$ to $t$. Certificate validity is needed because an asserted number $L$ or $U$ is only useful if it bounds every feasible solution in the correct direction; in a minimisation problem, choosing $L=c(F_0)$ for an arbitrary feasible $F_0$ proves nothing unless all other feasible values are at least $L$. Equality is also essential: a feasible minimisation solution with $c(F_0)>L$ may be strictly above the optimum, and a feasible maximisation solution with $c(F_0)<U$ may leave room for a better feasible solution. Thus a certificate argument stops the search only when the candidate, the universal bound, and equality are all present together.
Certificates become powerful when they are much smaller than the set of feasible solutions. A shortest path can be certified by vertex potentials, which turn many path inequalities into edge inequalities.
[example: Path Certificate From Potentials]
Let $G=(V,E)$ be a directed graph with edge lengths $w:E\to \mathbb R$, and let $s,t\in V$. Suppose vertex numbers $(y_v)_{v\in V}$ satisfy $y_s=0$ and
\begin{align*}
y_v-y_u \le w(uv)
\end{align*}
for every directed edge $uv\in E$. We show that $y_t$ is a lower bound on the length of every directed $s$-$t$ path.
Let
\begin{align*}
P=(v_0v_1,\ v_1v_2,\ \dots,\ v_{k-1}v_k)
\end{align*}
be any directed $s$-$t$ path, so $v_0=s$ and $v_k=t$. Applying the edge inequality to the edge $v_{i-1}v_i$ gives
\begin{align*}
y_{v_i}-y_{v_{i-1}}\le w(v_{i-1}v_i)
\end{align*}
for each $i=1,\dots,k$. Adding these $k$ inequalities gives
\begin{align*}
\sum_{i=1}^k (y_{v_i}-y_{v_{i-1}})\le \sum_{i=1}^k w(v_{i-1}v_i).
\end{align*}
The left-hand side telescopes because every internal vertex term appears once with sign $+$ and once with sign $-$:
\begin{align*}
\sum_{i=1}^k (y_{v_i}-y_{v_{i-1}})=y_{v_k}-y_{v_0}.
\end{align*}
Since $v_k=t$, $v_0=s$, and $y_s=0$, this becomes
\begin{align*}
y_{v_k}-y_{v_0}=y_t-y_s=y_t.
\end{align*}
The right-hand side is the path length:
\begin{align*}
\sum_{i=1}^k w(v_{i-1}v_i)=w(P).
\end{align*}
Therefore
\begin{align*}
y_t\le w(P)
\end{align*}
for every directed $s$-$t$ path $P$. Hence $y_t$ is a lower-bound certificate for the shortest path value, and any directed $s$-$t$ path with length exactly $y_t$ attains this lower bound and is shortest.
[/example]
The potential certificate proves a path bound by summing along a path. Graphs also have certificates built from separating a vertex set, so we next name the boundary that every crossing path or connected spanning subgraph must meet.
[definition: Cut In A Graph]
Let $G=(V,E)$ be an undirected graph. For $S\subset V$, the cut determined by $S$ is
\begin{align*}
\delta(S)=\{uv\in E : u\in S,\ v\in V\setminus S\}.
\end{align*}
[/definition]
A cut separates vertices, so any connected subgraph spanning both sides must use an edge of the cut. This gives lower-bound reasoning for connectivity problems and upper-bound reasoning for packing problems.
[example: Cut Certificate For Connectivity]
Let $G=(V,E)$ be connected, and let $S\subset V$ be nonempty with $S\ne V$. We show that every spanning tree $T$ of $G$ contains at least one edge of the cut
\begin{align*}
\delta(S)=\{uv\in E:u\in S,\ v\in V\setminus S\}.
\end{align*}
Let $T$ be a spanning tree of $G$. Since $S$ is nonempty, choose $x\in S$. Since $S\ne V$, choose $y\in V\setminus S$. Because $T$ is spanning, both $x$ and $y$ are vertices of $T$, and because $T$ is connected, there is an $x$-$y$ path in $T$:
\begin{align*}
x=v_0,\ v_1,\ \dots,\ v_k=y,
\end{align*}
with $v_{i-1}v_i\in T$ for each $i=1,\dots,k$.
The path starts in $S$ and ends outside $S$. Let $r$ be the smallest index such that $v_r\in V\setminus S$. Then $r\ge 1$, $v_{r-1}\in S$, and
\begin{align*}
v_{r-1}v_r\in T.
\end{align*}
Since one endpoint of this edge lies in $S$ and the other lies in $V\setminus S$, the definition of the cut gives
\begin{align*}
v_{r-1}v_r\in \delta(S).
\end{align*}
Thus
\begin{align*}
T\cap \delta(S)\ne\varnothing.
\end{align*}
So the cut condition is a certificate that any connected spanning solution must cross the partition between $S$ and $V\setminus S$.
[/example]
## Polynomial Time And Input Size
When should an algorithm count as efficient? Since finite optimisation problems can have exponentially many feasible solutions, efficiency is measured in the size of the input description rather than in the number of feasible solutions. This is why implicit structure matters: an algorithm may inspect a graph with $m$ edges without enumerating all paths, trees, or matchings.
[definition: Polynomial-Time Algorithm]
An algorithm is polynomial-time for a class of finite optimisation problems if there is a polynomial $p$ such that, for every input of length $N$ in the class, the algorithm terminates after at most $p(N)$ elementary operations.
[/definition]
The precise encoding of numbers can matter when weights are large. In this course, the emphasis is on combinatorial algorithms whose running time is polynomial in the natural input size, such as $|V|+|E|$ for graphs together with the bit-length needed to store weights.
[remark: Implicit Versus Explicit Search]
A graph with $n$ vertices may contain exponentially many simple paths or spanning trees. A polynomial-time algorithm for shortest paths or minimum spanning trees succeeds because it navigates the structure of the feasible family directly. The algorithm is not polynomial in the number of feasible solutions; it is polynomial in the input description.
[/remark]
This viewpoint explains why certificates are part of algorithm design. A fast algorithm should output not only a solution but also enough information to verify that no hidden feasible solution is better.
[example: Verifying A Shortest Path Certificate]
Let $P$ be the proposed directed $s$-$t$ path, written as
\begin{align*}
P=(v_0v_1,\ v_1v_2,\ \dots,\ v_{k-1}v_k),
\end{align*}
with $v_0=s$ and $v_k=t$. To verify the potential certificate, first check each directed edge $uv\in E$ and confirm that
\begin{align*}
y_v-y_u\le w(uv).
\end{align*}
There is one inequality for each edge, so this part of the verification inspects $|E|$ inequalities.
Next compute the length of the displayed path by adding its edge lengths:
\begin{align*}
w(P)=w(v_0v_1)+w(v_1v_2)+\cdots+w(v_{k-1}v_k).
\end{align*}
This uses exactly the $k$ edges of $P$. The path agrees with the potential bound precisely when
\begin{align*}
w(v_0v_1)+w(v_1v_2)+\cdots+w(v_{k-1}v_k)=y_t-y_s.
\end{align*}
If the potentials were normalized with $y_s=0$, this equality becomes
\begin{align*}
w(P)=y_t-y_s=y_t-0=y_t.
\end{align*}
The edge checks prove that every directed $s$-$t$ path has length at least $y_t-y_s$, while the displayed path has exactly that length. Therefore the proposed path and the potentials together verify optimality by inspecting the edges of $G$ and the edges of $P$, without enumerating the other feasible paths.
[/example]
## Greedy, Augmenting, And Primal-Dual Paradigms
Which proof patterns turn finite structure into algorithms? Three paradigms recur throughout the course. Greedy methods build a solution by irreversible local choices, augmenting methods repeatedly improve a feasible solution, and primal-dual methods build a solution and certificate together.
[definition: Greedy Algorithm]
A greedy algorithm constructs a feasible solution by repeatedly making a locally preferred choice according to a specified rule, without revising earlier choices.
[/definition]
A greedy algorithm needs a proof that local choices can be made compatible with some optimum. The standard proof is an exchange argument: compare the greedy partial solution with an optimal solution and replace part of the optimum without worsening its value.
[quotetheorem:5774]
[citeproof:5774]
The theorem does not say that every locally preferred choice is valid. Its force lies entirely in the exchange hypothesis: without it, a greedy prefix may exclude all optimal solutions, as happens in many knapsack instances where taking the best single value-to-weight item can block the best pair of items. The completion hypothesis is separate because being contained in an optimum is not enough unless the final prefix already determines a feasible answer of the right size or form. Concrete problems supply these missing facts from their own structure: spanning trees use cycle and cut exchange, matroids use basis exchange, and interval scheduling uses an ordering argument showing that the earliest finishing compatible interval can replace the first interval of some optimum.
[example: Greedy Choice In Minimum Spanning Tree]
Let $G=(V,E)$ be a connected graph with edge weights $w:E\to\mathbb R$. In Kruskal's algorithm, edges are considered in nondecreasing weight order, and an edge is added exactly when it does not create a cycle with the edges already chosen. Suppose the algorithm has already chosen a forest $F$, and let $e=uv$ be the next edge it adds. Since $e$ does not create a cycle with $F$, the vertices $u$ and $v$ lie in different connected components of $F$. Let $S$ be the vertex set of the component of $F$ containing $u$. Then $u\in S$, $v\notin S$, and therefore
\begin{align*}
e\in \delta(S).
\end{align*}
We show the exchange step: the greedy edge $e$ can be included in some minimum spanning tree extending the current forest. Let $T$ be a minimum spanning tree containing all edges of $F$. If $e\in T$, there is nothing to change. Assume $e\notin T$. Since $T$ is connected, it contains a $u$-$v$ path. This path starts in $S$ and ends in $V\setminus S$, so it contains at least one edge crossing the cut $\delta(S)$. Choose such an edge and call it $f$. Then
\begin{align*}
f\in T\cap \delta(S).
\end{align*}
The edge $f$ joins a vertex of $S$ to a vertex outside $S$, so adding $f$ to the current forest $F$ would also connect two different components of $F$ and would not create a cycle. Thus $f$ was eligible when Kruskal chose $e$. Since Kruskal considers eligible edges in nondecreasing weight order, we have
\begin{align*}
w(e)\le w(f).
\end{align*}
Now define
\begin{align*}
T'=(T\cup\{e\})\setminus\{f\}.
\end{align*}
Adding $e$ to the tree $T$ creates the unique cycle consisting of $e$ together with the unique $u$-$v$ path in $T$. The edge $f$ lies on that path, so removing $f$ breaks that cycle. The graph $T'$ still contains every vertex, is connected, and has no cycle, hence $T'$ is a spanning tree.
Its weight is obtained by removing the contribution of $f$ from $T$ and adding the contribution of $e$:
\begin{align*}
w(T')=\sum_{g\in T'}w(g).
\end{align*}
Since $T'=(T\setminus\{f\})\cup\{e\}$ and $e\notin T$, this becomes
\begin{align*}
w(T')=\sum_{g\in T\setminus\{f\}}w(g)+w(e).
\end{align*}
The sum over $T\setminus\{f\}$ is the sum over $T$ with $w(f)$ removed:
\begin{align*}
\sum_{g\in T\setminus\{f\}}w(g)=\sum_{g\in T}w(g)-w(f).
\end{align*}
Therefore
\begin{align*}
w(T')=w(T)-w(f)+w(e).
\end{align*}
Using $w(e)\le w(f)$ gives
\begin{align*}
w(T')\le w(T).
\end{align*}
Because $T$ was a minimum spanning tree and $T'$ is also a spanning tree, $T'$ must be minimum as well. Thus the crossing edge $f$ can be exchanged for the greedy edge $e$ without increasing total weight, which is the local exchange behind Kruskal's correctness proof.
[/example]
The greedy example builds a solution by accepting choices permanently. Some problems are better approached by starting with any feasible solution and repeatedly improving it, which leads to augmenting methods.
[definition: Augmenting Method]
An augmenting method is an algorithmic paradigm in which a feasible solution is repeatedly replaced by a better feasible solution through a structured local or global move called an augmentation.
[/definition]
For matchings, the augmentation is an alternating path that increases the size of the matching by one. For flows, the augmentation is an $s$-$t$ path in the residual network. In both cases, the certificate of optimality is a barrier object: a vertex cover or a cut.
[example: Augmenting A Matching]
Let $M$ be a matching in a graph, and let an $M$-augmenting path be written in vertex order as
\begin{align*}
P:\quad v_0,\ v_1,\ \dots,\ v_{2r+1},
\end{align*}
where $v_0$ and $v_{2r+1}$ are unmatched by $M$, and the edges alternate so that
\begin{align*}
v_0v_1,\ v_2v_3,\ \dots,\ v_{2r}v_{2r+1}\notin M
\end{align*}
and
\begin{align*}
v_1v_2,\ v_3v_4,\ \dots,\ v_{2r-1}v_{2r}\in M.
\end{align*}
Define the new edge set
\begin{align*}
M'=\bigl(M\setminus \{v_1v_2,\ v_3v_4,\ \dots,\ v_{2r-1}v_{2r}\}\bigr)
\cup \{v_0v_1,\ v_2v_3,\ \dots,\ v_{2r}v_{2r+1}\}.
\end{align*}
The path contains $r$ matching edges and $r+1$ non-matching edges selected for insertion. Hence
\begin{align*}
|M'|=|M|-r+(r+1)=|M|+1.
\end{align*}
It remains to check that $M'$ is still a matching. Outside the path, no edge of $M$ is changed, so no two unchanged edges share a vertex because $M$ was a matching. Along the path, the new selected edges are
\begin{align*}
v_0v_1,\ v_2v_3,\ \dots,\ v_{2r}v_{2r+1},
\end{align*}
and these are pairwise vertex-disjoint because they use the disjoint vertex pairs
\begin{align*}
\{v_0,v_1\},\ \{v_2,v_3\},\ \dots,\ \{v_{2r},v_{2r+1}\}.
\end{align*}
The endpoints $v_0$ and $v_{2r+1}$ were unmatched in $M$, and every internal path vertex $v_i$ with $1\le i\le 2r$ was incident in $M$ to the deleted matching edge adjacent to it on $P$. Therefore no vertex of an inserted edge is also incident to an unchanged edge of $M$.
Thus $M'$ is a matching and has one more edge than $M$. An augmenting path is therefore an explicit improvement step, so the existence of such a path proves that the current matching is not maximum.
[/example]
Augmenting methods improve the primal object until no improvement remains, but they still need a reason that no hidden better solution exists. One way to organise that reason is to maintain the candidate solution together with opposite-side bounding data, so progress on the object and progress on the certificate are coupled.
[definition: Primal-Dual Paradigm]
A primal-dual algorithm maintains a candidate feasible solution for an optimisation problem together with dual data that provide an opposite bound on the optimal value.
[/definition]
Keeping both objects feasible is not by itself enough: the candidate value may still fall short of the certificate bound, leaving room for improvement.
The useful stopping test for a primal-dual method is therefore a numerical one. We need a precise principle saying that once the primal candidate and the dual bound have the same value, feasibility on both sides turns that equality into optimality.
[quotetheorem:5775]
[citeproof:5775]
This stopping principle is the conceptual bridge to the rest of the course, but it also points to a specific difficulty in primal-dual algorithm design. Maintaining a feasible primal object and a feasible dual certificate is not enough unless their objective values meet; approximate equality gives an approximation guarantee rather than exact optimality. In linear-programming language, the missing equality is usually enforced through [complementary slackness](/theorems/2559), and those conditions do not appear automatically from feasibility alone. Bipartite matching will produce augmenting paths and vertex-cover certificates; flows will produce path augmentations and cut certificates; polyhedral methods will explain these certificates as dual feasible solutions.
[example: Dual Bound In Assignment]
In a weighted perfect assignment problem, let $W$ be the worker set, let $J$ be the job set, assume $|W|=|J|$, and let $p_{wj}$ be the profit of assigning worker $w$ to job $j$ along an allowed edge $wj$. Suppose numbers $a_w$ for workers and $b_j$ for jobs satisfy
\begin{align*}
a_w+b_j \ge p_{wj}
\end{align*}
for every allowed edge $wj$. We show that
\begin{align*}
\sum_{w\in W}a_w+\sum_{j\in J}b_j
\end{align*}
is an upper bound on the profit of every perfect matching.
Let $M$ be any perfect matching. For each edge $wj\in M$, the assumed inequality gives
\begin{align*}
p_{wj}\le a_w+b_j.
\end{align*}
Adding these inequalities over all edges of $M$ gives
\begin{align*}
\sum_{wj\in M}p_{wj}\le \sum_{wj\in M}(a_w+b_j).
\end{align*}
By splitting the sum over each edge into its worker part and its job part, we have
\begin{align*}
\sum_{wj\in M}(a_w+b_j)=\sum_{wj\in M}a_w+\sum_{wj\in M}b_j.
\end{align*}
Because $M$ is perfect, every worker $w\in W$ is incident to exactly one edge of $M$, so each term $a_w$ appears exactly once in the sum $\sum_{wj\in M}a_w$. Hence
\begin{align*}
\sum_{wj\in M}a_w=\sum_{w\in W}a_w.
\end{align*}
Similarly, every job $j\in J$ is incident to exactly one edge of $M$, so each term $b_j$ appears exactly once in the sum $\sum_{wj\in M}b_j$, and therefore
\begin{align*}
\sum_{wj\in M}b_j=\sum_{j\in J}b_j.
\end{align*}
Substituting these identities gives
\begin{align*}
\sum_{wj\in M}p_{wj}\le \sum_{w\in W}a_w+\sum_{j\in J}b_j.
\end{align*}
Thus the numbers $a_w$ and $b_j$ give an upper-bound certificate for every perfect assignment. If some perfect matching $M^\ast$ satisfies
\begin{align*}
\sum_{wj\in M^\ast}p_{wj}=\sum_{w\in W}a_w+\sum_{j\in J}b_j,
\end{align*}
then no perfect matching can have larger profit than $M^\ast$, so $M^\ast$ is an optimal assignment. This is the first glimpse of the dual variables used in the Hungarian algorithm.
[/example]
The first chapter has now set up the language of feasible solutions, objective values, and certificates. The next step is to apply that language to bipartite graphs, where augmenting paths provide the first clean optimisation mechanism.
# 2. Matchings in Bipartite Graphs
This chapter studies the first major augmenting method of the course: maximum matching in bipartite graphs. Building on Chapter 1's language of feasible solutions, local moves, and certificates, it uses only finite graphs, paths, cycles, subgraphs, and basic set operations. The guiding question is how to recognise that a current matching is optimal, and if it is not optimal, how to improve it by a local operation. Bipartite matchings are the model case where an algorithm, a structural theorem, and a dual certificate all describe the same phenomenon.
## Alternating Structure Around a Matching
Given a graph and a partial assignment of vertices to partners, what information is contained in the edges that are not yet used? The useful viewpoint is not to inspect unused edges in isolation, but to compare them with the matched edges and look for paths that alternate between the two types.
[definition: Bipartite Graph]
A bipartite graph is a graph $G=(X\cup Y,E)$ whose vertex set is the disjoint union of two sets $X$ and $Y$, and every edge in $E$ has one endpoint in $X$ and one endpoint in $Y$.
[/definition]
The two parts $X$ and $Y$ represent two types of objects, such as workers and jobs. This separation is what makes matching duality especially clean, because every edge crosses from one side to the other.
[example: Complete Bipartite Graphs]
Let $G=K_{m,n}$ have bipartition $X\cup Y$, with $|X|=m$ and $|Y|=n$. We compute the largest possible size of a matching in $G$.
If $M$ is any matching, then each edge of $M$ has one endpoint in $X$ and one endpoint in $Y$. Because no two edges of a matching share an endpoint, the map sending each edge of $M$ to its endpoint in $X$ is injective, so $|M|\le |X|=m$. The same argument using endpoints in $Y$ gives $|M|\le |Y|=n$. Hence
\begin{align*}
|M|\le m
\quad\text{and}\quad
|M|\le n,
\end{align*}
so
\begin{align*}
|M|\le \min\{m,n\}.
\end{align*}
Let $r=\min\{m,n\}$. Choose distinct vertices $x_1,\dots,x_r$ in $X$ and distinct vertices $y_1,\dots,y_r$ in $Y$. Since $G=K_{m,n}$ is complete bipartite, each edge $x_i y_i$ belongs to $E$ for $1\le i\le r$. The set
\begin{align*}
M=\{x_1y_1,\dots,x_ry_r\}
\end{align*}
is a matching: if $i\ne j$, then $x_i\ne x_j$ and $y_i\ne y_j$, so the edges $x_iy_i$ and $x_jy_j$ share no endpoint. Therefore $|M|=r=\min\{m,n\}$. Since every matching has size at most $\min\{m,n\}$ and this matching has exactly that size, the maximum matching size in $K_{m,n}$ is $\min\{m,n\}$.
[/example]
The example shows that the obstruction to matching more vertices is often a shortage of available neighbours. Before making this obstruction precise, we need the language for a current partial solution.
[definition: Matching]
A matching in a graph $G=(V,E)$ is a set $M\subset E$ such that no two edges of $M$ share an endpoint.
[/definition]
A vertex incident with an edge of $M$ is called matched by $M$; a vertex not incident with any edge of $M$ is called exposed by $M$. The optimisation problem is to maximise $|M|$ over all matchings $M$ in the graph.
[definition: Vertex Cover]
A vertex cover in a graph $G=(V,E)$ is a set $C\subset V$ such that every edge of $G$ has at least one endpoint in $C$.
[/definition]
Every matching gives a lower bound on every vertex cover: each matched edge needs a distinct covering vertex. This is the first certificate pattern in the chapter, and the bipartite case will later show that the best such lower bound is always tight.
[remark: Matching-Cover Certificate]
If $M$ is a matching and $C$ is a vertex cover in a graph $G$, then $|M|\le |C|$. Indeed, every edge of $M$ must be covered by at least one vertex of $C$, and different edges of $M$ cannot be covered by the same endpoint of the matching.
[/remark]
The matching hypothesis is essential here: in the star with centre $v$ and three leaves, the three incident edges are not a matching, and the cover $\{v\}$ covers all three of them. The vertex-cover hypothesis is equally specific: if $G$ has edges $ab$ and $cd$, the matching $\{ab,cd\}$ has size $2$, while the set $\{a\}$ meets one matched edge but is not a vertex cover. The matching-cover certificate gives only a lower bound on the size of vertex covers; it does not say that a given matching and cover have equal size, since a path of length two has a matching of size $1$ and the cover consisting of both endpoints of the path has size $2$. The main achievement of the bipartite theory later in the chapter is to identify when the lower bound can be made sharp and how to find the certificate.
[Weak duality](/theorems/2549) gives a way to prove optimality: if we find a matching $M$ and a vertex cover $C$ with $|M|=|C|$, then both are optimal. To find larger matchings when equality has not been certified, we need paths along which the current matching can be switched without breaking the matching condition.
[definition: Alternating Path]
Let $M$ be a matching in a graph $G$. An alternating path with respect to $M$ is a path whose edges alternate between edges in $M$ and edges in $E\setminus M$.
[/definition]
Alternation is designed so that switching the status of all edges on the path preserves the matching condition when the endpoint pattern is favourable. The improving case is the one where the path begins and ends at vertices currently missed by the matching, which motivates the following definition.
[definition: Augmenting Path]
Let $M$ be a matching in a graph $G$. An augmenting path with respect to $M$ is an alternating path whose first and last vertices are exposed by $M$.
[/definition]
Along an augmenting path, the unmatched edges outnumber the matched edges by one, so switching along the path should increase the matching size. To state that switch compactly, we use the set operation that toggles membership in the path edge set.
[definition: Symmetric Difference]
Let $U$ be a set. The symmetric difference operator on subsets of $U$ is the map $\triangle:\mathcal{P}(U)\times\mathcal{P}(U)\to \mathcal{P}(U)$ defined by
\begin{align*}
A\triangle B=(A\setminus B)\cup(B\setminus A)
\end{align*}
for all $A,B\subset U$.
[/definition]
Symmetric difference is the algebraic notation for switching. If $P$ is an augmenting path, then $M\triangle E(P)$ is the matching obtained by deleting the matched edges of $P$ and adding the unmatched edges of $P$.
[example: Repeated Augmentation]
Consider the bipartite graph with $X=\{a,b,c\}$, $Y=\{1,2,3\}$, and edge set
\begin{align*}
E=\{a1,a2,b2,b3,c3\}.
\end{align*}
Start with
\begin{align*}
M=\{a1,b2\}.
\end{align*}
The edges $a1$ and $b2$ share no endpoint, so $M$ is a matching. The vertices incident with edges of $M$ are $a,1,b,2$, hence the exposed vertices are $c$ and $3$.
The one-edge path $c3$ is alternating with respect to $M$, because its only edge satisfies $c3\in E\setminus M$. Its two endpoints $c$ and $3$ are exposed, so $c3$ is an augmenting path. Switching along this path means taking the symmetric difference with its edge set. Since $c3\notin M$, we have $M\setminus\{c3\}=M=\{a1,b2\}$ and $\{c3\}\setminus M=\{c3\}$. Therefore
\begin{align*}
M\triangle \{c3\}=(M\setminus \{c3\})\cup(\{c3\}\setminus M).
\end{align*}
Substituting the two set differences gives
\begin{align*}
M\triangle \{c3\}=\{a1,b2\}\cup\{c3\}.
\end{align*}
Thus
\begin{align*}
M\triangle \{c3\}=\{a1,b2,c3\}.
\end{align*}
The three edges $a1,b2,c3$ have pairwise distinct endpoints in $X$ and pairwise distinct endpoints in $Y$, so the new set is a matching of size $3$.
No matching in this graph can have more than $3$ edges, because every matched edge uses a distinct vertex of $X$ and $|X|=3$. Thus the augmented matching $\{a1,b2,c3\}$ is maximum, and it matches every vertex of $X$.
[/example]
This small example ends with a perfect matching on the left side. In general, the absence of an augmenting path is the exact condition for maximum size.
## Augmenting Paths and the Hungarian Search Tree
If a matching is not maximum, where should an improvement be found? Berge's theorem answers this by comparing the current matching to a larger one and examining the components of their symmetric difference.
[quotetheorem:5776]
[citeproof:5776]
The exposed endpoints are what make the switch improve the matching rather than merely rearrange it: starting and ending outside $M$ creates one more added edge than deleted edge. Finiteness has two roles here. First, the comparison subgraph $M\triangle N$ decomposes into finite alternating paths and cycles, so the counting argument can locate a component whose imbalance is visible in ordinary integer counts. Second, a strictly increasing sequence of augmentations cannot continue forever, so the theorem supports an algorithmic termination argument. Without finiteness, the same local operation need not produce a terminating algorithm. For example, on the one-way infinite path $v_0v_1v_2\cdots$, take $M=\{v_1v_2,v_3v_4,\dots\}$. The shifted matching $\{v_0v_1,v_2v_3,\dots\}$ improves the exposed status of $v_0$ by switching along an infinite alternating ray, but no finite augmenting path performs this switch. Thus finite augmenting paths no longer capture every useful infinite rearrangement, and a procedure that searches only for finite improvements may not reach a final optimum after finitely many steps. By itself, however, Berge's theorem does not specify an efficient way to find the path; it only says that such a path exists whenever the current matching is not maximum. Bipartiteness becomes useful because it imposes a fixed left-right parity on alternating paths, allowing the search to be organised as a tree or forest.
Berge's theorem turns optimality into a search problem. In bipartite graphs, this search can be organised from the exposed vertices on one side by growing an alternating tree.
[definition: Hungarian Search Tree]
Let $G=(X\cup Y,E)$ be a bipartite graph and let $M$ be a matching. A Hungarian search tree rooted at an exposed vertex $x_0\in X$ is a tree whose paths from $x_0$ are alternating paths, using unmatched edges from $X$ to $Y$ and matched edges from $Y$ to $X$.
[/definition]
The tree explores all vertices reachable from exposed vertices in $X$ by alternating paths of the prescribed parity. The next question is what the completed search certifies: either it reaches an exposed vertex in $Y$ and gives an augmentation, or its failure to do so proves that no such augmentation starts on the left.
[quotetheorem:5777]
[citeproof:5777]
The statement is deliberately one-sided: it starts from exposed vertices in $X$, so its failed search only rules out augmenting paths whose first exposed endpoint lies on the left. That one-sided hypothesis is the part needed when the goal is to match all of $X$, because any improvement for that goal must begin at an exposed left vertex. For maximum cardinality matching, the relevant additional condition is that the search is run from all currently exposed left vertices in the chosen orientation; otherwise a right-starting augmenting path could be missed. The clean rule "unmatched from $X$ to $Y$, matched from $Y$ to $X$" depends on bipartiteness. For example, in the triangle with matching consisting of one edge, the remaining vertex is exposed, but there is no fixed left-right parity that tells the search which side an unmatched edge must enter next; the odd cycle destroys the two-colour orientation used by the Hungarian tree. Thus the theorem is an invariant for this particular bipartite search procedure, not a standalone description of every possible matching algorithm.
This is the algorithmic form of Berge's theorem for bipartite graphs: repeatedly search, augment if possible, and stop when the search cannot find an exposed vertex on the other side. Each augmentation increases the matching size by one, so the finite process terminates.
[example: Constructing a Maximum Matching by Search]
Let $X=\{a,b,c,d\}$ and $Y=\{1,2,3,4\}$, with
\begin{align*}
E=\{a1,a2,b1,b3,c2,c4,d3\}.
\end{align*}
Begin with
\begin{align*}
M=\{a1,b3,c4\}.
\end{align*}
The three edges in $M$ have distinct endpoints in $X$ and distinct endpoints in $Y$, so $M$ is a matching. The vertices matched by $M$ are $a,b,c$ in $X$ and $1,3,4$ in $Y$, hence the exposed vertices are $d$ and $2$.
Starting from the exposed vertex $d$, the edge $d3$ is not in $M$, so the search may follow it from $d$ to $3$. Since $3$ is matched by the edge $b3\in M$, the alternating search then follows $b3$ from $3$ to $b$. From $b$, the edge $b1$ is not in $M$, so the search follows $b1$ from $b$ to $1$. Since $1$ is matched by the edge $a1\in M$, the search follows $a1$ from $1$ to $a$. Finally, the edge $a2$ is not in $M$, so the search follows $a2$ from $a$ to $2$. Thus the path
\begin{align*}
P=d3,b3,b1,a1,a2
\end{align*}
has edge set
\begin{align*}
E(P)=\{d3,b3,b1,a1,a2\}.
\end{align*}
Its first, third, and fifth edges are outside $M$:
\begin{align*}
d3\notin M,\quad b1\notin M,\quad a2\notin M.
\end{align*}
Its second and fourth edges are in $M$:
\begin{align*}
b3\in M,\quad a1\in M.
\end{align*}
Therefore the edges of $P$ alternate between edges outside $M$ and edges in $M$. Since the endpoints $d$ and $2$ are exposed, $P$ is an augmenting path.
Switching along $P$ means replacing $M$ by $M\triangle E(P)$. The matched edges on $P$ are
\begin{align*}
M\cap E(P)=\{a1,b3\}.
\end{align*}
The unmatched edges on $P$ are
\begin{align*}
E(P)\setminus M=\{d3,b1,a2\}.
\end{align*}
Removing from $M$ the edges of $P$ leaves only $c4$:
\begin{align*}
M\setminus E(P)=\{c4\}.
\end{align*}
By the definition of symmetric difference,
\begin{align*}
M\triangle E(P)=(M\setminus E(P))\cup(E(P)\setminus M).
\end{align*}
Substituting the two set differences gives
\begin{align*}
M\triangle E(P)=\{c4\}\cup\{d3,b1,a2\}.
\end{align*}
Thus
\begin{align*}
M\triangle E(P)=\{d3,b1,c4,a2\}.
\end{align*}
The four edges $d3,b1,c4,a2$ have distinct endpoints in $X$ and distinct endpoints in $Y$, so the new set is a matching. It has size $4$, and no matching can have more than $4$ edges because each matched edge uses a distinct vertex of $X$ and $|X|=4$. Hence $\{d3,b1,c4,a2\}$ is a maximum matching, and it matches every vertex of $X$.
[/example]
The search tree does more than find paths. When no augmenting path exists, the reached and unreached vertices encode a minimum vertex cover.
[illustration:co-alternating-search-cover]
## Hall's Condition and Vertex-Cover Duality
What stops a bipartite graph from matching every vertex on the left? The only possible obstruction is that some set of left vertices has too few neighbours, and Hall's theorem says that this local counting obstruction is also sufficient.
[definition: Neighbourhood]
Let $G=(X\cup Y,E)$ be a bipartite graph. The neighbourhood operator from the left side to the right side is the map $N:\mathcal{P}(X)\to \mathcal{P}(Y)$ defined by
\begin{align*}
N(S)=\{y\in Y: xy\in E \text{ for some } x\in S\}
\end{align*}
for all $S\subset X$.
[/definition]
If a matching covers every vertex of $X$, then the vertices of $S$ must be matched to distinct vertices of $N(S)$. The central question is whether these inequalities for all subsets $S\subset X$ are enough to build a matching covering $X$.
[quotetheorem:2018]
[citeproof:2018]
The condition must be checked over all subsets, not only single vertices. For instance, if $X=\{a,b\}$ and $Y=\{1\}$ with edges $a1$ and $b1$, then each individual left vertex has a neighbour, but $N(\{a,b\})=\{1\}$ has size $1<2$, so no matching can cover $X$. The finite hypothesis explains why the course proof can use a maximum matching and a terminating search forest. It also prevents a common misreading of Hall's condition: checking only finite subsets is not the same as checking all subsets when $X$ is infinite. For example, take an uncountable set $X$, a countably infinite set $Y$, and all possible edges between them. Every finite $S\subset X$ has $|N(S)|=|Y|\ge |S|$, but $N(X)=Y$ has smaller cardinality than $X$, so no matching can cover $X$. Hall's theorem does not count the number of possible matchings, nor does it choose one canonically; it gives the exact feasibility criterion. The augmenting-path proof is important because it turns a failed attempt at matching into the deficient set $S$ that explains the failure.
Hall's theorem is often used as an existence theorem rather than an algorithm. It gives a finite checklist over subsets, while the augmenting-path proof explains why failure of the checklist produces an explicit obstruction.
[example: Unit-Time Jobs with Deadlines]
Let the right side be the set of time slots $\{1,\dots,D\}$, where $D=\max_{j\in J} d_j$, and join job $j$ to exactly the slots $1,\dots,d_j$. A schedule assigns to each job $j$ a slot $\sigma(j)$ with $1\le \sigma(j)\le d_j$, and no two jobs may receive the same slot. From such a schedule, form the edge set
\begin{align*}
M_\sigma=\{j\sigma(j):j\in J\}.
\end{align*}
There is exactly one edge of $M_\sigma$ incident with each job $j$, and if $j\ne k$, then $\sigma(j)\ne \sigma(k)$, so the edges $j\sigma(j)$ and $k\sigma(k)$ do not share a time-slot endpoint. Hence $M_\sigma$ is a matching covering $J$.
Conversely, suppose $M$ is a matching covering $J$. For each job $j\in J$, there is exactly one edge of $M$ incident with $j$, say $jt_j$. Since the graph contains edges from $j$ only to slots at most $d_j$, we have
\begin{align*}
1\le t_j\le d_j.
\end{align*}
If $j\ne k$, then $t_j\ne t_k$, because otherwise the two matched edges $jt_j$ and $kt_k$ would share the same time-slot endpoint. Thus assigning job $j$ to slot $t_j$ gives a feasible schedule. Therefore feasible schedules are exactly matchings covering $J$.
By *Hall Marriage Theorem*, such a matching exists exactly when
\begin{align*}
|N(S)|\ge |S|
\end{align*}
for every set $S\subset J$. In this graph, the neighbours of $S$ are precisely the slots that meet at least one deadline in $S$:
\begin{align*}
N(S)=\{t: jt\in E \text{ for some } j\in S\}.
\end{align*}
Since $jt\in E$ means $1\le t\le d_j$, this is the same set as
\begin{align*}
N(S)=\{t: 1\le t\le d_j \text{ for some } j\in S\}.
\end{align*}
Thus Hall's inequality says that every collection $S$ of jobs must collectively have at least $|S|$ available slots. If some $S$ has fewer than $|S|$ available slots, then the jobs in $S$ cannot be assigned distinct slots before their deadlines; if no such $S$ exists, Hall's theorem supplies a matching covering $J$, hence a feasible schedule.
[/example]
The scheduling example translates feasibility into Hall's subset inequalities. For maximum cardinality matching, the corresponding certificate is not only a failed Hall set but a vertex cover whose size matches the current matching; this motivates the following min-max theorem.
[quotetheorem:5778]
[citeproof:5778]
Bipartiteness is essential. In a triangle, the maximum matching has size $1$, while every vertex cover has size at least $2$, so the equality in Konig's theorem fails outside the bipartite setting. The theorem is also a min-max certificate rather than, by itself, a complete implementation: the alternating search supplies the constructive route to the matching and the cover. This certificate viewpoint is the combinatorial shadow of linear programming duality, and it also aligns with the max-flow/min-cut perspective developed later in many courses.
Konig's theorem is the certificate theorem for bipartite matching: a maximum matching can be checked by exhibiting a cover of the same size. This explains why the Hungarian search tree is both an algorithmic object and a duality object.
[example: A Minimum Cover from a Search Forest]
Let $G$ have $X=\{a,b,c\}$, $Y=\{1,2,3\}$, and
\begin{align*}
E=\{a1,a2,b2,c2\}.
\end{align*}
Take
\begin{align*}
M=\{a1,b2\}.
\end{align*}
The edges $a1$ and $b2$ share no endpoint, so $M$ is a matching. The vertices matched by $M$ are $a,b,1,2$, hence the exposed vertices are $c$ in $X$ and $3$ in $Y$.
Start the alternating search from the exposed left vertex $c$. The edge $c2$ is not in $M$, so the search follows the unmatched edge $c2$ from $c$ to $2$. The vertex $2$ is matched by the edge $b2\in M$, so the search then follows the matched edge $b2$ from $2$ to $b$. From $b$, the only incident edge is $b2$, and this edge has already been used as the matched return edge, so there is no unmatched edge from $b$ to a new vertex of $Y$. Thus the search stops with reached sets
\begin{align*}
S=\{c,b\}\subset X
\end{align*}
and
\begin{align*}
T=\{2\}\subset Y.
\end{align*}
The cover extracted from the search forest is
\begin{align*}
C=(X\setminus S)\cup T.
\end{align*}
Since
\begin{align*}
X\setminus S=\{a,b,c\}\setminus\{c,b\}=\{a\},
\end{align*}
we get
\begin{align*}
C=\{a\}\cup\{2\}=\{a,2\}.
\end{align*}
This set covers every edge of $G$: the edges $a1$ and $a2$ are incident with $a$, while the edges $b2$ and $c2$ are incident with $2$.
Finally,
\begin{align*}
|C|=|\{a,2\}|=2
\end{align*}
and
\begin{align*}
|M|=|\{a1,b2\}|=2.
\end{align*}
No vertex cover can have size less than $2$, because the two matched edges $a1$ and $b2$ are disjoint, so one vertex cannot cover both of them. Therefore $C=\{a,2\}$ is a minimum vertex cover, and its size is certified by the matching $M$ of the same size.
[/example]
The chapter therefore ends with three equivalent ways to understand bipartite matching. An augmenting path is a local certificate that the matching is not optimal; Hall's condition is the subset-counting certificate for matching all vertices on one side; and Konig's theorem gives the exact dual certificate for maximum cardinality matching. Chapter 3 keeps the same bipartite matching framework but adds costs, replacing vertex covers by row and column potentials as the dual certificate.
# 3. Weighted Bipartite Matching and the Assignment Problem
This chapter turns bipartite matching into an optimisation problem with numerical costs. In Chapter 2, augmenting paths certified whether a matching could be enlarged and vertex covers certified maximum cardinality; now we ask which perfect matching is cheapest, and what kind of dual object certifies that no cheaper matching has been missed. The answer is the assignment problem, where potentials on the two sides of a bipartite graph play the same role as prices, and the Hungarian algorithm adjusts these prices until an optimal perfect matching appears in the equality graph.
## Perfect Matchings, Cost Matrices, and Potentials
The basic question is how to assign $n$ workers to $n$ jobs so that every worker receives one job, every job is filled once, and the total cost is minimal. The unweighted matching language records feasibility, but an optimisation algorithm must also compare feasible perfect matchings numerically.
[definition: Assignment Instance]
Let $n \in \mathbb N$. An assignment instance is a cost matrix $C = (c_{ij}) \in \mathbb R^{n \times n}$, where $c_{ij}$ is the cost of assigning row vertex $i$ to column vertex $j$.
[/definition]
The matrix is a compact way of writing a complete bipartite graph $K_{n,n}$ with edge weights. Since a feasible assignment uses exactly one edge incident with each row vertex and each column vertex, the next definition records the numerical objective attached to the corresponding permutation.
[definition: Assignment Cost]
Let $S_n$ denote the set of permutations of $\{1,\dots,n\}$. For an assignment instance $C\in\mathbb R^{n\times n}$, the assignment cost map sends a permutation $\pi\in S_n$ to
\begin{align*}
\operatorname{cost}_C(\pi)=\sum_{i=1}^{n} c_{i,\pi(i)}.
\end{align*}
[/definition]
The assignment problem is to minimise $\operatorname{cost}_C(\pi)$ over all $\pi \in S_n$.
This formulation already suggests a finite search problem, but $n!$ possibilities are too many for an algorithmic theory. To bring linear programming and dual certificates into the picture, we encode each assignment by its incidence matrix.
[definition: Permutation Matrix]
A permutation matrix is a matrix $X=(x_{ij}) \in \{0,1\}^{n\times n}$ such that each row sum is $1$ and each column sum is $1$.
[/definition]
The entry $x_{ij}=1$ means that row $i$ is matched to column $j$. The linear-programming relaxation is obtained by taking the natural convex region defined by the same row and column equations together with non-negativity, so we next name that feasible region.
[definition: Assignment Polytope]
The assignment polytope is
\begin{align*}
\mathcal A_n = \{X\in \mathbb R^{n\times n}: x_{ij}\ge 0,\ \sum_{j=1}^{n}x_{ij}=1\text{ for all }i,\ \sum_{i=1}^{n}x_{ij}=1\text{ for all }j\}.
\end{align*}
[/definition]
The polytope allows fractional matrices, so at first sight it looks like a weaker problem: a minimum of the relaxed linear problem might have been attained at a matrix whose entries split a row between several columns. The central polyhedral fact says that this feared failure does not occur for linear objectives, because the corners of the feasible region are exactly the original integral assignments.
[remark: Assignment Polytope Integrality]
The Birkhoff-von Neumann theorem says that the vertices of $\mathcal A_n$ are exactly the permutation matrices. Equivalently, every doubly stochastic matrix is a convex combination of permutation matrices.
[/remark]
Here the result is used as background for assignment certificates rather than as a theorem to prove inside this chapter. It relies on the full row and column equations together with non-negativity; if the row equations alone were imposed, the matrix with every row equal to $(1/2,1/2,0,\dots,0)$ would be feasible but would not encode a matching. It also does not say that every doubly stochastic matrix is itself an assignment, only that linear optimisation over the polytope can be certified at a permutation vertex. This is the bridge from finite matching language to the linear-programming duality used next.
[example: Three By Three Assignment Instance]
Consider the cost matrix whose first row is $(4,1,3)$, whose second row is $(2,0,5)$, and whose third row is $(3,2,2)$. For the assignment $\pi=(2,1,3)$, row $1$ is assigned to column $2$, row $2$ to column $1$, and row $3$ to column $3$, so
\begin{align*}
\operatorname{cost}_C(\pi)=c_{1,2}+c_{2,1}+c_{3,3}=1+2+2=5.
\end{align*}
To check all assignments in $S_3$, list the six possible permutations and their costs:
\begin{align*}
\operatorname{cost}_C(1,2,3)=c_{1,1}+c_{2,2}+c_{3,3}=4+0+2=6.
\end{align*}
\begin{align*}
\operatorname{cost}_C(1,3,2)=c_{1,1}+c_{2,3}+c_{3,2}=4+5+2=11.
\end{align*}
\begin{align*}
\operatorname{cost}_C(2,1,3)=c_{1,2}+c_{2,1}+c_{3,3}=1+2+2=5.
\end{align*}
\begin{align*}
\operatorname{cost}_C(2,3,1)=c_{1,2}+c_{2,3}+c_{3,1}=1+5+3=9.
\end{align*}
\begin{align*}
\operatorname{cost}_C(3,1,2)=c_{1,3}+c_{2,1}+c_{3,2}=3+2+2=7.
\end{align*}
\begin{align*}
\operatorname{cost}_C(3,2,1)=c_{1,3}+c_{2,2}+c_{3,1}=3+0+3=6.
\end{align*}
The smallest value in this list is $5$, so $\pi=(2,1,3)$ is a minimum-cost assignment. Later sections replace this enumeration by a dual certificate proving the same lower bound for all assignments at once.
[/example]
The example raises the certificate question. If a matching has cost $5$, a short proof of optimality should give a lower bound of $5$ for every possible assignment. Potentials provide exactly this kind of lower bound by distributing a price across rows and columns.
[definition: Row And Column Potential]
For an assignment instance $C\in \mathbb R^{n\times n}$, a pair of vectors $u,v\in \mathbb R^n$ is called a pair of row and column potentials if
\begin{align*}
u_i+v_j \le c_{ij}
\end{align*}
for all $1\le i,j\le n$.
[/definition]
Potentials are lower bounds distributed across rows and columns. Since every assignment uses each row and each column once, their total value bounds every assignment cost from below.
[quotetheorem:5780]
[citeproof:5780]
A lower bound is useful only when we know how it might become sharp. The permutation hypothesis in the weak bound is essential: if a row or column could be used twice, the sum of the chosen $u_i$ and $v_j$ terms would no longer be exactly the dual total. The result also does not identify an optimal assignment; it only gives a certificate format. To locate assignments attaining the potential bound, we must measure the remaining slack edge by edge and isolate the edges on which the bound is tight.
[definition: Reduced Cost And Equality Graph]
Given feasible potentials $u,v\in \mathbb R^n$, the reduced cost of edge $(i,j)$ is
\begin{align*}
\bar c_{ij}=c_{ij}-u_i-v_j.
\end{align*}
The equality graph is the bipartite graph on row vertices $R=\{1,\dots,n\}$ and column vertices $J=\{1,\dots,n\}$ with edge $(i,j)$ whenever $\bar c_{ij}=0$.
[/definition]
Feasibility of the potentials is exactly the assertion that all reduced costs are non-negative. Zero reduced cost edges are the candidates for a matching that achieves the dual lower bound.
[example: Tight Edges Need Not Certify Optimality]
For the cost matrix with row entries $(4,1,3)$, $(2,0,5)$, and $(3,2,2)$, take $u=(1,0,2)$ and $v=(1,0,0)$. The nine sums $u_i+v_j$ are
\begin{align*}
u_1+v_1=2,\quad u_1+v_2=1,\quad u_1+v_3=1,\quad u_2+v_1=1,\quad u_2+v_2=0,\quad u_2+v_3=0,\quad u_3+v_1=3,\quad u_3+v_2=2,\quad u_3+v_3=2.
\end{align*}
Comparing these with the corresponding costs gives
\begin{align*}
2\le 4,\quad 1\le 1,\quad 1\le 3,\quad 1\le 2,\quad 0\le 0,\quad 0\le 5,\quad 3\le 3,\quad 2\le 2,\quad 2\le 2,
\end{align*}
so the potentials are feasible. Their total value is
\begin{align*}
\sum_{i=1}^{3}u_i+\sum_{j=1}^{3}v_j=(1+0+2)+(1+0+0)=3+1=4.
\end{align*}
The reduced costs $c_{ij}-u_i-v_j$ are
\begin{align*}
\bar c_{11}=4-2=2,\quad \bar c_{12}=1-1=0,\quad \bar c_{13}=3-1=2,\quad \bar c_{21}=2-1=1,\quad \bar c_{22}=0-0=0,\quad \bar c_{23}=5-0=5,\quad \bar c_{31}=3-3=0,\quad \bar c_{32}=2-2=0,\quad \bar c_{33}=2-2=0.
\end{align*}
Thus the equality edges are exactly $(1,2)$, $(2,2)$, $(3,1)$, $(3,2)$, and $(3,3)$. No perfect matching can use only these edges: row $1$ has only the tight edge $(1,2)$, so column $2$ would already be used, while row $2$ also has only the tight edge $(2,2)$ and would require column $2$ a second time. The potentials certify the lower bound $4$, but the tight subgraph does not yet certify optimality because it contains no perfect matching.
[/example]
## Linear Programming Duality for the Assignment Problem
The certificate supplied by potentials is not an accident: it is the linear programming dual of the fractional assignment problem. The question is whether the best lower bound from potentials always reaches the value of the cheapest assignment.
[quotetheorem:5781]
[citeproof:5781]
Together with the Birkhoff-von Neumann theorem, this proves that the dual optimum equals the minimum over actual assignments, not merely over fractional matrices. Feasibility and boundedness are not cosmetic hypotheses in this use of strong duality: without primal feasibility there is no assignment to compare against, and without a lower bound a minimisation problem need not have a finite certificate. The theorem also does not tell us which edges an optimal assignment uses; it only guarantees that some best lower bound exists. For a checkable assignment certificate, we need the stronger local condition that every chosen edge is tight for the displayed row and column potentials.
[remark: Tight-Edge Assignment Certificate]
Let $c_{ij}$ be the cost of assigning row $i$ to column $j$. Suppose row potentials $u_1,\dots,u_n$ and column potentials $v_1,\dots,v_n$ satisfy $u_i+v_j\le c_{ij}$ for all $i,j$. If a perfect matching $M$ uses only edges with $u_i+v_j=c_{ij}$, then $M$ has minimum possible cost.
[/remark]
This is the main certificate: a perfect matching inside the equality graph, together with the potentials defining that graph, proves optimality. The condition $x_{ij}>0$ only on zero reduced-cost edges is essential; if a matched edge has positive reduced cost, the primal objective exceeds the dual value by exactly the weighted slack contribution on that edge. The theorem is only a sufficient criterion as stated here, because it does not promise that a given feasible matching can be repaired without changing the potentials. The Hungarian algorithm is designed to change the matching and the potentials together until this zero-slack situation is reached.
[example: Certificate For The Three By Three Instance]
For the cost matrix with rows $(4,1,3)$, $(2,0,5)$, and $(3,2,2)$, take row potentials $u=(2,1,2)$ and column potentials $v=(1,-1,0)$. The feasibility inequalities are checked entry by entry:
\begin{align*}
u_1+v_1=2+1=3\le 4.
\end{align*}
\begin{align*}
u_1+v_2=2+(-1)=1\le 1.
\end{align*}
\begin{align*}
u_1+v_3=2+0=2\le 3.
\end{align*}
\begin{align*}
u_2+v_1=1+1=2\le 2.
\end{align*}
\begin{align*}
u_2+v_2=1+(-1)=0\le 0.
\end{align*}
\begin{align*}
u_2+v_3=1+0=1\le 5.
\end{align*}
\begin{align*}
u_3+v_1=2+1=3\le 3.
\end{align*}
\begin{align*}
u_3+v_2=2+(-1)=1\le 2.
\end{align*}
\begin{align*}
u_3+v_3=2+0=2\le 2.
\end{align*}
Thus $u,v$ are feasible potentials, and their total value is
\begin{align*}
\sum_{i=1}^{3}u_i+\sum_{j=1}^{3}v_j=(2+1+2)+(1+(-1)+0)=5+0=5.
\end{align*}
Now consider the matching $M=\{(1,2),(2,1),(3,3)\}$. Its cost is
\begin{align*}
c_{1,2}+c_{2,1}+c_{3,3}=1+2+2=5.
\end{align*}
The three matched edges are tight:
\begin{align*}
u_1+v_2=2+(-1)=1=c_{1,2}.
\end{align*}
\begin{align*}
u_2+v_1=1+1=2=c_{2,1}.
\end{align*}
\begin{align*}
u_3+v_3=2+0=2=c_{3,3}.
\end{align*}
For any assignment $\pi$, feasibility gives $u_i+v_{\pi(i)}\le c_{i,\pi(i)}$ for each row $i$, so summing over the three rows gives
\begin{align*}
5=\sum_{i=1}^{3}u_i+\sum_{j=1}^{3}v_j\le \sum_{i=1}^{3}c_{i,\pi(i)}.
\end{align*}
The matching $M$ attains this lower bound with cost $5$, so it is a minimum-cost assignment.
[/example]
## The Hungarian Algorithm as a Primal-Dual Method
We now need an algorithm that finds both parts of the certificate: feasible potentials and a perfect matching in the equality graph. The key idea is to run the usual alternating-tree search only on equality edges; when the search gets stuck, adjust the potentials so that at least one new equality edge appears while no old matched equality edge is lost.
[definition: Hungarian Search Tree]
Given feasible potentials $u,v$ and a matching $M$ in the equality graph, a Hungarian search tree is an alternating tree rooted at an unmatched row vertex, using unmatched equality edges from rows to columns and matched equality edges from columns to rows.
[/definition]
The search tree has the same combinatorial shape as in the unweighted bipartite matching algorithm. To describe the moment when the search gets stuck, we need notation for the two sets reached by alternating equality paths.
[definition: Reachable Sets In Hungarian Search]
During a Hungarian search, let $S$ be the set of row vertices reachable from the root by alternating equality paths, and let $T$ be the set of column vertices reachable from the root by alternating equality paths.
[/definition]
If there is an equality edge from some row in $S$ to an unmatched column outside $T$, we augment the matching. If every equality edge from $S$ stays inside $T$, the ordinary augmenting-path search has no move even though the current matching may be far from perfect. Progress then requires creating a new tight edge while preserving dual feasibility and without breaking the matched tight edges already used by the search tree; this is the purpose of the potential update.
[definition: Hungarian Potential Update]
Let $C\in\mathbb R^{n\times n}$ be an assignment instance, and write $J=\{1,\dots,n\}$ for the set of column vertices. The Hungarian potential update is the transformation
\begin{align*}
\Phi_C:\{(u,v,S,T): u,v\in\mathbb R^n\text{ are feasible potentials, and }S,T\text{ arise from a maximal Hungarian search tree with }J\setminus T\ne\varnothing\}\to \{(p,q)\in\mathbb R^n\times\mathbb R^n:p_i+q_j\le c_{ij}\text{ for all }i,j\}
\end{align*}
defined as follows. The maximality condition means that there is no equality edge from $S$ to $J\setminus T$. For an input $(u,v,S,T)$ in this domain, set
\begin{align*}
\delta = \min\{c_{ij}-u_i-v_j : i\in S,\ j\notin T\}.
\end{align*}
Then $\Phi_C(u,v,S,T)=(u',v')$, where $u'_i=u_i+\delta$ for $i\in S$, $u'_i=u_i$ for $i\notin S$, $v'_j=v_j-\delta$ for $j\in T$, and $v'_j=v_j$ for $j\notin T$.
[/definition]
The definition makes sense because $S$ contains at least one row and, if the matching is not perfect, there is at least one column outside $T$. The following lemma checks that this algebraic move has exactly the graph-theoretic effect required by the search.
[quotetheorem:5782]
[citeproof:5782]
This update is precisely where the weighted algorithm differs from ordinary augmenting-path search. The assumptions on $S$ and $T$ matter: if $S$ were not the set of rows reachable by alternating equality paths, a matched edge from $T$ could lead outside $S$, and the update might destroy a matched equality edge. The lemma also does not say that one update always augments the matching; it says that the equality graph expands without losing the current matching. The next theorem says that repeated use of this expansion cannot continue forever without producing the desired optimal certificate.
[quotetheorem:5783]
[citeproof:5783]
For a concrete starting point in the minimisation version, take $u_i=\min_j c_{ij}$ and $v_j=0$. This makes every row incident with at least one equality edge and gives feasible potentials immediately. The correctness theorem assumes finite complete assignment data; it does not address sparse forbidden-edge variants unless those are first modelled with sufficiently large costs or a separate feasibility layer. Its forward role is to turn the dual certificate into an actual polynomial-time method rather than a merely existential optimality test.
[example: Equality Graph Update]
For the three by three cost matrix with rows $(4,1,3)$, $(2,0,5)$, and $(3,2,2)$, start with row potentials $u=(1,0,2)$ and column potentials $v=(0,0,0)$. The reduced costs $\bar c_{ij}=c_{ij}-u_i-v_j$ are
\begin{align*}
\bar c_{11}=4-1-0=3,\quad \bar c_{12}=1-1-0=0,\quad \bar c_{13}=3-1-0=2.
\end{align*}
\begin{align*}
\bar c_{21}=2-0-0=2,\quad \bar c_{22}=0-0-0=0,\quad \bar c_{23}=5-0-0=5.
\end{align*}
\begin{align*}
\bar c_{31}=3-2-0=1,\quad \bar c_{32}=2-2-0=0,\quad \bar c_{33}=2-2-0=0.
\end{align*}
Thus the equality graph has exactly the zero-reduced-cost edges $(1,2)$, $(2,2)$, $(3,2)$, and $(3,3)$. Take the matching $M=\{(1,2),(3,3)\}$, whose two edges are both equality edges.
Root the Hungarian search at the unmatched row $2$. From row $2$, the equality edge $(2,2)$ reaches column $2$, so column $2$ enters $T$. Since column $2$ is matched to row $1$ by $(1,2)\in M$, the alternating-tree rule then adds row $1$, giving
\begin{align*}
S=\{1,2\},\qquad T=\{2\}.
\end{align*}
The columns outside $T$ are $1$ and $3$. The four slacks from rows in $S$ to columns outside $T$ are
\begin{align*}
c_{11}-u_1-v_1=4-1-0=3,\quad c_{13}-u_1-v_3=3-1-0=2.
\end{align*}
\begin{align*}
c_{21}-u_2-v_1=2-0-0=2,\quad c_{23}-u_2-v_3=5-0-0=5.
\end{align*}
Therefore
\begin{align*}
\delta=\min\{3,2,2,5\}=2.
\end{align*}
The Hungarian update increases $u_i$ by $2$ for $i\in S$, leaves $u_3$ fixed, decreases $v_2$ by $2$, and leaves $v_1$ and $v_3$ fixed:
\begin{align*}
u'=(1+2,0+2,2)=(3,2,2).
\end{align*}
\begin{align*}
v'=(0,0-2,0)=(0,-2,0).
\end{align*}
The matched edge $(1,2)$ remains tight because
\begin{align*}
c_{12}-u'_1-v'_2=1-3-(-2)=1-3+2=0.
\end{align*}
The edge $(2,1)$ becomes tight because
\begin{align*}
c_{21}-u'_2-v'_1=2-2-0=0.
\end{align*}
The edge $(1,3)$ also becomes tight because
\begin{align*}
c_{13}-u'_1-v'_3=3-3-0=0.
\end{align*}
In particular, the new equality edge $(2,1)$ reaches the unmatched column $1$. Augmenting along this edge adds $(2,1)$ to the existing matching $M=\{(1,2),(3,3)\}$ and gives the perfect matching
\begin{align*}
\{(1,2),(2,1),(3,3)\}.
\end{align*}
This update shows the intended effect of the Hungarian step: the equality graph gains new usable tight edges while the matched equality edges already in the search tree remain available.
[/example]
## Optimality Certificates and How to Read Them
The practical output of the algorithm is not only a matching; it is a matching plus potentials. The question to ask at the end of any computation is whether the two objective values agree, because that equality is the certificate.
[definition: Assignment Optimality Certificate]
An assignment optimality certificate consists of a perfect matching $M$ and feasible potentials $u,v\in\mathbb R^n$ such that $u_i+v_j=c_{ij}$ for every edge $(i,j)\in M$.
[/definition]
Given such a certificate, the cost of the matching equals the potential total:
\begin{align*}
\sum_{(i,j)\in M} c_{ij}=\sum_{i=1}^{n}u_i+\sum_{j=1}^{n}v_j.
\end{align*}
The left side is a feasible primal value and the right side is a feasible dual value, so no gap remains. This gives a final, compact criterion that can be checked independently of how the certificate was found.
[remark: Final Assignment Criterion]
For the assignment problem with costs $c_{ij}$, a perfect matching $M$ and feasible potentials $u,v$ certify optimality exactly when every matched edge is tight: $u_i+v_j=c_{ij}$ for all $(i,j)\in M$. The equality of the matching cost with the potential total is the certificate.
[/remark]
This criterion is often easier to check than to find. The perfect-matching hypothesis is essential: the earlier potentials $u=(1,0,2)$ and $v=(1,0,0)$ had tight edges, but no tight perfect matching, so they certified only the weaker lower bound $4$. The criterion also does not assert uniqueness of the optimum; if several tight perfect matchings exist, all have the same optimal cost. This is why the final data worth recording are a tight perfect matching and any compatible potentials, not the particular sequence of Hungarian updates that produced them.
[remark: Shifting Potentials]
Potentials are not unique. If $a\in\mathbb R$, then replacing $u_i$ by $u_i+a$ for every $i$ and $v_j$ by $v_j-a$ for every $j$ leaves every sum $u_i+v_j$ unchanged. Reduced costs, equality edges, and the dual objective value are therefore unchanged.
[/remark]
This freedom explains why different implementations of the Hungarian algorithm may display different row and column labels while certifying the same assignment. The invariant data are the reduced costs and the tight matching.
[example: Potential Normalisation Does Not Affect The Certificate]
Suppose a final computation for the three by three instance gives row potentials $u=(2,1,2)$ and column potentials $v=(1,-1,0)$, with matched edges $(1,2)$, $(2,1)$, and $(3,3)$. Shift the potentials by $a=-1$, meaning that each row potential is replaced by $u_i+a$ and each column potential is replaced by $v_j-a$. The shifted row potentials are
\begin{align*}
u'=(2+(-1),1+(-1),2+(-1))=(1,0,1).
\end{align*}
The shifted column potentials are
\begin{align*}
v'=(1-(-1),-1-(-1),0-(-1))=(2,0,1).
\end{align*}
The row total changes from
\begin{align*}
u_1+u_2+u_3=2+1+2=5
\end{align*}
to
\begin{align*}
u'_1+u'_2+u'_3=1+0+1=2.
\end{align*}
The column total changes from
\begin{align*}
v_1+v_2+v_3=1+(-1)+0=0
\end{align*}
to
\begin{align*}
v'_1+v'_2+v'_3=2+0+1=3.
\end{align*}
The total potential value is unchanged, since
\begin{align*}
(u'_1+u'_2+u'_3)+(v'_1+v'_2+v'_3)=2+3=5.
\end{align*}
The tight sums on the matched entries are also unchanged. For $(1,2)$,
\begin{align*}
u'_1+v'_2=1+0=1=c_{1,2}.
\end{align*}
For $(2,1)$,
\begin{align*}
u'_2+v'_1=0+2=2=c_{2,1}.
\end{align*}
For $(3,3)$,
\begin{align*}
u'_3+v'_3=1+1=2=c_{3,3}.
\end{align*}
Thus the shifted potentials certify the same matching with the same total value $5$; the split between row labels and column labels has changed, but the tight sums on the matching edges have not.
[/example]
The chapter has moved from feasibility certificates for unweighted matchings to optimality certificates for weighted perfect matchings. The same pattern will reappear in Chapter 4 for flows and cuts: a primal object supplies a feasible solution, a dual object supplies a bound, and an algorithm succeeds when it makes the two meet.
# 4. Network Flows and Cuts
Network flow problems ask how much of a commodity can be pushed from a source to a sink through directed edges with limited capacities. This chapter assumes the basic language of finite directed graphs, paths, reachability, and elementary algorithmic complexity from the opening chapters, together with the primal-dual certificate pattern used for matchings and assignments. The chapter develops the algorithmic and dual viewpoints together: augmenting paths explain how to improve a flow, while cuts explain how to certify that no further improvement is possible. The central result is the [max-flow min-cut theorem](/theorems/2568), which is both a structural theorem about directed networks and the correctness proof behind the Ford-Fulkerson method.
## Directed Networks and Feasible Flows
How should we model transport through a directed graph when every edge has a maximum allowed throughput and every intermediate vertex must neither create nor destroy material? The first step is to separate the combinatorial object, namely the directed graph with capacities, from the numerical object, namely an assignment of flow values to edges.
[definition: Directed Network]
A directed network is a tuple $(G, s, t, c)$ where $G=(V,E)$ is a finite directed graph, $s,t \in V$ are distinct vertices called the source and sink, and $c:E \to \mathbb R_{\ge 0}$ is a capacity function.
[/definition]
A directed network gives the permitted routes and their upper limits, but it does not yet say how much traffic is actually sent. To discuss optimisation, we need a second object assigning numbers to edges while enforcing both capacity bounds and conservation at every intermediate vertex.
[definition: Feasible Flow]
A feasible flow in a directed network $(G,s,t,c)$ is a function $f:E\to \mathbb R_{\ge 0}$ such that:
1. $0\le f(e)\le c(e)$ for every $e\in E$.
2. For every $v\in V\setminus\{s,t\}$,
\begin{align*}
\sum_{u:uv\in E} f(uv)=\sum_{w:vw\in E} f(vw).
\end{align*}
[/definition]
A feasible flow is a candidate solution, but an optimisation problem also needs an objective value. Since the source may have several outgoing and incoming edges, the right scalar to maximise is the net amount leaving the source.
[definition: Value of a Flow]
For a directed network $(G,s,t,c)$, let $\mathcal F(G,s,t,c)$ denote the set of feasible flows. The value map is
\begin{align*}
\operatorname{val}:\mathcal F(G,s,t,c)\to \mathbb R,\qquad f\mapsto |f|,
\end{align*}
where $|f|$ is defined by
\begin{align*}
|f|=\sum_{v:sv\in E}f(sv)-\sum_{u:us\in E}f(us).
\end{align*}
[/definition]
This definition is made at the source, while the intended interpretation is delivery to the sink. If the network contains edges entering $s$ or leaving $t$, the displayed net value is a signed quantity; in the usual transport applications those edges can be deleted or assigned zero flow without reducing a maximum. We therefore need a conservation identity showing that the source-side value equals the sink-side net inflow for every feasible flow.
[quotetheorem:5784]
[citeproof:5784]
This identity justifies treating $|f|$ as the amount delivered by the flow rather than as a source-side convention. The conservation hypothesis is essential: without it, an internal vertex could create extra outgoing flow, so source outflow and sink inflow would no longer agree. The result does not use capacities, only conservation, and this is why the same summation argument will reappear when we compare flows with cuts. The next example shows how the definition behaves in a small network and why conservation is the essential restriction.
[example: Two Route Network]
Let $V=\{s,a,b,t\}$, let $E=\{sa,sb,ab,at,bt\}$, and let the capacities be $c(sa)=3$, $c(sb)=2$, $c(ab)=1$, $c(at)=2$, and $c(bt)=3$. Define $f(sa)=2$, $f(sb)=2$, $f(ab)=0$, $f(at)=2$, and $f(bt)=2$. The capacity bounds hold edge by edge:
\begin{align*}
0\le f(sa)=2\le 3=c(sa).
\end{align*}
\begin{align*}
0\le f(sb)=2\le 2=c(sb).
\end{align*}
\begin{align*}
0\le f(ab)=0\le 1=c(ab).
\end{align*}
\begin{align*}
0\le f(at)=2\le 2=c(at).
\end{align*}
\begin{align*}
0\le f(bt)=2\le 3=c(bt).
\end{align*}
It remains to check conservation at the internal vertices $a$ and $b$. At $a$, the incoming flow is
\begin{align*}
f(sa)=2,
\end{align*}
and the outgoing flow is
\begin{align*}
f(ab)+f(at)=0+2=2.
\end{align*}
Thus incoming flow equals outgoing flow at $a$. At $b$, the incoming flow is
\begin{align*}
f(sb)+f(ab)=2+0=2,
\end{align*}
and the outgoing flow is
\begin{align*}
f(bt)=2.
\end{align*}
Thus conservation also holds at $b$, so $f$ is a feasible flow.
Since no edge enters $s$ in this network, the value is
\begin{align*}
|f|=f(sa)+f(sb)=2+2=4.
\end{align*}
The edge $ab$ currently carries no flow, but it remains part of the network; later, after other flow values change, it may appear in a residual network and therefore in an augmenting path.
[/example]
Feasible flows form a continuous optimisation problem, but the graph structure gives a discrete way to improve them. To describe possible improvements, we need to record not only unused forward capacity but also the possibility of undoing flow already sent along an edge.
[illustration:co-residual-network-augmenting-path]
The need for reverse directions is not cosmetic. If one unit has been sent through $u\to v$, a later improvement may have to cancel part of that decision by moving one unit in the residual direction $v\to u$; recording only unused forward capacity would miss such corrections and could falsely declare a non-maximum flow stuck.
[definition: Residual Capacity]
Let $f$ be a feasible flow in a directed network $(G,s,t,c)$. The residual arc set is
\begin{align*}
R_f^+=\{(u,v,uv,+):uv\in E,\ f(uv)<c(uv)\}.
\end{align*}
The reverse residual arcs are
\begin{align*}
R_f^-=\{(v,u,uv,-):uv\in E,\ f(uv)>0\}.
\end{align*}
The full residual arc set is
\begin{align*}
R_f=R_f^+\cup R_f^-.
\end{align*}
The residual capacity function is the map
\begin{align*}
c_f:R_f\to \mathbb R_{>0}
\end{align*}
defined on forward residual arcs by
\begin{align*}
c_f(u,v,uv,+)=c(uv)-f(uv).
\end{align*}
It is defined on reverse residual arcs by
\begin{align*}
c_f(v,u,uv,-)=f(uv).
\end{align*}
[/definition]
A positive residual capacity means that the flow can be modified in the corresponding direction: a $+$ residual arc uses spare capacity on its underlying edge, and a $-$ residual arc cancels existing flow on its underlying edge. The tag is part of the residual arc data. This matters when both $uv$ and $vu$ are original edges, because spare capacity on $uv$ and cancellable flow on $vu$ are two different legal perturbations even though they point in the same residual direction. This motivates collecting all residual arcs into a graph, because reachability in that graph will identify multi-edge improvements.
[definition: Residual Network]
The residual network of a feasible flow $f$ is the directed graph with possible parallel residual arcs $G_f=(V,R_f)$. A residual arc $(x,y,e,\sigma)\in R_f$ has tail $x$ and head $y$.
\begin{align*}
E_f=\{(x,y):\text{there exists }(x,y,e,\sigma)\in R_f\}
\end{align*}
is its underlying reachability relation.
[/definition]
The residual network is not a new optimisation problem with its own source and sink; it is a local description of legal perturbations of the current flow. Keeping the residual arcs tagged avoids merging two different operations that happen to have the same tail and head. This motivates singling out residual paths that begin at the source and end at the sink, because those paths change the objective value rather than merely circulating flow.
[definition: Augmenting Path]
Let $f$ be a feasible flow in a directed network $(G,s,t,c)$. An augmenting path for $f$ is a directed path from $s$ to $t$ in the residual network $G_f$, written as a sequence of residual arcs
\begin{align*}
P=(a_1,\dots,a_k),\qquad a_i\in R_f,
\end{align*}
whose heads and tails concatenate from $s$ to $t$.
[/definition]
An augmenting path gives directions along which to push more value, but the path can only be used up to its weakest residual arc. This motivates defining the limiting amount before defining the update operation.
[definition: Bottleneck Capacity]
Let $\mathcal P_f$ be the set of augmenting paths for a feasible flow $f$. The bottleneck capacity map is
\begin{align*}
\Delta_f:\mathcal P_f\to \mathbb R_{>0}
\end{align*}
defined by
\begin{align*}
\Delta_f(P)=\min_{a\in P} c_f(a).
\end{align*}
[/definition]
The bottleneck capacity is positive, so it is a legitimate candidate improvement size. We next need to check that pushing exactly this amount along the residual path preserves feasibility and increases the flow value.
[quotetheorem:5785]
[citeproof:5785]
The lemma converts the search for a better flow into a graph reachability question. Its hypotheses matter in two ways: the path must be an $s$-$t$ path, otherwise the perturbation may only create a circulation, and the step size must be at most the bottleneck, otherwise some edge would violate a capacity or nonnegativity constraint. What the lemma does not say is that repeated augmentation is efficient or even terminates for arbitrary real capacities. If a path exists, augment; if no path exists, the set of vertices reachable from $s$ will become a cut certificate.
## Ford-Fulkerson and Edmonds-Karp
How can repeated local improvements be organised into an algorithm, and when does the algorithm terminate efficiently? The Ford-Fulkerson method is the most direct expression of the augmentation lemma: keep finding residual $s$-$t$ paths until none remain.
[definition: Ford Fulkerson Method]
The Ford-Fulkerson method starts with the zero flow and repeatedly performs the following step: if the residual network contains an augmenting path $P$, augment the current flow by the bottleneck capacity of $P$; otherwise stop.
[/definition]
This is a method rather than a fully specified algorithm, because it does not say which augmenting path to choose. With integer capacities, any choice raises the value by a positive integer, so integrality is the condition that turns repeated augmentation into finite termination.
[quotetheorem:5786]
[citeproof:5786]
This theorem explains why integer input data produces an integer optimum, a fact that is stronger than the existence of a real-valued optimum. The integer hypothesis cannot simply be dropped from the termination argument: with irrational capacities, poorly chosen augmenting paths can produce an infinite sequence of improvements that converges without stopping. The standard construction has three central vertices joined so that successive augmentations alternately cancel flow across a shared middle edge; choosing the paths in that alternating order makes the remaining residual value shrink by a fixed irrational factor, so the method approaches the optimum through infinitely many positive augmentations. Even for integer capacities, the theorem does not give a polynomial running-time bound: if two $s$-$t$ routes share a unit bottleneck in opposite residual directions while a parallel route has capacity $M$, alternating bad augmenting paths can raise the value by only $1$ at a time for $\Theta(M)$ iterations. The next example interprets the result in terms of paths rather than commodities.
[example: Edge Disjoint Paths]
Let $G$ be a directed graph with distinguished vertices $s$ and $t$, and give every edge capacity $1$. If $P_1,\dots,P_r$ are pairwise edge-disjoint directed $s$-$t$ paths, define
\begin{align*}
f(e)=\#\{i:e\in P_i\}.
\end{align*}
Edge-disjointness gives $f(e)\in\{0,1\}$ for every edge $e$, so $0\le f(e)\le 1$. At each internal vertex, every path that enters also leaves, and every path that neither uses the vertex contributes $0$ to both sides of the conservation equation. Hence $f$ is a feasible flow, and its value is
\begin{align*}
|f|=\sum_{v:sv\in E}f(sv)-\sum_{u:us\in E}f(us)=r-0=r,
\end{align*}
because each of the $r$ paths uses exactly one first edge leaving $s$ and no path enters $s$ before starting.
Conversely, let $f$ be an integer feasible flow in this unit-capacity network. Since $0\le f(e)\le 1$ and $f(e)$ is an integer, every edge has $f(e)=0$ or $f(e)=1$. Starting from $s$, follow an outgoing edge with value $1$; at every internal vertex reached, conservation says
\begin{align*}
\sum_{w:vw\in E}f(vw)=\sum_{u:uv\in E}f(uv),
\end{align*}
so if one unused positive-flow edge has brought us into $v\ne t$, there is a positive-flow edge leaving $v$ unless the walk has closed a directed cycle. Because the graph is finite, this process either reaches $t$ and produces a directed $s$-$t$ path, or repeats a vertex and produces a directed cycle. Delete one unit of flow from every edge of the path or cycle found. The deletion preserves nonnegativity, preserves conservation at all internal vertices, and in the path case lowers the flow value by exactly $1$.
Repeating until no positive-flow edge remains decomposes the original integer flow into directed $s$-$t$ paths and directed cycles. Cycles lower neither source net outflow nor sink net inflow, so a flow of value $k$ yields exactly $k$ edge-disjoint directed $s$-$t$ paths. Thus, in the unit-capacity network, maximum flow value is exactly the maximum number of pairwise edge-disjoint directed paths from $s$ to $t$.
[/example]
The edge-disjoint path example shows the power of arbitrary augmentations when capacities are integral, but it does not provide a polynomial bound in terms of the graph size. This motivates specifying a canonical rule for choosing augmenting paths, so that running time can be bounded by graph parameters rather than capacity magnitudes.
[definition: Edmonds Karp Algorithm]
The Edmonds-Karp algorithm is the Ford-Fulkerson method in which each augmenting path is chosen to have minimum number of edges in the residual network.
[/definition]
Choosing shortest paths makes the residual networks evolve in a controlled way, but this control must be expressed as a graph invariant. This motivates studying the vector of shortest residual distances from the source, which cannot move backwards under Edmonds-Karp augmentation.
[quotetheorem:5787]
[citeproof:5787]
The monotonicity theorem limits how often distances can reset, but it is deliberately only a statement about shortest-path layers, not about the value of the flow. It uses the Edmonds-Karp choice of a shortest augmenting path; arbitrary Ford-Fulkerson augmentations need not have this monotone-distance behaviour. For instance, take vertices $s,a,b,t$ with edges $sa,ab,bt,sb,at$, where $sa,ab,bt$ have capacity $1$, while $sb$ and $at$ have capacity $2$. If an arbitrary augmentation first uses the longer path $s\to a\to b\to t$, then the residual network gains the reverse arc $b\to a$ and still has spare capacity on $s\to b$ and $a\to t$. A later augmentation can use the residual path $s\to b\to a\to t$, cancelling the middle edge while increasing the total value. This shows the mechanism that the shortest-path rule forbids: a non-shortest augmentation can create reverse residual arcs that later participate in shorter reachability patterns. Edmonds-Karp avoids this by always augmenting along the current breadth-first-search layers. To obtain a running-time bound, we need a countable event tied to augmentations. We count how often a directed residual arc becomes a bottleneck arc on a shortest augmenting path.
[quotetheorem:5788]
[citeproof:5788]
This bound is the first point in the course where a primal improvement algorithm is proved polynomial by tracking a dual-looking quantity, namely residual distance. The theorem depends on the shortest-path rule, not just on augmenting paths, and it bounds augmentations in terms of $n$ and $m$ rather than the size of the capacities. It does not claim Edmonds-Karp is the fastest maximum-flow algorithm; its role here is to show how a simple local improvement method becomes polynomial once the right invariant is tracked. We next make the dual certificate explicit by introducing cuts.
## Cuts and Conservation
What certificate can prove that a feasible flow cannot be improved? A cut separates the source from the sink, and its capacity bounds the value of every feasible flow because all delivered material must cross the separation somewhere.
[definition: Source Sink Cut]
A source-sink cut in a directed network $(G,s,t,c)$ is a subset $S\subset V$ such that $s\in S$ and $t\notin S$. Its outgoing edge set is
\begin{align*}
\delta^+(S)=\{uv\in E:u\in S,\ v\notin S\}.
\end{align*}
Its incoming edge set is
\begin{align*}
\delta^-(S)=\{uv\in E:u\notin S,\ v\in S\}.
\end{align*}
[/definition]
A source-sink cut identifies a boundary, but the optimisation certificate needs a number attached to that boundary. Because flow is trying to move from $S$ toward $V\setminus S$, only edges directed out of $S$ contribute to the cut capacity.
[definition: Capacity of a Cut]
Let $\mathcal C(G,s,t)$ be the set of source-sink cuts in a directed network $(G,s,t,c)$. The cut-capacity map is
\begin{align*}
c:\mathcal C(G,s,t)\to \mathbb R_{\ge 0}
\end{align*}
defined by
\begin{align*}
c(S)=\sum_{uv\in \delta^+(S)}c(uv).
\end{align*}
[/definition]
The capacity of a cut is meant to upper-bound flow value, and that requires relating $|f|$ to actual flow crossing the boundary. This relation comes from summing conservation over all vertices on the source side of the cut.
[quotetheorem:5789]
[citeproof:5789]
The cut identity expresses value as outgoing boundary flow minus incoming boundary flow. The source-sink condition on $S$ is necessary: if $s$ and $t$ lie on the same side, the same summation would not measure transport from source to sink. The identity also keeps the incoming boundary term, which is important because flow may cross a cut in both directions in a directed network. This motivates the weak max-flow cut bound, because incoming boundary flow can only reduce this expression and outgoing boundary flow is capacity-bounded.
[quotetheorem:5790]
[citeproof:5790]
Weak duality is already enough to certify optimality when a flow and cut have equal value. The theorem gives only an upper bound for each fixed cut; it does not by itself prove that any cut is tight, or that a maximum flow exists. Its strength is certificate-based: once a feasible flow and a cut have the same value, no hidden local improvement can beat that value. The residual network explains how to find such a cut at termination.
[example: Residual Cut Certifying Optimality]
Suppose the feasible flow $f$ has no augmenting path, and let $S$ be the set of vertices reachable from $s$ in the residual network $G_f$. The source is reachable from itself, so $s\in S$. Since there is no residual directed path from $s$ to $t$, we have $t\notin S$, and therefore $S$ is a source-sink cut.
Take an edge $uv\in\delta^+(S)$. Then $u\in S$ and $v\notin S$. If $f(uv)<c(uv)$, the definition of residual capacity puts the forward residual arc $(u,v,uv,+)$ in $G_f$. Since $u$ is reachable from $s$, appending this arc would make $v$ reachable from $s$, contradicting $v\notin S$. Hence $f(uv)\not<c(uv)$. Capacity feasibility gives $f(uv)\le c(uv)$, so
\begin{align*}
f(uv)=c(uv).
\end{align*}
Thus every edge in $\delta^+(S)$ is saturated.
Now take an edge $uv\in\delta^-(S)$. Then $u\notin S$ and $v\in S$. If $f(uv)>0$, the definition of residual capacity puts the reverse residual arc $(v,u,uv,-)$ in $G_f$. Since $v$ is reachable from $s$, appending this arc would make $u$ reachable from $s$, contradicting $u\notin S$. Hence $f(uv)\not>0$. Nonnegativity gives $f(uv)\ge 0$, so
\begin{align*}
f(uv)=0.
\end{align*}
Thus every edge in $\delta^-(S)$ carries zero flow.
Applying the flow-across-a-cut identity gives
\begin{align*}
|f|=\sum_{uv\in\delta^+(S)}f(uv)-\sum_{uv\in\delta^-(S)}f(uv).
\end{align*}
Using $f(uv)=c(uv)$ on $\delta^+(S)$ and $f(uv)=0$ on $\delta^-(S)$, this becomes
\begin{align*}
|f|=\sum_{uv\in\delta^+(S)}c(uv)-\sum_{uv\in\delta^-(S)}0.
\end{align*}
The second sum is $0$, so
\begin{align*}
|f|=\sum_{uv\in\delta^+(S)}c(uv)=c(S).
\end{align*}
The weak max-flow cut bound says that every feasible flow has value at most $c(S)$, so the equality $|f|=c(S)$ proves that $f$ is maximum and that $S$ is a minimum cut.
[/example]
This example is the whole duality proof in miniature: absence of a residual path produces a reachable set, and the boundary of that set is tight in both directions. The remaining point is to make this certificate argument into a theorem about the displayed maximum and minimum. For integer capacities, Ford-Fulkerson termination can supply a flow with no augmenting path; for arbitrary real capacities, the theorem must also know that an optimal feasible flow exists before applying the residual-path test. We now state the main theorem with this existence issue included in the proof.
## Max-Flow Min-Cut Duality and Applications
Why do augmenting algorithms know when to stop? The answer is that the obstruction to an augmenting path is exactly a cut whose capacity matches the current flow value.
[quotetheorem:2568]
[citeproof:2568]
The theorem turns algorithmic termination into a certificate pair: a primal object, the flow, and a dual object, the cut. The finiteness hypothesis is doing real work here: it makes the feasible region a compact polytope in $\mathbb R^E$, so real capacities still have an attained maximum rather than merely a supremum. This is also the combinatorial shadow of linear programming duality: feasible flows form the primal optimisation problem, while cuts give integral dual certificates for this special network matrix. The residual-path condition is a useful optimality test, since it converts the global question of maximality into one graph search. This certificate structure is especially powerful when the network is built to encode another combinatorial problem.
[example: Bipartite Matching as Maximum Flow]
Let $G=(A\cup B,E)$ be a bipartite graph. Form a directed network by adding a source $s$ and sink $t$, adding edges $sa$ for every $a\in A$, directing each original bipartite edge from $A$ to $B$, and adding edges $bt$ for every $b\in B$. Give every edge capacity $1$.
First suppose $f$ is an integer feasible flow of value $k$ in this network, and define
\begin{align*}
M_f=\{ab\in E:f(ab)=1\}.
\end{align*}
Since every edge has capacity $1$ and $f$ is integer-valued, each edge flow is either $0$ or $1$. For any $a\in A$, conservation at $a$ gives
\begin{align*}
f(sa)=\sum_{b:ab\in E}f(ab).
\end{align*}
Because $0\le f(sa)\le 1$, the sum on the right is at most $1$, so at most one selected edge of $M_f$ is incident with $a$. For any $b\in B$, conservation at $b$ gives
\begin{align*}
\sum_{a:ab\in E}f(ab)=f(bt).
\end{align*}
Because $0\le f(bt)\le 1$, at most one selected edge of $M_f$ is incident with $b$. Hence $M_f$ is a matching.
The number of selected bipartite edges is computed by summing over their left endpoints:
\begin{align*}
|M_f|=\sum_{ab\in E}f(ab).
\end{align*}
Re-indexing this sum by vertices of $A$ gives
\begin{align*}
\sum_{ab\in E}f(ab)=\sum_{a\in A}\sum_{b:ab\in E}f(ab).
\end{align*}
By conservation at each $a\in A$,
\begin{align*}
\sum_{a\in A}\sum_{b:ab\in E}f(ab)=\sum_{a\in A}f(sa).
\end{align*}
No edge enters $s$ in the constructed network, so the definition of flow value gives
\begin{align*}
\sum_{a\in A}f(sa)=|f|.
\end{align*}
Combining these equalities and using $|f|=k$ gives
\begin{align*}
|M_f|=k.
\end{align*}
Thus an integer flow of value $k$ gives a matching of size $k$.
Conversely, let $M$ be a matching of size $k$. Define $f(sa)=1$ if $a$ is matched in $M$, and define $f(sa)=0$ otherwise. For each directed bipartite edge $ab$, define $f(ab)=1$ if $ab\in M$, and define $f(ab)=0$ otherwise. Finally, define $f(bt)=1$ if $b$ is matched in $M$, and define $f(bt)=0$ otherwise. Every value is $0$ or $1$, so all capacity constraints hold.
It remains to check conservation. If $a\in A$ is unmatched, then no edge of $M$ is incident with $a$, so
\begin{align*}
f(sa)=0=\sum_{b:ab\in E}f(ab).
\end{align*}
If $a$ is matched to $b_0$, then no other edge incident with $a$ belongs to $M$, and hence
\begin{align*}
\sum_{b:ab\in E}f(ab)=f(ab_0)=1=f(sa).
\end{align*}
Thus conservation holds at every $a\in A$. Similarly, if $b\in B$ is unmatched, then
\begin{align*}
\sum_{a:ab\in E}f(ab)=0=f(bt).
\end{align*}
If $b$ is matched to $a_0$, then no other edge incident with $b$ belongs to $M$, so
\begin{align*}
\sum_{a:ab\in E}f(ab)=f(a_0b)=1=f(bt).
\end{align*}
Thus conservation holds at every internal vertex, and $f$ is a feasible flow.
Its value is
\begin{align*}
|f|=\sum_{a\in A}f(sa).
\end{align*}
The sum counts exactly the matched vertices in $A$, so
\begin{align*}
\sum_{a\in A}f(sa)=|M|.
\end{align*}
Since $|M|=k$, we get
\begin{align*}
|f|=k.
\end{align*}
Thus matchings of size $k$ are exactly the integer flows of value $k$ in this constructed unit-capacity network. By the *[Integral Flow Theorem](/theorems/5786)*, a maximum flow may be chosen integer-valued, so computing a maximum flow recovers a maximum matching.
[/example]
The matching reduction shows how max-flow min-cut converts a packing problem into a separating certificate. In bipartite matching this is the same packing-versus-covering pattern that appears in Konig's theorem: a maximum matching is paired with a minimum vertex cover after reading the minimum cut in the constructed network. Applying the same idea to unit-capacity networks gives the edge form of Menger's theorem, where the packed objects are directed paths and the separating certificate is an edge cut.
[quotetheorem:5791]
[citeproof:5791]
This result is a useful model for the whole chapter: a maximum packing of disjoint objects equals a minimum separating certificate. Network flows provide a systematic way to turn that principle into algorithms and proofs. Chapter 5 keeps the same cut and max-flow machinery but uses it to decide feasibility under lower bounds and node balances rather than to maximise a single source-sink value.
# 5. Circulations, Lower Bounds, and Feasibility
This chapter extends the flow theory of Chapter 4 from maximum value questions to feasibility questions. Directed networks, cuts, maximum flows, residual capacities, and the max-flow min-cut theorem are reused as feasibility tests rather than as value optimisation tools. In applications, an arc may have to carry at least a prescribed amount as well as at most a prescribed amount, and a vertex may have external supply or demand. The main theme is that these constraints can still be tested using cuts and ordinary maximum flow, once the lower bounds and imbalances have been encoded correctly.
## Lower and Upper Bounds on Arcs
When a network models traffic, production, or assignment, the constraint on an arc is rarely just an upper capacity. A contract might require at least three units to be shipped, or a machine might have to run above a minimum load. The first problem is therefore: how should the usual capacity data be changed when every arc has both a compulsory minimum and a permitted maximum?
[definition: Lower Bounded Network]
Let $D=(V,A)$ be a directed graph. A lower bounded network consists of functions $l,u:A\to \mathbb Z$ satisfying $0\le l(a)\le u(a)$ for every $a\in A$.
[/definition]
The lower bound $l(a)$ is the compulsory amount on arc $a$, while $u(a)$ is the usual capacity. Once the arc data have two bounds, the next problem is to state conservation at vertices without losing track of the value sent from $s$ to $t$.
[definition: Feasible Lower Bounded Flow]
Let $D=(V,A)$ be a directed graph with lower and upper capacities $l,u:A\to \mathbb Z$. For distinct vertices $s,t\in V$, a feasible lower bounded flow from $s$ to $t$ of value $\nu$ is a function $f:A\to \mathbb R$ such that $l(a)\le f(a)\le u(a)$ for all $a\in A$, flow is conserved at every $v\in V\setminus\{s,t\}$, the net outflow from $s$ is $\nu$, and the net inflow into $t$ is $\nu$.
[/definition]
This definition separates two kinds of restrictions: local arc restrictions and vertex balance restrictions. The next example shows why lower bounds cannot be ignored and then repaired after computing an ordinary maximum flow.
[example: Mandatory Shipment]
Consider the directed graph with arcs $st$, $sx$, and $xt$, where $l(st)=u(st)=5$, $l(sx)=l(xt)=0$, and $u(sx)=u(xt)=10$. For any feasible lower bounded flow $f$ of value $5$, the bounds on the direct arc give
\begin{align*}
5=l(st)\le f(st)\le u(st)=5.
\end{align*}
Hence $f(st)=5$. Since the net outflow from $s$ must be $5$,
\begin{align*}
f(st)+f(sx)=5.
\end{align*}
Substituting $f(st)=5$ gives
\begin{align*}
5+f(sx)=5.
\end{align*}
Therefore
\begin{align*}
f(sx)=0.
\end{align*}
Flow conservation at $x$ gives $f(sx)=f(xt)$, so
\begin{align*}
f(xt)=0.
\end{align*}
Thus every feasible lower bounded flow of value $5$ sends all five units on the compulsory direct arc $st$.
If the lower bounds are ignored and only the upper capacities are kept, define an ordinary flow $g$ by $g(st)=0$, $g(sx)=5$, and $g(xt)=5$. The upper-capacity inequalities hold because
\begin{align*}
0\le g(st)=0\le 5.
\end{align*}
Also,
\begin{align*}
0\le g(sx)=5\le 10.
\end{align*}
And similarly,
\begin{align*}
0\le g(xt)=5\le 10.
\end{align*}
The net outflow from $s$ is
\begin{align*}
g(st)+g(sx)=0+5=5.
\end{align*}
Flow is conserved at $x$ because
\begin{align*}
g(sx)=5=g(xt).
\end{align*}
The net inflow into $t$ is
\begin{align*}
g(st)+g(xt)=0+5=5.
\end{align*}
So $g$ is a valid ordinary value-$5$ flow for the upper-capacity problem, but it violates the original lower bound on the direct arc because
\begin{align*}
g(st)=0<5=l(st).
\end{align*}
The lower bound is therefore not a cosmetic capacity adjustment: it forces the direct shipment to occur.
[/example]
The standard transformation removes lower bounds by paying them in advance. If we write $g(a)=f(a)-l(a)$, then $g$ has nonnegative lower bound and upper capacity $u(a)-l(a)$, but the vertex balances are shifted. The next problem is to measure that shift at each vertex.
[definition: Imbalance Created by Lower Bounds]
Let $D=(V,A)$ have lower capacities $l:A\to \mathbb Z$. The imbalance created by the lower bounds is the function $b_l:V\to \mathbb Z$ defined by
\begin{align*}
b_l(v)=\sum_{a\in \delta^-(v)} l(a)-\sum_{a\in \delta^+(v)} l(a)
\end{align*}
for each $v\in V$.
[/definition]
A positive value of $b_l(v)$ means the preloaded lower bounds send net flow into $v$, so the remaining variable part must send net flow out of $v$ to restore balance. This bookkeeping is the bridge from lower-bounded flows to circulations with vertex demands.
## Circulations, Demands, and Node Balances
The next question is not to maximise the amount from one source to one sink, but to decide whether all local requirements can be satisfied at once. Circulations are the right language because every vertex is treated uniformly: external supply and demand appear as prescribed node balances rather than as special source and sink vertices.
[definition: Circulation with Node Balance]
Let $D=(V,A)$ be a directed graph with lower and upper capacities $l,u:A\to \mathbb Z$ and a balance function $b:V\to \mathbb Z$. A feasible circulation with node balance $b$ is a function $f:A\to \mathbb R$ such that $l(a)\le f(a)\le u(a)$ for all $a\in A$, and for every $v\in V$,
\begin{align*}
\sum_{a\in \delta^-(v)} f(a)-\sum_{a\in \delta^+(v)} f(a)=b(v).
\end{align*}
[/definition]
With this sign convention, $b(v)>0$ means that $v$ demands net inflow, and $b(v)<0$ means that $v$ supplies net outflow. We need a first feasibility test before considering the more delicate capacity constraints, because the prescribed balances must cancel over the whole graph.
[quotetheorem:5792]
[citeproof:5792]
The total balance condition is necessary because summing all node equations cancels every arc contribution, so no choice of capacities can repair a nonzero total demand. The hypothesis that a feasible circulation exists is also essential: if $V=\{p,q\}$, $b(p)=-1$, $b(q)=1$, and the only arc is $qp$ with capacity $0$, then the total balance is $0$ but the demander $q$ cannot receive flow from the supplier $p$. Thus the theorem gives only a global conservation test. It does not say that supplies can reach demands, that lower bounds are compatible with upper bounds, or that any particular cut has enough entering capacity. The missing information is local separation information: every group of vertices must be able to import enough flow, after accounting for compulsory flow that leaves it. The next problem is to identify the exact obstruction caused by a cut separating some vertices from the rest of the network.
[quotetheorem:5793]
[citeproof:5793]
The cut condition has the same flavour as the max-flow min-cut theorem: infeasibility has a certificate, namely a set $X$ whose required net demand exceeds the maximum possible net inflow. The total balance hypothesis is still needed, because taking $X=V$ makes both cut terms vanish and forces $\sum_{v\in V}b(v)\le 0$, while taking complements forces the opposite inequality. The lower and upper capacity hypotheses are also essential: if an arc has $l(a)>u(a)$, the cut inequalities may not be the right obstruction because the single arc is already internally impossible. A concrete capacity obstruction is a supplier $p$ with $b(p)=-2$, a demander $q$ with $b(q)=2$, and one arc $pq$ of upper capacity $1$; the set $X=\{q\}$ violates the inequality and certifies infeasibility. What the theorem gives is a certificate and a characterisation of feasibility, not usually the most convenient hand procedure for checking every possible subset $X\subseteq V$. The theorem is also algorithmic through its proof, and the next problem is to turn that proof into an explicit construction that a maximum-flow subroutine can run.
[example: Transportation with Supply and Demand]
Let the permitted shipping arcs be $A\subseteq \{p_iq_j:1\le i\le m,\ 1\le j\le n\}$, and write $x_{ij}=f(p_iq_j)$ when $p_iq_j\in A$. Since each route has lower bound $0$ and upper capacity $u_{ij}$, the arc bounds are exactly
\begin{align*}
0\le x_{ij}\le u_{ij}
\end{align*}
for every permitted route $p_iq_j$.
At a factory $p_i$, there are no incoming shipping arcs and the balance is $b(p_i)=-s_i$. The circulation balance equation becomes
\begin{align*}
0-\sum_{p_iq_j\in A}x_{ij}=-s_i.
\end{align*}
Multiplying both sides by $-1$ gives
\begin{align*}
\sum_{p_iq_j\in A}x_{ij}=s_i.
\end{align*}
Thus factory $p_i$ ships exactly its supply. At a store $q_j$, there are no outgoing shipping arcs and the balance is $b(q_j)=d_j$, so
\begin{align*}
\sum_{p_iq_j\in A}x_{ij}-0=d_j.
\end{align*}
Hence
\begin{align*}
\sum_{p_iq_j\in A}x_{ij}=d_j.
\end{align*}
Thus store $q_j$ receives exactly its demand. Conversely, any numbers $x_{ij}$ satisfying these route-capacity, factory-supply, and store-demand equations define a feasible circulation by setting $f(p_iq_j)=x_{ij}$, because the displayed equations verify every arc bound and every node balance.
The total balance condition is compatible with the supply-demand hypothesis:
\begin{align*}
\sum_{v\in V}b(v)=\sum_{i=1}^m(-s_i)+\sum_{j=1}^n d_j.
\end{align*}
Since $\sum_i s_i=\sum_j d_j$, this gives
\begin{align*}
\sum_{v\in V}b(v)=-\sum_{i=1}^m s_i+\sum_{j=1}^n d_j=0.
\end{align*}
Now apply *[Hoffman's Circulation Theorem](/theorems/5793)*. If $S$ is a set of factories and $T$ is a set of stores, take $X=S\cup T$. Since all lower bounds are $0$, the cut inequality becomes
\begin{align*}
\sum_{a\in\delta^-(X)}u(a)\ge \sum_{v\in X}b(v).
\end{align*}
The right-hand side is
\begin{align*}
\sum_{v\in X}b(v)=\sum_{q_j\in T}d_j-\sum_{p_i\in S}s_i.
\end{align*}
The arcs entering $X$ are precisely the permitted arcs $p_iq_j$ with $p_i\notin S$ and $q_j\in T$, so the cut inequality is
\begin{align*}
\sum_{\{p_iq_j\in A:\ p_i\notin S,\ q_j\in T\}}u_{ij}\ge \sum_{q_j\in T}d_j-\sum_{p_i\in S}s_i.
\end{align*}
Adding $\sum_{p_i\in S}s_i$ to both sides gives the equivalent form
\begin{align*}
\sum_{p_i\in S}s_i+\sum_{\{p_iq_j\in A:\ p_i\notin S,\ q_j\in T\}}u_{ij}\ge \sum_{q_j\in T}d_j.
\end{align*}
So every group of stores must be coverable by the supply of factories placed on the same side of the cut together with the route capacity entering from the remaining factories; this is the cut obstruction behind transportation feasibility.
[/example]
In this form the theorem is useful for certification, but for computation it is often simpler to use the auxiliary maximum-flow construction directly. The next section records this construction as a reusable reduction.
## Reducing Feasibility to Maximum Flow
The algorithmic problem is: given lower bounds, upper bounds, and balances, how do we test feasibility using only a maximum-flow subroutine? The reduction has two steps: remove lower bounds, then force the remaining deficits and surpluses to be satisfied by arcs from a super-source and into a super-sink.
[definition: Residual Balance After Lower Bounds]
Let $D=(V,A)$ have lower capacities $l:A\to \mathbb Z$ and node balance $b:V\to \mathbb Z$. The residual balance is the function $\hat b:V\to \mathbb Z$ defined by
\begin{align*}
\hat b(v)=b(v)-\sum_{a\in \delta^-(v)}l(a)+\sum_{a\in \delta^+(v)}l(a)
\end{align*}
for each $v\in V$.
[/definition]
The residual balance is the balance that must be achieved by the extra variables $g(a)=f(a)-l(a)$. Thus vertices with $\hat b(v)>0$ need net inflow in the residual network, while vertices with $\hat b(v)<0$ must send out surplus.
[quotetheorem:5794]
[citeproof:5794]
This theorem is the practical version of the feasible circulation theorem, and each hypothesis has a job. The inequalities $l(a)\le u(a)$ are needed before the reduction starts: a single arc with $l(a)=2$ and $u(a)=1$ is already impossible, even if all vertex balances are zero. The total balance condition is also part of the test rather than a cosmetic assumption: with one vertex $v$, no arcs, and $b(v)=1$, there is residual demand but no residual supply, so source-side saturation alone would be vacuous while the original balance equation is impossible. The balance convention is needed because the auxiliary arcs are oriented from residual suppliers to residual demanders; if a vertex with $\hat b(v)<0$ were connected to $t^*$ instead of from $s^*$, even a two-vertex instance with one unit of supply and one unit of demand would route the wrong sign. The value condition is necessary because any missing unit of the required $B$ units leaves either residual supply unrouted or residual demand unmet. For example, if $\hat b(p)=-1$, $\hat b(q)=1$, and there is no path from $p$ to $q$ in the residual network, the maximum flow has value $0<B$ and no feasible circulation exists. The theorem does not say that a lower-value maximum flow is useful, and it does not by itself promise integrality for fractional input data. It turns feasibility into a single maximum-flow computation, after which the actual feasible circulation is recovered by adding the lower bounds back to the flow on each original arc.
[example: Checking Feasibility by an Auxiliary Network]
Suppose the residual balances are $\hat b(a)=-4$, $\hat b(b)=1$, and $\hat b(c)=3$. Their total is
\begin{align*}
\hat b(a)+\hat b(b)+\hat b(c)=-4+1+3=0.
\end{align*}
Thus the residual surplus equals the residual demand:
\begin{align*}
\sum_{\hat b(v)<0}-\hat b(v)=-\hat b(a)=-(-4)=4,
\end{align*}
and
\begin{align*}
\sum_{\hat b(v)>0}\hat b(v)=\hat b(b)+\hat b(c)=1+3=4.
\end{align*}
So the auxiliary target value is $B=4$.
By the construction in *[Super-Source Super-Sink Feasibility Criterion for Circulations with Lower Bounds](/theorems/5794)*, the vertex $a$ has residual supply because $\hat b(a)<0$, so we add
\begin{align*}
s^*a\quad\text{with capacity}\quad -\hat b(a)=4.
\end{align*}
The vertices $b$ and $c$ have residual demand because $\hat b(b)>0$ and $\hat b(c)>0$, so we add
\begin{align*}
bt^*\quad\text{with capacity}\quad \hat b(b)=1
\end{align*}
and
\begin{align*}
ct^*\quad\text{with capacity}\quad \hat b(c)=3.
\end{align*}
Each original arc $e$ receives residual capacity $u(e)-l(e)$.
The only arc leaving $s^*$ is $s^*a$, and its capacity is $4$, so any $s^*$-$t^*$ flow has value at most $4$. The original constraints are feasible exactly when the auxiliary maximum flow has value
\begin{align*}
B=4.
\end{align*}
Equivalently, feasibility holds exactly when the maximum flow uses all capacity on $s^*a$, that is, when
\begin{align*}
F(s^*a)=4.
\end{align*}
In that case the four units supplied at $a$ can be routed through the residual network to meet the one-unit demand at $b$ and the three-unit demand at $c$.
[/example]
The auxiliary construction handles node balances directly, while many earlier flow problems were stated with a source, a sink, and a prescribed value. The next problem is to connect these two languages without introducing a new algorithm.
[quotetheorem:5795]
[citeproof:5795]
This reduction is useful in two directions, and its hypotheses prevent several different failures. The artificial arc must have both lower and upper capacity equal to $\nu$; if it only had upper capacity $\nu$, a zero-balance circulation could send less than $\nu$ around the return arc and would correspond to a smaller $s$-$t$ flow. For instance, in a graph with a single original arc $st$ of capacity $10$, asking for value $5$ but adding a return arc $ts$ with bounds $0\le f(ts)\le 5$ would allow the zero circulation, which does not encode a value-$5$ flow. The orientation of the artificial arc also matters: adding $st$ instead of $ts$ doubles the imbalance at the terminals rather than cancelling it. The assumption that $s$ and $t$ are distinct is part of the source-sink model, since when $s=t$ net outflow from the source and net inflow into the sink are no longer separate constraints. The theorem does not optimise over $\nu$ and does not decide which values are attainable in one step; it tests one prescribed value by forcing that value onto the return arc.
## Integral Circulations and Discrete Feasibility
Combinatorial optimisation usually asks for integral objects: whole jobs, whole edges, whole matches, or whole units of supply. Since the auxiliary network has integer capacities, the next problem is to decide whether a fractional feasible circulation ever needs to be kept fractional.
[quotetheorem:5796]
[citeproof:5796]
This result is a typical integrality transfer. The integer-data hypothesis is essential: with one arc forced to satisfy $l(a)=u(a)=1/2$, any feasible solution has $f(a)=1/2$, so no integral circulation can exist. Similarly, a fractional node balance can force fractional net inflow even when all arc capacities are large enough. What the theorem does not say is that every feasible circulation is integral, or that an arbitrary fractional solution can be rounded arc-by-arc while preserving all balances. It only says that the feasible set contains at least one integral point whenever it is nonempty and the data are integral. This distinction is important in modelling: fractional relaxations may have many fractional feasible points, but the auxiliary max-flow proof supplies an integral representative. That is the step that turns the following models into discrete existence results: in $b$-matching the integral representative chooses edges rather than fractions of edges, and in the tournament model it chooses a winner of each game rather than splitting a win. The next examples use that representative property to turn fractional feasibility tests into discrete choices.
[example: Bipartite b-Matching as a Circulation]
Let $G=(P\cup Q,E)$ be bipartite, and let $r:P\cup Q\to \mathbb Z_{\ge 0}$ prescribe the required degree at each vertex. Direct every edge from $P$ to $Q$, so an edge with endpoints $p\in P$ and $q\in Q$ becomes an arc $pq$. Give every arc lower bound $0$ and upper bound $1$, and define $\beta(p)=-r(p)$ for $p\in P$ and $\beta(q)=r(q)$ for $q\in Q$.
Suppose first that $f$ is an integral feasible circulation with node balance $\beta$. For each arc $pq$, the capacity bounds give
\begin{align*}
0\le f(pq)\le 1.
\end{align*}
Since $f(pq)$ is an integer, this implies
\begin{align*}
f(pq)\in\{0,1\}.
\end{align*}
Let $M=\{pq\in E:f(pq)=1\}$. For a vertex $p\in P$, no directed edge enters $p$, so the balance equation is
\begin{align*}
0-\sum_{pq\in E}f(pq)=\beta(p).
\end{align*}
Using $\beta(p)=-r(p)$ gives
\begin{align*}
-\sum_{pq\in E}f(pq)=-r(p).
\end{align*}
Multiplying by $-1$ gives
\begin{align*}
\sum_{pq\in E}f(pq)=r(p).
\end{align*}
Because $f(pq)=1$ exactly on the chosen edges in $M$, vertex $p$ is incident with exactly $r(p)$ chosen edges. For a vertex $q\in Q$, no directed edge leaves $q$, so the balance equation is
\begin{align*}
\sum_{pq\in E}f(pq)-0=\beta(q).
\end{align*}
Using $\beta(q)=r(q)$ gives
\begin{align*}
\sum_{pq\in E}f(pq)=r(q).
\end{align*}
Thus $q$ is also incident with exactly $r(q)$ chosen edges.
Conversely, suppose $M\subseteq E$ is a bipartite $b$-matching with exactly $r(v)$ chosen edges incident with each vertex $v$. Define $f(pq)=1$ when $pq\in M$, and define $f(pq)=0$ when $pq\notin M$. Then for every edge $pq$,
\begin{align*}
0\le f(pq)\le 1.
\end{align*}
At each $p\in P$, exactly $r(p)$ outgoing arcs have value $1$, so
\begin{align*}
0-\sum_{pq\in E}f(pq)=0-r(p)=-r(p)=\beta(p).
\end{align*}
At each $q\in Q$, exactly $r(q)$ incoming arcs have value $1$, so
\begin{align*}
\sum_{pq\in E}f(pq)-0=r(q)-0=r(q)=\beta(q).
\end{align*}
Therefore $f$ is an integral feasible circulation with node balance $\beta$.
Integral feasible circulations in this network are exactly bipartite $b$-matchings meeting the prescribed degrees. By *Integral Circulation Theorem*, if the corresponding circulation instance is feasible even fractionally, then it has an integral feasible circulation, so fractional feasibility already implies the existence of the desired bipartite $b$-matching.
[/example]
The previous example shows how degree constraints can be encoded as node balances. The same modelling idea also describes score-sequence problems, where the choices are outcomes of games rather than selected graph edges.
[example: Tournament Score Sequences]
For a tournament on vertices $1,\dots,n$, suppose the proposed scores $r_1,\dots,r_n$ are nonnegative integers satisfying
\begin{align*}
\sum_{i=1}^n r_i=\binom{n}{2}.
\end{align*}
Create one match node $m_{ij}$ for each unordered pair $\{i,j\}$, add arcs $m_{ij}i$ and $m_{ij}j$, and give each such arc lower bound $0$ and upper capacity $1$. Give each match node balance $b(m_{ij})=-1$, and give each player node balance $b(i)=r_i$. The total balance is zero because there are $\binom{n}{2}$ match nodes, so
\begin{align*}
\sum_v b(v)=\sum_{1\le i<j\le n}(-1)+\sum_{i=1}^n r_i.
\end{align*}
Using the hypothesis on the proposed scores gives
\begin{align*}
\sum_v b(v)=-\binom{n}{2}+\binom{n}{2}=0.
\end{align*}
Suppose $f$ is an integral feasible circulation in this network. At a match node $m_{ij}$ there are no incoming arcs, and the only outgoing arcs are $m_{ij}i$ and $m_{ij}j$, so the balance equation gives
\begin{align*}
0-\bigl(f(m_{ij}i)+f(m_{ij}j)\bigr)=-1.
\end{align*}
Multiplying by $-1$ gives
\begin{align*}
f(m_{ij}i)+f(m_{ij}j)=1.
\end{align*}
The arc bounds give
\begin{align*}
0\le f(m_{ij}i)\le 1
\end{align*}
and
\begin{align*}
0\le f(m_{ij}j)\le 1.
\end{align*}
Since both values are integers and their sum is $1$, exactly one of them is $1$ and the other is $0$. Declare $i$ to beat $j$ when $f(m_{ij}i)=1$, and declare $j$ to beat $i$ when $f(m_{ij}j)=1$.
At player node $i$, there are no outgoing arcs, and the incoming arcs are precisely the arcs from match nodes involving $i$. Hence the balance equation is
\begin{align*}
\sum_{j<i} f(m_{ji}i)+\sum_{i<j} f(m_{ij}i)-0=r_i.
\end{align*}
The left-hand side counts exactly the matches assigned as wins to player $i$, so player $i$ receives exactly $r_i$ wins.
Conversely, given a tournament with score sequence $(r_1,\dots,r_n)$, define $f(m_{ij}i)=1$ and $f(m_{ij}j)=0$ when $i$ beats $j$, and define $f(m_{ij}i)=0$ and $f(m_{ij}j)=1$ when $j$ beats $i$. Then every arc value is either $0$ or $1$, so all bounds $0\le f\le 1$ hold. Each match node sends exactly one unit out, so its balance is
\begin{align*}
0-\bigl(f(m_{ij}i)+f(m_{ij}j)\bigr)=0-1=-1=b(m_{ij}).
\end{align*}
Each player $i$ receives one unit for each of its wins and has no outgoing arcs, so its balance is
\begin{align*}
\sum_{j<i} f(m_{ji}i)+\sum_{i<j} f(m_{ij}i)-0=r_i=b(i).
\end{align*}
Thus integral feasible circulations in this network are exactly tournaments with score sequence $(r_1,\dots,r_n)$.
The cut inequalities explain the usual subset restrictions. If $S$ is a set of $k$ players and $X$ consists of the players in $S$ together with all match nodes $m_{ij}$ with $i,j\in S$, then
\begin{align*}
\sum_{v\in X}b(v)=\sum_{i\in S}r_i+\sum_{\{i,j\}\subseteq S}(-1).
\end{align*}
There are $\binom{k}{2}$ match nodes with both endpoints in $S$, so this becomes
\begin{align*}
\sum_{v\in X}b(v)=\sum_{i\in S}r_i-\binom{k}{2}.
\end{align*}
The arcs entering $X$ are exactly the arcs from match nodes with one endpoint outside $S$ to the player endpoint inside $S$, and there are $k(n-k)$ such arcs, each of capacity $1$. Since all lower bounds are $0$, *Hoffman's Circulation Theorem* gives
\begin{align*}
k(n-k)\ge \sum_{i\in S}r_i-\binom{k}{2}.
\end{align*}
Adding $\binom{k}{2}$ to both sides gives
\begin{align*}
\sum_{i\in S}r_i\le \binom{k}{2}+k(n-k).
\end{align*}
Applying the same upper bound to the complement of $S$ gives
\begin{align*}
\sum_{i\notin S}r_i\le \binom{n-k}{2}+k(n-k).
\end{align*}
Using $\sum_{i=1}^n r_i=\binom{n}{2}$, we have
\begin{align*}
\sum_{i\in S}r_i=\binom{n}{2}-\sum_{i\notin S}r_i.
\end{align*}
Therefore
\begin{align*}
\sum_{i\in S}r_i\ge \binom{n}{2}-\binom{n-k}{2}-k(n-k).
\end{align*}
Expanding the binomial coefficients gives
\begin{align*}
\binom{n}{2}-\binom{n-k}{2}-k(n-k)=\frac{n(n-1)}{2}-\frac{(n-k)(n-k-1)}{2}-k(n-k).
\end{align*}
Putting the terms over the common denominator $2$ gives
\begin{align*}
\frac{n(n-1)-(n-k)(n-k-1)-2k(n-k)}{2}.
\end{align*}
Now
\begin{align*}
n(n-1)=n^2-n
\end{align*}
and
\begin{align*}
(n-k)(n-k-1)=n^2-2nk+k^2-n+k.
\end{align*}
Substituting these expansions into the numerator gives
\begin{align*}
n^2-n-\bigl(n^2-2nk+k^2-n+k\bigr)-2kn+2k^2=k^2-k.
\end{align*}
Hence
\begin{align*}
\binom{n}{2}-\binom{n-k}{2}-k(n-k)=\frac{k^2-k}{2}=\binom{k}{2}.
\end{align*}
So
\begin{align*}
\sum_{i\in S}r_i\ge \binom{k}{2}.
\end{align*}
The circulation model records that every set of $k$ players receives at least the $\binom{k}{2}$ wins from games played inside the set, and at most those internal wins together with the $k(n-k)$ games against players outside the set.
[/example]
This example illustrates the broader point of the chapter. Feasibility problems that appear to be about very different discrete structures can often be expressed as circulation problems; once this is done, cuts certify infeasibility, maximum flow tests feasibility, and integrality turns fractional routing into combinatorial choices. In this sense, the circulation theorem is a network-flow version of the same certificate principle behind Hall's condition from Chapter 2, transportation models from this chapter, and the linear-programming dual certificates developed later in the polyhedral chapters. For example, in the transportation model, choosing $X$ to contain a set of stores and the forbidden or insufficiently connected factories gives the condition that the stores' total demand cannot exceed the capacity entering them; this is the same shape as Hall's condition that every set of applicants has enough available jobs. The common lesson is that a global feasible object exists exactly when every local obstruction has enough capacity to be overcome.
# 6. Minimum-Cost Flows
Minimum-cost flow adds prices to the feasible-flow world developed in Chapters 4 and 5. Max-flow theory asked whether enough material could be sent through a capacitated network; now we ask how to send a prescribed amount at minimum total cost. The central new idea is that feasibility is still governed by residual networks, while optimality is governed by the absence of cost-improving residual cycles or, equivalently, by a system of vertex potentials.
## Costed Networks and Residual Costs
What extra information is needed when several feasible flows have the same value? Capacities describe which routings are possible, but they do not distinguish a cheap route from an expensive one. This motivates adding a cost to each unit of flow on each arc.
[definition: Costed Network]
A costed network is a finite directed graph $D=(V,A)$ with source $s\in V$, sink $t\in V$, capacity function $u:A\to \mathbb R_{\ge 0}$, and cost function $c:A\to \mathbb R$.
[/definition]
The arc set $A$ may contain parallel arcs. Formally, an arc is an element $a\in A$ with a tail $\operatorname{tail}(a)$ and head $\operatorname{head}(a)$; notation such as $xy$ means an arc from $x$ to $y$ when no ambiguity is possible. Capacities, costs, and residual reverse arcs are attached to the arc itself, not merely to the ordered pair of endpoints.
The costed network records the data, but it does not yet specify the optimisation problem: we must decide the required amount of flow and the objective to minimise. This motivates defining a minimum-cost flow of a fixed value.
[definition: Minimum-Cost Flow Problem]
Let $(D,s,t,u,c)$ be a costed network and let $b\ge 0$. The minimum-cost flow problem of value $b$ is to find an $s$-$t$ flow $f:A\to \mathbb R_{\ge 0}$ satisfying $0\le f(a)\le u(a)$ for all $a\in A$ and $|f|=b$ such that
\begin{align*}
\operatorname{cost}(f)=\sum_{a\in A} c(a)f(a)
\end{align*}
is minimised. Here $\operatorname{cost}$ is the map from feasible flows of value $b$ to $\mathbb R$ defined by this formula.
[/definition]
This formulation separates the feasibility question from the cost question. If no flow of value $b$ exists, the optimisation problem is infeasible; if a feasible flow exists, finite capacities ensure that a minimum is attained.
[example: Transportation Problem]
Suppose factories $p_1,\dots,p_m$ have supplies $a_i\ge 0$ and warehouses $q_1,\dots,q_n$ have demands $d_j\ge 0$, with
\begin{align*}
\sum_{i=1}^m a_i=\sum_{j=1}^n d_j=:B.
\end{align*}
Build a network with arcs $s p_i$ of capacity $a_i$ and cost $0$, arcs $p_i q_j$ of capacity $B$ and cost $c_{ij}$, and arcs $q_j t$ of capacity $d_j$ and cost $0$. We show that feasible flows of value $B$ are exactly transportation plans, with the same objective value.
Let $f$ be a feasible flow of value $B$, and set
\begin{align*}
x_{ij}=f(p_iq_j).
\end{align*}
Since the only arcs leaving $s$ are the arcs $s p_i$, the value condition gives
\begin{align*}
B=|f|=\sum_{i=1}^m f(sp_i).
\end{align*}
Also $f(sp_i)\le a_i$ for every $i$, and $\sum_{i=1}^m a_i=B$, so
\begin{align*}
0=\sum_{i=1}^m a_i-\sum_{i=1}^m f(sp_i)=\sum_{i=1}^m \bigl(a_i-f(sp_i)\bigr).
\end{align*}
Each term $a_i-f(sp_i)$ is non-negative, so each term must be $0$. Hence
\begin{align*}
f(sp_i)=a_i \text{ for every } i.
\end{align*}
Flow conservation at $p_i$ gives
\begin{align*}
\sum_{j=1}^n x_{ij}=\sum_{j=1}^n f(p_iq_j)=f(sp_i)=a_i.
\end{align*}
Similarly, since the only arcs entering $t$ are the arcs $q_jt$,
\begin{align*}
B=|f|=\sum_{j=1}^n f(q_jt).
\end{align*}
The inequalities $f(q_jt)\le d_j$ and the identity $\sum_{j=1}^n d_j=B$ give
\begin{align*}
0=\sum_{j=1}^n d_j-\sum_{j=1}^n f(q_jt)=\sum_{j=1}^n \bigl(d_j-f(q_jt)\bigr).
\end{align*}
Each term $d_j-f(q_jt)$ is non-negative, so
\begin{align*}
f(q_jt)=d_j \text{ for every } j.
\end{align*}
Flow conservation at $q_j$ therefore gives
\begin{align*}
\sum_{i=1}^m x_{ij}=\sum_{i=1}^m f(p_iq_j)=f(q_jt)=d_j.
\end{align*}
Thus the numbers $x_{ij}$ ship all factory supply and meet all warehouse demand.
Conversely, suppose non-negative numbers $x_{ij}$ satisfy the transportation constraints
\begin{align*}
\sum_{j=1}^n x_{ij}=a_i \text{ for every } i.
\end{align*}
\begin{align*}
\sum_{i=1}^m x_{ij}=d_j \text{ for every } j.
\end{align*}
Define a flow by
\begin{align*}
f(sp_i)=a_i,\quad f(p_iq_j)=x_{ij},\quad f(q_jt)=d_j.
\end{align*}
The source arcs and sink arcs satisfy their capacities by construction. For each middle arc,
\begin{align*}
0\le x_{ij}\le \sum_{j'=1}^n x_{ij'}=a_i\le \sum_{i'=1}^m a_{i'}=B,
\end{align*}
so the capacity $B$ on $p_iq_j$ is respected. Flow conservation holds at every $p_i$ and $q_j$ by the two displayed transportation constraints, and the value is
\begin{align*}
|f|=\sum_{i=1}^m f(sp_i)=\sum_{i=1}^m a_i=B.
\end{align*}
Finally, the flow cost is
\begin{align*}
\operatorname{cost}(f)=\sum_{i=1}^m 0\cdot f(sp_i)+\sum_{i=1}^m\sum_{j=1}^n c_{ij}f(p_iq_j)+\sum_{j=1}^n 0\cdot f(q_jt).
\end{align*}
Since $f(p_iq_j)=x_{ij}$ and the other two groups of arcs have cost $0$,
\begin{align*}
\operatorname{cost}(f)=\sum_{i=1}^m\sum_{j=1}^n c_{ij}x_{ij}.
\end{align*}
So the transportation problem is exactly the value-$B$ minimum-cost flow problem on this finite-capacity network.
[/example]
The transportation model shows that costs select among many feasible routings. To test whether a chosen routing can still be improved, we need the same local-move language used in max-flow theory. This motivates a residual network that also remembers the cost of every permissible change.
[definition: Residual Network with Costs]
Let $f$ be a feasible flow in a costed network $(D,s,t,u,c)$. The residual network $D_f=(V,A_f)$ has residual capacity function $u_f:A_f\to \mathbb R_{>0}$ and residual cost function $c_f:A_f\to\mathbb R$ defined as follows. It contains a forward arc $xy$ with $u_f(xy)=u(xy)-f(xy)$ and $c_f(xy)=c(xy)$ whenever $xy\in A$ and $f(xy)<u(xy)$. It contains a backward arc $yx$ with $u_f(yx)=f(xy)$ and $c_f(yx)=-c(xy)$ whenever $xy\in A$ and $f(xy)>0$.
[/definition]
A directed cycle in $D_f$ gives a value-preserving perturbation of $f$: augment along the forward residual arcs and cancel along the backward residual arcs. The sign of the total residual cost tells us whether that perturbation improves the objective. This motivates isolating cycles of negative residual cost.
[definition: Negative Residual Cycle]
Let $f$ be a feasible flow in a costed network. A negative residual cycle is a directed cycle $C$ in $D_f$ such that
\begin{align*}
c_f(C)=\sum_{a\in C} c_f(a)<0,
\end{align*}
where $c_f$ denotes residual cost.
[/definition]
Negative residual cycles are the cost analogue of augmenting paths. An augmenting path changes the value of a flow; a negative residual cycle keeps the value fixed and improves the objective. This motivates the fundamental optimality test: there are no further cost improvements exactly when no such cycle remains.
[remark: Negative-Cycle Optimality Criterion]
For a feasible flow $f$ of a fixed value in a finite-capacity costed network, $f$ is minimum-cost among flows of that same value if and only if its residual network $D_f$ contains no directed cycle of negative residual cost.
[/remark]
This criterion converts global optimality into a graph-search condition, but only after feasibility and the target value have already been fixed. Negative residual cycles preserve the value of the flow, so they cannot detect whether a flow of the required value exists or whether a larger value could be sent. For example, take two parallel $s$-$t$ arcs, one of capacity $1$ and cost $0$, and one of capacity $1$ and cost $-5$. The zero flow has no residual cycle, but it is not relevant to the value-$1$ problem because it has the wrong value; the criterion only tests optimality inside the fixed-value feasible set.
The finite-capacity assumption keeps the feasible region bounded and rules out pathologies in which cost can be driven down without a minimum being attained. If an uncapacitated negative-cost directed cycle is reachable in the residual sense, adding more circulation around it can reduce cost without changing the $s$-$t$ value, so there is no minimum. A boundary case to keep in mind is a residual network with a negative $s$-$t$ path but no negative cycle: using that path would change the value, so it is irrelevant to optimality among flows of the same fixed value. The theorem therefore suggests an algorithm for a fixed-value problem: keep finding negative residual cycles and cancel them until none remain.
[example: A Cost-Improving Cycle]
Consider a feasible flow with one unit on the path $s a t$, where $c(sa)=4$ and $c(at)=4$. Suppose the residual network contains arcs $s b$, $b a$, and $a s$ with residual costs $c_f(sb)=1$, $c_f(ba)=1$, and $c_f(as)=-4$. For the directed cycle $C=s b a s$, the residual cost is
\begin{align*}
c_f(C)=c_f(sb)+c_f(ba)+c_f(as).
\end{align*}
Substituting the three residual costs gives
\begin{align*}
c_f(C)=1+1+(-4)=-2.
\end{align*}
Choose $\varepsilon>0$ no larger than the residual capacity of any arc on $C$. Augmenting by $\varepsilon$ around $C$ increases the flow on the residual arcs $s b$ and $b a$ by $\varepsilon$, while the residual arc $a s$ cancels $\varepsilon$ units of flow on the original arc $s a$. At $s$, the outgoing flow on $s a$ decreases by $\varepsilon$ and the outgoing flow on $s b$ increases by $\varepsilon$, so the net outflow from $s$ is unchanged. At $b$, the added inflow $\varepsilon$ on $s b$ equals the added outflow $\varepsilon$ on $b a$. At $a$, the added inflow $\varepsilon$ on $b a$ is offset by the decrease of $\varepsilon$ in inflow from $s a$, and the remaining arc $a t$ is unchanged. Thus the perturbation preserves flow conservation and keeps the same $s$-$t$ value.
The change in total cost is the augmentation amount times the residual cost of the cycle:
\begin{align*}
\varepsilon c_f(C)=\varepsilon(-2)=-2\varepsilon<0.
\end{align*}
So this value-preserving residual cycle strictly lowers the cost, which shows that optimality cannot be tested only by looking for cheaper $s$-$t$ paths.
[/example]
## Cycle-Canceling and Successive Shortest Augmenting Paths
How do we turn the optimality criterion into an algorithm? There are two complementary approaches. One starts with any feasible flow of the required value and repeatedly improves it; the other builds the flow from zero while making the cheapest available augmentation at each step.
The preceding example shows a single local improvement. To make this systematic, we repeatedly choose a negative residual cycle and push as much flow as possible around it. This motivates the cycle-canceling algorithm.
[definition: Cycle-Canceling Algorithm]
The cycle-canceling algorithm starts with a feasible flow $f$ of the required value. While $D_f$ contains a negative residual cycle $C$, it augments around $C$ by the largest amount permitted by the residual capacities on $C$.
[/definition]
The largest permitted augmentation saturates at least one residual arc on the chosen cycle, so for integral data the algorithm moves between integral flows. The remaining question is whether stopping has the desired meaning. This motivates the correctness theorem.
[quotetheorem:5797]
[citeproof:5797]
The termination clause is essential for arbitrary real capacities: the theorem proves correctness of the stopping condition, not a useful running-time bound for every possible cycle-selection rule. If a rule keeps choosing cycles whose bottleneck capacities are $1/2,1/4,1/8,\dots$ in a network with real capacities, the total improvement may have an infinite convergent sum and the counting argument for integral flows gives no reason for the algorithm to stop. Under the stated integral-capacity and integral-cost hypotheses, however, every augmentation from an integral flow uses a positive integer bottleneck and decreases the integral total cost by at least $1$. Since there are only finitely many integral flows of value $b$, this gives termination for the usual finite integral setting; the theorem is stated conditionally because the algorithm still needs a rule for finding negative cycles and no polynomial running-time guarantee is being claimed.
The hypothesis that the algorithm starts from a feasible flow of value $b$ is also indispensable. In a network with a single arc $s t$ of capacity $1$ and target $b=2$, there is no feasible starting flow; cycle canceling cannot repair the missing unit because cycles preserve value. Even when a feasible flow exists, the theorem does not claim that every possible cycle-selection rule is efficient: it only says that if the process stops, the absence of a negative residual cycle has the optimality meaning supplied by the previous theorem. This limitation motivates a more constructive method: start at zero and repeatedly send flow along a cheapest residual $s$-$t$ path.
[definition: Successive Shortest Augmenting Path Algorithm]
The successive shortest augmenting path algorithm starts with the zero flow. While the current value is less than $b$, it first requires the residual network to have no reachable negative cycle that makes shortest $s$-$t$ path cost undefined. It then finds a minimum-cost directed $s$-$t$ path in the residual network and augments along that path by the minimum of the residual capacity of the path and the remaining required value.
[/definition]
The word shortest refers to cost, not number of arcs. Negative residual arcs may appear, so shortest path computation must be interpreted in a residual network whose negative cycles are controlled. This motivates the theorem that the shortest-augmentation choice preserves optimality as the value increases.
[quotetheorem:5798]
[citeproof:5798]
The no-negative-cycle hypothesis is what prevents a shortest augmentation from being undermined by a separate value-preserving improvement. For a concrete failure, suppose the zero flow can send one unit along an $s$-$t$ arc of cost $0$, and the residual network also contains a directed cycle of capacity $1$ and total cost $-10$ disjoint from that path. The cheapest $s$-$t$ path has cost $0$, but after sending the unit the flow is not minimum-cost among value-$1$ flows, because canceling the negative cycle lowers the cost to $-10$ without changing the value.
The theorem also assumes that a residual $s$-$t$ path exists whenever the current value is below $b$. If the only arc from $s$ to $t$ has capacity $1$ and the target is $b=2$, the algorithm reaches value $1$ and then has no residual $s$-$t$ path; the issue is infeasibility of the target value, not a failure of the shortest-path choice. Thus the induction proves optimality only under the stated feasibility and residual-cycle assumptions. In implementation, potentials are used to keep reduced residual costs non-negative, making Dijkstra-style shortest path searches available after an initial Bellman-Ford computation.
[example: Minimum-Cost Bipartite Matching]
Let $G=(X\cup Y,E)$ be a bipartite graph with $|X|=|Y|=n$ and edge costs $c_{xy}$. Build a costed network by adding a source $s$ and sink $t$, arcs $sx$ of capacity $1$ and cost $0$ for $x\in X$, arcs $xy$ of capacity $1$ and cost $c_{xy}$ for $xy\in E$, and arcs $yt$ of capacity $1$ and cost $0$ for $y\in Y$. We show that integral flows of value $n$ are exactly perfect matchings, with the same objective value.
Let $f$ be an integral feasible flow of value $n$. Since the only arcs leaving $s$ are the arcs $sx$ and each has capacity $1$,
\begin{align*}
n=|f|=\sum_{x\in X} f(sx)\le \sum_{x\in X}1=n.
\end{align*}
Thus equality holds throughout. Each term satisfies $0\le f(sx)\le 1$, so equality in the sum forces $f(sx)=1$ for every $x\in X$. Flow conservation at $x$ gives
\begin{align*}
\sum_{y:\,xy\in E} f(xy)=f(sx)=1.
\end{align*}
Because $f$ is integral and each middle arc has capacity $1$, every $f(xy)$ is either $0$ or $1$, so exactly one arc leaving each $x$ has value $1$.
Similarly, the only arcs entering $t$ are the arcs $yt$, and each has capacity $1$, so
\begin{align*}
n=|f|=\sum_{y\in Y} f(yt)\le \sum_{y\in Y}1=n.
\end{align*}
Again equality forces $f(yt)=1$ for every $y\in Y$. Flow conservation at $y$ gives
\begin{align*}
\sum_{x:\,xy\in E} f(xy)=f(yt)=1.
\end{align*}
Thus exactly one selected middle arc enters each $y$. Therefore
\begin{align*}
M=\{xy\in E:f(xy)=1\}
\end{align*}
contains one edge incident with each vertex of $X$ and one edge incident with each vertex of $Y$, so $M$ is a perfect matching.
Conversely, let $M$ be a perfect matching in $G$. Define $f(sx)=1$ for every $x\in X$, define $f(yt)=1$ for every $y\in Y$, and define the middle-arc values by setting $f(xy)=1$ if $xy\in M$ and $f(xy)=0$ if $xy\notin M$. Every arc value is either $0$ or $1$, so all capacity constraints are satisfied. For each $x\in X$, exactly one matching edge leaves $x$, hence
\begin{align*}
\sum_{y:\,xy\in E} f(xy)=1=f(sx).
\end{align*}
For each $y\in Y$, exactly one matching edge enters $y$, hence
\begin{align*}
\sum_{x:\,xy\in E} f(xy)=1=f(yt).
\end{align*}
Thus flow conservation holds at all non-terminal vertices, and the value is
\begin{align*}
|f|=\sum_{x\in X} f(sx)=\sum_{x\in X}1=n.
\end{align*}
For either construction, the source arcs and sink arcs have cost $0$, while the middle arc $xy$ has cost $c_{xy}$. Therefore
\begin{align*}
\operatorname{cost}(f)=\sum_{x\in X}0\cdot f(sx)+\sum_{xy\in E}c_{xy}f(xy)+\sum_{y\in Y}0\cdot f(yt).
\end{align*}
The zero-cost terms vanish, so
\begin{align*}
\operatorname{cost}(f)=\sum_{xy\in E}c_{xy}f(xy).
\end{align*}
Since $f(xy)=1$ exactly on the selected matching edges and $f(xy)=0$ on the other middle arcs,
\begin{align*}
\operatorname{cost}(f)=\sum_{xy\in M}c_{xy}.
\end{align*}
So, once integrality is supplied by the augmenting-path construction or by *Integrality of Minimum-Cost Flows*, the minimum-cost value-$n$ flow problem is exactly the minimum-cost perfect matching problem. Successive shortest augmenting paths are therefore the weighted flow version of augmenting along alternating paths.
[/example]
## Reduced Costs and Potentials
How can we certify that a residual network has no negative cycle without listing all cycles? The answer is to assign a number to each vertex, called a potential, and use it to reweight every residual arc. This reweighting changes path costs by endpoint terms and leaves cycle costs unchanged.
The matching example links min-cost flow to weighted augmenting paths. To use shortest path algorithms repeatedly, we want arc costs that are non-negative without changing which paths are cheapest. This motivates potentials and reduced costs.
[definition: Potential and Reduced Cost]
Let $D=(V,A)$ be a directed graph with arc cost $c:A\to\mathbb R$. A potential is a function $\pi:V\to\mathbb R$. The reduced cost function with respect to $\pi$ is the map $c^\pi:A\to\mathbb R$ defined by
\begin{align*}
c^\pi(xy)=c(xy)+\pi(x)-\pi(y)
\end{align*}
for every arc $xy\in A$.
[/definition]
Reduced costs preserve the cost of every directed cycle because the potential terms telescope. For a path $P$ from $u$ to $v$, the reduced cost differs from the original cost by $\pi(u)-\pi(v)$, so shortest paths between fixed endpoints are not changed by reweighting. This motivates recording the cycle invariance as a theorem.
[quotetheorem:5799]
[citeproof:5799]
The hypothesis that $C$ is a cycle is essential because only then do the endpoint potential terms cancel. For a path from $u$ to $v$, the cost changes by $\pi(u)-\pi(v)$, so potentials preserve comparisons only among paths with the same endpoints. They do not preserve arbitrary path-order comparisons across different endpoints, and a potential on the original graph alone is not an optimality certificate. The certification statement concerns residual arcs: if all available residual moves have non-negative reduced cost, then every residual cycle has non-negative original cost. As a concrete counterexample, let $P_1$ be a single arc $s x$ of cost $0$ and let $P_2$ be a single arc $s y$ of cost $1$. With potentials $\pi(s)=0$, $\pi(x)=-100$, and $\pi(y)=0$, the reduced costs become $100$ and $1$, so reweighting reverses the comparison between paths ending at different vertices. No contradiction arises, because there is no closed walk on which the endpoint terms must cancel. This is precisely why potentials are useful for optimality only after passing to the residual network. This motivates the reduced-cost optimality theorem, which says that every optimum has such a certificate and every such certificate proves optimality.
[remark: Reduced-Cost Optimality Certificate]
For a feasible fixed-value flow $f$, a potential $\pi:V\to\mathbb R$ certifies optimality if every residual arc $xy\in A_f$ has non-negative reduced residual cost
\begin{align*}
c_f(xy)+\pi(x)-\pi(y)\ge 0.
\end{align*}
Then every residual cycle has non-negative original residual cost by telescoping of the potential terms, so the negative-cycle optimality criterion proves that $f$ is minimum-cost among flows of the same value.
[/remark]
This certificate is the min-cost-flow analogue of complementary slackness from linear programming duality: residual arcs with positive remaining freedom play the role of variables that can still move, and non-negative reduced costs say that no allowed cycle move has negative dual slack. Feasibility and the fixed value remain part of the certificate. For instance, in a network with one arc $s t$ of capacity $1$ and cost $0$, the zero flow has residual reduced costs non-negative under the zero potential, but it is not a solution to the value-$1$ problem because it sends no flow. Reduced costs rule out cost-improving changes of an already feasible fixed-value flow; they do not by themselves prove capacities, conservation, or the required value.
The condition is imposed only on residual arcs, because those are exactly the infinitesimal changes still available from $f$. Testing only original arcs, or testing original arcs without respecting residual direction, can miss the decisive improvement. Consider two parallel paths from $s$ to $t$: $s a t$ has total cost $10$ and currently carries one unit, while $s b t$ has total cost $2$ and has one unit of spare capacity. In the residual network there is a cycle $s b t a s$, using the forward cheap path and the backward arcs of the expensive path, with total residual cost $2-10=-8$. Looking only at the original forward arcs does not see the backward cancellation that moves the existing unit from the expensive path to the cheap path. If some residual arc has negative reduced cost, optimality is not automatically false, since a single negative arc need not lie on a negative cycle, but the clean certificate has failed. When all residual reduced costs are non-negative, every available cycle improvement is excluded, and the negative-cycle criterion completes the argument.
[example: Shortest Path Potentials in a Residual Network]
Suppose $D_f$ has no negative residual cycle. Add a new root $r$ and zero-cost arcs $rv$ to every vertex $v$, and let $d(v)$ be the shortest path distance from $r$ to $v$ in this enlarged residual network. The added arc $rv$ gives a path of cost $0$ to each vertex, so every vertex is reachable from $r$; the no-negative-cycle hypothesis ensures that these distances are finite [real numbers](/page/Real%20Numbers) rather than undefined by arbitrarily cheap walks.
For any residual arc $xy$, take a shortest path from $r$ to $x$ and then traverse $xy$. This gives an $r$-$y$ walk of cost $d(x)+c_f(xy)$, so the definition of $d(y)$ as the minimum $r$-$y$ cost gives
\begin{align*}
d(y)\le d(x)+c_f(xy).
\end{align*}
Moving $d(y)$ to the right-hand side gives
\begin{align*}
0\le d(x)+c_f(xy)-d(y).
\end{align*}
Reordering the terms gives
\begin{align*}
0\le c_f(xy)+d(x)-d(y).
\end{align*}
Thus, with the potential $\pi(v)=d(v)$, every residual arc $xy$ satisfies
\begin{align*}
c_f^\pi(xy)=c_f(xy)+\pi(x)-\pi(y)=c_f(xy)+d(x)-d(y)\ge 0.
\end{align*}
By the *Reduced-Cost Optimality Certificate*, these shortest path distances give an optimality certificate for the current fixed-value flow.
The same calculation explains the potential update in successive shortest path methods. If the current potential is $\pi$ and $\delta(v)$ is the newly computed shortest path distance using the current reduced costs $c_f^\pi$, then every residual arc $xy$ satisfies
\begin{align*}
\delta(y)\le \delta(x)+c_f^\pi(xy).
\end{align*}
Equivalently,
\begin{align*}
c_f^\pi(xy)+\delta(x)-\delta(y)\ge 0.
\end{align*}
For the updated potential $\pi'(v)=\pi(v)+\delta(v)$, we compute
\begin{align*}
c_f^{\pi'}(xy)=c_f(xy)+\pi'(x)-\pi'(y).
\end{align*}
Substituting $\pi'(x)=\pi(x)+\delta(x)$ and $\pi'(y)=\pi(y)+\delta(y)$ gives
\begin{align*}
c_f^{\pi'}(xy)=c_f(xy)+\pi(x)+\delta(x)-\pi(y)-\delta(y).
\end{align*}
Grouping the current reduced cost terms gives
\begin{align*}
c_f^{\pi'}(xy)=c_f^\pi(xy)+\delta(x)-\delta(y).
\end{align*}
The previous inequality therefore gives
\begin{align*}
c_f^{\pi'}(xy)\ge 0.
\end{align*}
So shortest path distances are not only a certificate at one step; they are exactly the update that keeps later reduced residual costs non-negative.
[/example]
## Integrality and Modelling Consequences
Why do min-cost flow algorithms solve discrete problems such as matching without producing fractional answers? The feasible region permits real-valued flows, but the network constraints have an integral structure. Augmenting-path and cycle-canceling algorithms expose this structure directly.
The reduced-cost theorem provides certificates for optimality, while the examples show that many discrete optimisation problems can be written as min-cost flows. The remaining modelling question is whether the continuous flow formulation returns integral objects when the data are integral. This motivates the integrality theorem.
[quotetheorem:5800]
[citeproof:5800]
This theorem is often the reason for building a network model. It lets us solve an integer optimisation problem through a linear flow formulation while retaining at least one integral optimal solution. The hypotheses cannot simply be dropped: if an arc capacity is $1/2$ or the prescribed value is $b=1/2$, then a feasible optimum may necessarily carry fractional flow. The conclusion also does not say that the optimum is unique or that every optimum is integral; when several optimal flows exist, convex combinations of integral optima can be fractional and still have the same cost. What the theorem guarantees is existence of an integral optimum whenever the capacity data and required value have the integral structure.
[remark: Costs Need Not Be Integral for Integrality]
The integrality conclusion depends on capacities and required value, not on integrality of costs. Costs determine which integral feasible flow is cheapest, but they do not affect the integral structure of the feasible-flow polytope.
[/remark]
The transportation and bipartite matching examples now acquire stronger conclusions. With integral supplies, demands, and matching capacities, there are optimal solutions that ship integer amounts and choose whole edges. This is the bridge from continuous linear optimisation back to finite combinatorial objects, and it prepares the later polyhedral viewpoint where the same integrality phenomenon is expressed through integral polyhedra and total unimodularity.
The same language also connects this chapter to linear programming and economics. Potentials are dual variables for conservation constraints, reduced costs are dual slacks for available residual moves, and the absence of negative residual cycles is the complementary-slackness condition expressed in graph form. In operations research this interpretation underlies transportation pricing, assignment markets, and circulation models; algorithmically it explains why shortest-path methods and dual certificates appear together rather than as separate techniques.
# 7. General Matchings
This chapter returns to the matching problem of Chapter 2 and removes the bipartite assumption, studying matching in arbitrary finite graphs. The new obstruction is the presence of odd cycles: an alternating search can return to the same side of the search tree and create a structure that bipartite methods have no way to represent. The central idea is Edmonds' blossom contraction, which turns this obstruction into a smaller graph without losing the existence of augmenting paths.
## Odd Cycles and the Failure of Bipartite Search
What breaks when the two colour classes disappear? In a bipartite graph, every alternating search tree has a fixed parity structure: matched and unmatched edges alternate, and edges between two even vertices cannot occur across the same search tree. In a general graph, such an edge creates an odd alternating cycle, and this is the first sign that the search tree must remember more information than a bipartite tree does.
[definition: Matching]
Let $G=(V,E)$ be a finite graph. A matching in $G$ is a set $M \subset E$ such that no two edges of $M$ share an endpoint. A vertex $v \in V$ is matched by $M$ if it is incident with an edge of $M$, and exposed if it is not incident with any edge of $M$.
[/definition]
A matching records the current feasible solution, but optimisation needs a controlled move from one feasible solution to a larger one. The relevant move is not an arbitrary edge swap: it is a path whose matched and unmatched edges are arranged so that the symmetric difference is still a matching.
[definition: Alternating and Augmenting Path]
Let $M$ be a matching in a graph $G$. An $M$-alternating path is a path whose edges alternately lie outside $M$ and inside $M$. An $M$-augmenting path is an $M$-alternating path whose two endpoints are exposed and whose first and last edges are not in $M$.
[/definition]
If $P$ is an $M$-augmenting path, replacing $M$ by $M \triangle E(P)$ increases the size of the matching by $1$. The obstruction in general graphs is not this exchange operation, but finding the right path when the search encounters an odd alternating cycle.
[example: Triangle Obstruction]
Let $G$ be the triangle with edge set $\{ab,bc,ca\}$ and let $M=\{ab\}$. The vertex $a$ is matched because it is incident with $ab\in M$, the vertex $b$ is matched for the same reason, and $c$ is exposed because neither $bc$ nor $ca$ lies in $M$. An $M$-augmenting path must have two exposed endpoints, but this graph has only one exposed vertex, namely $c$, so no such path can exist.
Now attach a new exposed vertex $r$ by the edge $ra$ and keep the matching $M=\{ab\}$. The path
\begin{align*}
r,a,b,c
\end{align*}
has edge sequence
\begin{align*}
ra\notin M,\quad ab\in M,\quad bc\notin M.
\end{align*}
Its endpoints $r$ and $c$ are exposed, so this is an $M$-augmenting path. During the alternating search from $r$, the triangle is not behaving like a bipartite layer structure: the unmatched edge $ac\notin M$ closes the odd cycle
\begin{align*}
a,b,c,a,
\end{align*}
whose edge statuses are
\begin{align*}
ab\in M,\quad bc\notin M,\quad ca\notin M.
\end{align*}
Thus the extra triangle edge is not a harmless cross-edge; it records an odd alternating obstruction attached to the search, which is exactly the phenomenon later packaged as a blossom.
[/example]
The triangle example shows why assigning every reached vertex a permanent left-or-right parity is not valid. To continue the search, the odd alternating cycle must be named as a single object with a distinguished attachment point to the current forest. This motivates the following definition.
[definition: Blossom]
Let $M$ be a matching in a graph $G$. A blossom with respect to $M$ consists of an odd cycle $C$ and a distinguished vertex $b \in V(C)$, called the base, such that $C \setminus \{b\}$ is matched by $M$ along alternating edges of $C$.
[/definition]
For any vertex $v \in V(C)$, there are two simple arcs in $C$ from $b$ to $v$. Since $C$ has odd length and $C \setminus \{b\}$ is matched along the cycle, exactly one of these arcs has the parity needed in a given alternating-path lift. The base is the point through which the blossom communicates with the search. Since blossoms are discovered while growing paths from exposed vertices, we need a search object that records both reachability and alternating parity. This motivates the following definition.
[definition: Alternating Forest]
Let $M$ be a matching in $G$. An alternating forest is a forest rooted at exposed vertices such that every root-to-vertex path is $M$-alternating, every edge from an even-level vertex to a child is not in $M$, and every edge from an odd-level vertex to a child lies in $M$.
[/definition]
In bipartite graphs this forest either reaches another exposed root, giving an augmenting path, or certifies that no such path exists in the explored region. Before building the general search algorithm, we need the matching-theoretic certificate that makes augmenting paths the right target in every finite graph.
[quotetheorem:5801]
[citeproof:5801]
The finiteness of $G$ is used to make the symmetric-difference decomposition a finite collection of paths and cycles, and the matching condition is what bounds every degree by $2$. The theorem does not describe how to find an augmenting path; it only says that such a path is the exact obstruction to optimality. Thus Berge's criterion separates the correctness question from the search question: once a search procedure is known to find an augmenting path whenever one exists, repeated augmentation is enough to reach a maximum matching. The next section explains the operation that makes this search possible in the presence of odd cycles.
## Shrinking Blossoms and Lifting Paths
How can an odd alternating cycle be ignored without losing the path we are looking for? The answer is to contract the blossom to a single pseudovertex, search in the contracted graph, and then expand the pseudovertex using the alternating structure of the cycle.
[illustration:co-blossom-contraction]
[definition: Blossom Contraction]
Let $B \subset V(G)$ be the vertex set of a blossom in $G$. The contracted graph $G/B$ is obtained by replacing all vertices of $B$ by a single vertex $\beta$, deleting loops, and preserving edges from $V(G) \setminus B$ to $B$ as edges incident with $\beta$. In these notes graphs are treated as simple graphs after contraction: if several edges between the same outside vertex and $\beta$ arise, they are merged into one edge while remembering an original representative edge for path lifting. The contracted matching $M/B$ consists of the edges of $M$ not contained in the blossom, with endpoints in $B$ replaced by $\beta$ when necessary.
[/definition]
Contraction reduces the number of vertices, but reduction alone would be useless if it changed whether an augmenting path exists. The key point is that the odd alternating cycle has enough internal flexibility to route any path through the correct vertex while preserving alternation. Thus the next theorem is the local correctness statement on which every recursive or iterative blossom search depends.
[quotetheorem:5802]
[citeproof:5802]
This theorem is local but not merely cosmetic: the oddness of the cycle and the exposed role of the base are precisely what make the parity choice possible. It does not say that an arbitrary odd cycle may be contracted during a matching search; the cycle must be the blossom produced by two even-level vertices in the current alternating forest. For instance, take a triangle $abc$ with $M=\{ab\}$ and attach an exposed vertex $r$ to $c$. The path $r,c,b,a$ is an augmenting path, but contracting the triangle as an unbased odd cycle can turn the contracted matching into one where the pseudovertex is treated as matched internally and the edge from $r$ is no longer represented as the first edge of an augmenting path. The missing information is the base and stem parity, not the triangle itself. An efficient algorithm must therefore remember enough contraction history to lift paths back through several nested blossoms. With that caveat, shrinking turns the difficult search state into an ordinary vertex in a smaller graph, which is the mechanism used by Edmonds' algorithm.
[example: Path Lifted From a Contracted Blossom]
Consider the blossom whose cycle is
\begin{align*}
b,v_1,v_2,v_3,v_4,b
\end{align*}
and whose matching edges on the cycle are $v_1v_2$ and $v_3v_4$. The five cycle edges therefore have matching status
\begin{align*}
bv_1\notin M,\quad v_1v_2\in M,\quad v_2v_3\notin M,\quad v_3v_4\in M,\quad v_4b\notin M.
\end{align*}
Suppose an augmenting path in the contracted graph contains the segment
\begin{align*}
x,\beta,t,
\end{align*}
where the edge $x\beta$ represents the original matched edge $xb\in M$, and the edge $\beta t$ represents the original unmatched edge $v_3t\notin M$, with $t$ exposed.
To expand $\beta$, we need a path inside the blossom from $b$ to $v_3$ whose first edge is outside $M$ and whose last edge is inside $M$. There are two simple routes from $b$ to $v_3$ along the cycle:
\begin{align*}
b,v_1,v_2,v_3
\end{align*}
and
\begin{align*}
b,v_4,v_3.
\end{align*}
The first route has edge statuses
\begin{align*}
bv_1\notin M,\quad v_1v_2\in M,\quad v_2v_3\notin M,
\end{align*}
so it ends with an unmatched edge and therefore cannot be followed by the unmatched outside edge $v_3t\notin M$ in an alternating path. The second route has edge statuses
\begin{align*}
bv_4\notin M,\quad v_4v_3\in M,
\end{align*}
so the expanded edge sequence is
\begin{align*}
xb\in M,\quad bv_4\notin M,\quad v_4v_3\in M,\quad v_3t\notin M.
\end{align*}
These four statuses alternate. Hence replacing the contracted segment $x,\beta,t$ by
\begin{align*}
x,b,v_4,v_3,t
\end{align*}
preserves the alternating-path structure and gives the lifted augmenting path in the original graph.
[/example]
This example illustrates the parity choice inside the blossom. Since the cycle has odd length, exactly one of the two routes from the base to the entry vertex has the parity needed for the path being lifted. Repeating this local decision inside an alternating-forest search gives the full algorithm, so we now state the algorithmic conclusion.
[quotetheorem:5803]
[citeproof:5803]
The algorithm relies on three hypotheses: the graph is finite, the current set of edges is a matching, and contractions are performed only on blossoms found from even-level vertices in the alternating forest. Finiteness guarantees termination, because each shrink reduces the current graph and each augmentation increases $|M|$; between augmentations there can be only finitely many forest extensions and contractions. The theorem does not assert a particular running time or data structure, only the correctness of the search-and-shrink paradigm. Its failure certificate is meaningful because, in the final contracted graph, every edge incident with the reachable even set has already been classified: an edge to an unreached vertex would extend the forest, an edge to an even vertex in another tree would augment, and an edge to an even vertex in the same tree would create another blossom. Thus no usable edge means no augmenting path in the contracted graph, and shrinking correctness transfers that conclusion back to the original graph.
## Tutte and Berge Formula Viewpoints
What does a maximum matching look like when no augmenting path remains? The algorithmic viewpoint says that the search fails. The structural viewpoint turns failure into inequalities involving odd components after deleting vertices.
[definition: Odd Component Count]
For a finite graph $G$ and a set $S \subset V(G)$, let $o(G-S)$ denote the number of connected components of the induced graph $G-S$ whose vertex sets have odd cardinality.
[/definition]
Odd components matter because a perfect matching can match at most one vertex of each odd component to a vertex outside that component. This gives a necessary obstruction, and Tutte's theorem says that this obstruction is the only one.
[quotetheorem:5804]
[citeproof:5804]
The hypotheses, the direction of the implication, and the quantifier in Tutte's theorem should be kept separate. The theorem is a finite-graph result: finiteness enters both the edge-maximal enlargement and the component count, so the statement is not an infinite matching theorem. It is also a perfect-matching theorem, not a recipe for constructing a matching or for counting all perfect matchings. The condition is necessary because odd components need outside partners, and it is sufficient only because the proof can enlarge the graph and then recover a violating deletion set when no perfect matching exists.
The requirement to test every $S$ is essential. Taking $S=\varnothing$ detects the coarse parity obstruction: any graph with an odd number of vertices has $o(G)=1>|S|$. But many failures are hidden until the right vertices are deleted. A star $K_{1,3}$ passes the empty-set test, since it has one odd component, but for $S$ equal to the centre, the graph $G-S$ has three odd isolated vertices and $3>|S|=1$. Similarly, two disjoint triangles joined through a single cut vertex have no perfect matching because deleting the cut vertex leaves two odd components competing for one outside vertex. Compared with Hall's theorem, the obstruction is no longer a deficient neighbourhood on one side of a bipartition, but a surplus of odd components relative to the vertices available to connect them. This limitation is also the next structural clue: in the maximum-matching formula, the worst surplus does not merely obstruct perfection; it measures exactly how many vertices must remain unmatched.
[example: Perfect Matching in a Small Non-Bipartite Graph]
Let $G$ have edge set
\begin{align*}
\{12,23,34,45,51,13,56\}.
\end{align*}
The vertices $1,2,3$ form a triangle, so $G$ is not bipartite. The three edges
\begin{align*}
56,\quad 12,\quad 34
\end{align*}
are pairwise disjoint and cover all six vertices, so they form a perfect matching.
We verify Tutte's inequality for this graph using that matching. Fix any set $S\subset V(G)$, and let $C$ be an odd component of $G-S$. Since $C$ has odd cardinality, the perfect matching edges cannot all pair vertices of $C$ with other vertices of $C$: internal matching edges cover vertices two at a time, so an odd number of vertices leaves at least one vertex of $C$ matched to a vertex outside $C$. No vertex of $C$ is adjacent in $G-S$ to a vertex outside $C$, because $C$ is a component of $G-S$. Therefore the matching edge leaving $C$ must go from $C$ to a vertex of $S$.
Different odd components of $G-S$ receive different vertices of $S$, because the matching edges are disjoint and no vertex of $S$ can be matched to two components. Thus the map sending each odd component to the endpoint in $S$ of its crossing matching edge is injective. Hence
\begin{align*}
o(G-S)\le |S|
\end{align*}
for every $S\subset V(G)$. This confirms, in this concrete non-bipartite graph, that the triangle does not prevent a perfect matching and that every odd component created by deleting $S$ is paid for by a distinct deleted vertex.
[/example]
The same obstruction count extends from perfect matchings to maximum matchings, but the question changes from feasibility to deficiency. If $o(G-S)>|S|$, then after using all vertices of $S$ to connect to odd components, at least $o(G-S)-|S|$ odd components still have to contain an exposed vertex. The maximum-matching formula says that this elementary lower bound is also sharp: the worst such deletion accounts for exactly the minimum possible number of exposed vertices.
[remark: Berge Formula]
For a finite graph $G$, the maximum cardinality of a matching is
\begin{align*}
\frac{1}{2}\min_{S \subset V(G)} \left(|V(G)| + |S| - o(G-S)\right).
\end{align*}
Equivalently, the minimum possible number of exposed vertices in a matching is
\begin{align*}
\max_{S \subset V(G)} \left(o(G-S)-|S|\right).
\end{align*}
[/remark]
The formula packages Tutte's odd-component obstruction into the exact deficiency of a maximum matching. In the perfect-matching case the maximum on the second line is at most $0$, which is precisely Tutte's condition; in the general case its positive value counts the unavoidable exposed vertices. The Gallai-Edmonds sets below give a structural interpretation of the set $S$ where this maximum is attained.
[definition: Gallai-Edmonds Sets]
Let $G$ be a finite graph. Let $D$ be the set of vertices which are exposed by at least one maximum matching of $G$. Let $A$ be the set of vertices in $V(G) \setminus D$ adjacent to at least one vertex of $D$. Let $C=V(G)\setminus(A\cup D)$.
[/definition]
These three sets describe the stable shape of all maximum matchings. The vertices in $D$ live in pieces that are one vertex short of being perfectly matchable, the vertices in $A$ choose which pieces to attach to, and $C$ has an ordinary perfect matching structure. The theorem below records this organisation precisely.
[remark: Gallai-Edmonds Decomposition]
For a finite graph $G$ with Gallai-Edmonds sets $D,A,C$, every component of $G[D]$ is factor-critical, $G[C]$ has a perfect matching, and every maximum matching is built from a perfect matching of $G[C]$, near-perfect matchings in the components of $G[D]$, and a matching that saturates $A$ into distinct components of $G[D]$. The unmatched vertices in any maximum matching lie in components of $G[D]$ not matched to $A$.
[/remark]
The content of the decomposition is a canonical description of where unmatched vertices can live, but it should not be read as a statement about an arbitrary matching or an arbitrary partition of the vertex set. The finiteness and maximum-matching hypotheses are essential. If the matching used to define exposed vertices is not maximum, the sets can give the wrong picture: in the path $P_4$ with matching consisting of only its middle edge, the two endpoints are exposed, although the graph has a perfect matching and should have no deficient part. If $D$ were defined using vertices exposed by one chosen maximum matching rather than by at least one maximum matching, the result would also fail: in the path $P_3$, either endpoint may be left unmatched by a maximum matching, and the factor-critical component is the whole three-vertex path, not a single selected endpoint.
The theorem also has definite limits. It does not say that every odd component of a graph is factor-critical, nor that the same vertex is unmatched in a component of $G[D]$ for every maximum matching. The assertion that components of $G[D]$ are factor-critical is a structural conclusion forced by the family of all maximum matchings; the path $P_5$ is an odd connected graph but is not factor-critical, since deleting the neighbour of an endpoint leaves an isolated vertex and a three-vertex path. The set $C$ behaves like the already-perfectly-matchable part of the graph, while $A$ acts as the interface selecting which difficult components are repaired by outside matching edges. What remains in $D$ is not arbitrary odd debris: each component is as close as possible to having a perfect matching, in the sense that deleting any one vertex removes the obstruction. This is why the theorem is the structural companion to Berge's formula: the components of $G[D]$ account for the unavoidable exposed vertices, while the matching from $A$ into distinct components records exactly which deficiencies are repaired. To interpret that phrase precisely, we isolate the matching property carried by the odd components in $D$.
[definition: Factor-Critical Graph]
A graph $H$ is factor-critical if, for every vertex $v \in V(H)$, the graph $H-v$ has a perfect matching.
[/definition]
The smallest factor-critical graphs are odd cycles. Blossoms are algorithmic traces of factor-critical behaviour discovered during an alternating search.
[example: Odd Cycle as a Factor-Critical Piece]
Let $H=C_{2k+1}$, with vertices written cyclically as
\begin{align*}
u_0,u_1,\ldots,u_{2k},u_0.
\end{align*}
Fix a vertex $v=u_i$. After deleting $u_i$, the remaining vertices occur in the path
\begin{align*}
u_{i+1},u_{i+2},\ldots,u_{i+2k},
\end{align*}
where the indices are read modulo $2k+1$. This path has exactly $2k$ vertices, and its consecutive edges are
\begin{align*}
u_{i+1}u_{i+2},\quad u_{i+2}u_{i+3},\quad \ldots,\quad u_{i+2k-1}u_{i+2k}.
\end{align*}
Now take the $k$ edges
\begin{align*}
u_{i+1}u_{i+2},\quad
u_{i+3}u_{i+4},\quad
\ldots,\quad
u_{i+2k-1}u_{i+2k}.
\end{align*}
These edges are pairwise disjoint, because their endpoints are the disjoint pairs
\begin{align*}
\{u_{i+1},u_{i+2}\},\quad
\{u_{i+3},u_{i+4}\},\quad
\ldots,\quad
\{u_{i+2k-1},u_{i+2k}\}.
\end{align*}
Together they cover all $2k$ vertices of $H-u_i$. Hence they form a perfect matching of $H-u_i$.
Since the deleted vertex $u_i$ was arbitrary, $H-v$ has a perfect matching for every $v\in V(H)$. Therefore $C_{2k+1}$ is factor-critical. In a larger graph, such an odd component can be internally matched except for one chosen vertex; an outside matching edge into the component is exactly what removes that remaining deficiency.
[/example]
The chapter's message is that general matching theory has the same augmenting-path heart as the bipartite theory, but its certificates must account for odd sets. Edmonds' shrinking gives the algorithmic mechanism, while Tutte, Berge, and Gallai-Edmonds describe the structure that the algorithm is uncovering. The next chapter changes focus from matchings to matroids, where exchange replaces augmenting paths as the certificate behind greedy optimisation.
# 8. Matroids and Greedy Optimisation
This chapter assumes the earlier material on finite set systems, forests, spanning trees, and Kruskal's algorithm from Chapter 1, together with basic finite set notation and the linear-algebra notion of [linear independence](/page/Linear%20Independence) over a field. Matroids isolate the exact exchange structure behind the greedy algorithms seen earlier in the course. For shortest paths and flows, optimality came from potentials, cuts, and augmenting paths; here the certificate is that local exchanges among independent sets are always possible. The chapter asks when repeatedly taking the best available element produces a globally optimal feasible solution, and why this succeeds for spanning trees, representatives under quotas, and linear independence but fails for arbitrary set systems.
## From Independence Systems to Matroids
A greedy algorithm needs a family of feasible partial solutions that can be built element by element. The first question is which set systems allow this incremental view at all, before imposing the stronger exchange property that makes greedy optimisation work.
[definition: Independence System]
Let $E$ be a finite set. An independence system on $E$ is a non-empty family $\mathcal I \subseteq 2^E$ such that if $I \in \mathcal I$ and $J \subseteq I$, then $J \in \mathcal I$.
[/definition]
The downward-closed condition says that deleting chosen elements cannot destroy feasibility. To see why this condition alone is insufficient for greedy optimisation, we first examine a system where maximal feasible sets have incompatible sizes.
[example: Feasible Sets With Unequal Maximal Sizes]
Let $E=\{a,b,c\}$ and let
\begin{align*}
\mathcal I=\{\varnothing,\{a\},\{b\},\{c\},\{a,b\}\}.
\end{align*}
We verify that $\mathcal I$ is an independence system by checking downward closure. The subsets of $\varnothing$ are only $\varnothing$; the subsets of $\{a\}$ are $\varnothing,\{a\}$; the subsets of $\{b\}$ are $\varnothing,\{b\}$; the subsets of $\{c\}$ are $\varnothing,\{c\}$; and the subsets of $\{a,b\}$ are
\begin{align*}
\varnothing,\{a\},\{b\},\{a,b\}.
\end{align*}
All of these sets belong to $\mathcal I$, so every subset of an independent set is again independent.
The set $\{c\}$ is maximal under inclusion because the only larger subsets of $E$ containing it are
\begin{align*}
\{a,c\},\quad \{b,c\},\quad \{a,b,c\},
\end{align*}
and none of these belongs to $\mathcal I$. The set $\{a,b\}$ is also maximal because the only larger subset of $E$ containing it is $\{a,b,c\}$, which is not in $\mathcal I$. However,
\begin{align*}
|\{c\}|=1
\qquad\text{and}\qquad
|\{a,b\}|=2.
\end{align*}
Thus this independence system has maximal feasible sets of different sizes, so reaching a maximal feasible set need not mean reaching a largest feasible set.
[/example]
The example shows why heredity alone is too weak: a greedy construction can get stuck at a maximal feasible set that is smaller than another feasible set. To make local extension meaningful, the set system needs an augmentation rule saying that whenever one feasible set is smaller than another, some element of the larger set can be added without losing feasibility.
[definition: Matroid]
A matroid is a pair $M=(E,\mathcal I)$ where $E$ is a finite set and $\mathcal I \subseteq 2^E$ satisfies:
1. $\varnothing \in \mathcal I$.
2. If $I \in \mathcal I$ and $J \subseteq I$, then $J \in \mathcal I$.
3. If $I,J \in \mathcal I$ and $|I|<|J|$, then there exists $e \in J\setminus I$ such that $I\cup\{e\}\in \mathcal I$.
[/definition]
The third axiom is the exchange axiom. To connect it with familiar algebra, the following example shows that ordinary linear independence already has this augmentation behaviour.
[example: Linear Independence]
Let $k$ be a field, let $n\in \mathbb N$, let $E$ be finite, and choose a vector $v_e\in k^n$ for each $e\in E$. Let $\mathcal I$ be the family of subsets $I\subseteq E$ for which the family $(v_e)_{e\in I}$ is linearly independent. The empty set belongs to $\mathcal I$, since the only linear relation among no vectors is the empty relation.
If $I\in\mathcal I$ and $J\subseteq I$, then $J\in\mathcal I$. Indeed, suppose
\begin{align*}
\sum_{e\in J}\alpha_e v_e=0.
\end{align*}
Define coefficients $\beta_e$ for $e\in I$ by $\beta_e=\alpha_e$ when $e\in J$ and $\beta_e=0$ when $e\in I\setminus J$. Then
\begin{align*}
\sum_{e\in I}\beta_e v_e
=
\sum_{e\in J}\alpha_e v_e+\sum_{e\in I\setminus J}0\cdot v_e
=
0+0
=
0.
\end{align*}
Since $(v_e)_{e\in I}$ is linearly independent, every $\beta_e=0$, and hence every $\alpha_e=0$. Thus $(v_e)_{e\in J}$ is linearly independent.
Now let $I,J\in\mathcal I$ with $|I|<|J|$. We show that some $e\in J\setminus I$ can be added to $I$. Suppose instead that
\begin{align*}
v_e\in \operatorname{span}\{v_i:i\in I\}
\qquad\text{for every }e\in J\setminus I.
\end{align*}
If $e\in J\cap I$, then also $v_e\in \operatorname{span}\{v_i:i\in I\}$. Therefore
\begin{align*}
\operatorname{span}\{v_e:e\in J\}
\subseteq
\operatorname{span}\{v_i:i\in I\}.
\end{align*}
Because both listed families are linearly independent,
\begin{align*}
\dim\operatorname{span}\{v_e:e\in J\}=|J|
\qquad\text{and}\qquad
\dim\operatorname{span}\{v_i:i\in I\}=|I|.
\end{align*}
The inclusion of spans gives
\begin{align*}
|J|
=
\dim\operatorname{span}\{v_e:e\in J\}
\le
\dim\operatorname{span}\{v_i:i\in I\}
=
|I|,
\end{align*}
contradicting $|I|<|J|$. Hence there is some $e\in J\setminus I$ with
\begin{align*}
v_e\notin \operatorname{span}\{v_i:i\in I\}.
\end{align*}
For this $e$, the set $I\cup\{e\}$ is independent: if
\begin{align*}
\alpha_e v_e+\sum_{i\in I}\alpha_i v_i=0,
\end{align*}
then $\alpha_e\ne 0$ would imply
\begin{align*}
v_e=-\alpha_e^{-1}\sum_{i\in I}\alpha_i v_i\in \operatorname{span}\{v_i:i\in I\},
\end{align*}
which is impossible. Thus $\alpha_e=0$, and then
\begin{align*}
\sum_{i\in I}\alpha_i v_i=0.
\end{align*}
Since $(v_i)_{i\in I}$ is linearly independent, every $\alpha_i=0$. So $I\cup\{e\}\in\mathcal I$, proving the augmentation axiom. Linear independence therefore gives an independence system with exactly the exchange behaviour required of a matroid.
[/example]
This example gives a model for what complete feasible solutions should mean: they are independent sets that cannot be enlarged. This motivates the terminology of bases, which will become the feasible objects in the weighted optimisation problem.
[definition: Basis]
Let $M=(E,\mathcal I)$ be a matroid. A basis of $M$ is an independent set $B\in \mathcal I$ that is maximal under inclusion among independent subsets of $E$.
[/definition]
Bases are the feasible objects optimised by the greedy algorithm in this chapter. For this to be a well-posed fixed-size optimisation problem, we need to know that all bases have the same cardinality.
[quotetheorem:5807]
[citeproof:5807]
The conclusion depends on the exchange axiom, not just on downward closure. In the earlier independence system, $\{c\}$ and $\{a,b\}$ are both maximal independent sets but have different sizes, so there is no single notion of "the" size of a complete feasible solution. The theorem does not say that every independent set is maximum, nor that every set of the common basis size is independent; it only says that maximal independent sets have a common size once exchange is available. This common cardinality of the bases is the rank of the whole matroid. To study restrictions of the same problem to a subset of elements, we need a rank function for every subset, not only for $E$.
[definition: Rank Function]
Let $M=(E,\mathcal I)$ be a matroid. The rank function of $M$ is the map $r:2^E\to \mathbb N\cup\{0\}$ defined by
\begin{align*}
r(A)=\max\{|I|: I\subseteq A,\ I\in \mathcal I\}
\end{align*}
for every $A\subseteq E$.
[/definition]
The rank function records the best independent content of each subset, but for optimisation it must also behave predictably when sets overlap.
The next structural question is whether rank has diminishing returns: can two overlapping subsets both claim more independent content than their union and intersection allow? Submodularity is the formal inequality that rules this out and turns exchange into a usable set-function property.
[quotetheorem:5808]
[citeproof:5808]
Submodularity is a structural shadow of exchange: adding elements to a larger set gives no more rank increase than adding them to a smaller set. The matroid hypothesis is doing real work here; for a general independence system, the function $r(A)=\max\{|I|:I\subseteq A, I\in\mathcal I\}$ need not be submodular. In the earlier system with independent sets $\varnothing$, the singletons, and $\{a,b\}$, taking $A=\{a,c\}$ and $B=\{b,c\}$ gives $r(A)=r(B)=1$, while $r(A\cup B)=2$ and $r(A\cap B)=1$, contradicting submodularity. Conversely, submodularity of a set function by itself is not the greedy theorem: one also needs it to be the rank function of the feasible family, with independent sets governed by $|I|=r(I)$. For greedy optimisation, we need an even more local form of exchange between complete feasible solutions.
[quotetheorem:5809]
[citeproof:5809]
The lemma says that a basis can be modified in one coordinate while staying complete. Its hypotheses cannot be weakened to arbitrary maximal feasible sets: in the independence system with maximal sets $\{c\}$ and $\{a,b\}$, there is no element of $\{a,b\}$ that can replace $c$ while preserving the size of a complete feasible set. Even in a matroid, the lemma does not say that every proposed replacement works; it guarantees the existence of at least one suitable element on the other basis. This is the set-theoretic replacement for exchanging edges across a cut in spanning tree proofs, and it is the main tool for the greedy theorem.
## Greedy Optimisation over Bases
We now ask the algorithmic question: if every element $e\in E$ has a weight $w(e)$, when does the rule "take the heaviest feasible remaining element" find a maximum-weight basis? The answer is exactly the matroid exchange property.
[definition: Weight of a Set]
Let $E$ be a finite set and let $w:E\to \mathbb R$ be a weight function. The induced set-weight map $w_{\mathrm{set}}:2^E\to \mathbb R$ is defined by
\begin{align*}
w_{\mathrm{set}}(A)=\sum_{e\in A} w(e)
\end{align*}
for every $A\subseteq E$.
[/definition]
When no confusion can arise, we write $w(A)$ for $w_{\mathrm{set}}(A)$.
This additive objective is the natural setting for a greedy rule, since each local decision has a numerical score independent of the current partial solution. We next formalise the algorithm whose correctness will be proved by basis exchange.
[definition: Greedy Algorithm for a Matroid Basis]
Let $M=(E,\mathcal I)$ be a matroid and let $w:E\to \mathbb R$ be a weight function. Order the elements as $e_1,\dots,e_m$ so that $w(e_1)\ge w(e_2)\ge\cdots\ge w(e_m)$. Starting with $G_0=\varnothing$, define $G_i$ for $1\le i\le m$ by taking $G_i=G_{i-1}\cup\{e_i\}$ if $G_{i-1}\cup\{e_i\}\in \mathcal I$, and taking $G_i=G_{i-1}$ otherwise. The output is $G_m$.
[/definition]
The output is independent, and it is maximal because every skipped element remains infeasible after adding more elements. The central question is whether this maximal independent set has maximum possible weight among all bases.
[quotetheorem:5810]
[citeproof:5810]
The correctness template is to compare the greedy solution with an optimal solution, then exchange in the greedy choice without lowering objective value. Each hypothesis matters. If the feasible family is only an independence system, the earlier weights $w(c)=3$ and $w(a)=w(b)=2$ make greedy stop at $\{c\}$ even though $\{a,b\}$ is better. If the objective is not additive, local element weights no longer encode the marginal value of a choice; for instance a bonus for selecting $a$ and $b$ together can make the pair optimal even when neither element looks best individually. The theorem also does not assert uniqueness of the optimal basis, nor does it choose among equal-weight elements except through the specified ordering. The most familiar application is minimum spanning tree, where minimising cost is the same as maximising negative cost.
[example: Minimum Spanning Tree as Graphic Matroid Optimisation]
Let $H=(V,E)$ be a connected graph and let $c:E\to \mathbb R$ be an edge-cost function. In the graphic matroid of $H$, independent sets are forests. Since $H$ is connected, the bases are exactly the spanning trees: a maximal forest cannot have two connected components, because a path in $H$ between those components contains an edge joining two different components of the forest, and adding such an edge would not create a cycle.
Define $w:E\to\mathbb R$ by $w(e)=-c(e)$. For any spanning tree $T\subseteq E$,
\begin{align*}
w(T)=\sum_{e\in T}w(e)=\sum_{e\in T}(-c(e))=-\sum_{e\in T}c(e)=-c(T).
\end{align*}
Thus, for spanning trees $T_1$ and $T_2$, the inequality $w(T_1)\ge w(T_2)$ is equivalent to $-c(T_1)\ge -c(T_2)$, which is equivalent to $c(T_1)\le c(T_2)$. Therefore a maximum-weight basis for $w$ is exactly a minimum-cost spanning tree for $c$.
By *[Matroid Greedy Theorem](/theorems/5810)*, the greedy algorithm for the graphic matroid returns a maximum-weight basis. Sorting edges by non-increasing $w(e)$ is the same as sorting them by non-decreasing $c(e)$, since $w(e_1)\ge w(e_2)$ is equivalent to $-c(e_1)\ge -c(e_2)$, hence to $c(e_1)\le c(e_2)$. Accepting an edge exactly when independence is preserved means accepting it exactly when adding it does not create a cycle. So the matroid greedy algorithm becomes: consider edges from cheapest to most expensive, add an edge precisely when it does not create a cycle, and stop with a minimum-cost spanning tree. This is Kruskal's algorithm.
[/example]
The theorem also explains why greedy algorithms may be sensitive to the feasibility system rather than the numerical weights. To expose the missing hypothesis, we return to an independence system that is not a matroid.
[example: Failure of Greedy Outside Matroids]
Let $E=\{a,b,c\}$ and
\begin{align*}
\mathcal I=\{\varnothing,\{a\},\{b\},\{c\},\{a,b\}\}.
\end{align*}
Assign weights
\begin{align*}
w(c)=3,\qquad w(a)=2,\qquad w(b)=2.
\end{align*}
The greedy rule considers an element of largest weight first. Since
\begin{align*}
w(c)=3>2=w(a)=w(b),
\end{align*}
it selects $c$ first, giving the current feasible set $\{c\}$.
Now the only remaining elements are $a$ and $b$. Adding $a$ would give
\begin{align*}
\{c\}\cup\{a\}=\{a,c\},
\end{align*}
and adding $b$ would give
\begin{align*}
\{c\}\cup\{b\}=\{b,c\}.
\end{align*}
Neither $\{a,c\}$ nor $\{b,c\}$ belongs to $\mathcal I$, so the greedy algorithm cannot add either remaining element. It stops at $\{c\}$, whose weight is
\begin{align*}
w(\{c\})=w(c)=3.
\end{align*}
But $\{a,b\}\in\mathcal I$, and its weight is
\begin{align*}
w(\{a,b\})=w(a)+w(b)=2+2=4.
\end{align*}
Since
\begin{align*}
4>3,
\end{align*}
the feasible set $\{a,b\}$ has larger total weight than the greedy output. Thus this independence system does not support the greedy theorem: without the exchange axiom, a locally best first choice can trap the algorithm in a smaller, lower-weight maximal feasible set.
[/example]
This example is not a defect of a particular tie-breaking rule. It identifies the missing augmentation axiom: the smaller maximal set $\{c\}$ cannot be augmented toward the larger feasible set $\{a,b\}$.
## Standard Families of Matroids
The theory is useful because the same axioms appear in several concrete settings. The next question is how to recognise matroids already present in graph theory, selection under quotas, finite-dimensional linear algebra, and cardinality constraints.
[definition: Graphic Matroid]
Let $H=(V,E)$ be a finite graph. The graphic matroid of $H$ is the pair $M(H)=(E,\mathcal I)$ where $F\subseteq E$ belongs to $\mathcal I$ iff the subgraph $(V,F)$ is a forest.
[/definition]
In this matroid, circuits are graph cycles and bases are spanning forests with one tree in each connected component. We need to verify that the graph-theoretic notion of forest really satisfies the abstract augmentation axiom.
[quotetheorem:5811]
[citeproof:5811]
The familiar cut argument for spanning trees reappears here in matroid language. The hypotheses are specific to forests in a graph: the component count gives a numerical certificate for when an edge can be added without creating a cycle. If we used an arbitrary graph-theoretic family instead, such as matchings or connected edge sets, heredity or augmentation can fail; a maximal matching can have smaller cardinality than another matching, and a connected edge set need not remain connected after deleting an edge. Thus the theorem does not make arbitrary graph feasibility constraints greedy-solvable; it says only that acyclicity has exactly the exchange behaviour needed for greedy optimisation. A different family comes from selection problems with quotas across disjoint classes, so the next definition records that constraint system.
[definition: Partition Matroid]
Let $E$ be a finite set partitioned as $E=E_1\cup\cdots\cup E_t$, and let $b_1,\dots,b_t\in \mathbb N\cup\{0\}$. The partition matroid has independent sets
\begin{align*}
\mathcal I=\{I\subseteq E: |I\cap E_j|\le b_j\text{ for }1\le j\le t\}.
\end{align*}
[/definition]
Partition matroids model choosing representatives subject to category quotas. Since the greedy theorem applies only after the matroid axioms have been verified, the quota model still needs a formal exchange check rather than only an algorithmic description. The exchange axiom comes from the fact that if one feasible set is smaller than another, some category has unused capacity relative to the larger set.
[quotetheorem:5812]
[citeproof:5812]
This verification turns the quota model into an instance of the greedy theorem rather than a separate algorithm. The next example translates the theorem into the language of representatives.
[example: Choosing Representatives Under Partition Constraints]
Suppose the departments $E_1,\dots,E_t$ are disjoint and a feasible selection is a set $S\subseteq E$ satisfying $|S\cap E_j|\le b_j$ for every $1\le j\le t$. Order the applicants as $e_1,\dots,e_m$ so that $w(e_1)\ge w(e_2)\ge \cdots \ge w(e_m)$, and let $G$ be the current greedy selection when applicant $e_i\in E_j$ is considered.
Adding $e_i$ affects only the count in department $E_j$. Since $e_i\notin G$ at the moment it is considered,
\begin{align*}
|(G\cup\{e_i\})\cap E_j|=|(G\cap E_j)\cup\{e_i\}|=|G\cap E_j|+1.
\end{align*}
For any $\ell\ne j$, disjointness of the departments gives $e_i\notin E_\ell$, so
\begin{align*}
(G\cup\{e_i\})\cap E_\ell=G\cap E_\ell.
\end{align*}
Thus the quota inequalities for departments other than $E_j$ are unchanged, and applicant $e_i$ is accepted exactly when
\begin{align*}
|G\cap E_j|+1\le b_j.
\end{align*}
Equivalently, the greedy rule accepts precisely those applicants whose own department still has unused quota.
By *[Partition Matroids Are Matroids](/theorems/5812)*, these quota-feasible sets form a matroid, and by *Matroid Greedy Theorem*, the greedy output is a maximum-weight basis. Therefore, for every feasible selection $B$ of the same maximal feasible size,
\begin{align*}
w(G)=\sum_{e\in G}w(e)\ge \sum_{e\in B}w(e)=w(B).
\end{align*}
So sorting all applicants by score and accepting an applicant exactly when that applicant's department is not full produces a highest-total-score selection among all maximal quota-feasible selections.
[/example]
The quota example suggests an even simpler constraint: ignore categories and impose only a total cardinality cap. This motivates the uniform matroid.
[definition: Uniform Matroid]
Let $E$ be a finite set and let $0\le k\le |E|$. The uniform matroid $U_{k,E}$ has independent sets
\begin{align*}
\mathcal I=\{I\subseteq E: |I|\le k\}.
\end{align*}
[/definition]
The uniform matroid is the partition matroid with one part, but it is useful to see the exchange axiom directly. If $I,J\subseteq E$ satisfy $|I|\le k$, $|J|\le k$, and $|I|<|J|$, then any $e\in J\setminus I$ gives $|I\cup\{e\}|\le |J|\le k$. Thus $U_{k,E}$ is a matroid. For $U_{k,E}$, every basis is a $k$-element subset of $E$. The greedy algorithm therefore returns the $k$ elements of largest weight, showing that ordinary top-$k$ selection is the degenerate case of matroid optimisation.
A different source of matroids comes from linear algebra rather than quotas. To make that source precise for combinatorial optimisation, we label each ground-set element by a vector and declare a subset feasible exactly when its labels are linearly independent; this keeps parallel labels and zero labels visible as combinatorial features.
[definition: Linear Matroid]
Let $E$ be a finite set, let $k$ be a field, let $n\in \mathbb N$, and let $v_e\in k^n$ be assigned to each $e\in E$. The associated linear matroid has independent sets
\begin{align*}
\mathcal I=\{I\subseteq E: (v_e)_{e\in I}\text{ is linearly independent over }k\}.
\end{align*}
[/definition]
Linear matroids include graphic matroids after choosing an oriented incidence matrix for a graph. The following small example shows how dependence among labelled vectors controls the bases.
[example: A Small Linear Matroid]
Let $E=\{1,2,3\}$ over $\mathbb R$, with
\begin{align*}
v_1=(1,0),\qquad v_2=(0,1),\qquad v_3=(1,1).
\end{align*}
The empty set is independent. Each singleton is independent because none of the three vectors is zero:
\begin{align*}
v_1\ne 0,\qquad v_2\ne 0,\qquad v_3\ne 0.
\end{align*}
We check the two-element subsets. For $\{1,2\}$, if
\begin{align*}
\alpha v_1+\beta v_2=0,
\end{align*}
then
\begin{align*}
\alpha(1,0)+\beta(0,1)=(\alpha,\beta)=(0,0),
\end{align*}
so $\alpha=0$ and $\beta=0$. For $\{1,3\}$, if
\begin{align*}
\alpha v_1+\beta v_3=0,
\end{align*}
then
\begin{align*}
\alpha(1,0)+\beta(1,1)=(\alpha+\beta,\beta)=(0,0).
\end{align*}
Thus $\beta=0$, and then $\alpha+\beta=0$ gives $\alpha=0$. For $\{2,3\}$, if
\begin{align*}
\alpha v_2+\beta v_3=0,
\end{align*}
then
\begin{align*}
\alpha(0,1)+\beta(1,1)=(\beta,\alpha+\beta)=(0,0).
\end{align*}
Thus $\beta=0$, and then $\alpha+\beta=0$ gives $\alpha=0$. Hence all three two-element subsets are independent.
The full set is dependent because
\begin{align*}
v_1+v_2-v_3
=
(1,0)+(0,1)-(1,1)
=
(1,1)-(1,1)
=
(0,0),
\end{align*}
and the coefficients $1,1,-1$ are not all zero. Therefore the independent sets are exactly $\varnothing$, the three singletons, and the three two-element subsets.
Since every two-element independent set can only be enlarged to the dependent set $\{1,2,3\}$, every two-element subset is a basis. In a non-increasing weight order, the greedy algorithm accepts the first labelled vector, accepts the second because every pair is independent, and rejects the third because the full set is dependent. Thus greedy chooses the two largest-weight labelled vectors in the chosen order.
[/example]
These families explain the breadth of the greedy theorem. Whenever feasibility is governed by acyclicity, quotas, dimension, or pure cardinality, the same exchange proof gives the same optimisation conclusion. Other constructions, such as transversal matroids from matching representatives and the polymatroid rank functions introduced in Chapter 10, extend the same exchange-and-diminishing-returns viewpoint beyond the minimum-spanning-tree setting.
# 9. Matroid Intersection and Matroid Union
This chapter brings the matroid point of view from Chapter 8 back to the augmenting-path methods from Chapters 2 and 7. Greedy algorithms solve optimisation over a single matroid, but many natural constraints ask for a set to be independent in two matroids at once. The main result is Edmonds' matroid intersection theorem, which gives both an augmenting algorithm and a min-max certificate for the largest common independent set. We finish by stating matroid union, the complementary operation in which several independent sets are packed together to cover or build a larger set.
## Common Independent Sets and Exchange Graphs
Suppose a finite ground set $E$ carries two matroids $M_1=(E,\mathcal I_1)$ and $M_2=(E,\mathcal I_2)$. The basic optimisation problem is to find a largest set $I\subset E$ such that $I\in\mathcal I_1\cap\mathcal I_2$. For one matroid, an element can be added whenever independence remains; for two matroids, an attempted addition in one matroid may require deleting a different element to restore independence in the other. The exchange graph records exactly these possible repairs.
[definition: Common Independent Set]
Let $M_1=(E,\mathcal I_1)$ and $M_2=(E,\mathcal I_2)$ be matroids on the same finite ground set $E$. A common independent set is a subset $I\subset E$ such that $I\in\mathcal I_1$ and $I\in\mathcal I_2$.
[/definition]
The definition isolates the feasible sets for the intersection problem. The next object is not another feasibility condition, but a directed graph built relative to a current feasible set; its arcs say which single exchange preserves independence in one of the two matroids.
[definition: Exchange Graph]
Let $M_1=(E,\mathcal I_1)$ and $M_2=(E,\mathcal I_2)$ be matroids, and let $I$ be a common independent set. The exchange graph $D(I)$ is the directed graph with vertex set $E$ and the following arcs: for $x\in I$ and $y\in E\setminus I$, include $x\to y$ if $(I\setminus\{x\})\cup\{y\}\in\mathcal I_1$; include $y\to x$ if $(I\setminus\{x\})\cup\{y\}\in\mathcal I_2$.
[/definition]
The two orientations are chosen so that a directed path alternates between fixing the first matroid and fixing the second. To begin or end such a path, we also need to know which outside elements may be inserted without any deletion.
[definition: Source and Sink Sets]
Let $I$ be a common independent set for matroids $M_1=(E,\mathcal I_1)$ and $M_2=(E,\mathcal I_2)$. Define $S_I=\{y\in E\setminus I : I\cup\{y\}\in\mathcal I_1\}$ and $T_I=\{y\in E\setminus I : I\cup\{y\}\in\mathcal I_2\}$.
[/definition]
An element in $S_I\cap T_I$ immediately augments $I$. The harder case is when no single element can be added while preserving both matroids, but a sequence of exchanges might still make room. The next theorem is needed because it turns that informal repair chain into an exact optimality test: augmentation is possible precisely when the exchange graph contains the right directed path.
[quotetheorem:5813]
[citeproof:5813]
This is the exact analogue of [Berge's augmenting path theorem](/theorems/5776) for matchings, but its hypotheses are doing real work. The set $I$ must already be common independent, because the arcs describe repairs after a single exchange from a feasible starting point; if $I$ violates one matroid, the path language no longer records all violations that must be fixed. Finiteness ensures that reachability and shortest paths are ordinary finite graph notions and that the augmentation algorithm terminates after finitely many successful augmentations. The theorem also does not claim that every directed path gives an augmentation: one uses a shortest path from $S_I$ to $T_I$, so that hidden dependencies cannot reappear among the exchanged elements. Algorithmically, the template is now precise: start with $I=\varnothing$, construct $D(I)$, search from $S_I$ to $T_I$, augment along a shortest path if one exists, and stop when there is no such path.
[example: Assigning Tasks With Two Independent Constraint Systems]
Let $E$ be a finite set of possible task assignments. Suppose $M_1=(E,\mathcal I_1)$ records the worker-side qualification or capacity constraint, and $M_2=(E,\mathcal I_2)$ records the project-side compatibility constraint. A set $I\subset E$ is feasible exactly when
\begin{align*}
I\in\mathcal I_1\cap\mathcal I_2.
\end{align*}
Thus a common independent set is precisely a partial assignment satisfying both constraint systems.
Fix a current feasible partial assignment $I$. An outside assignment $y\in E\setminus I$ lies in $S_I$ when
\begin{align*}
I\cup\{y\}\in\mathcal I_1,
\end{align*}
so $y$ can be added without breaking the worker-side constraint. It lies in $T_I$ when
\begin{align*}
I\cup\{y\}\in\mathcal I_2,
\end{align*}
so $y$ can be added without breaking the project-side constraint. If $x\in I$ and $y\in E\setminus I$, the arc $x\to y$ records the worker-side repair
\begin{align*}
(I\setminus\{x\})\cup\{y\}\in\mathcal I_1,
\end{align*}
while the arc $y\to x$ records the project-side repair
\begin{align*}
(I\setminus\{x\})\cup\{y\}\in\mathcal I_2.
\end{align*}
Therefore a directed path
\begin{align*}
y_0,x_1,y_1,\ldots,x_k,y_k
\end{align*}
from $S_I$ to $T_I$ names the reassignment chain explicitly: insert the outside assignments $y_0,\ldots,y_k$ and remove the existing assignments $x_1,\ldots,x_k$. The new set is
\begin{align*}
I'=(I\setminus\{x_1,\ldots,x_k\})\cup\{y_0,\ldots,y_k\}.
\end{align*}
The path has $k$ removed assignments and $k+1$ inserted assignments, so
\begin{align*}
|I'|=|I|-k+(k+1).
\end{align*}
Since $-k+(k+1)=1$, this gives
\begin{align*}
|I'|=|I|+1.
\end{align*}
So an augmenting path is not just evidence that improvement is possible; it is a concrete sequence of task removals and insertions that increases the number of completed tasks by one while preserving both constraint systems.
[/example]
This example is deliberately abstract because the same exchange mechanism appears in many models. When the two matroids are partition matroids, the exchange graph collapses to the familiar alternating graph for bipartite matching.
## The Matroid Intersection Theorem
The augmentation criterion says when a current solution is not optimal, but a theorem of optimisation should also explain why the algorithm has stopped. The certificate is a min-max formula involving ranks of the two matroids on complementary parts of the ground set.
[quotetheorem:5814]
[citeproof:5814]
The formula is useful because the right-hand side is a certificate of optimality. If an algorithm outputs $I$ and a set $A$ with $|I|=r_1(A)+r_2(E\setminus A)$, then optimality is verified without inspecting all common independent sets. The shared ground set is essential: if the two matroids live on different sets, there is no single subset whose independence can be tested in both systems, and the expression $E\setminus A$ has no common meaning. The matroid hypotheses are also essential, because the proof uses exchange and closure; for instance, an arbitrary downward-[closed set](/page/Closed%20Set) system can fail to admit any local exchange path even when a larger feasible set exists. The theorem is a cardinality theorem for finite matroids, not a direct weighted optimisation statement and not a statement about arbitrary independence systems.
[remark: Rank Certificate]
The set $A$ in Edmonds' theorem plays the role of a cut or vertex cover. It is not a feasible solution of the original problem; it is a dual certificate showing that every common independent set has size at most the value already achieved.
[/remark]
This rank certificate explains why matroid intersection belongs beside max-flow min-cut and Konig's theorem in the course. We now check that the classical bipartite matching theorem is contained in this framework.
[example: Matching as Intersection of Two Partition Matroids]
Let $G=(U\cup V,E)$ be a bipartite graph, and use the edge set $E$ as the common ground set. For each $u\in U$, let
\begin{align*}
E(u)=\{e\in E : e \text{ is incident with } u\}.
\end{align*}
Define $M_U$ to be the partition matroid whose parts are the sets $E(u)$ for $u\in U$, each with capacity $1$. Thus $F\subset E$ is independent in $M_U$ exactly when
\begin{align*}
|F\cap E(u)|\le 1 \quad \text{for every } u\in U.
\end{align*}
Equivalently, no two edges of $F$ share an endpoint in $U$. Similarly, for each $v\in V$, let
\begin{align*}
E(v)=\{e\in E : e \text{ is incident with } v\},
\end{align*}
and define $M_V$ by the condition
\begin{align*}
|F\cap E(v)|\le 1 \quad \text{for every } v\in V.
\end{align*}
Therefore $F$ is common independent in $M_U$ and $M_V$ exactly when it satisfies the $U$-endpoint capacity inequalities
\begin{align*}
|F\cap E(u)|\le 1 \quad \text{for every } u\in U
\end{align*}
and the $V$-endpoint capacity inequalities
\begin{align*}
|F\cap E(v)|\le 1 \quad \text{for every } v\in V.
\end{align*}
The first condition says that no two selected edges share a left endpoint, and the second says that no two selected edges share a right endpoint. Since every edge of a bipartite graph has one endpoint in $U$ and one endpoint in $V$, the two conditions together say exactly that no two selected edges share any endpoint. Hence the common independent sets of $M_U$ and $M_V$ are precisely the matchings of $G$.
Now fix a matching $F$. An outside edge $y\in E\setminus F$ lies in $S_F$ exactly when $F\cup\{y\}$ still satisfies all $U$-endpoint capacities, which means the $U$-endpoint of $y$ is unmatched by $F$. It lies in $T_F$ exactly when $F\cup\{y\}$ still satisfies all $V$-endpoint capacities, which means the $V$-endpoint of $y$ is unmatched by $F$. If $x\in F$ and $y\in E\setminus F$, then the arc $x\to y$ appears exactly when replacing $x$ by $y$ preserves the $U$-endpoint capacities, and the arc $y\to x$ appears exactly when replacing $x$ by $y$ preserves the $V$-endpoint capacities. Thus a directed path from $S_F$ to $T_F$ alternates between edges outside $F$ and edges inside $F$, begins at an edge whose $U$-endpoint is unmatched, and ends at an edge whose $V$-endpoint is unmatched. This is exactly the usual alternating augmenting path for a bipartite matching, so matroid intersection recovers the standard matching augmentation procedure.
[/example]
The example shows that matroid intersection generalises bipartite matching without changing the logic of augmentation. It also produces less graphical applications, such as choosing edges of a graph subject to several colour and acyclicity constraints.
[example: Colorful Spanning Forests]
Let $G=(V,E)$ be a graph whose edges are coloured. For each colour $\alpha$, let
\begin{align*}
C_\alpha=\{e\in E : e \text{ has colour } \alpha\}.
\end{align*}
Let $M_1$ be the graphic matroid of $G$, so
\begin{align*}
F\in\mathcal I_1 \quad \text{if and only if } F \text{ is a forest}.
\end{align*}
Let $M_2$ be the partition matroid with parts $C_\alpha$ and capacity $1$ on each part, so
\begin{align*}
F\in\mathcal I_2
\quad \text{if and only if} \quad
|F\cap C_\alpha|\le 1 \text{ for every colour } \alpha.
\end{align*}
Therefore $F$ is common independent in $M_1$ and $M_2$ exactly when
\begin{align*}
F\in\mathcal I_1\cap\mathcal I_2
\end{align*}
exactly when
\begin{align*}
F \text{ is a forest}
\quad \text{and} \quad
|F\cap C_\alpha|\le 1 \text{ for every colour } \alpha.
\end{align*}
The second condition says that no two edges of $F$ have the same colour, so the common independent sets are precisely the rainbow forests of $G$.
Applying *[Edmonds Matroid Intersection Theorem](/theorems/5814)* to $M_1$ and $M_2$ therefore finds a common independent set of maximum cardinality, hence a largest rainbow forest. If such a forest $F$ has size $|V|-1$, then a forest on vertex set $V$ with $c$ connected components has $|V|-c$ edges, so
\begin{align*}
|V|-1=|F|=|V|-c.
\end{align*}
Subtracting $|V|$ from both sides gives
\begin{align*}
-1=-c,
\end{align*}
and multiplying by $-1$ gives
\begin{align*}
c=1.
\end{align*}
Thus $F$ is connected as well as acyclic, so it is a spanning tree, and the colour condition makes it a colourful spanning tree.
[/example]
The graphic-partition example is a typical reason matroids are valuable in optimisation. Acyclicity alone is greedy, colour capacity alone is greedy, but their conjunction needs the intersection machinery.
## Matroid Union and Covering Interpretations
Intersection asks for one set satisfying two independent-set systems at the same time. Union asks a different question: which sets can be decomposed into several pieces, each independent in its own matroid? This turns packing independent structures into a new matroid and leads to covering formulations.
[definition: Matroid Union]
Let $M_i=(E,\mathcal I_i)$ for $1\le i\le k$ be matroids on the same finite ground set $E$. Their union is the set system $M_1\vee\cdots\vee M_k=(E,\mathcal I)$, where
\begin{align*}
\mathcal I=\{I\subset E : I=I_1\cup\cdots\cup I_k \text{ for some } I_i\in\mathcal I_i\}.
\end{align*}
[/definition]
The definition says that a set is feasible if it can be coloured with colours $1,\dots,k$ so that the elements of colour $i$ form an independent set of $M_i$. The theorem is that this construction preserves the matroid axioms and has a rank formula parallel to the intersection theorem.
[remark: Matroid Union Theorem]
If $M_i=(E,\mathcal I_i)$ are matroids with rank functions $r_i$ for $1\le i\le k$, then the union $M_1\vee\cdots\vee M_k$ is a matroid. Its rank function is
\begin{align*}
r_{M_1\vee\cdots\vee M_k}(X)=\min_{A\subset X} \left(|X\setminus A|+\sum_{i=1}^k r_i(A)\right)
\end{align*}
for every $X\subset E$.
[/remark]
The finiteness and common-ground-set assumptions again matter. A set in the union is a subset of the same $E$, decomposed into labelled independent pieces; without a shared ground set, the formula could not compare $X\setminus A$ with the ranks $r_i(A)$. The matroid hypotheses are what make the decomposable sets satisfy exchange, so the union is not merely a union of arbitrary downward-closed families. The rank formula also states the maximum size of a decomposable subset of $X$, not that a particular decomposition is unique. Its main use in the next examples is to turn decomposition questions into inequalities over all subsets $A\subset X$.
[example: Covering a Set by Independent Sets]
Let $M=(E,\mathcal I)$ be a matroid with rank function $r$, and fix $k\in\mathbb N$. A subset $X\subset E$ can be covered by $k$ independent sets of $M$ exactly when there are sets $I_1,\ldots,I_k\in\mathcal I$ such that
\begin{align*}
X\subset I_1\cup\cdots\cup I_k.
\end{align*}
Equivalently, $X$ is independent in the union of $k$ copies of $M$, because the union matroid consists of the sets that can be written as a union of one independent set from each copy.
By the *[Matroid Union Theorem](/theorems/5815)*, the rank of $X$ in this union is
\begin{align*}
r_{\vee^k M}(X)
=
\min_{A\subset X}\left(|X\setminus A|+\sum_{i=1}^k r(A)\right).
\end{align*}
Since all $k$ copies are the same matroid $M$, each summand is $r(A)$, so
\begin{align*}
\sum_{i=1}^k r(A)=k r(A),
\end{align*}
and therefore
\begin{align*}
r_{\vee^k M}(X)
=
\min_{A\subset X}\left(|X\setminus A|+k r(A)\right).
\end{align*}
The set $X$ is independent in the union matroid exactly when its rank there is $|X|$, so this is equivalent to
\begin{align*}
|X|\le |X\setminus A|+k r(A)
\quad \text{for every } A\subset X.
\end{align*}
For each $A\subset X$, the sets $A$ and $X\setminus A$ are disjoint and have union $X$, hence
\begin{align*}
|X|=|A|+|X\setminus A|.
\end{align*}
Substituting this into the inequality gives
\begin{align*}
|A|+|X\setminus A|
\le
|X\setminus A|+k r(A).
\end{align*}
Subtracting $|X\setminus A|$ from both sides gives
\begin{align*}
|A|\le k r(A).
\end{align*}
Thus $X$ can be covered by $k$ independent sets of $M$ exactly when
\begin{align*}
|A|\le k r(A)
\quad \text{for every } A\subset X.
\end{align*}
The obstruction is therefore concrete: a subset $A\subset X$ with $|A|>k r(A)$ contains more elements than $k$ independent sets can cover inside a set of rank $r(A)$.
[/example]
This is a covering interpretation: instead of selecting one large independent set, we ask how many independent sets are needed to cover a prescribed set. For a graphic matroid, this becomes the question of decomposing a graph's edge set into forests.
[example: Decomposing Edges Into Forests]
Let $G=(V,E)$ and let $M(G)$ be its graphic matroid, so the independent sets of $M(G)$ are exactly the forests in $G$. A subset $X\subset E$ can be covered by $k$ forests exactly when $X$ is independent in the union of $k$ copies of $M(G)$: if
\begin{align*}
X\subset F_1\cup\cdots\cup F_k
\end{align*}
with each $F_i$ a forest, then assigning each edge of $X$ to one forest that contains it gives
\begin{align*}
X=X_1\cup\cdots\cup X_k
\end{align*}
with $X_i\subset F_i$, and each $X_i$ is again a forest because every subset of a forest is a forest.
By the *Matroid Union Theorem*, the rank of $X$ in the union of $k$ copies of $M(G)$ is
\begin{align*}
r_{\vee^k M(G)}(X)
=
\min_{A\subset X}\left(|X\setminus A|+\sum_{i=1}^k r(A)\right).
\end{align*}
Since all $k$ summands are the same graphic-matroid rank $r(A)$,
\begin{align*}
\sum_{i=1}^k r(A)=k r(A),
\end{align*}
so
\begin{align*}
r_{\vee^k M(G)}(X)
=
\min_{A\subset X}\left(|X\setminus A|+k r(A)\right).
\end{align*}
The set $X$ is independent in the union matroid exactly when
\begin{align*}
r_{\vee^k M(G)}(X)=|X|.
\end{align*}
The term with $A=\varnothing$ is
\begin{align*}
|X\setminus\varnothing|+k r(\varnothing)=|X|+k\cdot 0=|X|,
\end{align*}
so the minimum is equal to $|X|$ exactly when every term is at least $|X|$:
\begin{align*}
|X|\le |X\setminus A|+k r(A)
\quad \text{for every } A\subset X.
\end{align*}
For each $A\subset X$, the sets $A$ and $X\setminus A$ are disjoint and have union $X$, hence
\begin{align*}
|X|=|A|+|X\setminus A|.
\end{align*}
Substituting this into the inequality gives
\begin{align*}
|A|+|X\setminus A|
\le
|X\setminus A|+k r(A).
\end{align*}
Subtracting $|X\setminus A|$ from both sides gives
\begin{align*}
|A|\le k r(A)
\quad \text{for every } A\subset X.
\end{align*}
It remains to write this condition in graph language. For $A\subset E$, let $V(A)$ be the set of vertices incident with an edge of $A$, and let $c(A)$ be the number of connected components of the graph with vertex set $V(A)$ and edge set $A$. In each connected component with vertex set $W$, a spanning tree has $|W|-1$ edges, so a largest forest in the subgraph $(V(A),A)$ has
\begin{align*}
\sum_{j=1}^{c(A)} (|W_j|-1)
=
\sum_{j=1}^{c(A)} |W_j|-\sum_{j=1}^{c(A)}1
=
|V(A)|-c(A)
\end{align*}
edges. Therefore
\begin{align*}
r(A)=|V(A)|-c(A).
\end{align*}
Substituting this rank value into $|A|\le k r(A)$ gives the explicit sparsity condition
\begin{align*}
|A|\le k\bigl(|V(A)|-c(A)\bigr)
\quad \text{for every } A\subset X.
\end{align*}
Thus $X$ can be decomposed into $k$ forests exactly when no edge subset $A\subset X$ has more edges than $k$ times the maximum size of a forest inside that subset. This is the matroidal form of an arboricity-type covering criterion.
[/example]
Matroid union therefore complements matroid intersection. Intersection models simultaneous constraints; union models decomposition into independent layers. Together they show that matroids do not merely explain greedy algorithms, but also support dual certificates and augmenting methods for more global combinatorial optimisation problems. Chapter 10 abstracts one more common feature of matroid rank, graph cuts, and related certificates: submodularity.
# 10. Submodular Functions and Discrete Convexity
Submodular functions are the set-function analogue of convex functions in continuous optimisation. Earlier chapters used cuts, matchings, flows, matroids, and duality-style certificates to obtain exact optimisation results; this chapter isolates a common diminishing-returns structure behind several of those examples. The main questions are how to recognise submodularity, why graph cuts and matroid ranks satisfy it, and why minimising a submodular function is an exact polynomial-time optimisation problem even when no graph is visible.
## Diminishing Returns for Set Functions
Many combinatorial optimisation problems attach a value to each subset of a finite ground set. The first question is: what algebraic inequality captures the idea that adding an item is less valuable after many items have already been chosen?
[definition: Set Function]
Let $E$ be a finite set. A set function on $E$ is a function $f:2^E \to \mathbb R$.
[/definition]
A set function is flexible enough to encode costs, ranks, capacities, penalties, and utilities. To compare the effect of adding one element in different contexts, we need a local measurement of how much the function changes under a single-element extension.
[definition: Marginal Value]
Let $f:2^E \to \mathbb R$, let $A \subset E$, and let $e \in E \setminus A$. The marginal value of $e$ at $A$ is
\begin{align*}
\Delta_f(e \mid A) = f(A \cup \{e\}) - f(A).
\end{align*}
[/definition]
Marginal values by themselves do not impose any consistency between small and large sets. The discrete convexity condition we need says that once more elements are already present, the same new element cannot become more valuable.
[definition: Submodular Set Function]
Let $E$ be a finite set. A set function $f:2^E \to \mathbb R$ is submodular if for all $A,B \subset E$,
\begin{align*}
f(A) + f(B) \ge f(A \cup B) + f(A \cap B).
\end{align*}
[/definition]
The inequality says that the overlap between $A$ and $B$ has been counted twice on the left, and submodularity prevents the union from gaining more than the two separate sets justify. Since the definition quantifies over all pairs of subsets, we next need a single-element test that is easier to apply in examples.
[quotetheorem:5816]
[citeproof:5816]
This theorem is the discrete counterpart of decreasing derivative for concave functions, but here it is used inside minimisation theory. The requirement $e\notin B$ is part of the marginal-value setup: it compares the same genuine single-element extension in two contexts. If it were dropped, the expression $\Delta_f(e\mid B)$ would no longer be defined by the preceding definition; forcing it to be zero by convention would make the test depend on an artificial convention rather than on the submodular inequality. The sign convention also matters because the reversed inequality describes increasing returns, so we also name the opposite class.
[definition: Supermodular Set Function]
Let $E$ be a finite set. A set function $f:2^E \to \mathbb R$ is supermodular if for all $A,B \subset E$,
\begin{align*}
f(A) + f(B) \le f(A \cup B) + f(A \cap B).
\end{align*}
[/definition]
Thus $f$ is supermodular exactly when $-f$ is submodular. To calibrate both notions, we need the boundary case where the inequality has no curvature at all.
[definition: Modular Set Function]
Let $E$ be a finite set. A set function $f:2^E \to \mathbb R$ is modular if there are constants $c \in \mathbb R$ and $w_e \in \mathbb R$ for each $e \in E$ such that
\begin{align*}
f(A)=c+\sum_{e\in A} w_e
\end{align*}
for every $A \subset E$.
[/definition]
Modular functions satisfy the submodular inequality with equality for every pair $A,B$. They are useful test cases because submodular functions can be viewed as functions whose increments are allowed to fall below, but not rise above, the modular pattern.
[example: Checking A Three Element Function]
Let $E=\{1,2,3\}$ and define $f(A)=\min\{|A|,2\}$. If $e\in E\setminus A$, then $|A\cup\{e\}|=|A|+1$, so the marginal value is
\begin{align*}
\Delta_f(e\mid A)=f(A\cup\{e\})-f(A)=\min\{|A|+1,2\}-\min\{|A|,2\}.
\end{align*}
For the three possible sizes of $A$ with $e\notin A$, this gives
\begin{align*}
|A|=0:\quad \Delta_f(e\mid A)=\min\{1,2\}-\min\{0,2\}=1-0=1.
\end{align*}
\begin{align*}
|A|=1:\quad \Delta_f(e\mid A)=\min\{2,2\}-\min\{1,2\}=2-1=1.
\end{align*}
\begin{align*}
|A|=2:\quad \Delta_f(e\mid A)=\min\{3,2\}-\min\{2,2\}=2-2=0.
\end{align*}
Now let $A\subset B\subset E$ and $e\in E\setminus B$. Since $E$ has three elements, $|B|\le 2$. If $|B|=0$ or $|B|=1$, then $\Delta_f(e\mid B)=1$, and because $A\subset B$, the displayed cases give $\Delta_f(e\mid A)=1$ as well. If $|B|=2$, then $\Delta_f(e\mid B)=0$, while $\Delta_f(e\mid A)$ is either $1$ or $0$. Hence in every allowed case,
\begin{align*}
\Delta_f(e\mid A)\ge \Delta_f(e\mid B).
\end{align*}
Therefore $f$ is submodular by *Diminishing Returns Characterisation*. This example models a utility that gains one unit from each of the first two selected items and then saturates.
[/example]
The saturation example is a finite version of a recurring modelling pattern: coverage, access, and diversity objectives often have high early returns and smaller later returns. The next example gives this pattern a facility-location interpretation.
[example: Facility Location Diminishing Returns]
Let $E$ be a set of possible facility sites and let $C$ be a finite set of customers. For each customer $c\in C$ and site $e\in E$, let $a_{ce}\ge 0$ be the benefit of serving $c$ from site $e$, and define
\begin{align*}
f(A)=\sum_{c\in C}\max_{x\in A} a_{cx},
\end{align*}
with the maximum over the empty set taken to be $0$. For a fixed customer $c$, write
\begin{align*}
M_c(A)=\max_{x\in A}a_{cx},
\end{align*}
again with $M_c(\varnothing)=0$. If $e\notin A$, then the best value for customer $c$ after adding $e$ is
\begin{align*}
M_c(A\cup\{e\})=\max\{M_c(A),a_{ce}\}.
\end{align*}
Hence the contribution of customer $c$ to the marginal value of adding $e$ is
\begin{align*}
M_c(A\cup\{e\})-M_c(A)=\max\{0,a_{ce}-M_c(A)\}.
\end{align*}
Summing over customers gives
\begin{align*}
\Delta_f(e\mid A)=\sum_{c\in C}\max\{0,a_{ce}-M_c(A)\}.
\end{align*}
Now let $A\subset B\subset E$ and let $e\in E\setminus B$. For each customer $c\in C$, every site in $A$ is also in $B$, so
\begin{align*}
M_c(A)\le M_c(B).
\end{align*}
Therefore
\begin{align*}
a_{ce}-M_c(A)\ge a_{ce}-M_c(B).
\end{align*}
Since $t\mapsto \max\{0,t\}$ is monotone increasing on $\mathbb R$, this implies
\begin{align*}
\max\{0,a_{ce}-M_c(A)\}\ge \max\{0,a_{ce}-M_c(B)\}.
\end{align*}
After summing over all $c\in C$, we obtain
\begin{align*}
\Delta_f(e\mid A)\ge \Delta_f(e\mid B).
\end{align*}
Thus $f$ is submodular by *Diminishing Returns Characterisation*. Also, if $A\subset B$, then $M_c(A)\le M_c(B)$ for every $c$, and therefore
\begin{align*}
f(A)\le f(B).
\end{align*}
So $f$ is monotone increasing as well: adding facilities never hurts, but each new facility matters only to the extent that it improves a customer's current best available service.
[/example]
## Cut Functions as Submodular Functions
Cuts were already central in flows and connectivity. The question here is whether their familiar uncrossing behaviour is an instance of the same submodular inequality.
[definition: Undirected Cut Function]
Let $G=(V,E)$ be a finite undirected graph with edge capacities $c_e\ge 0$. The cut capacity function is the set function $\delta:2^V\to \mathbb R$ defined, for $S\subset V$, by
\begin{align*}
\delta(S)=\sum_{uv\in E: |\{u,v\}\cap S|=1} c_{uv}.
\end{align*}
[/definition]
A cut counts edges crossing from a set to its complement. To connect this definition with the abstract theory, we need to prove that the uncrossing inequality for two cuts is exactly the submodular inequality.
[quotetheorem:5817]
[citeproof:5817]
This theorem explains the algebra behind cut uncrossing. The non-negativity of the capacities is essential for this weighted-sum proof: with a single negative-capacity edge, multiplying the edgewise submodular inequality by a negative number reverses the inequality. The theorem does not say that every submodular function is a graph cut function; cuts form a special symmetric class inside the larger theory. Its use in optimisation is that if two candidate cuts cross, replacing them by their union and intersection cannot increase the combined cut value, which is the local move behind many uncrossing arguments.
[example: Two Crossing Cuts In A Four Vertex Graph]
Let $G=K_4$ on vertices $\{1,2,3,4\}$ with every edge capacity equal to $1$, and take $A=\{1,2\}$ and $B=\{2,3\}$. By the definition of the cut function, $\delta(S)$ counts the edges with exactly one endpoint in $S$. For $A$, the crossing edges are
\begin{align*}
13,\ 14,\ 23,\ 24,
\end{align*}
so
\begin{align*}
\delta(A)=1+1+1+1=4.
\end{align*}
For $B$, the crossing edges are
\begin{align*}
12,\ 13,\ 24,\ 34,
\end{align*}
so
\begin{align*}
\delta(B)=1+1+1+1=4.
\end{align*}
Now
\begin{align*}
A\cup B=\{1,2,3\},\qquad A\cap B=\{2\}.
\end{align*}
The cut of $A\cup B$ consists of the edges from $\{1,2,3\}$ to $\{4\}$, namely
\begin{align*}
14,\ 24,\ 34,
\end{align*}
and therefore
\begin{align*}
\delta(A\cup B)=1+1+1=3.
\end{align*}
The cut of $A\cap B=\{2\}$ consists of
\begin{align*}
12,\ 23,\ 24,
\end{align*}
so
\begin{align*}
\delta(A\cap B)=1+1+1=3.
\end{align*}
Thus the submodular inequality for this pair reads
\begin{align*}
\delta(A)+\delta(B)=4+4=8\ge 3+3=\delta(A\cup B)+\delta(A\cap B).
\end{align*}
The strict gap is $8-6=2$: it is exactly the two counts of edge $13$, which crosses both $A$ and $B$ but crosses neither $A\cup B$ nor $A\cap B$.
[/example]
Directed cuts require a choice of orientation convention. The outgoing cut function is still submodular, but the symmetry $\delta(S)=\delta(V\setminus S)$ no longer holds.
[definition: Directed Outgoing Cut Function]
Let $D=(V,A)$ be a finite directed graph with arc capacities $c_a\ge 0$. The outgoing cut capacity function is the set function $\delta^+:2^V\to \mathbb R$ defined, for $S\subset V$, by
\begin{align*}
\delta^+(S)=\sum_{uv\in A: u\in S,\ v\notin S} c_{uv}.
\end{align*}
[/definition]
The proof is the same arcwise membership check, now with ordered endpoints. This version is the set-function shadow of the $s$-$t$ cut constraints that appeared in max-flow/min-cut duality.
[quotetheorem:5818]
[citeproof:5818]
Again, non-negative capacities are part of the statement rather than decoration; a negative arc weight would reverse its single-arc contribution and can destroy submodularity. The theorem does not restore the symmetry of undirected cuts, since generally $\delta^+(S)$ and $\delta^+(V\setminus S)$ measure different sets of arcs. What survives is the union-intersection inequality, and that is the part needed when directed cut constraints are uncrossed in flow and closure models.
## Matroid Rank and Discrete Convexity
Greedy algorithms for matroids worked because independence has an exchange property. The question now is how that exchange property appears as curvature of a rank function.
[definition: Matroid Rank Function]
Let $M=(E,\mathcal I)$ be a matroid. The rank function of $M$ is the set function $r:2^E\to \mathbb Z$ defined by
\begin{align*}
r(A)=\max\{|I|: I\subset A,\ I\in\mathcal I\}.
\end{align*}
[/definition]
The rank $r(A)$ records the largest independent part of $A$. Adding an element either raises the rank by one or leaves it unchanged, and exchange says that the set of elements still capable of raising rank shrinks as the ambient set grows.
[quotetheorem:5819]
[citeproof:5819]
Rank submodularity is the algebraic form of the exchange axiom. The matroid hypothesis is essential: take the downward-closed set system on $E=\{a,b,c\}$ with independent sets $\varnothing,\{a\},\{b\},\{c\},\{b,c\}$. Its rank function satisfies $r(\{a\})=r(\{a,b\})=r(\{a,c\})=1$ but $r(E)=2$, so
\begin{align*}
r(\{a,b\})+r(\{a,c\})=2<3=r(E)+r(\{a\}),
\end{align*}
violating submodularity. This is exactly the exchange failure: the independent set $\{a\}$ cannot be augmented by either element of the larger independent set $\{b,c\}$. The theorem does not say that every integer-valued submodular function is a matroid rank function; rank functions also have normalisation, monotonicity, and singleton increments bounded by $1$. It gives another way to recognise why greedy algorithms are possible for weighted matroid optimisation but not for arbitrary set systems.
[example: Rank In A Graphic Matroid]
Let $G=(V,E)$ be a finite graph, and let $M(G)$ be its graphic matroid, so a set of edges is independent exactly when it is a forest. For $F\subset E$, write $k(F)$ for the number of connected components of the spanning subgraph $(V,F)$. The rank is
\begin{align*}
r(F)=|V|-k(F).
\end{align*}
We verify diminishing returns for this rank function. Let $F\subset H\subset E$, and let $e=uv\in E\setminus H$. Adding the edge $e$ to $(V,F)$ has two possible effects. If $u$ and $v$ lie in different connected components of $(V,F)$, then $e$ merges those two components and no other component changes, so
\begin{align*}
k(F\cup\{e\})=k(F)-1.
\end{align*}
If $u$ and $v$ lie in the same connected component of $(V,F)$, then adding $e$ creates a cycle inside that component and does not change the component count, so
\begin{align*}
k(F\cup\{e\})=k(F).
\end{align*}
Therefore
\begin{align*}
\Delta_r(e\mid F)=r(F\cup\{e\})-r(F)=\bigl(|V|-k(F\cup\{e\})\bigr)-\bigl(|V|-k(F)\bigr)=k(F)-k(F\cup\{e\}).
\end{align*}
Thus $\Delta_r(e\mid F)=1$ exactly when $u$ and $v$ are in different components of $(V,F)$, and $\Delta_r(e\mid F)=0$ exactly when they are in the same component of $(V,F)$. The same calculation with $H$ in place of $F$ gives the corresponding rule for $\Delta_r(e\mid H)$.
If $\Delta_r(e\mid H)=1$, then $u$ and $v$ lie in different connected components of $(V,H)$. Since $F\subset H$, every path in $(V,F)$ is also a path in $(V,H)$, so $u$ and $v$ cannot be connected in $(V,F)$. Hence $\Delta_r(e\mid F)=1$. If $\Delta_r(e\mid H)=0$, then $\Delta_r(e\mid F)$ is either $0$ or $1$, and therefore it is still at least $\Delta_r(e\mid H)$. In all cases,
\begin{align*}
\Delta_r(e\mid F)\ge \Delta_r(e\mid H).
\end{align*}
Thus the graphic matroid rank function is submodular by *Diminishing Returns Characterisation*: once more edges have already been chosen, a fixed edge has fewer opportunities to merge two components.
[/example]
The graphic example keeps the integer matroid setting, where each element contributes at most one unit of rank. Many optimisation models need the same exchange-shaped inequalities for divisible resources or weighted capacity, so we introduce a broader rank-function definition that preserves normalisation, monotonicity, and submodularity.
[definition: Polymatroid Rank Function]
Let $E$ be a finite set. A polymatroid rank function is a set function $f:2^E\to \mathbb R$ such that $f(\varnothing)=0$, $f$ is monotone increasing, and $f$ is submodular.
[/definition]
Matroid rank functions are integer-valued polymatroid rank functions with all singleton ranks at most $1$. The general notion allows fractional resources while keeping the same diminishing-returns geometry.
## The Lovasz Extension
A set function lives on the vertices of the hypercube $\{0,1\}^E$. The question is how to see submodularity as an ordinary convexity property on the whole cube $[0,1]^E$.
[definition: Lovasz Extension]
Let $E=\{1,\dots,n\}$ and let $f:2^E\to \mathbb R$ satisfy $f(\varnothing)=0$. For $x\in \mathbb R^n$, choose an ordering $i_1,\dots,i_n$ such that
\begin{align*}
x_{i_1}\ge x_{i_2}\ge \dots \ge x_{i_n},
\end{align*}
and set $S_k=\{i_1,\dots,i_k\}$ for $1\le k\le n$, with $S_0=\varnothing$. The Lovasz extension of $f$ is the function $\hat f:\mathbb R^n\to \mathbb R$ defined by
\begin{align*}
\hat f(x)=\sum_{k=1}^n x_{i_k}\bigl(f(S_k)-f(S_{k-1})\bigr).
\end{align*}
[/definition]
Although the definition uses a sorted order, equal coordinates do not create an ambiguity: exchanging tied adjacent elements does not change the value when the same level set is represented. The construction is linear on each region where the coordinate order is fixed, so a natural question is exactly when these linear regions fit together to form a convex function.
[example: Lovasz Extension Of A Cardinality Cap]
Let $E=\{1,2,3\}$ and let $f(A)=\min\{|A|,2\}$. For $x\in\mathbb R^3$, choose an ordering $i_1,i_2,i_3$ such that
\begin{align*}
x_{i_1}\ge x_{i_2}\ge x_{i_3}.
\end{align*}
Then $S_0=\varnothing$, $S_1=\{i_1\}$, $S_2=\{i_1,i_2\}$, and $S_3=E$. The three increments in the Lovasz extension are computed from the sizes of these sets:
\begin{align*}
f(S_1)-f(S_0)=\min\{1,2\}-\min\{0,2\}=1-0=1.
\end{align*}
\begin{align*}
f(S_2)-f(S_1)=\min\{2,2\}-\min\{1,2\}=2-1=1.
\end{align*}
\begin{align*}
f(S_3)-f(S_2)=\min\{3,2\}-\min\{2,2\}=2-2=0.
\end{align*}
Substituting these increments into the definition of the Lovasz extension gives
\begin{align*}
\hat f(x)=x_{i_1}\cdot 1+x_{i_2}\cdot 1+x_{i_3}\cdot 0.
\end{align*}
Thus
\begin{align*}
\hat f(x)=x_{i_1}+x_{i_2}.
\end{align*}
Since $x_{i_1}$ and $x_{i_2}$ are the two largest coordinates of $x$, this says that $\hat f(x)$ is the sum of the two largest coordinates. Equivalently,
\begin{align*}
\hat f(x)=\max\{x_1+x_2,\ x_1+x_3,\ x_2+x_3\},
\end{align*}
because choosing the two largest coordinates is the same as omitting the smallest coordinate.
Each function $x\mapsto x_i+x_j$ is linear. To see convexity directly, let $0\le \lambda\le 1$ and $x,y\in\mathbb R^3$. For each pair $i<j$,
\begin{align*}
\lambda(x_i+x_j)+(1-\lambda)(y_i+y_j)\le \lambda\max_{p<q}(x_p+x_q)+(1-\lambda)\max_{p<q}(y_p+y_q).
\end{align*}
Taking the maximum over $i<j$ on the left gives
\begin{align*}
\hat f(\lambda x+(1-\lambda)y)\le \lambda\hat f(x)+(1-\lambda)\hat f(y).
\end{align*}
So the Lovasz extension is a convex piecewise-linear function, reflecting that the cardinality cap has decreasing marginal increments $1,1,0$.
[/example]
The example raises a useful diagnostic question: can the discrete diminishing-returns condition be detected by ordinary convexity after extending the function from subsets to the cube? If so, submodular minimisation can be compared with convex minimisation instead of treated as arbitrary search over all subsets.
[quotetheorem:5820]
[citeproof:5820]
This criterion is best viewed as geometric intuition in this course rather than as an algorithm. The normalisation $f(\varnothing)=0$ fixes the homogeneous extension; without it, the displayed formula would need an additional constant term and the support-function interpretation would be misstated. The theorem does not say that every convex piecewise-linear function comes from a set function, only that this particular extension detects submodularity. It explains why minimising submodular functions resembles convex minimisation more than arbitrary search over $2^E$.
[illustration:co-lovasz-extension-two-elements]
## Submodular Minimisation
A convex function can often be minimised without enumerating all possible points. The discrete analogue asks whether a submodular function can be minimised over all subsets of a finite ground set without checking all $2^{|E|}$ subsets.
[definition: Submodular Minimisation Problem]
Let $E$ be a finite set and let $f:2^E\to \mathbb R$ be submodular. The submodular minimisation problem is to find a subset $A\subset E$ such that
\begin{align*}
f(A)=\min_{S\subset E} f(S).
\end{align*}
[/definition]
The problem is not restricted to monotone functions. In fact, a monotone increasing submodular function has the immediate minimiser $\varnothing$, so the interesting cases include cuts with terminal constraints encoded by penalties, symmetric functions, and functions formed by adding modular terms.
[remark: Submodular Minimisation Theorem]
There is a polynomial-time algorithm which, given a finite set $E$ and value-oracle access to an integer-valued submodular function $f:2^E\to\mathbb Z$ whose values have polynomially bounded encoding length, returns a minimiser of $f$ over $2^E$.
[/remark]
This theorem is used here as a structural result linking discrete convexity to exact optimisation, before Chapter 11 recasts several exact optimisation results as statements about integral polyhedra. The named polynomial algorithms of Groetschel-Lovasz-Schrijver, Iwata-Fleischer-Fujishige, and Schrijver show that the result is algorithmic, not merely existential. The integer-valued and polynomial encoding assumptions matter for the oracle statement, since an oracle returning values with unbounded bit-length would not give a polynomial input model. The theorem does not imply that maximising a submodular function is polynomial-time solvable, and it does not give the specialised speed of max-flow algorithms for graph cuts. Its proof uses the same philosophy as the ellipsoid method and linear optimisation over separation oracles: submodularity supplies enough inequality structure to certify optimality without listing all subsets.
[example: Minimum Graph Cut As Submodular Minimisation]
Let $G=(V,E)$ be an undirected graph with non-negative capacities $c_e\ge 0$, and let $s,t\in V$ be distinct. For the cut function $\delta$, the sets $\varnothing$ and $V$ have no edge with exactly one endpoint inside the set, so
\begin{align*}
\delta(\varnothing)=0,\qquad \delta(V)=0.
\end{align*}
For every $S\subset V$, each summand in $\delta(S)$ is a non-negative capacity, hence
\begin{align*}
\delta(S)=\sum_{uv\in E:\ |\{u,v\}\cap S|=1} c_{uv}\ge 0.
\end{align*}
Therefore the unconstrained cut function is minimised by $\varnothing$ and $V$, which is why an $s$-$t$ cut requires the constraints $s\in S$ and $t\notin S$ or an equivalent penalty.
One way to encode those constraints is to choose
\begin{align*}
C=\sum_{e\in E}c_e
\end{align*}
and take a number $M>C$. Define
\begin{align*}
p(S)=M\mathbf 1_{\{s\notin S\}}+M\mathbf 1_{\{t\in S\}}.
\end{align*}
Since
\begin{align*}
\mathbf 1_{\{s\notin S\}}=1-\mathbf 1_{\{s\in S\}},
\end{align*}
we can rewrite
\begin{align*}
p(S)=M-M\mathbf 1_{\{s\in S\}}+M\mathbf 1_{\{t\in S\}}.
\end{align*}
Thus $p$ is modular: it is a constant plus one weight attached to membership of $s$ and one weight attached to membership of $t$. Now set
\begin{align*}
F(S)=\delta(S)+p(S).
\end{align*}
The function $\delta$ is submodular by *[Submodularity of the Cut Capacity Function](/theorems/5817)*, and adding the modular function $p$ preserves submodularity by *[Basic Closure Properties of Submodular Functions](/theorems/5822)*, so $F$ is submodular.
If $S$ satisfies $s\in S$ and $t\notin S$, then $p(S)=0$, so
\begin{align*}
F(S)=\delta(S)\le C.
\end{align*}
For instance, the feasible set $\{s\}$ gives $F(\{s\})=\delta(\{s\})\le C$. If $S$ violates at least one of the two conditions, then $p(S)\ge M$, and since $\delta(S)\ge 0$,
\begin{align*}
F(S)=\delta(S)+p(S)\ge M>C.
\end{align*}
Thus every minimiser of $F$ must satisfy $s\in S$ and $t\notin S$, and among such sets minimising $F$ is exactly the same as minimising $\delta$. This shows how the minimum $s$-$t$ cut problem fits the submodular minimisation framework: the cut term supplies submodularity, while the modular penalty removes the unconstrained minimisers $\varnothing$ and $V$.
[/example]
Submodular minimisation is a unifying theorem because it includes examples that do not present themselves as graph cuts. The next example shows how a small table can be checked directly before invoking the theorem.
[example: Minimising A Small Submodular Table]
Let $E=\{a,b,c\}$ and define
\begin{align*}
g(S)=\min\{|S|,2\},\qquad m(S)=-2\mathbf 1_{\{b\in S\}}.
\end{align*}
Then $m$ is modular, because it can be written as
\begin{align*}
m(S)=0+0\cdot \mathbf 1_{\{a\in S\}}-2\cdot \mathbf 1_{\{b\in S\}}+0\cdot \mathbf 1_{\{c\in S\}}.
\end{align*}
For the eight subsets of $E$, the values of $g(S)+m(S)$ are:
\begin{align*}
g(\varnothing)+m(\varnothing)=\min\{0,2\}+0=0.
\end{align*}
\begin{align*}
g(\{a\})+m(\{a\})=\min\{1,2\}+0=1.
\end{align*}
\begin{align*}
g(\{b\})+m(\{b\})=\min\{1,2\}-2=1-2=-1.
\end{align*}
\begin{align*}
g(\{c\})+m(\{c\})=\min\{1,2\}+0=1.
\end{align*}
\begin{align*}
g(\{a,b\})+m(\{a,b\})=\min\{2,2\}-2=2-2=0.
\end{align*}
\begin{align*}
g(\{a,c\})+m(\{a,c\})=\min\{2,2\}+0=2.
\end{align*}
\begin{align*}
g(\{b,c\})+m(\{b,c\})=\min\{2,2\}-2=2-2=0.
\end{align*}
\begin{align*}
g(E)+m(E)=\min\{3,2\}-2=2-2=0.
\end{align*}
Thus the displayed table is exactly the set function
\begin{align*}
f(S)=g(S)+m(S)=\min\{|S|,2\}-2\mathbf 1_{\{b\in S\}}.
\end{align*}
The cardinality-cap function $g$ is submodular by the preceding cardinality-cap example, and adding the modular function $m$ preserves submodularity by *Basic Closure Properties of Submodular Functions*. Hence $f$ is submodular.
To minimise $f$, compare the listed values. The singleton $\{b\}$ has
\begin{align*}
f(\{b\})=-1,
\end{align*}
while every other subset has value $0$, $1$, or $2$. Therefore the minimum value is
\begin{align*}
\min_{S\subset E} f(S)=-1,
\end{align*}
and it is attained uniquely at $S=\{b\}$. The modular preference for including $b$ moves the minimiser away from $\varnothing$, while the submodular curvature is preserved.
[/example]
The theorem is a minimisation result, not a maximisation result. Maximising a monotone submodular function under constraints is a different topic, where greedy algorithms usually give approximation guarantees rather than exact optima.
[remark: Minimisation Versus Maximisation]
Submodular minimisation is polynomial-time solvable in broad oracle models, while submodular maximisation includes hard coverage-type problems. This contrast matches continuous optimisation: convex minimisation is the tractable direction, and concave maximisation over convex feasible regions is the parallel tractable direction after changing signs.
[/remark]
## Closure Properties and Modelling Patterns
To use submodularity in a new problem, we need rules for building new examples from known ones. Which operations preserve the diminishing-returns inequality?
The examples above combine several ingredients: graph boundary terms, rank-like terms, and linear preferences. Before building larger models, we need closure rules showing that these combinations keep the same discrete convexity.
[quotetheorem:5822]
[citeproof:5822]
These closure properties explain why examples in applications are often sums of graph terms, rank terms, and linear penalties. The non-negativity of $\alpha$ and $\beta$ is essential: multiplying a submodular function by a negative scalar turns it into a supermodular function instead. The result does not say that arbitrary nonlinear transformations preserve submodularity; even squaring a non-negative submodular function can create increasing marginal returns. The closure rules give a quick way to test whether a proposed model has the right curvature before invoking a minimisation theorem.
[example: Cut Plus Linear Preference]
Let $G=(V,E)$ have non-negative capacities, and let $w_v\in\mathbb R$ for each $v\in V$. Define
\begin{align*}
f(S)=\delta(S)+\sum_{v\in S}w_v.
\end{align*}
Write
\begin{align*}
m(S)=\sum_{v\in S}w_v.
\end{align*}
Then $m$ is modular, since it has the form
\begin{align*}
m(S)=0+\sum_{v\in S}w_v.
\end{align*}
We verify the submodular inequality. Let $A,B\subset V$. First,
\begin{align*}
f(A)+f(B)=\delta(A)+m(A)+\delta(B)+m(B).
\end{align*}
Reordering the four real numbers gives
\begin{align*}
f(A)+f(B)=\delta(A)+\delta(B)+m(A)+m(B).
\end{align*}
By *Submodularity of the Cut Capacity Function*,
\begin{align*}
\delta(A)+\delta(B)\ge \delta(A\cup B)+\delta(A\cap B).
\end{align*}
For the modular term, expand the two sides by vertex membership. A vertex in $A\cap B$ contributes $2w_v$ to $m(A)+m(B)$, a vertex in exactly one of $A$ and $B$ contributes $w_v$, and a vertex outside $A\cup B$ contributes $0$. The same contributions occur in $m(A\cup B)+m(A\cap B)$: vertices in $A\cap B$ appear once in $A\cup B$ and once in $A\cap B$, vertices in exactly one of $A$ and $B$ appear only in $A\cup B$, and vertices outside $A\cup B$ appear in neither set. Hence
\begin{align*}
m(A)+m(B)=m(A\cup B)+m(A\cap B).
\end{align*}
Combining the cut inequality with this equality gives
\begin{align*}
f(A)+f(B)\ge \delta(A\cup B)+\delta(A\cap B)+m(A\cup B)+m(A\cap B).
\end{align*}
Using $f(T)=\delta(T)+m(T)$ for $T=A\cup B$ and $T=A\cap B$, this becomes
\begin{align*}
f(A)+f(B)\ge f(A\cup B)+f(A\cap B).
\end{align*}
Thus $f$ is submodular.
This model balances a boundary cost against vertex preferences: negative weights can reward including selected vertices, so a minimum may choose a non-empty proper set even though the cut function alone is minimised by $\varnothing$ and $V$.
[/example]
A practical check is to search for increasing marginal returns. Finding one violation rules out submodularity, while proving every marginal decreases certifies it.
[example: A Non-Submodular Table]
Let $E=\{a,b\}$ and define
\begin{align*}
f(\varnothing)=0,\qquad f(\{a\})=0,\qquad f(\{b\})=0,\qquad f(\{a,b\})=1.
\end{align*}
For the submodular inequality with $A=\{a\}$ and $B=\{b\}$, we have
\begin{align*}
A\cup B=\{a,b\},\qquad A\cap B=\varnothing.
\end{align*}
Thus
\begin{align*}
f(A)+f(B)=f(\{a\})+f(\{b\})=0+0=0,
\end{align*}
while
\begin{align*}
f(A\cup B)+f(A\cap B)=f(\{a,b\})+f(\varnothing)=1+0=1.
\end{align*}
So
\begin{align*}
f(A)+f(B)=0<1=f(A\cup B)+f(A\cap B),
\end{align*}
which violates the submodular inequality. Hence $f$ is not submodular.
The same failure is visible through marginal values. Adding $b$ to the empty set gives
\begin{align*}
\Delta_f(b\mid \varnothing)=f(\{b\})-f(\varnothing)=0-0=0,
\end{align*}
but adding $b$ after $a$ is already present gives
\begin{align*}
\Delta_f(b\mid \{a\})=f(\{a,b\})-f(\{a\})=1-0=1.
\end{align*}
The marginal value has increased from $0$ to $1$ after enlarging the starting set from $\varnothing$ to $\{a\}$. For the supermodular inequality, the only incomparable pair is again $\{a\}$ and $\{b\}$, and there the displayed calculation gives $0\le 1$; when one set contains the other, both sides are equal. Thus $f$ is supermodular, representing complementarity between $a$ and $b$.
[/example]
The chapter's organising principle is that submodularity is the discrete convexity condition most compatible with exact minimisation. Cuts, matroid ranks, saturated coverage utilities, and modular perturbations all fit into this framework, giving a common language for uncrossing, exchange, and polynomial-time optimisation over subsets.
# 11. Polyhedral Combinatorics
Polyhedral combinatorics translates finite optimisation problems into questions about convex sets in Euclidean space. Earlier chapters used augmenting paths, cuts, potentials, and exchange arguments as certificates; here those certificates are seen as linear programming dual objects and faces of polyhedra. The guiding question is when an integer optimisation problem can be solved by its linear relaxation without losing integrality.
## Convex Hulls of Feasible Solutions
How can a discrete family of feasible solutions be represented so that linear programming can see the exact optimum? The basic move is to encode each feasible solution by an incidence vector and then take the convex hull of all such vectors.
[definition: Incidence Vector]
Let $E$ be a finite set and let $S \subset E$. The incidence vector of $S$ is the vector $\mathbf{1}_S \in \mathbb R^E$ whose $e$-coordinate is $1$ if $e \in S$ and $0$ if $e \notin S$.
[/definition]
Incidence vectors allow a set system to become a subset of $\mathbb R^E$. To make this encoding useful for linear programming, we need the smallest convex set containing exactly the encoded feasible solutions.
[definition: Feasible Solution Polytope]
Let $E$ be a finite set and let $\mathcal F \subset 2^E$ be a family of feasible subsets. The feasible solution polytope of $\mathcal F$ is
\begin{align*} P(\mathcal F) = \operatorname{conv}\{\mathbf{1}_S : S \in \mathcal F\} \subset \mathbb R^E. \end{align*}
[/definition]
The polytope $P(\mathcal F)$ is the exact linear programming model for the combinatorial problem, but exactness has to mean more than containing the feasible incidence vectors. The essential question is whether a linear objective can improve by moving to a convex combination of feasible vectors instead of staying at one actual feasible solution.
[quotetheorem:5823]
[citeproof:5823]
This theorem explains why convex hulls are conceptually perfect formulations: they preserve every linear objective because linear functions cannot gain by passing to convex combinations. The hypothesis that we use the actual convex hull is essential; replacing it by a larger set can introduce fractional points with better objective value than any feasible incidence vector, as the triangle matching example later shows. The theorem does not provide an efficient description of $P(\mathcal F)$, since the convex hull may have exponentially many facets or may be computationally hard to separate over. In practice we therefore replace $P(\mathcal F)$ by a larger polyhedron described by simple inequalities, and then ask whether the enlargement changes the optimum.
[definition: Linear Relaxation]
Let $\mathcal F \subset 2^E$ and let $Q \subset \mathbb R^E$ be a polyhedron. The polyhedron $Q$ is a linear relaxation of $P(\mathcal F)$ if
\begin{align*} P(\mathcal F) \subset Q. \end{align*}
[/definition]
A relaxation is useful when it is compact enough to optimise over and tight enough to preserve the integer optimum for the objective functions of interest. Without an integrality condition, even the one-variable relaxation
\begin{align*}
2x &\le 1, &
x &\ge 0
\end{align*}
has the fractional optimum $x=1/2$ for the objective $\max x$, despite having integral coefficients and an integral right-hand side. To state the strongest such guarantee, we use the convex-hull definition of an integral polyhedron.
[definition: Integral Polyhedron]
A polyhedron $P \subset \mathbb R^n$ is integral if
\begin{align*} P = \operatorname{conv}(P \cap \mathbb Z^n). \end{align*}
[/definition]
Equivalently, every finite linear optimum over $P$ has an integral optimal solution. For rational polytopes this is the same as saying that every vertex is integral; for unbounded rational polyhedra, the convex-hull formulation is the clean way to include faces and recession directions without forcing every point of a positive-dimensional face to be integral.
[example: Shortest Path Relaxation]
For non-negative arc costs $c_a\ge 0$, the shortest-path relaxation minimizes
\begin{align*}
\sum_{a\in A} c_a x_a
\end{align*}
over all $x\in\mathbb R^A$ satisfying the following balances: at $s$ the net outflow is $1$, at $t$ the net outflow is $-1$, at every other vertex the net outflow is $0$, and $x_a\ge 0$ for every arc $a$. Explicitly,
\begin{align*}
\sum_{a\in \delta^+(s)}x_a-\sum_{a\in \delta^-(s)}x_a=1.
\end{align*}
\begin{align*}
\sum_{a\in \delta^+(t)}x_a-\sum_{a\in \delta^-(t)}x_a=-1.
\end{align*}
\begin{align*}
\sum_{a\in \delta^+(v)}x_a-\sum_{a\in \delta^-(v)}x_a=0 \quad\text{for every }v\in V\setminus\{s,t\}.
\end{align*}
\begin{align*}
x_a\ge 0 \quad\text{for every }a\in A.
\end{align*}
Let $x$ be an integral feasible solution. Since the net outflow at $s$ is $1$, some outgoing arc from $s$ has positive $x$-value unless there is compensating inflow, in which case the total outgoing flow is still positive. Starting at $s$, follow an outgoing arc $a$ with $x_a>0$. Whenever the walk enters a vertex $v\ne t$, the balance equation gives
\begin{align*}
\sum_{a\in \delta^+(v)}x_a=\sum_{a\in \delta^-(v)}x_a.
\end{align*}
Because at least one incoming arc used by the walk has positive value, the right-hand side is positive, so the left-hand side is positive and there is an outgoing arc from $v$ with positive value.
Since $V$ is finite, this tracing process either reaches $t$ or repeats a vertex. If it repeats a vertex, the repeated segment is a directed cycle $C$ and every arc of $C$ has $x_a\ge 1$, because $x$ is integral and positive on those arcs. Replacing $x$ by $x-\mathbf 1_C$ preserves every balance equation: at each vertex of $C$ one unit is removed from exactly one incoming arc and exactly one outgoing arc, while vertices outside $C$ are unchanged. The new vector is still non-negative, and its cost changes by
\begin{align*}
\sum_{a\in A}c_a(x_a-\mathbf 1_C(a))-\sum_{a\in A}c_a x_a=-\sum_{a\in C}c_a\le 0.
\end{align*}
Each cycle deletion reduces $\sum_{a\in A}x_a$ by $|C|$, so finitely many deletions remove all positive directed cycles. In the remaining integral feasible solution, tracing positive arcs from $s$ cannot repeat a vertex, hence must reach $t$ and produce a directed $s$-$t$ path $P$. The incidence vector $\mathbf 1_P$ has net outflow $1$ at $s$, net outflow $-1$ at $t$, and net outflow $0$ at every internal vertex of $P$ and every vertex outside $P$.
Subtract $\mathbf 1_P$ from the remaining cycle-free solution. The result is a non-negative integral vector with net outflow $0$ at every vertex, since both vectors have the same $s$-$t$ balance. If this residual vector were non-zero, following a positive arc from any vertex incident with positive residual flow would eventually repeat a vertex, producing a positive directed cycle. That contradicts the cycle-free construction, so the residual vector is $0$. Thus the cycle-free solution is exactly $\mathbf 1_P$.
Therefore any integral feasible solution can be converted, without increasing cost, into the incidence vector of an $s$-$t$ path. When the relaxation has an integral optimum, the linear program recovers an actual shortest path rather than only a fractional flow.
[/example]
## Total Unimodularity and Integral Polyhedra
Which simple matrix conditions force all linear relaxations of the form $Ax \le b$ to have integral vertices whenever $b$ is integral? Total unimodularity is the main answer used in this course.
[definition: Totally Unimodular Matrix]
A matrix $A \in \mathbb Z^{m \times n}$ is totally unimodular if every square submatrix of $A$ has determinant in $\{-1,0,1\}$.
[/definition]
The definition is restrictive but stable under many operations that occur in combinatorial models, including taking submatrices, duplicating rows with a sign change, and adjoining identity matrices. Its power comes from [Cramer's rule](/theorems/3305) at a basic feasible solution.
[quotetheorem:5824]
[citeproof:5824]
The theorem turns determinant checks into integrality proofs: once the constraint matrix is totally unimodular, integral right-hand sides cannot create fractional basic feasible solutions. The integrality of $b$ is necessary, since even the constraint $x \le 1/2$ has a fractional unique optimum for maximizing $x$. Total unimodularity is also a sufficient condition for this particular inequality form, not a claim that every integral polyhedron must be presented by a matrix whose total unimodularity is visible from the written constraints. We next record the standard structural criterion for network matrices, since incidence matrices are the source of most examples in matching and flow.
[quotetheorem:5825]
[citeproof:5825]
This proof is a model for recognising total unimodularity: find a structural reason that every determinant either expands to a smaller one or vanishes. The sign pattern in each column is essential; an undirected vertex-edge incidence matrix can contain a triangle submatrix with determinant $2$, which is exactly the obstruction later visible in non-bipartite matching. The theorem does not say that every matrix built from a graph is totally unimodular, only that directed incidence matrices have the special cancellation property needed by the determinant test. The general theorem below is often quoted as the complete bridge between total unimodularity and integral right-hand sides.
[remark: Hoffman-Kruskal Theorem]
Let $A \in \mathbb Z^{m \times n}$. Then $A$ is totally unimodular if and only if, for every integral vector $b$ for which the polyhedron is non-empty,
\begin{align*}
\{x \in \mathbb R^n : Ax \le b,\ x \ge 0\}
\end{align*}
is integral.
[/remark]
The converse direction is more delicate than the basic Cramer's rule implication and is usually treated in a full course on integer programming. In this course we use the statement as a recognition principle: total unimodularity is exactly what makes all integral right-hand sides behave well for these standard inequalities. The quantifier over all integral right-hand sides is part of the theorem; it is stronger than checking that one particular polyhedron happens to be integral under one chosen presentation. A single integral polyhedron can often be written in many different ways, and Hoffman-Kruskal concerns the matrix form $Ax\le b$, $x\ge 0$ uniformly over integral choices of $b$.
[example: A Non-Totally Unimodular Constraint]
The single-row matrix $A=(2)$ is not totally unimodular: its only square submatrix is the $1\times 1$ matrix $(2)$, and
\begin{align*}
\det(2)=2\notin\{-1,0,1\}.
\end{align*}
Take the integral right-hand side $b=1$. The relaxation consists of the inequalities $2x\le 1$ and $x\ge 0$. Dividing $2x\le 1$ by the positive number $2$ gives
\begin{align*}
x\le \frac12.
\end{align*}
Together with $x\ge 0$, this says that the feasible region is the interval
\begin{align*}
0\le x\le \frac12.
\end{align*}
The endpoint $x=\frac12$ is a vertex of this interval. Equivalently, it is the unique feasible point where the upper constraint is tight, since
\begin{align*}
2x=1 \quad\Longleftrightarrow\quad x=\frac12.
\end{align*}
Thus an integral matrix and an integral right-hand side can still produce a fractional vertex when the matrix is not totally unimodular. The coefficient $2$ appears in the denominator of the basic solution, which is exactly the obstruction that total unimodularity rules out.
[/example]
## Matching Polytopes
How does the polyhedral viewpoint distinguish bipartite matching, where the simple degree constraints suffice, from general matching, where odd cycles create fractional vertices? This section revisits matching through its LP relaxation.
[definition: Matching Polytope Relaxation]
Let $G=(V,E)$ be a finite simple undirected graph. The standard matching relaxation is the polytope
\begin{align*} Q_M(G)=\left\{x \in \mathbb R^E : \sum_{e \in \delta(v)} x_e \le 1\text{ for all }v \in V,\ x_e \ge 0\text{ for all }e \in E\right\}. \end{align*}
[/definition]
The inequalities say that at most one chosen edge touches each vertex. For an integral vector, this is exactly the condition of being a matching, so the next issue is whether fractional vertices can occur.
[quotetheorem:5827]
[citeproof:5827]
This recovers the LP explanation for why bipartite matching has strong duality with vertex cover and admits exact polynomial-time algorithms. Bipartiteness is not cosmetic: it is what lets the degree matrix be converted into a directed incidence matrix by changing signs of rows. In a triangle the same degree constraints have the fractional point with all three edge variables equal to $1/2$, so the theorem does not extend to arbitrary graphs. The obstruction outside the bipartite case is already visible on that triangle, and examining it tells us what kind of inequality is missing.
[example: Fractional Triangle Matching]
Let $G=K_3$ with vertices $1,2,3$ and edges $e_{12},e_{23},e_{31}$. Define $x\in\mathbb R^E$ by
\begin{align*}
x_{e_{12}}=x_{e_{23}}=x_{e_{31}}=\frac12 .
\end{align*}
The degree constraint at vertex $1$ is
\begin{align*}
x_{e_{12}}+x_{e_{31}}=\frac12+\frac12=1\le 1.
\end{align*}
At vertex $2$,
\begin{align*}
x_{e_{12}}+x_{e_{23}}=\frac12+\frac12=1\le 1.
\end{align*}
At vertex $3$,
\begin{align*}
x_{e_{23}}+x_{e_{31}}=\frac12+\frac12=1\le 1.
\end{align*}
Also $x_e=1/2\ge 0$ for every edge $e$, so $x\in Q_M(K_3)$.
For unit weights $w_e=1$, the relaxation value at this point is
\begin{align*}
w\cdot x=x_{e_{12}}+x_{e_{23}}+x_{e_{31}}=\frac12+\frac12+\frac12=\frac32.
\end{align*}
Every two distinct edges of $K_3$ share a vertex, so a matching in $K_3$ can contain at most one edge. Hence every matching incidence vector has unit-weight value at most $1$. The relaxation therefore has a feasible fractional point with value $3/2>1$, so the standard matching relaxation is not integral for this non-bipartite graph.
[/example]
The triangle shows that degree constraints do not see the parity obstruction inside an odd set. We therefore add inequalities expressing that an odd set of vertices cannot contain a matching that covers more than all but one of its vertices.
[definition: Odd-Set Matching Inequalities]
Let $G=(V,E)$ be a finite simple undirected graph. For every odd set $S \subset V$ with $|S| \ge 3$, the odd-set matching inequality is
\begin{align*} \sum_{e \in E(S)} x_e \le \frac{|S|-1}{2}, \end{align*}
where $E(S)$ is the set of edges with both endpoints in $S$.
[/definition]
These inequalities cut off the fractional triangle point, since they impose $x_{e_{12}}+x_{e_{23}}+x_{e_{31}} \le 1$ on $S=V$. They also point beyond total unimodularity: non-bipartite matching is integral, but its compact description requires a richer family of inequalities.
[remark: Edmonds Matching Polytope Theorem]
For a finite simple graph $G=(V,E)$, the convex hull of incidence vectors of matchings in $G$ is described by the degree inequalities $\sum_{e \in \delta(v)}x_e \le 1$ for all $v\in V$, non-negativity $x_e\ge 0$ for all $e\in E$, and all odd-set inequalities
\begin{align*}
\sum_{e \in E(S)}x_e \le \frac{|S|-1}{2}
\end{align*}
for odd $S\subset V$ with $|S|\ge 3$.
[/remark]
The derivation is not part of this chapter's core route; it uses the blossom structure underlying Edmonds' matching algorithm. We use the theorem to understand what the triangle example was signalling: bipartiteness eliminates odd-set obstructions, while general graphs require blossom inequalities.
[example: Adding the Triangle Odd-Set Inequality]
For $K_3$, write
\begin{align*}
x_{12}=x_{e_{12}},\qquad x_{23}=x_{e_{23}},\qquad x_{31}=x_{e_{31}}.
\end{align*}
The fractional point from the standard relaxation has
\begin{align*}
x_{12}+x_{23}+x_{31}
=\frac12+\frac12+\frac12
=\frac32,
\end{align*}
so it violates the odd-set inequality for $S=V$:
\begin{align*}
x_{12}+x_{23}+x_{31}\le 1.
\end{align*}
With this inequality and non-negativity, the feasible points satisfy
\begin{align*}
x_{12}\ge 0,\qquad x_{23}\ge 0,\qquad x_{31}\ge 0,\qquad
x_{12}+x_{23}+x_{31}\le 1.
\end{align*}
These inequalities already imply the three degree constraints, because
\begin{align*}
x_{12}+x_{31}\le x_{12}+x_{23}+x_{31}\le 1,
\end{align*}
and the same argument gives
\begin{align*}
x_{12}+x_{23}\le 1,\qquad x_{23}+x_{31}\le 1.
\end{align*}
Thus the feasible region is the simplex
\begin{align*}
\operatorname{conv}\{(0,0,0),(1,0,0),(0,1,0),(0,0,1)\}.
\end{align*}
Indeed, for any feasible $x$, setting
\begin{align*}
\lambda_{12}=x_{12},\qquad
\lambda_{23}=x_{23},\qquad
\lambda_{31}=x_{31},\qquad
\lambda_0=1-x_{12}-x_{23}-x_{31}
\end{align*}
gives non-negative coefficients with
\begin{align*}
\lambda_0+\lambda_{12}+\lambda_{23}+\lambda_{31}=1,
\end{align*}
and
\begin{align*}
x=\lambda_0(0,0,0)+\lambda_{12}(1,0,0)+\lambda_{23}(0,1,0)+\lambda_{31}(0,0,1).
\end{align*}
Therefore the only vertices are the zero vector and the three unit edge vectors. The triangle odd-set inequality removes the fractional point $(1/2,1/2,1/2)$ and leaves exactly the incidence vectors of matchings in $K_3$.
[/example]
## Flow Polytopes
Why do max-flow algorithms return integral flows when all capacities are integral? The polyhedral answer is that the constraint matrix is built from a directed incidence matrix, so the relevant flow polyhedron is integral.
[definition: Flow Polyhedron]
Let $D=(V,A)$ be a directed graph with source $s$, sink $t$, and capacities $u_a \in \mathbb Z_{\ge 0}$ for $a \in A$. The flow polyhedron from $s$ to $t$ consists of pairs $(x,\nu) \in \mathbb R^A \times \mathbb R$ satisfying flow conservation with value $\nu$, namely net outflow $\nu$ at $s$, net outflow $-\nu$ at $t$, net outflow $0$ at every other vertex, and capacity constraints $0 \le x_a \le u_a$ for all $a \in A$.
[/definition]
The scalar $\nu$ records the value of the flow. Maximising $\nu$ over this polyhedron is the linear programming version of the maximum flow problem, so integrality of this polyhedron would explain the integral outputs of max-flow algorithms.
[quotetheorem:5829]
[citeproof:5829]
This theorem is the LP form of the integrality statement behind Ford-Fulkerson and the max-flow min-cut theorem. Integral capacities are necessary for the stated conclusion: a single arc of capacity $1/2$ has maximum value $1/2$. The result also does not assert that every feasible flow is integral; it says that an integral optimal representative exists at a basic feasible solution. This is a useful bridge to linear programming duality, where the dual variables become cut certificates and the equality of max flow and min cut becomes strong duality with integral primal and dual optima.
[example: Incidence Matrix of a Directed Graph]
Consider the directed path $s\to v\to t$, with arcs $a=(s,v)$ and $b=(v,t)$. With rows ordered as $s,v,t$, the node-arc incidence matrix has
\begin{align*}
(M_{s,a},M_{v,a},M_{t,a})=(1,-1,0)
\end{align*}
and
\begin{align*}
(M_{s,b},M_{v,b},M_{t,b})=(0,1,-1),
\end{align*}
because each arc contributes $1$ at its tail, $-1$ at its head, and $0$ at vertices not incident with it.
The $1\times 1$ square subdeterminants are just the entries $1,0,-1,1,0,-1$, so each lies in $\{-1,0,1\}$. For the $2\times 2$ square submatrix using rows $s$ and $v$, the determinant is
\begin{align*}
M_{s,a}M_{v,b}-M_{s,b}M_{v,a}=1\cdot 1-0\cdot(-1)=1.
\end{align*}
Using rows $s$ and $t$ gives
\begin{align*}
M_{s,a}M_{t,b}-M_{s,b}M_{t,a}=1\cdot(-1)-0\cdot 0=-1.
\end{align*}
Using rows $v$ and $t$ gives
\begin{align*}
M_{v,a}M_{t,b}-M_{v,b}M_{t,a}=(-1)(-1)-1\cdot 0=1.
\end{align*}
Thus every square subdeterminant is $0$, $1$, or $-1$, in agreement with *[Total Unimodularity of Directed Node-Arc Incidence Matrices](/theorems/5825)*.
For a flow of value $\nu$, the balance equation at $s$ is
\begin{align*}
x_a=\nu.
\end{align*}
At $v$, conservation says
\begin{align*}
x_b-x_a=0.
\end{align*}
At $t$, the net outflow condition is
\begin{align*}
-x_b=-\nu.
\end{align*}
The equation $x_b-x_a=0$ gives $x_b=x_a$, and $x_a=\nu$ then gives
\begin{align*}
x_b=x_a=\nu.
\end{align*}
The sink equation becomes $-\nu=-\nu$, so it is consistent with the first two equations.
If the capacities are $u_a$ and $u_b$, feasibility requires
\begin{align*}
\nu=x_a\le u_a
\end{align*}
and
\begin{align*}
\nu=x_b\le u_b.
\end{align*}
Hence every feasible value satisfies $\nu\le \min\{u_a,u_b\}$, while setting
\begin{align*}
x_a=x_b=\min\{u_a,u_b\}
\end{align*}
satisfies the balance equations and both capacity constraints. Therefore the maximum flow value is $\min\{u_a,u_b\}$, which is an integer whenever the capacities are integral.
[/example]
## Matroid Polytopes
What is the polyhedral shadow of the exchange axiom from the matroid chapters? Matroid independence is another setting where a combinatorial optimisation problem has a clean linear description, and the greedy algorithm is reflected in the faces of a polytope.
[definition: Matroid Independence Polytope]
Let $M=(E,\mathcal I)$ be a matroid with rank function $r:2^E \to \mathbb Z_{\ge 0}$. The matroid independence polytope is
\begin{align*} P_I(M)=\left\{x \in \mathbb R^E : \sum_{e \in S}x_e \le r(S)\text{ for all }S \subset E,\ x_e \ge 0\text{ for all }e \in E\right\}. \end{align*}
[/definition]
The rank inequalities say that no subset receives more chosen elements than a matroid independent set could contain inside that subset. The content of the theorem is that these inequalities describe the convex hull exactly.
[quotetheorem:5830]
[citeproof:5830]
This result places matroids alongside flows and bipartite matchings: a combinatorial certificate, here exchange-greedy optimality, becomes an integrality statement. The rank axioms are essential; for an arbitrary set function $r$, the same inequalities need not describe the convex hull of any natural independence family. For instance, on $E=\{1,2\}$ set $r(\varnothing)=0$, $r(\{1\})=r(\{2\})=0$, and $r(E)=1$. The resulting inequalities force $x_1\le 0$ and $x_2\le 0$, so the polyhedron contains only the zero incidence vector, whereas the value $r(E)=1$ suggests that a one-element independent choice should be possible. This failure is exactly what matroid monotonicity rules out. The theorem also does not make the description automatically efficient, since there is one inequality for every subset of $E$ unless a separation oracle or special structure is available. Conceptually, it connects polyhedral combinatorics with submodular optimisation: the rank function is submodular, and the greedy proof is a polyhedral expression of that submodularity.
[example: Uniform Matroid Polytope]
For the uniform matroid $U_{k,n}$ on $E=\{1,\dots,n\}$, the rank function is $r(S)=\min\{|S|,k\}$. The rank inequality for $S=E$ is
\begin{align*}
\sum_{i=1}^n x_i \le r(E)=\min\{n,k\}=k,
\end{align*}
and the singleton inequalities give $x_i\le r(\{i\})\le 1$ for every $i$. Together with non-negativity, every point in the independence polytope satisfies
\begin{align*}
0\le x_i\le 1 \quad\text{for }1\le i\le n,
\qquad
\sum_{i=1}^n x_i\le k.
\end{align*}
Conversely, if these inequalities hold, then for every $S\subset E$ we have
\begin{align*}
\sum_{i\in S}x_i\le \sum_{i\in S}1=|S|
\end{align*}
and also
\begin{align*}
\sum_{i\in S}x_i\le \sum_{i=1}^n x_i\le k.
\end{align*}
Hence
\begin{align*}
\sum_{i\in S}x_i\le \min\{|S|,k\}=r(S),
\end{align*}
so the full system of rank inequalities is exactly the system above.
The vertices of this polytope are precisely the $0$-$1$ vectors with at most $k$ ones. First, any such vector is feasible, and it is fixed uniquely by the $n$ active constraints $x_i=0$ or $x_i=1$ according to its coordinates, so it is a vertex. Conversely, let $x$ be feasible and suppose some coordinate is fractional. If $\sum_i x_i<k$, choose a fractional coordinate $j$ and choose $\varepsilon>0$ small enough that $0\le x_j-\varepsilon<x_j<x_j+\varepsilon\le 1$ and $\sum_i x_i+\varepsilon\le k$. Then $x$ is the midpoint of the two distinct feasible points obtained by replacing $x_j$ by $x_j-\varepsilon$ and $x_j+\varepsilon$, so $x$ is not a vertex. If instead $\sum_i x_i=k$, then because $k$ is an integer there must be at least two fractional coordinates $p,q$: one fractional coordinate alone would make the total sum non-integral after adding the integral coordinates. Choose $\varepsilon>0$ small enough that both $x_p+\varepsilon$ and $x_q-\varepsilon$ remain in $[0,1]$. The two points obtained by changing $(x_p,x_q)$ to $(x_p+\varepsilon,x_q-\varepsilon)$ and to $(x_p-\varepsilon,x_q+\varepsilon)$ are distinct, feasible, and have midpoint $x$. Thus no fractional feasible point is a vertex.
Therefore the uniform matroid rank inequalities collapse to the familiar cardinality constraint, and the polytope is the convex hull of the incidence vectors of subsets of size at most $k$.
[/example]
Polyhedral combinatorics therefore gives a common language for several earlier parts of the course. Total unimodularity explains the integrality of network and bipartite models; odd-set inequalities explain the extra structure needed for non-bipartite matching; matroid rank inequalities encode exchange. The recurring pattern is that a good combinatorial theorem often has an equivalent statement saying that a natural relaxation has integral faces. The final chapter uses this positive picture as the boundary line for NP-hardness and approximation.
# 12. NP-Hardness Boundaries and Approximation Glimpses
This chapter closes the course by locating the boundary between the exact polynomial-time methods developed in the previous chapters and the integer optimisation problems where such methods should not be expected. The guiding question is structural: which combinatorial constraints admit short certificates of optimality, and which small changes make the same style of problem NP-hard? We then look at approximation and relaxation as weaker substitutes for exact optimisation certificates.
## Exact Polynomial Structure Versus Hard Integer Optimisation
What feature made the earlier algorithms exact? In each case, there was a compact certificate: an augmenting path for non-optimality, a cut or dual potential for optimality, or an exchange property ensuring that a greedy step could be repaired. NP-hardness results show that such certificates should not be expected for every natural integer formulation unless major complexity-theoretic collapses occur.
[definition: Polynomial-Time Many-One Reduction]
Let $A$ and $B$ be decision problems over finite encodings. A polynomial-time many-one reduction from $A$ to $B$ is a map $f:\operatorname{Enc}(A)\to\operatorname{Enc}(B)$ computable in time polynomial in the input size.
[/definition]
For the map to transfer a decision problem, it must preserve the answer to every instance:
\begin{align*}
I \text{ is a yes-instance of } A
\quad \Longleftrightarrow \quad
f(I) \text{ is a yes-instance of } B.
\end{align*}
This preservation property is the mechanism for transferring hardness. If a known hard problem reduces to a new problem, then an efficient exact algorithm for the new problem would also solve the old one efficiently; to use this idea for optimisation, we need to specify how the decision threshold is extracted from an optimum value.
[definition: NP-Hard Optimisation Problem]
An optimisation problem is NP-hard if there is an NP-hard decision problem that reduces in polynomial time to the decision version of that optimisation problem.
[/definition]
The decision version usually asks whether there is a feasible solution with value at most $K$ for a minimisation problem, or at least $K$ for a maximisation problem. This lets us compare exact optimisation algorithms with yes-or-no reductions, and it prepares the first boundary: pairwise matching constraints have polynomial structure, while three-way matching constraints encode hard choices.
[example: Matching And Three-Dimensional Matching]
Bipartite matching asks for disjoint edges in a graph with two vertex classes. Three-dimensional matching has three disjoint ground sets $X,Y,Z$ and a family $T\subset X\times Y\times Z$ of triples, and it asks for a subfamily of triples no two of which agree in any coordinate.
Take $X=Y=Z=\{1,2\}$ and
\begin{align*}
T=\{(1,1,1),(1,2,2),(2,1,2),(2,2,1)\}.
\end{align*}
A perfect three-dimensional matching would have size $2$, because each chosen triple covers one element of $X$, one element of $Y$, and one element of $Z$, while each of $X,Y,Z$ has two elements. Write
\begin{align*}
t_1=(1,1,1),\quad t_2=(1,2,2),\quad t_3=(2,1,2),\quad t_4=(2,2,1).
\end{align*}
The pair $t_1,t_2$ shares the $X$-coordinate $1$. The pair $t_1,t_3$ shares the $Y$-coordinate $1$. The pair $t_1,t_4$ shares the $Z$-coordinate $1$. The pair $t_2,t_3$ shares the $Z$-coordinate $2$. The pair $t_2,t_4$ shares the $Y$-coordinate $2$. The pair $t_3,t_4$ shares the $X$-coordinate $2$.
There are
\begin{align*}
\binom{4}{2}=6
\end{align*}
pairs of triples in $T$, and the six checks above list all of them. Hence every pair of triples conflicts in at least one coordinate, so no matching can contain two triples. Therefore this instance has no perfect matching of size $2$. The example shows that moving from pair constraints to triple constraints creates simultaneous conflicts that cannot be repaired by a single alternating-path move.
[/example]
The example gives a concrete obstruction, but the course needs a complexity statement explaining that the obstruction is not accidental. The next theorem records the boundary: pairwise incidence in bipartite graphs is controlled by augmenting paths and integral flows, while three-way incidence is already NP-complete.
[quotetheorem:5831]
[citeproof:5831]
This theorem explains why the course repeatedly isolates special matrices and exchange axioms. They are not decorative hypotheses; they are the structures that keep the integer hull accessible. The bipartite hypothesis is essential: with two vertex classes, an unmatched vertex can be connected to another by an alternating path and the local repair has a global certificate, but with triples there is no single alternating path that records all conflicts among three coordinates. The four-triple example above is a miniature version of this failure, since each local-looking choice blocks every possible second triple. The theorem does not say that every matching-like problem is hard, or that all hypergraph matching instances are hopeless; it says that the general three-way incidence formulation is expressive enough to encode NP-complete exact-cover choices. Later reductions should be read in this same way: the point is not the vocabulary of the problem, but the loss of a structural certificate strong enough to control the integer hull.
## Reductions From Vertex Cover, Independent Set, And Travelling Salesman Variants
How do we prove that a problem near our exact toolkit is hard? We encode a familiar hard choice inside the new problem, while preserving the threshold value of the optimum. The reductions in this section are templates rather than isolated tricks.
[definition: Vertex Cover]
Let $G=(V,E)$ be a finite undirected graph. A vertex cover is a set $C \subset V$ such that every edge $uv \in E$ has $u \in C$ or $v \in C$.
[/definition]
Vertex cover is a minimisation problem: choose as few vertices as possible while covering all edges. To relate it to a maximisation problem on the same graph, we introduce the complementary notion of selecting vertices that avoid all edges between them.
[definition: Independent Set]
Let $G=(V,E)$ be a finite undirected graph. An independent set is a set $S \subset V$ such that no edge of $G$ has both endpoints in $S$.
[/definition]
The two definitions now pose a natural question: does covering every edge by $C$ mean that the vertices not chosen by $C$ are mutually non-adjacent? The answer gives a reduction template between a minimisation threshold and a maximisation threshold.
[quotetheorem:5832]
[citeproof:5832]
This complementarity is exact, but it does not make either problem easy. It depends on taking complements inside the same ambient vertex set: if we changed the ground set or moved to a hypergraph covering problem, the arithmetic $|S|=|V|-|C|$ would no longer by itself encode independence. The theorem also does not identify which of the two problems is algorithmically simpler; it only says that their threshold decision versions carry the same information after replacing $k$ by $|V|-k$. It tells us that an algorithmic breakthrough for one would immediately give the same breakthrough for the other, and a small graph shows the arithmetic of the complement relation.
[example: A Small Cover And Independent Set Pair]
Let $K_3$ have vertex set $V=\{a,b,c\}$ and edge set
\begin{align*}
E=\{ab,ac,bc\}.
\end{align*}
A one-vertex set cannot be a vertex cover: $\{a\}$ misses the edge $bc$, $\{b\}$ misses the edge $ac$, and $\{c\}$ misses the edge $ab$. Hence every vertex cover has size at least $2$. The set $\{a,b\}$ is a vertex cover because $ab$ has endpoint $a$, $ac$ has endpoint $a$, and $bc$ has endpoint $b$, so the minimum vertex-cover value is exactly $2$.
For independent sets, every two-vertex subset contains an edge:
\begin{align*}
\{a,b\}\text{ contains }ab,\quad
\{a,c\}\text{ contains }ac,\quad
\{b,c\}\text{ contains }bc.
\end{align*}
Thus no independent set has size $2$. Each one-vertex set is independent because it contains no pair of distinct adjacent vertices, so the maximum independent-set value is exactly $1$. Therefore the two optimum values satisfy
\begin{align*}
2+1=3=|V|.
\end{align*}
This is the complement relation in its smallest nontrivial form: choosing two vertices to cover all edges leaves exactly one vertex that is independent.
[/example]
The triangle example is local, while many hardness reductions force a global cyclic structure. To encode such a structure as an optimisation question, we need a problem where a feasible solution must visit every vertex exactly once and where the total length can distinguish graph edges from non-edges.
[definition: Travelling Salesman Decision Problem]
An instance consists of a finite set of cities, a non-negative integer distance $d(i,j)$ for each ordered pair of distinct cities, and an integer $K$. The question is whether there is a cyclic ordering of all cities whose total length is at most $K$.
[/definition]
The decision form is useful because Hamiltonian cycle can be embedded by assigning cheap lengths to graph edges and expensive lengths to non-edges. The threshold then detects whether the tour used only original graph edges.
[quotetheorem:5833]
[citeproof:5833]
The distance construction is doing essential work: the threshold $K=n$ separates tours that use only graph edges from tours that pay for at least one non-edge. If the expensive distance were not separated from the cheap distance, the tour length would no longer faithfully record Hamiltonian cycles. The theorem is a hardness statement for exact decision, not a claim that every metric or geometric travelling-salesman instance behaves the same way, and it does not rule out approximation guarantees. This proof also shows a common warning: a problem can look like a shortest path or assignment problem but become hard when the feasible object must visit every vertex exactly once.
The reductions so far explain why exact polynomial algorithms cannot be expected in full generality. The next question is more modest and more useful in hard regimes: can we still compute a feasible solution together with a certificate that its value is close to optimal? This shifts the role of cuts, matchings, and relaxations from proving exact optimality to proving quantitative bounds.
## Simple Approximation And Relaxation Ideas
When exact optimisation is hard, what kind of guarantee can still be proved? Approximation algorithms compare a polynomial-time solution with the unknown optimum, while relaxations replace an integer feasible region by a larger tractable one. Both ideas use the same certificate philosophy as the earlier chapters, but the certificate now bounds the optimum rather than identifying it exactly.
[definition: Approximation Ratio]
For a minimisation problem, an algorithm has approximation ratio $\rho \ge 1$ if for every instance it returns a feasible solution of value at most $\rho\operatorname{OPT}$. For a maximisation problem, it has approximation ratio $\rho \le 1$ if it returns a feasible solution of value at least $\rho\operatorname{OPT}$.
[/definition]
The definition asks for a computable solution together with a universal comparison to the optimum. For vertex cover, a maximal matching gives both a candidate cover and a lower bound, so it is the first approximation theorem to prove.
[quotetheorem:5834]
[citeproof:5834]
This is an approximation result rather than an exact theorem because the chosen cover may be twice the optimum. The maximality hypothesis is the feasibility certificate: a non-maximal matching could leave an uncovered edge whose endpoints are not selected. The disjointness of matching edges is the lower-bound certificate, since one vertex can cover at most one edge of the matching. The theorem does not say that every maximal matching gives a minimum cover, nor that the factor $2$ is intrinsic to vertex cover itself; it only certifies what this simple algorithm can guarantee from this particular witness. The matching still acts as a dual witness, but only to within a factor, and the next example shows the factor can occur for this algorithm.
[example: A Tight Example For The Matching Cover Algorithm]
Let $G$ have vertex set $\{u,v\}$ and edge set $\{uv\}$. The matching
\begin{align*}
M=\{uv\}
\end{align*}
is maximal, because there is no edge of $G$ outside $M$ that could be added to it. The matching-cover algorithm therefore selects all endpoints of edges in $M$, so it returns
\begin{align*}
C=\{u,v\}
\end{align*}
and hence
\begin{align*}
|C|=2.
\end{align*}
The set $\{u\}$ is a vertex cover because the only edge $uv$ has endpoint $u$, and the set $\{v\}$ is a vertex cover because the only edge $uv$ has endpoint $v$. No vertex cover can have size $0$, since the edge $uv$ would then have neither endpoint selected. Therefore
\begin{align*}
\operatorname{OPT}=1.
\end{align*}
The approximation ratio attained on this instance is
\begin{align*}
\frac{|C|}{\operatorname{OPT}}
=
\frac{2}{1}
=
2.
\end{align*}
Thus the factor-$2$ analysis is tight for this algorithm: even on one edge, the algorithm may return twice as many vertices as an optimum cover.
[/example]
The tight example exposes the missing ingredient: the matching lower bound does not describe the whole integer feasible region. This motivates a linear programming relaxation for vertex cover, where we keep the edge-covering inequalities but allow fractional choices to obtain a computable lower bound.
[definition: Vertex Cover Linear Programming Relaxation]
For a graph $G=(V,E)$, the vertex cover LP relaxation minimises $\sum_{v\in V}x_v$ subject to $x_u+x_v\ge 1$ for every edge $uv\in E$, and $x_v\ge 0$ for every vertex $v\in V$.
[/definition]
The integer programme also has constraints $x_v\in\{0,1\}$. Removing those constraints can lower the optimum, so the fractional optimum is a lower bound for the true minimum cover size.
[example: Fractional And Integer Optima On A Triangle]
Let $K_3$ have vertices $1,2,3$ and edges $12,13,23$. An integer vertex cover cannot have size $0$ or $1$: size $0$ covers no edge, and a single chosen vertex misses the edge between the other two vertices. The set $\{1,2\}$ covers $12$ by vertex $1$, covers $13$ by vertex $1$, and covers $23$ by vertex $2$, so the integer optimum is $2$.
For the LP relaxation, assign $x_1=x_2=x_3=\frac{1}{2}$. The edge $12$ has
\begin{align*}
x_1+x_2=\frac{1}{2}+\frac{1}{2}=1.
\end{align*}
The edge $13$ has
\begin{align*}
x_1+x_3=\frac{1}{2}+\frac{1}{2}=1.
\end{align*}
The edge $23$ has
\begin{align*}
x_2+x_3=\frac{1}{2}+\frac{1}{2}=1.
\end{align*}
Thus all three edge constraints are satisfied. The objective value of this fractional solution is
\begin{align*}
x_1+x_2+x_3=\frac{1}{2}+\frac{1}{2}+\frac{1}{2}=\frac{3}{2}.
\end{align*}
To prove that no feasible fractional solution has smaller value, take any feasible LP solution. The three edge inequalities are $x_1+x_2\ge 1$, $x_1+x_3\ge 1$, and $x_2+x_3\ge 1$. Adding the left sides and the right sides gives
\begin{align*}
(x_1+x_2)+(x_1+x_3)+(x_2+x_3)\ge 1+1+1.
\end{align*}
Collecting like terms gives
\begin{align*}
2x_1+2x_2+2x_3\ge 3.
\end{align*}
Factoring the left side gives
\begin{align*}
2(x_1+x_2+x_3)\ge 3.
\end{align*}
Dividing by $2$ gives
\begin{align*}
x_1+x_2+x_3\ge \frac{3}{2}.
\end{align*}
Since the assignment $x_1=x_2=x_3=\frac{1}{2}$ attains this lower bound, the LP optimum is exactly $\frac{3}{2}$.
The integer optimum is $2$ and the fractional optimum is $\frac{3}{2}$, so the integrality gap on this instance is
\begin{align*}
\frac{2}{3/2}=2\cdot \frac{2}{3}=\frac{4}{3}.
\end{align*}
The triangle is the smallest example here where the LP lower bound is sharp fractionally but still below the true integer cover value.
[/example]
The triangle computation gives a fractional lower bound that is not integral, so solving the relaxation does not directly produce a feasible vertex cover.
The algorithmic question is how much quality is lost if we round such a fractional solution into an actual cover. The edge inequalities suggest a threshold rule at $1/2$; the next result shows that this rule always preserves coverage and increases the objective by at most a factor of two.
[quotetheorem:5835]
[citeproof:5835]
The threshold
\begin{align*}
\frac{1}{2}
\end{align*}
is forced by the two-variable edge constraints: if both endpoints were below
\begin{align*}
\frac{1}{2},
\end{align*}
the edge inequality would fail. This is why the proof is special to vertex cover and similar pairwise covering systems; a constraint involving many variables need not contain any variable as large as
\begin{align*}
\frac{1}{2}.
\end{align*}
The theorem also separates the two roles of the LP relaxation: the fractional optimum gives a lower bound, while rounding gives a feasible integer solution whose cost is controlled by that lower bound. To see how the same ideas behave with more general covering constraints, we move to set cover, where each chosen set can cover an arbitrary subset of the universe and a single uncovered element may be covered by many overlapping choices.
[definition: Set Cover]
Let $U$ be a finite universe and let $\mathcal S\subset 2^U$ be a family of subsets whose union is $U$. A set cover is a subfamily $\mathcal C\subset \mathcal S$ such that $\bigcup_{S\in\mathcal C}S=U$.
[/definition]
The greedy heuristic repeatedly chooses a set covering the largest number of currently uncovered elements. Its logarithmic approximation analysis uses a charging argument: each newly covered element is charged a share of the chosen set, and the total charge can be bounded by comparing the remaining uncovered elements with an optimal cover.
[example: Greedy Set Cover Heuristic]
Let
\begin{align*}
U=\{1,2,3,4,5,6\}
\end{align*}
and write $S_1=\{1,2,3\}$, $S_2=\{3,4,5\}$, $S_3=\{5,6\}$, and $S_4=\{4,6\}$. At the first greedy step, the uncovered contribution of each set is just its size:
\begin{align*}
|S_1|=3,\quad |S_2|=3,\quad |S_3|=2,\quad |S_4|=2.
\end{align*}
Thus greedy may choose either $S_1$ or $S_2$. Suppose the tie is broken by choosing $S_1$. The uncovered elements are then
\begin{align*}
U\setminus S_1=\{1,2,3,4,5,6\}\setminus \{1,2,3\}=\{4,5,6\}.
\end{align*}
The second greedy step must recompute contributions using only $\{4,5,6\}$. For $S_1$ we have
\begin{align*}
S_1\cap \{4,5,6\}=\varnothing,
\end{align*}
so its current contribution is $0$. For $S_2$ we have
\begin{align*}
S_2\cap \{4,5,6\}=\{4,5\},
\end{align*}
so its current contribution is $2$. For $S_3$ we have
\begin{align*}
S_3\cap \{4,5,6\}=\{5,6\},
\end{align*}
so its current contribution is $2$. For $S_4$ we have
\begin{align*}
S_4\cap \{4,5,6\}=\{4,6\},
\end{align*}
so its current contribution is $2$. Hence the next choice depends on tie-breaking among $S_2,S_3,S_4$.
If greedy chooses $S_2$, then the uncovered set becomes
\begin{align*}
\{4,5,6\}\setminus \{4,5\}=\{6\},
\end{align*}
and one more set containing $6$, namely $S_3$ or $S_4$, completes the cover. If greedy chooses $S_3$, then the uncovered set becomes
\begin{align*}
\{4,5,6\}\setminus \{5,6\}=\{4\},
\end{align*}
and one more set containing $4$, namely $S_2$ or $S_4$, completes the cover. If greedy chooses $S_4$, then the uncovered set becomes
\begin{align*}
\{4,5,6\}\setminus \{4,6\}=\{5\},
\end{align*}
and one more set containing $5$, namely $S_2$ or $S_3$, completes the cover.
Thus every greedy continuation after the first choice produces a cover of size $3$, but the actual subfamily returned can differ. The example shows that the greedy certificate at each step is local: it records the largest current uncovered contribution, not a unique globally determined cover.
[/example]
Relaxations also explain why exact integer optimisation is harder than linear optimisation. The linear programme gives a bound quickly when its feasible region is tractable, but the gap between fractional and integral optima measures the price of losing discreteness.
[example: Comparing Integer And Fractional Optima]
Let the star graph have centre $c$, leaves $l_1,\dots,l_m$ with $m\ge 1$, and edges $cl_i$ for $1\le i\le m$. The set $\{c\}$ is a vertex cover because every edge $cl_i$ has endpoint $c$, so the integer optimum is at most $1$. No vertex cover has size $0$, since the edge $cl_1$ would then have neither endpoint selected. Hence the minimum vertex-cover value is exactly $1$.
For the LP relaxation, take
\begin{align*}
x_c=1,\qquad x_{l_i}=0\quad\text{for }1\le i\le m.
\end{align*}
For each edge $cl_i$,
\begin{align*}
x_c+x_{l_i}=1+0=1,
\end{align*}
so every edge constraint is satisfied. The objective value is
\begin{align*}
x_c+\sum_{i=1}^m x_{l_i}
=
1+\sum_{i=1}^m 0
=
1+0
=
1.
\end{align*}
Conversely, any feasible LP solution satisfies, for the edge $cl_1$,
\begin{align*}
x_c+x_{l_1}\ge 1.
\end{align*}
Since all variables are nonnegative,
\begin{align*}
x_c+\sum_{i=1}^m x_{l_i}
\ge
x_c+x_{l_1}
\ge
1.
\end{align*}
Thus the LP optimum on the star is exactly $1$, the same as the integer optimum.
Now let $C_n$ be an odd cycle with vertices $v_1,\dots,v_n$, where $n=2q+1$, and edges $v_iv_{i+1}$ for $1\le i<n$ together with $v_nv_1$. The fractional assignment
\begin{align*}
x_{v_i}=\frac{1}{2}\quad\text{for }1\le i\le n
\end{align*}
is feasible because every edge $v_iv_j$ has
\begin{align*}
x_{v_i}+x_{v_j}
=
\frac{1}{2}+\frac{1}{2}
=
1.
\end{align*}
Its objective value is
\begin{align*}
\sum_{i=1}^n x_{v_i}
=
\sum_{i=1}^n \frac{1}{2}
=
\frac{n}{2}.
\end{align*}
To see that this is the LP optimum, add all $n$ edge inequalities. The left side counts each vertex variable twice, since each vertex of a cycle is incident to exactly two edges:
\begin{align*}
(x_{v_1}+x_{v_2})+(x_{v_2}+x_{v_3})+\cdots+(x_{v_n}+x_{v_1})
\ge
\underbrace{1+\cdots+1}_{n\text{ terms}}.
\end{align*}
Collecting like terms gives
\begin{align*}
2\sum_{i=1}^n x_{v_i}\ge n.
\end{align*}
Dividing by $2$ gives
\begin{align*}
\sum_{i=1}^n x_{v_i}\ge \frac{n}{2}.
\end{align*}
Since the all-$\frac12$ assignment attains this lower bound, the fractional optimum is exactly $\frac{n}{2}$.
The integer optimum on $C_n$ is $\frac{n+1}{2}$. Indeed, if $C$ is a vertex cover, then the vertices not in $C$ contain no adjacent pair; otherwise an edge between two unchosen vertices would be uncovered. Around the cycle, each unchosen vertex therefore needs a chosen vertex after it, so the number of unchosen vertices is at most the number of chosen vertices. Since $n=2q+1$, this gives at most $q$ unchosen vertices and hence at least
\begin{align*}
n-q=(2q+1)-q=q+1=\frac{n+1}{2}
\end{align*}
chosen vertices. This bound is attained by choosing
\begin{align*}
\{v_1,v_3,v_5,\dots,v_{2q+1}\},
\end{align*}
because every edge either has one endpoint in this set, or in the case of $v_{2q+1}v_1$, has both endpoints in this set. Therefore the integer optimum is exactly $\frac{n+1}{2}$.
Thus the star has no integrality gap, while the odd cycle has integer optimum $\frac{n+1}{2}$ and fractional optimum $\frac{n}{2}$. The same relaxation can therefore be exact on one graph family and strictly inexact on another.
[/example]
The chapter ends the course by reframing the earlier positive results. Polynomial-time solvability came from augmenting paths, cuts, potentials, exchange, duality, submodularity, and total unimodularity; NP-hardness begins when those structures no longer control the integer feasible region. Approximation and relaxation keep part of the certificate philosophy alive: even when the exact optimum is out of reach, we can still prove that a computed solution is close to the best possible or that a fractional bound explains what remains difficult.
## Beyond and Connected Topics
For adjacent Androma material, compare this course with [Cambridge IB Optimisation](/page/Cambridge%20IB%20Optimisation) as a neighbouring optimisation syllabus. The natural specialised continuations are Network Flows for augmenting paths and cut certificates, Matching Theory for bipartite and general matching algorithms, Polyhedral Combinatorics for linear descriptions and total dual integrality, Submodular Functions for greedy and polymatroid methods, Integer Programming for formulations and integrality gaps, and Approximation Algorithms for NP-hard optimisation problems.
The same certificate pattern appears throughout discrete mathematics. Shortest-path potentials, max-flow/min-cut certificates, matching covers, row-column potentials for assignments, matroid exchange certificates, and submodular optimality conditions all turn a proposed solution into something independently checkable. The differences between these examples are not cosmetic: each certificate works because the feasible region has a special structure such as total unimodularity, exchange, residual augmentation, or submodularity. Small changes to constraints can remove that structure and push the problem into approximation or hardness theory.
## References
- Alexander Schrijver, *Combinatorial Optimization: Polyhedra and Efficiency*, Springer, 2003.
- Bernhard Korte and Jens Vygen, *Combinatorial Optimization: Theory and Algorithms*, Springer, 2018.
- William J. Cook, William H. Cunningham, William R. Pulleyblank, and Alexander Schrijver, *Combinatorial Optimization*, Wiley, 1998.
- Christos H. Papadimitriou and Kenneth Steiglitz, *Combinatorial Optimization: Algorithms and Complexity*, Dover, 1998.
- Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin, *Network Flows: Theory, Algorithms, and Applications*, Prentice Hall, 1993.
Contents
- 1. Optimisation Problems on Finite Structures
- Optimising Over Finite Families
- Feasibility, Local Moves, And Certificates
- Polynomial Time And Input Size
- Greedy, Augmenting, And Primal-Dual Paradigms
- 2. Matchings in Bipartite Graphs
- Alternating Structure Around a Matching
- Augmenting Paths and the Hungarian Search Tree
- Hall's Condition and Vertex-Cover Duality
- 3. Weighted Bipartite Matching and the Assignment Problem
- Perfect Matchings, Cost Matrices, and Potentials
- Linear Programming Duality for the Assignment Problem
- The Hungarian Algorithm as a Primal-Dual Method
- Optimality Certificates and How to Read Them
- 4. Network Flows and Cuts
- Directed Networks and Feasible Flows
- Ford-Fulkerson and Edmonds-Karp
- Cuts and Conservation
- Max-Flow Min-Cut Duality and Applications
- 5. Circulations, Lower Bounds, and Feasibility
- Lower and Upper Bounds on Arcs
- Circulations, Demands, and Node Balances
- Reducing Feasibility to Maximum Flow
- Integral Circulations and Discrete Feasibility
- 6. Minimum-Cost Flows
- Costed Networks and Residual Costs
- Cycle-Canceling and Successive Shortest Augmenting Paths
- Reduced Costs and Potentials
- Integrality and Modelling Consequences
- 7. General Matchings
- Odd Cycles and the Failure of Bipartite Search
- Shrinking Blossoms and Lifting Paths
- Tutte and Berge Formula Viewpoints
- 8. Matroids and Greedy Optimisation
- From Independence Systems to Matroids
- Greedy Optimisation over Bases
- Standard Families of Matroids
- 9. Matroid Intersection and Matroid Union
- Common Independent Sets and Exchange Graphs
- The Matroid Intersection Theorem
- Matroid Union and Covering Interpretations
- 10. Submodular Functions and Discrete Convexity
- Diminishing Returns for Set Functions
- Cut Functions as Submodular Functions
- Matroid Rank and Discrete Convexity
- The Lovasz Extension
- Submodular Minimisation
- Closure Properties and Modelling Patterns
- 11. Polyhedral Combinatorics
- Convex Hulls of Feasible Solutions
- Total Unimodularity and Integral Polyhedra
- Matching Polytopes
- Flow Polytopes
- Matroid Polytopes
- 12. NP-Hardness Boundaries and Approximation Glimpses
- Exact Polynomial Structure Versus Hard Integer Optimisation
- Reductions From Vertex Cover, Independent Set, And Travelling Salesman Variants
- Simple Approximation And Relaxation Ideas
- Beyond and Connected Topics
- References
Combinatorial Optimisation
Content
Problems
History
Created by admin on 6/7/2026 | Last updated on 6/7/2026
Prerequisites (0/1 completed)
Log in to track your prerequisite progress.
Prerequisites Graph
Interactive dependency map showing prerequisite concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Rate this page
★
★
★
★
★
Poor
Excellent