Optimisation is the mathematical science of making the best possible choice from a set of available alternatives. Whether minimising cost, maximising profit, or achieving some other objective subject to constraints, optimisation problems arise across engineering, economics, operations research, and the sciences. This course develops both the theoretical foundations and practical methods for solving optimisation problems, from small-scale applications to large systems of industrial importance. We begin by establishing the conceptual and mathematical preliminaries needed to formulate and analyse optimisation problems rigorously, then progress through increasingly sophisticated techniques for finding optimal solutions.
The course centres on several interconnected themes. First is the role of constraints: most real-world optimisation involves restrictions on feasible solutions, and understanding how constraints affect optimality is crucial. Second is the distinction between continuous and discrete optimisation, illustrated through polynomial-time algorithms for linear programs versus the combinatorial complexity of network routing. Third is the interplay between single-agent optimisation and game-theoretic settings, where multiple decision-makers compete or cooperate. Throughout, we emphasise both necessary conditions for optimality (such as the Lagrange multiplier conditions) and algorithmic methods for computing optimal solutions.
The chapters build a natural progression. We establish foundations in Chapter 1, then introduce Lagrange multipliers in Chapter 2 — the cornerstone technique for handling equality and inequality constraints. Chapter 3 treats linear programming, where constraints are affine and objectives linear, revealing the geometric structure of feasible regions and the simplex method. Chapter 4 shifts perspective to game theory, where optimisation becomes strategic: each player optimises their own objective given others' choices, leading to equilibrium concepts rather than global optima. Finally, Chapter 5 applies these ideas to network problems — routing, flow, matching — where graph structure combines with optimisation to solve canonical operational problems.
# Introduction and Preliminaries
This opening chapter lays the algebraic and analytic groundwork for the entire course. The central subject of Operations Research is optimisation under constraints: given an objective function and a set of feasibility conditions, find the decision variables that achieve the best objective value. Before developing algorithmic machinery, the course establishes what the most general form of such a problem looks like, introduces the special but fundamental case of linear programming, and recalls the classical theory of unconstrained optimisation — in particular, the role of convexity — that motivates everything that follows.
## Constrained Optimisation and Its General Form
Every optimisation problem in this course can be written in a single canonical form. To see why this is worth doing, consider a factory that wants to minimise production cost. Minimising over all of $\mathbb{R}^n$ with no restrictions is meaningless — producing nothing would minimise cost without any practical meaning. Practical problems always come with constraints, and the art of the subject is to handle those constraints systematically.
[definition: Constrained Optimisation Problem]
The **general constrained optimisation problem** is:
\begin{align*}
\text{minimise} \quad & f(x) \\
\text{subject to} \quad & h(x) = b, \quad x \in X,
\end{align*}
where $x \in \mathbb{R}^n$ is the **vector of decision variables**, $f: \mathbb{R}^n \to \mathbb{R}$ is the **objective function**, $h: \mathbb{R}^n \to \mathbb{R}^m$ and $b \in \mathbb{R}^m$ define the **functional constraints**, and $X \subseteq \mathbb{R}^n$ is the **regional constraint**.
[/definition]
This formulation is genuinely the most general one. Several apparent variants reduce to it:
- **Maximisation.** To maximise $f$, minimise $-f$.
- **Inequality functional constraints.** To handle $h(x) \geq b$, introduce a **slack variable** $z \geq 0$ and rewrite the constraint as $h(x) - z = b$, with $z \geq 0$ appended to the regional constraint.
- **Unconstrained variables.** If a variable $x_j$ may take any sign, write $x_j = x_j^+ - x_j^-$ with $x_j^+, x_j^- \geq 0$. Both parts are then governed by the standard regional constraint.
Note: throughout this course almost every object is a vector, so vectors are not bolded — context always makes the distinction clear.
## Linear Programming
What special structure can we exploit when the objective and all constraints are linear? This question is not merely academic: linearity turns out to be the key to both theoretical tractability and practical solvability.
**Linear programming** is the special case of constrained optimisation in which both the objective function and all constraints are linear. Linearity buys two decisive advantages. First, the feasible region — the intersection of finitely many half-spaces — is a convex polytope with only finitely many vertices. Second, it is a classical fact that if a finite optimum exists, it is always attained at one of these vertices. This means the search for an optimum over an infinite feasible set reduces, in principle, to checking finitely many candidate points. In practice the simplex method (Chapter 3) exploits this geometry to move efficiently from vertex to vertex, and interior-point methods (Chapter 4) achieve polynomial worst-case complexity. Neither guarantee is available for general nonlinear programs.
A general linear program can involve constraints of mixed types ($\geq$, $\leq$, $=$) and variables with mixed sign restrictions. One can reduce all of these, via the slack variable technique above, to two canonical forms.
[definition: General and Standard Forms of a Linear Program]
The **general form** of a linear program is:
\begin{align*}
\text{minimise} \quad & c^\top x \\
\text{subject to} \quad & Ax \geq b, \quad x \geq 0.
\end{align*}
The **standard form** is:
\begin{align*}
\text{minimise} \quad & c^\top x \\
\text{subject to} \quad & Ax = b, \quad x \geq 0.
\end{align*}
[/definition]
The two forms are equivalent: introduce slack variables $z \geq 0$ with $Ax - z = b$ to pass from general to standard form. The standard form is preferred when developing theoretical results, because equality constraints are easier to work with algebraically. In practice, the conversion can increase the number of variables, which is a computational trade-off rather than a theoretical one.
[example: A Two-Dimensional Linear Program]
Minimise $-(x_1 + x_2)$ subject to
\begin{align*}
x_1 + 2x_2 &\leq 6, \\
x_1 - x_2 &\leq 3, \\
x_1, x_2 &\geq 0.
\end{align*}
Because this is a two-dimensional problem, the feasible region — the set of $(x_1, x_2)$ satisfying all constraints — is a convex polygon. Its corners are found by solving pairs of binding constraints. The origin $(0,0)$ is the intersection of $x_1 = 0$ and $x_2 = 0$. The point $(3,0)$ is the intersection of $x_1 - x_2 = 3$ and $x_2 = 0$: substituting $x_2 = 0$ gives $x_1 = 3$. The point $(0,3)$ is the intersection of $x_1 + 2x_2 = 6$ and $x_1 = 0$: substituting $x_1 = 0$ gives $x_2 = 3$. The point $(4,1)$ is the intersection of the two non-trivial constraints $x_1 + 2x_2 = 6$ and $x_1 - x_2 = 3$: subtracting the second from the first gives $3x_2 = 3$, so $x_2 = 1$ and $x_1 = 4$. Thus the four corners are $(0,0)$, $(3,0)$, $(4,1)$, and $(0,3)$.
The objective function $-(x_1 + x_2)$ defines level curves $x_1 + x_2 = k$ (lines with slope $-1$), and its gradient (the cost vector $c = (-1,-1)^\top$) points in the direction of steepest decrease of the objective. To minimise $-(x_1 + x_2)$, we push the level curve $x_1 + x_2 = k$ as far as possible in the direction of $c$, i.e.\ we want the largest value of $x_1 + x_2$. The objective values at the corners are: $(0,0) \mapsto 0$, $(3,0) \mapsto -3$, $(4,1) \mapsto -5$, $(0,3) \mapsto -3$. The minimum value $-5$ is attained at $x^* = (4,1)$.
[/example]
[illustration:lp-feasible-region-level-curves]
The example already suggests the key structural fact about linear programs: the optimum is always achieved at a corner of the feasible region. This observation drives the entire algorithmic development in subsequent chapters.
## Unconstrained Optimisation and Convexity
Before attacking constrained problems directly, it helps to recall what we know about minimising a smooth function with no constraints, since the constrained theory will later reduce to this case via Lagrange multipliers.
Let $f: \mathbb{R}^n \to \mathbb{R}$ be differentiable. A necessary condition for $x^* \in \mathbb{R}^n$ to be a local minimum of $f$ is
\begin{align*}
\nabla f(x^*) = 0, \qquad \nabla f = \left(\frac{\partial f}{\partial x_1}, \ldots, \frac{\partial f}{\partial x_n}\right)^\top.
\end{align*}
This condition alone is not sufficient: a critical point can be a maximum, a minimum, or a saddle. The failure can be dramatic: consider $f: \mathbb{R} \to \mathbb{R}$ defined by $f(x) = x^3$. Here $f'(0) = 0$, so $x = 0$ is a critical point, yet $f(-1) = -1 < 0 = f(0)$, so $x = 0$ is not a global minimum — it is an inflection point. The gradient vanishing at a point tells us nothing about whether that point is better or worse than distant points. To guarantee a global minimum, one invokes **convexity**.
The geometric idea behind convexity is simple: a function is convex when its graph has no "dips" — it curves upward everywhere. On such a function, any valley you find is the only valley, and any local minimum is automatically global. This is what makes convexity so powerful for optimisation: it eliminates the possibility of getting trapped in a local-but-not-global minimum.
[definition: Convex Set]
A set $S \subseteq \mathbb{R}^n$ is **convex** if, for every $x, y \in S$ and every $\delta \in [0,1]$,
\begin{align*}
\delta x + (1-\delta)y \in S.
\end{align*}
Equivalently, the line segment joining any two points of $S$ lies entirely within $S$.
[/definition]
[illustration:convex-vs-nonconvex-sets]
The convexity of the domain is not merely a technical hypothesis — it ensures that the "mixing" operation $\delta x + (1-\delta)y$ stays inside $S$, so that the function values $f(\delta x + (1-\delta)y)$ and $\delta f(x) + (1-\delta)f(y)$ are both well-defined and comparable. Without convexity of $S$, the defining inequality for a convex function would have no meaning at certain interpolation points.
[definition: Convex and Concave Function]
A function $f: S \to \mathbb{R}$, where $S \subseteq \mathbb{R}^n$ is convex, is **convex** if for all $x, y \in S$ and $\delta \in [0,1]$,
\begin{align*}
f(\delta x + (1-\delta)y) \leq \delta f(x) + (1-\delta)f(y).
\end{align*}
Geometrically, the chord joining the graph at $(x, f(x))$ and $(y, f(y))$ lies above the graph. A function is **concave** if $-f$ is convex. A function can be neither.
[/definition]
[illustration:convex-function-chord-inequality]
Deciding in practice whether a given function is convex can be done directly from the definition, but for smooth functions the Hessian provides a far more convenient criterion. The Hessian of $f$ at $x$, denoted $H_f(x)$, is the matrix of second partial derivatives $(H_f)_{ij}(x) = \frac{\partial^2 f}{\partial x_i \partial x_j}(x)$; it encodes the local curvature of $f$ in every direction. The function is convex if and only if this curvature is non-negative in every direction, which is precisely the condition that $H_f(x)$ be positive semidefinite.
[definition: Positive Semidefinite Matrix]
A symmetric matrix $A \in \mathbb{R}^{n \times n}$ is **positive semidefinite** (PSD) if $v^\top A v \geq 0$ for all $v \in \mathbb{R}^n$. It is **positive definite** if $v^\top A v > 0$ for all $v \neq 0$.
[/definition]
The quadratic form $v^\top A v$ measures the "curvature" contributed by $A$ in the direction $v$. A PSD matrix has non-negative curvature in every direction; a positive definite matrix has strictly positive curvature. This distinction will matter below: a critical point of a convex function need not be a strict minimum (consider $f(x) = 0$ everywhere).
[quotetheorem:2545]
This result is not proved in the course; a proof via the second-order Taylor expansion and properties of PSD matrices can be found in Rockafellar, *Convex Analysis* (Princeton, 1970), Theorem 4.5.
Notice that the Hessian criterion is an **if and only if**: $H_f(x) \succeq 0$ everywhere is both necessary and sufficient for convexity. However, this does not mean that every convex function has a strict minimum. The criterion tells us about the shape of the function, not about whether it achieves its infimum. There are also limitations: the criterion requires twice differentiability. For functions that are only Lipschitz, one must work with subgradients instead — a topic beyond the scope of this course.
[example: Checking Convexity via the Hessian]
Consider two functions $f, g: \mathbb{R}^2 \to \mathbb{R}$ defined by $f(x_1, x_2) = x_1^2 + x_2^2$ and $g(x_1, x_2) = x_1^2 - x_2^2$.
For $f$: the partial derivatives are $\frac{\partial f}{\partial x_1} = 2x_1$ and $\frac{\partial f}{\partial x_2} = 2x_2$, giving the Hessian
\begin{align*}
H_f = \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}.
\end{align*}
For any $v = (v_1, v_2)^\top$, we have $v^\top H_f v = 2v_1^2 + 2v_2^2 \geq 0$, with equality only when $v = 0$. Thus $H_f$ is positive definite everywhere, so $f$ is convex (in fact strictly convex).
For $g$: the partial derivatives are $\frac{\partial g}{\partial x_1} = 2x_1$ and $\frac{\partial g}{\partial x_2} = -2x_2$, giving the Hessian
\begin{align*}
H_g = \begin{pmatrix} 2 & 0 \\ 0 & -2 \end{pmatrix}.
\end{align*}
Taking $v = (0,1)^\top$ gives $v^\top H_g v = -2 < 0$, so $H_g$ is not positive semidefinite anywhere and $g$ is not convex. The function $g = x_1^2 - x_2^2$ is a saddle surface: it curves upward along the $x_1$-axis and downward along the $x_2$-axis. Its unique critical point at the origin is neither a maximum nor a minimum.
[/example]
The connection to minimisation is now immediate:
[quotetheorem:2546]
This result follows from the standard first- and second-order optimality theory for smooth functions; a proof using the second-order Taylor expansion is given in Nocedal and Wright, *Numerical Optimization*, 2nd ed. (Springer, 2006), Chapter 2.
The two hypotheses are both necessary. The convexity of $X$ ensures that line segments between points stay in the domain, which is needed for the proof to pass between any $x \in X$ and the candidate minimiser $x^*$. The condition that $H_f$ be positive semidefinite *everywhere* on $X$ (not just at $x^*$) ensures that $f$ is globally convex on $X$, so that the vanishing gradient at $x^*$ is a global rather than merely local condition. Dropping either hypothesis can make the theorem fail: without convexity of $X$, two points in $X$ might not be connected by a path that stays in $X$; without global PSD of the Hessian, the function might dip below $f(x^*)$ in some direction away from $x^*$.
[example: Practical Convexity Check]
Suppose you are given an objective function and need to decide whether to trust that any critical point you find is a global minimum. The practical workflow is:
1. Compute $H_f(x)$ symbolically.
2. Check whether $H_f(x) \succeq 0$ for all $x \in X$ — often by verifying that all leading principal minors are non-negative (Sylvester's criterion).
3. If yes, any critical point $x^*$ with $\nabla f(x^*) = 0$ is a global minimum. Solve $\nabla f(x) = 0$ to find it.
4. If no, the function may have saddle points or local minima that are not global. More sophisticated methods (branch-and-bound, global search) may be needed.
For instance, a production cost function $f(x_1, x_2) = x_1^2 + x_1 x_2 + x_2^2$ has Hessian $H_f = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}$. The leading $1 \times 1$ minor is $2 > 0$ and the determinant is $4 - 1 = 3 > 0$, so $H_f$ is positive definite everywhere and the function is strictly convex. Any critical point found by solving $\nabla f = 0$ will be the unique global minimum.
[/example]
[remark: Linear Functions Are Both Convex and Concave]
A linear function $f(x) = c^\top x$ satisfies $H_f = 0$ everywhere, so it is both convex and concave. This means the sufficient condition above applies to linear programs: any critical point of the Lagrangian (introduced in the next chapter) with appropriate convexity properties will be a global minimum.
[/remark]
The key limitation is that real problems impose constraints — the feasible set is $X(b) = \{x \in X : h(x) = b\}$, a proper subset of $X$. The unconstrained condition $\nabla f(x^*) = 0$ is too strong: it demands that all directional derivatives vanish, but in a constrained problem only the directions tangent to the feasible set are relevant. The next chapter introduces the **method of Lagrange multipliers**, which converts constrained optimisation back into an unconstrained problem by penalising violations of the constraints.
But converting constraints into Lagrange multipliers is only the beginning. When we move from general nonlinear programs to the special case where everything is linear — objective and constraints alike — the structure becomes so rigid that entirely new algorithmic techniques become possible.
# 2. The Method of Lagrange Multipliers
Chapter 1 established the general framework for constrained optimisation and recalled the tools of unconstrained optimisation. The central difficulty identified there was that we know how to minimise a function freely — by setting the gradient to zero — but the feasible region of a constrained problem prevents us from moving in all directions. This chapter develops the method of Lagrange multipliers, which resolves that tension by encoding the constraints directly into a modified objective function. We prove the key sufficiency theorem, interpret the multipliers as rates of change of the optimal value (shadow prices), and develop the duality theory that underlies the entire chapter, culminating in a condition guaranteeing that strong duality holds.
## The Lagrangian and the Sufficiency Theorem
The central idea is disarmingly simple: if a candidate point $x^*$ is feasible and simultaneously minimises a modified function that penalises constraint violations, then $x^*$ must be optimal for the original problem. The modified function is called the Lagrangian.
Throughout this chapter, the primal problem $(P)$ is:
\begin{align*}
\text{minimise} \quad & f(x) \quad \text{subject to} \quad h(x) = b, \quad x \in X,
\end{align*}
where $x \in \mathbb{R}^n$, $f: \mathbb{R}^n \to \mathbb{R}$, $h: \mathbb{R}^n \to \mathbb{R}^m$, $b \in \mathbb{R}^m$, and $X \subseteq \mathbb{R}^n$ is the regional constraint.
[definition: Lagrangian]
The **Lagrangian** of problem $(P)$ is the function $L: \mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R}$ defined by
\begin{align*}
L(x, \lambda) = f(x) - \lambda^\top (h(x) - b),
\end{align*}
where $\lambda \in \mathbb{R}^m$ is called the **Lagrange multiplier** (or multiplier vector).
[/definition]
The sign convention is a matter of choice: using $f(x) + \lambda^\top(h(x) - b)$ instead merely replaces $\lambda^*$ by $-\lambda^*$ at the optimum, so neither convention is more correct. The key observation is that when the constraint $h(x) = b$ is satisfied, the penalty term vanishes and $L(x, \lambda) = f(x)$ regardless of $\lambda$.
[quotetheorem:2547]
The proof hinges on two observations: minimising $L$ over all of $X$ is easier than minimising $f$ over only the feasible subset $X(b)$, and on the feasible set the Lagrangian and the objective coincide.
[proof]
Let $X(b) = \{x \in X : h(x) = b\}$. For any $x \in X(b)$, we have $h(x) - b = 0$, so $f(x) = L(x, \lambda^*)$. Therefore:
\begin{align*}
\inf_{x \in X(b)} f(x) &= \inf_{x \in X(b)} L(x, \lambda^*) \\
&\geq \inf_{x \in X} L(x, \lambda^*) \quad \text{(minimising over a larger set can only decrease the infimum)} \\
&= L(x^*, \lambda^*) \quad \text{(by hypothesis)} \\
&= f(x^*) - \lambda^{*\top}(h(x^*) - b) = f(x^*). \quad \text{(since $x^*$ is feasible)}
\end{align*}
Since $x^* \in X(b)$ and $f(x^*) \geq \inf_{x \in X(b)} f(x)$, we conclude $f(x^*) = \inf_{x \in X(b)} f(x)$.
[/proof]
The sufficiency theorem is powerful precisely because of what it does not require: no differentiability, no convexity, no constraint qualification. It is a pure algebraic fact about infima. Its limitation is equally clear: we must already possess a pair $(x^*, \lambda^*)$ that jointly minimises the Lagrangian and satisfies the constraint. The theorem gives no algorithm for finding such a pair — it only confirms that once found, the pair certifies optimality. The task of searching for candidates is where the structure of $f$ and $h$ (differentiability, convexity) becomes essential.
[remark: How to Find the Multiplier]
The theorem does not tell us how to find $\lambda^*$ and $x^*$. To locate them, we need both $\nabla_x L = 0$ and $h(x) = b$. Expanding the gradient condition gives the system
\begin{align*}
\nabla f(x) &= \lambda \nabla h(x), \\
h(x) &= b,
\end{align*}
where $\nabla h(x) \in \mathbb{R}^{n \times m}$ denotes the Jacobian matrix of $h$ (each column is the gradient of one component $h_i$).
[/remark]
### Geometric Interpretation
The condition $\nabla f = \lambda \nabla h$ has an illuminating geometric meaning. In unconstrained minimisation, every directional derivative must vanish at a minimum, so $\nabla f = 0$. Under the constraint $h(x) = b$, the variable $x$ is restricted to the level set $\{x : h(x) = b\}$, a surface of dimension $n - m$. A minimum on this surface only requires that $\nabla f$ vanish in directions tangent to the surface — not in all directions.
Now the gradient $\nabla h$ is perpendicular to the level set of $h$ (it points in the normal direction). The condition $\nabla f = \lambda \nabla h$ therefore says exactly that $\nabla f$ is normal to the constraint surface. In other words, there is no "free" movement along the feasible set that reduces $f$.
[illustration:lagrange-multiplier-gradient-geometry]
[example: Minimisation with Two Equality Constraints]
Minimise $f(x) = x_1 - x_2 - 2x_3$ subject to
\begin{align*}
h_1(x) = x_1 + x_2 + x_3 &= 5, \\
h_2(x) = x_1^2 + x_2^2 &= 4.
\end{align*}
**Setting up the Lagrangian.** With multipliers $\lambda_1, \lambda_2 \in \mathbb{R}$:
\begin{align*}
L(x, \lambda) &= x_1 - x_2 - 2x_3 - \lambda_1(x_1 + x_2 + x_3 - 5) - \lambda_2(x_1^2 + x_2^2 - 4) \\
&= \bigl((1 - \lambda_1)x_1 - \lambda_2 x_1^2\bigr) + \bigl((-1 - \lambda_1)x_2 - \lambda_2 x_2^2\bigr) + (-2 - \lambda_1)x_3 + 5\lambda_1 + 4\lambda_2.
\end{align*}
**Identifying valid multipliers.** For $L(\cdot, \lambda)$ to have a finite infimum over $\mathbb{R}^3$, each term must be bounded below. Since $x_3$ appears linearly with coefficient $-2 - \lambda_1$, and $x_3$ is free, we need $\lambda_1 = -2$. The quadratic terms in $x_1$ and $x_2$ are bounded below only if $\lambda_2 < 0$ (so that the quadratic has a positive leading coefficient). Define the set of admissible multipliers $Y = \{\lambda \in \mathbb{R}^2 : \lambda_1 = -2,\, \lambda_2 < 0\}$.
**Finding the critical point.** With $\lambda_1 = -2$, the stationarity conditions $\partial L / \partial x_1 = 0$ and $\partial L / \partial x_2 = 0$ give:
\begin{align*}
3 - 2\lambda_2 x_1 = 0 &\implies x_1 = \frac{3}{2\lambda_2}, \\
1 - 2\lambda_2 x_2 = 0 &\implies x_2 = \frac{1}{2\lambda_2}.
\end{align*}
To confirm this is a minimum (not a saddle point), note that the Hessian of $L$ in the $(x_1, x_2)$-directions is $-2\lambda_2 I$, which is positive definite when $\lambda_2 < 0$, as required.
**Enforcing the functional constraints.** The second constraint $x_1^2 + x_2^2 = 4$ gives:
\begin{align*}
\frac{9}{4\lambda_2^2} + \frac{1}{4\lambda_2^2} = 4 \implies \frac{10}{4\lambda_2^2} = 4 \implies \lambda_2^2 = \frac{5}{8} \implies \lambda_2 = -\sqrt{\frac{5}{8}},
\end{align*}
where we take the negative root since $\lambda_2 < 0$. The first constraint then determines $x_3 = 5 - x_1 - x_2$.
**Optimal solution.** Substituting back:
\begin{align*}
x_1 = -3\sqrt{\frac{2}{5}}, \quad x_2 = -\sqrt{\frac{2}{5}}, \quad x_3 = 5 + 4\sqrt{\frac{2}{5}}.
\end{align*}
By the Lagrangian Sufficiency Theorem, this is optimal for the original problem.
[/example]
## Inequality Constraints and Slack Variables
What goes wrong when the constraints are inequalities rather than equalities? The gradient condition $\nabla f = \lambda \nabla h$ was derived under the assumption that the feasible set is locally a smooth manifold — a level set of $h$. An inequality $h(x) \leq b$ does not determine a manifold: it defines a region, and the feasible set may have corners, faces, and interior points where no constraint is active. At an interior optimum no constraints bind, so optimality is just $\nabla f = 0$. At a boundary point some constraints are active and behave like equalities; others are inactive and impose no restriction on the direction of movement. The Lagrangian approach handles this by introducing slack variables that convert inequalities to equalities — but at the cost of carrying extra variables that interact with the multipliers through complementary slackness.
Concretely, we convert the inequality $h(x) \leq b$ to an equality by introducing a **slack variable** $z \geq 0$: the constraint $h(x) \leq b$ becomes $h(x) + z = b$ with $z \in \mathbb{R}^m$, $z \geq 0$.
The augmented Lagrangian is then:
\begin{align*}
L(x, z, \lambda) = f(x) - \lambda^\top(h(x) + z - b).
\end{align*}
The procedure becomes:
1. Introduce slack variables to write $h(x) + z = b$ with $z \geq 0$.
2. Form $L(x, z, \lambda)$ as above.
3. Find $Y = \{\lambda : \inf_{x \in X,\, z \geq 0} L(x, z, \lambda) > -\infty\}$.
4. For each $\lambda \in Y$, find $x^*(\lambda)$ and $z^*(\lambda)$ minimising $L(\cdot, \cdot, \lambda)$ over $X \times \mathbb{R}^m_+$.
5. Find $\lambda^* \in Y$ such that $h(x^*(\lambda^*)) + z^*(\lambda^*) = b$.
By the Sufficiency Theorem, $x^*(\lambda^*)$ is then optimal for the original inequality-constrained problem.
### Complementary Slackness
The introduction of slack variables yields a useful structural constraint on the optimal multipliers. Since $z_j$ does not appear in the original objective $f(x)$, minimising $L$ over $z_j \geq 0$ reduces to minimising $-\lambda_j z_j$ over $z_j \geq 0$. This has a finite minimum only if $\lambda_j \leq 0$, and the minimum is:
- $z_j^* = 0$ if $\lambda_j < 0$ (the multiplier is nonzero, so the slack is forced to zero, meaning the constraint holds with equality);
- $z_j^*$ is free (any value $\geq 0$) if $\lambda_j = 0$.
In either case, $\lambda_j z_j^* = 0$. This is the **complementary slackness** condition.
[definition: Complementary Slackness]
Given an optimal slack $z^*(\lambda)$ for a Lagrangian with inequality constraints, **complementary slackness** is the condition
\begin{align*}
\lambda_j \cdot z^*(\lambda)_j = 0 \quad \text{for all } j = 1, \ldots, m.
\end{align*}
Equivalently, for each $j$, either the $j$-th constraint is active (equality holds, $z_j = 0$) or the $j$-th multiplier is zero ($\lambda_j = 0$), or both.
[/definition]
Complementary slackness significantly reduces the search space: instead of jointly optimising over all $\lambda$ and $z$, we can enumerate the $2^m$ combinations of which constraints are active and which multipliers are zero, and solve a much simpler problem in each case.
[example: Complementary Slackness in Practice]
Maximise $f(x) = x_1 - 3x_2$ subject to $x_1^2 + x_2^2 \leq 4$ and $x_1 + x_2 \leq 2$.
Introducing slack variables $z_1, z_2 \geq 0$: $x_1^2 + x_2^2 + z_1 = 4$ and $x_1 + x_2 + z_2 = 2$. Since we are maximising, equivalently minimise $-x_1 + 3x_2$. The Lagrangian is:
\begin{align*}
L(x, z, \lambda) = \bigl((-1 - \lambda_2)x_1 - \lambda_1 x_1^2\bigr) + \bigl((3 - \lambda_2)x_2 - \lambda_1 x_2^2\bigr) - \lambda_1 z_1 - \lambda_2 z_2 + 4\lambda_1 + 2\lambda_2.
\end{align*}
For $L$ to be bounded below, we need $\lambda_1, \lambda_2 \leq 0$.
Complementary slackness requires $\lambda_1 z_1 = 0$ and $\lambda_2 z_2 = 0$, giving four cases.
**Case 1: $\lambda_1 = 0$, $\lambda_2 = 0$.** The Lagrangian reduces to $L = -x_1 + 3x_2$, which is unbounded below over $\mathbb{R}^2$ (take $x_2 \to +\infty$). This case is infeasible in the dual.
**Case 2: $\lambda_1 = 0$, $z_2 = 0$ (second constraint active).** With $\lambda_1 = 0$, the Lagrangian is $(-1-\lambda_2)x_1 + (3-\lambda_2)x_2 + 2\lambda_2$. For this to be bounded below over $\mathbb{R}^2$, both linear coefficients must be zero: $\lambda_2 = -1$ and $\lambda_2 = 3$. These are contradictory, so this case yields no solution.
**Case 3: $z_1 = 0$, $\lambda_2 = 0$ (first constraint active).** With $\lambda_2 = 0$, stationarity in $x_1$ and $x_2$ gives $-1 - 2\lambda_1 x_1 = 0$ and $3 - 2\lambda_1 x_2 = 0$, so $x_1 = -\frac{1}{2\lambda_1}$ and $x_2 = \frac{3}{2\lambda_1}$. Substituting into $x_1^2 + x_2^2 = 4$:
\begin{align*}
\frac{1}{4\lambda_1^2} + \frac{9}{4\lambda_1^2} = 4 \implies \frac{10}{4\lambda_1^2} = 4 \implies \lambda_1^2 = \frac{5}{8} \implies \lambda_1 = -\sqrt{\frac{5}{8}},
\end{align*}
taking the negative root since $\lambda_1 \leq 0$. Then $x_1 = \frac{1}{2\sqrt{5/8}} = \sqrt{\frac{2}{5}}$ and $x_2 = -3\sqrt{\frac{2}{5}}$. Check: $x_1 + x_2 = \sqrt{\frac{2}{5}} - 3\sqrt{\frac{2}{5}} = -2\sqrt{\frac{2}{5}} < 2$, so the second constraint is indeed slack ($z_2 > 0$), consistent with $\lambda_2 = 0$. The maximum value of $f$ in this case is:
\begin{align*}
x_1 - 3x_2 = \sqrt{\frac{2}{5}} + 9\sqrt{\frac{2}{5}} = 10\sqrt{\frac{2}{5}} = 2\sqrt{10}.
\end{align*}
**Case 4: $z_1 = 0$, $z_2 = 0$ (both constraints active).** The feasible point must satisfy $x_1^2 + x_2^2 = 4$ and $x_1 + x_2 = 2$ simultaneously. Substituting $x_2 = 2 - x_1$ into the circle: $x_1^2 + (2-x_1)^2 = 4$, giving $2x_1^2 - 4x_1 = 0$, so $x_1 = 0$ or $x_1 = 2$. These give the candidate points $(0, 2)$ and $(2, 0)$, with objective values $0 - 6 = -6$ and $2 - 0 = 2$ respectively.
**Conclusion.** Comparing across cases, the maximum is $f = 2\sqrt{10} \approx 6.32$, achieved at $\bigl(\sqrt{2/5},\, -3\sqrt{2/5}\bigr)$.
[/example]
## Shadow Prices
We have seen that the multiplier $\lambda^*$ determines which direction the gradient of $f$ must point at an optimum. But $\lambda^*$ has a second, equally important interpretation: it measures how sensitive the optimal value is to changes in the constraint right-hand side $b$.
Define the **value function** $\phi: \mathbb{R}^m \to \mathbb{R}$ by
\begin{align*}
\phi(b) = \inf_{x \in X(b)} f(x),
\end{align*}
so $\phi(b)$ is the optimal objective value when the constraint is $h(x) = b$.
[quotetheorem:2548]
The proof is a careful application of the chain rule and the envelope theorem; it is omitted from this course.
The interpretation is economic and direct. Suppose $h(x) = b$ represents resource constraints: for instance, $x$ is a production plan, $h(x)$ is the vector of resources consumed, and $b$ is the vector of available resources. Then $\phi(b)$ is the minimum cost achievable given those resources. The derivative $\partial \phi / \partial b_i$ measures how much the optimal cost changes per unit increase in resource $i$ — precisely the **shadow price** of that resource. The Lagrange multiplier $\lambda_i^*$ is this marginal value.
For inequality constraints, complementary slackness integrates naturally: if the $i$-th constraint is not active at the optimum (there is slack), then tightening or relaxing it marginally has no effect on $\phi$, so $\partial \phi / \partial b_i = 0$. But complementary slackness says $\lambda_i^* = 0$ in that case. The two conditions are therefore consistent.
## Lagrange Duality
The Lagrangian gives rise to a natural **dual problem** that provides a lower bound on the primal optimal value — and, under favourable conditions, an exact match.
[definition: Dual Function and Dual Problem]
Given the primal problem $(P)$ with Lagrangian $L(x, \lambda) = f(x) - \lambda^\top(h(x) - b)$, the **dual function** $g: \mathbb{R}^m \to \mathbb{R} \cup \{-\infty\}$ is
\begin{align*}
g(\lambda) = \inf_{x \in X} L(x, \lambda).
\end{align*}
The set of **dual-feasible** multipliers is $Y = \{\lambda \in \mathbb{R}^m : g(\lambda) > -\infty\}$. The **dual problem** $(D)$ is
\begin{align*}
\sup_{\lambda \in Y} g(\lambda).
\end{align*}
[/definition]
The dual function $g(\lambda)$ encodes a fundamental trade-off: for each fixed $\lambda$, it gives the best (lowest) value achievable by any $x \in X$ when the constraint violation is penalised at rate $\lambda$. As we vary $\lambda$, we tune how heavily constraints are penalised, and the supremum over $\lambda$ gives the tightest lower bound this family of penalised problems can provide. Whether this lower bound is tight — whether it actually equals the primal optimal value — is the question of strong duality.
[quotetheorem:2549]
[citeproof:2549]
Weak duality says the dual optimal value is always a lower bound on the primal optimal value. The gap between these two values is called the **duality gap**. When the gap is zero, we have strong duality. The theorem holds without any convexity or smoothness assumption — it is a consequence of the definition alone. Its limitation is that it gives no information about how large the duality gap might be. In non-convex problems, the gap can be substantial: the dual may provide a lower bound that is far from the true primal optimal value, making the dual useless as a computational tool for solving the primal. We will see a concrete illustration of this shortly.
[definition: Strong Duality]
Problems $(P)$ and $(D)$ satisfy **strong duality** if
\begin{align*}
\sup_{\lambda \in Y} g(\lambda) = \inf_{x \in X(b)} f(x).
\end{align*}
[/definition]
Strong duality is not automatic — it is a structural property of the problem. When it holds, solving the dual (which may be easier) gives the primal optimal value. Moreover, it is precisely the condition under which the method of Lagrange multipliers succeeds: a dual optimizer $\lambda^*$ paired with the corresponding $x^*$ witnesses primal optimality via the Sufficiency Theorem.
[example: A Problem with a Duality Gap]
Consider the non-convex primal problem: minimise $f(x) = -x$ subject to $x^2 \leq 1$ (equivalently, $-1 \leq x \leq 1$). The primal optimum is $f^* = -1$ at $x^* = 1$.
Introducing a slack variable $z \geq 0$: $x^2 + z = 1$, the Lagrangian is
\begin{align*}
L(x, z, \lambda) = -x - \lambda(x^2 + z - 1).
\end{align*}
For $L$ to be bounded below over $(x, z) \in \mathbb{R} \times \mathbb{R}_+$, we need $\lambda \leq 0$ (so that $-\lambda z$ is bounded below for $z \geq 0$) and $\lambda \geq 0$ for the quadratic $-\lambda x^2$ to be bounded below in $x$. Both conditions force $\lambda = 0$.
With $\lambda = 0$, the dual function is $g(0) = \inf_{x} (-x) = -\infty$, so $Y = \{0\}$ is not useful: the dual problem gives $\sup_{\lambda \in Y} g(\lambda) = -\infty$. The duality gap is $-1 - (-\infty)$ — infinite.
The root cause is non-convexity: the constraint set $\{x : x^2 \leq 1\}$ is convex, but in more complex non-convex problems the dual function systematically underestimates the primal optimum. In integer programming, for example, where $x$ is restricted to integer values, the dual of the relaxed (continuous) problem always provides a lower bound but the gap can be large. Strong duality fails precisely because the Lagrangian relaxation cannot "see" the discrete structure of the feasible set.
[/example]
[example: Computing the Dual Function]
Return to the problem of minimising $x_1 - x_2 - 2x_3$ subject to $x_1 + x_2 + x_3 = 5$ and $x_1^2 + x_2^2 = 4$. From the earlier example, the set of admissible multipliers is $Y = \{\lambda \in \mathbb{R}^2 : \lambda_1 = -2,\, \lambda_2 < 0\}$, and for each $\lambda \in Y$, the Lagrangian is minimised at:
\begin{align*}
x^*(\lambda) = \left(\frac{3}{2\lambda_2},\, \frac{1}{2\lambda_2},\, 5 - \frac{4}{2\lambda_2}\right)^\top.
\end{align*}
Substituting back into $L$, the dual function is:
\begin{align*}
g(\lambda) = \frac{10}{4\lambda_2} + 4\lambda_2 - 10.
\end{align*}
The dual problem is to maximise $g$ over $\lambda_2 < 0$. Setting $\frac{d g}{d\lambda_2} = -\frac{10}{4\lambda_2^2} + 4 = 0$ gives $\lambda_2 = -\sqrt{5/8}$, matching the value found from the primal constraints. Computing $f(x^*)$ and $g(\lambda^*)$ confirms they are equal, so strong duality holds for this problem.
[/example]
## Supporting Hyperplanes and Conditions for Strong Duality
We now develop geometric conditions that guarantee strong duality. The connection between duality and geometry passes through the concept of a supporting hyperplane to the value function $\phi$.
[definition: Supporting Hyperplane]
A linear function $\alpha: \mathbb{R}^m \to \mathbb{R}$ is a **supporting hyperplane** to $\phi$ at $b$ if:
- $\alpha(b) = \phi(b)$ (the hyperplane touches $\phi$ at $b$), and
- $\alpha(c) \leq \phi(c)$ for all $c \in \mathbb{R}^m$ (the hyperplane lies below $\phi$ everywhere).
[/definition]
<!-- illustration-needed: plot of the value function phi(c) as a curved function of c in R^1, with a supporting hyperplane alpha(c) = phi(b) + lambda*(c-b) touching at b and lying below phi everywhere -->
A supporting hyperplane exists at $b$ whenever $\phi$ behaves well enough near $b$ — specifically, whenever $\phi$ is locally convex near $b$. Geometrically, the hyperplane is a "flat underpinning" for the graph of $\phi$: it touches the graph from below at $b$ and never pierces above it. The link to duality is that the slope $\lambda$ of this hyperplane is exactly the dual variable that closes the duality gap: as the proof below makes precise, having such a hyperplane is equivalent to there being a dual feasible point that achieves $g(\lambda) = \phi(b)$.
[quotetheorem:2550]
[citeproof:2550]
The theorem characterises strong duality completely in terms of the geometry of $\phi$. Notice, however, that the supporting hyperplane condition is a condition on $\phi$ — a derived object that depends on the entire family of optimisation problems parametrised by $b$. Checking it directly requires knowing $\phi$ in a neighbourhood of $b$, which is generally as hard as solving the original problem. The practical value of the theorem comes from identifying structural properties of $f$, $h$, and $X$ that guarantee $\phi$ is convex and hence (by the supporting hyperplane theorem below) that a supporting hyperplane exists.
## Convexity and the Guarantee of Strong Duality
When does a supporting hyperplane exist? The question reduces to asking when the value function $\phi$ is convex near $b$. If $\phi$ is convex everywhere, then by the supporting hyperplane theorem from convex analysis, a supporting hyperplane exists at every point in the interior of the domain, and strong duality holds throughout.
[quotetheorem:2551]
This result follows from the definition of convexity and is quoted without proof.
The theorem guarantees existence but is not constructive: it does not give the value of $\lambda$ defining the supporting hyperplane, only that one exists. The hypothesis that $b$ lies in the interior of the domain is essential — at a boundary point of the domain, $\phi$ may be convex but no supporting hyperplane exists (the function simply is not defined on one side). In optimisation, this corresponds to the case where the constraints are barely satisfiable: there exists an optimal solution but no dual certificate.
It remains to identify when $\phi$ itself is convex. The following theorem gives a practical sufficient condition.
[quotetheorem:2552]
[citeproof:2552]
The convexity of $\phi$ follows directly from the convexity of $f$, $h$, and $X$: the proof is essentially the observation that the convex combination of two feasible points is again feasible, and the objective at that combination is bounded by the convex combination of objectives. The hypotheses are tight: if $f$ is non-convex, the infimum over the feasible set need not respect convex combinations, and $\phi$ can fail to be convex even when $X$ and $h$ are convex. If $h$ is non-convex, the feasible region $\{x : h(x) \leq b\}$ may not contain convex combinations of feasible points from different parameter values $b_1$ and $b_2$.
[remark: The Equality Constraint Case]
The equality constraint $h(x) = b$ is equivalent to $h(x) \leq b$ and $-h(x) \leq -b$ simultaneously. For both $h$ and $-h$ to be convex, $h$ must be linear. Thus the convexity argument applies to equality-constrained problems when $h$ is linear — which is exactly the setting of linear programming.
[/remark]
In the equality constraint setting, the value function $\phi$ is convex whenever $f$ is convex and $h$ is linear. The supporting hyperplane theorem then guarantees a supporting hyperplane at any interior feasible parameter value $b$. Assembling this with the equivalence theorem gives us that strong duality holds automatically for equality-constrained convex programs with linear constraints — and, as a special case, for all linear programs.
[quotetheorem:2553]
The argument is direct: in a linear program, $f$, $h$, and the feasible region $X$ are all convex (in fact linear). By the convexity theorem, $\phi$ is convex. By the supporting hyperplane theorem, $\phi$ has a supporting hyperplane at any interior point $b$. By the equivalence theorem, strong duality holds.
This result is the theoretical cornerstone that justifies the entire Lagrangian approach to linear programs. It guarantees that the dual problem is not merely a lower bound gadget — it achieves the same optimal value as the primal, and solving either problem gives complete information about both. The result also reveals why non-convex and integer programs are harder: without convexity, neither $\phi$ nor $g$ need be well-behaved, the duality gap can be positive, and the dual provides only a lower bound rather than an exact characterisation of the primal optimum.
The duality theorem and shadow prices reveal the economic meaning hidden in the constraint structure of a linear program. Yet solving a linear program computationally is a different challenge: we must find the optimal vertex of a potentially high-dimensional polytope. This is where algorithmic techniques like the simplex method become essential.
# 3. Solutions of Linear Programs
## Linear Programs
Chapter 2 developed the Lagrangian framework for constrained optimisation in full generality: given any objective function $f$ and constraints $h(x) = b$, the Lagrangian sufficiency theorem tells us when a candidate point is globally optimal, and the complementary slackness conditions characterise optimality at the multiplier level. That framework applies to nonlinear problems, but it does not by itself give us an efficient algorithm — finding the right $\lambda^*$ can be just as hard as solving the original problem. This chapter turns to the special case where both the objective and the constraints are linear. This restriction buys us enormous structure: the feasible region becomes a convex polyhedron, and optimal solutions, when they exist, are always found at its corners. This geometric fact is the engine behind the simplex method, and the sections that follow make it precise and computationally actionable.
## Standard Form and the Feasible Region
Recall from Chapter 1 that any linear program can be converted into standard form. Introducing slack variables to turn inequality constraints into equalities, and splitting unconstrained variables into their positive and negative parts, we arrive at the canonical problem:
[definition: Standard Form LP]
The **standard form** of a linear program is
\begin{align*}
\text{maximise} \quad & c^\top x \\
\text{subject to} \quad & Ax = b, \quad x \geq 0,
\end{align*}
where $x \in \mathbb{R}^n$ is the vector of decision variables, $c \in \mathbb{R}^n$ is the cost vector, $A \in \mathbb{R}^{m \times n}$ is the constraint matrix, and $b \in \mathbb{R}^m$ is the right-hand side. The feasible region is the set $X(b) = \{x \in \mathbb{R}^n : Ax = b,\, x \geq 0\}$.
[/definition]
Throughout this chapter we work with the maximisation form. Minimising $c^\top x$ is equivalent to maximising $-c^\top x$, so no generality is lost.
The feasible region $X(b)$ is the intersection of the affine subspace $\{x : Ax = b\}$ with the non-negative orthant $\{x \geq 0\}$. Because each of these sets is convex, their intersection is convex. In two dimensions, $X(b)$ is a convex polygon — a finite collection of line segments bounding a shaded region — and the corners of that polygon are the key objects of interest.
[example: Feasible Region in Two Dimensions]
Consider the problem
\begin{align*}
\text{maximise} \quad & x_1 + x_2 \\
\text{subject to} \quad & x_1 + 2x_2 \leq 6, \\
& x_1 - x_2 \leq 3, \\
& x_1, x_2 \geq 0.
\end{align*}
Introducing slack variables $z_1, z_2 \geq 0$, this becomes
\begin{align*}
x_1 + 2x_2 + z_1 &= 6, \\
x_1 - x_2 + z_2 &= 3, \\
x_1, x_2, z_1, z_2 &\geq 0,
\end{align*}
which is in standard form with $n = 4$ variables and $m = 2$ constraints. The feasible region in the $(x_1, x_2)$-plane is the convex quadrilateral with vertices $(0,0)$, $(3,0)$, $(4,1)$, and $(0,3)$.
To maximise the objective $x_1 + x_2$, we want to move as far as possible in the direction of the cost vector $c = (1,1)^\top$. The level sets $\{x_1 + x_2 = k\}$ are lines of slope $-1$; as $k$ increases, these lines shift to the upper right. The maximum is achieved at the corner $(4, 1)$, giving the optimal value $5$.
[/example]
<!-- illustration-needed: the 2D feasible region example above — shade the quadrilateral with vertices (0,0), (3,0), (4,1), (0,3), draw the two constraint lines, and show the objective direction vector c = (1,1) along with several level-set lines x_1 + x_2 = k sweeping toward the optimal corner (4,1) -->
This picture suggests the central principle of the chapter: **the optimal value is attained at a corner of the feasible region**. But we need to handle an edge case. What if the objective is constant along an entire edge of the polygon — that is, $c$ is orthogonal to one of the boundary lines? In that case, a whole edge consists of optimal points, not just a single corner. However, since both endpoints of that edge are themselves corners, at least one corner is still optimal. Concretely: if an interior point $A$ of an edge achieves the maximum, we can slide $A$ along the edge until it reaches a corner vertex without changing the objective value. So a corner is always optimal.
This geometric reasoning generalises beyond two dimensions, but the argument requires a precise algebraic notion of "corner." That is the subject of the next section.
[remark: Why Corners Suffice]
The observation that optimal solutions occur at corners is the reason a finite algorithm for linear programs exists at all. A general convex set can have uncountably many extreme points (for instance, a disk has infinitely many boundary points). But the feasible region of a linear program is a convex polyhedron, which has only finitely many corners — at most $\binom{n}{m}$ by the counting argument in the next section. This finiteness is what makes exhaustive search, and ultimately the simplex algorithm, theoretically tractable.
[/remark]
## Basic Solutions
The geometry of the previous section revealed that optimal solutions to linear programs tend to occur at corners of the feasible polytope. This section translates that geometric intuition into precise algebraic language. The key idea is to characterise corners — called extreme points — directly from the constraint matrix $A$, without needing to plot anything. This algebraic characterisation is what makes the simplex method computable.
Throughout this section we work with a linear program in standard form:
\begin{align*}
\text{maximise} \quad c^\top x \quad \text{subject to} \quad Ax = b, \quad x \geq 0,
\end{align*}
where $A \in \mathbb{R}^{m \times n}$ and $b \in \mathbb{R}^m$. We assume the rows of $A$ are linearly independent; if they are not, redundant rows can be discarded without changing the feasible set. We also assume, for the discussion here, that any $m$ columns of $A$ are linearly independent — a genericity condition that rules out certain degeneracies and can always be arranged by a small perturbation.
### Extreme Points
Before defining basic solutions, we need the geometric notion they correspond to.
[definition: Extreme Point]
Let $S \subset \mathbb{R}^n$ be a convex set. A point $x \in S$ is an **extreme point** of $S$ if it cannot be written as a convex combination of two distinct points in $S$: that is, whenever $y, z \in S$ and $\delta \in (0, 1)$ satisfy
\begin{align*}
x = \delta y + (1 - \delta) z,
\end{align*}
it follows that $x = y = z$.
[/definition]
[explanation: What Extreme Points Capture]
An extreme point is a point that lies on no open line segment entirely contained in $S$. For a convex polygon in $\mathbb{R}^2$, the extreme points are exactly the vertices — the corner points. Interior points and points on edges (but not at corners) are never extreme, because any such point can be expressed as a midpoint of two nearby distinct points still inside $S$.
For the feasible polytope of a linear program, the extreme points are exactly the "corners" that Section 1 identified geometrically as candidates for the optimal solution. The definition above gives a coordinate-free characterisation that works in any dimension.
[/explanation]
### Basic Solutions and the Basis
The algebraic counterpart to extreme points is built around the idea of forcing $n - m$ of the $n$ variables to be zero. Since $Ax = b$ gives $m$ equations in $n$ unknowns, a solution is underdetermined — there is generally a family of solutions. Restricting to solutions with at most $m$ non-zero entries picks out a finite number of candidates.
[definition: Basic Solution and Basis]
A vector $x \in \mathbb{R}^n$ satisfying $Ax = b$ is a **basic solution** if it has at most $m$ non-zero entries: there exists a set $B \subseteq \{1, 2, \ldots, n\}$ with $|B| = m$ such that $x_i = 0$ whenever $i \notin B$. The set $B$ is called the **basis**, and the variables $x_i$ for $i \in B$ are the **basic variables**. The remaining variables $x_i$ for $i \notin B$ are the **non-basic variables**.
[/definition]
[remark: Basic Does Not Mean Feasible]
Being a basic solution means satisfying $Ax = b$ with at most $m$ non-zero entries. It says nothing about the sign of those entries. The constraint $x \geq 0$ is a separate requirement. A basic solution can have negative basic variable values, in which case it is not a feasible point of the linear program.
[/remark]
[definition: Non-Degenerate Basic Solution]
A basic solution is **non-degenerate** if it has exactly $m$ non-zero entries — that is, if all $m$ basic variables are strictly positive ($x_i > 0$ for $i \in B$, after accounting for feasibility). Equivalently, none of the basic variables equals zero.
[/definition]
Degeneracy occurs when a basic variable takes the value zero. It complicates the simplex method and is discussed further in that context.
[definition: Basic Feasible Solution]
A basic solution $x$ is a **basic feasible solution** (BFS) if it additionally satisfies $x \geq 0$.
[/definition]
The number of basic solutions is finite: it is at most $\binom{n}{m}$, the number of ways to choose which $m$ columns of $A$ form the basis. This finiteness is what makes it possible, in principle, to find an optimal solution by checking all vertices.
### A Worked Example
[example: Basic Feasible Solutions for a 2D LP]
Consider the linear program: maximise $f(x) = x_1 + x_2$ subject to
\begin{align*}
x_1 + 2x_2 + z_1 &= 6 \\
x_1 - x_2 + z_2 &= 3 \\
x_1, x_2, z_1, z_2 &\geq 0,
\end{align*}
where $z_1$ and $z_2$ are slack variables introduced to convert the original inequality constraints to equality form. Here $n = 4$ variables and $m = 2$ constraints, so a basic solution requires exactly $2$ non-zero entries (setting $n - m = 2$ variables to zero).
There are $\binom{4}{2} = 6$ ways to choose which two variables are non-zero. For each choice, solve the $2 \times 2$ system to find the corresponding values:
| Point | $x_1$ | $x_2$ | $z_1$ | $z_2$ | $f(x)$ | Feasible? |
|-------|--------|--------|--------|--------|---------|-----------|
| $A$ | $0$ | $0$ | $6$ | $3$ | $0$ | Yes |
| $B$ | $0$ | $3$ | $0$ | $6$ | $3$ | Yes |
| $C$ | $4$ | $1$ | $0$ | $0$ | $5$ | Yes |
| $D$ | $3$ | $0$ | $3$ | $0$ | $3$ | Yes |
| $E$ | $6$ | $0$ | $0$ | $-4$ | $6$ | No |
| $F$ | $0$ | $-3$ | $12$ | $0$ | $-3$ | No |
Points $E$ and $F$ are basic solutions but not feasible: $E$ has $z_2 = -4 < 0$ and $F$ has $x_2 = -3 < 0$. The four basic feasible solutions are $A$, $B$, $C$, $D$.
To verify point $C$: setting $z_1 = z_2 = 0$ reduces the system to $x_1 + 2x_2 = 6$ and $x_1 - x_2 = 3$. Subtracting gives $3x_2 = 3$, so $x_2 = 1$ and $x_1 = 4$. Both are non-negative, confirming $C$ is a BFS.
To verify point $E$: setting $x_2 = z_1 = 0$ gives $x_1 = 6$ from the first equation and $x_1 - z_2 = 3$ from the second, yielding $z_2 = 3 - 6 = -4 < 0$, which fails feasibility.
[/example]
[illustration:bfs-feasible-infeasible-vertices]
The diagram makes the correspondence explicit: $A$, $B$, $C$, $D$ are the four vertices of the feasible polytope. The infeasible basic solutions $E$ and $F$ lie on the constraint lines extended beyond the polytope.
### Basic Feasible Solutions Are the Extreme Points
The example strongly suggests that basic feasible solutions and extreme points are the same thing. This is indeed true, and is the central theorem of this section.
[quotetheorem:2554]
The course states this theorem without proof. The proof proceeds by showing both directions: if $x$ is a BFS, the linear independence of the columns indexed by $B$ prevents any non-trivial convex decomposition; conversely, if $x$ has more than $m$ non-zero entries, the resulting column dependence can be used to construct a convex decomposition, showing $x$ is not extreme.
[explanation: Why This Theorem Matters]
This theorem justifies the entire simplex method strategy. Geometrically, we want to search the vertices of the feasible polytope for the optimum. Algebraically, we can enumerate vertices by varying the basis $B$ — choosing which $m$ columns of $A$ define the basic variables — and solving the resulting system. The simplex method is an efficient way to navigate from one basis to an adjacent one, increasing the objective value at each step, rather than checking all $\binom{n}{m}$ possible bases.
The theorem also explains why the assumption that any $m$ columns of $A$ are linearly independent is useful: it ensures that each choice of $m$-element basis $B$ yields a unique basic solution, so different bases give different points in the feasible set.
[/explanation]
[remark: What Breaks Without Row Independence]
The hypothesis that the rows of $A$ are linearly independent is essential for the BFS–extreme-point correspondence to work cleanly. If the rows are dependent, then some constraints are redundant — they are implied by the others — and the linear system $Ax = b$ has a lower-dimensional solution set than expected. In that case, a choice of $m$ columns may fail to produce an invertible square submatrix $A_B$, so the formula $x_B = A_B^{-1} b$ is not well-defined. More concretely, two different choices of basis $B$ and $B'$ could give rise to the same extreme point of $X(b)$, or a vector with fewer than $m$ nonzero entries could fail to be identifiable as a BFS via the standard column-selection procedure. The correct remedy is to detect and remove redundant rows first, reducing $A$ to a matrix whose rows are genuinely linearly independent, before applying the theorem. In practice, row reduction (Gaussian elimination on $A$) accomplishes this.
[/remark]
### Counting and Finding Basic Solutions
Given a basis $B \subseteq \{1, \ldots, n\}$ with $|B| = m$, let $A_B \in \mathbb{R}^{m \times m}$ denote the submatrix of $A$ formed by the columns indexed by $B$. Under the assumption that any $m$ columns of $A$ are linearly independent, $A_B$ is invertible. The basic solution corresponding to $B$ is then:
\begin{align*}
x_B = A_B^{-1} b, \qquad x_i = 0 \text{ for } i \notin B.
\end{align*}
This is a basic feasible solution precisely when $A_B^{-1} b \geq 0$ (componentwise). Checking feasibility thus reduces to computing $A_B^{-1} b$ and verifying the sign of each entry.
[remark: Connection to the Simplex Method]
In the simplex method, the basis $B$ plays the role of a "current position" in the search. The algorithm moves from one basis to another by swapping one column into the basis (the entering variable) and one column out (the leaving variable), always maintaining feasibility. The decomposition $A = (A_B \mid A_N)$ and $x = (x_B \mid x_N)^\top$ into basis and non-basis components is the foundation of the simplex tableau, developed in detail in a later section.
[/remark]
## Extreme Points and Optimal Solutions
The earlier 2D picture already told us where to look for optimal solutions: at the corners of the feasible polygon. The fundamental theorem of this section says that this geometric intuition is exact, and extends to any number of dimensions. If a linear program has an optimal solution at all, it has one that is a basic feasible solution — an extreme point of the feasible set. This is not merely convenient; it is the reason the simplex method can search through a finite list of candidates rather than an infinite set.
## Optima Always Occur at Extreme Points
Recall the standard form maximization problem $(P)$: maximize $c^\top x$ subject to $Ax = b$, $x \geq 0$, where $A \in \mathbb{R}^{m \times n}$. The feasible set is
\begin{align*}
X(b) = \{x \in \mathbb{R}^n : Ax = b,\, x \geq 0\}.
\end{align*}
We say $(P)$ is **feasible** if $X(b) \neq \varnothing$, and **bounded** if the objective $c^\top x$ does not grow without bound over $X(b)$.
[quotetheorem:2555]
The statement is saying something subtle: it does not claim that every optimal solution is basic, only that at least one basic optimal solution exists. When the objective is constant along an edge of the feasible polytope — the degenerate case in which $c$ is orthogonal to a face — there may be an entire line segment of optimal points. But one endpoint of that segment, being a corner, is basic and feasible.
[proof]
Let $x^*$ be any optimal solution of $(P)$. If $x^*$ has at most $m$ nonzero entries, it is already a basic feasible solution and we are done.
Suppose instead that $x^*$ has $r > m$ nonzero entries. Because $x^*$ has more nonzero entries than a basic solution requires, it cannot be an extreme point of $X(b)$ (a result established when we identified basic feasible solutions with extreme points). So there exist distinct $y, z \in X(b)$ and $\delta \in (0,1)$ such that $x^* = \delta y + (1-\delta) z$.
By optimality of $x^*$, we have $c^\top x^* \geq c^\top y$ and $c^\top x^* \geq c^\top z$. But $c^\top x^* = \delta c^\top y + (1-\delta) c^\top z$, and a strict inequality in either term would contradict this equality. Hence $c^\top x^* = c^\top y = c^\top z$: both $y$ and $z$ are also optimal.
Since $y \geq 0$ and $z \geq 0$, the identity $x^* = \delta y + (1-\delta) z$ forces $y_i = z_i = 0$ whenever $x^*_i = 0$. In other words, the support of $y$ and $z$ is contained in the support of $x^*$, so $y$ and $z$ each have at most $r$ nonzero entries. If either has strictly fewer than $r$, we have found an optimal solution with fewer nonzero entries and we are done by induction.
Otherwise, consider the one-parameter family $x_{\hat\delta} = \hat\delta\, y + (1 - \hat\delta)\, z = z + \hat\delta(y - z)$. Since $y$ and $z$ are both optimal and feasible, so is $x_{\hat\delta}$ for every $\hat\delta \in \mathbb{R}$ (optimality is preserved by the linearity of $c^\top$; feasibility requires checking that $x_{\hat\delta} \geq 0$, which we arrange by choice of $\hat\delta$). The difference $y - z \neq 0$ has all its nonzero entries in positions where $x^*$ is nonzero. We can therefore choose $\hat\delta$ to drive exactly one coordinate of $x_{\hat\delta}$ to zero while keeping all others nonnegative, producing an optimal feasible point with strictly fewer than $r$ nonzero entries.
Repeating this argument finitely many times (the number of nonzero entries decreases strictly at each step), we eventually reach an optimal solution with at most $m$ nonzero entries — a basic feasible solution.
[/proof]
[remark: The Sliding Argument]
The inductive step has a concrete geometric meaning. When $c$ is orthogonal to an edge of the feasible polytope, the optimum is attained along the entire edge. Moving along the line $x_{\hat\delta} = z + \hat\delta(y-z)$ corresponds to "sliding" a point along this edge until it hits a corner. One endpoint of the edge is always a basic feasible solution.
[/remark]
<!-- illustration-needed: the sliding argument — show a 2D feasible polygon with an edge parallel to c; a point A in the interior of that edge; arrows indicating sliding toward the two endpoint corners, one of which is the basic feasible solution that achieves the optimum -->
## Why Finiteness Makes This Powerful
The theorem reduces the search for an optimum over the infinitely many points of $X(b)$ to a search over the finitely many basic feasible solutions. How many are there? A basic solution is determined by choosing which $m$ columns of $A$ form the basis matrix $A_B$. The number of ways to choose $m$ columns from $n$ is $\binom{n}{m}$, and each such choice yields at most one basic solution (since $A_B x_B = b$ has a unique solution when $A_B$ is invertible). So the feasible set has at most $\binom{n}{m}$ extreme points.
[remark: Finiteness Does Not Mean Tractability]
The bound $\binom{n}{m}$ can be astronomically large for realistic problem sizes. With $n = 200$ variables and $m = 100$ constraints, $\binom{200}{100} \approx 10^{58}$. Enumerating all basic feasible solutions is not a viable algorithm. The power of the simplex method — which is what comes next — is that it moves between adjacent basic feasible solutions in a directed way, reaching the optimum after far fewer steps in practice.
[/remark]
[example: Enumerating Basic Feasible Solutions]
Return to the standard form problem
\begin{align*}
\text{maximize} \quad & x_1 + x_2 \\
\text{subject to} \quad & x_1 + 2x_2 + z_1 = 6 \\
& x_1 - x_2 + z_2 = 3 \\
& x_1, x_2, z_1, z_2 \geq 0.
\end{align*}
We have $n = 4$ variables and $m = 2$ constraints, so there are $\binom{4}{2} = 6$ basic solutions. Each is found by setting two variables to zero and solving for the remaining two. The table below lists all six, together with feasibility and objective value:
| Label | $x_1$ | $x_2$ | $z_1$ | $z_2$ | Feasible? | $f(x)$ |
|-------|--------|--------|--------|--------|-----------|---------|
| $A$ | $0$ | $0$ | $6$ | $3$ | Yes | $0$ |
| $B$ | $0$ | $3$ | $0$ | $6$ | Yes | $3$ |
| $C$ | $4$ | $1$ | $0$ | $0$ | Yes | $5$ |
| $D$ | $3$ | $0$ | $3$ | $0$ | Yes | $3$ |
| $E$ | $6$ | $0$ | $0$ | $-4$ | No ($z_2 < 0$) | — |
| $F$ | $0$ | $-3$ | $12$ | $0$ | No ($x_2 < 0$) | — |
The basic feasible solutions are $A, B, C, D$, corresponding to the four corners of the feasible polygon. The maximum objective value is $5$, attained at $C = (4, 1, 0, 0)$.
To verify that $C$ is basic feasible: it satisfies both constraints ($4 + 2(1) = 6$ and $4 - 1 = 3$), it is nonnegative, and it has exactly $m = 2$ nonzero entries (specifically $x_1 = 4$ and $x_2 = 1$, with $z_1 = z_2 = 0$). The basis is $B = \{1, 2\}$, corresponding to the columns of $A$ for $x_1$ and $x_2$.
To see why $E$ fails: setting $x_2 = z_1 = 0$ gives $x_1 = 6$ from the first constraint ($6 = 6$), but then $z_2 = 3 - x_1 = 3 - 6 = -4 < 0$. This violates nonnegativity, so $E$ is not feasible.
[/example]
## The Broader Principle: Convex Functions on Compact Convex Sets
The result for linear programs is a special case of a more general principle. A linear objective $c^\top x$ is simultaneously both convex and concave; the LP theorem follows from convexity alone.
[quotetheorem:2556]
The proof uses the Krein–Milman theorem, which guarantees that every compact convex set is the convex hull of its extreme points. We state the key steps without proof, as the full argument requires tools from functional analysis.
The idea is as follows. Every $x \in X$ can be written as a convex combination of extreme points:
\begin{align*}
x = \sum_{i=1}^{k} \delta_i x_i, \quad \delta_i \geq 0, \quad \sum_{i=1}^{k} \delta_i = 1, \quad x_i \in \operatorname{ext}(X).
\end{align*}
By convexity of $f$:
\begin{align*}
f(x) \leq \sum_{i=1}^{k} \delta_i f(x_i) \leq \max_{1 \leq i \leq k} f(x_i) \leq \max_{x' \in \operatorname{ext}(X)} f(x').
\end{align*}
Since this holds for every $x \in X$, the maximum over $X$ is achieved at an extreme point.
[explanation: Why Convexity of $f$ Is the Right Condition]
It is important that we are maximizing a convex function (or equivalently, minimizing a concave function). For the minimum of a convex function, interior points can be strictly better than extreme points — that is exactly what convex optimization exploits. Linear programming sits at the intersection: a linear function is both convex and concave, so both maxima and minima lie at extreme points.
The practical upshot for operations research is this: whenever a problem can be cast as maximizing a convex (or minimizing a concave) function over a polytope, we can restrict attention to the vertices. For the special case of a linear objective, the vertices are exactly the basic feasible solutions, and their finite number makes algorithmic search possible.
[/explanation]
[remark: Why Compactness Cannot Be Dropped]
The compactness hypothesis is not merely a technical convenience — without it, the theorem fails. Consider $f(x) = x$ on the non-compact convex set $X = [0, \infty)$: the maximum does not exist at all, let alone at an extreme point. More subtly, consider a closed but unbounded convex set such as $X = \{(x_1, x_2) : x_1 \geq 0,\, x_2 \geq 0\}$ with $f(x_1, x_2) = x_1 + x_2$: again $f$ is unbounded on $X$ and no maximum exists. Compactness (closed and bounded) is precisely the condition that guarantees the maximum is attained somewhere on $X$; once existence is secured, the convexity argument then places the attained maximum at an extreme point. For linear programs, the "bounded" assumption in the theorem on existence of an extreme-point optimum plays exactly this role.
[/remark]
## Connection to Basic Feasible Solutions
Combining the results of this section and the previous one, we now have a clean correspondence:
**extreme points of $X(b)$** $\longleftrightarrow$ **basic feasible solutions of $Ax = b$, $x \geq 0$**.
Together with the theorem that an optimum is always achieved at an extreme point (when one exists), this gives the algorithmic blueprint:
1. The feasible set $X(b)$ has finitely many extreme points.
2. At least one extreme point is optimal.
3. Extreme points correspond exactly to basic feasible solutions, which are determined by choosing $m$ columns of $A$.
This is why the simplex method — which we develop in the next section — works by moving from one basic feasible solution to an adjacent one, improving the objective at each step. The theorem above guarantees that this search will terminate at a global optimum.
## Linear Programming Duality
Having established in the previous sections that optimal solutions to linear programs always occur at basic feasible solutions — the extreme points of the feasible polytope — we now develop a theory that lets us certify optimality without exhaustively examining all vertices. This is LP duality, and it is the LP-specific instantiation of the Lagrangian duality framework introduced in Chapter 2. The central objects are a pair of problems — the primal and the dual — whose optimal values coincide, and whose solutions are linked by a precise condition called complementary slackness.
### Deriving the Dual of a Linear Program
The derivation proceeds through the Lagrangian. Consider the linear program in general form, written with explicit slack variables $z \geq 0$ to convert inequality constraints into equalities:
[definition: Primal Linear Program in General Form]
The **primal linear program in general form** is
\begin{align*}
\text{minimize} \quad & c^\top x \\
\text{subject to} \quad & Ax - z = b, \\
& x, z \geq 0,
\end{align*}
where $A \in \mathbb{R}^{m \times n}$, $b \in \mathbb{R}^m$, $c \in \mathbb{R}^n$, and the feasible set for the primal variables is $X = \{(x, z) : x \geq 0,\, z \geq 0\} \subseteq \mathbb{R}^{n+m}$.
[/definition]
The Lagrangian for this problem, with multiplier $\lambda \in \mathbb{R}^m$ for the equality constraint $Ax - z = b$, is
\begin{align*}
L(x, z, \lambda) &= c^\top x - \lambda^\top(Ax - z - b) \\
&= (c^\top - \lambda^\top A)x + \lambda^\top z + \lambda^\top b.
\end{align*}
Since $x$ and $z$ are constrained only to be non-negative, we compute the Lagrangian dual function $g(\lambda) = \inf_{(x,z) \in X} L(x, z, \lambda)$. The infimum over $x \geq 0$ of the term $(c^\top - \lambda^\top A)x$ is $0$ if $c^\top - \lambda^\top A \geq 0$ (achieved at $x = 0$) and $-\infty$ otherwise; similarly, the infimum over $z \geq 0$ of $\lambda^\top z$ is $0$ if $\lambda \geq 0$ and $-\infty$ otherwise. Therefore $g(\lambda)$ is finite if and only if $A^\top \lambda \leq c$ and $\lambda \geq 0$, in which case $g(\lambda) = \lambda^\top b$. The dual problem — maximizing $g(\lambda)$ — is thus:
[definition: Dual Linear Program in General Form]
The **dual linear program** corresponding to the primal above is
\begin{align*}
\text{maximize} \quad & \lambda^\top b \\
\text{subject to} \quad & A^\top \lambda \leq c, \\
& \lambda \geq 0.
\end{align*}
The variables $\lambda \in \mathbb{R}^m$ are called the **dual variables** or **shadow prices**.
[/definition]
There is a structural symmetry worth noting immediately: the primal has $n$ variables and $m$ constraints (besides sign constraints), while the dual has $m$ variables and $n$ constraints. The roles of $b$ (the RHS of the primal constraints) and $c$ (the primal objective coefficients) are exchanged.
### Weak Duality
Before proving the main strong duality result, we establish the fundamental inequality that constrains the relationship between primal and dual objective values.
[quotetheorem:2557]
[citeproof:2557]
[explanation: When Is the Weak Duality Bound Tight vs. Loose?]
Weak duality says $\lambda^\top b \leq c^\top x$ for every feasible pair, but it does not say the gap $c^\top x - \lambda^\top b$ is zero at optimality. Understanding when the gap is tight and when it is strict is the central question that motivates strong duality.
For linear programs, strong duality (established below) guarantees the gap is zero at optimality: if both the primal and dual attain their optima, the optimal values coincide. So for LP, the weak duality bound is always achievable — it is tight.
For general nonlinear programs, the gap can be strictly positive even at the optimum. To see why, consider a primal problem that is non-convex or has a constraint qualification failure: the Lagrangian dual $g(\lambda) = \inf_x L(x,\lambda)$ can be strictly less than the primal infimum, because the dual function captures the "concave envelope" of the infimum rather than the infimum itself. The gap $\inf_x c^\top x - \sup_\lambda g(\lambda) \geq 0$ is called the **duality gap**, and it is precisely the gap measured by weak duality. For LP, convexity and the linear structure eliminate this gap entirely.
The weak duality bound is also useful when the gap is non-zero: even without strong duality, it certifies that the primal objective cannot be better than the dual objective. This one-sided bound is sufficient to prune search trees in branch-and-bound algorithms for integer programs, where strong duality does not hold in general.
[/explanation]
### Strong Duality
Weak duality gives an inequality; strong duality says the gap closes at optimality. This is the deepest result in LP theory and distinguishes linear programming from general nonlinear optimisation.
[quotetheorem:2553]
The proof of strong duality follows from the general Lagrangian duality theory of Chapter 2 applied to the LP setting: linearity of all functions ensures that the minimax identity holds without requiring constraint qualifications. Alternatively, it follows from the Simplex Method, which terminates at a primal–dual pair achieving equal objective values (this will be demonstrated in the next section). The proof is not reproduced here; the key point is that for LPs, unlike for general convex programs, there is no duality gap.
[explanation: Significance of Zero Duality Gap]
Strong duality is the deepest structural property of linear programming, and it has several important consequences. First, it converts a one-sided bound into an equality: the dual is not merely a relaxation of the primal but an exact reformulation. The optimal dual value precisely equals the optimal primal value, so solving either problem gives complete information about both.
Second, the zero duality gap is specific to LP and does not hold for general nonlinear programs — even convex ones. A convex quadratic program satisfies strong duality under Slater's condition (existence of a strictly feasible point), but without such a regularity condition, the duality gap can be positive even for convex objectives. LP's linearity means that Slater's condition is automatically satisfied whenever the problem is feasible and bounded, so no additional assumptions are needed.
Third, strong duality gives an algorithmic completeness guarantee: the simplex method terminates exactly when it has simultaneously constructed a primal optimal BFS and a dual feasible solution with equal objective values. This verifiable certificate of optimality — sometimes called a primal-dual certificate — is what makes LP practically checkable.
[/explanation]
[remark: Infeasibility and Unboundedness]
Strong duality also implies: if the primal is unbounded (optimal value $-\infty$), then the dual is infeasible; if the primal is infeasible, then either the dual is infeasible or the dual is unbounded. The case where both primal and dual are infeasible can occur.
[/remark]
### The Dual of the Dual
A natural question is whether taking the dual of the dual recovers the primal. For LPs, the answer is yes.
[quotetheorem:2558]
[citeproof:2558]
The self-duality of the LP dual construction means that the primal and dual stand in a completely symmetric relationship: the primal is a dual of the dual, and both problems contain exactly the same information about the optimal value. There is no asymmetry between the two — calling one "primal" and the other "dual" is a naming convention, not a structural distinction. This symmetry also has a practical consequence: if one problem is easy to solve (say, the dual has fewer variables than the primal), we can solve it instead and recover primal information via complementary slackness.
### Complementary Slackness
Strong duality tells us optimal primal and dual values coincide, but does not say how to recognize an optimal pair. Complementary slackness provides a precise characterisation: a pair $(x, \lambda)$ is optimal if and only if the slack in each dual constraint times the corresponding primal variable is zero, and vice versa.
[quotetheorem:2559]
[citeproof:2559]
[explanation: Reading Complementary Slackness]
The two conditions have a clean economic interpretation. The condition $(c^\top - \lambda^\top A)x = 0$ says: if a primal variable $x_j > 0$ (i.e., the $j$-th primal activity is positive), then the corresponding dual constraint must be tight, $(\lambda^\top A)_j = c_j$. Conversely, if the $j$-th dual constraint has slack, then $x_j = 0$ (that activity is not used at optimality). The condition $\lambda^\top(Ax - b) = 0$ says: if a dual variable $\lambda_i > 0$ (the $i$-th constraint has a nonzero shadow price), then the $i$-th constraint must be binding, $(Ax)_i = b_i$. Resources with positive shadow price are fully consumed at the optimum.
These are not just theoretical observations — they are the engine behind the Simplex Method's termination test. When we reach a basic feasible solution for the primal, we check whether the induced dual solution is feasible; if so, complementary slackness is automatically satisfied and we stop.
[/explanation]
### A Worked Example: Primal-Dual Correspondence
[example: Primal-Dual Pair and Complementary Slackness]
Consider the following primal problem (already in standard form with slack variables):
\begin{align*}
\text{maximize} \quad & 3x_1 + 2x_2 \\
\text{subject to} \quad & 2x_1 + x_2 + z_1 = 4, \\
& 2x_1 + 3x_2 + z_2 = 6, \\
& x_1, x_2, z_1, z_2 \geq 0.
\end{align*}
Here $m = 2$, $n = 4$ (including slacks), $c^\top = (3, 2, 0, 0)$, and $A = \begin{pmatrix} 2 & 1 & 1 & 0 \\ 2 & 3 & 0 & 1 \end{pmatrix}$.
**Constructing the dual.** The dual variables are $\lambda = (\lambda_1, \lambda_2)^\top$. The dual constraints $A^\top \lambda \leq c$ give, with slack variables $\mu_1, \mu_2 \geq 0$:
\begin{align*}
\text{minimize} \quad & 4\lambda_1 + 6\lambda_2 \\
\text{subject to} \quad & 2\lambda_1 + 2\lambda_2 - \mu_1 = 3, \\
& \lambda_1 + 3\lambda_2 - \mu_2 = 2, \\
& \lambda_1, \lambda_2, \mu_1, \mu_2 \geq 0.
\end{align*}
**Complementary slackness conditions.** For a primal–dual pair $(x_1, x_2, z_1, z_2)$ and $(\lambda_1, \lambda_2, \mu_1, \mu_2)$ to be optimal, we need:
\begin{align*}
\lambda_1 z_1 = 0, \quad \lambda_2 z_2 = 0, \quad \mu_1 x_1 = 0, \quad \mu_2 x_2 = 0.
\end{align*}
**Enumerating basic solutions.** Since each problem has 2 constraints and 4 variables, basic solutions are obtained by setting 2 variables to zero. The table below shows all six basic solutions of the primal, the dual solution forced by complementary slackness, and their objective values:
| Vertex | $x_1$ | $x_2$ | $z_1$ | $z_2$ | $f(x)$ | $\lambda_1$ | $\lambda_2$ | $\mu_1$ | $\mu_2$ | $g(\lambda)$ |
|--------|--------|--------|--------|--------|---------|-------------|-------------|---------|---------|--------------|
| $A$ | $0$ | $0$ | $4$ | $6$ | $0$ | $0$ | $0$ | $-3$ | $-2$ | $0$ |
| $B$ | $2$ | $0$ | $0$ | $2$ | $6$ | $\frac{3}{2}$ | $0$ | $0$ | $-\frac{1}{2}$ | $6$ |
| $C$ | $3$ | $0$ | $-2$ | $0$ | $9$ | $0$ | $\frac{3}{2}$ | $0$ | $\frac{5}{2}$ | $9$ |
| $D$ | $\frac{3}{2}$ | $1$ | $0$ | $0$ | $\frac{13}{2}$ | $\frac{5}{4}$ | $\frac{1}{4}$ | $0$ | $0$ | $\frac{13}{2}$ |
| $E$ | $0$ | $2$ | $2$ | $0$ | $4$ | $0$ | $\frac{2}{3}$ | $-\frac{5}{3}$ | $0$ | $4$ |
| $F$ | $0$ | $4$ | $0$ | $-6$ | $8$ | $2$ | $0$ | $1$ | $0$ | $8$ |
To verify vertex $D$: primal has $z_1 = z_2 = 0$, so complementary slackness forces $\lambda_1, \lambda_2$ unrestricted; primal has $x_1, x_2 > 0$, so $\mu_1 = \mu_2 = 0$. Solving $2\lambda_1 + 2\lambda_2 = 3$ and $\lambda_1 + 3\lambda_2 = 2$ gives $\lambda_1 = \frac{5}{4}$, $\lambda_2 = \frac{1}{4}$.
**Reading the table.** Among the six vertices:
- $C$, $E$, $F$ are primal infeasible (negative entries in the primal variables).
- $A$, $B$ are primal feasible but dual infeasible (negative $\mu_i$, meaning dual constraints are violated).
- $D$ is the unique vertex at which both primal and dual solutions are feasible, and we can read off $f(x) = g(\lambda) = \frac{13}{2}$. By weak duality, this confirms $D$ is optimal without any further computation.
The primal feasible region is the polygon with vertices $A$, $B$, $D$, $E$ (the vertices with $x_1, x_2, z_1, z_2 \geq 0$). The dual feasible region in $(\lambda_1, \lambda_2)$-space is bounded by $2\lambda_1 + 2\lambda_2 \leq 3$ and $\lambda_1 + 3\lambda_2 \leq 2$ with $\lambda_1, \lambda_2 \geq 0$.
[/example]
<!-- illustration-needed: side-by-side diagrams of the primal feasible region in (x_1, x_2)-space and the dual feasible region in (lambda_1, lambda_2)-space, with corresponding vertices A-F labeled on each, and the optimal vertex D highlighted in both -->
### Shadow Prices and Sensitivity
The dual variables $\lambda_i$ carry economic meaning that justifies the name *shadow price*. They measure the rate of change of the optimal objective value with respect to perturbations in the right-hand side $b$.
[definition: Shadow Price]
Let $p^*(b)$ denote the optimal primal objective value as a function of the RHS vector $b \in \mathbb{R}^m$. If $p^*$ is differentiable at $b$, the **shadow price** corresponding to constraint $i$ is
\begin{align*}
\lambda_i^* = \frac{\partial p^*}{\partial b_i}.
\end{align*}
At the optimal dual solution $\lambda^*$, the dual variable $\lambda_i^*$ equals this partial derivative.
[/definition]
[explanation: Interpreting Shadow Prices]
In the example above, the optimal dual solution at vertex $D$ is $\lambda_1^* = \frac{5}{4}$, $\lambda_2^* = \frac{1}{4}$. This says: if we increase the RHS of the first constraint from $4$ to $5$ (relaxing that resource limit by one unit), the optimal primal value increases by $\frac{5}{4}$. Similarly, relaxing the second constraint by one unit increases the optimum by $\frac{1}{4}$.
The complementary slackness condition $\lambda_i^*(A x^* - b)_i = 0$ has a direct interpretation in this light: if constraint $i$ is not binding at optimality ($(Ax^*)_i > b_i$, so there is slack), then $\lambda_i^* = 0$. Relaxing a non-binding constraint cannot improve the optimum — the constraint was not the bottleneck. Resources with positive shadow price are the binding constraints: they are fully consumed, and any additional unit would strictly increase the objective.
For constraint $i$ with $\lambda_i^* > 0$: the constraint is tight at the optimum, and $(Ax^*)_i = b_i$. For primal variable $j$ with $x_j^* > 0$: the dual constraint for variable $j$ is tight, $(A^\top \lambda^*)_j = c_j$. These shadow price interpretations make LP duality a powerful tool in resource allocation, production planning, and economic equilibrium analysis.
[/explanation]
### Summary of LP Duality
The theory developed in this section can be summarised in a compact correspondence table. Given the primal in general form (minimise $c^\top x$ subject to $Ax \geq b$, $x \geq 0$), the dual (maximise $\lambda^\top b$ subject to $A^\top \lambda \leq c$, $\lambda \geq 0$) satisfies:
- **Weak duality**: $\lambda^\top b \leq c^\top x$ for any feasible primal $x$ and dual $\lambda$.
- **Strong duality**: if one is feasible and bounded, both attain the same optimal value.
- **Complementary slackness**: $(x^*, \lambda^*)$ is a primal–dual optimal pair if and only if $(c^\top - \lambda^{*\top} A)x^* = 0$ and $\lambda^{*\top}(Ax^* - b) = 0$.
- **Self-duality**: the dual of the dual is the primal.
With these tools in hand, we turn to the Simplex Method, which implements a systematic traversal of vertices of the primal feasible region, using complementary slackness as both a termination criterion and a pivot rule.
## The Simplex Method
The previous sections established the theoretical bedrock: basic feasible solutions coincide with extreme points, an optimal solution always occurs at an extreme point, and primal and dual feasibility together with complementary slackness characterise optimality. The simplex method converts these theoretical guarantees into a concrete, efficient algorithm. Starting from any basic feasible solution of the primal, it systematically moves to adjacent extreme points — each step strictly improving the objective — until the dual feasibility conditions are met and optimality is confirmed.
The structure of this section mirrors the source: we begin with an introductory example to build intuition, then develop the general theory of the simplex tableau, and finally codify the algorithm step by step.
## The Simplex Tableau
Before writing down the general setup, it helps to see the method in action on a small problem.
[example: Introductory Simplex Pivot]
Consider the problem: maximise $x_1 + x_2$ subject to
\begin{align*}
x_1 + 2x_2 + z_1 &= 6 \\
x_1 - x_2 + z_2 &= 3 \\
x_1, x_2, z_1, z_2 &\geq 0.
\end{align*}
The variables $z_1, z_2$ are slack variables introduced to convert the inequality constraints to equalities. We record all coefficients in the **simplex tableau**, placing the constraint rows above a separator and the objective row below it, with the right-hand side values in the final column:
| | $x_1$ | $x_2$ | $z_1$ | $z_2$ | RHS |
|---|---|---|---|---|---|
| Row 1 | $1$ | $2$ | $1$ | $0$ | $6$ |
| Row 2 | $1$ | $-1$ | $0$ | $1$ | $3$ |
| Objective | $1$ | $1$ | $0$ | $0$ | $0$ |
The columns of $z_1$ and $z_2$ form the $2 \times 2$ identity matrix. This identity block immediately gives a basic feasible solution: set the non-basic variables $x_1 = x_2 = 0$ and read off the basic variables $z_1 = 6$, $z_2 = 3$. The objective value is $c^\top x = 0$.
This solution is not optimal because the objective row has positive entries in the $x_1$ and $x_2$ columns, meaning the objective can be improved by increasing either variable. We pick $x_2$ as the **entering variable** (its objective coefficient $1 > 0$ signals potential improvement). To bring $x_2$ into the basis, we need to pivot on a row in the $x_2$ column. We select the row with the smallest ratio $\text{RHS}/a_{i,x_2}$ among rows with $a_{i,x_2} > 0$:
\begin{align*}
\text{Row 1}: \quad \frac{6}{2} = 3, \qquad \text{Row 2}: \quad \frac{3}{-1} < 0 \text{ (skip, coefficient non-positive)}.
\end{align*}
Row 1 is the unique choice, making $x_2$ the entering variable and $z_1$ the **leaving variable**. We divide Row 1 by $2$ to make the pivot element equal to $1$, then eliminate $x_2$ from every other row (including the objective row) by subtracting appropriate multiples of the new Row 1:
| | $x_1$ | $x_2$ | $z_1$ | $z_2$ | RHS |
|---|---|---|---|---|---|
| Row 1 | $\tfrac{1}{2}$ | $1$ | $\tfrac{1}{2}$ | $0$ | $3$ |
| Row 2 | $\tfrac{3}{2}$ | $0$ | $\tfrac{1}{2}$ | $1$ | $6$ |
| Objective | $\tfrac{1}{2}$ | $0$ | $-\tfrac{1}{2}$ | $0$ | $-3$ |
The new basic feasible solution is $x_2 = 3$, $z_2 = 6$, $x_1 = z_1 = 0$, with objective value $3$. Observe that the bottom-right entry records $-f(x)$, so we read $f(x) = 3$. The objective row still has a positive entry ($\tfrac{1}{2}$ in the $x_1$ column), so the algorithm continues.
[/example]
This example illustrates the two core steps: identifying an entering variable from a positive objective-row entry, and determining a leaving variable via the minimum-ratio test. We now develop the matrix algebra that makes these steps precise for an LP in standard form.
### The Basis Decomposition
Work with the LP in standard form: maximise $c^\top x$ subject to $Ax = b$, $x \geq 0$, where $A \in \mathbb{R}^{m \times n}$ and $b \in \mathbb{R}^m$.
Let $B \subseteq \{1, 2, \ldots, n\}$ with $|B| = m$ be a basis index set. Rearranging columns so that basis columns come first, write
\begin{align*}
A &= \begin{pmatrix} A_B & A_N \end{pmatrix}, \qquad
x = \begin{pmatrix} x_B \\ x_N \end{pmatrix}, \qquad
c = \begin{pmatrix} c_B \\ c_N \end{pmatrix},
\end{align*}
where $A_B \in \mathbb{R}^{m \times m}$ is the (invertible) basis submatrix, $A_N \in \mathbb{R}^{m \times (n-m)}$ holds the non-basic columns, $x_B \in \mathbb{R}^m$ are the basic variables, and $x_N \in \mathbb{R}^{n-m}$ are the non-basic variables. The constraint $Ax = b$ becomes
\begin{align*}
A_B x_B + A_N x_N &= b,
\end{align*}
which gives
\begin{align*}
x_B &= A_B^{-1}(b - A_N x_N).
\end{align*}
Setting $x_N = 0$ yields the basic feasible solution $x_B = A_B^{-1} b$, $x_N = 0$, provided $A_B^{-1} b \geq 0$.
The objective function decomposes as
\begin{align*}
f(x) = c^\top x &= c_B^\top x_B + c_N^\top x_N \\
&= c_B^\top A_B^{-1}(b - A_N x_N) + c_N^\top x_N \\
&= c_B^\top A_B^{-1} b + \bigl(c_N^\top - c_B^\top A_B^{-1} A_N\bigr) x_N.
\end{align*}
This expression separates the objective into a constant $c_B^\top A_B^{-1} b$ (the value at the current basic feasible solution) and a linear function of $x_N$. The row vector $c_N^\top - c_B^\top A_B^{-1} A_N$ is called the **reduced cost** of the non-basic variables.
[definition: Reduced Costs]
Let $B$ be a basis for the LP $\max\{c^\top x : Ax = b,\, x \geq 0\}$. The **reduced cost** of the non-basic variable $x_j$ (for $j \notin B$) is
\begin{align*}
\bar{c}_j &= c_j - c_B^\top A_B^{-1} a_j,
\end{align*}
where $a_j$ is the $j$-th column of $A$. Collecting all non-basic reduced costs gives the row vector $\bar{c}_N^\top = c_N^\top - c_B^\top A_B^{-1} A_N$.
[/definition]
The reduced cost has a transparent interpretation: it measures the net change in the objective when $x_j$ is increased by one unit, accounting for the compensating changes in the basic variables required to maintain feasibility.
### The General Tableau
The simplex tableau organises all the algebraic information needed to execute one pivot step. After multiplying the constraint equations by $A_B^{-1}$, the tableau takes the following form:
| Basis columns | Non-basis columns | RHS |
|---|---|---|
| $I$ | $A_B^{-1} A_N$ | $A_B^{-1} b$ |
| $0$ | $c_N^\top - c_B^\top A_B^{-1} A_N$ | $-c_B^\top A_B^{-1} b$ |
The identity block $I$ on the left makes the basic variables readable directly: the $i$-th basic variable equals the $i$-th entry of $A_B^{-1} b$. The bottom row records the reduced costs $\bar{c}_N^\top$ for each non-basic variable and the negative of the current objective value in the bottom-right cell.
[remark: Orientation of the Objective Row]
The bottom-right entry of the tableau is $-c_B^\top A_B^{-1} b = -f(x_B, 0)$. When reading the current objective value, remember to negate this entry. This sign convention arises because the objective row is derived by subtracting $c_B^\top$ times the constraint rows from the objective equation, effectively tracking $-f$ rather than $f$.
[/remark]
### The Optimality Test
The optimality criterion follows immediately from the expression $f(x) = c_B^\top A_B^{-1} b + \bar{c}_N^\top x_N$.
[quotetheorem:2560]
[citeproof:2560]
[explanation: Multiple Optima and the Geometric Meaning]
The criterion $\bar{c}_N \leq 0$ is sufficient for optimality, but the case where some reduced costs are exactly zero (rather than strictly negative) is worth examining. If $\bar{c}_j = 0$ for some non-basic $j$, then increasing $x_j$ does not change the objective value — there is a direction in which the objective is flat. This means the current BFS is optimal, but it is not the unique optimum: there exists a ray (or edge) of the feasible polytope along which the objective is constant and equal to $f(x^*)$. Any point along that edge, including the adjacent BFS obtained by pivoting on $j$, is also optimal. In this situation the LP has infinitely many optimal solutions forming a face of the feasible polytope; however, at least two BFSs (the two endpoints of the optimal edge) are optimal, and the simplex method will stop at one of them.
Geometrically, the condition $\bar{c}_N \leq 0$ says that the objective cannot be improved by moving away from the current vertex along any edge of the feasible polytope. Each non-basic variable $x_j$ corresponds to an edge direction leaving the current vertex: a positive reduced cost $\bar{c}_j > 0$ means that edge points toward a better objective value, while a non-positive reduced cost $\bar{c}_j \leq 0$ means the edge goes sideways or downhill. The optimality criterion is therefore a local condition — "no improving edge exists" — which, by the convexity of the feasible polytope and the linearity of the objective, is also a global condition.
[/explanation]
### Pivoting: Entering and Leaving Variables
When some reduced cost $\bar{c}_j > 0$, the current solution is not optimal and we can improve by increasing $x_j$. As $x_j$ increases from $0$, the basic variables adjust according to
\begin{align*}
x_B &= A_B^{-1} b - (A_B^{-1} a_j)\, x_j,
\end{align*}
where $a_j$ is the $j$-th column of $A$ and $A_B^{-1} a_j$ is the column of $A_B^{-1} a_j$ displayed in the tableau. To keep $x_B \geq 0$, the increase in $x_j$ is limited by
\begin{align*}
x_j \leq \min\left\{\frac{(A_B^{-1} b)_i}{(A_B^{-1} a_j)_i} : (A_B^{-1} a_j)_i > 0\right\}.
\end{align*}
This is the **minimum-ratio test**. The basic variable achieving the minimum (say $(x_B)_r$) drops to zero and **leaves** the basis, while $x_j$ **enters**. If $A_B^{-1} a_j \leq 0$ componentwise, the objective is unbounded above and the algorithm terminates with that conclusion.
[definition: Pivot Element]
Given a pivot column $j$ (entering variable) and a pivot row $r$ (determined by the minimum-ratio test), the **pivot element** is $a_{rj} = (A_B^{-1} a_j)_r > 0$. A **pivot step** consists of dividing row $r$ by $a_{rj}$ to normalise the pivot element to $1$, then adding $-a_{kj}/a_{rj}$ times the new row $r$ to every other row $k \neq r$ (including the objective row). This eliminates all entries in column $j$ except the pivot, and the new basis replaces $(x_B)_r$ with $x_j$.
[/definition]
<!-- illustration-needed: the simplex tableau pivot — show a 4×5 tableau with the pivot element highlighted, arrows indicating which rows are updated, and the resulting tableau after the pivot step, emphasising the entering variable now forming a unit column -->
## The Simplex Algorithm
The above steps combine into a systematic procedure.
[definition: The Simplex Method]
Given an LP $\max\{c^\top x : Ax = b,\, x \geq 0\}$ with an initial basic feasible solution, the **simplex method** proceeds as follows.
1. **Initial tableau.** Write the simplex tableau for the current basis $B$, with $A_B^{-1} A$ in the constraint rows and reduced costs $\bar{c}^\top = c^\top - c_B^\top A_B^{-1} A$ in the objective row.
2. **Optimality check.** If all objective-row entries $\bar{c}_j \leq 0$ for every non-basic $j$, the current basic feasible solution is optimal. Stop.
3. **Choose entering variable.** Select a non-basic index $j$ with $\bar{c}_j > 0$. A common rule is Bland's rule (choose the smallest such $j$) to avoid cycling in degenerate problems.
4. **Minimum-ratio test.** Let $d = A_B^{-1} a_j$ be the pivot column in the current tableau. If $d \leq 0$ componentwise, the LP is unbounded. Stop. Otherwise, compute
\begin{align*}
r &= \operatorname{arg\,min}\left\{\frac{(A_B^{-1} b)_i}{d_i} : d_i > 0\right\}.
\end{align*}
The basic variable in row $r$ is the **leaving variable**. If the minimum is achieved by more than one row, the problem is **degenerate** and care is needed to prevent cycling.
5. **Pivot.** Perform a pivot step on element $a_{rj}$: divide row $r$ by $a_{rj}$, then eliminate all other entries in column $j$. The new basis is $B' = (B \setminus \{r\text{-th basis element}\}) \cup \{j\}$.
6. **Return to step 2.**
[/definition]
Each pivot step moves to an adjacent vertex of the feasible polytope with a strictly larger (or equal, if degenerate) objective value. Since there are finitely many vertices, the algorithm terminates in finite time provided degeneracy is handled (for example, via Bland's rule or a perturbation argument).
[remark: Degeneracy and Cycling]
If two or more rows tie in the minimum-ratio test, the leaving variable can be chosen arbitrarily, but the new basic feasible solution will have a zero basic variable — it is **degenerate**. Subsequent pivot steps may achieve no improvement, and in theory the algorithm could revisit the same vertex repeatedly (**cycling**). Bland's rule (always choose the smallest available index) provably prevents cycling but can be slower in practice than heuristics like Dantzig's rule (choose the most positive reduced cost).
[/remark]
## A Complete Example
We now carry out the full simplex algorithm on the introductory problem to see convergence.
[example: Full Simplex Execution]
Maximise $x_1 + x_2$ subject to $x_1 + 2x_2 + z_1 = 6$, $x_1 - x_2 + z_2 = 3$, $x_1, x_2, z_1, z_2 \geq 0$.
**Initial tableau.** The slack variables $z_1, z_2$ form an identity basis, giving the initial BFS $z_1 = 6$, $z_2 = 3$, $x_1 = x_2 = 0$, $f = 0$.
| | $x_1$ | $x_2$ | $z_1$ | $z_2$ | RHS |
|---|---|---|---|---|---|
| (1) | $1$ | $2$ | $1$ | $0$ | $6$ |
| (2) | $1$ | $-1$ | $0$ | $1$ | $3$ |
| Obj | $1$ | $1$ | $0$ | $0$ | $0$ |
**Iteration 1.** Both $x_1$ and $x_2$ have positive reduced costs. Choose $x_2$ (column with entry $1$) as the entering variable. Minimum-ratio test: only Row (1) has a positive entry in the $x_2$ column ($2 > 0$), giving ratio $6/2 = 3$. Row (2) has $-1 < 0$, excluded. Pivot element: $a_{1,x_2} = 2$.
Pivot: divide Row (1) by $2$, subtract the new Row (1) from the objective row (coefficient $1$), and add $\tfrac{1}{2} \times$ new Row (1) to Row (2) to eliminate $x_2$:
| | $x_1$ | $x_2$ | $z_1$ | $z_2$ | RHS |
|---|---|---|---|---|---|
| (1) | $\tfrac{1}{2}$ | $1$ | $\tfrac{1}{2}$ | $0$ | $3$ |
| (2) | $\tfrac{3}{2}$ | $0$ | $\tfrac{1}{2}$ | $1$ | $6$ |
| Obj | $\tfrac{1}{2}$ | $0$ | $-\tfrac{1}{2}$ | $0$ | $-3$ |
New BFS: $x_2 = 3$, $z_2 = 6$, $x_1 = z_1 = 0$, $f = 3$.
**Iteration 2.** The only positive reduced cost is $\tfrac{1}{2}$ in column $x_1$. Enter $x_1$. Minimum-ratio test on positive $x_1$-column entries: Row (1) gives $3 / \tfrac{1}{2} = 6$ and Row (2) gives $6 / \tfrac{3}{2} = 4$. Row (2) achieves the minimum, so $z_2$ leaves the basis. Pivot element: $a_{2,x_1} = \tfrac{3}{2}$.
Pivot: divide Row (2) by $\tfrac{3}{2}$, subtract $\tfrac{1}{2} \times$ new Row (2) from Row (1), subtract $\tfrac{1}{2} \times$ new Row (2) from the objective row:
\begin{align*}
\text{New Row (2):} \quad& x_1 = 1, \; x_2 = 0, \; z_1 = \tfrac{1}{3}, \; z_2 = \tfrac{2}{3}, \; \text{RHS} = 4. \\
\text{New Row (1):} \quad& x_1 = 0, \; x_2 = 1, \; z_1 = \tfrac{1}{3}, \; z_2 = -\tfrac{1}{3}, \; \text{RHS} = 1. \\
\text{New Obj:} \quad& x_1 = 0, \; x_2 = 0, \; z_1 = -\tfrac{2}{3}, \; z_2 = -\tfrac{1}{3}, \; \text{RHS} = -5.
\end{align*}
| | $x_1$ | $x_2$ | $z_1$ | $z_2$ | RHS |
|---|---|---|---|---|---|
| (1) | $0$ | $1$ | $\tfrac{1}{3}$ | $-\tfrac{1}{3}$ | $1$ |
| (2) | $1$ | $0$ | $\tfrac{1}{3}$ | $\tfrac{2}{3}$ | $4$ |
| Obj | $0$ | $0$ | $-\tfrac{2}{3}$ | $-\tfrac{1}{3}$ | $-5$ |
**Optimality check.** All objective-row entries are non-positive. The algorithm terminates.
The optimal basic feasible solution is $x_1 = 4$, $x_2 = 1$, $z_1 = z_2 = 0$, with optimal value $f = 5$. (Recall: the bottom-right entry is $-f = -5$, so $f = 5$.)
We can verify: $x_1 + x_2 = 5$, $x_1 + 2x_2 = 6$ (Row 1 tight), $x_1 - x_2 = 3$ (Row 2 tight). Both constraints are active, confirming that $x^* = (4, 1)$ is a vertex of the feasible polytope. This matches point $C$ in the basic solutions enumerated in Section 2.
[/example]
The algorithm terminated in exactly two pivot steps for this small problem. In higher dimensions the path through the polytope may be longer, but the guarantee from Section 3 — that an optimal vertex exists — ensures the search always ends.
[explanation: Connecting Simplex to Duality]
The reduced cost vector $\bar{c}_N^\top = c_N^\top - c_B^\top A_B^{-1} A_N$ has a direct dual interpretation. Define $\lambda^\top = c_B^\top A_B^{-1}$; this is a candidate dual solution. The optimality condition $\bar{c}_N \leq 0$ then reads $c_N \leq A_N^\top \lambda$, which is exactly the dual feasibility condition $A^\top \lambda \leq c$. Moreover, at the optimal tableau the basic variables $x_B = A_B^{-1} b$ satisfy $A_B^\top \lambda = c_B$ with equality, so complementary slackness holds automatically. The simplex method thus simultaneously constructs a primal optimal BFS and a dual feasible solution $\lambda$, and their equal objective values certify optimality through strong duality. This is the deep reason the termination criterion (all reduced costs non-positive) is both necessary and sufficient.
[/explanation]
## The Two-Phase Simplex Method
The simplex method developed in the previous section assumed we could read off an initial basic feasible solution directly from the tableau — specifically, that the constraint matrix already contained an identity submatrix, so the slack variables themselves formed a basis with non-negative values. This happy situation arises when every constraint is a $\leq$ inequality, because adding a slack variable to each row produces exactly such an identity block. But when some constraints are $\geq$ inequalities, the slack variables enter with a negative coefficient, and setting all structural variables to zero gives negative values for those slacks — not a feasible solution at all. The question of how to find a starting BFS, or determine that no BFS exists (which would mean the LP itself is infeasible), is the initialisation problem. The two-phase simplex method is the standard answer.
## Artificial Variables and the Phase I Problem
The core idea is to manufacture a guaranteed feasible solution by introducing **artificial variables** — non-negative variables added one per problematic constraint, which are set to the right-hand side value to satisfy each constraint immediately without any structural variables. The trick is that these artificial variables must then be driven to zero before we can trust the solution: a BFS for the augmented problem with positive artificial variables is not a BFS for the original problem. Phase I solves an auxiliary LP whose sole purpose is to minimise the sum of all artificial variables. If the minimum of this auxiliary objective is zero, all artificial variables are zero at the optimum, we have found a BFS for the original problem, and we can move to Phase II. If the minimum is strictly positive, no BFS exists and the original LP is infeasible.
To make this concrete, consider transforming a minimisation problem into maximisation form (by negating the objective), then adding slacks. For a $\geq$ constraint of the form
\begin{align*}
a_1 x_1 + \cdots + a_n x_n \geq b_i,
\end{align*}
we subtract a slack variable $z_i \geq 0$ to obtain the equality
\begin{align*}
a_1 x_1 + \cdots + a_n x_n - z_i = b_i.
\end{align*}
Setting all structural and slack variables to zero would require $0 = b_i$, which fails whenever $b_i > 0$. We therefore add an artificial variable $y_i \geq 0$:
\begin{align*}
a_1 x_1 + \cdots + a_n x_n - z_i + y_i = b_i.
\end{align*}
Now setting $y_i = b_i$ and all other variables to zero is a valid (if artificial) BFS. The Phase I problem minimises $\sum_i y_i$ subject to all the augmented constraints, using this as the starting BFS.
[definition: Artificial Variable]
Given a linear program in standard form $Ax = b$, $x \geq 0$, an **artificial variable** $y_i \geq 0$ is introduced for each constraint row $i$ that does not already have a unit vector as its basis column. The **Phase I problem** is
\begin{align*}
\text{minimise} \quad & \sum_i y_i \\
\text{subject to} \quad & [\,A \mid I\,]\begin{pmatrix} x \\ y \end{pmatrix} = b, \quad x, y \geq 0,
\end{align*}
where the artificial variables $y = (y_1, \ldots, y_m)^\top$ form the initial identity basis.
[/definition]
[remark: Phase I Feasibility]
The Phase I problem is always feasible: setting $y = b$ (assuming $b \geq 0$, which can be arranged by negating any row with $b_i < 0$) and $x = 0$ gives a BFS for the augmented system with Phase I objective value $\mathbf{1}^\top b \geq 0$. Since the objective is bounded below by $0$, Phase I always terminates.
[/remark]
## The Two-Phase Method in Practice
The tableau for the two-phase method carries **two objective rows simultaneously**: the Phase I row (minimising $\sum y_i$, equivalently maximising $-\sum y_i$) and the original objective row. During Phase I we optimise using the Phase I row only; the original objective row is updated by the same pivot operations so that it is ready for Phase II.
One subtlety deserves attention before running the simplex iterations: when the artificial variables form the initial basis, the Phase I objective row will contain non-zero entries in the columns of the basis variables $y_i$. These entries must be cleared before the tableau is in proper form. This is done by row-reducing: for each basis artificial variable $y_i$, add the corresponding constraint row to the Phase I objective row in order to zero out the $-1$ coefficient beneath $y_i$.
Once Phase I terminates with all artificial variables equal to zero, we drop the artificial variable columns and the Phase I objective row, retaining the constraint rows and the original objective row. This Phase II tableau already has all its basis columns zero-ed out in the objective row (because the pivot operations during Phase I maintained this invariant), so we can run the standard simplex directly.
[example: Two-Phase Method — Minimisation with Greater-Than Constraints]
Consider the problem
\begin{align*}
\text{minimise} \quad & 6x_1 + 3x_2 \\
\text{subject to} \quad & x_1 + x_2 \geq 1 \\
& 2x_1 - x_2 \geq 1 \\
& 3x_2 \leq 2 \\
& x_1, x_2 \geq 0.
\end{align*}
**Standardisation.** Minimising $6x_1 + 3x_2$ is equivalent to maximising $-6x_1 - 3x_2$. We introduce slack variables: subtract $z_1, z_2 \geq 0$ from the first two constraints and add $z_3 \geq 0$ to the third:
\begin{align*}
x_1 + x_2 - z_1 &= 1, \\
2x_1 - x_2 - z_2 &= 1, \\
3x_2 + z_3 &= 2, \\
x_1, x_2, z_1, z_2, z_3 &\geq 0.
\end{align*}
Setting all variables to zero would require $z_1 = z_2 = -1$, which is infeasible. So we add artificial variables $y_1, y_2 \geq 0$ to the first two rows (the third row already has $z_3$ as a natural basis column):
\begin{align*}
x_1 + x_2 - z_1 + y_1 &= 1, \\
2x_1 - x_2 - z_2 + y_2 &= 1, \\
3x_2 + z_3 &= 2.
\end{align*}
The initial BFS is $y_1 = 1, y_2 = 1, z_3 = 2$, all others zero. The Phase I objective is to maximise $-y_1 - y_2$.
**Initial tableau.** Both objective rows are recorded. The original objective row has coefficients $(-6, -3, 0, 0, 0, 0, 0)$ for $(x_1, x_2, z_1, z_2, z_3, y_1, y_2)$ and RHS $0$. The Phase I row has $( 0, 0, 0, 0, 0, -1, -1)$ and RHS $0$.
| $x_1$ | $x_2$ | $z_1$ | $z_2$ | $z_3$ | $y_1$ | $y_2$ | RHS |
|-------|-------|-------|-------|-------|-------|-------|-----|
| $1$ | $1$ | $-1$ | $0$ | $0$ | $1$ | $0$ | $1$ |
| $2$ | $-1$ | $0$ | $-1$ | $0$ | $0$ | $1$ | $1$ |
| $0$ | $3$ | $0$ | $0$ | $1$ | $0$ | $0$ | $2$ |
| $-6$ | $-3$ | $0$ | $0$ | $0$ | $0$ | $0$ | $0$ |
| $0$ | $0$ | $0$ | $0$ | $0$ | $-1$ | $-1$ | $0$ |
The basis variables are $y_1, y_2, z_3$, but the Phase I row has $-1$ under each basis variable $y_1$ and $y_2$. We clear these by adding rows 1 and 2 to the Phase I row:
| $x_1$ | $x_2$ | $z_1$ | $z_2$ | $z_3$ | $y_1$ | $y_2$ | RHS |
|-------|-------|-------|-------|-------|-------|-------|-----|
| $1$ | $1$ | $-1$ | $0$ | $0$ | $1$ | $0$ | $1$ |
| $2$ | $-1$ | $0$ | $-1$ | $0$ | $0$ | $1$ | $1$ |
| $0$ | $3$ | $0$ | $0$ | $1$ | $0$ | $0$ | $2$ |
| $-6$ | $-3$ | $0$ | $0$ | $0$ | $0$ | $0$ | $0$ |
| $3$ | $0$ | $-1$ | $-1$ | $0$ | $0$ | $0$ | $2$ |
**Phase I, Iteration 1.** The Phase I row has positive entries in the $x_1$ column (value $3$) and no other positive entries in the non-basis columns (columns $x_2, z_1, z_2$ have $0, -1, -1$ respectively). So we choose $x_1$ as the pivot column. To choose the pivot row, we apply the minimum ratio test to rows with positive $x_1$ entries: row 1 gives $1/1 = 1$ and row 2 gives $1/2$. The minimum is $1/2$, so row 2 is the pivot row. We perform the pivot:
| $x_1$ | $x_2$ | $z_1$ | $z_2$ | $z_3$ | $y_1$ | $y_2$ | RHS |
|-------|-------|-------|-------|-------|-------|-------|-----|
| $0$ | $\tfrac{3}{2}$ | $-1$ | $\tfrac{1}{2}$ | $0$ | $1$ | $-\tfrac{1}{2}$ | $\tfrac{1}{2}$ |
| $1$ | $-\tfrac{1}{2}$ | $0$ | $-\tfrac{1}{2}$ | $0$ | $0$ | $\tfrac{1}{2}$ | $\tfrac{1}{2}$ |
| $0$ | $3$ | $0$ | $0$ | $1$ | $0$ | $0$ | $2$ |
| $0$ | $-6$ | $0$ | $-3$ | $0$ | $0$ | $3$ | $3$ |
| $0$ | $\tfrac{3}{2}$ | $-1$ | $\tfrac{1}{2}$ | $0$ | $0$ | $-\tfrac{3}{2}$ | $\tfrac{1}{2}$ |
The basis is now $\{y_1, x_1, z_3\}$. The Phase I row still has a positive entry, under $x_2$ (value $\tfrac{3}{2}$). We choose $x_2$ as the next pivot column.
**Phase I, Iteration 2.** Among rows with positive $x_2$ coefficient: row 1 gives $\tfrac{1/2}{3/2} = \tfrac{1}{3}$, row 3 gives $\tfrac{2}{3}$. The minimum is $\tfrac{1}{3}$, so row 1 is the pivot row. The pivot element is $\tfrac{3}{2}$. We divide row 1 by $\tfrac{3}{2}$ to normalise, then eliminate $x_2$ from all other rows by subtracting appropriate multiples of the new row 1:
\begin{align*}
\text{New Row 1} &= \text{Row 1} \div \tfrac{3}{2}: \quad [0,\; 1,\; -\tfrac{2}{3},\; \tfrac{1}{3},\; 0,\; \tfrac{2}{3},\; -\tfrac{1}{3},\; \tfrac{1}{3}]. \\
\text{New Row 2} &= \text{Row 2} + \tfrac{1}{2} \times \text{New Row 1}: \quad [1,\; 0,\; -\tfrac{1}{3},\; -\tfrac{1}{3},\; 0,\; \tfrac{1}{3},\; \tfrac{1}{3},\; \tfrac{2}{3}]. \\
\text{New Row 3} &= \text{Row 3} - 3 \times \text{New Row 1}: \quad [0,\; 0,\; 2,\; -1,\; 1,\; -2,\; 1,\; 1]. \\
\text{New Orig Obj} &= \text{Orig Obj} + 6 \times \text{New Row 1}: \quad [0,\; 0,\; -4,\; -1,\; 0,\; 4,\; 1,\; 5]. \\
\text{New Phase I} &= \text{Phase I} - \tfrac{3}{2} \times \text{New Row 1}: \quad [0,\; 0,\; 0,\; 0,\; 0,\; -1,\; -1,\; 0].
\end{align*}
| $x_1$ | $x_2$ | $z_1$ | $z_2$ | $z_3$ | $y_1$ | $y_2$ | RHS |
|-------|-------|-------|-------|-------|-------|-------|-----|
| $0$ | $1$ | $-\tfrac{2}{3}$ | $\tfrac{1}{3}$ | $0$ | $\tfrac{2}{3}$ | $-\tfrac{1}{3}$ | $\tfrac{1}{3}$ |
| $1$ | $0$ | $-\tfrac{1}{3}$ | $-\tfrac{1}{3}$ | $0$ | $\tfrac{1}{3}$ | $\tfrac{1}{3}$ | $\tfrac{2}{3}$ |
| $0$ | $0$ | $2$ | $-1$ | $1$ | $-2$ | $1$ | $1$ |
| $0$ | $0$ | $-4$ | $-1$ | $0$ | $4$ | $1$ | $5$ |
| $0$ | $0$ | $0$ | $0$ | $0$ | $-1$ | $-1$ | $0$ |
The Phase I objective row is now $(0, 0, 0, 0, 0, -1, -1 \mid 0)$. Since every non-basis column has a non-positive Phase I coefficient, the Phase I optimum has been reached. The RHS of the Phase I row is $0$, confirming that the Phase I minimum value of $-\sum y_i = 0$ has been achieved. The artificial variables $y_1$ and $y_2$ are not in the basis, so they take value $0$.
**Transition to Phase II.** We drop the $y_1, y_2$ columns and the Phase I row, retaining only the constraint rows and the original objective row. The current basis is $\{x_2, x_1, z_3\}$, reading from the identity columns in the constraint rows:
| $x_1$ | $x_2$ | $z_1$ | $z_2$ | $z_3$ | RHS |
|-------|-------|-------|-------|-------|-----|
| $0$ | $1$ | $-\tfrac{2}{3}$ | $\tfrac{1}{3}$ | $0$ | $\tfrac{1}{3}$ |
| $1$ | $0$ | $-\tfrac{1}{3}$ | $-\tfrac{1}{3}$ | $0$ | $\tfrac{2}{3}$ |
| $0$ | $0$ | $2$ | $-1$ | $1$ | $1$ |
| $0$ | $0$ | $-4$ | $-1$ | $0$ | $5$ |
The current BFS is $x_2 = \tfrac{1}{3}$, $x_1 = \tfrac{2}{3}$, $z_3 = 1$. The objective row $(0, 0, -4, -1, 0 \mid 5)$ has no positive entries in the non-basis columns ($z_1$ and $z_2$), so the optimality criterion is already satisfied. Phase II terminates immediately without any further pivots.
The optimal value is read from the objective row RHS: the row tracks $-(- 6x_1 - 3x_2)$, so the RHS of $5$ means $-6x_1 - 3x_2 = -5$, giving the minimum of $6x_1 + 3x_2 = \mathbf{5}$, achieved at $(x_1, x_2) = (\tfrac{2}{3}, \tfrac{1}{3})$.
[/example]
## Infeasibility Detection
The Phase I objective value at termination is always $\geq 0$. If it equals $0$, we have driven all artificial variables out of the basis (or at least to zero value) and the original LP is feasible. If the Phase I optimal value is strictly positive, no matter how many pivots we perform, we cannot achieve $\sum y_i = 0$. This means there is no point satisfying all the original constraints simultaneously, and the original LP is **infeasible**.
[quotetheorem:2561]
[citeproof:2561]
[remark: Artificial Variables in the Basis at Phase I Optimum]
It is possible for a Phase I optimum with value $0$ to be reached while some artificial variable $y_i$ is still in the basis, but with value $y_i = 0$ (a degenerate BFS). In this case, $y_i$ is a non-contributing basis variable. Before moving to Phase II, we must pivot $y_i$ out of the basis by performing a pivot in the $y_i$ row, choosing any non-zero entry in that row that corresponds to an original variable as the pivot element. If no such entry exists, the constraint corresponding to $y_i$ is redundant and the row can be dropped.
[/remark]
## Degeneracy and Cycling
The simplex algorithm as described so far is guaranteed to terminate provided each pivot strictly increases the objective value. But a subtlety arises when the current BFS is **degenerate**: a basic feasible solution is degenerate if at least one basic variable takes the value $0$. Recall from Section 3.2 that a basic solution is called non-degenerate if it has exactly $m$ non-zero entries; a degenerate BFS has some basic variable equal to zero even though it is in the basis.
In a degenerate situation, the minimum-ratio test may produce a tie: two or more rows have the same minimum ratio $a_{i0}/a_{ij}$, which means the leaving variable reaches zero while the entering variable gains no positive value. The result is a **degenerate pivot**: we switch to a different basis but stay at the same vertex of the feasible polytope, and the objective value does not increase. This is not immediately a problem, but it opens the door to **cycling**: in a degenerate case, the simplex method might execute a sequence of degenerate pivots that returns to a basis it has previously visited, looping indefinitely without making progress.
Cycling is not merely a theoretical curiosity. Examples have been constructed showing that the simplex method does cycle when degenerate pivots are chosen carelessly. The remedy is a **pivot selection rule** that guarantees termination.
[definition: Bland's Rule]
**Bland's rule** is the following pivot selection strategy for the simplex method:
1. Among all non-basis columns $j$ with positive objective coefficient $a_{0j} > 0$, choose the **smallest index** $j$ as the pivot column.
2. Among all rows $i$ with $a_{ij} > 0$ that achieve the minimum ratio $\min_i a_{i0}/a_{ij}$, choose the row $i$ with the **smallest index** of the leaving basic variable.
[/definition]
The key property of Bland's rule is that it prevents cycling:
[quotetheorem:2562]
[citeproof:2562]
[explanation: Why Degeneracy is the Enemy of Termination]
It is worth pausing to understand intuitively why degeneracy causes trouble. In the non-degenerate case, every pivot strictly improves the objective value. Since there are finitely many bases — at most $\binom{n}{m}$ — and the objective value strictly increases at each step, no basis can be revisited. The simplex method must therefore terminate.
In the degenerate case, a pivot may leave the objective value unchanged. The concern is that a sequence of such zero-improvement pivots might traverse a cycle of bases: $B_0 \to B_1 \to \cdots \to B_k \to B_0$. Each basis in the cycle is different, but they all represent the same vertex of the feasible polytope (since all the degenerate basic variables are zero). The simplex method cannot detect that it has returned to the same geometric point — it only tracks the basis, not the solution values. Without a tie-breaking rule like Bland's, this can indeed happen.
Bland's rule prevents cycling by imposing a global priority ordering on variables: index order. This means that within any cycle, there is always a "highest-priority mismatch" that creates the contradiction in the termination proof. In practice, most implementations use more sophisticated rules (e.g., the lexicographic rule or perturbation methods) which also guarantee termination and tend to be more efficient than Bland's rule.
[/explanation]
[remark: Practical Performance]
In practice, degeneracy is extremely common — most real LP instances have degenerate optimal BFSs — but cycling is extraordinarily rare. For everyday computation, one simply runs the simplex method without special anti-cycling measures and it terminates in practice. Bland's rule (or perturbation) matters for theoretical completeness: to prove that the simplex method is a finite algorithm, one must handle the degenerate case.
[/remark]
## Summary of the Two-Phase Method
The full two-phase simplex procedure for a linear program $\max c^\top x$ subject to $Ax = b$, $x \geq 0$ (where $b \geq 0$ can be arranged by sign-flipping) is:
**Phase I:** Introduce artificial variables $y_i$ for each row lacking a unit basis column. Solve the auxiliary problem $\min \mathbf{1}^\top y$ (equivalently $\max -\mathbf{1}^\top y$) using the simplex method with $y$ as the initial basis. The initial Phase I objective row is cleared of basis-column entries before the first iteration.
- If Phase I terminates with $\mathbf{1}^\top y > 0$: the original LP is **infeasible**. Stop.
- If Phase I terminates with $\mathbf{1}^\top y = 0$: all $y_i = 0$, we have a BFS for the original problem.
**Phase II:** Drop the artificial variable columns and the Phase I objective row. The retained constraint rows and original objective row form a valid simplex tableau with an identified BFS. Run the standard simplex method (using any pivot rule, with Bland's rule for cycle safety) until either the optimal is found or unboundedness is detected.
This two-phase approach completes the simplex framework: Phase I handles initialisation and feasibility detection; Phase II handles optimisation. Together with the termination guarantee provided by Bland's rule, the simplex method is now a complete, provably finite algorithm for linear programming.
<!-- illustration-needed: the two-phase structure — a diagram showing the feasible polytope for a problem where the origin is infeasible, indicating how Phase I finds the BFS vertex on the boundary, and Phase II then travels along edges to the optimum -->
The simplex method gives us a mechanical procedure for navigating the vertices of a linear feasible region until we reach optimality. But linear programs model only a narrow slice of strategic and logistical problems. What happens when we step outside the realm of linear objectives and enter the domain of strategic interaction, where each decision-maker's payoff depends on the choices of others?
# 4. Non-cooperative Games
This chapter takes a brief but illuminating digression from the algorithmic theory of linear programming into **game theory**. The central question is: how should rational agents behave when their payoffs depend not just on their own decisions, but on the simultaneous decisions of others? The answer — the concept of an equilibrium — turns out to connect directly to the linear programming duality developed in the preceding chapters. We focus primarily on two-player games and the special case of zero-sum games, where LP duality gives a complete and beautiful picture.
## Games and Strategies
When two people interact strategically, what does it even mean to make a rational decision? Unlike a standard optimisation problem, one player's optimal choice depends on what the other player does — and vice versa. Before we can reason about rational play, we need a precise mathematical model of the situation.
[definition: Bimatrix Game]
A **two-player game**, or **bimatrix game**, is specified by two matrices $P, Q \in \mathbb{R}^{m \times n}$. Player 1, the **row player**, simultaneously chooses a row $i \in \{1, \ldots, m\}$, while Player 2, the **column player**, chooses a column $j \in \{1, \ldots, n\}$. Each player selects without knowledge of the other's choice. The resulting payoffs are $P_{ij}$ to the row player and $Q_{ij}$ to the column player.
[/definition]
The two-matrix format is complete, but it is cumbersome: the matrices alone do not show which rows and columns correspond to which actions. In practice, one writes the game as a **payoff table**, where each cell contains the pair $(P_{ij}, Q_{ij})$, rows are labelled by the row player's actions, and columns by the column player's.
[example: Rock-Paper-Scissors]
In rock-paper-scissors, the row player's payoff matrix is
\begin{align*}
P &= \begin{pmatrix} 0 & -1 & 1\\ 1 & 0 & -1\\ -1 & 1 & 0 \end{pmatrix},
\end{align*}
where rows and columns are ordered Rock, Paper, Scissors, and $Q = -P$. A win gives payoff $1$, a loss $-1$, and a draw $0$.
As a payoff table this reads:
| | R | P | S |
|---|---|---|---|
| **R** | $(0,0)$ | $(-1,1)$ | $(1,-1)$ |
| **P** | $(1,-1)$ | $(0,0)$ | $(-1,1)$ |
| **S** | $(-1,1)$ | $(1,-1)$ | $(0,0)$ |
By convention the first entry of each pair is the row player's payoff. Note that $P + Q = 0$ at every entry — this is the zero-sum property, which we will exploit heavily later.
[/example]
### Mixed Strategies and Expected Payoffs
In the examples so far players choose a single action with certainty. But rational players will often want to **randomise** — for instance, always playing Rock is easily exploited. The correct framework allows probabilistic play.
[definition: Mixed Strategy]
A **mixed strategy** for the row player is a vector
\begin{align*}
x \in X := \left\{ x \in \mathbb{R}^m : x \geq 0,\ \sum_{i=1}^m x_i = 1 \right\},
\end{align*}
where $x_i$ is the probability of choosing row $i$. Analogously, a mixed strategy for the column player is
\begin{align*}
y \in Y := \left\{ y \in \mathbb{R}^n : y \geq 0,\ \sum_{j=1}^n y_j = 1 \right\}.
\end{align*}
A strategy profile $(x, y) \in X \times Y$ induces a joint lottery. The **expected payoff** to the row player is
\begin{align*}
p(x, y) := x^\top P y = \sum_{i=1}^m \sum_{j=1}^n x_i P_{ij} y_j.
\end{align*}
A strategy $x$ (or $y$) is called **pure** if $x_i = 1$ for some $i$ (all probability concentrated on a single action).
[/definition]
The sets $X$ and $Y$ are probability simplices. Every pure strategy is a vertex of the corresponding simplex, and every mixed strategy is a convex combination of pure strategies. The bilinearity of $p(x, y)$ in $x$ and $y$ separately is the key algebraic property that makes the theory tractable.
### Dominant Strategies and the Prisoner's Dilemma
Some games have a clear solution without any need for the equilibrium concept: one action is simply best regardless of the opponent's play.
[example: Prisoner's Dilemma]
Alice and Bob are each independently offered a plea bargain: remain silent ($S$) or testify ($T$). The outcomes are:
- Both silent: 2 years each.
- One testifies, one stays silent: the informant goes free (payoff 3), the other gets 10 years (payoff 0).
- Both testify: 5 years each (payoff 1 each).
Higher payoff is better (longer sentence = lower payoff). The payoff table is:
| | $S$ | $T$ |
|---|---|---|
| **$S$** | $(2,2)$ | $(0,3)$ |
| **$T$** | $(3,0)$ | $(1,1)$ |
Whatever Bob does, Alice is strictly better off testifying: if Bob plays $S$, she gets 3 vs 2; if Bob plays $T$, she gets 1 vs 0. The same reasoning applies to Bob. We call $T$ a **dominant strategy** for each player.
The resulting outcome $(1,1)$ is **Pareto dominated** by $(2,2)$: both players would prefer mutual silence, but individual rationality drives them to a worse collective outcome. This is the famous paradox of the Prisoner's Dilemma. Payoffs are interpreted relatively, so only ordinal comparisons matter — replacing the entry $3$ with $100$ would not change any of the analysis.
[/example]
### The Chicken Game and the Maximin Strategy
Not every game has a dominant strategy. The Chicken game illustrates why one needs a more refined solution concept.
[example: Chicken and the Maximin Strategy]
Two drivers race toward each other. Each may swerve ($C$ for chicken) or hold course ($D$ for drive). If both hold, they crash (payoff 0); if one swerves and the other holds, the one who swerves looks cowardly (payoff 1) while the other looks bold (payoff 3); if both swerve, both look mildly silly (payoff 2).
| | $C$ | $D$ |
|---|---|---|
| **$C$** | $(2,2)$ | $(1,3)$ |
| **$D$** | $(3,1)$ | $(0,0)$ |
Neither $C$ nor $D$ is dominant. One natural fallback is the **maximin strategy**: choose $x \in X$ to maximise the worst-case expected payoff
\begin{align*}
\max_{x \in X} \min_{y \in Y} p(x, y) = \max_{x \in X} \min_{j \in \{1,\ldots,n\}} \sum_{i=1}^m x_i P_{ij}.
\end{align*}
The inner minimum over $y$ is attained at a pure strategy (a vertex of $Y$), which is why the last equality holds.
This optimisation problem can be formulated as a linear program: introduce $v \in \mathbb{R}$ as a lower bound on the expected payoff for any pure strategy of the opponent, then
maximize $v$ subject to
\begin{align*}
\sum_{i=1}^m x_i P_{ij} &\geq v \quad \text{for all } j = 1, \ldots, n, \\
\sum_{i=1}^m x_i &= 1, \\
x &\geq 0.
\end{align*}
For the Chicken game, the row player's payoff matrix is $P = \begin{pmatrix} 2 & 1 \\ 3 & 0 \end{pmatrix}$, with rows/columns indexed by $C$ and $D$. Writing $x = (x_C, x_D)$ with $x_C + x_D = 1$, the two constraints become:
\begin{align*}
2x_C + 3x_D &\geq v, \\
x_C + 0 \cdot x_D &\geq v.
\end{align*}
Substituting $x_D = 1 - x_C$ into the first constraint gives $2x_C + 3(1-x_C) = 3 - x_C \geq v$. Since we want to maximise $v$, both constraints bind at the optimum. Setting $3 - x_C = x_C$ yields $x_C = \tfrac{3}{2}$, which violates $x_C \leq 1$. This means the first constraint is not binding at the optimum; the binding constraint is the second one. With $x_C \geq v$ and $x_D = 1 - x_C \geq 0$, the maximum $v = x_C$ is achieved by taking $x_C$ as large as possible. Since the first constraint gives $v \leq 3 - x_C$, we need $x_C \leq 3 - x_C$, so $x_C \leq \tfrac{3}{2}$. But also $x_C \leq 1$. The first constraint is $3 - x_C \geq x_C$ for $x_C \leq 1$, which holds for all $x_C \leq 1$. Thus the second constraint $v = x_C$ dominates, and $v$ is maximised at $x_C = 1$, i.e., the row player always swerves. The maximin strategy is $C$ (always swerve), with guaranteed payoff $v^* = 1$.
However, the maximin solution is not fully satisfying: if both players swerve, one player can profitably deviate to $D$. The maximin strategy protects against the worst case, but it is not stable under mutual reasoning.
[/example]
<!-- illustration-needed: best-response diagram for Chicken — show two best-response curves in the $(x_C, y_C)$ unit square, where $x_C$ is the probability player 1 swerves and $y_C$ is the probability player 2 swerves. The three Nash equilibria appear as intersections: $(x_C, y_C) = (0,1)$ (pure $(D,C)$), $(1,0)$ (pure $(C,D)$), and $(\tfrac{1}{2}, \tfrac{1}{2})$ (mixed). Each player's best-response correspondence should be drawn as a step function. -->
## Nash Equilibrium
The maximin strategy guards against the worst case, but it ignores information about the opponent's actual incentives. The correct notion of stability requires that neither player has an incentive to deviate unilaterally — not just against the worst opponent, but against the specific strategy the opponent is playing.
[definition: Best Response and Equilibrium]
A strategy $x \in X$ is a **best response** to $y \in Y$ if
\begin{align*}
p(x, y) \geq p(x', y) \quad \text{for all } x' \in X.
\end{align*}
That is, $x$ maximises the row player's expected payoff given $y$.
A strategy profile $(x, y) \in X \times Y$ is a **(Nash) equilibrium** if $x$ is a best response to $y$ and $y$ is a best response to $x$.
[/definition]
At an equilibrium, neither player can improve their expected payoff by switching strategy, holding the other player's strategy fixed. This is the natural notion of a stable outcome: if both players reason correctly about the other's play, they will end up at an equilibrium.
[example: Equilibria in Chicken]
In the Chicken game, there are two pure-strategy equilibria: $(D, C)$ with payoffs $(3,1)$, and $(C, D)$ with payoffs $(1,3)$. In each case, the player who holds course is responding optimally (switching to $C$ lowers their payoff), and the player who swerves is also responding optimally (switching to $D$ crashes the game).
There is also a **mixed-strategy equilibrium** in which each player swerves with equal probability $\tfrac{1}{2}$. To verify: if the column player plays $C$ and $D$ each with probability $\tfrac{1}{2}$, the row player's expected payoff from $C$ is $\tfrac{1}{2}(2) + \tfrac{1}{2}(1) = \tfrac{3}{2}$, and from $D$ is $\tfrac{1}{2}(3) + \tfrac{1}{2}(0) = \tfrac{3}{2}$. Since both pure strategies yield the same expected payoff, the row player is indifferent and any mixed strategy is a best response — in particular, mixing equally is a best response. By symmetry, the same holds for the column player.
[/example]
A fundamental theorem guarantees that equilibria always exist.
[quotetheorem:2563]
The proof uses a fixed-point argument: one applies Kakutani's fixed-point theorem to the **best-response correspondence**, which maps each strategy profile $(x, y)$ to the set of all best-response pairs $(x^*, y^*)$. Kakutani's theorem requires a compact convex domain and an upper hemicontinuous correspondence with convex values — all of which hold for the simplices $X$ and $Y$. The full proof belongs to a course on fixed-point theory or mathematical economics rather than this course.
[explanation: What Nash's Theorem Does and Does Not Give You]
The existence theorem is powerful, but it leaves several crucial questions open. First, there is **no uniqueness**: the Chicken game already shows three equilibria (two pure, one mixed), and in general the number of equilibria can be exponential. Second, there is **no efficiency guarantee**: the Prisoner's Dilemma shows that the unique equilibrium can be Pareto dominated — both players could be better off at a non-equilibrium outcome, but no individual player has an incentive to deviate. Third, finding an equilibrium is computationally hard in a precise complexity-theoretic sense: the problem of computing a Nash equilibrium is **PPAD-complete**, a complexity class that captures problems believed to require exponential time in the worst case even though a solution is guaranteed to exist. The game of **Matching Pennies** — where each player chooses Heads or Tails, the row player wins if they match and loses if they differ — has payoff matrix $P = \begin{pmatrix} 1 & -1 \\ -1 & 1 \end{pmatrix}$, and no pure Nash equilibrium exists (each pure profile gives one player an incentive to deviate). This shows that the theorem's recourse to mixed strategies is genuinely necessary.
[/explanation]
## Zero-Sum Games and the Minimax Theorem
In a general bimatrix game $(P, Q)$ the two players have partly aligned and partly opposed interests. This makes equilibrium analysis complex: as we saw in Chicken, equilibria can be multiple and inefficient, and there is no canonical way to select among them. The situation simplifies dramatically when the players' interests are directly opposed.
[definition: Zero-Sum Game]
A bimatrix game $(P, Q)$ is a **zero-sum game** (or **matrix game**) if $Q_{ij} = -P_{ij}$ for all $i, j$. Equivalently, every outcome distributes a fixed total payoff of $0$: what the row player gains, the column player loses.
[/definition]
The zero-sum assumption eliminates the coordination problem entirely. There are no Pareto improvements, no reason for the players to communicate or cooperate, and no ambiguity about what each player wants: the row player wants to maximise $x^\top P y$ while the column player wants to minimise it. Since $Q = -P$, we need only specify $P$ to describe a zero-sum game completely.
[remark: Rock-Paper-Scissors is Zero-Sum]
In the rock-paper-scissors example, every outcome transfers payoff from one player to the other: $Q = -P$, so the sum of payoffs in every cell of the table is $0$.
[/remark]
In a zero-sum game, maximising $p(x, y) = x^\top P y$ is the row player's goal, while the column player seeks to minimise the same quantity. This adversarial structure means the maximin concept becomes exactly the right solution notion.
The row player's maximin value is $\max_{x \in X} \min_{y \in Y} p(x, y)$. Symmetrically, the column player minimises the row player's payoff, so their optimal strategy achieves $\min_{y \in Y} \max_{x \in X} p(x, y)$. A moment's thought gives the inequality
\begin{align*}
\max_{x \in X} \min_{y \in Y} p(x, y) \leq \min_{y \in Y} \max_{x \in X} p(x, y):
\end{align*}
whatever strategy the row player commits to first, the column player can respond to make things worse than if the row player had moved second. Without an additional theorem, it is not obvious these two quantities should be equal. Indeed, for non-zero-sum games they need not be: the row player's guaranteed payoff may genuinely depend on who moves first. The remarkable content of the Minimax Theorem is that for zero-sum games the order of play is irrelevant.
To see why equality can fail without the zero-sum assumption, consider what happens with pure strategies alone. In any matrix game, the **pure-strategy maximin** $\max_i \min_j P_{ij}$ can be strictly less than the **pure-strategy minimax** $\min_j \max_i P_{ij}$. For instance, take rock-paper-scissors: $\max_i \min_j P_{ij} = -1$ (every row contains a $-1$ entry) while $\min_j \max_i P_{ij} = 1$ (every column contains a $1$ entry). The gap between $-1$ and $1$ is precisely what mixed strategies close: under mixed strategies, both the maximin and minimax values equal $0$. The content of the Minimax Theorem is that for zero-sum games, mixed strategies always close this gap.
[quotetheorem:2564]
[citeproof:2564]
[explanation: Historical Significance and Connection to LP Duality]
Von Neumann proved the minimax theorem in 1928, predating the development of linear programming by nearly two decades. At the time, the proof used a topological fixed-point argument. The LP duality proof we gave here is historically later and, many would argue, more transparent: it reveals that the minimax equality is not a mysterious fact about probability but an instance of the algebraic symmetry between a primal LP and its dual.
The compactness of $X$ and $Y$ is essential. If the strategy sets were replaced by infinite-dimensional or non-compact sets, strong duality could fail (the LP might be infeasible or unbounded), and the minimax equality would break down. For matrix games, compactness is automatic because the simplices $X \subseteq \mathbb{R}^m$ and $Y \subseteq \mathbb{R}^n$ are bounded and closed.
The connection goes deeper: the minimax theorem and LP strong duality are in fact equivalent — each can be derived from the other in a few lines. This means that game theory and linear programming are not merely analogous; they are two faces of the same underlying duality.
[/explanation]
### The Value of a Matrix Game
The minimax theorem justifies the following definition.
[definition: Value of a Matrix Game]
The **value** of a zero-sum game with payoff matrix $P$ is
\begin{align*}
v^* := \max_{x \in X} \min_{y \in Y} p(x,y) = \min_{y \in Y} \max_{x \in X} p(x,y).
\end{align*}
[/definition]
The value $v^*$ is what the row player can guarantee regardless of the column player's strategy — and simultaneously, $v^*$ is the best the column player can hold the row player down to. The minimax theorem guarantees that both of these characterisations give the same number, so the game has a well-defined notion of "worth" to the row player.
The equilibria of a zero-sum game are completely characterised by the optimisers of the two LP problems.
[quotetheorem:2565]
To see why, suppose $(x, y)$ is a Nash equilibrium. Then $x$ is a best response to $y$, so $p(x, y) \geq p(x', y)$ for all $x' \in X$, which gives $p(x, y) = \max_{x'} p(x', y)$. But also $y$ is a best response to $x$ (minimising the row player's payoff), so $p(x, y) \leq p(x, y')$ for all $y' \in Y$, giving $p(x, y) = \min_{y'} p(x, y')$. Therefore $\min_{y'} p(x, y') = p(x, y) = \max_{x'} p(x', y)$, which forces $x$ to achieve the maximin value and $y$ to achieve the minimax value. Conversely, if $x$ and $y$ are optimal for the maximin and minimax programs respectively, then $\min_{y'} p(x, y') = v^* = \max_{x'} p(x', y)$, which implies neither player can profitably deviate.
[explanation: Solving Zero-Sum Games via LP]
The characterisation theorem has a decisive practical consequence: finding a Nash equilibrium of a zero-sum game reduces exactly to solving a linear program. One formulates the maximin LP (primal), solves it with the simplex method or an interior-point algorithm, and reads off the equilibrium strategy and game value directly from the optimal solution. The dual solution gives the column player's equilibrium strategy.
This is in sharp contrast to general-sum bimatrix games. For a general-sum game, finding a Nash equilibrium is PPAD-complete — no polynomial-time algorithm is known. The zero-sum structure, by collapsing the two-player problem into a single LP, places it firmly in polynomial time. This explains why zero-sum games occupy a privileged position in algorithmic game theory: they are the tractable case.
[/explanation]
[remark: Connection to LP Duality]
The minimax theorem is not just analogous to LP strong duality — it is a direct consequence of it. The two LP problems (row player's primal and column player's dual) form an exact primal-dual pair, and strong duality yields the equality of their optimal values. This is a striking instance where a theorem from combinatorial geometry (LP duality) has a substantive game-theoretic interpretation.
[/remark]
## Solving a Zero-Sum Game via LP: A Worked Example
To see all the moving parts in action, we solve rock-paper-scissors from first principles using the maximin LP.
[example: Rock-Paper-Scissors via LP]
The payoff matrix for the row player is
\begin{align*}
P = \begin{pmatrix} 0 & -1 & 1\\ 1 & 0 & -1\\ -1 & 1 & 0 \end{pmatrix},
\end{align*}
where rows and columns are ordered Rock ($R$), Paper ($P_a$), Scissors ($S_c$). We write the row player's mixed strategy as $x = (x_R, x_{P_a}, x_{S_c})^\top \in X$.
**Setting up the LP.** The maximin program is:
maximize $v$ subject to
\begin{align*}
0 \cdot x_R - 1 \cdot x_{P_a} + 1 \cdot x_{S_c} &\geq v \quad \text{(opponent plays } R\text{)}, \\
1 \cdot x_R + 0 \cdot x_{P_a} - 1 \cdot x_{S_c} &\geq v \quad \text{(opponent plays } P_a\text{)}, \\
-1 \cdot x_R + 1 \cdot x_{P_a} + 0 \cdot x_{S_c} &\geq v \quad \text{(opponent plays } S_c\text{)}, \\
x_R + x_{P_a} + x_{S_c} &= 1, \quad x \geq 0.
\end{align*}
**Finding the optimal solution.** By the symmetry of the game (the matrix is skew-symmetric: $P^\top = -P$), the row player's optimal strategy should treat all three actions equally. Testing $x = (\tfrac{1}{3}, \tfrac{1}{3}, \tfrac{1}{3})^\top$:
- Expected payoff if opponent plays $R$: $\tfrac{1}{3}(0) + \tfrac{1}{3}(-1) + \tfrac{1}{3}(1) = 0$.
- Expected payoff if opponent plays $P_a$: $\tfrac{1}{3}(1) + \tfrac{1}{3}(0) + \tfrac{1}{3}(-1) = 0$.
- Expected payoff if opponent plays $S_c$: $\tfrac{1}{3}(-1) + \tfrac{1}{3}(1) + \tfrac{1}{3}(0) = 0$.
All three constraints are satisfied with $v = 0$. Moreover, this is optimal: $v > 0$ would require the sum of all three constraints to give $0 \cdot (x_R + x_{P_a} + x_{S_c}) > 0$, which is impossible since the three rows of $P$ sum to the zero vector (each column of $P$ sums to zero by skew-symmetry). Therefore $v^* = 0$.
**Reading off the equilibrium.** The unique optimal solution to the maximin LP is $x^* = (\tfrac{1}{3}, \tfrac{1}{3}, \tfrac{1}{3})^\top$, $v^* = 0$. By symmetry, the column player's LP gives $y^* = (\tfrac{1}{3}, \tfrac{1}{3}, \tfrac{1}{3})^\top$ as well. The Nash equilibrium of rock-paper-scissors is for both players to mix uniformly over all three actions, yielding expected payoff $0$ to each player.
**Verification.** At this equilibrium, the row player is indifferent among all three pure strategies (each gives expected payoff $0$ against $y^*$). Any deviation to a pure strategy still yields $0$, so there is no profitable unilateral deviation. The equilibrium is unique: any asymmetric mixture by the row player can be exploited by the column player concentrating probability on the pure strategy that beats the row player's most-likely action.
[/example]
This worked example illustrates the general algorithm: formulate the maximin LP, find the optimal $(x^*, v^*)$, and verify that all constraints are tight (or identify which are). The dual solution $y^*$ emerges from the column player's analogous LP, and together $(x^*, y^*)$ form the Nash equilibrium of the game.
Zero-sum games, Nash equilibria, and mixed strategies all reduce to a common computational problem: finding a distribution over strategies that maximises one player's payoff given that the opponent plays optimally. This reduction to linear programming is remarkable, but real-world systems involve more than two-player conflicts. Many involve flows of goods, information, or resources through networks where optimisation must account for physical constraints simultaneously.
# 5. Network Problems
## Definitions
The earlier chapters of this course developed LP theory in a fairly abstract setting: we studied feasibility, duality, and the simplex algorithm for problems written in matrix form $\min c^\top x$ subject to $Ax = b$, $x \geq 0$. In this chapter we turn to a class of LPs whose constraint matrix $A$ has an exceptionally special structure — it arises from the incidence structure of a directed graph. These are the *network flow problems*, and their combinatorial backbone makes it possible to solve them far more efficiently than generic LPs. Before we can state the problems, we need to establish the language of directed graphs.
## Directed Graphs and Networks
Why do we need graph structure at all? The LP framework of earlier chapters treats a problem as a matrix of coefficients with no further geometry. But many practical problems — routing goods through a logistics network, scheduling flows along pipelines, allocating bandwidth across a communications grid — have a spatial structure: locations connected by links along which flow moves in one direction. Encoding that directionality is what graphs are for. The directed graph is the minimal mathematical object that captures the essentials: which locations exist, which links connect them, and in which direction each link can carry flow. Without it, we would have no way to write down flow conservation at a vertex or distinguish permissible from forbidden routes.
A network is at heart a directed graph: a set of locations (vertices) connected by directed links (edges). The direction on each edge matters, because it encodes a permissible direction of flow — goods, data, current, traffic — along that link.
[definition: Directed Graph]
A **directed graph** (or **network**) is a pair $G = (V, E)$, where $V$ is a finite set of **vertices** and $E \subseteq V \times V$ is a set of **directed edges**. An element $(u, v) \in E$ is an edge *from* $u$ *to* $v$; we call $u$ the **tail** and $v$ the **head** of the edge.
[/definition]
The direction on an edge is a genuine constraint: a road may be one-way, a pipeline may only pump in one direction. In an undirected graph we could traverse any edge freely; in a directed graph, we must respect the arrow. However, for connectivity questions it is sometimes useful to ask whether two vertices are connected if we are allowed to walk against the arrows, which motivates the degree count and the undirected walk below.
The degree of a vertex counts how many edges touch it, regardless of direction.
[definition: Degree]
The **degree** of a vertex $u \in V$ is the number of vertices $v \in V$ such that $(u, v) \in E$ or $(v, u) \in E$.
[/definition]
To describe how flow moves through the network — along which sequences of edges it can travel — we need a precise vocabulary for graph traversal. The basic concept is a walk; the important refinements are paths (no repeated vertex) and cycles (returning to the start).
[definition: Walk]
A **walk** from $u \in V$ to $v \in V$ is a sequence of vertices $u = v_1, v_2, \ldots, v_k = v$ such that $(v_i, v_{i+1}) \in E$ for all $i = 1, \ldots, k-1$. An **undirected walk** relaxes this to allow either $(v_i, v_{i+1}) \in E$ or $(v_{i+1}, v_i) \in E$ at each step — that is, we are permitted to traverse an edge against its direction.
[/definition]
The distinction between directed and undirected walks matters throughout the chapter. Directed walks follow the arrows and model actual flow paths. Undirected walks ignore arrow directions and are used to discuss structural properties of the graph, such as connectivity. Both notions are needed.
[definition: Path]
A **path** from $u$ to $v$ is a walk $v_1, \ldots, v_k$ in which the vertices $v_1, \ldots, v_k$ are pairwise distinct.
[/definition]
[definition: Cycle]
A **cycle** is a walk $v_1, v_2, \ldots, v_k$ in which $v_1 = v_k$ and the vertices $v_1, \ldots, v_{k-1}$ are pairwise distinct.
[/definition]
Cycles deserve special attention in network flow. A directed cycle is a route along which flow can circulate indefinitely; if that cycle has negative total cost, we can save money by pushing more flow around it, so detecting and eliminating negative-cost cycles is central to the minimum-cost flow algorithm. Undirected cycles matter for a different reason: the presence or absence of undirected cycles characterises whether a subgraph is a tree, which (as we will see) is precisely what distinguishes a basis from a non-basis in the network simplex method.
These structural notions lead to the two most important classes of special graphs in network flow theory.
[definition: Connected Graph]
A directed graph $G = (V, E)$ is **connected** if for every pair of vertices $u, v \in V$ there exists an undirected path from $u$ to $v$.
[/definition]
Note the use of *undirected* path in the connectivity definition. A graph in which every pair of vertices is connected by a directed path is called strongly connected and is a much stronger condition. Connectedness in the sense above — reachability ignoring directions — is the minimal structural assumption we need for the network to be "all one piece" and for flow conservation to be non-trivially coupled across vertices.
[definition: Tree]
A **tree** is a connected graph that contains no undirected cycle.
[/definition]
Trees are the combinatorial backbone of the network simplex method that we will develop in the next section. The critical property is the edge count: a tree on $n$ vertices has exactly $n - 1$ edges. This follows from the fact that a connected graph on $n$ vertices needs at least $n-1$ edges to be connected (a path uses exactly $n-1$), and every additional edge beyond $n-1$ creates a cycle. A tree is precisely the minimal connected graph — enough edges to connect all vertices, not so many that any cycle forms. This edge count translates directly into the LP framework: the flow conservation constraints have $n$ equations, but one is redundant (the supply-balance condition), leaving $n - 1$ independent equations, exactly as many as the edges in a spanning tree.
[definition: Spanning Tree]
A **spanning tree** of a graph $G = (V, E)$ is a tree $(V, E')$ with $E' \subseteq E$ — that is, a tree that uses every vertex of $G$ and only edges already present in $G$.
[/definition]
<!-- illustration-needed: a small directed graph on 5–6 vertices, with one spanning tree highlighted in a different colour, showing which edges belong to the tree and which do not -->
[remark: Spanning trees and bases]
The connection between spanning trees and LP bases will become concrete in the next section. The spanning tree will play exactly the role of a basis $B$ in the simplex tableau: a set of $n - 1$ edges that determines all basic variables, just as a basis matrix determines all basic variables in the general simplex method.
[/remark]
## Capacities, Supplies, and Flow
A directed graph becomes a *network flow model* once we attach numerical data to its vertices and edges. There are three kinds of data: how much flow each vertex injects or absorbs (the supply/demand vector), how much it costs to move a unit along each edge (the cost matrix), and how much flow each edge can carry (the capacity bounds).
[definition: Capacitated Network]
Let $G = (V, E)$ be a directed graph with $|V| = n$. A **capacitated network** associates to each edge $(i, j) \in E$:
- a **cost** $c_{ij} \in \mathbb{R}$, the cost of routing one unit of flow along $(i, j)$,
- a **lower bound** $\underline{m}_{ij} \geq 0$, the minimum flow required on $(i, j)$,
- an **upper bound** $\overline{m}_{ij} \geq \underline{m}_{ij}$, the maximum flow permitted on $(i, j)$.
We collect these into matrices $C, \underline{M}, \overline{M} \in \mathbb{R}^{n \times n}$, setting entries to $0$ for pairs $(i, j) \notin E$.
[/definition]
Each vertex is also assigned a **supply/demand value** $b_i$. Intuitively, $b_i > 0$ means vertex $i$ *produces* $b_i$ units of flow (a **source**), while $b_i < 0$ means it *consumes* $|b_i|$ units (a **sink**). We encode these as a vector $b \in \mathbb{R}^n$.
[definition: Flow]
A **flow** on the capacitated network $(G, C, \underline{M}, \overline{M}, b)$ is a matrix $x \in \mathbb{R}^{n \times n}$ (with $x_{ij} = 0$ for $(i,j) \notin E$) satisfying:
**Capacity constraints:** for all $(i, j) \in E$,
\begin{align*}
\underline{m}_{ij} \leq x_{ij} \leq \overline{m}_{ij}.
\end{align*}
**Flow conservation:** for each vertex $i \in V$,
\begin{align*}
b_i + \sum_{j:\,(j,i)\in E} x_{ji} = \sum_{j:\,(i,j)\in E} x_{ij}.
\end{align*}
[/definition]
The flow conservation constraint deserves careful reading. For a vertex $i$, the left-hand side counts the flow *into* $i$: both the externally supplied amount $b_i$ and all flow arriving along incoming edges. The right-hand side counts the flow *out of* $i$ along outgoing edges. Equality says that nothing is created or destroyed inside a vertex — everything that arrives must leave.
[remark: Sign convention for $b$]
Some texts write flow conservation as $\sum_{j} x_{ij} - \sum_{j} x_{ji} = b_i$, which is the outflow minus inflow equals supply. This is algebraically equivalent to the definition above. The present sign convention — supply adds to inflow — makes the formulation symmetric with the matrix form $Ax = b$ that arises when we write the problem as a general LP.
[/remark]
A necessary condition for any feasible flow to exist is that the total supply equals the total demand:
\begin{align*}
\sum_{i \in V} b_i = 0.
\end{align*}
This follows immediately from summing the flow conservation equations over all vertices: each flow variable $x_{ij}$ appears once with a positive sign (at the outflow of $i$) and once with a negative sign (at the inflow of $j$), so all flow terms cancel and only $\sum_i b_i$ remains. If this sum were nonzero, the system would be inconsistent — more flow would be entering the network globally than leaving, or vice versa.
[example: A small network]
Consider a network with four vertices $V = \{1, 2, 3, 4\}$ and edges
\begin{align*}
E = \{(1,2),\,(1,3),\,(2,4),\,(3,4)\},
\end{align*}
with supply vector $b = (10, 0, 0, -10)^\top$ (vertex 1 is a source, vertex 4 is a sink, vertices 2 and 3 are intermediate). Suppose all lower bounds are $0$ and upper bounds are $\overline{m}_{ij} = 8$ for every edge. A flow is a choice of $x_{12}, x_{13}, x_{24}, x_{34} \in [0, 8]$ satisfying the conservation equations:
- Vertex 1: $10 = x_{12} + x_{13}$
- Vertex 2: $x_{12} = x_{24}$
- Vertex 3: $x_{13} = x_{34}$
- Vertex 4: $x_{24} + x_{34} = 10$
Setting $x_{12} = 6$, $x_{13} = 4$, $x_{24} = 6$, $x_{34} = 4$ satisfies all constraints and lies within the capacity bounds. Setting $x_{12} = 9$ violates the upper bound $\overline{m}_{12} = 8$, so it is not a valid flow.
What would happen if the supply-balance condition failed? Suppose instead $b = (10, 0, 0, -8)^\top$, so that the source produces 10 units but the sink only absorbs 8. The conservation equations then sum to $\sum_i b_i = 2 \neq 0$, and the system has no solution — there is nowhere for the surplus 2 units to go. In practice, one handles this by introducing a dummy sink with demand $2$ and zero-cost edges from every node, absorbing the surplus without affecting the optimal cost structure.
[/example]
The three network problems treated in subsequent sections — minimum-cost flow, transportation, and maximum flow — all ask us to choose a flow satisfying the above constraints while optimising some objective. Each is a linear program whose constraint matrix inherits special structure from the graph, and that structure drives the efficient algorithms we will develop.
## Minimum-cost Flow Problem
The minimum-cost flow problem is the central optimisation problem in network theory. It unifies a wide family of practical questions — shipping goods between factories and warehouses, routing data through a communications network, scheduling flows in a pipeline — under a single mathematical framework. Before tackling specialised variants such as transportation or maximum flow, it pays to understand the general formulation: what constraints a valid flow must satisfy, how to write the problem as a linear program, and how duality theory furnishes optimality conditions that can be checked without solving the full LP.
## The Cost Formulation
The central question of this section is: given a network where flow is produced at some vertices and consumed at others, how do we minimise the total cost of routing flow from producers to consumers? We need to write this as an LP, which means identifying the decision variables (how much flow on each edge), the objective (total cost), and the constraints (capacity bounds and flow conservation at every vertex).
Let $G = (V, E)$ be a directed graph with $|V| = n$ vertices. Every vertex $i \in V$ carries a **supply-demand value** $b_i \in \mathbb{R}$. The interpretation is simple: if $b_i > 0$, vertex $i$ is a source that injects $b_i$ units of flow into the network (think of a factory producing goods). If $b_i < 0$, vertex $i$ is a sink that absorbs $|b_i|$ units. If $b_i = 0$, the vertex is a pure transit point.
[definition: Supply-Demand Vector]
Let $G = (V, E)$ be a directed network. A **supply-demand vector** is $b \in \mathbb{R}^n$ with components $b_i$ for each $i \in V$. Vertex $i$ is called a **source** if $b_i > 0$ and a **sink** if $b_i < 0$.
[/definition]
For the flow to be globally consistent, total production must equal total consumption:
\begin{align*}
\sum_{i \in V} b_i = 0.
\end{align*}
This is a necessary condition: if more flow enters the network than leaves, there is nowhere for the surplus to go.
Each edge $(i, j) \in E$ carries three parameters:
- $c_{ij} \in \mathbb{R}$: the cost of routing one unit of flow along the edge,
- $\underline{m}_{ij} \geq 0$: the minimum amount of flow that must use the edge (lower capacity bound),
- $\overline{m}_{ij} \geq 0$: the maximum amount of flow the edge can carry (upper capacity bound).
We collect these into matrices $C, \underline{M}, \overline{M} \in \mathbb{R}^{n \times n}$, setting entries to $0$ for pairs $(i, j) \notin E$. The decision variable is $x \in \mathbb{R}^{n \times n}$, where $x_{ij}$ is the amount of flow sent along edge $(i, j)$.
[definition: Minimum-Cost Flow Problem]
Given a directed network $G = (V, E)$ with supply-demand vector $b \in \mathbb{R}^n$, cost matrix $C \in \mathbb{R}^{n \times n}$, and capacity bounds $\underline{M}, \overline{M} \in \mathbb{R}^{n \times n}$, the **minimum-cost flow problem** is
\begin{align*}
\text{minimise} \quad & \sum_{(i,j) \in E} c_{ij} x_{ij} \\
\text{subject to} \quad & b_i + \sum_{j:\, (j,i) \in E} x_{ji} = \sum_{j:\, (i,j) \in E} x_{ij} \quad \text{for each } i \in V, \\
& \underline{m}_{ij} \leq x_{ij} \leq \overline{m}_{ij} \quad \text{for all } (i,j) \in E.
\end{align*}
A feasible solution $x$ satisfying all constraints is called a **feasible flow**, and an optimal feasible flow is called a **minimum-cost flow**.
[/definition]
The equality constraints are **flow conservation**: at each vertex, the supply $b_i$ plus total inflow equals total outflow. Flow is neither created nor destroyed in transit.
[remark: LP Structure]
The minimum-cost flow problem is a linear program. To put it in standard form $\min\{c^\top x : Ax = b,\, \underline{m} \leq x \leq \overline{m}\}$, one takes $A$ to be the **node-arc incidence matrix**: the entry $A_{ik}$ equals $+1$ if edge $k$ starts at vertex $i$, $-1$ if edge $k$ ends at vertex $i$, and $0$ otherwise. The resulting matrix $A$ is large — it has $n$ rows and $|E|$ columns — but exploiting its network structure leads to far more efficient algorithms than applying the general simplex method to $A$ directly.
[/remark]
## Reductions: Circulation and Uncapacitated Problems
Two standard simplifications arise frequently enough to deserve names.
[definition: Circulation Problem]
A **circulation problem** is a minimum-cost flow problem in which $b_i = 0$ for all $i \in V$. Every vertex is a pure transit point.
[/definition]
Any minimum-cost flow problem can be reduced to a circulation problem: introduce one additional vertex $s^*$, and for each vertex $i$ with $b_i < 0$ (a sink), add an edge from $i$ to $s^*$ with supply $|b_i|$, cost $0$, and capacity $|b_i|$; for each source with $b_i > 0$, add an edge from $s^*$ to $i$ with the corresponding supply. The extra vertex $s^*$ absorbs all imbalances, making every original vertex a transit node. This reduction is useful for theoretical arguments and sometimes simplifies algorithm design.
[definition: Uncapacitated Flow Problem]
A **uncapacitated flow problem** is the case where $\underline{m}_{ij} = 0$ and $\overline{m}_{ij} = +\infty$ for all $(i, j) \in E$.
[/definition]
An uncapacitated problem is either unbounded (flow can increase indefinitely along a negative-cost cycle) or has an optimal solution. In the latter case, if all costs are non-negative, the optimal solution uses at most $\sum_{i \in V} |b_i|$ units in total, so it is equivalent to a problem with finite capacity $\overline{m}_{ij} = \sum_{i \in V} |b_i|$ on every edge. Thus, for problems with non-negative costs, one may always pass to a problem with finite capacities without changing the set of optimal solutions.
## The LP Formulation and Node Potentials
Because the minimum-cost flow problem is a linear program, the theory of LP duality applies directly. The dual variables associated with the flow conservation constraints are called **node potentials**, and they play the same role as Lagrange multipliers in a continuous optimisation problem.
[definition: Node Potentials]
Given a feasible flow $x$ for the minimum-cost flow problem on $G = (V, E)$, a vector $\pi \in \mathbb{R}^n$ is called a **node potential** (or **price vector**) if $\pi_i$ is the dual variable corresponding to the flow conservation constraint at vertex $i \in V$.
[/definition]
The Lagrangian of the LP relaxation with node potentials $\pi$ is
\begin{align*}
L(x, \pi) = \sum_{(i,j) \in E} c_{ij} x_{ij} + \sum_{i \in V} \pi_i \left( b_i + \sum_{j:\,(j,i)\in E} x_{ji} - \sum_{j:\,(i,j)\in E} x_{ij} \right).
\end{align*}
Grouping terms by edge, the coefficient of $x_{ij}$ in $L$ is $c_{ij} + \pi_i^{\text{head}} - \pi_i^{\text{tail}}$, where "tail" and "head" refer to the endpoints of $(i,j)$. More precisely, for edge $(i,j)$:
\begin{align*}
c_{ij} - \pi_i + \pi_j.
\end{align*}
This quantity is so important it receives its own name.
[definition: Reduced Cost]
Given node potentials $\pi \in \mathbb{R}^n$, the **reduced cost** of edge $(i,j) \in E$ is
\begin{align*}
\bar{c}_{ij} = c_{ij} - \pi_i + \pi_j.
\end{align*}
[/definition]
The reduced cost has a natural interpretation: $\pi_i$ is the "price" at node $i$, and $\bar{c}_{ij}$ is the net cost of shipping one unit along $(i,j)$ after accounting for the value gained by arriving at $j$ and the value lost by leaving $i$.
[remark: Potential Invariance of Cost]
The total cost of any feasible flow $x$ satisfies
\begin{align*}
\sum_{(i,j)\in E} c_{ij} x_{ij} = \sum_{(i,j)\in E} \bar{c}_{ij} x_{ij} + \sum_{i \in V} \pi_i b_i,
\end{align*}
since the potential terms telescope to zero for any flow satisfying conservation. Consequently, for fixed $b$, minimising cost over feasible flows is equivalent to minimising reduced cost.
[/remark]
## Optimality Conditions
The LP structure of the minimum-cost flow problem means that complementary slackness provides necessary and sufficient conditions for optimality. These translate directly into conditions on reduced costs.
[quotetheorem:2566]
[citeproof:2566]
To understand why these conditions make intuitive sense, think about what would happen if some edge $(i, j)$ with $x^*_{ij} < \overline{m}_{ij}$ had $\bar{c}_{ij} < 0$. Then sending one additional unit along $(i,j)$ would decrease total cost (after adjusting for potentials), contradicting optimality. The conditions simply say: at an optimum, no improvement can be obtained by reallocating flow along any edge.
[example: Checking Optimality with Reduced Costs]
Consider a network with three vertices $\{1, 2, 3\}$ and edges $(1,2)$, $(1,3)$, $(2,3)$ with costs $c_{12} = 4$, $c_{13} = 6$, $c_{23} = 1$. Suppose $b_1 = 2$, $b_2 = 0$, $b_3 = -2$, and all capacities are finite with $\overline{m}_{ij} = 5$ and $\underline{m}_{ij} = 0$.
Consider the candidate flow $x^*_{12} = 2$, $x^*_{23} = 2$, $x^*_{13} = 0$, which routes all supply from vertex $1$ to vertex $3$ via vertex $2$. Verify conservation: vertex $1$ has outflow $2$, matching $b_1 = 2$; vertex $2$ has inflow $2$ and outflow $2$, matching $b_2 = 0$; vertex $3$ has inflow $2$, matching $|b_3| = 2$.
To check optimality, choose potentials $\pi_1 = 0$, $\pi_2 = -4$, $\pi_3 = -5$. Then:
\begin{align*}
\bar{c}_{12} &= 4 - 0 + (-4) = 0, \\
\bar{c}_{23} &= 1 - (-4) + (-5) = 0, \\
\bar{c}_{13} &= 6 - 0 + (-5) = 1 \geq 0.
\end{align*}
For edges $(1,2)$ and $(2,3)$: $x^*_{ij} > 0$ and $\bar{c}_{ij} = 0$, so complementary slackness holds. For edge $(1,3)$: $x^*_{13} = 0$ and $\bar{c}_{13} = 1 \geq 0$, again consistent. Since all optimality conditions are satisfied, this flow is optimal, with total cost $4 \cdot 2 + 1 \cdot 2 = 10$.
Note that routing all flow directly along $(1, 3)$ would cost $6 \cdot 2 = 12$, confirming that the path $1 \to 2 \to 3$ is indeed cheaper.
[/example]
The optimality conditions translate directly into an algorithm: start with any feasible flow, find an edge where the conditions are violated, and update the flow to eliminate the violation. This leads to the network simplex method, a specialised simplex algorithm that exploits the spanning tree structure of basic feasible solutions and is in practice far more efficient than applying the general simplex to the node-arc incidence matrix. The transportation problem, studied in the next section, illustrates this approach in the bipartite special case where the tableau structure makes the computation explicit and tractable.
<!-- illustration-needed: the reduced cost diagram — show a small network with nodes labeled by potentials pi_i, edges labeled by original cost c_ij and reduced cost c_ij - pi_i + pi_j, illustrating how potentials shift edge costs while preserving the optimal flow decision -->
## The Transportation Problem
The minimum-cost flow problem, developed in the previous sections, admits a particularly clean and practically important special case: the **transportation problem**. Here the network has a bipartite structure — all edges run from a set of suppliers to a set of consumers — and this structure makes the algorithm considerably more concrete. The transportation problem is one of the oldest and most-studied models in operations research, and its tableau-based algorithm prefigures many ideas in modern network optimisation.
A transportation problem arises when goods must be shipped from $n$ supply nodes (factories, warehouses) to $m$ demand nodes (retailers, customers), and we want to minimise total shipping cost. The network is bipartite: every edge runs from some supplier $i \in \{1, \ldots, n\}$ to some consumer $j \in \{1, \ldots, m\}$.
[definition: Transportation Problem]
A **transportation problem** with $n$ suppliers and $m$ consumers is the linear program:
\begin{align*}
\text{minimise} \quad & \sum_{i=1}^{n}\sum_{j=1}^{m} c_{ij} x_{ij} \\
\text{subject to} \quad & \sum_{j=1}^{m} x_{ij} = s_i, \quad i = 1, \ldots, n, \\
& \sum_{i=1}^{n} x_{ij} = d_j, \quad j = 1, \ldots, m, \\
& x_{ij} \geq 0 \quad \text{for all } i, j.
\end{align*}
Here $s \in \mathbb{R}^n$ with $s_i > 0$ is the **supply** vector, $d \in \mathbb{R}^m$ with $d_j > 0$ is the **demand** vector satisfying the balance condition $\sum_{i=1}^n s_i = \sum_{j=1}^m d_j$, and $c \in \mathbb{R}^{n \times m}$ is the **cost matrix**, with $c_{ij}$ the cost of shipping one unit from supplier $i$ to consumer $j$.
[/definition]
The balance condition $\sum s_i = \sum d_j$ is essential: total supply must equal total demand for the constraint system to be feasible. In practice, if they differ, one introduces a dummy supplier or dummy consumer with zero shipping cost to absorb the slack.
### Reduction from Min-Cost Flow
The transportation problem is not just a convenient special case — every bounded min-cost flow problem is equivalent to some transportation problem. This means all the theory developed for the transportation problem applies, via reduction, to the general min-cost flow setting.
[quotetheorem:2567]
[citeproof:2567]
<!-- illustration-needed: the reduction gadget — show an edge (i,j) with capacity m-bar_ij being replaced by a supplier node "ij" sending flow to consumer node i (cost 0) and consumer node j (cost c_ij), with the intuition that unused capacity goes back to i -->
When is this reduction useful? The main benefit is theoretical: results proved for the transportation problem — finite-time termination of the tableau algorithm, integer optimality for integer data, the spanning-tree BFS characterisation — automatically transfer to all bounded min-cost flow problems via the reduction. In practice, however, the reduction inflates the problem size considerably: a network with $|E|$ edges becomes a transportation instance with $|E|$ suppliers and $n + |E|$ consumers. For small networks this is manageable, but for large sparse graphs the reduction may be slower in practice than a direct network simplex implementation. The reduction also has a representational cost: the original network structure disappears inside the bipartite graph, so any solution to the transportation instance must be decoded back through the construction to recover the flow values on original edges. The right approach depends on the problem at hand: use the reduction for theoretical purposes and for very small instances; prefer a direct network simplex for larger problems where sparsity matters.
## The Transportation Tableau
What goes wrong if we simply apply the general simplex method to the transportation LP? In principle, nothing — the transportation problem is a valid LP and simplex works. But in practice, a transportation instance with $n$ suppliers and $m$ consumers has $nm$ decision variables and $n + m$ constraints, so the full simplex tableau has $O(nm)$ cells and each pivot takes $O(nm)$ work. For even moderately sized instances — 50 suppliers and 50 consumers gives 2500 variables — this is slow and opaque. The transportation tableau exploits the bipartite structure to reduce each pivot to a cycle-finding operation on a graph with only $n + m - 1$ basic edges, making the bookkeeping both efficient and transparent.
The transportation problem has $n + m$ equality constraints (one per supplier, one per consumer), but only $n + m - 1$ of them are independent, because the balance condition makes any one constraint redundant. A **basic feasible solution** (BFS) therefore has exactly $n + m - 1$ basic variables.
There is a natural bookkeeping device for tracking a BFS and its dual variables simultaneously: the transportation tableau.
[definition: Transportation Tableau]
The **transportation tableau** for an $n \times m$ transportation problem is a grid with $n$ rows (one per supplier) and $m$ columns (one per consumer). Each cell $(i,j)$ holds:
- the cost $c_{ij}$ in a corner box, and
- the current flow $x_{ij}$ (left blank if $x_{ij} = 0$).
Outside the grid, the row margins record the supplies $s_i$ and the column margins record the demands $d_j$. The dual variables $\lambda_i$ (one per supplier) and $\mu_j$ (one per consumer) are displayed along the row and column headers respectively. Each cell also carries the **reduced cost** $\bar{c}_{ij} = \lambda_i - \mu_j$, which is compared against $c_{ij}$ to test optimality.
[/definition]
<!-- illustration-needed: a generic 3x4 transportation tableau showing the positions of c_ij, x_ij, lambda_i, mu_j, and the reduced cost lambda_i - mu_j in each cell -->
The key structural fact linking the tableau to linear programming is the following: the basic edges of a BFS form a spanning tree of the bipartite graph.
[explanation: BFS and Spanning Trees]
In the transportation problem with $n$ suppliers and $m$ consumers, the bipartite graph has $n + m$ vertices. A spanning tree of this graph has exactly $n + m - 1$ edges. A basic feasible solution assigns nonzero flow to exactly $n + m - 1$ edges (in the non-degenerate case), and these edges form precisely a spanning tree. This is the transportation analogue of the general fact that a BFS of an LP corresponds to a basis of the constraint matrix.
The spanning-tree structure is what makes dual variable computation tractable. Given a spanning tree $T$ of basic edges, the complementary slackness condition
\begin{align*}
(c_{ij} - \lambda_i + \mu_j)\, x_{ij} = 0
\end{align*}
forces $\lambda_i - \mu_j = c_{ij}$ for every basic edge $(i,j) \in T$. This is a system of $n + m - 1$ equations in $n + m$ unknowns, so we fix one variable freely (typically $\lambda_1 = 0$) and propagate the remaining values along the tree.
[/explanation]
## Dual Variables and Optimality
What do we need in order to check whether a given feasible flow for the transportation problem is optimal, without solving the full LP from scratch? The answer is a pair of dual multipliers — one per supplier, one per consumer — that certify optimality via complementary slackness. Given these multipliers, checking optimality reduces to scanning the non-basic cells of the tableau, comparing each cell's cost against the corresponding multiplier combination. The challenge is computing the multipliers efficiently; the spanning-tree structure of the BFS is precisely what makes this tractable.
To apply the simplex method to the transportation problem, we derive the dual from first principles via the Lagrangian. Introducing multipliers $\lambda_i$ for the supply constraints and $\mu_j$ for the demand constraints (with a sign convention chosen so that the optimality condition takes a symmetric form), the Lagrangian is:
\begin{align*}
L(x, \lambda, \mu) &= \sum_{i=1}^{n}\sum_{j=1}^{m} c_{ij} x_{ij} + \sum_{i=1}^{n} \lambda_i \left(s_i - \sum_{j=1}^{m} x_{ij}\right) - \sum_{j=1}^{m} \mu_j \left(d_j - \sum_{i=1}^{n} x_{ij}\right).
\end{align*}
Rearranging, this becomes:
\begin{align*}
L(x, \lambda, \mu) &= \sum_{i=1}^{n}\sum_{j=1}^{m} (c_{ij} - \lambda_i + \mu_j)\, x_{ij} + \sum_{i=1}^{n} \lambda_i s_i - \sum_{j=1}^{m} \mu_j d_j.
\end{align*}
Since $x \geq 0$, the Lagrangian is bounded below in $x$ if and only if $c_{ij} - \lambda_i + \mu_j \geq 0$ for all $i, j$. This is the **dual feasibility condition**.
[definition: Dual Feasibility and Optimality Conditions]
A pair $(\lambda, \mu) \in \mathbb{R}^n \times \mathbb{R}^m$ is **dual feasible** if
\begin{align*}
\lambda_i - \mu_j \leq c_{ij} \quad \text{for all } i = 1,\ldots,n,\ j = 1,\ldots,m.
\end{align*}
A primal feasible solution $x$ and dual feasible $(\lambda, \mu)$ together satisfy the **complementary slackness conditions**:
\begin{align*}
(c_{ij} - \lambda_i + \mu_j)\, x_{ij} = 0 \quad \text{for all } i, j.
\end{align*}
When both primal feasibility, dual feasibility, and complementary slackness hold simultaneously, $x$ is optimal.
[/definition]
For a BFS corresponding to spanning tree $T$: since $x_{ij} > 0$ for $(i,j) \in T$, complementary slackness forces $c_{ij} - \lambda_i + \mu_j = 0$, i.e., $\lambda_i - \mu_j = c_{ij}$, for all basic edges. This determines all dual variables once one is fixed. Dual feasibility is then checked for the non-basic cells: if $\lambda_i - \mu_j \leq c_{ij}$ holds everywhere, the current BFS is optimal. If some non-basic cell $(i,j)$ has $\lambda_i - \mu_j > c_{ij}$, we can reduce cost by routing flow through that cell, and we perform a pivot.
## Pivoting on a Cycle
When dual feasibility is violated at a non-basic cell $(p,q)$ — meaning $\lambda_p - \mu_q > c_{pq}$ — we bring edge $(p,q)$ into the basis. Adding this edge to the spanning tree creates exactly one cycle. We increase the flow $x_{pq}$ by $\delta > 0$ and adjust flows along the cycle to maintain feasibility, alternating between increasing and decreasing edges as the cycle is traversed.
The pivot rule is: increase $\delta$ as far as possible without violating non-negativity, i.e., set $\delta = \min\{x_{ij} : (i,j) \text{ is a decreasing edge on the cycle}\}$. The edge achieving this minimum drops to zero and leaves the basis, restoring the spanning-tree structure for the next iteration. This is the transportation analogue of the simplex pivot.
[remark: Why the Cycle is Unique]
Adding any single non-basic edge to a spanning tree creates exactly one cycle — this is a basic fact about trees. The cycle alternates between edges that need to increase and edges that need to decrease in order to maintain the row and column sums. This alternating structure means the pivot is entirely determined once the entering edge is chosen.
[/remark]
## A Fully Worked Example
[example: Transportation Algorithm — Three Suppliers, Four Consumers]
We have three suppliers with supplies $s_1 = 8$, $s_2 = 10$, $s_3 = 9$, and four consumers with demands $d_1 = 6$, $d_2 = 5$, $d_3 = 8$, $d_4 = 8$. Note $8 + 10 + 9 = 6 + 5 + 8 + 8 = 27$, so the balance condition holds. The cost matrix is:
\begin{align*}
c = \begin{pmatrix} 5 & 3 & 4 & 6 \\ 2 & 7 & 4 & 1 \\ 5 & 6 & 2 & 4 \end{pmatrix}.
\end{align*}
**Step 1: Find an initial BFS (northwest corner method).**
We fill the tableau greedily, starting from cell $(1,1)$ and satisfying supply/demand constraints in order. Supplier 1 has supply 8. Consumer 1 needs 6, so set $x_{11} = 6$; supplier 1 has 2 remaining. Consumer 2 needs 5, so take $x_{12} = 2$; supplier 1 is exhausted, consumer 2 still needs 3. Move to supplier 2: set $x_{22} = 3$; supplier 2 has 7 remaining. Consumer 3 needs 8, so set $x_{23} = 7$; supplier 2 is exhausted, consumer 3 needs 1 more. Move to supplier 3: set $x_{33} = 1$; supplier 3 has 8 remaining. Consumer 4 needs 8, so $x_{34} = 8$.
The initial allocation is:
\begin{align*}
x_{11} = 6,\quad x_{12} = 2,\quad x_{22} = 3,\quad x_{23} = 7,\quad x_{33} = 1,\quad x_{34} = 8.
\end{align*}
This gives $n + m - 1 = 3 + 4 - 1 = 6$ basic variables, forming a spanning tree of the bipartite graph. The initial cost is $6 \cdot 5 + 2 \cdot 3 + 3 \cdot 7 + 7 \cdot 4 + 1 \cdot 2 + 8 \cdot 4 = 30 + 6 + 21 + 28 + 2 + 32 = 119$.
**Step 2: Compute dual variables.**
For basic edges, we need $\lambda_i - \mu_j = c_{ij}$. Fix $\lambda_1 = 0$. Then:
- $(1,1)$: $\lambda_1 - \mu_1 = c_{11} = 5$, so $\mu_1 = -5$.
- $(1,2)$: $\lambda_1 - \mu_2 = c_{12} = 3$, so $\mu_2 = -3$.
- $(2,2)$: $\lambda_2 - \mu_2 = c_{22} = 7$, so $\lambda_2 = 7 + \mu_2 = 4$.
- $(2,3)$: $\lambda_2 - \mu_3 = c_{23} = 4$, so $\mu_3 = \lambda_2 - 4 = 0$.
- $(3,3)$: $\lambda_3 - \mu_3 = c_{33} = 2$, so $\lambda_3 = 2$.
- $(3,4)$: $\lambda_3 - \mu_4 = c_{34} = 4$, so $\mu_4 = -2$.
So $(\lambda_1, \lambda_2, \lambda_3) = (0, 4, 2)$ and $(\mu_1, \mu_2, \mu_3, \mu_4) = (-5, -3, 0, -2)$.
**Step 3: Check dual feasibility (optimality test).**
For each non-basic cell $(i,j)$, compute $\lambda_i - \mu_j$ and compare to $c_{ij}$:
- $(1,3)$: $0 - 0 = 0 \leq 4 = c_{13}$. OK.
- $(1,4)$: $0 - (-2) = 2 \leq 6 = c_{14}$. OK.
- $(2,1)$: $4 - (-5) = 9 > 2 = c_{21}$. **Violation.**
- $(2,4)$: $4 - (-2) = 6 > 1 = c_{24}$. Violation.
- $(3,1)$: $2 - (-5) = 7 > 5 = c_{31}$. Violation.
- $(3,2)$: $2 - (-3) = 5 \leq 6 = c_{32}$. OK.
We have violations, so the current solution is not optimal. The most violated cell is $(2,1)$ with $\lambda_2 - \mu_1 = 9 > 2 = c_{21}$.
**Step 4: Pivot on cell (2,1).**
We bring edge $(2,1)$ into the basis. Adding it to the spanning tree $\{(1,1),(1,2),(2,2),(2,3),(3,3),(3,4)\}$ creates the cycle $(2,1) \to (1,1) \to (1,2) \to (2,2) \to (2,1)$. Traversing this cycle: we increase $x_{21}$ and $x_{12}$ by $\delta$, and decrease $x_{11}$ and $x_{22}$ by $\delta$.
The decreasing edges are $(1,1)$ with $x_{11} = 6$ and $(2,2)$ with $x_{22} = 3$. The maximum $\delta$ is $\min(6, 3) = 3$. Edge $(2,2)$ leaves the basis.
After the pivot: $x_{21} = 3$, $x_{11} = 3$, $x_{12} = 5$, $x_{22} = 0$ (leaves basis). The new BFS is:
\begin{align*}
x_{11} = 3,\quad x_{12} = 5,\quad x_{21} = 3,\quad x_{23} = 7,\quad x_{33} = 1,\quad x_{34} = 8.
\end{align*}
The new cost is $3 \cdot 5 + 5 \cdot 3 + 3 \cdot 2 + 7 \cdot 4 + 1 \cdot 2 + 8 \cdot 4 = 15 + 15 + 6 + 28 + 2 + 32 = 98$. The improvement is $119 - 98 = 21 = 3 \cdot (c_{21} - (\lambda_2 - \mu_1)\text{ deficit}) = 3 \cdot (9 - 2) = 21$, which confirms the computation.
**Step 5: Recompute dual variables and re-check.**
Fix $\lambda_1 = 0$. The new spanning tree is $\{(1,1),(1,2),(2,1),(2,3),(3,3),(3,4)\}$.
- $(1,1)$: $\mu_1 = -5$.
- $(1,2)$: $\mu_2 = -3$.
- $(2,1)$: $\lambda_2 = c_{21} + \mu_1 = 2 + (-5) = -3$.
- $(2,3)$: $\mu_3 = \lambda_2 - c_{23} = -3 - 4 = -7$.
- $(3,3)$: $\lambda_3 = c_{33} + \mu_3 = 2 + (-7) = -5$.
- $(3,4)$: $\mu_4 = \lambda_3 - c_{34} = -5 - 4 = -9$.
Now check non-basic cells:
- $(1,3)$: $\lambda_1 - \mu_3 = 0-(-7) = 7 > 4 = c_{13}$. Violation.
- $(1,4)$: $0-(-9) = 9 > 6$. Violation.
- $(2,2)$: $-3-(-3) = 0 \leq 7$. OK.
- $(2,4)$: $-3-(-9) = 6 > 1$. Violation.
- $(3,1)$: $-5-(-5) = 0 \leq 5$. OK.
- $(3,2)$: $-5-(-3) = -2 \leq 6$. OK.
The most violated cell is $(2,4)$ with $\lambda_2 - \mu_4 = 6 > 1 = c_{24}$.
**Step 6: Pivot on cell (2,4).**
Adding edge $(2,4)$ to the tree $\{(1,1),(1,2),(2,1),(2,3),(3,3),(3,4)\}$ creates the cycle $(2,4) \to (3,4) \to (3,3) \to (2,3) \to (2,4)$. We increase $x_{24}$ and $x_{33}$ by $\delta$, and decrease $x_{34}$ and $x_{23}$ by $\delta$.
Decreasing edges: $(3,4)$ with $x_{34} = 8$ and $(2,3)$ with $x_{23} = 7$. Maximum $\delta = \min(8, 7) = 7$. Edge $(2,3)$ leaves the basis.
After the pivot: $x_{24} = 7$, $x_{33} = 8$, $x_{34} = 1$, $x_{23} = 0$ (leaves). The new BFS is:
\begin{align*}
x_{11} = 3,\quad x_{12} = 5,\quad x_{21} = 3,\quad x_{24} = 7,\quad x_{33} = 8,\quad x_{34} = 1.
\end{align*}
New cost: $3 \cdot 5 + 5 \cdot 3 + 3 \cdot 2 + 7 \cdot 1 + 8 \cdot 2 + 1 \cdot 4 = 15 + 15 + 6 + 7 + 16 + 4 = 63$.
**Step 7: Final optimality check.**
Recompute dual variables with spanning tree $\{(1,1),(1,2),(2,1),(2,4),(3,3),(3,4)\}$, fixing $\lambda_1 = 0$:
- $(1,1)$: $\mu_1 = -5$.
- $(1,2)$: $\mu_2 = -3$.
- $(2,1)$: $\lambda_2 = c_{21} + \mu_1 = 2 - 5 = -3$.
- $(2,4)$: $\mu_4 = \lambda_2 - c_{24} = -3 - 1 = -4$.
- $(3,4)$: $\lambda_3 = c_{34} + \mu_4 = 4 + (-4) = 0$.
- $(3,3)$: $\mu_3 = \lambda_3 - c_{33} = 0 - 2 = -2$.
Check non-basic cells:
- $(1,3)$: $0-(-2) = 2 \leq 4$. OK.
- $(1,4)$: $0-(-4) = 4 \leq 6$. OK.
- $(2,2)$: $-3-(-3) = 0 \leq 7$. OK.
- $(2,3)$: $-3-(-2) = -1 \leq 4$. OK.
- $(3,1)$: $0-(-5) = 5 \leq 5$. OK.
- $(3,2)$: $0-(-3) = 3 \leq 6$. OK.
All non-basic cells satisfy dual feasibility. The current BFS is **optimal**, with objective value $\mathbf{63}$.
To summarise: two pivots reduced the cost from 119 to 98 to 63. The optimal plan ships 3 units from supplier 1 to consumer 1, 5 units to consumer 2; 3 units from supplier 2 to consumer 1, 7 units to consumer 4; and 8 units from supplier 3 to consumer 3, 1 unit to consumer 4.
[/example]
## The Algorithm in Summary
The transportation algorithm is structurally the simplex method applied to the transportation LP, exploiting the bipartite structure to make each step explicit.
[explanation: The Transportation Algorithm]
Given a balanced transportation problem ($\sum s_i = \sum d_j$):
1. **Find an initial BFS.** The northwest corner method (or Vogel's approximation for a better starting point): fill cells greedily from $(1,1)$, exhausting supply or demand at each step. This produces exactly $n + m - 1$ basic variables forming a spanning tree.
2. **Compute dual variables.** Fix one multiplier, say $\lambda_1 = 0$. For each basic edge $(i,j)$ in the spanning tree, propagate using $\lambda_i - \mu_j = c_{ij}$.
3. **Optimality test.** For every non-basic cell $(i,j)$, check whether $\lambda_i - \mu_j \leq c_{ij}$. If all cells pass, the current BFS is optimal; stop.
4. **Select an entering edge.** Choose any non-basic $(p,q)$ with $\lambda_p - \mu_q > c_{pq}$ (e.g., the most violated cell).
5. **Pivot.** Adding $(p,q)$ to the spanning tree creates a unique cycle. Increase $x_{pq}$ by $\delta$, adjusting flows alternately $\pm\delta$ around the cycle. Set $\delta = \min\{x_{ij} : (i,j) \text{ decreasing on the cycle}\}$. The minimising edge leaves the basis.
6. **Return to step 2** with the updated BFS and spanning tree.
The algorithm terminates in finite time (assuming non-degeneracy) because each pivot strictly reduces the objective, and there are finitely many spanning trees.
[/explanation]
[remark: Degeneracy]
If the minimum in the pivot step is achieved by two or more edges simultaneously, the leaving edge choice is ambiguous and the new BFS is degenerate (some basic variable is zero). Degenerate pivots do not change the objective value, so cycling is theoretically possible. In practice, a tie-breaking rule (e.g., lexicographic rule) prevents cycling and guarantees finite termination.
[/remark]
The transportation algorithm is one of the most efficient methods for its class. Because the constraint matrix of the transportation LP is totally unimodular, integer supply and demand values always yield integer optimal solutions — a fact that makes the transportation problem invaluable in combinatorial optimisation. The next section extends these ideas to a very different kind of flow problem: maximising the total flow through a network, rather than minimising its cost.
## The Maximum Flow Problem
The maximum flow problem is among the most natural problems in network optimisation. Given a directed network with a single source and a single sink, and with capacity constraints on the edges but no transportation costs, the question is simply: how much total flow can we push from the source to the sink? Despite its simplicity, this problem has deep structure — its LP dual is a combinatorial object called a cut, and the duality relationship between flow and cuts gives one of the most elegant theorems in the field.
## The Maximum Flow LP
What is the most flow we can push from a single source to a single sink in a network with capacity-limited edges? This question models a range of practical situations: the throughput of a communications link, the peak volume a traffic junction can handle, or the production rate of an assembly line with bottleneck stages. Unlike the minimum-cost flow problem, there are no costs to minimise — only capacity constraints to respect — and the objective is simply to maximise total throughput. Formulating this as a linear program is straightforward, and the LP dual will turn out to be the minimum cut problem.
Let $G = (V, E)$ be a directed network with source vertex $1$ and sink vertex $n$. Each edge $(i, j) \in E$ carries a capacity $C_{ij} \geq 0$. There are no costs. We seek to route as much flow as possible from $1$ to $n$ subject to the capacity constraints and flow conservation at every intermediate vertex.
[definition: Maximum Flow Problem]
The **maximum flow problem** on a network $G = (V, E)$ with capacities $C_{ij}$ is the linear program
\begin{align*}
\text{maximise} \quad & \delta \\
\text{subject to} \quad & \sum_{j:\,(i,j)\in E} x_{ij} - \sum_{j:\,(j,i)\in E} x_{ji} = \begin{cases} \delta & i = 1 \\ -\delta & i = n \\ 0 & \text{otherwise} \end{cases} \quad \text{for each } i \in V, \\
& 0 \leq x_{ij} \leq C_{ij} \quad \text{for each } (i,j) \in E.
\end{align*}
Here $\delta$ is the total flow from the source to the sink, and $x_{ij}$ is the flow on edge $(i,j)$.
[/definition]
This is a legitimate LP, so in principle one could apply the simplex method. The minimum-cost flow framework from the previous section also covers it: add a return arc from $n$ back to $1$ with cost $-1$ and infinite capacity, set all other costs to zero, and minimising the resulting cost function is equivalent to maximising $\delta$. However, there is a more direct approach that exploits the combinatorial structure of the problem.
## Cuts and the Flow Bound
An algorithm that finds a large flow is useful, but how do we know when to stop? Without an upper bound, we cannot tell whether we have achieved the maximum or merely a local optimum. This is where cuts come in: a cut is a partition of the vertex set that separates the source from the sink, and every edge crossing the cut boundary is a potential bottleneck. Any flow from source to sink must cross the boundary, so the total capacity of the crossing edges is an upper bound on what can be achieved. If we can find a cut whose capacity equals the value of a flow we have already constructed, the two together certify that the flow is optimal — no further improvement is possible.
The key notion that will allow us to certify optimality — and ultimately to find it — is that of a cut.
[definition: Cut and Cut Capacity]
Let $G = (V, E)$ be a network with capacities $C_{ij}$. A **cut** is a partition of $V$ into two disjoint sets $(S, V \setminus S)$ with $1 \in S$ and $n \in V \setminus S$.
The **capacity** of the cut $(S, V \setminus S)$ is
\begin{align*}
C(S) = \sum_{(i,j)\,\in\,(S \times (V \setminus S))\,\cap\, E} C_{ij},
\end{align*}
the total capacity of all edges directed from $S$ into $V \setminus S$.
[/definition]
Intuitively, a cut separates the source from the sink by drawing a boundary through the network. Any flow from $1$ to $n$ must cross this boundary, so the cut capacity is an upper bound on the achievable flow. The following computation makes this precise.
Given a feasible flow $x$ sending $\delta$ units from $1$ to $n$, define the net flow from a set $X \subseteq V$ to a set $Y \subseteq V$ as
\begin{align*}
f_x(X, Y) = \sum_{(i,j)\,\in\,(X \times Y)\,\cap\, E} x_{ij}.
\end{align*}
For any cut $(S, V \setminus S)$ with $1 \in S$ and $n \notin S$, summing the flow conservation constraint over all $i \in S$ gives
\begin{align*}
\delta = \sum_{i \in S} \left( \sum_{j:\,(i,j)\in E} x_{ij} - \sum_{j:\,(j,i)\in E} x_{ji} \right),
\end{align*}
because the conservation constraint contributes $0$ for every interior node and $\delta$ for the source. Collecting terms by whether the adjacent vertex is in $S$ or in $V \setminus S$:
\begin{align*}
\delta &= f_x(S, V) - f_x(V, S) \\
&= f_x(S, S) + f_x(S, V \setminus S) - f_x(V \setminus S, S) - f_x(S, S) \\
&= f_x(S, V \setminus S) - f_x(V \setminus S, S).
\end{align*}
Since $f_x(V \setminus S, S) \geq 0$ and $x_{ij} \leq C_{ij}$:
\begin{align*}
\delta \leq f_x(S, V \setminus S) \leq C(S).
\end{align*}
So every feasible flow is bounded above by the capacity of every cut. The remarkable fact — the max-flow min-cut theorem — is that the maximum flow exactly equals the minimum cut capacity. There is no duality gap.
<!-- illustration-needed: the cut boundary concept — show a small directed network with source 1 and sink n, a dashed cut line separating S from V\setminus S, with arrows crossing the boundary labelled by capacity and flow values -->
## The Max-Flow Min-Cut Theorem
[quotetheorem:2568]
[citeproof:2568]
[remark: Optimality Certificate]
The max-flow min-cut theorem provides an optimality certificate: to prove that a given flow $\delta$ is maximum, it suffices to exhibit a cut of capacity $\delta$. This is the analogue of exhibiting a dual feasible solution in LP duality. Finding the cut requires no additional work — the proof constructs it directly from the optimal flow via the reachability argument.
[/remark]
The theorem is powerful, but it is worth being explicit about what it does and does not say. It guarantees that the maximum flow and minimum cut values are equal — a result that might seem surprising, since the maximum flow is a continuous optimisation problem and the minimum cut is a combinatorial one. However, the theorem does not say anything directly about computational complexity, and this matters: finding the maximum flow requires an algorithm that runs in polynomial time, and the proof gives a correctness certificate but not an efficient algorithm by itself. The Ford-Fulkerson approach (described next) is correct but may be slow or even non-terminating for irrational capacities. Efficient algorithms — such as Edmonds-Karp, Dinic, or push-relabel — require additional ideas beyond the theorem.
The theorem also has a notable limitation with irrational capacities. If capacities are irrational, the Ford-Fulkerson algorithm may select augmenting paths whose bottleneck is irrational and arbitrarily small, causing the sequence of flow values to converge to a non-optimal limit. The max-flow min-cut theorem itself remains valid (the optimal value still equals the minimum cut capacity), but the algorithmic consequence — that we can find the optimum by augmenting — is only guaranteed when capacities are rational or integer.
## Augmenting Paths and the Ford-Fulkerson Algorithm
Before developing the augmenting-path algorithm, it is worth seeing why the naive greedy strategy fails. Suppose we push flow along paths one at a time, always choosing a directed path from source to sink and saturating it as much as possible. Consider a simple four-node network: source $1$, sink $4$, and two intermediate nodes $2$ and $3$, with edges $1 \to 2$ (capacity 2), $1 \to 3$ (capacity 2), $2 \to 4$ (capacity 2), $3 \to 4$ (capacity 2), and a cross edge $2 \to 3$ (capacity 1). The maximum flow is $4$. But suppose the greedy algorithm first chooses the path $1 \to 2 \to 3 \to 4$ and sends $1$ unit. Now the cross edge $2 \to 3$ is saturated. The remaining paths are $1 \to 2 \to 4$ (residual 1) and $1 \to 3 \to 4$ (residual 2), giving a total of only $3$ units — the greedy forward approach got stuck at a suboptimal flow. The fix is to allow backward steps: we need the option to "undo" flow sent along a previously chosen path by routing new flow backward along it. This is the insight that drives the augmenting-path algorithm and the residual graph construction below.
The proof above not only certifies optimality but also suggests an algorithm for finding the optimum. As long as an augmenting path exists from $1$ to $n$, the current flow is not optimal, and we can strictly improve it. The Ford-Fulkerson algorithm makes this the entire strategy.
[definition: Augmenting Path]
Given a feasible flow $x$ in a network $G = (V, E)$ with capacities $C_{ij}$, an **augmenting path** is an undirected path $v_0, v_1, \ldots, v_k$ from $1$ to $n$ such that for each $i = 1, \ldots, k$:
\begin{align*}
x_{v_{i-1}v_i} < C_{v_{i-1}v_i} \quad &\text{(forward edge not at capacity),} \\
\text{or} \quad x_{v_i v_{i-1}} > 0 \quad &\text{(backward edge with positive flow).}
\end{align*}
The **bottleneck capacity** of the path is $\varepsilon = \min_i r_i$, where $r_i = C_{v_{i-1}v_i} - x_{v_{i-1}v_i}$ on a forward edge and $r_i = x_{v_i v_{i-1}}$ on a backward edge.
[/definition]
Sending $\varepsilon$ units along an augmenting path — increasing $x_{v_{i-1}v_i}$ by $\varepsilon$ on forward edges and decreasing $x_{v_i v_{i-1}}$ by $\varepsilon$ on backward edges — produces a new feasible flow with value $\delta + \varepsilon$. The backward-edge step is essential: it allows the algorithm to "reroute" flow that was previously sent along a suboptimal path.
[definition: Residual Graph]
Given a feasible flow $x$ in $G = (V, E)$ with capacities $C_{ij}$, the **residual graph** $G_x = (V, E_x)$ has:
- a **forward arc** $(i, j)$ with residual capacity $C_{ij} - x_{ij}$ for each $(i,j) \in E$ with $x_{ij} < C_{ij}$,
- a **backward arc** $(j, i)$ with residual capacity $x_{ij}$ for each $(i,j) \in E$ with $x_{ij} > 0$.
An augmenting path from $1$ to $n$ in $G$ corresponds precisely to a directed path from $1$ to $n$ in $G_x$.
[/definition]
The residual graph makes the notion of augmentation concrete: to find an augmenting path, one simply performs a graph search (breadth-first or depth-first) in $G_x$ from the source $1$ looking for a directed path to the sink $n$.
**Ford-Fulkerson Algorithm.** Starting from any feasible flow $x$ (for example $x = 0$):
1. If no directed path from $1$ to $n$ exists in the residual graph $G_x$, output $x$ as optimal.
2. Find a directed path $P$ from $1$ to $n$ in $G_x$. Let $\varepsilon = \min_{(i,j) \in P} r_{ij}$ be the path's bottleneck capacity.
3. Send $\varepsilon$ units along $P$: increase $x_{ij}$ by $\varepsilon$ on forward arcs of $P$ and decrease $x_{ij}$ by $\varepsilon$ on backward arcs of $P$. Return to step 1.
Correctness of the algorithm follows directly from the max-flow min-cut theorem: it terminates only when no augmenting path exists, which the proof shows is precisely when the current flow is optimal, with the reachable set $S$ forming a minimum cut.
[remark: Termination and Complexity]
When all capacities are integers, the bottleneck on each augmenting path is a positive integer, so the flow value increases by at least 1 each iteration. The algorithm terminates in at most $\delta^*$ iterations, where $\delta^*$ is the optimal flow value. For rational capacities a similar argument applies after scaling. For irrational capacities, Ford-Fulkerson may not terminate; variants such as Edmonds-Karp (which use BFS to find shortest augmenting paths) guarantee termination in $O(|V| |E|^2)$ time regardless of capacities.
[/remark]
## Worked Example
[example: Six-Node Max-Flow Network]
Consider the network with source $1$, sink $n$, and four intermediate nodes arranged in two layers. The edges and their capacities are:
\begin{align*}
&1 \to A: 5, \quad A \to B: 1, \quad B \to n: 1, \\
&1 \to C: 5, \quad C \to D: 2, \quad D \to n: 5, \\
&A \to D: 4.
\end{align*}
where $A, B$ form the upper layer and $C, D$ the lower layer.
[illustration:max-flow-network-six-nodes]
We initialise with $x = 0$ and augment greedily. The residual graph initially equals the original graph.
**Iteration 1.** Take the upper path $1 \to A \to B \to n$. The bottleneck is $\min(5, 1, 1) = 1$. Send $\varepsilon = 1$. Updated flows: $x_{1A} = 1$, $x_{AB} = 1$, $x_{Bn} = 1$. Total flow: $\delta = 1$.
**Iteration 2.** Take the lower path $1 \to C \to D \to n$. The bottleneck is $\min(5, 2, 5) = 2$. Send $\varepsilon = 2$. Updated flows: $x_{1C} = 2$, $x_{CD} = 2$, $x_{Dn} = 2$. Total flow: $\delta = 3$.
**Iteration 3.** The path $1 \to A \to D \to n$ is available, since $x_{1A} = 1 < 5 = C_{1A}$, $x_{AD} = 0 < 4 = C_{AD}$, and $x_{Dn} = 2 < 5 = C_{Dn}$. The bottleneck is $\min(4, 4, 3) = 3$. Send $\varepsilon = 3$. Updated flows: $x_{1A} = 4$, $x_{AD} = 3$, $x_{Dn} = 5$. Total flow: $\delta = 6$.
**No further augmentation.** In the residual graph $G_x$ after iteration 3, the only edges out of node $D$ are the backward arc $D \to C$ (capacity $2$) and the backward arc $D \to A$ (capacity $3$); the forward arc $D \to n$ is saturated ($x_{Dn} = 5 = C_{Dn}$). The only edge into $n$ remaining is from $B$ with residual capacity $0$ (saturated). No directed path from $1$ to $n$ exists in $G_x$.
**Certifying optimality.** Let $S = \{1, A, C, B, D\}$ and $V \setminus S = \{n\}$. Edges from $S$ to $V \setminus S$ are $B \to n$ with capacity $1$ and $D \to n$ with capacity $5$, giving $C(S) = 1 + 5 = 6 = \delta$. The cut capacity equals the flow, confirming optimality.
The final flow assignment is: $x_{1A} = 4$, $x_{AB} = 1$, $x_{Bn} = 1$, $x_{AD} = 3$, $x_{1C} = 2$, $x_{CD} = 2$, $x_{Dn} = 5$. One can verify flow conservation at every intermediate node: node $A$ receives $4$ and sends $1 + 3 = 4$; node $B$ receives $1$ and sends $1$; node $C$ receives $2$ and sends $2$; node $D$ receives $3 + 2 = 5$ and sends $5$.
[/example]
## The Maximum Flow Problem as a Linear Program
Stepping back, the maximum flow problem is a special case of the minimum-cost flow problem, which is itself an LP. The max-flow min-cut theorem is therefore a special case of LP strong duality. To see this directly: the dual of the maximum flow LP assigns a multiplier $y_{ij} \geq 0$ to each capacity constraint $x_{ij} \leq C_{ij}$ and a multiplier $\pi_i$ (unconstrained) to each flow conservation constraint. Working through the dual LP, the dual variables $y_{ij}$ take the value $1$ on edges crossing the minimum cut and $0$ elsewhere, and the dual objective evaluates to $C(S)$. Strong duality then asserts $\delta^* = C(S^*)$ for the minimum cut $S^*$.
This connection places the max-flow min-cut theorem in the LP landscape studied throughout this course: a primal optimum (maximum flow) is certified by a dual solution (minimum cut), and the two optimal values coincide.
[remark: Connections Across the Course]
The network flow problems in this chapter form a natural hierarchy within linear programming. The minimum-cost flow LP is an instance of the general LP from Chapters 1-3, with a constraint matrix that is totally unimodular — a key structural property that guarantees optimal solutions are always integer-valued when the data is integer. The transportation problem reduces to minimum-cost flow, and maximum flow is a further special case. The Ford-Fulkerson algorithm can be viewed as a specialised form of the simplex method operating directly on network structure rather than on the full tableaux. In this sense, the efficient algorithms for network flows reflect the LP duality theory and complementary slackness conditions established in the first half of the course, applied to graphs with their rich combinatorial geometry.
[/remark]
[remark: Choosing the Right Model]
In practice, deciding which network model to use depends on what aspects of the problem are variable and what the objective is. Use the **transportation problem** when the network is bipartite — fixed sets of suppliers and consumers, no intermediate routing decisions, and you want to minimise total shipping cost. The tableau algorithm is particularly fast and interpretable for this structure. Use **minimum-cost flow** when the network has intermediate nodes, when flow must be routed through a graph rather than directly from suppliers to consumers, or when lower bounds on edge flow are present. The min-cost flow framework handles all of these, and the reduction theorem shows that transportation is just a special case. Use **maximum flow** when costs are irrelevant and the objective is purely to maximise throughput — for example, finding the peak capacity of a communications network or certifying that a given routing is optimal by exhibiting a saturated cut. The Ford-Fulkerson or Edmonds-Karp algorithms are then the right tools, and the min-cut certificate provides the guarantee.
[/remark]
## References
References will be added upon publication.
Contents
- Introduction and Preliminaries
- Constrained Optimisation and Its General Form
- Linear Programming
- Unconstrained Optimisation and Convexity
- 2. The Method of Lagrange Multipliers
- The Lagrangian and the Sufficiency Theorem
- Geometric Interpretation
- Inequality Constraints and Slack Variables
- Complementary Slackness
- Shadow Prices
- Lagrange Duality
- Supporting Hyperplanes and Conditions for Strong Duality
- Convexity and the Guarantee of Strong Duality
- 3. Solutions of Linear Programs
- Linear Programs
- Standard Form and the Feasible Region
- Basic Solutions
- Extreme Points
- Basic Solutions and the Basis
- A Worked Example
- Basic Feasible Solutions Are the Extreme Points
- Counting and Finding Basic Solutions
- Extreme Points and Optimal Solutions
- Optima Always Occur at Extreme Points
- Why Finiteness Makes This Powerful
- The Broader Principle: Convex Functions on Compact Convex Sets
- Connection to Basic Feasible Solutions
- Linear Programming Duality
- Deriving the Dual of a Linear Program
- Weak Duality
- Strong Duality
- The Dual of the Dual
- Complementary Slackness
- A Worked Example: Primal-Dual Correspondence
- Shadow Prices and Sensitivity
- Summary of LP Duality
- The Simplex Method
- The Simplex Tableau
- The Basis Decomposition
- The General Tableau
- The Optimality Test
- Pivoting: Entering and Leaving Variables
- The Simplex Algorithm
- A Complete Example
- The Two-Phase Simplex Method
- Artificial Variables and the Phase I Problem
- The Two-Phase Method in Practice
- Infeasibility Detection
- Degeneracy and Cycling
- Summary of the Two-Phase Method
- 4. Non-cooperative Games
- Games and Strategies
- Mixed Strategies and Expected Payoffs
- Dominant Strategies and the Prisoner's Dilemma
- The Chicken Game and the Maximin Strategy
- Nash Equilibrium
- Zero-Sum Games and the Minimax Theorem
- The Value of a Matrix Game
- Solving a Zero-Sum Game via LP: A Worked Example
- 5. Network Problems
- Definitions
- Directed Graphs and Networks
- Capacities, Supplies, and Flow
- Minimum-cost Flow Problem
- The Cost Formulation
- Reductions: Circulation and Uncapacitated Problems
- The LP Formulation and Node Potentials
- Optimality Conditions
- The Transportation Problem
- Reduction from Min-Cost Flow
- The Transportation Tableau
- Dual Variables and Optimality
- Pivoting on a Cycle
- A Fully Worked Example
- The Algorithm in Summary
- The Maximum Flow Problem
- The Maximum Flow LP
- Cuts and the Flow Bound
- The Max-Flow Min-Cut Theorem
- Augmenting Paths and the Ford-Fulkerson Algorithm
- Worked Example
- The Maximum Flow Problem as a Linear Program
- References
Cambridge IB Optimisation
Content
Problems
History
Created by admin on 4/25/2026 | Last updated on 4/25/2026
Prerequisites
No prerequisites required for this page.
Rate this page
★
★
★
★
★
Poor
Excellent