Combinatorics is the study of finite structures and the patterns that govern their organization. This course explores some of the deepest theorems in extremal combinatorics — results that ask: how large can a set system be if it satisfies certain constraints? How must objects be distributed if they obey given restrictions? The seven chapters that follow trace an arc from classical matching theory through modern algebraic and geometric methods, revealing how a single constraint can force dramatic structural consequences.
The course begins with Hall's Theorem, a foundational result about matchings in bipartite graphs that characterizes when perfect pairings exist. From there, we move to Sperner Systems — antichains of sets under inclusion — and ask how many sets we can choose without any containment relations. The Kruskal–Katona Theorem refines this by identifying the extremal families that minimize the shadow, and Isoperimetric Inequalities extend these ideas to the discrete cube. The middle chapters shift focus to additive structure: Sum Sets examines how sumsets grow under arithmetic operations, while Projections studies dimension reduction and volume bounds. Finally, Alon's Combinatorial Nullstellensatz brings the polynomial method into the picture, using algebraic tools to solve problems that seem purely combinatorial.
Each chapter builds on the previous one not through direct logical dependence, but through a deepening toolkit and philosophical maturity. We begin by counting and matching, progress to understanding antichain structure and optimality, and culminate in recognizing that many combinatorial problems yield to algebraic attack.
# Introduction
# 1. Hall's Theorem
This chapter opens the course by revisiting a classical result from graph theory: Hall's Marriage Theorem. The theorem gives a clean necessary and sufficient condition for a complete matching to exist in a bipartite graph, and its consequences extend surprisingly far — to systems of distinct representatives, regular bipartite graphs, and ultimately to chain decompositions of the Boolean lattice that will recur throughout the course.
## Bipartite Graphs and Complete Matchings
Suppose you are organising a research conference and must assign each applicant to a project supervisor, with every applicant matched to exactly one supervisor willing to take them. Can this always be done? Not necessarily: if five applicants all want the same single supervisor, at most one of them can be placed. The obstacle is a local bottleneck — a group of applicants whose collective preferences point to too few supervisors. Formalising this bottleneck condition is the first step toward Hall's theorem.
The natural setting is a bipartite graph, whose two sides represent the two groups being matched.
[definition: Bipartite Graph]
A **bipartite graph** $G = (X, Y; E)$ with bipartition $X$ and $Y$ is a graph on vertex set $X \sqcup Y$ in which every edge has one endpoint in $X$ and one endpoint in $Y$.
Such a graph is called **$(k, \ell)$-regular** if every vertex in $X$ has degree $k$ and every vertex in $Y$ has degree $\ell$. A bipartite graph that is $(k, \ell)$-regular for some $k, \ell \geq 1$ is called **biregular**.
[/definition]
A natural question is: when can we match every vertex in $X$ to a distinct neighbour in $Y$? This is formalised as a complete matching.
[definition: Complete Matching]
Let $G = (X, Y; E)$ be a bipartite graph. A **complete matching from $X$ to $Y$** is an injection $f: X \to Y$ such that $xf(x)$ is an edge for every $x \in X$.
[/definition]
Before stating Hall's theorem, it helps to think about what a complete matching forces. If $f: X \to Y$ is a complete matching and $S \subseteq X$ is any subset, then $\{f(x) : x \in S\}$ is a set of $|S|$ distinct neighbours of $S$. Writing $\Gamma(S)$ for the set of all neighbours of $S$ in $Y$, we get $|\Gamma(S)| \geq |S|$. Hall's theorem says this obvious necessary condition is also sufficient.
## Hall's Theorem and Its Proof
The necessary condition for a complete matching — that every subset $S \subseteq X$ must have at least $|S|$ neighbours — is easy to see. The remarkable content of Hall's theorem is that this obvious necessary condition is already sufficient. There is no additional obstruction; if no local bottleneck exists, a global matching can always be assembled.
[quotetheorem:2018]
[citeproof:2018]
The strategy behind this proof is itself worth examining, as it illustrates a powerful technique that appears throughout combinatorics.
[remark: Edge-Minimality Strategy]
This proof is a clean example of a minimality argument. Rather than building a matching greedily or via augmenting paths, it deduces what the minimal structure satisfying Hall's condition must look like. The contradiction in Step 1 is the combinatorial core: two tight sets whose intersection is too small for Hall's condition to hold.
[/remark]
Beyond the proof technique, it is natural to ask how sharp the condition is and whether the proof gives an algorithm for finding a matching.
[remark: Sharpness and Non-Constructiveness]
Hall's condition is tight: if any subset $S \subseteq X$ satisfies $|\Gamma(S)| = |S| - 1$, no complete matching exists. The proof is non-constructive — it establishes existence of a matching but gives no algorithm to find one. In practice, complete matchings are found via augmenting-path algorithms (e.g., the Hopcroft–Karp algorithm), but Hall's theorem answers the existence question without exhibiting a matching explicitly.
[/remark]
Hall's theorem is stated for bipartite graphs, but the condition it characterises arises naturally in any context where one selects one element from each of several sets without repeating. The next section makes this precise.
## Systems of Distinct Representatives
Imagine you must choose one element from each set in a family $\mathcal{A} = \{A_1, \ldots, A_m\}$ such that no element is chosen twice — a representative system where every set gets a distinct voice. Such a selection is called a system of distinct representatives, and the question of when one exists is precisely Hall's theorem in disguise. The distinctness requirement is what makes the problem non-trivial: without it, one could always pick any element from each set independently.
Given a finite family $\mathcal{A} = \{A_1, \ldots, A_m\}$ of finite sets, a **system of distinct representatives (SDR)** is a collection of distinct elements $a_1, \ldots, a_m$ with $a_i \in A_i$ for each $i$.
If an SDR exists, then for any subcollection indexed by $I \subseteq [m] := \{1, \ldots, m\}$, the elements $\{a_i : i \in I\}$ are $|I|$ distinct members of $\bigcup_{i \in I} A_i$, forcing $|\bigcup_{i \in I} A_i| \geq |I|$. Hall's theorem gives the converse.
[quotetheorem:2579]
[citeproof:2579]
The reduction to Hall's theorem is immediate, but the condition it yields deserves a closer look.
[remark: What Hall's Condition Captures for SDRs]
The condition $|\bigcup_{B \in \mathcal{B}} B| \geq |\mathcal{B}|$ for every subcollection $\mathcal{B}$ is the precise obstruction: if some $k$ sets collectively contain fewer than $k$ distinct elements, those $k$ sets cannot all be represented distinctly. The theorem says this is the only obstruction — if every local cluster has enough elements to go around, a global selection exists. Note also that the theorem gives no recipe for constructing the SDR; it merely guarantees existence.
[/remark]
The following example shows how to check Hall's condition and how a single violated subcollection certifies failure.
[example: Checking Hall's Condition for an SDR]
Let $A_1 = \{1, 2\}$, $A_2 = \{2, 3\}$, $A_3 = \{1, 3\}$. For every subcollection of size $k$, the union has size at least $k$: singletons have union size $2 \geq 1$; any two sets share exactly one element, so their union has size $3 \geq 2$; the full collection has union $\{1,2,3\}$ with $|\mathcal{B}| = 3$. Hall's condition holds throughout. An SDR is $a_1 = 2$, $a_2 = 3$, $a_3 = 1$.
Now suppose instead $A_1 = A_2 = \{1\}$ and $A_3 = \{1, 3\}$. The subcollection $\{A_1, A_2\}$ has union $\{1\}$ with size $1 < 2 = |\{A_1, A_2\}|$. Hall's condition fails, and no SDR exists: both $A_1$ and $A_2$ can only be represented by $1$, but the two representatives must be distinct.
[/example]
## Matchings in Regular Bipartite Graphs
Hall's theorem is sometimes difficult to verify directly because one must check all $2^{|X|}$ subsets. Before turning to degree conditions that make verification automatic, consider a case where Hall's condition fails in a simple way. Take $X = \{x_1, x_2, x_3\}$, $Y = \{y_1, y_2\}$, and edges $x_1 y_1$, $x_2 y_1$, $x_3 y_2$. Then $\Gamma(\{x_1, x_2\}) = \{y_1\}$, so $|\Gamma(S)| = 1 < 2 = |S|$ for $S = \{x_1, x_2\}$. No complete matching from $X$ to $Y$ exists — $x_1$ and $x_2$ are in direct competition for the single vertex $y_1$. The problem is a degree imbalance: vertices in $X$ have degree $1$ while $y_1$ has degree $2$, so too many $X$-vertices are forced to share a single $Y$-neighbour. The following theorem shows that when degrees are balanced in the right way, this pathology cannot arise.
[quotetheorem:2580]
[citeproof:2580]
The special case of regular bipartite graphs is worth stating explicitly.
[quotetheorem:2581]
This is immediate: $d(x) = k \geq \ell = d(y)$ for all $x \in X$ and $y \in Y$, so the degree condition applies. The degree condition is sharp in a simple sense: if $k < \ell$, the graph may fail to have a complete matching — take $|X| = 2$, $|Y| = 2$, every vertex of $X$ with degree $1$ and every vertex of $Y$ with degree $1$, but with both $X$-vertices connected to the same $Y$-vertex. Then $\Gamma(X) = 1 < 2 = |X|$.
## Expansion in Biregular Graphs
Knowing that a complete matching exists tells us nothing about how "spread out" the matching is. A more refined question is: if we take any subset $A \subseteq X$, how large must its neighbourhood $\Gamma(A)$ be relative to the whole of $Y$? Can a biregular graph concentrate the neighbourhood of a large subset of $X$ into a small corner of $Y$? The answer is no: biregularity enforces a proportional expansion law.
[quotetheorem:2582]
[citeproof:2582]
Informally: in a biregular graph, no subset of $X$ gets "squeezed" — its image in $Y$ is proportionally at least as large. As a corollary:
[quotetheorem:2583]
[citeproof:2583]
In particular, any biregular graph always has a complete matching from whichever side has fewer vertices. This is a remarkably clean conclusion: biregularity alone forces the smaller side to be completely matchable, regardless of the graph's size or structure.
[remark: Sharpness of the Expansion Bound]
The expansion ratio $|\Gamma(A)|/|Y| \geq |A|/|X|$ is sharp: for the complete bipartite graph $K_{k,k}$, every $A \subseteq X$ has $\Gamma(A) = Y$, so equality holds only when $A = X$. For a more informative example, consider a cycle $C_4$ with $X = \{x_1, x_2\}$, $Y = \{y_1, y_2\}$, edges $x_1 y_1$, $x_1 y_2$, $x_2 y_1$, $x_2 y_2$ (a $2$-regular bipartite graph). Then $|\Gamma(\{x_1\})|/|Y| = 1 = |\{x_1\}|/|X|$, so the bound is achieved with equality for a singleton set.
[/remark]
The expansion theorem has a further consequence beyond matchings: it constrains how the Boolean lattice is layered, leading directly to the chain decompositions used in the next chapter.
## Matchings Between Layers of the Boolean Lattice
The Boolean lattice $\mathcal{P}(X)$ is ordered by inclusion, and its layers $X^{(r)}$ are the sets of fixed size $r$. A natural question is: can every $r$-element subset of $X$ be extended to an $s$-element subset, with the assignment injective? Equivalently, does there exist a matching that sends each layer $r$ set into a distinct layer $s$ set containing it? This question determines the "vertical connectivity" of the Boolean lattice and is the combinatorial engine behind Sperner-type theorems.
We close the chapter by applying these ideas to the Boolean lattice. For a finite set $X$ with $|X| = n$, write $X^{(r)}$ for the family of all $r$-element subsets of $X$, and similarly $X^{(\leq r)}$, $X^{(\geq r)}$. We have $|X^{(r)}| = \binom{n}{r}$.
For $r < s$, form the bipartite graph $G = (X^{(r)}, X^{(s)}; E)$ where $A \in X^{(r)}$ is adjacent to $B \in X^{(s)}$ if $A \subseteq B$. Each $A \in X^{(r)}$ is contained in exactly $\binom{n-r}{s-r}$ members of $X^{(s)}$, and each $B \in X^{(s)}$ contains exactly $\binom{s}{r}$ members of $X^{(r)}$. So $G$ is biregular, and the expansion and matching results apply.
[example: Layer Matching in the Boolean Lattice for $n = 4$]
Let $X = \{1,2,3,4\}$, so $n = 4$. Take $r = 1$ and $s = 2$. Then $X^{(1)} = \{\{1\}, \{2\}, \{3\}, \{4\}\}$ has size $4 = \binom{4}{1}$ and $X^{(2)}$ has size $6 = \binom{4}{2}$. Each $1$-element set is contained in $\binom{3}{1} = 3$ sets of size $2$, and each $2$-element set contains $\binom{2}{1} = 2$ sets of size $1$. So $G$ is $(3,2)$-regular. Since $|X^{(1)}| = 4 \leq 6 = |X^{(2)}|$, the smaller side is $X^{(1)}$, and by the complete matching theorem for biregular graphs there is an injection $f: X^{(1)} \to X^{(2)}$ with $A \subseteq f(A)$ for all $A$. One explicit matching: $f(\{1\}) = \{1,2\}$, $f(\{2\}) = \{2,3\}$, $f(\{3\}) = \{3,4\}$, $f(\{4\}) = \{1,4\}$. For the injection from layer $2$ into layer $3$: $|X^{(2)}| = 6 > 4 = |X^{(3)}|$, so we instead inject from $X^{(3)}$ into $X^{(2)}$: each $3$-element set $B$ maps to a $2$-element subset $g(B) \subsetneq B$.
[/example]
The example illustrates that the direction of the injection depends on which layer is smaller. The following theorem states the general result for arbitrary layers.
[quotetheorem:2584]
[citeproof:2584]
These layer injections are the technical foundation for the results in the next chapter. To prove Sperner's theorem — that the largest antichain in $\mathcal{P}(X)$ has size $\binom{n}{\lfloor n/2 \rfloor}$ — one decomposes $\mathcal{P}(X)$ into chains by composing the layer injections: push every element toward the middle layer from above and below. Each resulting chain intersects any antichain in at most one set. Since the number of chains equals $\binom{n}{\lfloor n/2 \rfloor}$, no antichain can be larger. The key point to carry forward: Hall's theorem, through biregular expansion, is what makes this chain decomposition possible.
Hall's theorem reveals how to find perfect matchings in bipartite graphs through careful expansion arguments. This structural insight now shifts focus to a broader question: how do we understand and count families of subsets themselves, independent of any matching structure?
# 2. Sperner Systems
This chapter begins the course's central investigation: understanding families of subsets of a finite set $X$ through the lens of the power set $\mathcal{P}(X)$. The key structural insight is that $\mathcal{P}(X)$ carries a natural graded poset structure, and the fundamental question is how large an antichain can be. We prove Sperner's theorem — the definitive answer — by two distinct methods: a chain-partition argument and the LYM inequality. We then abstract both methods to general graded posets and close with the Littlewood–Offord problem, where the antichain-size bound reappears in a striking analytic setting.
## Chains and Antichains in the Power Set
What structure in $\mathcal{P}(X)$ could possibly prevent a cleverly mixed antichain from beating the middle level? Before we can answer this, we need the language that makes the question precise. A partially ordered set, or **poset**, is a set equipped with a reflexive, antisymmetric, transitive relation. The power set $\mathcal{P}(X)$, ordered by $A \leq B$ if $A \subseteq B$, is the central example throughout these notes. Within any poset, two kinds of subsets are of fundamental importance: those where every pair of elements is related (chains), and those where no pair is related (antichains).
Before committing to this language, it is worth seeing concretely why a mixed-level family cannot beat the middle level in a small example. Take $X = \{1, 2, 3\}$ with $n = 3$. The middle-level antichain $X^{(1)} = \{\{1\}, \{2\}, \{3\}\}$ has size $\binom{3}{1} = 3$. Can we do better by mixing levels? Suppose we try $\mathcal{A} = \{\{1, 2\}, \{1, 3\}, \{2, 3\}\}$ — this is $X^{(2)}$, also of size $3$. What if we try a genuinely mixed family, say $\mathcal{A}' = \{\{1\}, \{2, 3\}, \{1, 2\}\}$? The set $\{1\} \subset \{1, 2\}$, so $\{1\}$ and $\{1, 2\}$ are comparable: $\mathcal{A}'$ is not an antichain. Any attempt to mix levels in $n = 3$ runs into this containment problem. The structure that encodes *why* this always happens is the grading of $\mathcal{P}(X)$.
[definition: Chain]
Let $(S, \leq)$ be a poset. A subset $C \subseteq S$ is a **chain** if any two elements of $C$ are comparable, i.e., for all $x, y \in C$ either $x \leq y$ or $y \leq x$.
[/definition]
The other extreme — no two elements comparable — captures exactly the families we are trying to maximise.
[definition: Antichain]
A subset $A \subseteq S$ of a poset $(S, \leq)$ is an **antichain** if no two distinct elements of $A$ are comparable: for all $x \neq y$ in $A$, neither $x \leq y$ nor $y \leq x$.
[/definition]
In $\mathcal{P}(X)$, a chain is a nested sequence of subsets $A_1 \subsetneq A_2 \subsetneq \cdots \subsetneq A_k$, while an antichain is a family where no set contains another. The central question of this chapter is: what is the largest antichain in $\mathcal{P}(X)$?
A natural candidate for a large antichain is $X^{(r)}$, the family of all $r$-element subsets of $X$, since subsets of the same size cannot contain one another. Writing $|X| = n$, the family $X^{(\lfloor n/2 \rfloor)}$ has size $\binom{n}{\lfloor n/2 \rfloor}$, which is the largest binomial coefficient. Can one do better with a cleverly chosen antichain that mixes elements from different levels?
To answer this question, it helps to recognise that $\mathcal{P}(X)$ has extra structure beyond being a poset: it is **graded**.
[definition: Graded Poset]
A poset $\mathcal{P} = (S, <)$ is a **graded poset** if we can write $S$ as a disjoint union
\begin{align*}
S = \coprod_{i=0}^{n} S_i
\end{align*}
such that:
- each $S_i$ is an antichain; and
- $x < y$ if and only if there exist elements $x = z_i < z_{i+1} < \cdots < z_j = y$ with $z_h \in S_h$ for each $h$.
The set $S_i$ is called the **$i$-th level** of the poset. The integer $n$ is the **rank** of the poset.
[/definition]
The power set $\mathcal{P}(X)$ with $|X| = n$ is a graded poset of rank $n$, with $S_i = X^{(i)}$ for $0 \leq i \leq n$. The level $S_0 = \{\varnothing\}$ and $S_n = \{X\}$ are singletons, while the middle level $S_{\lfloor n/2 \rfloor} = X^{(\lfloor n/2 \rfloor)}$ is the largest.
## Sperner's Theorem
Sperner's 1928 theorem answers the antichain question definitively: the middle level is best.
[quotetheorem:2585]
[citeproof:2585]
The chain-partition argument is non-constructive in the sense that it uses Hall's theorem to guarantee the injections, without writing down a specific partition. For small $n$ we can exhibit the partition explicitly, which also makes the bound immediate.
[example: Sperner's Theorem for n=4]
Let $X = \{1, 2, 3, 4\}$, so $n = 4$ and $\lfloor n/2 \rfloor = 2$. The theorem says the largest antichain has size $\binom{4}{2} = 6$. Indeed, $X^{(2)} = \{12, 13, 14, 23, 24, 34\}$ is an antichain of size $6$. To see the bound is tight, note that any antichain containing, say, $\{1\}$ cannot also contain $\{1, 2\}$, $\{1, 3\}$, or $\{1, 4\}$. The chain partition forces the count to $6$.
[/example]
Two natural questions arise. First, is the extremal antichain unique? For even $n$ there is a unique level of maximum size, so the only antichain of size $\binom{n}{n/2}$ is $X^{(n/2)}$ itself. For odd $n$, both $X^{(\lfloor n/2 \rfloor)}$ and $X^{(\lceil n/2 \rceil)}$ achieve the maximum, but no antichain mixes elements from different levels and still achieves it — this uniqueness statement follows from the equality condition in the LYM inequality (Section 4 below). Second, what happens for near-maximal antichains, those of size $\binom{n}{\lfloor n/2 \rfloor} - k$ for small $k$? The chain-partition proof gives no information about this: it only pins the maximum. The LYM approach, being quantitative, opens the door to such stability questions; we return to this perspective when we study the Kruskal–Katona theorem.
## The LYM Inequality
A second, more quantitative approach to Sperner's theorem comes from measuring the "weight" an antichain places on $\mathcal{P}(X)$. The resulting inequality, due to Lubell, Yamamoto, and Meshalkin, is strictly stronger than Sperner's theorem.
[quotetheorem:2586]
[citeproof:2586]
The double-counting argument is entirely explicit: there are no hidden choices, only two ways to count the same set of pairs. This transparency is why the LYM inequality admits an immediate characterisation of equality.
[remark: Sharpness of LYM]
Equality holds in the LYM inequality if and only if $\mathcal{A} \subseteq X^{(r)}$ for some fixed $r$, meaning the antichain is concentrated on a single level. This means the bound $\binom{n}{\lfloor n/2 \rfloor}$ is achieved only by subfamilies of the middle level (or both middle levels when $n$ is odd).
[/remark]
A concrete example shows both how the weight sum accumulates and how the LYM inequality rules out extensions that Sperner's theorem alone could not.
[example: LYM Weight Computation]
Let $X = \{1, 2, 3, 4\}$ ($n = 4$) and $\mathcal{A} = \{\{1, 2\}, \{1, 3\}, \{3, 4\}, \{2, 4\}\}$. All four sets have size $2$, so the LYM weight is
\begin{align*}
w(\mathcal{A}) = \frac{|\mathcal{A} \cap X^{(2)}|}{\binom{4}{2}} = \frac{4}{6} = \frac{2}{3} \leq 1.
\end{align*}
Now try to extend $\mathcal{A}$ while keeping it an antichain. Could we add $\{1, 4\}$? Yes, since $\{1, 4\}$ is incomparable with all four sets (it shares exactly one element with each). Adding it gives $|\mathcal{A}| = 5$, with weight $5/6$. Could we add $\{1, 2, 3\}$ on top of these five? The set $\{1, 2, 3\}$ contains $\{1, 2\}$, $\{1, 3\}$, and $\{1, 4\} \not\subset \{1,2,3\}$ — but $\{1,2\} \subset \{1,2,3\}$, so no. The LYM inequality confirms: if we had $\{1, 2, 3\}$ in $\mathcal{A}$, the weight contribution from $X^{(3)}$ would be $1/\binom{4}{3} = 1/4$, and with five sets at level $2$ already contributing $5/6 > 3/4$, the total would exceed $1$, a contradiction.
[/example]
The LYM inequality is strictly stronger than Sperner's theorem: it bounds the entire "weighted coverage" of an antichain, not just its size. The generalisation to arbitrary graded posets replaces $\binom{n}{r}$ with $|S_r|$, and the key property that makes the bound work is that the poset expands — a large sub-family at one level has an even larger shadow at the next. We now develop this.
## Generalisation to Graded Posets
Both proofs of Sperner's theorem admit far-reaching generalisations. The key structural property that makes the LYM-type argument work in $\mathcal{P}(X)$ is that $\mathcal{P}(X)$ "expands" as one moves between levels. To formalise this, we need the notion of a shadow.
[definition: Shadow]
Let $P = (S, <)$ be a graded poset with level decomposition $S = \coprod_{i=0}^{n} S_i$. For a subset $\mathcal{A} \subseteq S_i$, the **shadow** of $\mathcal{A}$ at level $i-1$ is
\begin{align*}
\partial \mathcal{A} = \{x \in S_{i-1} : x < y \text{ for some } y \in \mathcal{A}\}.
\end{align*}
[/definition]
In $\mathcal{P}(X)$, the shadow of a family $\mathcal{A} \subseteq X^{(r)}$ is the set of all $(r-1)$-element subsets obtained by removing one element from some set in $\mathcal{A}$.
[definition: Downward-Expanding Poset]
A graded poset $P = (S, <)$ is **downward-expanding** if for all $i$ and all $\mathcal{A} \subseteq S_i$,
\begin{align*}
\frac{|\partial \mathcal{A}|}{|S_{i-1}|} \geq \frac{|\mathcal{A}|}{|S_i|}.
\end{align*}
The poset is **upward-expanding** if the analogous inequality holds for shadows going upward, and **expanding** if it is both upward and downward expanding.
[/definition]
Intuitively, downward-expansion means that any sub-family at level $i$ occupies at least as large a proportion of level $i-1$ (via its shadow) as it does of level $i$ itself. The power set $\mathcal{P}(X)$ is expanding; this follows from the biregularity of the inclusion graph between $X^{(r)}$ and $X^{(r-1)}$.
To state the generalised LYM inequality, we introduce a natural weight function on subsets of a graded poset.
[definition: Weight of a Subset]
Let $P = (S, <)$ be a graded poset with levels $S_0, \ldots, S_n$. The **weight** of a set $\mathcal{A} \subseteq S$ is
\begin{align*}
w(\mathcal{A}) = \sum_{i=0}^{n} \frac{|\mathcal{A} \cap S_i|}{|S_i|}.
\end{align*}
[/definition]
The LYM inequality in $\mathcal{P}(X)$ says precisely $w(\mathcal{A}) \leq 1$ for any antichain $\mathcal{A}$, since $|S_i| = \binom{n}{i}$. The same conclusion holds for any downward-expanding poset. It does not hold in general: if the poset fails to expand, antichains can have weight greater than $1$. Consider the "two-level" poset with $S_0 = \{a\}$ (one element) and $S_1 = \{b, c, d\}$ (three elements), where $a < b$ only (so $b$ has a shadow but $c$ and $d$ do not). Take $\mathcal{A} = \{b, c, d\}$, an antichain at level $1$. The shadow is $\partial\mathcal{A} = \{a\}$, so $|\partial\mathcal{A}|/|S_0| = 1$ while $|\mathcal{A}|/|S_1| = 1$ — just barely expanding. But modify $S_1$ to have four elements $\{b, c, d, e\}$ with only $b$ above $a$: now $|\partial\mathcal{A}|/|S_0| = 1/1 = 1$ but $|\mathcal{A}|/|S_1| = 4/4 = 1$, so the weight is $w(\mathcal{A}) = 4/4 = 1$. Remove the edge $a < b$ entirely, making $S_0$ isolated: then $|\partial\mathcal{A}| = 0 < |\mathcal{A}| = 4$, the poset is not downward-expanding, and any subset of $S_1$ is an antichain, so we can have $w(\mathcal{A}) = 4/4 = 1$ (or larger if $|S_1| < |\mathcal{A}|$ were possible). The point is that without expansion, the inductive "shadow-replacement" step in the proof breaks down.
[quotetheorem:2587]
[citeproof:2587]
The expanding condition is the minimal hypothesis needed for this shadow-replacement argument to go through. Every regular poset is expanding: if every element of $S_i$ lies above exactly $r_i$ elements of $S_{i-1}$ and the bipartite graph between $S_i$ and $S_{i-1}$ is $(r_i, s_{i-1})$-biregular, then a double-counting gives $r_i |S_i| = s_{i-1} |S_{i-1}|$, and the expansion inequality $|\partial\mathcal{A}|/|S_{i-1}| \geq |\mathcal{A}|/|S_i|$ follows from the biregularity by Hall's theorem. However, the converse is false: a poset can be expanding without being regular. The Boolean lattice $\mathcal{P}(X)$ is both regular and expanding, which is why two independent proofs (chain partition and LYM) exist.
### Regular Posets
There is a second generalisation, corresponding to the chain-partition proof of Sperner's theorem. The structural property used in that proof was not directly that $\mathcal{P}(X)$ is expanding, but rather a stronger uniformity: every element at a given level relates to the same number of elements one level down and one level up. This is the notion of regularity.
[definition: Regular Poset]
A graded poset $(S, <)$ is **regular** if for each level $i$, there exist constants $r_i$ and $s_i$ such that every $x \in S_i$ lies above exactly $r_i$ elements of $S_{i-1}$ and below exactly $s_i$ elements of $S_{i+1}$.
[/definition]
In $\mathcal{P}(X)$ with $|X| = n$, the element $A \in X^{(i)}$ lies above $i$ elements of $X^{(i-1)}$ (the $(i-1)$-element subsets of $A$) and below $n - i$ elements of $X^{(i+1)}$. So $\mathcal{P}(X)$ is regular with $r_i = i$ and $s_i = n - i$.
[explanation: Why Regular Implies Weight Bound]
The weight-at-most-$1$ conclusion holds for regular posets by a maximal chain argument that mirrors the LYM proof for $\mathcal{P}(X)$. In a regular poset, we can count the number $M$ of maximal chains of length $n+1$. The regularity condition forces every element $x \in S_k$ to lie in exactly
\begin{align*}
m(x) = \prod_{i=1}^{k} r_i \cdot \prod_{i=k}^{n-1} s_i
\end{align*}
maximal chains, and crucially this number depends only on the level $k$, not on the specific element $x \in S_k$. This is because regularity provides uniform "branching" both below and above every element. Since every maximal chain passes through a unique element of each $S_i$, we get $M = |S_i| \cdot m(x)$ for any $x \in S_i$, giving $m(x) = M/|S_i|$. An antichain meets each maximal chain in at most one element, so $M \geq \sum_{x \in \mathcal{A}} m(x) = M \sum_i |\mathcal{A} \cap S_i|/|S_i| = M \cdot w(\mathcal{A})$, hence $w(\mathcal{A}) \leq 1$.
[/explanation]
This motivating argument already contains the proof; the theorem statement records the conclusion.
[quotetheorem:2588]
[citeproof:2588]
The weight bound $w(\mathcal{A}) \leq 1$ is the same conclusion as the Generalised LYM inequality, but derived from regularity rather than expansion. Since every regular poset is expanding (as argued above), the Generalised LYM inequality is the stronger result — it applies to a strictly larger class of posets. What the regularity approach adds is a different proof technique: counting maximal chains rather than iterating shadow replacements. This maximal-chain count is the key idea in the chain-partition proof of Sperner's theorem, generalised. Both approaches yield the same Sperner-type bound $|\mathcal{A}| \leq \max_i |S_i|$, which goes beyond Sperner's theorem for $\mathcal{P}(X)$ by giving the same conclusion for a wide class of combinatorially natural posets (products of chains, the face lattice of a polytope, etc.).
[example: Symmetric Chain Decomposition for n=3]
We verify the symmetric chain decomposition of $\mathcal{P}([3])$ and extract its profile. Starting from the decomposition of $\mathcal{P}([2])$: the two symmetric chains are $\{\varnothing, \{1\}, \{1,2\}\}$ (length $3$) and $\{\{2\}\}$ (length $1$, a singleton at the middle level). Applying the inductive rule with $n = 3$:
From the chain $\mathcal{C}_1 = \{\varnothing, \{1\}, \{1,2\}\}$, the top element is $\{1,2\}$, so:
\begin{align*}
\mathcal{C}_1^{(0)} &= \{\varnothing,\, \{1\},\, \{1,2\},\, \{1,2,3\}\}, \\
\mathcal{C}_1^{(1)} &= \{\{3\},\, \{1,3\}\}.
\end{align*}
From the chain $\mathcal{C}_2 = \{\{2\}\}$, the single (top) element is $\{2\}$, so:
\begin{align*}
\mathcal{C}_2^{(0)} &= \{\{2\},\, \{2,3\}\}, \\
\mathcal{C}_2^{(1)} &= \varnothing \quad (\text{dropped, since } |\mathcal{C}_2| = 1).
\end{align*}
This gives three symmetric chains: $\mathcal{C}_1^{(0)}$ of length $4$ (levels $0$–$3$), $\mathcal{C}_1^{(1)}$ of length $2$ (levels $1$–$2$), and $\mathcal{C}_2^{(0)}$ of length $2$ (levels $1$–$2$). Every element of $\mathcal{P}([3])$ appears exactly once, and the total number of chains is $3 = \binom{3}{\lfloor 3/2 \rfloor} = \binom{3}{1}$, confirming the bound.
[/example]
## The Littlewood–Offord Problem
We now turn to a striking application of antichain ideas to a problem of analytic flavour. Given vectors $x_1, \ldots, x_n$ in a normed space, each of norm at least $1$, and a subset $A \subseteq [n]$, form the subset sum $x_A = \sum_{i \in A} x_i$. How large can a family $\mathcal{A} \subseteq \mathcal{P}([n])$ be if all subset sums $x_A$ for $A \in \mathcal{A}$ are within distance $1$ of each other?
The motivating example: take $x_i = 1$ for all $i$ (real scalars). Then $x_A = |A|$, and the condition $|x_A - x_B| < 1$ forces $A$ and $B$ to have the same cardinality. The largest such family is $[n]^{(\lfloor n/2 \rfloor)}$ with $\binom{n}{\lfloor n/2 \rfloor}$ sets. This suggests the Sperner-type bound should hold in general.
Erdős proved this bound for real scalars in 1945, and Kleitman extended it to arbitrary normed spaces in 1970 by a beautiful inductive argument exploiting the structure of symmetric decompositions.
[quotetheorem:2589]
[citeproof:2589]
The real-variable case reduces neatly to Sperner via the antichain observation: the condition $|x_i| \geq 1$ forces the ordering $x_A < x_B$ whenever $A \subsetneq B$, so any family with all pairwise differences less than $1$ must already be an antichain. The bound is sharp — the middle-level family $[n]^{(\lfloor n/2 \rfloor)}$ with $x_i = 1$ achieves it — and the proof reveals something conceptual: the real-line constraint is equivalent to antichain-ness, so the Littlewood–Offord bound in one dimension is exactly Sperner's theorem in disguise. The real-variable proof also shows that sharpness is achieved if and only if $\mathcal{A} = [n]^{(r)}$ for some $r$ (with $|r - n/2|$ minimal), by the same equality condition as LYM.
The case of general normed spaces is more subtle because the notion of "ordering by sum" no longer applies. Kleitman's 1970 proof handles this by replacing the chain partition of Sperner's proof with a partition into **sparse** families.
[definition: Sparse Family]
Given vectors $x_1, \ldots, x_n$ in a normed space, a family $\mathcal{F} \subseteq \mathcal{P}([n])$ is **sparse** if $\|x_E - x_F\| \geq 1$ for all distinct $E, F \in \mathcal{F}$.
[/definition]
If $\mathcal{F}$ is sparse and $\mathcal{A}$ is a family with all pairwise distances less than $1$, then $|\mathcal{F} \cap \mathcal{A}| \leq 1$: the family $\mathcal{A}$ can contain at most one element from each sparse set, just as an antichain can contain at most one element from each chain. So if we can partition $\mathcal{P}([n])$ into $\binom{n}{\lfloor n/2 \rfloor}$ sparse families, we are done.
The right notion of "profile" for this partition comes from symmetric chain decompositions. Recall that a **symmetric chain** in $\mathcal{P}([n])$ is a chain $\{C_i, C_{i+1}, \ldots, C_{n-i}\}$ with $|C_j| = j$. We established that $\mathcal{P}([n])$ decomposes into symmetric chains by induction on $n$.
[quotetheorem:2590]
[citeproof:2590]
The decomposition produced by the inductive rule above is not unique: different inductive constructions can yield different partitions into symmetric chains (for example, the Greene–Kleitman algorithm and the parenthesisation algorithm both work). What is canonical is the **profile** — the number of chains of each length depends only on $n$ and not on which valid symmetric chain decomposition is chosen, because each chain of length $n + 1 - 2i$ must contain exactly one element at the middle level, and the level sizes determine the count. Symmetric chain decompositions of graded posets beyond $\mathcal{P}([n])$ exist for several natural examples: products of chains $[k_1] \times \cdots \times [k_r]$, the face lattice of the Boolean algebra, and the subgroup lattice of certain $p$-groups. Whether a given regular graded poset admits a symmetric chain decomposition is in general an open question.
The key statistic we extract from symmetric chain decompositions is their **profile**: for $0 \leq i \leq n/2$, the number of chains with $n + 1 - 2i$ elements is
\begin{align*}
\ell(n, i) = \binom{n}{i} - \binom{n}{i-1}.
\end{align*}
A partition $\mathcal{P}([n]) = \mathcal{F}_1 \cup \cdots \cup \mathcal{F}_t$ is called **symmetric** if its profile matches this, i.e., the number of families $\mathcal{F}_j$ with $n + 1 - 2i$ elements is $\ell(n, i)$. A symmetric partition has total number of parts equal to $\sum_i \ell(n, i) = \binom{n}{\lfloor n/2 \rfloor}$.
[quotetheorem:2591]
[citeproof:2591]
The induction works for vectors in any Banach space without modification, and the use of a linear functional in place of an order on the reals is the key abstraction. The Hahn–Banach theorem is the tool that makes this possible.
[remark: Role of Hahn-Banach]
The key step — finding a linear functional that "separates" $x_n$ from the rest — is precisely where the Hahn–Banach theorem enters. In real Euclidean space one could instead use the standard inner product with $x_n/\|x_n\|$, but Hahn–Banach allows the same argument to work in any Banach space, including infinite-dimensional ones, with no changes.
[/remark]
The Sperner systems chapter established which families of subsets can coexist without containment. The Kruskal–Katona theorem now asks a more refined question: among all r-element subsets, which specific families minimize the size of their lower shadows?
# 3. The Kruskal–Katona Theorem
This chapter addresses one of the central questions in extremal set theory: given a family $\mathcal{A}$ of $r$-element subsets of $[n]$, how small can the lower shadow $\partial \mathcal{A}$ be? The Kruskal–Katona theorem gives a complete and sharp answer, identifying the initial segments of the colex order as the extremal configurations. The proof strategy introduced here — defining compressions that reduce the shadow without shrinking the family, then characterising the fixed points of all compressions as exactly the colex initial segments — is a model for a wide class of extremal arguments in combinatorics.
## The Shadow Problem and Motivating Examples
Recall from Chapter 2 that for a family $\mathcal{A} \subseteq X^{(r)}$, the **lower shadow** is
\begin{align*}
\partial \mathcal{A} = \{ B \in X^{(r-1)} : B \subseteq A \text{ for some } A \in \mathcal{A} \}.
\end{align*}
Each set $A \in \mathcal{A}$ contributes at most $r$ sets to $\partial \mathcal{A}$ (its $r$-element $(r-1)$-subsets), and each $(r-1)$-element set $B$ can appear in at most $n - (r-1)$ members of $\mathcal{A}$ (by adding any of the remaining $n - r + 1$ elements). A crude double-counting gives
\begin{align*}
|\partial \mathcal{A}| \cdot (n - r + 1) \geq |\mathcal{A}| \cdot r,
\end{align*}
from which we can derive
\begin{align*}
|\partial \mathcal{A}| \geq |\mathcal{A}| \cdot \frac{r}{n-r+1}.
\end{align*}
This lower bound is weak, however — it does not depend on the structure of $\mathcal{A}$ at all, only on its size. The question is whether a sharper bound can be obtained, and whether there is a clean description of which families achieve the minimum shadow.
[example: Scattered vs Bunched Families]
Take $n = 6$ and $r = 3$. Consider first the scattered family
\begin{align*}
\mathcal{A} = \{123,\, 456,\, 124,\, 256\}.
\end{align*}
Its lower shadow is
\begin{align*}
\partial \mathcal{A} = \{12, 13, 23, 45, 46, 56, 14, 24, 25, 26\},
\end{align*}
which has $10$ elements.
Now consider the bunched family
\begin{align*}
\mathcal{A}' = \{123,\, 124,\, 134,\, 234\}.
\end{align*}
These are exactly all $3$-element subsets of $\{1,2,3,4\}$. The lower shadow is
\begin{align*}
\partial \mathcal{A}' = \{12, 13, 14, 23, 24, 34\},
\end{align*}
which has only $6$ elements. Both families have size $4$, but the bunched family has a substantially smaller shadow.
[/example]
The key observation is that $\mathcal{A}' = [4]^{(3)}$, the complete $3$-uniform family on $4$ elements. This suggests a general principle: whenever $|\mathcal{A}| = \binom{k}{r}$, the optimal choice is $\mathcal{A} = [k]^{(r)}$, which achieves $|\partial \mathcal{A}| = \binom{k}{r-1}$. For other values of $|\mathcal{A}|$ — when $|\mathcal{A}|$ is not a binomial coefficient — we need a way to identify the "most bunched" families of arbitrary size. This requires a total ordering on $r$-element sets.
## The Colex Order and Initial Segments
When $|\mathcal{A}| = \binom{k}{r}$, the complete family $[k]^{(r)}$ achieves shadow size $\binom{k}{r-1}$, and there is no ambiguity about which family to use. The difficulty arises for arbitrary $m$: we need to identify the "most bunched" family of size $m$ for any $m$, not just binomial coefficient sizes. This requires a total ordering on $X^{(r)}$ so that we can speak of the first $m$ sets. But which ordering? The lexicographic order is a natural first guess, and it turns out to be wrong: lex initial segments do not minimise the shadow in general. The colexicographic order is the right one, and the proof of this is the substance of the theorem. There are two natural total orders on $X^{(r)}$ and more generally on $\mathbb{N}^{(r)}$:
[definition: Lex and Colex Orders]
Let $A, B \in \mathbb{N}^{(r)}$ with $A \neq B$. Write $A \triangle B = (A \setminus B) \cup (B \setminus A)$ for the symmetric difference.
- The **lexicographic order** (lex) sets $A <_{\mathrm{lex}} B$ if $\min(A \triangle B) \in A$.
- The **colexicographic order** (colex) sets $A <_{\mathrm{colex}} B$ if $\max(A \triangle B) \in B$.
[/definition]
In lex order, $A$ comes first if the smallest element distinguishing $A$ from $B$ belongs to $A$. In colex order, $B$ comes last if the largest distinguishing element belongs to $B$. Both are genuine total orders on $\mathbb{N}^{(r)}$.
[example: Colex Order on 3-Element Sets]
For $r = 3$, the elements of $\mathbb{N}^{(3)}$ in colex order begin
\begin{align*}
\{1,2,3\},\; \{1,2,4\},\; \{1,3,4\},\; \{2,3,4\},\; \{1,2,5\},\; \{1,3,5\},\; \{2,3,5\},\; \{1,4,5\},\; \{2,4,5\},\; \{3,4,5\},\; \{1,2,6\},\; \ldots
\end{align*}
[/example]
Writing elements as sorted strings for brevity ($\{1,2,3\} = 123$), the pattern is clear: the colex order groups together all sets with a fixed maximum, listing those with maximum $k$ before moving to those with maximum $k+1$. More precisely, to compare $A$ and $B$, colex looks at the largest element in the symmetric difference and gives precedence to the set that does *not* contain it.
The critical structural property of colex is that its initial segments in $\mathbb{N}^{(r)}$ are exactly the families $[n]^{(r)}$ for each $n \geq r$. The initial segment of size $\binom{n}{r}$ is $[n]^{(r)}$ — the complete $r$-uniform hypergraph on $[n]$. This makes colex a natural candidate for the "right" order: its initial segments are precisely the bunched families $[k]^{(r)}$ that we expect to minimise the shadow.
## Simple Compressions
The proof strategy has three steps: define operators that push a family towards the beginning of the colex order without increasing the shadow; characterise the fixed points of all such operators; conclude that fixed points are initial segments. The simplest operators are the $(i,j)$-compressions.
[definition: Simple Compression]
Let $X$ be a finite set and $i, j \in X$ with $i \neq j$. For $A \in X^{(r)}$, define
\begin{align*}
C_{ij}(A) = \begin{cases} (A \setminus \{j\}) \cup \{i\} & \text{if } j \in A \text{ and } i \notin A, \\ A & \text{otherwise.} \end{cases}
\end{align*}
For a family $\mathcal{A} \subseteq X^{(r)}$, define
\begin{align*}
C_{ij}(\mathcal{A}) = \{ C_{ij}(A) : A \in \mathcal{A} \} \cup \{ A \in \mathcal{A} : C_{ij}(A) \in \mathcal{A} \}.
\end{align*}
[/definition]
The operation $C_{ij}$ on a set replaces $j$ with $i$ when possible (i.e., when $j$ is present but $i$ is not). The operation on a family keeps the translated version $C_{ij}(A)$ when it is not already in $\mathcal{A}$, but retains $A$ itself when $C_{ij}(A)$ is already present (to avoid duplicates). The two cases in the family definition ensure $|C_{ij}(\mathcal{A})| = |\mathcal{A}|$.
Intuitively, if we think of $\mathbb{N}^{(r)}$ as arranged in a grid where each set $B \cup \{j\}$ sits above $B \cup \{i\}$ (for $i < j$), then $C_{ij}$ pushes elements down in this grid — it replaces a higher index with a lower one wherever possible. The crucial property of this operation is that it does not increase the shadow:
[quotetheorem:2592]
The inclusion $\partial C_{ij}(\mathcal{A}) \subseteq C_{ij}(\partial \mathcal{A})$ says that every $(r-1)$-set in the shadow of the compressed family is the $C_{ij}$-image of some set already in the shadow of $\mathcal{A}$. Since $C_{ij}$ is injective (as a map on $X^{(r-1)}$) and $C_{ij}(\partial \mathcal{A})$ has the same size as $\partial \mathcal{A}$, the size bound follows.
To see that the shadow can strictly decrease: take $\mathcal{A} = \{\{1,3,5\}, \{2,3,5\}\}$ with $r = 3$, $i = 1$, $j = 3$. Then $C_{13}(\mathcal{A}) = \{\{1,2,5\}, \{1,3,5\}\}$ (the set $\{2,3,5\}$ maps to $\{1,2,5\}$, and $\{1,3,5\}$ is unchanged since $1 \in \{1,3,5\}$). The shadow of $\mathcal{A}$ consists of $\{1,3\}, \{1,5\}, \{3,5\}, \{2,3\}, \{2,5\}$ — five sets. The shadow of $C_{13}(\mathcal{A})$ consists of $\{1,2\}, \{1,5\}, \{2,5\}, \{1,3\}, \{3,5\}$ — again five, but the two sets $\{1,2,5\}$ and $\{1,3,5\}$ share the $(r-1)$-subset $\{1,5\}$, whereas the original two sets shared only $\{3,5\}$. Increasing overlap between sets in the family causes the shadow to shrink, which is precisely what compressions are designed to produce.
This theorem gives us a machine for reducing the shadow, and we want to apply it systematically. But simple compressions alone are not enough: even after applying $C_{ij}$ for every $i < j$, the resulting family need not be a colex initial segment, as the next example shows.
[definition: Left-Compressed Family]
A family $\mathcal{A} \subseteq X^{(r)}$ is called **left-compressed** if $C_{ij}(\mathcal{A}) = \mathcal{A}$ for all $i < j$. Equivalently, whenever $A \in \mathcal{A}$, $j \in A$, $i \notin A$, and $i < j$, the set $(A \setminus \{j\}) \cup \{i\}$ is also in $\mathcal{A}$.
[/definition]
By repeatedly applying simple compressions (for all pairs $i < j$), any family can be made left-compressed without increasing its shadow. Since initial segments of colex are left-compressed, one might hope that the converse holds — that left-compressed families are exactly the initial segments. Unfortunately, this is not the case.
[example: Left-Compressed but Not an Initial Segment]
The family $\{123, 124, 125, 126\}$ (writing elements as sorted strings) is left-compressed: no set can be compressed further, since $1$ and $2$ are already the two smallest elements in every set. However, it is not an initial segment of the colex order on $\mathbb{N}^{(3)}$: the set $134$ precedes $125$ in colex (since the largest element in $\{1,3,4\} \triangle \{1,2,5\} = \{3,4,2,5\}$ is $5 \in \{1,2,5\}$, so $\{1,3,4\} <_{\mathrm{colex}} \{1,2,5\}$), but $134 \notin \mathcal{A}$ while $125 \in \mathcal{A}$.
[/example]
This shows that simple compressions are insufficient to characterise colex initial segments, and we need a more powerful family of compression operators.
## General Compressions and Initial Segments
To capture the full structure of the colex order, we generalise from single-element swaps to simultaneous swaps of $s$-element sets.
[definition: General Compression]
Let $U, V \subseteq X$ with $|U| = |V| = s$ and $U \cap V = \varnothing$. For $A \subseteq X$, define
\begin{align*}
C_{UV}(A) = \begin{cases} (A \setminus V) \cup U & \text{if } A \cap (U \cup V) = V, \\ A & \text{otherwise.} \end{cases}
\end{align*}
For $\mathcal{A} \subseteq X^{(r)}$, define
\begin{align*}
C_{UV}(\mathcal{A}) = \{ C_{UV}(A) : A \in \mathcal{A} \} \cup \{ A \in \mathcal{A} : C_{UV}(A) \in \mathcal{A} \}.
\end{align*}
We say $\mathcal{A}$ is **$(U,V)$-compressed** if $C_{UV}(\mathcal{A}) = \mathcal{A}$.
[/definition]
The condition $A \cap (U \cup V) = V$ means that $A$ contains all of $V$ and none of $U$. In this case, $C_{UV}(A)$ replaces $V$ with $U$ inside $A$. The family-level operation again retains $A$ when $C_{UV}(A)$ is already in the family, ensuring the size is preserved. Note that simple compressions $C_{ij}$ correspond to the special case $s = 1$, $U = \{i\}$, $V = \{j\}$; general compressions swap entire blocks simultaneously.
The behaviour of $(U,V)$-compressions on shadows is more delicate than for simple compressions. The following lemma gives the key condition under which the shadow does not increase.
[quotetheorem:2593]
The proof proceeds by induction on $s = |U| = |V|$. The base case $s = 1$ is the simple compression theorem above. For the inductive step, given a set $B \in \partial C_{UV}(\mathcal{A})$, one shows that $B \in C_{UV}(\partial \mathcal{A})$ by considering whether $B$ intersects $U$ or $V$, using the inductive hypothesis on smaller compressions to handle each case. The details require careful case analysis on how $B$ relates to $U$ and $V$, which is why the hypothesis that all smaller compressions are already applied is essential. This theorem is stated without full proof here; the argument is carried out in Bollobás, *Combinatorics* (1986), Chapter 8, and Anderson, *Combinatorics of Finite Sets* (1987).
The hypothesis asks for a weaker form of compression already to be in place: for each element of $U$, if we remove it and a matching element from $V$, the smaller compression should already be applied. This inductive structure on the size $s = |U| = |V|$ is what makes the argument work. The condition $\max V > \max U$ is the crucial directional requirement: it ensures that applying $C_{UV}$ moves sets earlier in the colex order (since colex comparisons are determined by the maximum element of the symmetric difference, and replacing the higher-indexed block $V$ with the lower-indexed block $U$ strictly reduces a set's position). Without this condition, compressions could push families in the wrong direction, away from colex initial segments rather than towards them.
The next result is the key characterisation: being $(U,V)$-compressed for all valid pairs is equivalent to being a colex initial segment.
[quotetheorem:2594]
[citeproof:2594]
We have now established the characterisation: a family is a colex initial segment if and only if it is closed under all valid compressions. The condition $\max V > \max U$ is not merely a technicality — it is exactly what distinguishes the colex direction from the reverse. A family closed under all compressions with $\max V > \max U$ is precisely one that has absorbed everything colex-smaller than any of its members, which is the definition of an initial segment. This gives us the target: to minimise the shadow, we want to reach a family that is closed under all such compressions.
We have now reduced the problem to showing that any family can be $(U,V)$-compressed for all valid pairs without increasing the shadow.
[quotetheorem:2595]
[citeproof:2595]
The potential function $\Sigma(\mathcal{B}) = \sum_{B \in \mathcal{B}} \sum_{i \in B} 2^i$ is a weighted sum that assigns larger weight to sets containing larger elements. The key property is that applying any valid compression $C_{UV}$ with $\max V > \max U$ strictly decreases $\Sigma$ whenever the compression is nontrivial, because we replace elements of $V$ — which contribute high weights $2^v$ — with elements of $U$ — which contribute lower weights $2^u < 2^v$ since $u < v$. This makes $\Sigma$ a genuine potential that decreases at every compression step, and the minimiser must therefore be a fixed point of all compressions, i.e., fully compressed. The minimiser is also the unique colex initial segment of size $|\mathcal{A}|$: the colex order on $\mathbb{N}^{(r)}$ is a total order, so there is exactly one initial segment of each cardinality.
## The Kruskal–Katona Theorem
The machinery is now in place. Any family $\mathcal{A}$ can be transformed, without increasing its shadow, into a fully-compressed family $\mathcal{B}$ of the same size. By the characterisation theorem, $\mathcal{B}$ is a colex initial segment. Colex initial segments are uniquely determined by their cardinality, and they achieve the minimum shadow. Putting this together:
[quotetheorem:2596]
[citeproof:2596]
This is a remarkable result: it says that among all $r$-uniform families of a fixed size, the colex initial segments have the smallest possible shadow. The theorem gives a complete solution to the shadow minimisation problem.
Why colex and not lex? The failure of lex can be seen concretely: for $r = 2$ and $m = 3$, the lex initial segment is $\{\{1,2\}, \{1,3\}, \{1,4\}\}$ with shadow $\{1, 2, 3, 4\}$ of size $4$, while the colex initial segment is $\{\{1,2\}, \{1,3\}, \{2,3\}\}$ with shadow $\{1, 2, 3\}$ of size $3$. The lex order bunches all sets around the smallest element, spreading the higher indices and enlarging the shadow; the colex order concentrates sets on a small ground set, causing the boundary to stay small. The extremal family is unique: since the colex order on $\mathbb{N}^{(r)}$ is a total order, there is exactly one initial segment of each cardinality, and this is the unique minimiser. The Kruskal–Katona theorem is one of the foundational results of discrete isoperimetry: in continuous settings, balls minimise surface area for a given volume; here, colex initial segments play the role of balls, the shadow plays the role of surface, and family size plays the role of volume. Many subsequent results in extremal combinatorics — including the Frankl union-closed conjecture approach and the Ahlswede–Khachatrian complete intersection theorem — rely on compression arguments in the same spirit.
## The Shadow Function and the Cascade Representation
The Kruskal–Katona theorem identifies the colex initial segment as the extremal family, but it does not immediately tell us the size of that shadow. Given $m$ sets, we know the minimum shadow is achieved by the colex initial segment of size $m$ — but what is the numerical value of $|\partial \mathcal{C}|$? To answer this we need an explicit description of the colex initial segment of size $m$ and a formula for its boundary. The key is a representation of $m$ as a sum of binomial coefficients that encodes the last set in the colex ordering, making the boundary count immediate. Define the **shadow function**
\begin{align*}
\partial^{(r)}(m) = \min \{ |\partial \mathcal{A}| : \mathcal{A} \subseteq X^{(r)},\, |\mathcal{A}| = m \}
\end{align*}
for $m \leq \binom{n}{r}$. By the theorem, this minimum is achieved by the colex initial segment of size $m$, and the value is the size of its shadow. Since every colex initial segment is determined by its last element, to compute $\partial^{(r)}(m)$ we need an explicit description of the initial segment of size $m$.
[example: Size of a Colex Initial Segment]
Take $r = 4$ and consider the initial segment ending in the set $\{3,4,7,9\}$ (written $3479$ for brevity). Any set $\{a_1 < a_2 < a_3 < a_4\}$ precedes $\{3,4,7,9\}$ in colex iff either $a_4 < 9$, or $a_4 = 9$ and $a_3 < 7$, or $a_4 = 9$, $a_3 = 7$, and $a_2 < 4$, or $a_4 = 9$, $a_3 = 7$, $a_2 = 4$, and $a_1 < 3$, or $a_4 = 9$, $a_3 = 7$, $a_2 = 4$, $a_1 = 3$ (the set itself). Counting:
\begin{align*}
\binom{8}{4} + \binom{6}{3} + \binom{3}{2} + \binom{2}{1} + 1.
\end{align*}
The pattern emerging is that each term is a binomial coefficient $\binom{m_j}{j}$ where the $m_j$ are derived from the elements of the last set.
[/example]
This example motivates the **cascade** (or Macaulay) representation of integers. Every positive integer $m$ can be written uniquely in the form
\begin{align*}
m = \binom{m_r}{r} + \binom{m_{r-1}}{r-1} + \cdots + \binom{m_s}{s}
\end{align*}
for some integers $m_r > m_{r-1} > \cdots > m_s \geq s \geq 1$. This is the **$r$-cascade representation** of $m$.
The shadow of the colex initial segment with this cascade representation satisfies a strikingly simple formula:
\begin{align*}
\partial^{(r)}\!\left(\sum_{i=s}^{r} \binom{m_i}{i}\right) = \sum_{i=s}^{r} \binom{m_i}{i-1}.
\end{align*}
In particular, $\partial^{(r)}\!\left(\binom{n}{r}\right) = \binom{n}{r-1}$, which is the familiar count: the shadow of the complete family $[n]^{(r)}$ consists of all $(r-1)$-element subsets of $[n]$.
## The Lovász Bound
Working with the cascade representation of $m$ requires finding the unique decomposition of $m$ into binomial coefficients, which can be mildly tedious. Lovász gave a clean reformulation that avoids this.
[quotetheorem:2597]
[citeproof:2597]
The Lovász bound reduces the cascade computation to solving a single real-valued equation. Its elegance lies in the fact that the generalised binomial coefficient $\binom{x}{r}$ is a smooth, strictly increasing function of $x$ for fixed $r$, so given $m$ one can solve $\binom{x}{r} = m$ uniquely for $x \geq r$ and read off the lower bound $\binom{x}{r-1}$ directly, without computing the cascade representation of $m$.
[remark: Sharpness of the Lovász Bound]
The bound $|\partial \mathcal{A}| \geq \binom{x}{r-1}$ is tight at integer values $x$: taking $\mathcal{A} = [x]^{(r)}$ gives $|\mathcal{A}| = \binom{x}{r}$ and $|\partial \mathcal{A}| = \binom{x}{r-1}$. For non-integer $x$, the bound is not always achieved as an equality, but it remains a valid and often sharp lower bound that is far simpler to apply than the exact cascade formula.
[/remark]
The Lovász bound is particularly useful in applications: given $|\mathcal{A}|$, one solves $\binom{x}{r} = |\mathcal{A}|$ for $x$ (treating this as an equation in a real variable) and reads off the lower bound $\binom{x}{r-1}$ directly. This avoids computing the cascade representation entirely. The Kruskal–Katona theorem, together with this clean real-valued reformulation, provides the foundation for many subsequent isoperimetric results in combinatorics.
The Kruskal–Katona theorem provides precise extremal bounds for subset families through a greedy ordering principle. These discrete bounds now find their continuous analog: isoperimetric inequalities that govern the behavior of volumes and surface areas in Euclidean space.
# 4. Isoperimetric Inequalities
This chapter extends the isoperimetric theme of Chapter 3. The Kruskal–Katona theorem established there answered how small the lower shadow of a uniform family $\mathcal{A} \subseteq X^{(r)}$ can be: the colex initial segment achieves the minimum. The natural next question is to drop the uniformity constraint, allow $\mathcal{A} \subseteq \mathcal{P}(X)$ to contain sets of mixed sizes, and ask how small the set of all *neighbours* of $\mathcal{A}$ can be. The answer turns out to be provided by Hamming balls: the initial segments of the simplicial order. The chapter then turns to a variant, the edge isoperimetric inequality, where the object to minimise is the number of edges crossing from $\mathcal{A}$ to its complement rather than the number of vertices outside $\mathcal{A}$ adjacent to it. Here the optimal sets are subcubes, and the relevant order is the binary order.
## The Discrete Cube and Its Isoperimetric Problems
To work with $\mathcal{P}(X)$ as a combinatorial object, we turn it into a graph by connecting subsets that differ in exactly one element.
[definition: Discrete Cube]
Given a finite set $X$ with $|X| = n$, the **discrete cube** $Q_n$ is the graph with vertex set $\mathcal{P}(X)$ in which two vertices $x, y$ are adjacent if and only if $|x \triangle y| = 1$, i.e., $x = y \cup \{a\}$ for some $a \notin y$, or $y = x \cup \{a\}$ for some $a \notin x$.
[/definition]
Identifying each subset $A \subseteq X$ with its indicator vector $\mathbf{1}_A \in \{0,1\}^n$ (where the $i$-th coordinate is $1$ if $i \in A$ and $0$ otherwise), the graph $Q_n$ is exactly the $n$-dimensional unit hypercube in $\mathbb{R}^n$: two $\{0,1\}$-vectors are adjacent when they differ in exactly one coordinate.
[example: The Cube $Q_3$]
The vertices of $Q_3$ are the eight subsets of $\{1,2,3\}$, arranged in four levels by size: $\varnothing$ at the bottom, then $\{1\}, \{2\}, \{3\}$, then $\{1,2\}, \{1,3\}, \{2,3\}$, and $\{1,2,3\}$ at the top. Two vertices are joined when they differ by one element. Under the indicator-vector identification, $\{1,3\} \leftrightarrow 101$ and $\{1,2,3\} \leftrightarrow 111$ are adjacent because they differ in the coordinate for element $2$. The resulting graph is precisely the combinatorial skeleton of a $3$-dimensional cube.
[/example]
[illustration:discrete-cube-q3]
With the graph structure in hand, we can define the two central notions of boundary.
[definition: Boundary and Neighbourhood]
Let $G$ be a graph and $A \subseteq V(G)$. The **boundary** of $A$ is
\begin{align*}
b(A) = \{ x \in V(G) \setminus A : x \text{ is adjacent to some vertex in } A \}.
\end{align*}
The **neighbourhood** of $A$ is $N(A) = A \cup b(A)$, i.e., the closed neighbourhood.
[/definition]
Note that $|b(A)| = |N(A)| - |A|$. It is often more convenient to prove isoperimetric bounds in terms of $|N(A)|$ rather than $|b(A)|$, since the neighbourhood has cleaner combinatorial descriptions.
An **isoperimetric inequality** on $G$ is a lower bound of the form $|b(A)| \geq f(|A|)$ for all $A \subseteq V(G)$. The classical continuous analogue is well-known: among all regions of fixed area in the plane, the disc has the smallest perimeter; among regions of fixed volume in $\mathbb{R}^3$, the ball has the smallest surface area; and among regions of fixed area on $S^2$, the spherical cap has the smallest perimeter.
In each continuous case the optimal shape is a *ball*, and the same turns out to be true in $Q_n$: the sets minimising $|N(A)|$ for a given $|A|$ are the discrete analogues of balls.
A natural first guess is to try a different kind of structured set. Subcubes — sets of the form $\{A \subseteq X : A \subseteq S\}$ for some fixed $S \subseteq X$ — are geometrically appealing, since they are exactly the faces of the hypercube. But they fail the vertex boundary test. In $Q_3$, take the subcube $A' = \{\varnothing, \{1\}, \{2\}, \{1,2\}\}$ (all subsets of $\{1,2\}$, a $2$-dimensional face). Its vertex boundary consists of $\{3\}, \{1,3\}, \{2,3\}, \{1,2,3\}$ — all four remaining vertices — giving $|b(A')| = 4$. On the other hand, the set $\{\varnothing, \{1\}, \{2\}, \{3\}\}$ has boundary $\{\{1,2\}, \{1,3\}, \{2,3\}\}$, so $|b(A)| = 3 < 4$. Subcubes are not optimal for the vertex boundary problem; the ball-like set does better.
[definition: Hamming Ball]
The **Hamming ball** of radius $r$ in $Q_n$ centred at $\varnothing$ is $X^{(\leq r)} = \bigcup_{k=0}^{r} X^{(k)}$, i.e., the family of all subsets of $X$ of size at most $r$.
More generally, a set of the form $A = X^{(\leq r)} \cup B$ with $B \subseteq X^{(r+1)}$ is called a **Hamming ball** (or **ball in simplicial order**).
[/definition]
The neighbourhood of $X^{(\leq r)}$ is $X^{(\leq r+1)}$ — every subset of size $\leq r$ is already in $A$, and its only neighbours not in $A$ are sets of size $r+1$. So Hamming balls have neighbourhoods that are themselves Hamming balls. The question is whether these are minimal.
## Compressions and the Simplicial Order
The space $\mathcal{P}(X)$ has $2^n$ subsets, and a brute-force comparison of all families of a given size is completely infeasible — there are $\binom{2^n}{m}$ families of size $m$, which grows faster than any polynomial. To find the minimum of $|N(A)|$ we need a structural argument: a way to continuously improve any family toward an optimal one without ever increasing the neighbourhood. The compression method, which we already saw in the Kruskal–Katona proof, provides exactly this. The key steps are: identify the correct total order on $\mathcal{P}(X)$ so that initial segments capture the "ball-like" structure, then define compression operators that move sets toward initial segments without increasing $|N(A)|$.
[definition: Simplicial Ordering]
The **simplicial ordering** on $Q_n$ is the total order defined by: $x < y$ if either $|x| < |y|$, or $|x| = |y|$ and $x < y$ in lex order (where the lex order on subsets of the same size compares by the smallest element in their symmetric difference).
[/definition]
The simplicial order lists $\mathcal{P}(X)$ by going level-by-level in lex within each level:
\begin{align*}
\varnothing,\; \{1\},\; \{2\},\; \{3\},\; \ldots,\; \{n\},\; \{1,2\},\; \{1,3\},\; \ldots,\; X.
\end{align*}
An initial segment of length $\binom{n}{0} + \binom{n}{1} + \cdots + \binom{n}{r} + m$ (where $0 \leq m \leq \binom{n}{r+1}$) is precisely the Hamming ball $X^{(\leq r)}$ together with the first $m$ elements of $X^{(r+1)}$ in lex order. By the Kruskal–Katona theorem applied to $B \subseteq X^{(r+1)}$, the lex initial segment minimises the upper shadow $\partial^+ B$. Since
\begin{align*}
N(X^{(\leq r)} \cup B) = X^{(\leq r+1)} \cup \partial^+ B,
\end{align*}
minimising $|\partial^+ B|$ is equivalent to minimising $|N(A)|$ when $A$ contains $X^{(\leq r)}$. This motivates why the simplicial order is the right one.
Now we define the compression operators. For $A \subseteq Q_n$ and $1 \leq i \leq n$, define the **$i$-sections** of $A$ as subsets of $\mathcal{P}(X \setminus \{i\})$:
\begin{align*}
A_-^{(i)} &= \{ x \in A : i \notin x \}, \\
A_+^{(i)} &= \{ x \setminus \{i\} : x \in A,\; i \in x \}.
\end{align*}
Thus $A_-^{(i)}$ is the collection of sets in $A$ not containing $i$, and $A_+^{(i)}$ is obtained from the sets in $A$ that do contain $i$ by removing $i$.
[definition: $i$-Compression]
The **$i$-compression** $C_i(A)$ of $A \subseteq Q_n$ is the set in $Q_n$ whose $i$-sections are the initial segments of $\mathcal{P}(X \setminus \{i\})$ in the simplicial order of sizes $|A_+^{(i)}|$ and $|A_-^{(i)}|$:
\begin{align*}
C_i(A)_+^{(i)} &= \text{first } |A_+^{(i)}| \text{ elements of } \mathcal{P}(X \setminus \{i\}) \text{ in simplicial order}, \\
C_i(A)_-^{(i)} &= \text{first } |A_-^{(i)}| \text{ elements of } \mathcal{P}(X \setminus \{i\}) \text{ in simplicial order}.
\end{align*}
We say $A$ is **$i$-compressed** if $C_i(A) = A$, and **compressed** if it is $i$-compressed for all $i$.
[/definition]
Note that $|C_i(A)| = |A|$ since $|C_i(A)_+^{(i)}| + |C_i(A)_-^{(i)}| = |A_+^{(i)}| + |A_-^{(i)}| = |A|$.
[example: A Compression in $Q_4$]
Work in $Q_4$ with $X = \{1,2,3,4\}$. Suppose $A$ has sections $A_+^{(4)} = \{\{1,2,3\}, \{1\}, \{2,3\}\}$ and $A_-^{(4)} = \{\{1\}, \{2\}, \{1,2\}, \{3\}\}$ in $Q_3$. Applying the $4$-compression replaces $A_+^{(4)}$ by the first $3$ elements of $\mathcal{P}(\{1,2,3\})$ in simplicial order, which are $\varnothing, \{1\}, \{2\}$, and replaces $A_-^{(4)}$ by the first $4$ elements, which are $\varnothing, \{1\}, \{2\}, \{3\}$. The resulting set $C_4(A)$ is "more like" an initial segment than $A$ was, with the heavy elements pushed down.
[/example]
The fundamental property of $i$-compressions is that they do not increase the neighbourhood.
[quotetheorem:2598]
[citeproof:2598]
The theorem tells us that a single compression step — replacing the $i$-sections of $A$ with the "most efficient" initial segments of the same sizes — can only help. Why does the cube structure matter here? The key is that the decomposition of $N(A)$ over the $i$-coordinate, splitting into contributions from the $+$ and $-$ sections, is valid precisely because in $Q_n$ each vertex has exactly one neighbour differing in coordinate $i$. On a general graph this decomposition breaks down. Furthermore, the nesting property of simplicial initial segments (smaller initial segments are contained in larger ones) is what allows the max formula $|N(B)| = \max\{|N(B_+^{(i)})|, |B_-^{(i)}|\} + \max\{|N(B_-^{(i)})|, |B_+^{(i)}|\}$ to produce a clean upper bound. The cube's product structure is doing real work.
One could apply compressions in any order — $C_1$, then $C_2$, and so on. Does the final result depend on the order? In general it does: after applying $C_1$ followed by $C_2$, the set may no longer be $1$-compressed, so one must iterate. But what matters for Harper's theorem is only that *some* compressed set of the same size exists with smaller neighbourhood, and the following reduction shows this is always achievable regardless of the order in which compressions are applied.
[quotetheorem:2599]
[citeproof:2599]
## Harper's Theorem
The compression lemma reduces the problem to understanding compressed sets. Why does the termination argument work? Each application of $C_i$ that actually changes $A$ strictly decreases the total weight $\sum_{x \in A} |x|$: elements are moved from larger sets to smaller ones in the simplicial order, reducing the sum of cardinalities. Since this weight is a non-negative integer, the process must stop in finitely many steps. The resulting set $B$ is $i$-compressed for every $i$ simultaneously.
One might now hope that every compressed set is an initial segment in the simplicial order — this would immediately finish the proof. But this hope is false.
[example: Compressed but Not an Initial Segment]
In $Q_3$, the set $\{\varnothing, \{1\}, \{2\}, \{1,2\}\}$ is compressed (any $i$-compression either leaves it unchanged or produces the same set), but it is not the simplicial initial segment of size $4$. The simplicial initial segment of size $4$ is $\{\varnothing, \{1\}, \{2\}, \{3\}\}$, which is obtained by replacing $\{1,2\}$ with $\{3\}$.
In $Q_4$, the set $A = \{\varnothing, \{1\}, \{2\}, \{3\}, \{4\}, \{1,2\}, \{1,3\}, \{2,3\}\}$ of size $8 = 2^{4-1}$ is compressed but not an initial segment; the simplicial initial segment of size $8$ would replace $\{2,3\}$ with $\{1,4\}$.
[/example]
These examples reveal a common pattern: in both cases, the non-initial-segment compressed set is obtained from the initial segment by swapping one element with its complement. The existence of these anomalies is deeply tied to the complement symmetry of $Q_n$. Since complementation $x \mapsto x^c$ reverses the simplicial order and $|Q_n| = 2^n$ is even, there is a unique pair of consecutive elements that are complements of each other — situated exactly at the midpoint $2^{n-1}$ of the order. Any compressed set that deviates from an initial segment must do so precisely at this pair. The following lemma makes this precise.
[quotetheorem:2600]
[citeproof:2600]
A direct computation shows that the anomalous compressed sets of size $2^{n-1}$ have a strictly larger neighbourhood than the simplicial initial segment of the same size. (This is verified by checking that $|N(B)| > |N(C)|$ when $C$ is the initial segment.) Combining this with the compression reduction, we obtain Harper's theorem.
[quotetheorem:2601]
[citeproof:2601]
[remark: Interpreting Harper's Theorem]
Harper's theorem is the combinatorial analogue of the classical isoperimetric theorem for the sphere: among all subsets of $Q_n$ of fixed size, the Hamming balls minimise the neighbourhood. The simplicial initial segments of size $\sum_{i=0}^{r} \binom{n}{i}$ are exactly $X^{(\leq r)}$, whose neighbourhood is $X^{(\leq r+1)}$ with size $\sum_{i=0}^{r+1} \binom{n}{i}$. For sizes between successive Hamming balls, the optimal sets are obtained by filling $X^{(\leq r)}$ and then taking a lex initial segment within $X^{(r+1)}$, which by Kruskal–Katona is again optimal.
[/remark]
[explanation: Necessity, Uniqueness, and Scope of Harper's Theorem]
Is the minimiser unique? For sizes that are exactly $\sum_{i=0}^{r} \binom{n}{i}$ — i.e., complete Hamming balls — the minimiser $X^{(\leq r)}$ is unique: the anomalous compressed set of size $2^{n-1}$ has strictly larger neighbourhood, and any other family of the same size has neighbourhood at least as large. For intermediate sizes, the minimiser is the unique simplicial initial segment of that size (the lex initial segment within level $r+1$ is uniquely determined by its size).
Does Harper's theorem hold on graphs other than the cube? Not in general. The cube is special: it has a product structure that makes the $i$-section decomposition exact, and initial segments of the simplicial order are nested in a way that keeps the compression argument tight. On a general vertex-transitive graph, there is no reason why the analogue of compressions should preserve the neighbourhood. Results of this type — isoperimetric inequalities on product graphs — require graph-specific arguments.
Finally, a stability question: if $|N(A)|$ is close to $|N(C)|$ for the optimal $C$, must $A$ be close to a Hamming ball? This is the realm of stability theorems, which are an active area of research. The compression proof as given does not yield stability directly, since compressions could in principle make many small changes before reaching the minimum.
[/explanation]
## The Edge Isoperimetric Inequality
The vertex isoperimetric inequality minimises the number of vertices *outside* $A$ adjacent to $A$. A different question asks: given $A \subseteq V(Q_n)$, how small can the number of *edges* between $A$ and its complement be? This is the edge isoperimetric problem.
[definition: Edge Boundary]
Let $G = (V, E)$ be a graph and $A \subseteq V$. The **edge boundary** of $A$ is
\begin{align*}
\partial_e A = \{ xy \in E : x \in A,\; y \notin A \}.
\end{align*}
[/definition]
Unlike the vertex boundary, the edge boundary is not minimised by Hamming balls. To see this, compare two sets in $Q_{2k+1}$ of size $2^{2k}$: the Hamming ball $X^{(\leq k)}$ and the bottom face $\{A \subseteq X : n \notin A\}$ (a subcube of dimension $n-1$).
[example: Hamming Ball vs Subcube in $Q_3$]
In $Q_3$, consider $|A| = 4$. The Hamming ball (simplicial initial segment) $\{\varnothing, \{1\}, \{2\}, \{3\}\}$ has neighbourhood $\{\{1,2\}, \{1,3\}, \{2,3\}\}$, contributing $3 \times 2 = 6$ edges in $\partial_e A$ (each of the three singletons sends one edge to each of the three pairs, but only the relevant ones count). Counting directly: from $\varnothing$ three edges go to the singletons outside $A$ — but all singletons are in $A$. From each singleton there are edges to the three pairs; since the three pairs are all outside $A$, each singleton contributes $2$ edges. From $\varnothing$ there are no edges leaving $A$ since all neighbours $\{1\},\{2\},\{3\}$ are in $A$. So $|\partial_e A| = 3 \cdot 2 = 6$.
Now take the subcube $A' = \{\varnothing, \{1\}, \{2\}, \{1,2\}\}$, the bottom face with $3 \notin x$ for all $x \in A'$. The edges from $A'$ to its complement go from each element of $A'$ to the element obtained by adding $3$: $\varnothing \to \{3\}$, $\{1\} \to \{1,3\}$, $\{2\} \to \{2,3\}$, $\{1,2\} \to \{1,2,3\}$. So $|\partial_e A'| = 4$, which is strictly smaller than $6$.
[/example]
This example shows subcubes outperform Hamming balls for the edge isoperimetric problem. More generally, a $k$-dimensional subcube in $Q_n$ has $2^k$ vertices and edge boundary of size $2^k(n-k)$ (each vertex in the subcube has exactly $n-k$ edges leaving the subcube, one for each of the $n-k$ coordinates not in the subcube). For the vertex isoperimetric problem the ball shape was optimal; for the edge isoperimetric problem the cube shape is optimal.
The correct order for the edge isoperimetric inequality is the binary order, which assigns to each set $A \subseteq X$ the natural number $\sum_{i \in A} 2^{i-1}$ (treating elements as bit positions).
[definition: Binary Order]
The **binary order** on $Q_n \cong \mathcal{P}(X)$ is defined by $x < y$ if $\max(x \triangle y) \in y$. Equivalently, define $\varphi: \mathcal{P}(X) \to \mathbb{N}$ by
\begin{align*}
\varphi(x) = \sum_{i \in x} 2^i,
\end{align*}
and set $x < y$ iff $\varphi(x) < \varphi(y)$.
[/definition]
The binary order avoids large elements: the first few sets in binary order are
\begin{align*}
\varnothing,\; \{1\},\; \{2\},\; \{1,2\},\; \{3\},\; \{1,3\},\; \{2,3\},\; \{1,2,3\},\; \ldots
\end{align*}
The initial segment of size $2^k$ in binary order is exactly the $k$-dimensional subcube $\{A \subseteq \{k+1, k+2, \ldots, n\}^c : A \subseteq \{1,\ldots,k\}\}$. More precisely, the first $2^k$ elements are all subsets of $\{1, \ldots, k\}$, forming a $k$-dimensional subcube.
The edge isoperimetric inequality for the cube is proved by the same compression strategy: define binary compressions $D_i$ that replace the $i$-sections of $A$ with binary initial segments of the same sizes, show these do not increase $|\partial_e A|$, argue that repeated compressions terminate at a fully compressed set, and classify fully compressed sets.
[quotetheorem:2602]
[citeproof:2602]
[remark: Comparing the Two Inequalities]
The vertex and edge isoperimetric inequalities on the cube are parallel in structure but give different answers. For the vertex problem, Hamming balls (simplicial initial segments) are optimal: they minimise the number of vertices just outside the set. For the edge problem, subcubes (binary initial segments) are optimal: they minimise the number of edges crossing the boundary. The proofs share the same compression skeleton — define section-wise compressions, show they do not increase the quantity of interest, and classify the fixed points — but the precise compressions and orders differ. This compression framework is one of the most powerful tools in extremal combinatorics, and its versatility is evident from its success in both problems.
[/remark]
[explanation: Sharpness, Uniqueness, and Extensions]
Sharpness: the inequality $|\partial_e C| \leq |\partial_e A|$ is sharp whenever $|A| = 2^k$ for some $k$, since the $k$-dimensional subcube achieves equality. For other sizes, the binary initial segment is again the unique minimiser.
Uniqueness: the argument shows that any fully compressed set is either the binary initial segment or the single exception $\tilde{B}$ of size $2^{n-1}$, and direct computation shows $\tilde{B}$ has a larger edge boundary than the binary initial segment of the same size. So the minimiser is unique at every size.
Extensions to other graph products: the edge isoperimetric inequality extends naturally to the Cartesian product of paths (the grid graph $P_{n_1} \times P_{n_2} \times \cdots \times P_{n_k}$) and more generally to Cartesian products of arbitrary graphs. The binary order generalises to a lexicographic product order, and the compression strategy applies verbatim. This makes the edge isoperimetric framework genuinely flexible in a way that the vertex isoperimetric framework is not, since the vertex problem on product graphs is significantly harder to analyse.
[/explanation]
Isoperimetric inequalities reveal how geometry constrains the growth of sets. The focus now shifts to an algebraic setting: when we add two finite sets element-by-element in an abelian group, what can we say about the size of their sumset?
# 5. Sum Sets
This chapter investigates what happens when you add two finite subsets of an abelian group element by element. The central question is: how large or small can the resulting sum set be? In $\mathbb{R}$ a clean lower bound follows from the ordering, but in a finite group there is no order, and new phenomena arise. The Cauchy–Davenport theorem gives the sharp lower bound for prime-order groups, and the chapter closes with the Erdős–Ginzburg–Ziv theorem, which answers a related zero-sum question using Cauchy–Davenport as its key tool.
## Sum Sets in Ordered Groups
The central question of additive combinatorics is: how small can $A + B$ be? Given finite sets $A$ and $B$ in an abelian group, we can always pair up every element of $A$ with every element of $B$ to produce $|A| \cdot |B|$ sums, but many of those sums will coincide. The interesting lower bound question asks: at minimum, how many distinct sums must appear? In an ordered group like $\mathbb{R}$, the order gives a clean answer.
Let $G$ be an abelian group and $A, B \subseteq G$ finite non-empty subsets. The sum set is the natural starting point for additive combinatorics.
[definition: Sum Set]
Let $G$ be an abelian group and let $A, B \subseteq G$. The **sum set** of $A$ and $B$ is
\begin{align*}
A + B = \{a + b : a \in A,\, b \in B\}.
\end{align*}
[/definition]
The upper bound $|A + B| \leq |A| \cdot |B|$ is immediate since there are at most $|A| \cdot |B|$ pairs $(a, b)$, and this bound is achieved when all pairwise sums are distinct. The more interesting question is how small $|A + B|$ can be.
[quotetheorem:2603]
[citeproof:2603]
The ordering of $\mathbb{R}$ is essential here: it lets us exhibit a strictly increasing chain of $|A| + |B| - 1$ distinct sums. The bound $|A| + |B| - 1$ is sharp: take $A = \{0, 1, \ldots, n-1\}$ and $B = \{0, 1, \ldots, m-1\}$, where $A + B = \{0, 1, \ldots, n + m - 2\}$ is again an arithmetic progression of exactly $n + m - 1$ elements. This sharpness is not a coincidence: arithmetic progressions are precisely the sets that minimize sum set size, and a theme that sits at the heart of Freiman's theorem in additive combinatorics. In a group without order the argument collapses: in $(\mathbb{Z}/6\mathbb{Z}, +)$, take $A = B = \{0, 2, 4\}$; then $|A| = |B| = 3$ but $A + B = \{0, 2, 4\} = A$, so $|A + B| = 3 < 3 + 3 - 1 = 5$. The culprit is the subgroup $\{0, 2, 4\} \leq \mathbb{Z}/6\mathbb{Z}$.
[example: Sum Set Computation in $\mathbb{Z}_7$]
Let $A = \{0, 1, 3\}$ and $B = \{0, 2, 5\}$ in $\mathbb{Z}_7$. We compute $A + B$ directly by listing all nine pairwise sums and reducing modulo 7:
\begin{align*}
0 + 0 &= 0, & 0 + 2 &= 2, & 0 + 5 &= 5, \\
1 + 0 &= 1, & 1 + 2 &= 3, & 1 + 5 &= 6, \\
3 + 0 &= 3, & 3 + 2 &= 5, & 3 + 5 &= 1.
\end{align*}
The distinct values are $\{0, 1, 2, 3, 5, 6\}$, so $|A + B| = 6$. Cauchy–Davenport predicts $|A + B| \geq \min(7, 3 + 3 - 1) = 5$, and the actual sum set exceeds this lower bound. Note that $4$ is the only element of $\mathbb{Z}_7$ not achieved: to get $4$ we would need $a + b \equiv 4$, i.e., the pairs $(b, a) \in \{(1,3), (3,1), (4,0), (0,4), (6,5), (5,6)\}$, none of which lie in $A \times B$.
[/example]
## Sum Sets in Finite Groups
The situation in a finite group $G$ is fundamentally different. The ordering argument breaks down, and the bound $|A + B| \geq |A| + |B| - 1$ cannot hold universally: if $H \leq G$ is any subgroup, then $H + H = H$, so $|H + H| = |H|$ while $|H| + |H| - 1 = 2|H| - 1 > |H|$ whenever $|H| > 1$. The obstruction is precisely the existence of subgroups.
To eliminate this obstruction we work in a group with no proper non-trivial subgroups, namely $\mathbb{Z}_p$ for $p$ prime.
[quotetheorem:2604]
[citeproof:2604]
Primality is not a technicality — it is indispensable. In $\mathbb{Z}_6$, take $A = B = \{0, 2, 4\}$; then $|A| + |B| - 1 = 5$ but $A + B = \{0, 2, 4\}$ has only 3 elements. The subgroup $\{0, 2, 4\}$ is closed under addition and swallows the sum set entirely. More broadly, whenever $A$ is a non-trivial subgroup of $\mathbb{Z}_n$ (which exists precisely when $n$ is composite), the bound $|A + A| \geq 2|A| - 1$ fails. The proper generalisation to composite moduli is Kneser's theorem: if $|A + B| < |A| + |B| - 1$ in any abelian group $G$, then there exists a non-trivial subgroup $H \leq G$ such that $A + B + H = A + B$ and $|A + B| = |A + H| + |B + H| - |H|$. Kneser's theorem thus identifies subgroups as the only obstruction to the Cauchy–Davenport bound and underpins much of modern additive combinatorics.
[remark: Sharpness of Cauchy–Davenport]
The bound is sharp. Take $A = \{0, 1, \ldots, r-1\}$ and $B = \{0, 1, \ldots, s-1\}$ in $\mathbb{Z}_p$ with $r + s - 1 \leq p$; then $A + B = \{0, 1, \ldots, r+s-2\}$ has exactly $r + s - 1$ elements. The condition $|A| + |B| \leq p + 1$ ensures the bound does not exceed $p = |\mathbb{Z}_p|$.
[/remark]
The theorem extends immediately to sums of more than two sets by iterated application.
[quotetheorem:2605]
[citeproof:2605]
The size condition $\sum_{i=1}^{k} |A_i| \leq p + k - 1$ deserves attention. When it fails — that is, when the sets are collectively large relative to $p$ — the immediate bound $|A_1 + \cdots + A_k| \geq p$ already holds, since the sum set fills all of $\mathbb{Z}_p$. The condition is therefore not a restriction on the theorem's applicability but a bookkeeping convention: the two cases ($\sum |A_i| \leq p + k - 1$ and $\sum |A_i| > p + k - 1$) together cover all situations. The iterated form is exactly the tool needed in the Erdős–Ginzburg–Ziv proof below, where $k = p$ sets of size $2$ (plus one singleton) are combined to force the sum set to cover all of $\mathbb{Z}_p$.
## Zero-Sum Problems
A natural companion to sum set questions is the zero-sum problem: given a sequence of elements in $\mathbb{Z}_n$, when must a non-empty subsequence sum to zero? This connects to Cauchy–Davenport through a remarkable theorem of Erdős, Ginzburg, and Ziv.
We first establish the simpler consecutive-sum version as a warm-up.
[quotetheorem:2606]
[citeproof:2606]
The harder question is whether we can guarantee a zero sub-sum of a **prescribed** length, not just some consecutive sub-sum. The Erdős–Ginzburg–Ziv theorem answers this with a remarkably clean result.
[quotetheorem:2607]
[citeproof:2607]
[example: EGZ Instance in $\mathbb{Z}_3$]
Take $n = 3$, so we have $2n - 1 = 5$ elements of $\mathbb{Z}_3$. Let the sequence be $a_1, \ldots, a_5 = 0, 1, 2, 0, 2$. We must find a 3-element subset summing to $0 \pmod{3}$.
Following the prime case of the proof with $p = 3$: sort the sequence to get $0, 0, 1, 2, 2$. No three consecutive equal terms appear. Form the pairs
\begin{align*}
A_1 = \{a_1, a_3\} = \{0, 1\}, \quad A_2 = \{a_2, a_4\} = \{0, 2\}, \quad A_3 = \{a_5\} = \{2\}.
\end{align*}
The iterated Cauchy–Davenport bound gives $|A_1 + A_2 + A_3| \geq (2 + 2 + 1) - 3 + 1 = 3$, so $A_1 + A_2 + A_3 = \mathbb{Z}_3$.
To find the zero-sum subset explicitly, we need $s_1 + s_2 + s_3 \equiv 0 \pmod{3}$ with $s_i \in A_i$. Try $s_1 = 0 \in A_1$, $s_2 = 0 \in A_2$, $s_3 = 2 \in A_3$: sum is $0 + 0 + 2 = 2 \not\equiv 0$. Try $s_1 = 1 \in A_1$, $s_2 = 2 \in A_2$, $s_3 = 2 \in A_3$: sum is $1 + 2 + 2 = 5 \equiv 2 \not\equiv 0$. Try $s_1 = 0 \in A_1$, $s_2 = 2 \in A_2$, $s_3 = 2 \in A_3$: sum is $0 + 2 + 2 = 4 \equiv 1 \not\equiv 0$. Finally, $s_1 = 1 \in A_1$, $s_2 = 0 \in A_2$, $s_3 = 2 \in A_3$: sum is $1 + 0 + 2 = 3 \equiv 0$. This corresponds to selecting $a_3 = 1$ (from $A_1$), $a_2 = 0$ (from $A_2$), $a_5 = 2$ (from $A_3$) in the original sequence. Indeed $\{a_2, a_3, a_5\} = \{1, 0, 2\}$ and $0 + 1 + 2 = 3 \equiv 0 \pmod{3}$.
[/example]
[explanation: Why the Proof Works]
The prime case and the general case each exploit a different structure. In the prime case, Cauchy–Davenport is the engine: by forming two-element sets $A_i = \{a_i, a_{i+p-1}\}$, we reduce the problem to showing that the sum set $A_1 + \cdots + A_p$ covers all of $\mathbb{Z}_p$, which the iterated theorem guarantees. The key point is that we are free to choose **one** element from each pair, and having $p$ pairs of size $2$ plus one singleton is exactly enough to saturate $\mathbb{Z}_p$.
In the composite case, the induction works by a two-level strategy: first reduce to a smaller group $\mathbb{Z}_p$ by forming blocks of size $m$ (using the hypothesis for $m$), then apply the prime case to finish. The bookkeeping with $2n - 1 = 2pm - 1$ terms is designed precisely so that both levels of induction have exactly the right number of elements to work with.
[/explanation]
Sum set inequalities expose the limitations of growth in discrete groups through careful counting arguments. The final shift takes us from discrete addition to continuous projection: what is preserved when we project a bounded set onto lower-dimensional subspaces?
# 6. Projections
This chapter marks a transition from discrete combinatorics to a continuous setting. The central question is: given a bounded open body $K \subseteq \mathbb{R}^n$, can we bound the volume $|K|$ in terms of the Lebesgue measures of its coordinate projections? We develop increasingly powerful answers — from a three-dimensional inequality proved by compression, to the general Uniform Cover Inequality, and finally to the Bollobás–Thomason Box Theorem, which subsumes all these results.
## Projections and the Three-Dimensional Inequality
Let $K \subseteq \mathbb{R}^n$ be a bounded open set. For a subset $A \subseteq [n] = \{1, \ldots, n\}$, we define the **coordinate projection** of $K$ onto the coordinates indexed by $A$ as
[definition: Coordinate Projection]
Let $K \subseteq \mathbb{R}^n$ be a bounded open set and $A \subseteq [n]$. The **coordinate projection** of $K$ onto $A$ is
\begin{align*}
K_A = \{(x_i)_{i \in A} : \exists\, y \in K \text{ with } y_i = x_i \text{ for all } i \in A\} \subseteq \mathbb{R}^A.
\end{align*}
We write $|K_A|$ for the Lebesgue measure of $K_A$ as a subset of $\mathbb{R}^A$.
[/definition]
A first bound comes from partitions. If $A_1 \cup \cdots \cup A_m = [n]$ is a partition, then
\begin{align*}
|K| \leq \prod_{i=1}^m |K_{A_i}|,
\end{align*}
since the map $(x_i)_{i \in A_1} \times \cdots \times (x_i)_{i \in A_m} \mapsto (x_1, \ldots, x_n)$ surjects $K_{A_1} \times \cdots \times K_{A_m}$ onto $K$. This is entirely elementary and follows from the observation that every point of $K$ is determined by its restrictions to the $A_j$.
The more interesting question is what can be said when the sets $A_i$ overlap. For instance, in $\mathbb{R}^3$, can we bound $|K|$ from the three pairwise projections $K_{12}$, $K_{13}$, and $K_{23}$? It is not possible to bound $|K|$ from just two of the three pairwise projections — for example, the box $\left(0, \frac{1}{n}\right) \times (0, n) \times (0, n)$ has fixed $|K_{12}|$ and $|K_{13}|$ as $n \to \infty$ while $|K| \to \infty$.
With all three pairwise projections in hand, however, we do get a bound.
[quotetheorem:2608]
[citeproof:2608]
The exponent $2$ in the inequality $|K|^2 \leq |K_{12}| \cdot |K_{13}| \cdot |K_{23}|$ is sharp: the unit cube $K = (0,1)^3$ gives equality, since $|K| = 1$ and each pairwise projection has measure $1$. More generally, equality holds for boxes. The three-dimensional inequality is sometimes called the **Loomis–Whitney inequality** in dimension $3$; Loomis and Whitney proved the $n$-dimensional analogue that $|K|^{n-1} \leq \prod_{i=1}^n |K_{[n] \setminus \{i\}}|$, bounding volume by the measures of the $(n-1)$-dimensional projections onto coordinate hyperplanes. Our inequality corresponds to pairwise projections in $\mathbb{R}^3$, which is a different and in some ways harder configuration, since the projections overlap rather than partition the coordinates.
## Uniform Covers and the Uniform Cover Inequality
The three-dimensional inequality only handles the specific configuration of pairwise projections in $\mathbb{R}^3$. Moving to higher dimensions, or to overlapping projection families where each coordinate is covered more than twice, we need a more flexible language to describe which projections appear and how many times each coordinate is represented. The right concept is that of a **uniform cover** of $[n]$.
[definition: Uniform Cover]
A family $A_1, \ldots, A_r \subseteq [n]$ (with repetitions allowed) is called a **cover** of $[n]$ if $\bigcup_{i=1}^r A_i = [n]$. The family is a **uniform $k$-cover** if every element $i \in [n]$ belongs to exactly $k$ of the sets $A_1, \ldots, A_r$.
[/definition]
[example: Uniform Covers of $[3]$]
With $n = 3$:
- The singletons $\{1\}, \{2\}, \{3\}$ form a uniform $1$-cover.
- $\{1\}, \{2,3\}$ is also a uniform $1$-cover, since $1$ appears in $\{1\}$ and each of $2, 3$ appears in $\{2,3\}$.
- $\{1,2\}, \{1,3\}, \{2,3\}$ form a uniform $2$-cover: element $1$ appears in $\{1,2\}$ and $\{1,3\}$, element $2$ appears in $\{1,2\}$ and $\{2,3\}$, and element $3$ appears in $\{1,3\}$ and $\{2,3\}$.
- $\{1,2\}$ and $\{2,3\}$ do **not** form a uniform cover of $[3]$, since element $1$ appears once but element $2$ appears twice.
- With repetitions: $\{1\}, \{1\}, \{2,3\}, \{2\}, \{3\}$ is a uniform $2$-cover.
[/example]
The uniform $2$-cover $\{12\}, \{13\}, \{23\}$ of $[3]$ is precisely the setting of the three-dimensional inequality above: the theorem says $|K|^2 \leq |K_{12}| \cdot |K_{13}| \cdot |K_{23}|$. The general uniform cover inequality extends this to all dimensions and all uniform covers.
[example: Applying the Three-Dimensional Inequality to a Cylinder]
Let $K = \{(x_1, x_2, x_3) : x_1^2 + x_2^2 < r^2,\, 0 < x_3 < h\}$ be the open solid cylinder of radius $r > 0$ and height $h > 0$.
We compute the three pairwise projections. The projection $K_{12}$ is the open disk $\{(x_1, x_2) : x_1^2 + x_2^2 < r^2\}$, so $|K_{12}| = \pi r^2$. The projection $K_{13}$ is the open rectangle $\{(x_1, x_3) : |x_1| < r,\, 0 < x_3 < h\}$, so $|K_{13}| = 2rh$. Similarly, $K_{23} = \{(x_2, x_3) : |x_2| < r,\, 0 < x_3 < h\}$ gives $|K_{23}| = 2rh$.
The theorem asserts $|K|^2 \leq |K_{12}| \cdot |K_{13}| \cdot |K_{23}|$. The left-hand side is $(\pi r^2 h)^2$. The right-hand side is $\pi r^2 \cdot 2rh \cdot 2rh = 4\pi r^4 h^2$. Indeed,
\begin{align*}
(\pi r^2 h)^2 = \pi^2 r^4 h^2 \leq 4\pi r^4 h^2,
\end{align*}
since $\pi^2 < 4\pi$, i.e. $\pi < 4$. The bound is not tight for the cylinder: the ratio of the right-hand side to the left-hand side is $4/\pi > 1$. Equality would require the cylinder's cross-section to be a square rather than a disk, which matches the observation that boxes saturate the inequality.
[/example]
[quotetheorem:2609]
[citeproof:2609]
The uniformity condition — that every element of $[n]$ appears in exactly $k$ sets — is essential to the argument in two ways. Structurally, it ensures the induction works cleanly: peeling off the last coordinate splits $\mathcal{A}$ into $\mathcal{A}_-$ and $\mathcal{A}_+$ in a way that leaves a uniform $k$-cover of $[n-1]$. Metrically, it is what allows Hölder's inequality to be applied with exponent $k$ across the $k$ sets in $\mathcal{A}_+$. The bound is tight: for any box $L = \prod_{i=1}^n (0, \ell_i)$ and any uniform $k$-cover, equality holds because the product $\prod_{A \in \mathcal{A}} |L_A| = \prod_{A \in \mathcal{A}} \prod_{i \in A} \ell_i = \prod_{i=1}^n \ell_i^k = |L|^k$ telescopes exactly. The question now is whether the inequality can be extended beyond uniform covers, and what the most powerful reformulation looks like.
## Irreducible Covers and the Box Theorem
The Uniform Cover Inequality is already powerful, but the Bollobás–Thomason Box Theorem is stronger: for every body $K$, there exists a box with the same volume whose projections are no larger than those of $K$. The Uniform Cover Inequality then follows as an immediate corollary.
[definition: Reducible and Irreducible Covers]
A uniform $k$-cover is **reducible** if it can be written as the disjoint union (as multisets) of two uniform covers. Otherwise, it is **irreducible**.
[/definition]
The key structural fact, which makes the Box Theorem proof work, is the following finiteness result.
[quotetheorem:2610]
[citeproof:2610]
Finiteness is not merely a curiosity — it is precisely the ingredient that makes the Box Theorem proof possible. The proof of the Box Theorem constructs a minimal array of non-negative reals satisfying one constraint per irreducible cover. If there were infinitely many irreducible covers, the minimisation argument would break down, since an infinite system of inequality constraints need not have a minimal solution. Finiteness ensures the constraint system is finite, so the minimisation is just an optimisation over a closed bounded region in $\mathbb{R}^{2^n}$.
[quotetheorem:2611]
[citeproof:2611]
## Extensions to Non-Uniform Covers
The Uniform Cover Inequality requires that each element of $[n]$ belongs to exactly $k$ of the sets in the cover. What if we only know that each element belongs to at least $k$ sets — a weaker, non-uniform condition? For general bodies, the inequality can fail: a non-uniform cover allows some coordinates to appear more often than others, and the projection measures may not compensate. However, for a special class of bodies — those built from translates of the unit cube — the inequality extends. The unit-cube hypothesis is not merely technical; it enforces a monotonicity property on projection measures that is needed to pass from a $k$-cover to a uniform $k$-cover without losing control of the right-hand side.
[definition: $k$-Cover]
A family $\mathcal{A}$ of subsets of $[n]$ is a **$k$-cover** if every element $i \in [n]$ belongs to at least $k$ members of $\mathcal{A}$.
[/definition]
[quotetheorem:2612]
[citeproof:2612]
The hypothesis that $K$ is a union of unit cubes is essential for the monotonicity $|K_B| \leq |K_A|$ when $B \subseteq A$. For a general bounded open body this monotonicity can fail: projecting onto fewer coordinates need not produce a smaller measure, because the projection onto $B$ may compress dimensions that were well-spread in $K$. For example, a thin diagonal strip in $\mathbb{R}^2$ has projections onto each coordinate axis that are much larger than its area, but the projection onto the full set $\{1,2\}$ is just the strip itself. Without the unit-cube structure, passing from a $k$-cover to a uniform $k$-cover by replacing sets with subsets would potentially increase the right-hand side, invalidating the argument.
[remark: Relationship Between Results]
The logical structure of the chapter is worth noting. The three-dimensional inequality, the Uniform Cover Inequality, and the Box Theorem form a chain: each subsumes the previous, yet each was proved using tools of increasing sophistication. The Box Theorem provides the most powerful statement, asserting that volume is controlled entirely by the geometry of the coordinate projections.
[/remark]
Projections show how volumes decrease under orthogonal projection to lower dimensions. The Combinatorial Nullstellensatz now applies polynomial methods to attack discrete problems, using algebraic structure rather than geometric intuition.
# 7. Alon's Combinatorial Nullstellensatz
This chapter introduces Alon's Combinatorial Nullstellensatz, a polynomial method with surprisingly wide combinatorial reach. The theorem itself is elementary to state — it is a non-vanishing criterion for a multivariate polynomial over a finite grid — but its consequences are striking: with it one can prove the Cauchy–Davenport theorem, the Erdős–Heilbronn conjecture, and several results about zero-sum sequences in just a few lines each. The chapter develops the necessary polynomial lemmas, proves the main theorem, and works through five distinct applications.
## The Polynomial Framework
In one variable, root-counting is straightforward: a nonzero polynomial of degree $n$ has at most $n$ roots, and a polynomial vanishing at $n$ points is divisible by their minimal polynomial. In several variables the situation is more delicate — a nonzero multivariate polynomial can vanish on an infinite set, and the naive degree bound fails entirely. To recover controlled root-counting over finite grids, one needs to track degrees variable by variable, not just in total. The three lemmas in this section build that framework, generalising each one-variable fact to several variables. Throughout this chapter $R$ denotes a commutative ring with identity, $\mathbb{F}$ a field, and $\mathbb{F}_q$ the unique field of order $q = p^k$ for a prime $p$. Write $R[X] = R[X_1, \ldots, X_n]$ for the polynomial ring in $n$ variables, and $\deg_{X_i} f$ for the degree of $f$ in the variable $X_i$ alone.
The starting point is the division algorithm in one variable, which extends to rings as long as the divisor is monic.
[quotetheorem:2613]
This is the usual polynomial long division; the monic hypothesis on $g$ ensures the leading coefficient is invertible (equal to $1$), so the algorithm runs without requiring a field.
The first lemma lifts this to several variables by dividing successively in each variable.
[quotetheorem:2614]
[citeproof:2614]
The second lemma generalises the fact that a polynomial vanishing at all points of a finite set $S \subset \mathbb{F}$ must be the zero polynomial if its degree is less than $|S|$.
[quotetheorem:2615]
[citeproof:2615]
The third lemma generalises the one-variable divisibility statement: a polynomial vanishing at every element of a finite set $S$ is divisible by $\prod_{s \in S}(X - s)$.
[quotetheorem:2616]
[citeproof:2616]
These three results together provide the algebraic scaffolding for the Nullstellensatz. To see why the degree condition matters, observe that without it the lemmas would give no contradiction: a polynomial of high enough degree can freely vanish on any finite grid while carrying high-degree monomials, since the $h_j g_j$ decomposition can absorb them. The point of requiring $\deg f = \sum d_i$ exactly is to make the top-degree monomial $X_1^{d_1} \cdots X_n^{d_n}$ too heavy for the decomposition to handle — a constraint that will be made precise in the main theorem.
## Alon's Combinatorial Nullstellensatz
[motivation]
The three lemmas show that a polynomial vanishing on a grid $S_1 \times \cdots \times S_n$ must decompose as $\sum h_i g_i$, with the total degree of each $h_i$ bounded by $\deg f - |S_i|$. The Nullstellensatz turns this into a non-vanishing criterion: if $f$ contains a monomial $X_1^{d_1} \cdots X_n^{d_n}$ with nonzero coefficient and $|S_i| = d_i + 1$, then the decomposition is impossible — no $h_j g_j$ can contribute that monomial — so $f$ cannot vanish everywhere on the grid.
[/motivation]
[quotetheorem:2617]
[citeproof:2617]
[remark: Sharpness of the Degree Condition]
The hypothesis $\deg f = \sum_{i=1}^n d_i$ is essential. It forces the designated monomial $X_1^{d_1} \cdots X_n^{d_n}$ to be a top-degree term, making it impossible for the $h_j g_j$ decomposition to reproduce it. If $f$ had higher total degree, the bound on $\deg h_j$ would relax and the argument breaks down.
[/remark]
[example: Degree Condition is Tight]
Take $n = 2$, $S_1 = S_2 = \{0, 1\}$ (so $d_1 = d_2 = 1$), and consider $f(X_1, X_2) = X_1^2 - X_1$ over any field. Then $f$ vanishes on all four points of $S_1 \times S_2$: we have $0^2 - 0 = 0$ and $1^2 - 1 = 0$, so $f(x_1, x_2) = 0$ for every $(x_1, x_2) \in \{0,1\}^2$. Yet the coefficient of $X_1^1 X_2^1$ in $f$ is zero — the monomial $X_1 X_2$ does not appear. This does not contradict the Nullstellensatz: the hypothesis of the theorem is simply not met, since $f$ contains no monomial $X_1^{d_1} X_2^{d_2} = X_1 X_2$ with nonzero coefficient. The polynomial $f = X_1(X_1 - 1) = X_1 \cdot g_1(X_1)$ where $g_1 = (X_1 - 0)(X_1 - 1)$ is the vanishing polynomial for $S_1$; in the language of the Membership Lemma, $f$ already decomposes as $h_1 g_1$ with $h_1 = X_1$. The example illustrates that the nonzero-coefficient condition is the operative hypothesis, not merely a technical assumption: a polynomial can vanish on the full grid even when its degree equals $\sum d_i$, as long as it carries no monomial of the form $X_1^{d_1} \cdots X_n^{d_n}$ with nonzero coefficient.
[/example]
## Applications in Finite Fields: Chevalley–Warning
Having proved the Nullstellensatz, we now demonstrate its reach across three different areas of combinatorics. The applications share a common template — encode a combinatorial condition as polynomial vanishing, identify a distinguished monomial, apply the theorem — but each requires a different polynomial and a different coefficient computation.
The most immediate application is to systems of polynomial equations over finite fields. A key tool is the **indicator function identity** in $\mathbb{F}_q$. Since the multiplicative group $\mathbb{F}_q^\times$ has order $q - 1$, Fermat's little theorem gives for any $x \in \mathbb{F}_q$:
\begin{align*}
x^{q-1} = \begin{cases} 1 & x \neq 0, \\ 0 & x = 0. \end{cases}
\end{align*}
Consequently $1 - f(x)^{q-1}$ equals $1$ when $f(x) = 0$ and $0$ otherwise — it is the indicator function for the zero set of $f$, built entirely from polynomial operations.
[quotetheorem:2618]
[citeproof:2618]
[quotetheorem:2619]
[citeproof:2619]
These two results are collectively called the **Chevalley–Warning theorems**. Chevalley's theorem is the qualitative statement (no unique zero), while Warning's theorem is the quantitative sharpening (the zero count is divisible by $p$). Warning's theorem is proved by direct computation without the Nullstellensatz, and neither result implies the other in full generality.
## Additive Combinatorics: Cauchy–Davenport and Sumsets
The Nullstellensatz gives a short proof of the Cauchy–Davenport theorem. The idea is to encode the condition "$A + B$ is small" as the vanishing of a polynomial on $A \times B$, then derive a contradiction from the Nullstellensatz.
[quotetheorem:2604]
[citeproof:2604]
[example: Concrete Binomial Coefficient Check]
Take $A = \{0, 1, 2\}$ and $B = \{0, 1, 2, 3\} \subseteq \mathbb{Z}_{11}$. The Cauchy–Davenport bound gives $|A + B| \geq 6$. If we supposed $|A + B| = 5$ and chose $|C| = 5$, the polynomial $f(X,Y) = \prod_{c \in C}(X + Y - c)$ has degree $5$. The coefficient of $X^2 Y^3$ is $\binom{5}{2} = 10 \equiv -1 \not\equiv 0 \pmod{11}$, so the Nullstellensatz applies and $f$ cannot vanish on $A \times B$.
[/example]
[explanation: Polynomial Method vs Direct Induction]
The polynomial proof is worth comparing to the direct induction proof of Cauchy–Davenport given in Chapter 5. There, one proceeds by induction on $|A| + |B|$, using a two-step reduction: first handle the case when $A$ or $B$ is a full residue system, then reduce to smaller sets by passing to a prime $q < p$ dividing $|A + B|$. The argument is elementary but requires careful case analysis, and the induction hypothesis applies to a modified pair of sets whose structure can be hard to track.
The polynomial proof avoids all of this. Instead of constructing the sumset by induction, it encodes the assumption "$A + B$ is small" directly as the vanishing condition $f|_{A \times B} = 0$, then derives a contradiction in one shot from the Nullstellensatz. The key insight is that the polynomial $f(X,Y) = \prod_{c \in C}(X + Y - c)$ is perfectly tailored: it vanishes on $A \times B$ if and only if $A + B \subseteq C$, and its degree is exactly $|C|$, which by hypothesis equals $|A| + |B| - 2$. The monomial $X^{|A|-1} Y^{|B|-1}$ then carries a nonzero binomial coefficient, and the Nullstellensatz fires. The polynomial method does not just reprove Cauchy–Davenport — it reveals why it is true: the sumset must be large because the associated polynomial has a distinguished high-degree monomial that cannot be killed by the vanishing ideal of a small set.
[/explanation]
## Restricted Sumsets and the Erdős–Heilbronn Conjecture
Cauchy–Davenport concerns unrestricted sums $A + B$. A natural and harder variant imposes a constraint on which pairs $(a, b)$ are allowed to contribute. For $A, B \subseteq \mathbb{Z}_p$, the **restricted sumset** is
\begin{align*}
A \mathbin{\dot{+}} B = \{a + b : a \in A,\, b \in B,\, a \neq b\}.
\end{align*}
The restriction $a \neq b$ makes counting harder. For example,
\begin{align*}
[n] \mathbin{\dot{+}} [m] &= \{3, 4, \ldots, m + n\} \quad (n \neq m), \\
[n] \mathbin{\dot{+}} [n] &= \{3, 4, \ldots, 2n - 1\},
\end{align*}
showing that $|A \mathbin{\dot{+}} A|$ can be as small as $2|A| - 3$ when $|A| \geq 2$.
In 1964, Erdős and Heilbronn conjectured that this is sharp: if $2|A| \leq p + 3$, then $|A \mathbin{\dot{+}} A| \geq 2|A| - 3$. The conjecture remained open for thirty years until Dias da Silva and Hamidoune proved it in 1994, using the representation theory of $\mathrm{GL}_2(\mathbb{F}_p)$ — a highly non-elementary tool. Alon, Nathanson, and Ruzsa gave a much simpler proof in 1996 using the Nullstellensatz. We prove the asymmetric version first, since the symmetric case follows from it by a short trick.
[quotetheorem:2620]
[citeproof:2620]
[quotetheorem:2621]
[citeproof:2621]
[explanation: Significance of the Erdős–Heilbronn Result]
The Erdős–Heilbronn conjecture is significant for two reasons. First, it resolves a thirty-year-old problem whose difficulty stems from the restriction $a \neq b$: the unrestricted sumset $A + A$ has $|A + A| \geq 2|A| - 1$ by Cauchy–Davenport, but the restriction breaks the additive symmetry used in that proof, and no elementary induction argument was found for three decades. Second, the asymmetric version proved here — $|A \mathbin{\dot{+}} B| \geq |A| + |B| - 2$ when $|A| < |B|$ — is strictly stronger than the symmetric conjecture. It gives the sharp bound even when the two sets differ in size, and the symmetric case $A = B$ is merely a corollary via the splitting trick. This asymmetric formulation is due to Alon, Nathanson, and Ruzsa; earlier approaches to the symmetric conjecture did not naturally suggest this generality.
[/explanation]
## Seating Arrangements at a Circular Table
The polynomial method is not limited to additive number theory. The following result shows it can also handle combinatorial geometry problems where objects must avoid each other. Suppose $2n + 1$ people sit at a circular table with positions labelled by $\mathbb{Z}_{2n+1}$. A host occupies position $0$; the remaining $2n$ positions are for $n$ couples. The host specifies a distance $d_i \in \{1, \ldots, n\}$ for couple $i$, meaning the pair should occupy positions $\{x_i, x_i + d_i\}$ for some $x_i \in \mathbb{Z}_{2n+1}$. The question is whether such a seating always exists.
[quotetheorem:2622]
[citeproof:2622]
[example: Circular Seating with Three Couples]
Take $p = 7$, so $n = 3$ couples at a table of $7$ seats with the host at position $0 \in \mathbb{Z}_7$. Suppose the prescribed distances are $d_1 = 1$, $d_2 = 2$, $d_3 = 3$. We need three starting positions $x_1, x_2, x_3 \in \{1, 2, 3, 4, 5, 6\}$ such that the six occupied positions $x_1, x_1 + 1, x_2, x_2 + 2, x_3, x_3 + 3$ are all distinct in $\mathbb{Z}_7$.
Try $x_1 = 4, x_2 = 1, x_3 = 6$. Couple 1 occupies $\{4, 5\}$, couple 2 occupies $\{1, 3\}$, couple 3 occupies $\{6, 2\}$. The six positions are $\{4, 5, 1, 3, 6, 2\}$: these are six distinct nonzero elements of $\mathbb{Z}_7$, so the seating is valid.
To check: $4 + 1 = 5$ (mod 7), $1 + 2 = 3$ (mod 7), $6 + 3 = 9 \equiv 2$ (mod 7). The set $\{4, 5, 1, 3, 6, 2\} = \mathbb{Z}_7 \setminus \{0\}$, so the six non-host seats are exactly filled. The theorem guarantees such a seating must exist, and this example shows one explicitly for $p = 7$ and distances $(1, 2, 3)$.
[/example]
[explanation: Why Primality is Essential]
The proof relies critically on $p = 2n + 1$ being prime. The argument uses the Nullstellensatz over $\mathbb{Z}_p$, which is a field only when $p$ is prime. For composite $m = 2n + 1$, the ring $\mathbb{Z}_m$ has zero-divisors: for instance, if $m = 9$, then $3 \cdot 3 \equiv 0 \pmod{9}$. The polynomial $f$ constructed in the proof is a product of linear factors, and when those factors can be zero-divisors the argument that $f(x) = 0 \iff$ one factor vanishes breaks down — a product of nonzero elements can still be zero.
More concretely, the Nullstellensatz requires working over a field, so one needs $|S_i| = d_i + 1$ elements in a field. With $m$ composite there is no field of order $m$, so the entire strategy fails. One can ask whether the seating theorem remains true for composite $m$, but it need not: the argument provides no guarantee in that case, and explicit counterexamples exist.
Primality is also used implicitly in Wilson's theorem: the fact that $(p-1)! \equiv -1 \pmod{p}$ holds precisely for $p$ prime, and the computation of the leading coefficient of $f$ depends on this. If one tried to prescribe other distances — say allowing $d_i \in \{1, \ldots, 2n\}$ instead of $d_i \in \{1, \ldots, n\}$ — the polynomial $f$ would change, and different conditions on $p$ might be needed to ensure the required monomial has nonzero coefficient.
[/explanation]
## Latin Transversals
The polynomial method applies equally well to questions about latin squares and complete systems of representatives. Our final application addresses the following question in $\mathbb{Z}_p$. Given any two enumerations $a_1, \ldots, a_p$ and $c_1, \ldots, c_p$ of $\mathbb{Z}_p$, setting $b_i = c_i - a_i$ gives $\sum_i b_i = 0$. Is the converse true?
[quotetheorem:2623]
[citeproof:2623]
[remark: The Polynomial Method in Combinatorics]
All five applications share the same structure: encode a combinatorial property as the nonvanishing of a polynomial, build a polynomial that vanishes on the full grid if the desired combinatorial object does not exist, locate a top-degree monomial $X_1^{d_1} \cdots X_n^{d_n}$ with nonzero coefficient (typically by a binomial or multinomial coefficient calculation), and apply the Nullstellensatz to reach a contradiction. The craft lies in choosing the polynomial $f$ so that the coefficient computation works out — often a delightful combinatorial identity in its own right.
[/remark]
## References