What brings you to Androma?

Additive combinatorics studies how arithmetic structure emerges from addition in sets, groups, rings, and fields. The central objects are sumsets, doubling phenomena, additive energy, and the tension between randomness and regularity: when a set behaves like a genuine progression, when it expands rapidly under addition, and when it hides unexpectedly many additive relations. This course develops the basic vocabulary and then moves toward the major structural results that explain why sets with small additive growth must resemble highly organized objects. The chapters are arranged to build this theory step by step. They begin with first examples of sumset growth and popular sums, then develop the toolkit of Ruzsa calculus and Plünnecke theory for controlling iterated addition. From there the course turns to generalized arithmetic progressions, Freiman’s theorem, and the Balog-Szemerédi-Gowers theorem, which together identify approximate structure from additive concentration. Fourier methods provide a complementary analytic viewpoint, while the later chapters broaden the scope to sum-product phenomena, growth and expansion, and the appearance of long arithmetic progressions. The final chapters place these ideas into deeper frameworks, culminating in the Green-Tao theorem and its synthesis of combinatorial, analytic, and ergodic methods, including the correspondence principles that connect finite additive problems to dynamical systems. # Introduction Additive combinatorics asks how much algebraic structure is forced by unusually many additive coincidences. Instead of studying a set only through its elements, we study the size and shape of the sets produced from it by addition, subtraction, and sometimes multiplication. The course begins with finite subsets of abelian groups and then develops tools that convert quantitative hypotheses, such as small doubling or large additive energy, into structural conclusions. The guiding tension is that additive structure is rigid, but it can appear in disguised forms that require combinatorial, Fourier-analytic, and geometric methods to detect. These notes are written for readers who have seen undergraduate algebra, elementary number theory, real analysis, basic Fourier analysis, probability, and graph theory. Chapters 1 and 2 use finite-set counting and the group law; Chapters 3 through 10 introduce graph-theoretic refinements, inverse theorems, incidence estimates, and applications to arithmetic progressions. Throughout the course, constants matter: a statement with polynomial dependence on a doubling parameter is often much stronger than a statement with exponential dependence. ## The Central Question What can be inferred from the fact that a finite set does not expand much under addition? Let $G$ be an abelian group and let $A \subset G$ be finite. The sumset $A+A$ records all possible pairwise sums, and the comparison between $|A+A|$ and $|A|$ measures how far $A$ is from behaving like an additively closed object. [definition: Sumset] Let $G$ be an abelian group and let $A,B \subset G$ be finite subsets. The sumset of $A$ and $B$ is \begin{align*} A+B := \{a+b : a \in A,\ b \in B\}. \end{align*} [/definition] The definition is deliberately elementary, but it already separates structured sets from generic ones. If $A$ is an arithmetic progression in $\mathbb Z$, then $A+A$ has about twice as many elements as $A$. If $A$ is a random set in a large ambient group, then repeated sums usually collide rarely, so the sumset is much larger. [definition: Doubling Constant] Let $G$ be an abelian group and let $A \subset G$ be a nonempty finite subset. The doubling constant of $A$ is \begin{align*} K(A) := \frac{|A+A|}{|A|}. \end{align*} [/definition] Small doubling is the first quantitative signal of additive structure. Much of the course studies what can be concluded from a bound $K(A) \le K$, where $K$ is treated as a parameter and $|A|$ may be very large. [example: Arithmetic Progression Has Small Doubling] Let $A = \{x, x+d, \dots, x+(n-1)d\} \subset \mathbb Z$ with $d \neq 0$. Then \begin{align*} A+A = \{2x, 2x+d, \dots, 2x+2(n-1)d\}, \end{align*} so $|A+A| = 2n-1$ and $K(A)=2-1/n$. This example shows that one-dimensional additive patterns have bounded doubling independent of their length. [/example] The converse direction is much deeper. A finite set with small doubling need not be an arithmetic progression, but the philosophy of Freiman theory says that it must be controlled by a higher-dimensional progression or by an appropriate coset-progression analogue. ## The Course Viewpoint How do we turn a numerical inequality into a structural statement about a set? The course repeatedly uses the same pattern: count additive coincidences, isolate a dense or popular part of the relevant object, and then upgrade this density into algebraic control. This viewpoint explains why the subject lies between number theory, combinatorics, harmonic analysis, and geometry. [definition: Difference Set] Let $G$ be an abelian group and let $A,B \subset G$ be finite subsets. The difference set of $A$ and $B$ is \begin{align*} A-B := \{a-b : a \in A,\ b \in B\}. \end{align*} [/definition] Difference sets are often more stable than sumsets because they contain zero and are symmetric when $A=B$. They allow the Ruzsa distance framework, covering arguments, and the comparison of several sets through inequalities involving $A-B$, $A+C$, and $B+C$. Set size alone forgets how often different pairs produce the same sum. To connect growth estimates with counting arguments, we need a statistic that measures the multiplicity of additive coincidences rather than only the number of resulting sums. [definition: Additive Energy] Let $G$ be an abelian group and let $A,B \subset G$ be finite subsets. The additive energy of $A$ and $B$ is \begin{align*} E^+(A,B) := |\{(a_1,a_2,b_1,b_2) \in A^2 \times B^2 : a_1+b_1=a_2+b_2\}|. \end{align*} When $A=B$, write $E^+(A):=E^+(A,A)$. [/definition] Energy counts how many ways sums can collide. Small doubling forces many collisions by the pigeonhole principle and Cauchy-Schwarz, while high energy can often be converted into small doubling after passing to large subsets. A natural first question is whether a small sumset must, on its own, force the energy to be large. If $A+B$ is small, then the $|A|\,|B|$ pairs $(a,b)$ are distributed among the few available sums, so some sums must be represented many times; the task is to turn this heuristic into a clean quantitative lower bound on $E^+(A,B)$ in terms of $|A|$, $|B|$, and $|A+B|$. The following theorem does exactly this with a single application of the Cauchy--Schwarz inequality. [quotetheorem:4565] [citeproof:4565] The finiteness and non-emptiness hypotheses are essential here: the proof uses a finite representation function and divides by $|A+B|$, so empty sets or infinite sets require different formulations. The bound is sharp in the extreme structured case where every element of $A+B$ has the same number of representations, but it does not by itself describe the shape of $A$ or $B$. For instance, the theorem only says that small $|A+B|$ forces high energy; it does not say that high energy forces small doubling for the original set without further work. That gap is one of the reasons Chapter 2 introduces popularity arguments and Chapter 5 introduces the Balog-Szemeredi-Gowers theorem. This short proof is therefore a model for many later arguments: begin with an average statement, identify where a large contribution must lie, and then study the popular part rather than the whole object. ## Model Examples Which finite sets should be regarded as the basic structured examples? The first lectures compare arithmetic progressions, boxes in $\mathbb Z^d$, subspaces of $\mathbb F_p^n$, random sets, and geometric progressions viewed through addition. These examples set the scale for the main theorems. [definition: Generalised Arithmetic Progression] Let $G$ be an abelian group. A generalised arithmetic progression of rank $d$ is a set \begin{align*} P = \{x_0+n_1x_1+\dots+n_dx_d : 0 \le n_i < L_i\text{ for }1 \le i \le d\}, \end{align*} where $x_0,x_1,\dots,x_d \in G$ and $L_1,\dots,L_d \in \mathbb N$. [/definition] Generalised arithmetic progressions are the natural multidimensional replacement for ordinary arithmetic progressions. They are central to inverse theorems because their sumsets grow only by changing side lengths. [example: Boxes In A Lattice] Let $A=\{0,1,\dots,L-1\}^d \subset \mathbb Z^d$. Then \begin{align*} A+A=\{0,1,\dots,2L-2\}^d, \end{align*} so $|A|=L^d$ and $|A+A|=(2L-1)^d$. Hence $K(A)=(2-1/L)^d$, showing that rank contributes exponentially to the doubling constant. [/example] The previous example fixes one test case for the idea. The next example, Subspaces Over A Finite Field, changes the setting so the same mechanism can be recognized from another angle. [example: Subspaces Over A Finite Field] Let $V \le \mathbb F_p^n$ be a vector subspace. Since $V+V=V$, the doubling constant is $K(V)=1$. Cosets of subspaces have the same property after translation, making them the exact finite-field analogue of additively closed sets. [/example] Random sets provide the opposite benchmark. If $A$ is sparse inside a large finite group, most sums have few representations and $|A+A|$ is typically close to the maximum allowed by counting. This contrast explains why inverse theorems are meaningful: small doubling is a strong restriction, not a generic phenomenon. ## Main Tools And Milestones Which tools let us pass from examples to theorems with useful quantitative bounds? The course builds a toolkit in stages, beginning with sharp sumset inequalities and then moving toward inverse and robust inverse results. Each stage answers a slightly different version of the same question: how much structure is forced by additive expansion constraints? [remark: First Direct Inequalities] The early part of the course studies Cauchy-Davenport in $\mathbb Z/p\mathbb Z$ and Kneser theorem in abelian groups. These results give lower bounds for $|A+B|$ and identify the obstruction caused by periodicity. They are direct theorems: they say how large a sumset must be under given hypotheses. [/remark] Once lower bounds are understood, the next question is persistence: if one addition is controlled, how many further additions and subtractions remain controlled? [remark: Pluennecke-Ruzsa Theory] Pluennecke-Ruzsa theory controls iterated sumsets and difference sets from a small-doubling hypothesis. A typical conclusion has the form $|mA-nA| \le K^{m+n}|A|$ under appropriate assumptions. The strength is that repeated addition and subtraction remain controlled once the first doubling is controlled. [/remark] Control of iterated sumsets still does not identify the underlying shape of the set. The inverse results supply this missing structural description. [remark: Inverse Theorems] Freiman-type inverse theorems classify finite sets with small doubling up to containment in structured objects such as generalised arithmetic progressions or coset progressions. The exact formulation depends strongly on the ambient group. In $\mathbb Z$, the theorem has a progression flavour; in $\mathbb F_p^n$, the finite-field geometry becomes more visible. [/remark] In applications the input is often not exact small doubling but a large number of additive coincidences. The robust theory explains how to recover a small-doubling subset from this weaker information. [remark: Robust Structure From Energy] The Balog-Szemeredi-Gowers theorem starts not from small doubling but from many additive quadruples. It says, in a quantitative form, that large additive energy contains a large subset with small doubling. This makes energy a robust substitute for exact small-doubling information. [/remark] These milestones are not isolated results. They form a pipeline: lower bounds reveal obstructions, Pluennecke-Ruzsa theory propagates small doubling, inverse theorems describe the resulting structure, and Balog-Szemeredi-Gowers lets the theory tolerate noise. ## Additive And Multiplicative Structure Can a finite set of numbers be structured for both addition and multiplication? Sum-product theory says that, over many fields and rings, the answer is usually no unless the set is forced into a special algebraic environment. This is where additive combinatorics connects to incidence geometry and growth in groups. [definition: Product Set] Let $R$ be a ring and let $A,B \subset R$ be finite subsets. The product set of $A$ and $B$ is \begin{align*} AB := \{ab : a \in A,\ b \in B\}. \end{align*} [/definition] The product set lets us compare additive and multiplicative expansion on the same finite set. In a field, a set with both $|A+A|$ and $|AA|$ small would have to resemble both an additive progression and a multiplicative progression, and those templates are usually incompatible. [example: Geometric Progression Viewed Additively] Let $A=\{1,2,4,\dots,2^{n-1}\} \subset \mathbb Z$. Multiplicatively, $A$ is highly structured, since $AA=\{2^j:0 \le j \le 2n-2\}$ has size $2n-1$. Additively, the binary expansion viewpoint shows that most sums $2^i+2^j$ are distinct up to symmetry, so $|A+A|$ is on the order of $n^2$. [/example] The example also points to a boundary of the idea. The following remark, Incidence Geometry, records that interpretation before the construction is used again. [remark: Incidence Geometry] Incidence estimates count intersections between points and geometric objects such as lines or curves. They enter the course because additive and multiplicative equations can often be rephrased as incidence problems. This provides bounds for sum-product estimates and for expansion phenomena that are hard to see through additive energy alone. [/remark] ## Applications And Endpoints What are the main mathematical payoffs of the theory? The final part of the course uses the developed machinery to study arithmetic progressions and related configurations. Additive combinatorics is powerful because it gives flexible structural input for problems that at first look purely enumerative. [definition: Arithmetic Progression] Let $G$ be an abelian group. A $k$-term arithmetic progression in $G$ is a sequence \begin{align*} x,\ x+d,\ x+2d,\ \dots,\ x+(k-1)d, \end{align*} where $x,d \in G$. [/definition] Arithmetic progressions are both examples of small-doubling sets and configurations whose existence inside dense sets is a central problem. This dual role explains why the course ends with applications to progression-finding theorems. [remark: Progressions As Configurations] Finding a long arithmetic progression inside a set is different from showing that the whole set is progression-like. The former is a configuration problem, while the latter is an inverse problem. Additive combinatorics supplies methods that move between these perspectives by locating structured subsets or structured ambient models. [/remark] The course therefore has two complementary aims. The first is to understand finite sets with constrained additive growth. The second is to use that understanding in problems where additive configurations must be found, counted, or ruled out. ## How To Read These Notes What should the reader track while moving through the course? The most important habit is to distinguish direct results, inverse results, and robust inverse results. Direct results lower-bound expansion; inverse results classify near-extremisers; robust inverse results extract structure from noisy or averaged hypotheses. The second habit is to follow the dependence on parameters. A theorem with conclusion $K^{O(1)}|A|$ behaves very differently from one with conclusion $\exp(K^{O(1)})|A|$. These quantitative distinctions become decisive in applications. The third habit is to keep the model examples in mind. Arithmetic progressions, boxes, subspaces, random sets, and geometric progressions reappear in Chapters 1, 2, 4, and 7 as tests for definitions and theorems. A good proof in this subject usually explains both why the model examples satisfy the hypothesis and why the conclusion has the right shape for them. The introduction has emphasized the role of model examples and the need for statements that capture their additive behavior. The next chapter begins that program in earnest by asking how small a sumset can be and which familiar families already exhibit the extremal patterns. # 1. Sumsets, Doubling, and First Examples This opening chapter fixes the basic language of additive combinatorics and tests it against the examples that will guide the rest of the course. The central question is: when is the sumset $A+A$ much smaller than a generic set of that size would suggest? Arithmetic progressions, multidimensional boxes, subspaces, random sets, and geometric progressions give different answers, and the contrast between them is the first sign that small doubling is a structural condition. ## Measuring Additive Growth If $A$ and $B$ are finite subsets of an abelian group $G$, the first problem is to measure how much larger the set of pairwise sums is than the original sets. Addition may create many collisions, and additive combinatorics begins by recording those collisions in a way that retains both their algebraic and their counting content. [definition: Sumset And Difference Set] Let $G$ be an abelian group and let $A,B \subset G$ be finite subsets. The sumset and difference set are \begin{align*} A+B &= \{a+b : a \in A,\ b \in B\},\\ A-B &= \{a-b : a \in A,\ b \in B\}. \end{align*} [/definition] The notation is set-valued, not multiset-valued: repeated representations of the same group element do not create new elements of $A+B$. However, the number of representations is often as important as the set itself. Many arguments add the same set repeatedly rather than just once. To state growth questions across several addition steps, we need notation that records all sums obtained from $k$ allowed choices, including repeated choices from the same set. [definition: Iterated Sumset] Let $G$ be an abelian group, let $A \subset G$ be finite, and let $k \in \mathbb N$. The $k$-fold sumset of $A$ is \begin{align*} kA = \{a_1+\cdots+a_k : a_1,\dots,a_k \in A\}. \end{align*} [/definition] This convention allows repetitions. Thus $2A=A+A$, while $A-A$ is a different object because signs matter. The set $A+B$ records which sums occur, but it does not record whether a sum is rare or highly repeated. Energy estimates and pigeonhole arguments need this finer information, so we introduce a function that counts the number of ordered representations of each group element as a sum from $A$ and $B$. [definition: Representation Function] Let $G$ be an abelian group and let $A,B \subset G$ be finite. The representation function of $A+B$ is the function $r_{A+B}:G\to \mathbb N\cup\{0\}$ defined by \begin{align*} r_{A+B}(x)=|\{(a,b)\in A\times B : a+b=x\}|. \end{align*} [/definition] The basic identity behind later energy arguments is \begin{align*} \sum_{x\in G} r_{A+B}(x)=|A||B|. \end{align*} It says that every ordered pair contributes to exactly one sum. When $A+B$ is small, the same total mass must be concentrated on fewer values of $x$, so many sums have many representations. To compare different finite sets on a common scale, we normalize sumset size by the size of the original set. This produces a dimensionless measure of additive growth, distinguishing sets whose self-sumset is genuinely small from sets that are merely small in absolute cardinality. [definition: Doubling Constant] Let $G$ be an abelian group. The doubling constant is the map \begin{align*} K_G:\{A\subset G:A\text{ finite and nonempty}\}&\to \mathbb Q_{>0},\\ A&\mapsto \frac{|A+A|}{|A|}. \end{align*} [/definition] When the ambient group is clear, we write $K(A)$ instead of $K_G(A)$. Small doubling means $K(A)$ is bounded independently of $|A|$, or at least much smaller than the generic size scale suggested by $|A|$. The whole course develops consequences of this hypothesis and converse statements saying that small doubling forces approximate algebraic structure. [example: Interval Doubling] Let $A=\{0,1,\dots,N-1\}\subset \mathbb Z$. Then \begin{align*} A+A=\{0,1,\dots,2N-2\}, \end{align*} so $|A+A|=2N-1$ and $K(A)=2-1/N$. The point is that an interval has the smallest possible additive growth in $\mathbb Z$ among sets of comparable shape: adjacent elements fill every intermediate sum. [/example] This computation also shows which features of the set are irrelevant to doubling. The exact location and scale of an interval do not matter; what matters is the additive pattern among its elements. [remark: Translation And Dilation] In $\mathbb Z$, translating $A$ by an integer does not change $|A+A|$, and multiplying $A$ by a nonzero integer does not change $|A+A|$ either. These operations preserve all additive relations among elements of $A$. [/remark] ## First Structured Examples The next problem is to identify finite sets whose additive growth is small for a visible reason. These examples are not merely illustrations; later inverse theorems say, in different settings, that they are close to the only possibilities. [definition: Arithmetic Progression] Let $G$ be an abelian group. A finite arithmetic progression in $G$ is a set of the form \begin{align*} P=\{a+jd : 0\le j\le N-1\}, \end{align*} where $a,d\in G$ and $N\in\mathbb N$. [/definition] When the elements $a,a+d,\dots,a+(N-1)d$ are distinct, the progression has size $N$. Its sumset is another progression with length approximately doubled, so arithmetic progressions are the model one-dimensional small-doubling sets. [example: Arithmetic Progressions] Let $P=\{a+jd:0\le j\le N-1\}\subset \mathbb Z$ with $d \neq 0$. Then \begin{align*} P+P=\{2a+md:0\le m\le 2N-2\}, \end{align*} and hence $|P+P|=2N-1$. This gives $K(P)=2-1/N$, the same doubling constant as an interval after translation and dilation. [/example] The multidimensional analogue replaces an interval by a box. The growth then depends on the number of independent directions, and this dimension dependence is the first reason that small doubling does not mean one-dimensionality. To use boxes inside an arbitrary abelian group, we must separate the integer parameters from the group elements they generate. The next formal object records a base point, several additive directions, and finite ranges for the coefficients, giving the standard model for multidimensional small-doubling sets. [definition: Generalised Arithmetic Progression] Let $G$ be an abelian group. A generalised arithmetic progression of rank $d$ is a set \begin{align*} P=\{a+n_1v_1+\cdots+n_dv_d : 0\le n_i\le N_i-1\text{ for }1\le i\le d\}, \end{align*} where $a,v_1,\dots,v_d\in G$ and $N_1,\dots,N_d\in\mathbb N$. [/definition] If the displayed parametrisation is injective, then $|P|=N_1\cdots N_d$. In that case $P+P$ is parametrised by the same directions but with each side length changing from $N_i$ to $2N_i-1$. [example: Two-Dimensional Progression] Let \begin{align*} P=\{(i,j)\in \mathbb Z^2:0\le i\le M-1,\ 0\le j\le N-1\}. \end{align*} Then \begin{align*} P+P=\{(u,v)\in \mathbb Z^2:0\le u\le 2M-2,\ 0\le v\le 2N-2\}, \end{align*} so \begin{align*} |P+P|=(2M-1)(2N-1). \end{align*} For large $M,N$, the doubling constant is close to $4$, reflecting the two independent additive directions. [/example] The same coordinatewise calculation is the right way to think about higher-rank progressions. Each independent direction contributes its own factor to the growth, so the doubling constant records dimension rather than just size. [example: Boxes In Higher Rank] For a box $P=\prod_{i=1}^d\{0,1,\dots,N_i-1\}\subset \mathbb Z^d$, direct coordinatewise addition gives \begin{align*} |P+P|=\prod_{i=1}^d(2N_i-1). \end{align*} If all side lengths are large, $K(P)$ is close to $2^d$. Thus bounded rank progressions have bounded doubling, but the bound records the rank. [/example] Subspaces over finite fields are an even more rigid example. They have no growth at all because they are already closed under addition. [example: Subspaces Over Finite Fields] Let $V\le \mathbb F_p^n$ be a vector subspace. Then $V+V=V$, so $K(V)=1$. More generally, if $A=x+V$ is a coset, then $A+A=2x+V$ and $|A+A|=|A|$. [/example] These examples show that the smallest possible doubling in a torsion group may be $1$, unlike in $\mathbb Z$, where a finite nonempty set satisfies $|A+A|\ge 2|A|-1$. The difference comes from the presence of finite subgroups. ## Random And Multiplicative Examples The next problem is to understand what small doubling is not. A random set usually has very large doubling, while a geometric progression is multiplicatively structured but additively unstructured. [example: Random Sets In A Cyclic Group] Let $G=\mathbb Z/N\mathbb Z$ and choose a random subset $A\subset G$ by including each element independently with probability $\delta$. The expected size is $\delta N$. For a fixed $x\in G$, the probability that $x\notin A+A$ is approximately $(1-\delta^2)^N$ when dependencies are ignored at first order, so for moderately dense $A$ the sumset is typically close to all of $G$. Thus $|A+A|$ is usually governed by ambient saturation rather than by additive structure. [/example] For sparse random subsets of a large ambient group, most ordered pairs have distinct sums. In that regime $|A+A|$ is close to $|A|^2/2$, which is the opposite of small doubling. [example: Geometric Progressions Viewed Additively] Let $A=\{1,2,4,\dots,2^{N-1}\}\subset \mathbb Z$. If $0\le i\le j\le N-1$, then $2^i+2^j$ determines the pair $\{i,j\}$ by binary expansion. Hence the unordered sums are all distinct, and \begin{align*} |A+A|=\frac{N(N+1)}{2}. \end{align*} Although $A$ is highly structured multiplicatively, its additive doubling is of order $N$. [/example] This example is the first warning that additive and multiplicative structure are different. Later sum-product estimates make this tension quantitative: over fields, a set cannot usually have both $A+A$ and $A\cdot A$ small unless forced by an algebraic obstruction. ## Lower Bounds In Prime Cyclic Groups A fundamental problem is to find universal lower bounds for $|A+B|$. In $\mathbb Z$, intervals show that $|A+B|\ge |A|+|B|-1$ is the right scale. In a finite cyclic group, wraparound prevents a bound larger than the size of the group. [quotetheorem:2604] [citeproof:2604/additive-combinatorics] The theorem says that prime cyclic groups behave like the integers until the sumset fills the whole group. The primality hypothesis is essential: in $\mathbb Z/6\mathbb Z$, the subgroup $H=\{0,3\}$ satisfies $H+H=H$, so $|H+H|=2$ while $2|H|-1=3$. Applying Cauchy-Davenport with $A=B$ gives \begin{align*} |A+A|\ge \min\{p,2|A|-1\}. \end{align*} Thus in $\mathbb Z/p\mathbb Z$, a set with $|A|<p/2$ cannot have doubling below approximately $2$ unless it is extremely small. The theorem is only a lower bound: it does not classify the sets for which equality or near-equality occurs. Those classification questions lead to inverse additive theory, where small sumsets are used to recover progressions, cosets, or higher-rank approximate structure. [example: Sharpness Of Cauchy Davenport] Let $A=\{0,1,\dots,m-1\}$ and $B=\{0,1,\dots,n-1\}$ in $\mathbb Z/p\mathbb Z$, with $m+n-1\le p$. Then \begin{align*} A+B=\{0,1,\dots,m+n-2\}, \end{align*} so equality holds in Cauchy-Davenport. If $m+n-1>p$, the lower bound becomes $p$, and intervals already wrap around enough to fill the whole group in many endpoint configurations. [/example] Cauchy-Davenport is special to prime cyclic groups. In composite cyclic groups, nontrivial subgroups give counterexamples to the naive lower bound: if $H\le G$ is a finite subgroup, then $H+H=H$, not a set of size $2|H|-1$. ## Stabilizers And Kneser's Theorem The final problem in this chapter is to state the correct lower bound in an arbitrary abelian group. The obstruction to Cauchy-Davenport is periodicity: a sumset may be invariant under translation by a nontrivial subgroup. [definition: Stabilizer Of A Set] Let $G$ be an abelian group and let $S\subset G$ be finite. The stabilizer, or period group, of $S$ is \begin{align*} \operatorname{Stab}(S)=\{h\in G:S+h=S\}. \end{align*} [/definition] The stabilizer is a subgroup of $G$. If $H=\operatorname{Stab}(S)$, then $S$ is a union of cosets of $H$, so $H$ measures the exact periodicity present in $S$. With periodicity now measured precisely, we can ask what the correct replacement for $|A+B|\ge |A|+|B|-1$ should be in an arbitrary abelian group. The integer bound fails exactly when $A+B$ is invariant under a nontrivial subgroup, so the right statement ought to subtract a correction governed by the stabilizer of the sumset, and ought to reduce to the classical inequality whenever that stabilizer is trivial. Kneser's theorem makes this correction precise and is sharp. [quotetheorem:4566] [citeproof:4566] This is the sharp abelian-group version of the basic sumset lower bound. The hypotheses allow arbitrary abelian groups, but finiteness and nonemptiness of $A$ and $B$ are essential because the displayed cardinalities must be finite and the stabilizer correction must be meaningful. The theorem does not by itself describe the internal shape of $A$ and $B$; it explains exactly how periodicity in $A+B$ changes the lower bound. In later arguments, Kneser's theorem is the bridge between numerical small doubling and structural information, because a large stabilizer forces coset structure while a trivial stabilizer recovers the integer-like lower bound. The course uses it as a structural theorem rather than proving it here; the proof belongs to the deeper theory of sumsets in groups and is usually developed through Dyson transforms, Kemperman theory, or modern isoperimetric methods. [example: Kneser For Subgroups] Let $A=B=H$, where $H$ is a finite subgroup of an abelian group $G$. Then $A+B=H$ and $\operatorname{Stab}(A+B)=H$. Kneser's lower bound reads \begin{align*} |H|\ge |H|+|H|-|H|, \end{align*} which is equality. This explains exactly why the uncorrected lower bound $|A+B|\ge |A|+|B|-1$ fails in groups with finite subgroups. [/example] The subgroup example is the cleanest obstruction, but periodicity can appear in less obvious ways after taking the sumset. The next example separates the visible periods of the summands from the actual stabilizer of their sum. [example: A Periodic Sumset] Let $G=\mathbb Z/12\mathbb Z$, let $H=\{0,4,8\}$, and set $A=\{0,1\}+H$, $B=\{0,2\}+H$. Then $A+H=A$ and $B+H=B$, while \begin{align*} A+B=\{0,1,2,3\}+H=G. \end{align*} Here $\operatorname{Stab}(A+B)=G$, so Kneser's bound gives $|A+B|\ge |G|$. The example shows that the stabilizer must be taken from the sumset itself, not guessed from the visible periods of $A$ and $B$. [/example] ## What The Examples Teach The final problem is to synthesize the examples into a working taxonomy of additive growth. Progressions and boxes have small doubling because their sums remain in the same bounded-dimensional coordinate system. Subspaces and cosets have doubling $1$ because the ambient group contains finite additive structure. Random sets and geometric progressions usually have large additive doubling because their pairwise sums collide rarely. The next chapter replaces the size of $A+A$ by a more refined statistic: the number of additive quadruples $a_1+a_2=a_3+a_4$. Representation functions already contain this information, and the transition from doubling to additive energy is the first step toward robust versions of small-doubling structure. The first chapter set up sumsets and doubling as the basic language for measuring additive structure. To go beyond the size of $A+A$, we now count how often the same sum occurs, since those repeated representations reveal a finer and more flexible notion of structure. # 2. Additive Energy and Popular Sums The first chapter measured additive structure by the size of a sumset. This chapter assumes the basic language of finite subsets of abelian groups, sumsets, difference sets, and Cauchy--Schwarz, and introduces a second measurement: the number of ways in which the same sum can be represented. Additive energy turns repeated additive coincidences into a quantitative object, while popular sums isolate the part of a sumset where those coincidences concentrate. The guiding question is: when $A+B$ is not small enough to be structurally decisive, can many repeated sums still force a dense, usable piece of additive structure? ## Counting Additive Coincidences How should we distinguish a sumset in which every element has about one representation from a sumset in which many elements have many representations? The set $A+B$ records which sums occur, but it forgets how often they occur. Additive energy keeps this multiplicity data and is the main bridge between sumset estimates, graph density, and structural extraction. For finite subsets $A,B$ of an abelian group $G$, define the representation function as the map \begin{align*} r_{A+B}:G&\to \mathbb Z_{\ge 0}, & x&\mapsto |\{(a,b) \in A \times B : a+b=x\}|. \end{align*} It is often useful to view this as a convolution of indicator functions. Let $\mathcal F_{\mathrm{fin}}(G)$ denote the set of finitely supported complex-valued functions on $G$. The convolution operator is the map \begin{align*} *: \mathcal F_{\mathrm{fin}}(G) \times \mathcal F_{\mathrm{fin}}(G)&\to \mathcal F_{\mathrm{fin}}(G), & (f,g)&\mapsto f*g, \end{align*} where \begin{align*} (f*g)(x)=\sum_{y \in G} f(y)g(x-y). \end{align*} With this notation, \begin{align*} r_{A+B}=\mathbf{1}_A * \mathbf{1}_B. \end{align*} [definition: Additive Energy] Let $G$ be an abelian group, and let $\operatorname{Fin}(G)$ denote the set of finite subsets of $G$. The additive energy functional is the map \begin{align*} E^+:\operatorname{Fin}(G)\times \operatorname{Fin}(G)&\to \mathbb Z_{\ge 0}, & (A,B)&\mapsto |\{(a_1,a_2,b_1,b_2) \in A^2 \times B^2 : a_1+b_1=a_2+b_2\}|. \end{align*} The additive energy of $A$ is $E^+(A)=E^+(A,A)$. [/definition] The equation $a_1+b_1=a_2+b_2$ says that two edges in the complete bipartite addition table land on the same sum. Thus $E^+(A,B)$ is large when the fibres of the sum map $A \times B \to A+B$ are uneven or large. Counting quadruples directly is awkward in applications, so we look for equivalent expressions for $E^+(A,B)$ that tie it to objects we can already estimate. Two reformulations are useful: one writes the energy as the $L^2$ mass of the representation function $r_{A+B}$, and the other rewrites it through the differences $a-a'$ and $b-b'$, exposing the overlaps $A\cap(A+t)$ that drive later arguments. The next theorem records both languages and shows they agree. [quotetheorem:4567] [citeproof:4567] The first formula says that energy is the $L^2$ mass of the representation function. Finiteness is needed here: for an infinite subgroup $H$, the analogous representation counts and sums need not be finite numbers. The theorem does not say that large energy forces $A$ and $B$ themselves to be subgroups or progressions; it only converts additive coincidences into two equivalent counting languages. The second language, in terms of differences, is what the Katz--Koester transform later in this chapter, and the Ruzsa calculus of Chapter 3, use to bring overlap information such as $A\cap(A+t)$ into the argument. [example: Interval Energy] Let $A=\{1,2,\dots,N\}\subset \mathbb Z$. Then $r_{A+A}(x)$ increases from $1$ to $N$ and then decreases from $N-1$ to $1$. Therefore \begin{align*} E^+(A)=\sum_{m=1}^{N}m^2+\sum_{m=1}^{N-1}m^2=\frac{2N^3+N}{3}. \end{align*} This example shows that an interval has energy of order $|A|^3$, which is the largest possible scale up to constants for a set with $|A|$ elements in a torsion-free group. [/example] ## Energy and the Size of the Sumset What lower bound on energy is forced just by knowing that $A+B$ is small? Since the total number of representations is fixed, \begin{align*} \sum_{x \in G} r_{A+B}(x)=|A||B|, \end{align*} a small support for $r_{A+B}$ forces its second moment to be large. This is the basic energy-sumset inequality. [quotetheorem:4568] [citeproof:4568] The inequality should be read as a compression principle: if many ordered pairs $(a,b)$ land in comparatively few sums, then many pairs of ordered pairs collide. The nonempty hypothesis prevents division by $|A+B|=0$, and Cauchy--Schwarz is sharp only when the representation counts are evenly distributed on their support, as happens for finite subgroups. The converse is not a direct statement about the whole sumset: high energy may be concentrated on a useful part of $A+B$ rather than spread across all sums. This limitation is why popular sums become important. [remark: Normalised Energy] If $|A|=|B|=N$, it is common to write $E^+(A,B)=\alpha N^3$ with $0\le \alpha\le 1$ in many ambient groups where the natural maximum scale is $N^3$. The parameter $\alpha$ measures the density of additive coincidences relative to a maximally structured configuration. Small doubling gives $\alpha \ge 1/K$ when $|A+B|\le KN$, after adjusting for unequal sizes in the general case. [/remark] The remark clarifies how the previous construction should be read. The next example, Random Set Energy, returns to a concrete case where that interpretation can be checked directly. [example: Random Set Energy] Let $A$ be a random subset of $\{1,\dots,M\}$ obtained by keeping each element independently with probability $p$, and suppose $N=pM$ is large. Most sums have about $p^2M$ possible representations in the middle of the interval, so the expected energy is on the order of $p^4M^3+N^2$. In terms of $N$, this is about $N^4/M+N^2$. When $N$ is much smaller than $M$, the collision count is far below the interval scale $N^3$, reflecting the absence of strong additive structure. [/example] The previous example fixes one test case for the idea. The next example, Subgroup Energy, changes the setting so the same mechanism can be recognized from another angle. [example: Subgroup Energy] Let $H\le \mathbb F_p^n$ be a subgroup, viewed as an additive subspace. Since $H+H=H$, every $x\in H$ has exactly $|H|$ representations as $x=h_1+h_2$. Hence \begin{align*} E^+(H)=\sum_{x\in H}|H|^2=|H|^3. \end{align*} Subgroups are exact models of maximal additive energy: the sumset has no expansion, and every sum is equally popular. [/example] ## Popular Sums If the energy is large, must many sums be popular, or could the energy be carried by a small exceptional set of sums? Popular sums answer this by passing from a weighted object, the representation function, to a dense graph inside $A\times B$. This viewpoint is central in the Balog--Szemerédi--Gowers theorem of Chapter 5 and in arguments where we keep only the edges that participate in many additive coincidences. [definition: Popular Sum Set] Let $A,B$ be finite subsets of an abelian group $G$, and let $\tau>0$. The set of $\tau$-popular sums is \begin{align*} P_\tau(A,B)=\{x\in A+B:r_{A+B}(x)\ge \tau\}. \end{align*} [/definition] After defining $P_\tau(A,B)$, we often choose $\tau$ as a constant multiple of the average representation count $|A||B|/|A+B|$ or as a dyadic level on which the energy has substantial mass. The difficulty with energy is that it is spread across all multiplicities at once, from sums that occur a single time to sums hit nearly $\min\{|A|,|B|\}$ times. To exploit it we would like to isolate one scale $\tau$ on which the representation function is essentially constant, so that the popular sums at that level behave like a regular bipartite graph rather than a weighted one. The question is whether such a level must always carry a fixed proportion of the energy. The following dyadic pigeonhole lemma guarantees that it does, at the cost of only a logarithmic factor. [quotetheorem:4569] [citeproof:4569] The dyadic lemma lets us replace a high-energy representation function by a nearly regular collection of popular sums. The finiteness of $A$ and $B$ is needed because the relevant representation counts lie in the finite range $1\le r_{A+B}(x)\le \min\{|A|,|B|\}$. The logarithmic loss cannot be ignored in iterative arguments, and the lemma does not identify a canonical level; it only guarantees that at least one level carries substantial energy. Once a popular level has been chosen, it is useful to forget the weights and remember only which pairs land in that level. This turns the incidence relation $a+b=x$ into a bipartite graph, allowing graph-density language to express how many additive pairs are being retained. [definition: Sumset Graph] Let $A,B$ be finite subsets of an abelian group $G$, and let $P\subset A+B$. The sumset graph associated to $P$ is the bipartite graph $\Gamma_P\subset A\times B$ with edge set \begin{align*} E(\Gamma_P)=\{(a,b)\in A\times B:a+b\in P\}. \end{align*} [/definition] The graph $\Gamma_P$ packages popular sums as edges rather than as elements of $G$. If $P$ is a popular level, then the number of edges is approximately $|P|\tau$, while the number of additive quadruples supported on that level is approximately $|P|\tau^2$. ## The Katz--Koester Transform How can a large overlap such as $A\cap(A+t)$ be converted into containment information about sumsets and difference sets? The Katz--Koester transform is a simple but powerful device: instead of only recording that two translates overlap, it records that the overlap has sumsets lying inside two different ambient sumsets at once. [quotetheorem:4570] [citeproof:4570] The transform turns popular differences into structured intersections of sumsets. The assumption $t\in A-B$ ensures that $A_t$ is the fibre of an actual overlap; if $t\notin A-B$, the set $A_t$ is empty and the containment carries no information. The result is only a containment, not an equality: even for intervals, the intersection $(A+A)\cap(A+A+t)$ can contain sums not produced by $(A\cap(A+t))+A$ with the chosen witnesses. Later, this is used to show that if many differences $t$ have large $A\cap(A+t)$, then many translates of $A+A$ overlap substantially, which is a key input in energy increment and covering arguments. [example: Katz Koester for an Interval] Let $A=\{1,\dots,N\}\subset\mathbb Z$ and $t$ satisfy $0\le t<N$. Then $A\cap(A+t)=\{t+1,\dots,N\}$ has size $N-t$. Adding $A$ gives an interval of length about $2N-t$, and it lies inside the overlap of $A+A$ with $A+A+t$. This example shows how a large translate intersection of $A$ forces a large translate intersection of the sumset. [/example] ## Dense Bipartite Graphs and Dependent Random Choice Once popular sums have been converted into a dense bipartite graph, what graph-theoretic mechanism extracts a subset with many shared neighbours? Dependent random choice gives such a mechanism. In additive combinatorics, shared neighbours translate into many pairs having controlled sums or differences, and this is the graph-theoretic core behind several structure-from-energy arguments. [quotetheorem:4571] [citeproof:4571] The value of the lemma is not the exact constants, but the conversion of edge density into common-neighbour richness. The assumptions that both vertex classes are finite and that the edge density is positive are essential for the random sampling expectations; a sparse matching, for instance, has no large subset whose pairs have many common neighbours. The theorem does not say that every pair in $X'$ is good unless the threshold and parameters make the exceptional term small. In an additive graph, common neighbours of $a_1,a_2\in A$ are elements $b\in B$ for which both $a_1+b$ and $a_2+b$ lie in the chosen popular sum set. Thus many pairs in $A'$ inherit additive constraints from the same collection of sums. [explanation: Energy Increment Viewpoint] Large energy says that many pairs of edges in $A\times B$ share the same sum. Popular sums select a part of the graph where this collision count is not spread too thinly. Dependent random choice then extracts a subset of vertices for which many pairs have many common witnesses, and the Katz--Koester transform converts those witnesses into containment relations among sumsets. This is called an energy increment viewpoint because the argument repeatedly replaces a diffuse high-energy statement by a denser statement on a better organised object. The new object might be a popular sum level, a dense subgraph, or a subset with many shared neighbours. Each replacement loses constants or logarithms, but gains a form of structure that can be used by later inverse theorems. [/explanation] ## Comparing the Model Examples What is the correct upper scale for additive energy, and how should we interpret examples that nearly attain it? Without an upper benchmark, the lower bound from small doubling has no natural meaning: saying that $E^+(A)$ is large only becomes informative once we know how large it could be. The three basic examples from this chapter sit at different points on the energy spectrum. Intervals and subgroups have energy of order $|A|^3$, but for different reasons: intervals have linearly varying representation counts, while subgroups have perfectly flat representation counts on $A+A$. Random sets in a much larger ambient interval have much smaller energy, because most additive coincidences are accidental rather than structurally forced. [quotetheorem:4572] [citeproof:4572] This upper bound shows why the interval and subgroup examples are extremal up to constants. Finiteness is essential, since the quadruple count is otherwise not a finite combinatorial quantity, and the cancellation in the ambient abelian group is what makes the fourth element determined by the first three. The theorem does not classify near-extremisers: cosets of finite subgroups, intervals, and other structured sets can all have energy on the $|A|^3$ scale for different reasons. The lower bound from Cauchy--Schwarz shows that small doubling pushes energy toward this extremal scale, while the popular-sums machinery begins the reverse direction: high energy can be refined into dense additive structure even when the whole sumset is not small. Additive energy shows that small doubling and many repeated sums are two faces of the same phenomenon. The next chapter turns this observation into a calculus for manipulating sumsets, difference sets, and iterated combinations of a set, giving tools that can be reused throughout the course. # 3. Ruzsa Calculus and Plünnecke Theory After Chapter 2 converted small sumsets into energy and popular-sum information, Ruzsa calculus turns one small-sumset hypothesis into many other estimates. The motivating question is quantitative: if a finite set $A$ has small doubling, how much control does this give over $A-A$, $2A-2A$, or more general sets $mA-nA$? This chapter develops the basic metric-like inequalities of Ruzsa, then uses Pluennecke theory to obtain the sharp exponential dependence on the number of summands. ## Comparing Different Sumsets Suppose $A$ and $B$ are finite non-empty subsets of an abelian group $G$. If $|A+B|$ is small, should $A$ and $B$ be thought of as close to each other? Ruzsa's answer is to compare the size of $A-B$ with the geometric mean of $|A|$ and $|B|$. [definition: Ruzsa Distance] Let $G$ be an abelian group, and let $A,B \subset G$ be finite non-empty sets. The Ruzsa distance between $A$ and $B$ is \begin{align*} d(A,B) := \log \frac{|A-B|}{|A|^{1/2}|B|^{1/2}}. \end{align*} [/definition] The word distance is used with care: $d(A,A)$ need not be zero, since $|A-A|$ is usually larger than $|A|$. The reason this still behaves like useful geometry is that additive comparisons can be routed through intermediate sets: if $A$ is close to $B$ and $B$ is close to $C$, then $A$ should be close to $C$ up to the same logarithmic bookkeeping. This is the structural property that makes Ruzsa distance a device for chaining several small-sumset estimates into one controlled bound. [quotetheorem:4573] [citeproof:4573] The theorem is the main bookkeeping tool in this chapter. It allows a difficult set such as $A-A$ or $A+B-C$ to be bounded by stepping through intermediate sets whose pairwise sumsets are already known to be small. [example: Difference Set From Small Doubling] Let $A\subset G$ be finite non-empty and suppose $|A+A|\le K|A|$. Apply the Ruzsa triangle inequality with $(A,B,C)=(A,-A,A)$ to obtain \begin{align*} |A-A| \le \frac{|A+A|^2}{|A|} \le K^2|A|. \end{align*} This example shows that small doubling controls the difference set, although with a quadratic loss in $K$. [/example] ## Covering by Few Translates If $A+B$ is small, then $B$ cannot point in many independent additive directions relative to $A$. The covering question asks for a concrete replacement for this intuition: can most or all of $B$ be covered by few translates of $A-A$? [quotetheorem:4574] [citeproof:4574] The covering lemma is weaker than saying that $B$ is contained in a subgroup or progression, but it is much more flexible. It converts small expansion by $A$ into a covering statement by a controlled number of translates of the symmetric set $A-A$. [example: Approximate Closure Under Addition] Assume $|A+A|\le K|A|$. Applying the Ruzsa covering lemma with $B=A$ gives a set $X\subset A$ with $|X|\le K$ and \begin{align*} A \subset X+A-A. \end{align*} Adding $A$ to both sides gives $A+A\subset X+2A-A$, so repeated use of the same containment expresses iterated sumsets in terms of boundedly many translates of iterated difference sets. This is a first structural shadow of small doubling. [/example] ## Basic Ruzsa Calculus Inequalities The next problem is to control all expressions built from $A$ by adding and subtracting it several times. A useful estimate should depend polynomially on the doubling constant $K=|A+A|/|A|$ when the numbers of summands are fixed. [quotetheorem:4575] [citeproof:4575] This theorem is often enough for qualitative arguments: fixed iterated sum-difference sets have size at most a fixed power of $K$ times $|A|$. Plünnecke theory, developed below, improves the exponent and gives the canonical $K^{m+n}$ behaviour. [example: Polynomial Bounds for Fixed Iterates] Let $|A+A|\le K|A|$. Taking $m=n=2$ in the basic sum-difference bound gives \begin{align*} |2A-2A|\le K^7|A|. \end{align*} For any fixed $r$, the same theorem bounds every set formed from $r$ copies of $A$ and $-A$ by $K^{O(r)}|A|$. Thus small doubling prevents any fixed-order additive expression from having size comparable to $|A|^r$. [/example] ## Magnification Ratios and Addition Graphs Ruzsa calculus gives robust estimates, but it loses powers of $K$. Pluennecke's theory asks a sharper question: if adding $B$ to $A$ expands by a factor at most $K$, can repeated addition of $B$ be made to expand by at most $K^k$? [definition: Addition Graph] Let $A,B$ be finite subsets of an abelian group $G$. In contrast with the bipartite sumset graphs of Chapter 2, the addition graph generated by $B$ from $A$ is the layered directed graph with vertex layers \begin{align*} V_i := A+iB \qquad (i\ge 0), \end{align*} and directed edges $x\to x+b$ from $V_i$ to $V_{i+1}$ for every $x\in V_i$ and $b\in B$. [/definition] The graph records all ways of adding one element of $B$ at each step. Its expansion behaviour is measured not only on the whole bottom layer but on every subset of it. A global bound on $|A+B|/|A|$ can hide a smaller part of $A$ that expands much less, and Pluennecke's method is designed to find the least-expanding part. The right invariant therefore minimizes expansion over all nonempty subsets of the bottom layer after a fixed number of addition steps. To state that invariant, we need notation for the set reached from a chosen subset after exactly $j$ additions. The resulting ratio measures the best possible expansion after $j$ layers, rather than the expansion of the whole graph. [definition: Magnification Ratio] Let $\nabla^j(X)$ denote the set of vertices reachable from $X\subset V_0$ in $j$ steps in a layered addition graph. The $j$th magnification ratio is \begin{align*} D_j := \min_{\varnothing\neq X\subset V_0} \frac{|\nabla^j(X)|}{|X|}. \end{align*} [/definition] The definition isolates the least expanding part of $A$. Pluennecke's insight is that the sequence $D_j$ has a multiplicative convexity property. Granting these definitions, we want to convert a single hypothesis on $D_1$ into control over $D_j$ for every $j$, since $D_1$ records the expansion of $A$ under one copy of $B$ while $D_j$ governs the $j$-fold sumset. The danger is that expansion might degrade at each step, letting $D_j$ grow faster than any power of $D_1$. Pluennecke's commutativity theorem rules this out by establishing a multiplicative convexity for the magnification ratios. Concretely, we need an inequality that bounds every $D_j$ in terms of $D_1$ alone, so that a single doubling hypothesis propagates to all higher sumsets without compounding loss. The following theorem supplies exactly this: it shows the magnification ratios are log-convex in $j$, so $D_j$ can never exceed the power $D_1^{j}$ predicted by the first step. [quotetheorem:4576] [citeproof:4576] The convexity statement is the engine behind every Pluennecke-type bound in these notes: it converts a hypothesis on $D_1$, that is on the doubling $|A+B|/|A|$, into control of all higher magnification ratios with no additional loss. Because each $D_j$ is a minimum over nonempty subsets of $V_0$, the inequality bounds the worst-expanding part of $A$ rather than its average behaviour, which is precisely the robustness later iterations require. The price is that the global ratio can be dominated by a small obstinate subset, and the next statement extracts the complementary, well-expanding part explicitly. For these notes, the important consequence is the asymmetric form below. It says that even if all of $A$ does not expand optimally under repeated addition of $B$, some large enough non-empty part of $A$ does. [quotetheorem:4577] [citeproof:4577] The subset $A'$ is essential in this asymmetric statement. Without it a uniform bound over all of $A$ would simply be false, since one part of $A$ can expand far faster than the average while another part stagnates, so no single exponent fits every point. The theorem instead isolates a nonempty portion $A'$ whose higher-order expansion obeys the expected power law. The size of $A'$ is not part of this quoted statement; the point for the next step is that a carefully chosen subset with controlled growth is enough to drive the Pluennecke--Ruzsa inequality. ## Pluennecke-Ruzsa Inequality The final goal of the chapter is a clean estimate for mixed sums and differences. The sharp Pluennecke statement must be read in its good-subset form: small doubling lets us pass to a nonempty part of $A$ whose mixed sums and differences grow with the expected exponent. For the whole original set, the earlier Ruzsa-calculus bounds still give polynomial control, but generally with weaker exponents. The preceding theorem is therefore the main Pluennecke input used in these notes. It says that from $|A+A|\le K|A|$ one may extract a nonempty $A'\subset A$ such that $A'+mA-nA$ has size at most $K^{m+n}|A'|$. This is the form that survives iteration cleanly: the base set is chosen to be the part of $A$ with optimal expansion. [example: Small Doubling Controls Good-Subset Sum Difference Sets] Suppose $|A+A|\le K|A|$. The signed Pluennecke estimate gives a nonempty subset $A'\subset A$ such that \begin{align*} |A'+3A-2A|\le K^5|A'|, \qquad |A'+4A-4A|\le K^8|A'|. \end{align*} This is stronger than a bare polynomial estimate for many iterative arguments, because it keeps the optimal exponent after choosing the right base subset. If one insists on bounding $mA-nA$ for the original $A$, one falls back on Ruzsa calculus and obtains polynomial bounds in $K$ with less sharp exponents. [/example] The example also points to a boundary of the idea. The following remark, Role in the Course, records that interpretation before the construction is used again. [remark: Role in the Course] Ruzsa calculus supplies the algebraic manipulation rules for sumsets, while Pluennecke theory supplies the sharp growth estimate. Later chapters use these tools as black boxes: Balog-Szemeredi-Gowers arguments first locate a set with small doubling, and inverse theorems then apply Pluennecke-Ruzsa bounds to control the ambient additive structure. [/remark] Ruzsa calculus and Plünnecke theory show that one small-doubling hypothesis controls a whole family of related additive expressions. With those quantitative bounds in hand, the next chapter asks the inverse question: which sets must look like generalized progressions if their sumsets are small? # 4. Generalized Arithmetic Progressions and Freiman's Theorem Small doubling asks for a converse to the elementary observation that structured sets have small sumsets. Arithmetic progressions, multidimensional boxes, and finite-field subspaces all satisfy $|A+A| \le K|A|$ for moderate $K$; this chapter studies how much of that structure is forced by the inequality itself. The guiding problem is inverse: given only the size of $A+A$, recover a low-complexity object that contains $A$ or a large part of $A$. The central objects are generalized arithmetic progressions, which are additive images of boxes in integer lattices. They provide a flexible language for describing finite sets that behave like bounded-dimensional chunks of abelian groups, and Freiman homomorphisms record exactly the additive relations that matter for sumset questions. ## Progressions as Models for Small Doubling What should replace an ordinary arithmetic progression when a set has several independent additive directions? A two-dimensional box such as $\{a+mx+ny:0\le m<L_1,0\le n<L_2\}$ has small doubling for the same reason an interval does: adding two points only doubles each side length. Generalized arithmetic progressions formalise this phenomenon while separating the combinatorial box from its image in the ambient group. [definition: Generalized Arithmetic Progression] Let $G$ be an abelian group. A generalized arithmetic progression, or GAP, of rank $d$ in $G$ is a set \begin{align*} P = \{x_0 + n_1x_1 + \cdots + n_dx_d : 0 \le n_i < L_i \text{ for } 1 \le i \le d\}, \end{align*} where $x_0,x_1,\dots,x_d \in G$ and $L_1,\dots,L_d \in \mathbb N$. [/definition] The parameters define a homomorphism from the integer box $B=\prod_{i=1}^d \{0,\dots,L_i-1\}$ into $G$, followed by translation by $x_0$. The rank counts the number of independent directions in the parametrisation, not necessarily the dimension of the image. Because different parameter choices may land on the same group element, the size of the parameter box and the size of the resulting set need not agree. We keep track of the parameter-box size separately, since it is the natural quantity in counting estimates before any collisions in $G$ are identified. [definition: Volume of a Progression] For a generalized arithmetic progression $P$ with side lengths $L_1,\dots,L_d$, its volume is \begin{align*} \operatorname{vol}(P)=L_1\cdots L_d. \end{align*} [/definition] The volume is the number of parameter choices before identifying collisions in the ambient group. When no collisions occur, the volume equals $|P|$; otherwise it overcounts the set. For structural theorems, this distinction matters: a progression with many collisions may look large in its parameters while representing a much smaller set. The cleanest model is therefore a progression whose parametrisation is injective, so that counting in the box is the same as counting in the ambient group. Later inverse results need to treat a progression as both a parametrised box and an actual subset of the ambient group. To make that possible, we isolate the condition under which the parameters give honest coordinates rather than multiple names for the same element. [definition: Proper Progression] A generalized arithmetic progression \begin{align*} P = \{x_0 + n_1x_1 + \cdots + n_dx_d : 0 \le n_i < L_i\} \end{align*} is proper if every element of $P$ has a unique representation with $0 \le n_i < L_i$ for all $i$. [/definition] Properness is the condition that the parametrising box embeds without folding. Non-proper progressions are still useful, but proper progressions give the cleanest counting estimates. [example: Ordinary Progressions and Boxes] In $\mathbb Z$, the set $P=\{a,a+r,\dots,a+(L-1)r\}$ is a rank $1$ GAP of volume $L$. It is proper when $r \neq 0$, and $P+P$ has size $2L-1$. In $\mathbb Z^2$, the rectangle $P=\{(m,n):0\le m<L_1,0\le n<L_2\}$ is a proper rank $2$ GAP, and a direct computation gives \begin{align*} |P+P|=(2L_1-1)(2L_2-1)<4|P|. \end{align*} This shows why rank $d$ progressions have doubling bounded in terms of $2^d$ when the side lengths are large. [/example] The rectangle example shows the ideal behaviour: the parameter box and the progression have the same size, and doubling only enlarges the box in each coordinate. The next example isolates the obstruction that must be excluded when we want counting arguments to reflect the parametrisation. [example: A Non-Proper Progression] In $\mathbb Z/6\mathbb Z$, take $P=\{n\cdot 2:0\le n<4\}$. The four parameter values give residues $0,2,4,0$, so $P=\{0,2,4\}$ has volume $4$ but cardinality $3$. The failure of properness is caused by the relation $3\cdot 2=0$ in the ambient group. [/example] This collision phenomenon explains why the next counting theorem assumes properness. Without unique parametrisation, the volume of the box is no longer the cardinality of the progression, so a box-counting proof would be measuring the wrong object. [quotetheorem:4579] [citeproof:4579] This theorem supplies the forward direction: low rank implies small doubling. Properness is used only to identify $|P|$ with the number of parameter choices; the containment of $P+P$ in the doubled box holds even for non-proper progressions, but then $|P|$ may be much smaller than the box volume. The non-proper example in $\mathbb Z/6\mathbb Z$ also shows that small doubling of a GAP does not force properness, since torsion can collapse many parameters. Freiman theory asks for partial converses, with the rank and volume controlled by the doubling constant rather than by an arbitrary parametrisation. ## Freiman Homomorphisms and Additive Relations Which features of a set are preserved when we move it into a more convenient group? For sumset problems, the relevant data are not all group operations, but additive relations of bounded length. Freiman homomorphisms isolate these finite-order additive patterns. [definition: Freiman Homomorphism] Let $A\subset G$ and $B\subset H$ be subsets of abelian groups. A map $\phi:A\to B$ is a Freiman $s$-homomorphism if whenever $a_1,\dots,a_s,a'_1,\dots,a'_s\in A$ satisfy \begin{align*} a_1+\cdots+a_s=a'_1+\cdots+a'_s, \end{align*} we have \begin{align*} \phi(a_1)+\cdots+\phi(a_s)=\phi(a'_1)+\cdots+\phi(a'_s). \end{align*} [/definition] A Freiman $2$-homomorphism preserves all additive quadruple relations $a_1+a_2=a'_1+a'_2$. This is the level needed for doubling and additive energy. A one-way preservation statement is not enough for modelling, because it may collapse distinct additive configurations or create a model that satisfies fewer relations than the original set. To transport estimates back and forth between two ambient groups, the preservation must hold in both directions, which requires a bijective map whose inverse has the same Freiman property. [definition: Freiman Isomorphism] A bijection $\phi:A\to B$ is a Freiman $s$-isomorphism if $\phi$ and $\phi^{-1}$ are both Freiman $s$-homomorphisms. [/definition] Freiman isomorphism means that the two sets have the same additive relations up to order $s$. It allows us to replace a set by a model in $\mathbb Z^d$, $\mathbb Z/N\mathbb Z$, or a finite-field vector space without changing the relevant additive combinatorics. [example: Modelling an Interval Modulo a Large Prime] Let $A=\{0,1,\dots,L-1\}\subset \mathbb Z$ and let $p>2L-2$ be prime. Reduction modulo $p$ gives a Freiman $2$-isomorphism from $A$ onto its image in $\mathbb F_p$. Indeed, any equality of two sums in $\mathbb F_p$ comes from an integer congruence between numbers in $[0,2L-2]$, and the chosen bound on $p$ prevents wraparound. [/example] The point of the bound $p>2L-2$ is that all sums of two elements of $A$ still have their integer representatives visible after reduction. This is the model situation for a Freiman isomorphism: the ambient group changes, but no additive quadruple relation is created or destroyed. This raises the general question of exactly what is preserved when a set is transported across a Freiman $s$-isomorphism. If modelling is to be useful, the cardinalities of sumsets, and more generally of iterated sums and differences up to order $s$, must be invariant under the map, so that an estimate proved in a convenient model transfers back to the original set. The next theorem confirms that a Freiman $s$-isomorphism preserves precisely these quantities, making any additive statement of complexity at most $s$ checkable in the model. [quotetheorem:4580] [citeproof:4580] This justifies changing ambient groups when the replacement is Freiman-isomorphic to the original set. Bijectivity and the inverse Freiman condition are both essential: the quotient map $\mathbb Z\to\mathbb Z/2\mathbb Z$ is a one-way homomorphism on $\{0,1,2\}$, but it collapses distinct sums such as $0+2$ and $1+1$ after reduction. Thus a one-way Freiman homomorphism can only guarantee a well-defined map of sumsets, not equality of their sizes. Many inverse theorems are proved by first modelling $A$ efficiently and then finding structure in the model. ## The Sharp Small-Doubling Threshold in the Integers What happens at the first nontrivial threshold beyond ordinary progressions? In $\mathbb Z$, the inequality $|A+A|<3|A|-4$ is strong enough to force $A$ into a one-dimensional arithmetic progression. Freiman's $3k-4$ theorem is the sharp form of this principle. [definition: Additive Diameter] For a finite nonempty set $A\subset \mathbb Z$, its additive diameter is \begin{align*} \operatorname{diam}(A)=\max A-\min A. \end{align*} [/definition] After translating, a finite set of integers may be viewed as a subset of $\{0,\dots,n\}$. If the sumset is very small, there cannot be too many gaps in this interval. The diameter lets us pose the sharp one-dimensional question: how small must $|A+A|$ be before $A$ is forced to sit inside a short arithmetic progression, and exactly how short is that progression? A single arithmetic progression already has $|A+A|=2|A|-1$, so any useful threshold must lie above this value, and the real content is to pin down both the precise constant and the exact length it forces. Freiman's $3k-4$ theorem answers both at once. [quotetheorem:4581] [citeproof:4581] The theorem is stronger than a qualitative inverse result: it gives the exact progression length forced by the sumset size. It is a containment theorem, not a classification theorem; many different subsets of the containing progression can have the same or smaller doubling behaviour. The threshold is close to sharp because examples with two separated directions can push the sumset size to about $3k$ while no single short arithmetic progression captures the geometry. This is the first place in the course where one-dimensional inverse structure begins to give way to higher-rank progressions. [example: Classifying Sets with Very Small Doubling] Let $A\subset\mathbb Z$ have $|A|=k\ge 3$ and $|A+A|<3k-4$. The theorem gives an arithmetic progression $P$ containing $A$ with length at most $|A+A|-k+1<2k-3$. Thus $A$ occupies more than half of $P$ up to a bounded endpoint correction. For instance, if $k=5$ and $|A+A|\le 10$, then $A$ lies in an arithmetic progression of length at most $6$, so after an affine change it is a five-element subset of $\{0,1,2,3,4,5\}$. [/example] This example demonstrates the strength of the theorem but also its limitation: it produces a short ambient progression, not a canonical normal form for $A$. The next remark explains why such a conclusion cannot persist once the doubling threshold is relaxed. [remark: Sharpness of the Threshold] The threshold cannot be raised much while keeping a one-dimensional conclusion. Sets resembling a two-dimensional corner, such as $\{0,1,\dots,m-1\}\cup\{N,2N,\dots,rN\}$ with separated scales, create additional directions while keeping the doubling comparable to $3|A|$. These examples explain why general Freiman theorems must allow higher-rank progressions once the doubling constant moves beyond the $3k-4$ range. [/remark] ## Freiman-Ruzsa Structure in Torsion-Free Groups If $|A+A|\le K|A|$ with $K$ fixed but not near $2$, what replaces the containing arithmetic progression? Ordinary progressions alone cannot be the answer: a dense subset of a two-dimensional box $\{mx+ny:0\le m<N,0\le n<M\}$ may have small doubling while projecting nontrivially in two independent directions. Nor can one demand that $A$ itself be a progression, since deleting a bounded proportion of points usually preserves small doubling but destroys exact box structure. The correct conclusion in torsion-free settings is containment in a generalized arithmetic progression whose rank and relative volume depend only on $K$. This is the Freiman-Ruzsa theorem over the integers and, more generally, over torsion-free abelian groups. [quotetheorem:4582] [citeproof:4582] This theorem is a qualitative inverse theorem: bounded doubling in $\mathbb Z$ is equivalent, up to constants depending on $K$, to containment in a bounded-rank progression. The nonempty hypothesis only removes the degenerate empty-set case, while the finiteness hypothesis is essential because the whole statement compares cardinalities. The conclusion does not say that $A$ is dense in every side direction of $P$, nor that the rank obtained is unique; it gives one bounded-complexity container. The constants in the classical proof are not polynomial in $K$, which motivates sharper conjectures later in the chapter. [example: Dense Subsets of a Two-Dimensional Progression] Let $P=\{nx+\ell y:0\le n<N,0\le \ell<M\}\subset\mathbb Z$ be a proper rank $2$ GAP, and suppose $A\subset P$ has density at least $1/10$. Then a direct containment gives $A+A\subset P+P$, so \begin{align*} |A+A|\le |P+P|\le 4|P|\le 40|A|. \end{align*} This example shows the reusable mechanism behind many small-doubling constructions: dense subsets of low-rank progressions inherit small doubling even when they are not themselves progressions. [/example] The example also explains why the inverse theorem must tolerate bounded losses. From the inequality alone we should expect to recover a structured container comparable to $A$, not an exact description of the points of $A$ inside that container. We can now ask for the full converse in general: if $A\subset\mathbb Z$ has doubling bounded by $K$, must $A$ be contained in a generalised arithmetic progression whose rank and size are controlled by $K$ alone? The dense-subset examples show we cannot hope for more than containment in such a progression, but they leave open whether containment is always available. Freiman's theorem, in its torsion-free form, asserts that it is. [quotetheorem:4583] [citeproof:4583] The torsion-free hypothesis matters because finite subgroups themselves have doubling $1$ but need not resemble a progression of bounded rank and comparable size. For instance, a large vector subspace in $\mathbb F_2^n$ has $A+A=A$, so any theorem in a torsion group must allow subgroup components. The theorem also does not give uniform constants independent of $K$; all rank and size bounds are permitted to worsen as $K$ increases. In groups with torsion, the corresponding inverse objects are coset progressions, where a finite subgroup is added to a GAP. ## Finite-Field Models and Bogolyubov's Lemma How does the inverse problem change in a vector space over a finite field? In $\mathbb F_p^n$, subspaces play the role of progressions: if $V\le \mathbb F_p^n$, then $V+V=V$. There is also a new obstruction: a dense set need not contain any large subspace at all, since random dense subsets typically miss many specified subspaces. The finite-field theory therefore seeks large subspaces inside iterated sumsets such as $2A-2A$, where repeated addition has smoothed out the random-looking gaps. [definition: Density in a Finite Vector Space] Let $G=\mathbb F_p^n$ and let $A\subset G$. The density of $A$ is \begin{align*} \alpha=\frac{|A|}{|G|}. \end{align*} [/definition] The Fourier transform on finite abelian groups measures how far a set is from being uniformly distributed across characters. Large Fourier coefficients identify directions along which density can be incremented; iterating this idea produces a subspace contained in an iterated sumset. The question this raises is whether the smoothing effect of repeated addition is strong enough to guarantee genuine structure: given only that $A$ has density $\alpha$, must the fourfold sumset $2A-2A$ contain a subspace whose codimension is controlled purely by $\alpha$? Bogolyubov's lemma answers yes, and the next statement records the precise codimension bound that later finite-field arguments will rely on. [quotetheorem:4584] [citeproof:4584] This lemma is a finite-field analogue of the principle that dense sets have structured iterated sumsets. The density hypothesis is necessary in this form: a very sparse set can have $2A-2A$ far too small to contain a large subspace. It does not say that $A$ itself contains a large subspace, and random dense examples show why that stronger conclusion would be false. What it provides is structure after four additive operations, which is exactly the input needed for finite-field Freiman-Ruzsa arguments. [example: Dense Subsets of $\mathbb F_2^n$] Let $A\subset\mathbb F_2^n$ have density $\alpha$. Since $-A=A$ in characteristic $2$, Bogolyubov's lemma says that $4A$ contains a subspace of codimension $O(\alpha^{-2})$. If $A$ is already a coset of a subspace $V$, then $4A=V$, so the lemma recovers the expected model case with codimension equal to that of $V$. [/example] This model case should be read as the finite-field analogue of a proper progression: the structured object has doubling exactly $1$. The general theorem below says that every small-doubling set is contained in a comparably sized version of this model, after allowing constants depending on $p$ and $K$. [quotetheorem:4585] [citeproof:4585] Finite-field inverse theorems are often cleaner than their integer counterparts because subspaces replace progressions and there are no boundary effects from intervals. The nonempty hypothesis again removes a degenerate case, while finiteness is automatic in $\mathbb F_p^n$ but is still part of the cardinality statement. The theorem does not say that $A$ is itself close to a subspace in symmetric difference; it gives containment in one coset of controlled size. The price for this clean structural form is that constants may depend on the characteristic. ## Polynomial Freiman-Ruzsa and Quantitative Inverse Theory How efficient should Freiman's theorem be as $K$ grows? The classical Freiman-Ruzsa theorem proves bounded-rank structure for fixed $K$, but many applications need polynomial dependence on $K$. The obstruction is cumulative loss: if an iterative argument applies an inverse theorem at several scales, exponential dependence on $K$ can turn a useful density increment into a vacuous bound. The Polynomial Freiman-Ruzsa philosophy predicts that small-doubling sets are controlled by structured objects with losses $K^{O(1)}$ rather than exponential or worse. [definition: Control by a Structured Set] Let $A,B$ be finite subsets of an abelian group and let $K\ge 1$. We say that $B$ $K$-controls $A$ if \begin{align*} |B|\le K|A| \end{align*} and there exists a set $X$ with $|X|\le K$ such that \begin{align*} A\subset B+X. \end{align*} [/definition] Control is weaker than containment in a single progression of comparable size, but it is robust under covering arguments. It captures the idea that $A$ is covered by few translates of a structured object. With this language in place we can ask for the strongest structural prediction available: if $A$ has doubling $K$ in a torsion-free group, can it always be controlled by a progression whose complexity and covering loss grow only polynomially in $K$? This is the central open problem of the subject, so it must be presented as a conjectural benchmark rather than as a theorem card. [remark: Polynomial Freiman-Ruzsa Conjecture] The conjectural torsion-free form predicts that there is an absolute constant $C>0$ such that every finite nonempty $A\subset\mathbb Z$ with $|A+A|\le K|A|$ is $K^C$-controlled by a proper generalized arithmetic progression of rank at most $C\log K$. [/remark] Its importance is quantitative: it predicts that the complexity of the model grows logarithmically in $K$ and that the covering loss is polynomial. The conjecture does not claim that $A$ lies inside one progression of size $K^C|A|$ without translates; control allows a bounded set of shifts, which is necessary for stability under covering arguments. The torsion-free setting is also essential to this exact formulation, since finite subgroups force the structured object to include subgroup components in torsion groups. It is natural to ask whether the conjecture becomes tractable when progressions are replaced by their cleanest possible analogue. In a finite vector space the structured objects are subspaces, and one can hope that a small-doubling set is controlled by a subspace of comparable size with only polynomial loss. The finite-field version below isolates exactly this model case, which is where the strongest partial results are known. [quotetheorem:4587] [citeproof:4587] The finite-field version is a model case for the general conjecture because the structured objects are subspaces. It does not assert that $A$ is contained in a subspace of size $K^C|A|$ directly; it asserts control, so $A$ may be covered by polynomially many translates of a polynomially sized subspace. This distinction matters in examples built from several cosets of the same subspace, where no single small subspace contains all of $A$. The conjecture is closely connected with approximate homomorphisms, property testing in theoretical computer science, and quantitative forms of Bogolyubov's lemma. [remark: Why Polynomial Bounds Matter] Many applications of additive combinatorics use inverse theorems inside iterative arguments. Exponential losses in $K$ can make the final bound unusable, while polynomial losses often survive iteration. This is why Freiman-type statements are not judged only by their qualitative structure, but also by how rank, codimension, and covering constants depend on the doubling constant. [/remark] The chapter's main message is that generalized arithmetic progressions and subspaces are the universal models for small doubling in the settings considered here. Freiman homomorphisms explain why these models may be transferred between groups, sharp integer theorems describe the first threshold, and Freiman-Ruzsa theory gives the bounded-complexity structure that underlies later results such as Balog-Szemeredi-Gowers and higher-order inverse theorems. Freiman's theorem explains the structure forced by small doubling, at least up to bounded complexity. The next chapter pushes further by showing how to pass from dense additive statistics to actual large structured subsets, which is the bridge from inverse results to approximate groups. # 5. Balog-Szemerédi-Gowers and Approximate Groups Chapters 2 and 3 developed additive energy, popular sums, and Plünnecke-Ruzsa theory as quantitative tools for measuring additive structure. This chapter explains how to pass from a statistical statement, such as many additive quadruples or many edges in an additive graph, to an honest large subset with small doubling. The central mechanism is the Balog-Szemerédi-Gowers theorem, and the structural language that emerges from it is the language of approximate groups. The guiding question is: when a set behaves additively on many pairs but not necessarily on all pairs, can we discard a controlled portion and recover genuine small-doubling structure? BSG gives an affirmative answer with polynomial quantitative losses. This is the bridge between energy estimates, which are often produced by Fourier analysis or incidence geometry, and inverse theorems, which require small sumsets. ## Dense Additive Graphs and Restricted Sumsets Suppose $A$ is a finite subset of an abelian group $G$, and many pairs $(a,a') \in A \times A$ have sums belonging to a comparatively small set. The full sumset $A+A$ may still be large, because a small exceptional set of pairs can produce many distinct sums. The first problem is to formulate a version of small doubling that only sees the popular or permitted pairs. [definition: Additive Graph] Let $A$ and $B$ be finite subsets of an abelian group $G$. An additive graph between $A$ and $B$ is a bipartite graph $\Gamma \subset A \times B$. [/definition] The graph is not extra algebraic structure on $A$ and $B$; it is a bookkeeping device for remembering which pairs are regarded as reliable. If a dense set of edges has only a few possible sums, then the addition table contains a large structured region even though the whole Cartesian product may still behave badly. To measure this partial structure, the ordinary sumset is too crude because it includes sums from pairs we have deliberately discarded. We instead need a sumset formed only from the permitted edges, so that its size reflects the additive behaviour on the dense reliable part of the graph. [definition: Restricted Sumset] Let $A$ and $B$ be finite subsets of an abelian group $G$, and let $\Gamma \subset A \times B$ be an additive graph. The restricted sumset along $\Gamma$ is \begin{align*} A +_{\Gamma} B := \{a+b : (a,b) \in \Gamma\}. \end{align*} [/definition] The graph is useful when $|\Gamma|$ is dense in $A \times B$ while $|A+_{\Gamma}B|$ is small. This says that many allowed pairs collapse into few sums, even if the disallowed pairs hide large additive noise. [example: A Dense Graph Inside An Arithmetic Progression] Let $A=B=\{1,\dots,N\}\subset \mathbb Z$ and let \begin{align*} \Gamma=\{(a,b)\in A\times B:a+b\le N+1\}. \end{align*} For each $a\in A$ there are $N+1-a$ possible values of $b$, so \begin{align*} |\Gamma|=N+(N-1)+\cdots+1=\frac{N(N+1)}2. \end{align*} Thus $\Gamma$ has density $(N+1)/(2N)>1/2$ in $A\times B$. The restricted sums are exactly \begin{align*} A+_\Gamma B=\{2,3,\dots,N+1\}, \end{align*} which has only $N$ elements. The graph keeps a positive proportion of the addition table while exposing only the triangular structured part of the sums. [/example] The previous example fixes one test case for the idea. The next example, Dense Relations Hidden In A Noisy Set, changes the setting so the same mechanism can be recognized from another angle. [example: Dense Relations Hidden In A Noisy Set] Let $P=\{1,\dots,N\}$, let $R\subset \mathbb Z$ satisfy $|R|=N$ and $R\cap P=\varnothing$, and put $A=P\cup R$. Take \begin{align*} \Gamma=P\times P\subset A\times A. \end{align*} Then $|A|=2N$ and $|\Gamma|=N^2$, so $\Gamma$ has density $1/4$ in $A\times A$. Along this graph the restricted sumset is just \begin{align*} A+_\Gamma A=P+P=\{2,3,\dots,2N\}, \end{align*} so $|A+_\Gamma A|=2N-1$ even if the sums involving $R$ are completely unstructured. This is the basic BSG situation: a noisy set may still contain a dense graph of additive relations with a small visible sumset. [/example] The point of BSG is that examples of this kind are not accidental. Whenever a dense graph has a small restricted sumset, a large subset with small full sumset can be extracted. ## The Balog-Szemerédi-Gowers Theorem The next question is whether dense additive relations force a structured subset without assuming the structure is already visible. The Balog-Szemerédi-Gowers theorem answers this by upgrading restricted small doubling to ordinary small doubling on large subsets. [quotetheorem:4588] [citeproof:4588] The two hypotheses play different roles. The density condition prevents the theorem from seeing only a tiny structured corner of $A \times B$: for instance, a single edge always has a restricted sumset of size $1$, but it carries no information about large subsets of $A$ or $B$. The restricted-sumset condition prevents the opposite failure, where many edges are present but their sums are essentially all distinct; a random dense graph on a random-looking set typically has a restricted sumset comparable to the full sumset and gives no small-doubling conclusion. The theorem also has a built-in limitation. It does not say that all of $A$ or all of $B$ has small doubling, and it does not identify the extracted subsets explicitly. The conclusion is a conversion principle: after discarding a polynomially controlled part of the original sets, the statistical relation becomes ordinary small-doubling structure. The exact polynomial exponent depends on the version of the theorem and on how efficiently the graph-cleaning step is implemented; in this course the important feature is polynomial dependence on $K$. In applications one rarely starts from a graph with a controlled restricted sumset; instead one starts from a single scalar statistic, the additive energy $E^+(A)$, which counts the quadruples $a_1+a_2=a_3+a_4$. The practical question is therefore whether a lower bound on this energy alone is enough to extract a small-doubling subset, with no graph hypothesis to verify. The following energy formulation of BSG answers this directly: it converts the bound $E^+(A)\ge |A|^3/K$ into a large subset $A'$ whose sumset $|A'+A'|$ is polynomially controlled, which is the form most Fourier and incidence arguments actually produce. [quotetheorem:4589] [citeproof:4589] This form is often the one used in applications. Fourier or incidence arguments tend to produce many solutions of $a_1+a_2=a_3+a_4$, while inverse theorems require a set with small $A'+A'$. The threshold $E^+(A) \ge |A|^3/K$ is essential because $|A|^3$ is the natural scale of high energy. A random subset of a large ambient group normally has energy close to the random baseline, far below this scale, and there is then no reason for any positive-density subset to have small doubling. At the other extreme, every finite set has at least the diagonal additive quadruples, but those diagonal coincidences alone cannot force structure. The conclusion should also be read carefully. BSG extracts some large structured subset; it does not say which subset it is, that it is unique, or that it resembles a progression. Those identifications belong to the inverse theory that starts after BSG has supplied the small-doubling input. [example: Extracting A Structured Subset From Energy] Let $A=P\cup R\subset \mathbb Z$, where $P=\{1,\dots,N\}$, $N\ge 1$, and $R$ is a set of $N$ integers disjoint from $P$. We compute the part of $E^+(A)$ forced by additive quadruples lying entirely in $P^4$. For $s\in\{2,\dots,2N\}$, write \begin{align*} r_{P+P}(s)=|\{(p,p')\in P^2:p+p'=s\}|. \end{align*} If $2\le s\le N+1$, then $p'=s-p$, and the conditions $p\in P$ and $p'\in P$ are \begin{align*} 1\le p\le N \qquad\text{and}\qquad 1\le s-p\le N. \end{align*} The second double inequality is equivalent to \begin{align*} s-N\le p\le s-1. \end{align*} Since $s\le N+1$, we have $s-N\le 1$, and since $s\ge 2$, we have $s-1\ge 1$. Also $s-1\le N$ because $s\le N+1$. Hence the possible values of $p$ are exactly \begin{align*} p=1,2,\dots,s-1, \end{align*} so \begin{align*} r_{P+P}(s)=s-1. \end{align*} If $N+2\le s\le 2N$, then again $p'=s-p$, and \begin{align*} 1\le p\le N, \qquad 1\le s-p\le N \end{align*} is equivalent to \begin{align*} 1\le p\le N, \qquad s-N\le p\le s-1. \end{align*} Since $s\ge N+2$, we have $s-N\ge 2$, and since $s\le 2N$, we have $s-N\le N$. Thus the possible values of $p$ are exactly \begin{align*} p=s-N,s-N+1,\dots,N. \end{align*} The number of integers in this interval is \begin{align*} N-(s-N)+1=2N+1-s, \end{align*} so \begin{align*} r_{P+P}(s)=2N+1-s. \end{align*} Grouping ordered pairs in $P^2$ by their sum gives \begin{align*} E^+(P) &=\sum_{s=2}^{2N} r_{P+P}(s)^2\\ &=\sum_{s=2}^{N+1}(s-1)^2+\sum_{s=N+2}^{2N}(2N+1-s)^2. \end{align*} In the first sum put $t=s-1$, so $t$ runs from $1$ to $N$. In the second sum put $t=2N+1-s$, so $t$ runs from $N-1$ down to $1$. Therefore \begin{align*} E^+(P) &=\sum_{t=1}^{N}t^2+\sum_{t=1}^{N-1}t^2\\ &=N^2+\sum_{t=1}^{N-1}t^2+\sum_{t=1}^{N-1}t^2\\ &=N^2+2\sum_{t=1}^{N-1}t^2\\ &=N^2+2\cdot \frac{(N-1)N(2N-1)}{6}\\ &=N^2+\frac{(N-1)N(2N-1)}{3}. \end{align*} Expanding the numerator, \begin{align*} (N-1)N(2N-1) &=(N^2-N)(2N-1)\\ &=2N^3-N^2-2N^2+N\\ &=2N^3-3N^2+N. \end{align*} Hence \begin{align*} E^+(P) &=N^2+\frac{2N^3-3N^2+N}{3}\\ &=\frac{3N^2}{3}+\frac{2N^3-3N^2+N}{3}\\ &=\frac{2N^3+N}{3}. \end{align*} Since $P\subset A$, every quadruple $(p_1,p_2,p_3,p_4)\in P^4$ satisfying \begin{align*} p_1+p_2=p_3+p_4 \end{align*} is also a quadruple in $A^4$ satisfying the same equation. Therefore \begin{align*} E^+(A) &\ge E^+(P)\\ &=\frac{2N^3+N}{3}\\ &\ge \frac{2N^3}{3}. \end{align*} Because $P\cap R=\varnothing$, $|P|=N$, and $|R|=N$, we have \begin{align*} |A| &=|P\cup R|\\ &=|P|+|R|\\ &=N+N\\ &=2N. \end{align*} Thus \begin{align*} \frac{2N^3}{3} &=\frac{2}{3}N^3\\ &=\frac{2}{3}\cdot \frac{(2N)^3}{8}\\ &=\frac{(2N)^3}{12}\\ &=\frac{|A|^3}{12}. \end{align*} Combining the inequalities gives \begin{align*} E^+(A)\ge \frac{|A|^3}{12}. \end{align*} Thus the *Energy Form Of Balog-Szemerédi-Gowers* applies with the absolute constant $K=12$. It produces a subset $A'\subset A$ such that \begin{align*} |A'|\ge 12^{-O(1)}|A| \end{align*} and \begin{align*} |A'+A'|\le 12^{O(1)}|A'|. \end{align*} Since $12$ is fixed independently of $N$, this means $|A'|\gg |A|$ and $|A'+A'|\ll |A'|$. The calculation shows that the progression $P$ alone already forces large additive energy in $A$, and BSG extracts a large small-doubling subset without being told which elements of $A$ came from that structured progression. [/example] ## Ruzsa Covering and a Weak Freiman Theorem Once BSG has produced a large small-doubling subset, the next problem is to convert small doubling into more recognisable algebraic structure. In this chapter we prove a weak Freiman-type conclusion: a set with many additive relations is controlled by a small number of translates of a difference set. [quotetheorem:4574] [citeproof:4574] The covering lemma says that a small-doubling set is controlled by one of its difference sets. The hypothesis $|A+B| \le K|B|$ is what makes the greedy maximal family small: without it, the disjoint translates $x+B$ can be numerous, and no bounded-size set $X$ can cover $A$ by translates of $B-B$. For example, if $B=\{0\}$, the conclusion becomes $A \subset X$, so $|X|$ must be at least $|A|$ unless $A$ is already small. The lemma is therefore a covering result, not a classification theorem. It shows that $A$ lies inside a controlled union of translates of $B-B$, but it does not describe the internal structure of $B-B$. We can now combine the two tools to handle a set described only by its additive statistics. The aim is to start from a high-energy set, with no doubling hypothesis assumed in advance, and produce a positive-density subset that is controlled by a single small difference set. The next theorem carries this out: BSG first extracts a small-doubling piece, and Ruzsa covering then packages it as a bounded number of translates of a difference set. [quotetheorem:4591] [citeproof:4591] The energy hypothesis is doing real work. If $A$ is a random set in a large interval or in a large finite group, its additive quadruples are too sparse to force a positive-density structured component, and the conclusion can fail for any polynomially large subset. High energy rules out this random behaviour by ensuring that many pairs of elements of $A$ participate in repeated sums. The two conclusions have separate meanings. The lower bound on $|A'|$ says that the structured part is not negligible, while the bound on $|A'-A'|$ says that the controlling difference object has small size relative to $A'$. The covering statement $A' \subset X+(A'-A')$ is weaker than saying that $A'$ is a progression or coset progression, because $A'-A'$ may itself still need structural analysis. Thus the theorem is mainly conceptual: BSG supplies the small-doubling set, and Ruzsa covering supplies control by a bounded-complexity difference set. ## Approximate Groups Small doubling is a statement about one product or sumset. In noncommutative settings, and even in abelian groups when repeated sums matter, it is more stable to ask whether a set is closed under multiplication up to boundedly many translates. This motivates approximate groups. [definition: Approximate Group] Let $G$ be a group and let $K \ge 1$. A finite set $A \subset G$ is a $K$-approximate group if the following conditions hold: 1. $A$ is symmetric: $e \in A$ and $A=A^{-1}$. 2. There exists a set $X \subset G$ with $|X| \le K$ such that \begin{align*} A \cdot A \subset X \cdot A. \end{align*} [/definition] In an abelian group written additively, this says that $A$ is symmetric around $0$ and $A+A$ is covered by at most $K$ translates of $A$. The condition is stronger than merely $|A+A|\le K|A|$, because it gives a reusable covering statement rather than a single cardinality estimate. This matters under iteration: knowing only that one sumset is small does not by itself specify where $3A,4A,\dots$ sit, while approximate closure gives a stable language for repeated sums and products. Ruzsa covering often allows one to pass between these viewpoints after replacing $A$ by a related set such as $A-A$. [example: Symmetric Arithmetic Progressions] Let \begin{align*} P=\{-N,-N+1,\dots,N\}\subset \mathbb Z. \end{align*} Then $0\in P$ and $P=-P$, so $P$ is symmetric. Moreover \begin{align*} P+P=\{-2N,-2N+1,\dots,2N\}. \end{align*} If $X=\{-N,N\}$, then \begin{align*} X+P=(-N+P)\cup(N+P)=\{-2N,\dots,2N\}, \end{align*} with the evident one-element modification when $N=0$. Hence $P+P\subset X+P$ and $|X|\le2$, so a symmetric arithmetic progression is a $2$-approximate group in additive notation. [/example] The interval example has approximate closure because adding two elements only enlarges the allowed coefficient range by a bounded factor. In higher rank, the same phenomenon occurs for finite boxes in several additive directions, but we need terminology that records the base point, directions, and side lengths before discussing symmetric or centred variants. [definition: Generalized Arithmetic Progression] As in Chapter 4, let $G$ be an abelian group. A generalized arithmetic progression of rank $d$ is a set \begin{align*} P = \{x_0+n_1v_1+\cdots+n_dv_d : 0 \le n_i < L_i \text{ for } 1\le i\le d\}, \end{align*} where $x_0,v_1,\dots,v_d \in G$ and $L_1,\dots,L_d \in \mathbb N$. [/definition] The base point $x_0$ allows progressions to be translated, while the directions $v_i$ and lengths $L_i$ record the ordered additive parameters. For approximate groups, the most useful progressions are centred and symmetric, because they contain $0$, are closed under negation, and their doubling can be covered by a bounded number of translates of themselves. This is the additive model behind the abstract definition of approximate closure. [example: Approximate Groups From Boxes] Let \begin{align*} P=\{n_1v_1+\cdots+n_dv_d: |n_i|\le L_i \text{ for } 1\le i\le d\} \end{align*} in an abelian group $G$. The set contains $0$ and is closed under negation. If $x,y\in P$, then the coefficient of $v_i$ in $x+y$ lies between $-2L_i$ and $2L_i$. Choose a sign vector $\varepsilon=(\varepsilon_1,\dots,\varepsilon_d)\in\{-1,1\}^d$ so that each coefficient of $x+y$ can be written as $\varepsilon_iL_i+c_i$ with $|c_i|\le L_i$. Then \begin{align*} x+y\in \left(\sum_{i=1}^d\varepsilon_iL_iv_i\right)+P. \end{align*} Thus \begin{align*} P+P\subset X+P, \qquad X=\left\{\sum_{i=1}^d\varepsilon_iL_iv_i: \varepsilon_i\in\{-1,1\}\right\}, \end{align*} and $|X|\le2^d$. A centred rank-$d$ box is therefore a $2^d$-approximate group. [/example] Approximate groups also arise from small-doubling sets by symmetrisation. If $|A+A|\le K|A|$ in an abelian group, then Plünnecke-Ruzsa estimates imply that $A-A$ has polynomially controlled doubling, and a bounded translate cover turns a related symmetric set into a $K^{O(1)}$-approximate group. ## Coset Progressions and the Structure Theorem The final question is what approximate groups look like. In abelian groups, the expected answer is: finite subgroups, progressions, and finite sums of these two phenomena. In general groups, progressions must be replaced by nilpotent progressions, but the slogan remains that approximate closure forces algebraic structure. [definition: Coset Progression] Let $G$ be an abelian group. A coset progression is a set of the form $H+P$, where $H \le G$ is a finite subgroup and $P$ is a generalised arithmetic progression in $G$. [/definition] The subgroup part accounts for torsion, while the progression part accounts for ordered additive directions. For example, in $\mathbb F_p^n$, subspaces are already exact additive objects, while in $\mathbb Z^d$ boxes are progression-like objects. The decisive question is whether coset progressions are not merely examples of small-doubling sets but the only possibility. We want to know that every set in an arbitrary abelian group with $|A+A|\le K|A|$ is forced to live inside a coset progression of bounded rank and comparable size, so that the two phenomena above genuinely exhaust the abelian sources of small doubling. The structure theorem below confirms this, giving the general-group conclusion that the earlier integer and finite-field results only previewed. [quotetheorem:4592] [citeproof:4592] The small-doubling hypothesis is indispensable. A random finite subset of an abelian group may have $|A+A|$ close to $|A|^2$, and such a set cannot be controlled by a bounded-rank progression of comparable size. The theorem says that once this random expansion is ruled out, the only remaining abelian sources of small doubling are progression-like directions together with finite subgroup structure. There are still limitations. The theorem is qualitative at the level stated here: the constants and ranks may be large functions of $K$, and the controlling coset progression need not be unique. Its role in the course is to mark the next stage after BSG: BSG produces small doubling, while Freiman-Ruzsa theory explains the shape of sets with small doubling. The abelian theorem does not cover multiplicative growth in noncommutative groups. There, a set can be closed under multiplication up to a few translates without resembling a subgroup or an abelian progression, because commutators create new directions. The right replacement for a coset progression is therefore a coset nilprogression: a bounded-rank nilpotent model that records the controlled failure of commutativity. The next theorem is the structural endpoint for approximate groups in this broader setting. [quotetheorem:4593] [citeproof:4593] The approximate-group hypotheses are necessary here for the same reason as in the abelian setting, but noncommutativity adds another obstruction: a set can have many accidental products without its iterated products being organised by boundedly many translates. Symmetry, containment of the identity, and the covering condition $A \cdot A \subset X \cdot A$ ensure that repeated products remain controlled enough for structure theory to apply. The theorem is not a practical classification of every element of $A$. It gives control by a coset nilprogression with rank and step bounded in terms of $K$, but the bounds are not the main concern in this course and the controlling object may be far from unique. Its importance is conceptual: once a statistical source produces many multiplicative or additive relations, BSG-type arguments give approximate closure, and the structure theorem identifies nilpotent algebraic structure as the endpoint. [explanation: Why BSG Is The Bridge] Many arguments in additive combinatorics begin with a global inequality that detects an abnormal number of coincidences. Additive energy counts coincidences of sums; multiplicative energy counts coincidences of products; incidence estimates often produce dense relation graphs. These outputs are statistical, because they say that many pairs behave well, not that every pair in the set behaves well. Inverse theorems usually require a different input. They start from a set with small doubling or from an approximate group, where every product or sum is controlled up to boundedly many translates. BSG is the mechanism that changes the type of information: it passes from many good relations to a large domain on which all relations are controlled. The cost is quantitative. The subset may be smaller by a polynomial factor in $K$, and the doubling constant may grow polynomially in $K$. For the qualitative structure theory developed in this course, this cost is acceptable; for sharp quantitative applications, improving the exponents becomes a separate problem. [/explanation] The chapter main lesson is that additive structure appears in two forms. Energy and dense graphs reveal statistical structure, while approximate groups and coset progressions describe genuine algebraic structure. The Balog-Szemerédi-Gowers theorem is the conversion principle connecting the two. Balog-Szemerédi-Gowers converts many additive coincidences into genuine structure, while approximate groups provide a language for that structure once it has been found. The next chapter shifts from combinatorial counting to harmonic analysis, where Fourier methods give a new way to detect and exploit additive regularity. # 6. Fourier Methods on Finite Abelian Groups Fourier analysis on finite abelian groups is the point where additive combinatorics turns additive structure into analytic information and then back again. Earlier chapters measured additive structure through doubling, energy, and covering arguments; here we learn to diagonalise convolution and to detect additive structure through large Fourier coefficients. The main questions are: how much of a set is controlled by its large spectrum, and how can that control force subspaces or Bohr sets inside iterated sumsets such as the $2A-2A$ sets already controlled combinatorially in Chapter 3? ## Characters and Fourier Transform on Finite Groups How can we choose coordinates in which addition becomes multiplication? On a finite abelian group, the correct coordinates are characters, because they convert convolution into pointwise multiplication. The two model groups in this chapter are the vector space $\mathbb F_p^n$ and the cyclic group $\mathbb Z/N\mathbb Z$. [definition: Character] Let $G$ be a finite abelian group. A character of $G$ is a group homomorphism $\gamma: G \to S^1$, where $S^1=\{z\in\mathbb C: |z|=1\}$. [/definition] The set of all characters is denoted $\widehat G$ and is itself a finite abelian group under pointwise multiplication. For $G=\mathbb F_p^n$, each $\xi\in\mathbb F_p^n$ defines a character \begin{align*} \gamma_\xi(x)=\exp\left(\frac{2\pi i}{p} x\cdot \xi\right), \end{align*} where $x\cdot\xi=\sum_{j=1}^n x_j\xi_j$ in $\mathbb F_p$. For $G=\mathbb Z/N\mathbb Z$, each $r\in\mathbb Z/N\mathbb Z$ defines \begin{align*} \gamma_r(x)=\exp\left(\frac{2\pi i rx}{N}\right). \end{align*} Characters are the basic oscillations on the group, so the next step is to measure how strongly a function correlates with each one. The Fourier transform packages all of these correlations into a function on the dual group. [definition: Fourier Transform on a Finite Abelian Group] Let $G$ be a finite abelian group and let $f:G\to\mathbb C$. The Fourier transform of $f$ is the function $\widehat f:\widehat G\to\mathbb C$ defined by \begin{align*} \widehat f(\gamma)=\mathbb E_{x\in G} f(x)\overline{\gamma(x)}=\frac{1}{|G|}\sum_{x\in G} f(x)\overline{\gamma(x)}. \end{align*} [/definition] This normalisation is adapted to density arguments: if $A\subseteq G$ and $\mathbb{1}_A$ is its indicator function, write \begin{align*} \alpha=\frac{|A|}{|G|}. \end{align*} Then \begin{align*} \widehat{\mathbb{1}_A}(1)=\frac{|A|}{|G|}=\alpha, \end{align*} where $1$ is the principal character. Thus the zero-frequency coefficient records the density of $A$. Before we can use these coefficients to reconstruct a function or to count additive quadruples, we need to know that the characters behave like an orthonormal system: that averaging one character against the conjugate of another sees only whether the two agree. Everything that follows, from Fourier inversion to Parseval, rests on this single averaging identity, which the next theorem establishes for an arbitrary finite abelian group. [quotetheorem:4594] [citeproof:4594] Orthogonality is the finite analogue of an orthonormal basis calculation: distinct characters do not correlate under the uniform average. The finiteness of $G$ is essential here because the proof uses an actual uniform average over all group elements, not a limiting or measure-theoretic integral. The abelian hypothesis ensures that the full set of characters is large enough to form the frequency side used below; nonabelian groups require matrix-valued representations instead. The next problem is reconstruction. If the Fourier coefficients are supposed to replace the original function, then every value $f(x)$ must be recoverable from the list of numbers $\widehat f(\gamma)$. Orthogonality supplies exactly the cancellation needed: when the candidate reconstruction is averaged against a character, all wrong frequencies disappear and the desired coefficient remains. Fourier inversion is the theorem that makes this no-loss passage precise. [quotetheorem:4595] [citeproof:4595] Fourier inversion says that no information is lost when we pass from $f$ to $\widehat f$, provided we keep every character. The formula depends on the chosen normalisation: because the Fourier coefficient used an average over $G$, inversion has no additional factor of $|G|$. The result is special to finite abelian groups in this elementary form; for nonabelian finite groups, reconstruction uses irreducible representations and matrix coefficients. The first structural example is a subspace, where the transform is concentrated exactly on the dual object that annihilates it. [example: Fourier Transform of a Subspace] Let $G=\mathbb F_p^n$ and let $V\le G$ be a subspace. The Fourier transform of $1_V$ is supported on the annihilator \begin{align*} V^\perp=\{\xi\in\mathbb F_p^n: x\cdot\xi=0\text{ for all }x\in V\}. \end{align*} A direct average gives $\widehat{1_V}(\xi)=|V|/|G|$ for $\xi\in V^\perp$ and $0$ otherwise. Thus perfect additive structure produces Fourier support on a dual subspace. [/example] ## Parseval, Convolution, and Additive Energy How do additive configurations become Fourier inequalities? The answer is that the Fourier transform turns convolution into multiplication, while Parseval converts counting problems into squared Fourier coefficients. This is the analytic form of the energy method from earlier chapters. [definition: Normalised Convolution] Let $G$ be a finite abelian group. Normalised convolution is the operation \begin{align*} *:\mathbb C^G\times\mathbb C^G\to\mathbb C^G \end{align*} defined, for $f,g:G\to\mathbb C$, by \begin{align*} (f*g)(x)=\mathbb E_{y\in G} f(y)g(x-y). \end{align*} [/definition] For indicator functions, the unnormalised representation function is recovered by $r_{A+B}(x)=|G|(\mathbb{1}_A*\mathbb{1}_B)(x)$. The normalised version keeps Fourier identities free from extra powers of $|G|$. The reason convolution is worth introducing at all is that it is the additive operation we ultimately want to analyse: representation counts for $A+B$ are convolutions of indicator functions, and these counts are hard to control directly in physical space. The standard finite Fourier identity for the normalised convention is \begin{align*} \widehat{f*g}(\gamma)=\widehat f(\gamma)\widehat g(\gamma) \end{align*} for every character $\gamma\in\widehat G$. The normalisation is doing real work: no extra factor of $|G|$ appears. This identity is the bridge from additive representation counts to algebraic expressions in Fourier coefficients. Multiplying coefficients is only useful if we can also relate sizes computed in physical space to sizes computed in frequency space; otherwise the transform would tell us nothing quantitative about $A$. The Parseval identity for this finite Fourier normalisation is \begin{align*} \mathbb E_{x\in G}|f(x)|^2=\sum_{\gamma\in\widehat G}|\widehat f(\gamma)|^2. \end{align*} This is the Pythagoras theorem for the finite Fourier basis. It converts a physical-space average into a frequency-space sum where large coefficients can be isolated. The next example applies this identity to additive energy, replacing a count of quadruples by a fourth moment of Fourier coefficients. [example: Energy in Fourier Variables] Let $A\subseteq G$ have density \begin{align*} \alpha=\frac{|A|}{|G|}. \end{align*} Since $\mathbb{1}_A*\mathbb{1}_A$ records normalised representations of sums, Parseval and the convolution identity give \begin{align*} \mathbb E_x |(\mathbb{1}_A*\mathbb{1}_A)(x)|^2=\sum_{\gamma\in\widehat G}|\widehat{\mathbb{1}_A}(\gamma)|^4. \end{align*} Multiplying by $|G|^3$ gives the usual additive energy $E^+(A)=|G|^3\sum_\gamma |\widehat{\mathbb{1}_A}(\gamma)|^4$. A large energy lower bound therefore says that the fourth powers of the Fourier coefficients have substantial mass. [/example] ## Large Spectrum and Dissociated Sets Which frequencies carry useful information about a set? A coefficient $\widehat{1_A}(\gamma)$ is large when $1_A$ has a strong bias in the direction of the character $\gamma$. The large spectrum records these biases, and the Chang lemma says that the large spectrum cannot have too many independent directions. [definition: Large Spectrum] Let $G$ be a finite abelian group, let $A\subseteq G$ have density \begin{align*} \alpha=\frac{|A|}{|G|}, \end{align*} and let $0<\rho\le1$. The $\rho$-large spectrum of $A$ is \begin{align*} \operatorname{Spec}_\rho(A)=\{\gamma\in\widehat G: |\widehat{\mathbb{1}_A}(\gamma)|\ge \rho\alpha\}. \end{align*} [/definition] Parseval immediately bounds the size of the spectrum by $|\operatorname{Spec}_\rho(A)|\le \rho^{-2}\alpha^{-1}$, but this estimate ignores additive relations between frequencies. The Chang argument shows that after passing to the span generated by a small set of independent frequencies, the whole spectrum is controlled. To make that statement precise, we need a notion of independence for characters. The relevant obstruction is not equality of two frequencies, but the existence of short signed multiplicative relations among them. [definition: Dissociated Set] Let $G$ be a finite abelian group. A subset $\Lambda\subseteq\widehat G$ is dissociated if the only choice of coefficients $\varepsilon_\lambda\in\{-1,0,1\}$ satisfying \begin{align*} \prod_{\lambda\in\Lambda}\lambda^{\varepsilon_\lambda}=1 \end{align*} is $\varepsilon_\lambda=0$ for every $\lambda\in\Lambda$. [/definition] In groups of exponent $2$, such as $\mathbb F_2^n$, dissociation is the same as linear independence over $\mathbb F_2$. In cyclic groups, it is a stronger condition than ordinary distinctness: it rules out short signed additive relations among frequencies. Parseval already bounds how many large coefficients the spectrum can contain, but it says nothing about how those frequencies relate to one another additively. The decisive question is whether the large spectrum must be expensive to describe, spread across many independent directions, or whether it has to sit inside the span of only a few frequencies. If the latter holds, then the entire spectrum is captured by a small dissociated generating set whose size grows only logarithmically as the density shrinks. The following lemma of Chang establishes precisely this compression. [quotetheorem:4596] [citeproof:4596] The density parameter matters because a coefficient of size $\rho\alpha$ is large relative to the main Fourier coefficient, not necessarily large in absolute terms. The threshold $\rho$ controls the tradeoff between information and complexity: lowering $\rho$ includes more frequencies but increases the number of generators needed. Dissociation is the obstruction to compression, since independent signed frequency relations cannot be generated from fewer directions. The lemma controls where large coefficients can live, but it does not determine their phases and it does not by itself produce a structured subset of $G$; that requires passing to annihilators or Bohr sets. [example: Spectrum of a Subspace Coset] Let $A=x_0+V$ be a coset of a subspace $V\le\mathbb F_p^n$. The density is $\alpha=|V|/|G|$, and the nonzero Fourier coefficients occur exactly on $V^\perp$. Each such coefficient has modulus $\alpha$, so $\operatorname{Spec}_1(A)=V^\perp$. The Chang lemma is sharp in form here: the spectrum is a subspace of dimension $\log_p(1/\alpha)$, so it is generated by $O(\log(1/\alpha))$ frequencies. [/example] ## From Fourier Bias to Subspaces and Bohr Sets How can a large Fourier coefficient force additive structure in the original group? The guiding principle is that if many frequencies are controlled, then the common level set on which those characters are almost constant is a structured region. In finite vector spaces this region is a subspace; in cyclic groups it is a Bohr set. [definition: Annihilator] Let $G=\mathbb F_p^n$ and let $\Gamma\subseteq\widehat G$ be identified with a subset of $\mathbb F_p^n$. The annihilator of $\Gamma$ is \begin{align*} \Gamma^\perp=\{x\in G: \gamma(x)=1\text{ for every }\gamma\in\Gamma\}. \end{align*} [/definition] If $\Gamma$ has dimension $d$, then $\Gamma^\perp$ has codimension $d$. This is why the Chang lemma is especially powerful in finite-field models: a logarithmic bound on the dimension of the spectrum becomes a polynomial-density subspace in the physical group. In cyclic groups there is usually no exact annihilator of comparable size: requiring all characters to equal $1$ can be too rigid. The problem is not that the characters give no structure, but that exact equality throws away too many points. We need a useful substitute: an approximate common level set, where each relevant character is allowed a small error from $1$. This keeps enough points to be large while still giving a region on which the selected frequencies are controlled. [definition: Bohr Set] Let $G=\mathbb Z/N\mathbb Z$, let $\Gamma\subseteq\widehat G$ be finite, and let $0<\delta\le1$. The Bohr set with frequency set $\Gamma$ and radius $\delta$ is \begin{align*} B(\Gamma,\delta)=\{x\in G: |1-\gamma(x)|\le\delta\text{ for every }\gamma\in\Gamma\}. \end{align*} [/definition] Bohr sets play the role of approximate subspaces in cyclic groups. They are not usually closed under addition, but if the radius is small enough then adding elements from a smaller Bohr set keeps all relevant characters close to $1$. With Bohr sets available as the cyclic-group analogue of subspaces, we can ask what additive structure a positive-density set is forced to contain. The natural place to look is the fourfold sum-difference set $2A-2A$, because there the Fourier coefficients enter squared and hence with a definite sign, so the cancellation that obstructs structure in $A$ itself is no longer possible. The next theorem of Bogolyubov turns this sign positivity into a concrete conclusion: positive density alone guarantees a large Bohr set, or in finite-field models an honest subspace, inside $2A-2A$. [quotetheorem:4597] [citeproof:4597] The positive-density hypothesis is essential: a very sparse set need not force any large neighbourhood of $0$ inside $2A-2A$ without additional small-doubling assumptions. The theorem also does not say that $A$ itself contains a subspace or Bohr set; the structure appears only after taking the fourfold sum-difference, where Fourier cancellation is strong enough to control all points of the neighbourhood. In vector spaces over a fixed finite field the conclusion is genuinely linear, while in cyclic groups Bohr sets are the correct approximate substitute. This distinction is why the next step of the course separates finite-field Bogolyubov-Ruzsa arguments from cyclic-group inverse theory. [example: A Positive Density Set in $\mathbb F_2^n$] Let $A\subseteq\mathbb F_2^n$ have density $\alpha>0$. Applying the finite-field form of the Bogolyubov argument gives a subspace $V\le\mathbb F_2^n$ of codimension $O(\alpha^{-2})$ such that $V\subseteq2A-2A$. Since subtraction equals addition in $\mathbb F_2^n$, this says $V\subseteq4A$. The proof identifies the large spectrum at threshold comparable to $\alpha^{3/2}$, takes its annihilator, and checks by Fourier inversion that every point of the annihilator has a positive number of representations as $a_1+a_2-a_3-a_4$ with $a_i\in A$. [/example] ## Bogolyubov-Ruzsa Arguments What improves when $A$ has small doubling rather than positive density in the whole ambient group? The Ruzsa covering philosophy lets us move to a dense model inside a smaller approximate ambient object, and the Bogolyubov Fourier argument then produces structure inside an iterated sumset. The conclusion is strongest in finite-field settings, where the structured object is an actual subspace. [quotetheorem:4598] [citeproof:4598] The lemma is a bridge between Fourier analysis and inverse additive theory. The small-doubling hypothesis is doing more than providing density: without it, a sparse random set has no reason to place a large subspace inside $2A-2A$. The conclusion also does not classify $A$ itself; it only finds a large subspace in a bounded iterated sumset, and further inverse-theorem arguments are needed to describe $A$ by cosets or models. Quantitatively, the proof depends on the strongest available finite-field Bogolyubov-Ruzsa machinery rather than on the elementary positive-density Bogolyubov theorem alone. It says that polynomial or quasi-polynomial additive control forces genuine linear structure inside a bounded iterated sumset, and this is the input used in many proofs of Freiman-type theorems over finite fields. [example: From Small Doubling to a Large Subspace] Suppose $A\subseteq\mathbb F_2^n$ satisfies $|A+A|\le K|A|$. The finite-field Bogolyubov-Ruzsa lemma gives a subspace $V\subseteq2A-2A$ whose size is at least $\exp(-C(\log K)^4)|A|$. For fixed $K$, this is a positive proportion of $A$ up to a constant depending only on $K$, so bounded doubling forces a large linear piece inside the fourfold sumset. This example shows why $2A-2A$ is more flexible than $A+A$: the extra subtraction stabilises the set enough for Fourier methods to detect a subspace. [/example] ## How the Fourier Chapter Fits the Course What has Fourier analysis added to the combinatorial methods already developed? Energy and covering arguments show that small sumsets produce many additive quadruples; Fourier analysis reveals which frequencies organise those quadruples. The Chang lemma compresses the large spectrum into few directions, the Bogolyubov argument turns those directions into subspaces or Bohr sets, and the Bogolyubov-Ruzsa lemma packages the method for sets of small doubling. The next inverse theorems use these structured pieces as the first approximation to a full description of sets with small sumset. Fourier analysis reframes additive structure in terms of frequency concentration and convolution. Once that analytic viewpoint is available, it becomes natural to compare additive behavior with multiplicative behavior, leading to the tension at the heart of sum-product phenomena. # 7. Sum-Product Phenomena over Real and Finite Fields After Chapters 1 and 4 treated arithmetic progressions as additive models, this chapter asks why a finite set of numbers cannot usually behave like an arithmetic progression and a geometric progression at the same time. Additive structure makes $A+A$ small, while multiplicative structure makes $AA$ small; the sum-product problem measures the tension between these two possibilities. The real-number theory is driven by incidence geometry, and the finite-field theory has to contend with genuine algebraic obstructions coming from subfields. ## The Sum-Product Question What is the smallest possible size of either $A+A$ or $AA$ when $A$ is a finite set of numbers? The examples from the opening chapters suggest two competing models: intervals have small sumset and large product set, while geometric progressions have small product set and large sumset. The sum-product philosophy says that, away from algebraic degeneracies, these are the only ways to make one operation economical. [definition: Sumset And Product Set] Let $F$ be a field and let $A \subset F$ be finite. The sumset and product set of $A$ are \begin{align*} A+A &= \{a+b : a,b \in A\}, & AA &= \{ab : a,b \in A\}. \end{align*} [/definition] The central quantity is $\max(|A+A|,|AA|)$. If this maximum is much larger than $|A|$, then at least one of addition and multiplication expands $A$ significantly. The obstruction to a uniform statement is that the amount of forced expansion depends on the ambient field and on which sets are allowed. Linear growth is the baseline scale, so the meaningful question is whether every admissible set must beat that scale by a fixed power. We package that lower bound as an exponent, making different sum-product estimates comparable. [definition: Sum-Product Exponent] Let $F$ be a field. A number $\delta > 0$ is a sum-product exponent over $F$ if there is a constant $c_\delta > 0$ such that every finite set $A \subset F$ in the relevant range satisfies \begin{align*} \max(|A+A|,|AA|) \ge c_\delta |A|^{1+\delta}. \end{align*} [/definition] Over $\mathbb R$, the relevant range is all finite non-empty sets. Over a finite field, the range must exclude sets that are too large and sets that are concentrated in subfields. [example: Arithmetic And Geometric Progressions] Let $A=\{1,2,\dots,n\} \subset \mathbb R$. Then $A+A=\{2,3,\dots,2n\}$, so $|A+A|=2n-1$, while $AA$ contains many distinct products and in particular has size much larger than $n$; a crude lower bound is obtained from the products $ab$ with $1 \le a \le n$ and $b$ prime in $(n/2,n]$. By contrast, if $G=\{2^0,2^1,\dots,2^{n-1}\}$, then $GG=\{2^k:0 \le k \le 2n-2\}$ has size $2n-1$, while $G+G$ is large because binary expansions distinguish most sums. These examples show that either addition or multiplication can be small, but not both in these model cases. [/example] The two model computations pull in opposite directions: the arithmetic progression keeps the sumset linear while forcing the product set to grow, and the geometric progression does exactly the reverse. The obstruction to having both sets small is that additive structure, modeled by an arithmetic progression, and multiplicative structure, modeled by a geometric progression, are incompatible templates, and no finite set of reals can imitate both at once. The natural conjecture is therefore that at least one of $|A+A|$ and $|AA|$ must exceed $|A|$ by a definite power of $|A|$; quantifying that power is the content of the following theorem. [quotetheorem:4599] [citeproof:4599] The theorem is qualitative from the modern point of view: it gives expansion by some positive power of $|A|$, but the exponent produced by the original argument is tiny and far from the conjectured $1-\varepsilon$. The range of validity matters as well: over $\mathbb R$ the statement holds for every finite set, whereas over a finite field one must exclude sets close to the field size or trapped in a subfield, where both operations can stay small. The reliance on the reals enters through the order structure, which is what makes the elementary proof possible. Its role in the course is to introduce the idea that additive combinatorial structure can be contradicted using geometry, a theme the incidence arguments below will make quantitative. ## Incidences as a Sum-Product Engine How can equations in $A+A$ and $AA$ be converted into geometry? The key observation is that a product-sum identity can be interpreted as a point lying on a line. A set with both small sumset and small product set then forces a large family of incidences. [definition: Point-Line Incidence] Let $P$ be a finite set of points in $\mathbb R^2$ and let $L$ be a finite set of affine lines in $\mathbb R^2$. The number of incidences between $P$ and $L$ is \begin{align*} I(P,L) = |\{(p,\ell) \in P \times L : p \in \ell\}|. \end{align*} [/definition] This terminology is useful because many additive equations can be reinterpreted as asking how often points lie on lines. The obstruction is that a purely algebraic count may hide geometry: if too many solutions occur, then either many points concentrate on few lines or many lines pass through special point configurations. The incidence theorem below is the quantitative rule that prevents uncontrolled concentration in the plane. [quotetheorem:4600] [citeproof:4600] The course uses Szemerédi-Trotter as a black-box incidence theorem. Its hypotheses are deliberately finite: without finite point and line sets the incidence count would not be a combinatorial quantity, and without working over the real plane the crossing and cell-decomposition proof strategy is no longer available in this form. The exponent is the important feature for applications, because it rules out the possibility that a large grid of algebraically defined lines can all pass through too many recorded points. This is the geometric bottleneck that turns algebraic incidence constructions into expansion estimates. [example: Incidences From A Product-Sum Equation] Let $A \subset \mathbb R$ be finite and suppose $0 \notin A$ for simplicity. For $a,b \in A$, consider the line \begin{align*} \ell_{a,b} = \{(x,y) \in \mathbb R^2 : y = ax+b\}. \end{align*} If $x \in A$, then $ax \in AA$ and $ax+b \in AA+A$. Thus the line $\ell_{a,b}$ passes through $|A|$ points of the set $P=A \times (AA+A)$ after recording the pairs $(x, ax+b)$. Since there are $|A|^2$ such lines, this gives $|A|^3$ incidences, which can be compared with Szemerédi-Trotter to force a lower bound on $|AA+A|$. [/example] The incidence count just produced is not special to the particular lines chosen. Every finite real set yields about $|A|^3$ incidences between the $|A|^2$ lines $\ell_{a,b}$ and the point set $A \times (AA+A)$, simply because each such line already passes through $|A|$ of the recorded points. Szemerédi-Trotter caps from above the number of incidences a system of points and lines can have, so confronting the algebraic lower bound with the geometric upper bound forces $|AA+A|$, and hence $\max(|A+A|,|AA|)$, to be large. Elekes packaged this confrontation into the clean lower bound stated next. [quotetheorem:4601] [citeproof:4601] Elekes' argument is the first point where the course uses geometry to prove an additive-combinatorial expansion statement, and the exponent it yields already improves on the bare Erdős-Szemerédi bound. The precise incidence model is less important than the pattern it exhibits: construct many incidences from algebra, then bound them geometrically. The Szemerédi-Trotter input is also where the dependence on the reals enters, since the sharp incidence bound can fail over finite fields, and this is exactly why the finite-field sum-product theorems below require a different mechanism. The next section sharpens the same point set by using the order structure of $\mathbb R$. ## Solymosi Refinement over the Reals Can the incidence method be sharpened by using the order structure of $\mathbb R$? Solymosi improvement uses the geometry of lines through the origin and the fact that products and sums arrange themselves in ordered cones. This method is specific to ordered fields and is not available in the same form over finite fields. [definition: Multiplicative Energy] Let $A,B \subset \mathbb R$ be finite. The multiplicative energy of $A$ and $B$ is \begin{align*} E^{\times}(A,B) = |\{(a_1,b_1,a_2,b_2) \in A \times B \times A \times B : a_1b_1=a_2b_2\}|. \end{align*} For $A=B$, write $E^{\times}(A)=E^{\times}(A,A)$. [/definition] Multiplicative energy measures the failure of the product map $A \times A \to AA$ to be close to injective. By Cauchy-Schwarz, \begin{align*} E^{\times}(A) \ge \frac{|A|^4}{|AA|}. \end{align*} Thus a small product set forces many pairs of points of $A \times A$ to lie on the same line through the origin. This concentration along lines through the origin is exactly the structure that the order of $\mathbb R$ can exploit. If we list the slopes in increasing order, then consecutive lines bound thin angular sectors, and a sum of one point from each of two adjacent lines lands in a region essentially disjoint from the regions produced by other pairs of slopes. Counting these nearly disjoint contributions converts multiplicative energy into a bound on $|A+A|$, and carrying out this ordered argument yields Solymosi's inequality. What we still need is the trade-off stated as a single usable inequality. The question is how strong a sum-product bound the ordered-slope count actually delivers: it should tie the multiplicative energy $E^{\times}(A)$ directly to the sizes of $AA$ and $A+A$, so that a small product set is forced to be accompanied by a large sum set. The following theorem records exactly that relation in the sharp form the argument produces. [quotetheorem:4602] [citeproof:4602] The next example uses Solymosi's inequality as a black-box estimate and turns it into a direct lower bound for the larger of the sumset and product set. [example: Deriving Expansion From Solymosi Inequality] Let $M=\max(|A+A|,|AA|)$ for a finite non-empty set $A \subset \mathbb R$. Solymosi inequality gives \begin{align*} |A|^4 \le C|AA||A+A|^2\log |A| \le CM^3\log |A|. \end{align*} Therefore \begin{align*} M \ge C^{-1/3}\frac{|A|^{4/3}}{(\log |A|)^{1/3}}. \end{align*} This example illustrates how an energy estimate becomes a direct sum-product lower bound after the two expanding quantities are dominated by a single parameter. [/example] The example also points to a boundary of the idea. The following remark, Current Perspective, records that interpretation before the construction is used again. [remark: Current Perspective] The exponent $4/3$ up to logarithmic loss is not the conjectured truth over $\mathbb R$. The Erdős-Szemerédi conjecture predicts that for every $\varepsilon>0$ there is $c_\varepsilon>0$ such that \begin{align*} \max(|A+A|,|AA|) \ge c_\varepsilon |A|^{2-\varepsilon}. \end{align*} The known exponents have improved beyond the bounds presented here, but Elekes and Solymosi remain the conceptual entry point because they explain why incidence geometry is relevant. [/remark] ## Finite Fields and Subfield Obstructions What changes when $\mathbb R$ is replaced by a finite field? The first issue is saturation: if $A$ is a positive proportion of $\mathbb F_p$, then $A+A$ and $AA$ cannot be larger than $p$. The second issue is algebraic: in a field with proper subfields, a subfield is closed under both addition and multiplication. [example: Subfields Prevent Sum-Product Expansion] Let $F$ be a finite field and let $K \subset F$ be a proper subfield. If $A=K$, then \begin{align*} A+A &= K, & AA &= K. \end{align*} Hence $\max(|A+A|,|AA|)=|A|$. Any finite-field sum-product theorem must exclude sets contained in, or strongly concentrated near, cosets of subfields. [/example] The example displays the obstruction before the terminology is introduced. A set can fail sum-product expansion not because it is random-looking, but because it is concentrated inside an affine copy of a smaller field, where both operations remain almost closed. A usable finite-field theorem therefore needs a quantitative condition that prevents large intersections with all such affine subfields. [definition: Subfield Avoidance] Let $F$ be a finite field and let $A \subset F$. We say that $A$ satisfies subfield avoidance at scale $\eta>0$ if there is a constant $C_\eta>0$ such that for every proper subfield $K \subset F$ and every $x,y \in F$ with $y \neq 0$, \begin{align*} |A \cap (x+yK)| \le C_\eta |K|^{1/2}|A|^{1/2-\eta}. \end{align*} [/definition] This condition is one way to formalise the idea that $A$ is not mostly sitting inside an affine copy of a smaller field. Different statements of finite-field sum-product theorems use slightly different avoidance hypotheses, but they all rule out the same obstruction. [quotetheorem:4603] [citeproof:4603] The course states this theorem without proof because the mechanism is substantially deeper than the real incidence arguments above. The subfield-avoidance hypothesis is essential: if $A$ is close to a subfield or one of its affine copies, both $A+A$ and $AA$ can remain small for structural reasons. The size hypotheses keep the statement away from saturation at the full field size and away from tiny exceptional sets where asymptotic expansion has no content. Conceptually, the theorem says that once field-like obstructions are removed, addition and multiplication cannot both preserve a finite set. [remark: Prime Fields] When $F=\mathbb F_p$, there are no proper subfields. In this case the main hypotheses reduce to a size condition such as $|A|<p^{1-\eta}$, together with a lower-size condition excluding bounded sets. The absence of subfields is why prime-field sum-product estimates are technically cleaner than estimates over general finite fields. [/remark] ## How the Chapter Fits the Course The previous chapters developed tools for sets with small additive doubling, such as energy, popular sums, and structural approximations. Sum-product estimates provide a complementary obstruction: a set cannot maintain strong additive structure while also maintaining strong multiplicative structure, except in field-like examples. This is the point where additive combinatorics begins to interact with incidence geometry and with the algebraic structure of the ambient field. [explanation: From Additive Structure To Expansion] If $|A+A|$ is small, earlier tools suggest that $A$ has additive organisation: many additive quadruples, many popular sums, and often a progression-like model. The sum-product theorem says that such organisation forces multiplicative expansion over $\mathbb R$ and over finite fields without subfield concentration. Conversely, if $|AA|$ is small, multiplicative energy is large, and the geometry of ratios or slopes can be used to force additive expansion. This duality is a recurring theme in later applications. Incidence bounds will reappear when estimating configurations, while finite-field sum-product estimates enter arguments where expansion in rings or fields substitutes for Fourier uniformity. The key lesson is not just the inequality itself, but the method: algebraic equations are counted by additive-combinatorial energy, then constrained by geometry or field structure. [/explanation] Sum-product estimates show that addition and multiplication cannot both be highly structured for most finite sets. The next chapter applies that principle to growth, expansion, and arithmetic progressions, where the same ideas are used to force new structure or derive contradiction. # 8. Applications to Growth, Expansion, and Arithmetic Progressions Building on the sum-product estimates of Chapter 7 and the Fourier density-increment tools of Chapter 6, this chapter uses the machinery of additive combinatorics in two directions. The first direction asks how addition and multiplication compete: a finite set cannot usually have both a small sumset and a small product set, and this tension produces expansion. The second direction asks how a dense set with no arithmetic progression can exist: Fourier analysis and density increments show that the absence of progressions forces density to rise on a structured subspace, and iteration gives a contradiction. ## Expansion from Addition and Multiplication What does it mean for a map to expand a finite set? In this setting, expansion means that a natural algebraic map sends a Cartesian product to a set substantially larger than the original input. The guiding example is that the two maps $(a,b) \mapsto a+b$ and $(a,b) \mapsto ab$ cannot both behave like near-constant-width maps on the same finite set. [definition: Additive And Multiplicative Expansion] Let $F$ be a field and let $A \subset F$ be finite. The additive expansion and multiplicative expansion of $A$ are measured by \begin{align*} A+A &= \{a+b : a,b \in A\},\\ A \cdot A &= \{ab : a,b \in A\}. \end{align*} The associated expansion parameters are $|A+A|/|A|$ and $|A \cdot A|/|A|$. [/definition] A set with small additive expansion resembles the structured examples from the early chapters: arithmetic progressions, boxes, or approximate groups. A set with small multiplicative expansion resembles a multiplicative progression or a subgroup. Sum-product estimates say that, away from degenerate cases, a single set cannot strongly resemble both types of structure. Over a finite field this dichotomy must be stated with care, since the field itself is simultaneously an additive group and, after removing $0$, a multiplicative group, and so violates any naive sum-product bound. The right question is whether a set that is small compared with the field size, and not concentrated inside a proper subfield, is still forced to expand under one of the two operations. The next theorem answers this affirmatively in $\mathbb F_p$, where there are no proper subfields to obstruct the conclusion. [quotetheorem:4604] [citeproof:4604] This theorem is used as an expansion statement rather than as a precise numerical estimate. It says that any set below the field-size barrier must expand under at least one of the two basic algebraic operations. The upper bound $|A| < p^{1-\varepsilon}$ is essential in finite fields, since a positive-density subset of $\mathbb F_p$ already has $A+A$ close to the whole field. The lower bound $1<|A|$ excludes singleton sets, for which neither operation can show genuine growth. The theorem does not identify which of the two sets expands, and it does not give an effective structural classification of the exceptional near-extremal examples. Its role in the rest of the chapter is to supply a black-box obstruction to simultaneous additive and multiplicative structure. The next example shows the two pure model behaviours separately. Arithmetic progressions make addition small but multiplication large, while geometric progressions make multiplication small but addition large. [example: Arithmetic Progression Versus Geometric Progression] Let $A = \{1,2,\dots,N\} \subset \mathbb Z$. Then $|A+A| = 2N-1$, so addition has linear expansion. Multiplication behaves differently: products $ab$ with $1 \le a,b \le N$ include many distinct integers, and standard divisor estimates give growth larger than linear by a logarithmic factor. For a geometric progression $G = \{2^0,2^1,\dots,2^{N-1}\}$, the product set has size $2N-1$, while the sumset has roughly $N^2$ elements because binary expansions distinguish most sums. These two examples show why additive and multiplicative structure pull in different directions. [/example] The same principle can be packaged as expansion of a graph or a map. If a bipartite graph has left and right vertex sets both equal to $A$ and sends $(a,b)$ to $a+b$ or $ab$, small image means many collisions. Sum-product estimates rule out having many collisions for both maps at once. This viewpoint suggests asking when a single polynomial map must have a large image on every admissible input set. The definition below records that property as an expansion condition for polynomials, rather than for addition and multiplication separately. [definition: Algebraic Expander] Let $F$ be a field and let $P \in F[x,y]$, viewed as the induced map \begin{align*} P:F\times F\to F,\qquad (x,y)\mapsto P(x,y). \end{align*} The polynomial $P$ is an algebraic expander on a class of finite sets if there is a constant $c>0$ such that for every set $A$ in the class, \begin{align*} |P(A,A)| \ge |A|^{1+c}, \end{align*} where $P(A,A)=\{P(a,b):a,b\in A\}$. [/definition] The polynomials $x+y$ and $xy$ are not expanders on their own, because arithmetic and geometric progressions defeat them respectively. The lesson of sum-product theory is that the pair of maps forms a robust expander: at least one of them expands every non-degenerate input. A direct transfer from finite fields to subsets of the real line fails because a set can concentrate inside a tiny interval and look essentially one-dimensional at every relevant scale. The discretized theorem below resolves this obstruction by requiring quantitative non-concentration across all intermediate intervals. [quotetheorem:4605] [citeproof:4605] This is the continuous-scale analogue of finite sum-product expansion. The lower bound on $N_\delta(A)$ rules out sets too small for a scale-uniform expansion statement, while the upper bound keeps the conclusion below the ambient one-dimensional covering ceiling. The interval non-concentration condition is essential: without it, a set concentrated in a short interval can have both $A+A$ and $A\cdot A$ controlled at the relevant scale. The theorem does not describe extremisers or give an optimal exponent; it only guarantees some power improvement depending on the scale-distribution parameter. It is quoted here because its proof uses a deeper multi-scale incidence and projection argument, and in applications the theorem is usually used as a black box feeding projection, spectral-gap, and growth problems. [remark: Why Discretized Expansion Matters] Discretized sum-product estimates are a bridge between additive combinatorics, harmonic analysis, and dynamics. They feed into results on restricted projections, spectral gaps, and growth in matrix groups, where the objects are not finite subsets of a field but approximate sets observed at a finite resolution. [/remark] ## Density Increments for Three-Term Progressions How can the absence of a pattern force more structure? Roth-type arguments begin with a dense set $A$ and count the forbidden configurations. If the expected number of configurations is missing, then $A$ must have a large Fourier coefficient or another structured imbalance. That imbalance is converted into a density increment on a subspace or long progression. [definition: Three Term Arithmetic Progression] Let $G$ be an abelian group. A three-term arithmetic progression in $G$ is a triple $(x,x+d,x+2d)$ with $x,d \in G$. [/definition] The progression is non-degenerate when the three terms are distinct. In groups of odd characteristic, this is equivalent to $d \neq 0$. The case $G=\mathbb F_p^n$ is especially clean because subspaces replace long arithmetic progressions as the structured sets on which density is incremented. A density increment argument needs a numerical way to say that $A$ has become more concentrated after passing to a structured region. The comparison is always between the ambient proportion of $A$ and its proportion inside a chosen subset, so we fix notation for both measurements before stating any increment step. [definition: Density] Let $G$ be a finite set and let $A \subset G$. The density of $A$ in $G$ is \begin{align*} \alpha = \frac{|A|}{|G|}. \end{align*} If $V \subset G$ is nonempty, the density of $A$ on $V$ is $|A \cap V|/|V|$. [/definition] The central counting object is the average number of progressions weighted by the indicator function of $A$. Over $\mathbb F_p^n$, Fourier analysis diagonalises this count. For that diagonalisation we need the finite-vector-space version of the Fourier transform, with frequencies identified with vectors in the same space. This notation turns correlations with affine hyperplanes into ordinary Fourier coefficients. [definition: Fourier Transform On A Finite Vector Space] Let $G=\mathbb F_p^n$. The Fourier transform on $G$ is the map from functions $f:G\to \mathbb C$ to functions $\hat f:G\to \mathbb C$ defined by \begin{align*} \hat f(\xi) = \mathbb E_{x \in G} f(x) e^{-2\pi i \xi \cdot x/p} \end{align*} for each $\xi \in G$. [/definition] With this normalisation, the constant coefficient is the mean of $f$. Large nonzero Fourier coefficients detect correlation with the level sets of a character, and those level sets are affine hyperplanes. The reason for computing this transform is that the three-term progression count is an average of $\hat f$ over the frequency relation $\xi+\xi=2\xi$, so a set with no progressions cannot have all of its nonzero coefficients small. Some character must therefore correlate strongly with the set, and because its level sets are affine hyperplanes, the set must be denser on one of them than on the whole space. Iterating this density increment until the density can no longer grow is what bounds the size of a progression-free set, and the following theorem records that bound. [quotetheorem:4606] [citeproof:4606] This is the finite-vector-space model of Roth's theorem. The odd-prime hypothesis is needed so that nonzero common difference gives three distinct terms and the equation $x,x+d,x+2d$ behaves as expected. The conclusion does not claim the best possible dependence on $n$, and it does not classify progression-free sets near the upper bound. Its purpose here is to isolate the density-increment mechanism in a form where the algebra is transparent, before comparing it with the stronger cap-set method. The next example uses the theorem's density-increment mechanism in one concrete Fourier step, showing how a large coefficient produces a denser affine hyperplane. [example: One Density Increment Step In Finite Vector Space] Let $G=\mathbb F_p^n$ with $p$ odd, and suppose $A\subset G$ has density $\alpha$ and a balanced function $f=\mathbf{1}_A-\alpha$ satisfying $|\hat f(\xi)|\ge \eta$ for some nonzero $\xi\in G$. Decompose $G$ into the $p$ affine hyperplanes $H_t=\{x\in G:\xi\cdot x=t\}$ for $t\in\mathbb F_p$. The Fourier coefficient is a weighted average of the density deviations $|A\cap H_t|/|H_t|-\alpha$. If all hyperplane densities were less than $\alpha+\eta/2$, cancellation could not produce a coefficient of size $\eta$; hence some $H_t$ has density at least $\alpha+c_p\eta$. In the Roth argument, $\eta\gtrsim_p \alpha^2$, so this gives a density increment of order $\alpha^2$ on a codimension-one affine subspace. [/example] The point of the example is that the density increment is not a qualitative compactness argument. It is a quantitative replacement of a missing progression count by a specific subspace on which $A$ is denser. ## From Roth To Cap Sets How far can Fourier density increments go? They prove strong theorems for three-term progressions, but their bounds lose power as the required structures become more rigid or the ambient group changes. In $\mathbb F_3^n$, density increments still show that progression-free sets cannot have fixed positive density, but they do not by themselves give the sharp exponential saving. The cap-set problem asks for the largest subset with no three distinct elements summing to zero, which is the same as having no non-degenerate three-term progression. [definition: Cap Set] A cap set is a subset $A \subset \mathbb F_3^n$ containing no three distinct elements $x,y,z \in A$ with \begin{align*} x+y+z=0. \end{align*} [/definition] In characteristic $3$, the condition $x,y,z$ form a three-term progression is equivalent to $x+y+z=0$. This turns an additive-combinatorial question into a problem about tri-colored additive relations. [quotetheorem:4607] [citeproof:4607] This theorem is a modern strengthening of the finite-field Roth phenomenon for $p=3$. The restriction to $\mathbb F_3^n$ is not cosmetic: in characteristic $3$, the three-term progression equation becomes the symmetric relation $x+y+z=0$, which is what the slice-rank method exploits. The theorem gives no exact value for the optimal constant and does not describe the largest cap sets structurally. Its forward role is methodological: it shows where polynomial-method arguments can outperform density increments. The proof is not included in this course; it uses the polynomial method, especially the slice-rank method of Ellenberg and Gijswijt following Croot, Lev, and Pach. [remark: Comparison With Density Increment Bounds] A density-increment proof gives bounds that are polynomially smaller than the ambient space size in finite-vector-space Roth problems. The cap-set theorem gives an exponential saving over $3^n$. This difference marks a genuine change of method: Fourier analysis finds a denser affine subspace, while slice rank bounds the possible size of a progression-free set directly. [/remark] ## Additive Combinatorics As A Route To Arithmetic Progressions Why do sumsets, energy, and structure theorems help with arithmetic progressions? A progression-free set has fewer additive coincidences than a random set of the same density. When this deficit is translated into Fourier or energy language, it produces either a density increment or a structured model. The earlier chapters supply the vocabulary for making this translation quantitative. [quotetheorem:4608] [citeproof:4608] This alternative is the operational heart of the proof. The odd characteristic assumption again prevents the progression equation from degenerating, and the finite-vector-space setting is what makes the increment occur on an affine hyperplane rather than on a long arithmetic progression. The statement does not say that every dense set has a large Fourier coefficient; it says that failure of the expected progression count forces one. Repeatedly applying it inside the hyperplane obtained at the previous step either finds the desired progression or raises the density on a smaller affine space. The iteration is where the quantitative bound enters. The following example records the bookkeeping: density can only increase finitely many times, while each increment spends one dimension. [example: Iterating The Increment] Suppose $A\subset \mathbb F_p^n$ has density $\alpha_0$ and no non-degenerate three-term progression. Define $\alpha_{k+1}\ge \alpha_k+c_p\alpha_k^2$ whenever the density increment alternative is applied. Since $\alpha_k\ge\alpha_0$, after $m$ steps the density has increased by at least $m c_p\alpha_0^2$. The density cannot exceed $1$, so $m\le (1-\alpha_0)/(c_p\alpha_0^2)$. Because each step lowers the dimension by one, the process is possible only when $n$ is at least a constant multiple of $\alpha_0^{-2}$ under this rough bookkeeping; sharper bookkeeping yields the stated $O_p(\alpha^{-1})$ threshold in the Fourier proof. [/example] The final message of the chapter is that additive combinatorics gives two complementary kinds of application. Sum-product theory turns structural incompatibility into expansion, feeding problems where growth is the desired conclusion. Density-increment theory turns the absence of arithmetic patterns into stronger local density, feeding problems where eventual contradiction is the desired conclusion. Both applications use the same philosophy: a finite set with too many additive or multiplicative coincidences must reveal structure somewhere. The applications chapter showed how additive-combinatorial tools drive expansion arguments and density increment schemes. Those ideas culminate in the Green-Tao theorem, where the goal is to synthesize structure, uniformity, and dense-model arguments into a single proof strategy. # 9. The Green-Tao Theorem as a Structural Synthesis This chapter explains how the previous parts of the course assemble into the proof strategy behind the Green-Tao theorem. Chapters 6 through 8 developed two recurring ideas: dense sets contain arithmetic structure, and sparse sets can sometimes be studied by replacing them with dense models. The primes are too sparse for Szemerédi's theorem to apply directly, but they are not sparse in an arbitrary way; after removing small-prime congruence obstructions, they behave randomly enough for a transference argument. The guiding question is: how can a theorem about dense subsets of intervals be transferred to a subset of the primes, whose natural density in the integers is zero? The answer has three components. First, replace counting measure by a pseudorandom majorant. Second, prove a relative version of Szemerédi's theorem inside that majorant. Third, verify that the primes, after the $W$-trick and normalisation, have positive relative density inside such a majorant. ## Why Direct Density Arguments Fail for the Primes The first obstruction is numerical: the primes are not a dense subset of $\{1,\dots,N\}$. The prime number theorem gives \begin{align*} |\mathcal P \cap \{1,\dots,N\}| \sim \frac{N}{\log N}, \end{align*} so the density tends to $0$. Szemerédi's theorem only applies to subsets whose density is bounded below independently of $N$, and so it cannot be inserted directly into the primes. [definition: Arithmetic Progression of Length k] Let $k \ge 1$. An arithmetic progression of length $k$ in $\mathbb Z$ is a set of the form \begin{align*} \{a, a+d, a+2d, \dots, a+(k-1)d\} \end{align*} with $a,d \in \mathbb Z$ and $d \neq 0$. [/definition] The sign of $d$ is immaterial for existence questions, but in finite intervals it is common to take $d>0$. Because the primes are too sparse for Szemerédi's theorem to apply to them directly, we first isolate the dense statement we ultimately want to transport to the sparse setting. The question it settles is whether positive density alone, with no arithmetic information about which integers lie in the set, already forces arbitrarily long progressions. The following theorem of Szemerédi says that it does, and it is the dense engine that the later transference principle will run on top of a sparse pseudorandom model. [quotetheorem:4609] [citeproof:4609] In this course the theorem is used as a structural input rather than reproved in full. Its hypotheses matter in two separate ways: the lower density bound excludes sparse counterexamples such as a set with only $N/\log N$ elements, and the largeness condition on $N$ allows finite initial exceptions depending on $k$ and $\delta$. The theorem does not say that every sparse set contains progressions, nor does it supply useful information about the primes by itself. For our purposes, Szemerédi's theorem is the dense model result: it says that density alone forces progressions, and this is the dense statement that transference will later imitate inside a pseudorandom majorant. [example: Density Zero of the Primes] Let $\mathcal P$ denote the set of primes. In $\{1,\dots,N\}$ the relative size of $\mathcal P$ is \begin{align*} \frac{|\mathcal P \cap \{1,\dots,N\}|}{N} \sim \frac{1}{\log N}. \end{align*} Thus for any fixed $\delta>0$, the inequality $|\mathcal P \cap \{1,\dots,N\}| \ge \delta N$ fails for all sufficiently large $N$. This calculation shows that Szemerédi's theorem cannot be applied to the primes using ordinary density in the interval. [/example] The failure is not merely a technicality. A sparse set with density on the scale \begin{align*} \frac{1}{\log N} \end{align*} may have no long progressions at all, so the conclusion must rely on special distributional information about the primes. ## The Transference Problem The transference problem asks whether a dense theorem remains true when counting measure is replaced by a sparse measure that behaves randomly with respect to the configurations being counted. In the Green-Tao argument, the dense subset is not a subset of $\{1,\dots,N\}$ in the usual sense, but a function bounded above by a majorant $\nu$ whose total mass is comparable to $N$. [definition: Measure on a Finite Cyclic Group] Let $N \in \mathbb N$. A measure on $\mathbb Z/N\mathbb Z$ is a function $\nu: \mathbb Z/N\mathbb Z \to [0,\infty)$ satisfying \begin{align*} \mathbb E_{x \in \mathbb Z/N\mathbb Z} \nu(x) = 1 + o(1) \end{align*} for the limiting parameter under discussion. [/definition] This normalisation makes $\nu$ comparable to the constant function $1$ in average size, even if it is large on a sparse set and small elsewhere. A function $f$ with $0 \le f \le \nu$ is then interpreted as a relatively dense subset of the support of $\nu$ when $\mathbb E f$ is bounded below. [definition: Positive Relative Density] Let $\nu: \mathbb Z/N\mathbb Z \to [0,\infty)$ be a measure. A function $f: \mathbb Z/N\mathbb Z \to [0,\infty)$ has relative density at least $\delta$ inside $\nu$ if \begin{align*} 0 \le f(x) \le \nu(x) \quad \text{for all } x, \qquad \mathbb E_{x \in \mathbb Z/N\mathbb Z} f(x) \ge \delta. \end{align*} [/definition] The point is that $f$ may represent a subset of a sparse ambient set, yet still have average bounded below because $\nu$ has been scaled to have mean about $1$. [example: Relative Density After Normalisation] Suppose $S \subset \{1,\dots,N\}$ has size about $N/\log N$, and define \begin{align*} \nu(x) = (\log N)\mathbb{1}_S(x). \end{align*} Then $\mathbb E \nu \approx 1$. If $A \subset S$ has $|A| \ge c|S|$ and we put $f(x)=(\log N)\mathbb{1}_A(x)$, then $0 \le f \le \nu$ and \begin{align*} \mathbb E f \approx c. \end{align*} Thus $A$ has density zero in $\{1,\dots,N\}$ but positive relative density inside the weighted ambient measure $\nu$. [/example] This example is the model for the primes. The measure $\nu$ cannot be the rescaled prime indicator itself, because it must satisfy strong pseudorandomness estimates that are easier to prove for a smoother enveloping sieve weight. [definition: Pseudorandom Majorant] Let $k \ge 1$ be fixed. A pseudorandom majorant for $k$-term arithmetic progressions is a family of measures $\nu_N: \mathbb Z/N\mathbb Z \to [0,\infty)$ such that $\mathbb E \nu_N=1+o(1)$ and such that the linear forms estimates needed to count $k$-term arithmetic progressions in functions bounded by $\nu_N$ hold with the same main terms as for the constant measure $1$. [/definition] The phrase "linear forms estimates" is doing real work: it means that products of $\nu_N$ evaluated at several affine-linear forms have average $1+o(1)$ whenever the forms are the finite collection required by the counting and dense model steps. With a pseudorandom majorant in hand we can finally pose the question the whole construction was built to answer: does positive density measured relative to $\nu_N$, rather than relative to the uniform measure, still force long progressions? An affirmative answer would let us replace the constant ambient measure in Szemerédi's theorem by the sparse enveloping weight, and thereby reach sets as thin as the primes. The relative Szemerédi theorem supplies exactly this conclusion, provided the majorant satisfies the linear forms condition just isolated. [quotetheorem:4610] [citeproof:4610] This theorem is quoted here as the transference input. The domination hypothesis $0 \le f_N \le \nu_N$ is what lets the sparse function be compared with a dense model; without it, a large average alone gives no control over the exceptional spikes of $f_N$. The pseudorandomness hypothesis is also essential: a measure such as $2\mathbb{1}_{2\mid x}$ has mean $1$ but distorts progression-counting forms. The conclusion does not assert that every sparse set of positive relative density contains progressions; it applies only when the ambient measure passes the required linear forms tests. The average includes all $r$, including $r=0$, but diagonal progressions contribute only $O(1/N)$ to the normalised count when $f_N$ is controlled by a pseudorandom majorant, so the positive lower bound forces nontrivial progressions for large $N$. Relative Szemerédi is the formal bridge between dense combinatorics and the primes. Once the primes are placed under a suitable pseudorandom majorant with positive relative density, the theorem supplies progressions. ## Gowers Norms and Arithmetic Regularity The next question is what kind of randomness a measure or function must satisfy so that progression counts are stable. For arithmetic progressions of length $k$, the relevant uniformity is measured by the Gowers $U^{k-1}$ norm. [definition: Gowers Uniformity Norm] Let $G$ be a finite abelian group and let $s \ge 1$. The Gowers uniformity norm is the map \begin{align*} \|\cdot\|_{U^s(G)}: \mathbb C^G \to [0,\infty) \end{align*} defined for $f:G \to \mathbb C$ by \begin{align*} \|f\|_{U^s(G)}^{2^s} = \mathbb E_{x,h_1,\dots,h_s \in G} \prod_{\omega \in \{0,1\}^s} \mathcal C^{|\omega|} f(x+\omega_1h_1+\cdots+\omega_s h_s), \end{align*} where $\mathcal C$ denotes complex conjugation. [/definition] For real-valued indicator-type functions, the norm measures how much the function correlates with additive cube patterns. Large $U^1$ norm detects a nonzero mean; large $U^2$ norm detects large Fourier coefficients; higher norms detect higher-order additive structure. The reason these norms are introduced at all is that we want a quantity that controls how many arithmetic progressions a function carries. The question is whether smallness in a single uniformity norm is enough to render a factor invisible to progression counts, regardless of the other factors present. The next theorem answers this: a bounded function with small $U^{k-1}$ norm contributes negligibly to the count of $k$-term progressions whenever it occupies any one slot. [quotetheorem:4611] [citeproof:4611] This principle is one of the places where the exact choice of uniformity norm is forced. The boundedness hypothesis prevents a single large spike from dominating the progression average, and the use of $U^{k-1}$ is sharp for this Cauchy-Schwarz method because a $k$-term progression has $k-1$ independent shift directions after differencing. The theorem does not count progressions by itself; it only says that any term containing a uniform factor is negligible. This is why later arguments first decompose a function into structured and uniform parts before counting configurations. [example: Three-Term Progressions and Fourier Uniformity] For $k=3$, the controlling norm is $U^2$. On $G=\mathbb Z/N\mathbb Z$, a function with small $U^2(G)$ norm has small Fourier coefficients, so its contribution to \begin{align*} \mathbb E_{x,r} f_0(x)f_1(x+r)f_2(x+2r) \end{align*} is small whenever it appears in one slot. This recovers the familiar Fourier-analytic proof pattern for Roth-type results: split a function into structured and uniform parts, count progressions in the structured part, and bound all terms containing a uniform factor. [/example] Arithmetic regularity packages this decomposition systematically. A bounded function is decomposed into a structured part, a uniform error, and a small exceptional error; the structured part is measurable with respect to a factor that captures the relevant additive information. [remark: Arithmetic Regularity Requires Formal Data] The generalized von Neumann inequality tells us that uniform factors can be discarded, but it does not by itself produce the splitting into structured and uniform parts. A fully formal arithmetic regularity lemma must specify what "structured" means, what complexity measure is being controlled, which uniformity threshold is required, and how small the exceptional $L^2$ error must be. Different settings use different structured-factor models, such as character factors, polynomial phase factors, nilspace factors, or nilsequence factors. In these notes, the regularity lemma should therefore be read as a methodological principle rather than as a single theorem card: once the structured model and quantitative parameters are fixed, it supplies the structured-plus-uniform decomposition on which the counting strategy depends. [/remark] The boundedness hypothesis keeps the $L^2$ energy finite and normalised, while the finite ambient group gives a setting in which conditional expectations onto finite-complexity factors are well behaved. The regularity viewpoint does not identify the structured part uniquely, and in outline form it hides the quantitative dependence of the complexity bound on the threshold and error tolerance. In the Green-Tao proof, the same philosophy appears through dense model and counting lemmas rather than as a single elementary decomposition. The logic persists: uniform components do not affect progression counts, while structured components are controlled by dense combinatorics. ## Inverse Theorems and Linear Forms Estimates Uniformity norms are useful only when large norm has a structural explanation. The inverse problem asks: if $\|f\|_{U^s}$ is large, what object must $f$ correlate with? [remark: Inverse Theorem for Gowers Norms] The Green-Tao-Ziegler inverse theorem answers this question in the finite-interval setting: after fixing the normalisation of the $U^s([N])$ norm, a bounded function with large $U^s$ norm correlates quantitatively with an $(s-1)$-step nilsequence of bounded complexity. The full statement requires nilmanifolds and quantitative equidistribution theory, so these notes use it as a black-box structural principle rather than as a self-contained theorem proved here. [/remark] The role of the inverse theorem is conceptual: high Gowers norm is not mysterious randomness failure; it is correlation with controlled algebraic structure. Linear forms estimates are the complementary pseudorandomness input. Instead of identifying structure in a dense function, they assert that the majorant does not bias the finite systems of affine-linear forms used to count progressions. The issue is that having average value $1$ is not enough: a weight may still correlate with the same patterns the proof is trying to count. The next condition rules out those correlations by testing the weight simultaneously along every affine-linear system needed in the argument. [definition: Linear Forms Condition] Let $\nu_N: \mathbb Z/N\mathbb Z \to [0,\infty)$ be a family of measures. Fix a finite collection of systems. A system in the collection consists of an integer $d \ge 1$ and affine-linear maps \begin{align*} \psi_i:(\mathbb Z/N\mathbb Z)^d \to \mathbb Z/N\mathbb Z, \qquad 1 \le i \le m. \end{align*} The family satisfies the linear forms condition for this collection if, for every such system, \begin{align*} \mathbb E_{y \in (\mathbb Z/N\mathbb Z)^d} \prod_{i=1}^{m} \nu_N(\psi_i(y)) = 1+o(1). \end{align*} [/definition] This condition says that the measure behaves like the constant function $1$ on the exact configurations that the proof tests. It is stronger than merely having mean $1$, because correlations between several shifts of $\nu_N$ must also have the expected size. [example: Mean One Does Not Imply Pseudorandomness] Let $N$ be even and define $\nu(x)=2\mathbf{1}_{2\mid x}$ on $\mathbb Z/N\mathbb Z$. Then $\mathbb E \nu=1$, but \begin{align*} \mathbb E_{x,r} \nu(x) \nu(x+r) \nu(x+2r) \end{align*} is biased by parity constraints. Three-term progressions with odd common difference alternate parity, so the measure does not behave like $1$ on progression-counting forms. This example shows why a majorant for the primes must remove local congruence obstructions before any randomness assertion can be true. [/example] The same issue is severe for the primes: except for the prime $2$, all primes are odd, and modulo any small prime $p$ they avoid the residue class $0$. The $W$-trick removes these deterministic local obstructions. ## The W-Trick and the Prime Majorant Before the primes can be treated as pseudorandom, one isolates a residue class with no small-prime obstruction. Let \begin{align*} W = \prod_{p \le w} p, \end{align*} where $w=w(N)$ grows slowly with $N$. Instead of studying primes $n$ directly, study numbers of the form $Wn+b$, where $\gcd(b,W)=1$. [definition: W-Tricked von Mangoldt Function] Let $W=\prod_{p\le w}p$ and let $b$ satisfy $\gcd(b,W)=1$. The $W$-tricked von Mangoldt function is the map \begin{align*} \Lambda_{W,b}:\mathbb N \to \mathbb R_{\ge 0}, \qquad n \mapsto \frac{\varphi(W)}{W}\Lambda(Wn+b), \end{align*} where $\Lambda$ is the von Mangoldt function and $\varphi$ is Euler's totient function. [/definition] The factor $\varphi(W)/W$ renormalises the expected average. After restricting to a reduced residue class modulo $W$, the most visible congruence biases have been divided out. [example: Positive Relative Density of the Primes in a Sieve Majorant] Assume a sieve weight $\nu_N$ majorises a normalised prime weight on $\{1,\dots,N\}$, for instance $0 \le c\Lambda_{W,b}(n) \le \nu_N(n)$ after harmless truncation. The prime number theorem in arithmetic progressions gives \begin{align*} \mathbb E_{1\le n\le N} \Lambda_{W,b}(n) \approx 1. \end{align*} Thus the prime weight has positive relative density inside $\nu_N$. Applying relative Szemerédi to $f(n)=c\Lambda_{W,b}(n)$ yields many progressions in the $n$-variable, and these become progressions $Wn+b$ in the primes whenever the von Mangoldt weight is supported on prime values. [/example] The actual Green-Tao proof uses an enveloping sieve majorant rather than the prime weight itself. The sieve majorant is designed to dominate the primes while satisfying the necessary linear forms condition. With the dense input, the transference principle, and the sieve majorant assembled, the question is whether positive relative density inside a pseudorandom majorant is genuinely enough to inherit Szemerédi's conclusion. We want a statement that runs the dense counting argument inside the sparse universe cut out by the majorant. The following theorem records this transference result and, applied to the primes, yields arbitrarily long progressions. [quotetheorem:4614] [citeproof:4614] The theorem is a synthesis rather than a single new density argument. The condition that the majorant satisfy linear forms estimates is essential; arbitrary sparse subsets of the integers do not inherit Szemerédi's conclusion. The theorem also does not provide practically sized effective bounds for the first $k$-term prime progression. Its role here is structural: Szemerédi supplies progressions in dense sets, transference lets dense arguments run inside a pseudorandom sparse universe, and analytic number theory supplies the pseudorandom majorant for the primes. ## How the Pieces Fit Together The final question is how to remember the architecture without losing the role of each technical ingredient. The proof has a dense combinatorial side and an analytic number theory side, joined by transference. [explanation: Structural Synthesis] The dense side begins with Szemerédi's theorem: positive density in an interval forces arithmetic progressions. Gowers norms explain which kind of uniformity controls progression counts, while arithmetic regularity and inverse theorems explain how a function decomposes into structured and uniform pieces. The sparse side begins with the primes and their local congruence biases. The $W$-trick removes the fixed small-prime obstructions, and the enveloping sieve constructs a majorant with average $1+o(1)$. Linear forms estimates certify that this majorant behaves like the constant measure for the configurations used in the dense argument. The bridge is the relative Szemerédi theorem. It says that if $f$ is bounded by a pseudorandom measure and has positive average, then $f$ contains the same kind of arithmetic configurations as a dense subset of a uniform ambient space. In this sense, Green-Tao is not a direct strengthening of Szemerédi's theorem to all sparse sets; it is a theorem about sparse sets that admit sufficiently strong pseudorandom majorants. [/explanation] The explanation gives the organizing idea; the example now supplies a test case. This keeps the next construction tied to computations rather than only to terminology. [example: A Sparse Set Where Transference Would Fail] Let $S$ be the set of even numbers in $\{1,\dots,N\}$ and put $\nu=2\mathbf{1}_S$. The set $S$ has density $1/2$ in the usual interval, so it contains long arithmetic progressions for elementary reasons, but $\nu$ does not model the uniform measure on all linear forms because it forces parity. If a sparse set concentrates on residue classes that distort the progression-counting forms, a relative theorem with respect to that measure cannot be applied without first factoring out the obstruction. The $W$-trick is precisely this factoring-out step for the primes. [/example] The chapter therefore closes the course by connecting its main themes: additive configurations are controlled by uniformity norms; lack of uniformity is structural; structure can be regularised; and sparse number-theoretic sets can inherit dense combinatorial conclusions when their ambient measure is pseudorandom enough. The Green-Tao theorem chapter brought together energies, Fourier analysis, regularity, and density increments as a finitary route to arithmetic patterns. The final chapter broadens that perspective by showing how ergodic ideas and correspondence principles package the same conclusions in a different language, linking combinatorics back to measure-theoretic dynamics. # 10. Ergodic Connections and Correspondence Principles Chapters 2 through 9 developed finitary tools for detecting additive structure: energies, covering, Fourier analysis, regularity, transference, and density increments. This chapter explains a different route to the same qualitative theorems, in which a dense subset of the integers is encoded as a measurable set inside a dynamical system. The payoff is conceptual: recurrence in dynamics becomes arithmetic progression structure in number theory, and Szemeredi's theorem appears as a shadow of multiple recurrence. The point is not that ergodic methods replace finitary arguments in quantitative additive combinatorics. Rather, they isolate the limiting mechanism behind density increment proofs: a set of positive density cannot avoid returning to itself along several correlated shifts forever. We will build the correspondence principle, state Furstenberg's multiple recurrence theorem, and see how it implies Szemeredi's theorem and its multidimensional extensions. ## From Dense Sets to Dynamical Systems How can a subset of $\mathbb N$ become an object to which measure theory applies? A finite density statement only sees long intervals, so the construction records all translates of the indicator sequence and takes a compact closure. Positive upper density then becomes positive measure for a cylinder event. [definition: Upper Density] Let $A \subset \mathbb N$. The upper density of $A$ is \begin{align*} \overline{d}(A) = \limsup_{N \to \infty} \frac{|A \cap \{1,\dots,N\}|}{N}. \end{align*} [/definition] A set has positive upper density when it occupies a positive proportion of infinitely many initial intervals. The correspondence principle will use a sequence of such intervals on which the density approaches the limsup. The difficulty is that upper density is only an asymptotic statistic on finite intervals, while recurrence theorems require events inside a probability space that can be shifted without changing their measure. The correspondence principle resolves this by replacing translates of the original set with a shift action on a limiting space, so we first name the dynamical structure that makes such recurrence statements meaningful. [definition: Measure-Preserving System] A measure-preserving system is a quadruple $(X, \mathcal B, \mu, T)$ where $(X, \mathcal B, \mu)$ is a probability space and $T:X \to X$ is measurable with \begin{align*} \mu(T^{-1}E)=\mu(E) \end{align*} for every $E \in \mathcal B$. [/definition] Here $T$ should be thought of as time evolution. In the correspondence principle, $T$ is the left shift on binary sequences, and recurrence of a measurable set under $T$ records simultaneous containment of several shifted copies of $A$. [example: The Shift System of a Dense Set] Let $A \subset \mathbb N$ satisfy $\overline d(A)>0$, and let $x_A \in \{0,1\}^{\mathbb Z}$ be a two-sided sequence with $x_A(n)=\mathbf{1}_A(n)$ for $n \ge 1$, extended arbitrarily to $n \le 0$. Let $T:\{0,1\}^{\mathbb Z}\to \{0,1\}^{\mathbb Z}$ be the shift $(Tx)(n)=x(n+1)$, and let $X$ be the closure of $\{T^m x_A:m\in\mathbb Z\}$ in the product topology. If $B=\{x\in X:x(0)=1\}$, then visits of $x_A$ to \begin{align*} B \cap T^{-h_1}B \cap \cdots \cap T^{-h_k}B \end{align*} correspond to integers $n$ for which $n,n+h_1,\dots,n+h_k$ all lie in $A$. A limiting empirical measure along intervals where $A$ has near-maximal density gives a $T$-invariant probability measure $\mu$ on $X$ with $\mu(B)=\overline d(A)$ along the chosen subsequence. [/example] The example hides a compactness step. The space $\{0,1\}^{\mathbb Z}$ is compact by Tychonoff's theorem, so empirical averages have weak* limit points. These limit points are shift-invariant because boundary errors vanish when intervals become long. Having built the shift system and its invariant measure, we need to know what dynamical recurrence buys back in the integers. The question is whether positive-measure recurrence in $(X,\mu,T)$ can be translated into actual combinatorial configurations inside the original set $A$. The correspondence principle records this translation in the direction that applications require. [quotetheorem:4615] [citeproof:4615] The inequality is the direction needed for applications: if the dynamical system has positive measure recurrence for a pattern, then the original set has positive upper density of translates of that pattern. In particular, existence of one recurrent shift gives at least one corresponding arithmetic configuration in $A$. Only this implication is used; the reverse passage from combinatorics to dynamics was already supplied by the orbit-closure construction, so the principle functions as a bridge rather than an equivalence. The positive upper density hypothesis on $A$ is essential, since it is exactly what guarantees that the empirical measures do not escape to a trivial limit with $\mu(B)=0$. The statement is also purely qualitative: it transfers existence of recurrence but carries no quantitative bound on how often the configuration appears. This is why the next step strengthens single recurrence to multiple recurrence, which is the property that detects full arithmetic progressions rather than a single shifted pair. ## Multiple Recurrence as the Dynamical Core What dynamical statement corresponds to finding long arithmetic progressions? A $k$-term arithmetic progression in $A$ is an integer $n$ and a common difference $r$ such that $n,n+r,\dots,n+(k-1)r$ lie in $A$. In the shift system this is exactly the non-emptiness, and in fact positive measure, of \begin{align*} B\cap T^{-r}B\cap \cdots \cap T^{-(k-1)r}B. \end{align*} The correspondence principle reduces the combinatorial problem to finding a positive-measure simultaneous return of $B$ to itself under the powers $T^r,T^{2r},\dots,T^{(k-1)r}$. Ordinary recurrence only gives one return of $B$ and is too weak to force all positions of a progression at once. The needed dynamical input is multiple recurrence: positive measure sets must return along a common time in all these shifted copies simultaneously. [quotetheorem:4616] [citeproof:4616] The theorem is qualitative: it asserts existence of a return time, not an efficient bound for how far one must search. That is why it proves Szemeredi's theorem in its qualitative form but does not reproduce the quantitative information obtained from density-increment or regularity methods. [example: Three-Term Recurrence] Take $k=3$. If $\mu(B)>0$, multiple recurrence gives $r\ge 1$ with \begin{align*} \mu(B\cap T^{-r}B\cap T^{-2r}B)>0. \end{align*} For a shift system arising from $A\subset\mathbb N$, this means that the set of $n$ for which $n,n+r,n+2r\in A$ has positive upper density along the corresponding subsequence. Thus the dynamical recurrence statement is exactly the three-term progression statement after translation through the correspondence principle. [/example] ## Szemeredi's Theorem from Correspondence Why does recurrence imply a theorem about dense subsets of integers rather than only a theorem about limiting systems? The correspondence principle preserves enough finite intersection data to pull a positive-measure recurrent intersection back to a positive-density intersection of integer shifts. [quotetheorem:4617] [citeproof:4617] This proof is short because the hard work is concentrated in multiple recurrence. Finitary proofs distribute the work differently: they search inside a large interval for a density increment on a subprogression or a structured Bohr-type set, then iterate until the desired pattern is forced. [remark: Qualitative Versus Quantitative Information] The ergodic proof gives an existence theorem for every set of positive upper density. It does not give practical bounds for the least $N$ such that every subset of $\{1,\dots,N\}$ of density at least $\delta$ contains a $k$-term progression. Quantitative bounds require finitary replacements for compactness, recurrence, and characteristic-factor arguments. [/remark] ## Ergodic Proofs and Density-Increment Proofs What is the conceptual relation between Furstenberg's proof and Roth-style density increments? Both arguments divide behaviour into uniform and structured components. A uniform component cannot correlate with the pattern-counting operator, while a structured obstruction either already produces many patterns or yields a place where the set has increased density. [explanation: Translating the Dictionary] In finitary additive combinatorics, a set $A\subset\{1,\dots,N\}$ with too few progressions has a large Fourier coefficient, a large energy statistic, or a non-random factor detected by a regularity lemma. That obstruction is then used to pass to a subprogression, Bohr set, or factor on which the relative density of $A$ increases. In ergodic theory, the same obstruction appears as a nontrivial factor of the measure-preserving system. Conditional expectation onto that factor is the structured part, and the orthogonal complement is the uniform part. Recurrence is proved after showing that the uniform part contributes nothing to the limiting multiple averages and that the structured factor has enough compactness or algebraic recurrence. The correspondence principle explains why these two languages prove the same qualitative phenomenon. Finitary proofs are effective because every step occurs in a large but finite ambient set. Ergodic proofs are often shorter because compactness and limiting objects absorb the bookkeeping. [/explanation] The explanation gives the organizing idea; the example now supplies a test case. This keeps the next construction tied to computations rather than only to terminology. [example: Density Increment as a Finite Recurrence Mechanism] Suppose $A\subset\{1,\dots,N\}$ has density $\delta$ and contains no three-term arithmetic progression. Roth's Fourier-analytic argument finds a nonzero frequency on which $\mathbf{1}_A-\delta\mathbf{1}_{[N]}$ has large Fourier coefficient. This frequency identifies a long progression $P\subset\{1,\dots,N\}$ on which the density of $A$ is at least $\delta+c\delta^2$ for an absolute constant $c>0$, after standard smoothing and pigeonholing. Iterating this procedure cannot continue indefinitely because the density is bounded above by $1$, so a progression must eventually appear. The ergodic analogue is that a non-random component is captured by a factor rather than by a single finite subprogression. [/example] ## Multidimensional Correspondence What changes when the patterns live in $\mathbb Z^d$ rather than in $\mathbb N$? The shift is replaced by a commuting $\mathbb Z^d$-action, and arithmetic progressions are replaced by finite configurations generated by several translation vectors. [definition: Commuting Measure-Preserving Action] A commuting measure-preserving $\mathbb Z^d$-action on a probability space $(X,\mathcal B,\mu)$ is a family $(T^n)_{n\in\mathbb Z^d}$ of measurable maps $T^n:X\to X$ such that \begin{align*} T^{m+n}=T^m\circ T^n,\qquad T^0=\operatorname{id}_X, \end{align*} and each $T^n$ preserves $\mu$. [/definition] The same orbit-closure construction works for subsets of $\mathbb Z^d$ with positive upper Banach density, using boxes instead of intervals. Recurrence for commuting actions then becomes a theorem about homothetic copies of finite configurations. This raises the question of whether the single-transformation recurrence theorem has a genuine several-variable analogue. We want a statement that, given several commuting measure-preserving actions, still forces a positive-measure set to return simultaneously along a common scaling parameter, so that homothetic copies of a fixed finite configuration are detected. The following theorem provides exactly this multidimensional multiple recurrence. [quotetheorem:4618] [citeproof:4618] The theorem is a recurrence statement rather than a quantitative counting theorem: it guarantees a common return time $n$ but does not bound how large the first such $n$ must be. The commutativity of the actions is essential, because the translated copies of a finite configuration must be assembled consistently from several coordinate directions. Its role here is to turn the correspondence principle into combinatorial content: once a dense set has been encoded as a measure-preserving system, multidimensional recurrence produces the grid-shaped patterns that the next example spells out concretely. [example: Corners in a Dense Subset of the Grid] Let $E\subset\mathbb Z^2$ have positive upper Banach density, and take \begin{align*} F=\{(0,0),(1,0),(0,1)\}. \end{align*} The multidimensional theorem gives $a=(a_1,a_2)$ and $r\ge 1$ such that \begin{align*} (a_1,a_2),\qquad (a_1+r,a_2),\qquad (a_1,a_2+r) \end{align*} all lie in $E$. This is the corner configuration, the two-dimensional analogue of a three-term arithmetic progression. [/example] ## What the Correspondence Principle Does Not Preserve Which information is lost when passing from a finite set to a limiting dynamical system? The construction preserves positivity of densities and finite intersection patterns, but it discards numerical bounds and much of the finite geometry of the original ambient interval. [remark: Loss of Bounds] The limiting system remembers that some subsequence of intervals sees density close to $\overline d(A)$. It does not remember a usable bound for the first interval on which a progression must occur. This is the main reason ergodic proofs are powerful for qualitative recurrence but less suited to explicit estimates. [/remark] The first remark settles one point of interpretation. The following remark, Dependence on Subsequences, records a second boundary case before the notes move on. [remark: Dependence on Subsequences] Different choices of density-realising intervals can produce different invariant measures. The correspondence principle only requires one such limiting system, because any positive recurrence in that system transfers back to a positive upper-density statement for $A$. [/remark] The chapter closes the course by placing additive combinatorics next to ergodic theory. The previous chapters supplied finite mechanisms for structure, randomness, and iteration; this chapter shows that, at the level of existence, those mechanisms are manifestations of recurrence in measure-preserving dynamics.

Created by admin on 5/31/2026 | Last updated on 6/1/2026

Sign in to Androma

Check your inbox

One last step

Additive Combinatorics

Prerequisites (0/3 completed)

Prerequisites Graph

Rate this page