This course develops the analytic tools used to study additive problems in number theory, with a focus on exponential sums, Fourier-analytic methods, and the circle method. The central goal is to understand how arithmetic structure in integers, primes, and polynomial sequences is reflected in oscillatory sums, and how that structure can be exploited to count representations, bound error terms, and prove quantitative distribution results.
The early chapters build the basic language: additive characters and finite Fourier analysis provide the framework for decomposing arithmetic functions into frequency components, while Weyl sums and polynomial phases introduce the main class of oscillatory sums that arise in Diophantine questions. From there, van der Corput estimates supply key cancellation techniques, and Vinogradov mean value estimates sharpen the control needed for higher-degree polynomial problems. These ideas culminate in the Hardy-Littlewood circle method, which turns Fourier analysis into a systematic machine for counting solutions to additive equations.
Later chapters apply this machinery to classical and modern problems. Waring’s problem and additive problems with prime variables show how the method handles representations by powers and primes, while additive energy, density, and Fourier control connect exponential-sum methods to combinatorial structure. The final chapter on fractional parts and Diophantine applications shows how the same analytic principles govern distribution modulo one and related approximation problems, tying together the course’s analytic, combinatorial, and Diophantine themes.
# Introduction
This course studies how finite Fourier analysis converts additive questions about integers into estimates for oscillatory sums. The guiding problem is to count solutions of equations such as
\begin{align*}
x_1^k+\cdots+x_s^k=n,
\end{align*}
or to understand when a structured set of integers has the expected number of additive representations. The main technical lesson is that cancellation in exponential sums is the mechanism that separates a main term from an error term.
The course sits after a first course in analytic number theory. We use congruences, elementary asymptotic notation, real and complex analysis, Fourier analysis on the circle, and basic measure and integration. Dirichlet series, sieve methods, and $L$-functions may appear as context, but the core method throughout is additive Fourier analysis.
## What Is Being Counted
How can an equation in integers be turned into something Fourier analytic? The common pattern is to encode a set or sequence by a generating exponential sum, multiply such sums to impose an additive relation, and then extract the desired coefficient by orthogonality. This viewpoint is useful because the size of an exponential sum measures how much arithmetic structure remains after oscillation.
The first kind of object is a representation function. It records how many ways an integer can be assembled from a prescribed list of summands, and it is the bridge between additive number theory and Fourier coefficients.
[definition: Representation Function]
Let $A_1,\dots,A_s$ be finite subsets of $\mathbb Z$. The representation function associated to $(A_1,\dots,A_s)$ is the function $R_{A_1,\dots,A_s}:\mathbb Z\to\mathbb N\cup\{0\}$ defined by
\begin{align*}
R_{A_1,\dots,A_s}(n)=\#\{(a_1,\dots,a_s)\in A_1\times\cdots\times A_s:a_1+\cdots+a_s=n\}.
\end{align*}
[/definition]
This definition is deliberately finite: it lets us count exactly before taking limits. The later analytic work asks whether $R_{A_1,\dots,A_s}(n)$ is close to a predicted main term when the sets vary with a parameter such as $N$.
[example: Sum Of Two Intervals]
Let $A=B=\{1,\dots,N\}$, with $N\ge 1$. For a fixed integer $n$, the value $R_{A,B}(n)$ is the number of integers $a$ such that $1\le a\le N$ and $1\le n-a\le N$. The second condition is equivalent to $n-N\le a\le n-1$, so the admissible values of $a$ are exactly the integers in the interval
\begin{align*}
\max(1,n-N)\le a\le \min(N,n-1).
\end{align*}
If $n<2$, then $a+b\ge 1+1=2$, so there are no pairs. If $n>2N$, then $a+b\le N+N=2N$, so again there are no pairs.
Now assume $2\le n\le 2N$. If $2\le n\le N+1$, then $n-N\le 1$ and $n-1\le N$, so the admissible interval is $1\le a\le n-1$. Hence
\begin{align*}
R_{A,B}(n)=(n-1)-1+1=n-1.
\end{align*}
If $N+1\le n\le 2N$, then $n-N\ge 1$ and $n-1\ge N$, so the admissible interval is $n-N\le a\le N$. Hence
\begin{align*}
R_{A,B}(n)=N-(n-N)+1=2N+1-n.
\end{align*}
Combining the two ranges gives
\begin{align*}
R_{A,B}(n)=\min(n-1,2N+1-n)
\end{align*}
for $2\le n\le 2N$, and $R_{A,B}(n)=0$ otherwise. Thus the representation function rises by one at each step until $n=N+1$, then falls by one at each step, giving a triangular large-scale shape before any cancellation estimate enters.
[/example]
The finite interval example is too regular to reveal the main difficulty. In problems such as Waring's problem, the summands are powers rather than consecutive integers, and the representation function is controlled by sums with polynomial phases.
## Exponential Sums As Measuring Devices
Where does cancellation enter the count? If the terms in a complex sum point in nearly the same direction, the sum is large; if they rotate sufficiently, the sum is smaller than its number of terms. Analytic number theory turns this geometric idea into quantitative bounds.
[definition: Exponential Notation]
The exponential notation is the map
\begin{align*}
e:\mathbb R&\to\mathbb C, & x&\mapsto \exp(2\pi i x).
\end{align*}
[/definition]
The normalization makes $e(x)$ periodic with period $1$, so it is adapted to arithmetic modulo $q$ and to integration over $\mathbb R/\mathbb Z$. Once this notation is fixed, we can package a weighted arithmetic sequence together with an oscillating phase and ask whether the resulting complex sum cancels.
[definition: Exponential Sum]
Let $(a_n)_{n\in I}$ be a finite complex sequence indexed by a finite set $I\subset\mathbb Z$, and let $\phi:I\to\mathbb R$ be a phase function. The associated exponential sum is
\begin{align*}
S=\sum_{n\in I}a_n e(\phi(n)).
\end{align*}
[/definition]
The phase may be linear, polynomial, rational, or defined by a congruence. The weights $a_n$ allow smooth cutoffs, arithmetic functions, or indicator functions of sets to be included without changing the basic formalism.
[example: Complete Linear Sum]
Let $a\in\mathbb Z$ and $q\ge 1$, and set
\begin{align*}
S=\sum_{x=1}^{q}e\left(\frac{ax}{q}\right).
\end{align*}
If $q\mid a$, then $a/q\in\mathbb Z$, so for each $x$ the number $ax/q$ is an integer and $e(ax/q)=1$. Hence
\begin{align*}
S=\sum_{x=1}^{q}1=q.
\end{align*}
Now suppose $q\nmid a$. Put $\zeta=e(a/q)$. Then $\zeta\ne 1$, because $e(t)=1$ exactly when $t\in\mathbb Z$, and $a/q\notin\mathbb Z$. Also
\begin{align*}
\zeta^q=e(a)=1,
\end{align*}
since $a$ is an integer. The sum becomes
\begin{align*}
S=\sum_{x=1}^{q}\zeta^x=\zeta+\zeta^2+\cdots+\zeta^q.
\end{align*}
Multiplying by $1-\zeta$ gives
\begin{align*}
(1-\zeta)S=(\zeta+\zeta^2+\cdots+\zeta^q)-(\zeta^2+\zeta^3+\cdots+\zeta^{q+1}).
\end{align*}
All middle terms cancel, so
\begin{align*}
(1-\zeta)S=\zeta-\zeta^{q+1}.
\end{align*}
Because $\zeta^q=1$, we have $\zeta^{q+1}=\zeta$, and therefore
\begin{align*}
(1-\zeta)S=0.
\end{align*}
Since $1-\zeta\ne 0$, it follows that $S=0$. Thus a complete linear phase modulo $q$ either has no oscillation and contributes $q$, or completes a full nontrivial period and cancels exactly.
[/example]
Exact cancellation is rare in the later chapters, but this example gives the model for what bounds should imitate. The major parts of the course develop methods that produce partial cancellation for incomplete sums and polynomial phases.
## Orthogonality And Fourier Extraction
How can a counting condition such as $a_1+\cdots+a_s=n$ be imposed analytically? The answer is orthogonality: averaging additive characters kills all nonzero frequencies and keeps the zero frequency. This is the finite analogue of extracting a Fourier coefficient.
[quotetheorem:4594]
[citeproof:4594]
This theorem is the first extraction tool. It converts congruence restrictions into character sums, which can then be estimated by algebraic or analytic methods. The averaging over a complete residue system modulo $q$ is essential: if only part of the residue system is used, the geometric progression need not complete a full period and the exact cancellation can fail. The condition being tested is divisibility by $q$, not equality in $\mathbb Z$, so the theorem detects congruence information only. This limitation is also its strength, because it lets a modular condition be inserted into a finite count without changing the original variables.
[example: Solutions To A Linear Congruence]
Fix $q\ge 1$ and let $A,B,C\subset\mathbb Z/q\mathbb Z$. Choose integer representatives when writing $e(ra/q)$; this is independent of the choice, since replacing $a$ by $a+q$ changes $ra/q$ by the integer $r$ and $e(x+r)=e(x)$.
We compute the number $T$ of triples $(a,b,c)\in A\times B\times C$ satisfying $a+b\equiv c\pmod q$. By *Orthogonality Of Additive Characters Modulo q*, applied to $m=a+b-c$, the indicator of this congruence is
\begin{align*}
1_{a+b\equiv c\pmod q}=\frac{1}{q}\sum_{r=0}^{q-1}e\left(\frac{r(a+b-c)}{q}\right).
\end{align*}
Therefore
\begin{align*}
T=\sum_{a\in A}\sum_{b\in B}\sum_{c\in C}\frac{1}{q}\sum_{r=0}^{q-1}e\left(\frac{r(a+b-c)}{q}\right).
\end{align*}
All sums are finite, so we may move the $r$-sum outside:
\begin{align*}
T=\frac{1}{q}\sum_{r=0}^{q-1}\sum_{a\in A}\sum_{b\in B}\sum_{c\in C}e\left(\frac{r(a+b-c)}{q}\right).
\end{align*}
Using $e(x+y)=e(x)e(y)$ and $e(-x)=e(x)^{-1}$, each summand factors as
\begin{align*}
e\left(\frac{r(a+b-c)}{q}\right)=e\left(\frac{ra}{q}\right)e\left(\frac{rb}{q}\right)e\left(-\frac{rc}{q}\right).
\end{align*}
Hence, by finite distributivity,
\begin{align*}
T=\frac{1}{q}\sum_{r=0}^{q-1}\left(\sum_{a\in A}e\left(\frac{ra}{q}\right)\right)\left(\sum_{b\in B}e\left(\frac{rb}{q}\right)\right)\left(\sum_{c\in C}e\left(-\frac{rc}{q}\right)\right).
\end{align*}
For $r=0$, every exponential factor is $1$, so that contribution is
\begin{align*}
\frac{1}{q}\left(\sum_{a\in A}1\right)\left(\sum_{b\in B}1\right)\left(\sum_{c\in C}1\right)=\frac{|A||B||C|}{q}.
\end{align*}
The remaining terms, with $1\le r\le q-1$, are the nonzero Fourier frequencies; they vanish only when the three sets have enough cancellation in their additive character sums, so they measure the deviation from the uniform main contribution.
[/example]
The formula separates counting into a main frequency and nonzero frequencies. Much of the course asks how small the nonzero-frequency contribution is in structured but non-random problems.
## Major Arcs, Minor Arcs, And Additive Problems
Why is rational approximation central when phases are polynomial? A phase such as $\alpha n^k$ behaves differently when $\alpha$ is close to a rational number with small denominator than when it is poorly approximated. If all frequencies are treated uniformly, the large contributions near small-denominator rationals are either overestimated as error or lost inside a bound too weak for the remaining frequencies. The circle method turns this distinction into a partition of the unit interval.
[definition: Major And Minor Arcs]
Let $N\ge 1$ and let $\mathfrak M\subset\mathbb R/\mathbb Z$ be a union of intervals around rational points $a/q$ with $1\le q\le Q$, where $Q$ is a parameter. The set $\mathfrak M$ is called the set of major arcs, and its complement $\mathfrak m=(\mathbb R/\mathbb Z)\setminus\mathfrak M$ is called the set of minor arcs.
[/definition]
The exact width of the intervals depends on the problem, so the definition records the architecture rather than a universal choice. Major arcs usually produce the main term, while minor arcs are controlled by cancellation estimates for the polynomial exponential sums introduced next.
[definition: Weyl Sum]
Let $f\in\mathbb Z[x]$ and $N\ge 1$. The Weyl sum associated to $f$ is the map
\begin{align*}
S_f(\cdot;N):\mathbb R/\mathbb Z&\to\mathbb C, & \alpha&\mapsto \sum_{n=1}^{N}e(\alpha f(n)).
\end{align*}
[/definition]
Weyl sums are the basic objects used to study polynomial additive problems. To see why they are not merely estimates in isolation, we next derive the identity that makes a power of a Weyl sum equal to a representation-counting integral.
[quotetheorem:9043]
[citeproof:9043]
This theorem is the analytic form of Waring's problem. Once the representation count is written as an integral, the task is to estimate it by splitting the domain into major and minor arcs. The finiteness of the sum over $1\le n\le N$ is what permits the expansion and interchange with the integral without convergence issues. The integer condition on $m$ is also essential for the orthogonality step as stated: the integral of $e(\alpha t)$ over $[0,1]$ detects whether the integer $t$ is zero, not whether an arbitrary real parameter vanishes. The formula itself gives no positivity or asymptotic lower bound; those require separate major-arc analysis and minor-arc cancellation estimates.
[example: The Main-Term Error Split]
Let
\begin{align*}
F(\alpha)=S_k(\alpha;N)^s e(-m\alpha).
\end{align*}
By *[Fourier Formula For Waring Representation Counts](/theorems/9043)*, the number of tuples $(n_1,\dots,n_s)\in\{1,\dots,N\}^s$ with $n_1^k+\cdots+n_s^k=m$ is
\begin{align*}
\int_0^1 F(\alpha)\,d\alpha.
\end{align*}
Identifying $[0,1]$ with $\mathbb R/\mathbb Z$, and using that $\mathfrak M$ and $\mathfrak m$ form a partition of $\mathbb R/\mathbb Z$, finite additivity of the integral gives
\begin{align*}
\int_0^1 F(\alpha)\,d\alpha=\int_{\mathfrak M}F(\alpha)\,d\alpha+\int_{\mathfrak m}F(\alpha)\,d\alpha.
\end{align*}
Substituting the definition of $F$ into both terms gives
\begin{align*}
\int_0^1 S_k(\alpha;N)^s e(-m\alpha)\,d\alpha=\int_{\mathfrak M}S_k(\alpha;N)^s e(-m\alpha)\,d\alpha+\int_{\mathfrak m}S_k(\alpha;N)^s e(-m\alpha)\,d\alpha.
\end{align*}
The major-arc integral is the candidate main contribution, where frequencies near rationals with small denominator encode the expected local congruence and real-density factors. The minor-arc integral is the error term: if one proves a sufficiently strong upper bound for $|S_k(\alpha;N)|$ on $\mathfrak m$, then the absolute value of this error is controlled by
\begin{align*}
\left|\int_{\mathfrak m}S_k(\alpha;N)^s e(-m\alpha)\,d\alpha\right|\le \int_{\mathfrak m}|S_k(\alpha;N)|^s\,d\alpha,
\end{align*}
because $|e(-m\alpha)|=1$. Thus the split turns the representation count into a main-term analysis on $\mathfrak M$ plus a cancellation estimate on $\mathfrak m$.
[/example]
This split explains the progression of the lectures. We first build Fourier tools, then prove cancellation estimates, and only afterward apply them to representation problems.
## Shape Of The Course
What sequence of tools is needed before the circle method can be used responsibly? The course starts with finite Fourier analysis because every later argument relies on orthogonality and completion. It then studies Weyl sums, rational approximation, and differencing, before assembling the major-arc and minor-arc analysis for additive equations.
The first block concerns additive characters modulo $q$, finite Fourier inversion, completion of sums, summation by parts, and smooth dyadic cutoffs. These tools explain how short sums can be compared with complete sums and how rough cutoff functions can be replaced by smoother weights.
The second block concerns Weyl sums with polynomial phases. Weyl differencing lowers degree at the cost of introducing correlations, while rational approximation separates frequencies into those near small-denominator rationals and those that are not.
The third block applies these estimates to additive problems. Waring's problem is the central model: the [generating function](/page/Generating%20Function) is a power of a Weyl sum, the main term comes from major arcs, and the minor arcs require enough cancellation to be negligible relative to the expected answer.
The final perspective is methodological. Exponential sums are not auxiliary estimates placed after the counting problem; they are the language in which the counting problem is expressed. Chapter 1 develops the finite Fourier language behind this translation; Chapters 2--4 prove cancellation estimates for polynomial phases; Chapters 5--7 apply them to Waring's problem and prime variables; and Chapters 8--9 return to additive structure and distribution modulo one.
The introduction has set up the course’s central theme: arithmetic questions become analytic ones once they are encoded by exponential sums. Chapter 1 develops the finite Fourier language needed to make that encoding precise and usable.
# 1. Additive characters and finite Fourier analysis
This opening chapter sets up the finite Fourier language used throughout the course. The guiding problem is how to turn arithmetic constraints, such as congruences or additive equations, into exponential sums whose cancellation can be estimated. We begin on the finite group $\mathbb Z/q\mathbb Z$, then pass to completion and summation by parts, and end by viewing representation functions as Fourier coefficients of generating sums.
## Detecting Congruences by Oscillation
How can a congruence condition be inserted into a sum without carrying the condition explicitly? The answer is to average additive characters: the oscillation vanishes unless the desired congruence holds. This mechanism is the finite analogue of extracting a Fourier coefficient on the circle.
The basic notation packages the periodic exponential into a single symbol. It is used so often in analytic number theory that writing the full exponential would hide the additive structure of the argument.
[definition: Exponential Notation]
Define the map $e: \mathbb R \to \mathbb C^\times$ by
\begin{align*}
e(x) := \exp(2\pi i x).
\end{align*}
[/definition]
The function $e(x)$ has period $1$, so $e(a/q)$ depends only on the residue class of $a$ modulo $q$. To detect congruences by averaging phases, we need the homomorphisms from the additive group of residues into complex units; these are the finite frequencies on $\mathbb Z/q\mathbb Z$.
[definition: Additive Character Modulo Q]
Let $q \in \mathbb N$. An additive character modulo $q$ is a [group homomorphism](/page/Group%20Homomorphism)
\begin{align*}
\chi : \mathbb Z/q\mathbb Z \to \mathbb C^\times
\end{align*}
from the additive group $\mathbb Z/q\mathbb Z$ to the multiplicative group $\mathbb C^\times$.
[/definition]
The definition identifies the objects we want to average, but it does not yet say how to enumerate them. The next theorem is needed because Fourier analysis modulo $q$ requires a concrete frequency parameter, and it proves that the residues $a \bmod q$ provide exactly that parameter.
[quotetheorem:9044]
[citeproof:9044]
The hypothesis that the group is the cyclic additive group $\mathbb Z/q\mathbb Z$ is doing real work: a character is forced by its value on the generator $1$. For a non-cyclic finite abelian group, one would need one frequency parameter for each cyclic factor, so a single residue $a$ would not enumerate all characters. For instance, on $\mathbb Z/2\mathbb Z \times \mathbb Z/2\mathbb Z$ a character is determined by two independent signs, one on each generator, rather than by one residue class modulo $2$. The theorem does not say anything about multiplicative Dirichlet characters; those are homomorphisms on units and detect a different structure. In this chapter the point is that congruence conditions are additive, so additive characters are the right frequencies.
The classification means that averaging over all characters is the same as averaging over the residue $a$ modulo $q$. What remains is to turn a congruence condition into something that can be inserted inside a sum. The obstruction is that the condition $q\mid n$ is a discontinuous yes-or-no test, while Fourier manipulations need expressions that multiply and factor. Averaging all additive phases is the mechanism that produces this exact detector.
[quotetheorem:4594]
[citeproof:4594]
The normalising factor $1/q$ is essential: without it the detector would return $q$ rather than $1$ on multiples of $q$. The congruence condition also matters; if one replaced $e(an/q)$ by an arbitrary periodic phase, the geometric-series cancellation could fail. The theorem detects only divisibility by the fixed modulus $q$, not equality of integers, so later integer problems will require an integral over the circle rather than a finite average. This identity is the finite replacement for a Kronecker delta. Instead of solving a congruence directly, we may average a phase over the dual variable $a$, and the following computation shows how a counting problem factorises after the congruence is detected.
[example: Counting Solutions To A Linear Congruence]
Fix $q \in \mathbb N$. For residues $x,y,z \bmod q$, *Orthogonality of Additive Characters* gives
\begin{align*}
\frac{1}{q}\sum_{a \bmod q} e\left(\frac{a(x+y-z)}{q}\right)=1
\end{align*}
when $x+y-z\equiv 0\pmod q$, and gives $0$ otherwise. Therefore the number of triples $(x,y,z)\in(\mathbb Z/q\mathbb Z)^3$ satisfying $x+y=z$ is
\begin{align*}
\sum_{x,y,z \bmod q} \frac{1}{q}\sum_{a \bmod q} e\left(\frac{a(x+y-z)}{q}\right).
\end{align*}
Since all sums are finite, we may interchange the order of summation:
\begin{align*}
\sum_{x,y,z \bmod q} \frac{1}{q}\sum_{a \bmod q} e\left(\frac{a(x+y-z)}{q}\right)=\frac{1}{q}\sum_{a \bmod q}\sum_{x,y,z \bmod q} e\left(\frac{ax}{q}\right)e\left(\frac{ay}{q}\right)e\left(-\frac{az}{q}\right).
\end{align*}
The variables now separate, so this is
\begin{align*}
\frac{1}{q}\sum_{a \bmod q}\left(\sum_{x \bmod q} e\left(\frac{ax}{q}\right)\right)\left(\sum_{y \bmod q} e\left(\frac{ay}{q}\right)\right)\left(\sum_{z \bmod q} e\left(-\frac{az}{q}\right)\right).
\end{align*}
For a fixed $a$, orthogonality in the summation variable gives
\begin{align*}
\sum_{x \bmod q} e\left(\frac{ax}{q}\right)=q
\end{align*}
if $a\equiv 0\pmod q$, and gives $0$ otherwise. The same statement holds for the $y$-sum, and the $z$-sum has the same value because $-a\equiv 0\pmod q$ exactly when $a\equiv 0\pmod q$. Hence every nonzero frequency contributes $0$, while the frequency $a\equiv 0\pmod q$ contributes
\begin{align*}
\frac{1}{q}\cdot q\cdot q\cdot q=q^2.
\end{align*}
Thus the count is $q^2$, matching the elementary fact that $x$ and $y$ may be chosen freely and then determine $z\equiv x+y\pmod q$.
[/example]
The example uses orthogonality to count solutions, but the same mechanism should also recover arbitrary functions on the residue group. This motivates defining Fourier coefficients of a function by correlating it with each additive character.
[definition: Finite Fourier Transform Modulo Q]
Let $q \in \mathbb N$ and let $f: \mathbb Z/q\mathbb Z \to \mathbb C$. Its finite [Fourier transform](/page/Fourier%20Transform) is the function $\hat f: \mathbb Z/q\mathbb Z \to \mathbb C$ defined by
\begin{align*}
\hat f(a) := \sum_{x \bmod q} f(x)e\left(-\frac{ax}{q}\right).
\end{align*}
[/definition]
The sign convention is chosen so that reconstruction uses a positive phase. The central question after defining the transform is whether these finitely many coefficients retain all the information in $f$, and orthogonality gives the answer.
[quotetheorem:4595]
[citeproof:4595]
The factor $1/q$ cannot be omitted, since the orthogonality sum has total mass $q$ before normalisation. The theorem also depends on knowing all $q$ Fourier coefficients; a partial set of frequencies need not determine an arbitrary function on $\mathbb Z/q\mathbb Z$. This is the obstruction to a naive approach that keeps only the average value of $f$: two functions can have the same zero-frequency coefficient but differ everywhere else. Fourier inversion becomes useful once the nonzero Fourier transform is small away from the zero frequency. To quantify the total size of the coefficients and prepare for later error estimates, we need the finite Fourier form of [Parseval's identity](/theorems/434), with the same normalization convention as the transform above.
[remark:Finite Parseval identity]
For functions $f,g: \mathbb Z/q\mathbb Z \to \mathbb C$ with $\hat f(a)=\sum_{x \bmod q} f(x)e(-ax/q)$ and similarly for $\hat g$, orthogonality gives
\begin{align*}
\sum_{x \bmod q} f(x)\overline{g(x)} = \frac{1}{q}\sum_{a \bmod q} \hat f(a)\overline{\hat g(a)}.
\end{align*}
In particular, taking $g=f$ shows that the average square size of the Fourier coefficients is exactly tied to the square mass of the original function.
[/remark]
The conjugate on $g$ is part of this finite Fourier identity, not a cosmetic convention: without it the identity would not be positive when $g=f$. A concrete failure is obtained modulo $q=4$ with $f(0)=1$, $f(1)=i$, and $f(2)=f(3)=0$; then $\sum_x f(x)^2=0$, while the square mass should be $2$. The normalising factor is also forced: taking $f=g=\mathbb{1}_{\{0\}}$ gives $\hat f(a)=1$ for every $a$, so the right-hand side equals $1$ only after division by $q$. The hypothesis that the functions live on the finite residue group matters as well, since replacing the finite sums by infinite sums over $\mathbb Z$ would not give an identity without extra convergence and transform conventions. Parseval is an $L^2$ statement, so it controls average Fourier size but does not by itself give a pointwise bound for each coefficient. For example, a function concentrated at one residue has all Fourier coefficients of modulus $1$, while a constant function has only the zero-frequency coefficient; the identity accounts for both behaviours through total square mass. Parseval shows how square averages of coefficients reflect the original function, but we also need concrete examples where nonzero phases cancel. The next theorem is needed as the first non-linear model: a complete quadratic sum has size $p^{1/2}$, not size $p$, and this square-root scale guides later estimates.
[quotetheorem:9045]
[citeproof:9045]
The primality and oddness assumptions are not harmless: the change of variables $u=x-y$, $v=x+y$ uses that $2$ is invertible modulo $p$, and the Legendre symbol is defined for odd primes. For composite moduli, quadratic sums factor or degenerate according to the arithmetic of the modulus, and the square-root size may require extra coprimality hypotheses. A concrete failure occurs modulo $4$:
\begin{align*}
\sum_{x \bmod 4} e\left(\frac{x^2}{4}\right)=1+i+1+i=2+2i,
\end{align*}
whose absolute value is $2\sqrt{2}$ rather than $4^{1/2}$. The theorem is therefore a model estimate rather than a universal bound for every quadratic phase. This example introduces a recurring theme: complete sums often have algebraic structure unavailable to short incomplete sums. The next section explains how to transfer an incomplete sum into complete sums plus a controllable weight.
## Completing Short Sums
Suppose a sum runs over an interval $I \subset \mathbb Z$ rather than over all residue classes modulo $q$. How can we exploit information about complete sums modulo $q$? Completion rewrites the interval condition by Fourier inversion modulo $q$, separating the arithmetic phase from the shape of the interval.
To name the distinction, we separate sums over all residue classes from sums over a finite interval of integer representatives. This language is needed because later estimates will often prove cancellation only for complete sums and then use completion to handle incomplete sums.
[definition: Complete And Incomplete Exponential Sums]
Let $q \in \mathbb N$. The complete-sum functional is the map
\begin{align*}
S(\cdot;q):\{F: \mathbb Z/q\mathbb Z \to \mathbb C\}\to \mathbb C,
\qquad
S(F;q) := \sum_{x \bmod q} F(x).
\end{align*}
For each finite interval $I \subset \mathbb Z$, the incomplete-sum functional over $I$ is the map
\begin{align*}
S_I(\cdot;q):\{F: \mathbb Z/q\mathbb Z \to \mathbb C\}\to \mathbb C,
\qquad
S_I(F;q) := \sum_{n \in I} F(n \bmod q).
\end{align*}
[/definition]
The indicator of an interval is not periodic as a function on the integers, but modulo $q$ it has a finite Fourier expansion. Applying inversion to that indicator is the completion step, and the following identity records the exact form of the completed sum.
[quotetheorem:9046]
[citeproof:9046]
The sign convention matters here, but the two signs must be tracked as a pair. With the transform convention $\hat f(a)=\sum_x f(x)e(-ax/q)$, the interval coefficient entering the final formula is $C(a)=\sum_{n\in I}e(an/q)$ because the transform of the residue-counting function is $C(-a)$ and Fourier inversion then reindexes $a$ to $-a$. If the interval coefficient were defined with the opposite sign while keeping the same transform convention, the displayed identity would have to be reindexed accordingly. For example, with $q=3$, $I=\{1\}$, and $F(x)=e(x/3)$, the stated formula returns $e(1/3)$, while replacing $C(a)$ by $\sum_{n\in I}e(-an/3)$ in the same displayed formula would return $e(-1/3)$. The interval being finite is also essential, since taking $I=\mathbb Z$ would make $C(0)$ divergent and the formula would no longer be a finite identity. Completion does not itself prove cancellation: if $F$ is the constant function $1$, then the formula returns $|I|$, with no saving. It transfers the problem to complete sums weighted by the explicitly computable coefficients $C(a)$, and any estimate must still control the frequency sum that has been introduced.
[example: Completing A Short Linear Phase]
Let $I=\{M+1,\dots,M+N\}$ with $1\le N\le q$, and let $F(x)=e(bx/q)$ for some $b\in\mathbb Z$. The incomplete sum is $S_I(F;q)$, namely
\begin{align*}
S_I(F;q)=\sum_{n=M+1}^{M+N} e\left(\frac{bn}{q}\right).
\end{align*}
For the completion formula, set
\begin{align*}
C(a)=\sum_{n=M+1}^{M+N} e\left(\frac{an}{q}\right).
\end{align*}
The finite Fourier transform of $F$ is
\begin{align*}
\hat F(a)=\sum_{x\bmod q} e\left(\frac{bx}{q}\right)e\left(-\frac{ax}{q}\right)=\sum_{x\bmod q} e\left(\frac{(b-a)x}{q}\right).
\end{align*}
By *Orthogonality of Additive Characters* applied in the $x$-variable,
\begin{align*}
\hat F(a)=q
\end{align*}
when $a\equiv b\pmod q$, and
\begin{align*}
\hat F(a)=0
\end{align*}
when $a\not\equiv b\pmod q$.
Now *Completion Of A Short Sum Modulo Q* gives
\begin{align*}
S_I(F;q)=\frac{1}{q}\sum_{a\bmod q} C(a)\hat F(a).
\end{align*}
All terms vanish except the residue class $a\equiv b\pmod q$, so
\begin{align*}
S_I(F;q)=\frac{1}{q}C(b)\,q=C(b).
\end{align*}
Substituting the definition of $C(b)$ gives
\begin{align*}
S_I(F;q)=\sum_{n=M+1}^{M+N} e\left(\frac{bn}{q}\right),
\end{align*}
so in this single-frequency case completion recovers exactly the original geometric progression; later estimates enter only when several completed frequencies contribute.
[/example]
The example has no loss because a single frequency is present. For a general interval, we need a uniform estimate for its Fourier coefficients to control how expensive the completion formula can be.
[quotetheorem:9047]
[citeproof:9047]
The minimum in the bound records two different obstructions: a very short interval cannot have more than $N$ total mass, while a frequency close to an integer can make the geometric denominator small. The estimate does not imply strong cancellation when $a/q$ is close to $0$ in $\mathbb R/\mathbb Z$; in that regime the sharp interval still behaves almost like a constant phase. For instance, $a\equiv 0\pmod q$ gives $C(a)=N$, so no square-root saving is available from the interval alone. The coefficient bound explains the endpoint cost of a sharp interval: the indicator of an interval has Fourier coefficients that decay only like the reciprocal distance to an integer frequency, so summing over many frequencies can introduce logarithmic losses. This motivates the definition of smooth dyadic cutoffs, which reduce endpoint effects and allow estimates to be organised one scale at a time.
[definition: Smooth Dyadic Cutoff]
A smooth dyadic cutoff is a function $w \in C_c^\infty((0,\infty))$ for which there exist constants $0<c<C$ such that
\begin{align*}
\operatorname{supp} w \subset [c,C].
\end{align*}
For $N>0$, the associated weight on positive integers is the map $w_N:\mathbb Z_{>0}\to \mathbb C$ defined by
\begin{align*}
w_N(n):=w(n/N).
\end{align*}
[/definition]
Smooth dyadic cutoffs isolate the scale $n \asymp N$ while avoiding sharp endpoints. To use them effectively, we need a summation formula that converts bounds for unweighted partial sums into bounds for weighted sums.
[quotetheorem:9048]
[citeproof:9048]
The differentiability of $f$ is the regularity that makes the error term controllable through $f'$. If the weight has a jump discontinuity, this formula must be interpreted with endpoint contributions, and those jumps are exactly the sharp cutoffs that completion was designed to manage. The theorem does not create cancellation in $A(t)$; it only preserves whatever cancellation is already known for partial sums while introducing a derivative cost. This formula turns a bound for the unweighted partial sums $A(t)$ into a bound for smoothly weighted sums. The derivative of the weight controls the loss, so smooth weights interact well with cancellation estimates.
[example: Smoothing A Dyadic Exponential Sum]
Let $a_n=e(\alpha n)$, let $w$ be a smooth dyadic cutoff supported in $[1,2]$, and put
\begin{align*}
W_N=\sum_{n\ge 1} e(\alpha n)w(n/N).
\end{align*}
Because $w(t)=0$ for $t\notin[1,2]$, the summand vanishes unless $N\le n\le 2N$. Choose an integer $L\ge 2N$, define
\begin{align*}
A(t)=\sum_{1\le n\le t} e(\alpha n),
\end{align*}
and set $f(t)=w(t/N)$. Then $f'(t)=N^{-1}w'(t/N)$ by the chain rule, and $f(L)=w(L/N)=0$ since $L/N\ge 2$ and $w$ is supported in $[1,2]$.
Applying *Partial Summation For Exponential Sums* to the finite sum up to $L$ gives
\begin{align*}
W_N=A(L)f(L)-\int_1^L A(t)f'(t)\,dt.
\end{align*}
Substituting $f(L)=0$ and $f'(t)=N^{-1}w'(t/N)$ gives
\begin{align*}
W_N=-\frac{1}{N}\int_1^L A(t)w'(t/N)\,dt.
\end{align*}
Since $w'(t/N)=0$ unless $1\le t/N\le 2$, the integral may be restricted to $N\le t\le 2N$:
\begin{align*}
W_N=-\frac{1}{N}\int_N^{2N} A(t)w'(t/N)\,dt.
\end{align*}
Taking absolute values and using $|w'(t/N)|\le \|w'\|_\infty$ gives
\begin{align*}
|W_N|\le \frac{\|w'\|_\infty}{N}\int_N^{2N}|A(t)|\,dt.
\end{align*}
Thus, if $|A(t)|\le B$ for all $t\le 2N$, then
\begin{align*}
|W_N|\le \frac{\|w'\|_\infty}{N}\int_N^{2N}B\,dt=B\|w'\|_\infty.
\end{align*}
The smoothed dyadic sum is therefore controlled by the same partial-sum bound over $t\le 2N$, with only the derivative size of the fixed cutoff entering the constant.
[/example]
The course will repeatedly move between sharp, completed, and smoothed sums. The formal identities above are exact; the analytic work lies in choosing the form where the available cancellation is strongest.
## Representation Functions And Generating Sums
Additive problems ask how many ways an integer or residue can be represented as a sum of structured elements. Fourier analysis converts this counting problem into multiplication of generating sums. The product structure is the main reason exponential sums enter additive number theory.
To make this precise in the finite setting, we first name the function that records the number of additive representations of each residue. This object is the finite-group version of the representation functions later studied over the integers.
[definition: Representation Function Modulo Q]
Let $q \in \mathbb N$ and let $A,B \subset \mathbb Z/q\mathbb Z$. The representation function for sums from $A$ and $B$ is the map $r_{A+B}:\mathbb Z/q\mathbb Z\to \mathbb Z_{\ge 0}$ defined by
\begin{align*}
r_{A+B}(z):=|\{(a,b)\in A\times B : a+b\equiv z \pmod q\}|.
\end{align*}
[/definition]
This function is a convolution on the finite group. Counting representations directly keeps the constraint $a+b\equiv z\pmod q$ coupled across two variables, so it is not yet clear how the two sets $A$ and $B$ contribute separately.
The first structural question is whether Fourier transform turns this coupled representation count into separate information about $A$ and $B$. If it does not, the transform has only repackaged the same counting problem; if it does, additive convolution becomes multiplication frequency by frequency.
[quotetheorem:9049]
[citeproof:9049]
The product rule moves the additive constraint into the frequency side. The hypotheses are tied to finite subsets of the same residue group: if $A=\{0\}\subset \mathbb Z/2\mathbb Z$ and $B=\{0\}\subset \mathbb Z/3\mathbb Z$, then the expression $a+b\equiv z\pmod q$ has no single modulus $q$ until both sets are placed in a common group. The indicator-function hypothesis is also what makes $r_{A+B}$ a convolution; replacing $\mathbb{1}_A$ and $\mathbb{1}_B$ by unrelated functions would prove a convolution product for those functions, not a representation count for sets. The theorem is an exact structural identity rather than an equidistribution statement, since large Fourier coefficients may still occur. The next theorem is needed to recover the actual number of representations of a given residue and to separate the zero-frequency main term from the nonzero-frequency error.
[quotetheorem:9050]
[citeproof:9050]
The zero-frequency term is the random-model prediction: if sums from $A$ and $B$ were equidistributed modulo $q$, each residue would receive about $|A||B|/q$ representations. The formula does not guarantee this prediction is accurate; if $A=B=\{0\}$, the nonzero-frequency contribution is large enough to concentrate all mass at $z=0$. Thus the obstruction to uniform representation is exactly the presence of large nonzero Fourier coefficients. This formula is the prototype for the Hardy-Littlewood circle method: major contributions come from structured frequencies, and minor frequencies are controlled by cancellation. The following three-set count shows how the same transform language handles equations with a variable on each side.
[example: Three-Term Congruence Count]
Let $A,B,C \subset \mathbb Z/q\mathbb Z$, and let $T$ be the number of triples $(a,b,c)\in A\times B\times C$ satisfying $a+b\equiv c\pmod q$. For each residue $z$, the value $r_{A+B}(z)$ counts the pairs $(a,b)\in A\times B$ with $a+b\equiv z\pmod q$, while $\mathbb{1}_C(z)$ is $1$ exactly when $z\in C$. Hence
\begin{align*}
T=\sum_{z \bmod q} r_{A+B}(z)\mathbb{1}_C(z).
\end{align*}
Since $\mathbb{1}_C(z)$ is real-valued, $\mathbb{1}_C(z)=\overline{\mathbb{1}_C(z)}$, so
\begin{align*}
T=\sum_{z \bmod q} r_{A+B}(z)\overline{\mathbb{1}_C(z)}.
\end{align*}
Applying *[Parseval Identity](/theorems/248) Modulo Q* with $f=r_{A+B}$ and $g=\mathbb{1}_C$ gives
\begin{align*}
T=\frac{1}{q}\sum_{h \bmod q}\widehat{r_{A+B}}(h)\overline{\widehat{\mathbb{1}_C}(h)}.
\end{align*}
By *Fourier Transform Of A Representation Function*,
\begin{align*}
\widehat{r_{A+B}}(h)=\widehat{\mathbb{1}_A}(h)\widehat{\mathbb{1}_B}(h).
\end{align*}
Substituting this identity into the previous display gives
\begin{align*}
T=\frac{1}{q}\sum_{h \bmod q}\widehat{\mathbb{1}_A}(h)\widehat{\mathbb{1}_B}(h)\overline{\widehat{\mathbb{1}_C}(h)}.
\end{align*}
At the zero frequency,
\begin{align*}
\widehat{\mathbb{1}_A}(0)=\sum_{x\bmod q}\mathbb{1}_A(x)=|A|.
\end{align*}
Similarly, $\widehat{\mathbb{1}_B}(0)=|B|$ and $\widehat{\mathbb{1}_C}(0)=|C|$. Therefore the $h=0$ term contributes
\begin{align*}
\frac{1}{q}|A||B|\overline{|C|}=\frac{|A||B||C|}{q}.
\end{align*}
Separating this term from the remaining frequencies yields
\begin{align*}
T=\frac{|A||B||C|}{q}+\frac{1}{q}\sum_{h \bmod q,\, h\ne 0}\widehat{\mathbb{1}_A}(h)\widehat{\mathbb{1}_B}(h)\overline{\widehat{\mathbb{1}_C}(h)}.
\end{align*}
Thus the expected uniform contribution is the zero-frequency term, while every deviation from that prediction is carried by the nonzero Fourier coefficients.
[/example]
The finite congruence count points toward integer additive problems, where the target equation is an equality rather than a congruence. This motivates the definition of a generating exponential sum, the circle analogue of a finite Fourier transform for a finite set of integers.
[definition: Generating Exponential Sum]
Let $A$ be a finite subset of $\mathbb Z$. Its generating exponential sum is the map $S_A:\mathbb R/\mathbb Z\to \mathbb C$ defined by
\begin{align*}
S_A(\alpha):=\sum_{n\in A}e(\alpha n).
\end{align*}
[/definition]
The definition packages a finite set into a periodic function on $\mathbb R/\mathbb Z$. For integer equations, however, a finite congruence average is no longer enough: we must isolate the exact coefficient whose frequency equals the target integer $m$.
The obstruction is that the product of generating sums records every possible total simultaneously. To recover only the representations of one integer $m$, we need an exact coefficient-extraction identity on the circle, analogous to finite character orthogonality but suited to integer equality.
[quotetheorem:9051]
[citeproof:9051]
The finiteness of the sets $A_i$ ensures that the product is a finite trigonometric polynomial, so expansion and integration can be interchanged without convergence issues. A concrete failure mode appears if $A_1=A_2=\mathbb N$ were allowed: the sums $S_{A_i}(\alpha)$ would not be finite functions on the circle, and the displayed integral would not be defined as an ordinary Riemann or [Lebesgue integral](/page/Lebesgue%20Integral). The formula counts exact integer equalities, unlike the finite orthogonality formula earlier, which counted congruences modulo a fixed $q$; replacing equality by a congruence would require the finite average from the first section. It does not estimate $R(m)$ on its own; all analytic difficulty is transferred to bounding or asymptotically evaluating the integral. This final identity is the bridge from finite Fourier analysis to the later circle method. The rest of the course develops estimates for generating sums, then inserts those estimates into representation formulas of this kind.
Chapter 1 showed how orthogonality turns counting problems into Fourier coefficients and representation integrals. The next step is to understand how much cancellation a polynomial phase can generate, and that is the subject of Weyl sums.
# 2. Weyl sums and polynomial phases
Building on Chapter 1's Fourier extraction formulas, this chapter begins the main analytic mechanism of the course: estimating finite sums whose phases are polynomial. In Chapter 1, orthogonality converted counting problems into exponential sums; here we learn how oscillation in a polynomial phase forces cancellation unless the leading coefficient is well approximated by a rational number with small denominator. The three guiding questions are: what structure does a polynomial phase have under finite differences, how does rational approximation control the size of the sum, and how do these estimates imply equidistribution?
## Polynomial Exponential Sums
Finite polynomial phases cannot be treated as ordinary Fourier coefficients of a fixed periodic function: the summation length changes, the arithmetic of the polynomial coefficients matters, and the relevant frequency may move with the additive problem. We therefore need notation that keeps the arithmetic part, the length of summation, and the frequency parameter visible, because later circle method arguments will integrate such sums over $\alpha \in \mathbb R/\mathbb Z$.
[definition: Weyl Sum]
Let $f \in \mathbb R[x]$ and let $N \in \mathbb N$. The Weyl sum associated to $f$ is the function
\begin{align*}
S_f(\cdot;N):\mathbb R \longrightarrow \mathbb C
\end{align*}
defined by
\begin{align*}
S_f(\alpha;N):=\sum_{1 \le n \le N} e(\alpha f(n)),
\end{align*}
where $e(x):=e^{2\pi i x}$.
[/definition]
The value of $S_f(\alpha;N)$ depends only on $\alpha f(n)$ modulo $1$. For integral polynomials it is often more natural to absorb $\alpha$ into the leading coefficients and write $\sum_{n \le N} e(P(n))$ for a real polynomial $P$.
[example: Linear Phases]
Let $P(n)=\theta n+\theta_0$ with $\theta,\theta_0\in\mathbb R$. Since $e(x+y)=e(x)e(y)$, we have
\begin{align*}
\sum_{1\le n\le N}e(P(n))=e(\theta_0)\sum_{1\le n\le N}e(\theta n).
\end{align*}
Write $z=e(\theta)$. If $\theta\in\mathbb Z$, then $z=1$ and the sum is exactly $Ne(\theta_0)$. If $\theta\notin\mathbb Z$, then $z\ne 1$, and
\begin{align*}
(1-z)\sum_{1\le n\le N}z^n=(z+z^2+\cdots+z^N)-(z^2+z^3+\cdots+z^{N+1})=z-z^{N+1}.
\end{align*}
Therefore
\begin{align*}
\left|\sum_{1\le n\le N}e(P(n))\right|=\left|e(\theta_0)\frac{z-z^{N+1}}{1-z}\right|\le \frac{|z|+|z|^{N+1}}{|1-z|}=\frac{2}{|1-e(\theta)|}.
\end{align*}
If $d=\|\theta\|_{\mathbb R/\mathbb Z}$, then $0<d\le 1/2$ and
\begin{align*}
|1-e(\theta)|=|1-e^{2\pi i\theta}|=2|\sin(\pi\theta)|=2\sin(\pi d)\ge 4d,
\end{align*}
using $\sin t\ge 2t/\pi$ for $0\le t\le \pi/2$. Hence
\begin{align*}
\left|\sum_{1\le n\le N}e(P(n))\right|\le \frac{1}{2\|\theta\|_{\mathbb R/\mathbb Z}}.
\end{align*}
Thus the linear sum can be large only when $\theta$ is very close to an integer, meaning the phase changes by almost an integer from one term to the next.
[/example]
The linear example gives a complete answer because the ratio between consecutive terms is fixed. For higher-degree phases that ratio changes with $n$, so we need an operation that compares nearby terms and exposes a new phase of smaller degree.
[definition: Difference Operator]
For an integer $h$, the multiplicative difference operator is the map
\begin{align*}
\Delta_h:\mathbb C^{\mathbb Z} \longrightarrow \mathbb C^{\mathbb Z}
\end{align*}
defined by
\begin{align*}
(\Delta_h F)(n):=F(n+h)\overline{F(n)}.
\end{align*}
The additive polynomial difference operator is the map
\begin{align*}
\delta_h:\mathbb R[x] \longrightarrow \mathbb R[x]
\end{align*}
defined by
\begin{align*}
(\delta_h P)(n):=P(n+h)-P(n).
\end{align*}
[/definition]
With this notation, $\Delta_h(e(P(n)))=e((\delta_h P)(n))$.
The multiplicative difference operator is suited to exponential sums, while the polynomial difference operator records the new phase. If $P$ has degree $k$, then $\delta_h P$ has degree at most $k-1$ in $n$, and its leading coefficient is $k h$ times the leading coefficient of $P$.
[example: Quadratic Difference]
For $P(n)=\alpha n^2$ and an integer shift $h$, the additive difference is
\begin{align*}
\delta_h P(n)=P(n+h)-P(n)
\end{align*}
so substituting $P(t)=\alpha t^2$ gives
\begin{align*}
\delta_h P(n)=\alpha(n+h)^2-\alpha n^2.
\end{align*}
Expanding the square,
\begin{align*}
\alpha(n+h)^2-\alpha n^2=\alpha(n^2+2hn+h^2)-\alpha n^2.
\end{align*}
Distributing $\alpha$ and cancelling the two $\alpha n^2$ terms,
\begin{align*}
\alpha n^2+2\alpha hn+\alpha h^2-\alpha n^2=2\alpha h n+\alpha h^2.
\end{align*}
Therefore
\begin{align*}
\delta_h P(n)=2\alpha h n+\alpha h^2.
\end{align*}
Thus one differencing step converts the quadratic phase $e(\alpha n^2)$ into the linear phase $e(2\alpha h n+\alpha h^2)$. The constant term $\alpha h^2$ only contributes the fixed factor $e(\alpha h^2)$ to the inner sum, while the coefficient $2\alpha h$ controls the geometric ratio between consecutive terms; if $2\alpha h$ is far from an integer, those linear terms cancel geometrically.
[/example]
## Weyl Differencing
The immediate problem is that taking differences introduces shifts, boundary terms, and many possible values of $h$. Weyl's method packages these effects into an inequality: a large polynomial sum forces many lower-degree difference sums to be large.
[quotetheorem:9052]
[citeproof:9052]
This lemma does not require $P$ to be a polynomial, and this generality is useful because it isolates the averaging argument from any algebraic property of the phase. The averaging parameter $H$ is not constrained by $H\le N$ in the stated lemma; what matters conceptually is that $H$ is a positive differencing scale to keep the translate average meaningful on the interval of summation; if $H$ were much larger than $N$, the boundary terms would dominate the estimate rather than the differences. The lemma by itself does not prove cancellation: for example, if $P$ is constant, every difference sum is as large as its length. Its force in the polynomial case is that each differencing step reduces degree, so the argument can be iterated until only linear sums remain.
[definition: Iterated Polynomial Difference]
For $h_1,\dots,h_r\in\mathbb Z$, the iterated polynomial difference operator is the map
\begin{align*}
\delta_{h_1,\dots,h_r}:\mathbb R[x] \longrightarrow \mathbb R[x]
\end{align*}
defined recursively by
\begin{align*}
\delta_{h_1,\dots,h_r}P:=\delta_{h_r}(\delta_{h_1,\dots,h_{r-1}}P),\qquad (\delta_{h_1}P)(n):=P(n+h_1)-P(n).
\end{align*}
[/definition]
After $r$ differences, a degree $k$ polynomial has degree at most $k-r$. To apply the linear estimate at the end of the iteration, we also need the exact leading coefficient that survives after these differences, since that coefficient is where the rational approximation to the original leading coefficient enters.
[quotetheorem:9053]
[citeproof:9053]
The condition $h_1\cdots h_r\ne 0$ is needed for the displayed leading coefficient: if one shift is $0$, that differencing step annihilates the polynomial difference and no degree drop of the stated form remains. The theorem gives only the top coefficient, not a full description of all lower terms, because Weyl's method ultimately needs the terminal linear frequency rather than every coefficient generated along the way. A simple counterpoint is a polynomial with rational leading coefficient but irrational lower coefficient; the largest irrational coefficient, not necessarily the original degree, controls qualitative equidistribution. The next task is to sum over all possible shift products and convert the resulting linear estimates into a single upper bound for the original polynomial sum; the rational approximation to the leading coefficient is the datum that makes this possible.
[quotetheorem:9054]
[citeproof:9054]
The coprimality condition $(a,q)=1$ makes $q$ the true denominator of the rational approximation; without it, the terms $1/q$ and $q/N^k$ would measure the wrong arithmetic scale. The approximation hypothesis $|\alpha-a/q|\le q^{-2}$ is not a harmless decoration, because the proof counts near-integer multiples using the denominator $q$ and would not give this form from an arbitrary rational approximation. The estimate is strongest on minor arcs: if $q$ is neither too small nor too large relative to $N^k$, then the parenthesis is a negative power of $N$ and the sum saves a power over the bound $|S|\le N$. It is also limited near small denominators, as $\alpha=a/q$ with small $q$ can make the phase nearly periodic on residue classes, which is why the circle method treats such points separately.
[example: Bounding A Quadratic Weyl Sum]
Let $P(n)=\alpha n^2$, and suppose $(a,q)=1$ and $|\alpha-a/q|\le q^{-2}$. Applying *Weyl Inequality For Polynomial Phases* with $k=2$ and leading coefficient $\alpha$, the exponent in the Weyl factor is
\begin{align*}
2^{1-k}=2^{1-2}=2^{-1}=\frac{1}{2}.
\end{align*}
Therefore
\begin{align*}
\left|\sum_{1\le n\le N}e(\alpha n^2)\right|\lesssim_{\varepsilon}N^{1+\varepsilon}\left(\frac{1}{q}+\frac{1}{N}+\frac{q}{N^2}\right)^{1/2}.
\end{align*}
To see the size of the bound when $q$ is comparable to $N$, assume for example that $cN\le q\le CN$ for fixed constants $c,C>0$. From $q\ge cN$ we get
\begin{align*}
\frac{1}{q}\le \frac{1}{cN}.
\end{align*}
From $q\le CN$ we get
\begin{align*}
\frac{q}{N^2}\le \frac{CN}{N^2}=\frac{C}{N}.
\end{align*}
Hence
\begin{align*}
\frac{1}{q}+\frac{1}{N}+\frac{q}{N^2}\le \left(\frac{1}{c}+1+C\right)\frac{1}{N}.
\end{align*}
Substituting this estimate into the Weyl bound gives
\begin{align*}
\left|\sum_{1\le n\le N}e(\alpha n^2)\right|\lesssim_{\varepsilon,c,C}N^{1+\varepsilon}\left(\frac{1}{N}\right)^{1/2}.
\end{align*}
Since
\begin{align*}
N^{1+\varepsilon}N^{-1/2}=N^{1/2+\varepsilon},
\end{align*}
we obtain
\begin{align*}
\left|\sum_{1\le n\le N}e(\alpha n^2)\right|\lesssim_{\varepsilon,c,C}N^{1/2+\varepsilon}.
\end{align*}
Compared with the elementary estimate $|\sum_{1\le n\le N}e(\alpha n^2)|\le N$, this gains almost a factor of $N^{1/2}$. If $q$ is small, the term $1/q$ is not small, so the Weyl factor need not contribute a negative power of $N$; this is the quantitative sign that small-denominator rationals are the structured points where cancellation can weaken.
[/example]
The quadratic example also warns us that the rational approximation data cannot be ignored. When $\alpha$ is very close to a rational number with small denominator, the phase may be nearly periodic over residue classes and cancellation can fail.
## Rational Approximation And Arcs
The next problem is to separate the values of $\alpha$ where a single Weyl sum has structured behaviour from those where Weyl's inequality gives cancellation. Dirichlet approximation supplies a rational $a/q$ near every $\alpha$, so the distinction is not whether an approximation exists, but whether its denominator is small enough to matter at scale $N$.
[quotetheorem:5516]
[citeproof:5516]
The parameter $Q$ is necessary because it fixes the allowed denominator size; without such a cutoff the statement would not distinguish useful approximations from very complicated ones. The coprimality condition is a normalization rather than an extra approximation miracle: if the pigeonhole argument gives $a/q$ with $d=(a,q)>1$, then $a/q=(a/d)/(q/d)$ and the reduced denominator $q/d$ is no larger, while the same real number is being used to approximate $\alpha$. This matters later because an unreduced fraction such as $2/6$ would falsely present denominator $6$ even though its arithmetic period is governed by denominator $3$. The estimate is essentially a pigeonhole bound, so it guarantees existence but not uniqueness, and different reduced fractions can compete when arcs overlap. For rational $\alpha$ the theorem may recover the exact rational once $Q$ is large enough, while for badly approximable irrational numbers it is close to the best possible order. The circle method turns this [universal approximation](/theorems/1994) statement into a partition of the frequency circle into regions near rationals with small denominators and the remaining complementary region.
[definition: Major And Minor Arcs For A Single Polynomial Sum]
As foreshadowed by the major/minor arc split in Chapter 0, fix $k\ge 2$, $N\ge 1$, and parameters $Q,R\ge 1$. The major arcs are the set of $\alpha\in\mathbb R/\mathbb Z$ for which there exist integers $a,q$ with $1\le q\le Q$, $(a,q)=1$, and
\begin{align*}
\left|\alpha-\frac{a}{q}\right|\le \frac{R}{N^k}.
\end{align*}
The minor arcs are the complement of this set in $\mathbb R/\mathbb Z$.
[/definition]
The parameters are chosen according to the additive problem under study. For a single sum, the definition records the main principle: near a small-denominator rational, arithmetic periodicity may dominate; away from such rationals, Weyl's inequality gives cancellation.
[example: Failure Of Cancellation Near A Rational]
Take $P(n)=\alpha n^k$ with $\alpha=a/q$, where $a,q\in\mathbb Z$ and $q\ge 1$. For every integer $n$,
\begin{align*}
(n+q)^k=\sum_{j=0}^k\binom{k}{j}n^{k-j}q^j
\end{align*}
so
\begin{align*}
(n+q)^k-n^k=\sum_{j=1}^k\binom{k}{j}n^{k-j}q^j=q\sum_{j=1}^k\binom{k}{j}n^{k-j}q^{j-1}.
\end{align*}
Thus $(n+q)^k-n^k$ is divisible by $q$, and hence
\begin{align*}
e\left(\frac{a(n+q)^k}{q}\right)=e\left(\frac{a n^k}{q}\right)e\left(\frac{a((n+q)^k-n^k)}{q}\right)=e\left(\frac{a n^k}{q}\right),
\end{align*}
because $a((n+q)^k-n^k)/q$ is an integer. Therefore $e(a n^k/q)$ is periodic with period $q$.
Write
\begin{align*}
G(a,q):=\sum_{r=1}^q e\left(\frac{a r^k}{q}\right).
\end{align*}
If $N=Mq+s$ with integers $M\ge 0$ and $0\le s<q$, then the first $Mq$ terms split into $M$ complete periods, so
\begin{align*}
\sum_{1\le n\le Mq}e\left(\frac{a n^k}{q}\right)=M G(a,q).
\end{align*}
The remaining endpoint contribution has at most $s$ terms, each of absolute value $1$, so
\begin{align*}
\left|\sum_{Mq<n\le Mq+s}e\left(\frac{a n^k}{q}\right)\right|\le s<q.
\end{align*}
Consequently,
\begin{align*}
\left|\sum_{1\le n\le N}e\left(\frac{a n^k}{q}\right)\right|\ge M|G(a,q)|-q.
\end{align*}
In particular, if $|G(a,q)|\ge c q$ for some fixed $c>0$ and $q$ is fixed, then along multiples $N=Mq$ we get
\begin{align*}
\left|\sum_{1\le n\le N}e\left(\frac{a n^k}{q}\right)\right|=M|G(a,q)|\ge cN.
\end{align*}
Thus small-denominator rational points can produce sums of order $N$, so a bound saving a fixed power of $N$ cannot hold uniformly over all such rational phases.
[/example]
The major arcs are not bad; they are structured. Later chapters exploit them by replacing a long sum by a product of a complete exponential sum modulo $q$ and a continuous oscillatory integral involving the small real displacement $\beta=\alpha-a/q$.
[remark: Major Arc Factorisation]
If $\alpha=a/q+\beta$ and $P(n)=n^k$, then
\begin{align*}
e(\alpha n^k)=e(a n^k/q)e(\beta n^k).
\end{align*}
The first factor is periodic modulo $q$, while the second varies slowly when $|\beta|N^k$ is bounded. This is the local model behind singular series and singular integrals in the Hardy--Littlewood method.
[/remark]
## Weyl Equidistribution
The final question in this chapter is qualitative rather than quantitative: when do the fractional parts of a polynomial sequence spread uniformly around the circle? Exponential sums give exactly the right test, because intervals can be approximated by finite [Fourier series](/page/Fourier%20Series).
[definition: Equidistribution Modulo One]
A sequence $(x_n)_{n\ge 1}$ in $\mathbb R$ is equidistributed modulo $1$ if for every interval $I\subset [0,1)$,
\begin{align*}
\lim_{N\to\infty}\frac{1}{N}|\{1\le n\le N: \{x_n\}\in I\}|=|I|.
\end{align*}
[/definition]
The definition is geometric and speaks about visits to intervals. The Fourier bridge behind the next result is Weyl's criterion: a sequence $(x_n)$ is uniformly distributed modulo $1$ exactly when, for every non-zero integer $m$, the averages
\begin{align*}
\frac1N\sum_{n\le N} e(mx_n)
\end{align*}
tend to $0$ as $N\to\infty$. Testing every non-zero integer $m$ is necessary because a sequence may cancel at one frequency but still retain periodic bias visible at another; for instance, alternation between $0$ and $1/2$ is invisible to even characters but not to odd ones. This criterion is qualitative: it proves uniform distribution but gives no rate unless the exponential sums are bounded quantitatively.
For polynomial phases, the criterion combines with Weyl differencing to give a more specialized theorem. If $P(n)$ has at least one irrational non-constant coefficient, then each non-zero Fourier mode $mP(n)$ still has an irrational non-constant coefficient, so polynomial cancellation applies to every Fourier frequency that Weyl's criterion asks us to test.
The theorem card below is this polynomial consequence, not Weyl's criterion itself. The structural question is: exactly which coefficients force the fractional parts of $P(n)$ to spread evenly through the unit interval? The result isolates the answer in terms of irrational non-constant coefficients, separating the harmless constant shift from the coefficients that control oscillation.
[quotetheorem:3434]
[citeproof:3434]
The irrationality hypothesis is necessary: if all non-constant coefficients are rational, then after multiplying by a common denominator the sequence becomes periodic modulo $1$, and a nonconstant periodic sequence need not visit intervals with the correct limiting proportions. The constant coefficient does not matter because adding $\alpha_0$ only translates the sequence modulo $1$, or equivalently multiplies every exponential sum by the fixed unit complex number $e(m\alpha_0)$. This polynomial equidistribution theorem is qualitative rather than quantitative; it does not say how large $N$ must be before the distribution looks uniform. It is the uniform-distribution counterpart of Weyl's inequality: irrational polynomial phases eventually cancel at every fixed non-zero Fourier frequency, while the quantitative inequality measures how much cancellation is visible at a finite scale.
[example: Irrational Monomial Sequences]
Let $k\ge 1$ and let $\alpha\in\mathbb R\setminus\mathbb Q$. For $P(n)=\alpha n^k$, the coefficient of $n^k$ is $\alpha$, and $\alpha$ is irrational by assumption. Every other non-constant coefficient is $0$, because
\begin{align*}
P(n)=\alpha n^k+0\cdot n^{k-1}+\cdots+0\cdot n
\end{align*}
with the obvious omission of terms when $k=1$. Thus $P$ has at least one irrational non-constant coefficient, so *Weyl Equidistribution For Polynomial Sequences* applies and shows that $(P(n))_{n\ge 1}=(\alpha n^k)_{n\ge 1}$ is equidistributed modulo $1$.
By the definition of equidistribution modulo $1$, this means that for every interval $I\subset[0,1)$,
\begin{align*}
\lim_{N\to\infty}\frac{1}{N}|\{1\le n\le N:\{\alpha n^k\}\in I\}|=|I|.
\end{align*}
So the fractional parts of the monomial sequence $\alpha n^k$ eventually visit each interval with exactly the interval's length as their limiting frequency.
[/example]
The rational case has the opposite behaviour: if all non-constant coefficients are rational, then $P(n)$ modulo $1$ is periodic after clearing denominators. Thus polynomial equidistribution is a precise irrationality phenomenon, and Weyl sums measure how far a finite segment is from that limiting distribution.
The Weyl-sum estimates of Chapter 2 make oscillation quantitative for polynomial phases, but they rely on differencing and smoothing ideas that work more broadly. Chapter 3 turns those ideas into the van der Corput estimates for general one-variable exponential sums.
# 3. van der Corput estimates
The previous chapter developed Weyl differencing, rational approximation, and equidistribution for polynomial exponential sums. This chapter develops the first systematic estimates for such sums when the phase is a smooth real function of one variable. The guiding principle is that monotonic change in the derivative forces cancellation, and differencing converts harder sums into shorter sums whose phases have more visible oscillation.
We write $e(x)=e^{2\pi i x}$ and study sums of the form
\begin{align*}
S = \sum_{a < n \le b} e(f(n)),
\end{align*}
where $f:[a,b]\to \mathbb R$ is sufficiently differentiable. The estimates below are the entry point to exponent pairs, a compact language for recording how much cancellation can be extracted from a family of exponential sums.
## First and Second Derivative Tests
How can local information about a smooth phase $f$ control a discrete sum over integers? The first derivative test treats the case where $f'$ stays away from integers, so consecutive terms rotate at a definite rate. The [second derivative test](/theorems/8607) covers the more flexible situation where $f'$ may pass near an integer, but does so at controlled speed.
The first tool measures cancellation using the distance from a real number to the nearest integer.
[definition: Distance To The Nearest Integer]
Define the map $\|\cdot\|:\mathbb R\to[0,1/2]$ by
\begin{align*}
\|x\| := \min_{m\in\mathbb Z}|x-m|.
\end{align*}
[/definition]
This notation is not a norm; it records how close the additive character $e(x)$ is to $1$. It enters through the finite geometric progression estimate, which is the model calculation behind the first derivative test.
[example: Geometric Progression With Nonintegral Frequency]
Let $\alpha\in\mathbb R$ and $M\in\mathbb N$, and set
\begin{align*}
G_M(\alpha)=\sum_{1\le n\le M} e(\alpha n).
\end{align*}
Since $|e(x)|=1$ for real $x$, we always have
\begin{align*}
|G_M(\alpha)|\le \sum_{1\le n\le M}|e(\alpha n)|=M.
\end{align*}
Now suppose $\alpha\notin\mathbb Z$, so $e(\alpha)\ne 1$. Multiplying by $1-e(\alpha)$ gives
\begin{align*}
(1-e(\alpha))G_M(\alpha)=\sum_{1\le n\le M}e(\alpha n)-\sum_{1\le n\le M}e(\alpha(n+1)).
\end{align*}
After reindexing the second sum as $\sum_{2\le n\le M+1}e(\alpha n)$, the terms with $2\le n\le M$ cancel, leaving
\begin{align*}
(1-e(\alpha))G_M(\alpha)=e(\alpha)-e(\alpha(M+1)).
\end{align*}
Factoring out $e(\alpha)$ gives the geometric-series identity
\begin{align*}
G_M(\alpha)=e(\alpha)\frac{1-e(M\alpha)}{1-e(\alpha)}.
\end{align*}
Taking absolute values and using $|e(\alpha)|=1$ and $|1-e(M\alpha)|\le 2$, we obtain
\begin{align*}
|G_M(\alpha)|\le \frac{2}{|1-e(\alpha)|}.
\end{align*}
Choose $m\in\mathbb Z$ with $|\alpha-m|=\|\alpha\|$, and put $\delta=\|\alpha\|$. Since $e(\alpha)=e(\alpha-m)$ and $|1-e(-\delta)|=|1-e(\delta)|$, we have
\begin{align*}
|1-e(\alpha)|=2|\sin(\pi\delta)|.
\end{align*}
Here $0<\delta\le 1/2$, because $\alpha\notin\mathbb Z$. The inequality $\sin u\ge 2u/\pi$ for $0\le u\le \pi/2$ gives
\begin{align*}
|1-e(\alpha)|=2\sin(\pi\delta)\ge 4\delta=4\|\alpha\|.
\end{align*}
Therefore, for $\alpha\notin\mathbb Z$,
\begin{align*}
|G_M(\alpha)|\le \min\left(M,\frac{1}{2\|\alpha\|}\right).
\end{align*}
Thus a fixed nonintegral frequency gives cancellation unless $\alpha$ is very close to an integer.
[/example]
The geometric progression estimate handles a linear phase, and the next question is how much of it survives when the frequency changes with $n$. Monotonicity of $f'$ prevents the changing frequency from revisiting the same integer neighbourhood many times. The lower bound on $\|f'(x)\|$ is the hypothesis that keeps all local linear models away from resonance.
[quotetheorem:9055]
[citeproof:9055]
The first derivative estimate gives a complete answer when the phase stays away from resonance. In many arithmetic sums the derivative crosses an integer inside the summation range, so the previous theorem no longer applies near that point. Curvature still limits how long the phase can remain nearly resonant, and the next estimate turns that limitation into a square-root type bound.
[quotetheorem:9056]
[citeproof:9056]
This theorem explains why square-root cancellation is often the natural target for curved phases. The lower bound $|f''|\ge\lambda$ prevents the derivative from lingering near a resonant integer for too long, while the fixed sign of $f''$ gives the monotonicity of $f'$ needed to count those resonant windows without repeated returns. The upper bound $|f''|\le\Lambda$ controls how many derivative intervals are swept out across the summation range, so both sides of the curvature hypothesis contribute to the final estimate.
The estimate is not automatically stronger than the bound by length alone. In the dyadic form it is useful when $N\lambda^{1/2}+\lambda^{-1/2}$ is smaller than $N$; when $N^2\lambda$ is small, the term $\lambda^{-1/2}$ can exceed the length of the sum. Thus the result is best viewed as a curvature test, not as a universal cancellation theorem. It also gives useful bounds even when the phase is not a polynomial.
[example: Logarithmic Phase]
Let $t\in\mathbb R$ with $|t|\ge 1$, and consider
\begin{align*}
S(N,t)=\sum_{N<n\le 2N} e(t\log n).
\end{align*}
Set $f(x)=t\log x$ on $[N,2N]$. Then
\begin{align*}
f'(x)=\frac{t}{x}
\end{align*}
and
\begin{align*}
f''(x)=-\frac{t}{x^2}.
\end{align*}
Since $N\le x\le 2N$, we have $N^2\le x^2\le 4N^2$, hence
\begin{align*}
\frac{|t|}{4N^2}\le |f''(x)|=\frac{|t|}{x^2}\le \frac{|t|}{N^2}.
\end{align*}
Also $f''$ has constant sign on $[N,2N]$, because $-t/x^2$ is always positive when $t<0$ and always negative when $t>0$.
Apply *[Second Derivative](/page/Second%20Derivative) Estimate* with $\lambda=|t|/(4N^2)$, $\Lambda=|t|/N^2$, and interval length $2N-N=N$. This gives
\begin{align*}
|S(N,t)|\lesssim N\left(\frac{|t|}{N^2}\right)^{1/2}+\left(\frac{|t|}{4N^2}\right)^{-1/2}.
\end{align*}
The first term is
\begin{align*}
N\left(\frac{|t|}{N^2}\right)^{1/2}=N\frac{|t|^{1/2}}{N}=|t|^{1/2}.
\end{align*}
The second term is
\begin{align*}
\left(\frac{|t|}{4N^2}\right)^{-1/2}=\left(\frac{4N^2}{|t|}\right)^{1/2}=2N|t|^{-1/2}.
\end{align*}
Absorbing the factor $2$ into the absolute implicit constant, we obtain
\begin{align*}
|S(N,t)|\lesssim |t|^{1/2}+N|t|^{-1/2}.
\end{align*}
Thus the curvature of $t\log x$ gives a nontrivial dyadic estimate in the range where the logarithmic phase changes enough to create oscillation but the curvature term $|t|^{1/2}$ has not yet exceeded the length scale $N$.
[/example]
Polynomial phases provide a direct bridge back to the Weyl sums of the previous chapter, where cancellation was measured against the size of a leading coefficient rather than against an abstract curvature parameter. The next example tests the second derivative estimate on the simplest genuinely curved polynomial, a cubic, and tracks how the curvature parameter $|f''|$ depends jointly on the coefficient $\alpha$ and the dyadic scale $M$. Working this out makes explicit how the smooth-phase machinery of this chapter recovers and reorganises the polynomial estimates obtained earlier, and it sets up the differencing computation for the same cubic that appears later in the chapter.
[example: Cubic Phase]
Let $\alpha\in\mathbb R$ and
\begin{align*}
S(N,\alpha)=\sum_{1\le n\le N} e(\alpha n^3).
\end{align*}
On the dyadic block $M<n\le 2M$, take $f(x)=\alpha x^3$. If $\alpha=0$, then $e(\alpha n^3)=e(0)=1$ for every $n$, so the block is bounded by its length. Thus the curvature estimate is only interesting when $\alpha\ne 0$. In that case
\begin{align*}
f'(x)=3\alpha x^2
\end{align*}
and
\begin{align*}
f''(x)=6\alpha x.
\end{align*}
For $M\le x\le 2M$, we have
\begin{align*}
6|\alpha|M\le |f''(x)|=6|\alpha|x\le 12|\alpha|M.
\end{align*}
Also $f''(x)=6\alpha x$ has constant sign on $[M,2M]$, since $x>0$ and $\alpha$ is fixed.
Applying *Second Derivative Estimate* with $\lambda=6|\alpha|M$, $\Lambda=12|\alpha|M$, and interval length $M$ gives
\begin{align*}
\left|\sum_{M<n\le 2M}e(\alpha n^3)\right|\lesssim M(12|\alpha|M)^{1/2}+(6|\alpha|M)^{-1/2}.
\end{align*}
For the first term,
\begin{align*}
M(12|\alpha|M)^{1/2}=12^{1/2}M(|\alpha|M)^{1/2}.
\end{align*}
For the second term,
\begin{align*}
(6|\alpha|M)^{-1/2}=6^{-1/2}(|\alpha|M)^{-1/2}.
\end{align*}
Absorbing the numerical constants into the implicit constant, we obtain
\begin{align*}
\left|\sum_{M<n\le 2M}e(\alpha n^3)\right|\lesssim M(|\alpha|M)^{1/2}+(|\alpha|M)^{-1/2}.
\end{align*}
The second term is smaller than the length $M$ exactly when $|\alpha|M^3>1$, while the first term is smaller than $M$ when $|\alpha|M<1$. Thus this curvature estimate is useful on dyadic blocks where the cubic phase has accumulated visible oscillation across the block but the local curvature is not so large that the first term already exceeds the length scale.
[/example]
## Differencing and the A-Process
What should we do when a derivative test is too weak because the phase has high degree or slowly varying curvature? The van der Corput A-process replaces the original sum by a family of shifted correlation sums. The gain is that the differenced phase $f(n+h)-f(n)$ often has one less effective degree, so derivative tests become stronger after one round of averaging.
The basic inequality is a Hilbert-space estimate for finite sequences. It is independent of number theory; the arithmetic enters when we interpret the correlations.
[quotetheorem:9057]
[citeproof:9057]
The inequality isolates the correlation sums that must be estimated next. Both hypotheses earn their place: the normalisation $|z_n|\le 1$ is what keeps the discarded boundary terms at size $O(H)$ and pins the diagonal contribution at $N^2/H$, since unbounded $z_n$ would let those error terms swamp the gain, while the range $1\le H\le N$ balances the diagonal term $N^2/H$ against the number of correlation sums summed over $h$ — pushing $H$ up to $N$ collapses the diagonal saving to $O(N)$ and recovers nothing beyond the bound by length. For exponential sequences those correlations again have phase form, so it is useful to name the shifted phase that appears. This notation lets the A-process be stated as a direct consequence of the inequality rather than as a new analytic argument.
[definition: Differenced Phase]
This is the smooth-phase analogue of the polynomial difference operator $\delta_h$ from Chapter 2. Let $f:I\to\mathbb R$ be a function on an interval $I\subset\mathbb R$. For an integer $h$, set
\begin{align*}
I_h:=\{x\in I:x+h\in I\}.
\end{align*}
The differencing operator $\Delta_h$ sends such an $f$ to the function $\Delta_h f:I_h\to\mathbb R$ defined by
\begin{align*}
\Delta_h f(x):=f(x+h)-f(x).
\end{align*}
[/definition]
The differenced phase records exactly the oscillation of a shifted product $e(f(n+h))\overline{e(f(n))}$. This matters because the original phase may be too curved to estimate directly, while correlations between nearby terms can have a simpler derivative structure.
The missing estimate is a rigorous way to replace a single hard sum by a controlled average of these shifted correlations. The A-process supplies that comparison, so that simpler differenced phases can be used without losing track of the original sum.
The point of the next estimate is therefore not merely to define $\Delta_h f$, but to make differencing usable in bounds. It gives a quantitative inequality that turns cancellation in the original sum into cancellation in shifted correlation sums over a controlled range of shifts.
[quotetheorem:9058]
[citeproof:9058]
The derivative of the differenced phase is an average of $f''$ over an interval of length $h$. For smooth phases this means that the A-process converts second derivative information about $f$ into first derivative information about $\Delta_h f$.
[example: Differencing A Cubic Phase]
Let $f(x)=\alpha x^3$. For an integer shift $h$, the definition of the differenced phase gives
\begin{align*}
\Delta_h f(n)=f(n+h)-f(n)=\alpha(n+h)^3-\alpha n^3.
\end{align*}
Expanding the cube,
\begin{align*}
(n+h)^3=(n+h)(n+h)^2=(n+h)(n^2+2nh+h^2).
\end{align*}
Multiplying out the last product,
\begin{align*}
(n+h)(n^2+2nh+h^2)=n^3+2n^2h+nh^2+hn^2+2nh^2+h^3.
\end{align*}
Combining like terms gives
\begin{align*}
(n+h)^3=n^3+3hn^2+3h^2n+h^3.
\end{align*}
Therefore
\begin{align*}
\Delta_h f(n)=\alpha(n^3+3hn^2+3h^2n+h^3)-\alpha n^3.
\end{align*}
Distributing $\alpha$ and cancelling $\alpha n^3-\alpha n^3$,
\begin{align*}
\Delta_h f(n)=3\alpha h n^2+3\alpha h^2 n+\alpha h^3.
\end{align*}
Thus the first difference of the cubic phase is a quadratic polynomial in $n$. If we difference once more with shift $k$, the quadratic part becomes linear:
\begin{align*}
\Delta_k(\Delta_h f)(n)=3\alpha h((n+k)^2-n^2)+3\alpha h^2((n+k)-n).
\end{align*}
Since $(n+k)^2-n^2=2kn+k^2$ and $(n+k)-n=k$, this is
\begin{align*}
\Delta_k(\Delta_h f)(n)=6\alpha hk\,n+3\alpha h k^2+3\alpha h^2k.
\end{align*}
The second difference is therefore a linear phase in $n$, so its exponential sum is governed by the geometric progression estimate. This is the algebraic mechanism behind Weyl differencing for cubic sums.
[/example]
The A-process is often described as a norm reduction because it replaces an $L^1$-type sum by square roots of averaged correlations. Each use loses some length through Cauchy's inequality but gains smoother or lower-degree phases.
## The B-Process and Exponent Pairs
The A-process reduces phases by differencing; the B-process analyses sums by locating stationary points. The guiding question is how a sum over $n$ can be transformed into a dual sum over the possible integer values of $f'(x)$. This is the discrete analogue of the stationary phase principle for oscillatory integrals.
The following formulation records the shape of the transformation rather than the sharpest endpoint constants.
[quotetheorem:9059]
[citeproof:9059]
The B-process gives a conceptual explanation for the second derivative estimate: a sum over length $N$ with curvature about $\lambda$ has roughly $N\lambda$ stationary frequencies, each of size about $\lambda^{-1/2}$. This predicts a transformed length $N\lambda$ and total size at most $N\lambda^{1/2}$, with endpoint terms accounting for the additional $\lambda^{-1/2}$.
[example: Square-Root Phase]
Consider
\begin{align*}
S(N)=\sum_{N<n\le 2N}e(\sqrt n),
\end{align*}
and set $f(x)=\sqrt{x}=x^{1/2}$ on $[N,2N]$. Differentiating gives
\begin{align*}
f'(x)=\frac{1}{2}x^{-1/2}=\frac{1}{2\sqrt{x}}.
\end{align*}
Differentiating once more gives
\begin{align*}
f''(x)=\frac{1}{2}\left(-\frac{1}{2}\right)x^{-3/2}=-\frac{1}{4x^{3/2}}.
\end{align*}
Since $f''(x)<0$ on $[N,2N]$, the derivative $f'$ is decreasing, so its image is
\begin{align*}
f'([N,2N])=\left[\frac{1}{2\sqrt{2N}},\frac{1}{2\sqrt N}\right].
\end{align*}
The length of this derivative interval is
\begin{align*}
\frac{1}{2\sqrt N}-\frac{1}{2\sqrt{2N}}=\frac{1-1/\sqrt2}{2\sqrt N}.
\end{align*}
For $N\ge 1$, this length is less than $1$, and the whole interval lies in $(0,1/2]$. Hence there is no integer $m$ with $m\in f'([N,2N])$, except for the endpoint possibility $f'(1)=1/2$ when $N=1$, which still is not an integer. Thus the main stationary-frequency sum predicted by *Van Der Corput B-Process* is empty on this dyadic block.
The same bounds make the first derivative estimate effective. For $x\in[N,2N]$ we have
\begin{align*}
0<f'(x)\le \frac{1}{2}.
\end{align*}
Therefore the distance from $f'(x)$ to the nearest integer is $f'(x)$, with a tie only when $f'(x)=1/2$, and so
\begin{align*}
\|f'(x)\|=f'(x)\ge \frac{1}{2\sqrt{2N}}.
\end{align*}
Applying *First Derivative Estimate* with $\lambda=1/(2\sqrt{2N})$ gives
\begin{align*}
|S(N)|\lesssim \lambda^{-1}=2\sqrt{2N}.
\end{align*}
Absorbing the numerical factor into the implicit constant,
\begin{align*}
|S(N)|\lesssim N^{1/2}.
\end{align*}
Thus the square-root phase has no sizeable B-process stationary family on a large dyadic block, while its monotone derivative stays far enough from integers to give square-root-sized cancellation compared with the length bound $|S(N)|\le N$.
[/example]
The B-process has converted a geometric idea into a transformation of estimates. To apply such transformations without restating derivative hypotheses every time, the course records admissible bounds using exponent pairs. This notation keeps track of how much dependence on the derivative parameter is traded for how much dependence on the interval length.
[definition: Exponent Pair]
A pair $(\kappa,\lambda)\in\mathbb R^2$ is an exponent pair if, for every smooth real phase $f$ on an interval of length $N$ satisfying the standard one-parameter derivative conditions with parameter $T$, the bound
\begin{align*}
\sum_{n\in I}e(f(n))\lesssim T^\kappa N^\lambda
\end{align*}
holds uniformly in the allowed range of $T$ and $N$, with an implicit constant depending only on the admissible derivative constants.
[/definition]
The precise derivative hypotheses vary slightly between texts, but the transformation rules are stable. The basic pair $(0,1)$ records the estimate by length alone, while further pairs arise from applying A and B to previous pairs. The next theorem gives the algebraic rules that make exponent pairs useful as a calculus of estimates.
[quotetheorem:9060]
[citeproof:9060]
These transformations allow a small set of analytic estimates to generate many consequences, but the calculus has built-in restrictions. An exponent pair is meaningful only relative to the derivative hypotheses under which the A- and B-processes are valid, so applying the algebraic formulas silently assumes the same smoothness, monotonicity, and one-parameter control needed in the preceding estimates. The transformations also do not describe every possible admissible pair: one usually closes the known pairs under convexity, interpolation with the length bound, and repeated A/B operations, and the resulting convex hull records the estimates actually available from this method.
The next example shows the classical first nonlength pair obtained from the length bound.
[example: The Classical Exponent Pair One Sixth Two Thirds]
Start with the length-bound exponent pair $(0,1)$. By *Exponent Pair Transformations*, the B-process sends an exponent pair $(\kappa,\lambda)$ to
\begin{align*}
B(\kappa,\lambda)=\left(\lambda-\frac{1}{2},\kappa+\frac{1}{2}\right).
\end{align*}
Substituting $\kappa=0$ and $\lambda=1$ gives
\begin{align*}
B(0,1)=\left(1-\frac{1}{2},0+\frac{1}{2}\right).
\end{align*}
Since $1-\frac{1}{2}=\frac{1}{2}$ and $0+\frac{1}{2}=\frac{1}{2}$, we get
\begin{align*}
B(0,1)=\left(\frac{1}{2},\frac{1}{2}\right).
\end{align*}
Now apply the A-process to $(1/2,1/2)$. By *Exponent Pair Transformations* again,
\begin{align*}
A(\kappa,\lambda)=\left(\frac{\kappa}{2\kappa+2},\frac{\kappa+\lambda+1}{2\kappa+2}\right).
\end{align*}
With $\kappa=\frac{1}{2}$ and $\lambda=\frac{1}{2}$, the common denominator is
\begin{align*}
2\kappa+2=2\cdot\frac{1}{2}+2=1+2=3.
\end{align*}
The first coordinate is
\begin{align*}
\frac{\kappa}{2\kappa+2}=\frac{1/2}{3}=\frac{1}{2}\cdot\frac{1}{3}=\frac{1}{6}.
\end{align*}
The numerator of the second coordinate is
\begin{align*}
\kappa+\lambda+1=\frac{1}{2}+\frac{1}{2}+1=1+1=2.
\end{align*}
Therefore the second coordinate is
\begin{align*}
\frac{\kappa+\lambda+1}{2\kappa+2}=\frac{2}{3}.
\end{align*}
Combining the two coordinates gives
\begin{align*}
A(B(0,1))=A\left(\frac{1}{2},\frac{1}{2}\right)=\left(\frac{1}{6},\frac{2}{3}\right).
\end{align*}
The pair $(1/6,2/3)$ is the first classical nonlength pair obtained from the length-bound pair by applying $B$ once and then $A$ once.
[/example]
The chapter's main message is that derivative tests, differencing, and stationary phase are not separate tricks. They are three expressions of the same idea: cancellation becomes visible after choosing the right representation of the exponential sum. The following chapters use these estimates on minor arcs, where saving a small power of the length is often enough to make the circle method converge.
Chapter 3 organized cancellation through differencing and smooth phase estimates, showing how lower-order structure emerges after repeated transformations. Chapter 4 globalizes the same philosophy by studying mean values of exponential sums, where the collective behavior of many sums reveals the bounds needed later in the circle method.
# 4. Vinogradov mean values
Chapters 2 and 3 developed differencing as a way to replace a difficult exponential sum by lower-degree or smoother correlation sums. This chapter explains what happens when those differencing identities are organized globally rather than applied to a single sum. The central object is the Vinogradov mean value, which counts solutions of a system of power-sum equations and turns analytic moment estimates into arithmetic information for the minor-arc estimates used in Chapters 5 and 6.
The main question is how large a Weyl sum can be when its leading coefficient is poorly approximated by rationals with small denominator. Mean values do not answer this pointwise question directly, but they give strong average control; Markov, Hölder, and interpolation arguments then convert average estimates into bounds usable on minor arcs. [Hua's lemma](/theorems/9065) is the bridge from this moment technology to Waring-type applications.
## The Vinogradov System
The first problem is to understand the even moments of the basic polynomial Weyl sum
$S_k(\theta;X)=\sum_{1\le n\le X} e(\theta_1 n+\cdots+\theta_k n^k)$, where $\theta=(\theta_1,\dots,\theta_k)\in[0,1)^k$. Expanding $|S_k(\theta;X)|^{2s}$ and integrating term-by-term turns cancellation into a count of exact Diophantine coincidences.
[definition: Vinogradov System]
Let $s,k\in\mathbb N$ and $X\ge 1$. The Vinogradov system of degree $k$ with $2s$ variables is the system
\begin{align*}
x_1^j+\cdots+x_s^j=y_1^j+\cdots+y_s^j,\qquad 1\le j\le k,
\end{align*}
in integer variables $1\le x_i,y_i\le X$.
[/definition]
The equations ask that two multisets of $s$ integers have the same first $k$ power sums. For $s\le k$, Newton's identities force equality of the two multisets, but for larger $s$ there may be genuinely off-diagonal solutions. Counting both types is the purpose of the mean value.
[definition: Vinogradov Mean Value]
For $s,k\in\mathbb N$, the Vinogradov mean value of degree $k$ and moment parameter $s$ is the function
\begin{align*}
J_{s,k}:[1,\infty)\to\mathbb N_0
\end{align*}
defined by
\begin{align*}
J_{s,k}(X)=\#\left\{(x_1,\dots,x_s,y_1,\dots,y_s)\in\{1,\dots,\lfloor X\rfloor\}^{2s}:\sum_{i=1}^s x_i^j=\sum_{i=1}^s y_i^j\text{ for }1\le j\le k\right\}.
\end{align*}
[/definition]
This definition is arithmetic, but its power comes from its analytic representation as a moment. The system contains $k$ simultaneous equations, so a one-dimensional orthogonality integral would detect only one constraint and lose the rest. Integrating over the full $k$-torus supplies one frequency variable for each power-sum equation and turns the whole system into a single moment identity.
[quotetheorem:9061]
[citeproof:9061]
The identity shows why Vinogradov mean values are central: an upper bound for $J_{s,k}(X)$ is an $L^{2s}$ bound for polynomial exponential sums uniform in all coefficients. The full $k$-dimensional integration is essential: integrating over only the leading coefficient would impose only the top-degree equation and would miss the lower power-sum constraints. A concrete failure occurs already for $k=2$ and $s=2$: the tuple $(x_1,x_2,y_1,y_2)=(1,7,5,5)$ has
\begin{align*}
1^2+7^2=5^2+5^2,
\end{align*}
but $1+7\ne 5+5$. An integral over only $\theta_2$ would count this tuple, while $J_{2,2}(X)$ must exclude it because the linear power-sum equation fails. Thus the identity packages simultaneous cancellation in all coefficients into one arithmetic count, which is exactly what later minor-arc arguments need.
The diagonal solutions already give a lower bound, while dimension-counting gives another obstruction.
[remark: Expected Size]
There are at least $s!X^s+O_s(X^{s-1})$ diagonal solutions obtained by permuting the $y_i$ among the $x_i$. A scaling heuristic also predicts about $X^{2s-k(k+1)/2}$ solutions when the $k$ equations behave independently. Thus the expected upper bound has the shape
\begin{align*}
J_{s,k}(X)\lesssim_{s,k,\varepsilon} X^\varepsilon\left(X^s+X^{2s-k(k+1)/2}\right).
\end{align*}
[/remark]
The expected size raises the decisive question for applications: can the two natural lower-bound mechanisms be the only obstructions to a sharp upper bound? Without an estimate of this strength, a moment argument could lose too many powers of $X$ to prove that the minor arcs are smaller than the main term; the diagonal count alone gives no information about possible high-density families of off-diagonal solutions. The modern [Vinogradov mean value theorem](/theorems/9062) answers this question and supplies the uniform moment estimate needed later on minor arcs.
[quotetheorem:9062]
The staged theorem manifest records this card as a statement-only external input for these notes; its proof belongs to the modern Vinogradov mean value theory rather than to the present course.
This deep result of Bourgain, Demeter, and Guth, independently in efficient congruencing form by Wooley, is used here as a black-box input. The hypotheses are deliberately uniform in all coefficient vectors, so the theorem gives no direct pointwise information about a specified sum until it is combined with a large-value or interpolation argument. We use it as a replacement for the weaker classical estimates whenever the sharp exponent matters.
[example: Elementary Computation Of J Two Two]
Take $s=2$ and $k=2$, and write $N=\lfloor X\rfloor$. The system counted by $J_{2,2}(X)$ is
\begin{align*}
x_1+x_2=y_1+y_2
\end{align*}
together with
\begin{align*}
x_1^2+x_2^2=y_1^2+y_2^2.
\end{align*}
Squaring the first equation gives
\begin{align*}
(x_1+x_2)^2=(y_1+y_2)^2.
\end{align*}
Using $(a+b)^2=a^2+b^2+2ab$ on both sides gives
\begin{align*}
x_1^2+x_2^2+2x_1x_2=y_1^2+y_2^2+2y_1y_2.
\end{align*}
Subtracting the second equation from this identity leaves
\begin{align*}
2x_1x_2=2y_1y_2,
\end{align*}
so
\begin{align*}
x_1x_2=y_1y_2.
\end{align*}
Thus the ordered pair $(y_1,y_2)$ has the same sum and product as $(x_1,x_2)$. Equivalently, $y_1$ and $y_2$ are the two roots of
\begin{align*}
t^2-(x_1+x_2)t+x_1x_2=0.
\end{align*}
But
\begin{align*}
t^2-(x_1+x_2)t+x_1x_2=(t-x_1)(t-x_2),
\end{align*}
so $(y_1,y_2)$ is a permutation of $(x_1,x_2)$.
There are $N^2$ choices for the ordered pair $(x_1,x_2)$. For the $N$ choices with $x_1=x_2$, there is exactly one corresponding ordered pair $(y_1,y_2)$, namely $(x_1,x_1)$. For the remaining $N^2-N$ choices with $x_1\ne x_2$, there are exactly two corresponding ordered pairs, namely $(x_1,x_2)$ and $(x_2,x_1)$. Therefore
\begin{align*}
J_{2,2}(X)=N+2(N^2-N)=2N^2-N=2\lfloor X\rfloor^2-\lfloor X\rfloor.
\end{align*}
This shows that in the case $s=k=2$ all solutions are diagonal, and the mean value has size $J_{2,2}(X)\asymp X^2$.
[/example]
The example illustrates the diagonal regime $s\le k$. Once $s$ is larger, off-diagonal solutions appear and the second term in the Vinogradov [mean value theorem](/theorems/186) begins to control the answer.
## From Moments To Pointwise Weyl Sum Bounds
The next problem is to turn an integral estimate into a bound for an individual Weyl sum. A single large value of a function can hide inside a high moment unless the set on which it is large has enough measure, so the analytic task is to manufacture such a set from Diophantine approximation and local stability of the phase.
[quotetheorem:514]
[citeproof:514]
This is the cleanest large-value estimate, but by itself it only bounds the measure of exceptional coefficient vectors. The hypothesis $F\in L^{2s}$ is doing real work: without a finite moment the right-hand side may be infinite, and the statement gives no exclusion of large values. For example, on $(0,1)$ the function $F(t)=t^{-1/(2s)}$ is not in $L^{2s}$, and for every $T\ge 1$ the large-value set has measure $T^{-2s}$ while the formal right-hand side is infinite. The positive threshold is also necessary: if $T=0$ and $F$ vanishes only on a null set, the expression $T^{-2s}$ is meaningless rather than a bound. For a pointwise Weyl bound, we combine Markov's estimate with information saying that if the sum is large at one point, it remains large on a small box around that point.
[quotetheorem:9063]
[citeproof:9063]
The stability box supplies the missing measure lower bound. Its anisotropic side lengths are forced by the derivatives: changing $\theta_k$ by the same amount as $\theta_1$ would move the phase by about $X^k$ times more. This cannot be replaced by an isotropic box of the same side length in every direction. At $\theta=0$ one has $S_k(0;X)=\lfloor X\rfloor$, but moving only in the top coefficient to $\theta_k'=1/(2X^k)$ changes the phase of the terms with $n$ near $X$ by order $1$, so a box with $\theta_k$-side comparable to $AX^{-2}$ would be far too large when $k\ge 2$ and $A\asymp X$. The lower bound $A/2$ is also tied to the small constant in the side lengths; increasing that constant without restriction can move the sum past a point of substantial cancellation. The estimate is local in coefficient space, so it does not identify which rational approximations create large values; it only says that a large value cannot be supported on a set of arbitrarily small measure. The next step is to compare that lower bound with Markov's upper bound.
[quotetheorem:9064]
[citeproof:9064]
This conversion is intentionally crude but logically precise: it gives a global pointwise bound from a global moment bound and local Lipschitz stability alone. Both inputs are necessary. A finite moment bound without stability cannot control a point value: a function on $[0,1]$ may have height $N$ on an interval of length $N^{-4s}$ and height $0$ elsewhere, giving a small $L^{2s}$ norm but a very large supremum. Conversely, stability without the assumed moment bound only says a large value occupies a box; it gives no contradiction unless the total $2s$-moment is small enough. The displayed exponent must also be read against the value at $\theta=0$, where $|S_k(0;X)|=\lfloor X\rfloor$; if the assumed exponent $s+\Delta$ is too weak, the theorem returns a bound no better than the elementary estimate $|S_k|\le X$. To obtain the sharper minor-arc estimates used in the circle method, one adds a separate rational-approximation analysis that rules out major-arc boxes and usually improves the stability-volume comparison. The theorem here isolates the mechanism without building an undefined major-arc condition into the statement.
[example: Converting An L Two S Mean Value Into A Pointwise Estimate]
Assume the moment estimate
\begin{align*}
\int_{[0,1)^k}|S_k(\phi;X)|^{2s}\,d\phi\le C_\varepsilon X^{s+\varepsilon}.
\end{align*}
Fix $\theta\in[0,1)^k$ and put $A=|S_k(\theta;X)|$. If $A<1$, the elementary bound is already stronger than any positive power of $X$, so assume $A\ge 1$. By *[Local Stability Of Polynomial Weyl Sums](/theorems/9063)*, there is a box $B$ centred at $\theta$ such that $|S_k(\phi;X)|\ge A/2$ for every $\phi\in B$, and the side length in the $\phi_j$ direction is at least a constant multiple of $AX^{-j-1}$. Hence
\begin{align*}
\operatorname{vol}(B)\ge c_k\prod_{j=1}^k AX^{-j-1}.
\end{align*}
The product of the $A$ factors is $A^k$, and
\begin{align*}
\sum_{j=1}^k(j+1)=\sum_{j=1}^k j+\sum_{j=1}^k1=\frac{k(k+1)}2+k=\frac{k(k+3)}2.
\end{align*}
Therefore
\begin{align*}
\operatorname{vol}(B)\ge c_k A^kX^{-k(k+3)/2}.
\end{align*}
Let $E=\{\phi\in[0,1)^k:|S_k(\phi;X)|\ge A/2\}$. Since $B\subseteq E$,
\begin{align*}
c_k A^kX^{-k(k+3)/2}\le \mathcal L^k(E).
\end{align*}
By *Markov Bound For Large Values* and the assumed moment estimate,
\begin{align*}
\mathcal L^k(E)\le (A/2)^{-2s}\int_{[0,1)^k}|S_k(\phi;X)|^{2s}\,d\phi\le 2^{2s}C_\varepsilon A^{-2s}X^{s+\varepsilon}.
\end{align*}
Combining the lower and upper bounds for $\mathcal L^k(E)$ gives
\begin{align*}
c_k A^kX^{-k(k+3)/2}\le 2^{2s}C_\varepsilon A^{-2s}X^{s+\varepsilon}.
\end{align*}
Multiplying by $A^{2s}X^{k(k+3)/2}/c_k$ yields
\begin{align*}
A^{2s+k}\le C'_{s,k,\varepsilon}X^{s+k(k+3)/2+\varepsilon}.
\end{align*}
Taking the $(2s+k)$-th root gives
\begin{align*}
A\le C''_{s,k,\varepsilon}X^{\frac{s+k(k+3)/2}{2s+k}+\varepsilon}.
\end{align*}
This exponent is strictly smaller than $1$ exactly when
\begin{align*}
s+\frac{k(k+3)}2<2s+k,
\end{align*}
which is equivalent to
\begin{align*}
s>\frac{k(k+1)}2.
\end{align*}
Thus, in this idealized moment situation, sufficiently high moments force a power saving over the elementary estimate $|S_k(\theta;X)|\le X$; in an actual minor-arc argument, the separate major-arc exclusion identifies the coefficient vectors where this saving is useful.
[/example]
## Hua's Lemma And Waring-Type Minor Arcs
The third problem is more concrete: Waring's problem needs estimates for sums $\sum_{n\le X}e(\alpha n^k)$ with one coefficient, not the full $k$-parameter Weyl sum. Hua's lemma gives a family of one-dimensional moment bounds strong enough to control minor arcs after Weyl differencing.
[quotetheorem:9065]
[citeproof:9065]
Hua's lemma predates the sharp Vinogradov mean value theorem and is weaker than what modern VMVT can imply in many ranges. Its hypotheses are tailored to pure $k$-th power sums; if lower-degree coefficients are allowed to vary independently, the full Vinogradov mean value theorem is the natural replacement. The purity hypothesis is not cosmetic. If the phase is replaced by $\frac{1}{2}(n^k-n)$, then every term has phase an integer because $n^k-n$ is even, so the sum has no oscillation at all, while the differencing proof above no longer tracks only the single highest monomial in the same way. Likewise, dropping the integration over a full period would break the orthogonality step: integrating over a short interval around $0$ would see $f_k(\alpha;X)\approx X$ throughout that interval and would not produce the same power saving. Its advantage is that it is one-dimensional and fits exactly into the classical circle method treatment of sums of $k$-th powers.
[example: Cubic Minor Arc Bound From Hua]
Let $f_3(\alpha;X)=\sum_{n\le X}e(\alpha n^3)$, and let $\mathfrak m\subseteq[0,1)$ be a minor-arc set on which the Weyl bound
\begin{align*}
\sup_{\alpha\in\mathfrak m}|f_3(\alpha;X)|\le C_\delta X^{1-\delta}
\end{align*}
holds for some $\delta>0$. By *Hua's Lemma* with $k=3$ and $j=3$,
\begin{align*}
\int_0^1|f_3(\alpha;X)|^8\,d\alpha\le C_\varepsilon X^{5+\varepsilon}.
\end{align*}
For any $r\ge 8$, write the integrand as
\begin{align*}
|f_3(\alpha;X)|^r=|f_3(\alpha;X)|^{r-8}|f_3(\alpha;X)|^8.
\end{align*}
Since $\alpha\in\mathfrak m$ implies $|f_3(\alpha;X)|^{r-8}\le (C_\delta X^{1-\delta})^{r-8}$, we get
\begin{align*}
\int_{\mathfrak m}|f_3(\alpha;X)|^r\,d\alpha\le (C_\delta X^{1-\delta})^{r-8}\int_{\mathfrak m}|f_3(\alpha;X)|^8\,d\alpha.
\end{align*}
Because $\mathfrak m\subseteq[0,1)$,
\begin{align*}
\int_{\mathfrak m}|f_3(\alpha;X)|^8\,d\alpha\le \int_0^1|f_3(\alpha;X)|^8\,d\alpha\le C_\varepsilon X^{5+\varepsilon}.
\end{align*}
Therefore
\begin{align*}
\int_{\mathfrak m}|f_3(\alpha;X)|^r\,d\alpha\le C_{\delta}^{r-8}C_\varepsilon X^{(1-\delta)(r-8)+5+\varepsilon}.
\end{align*}
Expanding the exponent gives
\begin{align*}
(1-\delta)(r-8)+5+\varepsilon=r-3-\delta(r-8)+\varepsilon.
\end{align*}
Thus Hua's eighth moment controls the first eight factors, while the minor-arc Weyl saving supplies an additional factor $X^{-\delta}$ for each of the remaining $r-8$ powers.
[/example]
The cubic example shows how moment bounds are spent inside a minor-arc integral, but it still leaves a pointwise question: where does the saving $|f_k(\alpha;X)|\le X^{1-\delta}$ come from? The classical answer is Vinogradov's polynomial Weyl-sum bound, which converts a rational approximation to the leading coefficient into an explicit cancellation estimate.
[quotetheorem:9066]
[citeproof:9066]
This is the classical pointwise estimate behind many minor-arc arguments. The coprimality and approximation hypotheses are essential for the spacing estimate. If the coprimality condition is dropped, the same rational number can be represented with an inflated denominator: $\alpha_k=0$ may be written as $a/q=0/q$ with large $q$ and $(a,q)\ne 1$, which would falsely predict cancellation although $\sum_{n\le X}e(P(n))$ can equal $X$ when all coefficients are integral. The approximation hypothesis is also necessary: choosing an unrelated denominator $q$ for a coefficient with no nearby rational $a/q$ gives no spacing information about $\|h\alpha_k\|$. If $q$ is tiny and genuinely approximates $\alpha_k$, the leading coefficient is close to a major-arc rational and the bound may have no useful saving. The factor is small when $q$ is neither tiny nor too close to $X^k$, which is exactly the minor-arc regime after Dirichlet approximation has assigned each $\alpha_k$ a rational approximant.
[example: Deriving A Minor Arc Bound For Cubic Weyl Sums]
Let $f_3(\alpha;X)=\sum_{n\le X}e(\alpha n^3)$. By *[Vinogradov Bound For Polynomial Weyl Sums](/theorems/9066)* with $k=3$, if $|\alpha-a/q|\le q^{-2}$ and $(a,q)=1$, then
\begin{align*}
|f_3(\alpha;X)|\lesssim_\varepsilon X^{1+\varepsilon}\left(q^{-1}+X^{-1}+qX^{-3}\right)^{1/4}.
\end{align*}
Assume that $\alpha$ lies on a cubic minor arc for which
\begin{align*}
X^\eta\le q\le X^{3-\eta}
\end{align*}
with fixed $\eta>0$. From $q\ge X^\eta$ we have
\begin{align*}
q^{-1}\le X^{-\eta}.
\end{align*}
From $q\le X^{3-\eta}$ we have
\begin{align*}
qX^{-3}\le X^{3-\eta}X^{-3}=X^{-\eta}.
\end{align*}
Also $X^{-1}$ is bounded by $X^{-\eta'}$ if $\eta'=\min\{\eta,1\}$. Since $X^{-\eta}\le X^{-\eta'}$, the three terms in the Weyl factor satisfy
\begin{align*}
q^{-1}+X^{-1}+qX^{-3}\le 3X^{-\eta'}.
\end{align*}
Substituting this estimate into the cubic Weyl bound gives
\begin{align*}
|f_3(\alpha;X)|\lesssim_\varepsilon X^{1+\varepsilon}(3X^{-\eta'})^{1/4}.
\end{align*}
The factor $3^{1/4}$ is absolute, so it is absorbed into the implicit constant:
\begin{align*}
|f_3(\alpha;X)|\lesssim_\varepsilon X^{1-\eta'/4+\varepsilon}.
\end{align*}
Thus, choosing $\varepsilon<\eta'/4$ gives a fixed power saving over the elementary bound $|f_3(\alpha;X)|\le \lfloor X\rfloor\le X$ on these minor arcs, which is the pointwise input needed in the minor-arc integral estimates.
[/example]
The chapter therefore leaves us with two complementary tools. The Vinogradov mean value theorem supplies near-optimal global moment control for complete polynomial coefficient vectors, while Hua's lemma and Vinogradov's bound give the one-dimensional estimates most directly inserted into Waring-type circle method integrals. The next chapter uses these estimates to separate major arcs, where singular series and singular integrals appear, from minor arcs, where the savings proved here dominate.
The mean-value estimates of Chapter 4 provide the quantitative input that makes minor-arc analysis effective. Chapter 5 uses them to assemble the Hardy-Littlewood circle method into a general framework for turning representation problems into major-arc asymptotics and minor-arc error bounds.
# 5. The Hardy-Littlewood circle method
Chapters 2--4 developed the Weyl, van der Corput, and Vinogradov mean-value estimates that separate structured rational phases from genuinely oscillatory phases. This chapter turns those estimates into an asymptotic counting machine. The Hardy-Littlewood circle method rewrites a representation problem as a Fourier coefficient, splits the unit circle into major and minor arcs, and then interprets the main term as the product of a real density and a family of congruence densities.
The guiding example is Waring's problem: count solutions of
\begin{align*}
x_1^k+\dots+x_s^k=N,\qquad 1\le x_i\le P,\qquad P\asymp N^{1/k}.
\end{align*}
The same architecture applies to many additive equations, but kth powers expose all the central features: generating functions, rational approximations, complete exponential sums, singular integrals, singular series, and local obstructions.
## Generating Functions and Representation Integrals
How can an additive equation be converted into an analytic object that remembers the number of representations? The first step is to encode the allowed summands into a trigonometric polynomial and recover the desired coefficient by orthogonality on the circle.
[definition: Circle Exponential]
The circle exponential is the function $e:\mathbb R\to\mathbb C$ defined by
\begin{align*}
e(x)=e^{2\pi i x}.
\end{align*}
[/definition]
This notation makes additive characters on $\mathbb R/\mathbb Z$ look like monomials in an ordinary generating function. To count sums of kth powers, we need a trigonometric polynomial whose frequencies are precisely the allowed powers $x^k$.
[definition: Kth-Power Weyl Sum]
Let $k\ge 2$ and let $P\ge 1$. The kth-power Weyl sum of length $P$ is the function $S_k(\cdot;P):\mathbb R\to\mathbb C$ defined by
\begin{align*}
S_k(\alpha;P)=\sum_{1\le x\le P} e(\alpha x^k).
\end{align*}
[/definition]
The power $S_k(\alpha;P)^s$ expands into a sum over $s$ variables, so it contains all possible ordered sums $x_1^k+\dots+x_s^k$. The counting problem is to keep only those terms whose total frequency is exactly $N$ and discard all others. Circle orthogonality supplies that filter: multiplying by $e(-N\alpha)$ shifts the target frequency to zero, and integration keeps precisely the zero-frequency terms.
[quotetheorem:9067]
[citeproof:9067]
This identity is the point at which Diophantine counting becomes harmonic analysis. Its hypotheses matter in two ways: the variables are positive integers in a finite interval, and $N$ is an integer frequency, so the orthogonality integral is exactly a Kronecker delta. If the summation set were weighted or infinite, the same displayed formula would need convergence or smoothing hypotheses before the interchange of summation and integration could be justified. The identity gives no asymptotic information by itself; it only transfers the problem to estimating the integral, which is why the rest of the chapter is devoted to identifying where that integral has mass.
[example: Representations as Five Squares]
For $k=2$ and $s=5$, take $P=\lfloor N^{1/2}\rfloor$ and define $S_2(\alpha;P)=\sum_{1\le x\le P}e(\alpha x^2)$. Expanding the fifth power gives
\begin{align*}
S_2(\alpha;P)^5=\sum_{1\le x_1,\dots,x_5\le P}e(\alpha(x_1^2+x_2^2+x_3^2+x_4^2+x_5^2)).
\end{align*}
Multiplying by $e(-N\alpha)$ and using $e(u)e(v)=e(u+v)$, we get
\begin{align*}
S_2(\alpha;P)^5e(-N\alpha)=\sum_{1\le x_1,\dots,x_5\le P}e(\alpha(x_1^2+x_2^2+x_3^2+x_4^2+x_5^2-N)).
\end{align*}
For an integer $m$, the orthogonality integral is
\begin{align*}
\int_0^1 e(m\alpha)\,d\alpha=1
\end{align*}
when $m=0$, while for $m\ne 0$ it is
\begin{align*}
\int_0^1 e(m\alpha)\,d\alpha=\frac{e(m)-1}{2\pi i m}=0
\end{align*}
because $e(m)=e^{2\pi i m}=1$. Therefore
\begin{align*}
\int_0^1 S_2(\alpha;P)^5e(-N\alpha)\,d\alpha=\#\{(x_1,\dots,x_5)\in\{1,\dots,P\}^5:x_1^2+\cdots+x_5^2=N\}=R_{5,2}(N;P).
\end{align*}
Since $P\asymp N^{1/2}$, the circle-method main term predicted later has real scale $P^{5-2}=P^3\asymp N^{3/2}=N^{5/2-1}$, multiplied by the singular series and singular integral; proving the asymptotic then amounts to showing that the minor-arc integral is $o(N^{3/2})$.
[/example]
The example shows the main tension. Near rationals with small denominator, the phase $\alpha x^k$ has visible arithmetic structure; away from such rationals, Weyl estimates should force cancellation.
## Major Arcs, Minor Arcs, and Farey Dissection
Where on the unit circle can the integral have a main contribution? The answer is governed by rational approximation: points close to $a/q$ with small $q$ behave like finite congruence sums modulo $q$ times a smooth real oscillatory integral.
[definition: Major Arcs for Kth Powers]
Let $Q\ge 1$, let $P\ge 1$, and let $k\ge 2$. For coprime integers $a,q$ with $1\le q\le Q$ and $0\le a<q$, define
\begin{align*}
\mathfrak M(q,a):=\left\{\alpha\in[0,1):\left|\alpha-\frac{a}{q}\right|\le \frac{Q}{qP^k}\right\}.
\end{align*}
The major arcs are
\begin{align*}
\mathfrak M:=\bigcup_{1\le q\le Q}\ \bigcup_{0\le a<q,\ \gcd(a,q)=1}\mathfrak M(q,a),
\end{align*}
and the minor arcs are $\mathfrak m:=[0,1)\setminus\mathfrak M$.
[/definition]
The cutoff $Q$ is a balancing parameter. It must be large enough to include all rational phases that contribute to the main term, but small enough that the arcs remain essentially disjoint. We next need a finite exponential sum that records the arithmetic oscillation at the rational centre $a/q$.
[definition: Complete Kth-Power Sum]
For $k\ge 2$, the complete kth-power sum is the function $C_k:\mathbb N\times\mathbb Z\to\mathbb C$ defined by
\begin{align*}
C_k(q,a)=\sum_{r=1}^{q} e\left(\frac{a r^k}{q}\right).
\end{align*}
[/definition]
Complete sums isolate the arithmetic part of a major arc. Once this modular oscillation is separated, we still need a continuous object that measures how the phase changes as $\alpha$ moves away from $a/q$.
[definition: Local Major Arc Integral]
For $k\ge 2$ and $P\ge 1$, the local major arc integral is the function $V_k(\cdot;P):\mathbb R\to\mathbb C$ defined by
\begin{align*}
V_k(\beta;P)=\int_0^P e(\beta t^k)\,dt.
\end{align*}
[/definition]
The complete sum and the local integral have been introduced because the major arcs should factor into arithmetic and real pieces. Near a rational point $a/q$, the summation variable has two simultaneous roles: its residue class modulo $q$ controls the arithmetic oscillation, while its size controls the slowly varying real phase $e(\beta x^k)$.
The key obstacle is to justify replacing a discrete Weyl sum by the product of these two pieces. A useful major-arc formula must separate residue-class oscillation from continuous variation and still give an error term small enough to be inserted into the circle-method integral.
The next approximation is the local bridge between the discrete and continuous objects. It explains exactly when the Weyl sum near $a/q$ is governed by the product $q^{-1}S(q,a)V_k(\beta;P)$, and it records the size of the error that later major-arc integrations must absorb.
[quotetheorem:9068]
[citeproof:9068]
This approximation evaluates a single Weyl sum near one rational centre. The coprimality condition ensures that $a/q$ is a reduced rational centre; without it the same point would be counted repeatedly with different denominators, which would distort the complete-sum factor. The restriction $|\beta|\le Q/(qP^k)$ is also structural: outside this range the continuous factor is no longer slowly varying on residue classes modulo $q$, and the displayed error may be as large as the expected main term. The theorem is therefore a local model, not a global estimate for all $\alpha$, and the next step is to insert it only on the arcs where its hypotheses are designed to hold.
[quotetheorem:9069]
[citeproof:9069]
This theorem names the main strategy of the method, but it also exposes its main limitation: a decomposition is only useful when the proposed main term is nonzero and the two error estimates are genuinely relative to it. If local congruence obstructions force $M(N;P)=0$, then a statement of the form $o(M(N;P))$ has no asymptotic meaning, and one must first identify the obstruction rather than estimate minor arcs. The partition into $\mathfrak M$ and $\mathfrak m$ is flexible, so the quality of the final result depends on choosing rational centres finely enough to capture structure but sparsely enough to keep the major arc analysis controlled. The next organizational issue is therefore how to choose those centres without ambiguity, especially when many nearby fractions compete to approximate the same point of the circle.
[remark: Farey Dissection]
A Farey dissection chooses rational centres $a/q$ with $q\le Q$ according to the Farey sequence of order $Q$. This gives a canonical partition of $[0,1)$ into intervals attached to reduced fractions, with endpoints at mediants of neighbouring Farey fractions. In many treatments, the major arcs are the central parts of these Farey intervals and the minor arcs are the remaining parts.
[/remark]
The Farey viewpoint is useful because it avoids overlaps and records the best small-denominator approximation to each point. In applications, the more flexible overlapping definition above is often enough once $Q$ is chosen so that overlaps are harmless or can be accounted for.
## Singular Integrals and Singular Series
What remains after the major arc approximation is inserted into the representation integral? The main term separates into a real density measuring solutions over $\mathbb R$ and an arithmetic density measuring solutions modulo powers of primes. Without the real density, the method would predict the same order of magnitude for every target size, even though the hypersurface $u_1^k+\dots+u_s^k=\tau$ has a different real volume as $\tau$ varies and may have no positive real points for some signs or ranges.
[definition: Singular Integral for Kth Powers]
Let $k,s\in\mathbb N$ with $k\ge 2$. The singular integral for kth powers is the partially defined function $J_{s,k}:(0,\infty)\to\mathbb C$ defined by
\begin{align*}
J_{s,k}(\tau)=\int_{-\infty}^{\infty}\left(\int_0^1 e(\gamma t^k)\,dt\right)^s e(-\tau\gamma)\,d\gamma,
\end{align*}
where the value is defined whenever the improper integral exists.
[/definition]
For the equation $x_1^k+\dots+x_s^k=N$ with $P=N^{1/k}$, scaling $t=Pu$ in $V_k(\beta;P)$ produces the power $P^{s-k}=N^{s/k-1}$ and the parameter $\tau=1$.
[example: Real Singular Integral for Kth Powers]
Set $P=N^{1/k}$, so $N=P^k$, and write $\gamma=\beta P^k$. In the local major arc integral, make the change of variables $t=Pu$, so $dt=P\,du$ and $t^k=P^k u^k$. Then
\begin{align*}
V_k(\beta;P)=\int_0^P e(\beta t^k)\,dt=P\int_0^1 e(\beta P^k u^k)\,du.
\end{align*}
Since $\gamma=\beta P^k$, this becomes
\begin{align*}
V_k(\beta;P)=P\int_0^1 e(\gamma u^k)\,du.
\end{align*}
Now compute the full real factor using the same substitution $\gamma=\beta P^k$. We have $\beta=\gamma/P^k$, $d\beta=P^{-k}\,d\gamma$, and
\begin{align*}
e(-N\beta)=e(-P^k\gamma/P^k)=e(-\gamma).
\end{align*}
Therefore
\begin{align*}
V_k(\beta;P)^s=P^s\left(\int_0^1 e(\gamma u^k)\,du\right)^s.
\end{align*}
Substituting these identities into the integral gives
\begin{align*}
\int_{-\infty}^{\infty}V_k(\beta;P)^s e(-N\beta)\,d\beta=P^{s-k}\int_{-\infty}^{\infty}\left(\int_0^1 e(\gamma u^k)\,du\right)^s e(-\gamma)\,d\gamma.
\end{align*}
By the definition of $J_{s,k}(1)$, the last integral is $J_{s,k}(1)$, so
\begin{align*}
\int_{-\infty}^{\infty}V_k(\beta;P)^s e(-N\beta)\,d\beta=P^{s-k}J_{s,k}(1).
\end{align*}
Since $P=N^{1/k}$, the scaling factor is
\begin{align*}
P^{s-k}=(N^{1/k})^{s-k}=N^{(s-k)/k}=N^{s/k-1}.
\end{align*}
Thus, once the truncated major arc range has been extended to the full real line with a lower-order tail error, the real singular integral contributes on the scale $N^{s/k-1}$; for $s>k$ this scale grows with $N$.
[/example]
The real factor alone cannot decide whether integer representations exist, because congruence restrictions may forbid them. We therefore need an arithmetic companion to the singular integral, built from the complete sums at all rational centres.
[definition: Singular Series for Kth Powers]
Let $k,s\in\mathbb N$ with $k\ge 2$. The singular series for kth powers is the partially defined function $\mathfrak S_{s,k}:\mathbb Z\to\mathbb C$ defined by
\begin{align*}
\mathfrak S_{s,k}(N)=\sum_{q=1}^{\infty}q^{-s}\sum_{0\le a<q,\ \gcd(a,q)=1}C_k(q,a)^s e\left(-\frac{aN}{q}\right),
\end{align*}
where the value is defined whenever the series converges.
[/definition]
This expression arises by summing the complete-sum contribution over all major arc centres. To interpret it arithmetically, we need a modular representation count for the same equation over $\mathbb Z/q\mathbb Z$.
[definition: Modular Representation Count]
For $k,s\in\mathbb N$ with $k\ge 2$ and $q\in\mathbb N$, the modular representation count is the function $M_{s,k}(\cdot;q):\mathbb Z\to\mathbb N\cup\{0\}$ defined by
\begin{align*}
M_{s,k}(N;q)=\#\{(x_1,\dots,x_s)\in(\mathbb Z/q\mathbb Z)^s:x_1^k+\dots+x_s^k\equiv N\pmod q\}.
\end{align*}
[/definition]
The modular count translates local solubility into a number that can be compared across prime powers. The analytic singular series is written as a sum of complete exponential sums, so its local meaning is not visible until those sums are converted back into congruence counts. Additive character orthogonality performs this conversion for each modulus, and the [Chinese remainder theorem](/theorems/734) is what lets the resulting information split into prime-power factors.
[quotetheorem:9070]
[citeproof:9070]
The theorem explains the phrase local-to-global heuristic. The absolute convergence hypothesis is not cosmetic: without it, rearranging the singular series into an Euler product can change the value or fail to define a product at all. The prime-power limits also encode a real limitation of the method, since a single missing congruence class modulo $p^h$ forces the corresponding local factor to vanish no matter how strong the analytic estimates are. The real density handles the archimedean place, while the product over primes handles all congruence classes modulo $p^h$; to use the asymptotic for existence, we therefore need to know when this product is positive rather than zero.
[quotetheorem:9071]
[citeproof:9071]
This criterion turns the singular series from a formal analytic object into a diagnostic tool. The positivity of each individual local factor is not enough unless the infinite product is controlled, because a product of positive numbers can still tend to zero. The second direction gives a concrete counterexample pattern: if a congruence obstruction appears at even one prime power, then no amount of real-variable analysis can produce integer solutions in the forbidden residue class. Before estimating minor arcs, one should first check that the target equation has solutions modulo every relevant prime power and that the surviving local densities do not decay away in the product.
[example: A Local Obstruction Modulo Nine]
For each residue $r\in\{0,1,\dots,8\}$, compute $r^3$ modulo $9$:
\begin{align*}
0^3\equiv 0,\quad 1^3\equiv 1,\quad 2^3=8\equiv 8,\quad 3^3=27\equiv 0,\quad 4^3=64\equiv 1\pmod 9.
\end{align*}
\begin{align*}
5^3=125\equiv 8,\quad 6^3=216\equiv 0,\quad 7^3=343\equiv 1,\quad 8^3=512\equiv 8\pmod 9.
\end{align*}
Thus every cube modulo $9$ lies in $\{0,1,8\}$, and $8\equiv -1\pmod 9$.
Now add two possible cube residues. The sums are
\begin{align*}
0+0\equiv 0,\quad 0+1\equiv 1,\quad 0+8\equiv 8,\quad 1+1\equiv 2,\quad 1+8\equiv 0,\quad 8+8\equiv 16\equiv 7\pmod 9.
\end{align*}
Therefore a sum of two cubes modulo $9$ can lie only in
\begin{align*}
\{0,1,2,7,8\}\pmod 9.
\end{align*}
The missing residue classes are
\begin{align*}
\{3,4,5,6\}\pmod 9.
\end{align*}
So if $N\equiv 3,4,5,$ or $6\pmod 9$, there are no residue classes $(x,y)\in(\mathbb Z/9\mathbb Z)^2$ with $x^3+y^3\equiv N\pmod 9$. Hence
\begin{align*}
M_{2,3}(N;9)=0.
\end{align*}
Any integer solution $x^3+y^3=N$ would reduce modulo $9$ to such a forbidden congruence solution, so the obstruction is genuine. In the Euler-product interpretation of the singular series, the local factor at $p=3$ is then zero, and the corresponding singular series vanishes.
[/example]
The obstruction is not a failure of analysis; it is genuine arithmetic information. The circle method predicts asymptotics only for integers passing the local tests encoded by the singular series.
## The Standard Asymptotic Shape
What does the method ultimately try to prove for Waring-type problems? Once the major arcs have been evaluated and the minor arcs bounded, the answer takes a universal form: a power of $N$, multiplied by the singular integral and singular series.
[quotetheorem:9072]
[citeproof:9072]
The theorem is a template rather than a single universal result because the necessary lower bound on $s$ depends on the available exponential-sum technology. Each hypothesis blocks a different possible failure: weak minor arc bounds can swamp the main term, divergent singular series can destroy the arithmetic factor, and a zero local density can make the predicted main term vanish. For example, a Waring-type equation in a residue class forbidden modulo a prime power cannot satisfy the positive-factor hypothesis, even if the real singular integral is nonzero. Stronger Weyl-type estimates and mean value theorems enlarge the range of additive problems accessible to the method, so the remaining analytic work is to prove the displayed minor arc estimate in as small a number of variables as possible.
[remark: What the Minor Arcs Must Prove]
The minor arc task is not to estimate $S_k(\alpha;P)$ pointwise at every point with the same strength. Often one combines a pointwise Weyl bound with an $L^u$ mean value estimate, using Hölder's inequality to control the $s$th moment on $\mathfrak m$. This is why the earlier chapters on Weyl differencing and Vinogradov-type mean values feed directly into the circle method.
[/remark]
At the end of this chapter, the circle method has been reduced to three verifications for any proposed additive problem: compute the real density, analyse the singular series and its local obstructions, and prove that the minor arcs are smaller than the predicted main term.
Chapter 5 reduced the circle method to checking local densities, singular series, and minor-arc saving. Chapter 6 applies that framework to Waring’s problem, where the abstract machinery becomes a concrete representation theorem for sums of powers.
# 6. Waring's problem
Waring's problem asks how efficiently the positive integers can be built from fixed powers. Chapters 2--5 developed Weyl sums, major and minor arcs, singular series, singular integrals, and mean-value estimates as tools for additive representation functions. Here those tools meet their first large-scale test: counting solutions of $n=x_1^k+\cdots+x_s^k$ with $x_i\in\mathbb N$, and separating existence questions from asymptotic counting questions.
## Exact and Asymptotic Forms of Waring's Problem
The first issue is not how to estimate an exponential sum, but what question the estimate is meant to answer. Waring's original problem asks for a uniform finite number of summands, while the Hardy-Littlewood method naturally predicts an asymptotic formula once the number of summands is large enough.
[definition: Waring Number]
Let $\mathbb N_{\ge 2}=\{k\in\mathbb N:k\ge 2\}$. The Waring number is the function
\begin{align*}
g:\mathbb N_{\ge 2}\to\mathbb N,\qquad k\mapsto g(k),
\end{align*}
defined by letting $g(k)$ be the least integer $s\in\mathbb N$ such that every positive integer is a sum of $s$ non-negative $k$-th powers, when such an integer exists.
[/definition]
The word "every" makes $g(k)$ sensitive to small exceptional integers. This is a mismatch with the circle method, whose estimates usually become effective only after $n$ is large and cannot by themselves repair finitely many small failures. To measure the part of Waring's problem controlled by large-$n$ analysis, we separate the eventual representation question from the exact one.
[definition: Asymptotic Waring Number]
The asymptotic Waring number is the function
\begin{align*}
G:\mathbb N_{\ge 2}\to\mathbb N,\qquad k\mapsto G(k),
\end{align*}
defined by letting $G(k)$ be the least integer $s\in\mathbb N$ such that every sufficiently large positive integer is a sum of $s$ non-negative $k$-th powers, when such an integer exists.
[/definition]
The inequality $G(k)\le g(k)$ follows from the definitions after restricting the exact statement to the sufficiently large range. In the analytic part of the chapter we often count representations by positive variables, because the circle method naturally sums over $1\le x\le P$; such a result is stronger than needed for the non-negative definition of $G(k)$. The analytic task in this chapter is to bound $G(k)$, and in stronger forms to count representations with an asymptotic main term.
[example: Four Squares as an Atypical Case]
For $k=2$, *Lagrange's Four-Square Theorem* says that every positive integer is a sum of four non-negative squares, so $g(2)\le 4$. To see that three squares do not suffice, reduce squares modulo $8$: if $a$ is even then $a^2\equiv 0$ or $4\pmod 8$, and if $a$ is odd then $a^2\equiv 1\pmod 8$. Thus each square has residue in $\{0,1,4\}$ modulo $8$, and the possible sums of three such residues are
\begin{align*}
\{0,1,2,3,4,5,6\}\pmod 8.
\end{align*}
The residue $7$ is missing, so no integer congruent to $7\pmod 8$, for instance $7$ itself, is a sum of three squares. Hence $g(2)\ge 4$, and together with [Lagrange's theorem](/theorems/782) this gives $g(2)=4$.
This exact answer is atypical: it comes from special quadratic structure and congruence restrictions for squares, not from the large-$n$ circle-method estimates that dominate higher powers. The circle method can still count square representations, but the equality $g(2)=4$ should not be treated as a model for larger $k$, where small exceptional integers strongly affect $g(k)$ while analytic estimates primarily control $G(k)$.
[/example]
The general existence theorem is older than the circle method. In this course it is used as a conceptual starting point, while the quantitative bounds below come from exponential sums.
[quotetheorem:9073]
[citeproof:9073]
Hilbert's theorem answers existence for each fixed exponent $k$; the hypothesis $k\ge 2$ places the problem beyond the additive case $k=1$, where no analytic machinery is needed. The theorem does not give a usable value of $g(k)$, does not count representations, and does not distinguish finitely many exceptional small integers from the large-$n$ behaviour that the circle method can see. Its role here is therefore structural: it guarantees that Waring's problem has a finite exact answer, while the rest of the chapter asks for quantitative asymptotic information. To count representations, choose $P=n^{1/k}$ up to harmless integer parts and introduce the Weyl sum
\begin{align*}
f(\alpha;P)=\sum_{1\le x\le P} e(\alpha x^k), \qquad e(t)=e^{2\pi i t}.
\end{align*}
This motivates the following definition of the representation function whose value the integral will recover.
[definition: Waring Representation Function]
For integers $k\ge 2$ and $s\ge 1$, the Waring representation function is the map
\begin{align*}
R_{s,k}:\mathbb N\to\{0,1,2,\dots\},\qquad n\mapsto R_{s,k}(n),
\end{align*}
defined by
\begin{align*}
R_{s,k}(n)=\#\{(x_1,\dots,x_s)\in\mathbb N^s:x_1^k+\cdots+x_s^k=n\}.
\end{align*}
[/definition]
This definition fixes positive variables. The next step is to express $R_{s,k}(n)$ as the zeroth Fourier coefficient of a generating sum, because that expression is what the major/minor arc decomposition can estimate.
[quotetheorem:9074]
[citeproof:9074]
This identity is the point where exact and asymptotic questions diverge. The choice $P=\lfloor n^{1/k}\rfloor$ is forced by the positivity of the variables: any represented tuple has $x_i^k\le n$, so larger summation ranges add no new solutions and smaller ranges may lose solutions. The identity itself does not estimate the integral and does not prove positivity; it only transfers the counting problem into Fourier form. It also counts positive variables, so it is adapted to $R_{s,k}(n)$ rather than directly to the non-negative convention used in the definition of $G(k)$, though positive representations are sufficient for asymptotic Waring once $n$ is large.
A concrete failure occurs if the range is shortened. For $k=2$, $s=1$, and $n=m^2$, the representation $n=m^2$ is counted by $R_{1,2}(n)$, but the integral with $P=m-1$ omits the only contributing term and gives $0$. A different failure occurs if non-negative variables are counted with the same positive Weyl sum: for $n=0$ there is one non-negative representation by any number of variables, while the integral above gives $0$ because no positive variable can appear. These examples explain why the range and positivity convention are part of the statement rather than cosmetic choices.
## Minor Arc Estimates from Mean Values
The obstacle to an asymptotic formula is the part of the integral where $\alpha$ is not well approximated by rationals with small denominator. On those minor arcs, the Weyl sum should exhibit cancellation; the course's mean-value estimates provide a way to turn average cancellation into a pointwise-saving integral bound.
[definition: Major and Minor Arcs for Waring]
Let $k\ge 2$, $P\ge 1$, and let $Q$ be a parameter with $1\le Q\le P^k$. For $1\le q\le Q$ and $1\le a\le q$ with $\gcd(a,q)=1$, put
\begin{align*}
\mathfrak M(q,a;Q)=\left\{\alpha\in[0,1):\left|\alpha-\frac{a}{q}\right|\le \frac{Q}{qP^k}\right\}.
\end{align*}
The major arcs are $\mathfrak M(Q)=\bigcup_{q,a}\mathfrak M(q,a;Q)$ over these pairs $(q,a)$, and the minor arcs are $\mathfrak m(Q)=[0,1)\setminus\mathfrak M(Q)$.
[/definition]
The parameter $Q$ is chosen so that the major arcs are narrow enough to be almost disjoint but wide enough to contain the rational approximations responsible for the main term. The minor arc estimate must beat the scale $P^{s-k}$, so we next need an averaged estimate strong enough to supply this saving.
[quotetheorem:9075]
[citeproof:9075]
Hua's lemma is useful because a small power moment can control a larger integral once a pointwise minor arc bound is available. To compare the classical method with the modern theory, we also record the stronger mean-value input controlling the full system of powers $1,\dots,k$.
The range $1\le j\le k$ matters because each differencing step removes one layer of degree from the phase, and after $k$ steps the polynomial structure has been exhausted. Taking $j<k$ gives a weaker but sometimes cheaper estimate; taking $j>k$ is not part of Hua's differencing lemma and would require different information rather than more repetitions of the same argument. The bound is also an average estimate over the full unit interval, so by itself it gives no pointwise minor arc cancellation and no major arc main term. Its value comes from being combined with a separate pointwise estimate on $\mathfrak m(Q)$.
The hypotheses also prevent several false readings of the estimate. If $j>k$, the displayed exponent would predict further saving after the degree has already been killed; for the linear phase $k=1$, repeated differencing past the first step has no degree left to remove, and the proposed range would not reflect the actual orthogonality behaviour of $\sum_{x\le P}e(\alpha x)$. The fixed phase $x^k$ is equally important: replacing it by a constant phase gives $f(\alpha;P)=P$, so the moment is $P^{2^j}$ rather than $P^{2^j-j+\varepsilon}$. Thus the theorem is a degree-sensitive statement about a genuine $k$-th power Weyl sum, not a general inequality for arbitrary exponential sums of length $P$.
[quotetheorem:9076]
The staged theorem manifest records this estimate as a statement-only external input for these notes; it is quoted to compare the modern mean-value theory with the classical Hua-lemma route.
For comparison with Hua's lemma, we treat this estimate as a modern mean-value input: it controls the full system of equations in the powers $1,\dots,k$ and can replace repeated differencing in sharper versions of the Waring argument.
The classical route to Waring's problem combines Hua's mean value estimate with Weyl-type minor arc bounds. The following form records the output needed for $G(k)$, not the best possible numerical constants.
[quotetheorem:9077]
[citeproof:9077]
This theorem is not numerically competitive with modern refinements, but it displays the structure of the method. The hypothesis $s\ge 2^k+1$ is tied to using Hua's $2^k$-th moment together with one additional factor from a pointwise minor arc saving; with fewer variables this particular argument does not leave enough room to beat the expected main scale $P^{s-k}$. The proof sketch also hides delicate local information: positivity of the singular series is a separate arithmetic assertion, not a consequence of the minor arc bound. The conclusion is therefore a classical sufficient bound for $G(k)$, not a sharp determination of the asymptotic Waring number.
The threshold in the definition of $G(k)$ cannot be removed. For positive variables, $R_{s,k}(n)=0$ whenever $n<s$, because each summand is at least $1$; therefore an asymptotic representation statement for positive variables can only begin after a size threshold. Local arithmetic can also destroy a proposed main term unless the singular series is known to be positive. For example, modulo $8$ every square is $0$, $1$, or $4$, so an integer $n\equiv 7\pmod 8$ cannot be represented as a sum of three squares; the corresponding local density is zero, and a positive singular-series main term for three squares would be impossible. The same warning applies inside the large-variable method: the analytic minor arc estimate alone would still produce only a candidate main term, and the theorem becomes an existence result only after the congruence densities for $s\ge 2^k+1$ have been proved positive and uniformly bounded below. Waring's large-$s$ theorem works because enough variables remove such local obstructions in the eventual range, not because the obstructions are irrelevant.
[example: Minor Arcs for Cubes]
For $k=3$, put
\begin{align*}
f(\alpha;P)=\sum_{1\le x\le P}e(\alpha x^3).
\end{align*}
The circle-method identity gives
\begin{align*}
R_{s,3}(n)=\int_0^1 f(\alpha;P)^s e(-n\alpha)\,d\alpha,
\end{align*}
so the minor arc contribution is
\begin{align*}
I_{\mathfrak m}=\int_{\mathfrak m(Q)} f(\alpha;P)^s e(-n\alpha)\,d\alpha.
\end{align*}
Assume that on $\mathfrak m(Q)$ one has the Weyl-type bound $|f(\alpha;P)|\le P^{1-\delta}$ for some $\delta>0$. Since $|e(-n\alpha)|=1$, the triangle inequality gives
\begin{align*}
|I_{\mathfrak m}|\le \int_{\mathfrak m(Q)} |f(\alpha;P)|^s\,d\alpha.
\end{align*}
For $s>8$, split $|f|^s=|f|^{s-8}|f|^8$ and use the pointwise minor arc bound on the first factor:
\begin{align*}
|f(\alpha;P)|^{s-8}\le \left(P^{1-\delta}\right)^{s-8}=P^{s-8}P^{-\delta(s-8)}.
\end{align*}
Therefore
\begin{align*}
|I_{\mathfrak m}|\le P^{s-8}P^{-\delta(s-8)}\int_{\mathfrak m(Q)}|f(\alpha;P)|^8\,d\alpha.
\end{align*}
Extending the integral from $\mathfrak m(Q)$ to $[0,1]$ can only increase it, and *[Hua Mean Value Bound](/theorems/9075)* with $j=3$ gives
\begin{align*}
\int_0^1 |f(\alpha;P)|^8\,d\alpha \lesssim_{\varepsilon} P^{8-3+\varepsilon}=P^{5+\varepsilon}.
\end{align*}
Hence
\begin{align*}
|I_{\mathfrak m}|\lesssim_{\varepsilon} P^{s-8}P^{-\delta(s-8)}P^{5+\varepsilon}=P^{s-3-\delta(s-8)+\varepsilon}.
\end{align*}
The expected main scale for cubes is $P^{s-3}$. Dividing the minor arc bound by this scale gives
\begin{align*}
\frac{|I_{\mathfrak m}|}{P^{s-3}}\lesssim_{\varepsilon} P^{-\delta(s-8)+\varepsilon}.
\end{align*}
Choosing, for example, $\varepsilon=\delta(s-8)/2$ makes the exponent negative, so the minor arc contribution is $o(P^{s-3})$. This illustrates the standard mechanism: Hua's eighth moment controls eight factors of the Weyl sum, while the remaining $s-8$ factors convert pointwise minor arc cancellation into a saving over the main term.
[/example]
The example shows the standard pattern: isolate enough factors to use a mean value, and use the remaining factors to harvest pointwise cancellation. The major arc calculation now has to identify the main term and prove that it is positive.
## Major Arcs and the Hardy-Littlewood Formula
The major arcs encode the arithmetic of congruences and the real density of solutions. The guiding question is whether the local information near rationals $a/q$ assembles into a product of local densities and a real singular integral.
Near a reduced rational $a/q$, write $\alpha=a/q+\beta$. The Weyl sum separates into a complete exponential sum modulo $q$ and a continuous oscillatory integral, so we first name the finite sum that carries the congruence information.
[definition: Complete Waring Sum]
For integers $k\ge 2$ and $q\ge 1$, the complete Waring sum modulo $q$ is the map
\begin{align*}
S(q,\cdot):\mathbb Z\to\mathbb C,\qquad a\mapsto S(q,a),
\end{align*}
defined by
\begin{align*}
S(q,a)=\sum_{r=1}^q e\left(\frac{ar^k}{q}\right).
\end{align*}
[/definition]
The complete sum contains the finite congruence information for one modulus. Major arcs require these contributions to be assembled over many reduced rational centres, and the resulting arithmetic factor must remember every congruence obstruction at once. The singular series is the object that packages those complete-sum contributions across all moduli when the infinite sum converges.
[definition: Waring Singular Series]
For integers $k\ge 2$ and $s\ge 1$, the Waring singular series is the partially defined map
\begin{align*}
\mathfrak S_{s,k}:\mathbb N\to\mathbb C,\qquad n\mapsto \mathfrak S_{s,k}(n),
\end{align*}
given, for those $n$ for which the series converges, by
\begin{align*}
\mathfrak S_{s,k}(n)=\sum_{q=1}^{\infty}\sum_{a \in A(q)}q^{-s}S(q,a)^s e\left(-\frac{an}{q}\right),
\end{align*}
where $A(q)=\{a\in\{1,\dots,q\}:\gcd(a,q)=1\}$.
[/definition]
The singular series is the arithmetic factor in the answer. The other factor comes from replacing the discrete variables $x_i$ by real variables after scaling by $P$, so we also need the corresponding continuous density.
[definition: Waring Singular Integral]
The Waring singular integral is the map
\begin{align*}
\mathfrak J:\{(s,k)\in\mathbb N^2:k\ge 2,\ s>k\}\to\mathbb C,\qquad (s,k)\mapsto \mathfrak J(s,k),
\end{align*}
defined by
\begin{align*}
\mathfrak J(s,k)=\mathfrak J_{s,k}=\int_{-\infty}^{\infty}\left(\int_0^1 e(\beta t^k)\,dt\right)^s e(-\beta)\,d\beta.
\end{align*}
[/definition]
The singular integral is the real density of the hypersurface $t_1^k+\cdots+t_s^k=1$ in the positive cube. With both local factors identified, this motivates the following Hardy-Littlewood formula for the full representation function.
[quotetheorem:9078]
[citeproof:9078]
This theorem is the asymptotic form of Waring's problem. The requirement that $s$ be large is doing several jobs at once: it gives enough minor arc cancellation, makes the singular series converge in the needed sense, and ensures that the real singular integral is finite and positive. Local solubility is also essential; if the congruence $x_1^k+\cdots+x_s^k\equiv n\pmod {p^h}$ fails for some prime power, then the corresponding local density is zero and no positive main term can hold. Even when solutions exist modulo every prime power, one needs enough non-singular solutions to obtain stable positive densities and uniform lower bounds for the singular series in the asymptotic range. Thus the formula should be read as a large-$s$ local-global theorem with an error term, not as a statement that every small value of $s$ behaves regularly.
[example: Positivity of the Singular Series]
Fix a prime $p$, and write
\begin{align*}
N_p(h)=\#\{x\in(\mathbb Z/p^h\mathbb Z)^s:x_1^k+\cdots+x_s^k\equiv n\pmod {p^h}\}.
\end{align*}
Assume that for some $h_0$ there is a solution $a=(a_1,\dots,a_s)$ modulo $p^{h_0}$ with at least one derivative non-zero modulo $p$:
\begin{align*}
k a_i^{k-1}\not\equiv 0\pmod p
\end{align*}
for some index $i$. If $a$ is lifted to $a+p^{h}y$ modulo $p^{h+1}$, then the first-order expansion of $F(x)=x_1^k+\cdots+x_s^k-n$ gives
\begin{align*}
F(a+p^h y)\equiv F(a)+p^h\sum_{j=1}^s k a_j^{k-1}y_j\pmod {p^{h+1}}.
\end{align*}
Since one coefficient $k a_i^{k-1}$ is invertible modulo $p$, the linear congruence in $y_1,\dots,y_s$ has exactly $p^{s-1}$ solutions modulo $p$: choose the other $s-1$ variables freely, then solve uniquely for $y_i$. Thus each non-singular solution modulo $p^h$ lifts to $p^{s-1}$ solutions modulo $p^{h+1}$.
Iterating from level $h_0$ to level $h$ gives at least
\begin{align*}
p^{(h-h_0)(s-1)}
\end{align*}
solutions modulo $p^h$ above the original non-singular solution. Therefore
\begin{align*}
p^{h(1-s)}N_p(h)\ge p^{h(1-s)}p^{(h-h_0)(s-1)}=p^{-h_0(s-1)}.
\end{align*}
The right-hand side is positive and independent of $h$, so the local density
\begin{align*}
\sigma_p(n)=\lim_{h\to\infty}p^{h(1-s)}N_p(h)
\end{align*}
is positive whenever the limit exists and the non-singular lifting persists. In the large-$s$ range of the *[Hardy-Littlewood Asymptotic Formula for Waring's Problem](/theorems/9078)*, the singular series is the convergent Euler product
\begin{align*}
\mathfrak S_{s,k}(n)=\prod_p \sigma_p(n).
\end{align*}
Thus positive local densities at every prime give a positive singular series, provided the convergent product is bounded away from zero rather than losing all mass through infinitely many small factors.
[/example]
The role of local solubility is worth separating from the analytic estimates. Minor arcs prove that no large unstructured cancellation is missing from the integral; major arcs identify the candidate main term. Positivity of the singular series is the final arithmetic check that the candidate main term does not vanish.
[remark: Why the Asymptotic Problem Ignores Some Exact Obstructions]
The value of $g(k)$ can be forced upward by finitely many integers whose representations require many summands. The value of $G(k)$ ignores those finitely many exceptions and is governed by estimates for large $n$. This is why the circle method naturally gives bounds for $G(k)$ and asymptotic formulae for $R_{s,k}(n)$ before it gives sharp exact information about $g(k)$.
[/remark]
The chapter therefore completes the course's first full circle-method problem. Weyl sums express the representation function, minor arcs are controlled through mean values, major arcs supply the singular series and singular integral, and the resulting formula turns analytic cancellation into additive representation theorems.
Waring’s problem shows how Weyl sums, major arcs, and mean-value bounds combine into a complete additive theorem. Chapter 7 keeps the same circle-method architecture but replaces polynomial variables with primes, where the generating functions and local analysis become more delicate.
# 7. Additive problems with prime variables
This chapter brings the circle method to additive problems in which the variables are prime. Chapters 1, 5, and 6 converted representation functions into integrals of exponential sums; here the relevant generating function is no longer a polynomial Weyl sum but a weighted sum over primes. The von Mangoldt weight is used because it interacts well with Dirichlet convolution, prime distribution in arithmetic progressions, and bilinear decompositions.
The central question is whether the Fourier mass of the primes is concentrated near rational points with small denominator, as happens for many structured integer sequences. On the major arcs, this is supplied by prime number theory in arithmetic progressions. On the minor arcs, the course uses Vinogradov's method, Vaughan's identity, and Type I/Type II bilinear estimates to obtain cancellation.
## Weighted Prime Sums as Circle-Method Inputs
The circle method asks for a generating function whose Fourier coefficients encode the objects being counted. For prime variables, the unweighted indicator of the primes is arithmetically rigid and awkward under convolution, while the von Mangoldt function has the exact convolution identities needed to expose cancellation.
[definition: Weighted Prime Exponential Sum]
Let $e: \mathbb R/\mathbb Z \to \mathbb C$ be the character induced by
\begin{align*}
e(x):=e^{2\pi i x}.
\end{align*}
For $N\ge 1$, the weighted prime exponential sum is the map $S_N: \mathbb R/\mathbb Z\to \mathbb C$ defined by
\begin{align*}
S_N(\alpha):=\sum_{n\le N}\Lambda(n)e(\alpha n).
\end{align*}
[/definition]
The weight $\Lambda(n)$ gives logarithmic weight to prime powers, but the prime powers contribute a smaller error in the applications of this chapter. Thus $S_N(\alpha)$ behaves as a Fourier transform of the primes while retaining algebraic structure under convolution.
[example: Prime Powers Have Lower Total Weight]
Let $N\ge 2$. Since $\Lambda(n)=\log p$ when $n=p^k$ is a prime power and $\Lambda(n)=0$ otherwise, we have
\begin{align*}
\sum_{n\le N}\Lambda(n)=\sum_{\substack{p^k\le N; k\ge 1}}\log p.
\end{align*}
The terms with $k=1$ give $\sum_{p\le N}\log p$, so the difference between the von Mangoldt-weighted sum and the prime-weighted sum is
\begin{align*}
\sum_{n\le N}\Lambda(n)-\sum_{p\le N}\log p=\sum_{\substack{p^k\le N; k\ge 2}}\log p.
\end{align*}
If $p^k\le N$ and $k\ge 2$, then $p\le N^{1/k}$ and $2^k\le p^k\le N$, so $k\le \log_2 N$. Therefore
\begin{align*}
\sum_{\substack{p^k\le N; k\ge 2}}\log p \le \sum_{2\le k\le \log_2 N}\sum_{p\le N^{1/k}}\log p.
\end{align*}
For each such $k$, there are at most $N^{1/k}$ primes $p\le N^{1/k}$, and each satisfies $\log p\le \log N$, hence
\begin{align*}
\sum_{p\le N^{1/k}}\log p\le N^{1/k}\log N.
\end{align*}
Since $k\ge 2$ gives $N^{1/k}\le N^{1/2}$, we get
\begin{align*}
\sum_{\substack{p^k\le N; k\ge 2}}\log p \le (\log_2 N)N^{1/2}\log N.
\end{align*}
Finally $\log_2 N=(\log N)/(\log 2)$, so
\begin{align*}
\sum_{\substack{p^k\le N; k\ge 2}}\log p \ll N^{1/2}(\log N)^2.
\end{align*}
Thus the extra weight introduced by replacing primes with $\Lambda$ comes only from higher prime powers, and that contribution is lower order compared with the main scale $N$ of prime number theory.
[/example]
The example justifies using a weighted prime sum rather than the raw prime indicator. The next task is to connect this weighted sum to an additive representation function, so that estimates for $S_N(\alpha)$ have direct arithmetic consequences.
[quotetheorem:9079]
[citeproof:9079]
The hypotheses in the identity are bookkeeping hypotheses, but they are still necessary. The cut-off $n_i\le N$ in $S_N$ is harmless because a summand with $n_1+n_2+n_3=N$ and $n_i\ge 1$ automatically has each $n_i\le N$; if the same integral used a shorter cut-off, such as $S_{N/3}$, it would miss representations with one summand larger than $N/3$. The positivity condition excludes non-prime variables: allowing $n_i=0$ would add terms involving an undefined $\Lambda(0)$, while allowing negative variables would no longer match the finite Fourier polynomial $S_N$.
The identity itself proves no positivity and gives no asymptotic formula for $R_3(N)$. For instance, the formula remains true for even $N$, but the main ternary odd-prime asymptotic later has a parity obstruction modulo $2$. Its role is to transfer the problem to analysis on $\mathbb R/\mathbb Z$: the major arcs must produce a positive main term, and the minor arcs must be shown to be smaller.
[example: Ternary Goldbach as Positivity of a Weighted Count]
For an odd integer $N$, write the weighted count as
\begin{align*}
R_3(N)=\sum_{\substack{n_1+n_2+n_3=N; n_i\ge 1}}\Lambda(n_1)\Lambda(n_2)\Lambda(n_3).
\end{align*}
Each summand is nonnegative. Hence if $R_3(N)>0$, at least one triple $(n_1,n_2,n_3)$ has $\Lambda(n_1)\Lambda(n_2)\Lambda(n_3)>0$, so $\Lambda(n_i)>0$ for each $i$. By the definition of the von Mangoldt function, each such $n_i$ is a prime power.
Let $P_3(N)$ denote the part of this sum in which all three variables are prime:
\begin{align*}
P_3(N)=\sum_{\substack{p_1+p_2+p_3=N; p_i\ \mathrm{prime}}}(\log p_1)(\log p_2)(\log p_3).
\end{align*}
Then the remaining contribution is
\begin{align*}
R_3(N)-P_3(N)=\sum_{\substack{n_1+n_2+n_3=N; n_i\ge 1; \text{ at least one }n_i=p^k,\ k\ge 2}}\Lambda(n_1)\Lambda(n_2)\Lambda(n_3).
\end{align*}
Thus an asymptotic formula
\begin{align*}
R_3(N)=\frac{1}{2}\mathfrak S_3(N)N^2+o(N^2)
\end{align*}
together with $\mathfrak S_3(N)\ge c>0$ for odd $N$ gives
\begin{align*}
R_3(N)\ge \frac{c}{4}N^2
\end{align*}
for all sufficiently large odd $N$. If the prime-power contribution is $o(N^2)$, then for sufficiently large odd $N$ it is at most $cN^2/8$, and therefore
\begin{align*}
P_3(N)\ge \frac{c}{4}N^2-\frac{c}{8}N^2=\frac{c}{8}N^2>0.
\end{align*}
Since every term in $P_3(N)$ is nonnegative, this positivity gives at least one triple of primes $p_1+p_2+p_3=N$. The circle method therefore seeks the weighted asymptotic first, because positivity of the weighted count survives after the lower-order prime-power terms are removed.
[/example]
Binary Goldbach has the same formal Fourier identity, but the minor arcs are harder because there are only two copies of $S_N$ rather than three. The loss of one averaging factor is the main analytic difference.
[example: Why Binary Goldbach Is Harder]
For the binary Goldbach problem the weighted count is
\begin{align*}
R_2(N)=\int_0^1 S_N(\alpha)^2e(-N\alpha)\,d\alpha.
\end{align*}
Indeed, expanding $S_N(\alpha)^2$ gives
\begin{align*}
S_N(\alpha)^2e(-N\alpha)=\sum_{n_1\le N}\sum_{n_2\le N}\Lambda(n_1)\Lambda(n_2)e((n_1+n_2-N)\alpha).
\end{align*}
Integrating term by term over $[0,1]$, the integral of $e((n_1+n_2-N)\alpha)$ is $1$ when $n_1+n_2=N$ and is $0$ otherwise, so the integral equals
\begin{align*}
\sum_{\substack{n_1+n_2=N; n_i\ge 1}}\Lambda(n_1)\Lambda(n_2).
\end{align*}
The ternary minor-arc estimate has an extra copy of $S_N$ available:
\begin{align*}
\int_{\mathfrak m}|S_N(\alpha)|^3\,d\alpha\le \sup_{\alpha\in\mathfrak m}|S_N(\alpha)|\int_0^1|S_N(\alpha)|^2\,d\alpha.
\end{align*}
The mean-square factor is
\begin{align*}
\int_0^1|S_N(\alpha)|^2\,d\alpha=\sum_{n\le N}\Lambda(n)^2\ll N\log N,
\end{align*}
so a logarithmic pointwise saving for $\sup_{\mathfrak m}|S_N|$ makes the ternary minor-arc integral $o(N^2)$, smaller than the ternary main term of size $N^2$.
For the binary integral, the same separation gives only
\begin{align*}
\int_{\mathfrak m}|S_N(\alpha)|^2\,d\alpha\le \int_0^1|S_N(\alpha)|^2\,d\alpha\ll N\log N.
\end{align*}
The expected binary main term has size $N$, so this bound is larger than the main term by a logarithmic factor. If one extracts a supremum from one factor instead, then Cauchy-Schwarz gives
\begin{align*}
\int_{\mathfrak m}|S_N(\alpha)|^2\,d\alpha\le \sup_{\alpha\in\mathfrak m}|S_N(\alpha)|\int_0^1|S_N(\alpha)|\,d\alpha.
\end{align*}
Since $[0,1]$ has measure $1$,
\begin{align*}
\int_0^1|S_N(\alpha)|\,d\alpha\le \left(\int_0^1|S_N(\alpha)|^2\,d\alpha\right)^{1/2}\ll (N\log N)^{1/2}.
\end{align*}
Even with the pointwise bound $\sup_{\mathfrak m}|S_N(\alpha)|\ll_A N/(\log N)^A$, this gives a bound of order $N^{3/2}$ up to logarithms, still too large compared with the desired scale $N$. Thus the binary problem loses exactly the averaging factor that makes the ternary minor-arc argument close.
[/example]
## Major Arcs from Prime Distribution Estimates
The first major question is what $S_N(\alpha)$ looks like when $\alpha$ lies close to a rational number $a/q$ with small $q$. Near such a rational, the exponential phase is nearly periodic modulo $q$, so the answer depends on how primes are distributed among reduced residue classes modulo $q$.
[definition: Major Arcs for Prime Sums]
Let $Q,L \ge 1$ be parameters with $Q L \le N$. The major arcs are the union
\begin{align*}
\mathfrak M(Q,L) := \bigcup_{1 \le q \le Q}\bigcup_{\substack{1 \le a \le q;\ \gcd(a,q)=1}}\left\{\alpha \in \mathbb R/\mathbb Z : \left|\alpha-\frac{a}{q}\right| \le \frac{L}{qN}\right\}.
\end{align*}
Here $|\alpha-a/q|$ denotes the distance on $\mathbb R/\mathbb Z$ from $\alpha$ to the residue class of $a/q$, namely $\min_{k\in\mathbb Z}|\tilde\alpha-a/q-k|$ for any real representative $\tilde\alpha$ of $\alpha$.
[/definition]
The complement $\mathfrak m(Q,L)=(\mathbb R/\mathbb Z)\setminus\mathfrak M(Q,L)$ is the minor arc set. The restriction $QL\le N$ keeps the arcs at a scale where different reduced rationals are separated well enough for the major/minor decomposition to be useful. The next theorem provides the uniform prime-distribution input needed to approximate $S_N(a/q+\beta)$ on these arcs, because it counts the von Mangoldt weight in each reduced residue class modulo $q$.
[quotetheorem:9080]
The staged theorem manifest records this card as a statement-only external input for these notes; it supplies a standard prime-distribution theorem rather than a result proved in this course.
This is the Siegel-Walfisz form of the [prime number theorem](/theorems/1692) in arithmetic progressions. When it is applied later with $x\le N$, the modulus condition is checked at the scale of the summatory variable $x$, not only at the outer scale $N$. The condition $q\le (\log x)^A$ is a genuine limitation of this quoted input: it gives uniformity for logarithmic moduli, not for moduli as large as a power of $x$. The coprimality hypothesis is also necessary, since if $\gcd(a,q)>1$ then primes in the class $a\pmod q$ are forced to divide $q$ except for finitely many exceptional terms, so the main term $x/\varphi(q)$ would be false.
The quoted theorem supplies the distribution statement, but the major-arc calculation also requires summing over all reduced residues when the arcs around $a/q$ are assembled. After the prime weights are replaced by their progression averages, the remaining dependence on the target integer is the finite phase sum over reduced residue classes. Naming this sum keeps the later singular series from hiding the arithmetic contribution of each denominator $q$.
[definition: Ramanujan Sum]
For $q\in\mathbb N$, the Ramanujan sum is the map $c_q:\mathbb Z\to\mathbb C$ defined by
\begin{align*}
c_q(m):=\sum_{\substack{1\le a\le q;\ \gcd(a,q)=1}}e\left(\frac{am}{q}\right).
\end{align*}
[/definition]
Ramanujan sums appear because primes not dividing $q$ are asymptotically equidistributed among reduced residue classes modulo $q$. Near a rational point $a/q$, the prime exponential sum has two sources of oscillation: the residue class of each prime modulo $q$, and the small real displacement $\beta$ from the rational centre.
The analytic problem is that the von Mangoldt weight is not evenly distributed residue by residue unless the modulus is small enough for a [prime number theorem](/theorems/1742) in progressions. The next major-arc estimate is needed precisely to separate the reduced-residue contribution from the real oscillation while recording the range where that distribution input is valid.
[quotetheorem:9081]
[citeproof:9081]
The hypotheses mark exactly where this particular major-arc approximation is valid. If $q$ is too large, for example $q=N^{1/10}$, the quoted Siegel-Walfisz input gives no uniform asymptotic for primes in residue classes modulo $q$. If $\gcd(a,q)>1$, take $q=2$ and $a=0$: apart from the prime $2$ and its powers, primes do not lie in that residue class, so the factor $\mu(q)/\varphi(q)$ is the wrong model. If $|\beta|$ is outside the stated range, the approximation by the single real factor $\sum_{n\le N}e(\beta n)$ is no longer controlled by the same partial-summation error; one then needs a different estimate that keeps sharper track of the oscillatory integral over each progression.
The theorem also does not say that all nearby rational points are major arcs in a useful sense. It is a local formula under logarithmic modulus and narrow width assumptions, not a substitute for distribution of primes in progressions to polynomial moduli. Within its valid range, it turns the major-arc part of the ternary integral into a product of local congruence densities and one real integral. To state that product compactly, we package the finite congruence contributions into the singular series.
[definition: Ternary Goldbach Singular Series]
The ternary Goldbach singular series is the map $\mathfrak S_3:\mathbb N\to\mathbb R$ defined by the absolutely convergent series
\begin{align*}
\mathfrak S_3(N):=\sum_{q=1}^{\infty}\frac{\mu(q)^3}{\varphi(q)^3}c_q(-N).
\end{align*}
[/definition]
The series can be rewritten as an Euler product, making the congruence obstructions visible. For odd $N$, there is no local obstruction to writing $N$ as a sum of three odd primes, so the singular series is positive.
[quotetheorem:9082]
[proofunderconstruction:9082]
The oddness of $N$ is the local condition that removes the obstruction modulo $2$: three odd primes have odd sum, while an even target such as $N=2M$ cannot be represented as a sum of three odd primes and would require one summand to be $2$. The logarithmic choice of $Q$ and $L$ reflects the available Siegel-Walfisz uniformity; taking $Q=N^{1/10}$ would demand prime-distribution information in arithmetic progressions that has not been assumed. The lower bound for $\mathfrak S_3(N)$ records that no further local congruence obstruction survives for odd $N$.
This theorem is only the major-arc evaluation. It does not estimate the integral over $\mathfrak m$, does not prove the final representation theorem by itself, and says nothing useful for even $N$ in the three-odd-prime model. The rest of the proof is the minor-arc estimate, where rational approximation is deliberately poor.
## Vaughan Identity and Bilinear Decomposition
On the minor arcs, direct estimates for $S_N(\alpha)$ are too weak because $\Lambda$ is supported on primes. The key move is to replace $\Lambda$ by sums that separate variables, producing bilinear forms where Cauchy-Schwarz, completion, and Diophantine approximation can be applied.
[definition: Type I and Type II Prime Bilinear Sums]
A Type I sum has the shape
\begin{align*}
\sum_{m\le M}a_m\sum_{n\le N_m}e(\alpha mn),
\end{align*}
where one variable is short or has simple coefficients. A Type II sum has the shape
\begin{align*}
\sum_{m\sim M}a_m\sum_{n\sim K}b_n e(\alpha mn),
\end{align*}
where both variables have comparable nontrivial ranges and $MK$ is the original scale.
[/definition]
Type I sums are controlled by estimating inner geometric progressions. Type II sums require more averaging, usually through Cauchy-Schwarz and bounds for sums of phases involving differences. This is the organizational purpose of Vaughan's identity: it does not estimate primes directly, but rewrites the von Mangoldt weight $\Lambda$ into pieces whose shapes match the two analytic estimates just described.
[quotetheorem:7192]
[citeproof:7192]
The displayed identity should be read as a bookkeeping device for divisor structure. Its parameters decide how much of $\Lambda$ is treated as a short, directly controlled part and how much is left in [bilinear form](/page/Bilinear%20Form). The exact summation ranges and coefficients in the theorem are what make the decomposition balance these roles without changing the original arithmetic weight.
The identity is an algebraic decomposition, not a cancellation estimate by itself. Bad choices of the splitting parameters can make the resulting sums analytically useless: taking them too small leaves a long bilinear remainder, while taking them too large destroys the short-variable structure needed in Type I estimates. With balanced parameters, some pieces lead to Type I expressions after summing against $e(\alpha n)$, and the remaining bilinear pieces are designed for Type II estimates.
[example: Type I Estimate by Geometric Sums]
Consider
\begin{align*}
T_I=\sum_{m\le M}a_m\sum_{n\le K}e(\alpha mn)
\end{align*}
with $|a_m|\le 1$. Fix $m$ and put $x=\alpha m$. If $x\in\mathbb Z$, then $e(xn)=1$ for every $n$, so
\begin{align*}
\left|\sum_{n\le K}e(xn)\right|=K.
\end{align*}
If $x\notin\mathbb Z$, the finite geometric-sum formula gives
\begin{align*}
\sum_{n\le K}e(xn)=e(x)\frac{1-e(Kx)}{1-e(x)}.
\end{align*}
Since $|e(x)|=1$ and $|1-e(Kx)|\le 2$, while
\begin{align*}
|1-e(x)|=2|\sin(\pi x)|\ge 4\|x\|_{\mathbb R/\mathbb Z},
\end{align*}
we obtain
\begin{align*}
\left|\sum_{n\le K}e(xn)\right|\le \frac{1}{2\|x\|_{\mathbb R/\mathbb Z}}.
\end{align*}
Combining this estimate with the termwise bound
\begin{align*}
\left|\sum_{n\le K}e(xn)\right|\le \sum_{n\le K}|e(xn)|=K
\end{align*}
gives
\begin{align*}
\left|\sum_{n\le K}e(\alpha mn)\right|\le \min\left(K,\frac{1}{2\|\alpha m\|_{\mathbb R/\mathbb Z}}\right).
\end{align*}
Using $|a_m|\le 1$ and the triangle inequality,
\begin{align*}
|T_I|\le \sum_{m\le M}\min\left(K,\frac{1}{2\|\alpha m\|_{\mathbb R/\mathbb Z}}\right).
\end{align*}
Thus the estimate improves on the absolute bound $MK$ whenever the residues $\alpha m$ are usually separated from integers. For example, if $\|\alpha m\|_{\mathbb R/\mathbb Z}\ge \eta$ for every $m\le M$, then
\begin{align*}
|T_I|\le M\min\left(K,\frac{1}{2\eta}\right).
\end{align*}
Conversely, if $\|\alpha m\|_{\mathbb R/\mathbb Z}\le \eta$, then for some integer $r$,
\begin{align*}
|\alpha m-r|\le \eta.
\end{align*}
Dividing by $m$ gives
\begin{align*}
\left|\alpha-\frac{r}{m}\right|\le \frac{\eta}{m}.
\end{align*}
So many values of $m$ with $\alpha m$ very close to an integer would force $\alpha$ to have many good rational approximations with denominators at most $M$. The minor-arc condition is used precisely to rule out that concentration, so the geometric-sum bound is much smaller than $K$ for enough values of $m$ to produce cancellation relative to the size $MK$.
[/example]
The Type II case is subtler because neither inner sum is short enough to handle termwise. Cauchy-Schwarz transfers the problem to correlations of additive characters.
[example: Bounding a Type II Bilinear Sum after Cauchy-Schwarz]
Let
\begin{align*}T_{II}=\sum_{m\sim M}a_m\sum_{n\sim K}b_n e(\alpha mn),\end{align*}
where $|a_m|,|b_n|\le 1$ and $MK\asymp N$. Put
\begin{align*}A_m=\sum_{n\sim K}b_n e(\alpha mn).\end{align*}
Cauchy-Schwarz in the $m$-variable gives
\begin{align*}|T_{II}|^2=\left|\sum_{m\sim M}a_mA_m\right|^2\le \left(\sum_{m\sim M}|a_m|^2\right)\left(\sum_{m\sim M}|A_m|^2\right).\end{align*}
Since $|a_m|\le 1$ and the dyadic range $m\sim M$ contains $O(M)$ integers,
\begin{align*}|T_{II}|^2\ll M\sum_{m\sim M}\left|\sum_{n\sim K}b_n e(\alpha mn)\right|^2.\end{align*}
For each fixed $m$, expanding the square gives
\begin{align*}\left|\sum_{n\sim K}b_n e(\alpha mn)\right|^2=\sum_{n_1\sim K}\sum_{n_2\sim K}b_{n_1}\overline{b_{n_2}}e(\alpha m(n_1-n_2)).\end{align*}
Substituting this expansion and interchanging the finite sums,
\begin{align*}|T_{II}|^2\ll M\sum_{n_1\sim K}\sum_{n_2\sim K}b_{n_1}\overline{b_{n_2}}\sum_{m\sim M}e(\alpha m(n_1-n_2)).\end{align*}
Taking absolute values and using $|b_{n_1}\overline{b_{n_2}}|\le 1$ yields
\begin{align*}|T_{II}|^2\ll M\sum_{n_1\sim K}\sum_{n_2\sim K}\left|\sum_{m\sim M}e(\alpha m(n_1-n_2))\right|.\end{align*}
On the diagonal $n_1=n_2$, the inner phase is $e(0)=1$, so
\begin{align*}\left|\sum_{m\sim M}e(\alpha m(n_1-n_2))\right|\ll M.\end{align*}
There are $O(K)$ diagonal pairs, hence the diagonal contribution to $|T_{II}|^2$ is
\begin{align*}\ll M\cdot K\cdot M=M^2K.\end{align*}
For the off-diagonal terms, write $h=n_1-n_2\ne 0$. Then $|h|\ll K$, and the inner sum is a finite geometric progression in $m$. If $\alpha h\in\mathbb Z$, its absolute value is $O(M)$. If $\alpha h\notin\mathbb Z$, the finite geometric-sum bound gives
\begin{align*}\left|\sum_{m\sim M}e(\alpha hm)\right|\ll \min\left(M,\frac{1}{\|\alpha h\|_{\mathbb R/\mathbb Z}}\right).\end{align*}
For each fixed nonzero $h$ with $|h|\ll K$, there are $O(K)$ pairs $(n_1,n_2)$ in the dyadic range satisfying $n_1-n_2=h$. Therefore
\begin{align*}|T_{II}|^2\ll M^2K+MK\sum_{0<|h|\ll K}\min\left(M,\frac{1}{\|\alpha h\|_{\mathbb R/\mathbb Z}}\right).\end{align*}
Thus Cauchy-Schwarz has converted the bilinear sum into diagonal terms of size $M^2K$ and off-diagonal geometric sums controlled by the distances $\|\alpha h\|_{\mathbb R/\mathbb Z}$. On minor arcs, the point is that many nonzero differences $h$ cannot make $\alpha h$ extremely close to an integer, so the off-diagonal sum gains cancellation compared with the full size suggested by counting all pairs without oscillation.
[/example]
These examples describe the two analytic mechanisms behind Vinogradov's estimate for prime exponential sums. Vaughan's identity is the device that ensures every term has one of these two forms.
## Vinogradov's Minor-Arc Estimate
The minor-arc problem is to prove that $S_N(\alpha)$ is smaller than its possible size $N$ when $\alpha$ is not close to a rational with small denominator. The available saving does not need to be dramatic for the ternary problem; it only needs to beat the major-arc main term after integration.
[quotetheorem:9083]
[citeproof:9083]
The estimate is stated in a logarithmic form because that is the version needed for Vinogradov's theorem. The constants are ordered: after the desired saving exponent $A$ is fixed, the major-arc parameter $B$ is chosen large enough to make the Type I and Type II losses harmless. Reversing this order would be false as a uniform statement, since a fixed small $B$ may leave points with good rational approximations outside the declared major arcs.
The exclusion from $\mathfrak M((\log N)^B,(\log N)^B)$ is essential. At $\alpha=0$, the sum $S_N(0)=\sum_{n\le N}\Lambda(n)$ is asymptotic to $N$, so no logarithmic saving is possible; more generally, points close to rationals with denominator at most $(\log N)^B$ are exactly the structured frequencies assigned to the major arcs. The theorem is therefore not a global bound for prime exponential sums, and it is not invariant under changing the major-arc definition without rechecking the Diophantine estimates. It is a cancellation statement for this specific minor-arc setup.
[remark: Analytic Input Isolated]
For the proof of the three-primes theorem, the major-arc asymptotic and the minor-arc estimate are the only analytic inputs about primes needed after the Fourier identity is established. This separation is the structural advantage of the circle method: local distribution of primes supplies the main term, while cancellation of prime exponential sums supplies the error term.
[/remark]
The remark identifies the two analytic estimates needed for the final theorem. To use the pointwise minor-arc estimate inside the ternary integral, we combine it with the mean-square identity for $S_N$, leaving two factors to be handled by Parseval.
[quotetheorem:9084]
[citeproof:9084]
The hypotheses are matched to the proof. The set $\mathfrak m$ must be the complement of the same logarithmic major arcs used in Vinogradov's estimate; if one enlarged the minor arcs by removing only neighbourhoods of $0$ and not of other rationals $a/q$ with small $q$, then points near $1/3$ or $1/5$ could still carry major-arc structure and the pointwise bound would not apply. The choice of $B(A)$ is also necessary because Parseval costs a factor $\sum_{n\le N}\Lambda(n)^2\ll N\log N$, so the pointwise estimate must be requested with enough spare logarithmic saving.
This theorem does not identify the minor-arc integral asymptotically; it only proves that it is smaller than the major term by an arbitrary logarithmic factor. It also explains why three variables are accessible: two powers of $S_N$ can be absorbed by Parseval, while the remaining factor is controlled pointwise on the minor arcs. The same calculation shows the boundary of the method: if only two prime variables are present, Parseval accounts for both factors and leaves no extra small supremum factor to force a logarithmic saving in the final integral.
The final theorem now has two independent ingredients with compatible sizes. The major arcs produce a positive term of order $N^2$ for odd $N$, while the minor arcs are smaller by an arbitrary logarithmic factor. What remains is to translate positivity of the weighted von Mangoldt count into positivity of an actual prime representation, which requires checking that prime powers have not supplied all the weight.
## Vinogradov's Three-Primes Theorem
The final question is how the major-arc and minor-arc estimates combine to produce an additive theorem about primes. The answer is that the weighted count has a positive asymptotic for every sufficiently large odd $N$.
[quotetheorem:9085]
[citeproof:9085]
The phrase "sufficiently large" is unavoidable for this proof because the argument proves an asymptotic inequality, not an exact finite verification; it supplies no effective numerical threshold in the form stated here. Oddness is also not a technical artefact: apart from representations involving the prime $2$, a sum of three odd primes is odd, and the singular series encodes this local parity condition. The theorem says nothing about small odd integers below the threshold, and it does not imply binary Goldbach, where the minor-arc problem has one fewer averaging factor.
The conclusion is an existence theorem, not a uniform formula for the number of representations of every $N$. The weighted asymptotic in the proof gives the route to positivity for large odd $N$, while separate estimates remove the lower-order contribution of prime powers. The theorem is the prototype for additive prime problems in the circle method: encode the representation count by a prime exponential sum, compute the major arcs using prime distribution in arithmetic progressions, and control the minor arcs through bilinear decompositions of $\Lambda$. Chapter 8 shifts from weighted sequences such as $\Lambda$ to subsets of finite groups, where Fourier coefficients measure additive structure rather than prime distribution.
[example: Shape of the Final Asymptotic]
For large odd $N$, the Fourier identity for $R_3(N)$, the major-arc main term, and the minor-arc bound give
\begin{align*}
R_3(N)=\frac{1}{2}\mathfrak S_3(N)N^2+O_C\left(\frac{N^2}{(\log N)^C}\right).
\end{align*}
Since $C$ may be chosen fixed and positive, the error satisfies $N^2/(\log N)^C=o(N^2)$, so this is the weighted asymptotic
\begin{align*}
R_3(N)=\frac{1}{2}\mathfrak S_3(N)N^2+o(N^2).
\end{align*}
The factor $N^2/2$ is the continuous-scale version of the elementary count of positive integer triples. Indeed, for each $n_1$ with $1\le n_1\le N-2$, the variable $n_2$ can take the values $1,\dots,N-n_1-1$, and then $n_3=N-n_1-n_2$ is forced. Hence
\begin{align*}
\#\{(n_1,n_2,n_3)\in\mathbb N^3:n_1+n_2+n_3=N\}=\sum_{n_1=1}^{N-2}(N-n_1-1).
\end{align*}
Putting $r=N-n_1-1$ turns this into
\begin{align*}
\sum_{n_1=1}^{N-2}(N-n_1-1)=\sum_{r=1}^{N-2}r.
\end{align*}
Using the finite arithmetic-series formula,
\begin{align*}
\sum_{r=1}^{N-2}r=\frac{(N-2)(N-1)}{2}.
\end{align*}
Expanding the numerator gives
\begin{align*}
\frac{(N-2)(N-1)}{2}=\frac{N^2-3N+2}{2}.
\end{align*}
Thus the leading real density is
\begin{align*}
\frac{N^2-3N+2}{2}=\frac{N^2}{2}+o(N^2).
\end{align*}
The singular series $\mathfrak S_3(N)$ modifies this real density by the local congruence densities. For odd $N$, the major-arc theorem gives a constant $c>0$ with $\mathfrak S_3(N)\ge c$, so the main term is at least $cN^2/2$. For sufficiently large $N$, the $o(N^2)$ error has absolute value at most $cN^2/4$, and therefore
\begin{align*}
R_3(N)\ge \frac{c}{2}N^2-\frac{c}{4}N^2=\frac{c}{4}N^2>0.
\end{align*}
This positivity is the bridge from the asymptotic formula to existence: once the lower-order prime-power contribution is removed, a positive weighted count forces at least one representation of $N$ as a sum of three primes.
[/example]
Chapter 7 established that prime-variable representation problems can still be attacked by Fourier analysis and exponential-sum estimates. Chapter 8 returns to finite cyclic groups and asks a different but related question: how Fourier mass, additive energy, and density control the additive structure of a set.
# 8. Additive energy, density, and Fourier control
This chapter returns to the finite Fourier identities of Chapter 1 and turns them into quantitative tools for additive problems in finite cyclic groups. The guiding question is how much information about a set is contained in the size and location of its Fourier coefficients. Additive energy measures the number of additive coincidences in a set, while density increments convert a single strong frequency into a smaller region where the set is denser. Three-term arithmetic progressions provide the model problem because their count is both a Fourier expression and a test of additive structure.
## Additive Energy on Finite Cyclic Groups
The first problem is to measure whether a subset of a finite group behaves like a random set or contains unusually many additive coincidences. A naive count of pairs or single elements cannot see additive structure: an interval and a random set may have the same density, but the interval has many more repeated sums. The ambient group for this chapter is $G = \mathbb{Z}/N\mathbb{Z}$, written additively, and sums over $G$ always mean sums over residue classes modulo $N$.
For $r,x \in G$, write
\begin{align*}
e_N(rx) = \exp(2\pi i rx/N).
\end{align*}
The Fourier transform records correlation with additive characters, so it is the correct language for translating additive equations into products.
[definition: Fourier Transform on a Finite Cyclic Group]
Let $f:G\to \mathbb C$. Its Fourier transform is the function $\hat f:G\to \mathbb C$ defined by
\begin{align*}
\hat f(r)=\sum_{x\in G} f(x)e_N(-rx).
\end{align*}
[/definition]
The definition supplies the coefficients, but the first obstruction is possible loss of information: a list of correlations with characters is useful only if it can reconstruct the original function. Without reconstruction, later Fourier expansions would give auxiliary statistics rather than exact counts.
Finite character orthogonality resolves this obstruction by providing an inversion formula with the correct normalising factor. This is the point at which Fourier coefficients become a genuine coordinate system for functions on $G$.
[quotetheorem:4595]
[citeproof:4595]
The factor $1/N$ and the chosen normalisation are essential: without them, the orthogonality sum would recover $Nf(x)$ rather than $f(x)$. Finiteness is also being used in a strong way, since all sums can be interchanged without convergence hypotheses. Inversion says that no information has been lost, but it does not say whether the information is concentrated in a useful place; a function may be exactly recoverable while all of its non-zero Fourier coefficients are too small to isolate by hand. Density arguments therefore require quantitative accounting of the total size of a function. The next theorem gives that accounting by comparing physical-side $L^2$ mass with Fourier-side square mass.
[quotetheorem:434]
[citeproof:434]
Parseval is the basic accounting principle behind the large-coefficient arguments later in the chapter. Its hypotheses are deliberately modest: no smoothness or decay is needed, because $G$ is finite and the characters form an orthogonal basis. The limitation is that Parseval controls only a second moment, so it cannot by itself distinguish one coefficient of size $\eta N$ from many coefficients whose squares add to the same total. To detect repeated additive coincidences, we need a statistic whose Fourier expression involves a higher moment.
The next definition packages the number of ways in which two sums from $A$ can be equal. It is large for highly structured sets, such as intervals or subgroups, and close to the random-set scale when additive coincidences are scarce. This is needed because a naive density comparison misses repeated-sum structure: two sets with the same cardinality can have very different numbers of solutions to $a_1+a_2=a_3+a_4$.
[definition: Additive Energy]
Let $A\subset G$. The additive energy functional is the map $E: \mathcal{P}(G)\to \mathbb{N}$ defined by
\begin{align*}
E(A)=|\{(a_1,a_2,a_3,a_4)\in A^4:a_1+a_2=a_3+a_4\}|.
\end{align*}
[/definition]
The definition is combinatorial, but it hides which frequencies are responsible for repeated sums. Directly counting quadruples can show that energy is large, yet it does not reveal whether that largeness comes from one strong bias or from many smaller biases.
To connect energy with Fourier uniformity, we need an identity that rewrites this quadruple count as a fourth moment of the Fourier coefficients. That identity is what makes additive energy a usable diagnostic for structure.
[quotetheorem:9086]
[citeproof:9086]
The theorem uses the exact equality condition $a_1+a_2=a_3+a_4$ in the group $G$; changing the equation or working in an interval without controlling wraparound would change the representation function. The fourth power is also a limitation as well as a strength: it detects global concentration of Fourier mass, but it does not identify which coefficient should be used for a density increment. Thus energy is a structural diagnostic, while later arguments still need a way to turn one large coefficient into a smaller denser problem.
The identity separates additive structure by moments: a set with Fourier mass concentrated on a few frequencies has larger energy than a set whose non-zero mass is spread thinly. The following example illustrates this distinction at the level of scale rather than exact constants.
[example: Random And Structured Energies]
Let $A\subset G$ be chosen by independent Bernoulli variables $X_x=\mathbf{1}_A(x)$ with $\mathbb E X_x=\alpha$. Then
\begin{align*}
\mathbb E[E(A)]=\sum_{a_1+a_2=a_3+a_4}\mathbb E[X_{a_1}X_{a_2}X_{a_3}X_{a_4}].
\end{align*}
There are exactly $N^3$ additive quadruples, since $a_1,a_2,a_3$ determine $a_4=a_1+a_2-a_3$. Among them, all but $O(N^2)$ have four distinct entries: if two of $a_1,a_2,a_3,a_4$ are forced equal, then at most two independent group elements remain to be chosen. For a quadruple with four distinct entries,
\begin{align*}
\mathbb E[X_{a_1}X_{a_2}X_{a_3}X_{a_4}]=\mathbb E X_{a_1}\mathbb E X_{a_2}\mathbb E X_{a_3}\mathbb E X_{a_4}=\alpha^4,
\end{align*}
by independence. The repeated-entry quadruples contribute at most $O(\alpha^2N^2)$ when $\alpha N\ge 1$, because their number is $O(N^2)$ and their expectations are bounded by a constant multiple of $\alpha^2$ except for the $N$ quadruples $(x,x,x,x)$, whose total contribution $\alpha N$ is then at most $\alpha^2N^2$. Hence
\begin{align*}
\mathbb E[E(A)] = \alpha^4N^3+O(\alpha^2N^2).
\end{align*}
Now let $A=\{0,1,\dots,L-1\}\subset \mathbb Z/N\mathbb Z$ with $L=\alpha N$ and assume $2L<N$ to avoid wraparound in the sums below. For $0\le t\le L-1$, the equation $a+b=t$ has the $t+1$ solutions
\begin{align*}
(a,b)=(0,t),(1,t-1),\dots,(t,0).
\end{align*}
For $L-1\le t\le 2L-2$, it has the $2L-1-t$ solutions with $0\le a,b\le L-1$. Therefore
\begin{align*}
E(A)=\sum_t R_A(t)^2\ge \sum_{t=0}^{L-1}(t+1)^2.
\end{align*}
Since $\sum_{m=1}^{L}m^2=L(L+1)(2L+1)/6$, this gives
\begin{align*}
E(A)\ge \frac{L(L+1)(2L+1)}{6}\ge \frac{L^3}{3}.
\end{align*}
Also $R_A(t)\le L$ for every $t$ and at most $2L-1$ values of $t$ occur, so
\begin{align*}
E(A)=\sum_t R_A(t)^2\le (2L-1)L^2\le 2L^3.
\end{align*}
Thus the interval has $E(A)$ comparable to $L^3=\alpha^3N^3$, larger than the random main scale $\alpha^4N^3$ when $\alpha$ is small. Fourier-theoretically, this says that random sets distribute their non-zero Fourier mass across many frequencies, while intervals place substantial mass on low frequencies.
[/example]
This comparison motivates the central dichotomy of the chapter. Either the Fourier mass is spread out, in which case additive counts resemble their random main terms, or some coefficient is large, in which case the set has detectable structure.
## Large Fourier Coefficients and Density Increments
The next problem is to turn a large Fourier coefficient into a usable structural statement. A coefficient $\widehat{\mathbf{1}_A}(r)$ measures bias of $A$ against the character $x\mapsto e_N(rx)$, but density increment arguments need a concrete region, such as a progression, where the set has higher density.
For a set $A\subset G$ with density $\alpha$, the balanced function removes the zero-frequency main term. A direct Fourier transform of $\mathbf{1}_A$ is misleading at frequency $0$, since $\widehat{\mathbf{1}_A}(0)=|A|$ is large for every dense set and says nothing about additive irregularity. The balanced function is the object whose non-zero Fourier coefficients measure deviations from uniformity.
[definition: Balanced Function]
Let $A\subset G$ have density $\alpha=|A|/N$. The balanced function associated to $A$ is the map $f_A:G\to \mathbb C$ defined by
\begin{align*}
f_A(x)=\mathbf{1}_A(x)-\alpha.
\end{align*}
[/definition]
The zero-frequency coefficient of $f_A$ vanishes, and Parseval gives
\begin{align*}
\sum_{r\in G}|\widehat{f_A}(r)|^2=N|A|(1-\alpha).
\end{align*}
Thus a large non-zero coefficient is a genuine irregularity rather than a restatement of the density of $A$.
[example: A Set With One Large Fourier Coefficient]
Let $N>12$, and write each residue $x\in G$ by the representative $m\in\{0,1,\dots,N-1\}$. The condition $\operatorname{Re}(e_N(x))\ge 0$ is the condition
\begin{align*}
\cos(2\pi m/N)\ge 0.
\end{align*}
This holds on an arc of angular length $\pi$, so the number of allowed residues differs from $N/2$ by at most $2$. Hence $|A|=N/2+O(1)$ and the density $\alpha=|A|/N$ satisfies $\alpha=1/2+O(1/N)$.
We compute the coefficient at frequency $1$. Since $1\ne 0$ in $G$,
\begin{align*}
\sum_{x\in G}e_N(-x)=0.
\end{align*}
Therefore
\begin{align*}
\widehat{f_A}(1)=\sum_{x\in G}(\mathbf{1}_A(x)-\alpha)e_N(-x)=\sum_{x\in A}e_N(-x)-\alpha\sum_{x\in G}e_N(-x)=\sum_{x\in A}e_N(-x).
\end{align*}
Taking real parts gives
\begin{align*}
|\widehat{f_A}(1)|\ge \operatorname{Re}\widehat{f_A}(1)=\sum_{x\in A}\cos(2\pi x/N).
\end{align*}
Now let
\begin{align*}
B=\{x\in G:\text{some representative }m\text{ of }x\text{ satisfies }-N/6\le m\le N/6\}.
\end{align*}
For every $x\in B$, we have $|2\pi m/N|\le \pi/3$, so
\begin{align*}
\cos(2\pi m/N)\ge \cos(\pi/3)=1/2.
\end{align*}
Thus $B\subset A$, and $|B|\ge N/3-2$. Since every term $\cos(2\pi x/N)$ with $x\in A$ is non-negative,
\begin{align*}
\sum_{x\in A}\cos(2\pi x/N)\ge \sum_{x\in B}\cos(2\pi x/N)\ge \frac{|B|}{2}\ge \frac{N}{6}-1.
\end{align*}
For $N>12$, this is at least $N/12$, so
\begin{align*}
|\widehat{f_A}(1)|\ge N/12.
\end{align*}
Thus the balanced function has a Fourier coefficient of size comparable to $N$ at the single frequency $r=1$, coming from a geometric bias toward one half of the unit circle rather than from subgroup structure.
[/example]
The example shows what a large coefficient looks like, but a large global bias is not yet a smaller set on which the density has increased. The obstruction is geometric: the values of a non-zero character live around the unit circle, while an additive-combinatorial iteration needs an arithmetic progression inside the original group.
The next question is therefore not merely whether the Fourier coefficient is large, but whether its phase information can be straightened into an interval-like region of the group. For a prime [cyclic group](/page/Cyclic%20Group), multiplication by a non-zero frequency is invertible, so an arc of favorable phases can be pulled back to a long arithmetic progression. A density-increment lemma is the formal tool that makes this conversion quantitative: it turns one large non-zero coefficient into a structured progression on which $A$ has genuinely larger relative density.
[quotetheorem:9087]
[citeproof:9087]
This lemma is the engine of Roth-type arguments. The prime-modulus hypothesis is not cosmetic: in a composite group $\mathbb Z/N\mathbb Z$, a non-zero frequency $r$ with $\gcd(r,N)>1$ does not define an invertible dilation, so the pullback of a phase interval can split into several residue classes rather than a single long proper progression. The non-zero-frequency hypothesis is also necessary because the zero coefficient only records $\sum_x f_A(x)=0$ and cannot point to a denser region. The largeness threshold matters as well: a coefficient of size $o(N)$ may express a small oscillation spread across the whole group and need not produce a density increase of fixed size on any long progression. The conclusion is weaker than finding a subgroup, since $\mathbb Z/N\mathbb Z$ has few subgroups when $N$ is prime, but an arithmetic progression is enough because the original problem can be rescaled onto that progression.
[remark: Normalising the Ambient Modulus]
In applications to subsets of $\{1,\dots,N\}$, one often embeds the interval into $\mathbb Z/M\mathbb Z$ with $M$ comparable to $N$ and with no wraparound for the additive configurations being counted. Taking $M$ prime is convenient when dilation by a non-zero frequency must be invertible. The loss in density and length is absorbed into absolute constants.
[/remark]
The contrapositive viewpoint is just as important. If a set has no useful density increment, then all its non-zero Fourier coefficients are small, and Fourier expansions of additive counts should be dominated by their zero-frequency terms.
## Three-Term Arithmetic Progressions
The model additive problem is to count triples $(x,x+d,x+2d)$ lying in a set. The equation is linear, translation-invariant, and simple enough that the full Fourier expansion can be written in one line, yet strong enough to detect additive structure. Counting only pairs would miss the first genuinely additive obstruction: a dense set can have many pairs while still avoiding non-degenerate three-term progressions.
[definition: Three-Term Arithmetic Progression Count]
The three-term progression-counting functional is the trilinear map $\Lambda_3: \mathbb C^G\times \mathbb C^G\times \mathbb C^G\to \mathbb C$ defined by
\begin{align*}
\Lambda_3(f_0,f_1,f_2)=\frac{1}{N^2}\sum_{x,d\in G} f_0(x)f_1(x+d)f_2(x+2d).
\end{align*}
For a set $A\subset G$, write $\Lambda_3(A)=\Lambda_3(\mathbf{1}_A,\mathbf{1}_A,\mathbf{1}_A)$.
[/definition]
The normalisation is chosen so that a random set of density $\alpha$ has expected value about $\alpha^3$. To prove this rigorously under Fourier-uniformity hypotheses, we need the exact expansion of this average in frequency space.
[quotetheorem:9088]
[citeproof:9088]
The expansion isolates the zero-frequency contribution and leaves a non-zero-frequency error term. The identity itself is valid for every cyclic modulus; no oddness assumption is needed to derive the displayed frequency relation. Oddness enters later, when a uniformity estimate treats the factor $\widehat{f_1}(-2r)$ as a genuinely non-zero frequency whenever $r\ne 0$. Thus the expansion becomes powerful only when paired with a uniformity hypothesis or with a contrapositive argument forcing a large coefficient. The following example shows how a uniform bound for the balanced Fourier coefficients controls that error.
[example: Counting Progressions by Fourier Expansion]
Let $A\subset G$ have density $\alpha$, and write $\mathbf{1}_A=\alpha+f_A$. By *Fourier Expansion for Three-Term Progressions*,
\begin{align*}
\Lambda_3(A)=\frac{1}{N^3}\sum_{r\in G}\widehat{\mathbf{1}_A}(r)^2\widehat{\mathbf{1}_A}(-2r).
\end{align*}
The term $r=0$ is
\begin{align*}
\frac{1}{N^3}\widehat{\mathbf{1}_A}(0)^2\widehat{\mathbf{1}_A}(0)=\frac{1}{N^3}|A|^3=\frac{1}{N^3}(\alpha N)^3=\alpha^3.
\end{align*}
Thus
\begin{align*}
\Lambda_3(A)-\alpha^3=\frac{1}{N^3}\sum_{r\ne 0}\widehat{\mathbf{1}_A}(r)^2\widehat{\mathbf{1}_A}(-2r).
\end{align*}
For $r\ne 0$, the constant function $\alpha$ has Fourier coefficient
\begin{align*}
\sum_{x\in G}\alpha e_N(-rx)=\alpha\sum_{x\in G}e_N(-rx)=0
\end{align*}
by character orthogonality, so $\widehat{\mathbf{1}_A}(r)=\widehat{f_A}(r)$. If $N$ is odd, then $r\ne 0$ also implies $-2r\ne 0$ in $G$, and hence $\widehat{\mathbf{1}_A}(-2r)=\widehat{f_A}(-2r)$. Therefore
\begin{align*}
\Lambda_3(A)-\alpha^3=\frac{1}{N^3}\sum_{r\ne 0}\widehat{f_A}(r)^2\widehat{f_A}(-2r).
\end{align*}
Taking absolute values and using $\sup_{s\ne 0}|\widehat{f_A}(s)|\le \eta N$ gives
\begin{align*}
|\Lambda_3(A)-\alpha^3|\le \frac{\eta N}{N^3}\sum_{r\ne 0}|\widehat{f_A}(r)|^2.
\end{align*}
By *Parseval Identity on a Finite Cyclic Group*,
\begin{align*}
\sum_{r\in G}|\widehat{f_A}(r)|^2=N\sum_{x\in G}|f_A(x)|^2.
\end{align*}
Since $f_A(x)=1-\alpha$ on $A$ and $f_A(x)=-\alpha$ off $A$,
\begin{align*}
\sum_{x\in G}|f_A(x)|^2=|A|(1-\alpha)^2+(N-|A|)\alpha^2=\alpha N(1-\alpha)^2+(1-\alpha)N\alpha^2=\alpha(1-\alpha)N.
\end{align*}
Hence
\begin{align*}
\sum_{r\in G}|\widehat{f_A}(r)|^2=\alpha(1-\alpha)N^2\le \alpha N^2.
\end{align*}
Substituting this into the error bound yields
\begin{align*}
|\Lambda_3(A)-\alpha^3|\le \frac{\eta N}{N^3}\alpha N^2=\eta\alpha.
\end{align*}
Thus, under the stated uniform Fourier bound, the three-term progression count differs from the random main value $\alpha^3$ by at most $\eta\alpha$.
[/example]
The previous example shows that small non-zero Fourier coefficients force the three-term progression count to stay near its random main term. The obstruction to a progression-free dense set is therefore quantitative: if the count is missing, the error terms in the Fourier expansion must have cancelled a main term of size about $\alpha^3$.
The large-spectrum step identifies where that cancellation can live. It shows that a progression-free set must have a non-zero Fourier coefficient large enough to feed the density-increment mechanism.
The theorem below is the contrapositive form needed for Roth's argument. Rather than estimating a known large coefficient, it starts from the absence of non-degenerate three-term progressions and forces a large Fourier mode to appear, turning a counting failure into usable structure.
[quotetheorem:9089]
[citeproof:9089]
The lower bound on $N$ removes the degenerate progressions $d=0$ from the main comparison; without it, the diagonal triples could be too large relative to $\alpha^3$. The oddness condition ensures that the Fourier expansion treats $r$ and $-2r$ without losing information through the doubling map. The theorem is a trigger rather than a conclusion: it does not locate a progression directly, but it produces exactly the large coefficient needed by the density increment lemma.
The large-spectrum theorem gives a trigger, and the density-increment lemma moves the problem to a denser subprogression. The remaining obstruction is termination: an iteration could in principle continue while the ambient progression shrinks, so one must balance the density gain against the loss of length.
Roth's argument closes this loop by showing that, above a certain density threshold, the iteration cannot avoid producing a non-degenerate three-term progression. The next theorem packages the whole iteration into a final density bound: it identifies a scale at which any denser set must either already contain a progression or have exhausted the room needed for further density increments.
[quotetheorem:9090]
[citeproof:9090]
This proves Roth's qualitative conclusion with a weak quantitative bound. The hypothesis that the progression be non-degenerate matters because every element gives a degenerate triple with $d=0$, so those triples cannot certify additive structure. The proof also shows where the quantitative weakness enters: each step keeps only one Fourier coefficient and pays a length factor polynomial in the current density. The remaining question is how to retain more spectral information at each stage, which points toward the stronger large-spectrum and Bohr-set methods that later courses develop.
[remark: Why This Roth Bound Is Weak]
The argument uses only a single large Fourier coefficient at each stage, so it wastes information contained in the whole large spectrum. Stronger proofs combine many coefficients through Bohr sets, almost-periodicity, or energy increment methods. The value of the weak argument is that it isolates the fundamental Fourier dichotomy: uniformity counts progressions, non-uniformity raises density somewhere smaller.
[/remark]
## Energy, Uniformity, and the Course Perspective
The final question is how this chapter fits into the surrounding analytic number theory. Earlier chapters used exponential sums to control error terms in counting problems; here the same philosophy is applied to subsets rather than weighted sequences. The Fourier transform measures uniformity, Parseval turns that measurement into exact identities, and density increments convert failure of uniformity into new structure.
[example: Energy Versus Progression Uniformity]
Let $A\subset G$ have density $\alpha$, and write
\begin{align*}
M=\sup_{r\ne 0}|\widehat{\mathbf{1}_A}(r)|.
\end{align*}
By *Fourier Formula for Additive Energy*,
\begin{align*}
E(A)=\frac{1}{N}\sum_{r\in G}|\widehat{\mathbf{1}_A}(r)|^4.
\end{align*}
The zero-frequency term is
\begin{align*}
\frac{1}{N}|\widehat{\mathbf{1}_A}(0)|^4=\frac{1}{N}|A|^4=\frac{1}{N}(\alpha N)^4=\alpha^4N^3.
\end{align*}
Thus the remaining energy is the fourth-moment sum
\begin{align*}
E(A)-\alpha^4N^3=\frac{1}{N}\sum_{r\ne 0}|\widehat{\mathbf{1}_A}(r)|^4.
\end{align*}
This quantity can be sizeable because many non-zero frequencies may each contribute a medium-sized fourth power; it is not determined only by the largest coefficient $M$.
Progression counting uses a different Fourier expression. By *Fourier Expansion for Three-Term Progressions*,
\begin{align*}
\Lambda_3(A)=\frac{1}{N^3}\sum_{r\in G}\widehat{\mathbf{1}_A}(r)^2\widehat{\mathbf{1}_A}(-2r).
\end{align*}
The zero-frequency term is
\begin{align*}
\frac{1}{N^3}\widehat{\mathbf{1}_A}(0)^2\widehat{\mathbf{1}_A}(0)=\frac{1}{N^3}(\alpha N)^3=\alpha^3.
\end{align*}
When $N$ is odd, the non-zero-frequency error has the form
\begin{align*}
\Lambda_3(A)-\alpha^3=\frac{1}{N^3}\sum_{r\ne 0}\widehat{\mathbf{1}_A}(r)^2\widehat{\mathbf{1}_A}(-2r),
\end{align*}
so it is controlled by a mixed cubic expression rather than by the fourth moment alone.
Conversely, suppose some non-zero frequency satisfies $|\widehat{\mathbf{1}_A}(r_0)|\ge \eta N$. Its single contribution to the energy sum is
\begin{align*}
\frac{1}{N}|\widehat{\mathbf{1}_A}(r_0)|^4\ge \frac{1}{N}(\eta N)^4=\eta^4N^3.
\end{align*}
For the balanced function this same kind of bound, $|\widehat{f_A}(r_0)|\ge \eta N$, is exactly the input to *[Fourier Density Increment Lemma](/theorems/9087)*, which produces a progression on which the density of $A$ increases by a constant multiple of $\eta$. Thus energy is a global measure of additive structure, while a large coefficient is a local Fourier witness that can drive an iteration.
[/example]
The main lesson is a reusable three-step method. First, express an additive count through Fourier expansion. Second, separate the zero-frequency main term from the non-zero-frequency error. Third, if the error is too large, locate a large Fourier coefficient and pass to a denser subproblem. This method parallels the circle-method decomposition from Chapters 5--7: zero frequencies provide main terms, non-zero frequencies provide errors, and large errors reveal structure. Chapter 9 applies the same Fourier-control principle to distribution modulo one.
Chapter 8 showed that nontrivial Fourier information detects additive structure and can be used to pass to denser subproblems. Chapter 9 carries that same philosophy into distribution modulo one, where cancellation in nonzero Fourier modes becomes a test for equidistribution and Diophantine behavior.
# 9. Fractional parts and Diophantine applications
We now turn from estimating exponential sums for their own sake to using them as measuring devices for distribution modulo $1$. The bridge is the principle that a sequence is well distributed when all its non-zero Fourier modes have cancellation. Throughout this chapter, $e(t):=e^{2\pi i t}$. The prerequisites are Weyl differencing from Chapter 2, polynomial exponential-sum bounds from Chapters 3--4, and the elementary Fourier approximation of interval indicators used in Chapter 8. This chapter makes the bridge quantitative, first through Weyl's criterion and the Erdos-Turan inequality, then through Diophantine applications to small fractional parts of polynomial sequences.
## Measuring Distribution Modulo One
How can cancellation in sums of $e(hx_n)$ say that the points $x_n$ spread evenly around the unit circle? The right language is discrepancy: it compares the empirical count in intervals with the interval length, so it records the largest visible failure of uniform distribution.
[definition: Fractional Part And Distance To The Nearest Integer]
The fractional-part map is
\begin{align*}
\{\cdot\}:\mathbb R&\to [0,1), & x&\mapsto x-\lfloor x\rfloor.
\end{align*}
The distance-to-the-nearest-integer map is
\begin{align*}
\|\cdot\|:\mathbb R&\to [0,1/2], & x&\mapsto \min_{m\in\mathbb Z}|x-m|.
\end{align*}
[/definition]
The notation $\|x\|$ is special to this chapter and always means distance to $\mathbb Z$, not a vector norm. It is suited to Diophantine inequalities because conditions such as $\|\alpha n^k\|<\delta$ ask whether the phase $\alpha n^k$ lies near an integer.
[example: Fractional Parts Of A Rational Progression]
Let $\alpha=a/q$ with $\gcd(a,q)=1$ and $q\ge 2$. For each $n$, write $an=qm_n+r_n$ with $m_n\in\mathbb Z$ and $0\le r_n<q$. Then
\begin{align*}\{\alpha n\}=\left\{\frac{an}{q}\right\}=\left\{m_n+\frac{r_n}{q}\right\}=\frac{r_n}{q}.\end{align*}
Hence every fractional part belongs to $\{0,1/q,\dots,(q-1)/q\}$.
Write $N=Lq+r$ with $L=\lfloor N/q\rfloor$ and $0\le r<q$. Among the integers $1,\dots,N$, each residue class modulo $q$ occurs either $L$ or $L+1$ times: in the first $Lq$ terms each class occurs exactly $L$ times, and the remaining $r$ terms add one occurrence to $r$ classes. Since $\gcd(a,q)=1$, multiplication by $a$ permutes the residue classes modulo $q$, so the residues $an\bmod q$ also occur either $\lfloor N/q\rfloor$ or $\lceil N/q\rceil$ times.
Thus the progression is periodic modulo $1$ and evenly distributed among the $q$ lattice points. It is not uniformly distributed in $[0,1)$: the interval $[1/(2q),1/q)$ has length $1/(2q)$ but contains none of the points $0,1/q,\dots,(q-1)/q$, so its empirical count is always $0$.
[/example]
The rational example shows why irrationality must enter the distribution theory: periodicity can distribute mass evenly among finitely many points while still missing most intervals. To compare arbitrary finite samples with the uniform measure on $[0,1)$, we introduce a single quantity that records the worst interval-counting error.
[definition: Discrepancy]
For $N\ge 1$, the discrepancy functional
\begin{align*}
D_N:\mathbb R^N\to [0,1]
\end{align*}
is defined by
\begin{align*}
D_N(x_1,\dots,x_N):=\sup_{0\le a<b\le 1}\left|\frac{1}{N}\#\{1\le n\le N:\{x_n\}\in [a,b)\}-(b-a)\right|.
\end{align*}
[/definition]
Small discrepancy means that every interval receives approximately its expected proportion of points. The supremum over intervals is important: controlling only a few fixed intervals would miss clustering on shorter scales. The limiting notion used in applications asks for this interval-counting error to disappear as the number of sampled points grows.
[definition: Uniform Distribution Modulo One]
A sequence $(x_n)_{n\ge 1}$ in $\mathbb R$ is uniformly distributed modulo $1$ if, for every interval $[a,b)\subset [0,1)$,
\begin{align*}
\lim_{N\to\infty}\frac{1}{N}\#\{1\le n\le N:\{x_n\}\in [a,b)\}=b-a.
\end{align*}
[/definition]
Uniform distribution is stated interval by interval, while discrepancy gives one finite-$N$ number. We need to know that these two viewpoints agree, because later estimates will bound discrepancy but the conceptual goal is equidistribution.
[quotetheorem:9091]
[citeproof:9091]
This characterisation is useful conceptually, but it is not yet adapted to exponential sums. The hypothesis that all intervals are tested cannot be reduced to checking only a few preferred intervals. For instance, testing only the two dyadic halves would not detect the sequence that alternates between the small intervals $[0,1/10)$ and $[1/2,3/5)$ with equal frequencies: each half receives the right total mass, but the first half is still concentrated near $0$ and misses most of its subintervals. The theorem also does not give a rate, so it cannot distinguish a sequence with discrepancy $N^{-1/10}$ from one with discrepancy $1/\log N$ at the level of mere convergence. The next section replaces interval indicators by their Fourier approximations, which is the step that makes quantitative estimates possible.
## Weyl's Criterion With Error Terms
Weyl's criterion is often stated for an arbitrary sequence $(x_n)$ in $\mathbb R/\mathbb Z$: equidistribution is equivalent to cancellation of every nonzero Fourier average $N^{-1}\sum_{n\le N}e(hx_n)$. In this course, the criterion is used through the polynomial case, where $x_n=f(n)$ and the needed Fourier averages are precisely Weyl sums. The theorem quoted below is therefore the polynomial specialization of the criterion, with quantitative error terms rather than only a qualitative limit statement.
Which Fourier modes must be controlled to force equidistribution? Since interval indicators have Fourier series with non-zero frequencies, the relevant test objects are the exponential sums
\begin{align*}
S_h(N):=\sum_{n\le N} e(hx_n),\qquad h\in\mathbb Z.
\end{align*}
The zero mode only records the number of terms, while the non-zero modes detect bias.
[quotetheorem:3434]
[citeproof:3434]
Weyl's criterion is qualitative: it tells us which sums matter, but it does not say how much cancellation is enough at a fixed value of $N$. The condition for every non-zero $h$ is essential. For example, if the frequency $q$ is not tested, then the sequence $x_n=n/q$ has $e(qx_n)=1$ for every $n$, so it is trapped on a rational lattice and fails to be uniformly distributed, even though other selected frequencies may show cancellation over complete periods. The theorem also gives no usable finite error term, since convergence for each fixed $h$ may be very slow and not uniform in $h$. Additive applications need finite bounds, so we truncate the Fourier expansion and keep track of the truncation loss and the contribution of each retained frequency.
[quotetheorem:9092]
[citeproof:9092]
The theorem is the working quantitative form of Weyl's criterion used throughout this chapter. It says that to prove distribution at scale $N$, it suffices to estimate finitely many sums, with low frequencies weighted most strongly. Each hypothesis has a visible role. The zero mode is excluded because $N^{-1}\sum_{n\le N}e(0x_n)=1$ for every sequence and carries no distributional information. The finite cutoff cannot simply be removed at finite $N$: taking $H$ too large adds many poorly controlled frequencies, while taking $H$ too small leaves a visible smoothing error of size $1/H$. For instance, a sequence concentrated in an interval of length much smaller than $1/H$ can evade the retained low-frequency tests more easily than a sequence with the same concentration tested at frequencies comparable with the reciprocal length. Thus the theorem is useful only together with an explicit choice of $H$ matched to the available exponential-sum estimates.
[example: Discrepancy Of An Irrational Linear Sequence]
Let $x_n=\alpha n$ with $\alpha\in\mathbb R\setminus\mathbb Q$. Fix a non-zero integer $h$ and put $z=e(h\alpha)$. Since $h\alpha\notin\mathbb Z$, we have $z\ne 1$, and therefore
\begin{align*}
(1-z)\sum_{n=1}^{N}z^n=\sum_{n=1}^{N}z^n-\sum_{n=1}^{N}z^{n+1}=z-z^{N+1}.
\end{align*}
Dividing by $1-z$ gives
\begin{align*}
\sum_{n\le N}e(h\alpha n)=\sum_{n=1}^{N}z^n=\frac{z-z^{N+1}}{1-z}=e(h\alpha)\frac{1-e(h\alpha N)}{1-e(h\alpha)}.
\end{align*}
Since $|e(t)|=1$ for every real $t$, we have $|e(h\alpha)|=1$ and $|1-e(h\alpha N)|\le |1|+|e(h\alpha N)|=2$, so
\begin{align*}
\left|\sum_{n\le N}e(h\alpha n)\right|\le \frac{2}{|1-e(h\alpha)|}.
\end{align*}
Now fix $H\ge 1$. By the *[Quantitative Weyl Criterion](/theorems/9092)*,
\begin{align*}
D_N(\alpha,\dots,\alpha N)\le C\left(\frac{1}{H}+\sum_{1\le |h|\le H}\frac{1}{|h|}\left|\frac{1}{N}\sum_{n\le N}e(h\alpha n)\right|\right).
\end{align*}
Using the bound just proved for each of the finitely many non-zero integers with $|h|\le H$,
\begin{align*}
D_N(\alpha,\dots,\alpha N)\le C\left(\frac{1}{H}+\frac{1}{N}\sum_{1\le |h|\le H}\frac{2}{|h|\,|1-e(h\alpha)|}\right).
\end{align*}
For fixed $H$, the finite sum is a constant depending only on $\alpha$ and $H$, so
\begin{align*}
\limsup_{N\to\infty}D_N(\alpha,\dots,\alpha N)\le \frac{C}{H}.
\end{align*}
Letting $H\to\infty$ gives $D_N(\alpha,\dots,\alpha N)\to 0$, and the *Discrepancy Characterisation Of Uniform Distribution* then shows that $(\alpha n)_{n\ge 1}$ is uniformly distributed modulo $1$.
The proof also shows the finite-$N$ obstruction: if some tested frequency satisfies $\|h\alpha\|$ very small, then $e(h\alpha)$ is close to $1$, the denominator $|1-e(h\alpha)|$ is small, and the geometric sum may remain large over ranges where the phases $h\alpha n$ have not yet rotated much around the circle.
[/example]
The linear case is special because the exponential sum is geometric. Polynomial sequences require the Weyl sum estimates developed earlier in the course.
## The Erdos-Turan Inequality
Can the preceding quantitative criterion be made sharp enough to use directly in counting problems? The Erdos-Turan inequality is the standard form: it is a discrepancy bound with an explicit finite frequency cutoff and the optimal harmonic weights for this method.
[quotetheorem:9093]
[citeproof:9093]
This inequality is a clean example of the course philosophy: an arithmetic counting question is converted into finitely many exponential sums. The cutoff $H$ is a genuine part of the statement, not a technical decoration, because interval indicators have infinitely many Fourier modes and a finite sample cannot control all of them at once. The weights $1/h$ reflect the Fourier coefficients of intervals, so low-frequency bias is more dangerous than high-frequency oscillation. If all exponential sums were replaced by the bound $N$, the inequality would only give
\begin{align*}
D_N(x_1,\dots,x_N)\le \frac{3}{H+1}+3\sum_{h=1}^{H}\frac{1}{h},
\end{align*}
which is larger than a constant for moderate $H$ and gives no meaningful discrepancy information. Taking $H=1$ is also too coarse for sequences whose first Fourier mode is small but whose mass is concentrated at a finer scale. The next example shows how a cancellation-producing Weyl-sum estimate feeds directly into discrepancy.
[example: Quadratic Polynomial Discrepancy]
Let $x_n=\alpha n^2$, and suppose that for every $1\le h\le H$ there are coprime integers $a_h,q_h$ with
\begin{align*}\left|h\alpha-\frac{a_h}{q_h}\right|\le \frac{1}{q_h^2}.\end{align*}
For this fixed $h$, the phase $h x_n$ is $h\alpha n^2$, so the corresponding exponential sum is
\begin{align*}\sum_{n\le N}e(hx_n)=\sum_{n\le N}e(h\alpha n^2).\end{align*}
By the *Quadratic Weyl Inequality*, there is an absolute constant $A>0$ such that
\begin{align*}\left|\sum_{n\le N}e(h\alpha n^2)\right|\le A\left(Nq_h^{-1/2}+N^{1/2}q_h^{1/2}+q_h^{1/2}\right).\end{align*}
Applying the *[Erdos-Turan Discrepancy Inequality](/theorems/9093)* to the points $\alpha,\alpha 2^2,\dots,\alpha N^2$ gives
\begin{align*}D_N(\alpha,\alpha 2^2,\dots,\alpha N^2)\le \frac{3}{H+1}+\frac{3}{N}\sum_{h=1}^{H}\frac{1}{h}\left|\sum_{n\le N}e(h\alpha n^2)\right|.\end{align*}
Substituting the displayed Weyl bound into each summand gives
\begin{align*}D_N(\alpha,\alpha 2^2,\dots,\alpha N^2)\le \frac{3}{H+1}+\frac{3A}{N}\sum_{h=1}^{H}\frac{Nq_h^{-1/2}+N^{1/2}q_h^{1/2}+q_h^{1/2}}{h}.\end{align*}
Dividing the three terms in the numerator by $N$ yields
\begin{align*}D_N(\alpha,\alpha 2^2,\dots,\alpha N^2)\le \frac{3}{H+1}+3A\sum_{h=1}^{H}\frac{q_h^{-1/2}+N^{-1/2}q_h^{1/2}+N^{-1}q_h^{1/2}}{h}.\end{align*}
Thus the discrepancy of the quadratic fractional parts $\{\alpha n^2\}_{n\le N}$ is controlled explicitly by the cutoff error $3/(H+1)$ and by the rational approximation denominators $q_h$ for the frequencies $h\alpha$.
[/example]
The example also explains why minor arc information is useful. If the frequencies $h\alpha$ avoid rationals with small denominators, then the quadratic sums are small, and discrepancy follows.
[remark: Choice Of The Cutoff]
The parameter $H$ balances Fourier truncation against exponential-sum quality. Large $H$ reduces the approximation error $1/H$, but it also introduces more sums and may include frequencies for which $h\alpha$ has poorer Diophantine behaviour. In applications, $H$ is chosen after inserting the available Weyl or Vinogradov estimate.
[/remark]
The inequality is also flexible enough for weighted counts, but the unweighted form is the version needed for the Diophantine applications below. From this point on, the arithmetic input will be estimates for polynomial phases rather than abstract sequences. The guiding question is whether the oscillation forced by an irrational coefficient is strong enough to make the fractional parts behave like independent samples from the unit interval. That question naturally leads from general discrepancy bounds to Weyl's equidistribution theorem for polynomial sequences.
## Polynomial Sequences With Irrational Leading Coefficient
When does a polynomial sequence $P(n)$ become equidistributed modulo $1$? Lower-degree rational terms can create periodic perturbations, but an irrational coefficient in the highest degree forces oscillation after enough differencing.
[quotetheorem:3434]
[citeproof:3434]
The irrational nonconstant coefficient hypothesis is necessary. If all nonconstant coefficients are rational, then $P(n)$ is periodic modulo $1$ after passing to a common denominator, so the sequence can visit only finitely many fractional parts and cannot be uniformly distributed in $[0,1)$. The theorem is also qualitative: it proves cancellation for each fixed frequency but does not give a finite discrepancy rate. For this chapter the main quantitative case is an irrational leading coefficient, since the polynomial Weyl inequalities already proved read the answer from rational approximations to that coefficient.
[quotetheorem:9094]
[citeproof:9094]
The formula is less important than its shape: discrepancy is small when the leading coefficient avoids good rational approximations with denominator too small or too large relative to $N^k$. The statement depends on the explicitly chosen approximations $a_h/q_h$; a poor choice can weaken the displayed bound even when better approximations exist. A concrete obstruction is obtained by taking a Liouville number $\alpha$ with infinitely many approximations $|q_j\alpha-a_j|\le q_j^{-j}$ and choosing $N_j$ much smaller than a fixed power of $q_j$. Along such ranges the phase $\alpha n^k$ is very close to $(a_j/q_j)n^k$ for $n\le N_j$, so low-frequency sums can behave almost like rational polynomial sums rather than genuinely minor-arc sums. This is the quantitative form behind many finite equidistribution arguments, where the main work is to choose the approximations and the cutoff $H$ coherently.
[example: Cubic Sequence With Diophantine Leading Coefficient]
[claim]Under the Diophantine hypothesis $\|r\alpha\|\ge cr^{-\tau}$, the cubic sequence $P(n)=\alpha n^3$ has discrepancy $D_N(P(1),\dots,P(N))\ll N^{-\kappa}$ for some $\kappa>0$ after choosing the cutoff $H=N^\eta$ with $\eta>0$ sufficiently small.[/claim]
[proof]Fix $0<\eta<3/(2\tau)$ and put $H=\lfloor N^\eta\rfloor$. For each $1\le h\le H$, apply *Dirichlet's approximation theorem* with $Q=\lfloor N^{3/2}\rfloor$ to $h\alpha$. Thus there are coprime integers $a_h,q_h$ with $1\le q_h\le Q$ and
\begin{align*}|h\alpha-a_h/q_h|\le 1/(q_hQ).\end{align*}
Since $q_h\le Q$, we have $1/(q_hQ)\le 1/q_h^2$, so these approximations are admissible for *Quantitative Polynomial Equidistribution*.
The same approximation gives
\begin{align*}\|q_hh\alpha\|\le |q_hh\alpha-a_h|=q_h|h\alpha-a_h/q_h|\le 1/Q.\end{align*}
The Diophantine hypothesis applied with $r=q_hh$ gives
\begin{align*}c(q_hh)^{-\tau}\le \|q_hh\alpha\|.\end{align*}
Combining the two inequalities,
\begin{align*}c(q_hh)^{-\tau}\le 1/Q.\end{align*}
Multiplying by $Q(q_hh)^\tau/c$ and then taking $\tau$-th roots gives
\begin{align*}q_h\ge c^{1/\tau}Q^{1/\tau}h^{-1}.\end{align*}
Therefore
\begin{align*}q_h^{-1}\le c^{-1/\tau}hQ^{-1/\tau}.\end{align*}
Also $q_h\le Q$, so
\begin{align*}q_hN^{-3}\le QN^{-3}\le N^{-3/2}\end{align*}
for all sufficiently large $N$.
For cubic phases, *Quantitative Polynomial Equidistribution* gives, for every $\varepsilon>0$,
\begin{align*}D_N(P(1),\dots,P(N))\le \frac{3}{H+1}+C_\varepsilon N^\varepsilon\sum_{h=1}^{H}\frac{1}{h}(q_h^{-1}+N^{-1}+q_hN^{-3})^{1/4}.\end{align*}
Using $Q=\lfloor N^{3/2}\rfloor$ and $h\le H\le N^\eta$, the three terms inside the fourth root satisfy
\begin{align*}q_h^{-1}+N^{-1}+q_hN^{-3}\ll hN^{-3/(2\tau)}+N^{-1}+N^{-3/2}\ll N^{\eta-3/(2\tau)}+N^{-1}.\end{align*}
Choose $\sigma>0$ with
\begin{align*}\sigma<\min\{1,3/(2\tau)-\eta\}.\end{align*}
Then the preceding bound is $\ll N^{-\sigma}$, and hence
\begin{align*}(q_h^{-1}+N^{-1}+q_hN^{-3})^{1/4}\ll N^{-\sigma/4}.\end{align*}
Substituting this into the discrepancy estimate gives
\begin{align*}D_N(P(1),\dots,P(N))\ll N^{-\eta}+N^{\varepsilon-\sigma/4}\sum_{h=1}^{H}\frac{1}{h}.\end{align*}
Since $\sum_{h=1}^{H}h^{-1}\le 1+\log H\le 1+\eta\log N$, choose $\varepsilon<\sigma/8$ and absorb the logarithm into a smaller power of $N$. Then
\begin{align*}D_N(P(1),\dots,P(N))\ll N^{-\kappa}\end{align*}
for some $\kappa>0$ depending only on $c,\tau,\eta$, and the Weyl-inequality constant.[/proof]
Thus the Diophantine lower bound prevents the rational approximations used by the cubic Weyl bound from having denominators too small, and Erdos-Turan turns the resulting exponential-sum saving into a power-saving discrepancy estimate.
[/example]
The exact exponent is usually secondary in additive applications. What matters is that a power saving in exponential sums becomes a power saving in discrepancy after a controlled frequency summation.
## Small Fractional Parts And Counting Inequalities
Distribution modulo $1$ becomes a Diophantine tool when the interval is a small neighbourhood of $0$. We want to count solutions of
\begin{align*}
\|P(n)\|<\delta,
\end{align*}
where $1\le n\le N$ and $0<\delta\le 1/2$.
[quotetheorem:9095]
[citeproof:9095]
This result is deliberately elementary: all the analytic work is hidden in the discrepancy estimate. The factor $2$ in front of $ND_N$ is not an artefact of the proof with the present definition of discrepancy, because the near-integer window is represented by two intervals rather than a single interval in $[0,1)$. The restriction $\delta\le 1/2$ prevents these intervals from overlapping and keeps their total length equal to $2\delta$. The endpoint convention is part of the statement: discrepancy controls half-open intervals, while a strict inequality may exclude a boundary point with arbitrarily large multiplicity.
[quotetheorem:9096]
[citeproof:9096]
The inequality is the form most often used in Diophantine applications: if the exponential sums are small, then the expected number $2\delta N$ is the right scale for upper bounds. Its usefulness depends on making both error terms smaller than the main term, so choosing $H$ is tied to the size of $\delta$ as well as to the available cancellation. The result is also only as strong as the unweighted Erdos-Turan input; without a special symmetric-interval inequality or circular discrepancy, the factor $2$ from the two arcs near $0$ remains. The next example records the standard way this estimate is combined with a uniform Weyl or Vinogradov bound over the retained frequencies.
[example: Bounding Solutions To A Monomial Inequality]
Let $x_n=\alpha n^k$, where $k\ge 2$ and $0<\delta\le 1/2$. Assume that there are numbers $H\ge 1$ and $\rho>0$ such that, for every integer $h$ with $1\le h\le H$,
\begin{align*}\left|\sum_{n\le N}e(h\alpha n^k)\right|\le N^{1-\rho}.\end{align*}
We apply *[Exponential Sum Bound For Small Fractional Parts](/theorems/9096)* to the points $x_1,\dots,x_N$. Since $\|x_n\|=\|\alpha n^k\|$, it gives
\begin{align*}\#\{1\le n\le N:\|\alpha n^k\|<\delta\}\le 2\delta N+\frac{6N}{H+1}+6\sum_{h=1}^{H}\frac{1}{h}\left|\sum_{n\le N}e(h\alpha n^k)\right|.\end{align*}
For each summand, the assumed Weyl-type bound gives
\begin{align*}\frac{1}{h}\left|\sum_{n\le N}e(h\alpha n^k)\right|\le \frac{N^{1-\rho}}{h}.\end{align*}
Summing this inequality over $1\le h\le H$ yields
\begin{align*}\sum_{h=1}^{H}\frac{1}{h}\left|\sum_{n\le N}e(h\alpha n^k)\right|\le N^{1-\rho}\sum_{h=1}^{H}\frac{1}{h}.\end{align*}
Substituting into the small-fractional-parts bound gives
\begin{align*}\#\{1\le n\le N:\|\alpha n^k\|<\delta\}\le 2\delta N+\frac{6N}{H+1}+6N^{1-\rho}\sum_{h=1}^{H}\frac{1}{h}.\end{align*}
If $H=\lfloor N^\eta\rfloor$ for some fixed $\eta>0$, then $H+1\ge N^\eta$, so
\begin{align*}\frac{6N}{H+1}\le 6N^{1-\eta}.\end{align*}
Also,
\begin{align*}\sum_{h=1}^{H}\frac{1}{h}\le 1+\log H\le 1+\eta\log N.\end{align*}
Thus the error terms are bounded by $6N^{1-\eta}+6N^{1-\rho}(1+\eta\log N)$, so exponential-sum cancellation up to frequency $H$ turns directly into a quantitative upper bound for the number of solutions to $\|\alpha n^k\|<\delta$.
[/example]
The example gives an upper bound for how often a polynomial phase can lie close to an integer once exponential-sum cancellation is known. The complementary question is whether such a near-integer value must occur at least once. A discrepancy lower bound answers this after shrinking the target radius slightly, so that the half-open window it controls lies inside the strict inequality being sought.
[quotetheorem:9097]
[citeproof:9097]
This final theorem is the conceptual endpoint of the chapter. Exponential sums control discrepancy; discrepancy controls the number of near-integer values; and near-integer values are the Diophantine information needed in additive problems. The auxiliary radius $\eta<\delta$ keeps the endpoint convention from deciding the strict inequality. At the boundary where $2\eta N$ is comparable with the discrepancy error, the method may fail even though a solution happens to exist for accidental arithmetic reasons. Thus the theorem should be read as a sufficient criterion, not as a characterisation of when small fractional parts occur.
[remark: Link With Vinogradov Methods]
In Vinogradov-style arguments, the sums $\sum_{n\le N}e(\alpha n^k)$ are estimated differently on major and minor arcs. Major arcs describe structured rational approximations, while minor arcs give cancellation. The results of this chapter explain how those estimates also imply distribution statements for polynomial fractional parts, beyond their role in counting additive representations.
[/remark]
## Beyond and Connections
The circle method viewpoint developed here connects this note to several Androma topics. The Fourier transform on $\mathbb Z/q\mathbb Z$ is the finite-group model behind the later major-arc expansions, while Parseval-type square-mass identities explain why $L^2$ control can be useful even when pointwise cancellation remains difficult. The Gauss sum and complete-sum estimates point toward algebraic exponential sums, character sums, and the use of orthogonality in finite fields.
The major-arc analysis also connects to Dirichlet characters, arithmetic progressions, and prime number theorems in residue classes: these are the mechanisms that turn local congruence information into main terms. The minor-arc estimates connect to Vinogradov's method, bilinear decompositions, and bounds for exponential sums over primes. Together, these topics form the bridge from finite Fourier analysis to additive prime problems such as Goldbach-type questions and Waring-type problems.
Several internal links are especially natural after this course. For the finite Fourier material, see [Fourier Inversion Formula for Finite Abelian Groups](/theorems/4595) and [Orthogonality Relations for Characters of a Finite Abelian Group](/theorems/4594). For distribution modulo one and approximation, see [Weyl Equidistribution Theorem](/theorems/3434), [Weyl Equidistribution for Irrational Rotations](/theorems/3443), and [Dirichlet Approximation Theorem](/theorems/5516). For prime and additive applications, see [Basic Properties of Dirichlet Characters](/theorems/4365), [Prime Number Theorem for Arithmetic Progressions](/theorems/4383), [Bombieri-Vinogradov Theorem](/theorems/7189), [Sieve Upper Bound for Goldbach Representations](/theorems/7196), [Sieve Lower Bound for Goldbach with an Almost Prime](/theorems/7197), [Dirichlet's theorem on primes in arithmetic progressions](/theorems/1625), and [Analytic Number Theory II: Sieve Methods](/page/Analytic%20Number%20Theory%20II%3A%20Sieve%20Methods).
## References
- H. Davenport, *Multiplicative Number Theory*, 3rd ed., Springer, 2000.
- G. H. Hardy and E. M. Wright, *An Introduction to the Theory of Numbers*, 6th ed., Oxford University Press, 2008.
- H. Iwaniec and E. Kowalski, *Analytic Number Theory*, American Mathematical Society, 2004.
- H. L. Montgomery and R. C. Vaughan, *Multiplicative Number Theory I: Classical Theory*, Cambridge University Press, 2007.
- R. C. Vaughan, *The Hardy-Littlewood Method*, 2nd ed., Cambridge University Press, 1997.
Analytic Number Theory III: Exponential Sums and Additive Problems
Also known as: ["Analytic Number Theory III","Exponential Sums and Additive Problems","Exponential Sums","Additive Number Theory","Waring problem and exponential sums","Circle method and additive problems"]