This course develops the main sieve methods of analytic number theory, with an emphasis on how one estimates the size of sets of integers after removing those divisible by many small primes. The central problem is to turn arithmetic information about congruences and divisibility into upper and lower bounds for primes, almost primes, and other sparse sets. Along the way, the course builds a working vocabulary for sifting problems, inclusion-exclusion, and the balance between combinatorial structure and analytic estimates.
The first chapters set up the basic framework by formulating sifting problems and comparing naive inclusion-exclusion with Brun’s pure sieve and related combinatorial ideas. From there, the course moves to Selberg’s upper bound sieve and the linear sieve, which refine the method and make it effective for almost-prime results. The large sieve then introduces a different but complementary analytic tool, leading to results on distribution in arithmetic progressions and culminating in the [Bombieri-Vinogradov theorem](/theorems/7189). The final chapters apply these methods to classical problems such as twin primes, Goldbach-type questions, and almost-prime representations, showing how sieve theory provides both a flexible method and a deep set of consequences in prime number theory.
# Introduction
Sieve methods begin with a simple obstruction: primes are rare, but exact primality is too rigid to detect directly by inclusion-exclusion. The central move of the course is to replace the question "is $n$ prime?" by the more flexible question "has $n$ avoided all small prime divisors?" This chapter fixes the language for that move, explains the scale of the main terms and errors, and gives the first identities that later chapters will sharpen.
The course sits between elementary number theory and analytic estimates for arithmetic progressions. We use Dirichlet convolution, the Mobius function, asymptotic notation, and distribution results for primes as inputs; the new material is the systematic design of weights that count sifted integers from above or below.
## The Problem Sieve Methods Address
How can we count elements of a finite arithmetic set after removing all elements divisible by small primes? If $A$ is a finite set of integers and $P(z)$ is the product of primes below $z$, the rough count is governed by divisibility conditions modulo many primes at once. The difficulty is that exact inclusion-exclusion has too many terms once $z$ is large.
[definition: Sifting Product]
The sifting product is the function
\begin{align*}
P : [2,\infty) \to \mathbb N, \qquad z \mapsto P(z) := \prod_{p < z} p,
\end{align*}
where the product is over primes $p$.
[/definition]
The product $P(z)$ records the primes by which we intend to sift. To turn this product into a counting problem, we need notation for the elements of $A$ that survive all divisibility tests by primes below $z$.
[definition: Sifting Function]
Let $A \subset \mathbb Z$ be finite. The sifting function attached to $A$ is the function
\begin{align*}
S(A,P,\cdot) : [2,\infty) \to \mathbb N \cup \{0\}, \qquad z \mapsto S(A, P, z) := |\{a \in A : \gcd(a, P(z)) = 1\}|.
\end{align*}
[/definition]
This definition includes the classical example $A = \{1,2,\dots,x\}$, where $S(A,P,z)$ counts integers up to $x$ with no prime factor below $z$. It also includes arithmetic sequences such as $A = \{n(n+2) : n \le x\}$, which appear when studying twin-prime-type questions.
[example: Rough Integers Up To x]
Let $A=\{1,2,\dots,x\}$ with $z\le x$. By the definition of $S(A,P,z)$, an integer $n\le x$ is counted exactly when $\gcd(n,P(z))=1$, which means that no prime $p<z$ divides $n$; equivalently, every prime factor of $n$ is at least $z$.
For a fixed prime $p<z$, the multiples of $p$ in $\{1,\dots,x\}$ are
\begin{align*}
p,2p,3p,\dots,\left\lfloor \frac{x}{p}\right\rfloor p,
\end{align*}
so their number is $\lfloor x/p\rfloor$ and their proportion is $\lfloor x/p\rfloor/x$. Since
\begin{align*}
\frac{x}{p}-1 < \left\lfloor \frac{x}{p}\right\rfloor \le \frac{x}{p},
\end{align*}
we get
\begin{align*}
\frac{1}{p}-\frac{1}{x} < \frac{1}{x}\left\lfloor \frac{x}{p}\right\rfloor \le \frac{1}{p}.
\end{align*}
Thus the proportion divisible by $p$ is $1/p+O(1/x)$, and the proportion not divisible by $p$ is $1-1/p+O(1/x)$. The independence heuristic discards the small endpoint errors and multiplies these local survival proportions over the primes $p<z$, giving
\begin{align*}
S(A,P,z) \approx x\prod_{p<z}\left(1-\frac{1}{p}\right).
\end{align*}
By *Mertens' theorem*,
\begin{align*}
\prod_{p<z}\left(1-\frac{1}{p}\right) \sim \frac{e^{-\gamma}}{\log z},
\end{align*}
so the predicted main scale is
\begin{align*}
x\prod_{p<z}\left(1-\frac{1}{p}\right) \sim \frac{e^{-\gamma}x}{\log z}.
\end{align*}
This is the basic local-product prediction: sieving by primes below $z$ should leave about a constant multiple of $x/\log z$ integers.
[/example]
This heuristic explains why a sieve should produce a product of local factors. It does not explain how to control the accumulated error from many residue classes; that is the main technical issue behind the course.
## Inclusion-Exclusion as the Starting Point
What happens if we try to count the sifted set exactly? The condition $\gcd(a,P(z))=1$ can be written using the Mobius function, so the first formula is an exact identity rather than an estimate.
[quotetheorem:7160]
[citeproof:7160]
The identity reduces sifting to the task of estimating the divisibility counts $|A_d|$, but it does not by itself give a usable estimate. The hypothesis $d \mid P(z)$ is doing real work: it restricts the sum to squarefree moduli built from the primes being sieved, so the Mobius coefficient is exactly the inclusion-exclusion sign for the chosen local conditions. If one summed over unrelated divisors, for instance over moduli involving a prime $q \ge z$, the divisibility condition $q \mid a$ would impose a test that is not part of the event $\gcd(a,P(z))=1$, and the resulting expression would no longer be the sifted-set indicator. Thus the theorem is exact bookkeeping for a fixed set of forbidden primes, not an estimate for how often the corresponding residue classes occur.
To turn the identity into an asymptotic statement, each divisibility count must be split into a predicted density and an error. The obstruction is that inclusion-exclusion asks for $|A_d|$ simultaneously for many squarefree $d$, so a single estimate for $|A|$ or for divisibility by one prime is not enough. The page therefore needs a bookkeeping device that records the expected local density for every relevant squarefree modulus and separates it from the part that must later be bounded.
[definition: Local Density Model]
Let $A \subset \mathbb Z$ be finite and let $X \in \mathbb R_{+}$. A local density model for $A$ consists of a multiplicative function
\begin{align*}
g : \{d \in \mathbb N : d \text{ squarefree}\} \to \mathbb R
\end{align*}
and a remainder function
\begin{align*}
r : \{d \in \mathbb N : d \text{ squarefree}\} \to \mathbb R, \qquad d \mapsto r_d,
\end{align*}
such that
\begin{align*}
|A_d| = g(d)X + r_d
\end{align*}
for every squarefree $d$.
[/definition]
The function $g(d)$ describes the expected proportion of elements divisible by $d$, while $r_d$ measures the discrepancy. Sieve theory is powerful when the remainders can be controlled on average over a suitable range of $d$.
[example: Integers Divisible By d]
For $A=\{1,2,\dots,x\}$ and squarefree $d$, an element of $A_d$ has the form $kd$ with $k\in\mathbb N$ and $kd\le x$. This is equivalent to $1\le k\le x/d$, so the admissible values of $k$ are $1,2,\dots,\lfloor x/d\rfloor$. Hence
\begin{align*}
|A_d|=\left\lfloor \frac{x}{d}\right\rfloor.
\end{align*}
Writing
\begin{align*}
r_d:=\left\lfloor \frac{x}{d}\right\rfloor-\frac{x}{d},
\end{align*}
we have $-1<r_d\le 0$, because $y-1<\lfloor y\rfloor\le y$ for every real $y$. Therefore
\begin{align*}
|A_d|=\frac{x}{d}+r_d,
\end{align*}
with $|r_d|\le 1$. In the notation of the local density model, this means $X=x$, $g(d)=1/d$, and $r_d=O(1)$.
Keeping only the main terms in the [exact sifting identity](/theorems/7160) gives
\begin{align*}
\sum_{d\mid P(z)}\mu(d)g(d)X=x\sum_{d\mid P(z)}\frac{\mu(d)}{d}.
\end{align*}
Since $P(z)=\prod_{p<z}p$ is squarefree, each divisor $d\mid P(z)$ is obtained by choosing a subset of the primes $p<z$. Expanding the finite product gives
\begin{align*}
\prod_{p<z}\left(1-\frac{1}{p}\right)=\sum_{d\mid P(z)}\frac{\mu(d)}{d},
\end{align*}
because choosing the term $-1/p$ for exactly the primes dividing $d$ contributes $(-1)^{\omega(d)}/d=\mu(d)/d$. Thus the main-term contribution is
\begin{align*}
x\sum_{d\mid P(z)}\frac{\mu(d)}{d}=x\prod_{p<z}\left(1-\frac{1}{p}\right).
\end{align*}
This example shows that the natural density model for ordinary integers is $g(d)=1/d$, and that the Euler product arises from expanding the inclusion-exclusion main terms.
[/example]
This example also reveals the first obstruction. Even if each individual error is small, summing $O(1)$ over all $d \mid P(z)$ is too large when $z$ grows.
## Truncation and the Shape of a Sieve
How can exact inclusion-exclusion be shortened without losing the sign of the estimate? Brun's idea is to stop the alternating sum at a controlled level, obtaining upper and lower bounds rather than an equality. Chapter 2 first makes this precise through Brun's alternating truncation, and Chapters 3 and 4 replace it by more efficient combinatorial and Selberg weights.
[definition: Sieve Weights]
Let $D \ge 1$. A family of sieve weights of level $D$ is a function
\begin{align*}
\lambda : \{d \in \mathbb N : d \text{ squarefree}\} \to \mathbb R, \qquad d \mapsto \lambda_d,
\end{align*}
such that $\lambda_d=0$ for $d>D$ and $\lambda_1=1$.
[/definition]
Weights provide a short substitute for the full Mobius sum. To use them for upper bounds, we need a pointwise majorisation of the sifted-set indicator before summing over the sequence $A$.
[definition: Upper Bound Sieve Weights]
A family of sieve weights $\lambda : \{d \in \mathbb N : d \text{ squarefree}\} \to \mathbb R$ is an upper bound sieve for $P(z)$ if, for every $n \in \mathbb Z$,
\begin{align*}
\mathbb{1}_{\{m \in \mathbb Z : \gcd(m,P(z))=1\}}(n) \le \sum_{d \mid \gcd(n,P(z))} \lambda_d.
\end{align*}
[/definition]
Upper bound weights are designed so that summing over $n \in A$ gives an inequality in the useful direction. The next principle records the algebraic step that converts pointwise majorisation into a bound involving the local densities and the remainder sum.
[quotetheorem:7161]
[citeproof:7161]
This principle is the template for most of the course, but its upper-bound hypothesis is essential. If the pointwise majorisation fails for even one integer $n$ with $\gcd(n,P(z))=1$, summing over a set $A$ concentrated on such integers can make the right-hand side smaller than the true sifted count, so the inequality has no reason to survive averaging. The result also does not say that the chosen weights are sharp: a valid upper-bound sieve may have a main term much larger than the expected order, or a remainder term too large to be useful. Later chapters therefore have two separate tasks: construct weights with a good local-product main term, and prove distribution estimates strong enough to control the weighted remainder sum in applications such as primes in progressions, twin-prime-type sequences, and Goldbach-type sums.
## Sieve Dimension and Local Obstructions
Why do twin-prime-type problems behave differently from counting primes? The answer is that the number of forbidden residue classes modulo $p$ may vary with the problem. This local count determines the sieve dimension.
[definition: Sieve Dimension]
Let $A$ have local density function $g$ on squarefree integers. The sieve dimension is a number $\kappa \ge 0$ such that, in the range relevant to the sieve,
\begin{align*}
\sum_{p<z} g(p)\log p \sim \kappa \log z.
\end{align*}
[/definition]
For $A=\{1,2,\dots,x\}$ with $g(p)=1/p$, the dimension is $1$. For twin-prime-type sequences, two residue classes are usually removed modulo each odd prime, so the dimension is $2$.
[example: Twin Prime Local Density]
Let $A=\{n(n+2):1\le n\le x\}$, and fix a prime $p$. The divisibility condition is
\begin{align*}
p\mid n(n+2)
\end{align*}
which is equivalent, since $p$ is prime, to
\begin{align*}
n\equiv 0 \pmod p \quad \text{or} \quad n+2\equiv 0 \pmod p.
\end{align*}
For odd $p$, the second congruence is $n\equiv -2\pmod p$, and the two classes $0$ and $-2$ are distinct because $p\nmid 2$. Thus exactly two residue classes modulo $p$ are removed, so the local density is
\begin{align*}
g(p)=\frac{2}{p}\qquad (p\text{ odd}).
\end{align*}
At $p=2$, the two congruence classes coincide because $-2\equiv 0\pmod 2$. Equivalently, $n$ and $n+2$ have the same parity, so $2\mid n(n+2)$ exactly when $n$ is even. Hence the special local density is
\begin{align*}
g(2)=\frac{1}{2}.
\end{align*}
The sieve dimension comes from the prime sum
\begin{align*}
\sum_{p<z}g(p)\log p=\frac{1}{2}\log 2+\sum_{2<p<z}\frac{2\log p}{p}.
\end{align*}
The term $\frac{1}{2}\log 2$ is constant in $z$, and *Mertens' prime sum estimate* gives $\sum_{p<z}(\log p)/p\sim \log z$, so
\begin{align*}
\sum_{p<z}g(p)\log p\sim 2\log z.
\end{align*}
Thus this sequence has sieve dimension $2$.
The corresponding local survival product is
\begin{align*}
\left(1-\frac{1}{2}\right)\prod_{2<p<z}\left(1-\frac{2}{p}\right).
\end{align*}
For each odd prime $p$,
\begin{align*}
1-\frac{2}{p}=\frac{p-2}{p}.
\end{align*}
Also
\begin{align*}
\left(1-\frac{1}{p}\right)^2=\frac{(p-1)^2}{p^2}.
\end{align*}
Dividing the first expression by the second gives
\begin{align*}
\frac{(p-2)/p}{(p-1)^2/p^2}=\frac{p(p-2)}{(p-1)^2}=1-\frac{1}{(p-1)^2}.
\end{align*}
Therefore
\begin{align*}
1-\frac{2}{p}=\left(1-\frac{1}{p}\right)^2\left(1-\frac{1}{(p-1)^2}\right).
\end{align*}
Multiplying over odd primes below $z$ gives
\begin{align*}
\left(1-\frac{1}{2}\right)\prod_{2<p<z}\left(1-\frac{2}{p}\right)=\frac{1}{2}\prod_{2<p<z}\left(1-\frac{1}{p}\right)^2\prod_{2<p<z}\left(1-\frac{1}{(p-1)^2}\right).
\end{align*}
The last product tends to the twin-prime singular-series factor, while *Mertens' theorem* gives
\begin{align*}
\prod_{2<p<z}\left(1-\frac{1}{p}\right)\asymp \frac{1}{\log z}.
\end{align*}
Hence the expected sieve product has size comparable to
\begin{align*}
\frac{1}{(\log z)^2},
\end{align*}
with the constant modified by the special factor at $2$ and by the singular series.
[/example]
The exceptional behaviour at small primes is not cosmetic. It is encoded in the singular series, which records whether local congruence conditions make the target problem impossible or merely change the leading constant.
## The Main Course Themes
Which parts of a sieve are combinatorial, and which parts require analytic number theory? The course separates three inputs: the construction of weights, the estimation of local products, and the distribution estimates that bound the remainders.
[explanation: Architecture of the Course]
The first part develops the basic sifting function, exact inclusion-exclusion, the Legendre sieve, Buchstab's identity, and Brun's pure sieve. These tools show how elementary cancellation in alternating sums already proves finite results such as Brun's theorem on twin primes.
The second part introduces the Selberg sieve and the larger linear-algebra viewpoint behind optimal upper bounds. Here [Hilbert space](/page/Hilbert%20Space) inequalities enter through quadratic forms in the weights, and the problem becomes one of minimising a positive definite expression subject to the normalisation at $d=1$.
The third part brings in the large sieve and the Bombieri-Vinogradov theorem. This is where distribution in arithmetic progressions replaces pointwise information, allowing sieve methods to reach ranges that are inaccessible to elementary remainder estimates.
[/explanation]
These themes reappear in each application. For primes, twin primes, Goldbach-type sums, and almost-prime values of polynomials, the same three questions must be answered: what residue classes are forbidden, how should the weights be chosen, and how far can the remainder estimates be summed?
[remark: What Sieve Methods Can And Cannot Prove]
Classical sieve methods are very effective at proving upper bounds and at producing almost primes. They usually cannot by themselves prove that a sequence contains infinitely many primes, because the lower-bound sieve loses information near the parity barrier. The parity barrier is the phenomenon that many sieve weights cannot distinguish numbers with an even number of prime factors from numbers with an odd number of prime factors.
[/remark]
The parity barrier explains why the course does not present a proof of the twin prime conjecture or Goldbach's conjecture. Instead it develops the strongest unconditional sieve statements available from the chosen inputs, such as Brun-type upper bounds, almost-prime results, and average distribution theorems.
## A Working Notational Summary
What notation will remain fixed throughout the course? We use $p$ for primes, $d$ for squarefree divisors, $z$ for the sifting threshold, and $D$ for the level of the sieve. The finite sequence being sifted is denoted by $A$, its expected size by $X$, and the sifted count by $S(A,P,z)$.
[definition: Remainder Sum]
Given $z \ge 2$, sieve weights $\lambda : \{d \in \mathbb N : d \text{ squarefree}\} \to \mathbb R$, and a local density model $|A_d|=g(d)X+r_d$, the associated remainder sum is the quantity
\begin{align*}
R(A,\lambda,z) := \sum_{d\mid P(z)} \lambda_d r_d \in \mathbb R.
\end{align*}
[/definition]
The size of $R(A,\lambda)$ is the test of whether a formal sieve identity has become an analytic theorem. In elementary sieves it is bounded by direct estimates for $r_d$; in the large sieve part it is controlled by mean-square inequalities and average equidistribution.
[example: The Square-Root Barrier]
For $A=\{1,2,\dots,x\}$ and squarefree $d$, the preceding computation gives
\begin{align*}
|A_d|=\left\lfloor \frac{x}{d}\right\rfloor=\frac{x}{d}+r_d
\end{align*}
with
\begin{align*}
-1<r_d\le 0.
\end{align*}
Hence $|r_d|\le 1$. Substituting this local density model into the exact sifting identity gives
\begin{align*}
S(A,P,z)=x\sum_{d\mid P(z)}\frac{\mu(d)}{d}+\sum_{d\mid P(z)}\mu(d)r_d.
\end{align*}
The second sum is the accumulated endpoint error. Since $|\mu(d)|\le 1$ and $|r_d|\le 1$ for every $d$, we have
\begin{align*}
\left|\sum_{d\mid P(z)}\mu(d)r_d\right|\le \sum_{d\mid P(z)}|\mu(d)|\,|r_d|.
\end{align*}
Also
\begin{align*}
\sum_{d\mid P(z)}|\mu(d)|\,|r_d|\le \sum_{d\mid P(z)}1.
\end{align*}
Because $P(z)=\prod_{p<z}p$ is squarefree, each divisor $d\mid P(z)$ is obtained by choosing a subset of the primes $p<z$. Therefore
\begin{align*}
\sum_{d\mid P(z)}1=2^{\pi(z)}.
\end{align*}
Thus exact inclusion-exclusion can accumulate one bounded error term for every subset of the sieving primes, which is far too many terms once $z$ grows.
If instead the sieve uses only moduli $d\le D$ and the coefficients satisfy $|\lambda_d|\le 1$, the same estimate gives
\begin{align*}
\left|\sum_{d\le D,\ d\mid P(z)}\lambda_d r_d\right|\le \sum_{d\le D,\ d\mid P(z)}|\lambda_d|\,|r_d|.
\end{align*}
Since each summand on the right is at most $1$,
\begin{align*}
\sum_{d\le D,\ d\mid P(z)}|\lambda_d|\,|r_d|\le \sum_{1\le d\le D}1=D.
\end{align*}
For this remainder to be smaller than the expected main term
\begin{align*}
x\prod_{p<z}\left(1-\frac{1}{p}\right),
\end{align*}
one needs
\begin{align*}
D=o\left(x\prod_{p<z}\left(1-\frac{1}{p}\right)\right).
\end{align*}
In the ordinary dimension-one case this main scale is comparable to $x/\log z$, so an elementary sieve with only individual bounds $r_d=O(1)$ cannot afford to sum over too many moduli. This restriction is the elementary form of the square-root barrier.
[/example]
Chapter 1 begins with the basic sifting problem in detail. It rederives the exact inclusion-exclusion formula in the notation with an explicit prime set $P$, introduces Buchstab's decomposition, and shows how the prime-counting problem already contains the central tension between main terms and remainders.
Introduction has now fixed the main objects and the basic tension between main terms and remainders. The next chapter begins the actual sifting problem by making inclusion-exclusion explicit for a finite prime set and showing how Buchstab's decomposition organizes the exact formulas that every sieve later approximates.
# 1. Sifting Problems and Inclusion-Exclusion
Sieve methods begin with a simple question: how many elements of a finite set remain after removing those divisible by small primes? The first chapter sets up the common language for this question, explains the exact inclusion-exclusion formula behind every elementary sieve, and shows why exactness becomes unusable in analytic problems. The Legendre sieve for primes is the guiding example: it is perfectly exact, but its error term already reveals the square-root barrier that later chapters work to overcome.
## Sifted Sets and Local Densities
The basic problem is to count elements of a finite arithmetic set after imposing many congruence exclusions. Let $A$ be a finite set of integers, let $P$ be a set of primes, and let $z \ge 2$. We want to count those $a \in A$ which avoid all prime divisors $p \in P$ with $p < z$.
[definition: Sifting Product]
Let $P$ be a set of primes. The sifting product associated to $P$ is the map
\begin{align*}
P(\cdot):[2,\infty) \to \mathbb N
\end{align*}
defined by
\begin{align*}
P(z):=\prod_{p \in P,\ p < z} p.
\end{align*}
[/definition]
The notation deliberately overloads $P$: the same letter denotes the set of primes, while $P(z)$ denotes the integer obtained by multiplying the primes of $P$ below $z$. This convention is standard in sieve theory, but the distinction matters because divisibility conditions are imposed using the integer $P(z)$, not the set $P$ itself. The product $P(z)$ packages all forbidden prime divisors below the level $z$, and the condition $\gcd(a,P(z))=1$ says that $a$ has survived the removal of residue classes modulo all primes in $P$ below $z$.
[definition: Sifting Function]
Let $\mathcal F_{\mathrm{fin}}(\mathbb Z)$ denote the finite subsets of $\mathbb Z$, and let $\mathcal P$ denote the collections of primes. The sifting function is the map
\begin{align*}
S:\mathcal F_{\mathrm{fin}}(\mathbb Z)\times \mathcal P\times [2,\infty) \to \mathbb Z_{\ge 0}
\end{align*}
defined by
\begin{align*}
S(A,P,z):=|\{a \in A : \gcd(a,P(z))=1\}|.
\end{align*}
[/definition]
The notation is deliberately flexible. In different applications $A$ may be an interval, a polynomial sequence, a set of values of a linear form, or a sequence with weights. The set counted by $S(A,P,z)$ is called the sifted set at level $z$.
[example: Integers With No Small Prime Factor]
Let $A=\{1,2,\dots,\lfloor x\rfloor\}$ and let $P$ be the set of all primes. By the definitions of the sifting product and sifting function,
\begin{align*}
S(A,P,z)=|\{n\in\{1,\dots,\lfloor x\rfloor\}:\gcd(n,\prod_{p<z}p)=1\}|.
\end{align*}
For an integer $n$, the condition $\gcd(n,\prod_{p<z}p)=1$ is equivalent to saying that no prime $p<z$ divides $n$, because a prime divides the product $\prod_{p<z}p$ exactly when it is one of the primes appearing in that product. Hence
\begin{align*}
S(A,P,z)=|\{n\le x:p\nmid n\text{ for every prime }p<z\}|.
\end{align*}
This counts integers up to $x$ with no prime factor below $z$. If $z=\sqrt{x}$ and a surviving integer $n>1$ is composite, let $q$ be its least prime factor. Then $n=qm$ with $q\le m$, so
\begin{align*}
q^2\le qm=n\le x.
\end{align*}
Thus $q\le \sqrt{x}$. Since $n$ survives, no prime below $\sqrt{x}$ divides $n$, so $q\ge \sqrt{x}$. Therefore $q=\sqrt{x}$, and then $q^2=x$ and $n=x$. Thus the only possible composite survivor at the boundary is a square of a prime equal to $\sqrt{x}$; taking the level just above $\sqrt{x}$ removes this boundary case and leaves only $1$ and primes.
[/example]
The first example shows the sifted set, but it does not yet explain how large such a set should be. A sieve needs an expected density of the forbidden congruence classes, so the next object records how often divisibility by a given prime occurs inside $A$.
[definition: Local Density]
Let $A$ be a finite set of integers and let $p$ be a prime. A local density at $p$ is a number $\omega(p)$ such that the count of elements of $A$ divisible by $p$ is modelled as
\begin{align*}
|\{a \in A : p \mid a\}| = \frac{\omega(p)}{p}X + r_p,
\end{align*}
where $X$ is the main scale of $A$ and $r_p$ is a remainder term.
[/definition]
Here $X$ is usually comparable to $|A|$, but in weighted problems it is the total mass of the sequence. The ratio $\omega(p)/p$ is the expected fraction removed by the prime $p$, while $r_p$ measures the deviation from this local model.
[example: Interval Local Density]
Let $A=\{1,2,\dots,x\}$ with $x\in \mathbb N$, and let $X=x$. An element of $A$ is divisible by $p$ exactly when it has the form $kp$ with $k\in\mathbb N$ and $kp\le x$, equivalently $1\le k\le x/p$. Hence the multiples of $p$ in $A$ are
\begin{align*}
p,2p,\dots,\left\lfloor\frac{x}{p}\right\rfloor p,
\end{align*}
so
\begin{align*}
|\{a\in A:p\mid a\}|=\left\lfloor\frac{x}{p}\right\rfloor.
\end{align*}
By the defining property of the floor function,
\begin{align*}
\left\lfloor\frac{x}{p}\right\rfloor\le \frac{x}{p}<\left\lfloor\frac{x}{p}\right\rfloor+1.
\end{align*}
Subtracting $x/p$ throughout gives
\begin{align*}
-1<\left\lfloor\frac{x}{p}\right\rfloor-\frac{x}{p}\le 0.
\end{align*}
Thus, with
\begin{align*}
r_p:=\left\lfloor\frac{x}{p}\right\rfloor-\frac{x}{p},
\end{align*}
we have $|r_p|\le 1$ and
\begin{align*}
|\{a\in A:p\mid a\}|=\frac{1}{p}x+r_p.
\end{align*}
In the local-density notation this is $\omega(p)=1$ with a uniformly bounded remainder $r_p=O(1)$, so the interval is the basic one-dimensional sieve model.
[/example]
The interval example has one forbidden residue class modulo each prime. Many sieve problems remove several residue classes per prime, and the average number of removed classes controls the scale of the final answer. This motivates a dimension parameter, which records the logarithmic average of the local densities.
[definition: Sieve Dimension]
Let $P$ be a set of primes, let $\mathcal D(P)$ be the set of squarefree positive integers whose prime factors all lie in $P$, and let $\omega:\mathcal D(P)\to \mathbb R$ be a multiplicative local density function. The sieve dimension is a number $\kappa \ge 0$ for which
\begin{align*}
\sum_{p \in P,\ p<z} \frac{\omega(p)\log p}{p} = \kappa \log z + O(1)
\end{align*}
as $z \to \infty$.
[/definition]
The parameter $\kappa$ measures the average number of residue classes removed per prime. Dimension $1$ is the ordinary problem of excluding divisibility by primes; dimension $2$ appears, for instance, when sieving $n(n+2)$ for twin-prime type questions.
[example: Rough Size of a One-Dimensional Sifted Set]
For $A=\{1,2,\dots,x\}$ with $x\in\mathbb N$ and $P$ the set of all primes, the interval local-density computation gives one forbidden residue class modulo each prime $p$: the multiples of $p$ have density $1/p$ up to a bounded remainder. Thus the model predicts that sieving by a single prime $p$ keeps the fraction
\begin{align*}
1-\frac{1}{p}
\end{align*}
of the set. Applying this prime by prime for all $p<z$ gives the expected main term
\begin{align*}
S(A,P,z)\approx x\prod_{p<z}\left(1-\frac{1}{p}\right).
\end{align*}
By *Mertens' product theorem*,
\begin{align*}
\prod_{p<z}\left(1-\frac{1}{p}\right)\sim \frac{e^{-\gamma}}{\log z}.
\end{align*}
Substituting this asymptotic into the density model gives
\begin{align*}
x\prod_{p<z}\left(1-\frac{1}{p}\right)\sim x\cdot \frac{e^{-\gamma}}{\log z}.
\end{align*}
Since $e^{-\gamma}$ is a positive constant independent of $x$ and $z$, this main term has order $x/\log z$. The example shows that a dimension-one sieve naturally produces the same logarithmic scale that appears in prime counting.
[/example]
## Exact Inclusion-Exclusion and the Need for Truncation
The next question is whether the heuristic product can be replaced by an exact formula. For a finite set of forbidden primes the answer is yes: inclusion-exclusion counts the sifted set exactly by adding and subtracting divisibility conditions. The issue is that exactness requires summing over every squarefree divisor of $P(z)$.
[definition: Divisibility Subset]
Let $A$ be a finite set of integers and let $d \in \mathbb N$. Define
\begin{align*}
A_d := \{a \in A : d \mid a\}.
\end{align*}
[/definition]
The sets $A_d$ turn a simultaneous avoidance problem into many divisibility counts. The Möbius function supplies the signs that select integers coprime to the sifting product.
[quotetheorem:751]
[citeproof:751]
This theorem is the algebraic core of the elementary sieve because it turns a coprimality condition into divisibility counts. Each hypothesis has a specific role. The finiteness of $A$ makes both sides finite sums; if $A=\mathbb N$ and $P=\{2\}$, then $S(A,P,3)$ and $|A_2|$ are infinite cardinalities, so the displayed subtraction is no longer a numerical identity. The restriction to the integer product $P(z)$ matters because the proof uses the divisor identity $\sum_{d\mid n}\mu(d)$; replacing the divisors $d\mid P(z)$ by arbitrary moduli would not test the same coprimality condition. The use of $p<z$ is also part of the convention: if $P=\{2\}$, $z=2$, and $A=\{2\}$, then $P(z)=1$ and the element is counted, whereas changing the convention to $p\le z$ would remove it.
The theorem does not estimate $S(A,P,z)$, and it does not say that the terms $\mu(d)|A_d|$ are individually small or useful. It is exact only because it pays for every squarefree divisor of $P(z)$, so the number of terms grows exponentially in the number of sieving primes. To compare the exact formula with the probabilistic product from the previous section, we need local densities not only for primes but also for simultaneous divisibility by squarefree products of primes.
[definition: Multiplicative Sieve Density]
Let $P$ be a set of primes, and let $\mathcal D(P)$ be the set of squarefree positive integers whose prime factors all lie in $P$. A multiplicative sieve density is a map
\begin{align*}
\omega:\mathcal D(P)\to \mathbb R
\end{align*}
such that
\begin{align*}
\omega(d)=\prod_{p\mid d}\omega(p)
\end{align*}
for every $d\in\mathcal D(P)$.
[/definition]
With this convention, the local model for divisibility by $d$ is $|A_d|=(\omega(d)/d)X+r_d$. The point that still has to be checked is whether exact inclusion-exclusion preserves the prime-by-prime density prediction once all squarefree divisors of $P(z)$ are summed. If the main divisor sum factors, the only obstruction left is the accumulated remainder; if it does not factor, even the predicted main term has the wrong shape. This is the first place where multiplicativity turns local congruence data into an Euler product and exposes the precise error term to be controlled.
[quotetheorem:7162]
The formula identifies both the promise and the danger of exact inclusion-exclusion. Multiplicativity of $\omega$ is exactly what makes the main divisor sum factor into an Euler product. For a concrete failure, take $P=\{2,3\}$ and define $\omega(1)=1$, $\omega(2)=\omega(3)=1$, but $\omega(6)=0$. Then
\begin{align*}
\sum_{d\mid 6}\mu(d)\frac{\omega(d)}{d}=1-\frac{1}{2}-\frac{1}{3}+0=\frac{1}{6},
\end{align*}
whereas
\begin{align*}
\prod_{p\in\{2,3\}}\left(1-\frac{\omega(p)}{p}\right)=\frac{1}{3}.
\end{align*}
Thus the prime-by-prime product is not justified without the squarefree multiplicativity hypothesis. The assumption that the displayed formula for $|A_d|$ holds for every $d\mid P(z)$ is also necessary for exactness; if it is known only for primes, then the term $|A_6|$ in the preceding example remains uncontrolled. The remainders $r_d$ are unavoidable because arithmetic sets rarely distribute perfectly modulo every squarefree modulus.
This expansion does not by itself give an asymptotic formula. Even if each $r_d$ is bounded, the sum contains all divisors $d\mid P(z)$, and their number may be too large for the total remainder to be small. Chapters 2 and 3 design such replacements: Brun's alternating truncation first keeps only bounded layers of the divisor lattice, and the combinatorial sieve then packages the same idea as upper and lower weights supported only on a feasible range of moduli.
[example: Why the Full Remainder Sum Is Too Large]
Let $A=\{1,2,\dots,x\}$ with $x\in\mathbb N$. For a squarefree divisor $d\mid P(z)$, the elements of $A_d$ are
\begin{align*}
d,2d,\dots,\left\lfloor\frac{x}{d}\right\rfloor d.
\end{align*}
Hence
\begin{align*}
|A_d|=\left\lfloor\frac{x}{d}\right\rfloor.
\end{align*}
Writing
\begin{align*}
r_d:=\left\lfloor\frac{x}{d}\right\rfloor-\frac{x}{d},
\end{align*}
the defining property of the floor function gives $-1<r_d\le 0$, so $|r_d|\le 1$ and
\begin{align*}
|A_d|=\frac{x}{d}+r_d.
\end{align*}
Substituting this into the *Exact Sieve Expansion With Remainders* gives
\begin{align*}
S(A,P,z)=x\prod_{p<z}\left(1-\frac{1}{p}\right)+\sum_{d\mid P(z)}\mu(d)r_d.
\end{align*}
The absolute value of the remainder is bounded by the triangle inequality:
\begin{align*}
\left|\sum_{d\mid P(z)}\mu(d)r_d\right|\le \sum_{d\mid P(z)}|\mu(d)|\,|r_d|.
\end{align*}
Since every divisor of $P(z)$ is squarefree, $|\mu(d)|=1$ for each $d\mid P(z)$, and since $|r_d|\le 1$,
\begin{align*}
\sum_{d\mid P(z)}|\mu(d)|\,|r_d|\le \sum_{d\mid P(z)}1.
\end{align*}
If $\nu(z)$ denotes the number of primes $p<z$, then $P(z)$ is the product of $\nu(z)$ distinct primes. Each divisor is obtained by choosing either to include or exclude each of these primes, so
\begin{align*}
\sum_{d\mid P(z)}1=2^{\nu(z)}\le 2^{\pi(z)}.
\end{align*}
Thus exact inclusion-exclusion gives the error bound
\begin{align*}
\left|\sum_{d\mid P(z)}\mu(d)r_d\right|\le 2^{\pi(z)}.
\end{align*}
For example, at the Legendre level $z=\lfloor\sqrt{x}\rfloor+1$, this bound is exponential in the number of primes up to $\sqrt{x}$, while the expected main term has size comparable to $x/\log z$. The calculation shows that bounded individual errors do not remain harmless after summing over all divisors of $P(z)$; a useful sieve must truncate or smooth the inclusion-exclusion sum.
[/example]
Truncation is the central design choice in sieve theory. Beginning with Brun's pure sieve in Chapter 2 and continuing with the combinatorial sieve in Chapter 3, we construct coefficients $\lambda_d$ supported on $d<D$ so that
\begin{align*}
S(A,P,z) \approx \sum_{d\mid P(z)}\lambda_d |A_d|,
\end{align*}
with enough sign control to give upper or lower bounds. The parameter $D$ is called the level of the sieve and must stay within the range where the remainder terms are manageable.
## The Legendre Sieve and the Square-Root Barrier
The first exact application is prime counting. The problem is to express $\pi(x)$ using a sifting function and then inspect the size of the resulting remainder. This gives a useful identity, but it also shows why exact inclusion-exclusion alone is not a satisfactory asymptotic method.
[definition: Prime Counting Function]
The prime counting function is the map $\pi:[1,\infty)\to \mathbb Z_{\ge 0}$ defined by
\begin{align*}
\pi(x):=|\{p \le x : p \text{ is prime}\}|.
\end{align*}
[/definition]
To detect primes up to $x$, we remove all integers divisible by primes below $\sqrt{x}$. A composite number $n \le x$ has a prime factor at most $\sqrt{x}$, so the survivors are $1$ and the primes between $\sqrt{x}$ and $x$.
[quotetheorem:1752]
[citeproof:1752]
The identity is exact and historically important, but each convention is doing work. The condition $x\ge 2$ avoids the degenerate range where the phrase "primes between $y$ and $x$" no longer matches the displayed correction term. The choice $y=\lfloor\sqrt{x}\rfloor$ uses the elementary fact that every composite $n\le x$ has a prime factor at most $\sqrt{x}$; if we took $y<\sqrt{x}$, then composites such as $n=pq$ with $y<p\le q$ could survive the sieve. For instance, with $x=49$ and $y=5$, the number $49$ is not removed by primes at most $5$.
The level convention is also part of the statement. Since $S(A,P,z)$ removes primes $p<z$, choosing $z=y+1$ removes exactly the primes at most $y$. If instead the formula used $z=y$, then primes equal to $y$ would not be removed; at $x=25$, this would leave $25$ among the survivors. If one used $\sqrt{x}+1$ directly with the convention $p<z$, primes lying between $\sqrt{x}$ and $\sqrt{x}+1$ would be removed from the sifted set when $x$ is not a square, changing the correction term.
The identity does not estimate $\pi(x)$ efficiently and does not approximate the [prime number theorem](/theorems/1692). It merely rewrites prime counting as a sifted-set count at the square-root threshold. To test the strength of the inclusion-exclusion machinery developed above, we now expand this particular sifting function by the Möbius formula; the result is Legendre's sieve formula.
[quotetheorem:7163]
[citeproof:7163]
This formula is the point where exact inclusion-exclusion becomes computationally expensive rather than conceptually difficult. The hypotheses come directly from the preceding identity. The lower bound $x\ge 2$ keeps $y\ge 1$ and makes the correction $\pi(y)-1$ meaningful in the survivor count. The choice $y=\lfloor\sqrt{x}\rfloor$ is necessary for exact prime detection; with $x=49$ and $y=5$, the same formula would count $49$ as a survivor if the sifting level were based on $5$ rather than $7$. The floor functions are necessary because the interval count is $|A_d|=\lfloor x/d\rfloor$; replacing them by $x/d$ is not an identity, as $x=10$ and $d=3$ already gives $\lfloor 10/3\rfloor=3$ while $10/3$ is not an integer count.
The formula also depends on using all squarefree divisors of $P(y+1)$. If terms are omitted, the cancellations that remove composites fail: for $x=30$, leaving out the divisor $6$ would mishandle numbers divisible by both $2$ and $3$. What the formula does not provide is a usable asymptotic estimate for $\pi(x)$. Replacing $\lfloor x/d\rfloor$ by $x/d$ gives a product main term, but the accumulated floor errors over all $d\mid P(y+1)$ dominate. This is the square-root barrier in its most elementary form and motivates the truncated sieves introduced later.
[remark: Square-Root Barrier]
In the Legendre sieve, the sifting level is $z=\lfloor\sqrt{x}\rfloor+1$, so the exact inclusion-exclusion sum contains $2^{\pi(\lfloor\sqrt{x}\rfloor)}$ terms. The local estimate $\lfloor x/d\rfloor=x/d+O(1)$ is individually strong, but the total $O(1)$ error over all $d\mid P(z)$ is too large for prime-counting asymptotics. Later sieves lower the combinatorial cost by using only selected divisors $d<D$, at the price of producing bounds rather than exact identities.
[/remark]
The square-root barrier comes from attempting to sieve to a high level in one exact step. A more flexible approach changes the level gradually, recording which prime first enters the factorisation of an integer that has just failed the stronger sifting condition. This leads to Buchstab's decomposition.
[quotetheorem:7164]
[citeproof:7164]
Buchstab's decomposition is the first recursive tool of the course. The finiteness of $A$ ensures that all counts and the prime-indexed sum are finite; if $A=\mathbb N$ and $P$ contains all primes, the displayed terms are infinite and the subtraction is not a numerical identity. The strict inequality $z_1<z_2$ gives a genuine interval of newly forbidden primes. If $z_1=z_2$, the sum is empty and the statement reduces to the tautology $S(A,P,z_1)=S(A,P,z_1)$, so no decomposition has occurred. The half-open interval $z_1\le p<z_2$ must match the convention in $S(A,P,z)$; using $z_1<p<z_2$ would miss elements whose least newly forbidden prime is exactly $z_1$, for example $A=\{2\}$, $P=\{2\}$, $z_1=2$, and $z_2=3$.
The theorem deliberately sifts $A_p$ as a subset of $A$; in special cases such as intervals we may reparametrise multiples $a=pm$, but that extra identification is not part of the abstract statement. It does not estimate any summand and does not say that the recursive process converges without further analytic input. Its value is structural: it lowers a sifting level by accounting for the first prime factor introduced in the interval of levels, and it is the starting point for recursive estimates of rough numbers.
[example: First Buchstab Step for Rough Integers]
Let $A=\{1,2,\dots,\lfloor x\rfloor\}$ and let $P$ be the set of all primes. For $2\le z_1<z_2$, *[Buchstab Decomposition](/theorems/7164)* gives
\begin{align*}
S(A,P,z_2)=S(A,P,z_1)-\sum_{z_1\le p<z_2}S(A_p,P,p).
\end{align*}
For a prime $p$ in the sum,
\begin{align*}
A_p=\{a\in A:p\mid a\}=\{pm:m\in\mathbb N,\ pm\le \lfloor x\rfloor\}.
\end{align*}
The condition $pm\le \lfloor x\rfloor$ is equivalent to
\begin{align*}
m\le \frac{\lfloor x\rfloor}{p}.
\end{align*}
Since $m$ is an integer, this is equivalent to
\begin{align*}
m\le \left\lfloor\frac{\lfloor x\rfloor}{p}\right\rfloor=\left\lfloor\frac{x}{p}\right\rfloor.
\end{align*}
Thus multiplication by $p$ gives a bijection
\begin{align*}
\{1,2,\dots,\lfloor x/p\rfloor\}\longrightarrow A_p,\qquad m\mapsto pm.
\end{align*}
Now compute the sifting condition in the summand. By the definition of $S(A_p,P,p)$,
\begin{align*}
S(A_p,P,p)=|\{a\in A_p:\gcd(a,P(p))=1\}|.
\end{align*}
Writing $a=pm$ and using
\begin{align*}
P(p)=\prod_{q<p}q,
\end{align*}
we have $p\nmid P(p)$, and every prime divisor of $P(p)$ is a prime $q<p$. Hence
\begin{align*}
\gcd(pm,P(p))=1
\end{align*}
holds exactly when no prime $q<p$ divides $m$, equivalently when
\begin{align*}
\gcd(m,P(p))=1.
\end{align*}
Therefore
\begin{align*}
S(A_p,P,p)=|\{m\le x/p:\gcd(m,P(p))=1\}|=S(\{1,2,\dots,\lfloor x/p\rfloor\},P,p).
\end{align*}
So the first Buchstab step rewrites the count at level $z_2$ in terms of the easier count at level $z_1$ and correction terms that count integers $m\le x/p$ with no prime factor below $p$; the recursion is organized by the least prime factor newly removed between $z_1$ and $z_2$.
[/example]
## The Fundamental Sieve Problem
Using the notation for $A_d$, $\omega(d)$, and the remainder terms introduced above, the chapter ends by isolating the general problem that every later sieve tries to solve. We are given local information about divisibility by squarefree integers and want upper or lower estimates for the sifted set using only a controlled range of moduli.
[definition: Fundamental Sieve Problem]
Let $A$ be a finite sequence of integers with main scale $X$, let $P$ be a set of primes, and let $z\ge 2$. Suppose that for squarefree $d\mid P(z)$ one has
\begin{align*}
|A_d|=\frac{\omega(d)}{d}X+r_d.
\end{align*}
The fundamental sieve problem is to estimate
\begin{align*}
S(A,P,z)
\end{align*}
in terms of $X$, the product
\begin{align*}
V(z):=\prod_{p\in P,\ p<z}\left(1-\frac{\omega(p)}{p}\right),
\end{align*}
and the available bounds for the remainders $r_d$.
[/definition]
The expected answer has the shape $S(A,P,z)\approx XV(z)$, but the symbol $\approx$ hides the main work of the course. To make it meaningful, one must specify the sieve dimension, the level of distribution for the remainders, and whether the desired result is an upper bound, a lower bound, or an asymptotic formula.
[remark: What Later Sieves Change]
Legendre's identity uses perfect coefficients $\mu(d)$ but pays for every divisor of $P(z)$. Brun's sieve keeps the inclusion-exclusion signs but truncates the order. Selberg's sieve chooses weights optimally for upper bounds, while the large sieve supplies distribution estimates strong enough to average the remainders over many moduli. These methods all begin from the same formula in this chapter and differ in how they control the remainder sum.
[/remark]
Chapter 1 made clear that exact inclusion-exclusion is conceptually clean but quickly becomes unwieldy once all divisors of $P(z)$ are present. The next chapter keeps the same notation and shows how Eratosthenes' and Brun's ideas recover useful information by truncating the expansion and retaining only one-sided bounds.
# 2. Eratosthenes and Brun's Pure Sieve
Chapter 1 formulated a sifting problem as the task of counting elements of a finite set $A$ that avoid all primes below a level $z$, and it identified exact inclusion-exclusion as too expensive once all divisors of $P(z)$ are used. This chapter keeps that notation and turns to the first remedy: stopping inclusion-exclusion while preserving one-sided inequalities. This chapter returns to the oldest sieve, the sieve of Eratosthenes, and treats it in two complementary ways: as a finite algorithm for finding primes, and as a template for asymptotic counting. The central difficulty is that exact inclusion-exclusion has too many terms when $z$ is large, so Brun's idea is to stop the alternating sum at a controlled point while preserving one-sided inequalities. The payoff is the first nontrivial upper bound for twin primes and the convergence of the sum of reciprocals of twin primes.
## Eratosthenes as Counting by Deletion
How much information is retained if we delete multiples of small primes without trying to identify the remaining integers one by one? The algorithmic sieve of Eratosthenes starts with the integers up to $x$ and removes composite numbers by crossing out multiples of each prime $p \le \sqrt{x}$. In asymptotic sieve theory the same deletion process is encoded by a product of local conditions, but the errors from overlaps between congruence classes become the main object of study.
[definition: Sifted Set]
As in Chapter 1, let $A$ be a finite set of integers, let $P(z)=\prod_{p<z}p$, and let $d$ range over squarefree positive integers. The sifted set at level $z$ is
\begin{align*}
S(A,z)=\{a\in A: \gcd(a,P(z))=1\}.
\end{align*}
The associated counting function is the map
\begin{align*}
S(\,\cdot\,,P,z):\{B\subset \mathbb Z:B\text{ finite}\}\to \mathbb N\cup\{0\},
\end{align*}
defined by
\begin{align*}
S(B,P,z)=|\{b\in B:\gcd(b,P(z))=1\}|.
\end{align*}
[/definition]
Here $P$ in $S(A,P,z)$ is part of the standard sieve notation and means the prime product $P(z)$, not an independent third datum. This definition packages the deletion rule into a single coprimality condition. To calibrate the size of a sifted set, we first test it on the basic sequence of all integers up to $x$, where the only forbidden residue class modulo each prime is $0$.
[example: Rough Integers Up To X]
Let $A=\{1,2,\dots,\lfloor x\rfloor\}$. For $n\in A$, the condition $\gcd(n,P(z))=1$ means that no prime $p<z$ divides $n$; equivalently, every prime divisor of $n$ is at least $z$.
For a fixed prime $p<z$, the deleted integers are
\begin{align*}
p,2p,3p,\dots,\left\lfloor\frac{\lfloor x\rfloor}{p}\right\rfloor p,
\end{align*}
so their number is $\lfloor \lfloor x\rfloor/p\rfloor$. Since
\begin{align*}
\frac{x}{p}-1<\left\lfloor\frac{\lfloor x\rfloor}{p}\right\rfloor\le \frac{x}{p},
\end{align*}
the deleted proportion is $1/p$ up to an endpoint error of size $O(1/x)$. If the divisibility conditions by distinct primes behaved independently, the predicted surviving proportion would be the product of the individual surviving proportions:
\begin{align*}
\prod_{p<z}\left(1-\frac{1}{p}\right).
\end{align*}
Multiplying by the scale $x$ gives the heuristic main term
\begin{align*}
x\prod_{p<z}\left(1-\frac{1}{p}\right).
\end{align*}
By *Mertens' theorem*,
\begin{align*}
\prod_{p<z}\left(1-\frac{1}{p}\right)\asymp \frac{1}{\log z},
\end{align*}
so the heuristic size is comparable to $x/\log z$. The estimate becomes a theorem only after the overlaps between the deleted classes, such as integers divisible by both $p$ and $q$, are controlled.
[/example]
The example displays the basic tension of the subject: the local product is simple, but the exact count depends on all simultaneous divisibility conditions. To state those simultaneous conditions in a form that can be estimated for general sequences, we record the density of the set cut out by each squarefree divisor.
[definition: Local Density]
Let $A$ be a finite set of integers. For a squarefree integer $d$, write
\begin{align*}
A_d=\{a\in A:d\mid a\}.
\end{align*}
A local density function is a multiplicative function
\begin{align*}
\omega:\{d\in\mathbb N:d\text{ squarefree}\}\to \mathbb R,
\end{align*}
such that, for a scale parameter $X>0$,
\begin{align*}
|A_d|=\frac{\omega(d)}{d}X+r_d.
\end{align*}
[/definition]
The quantity $\omega(p)$ measures how many residue classes modulo $p$ are forbidden by the sequence, while $r_d$ records the finite endpoint error left after replacing an exact count by its expected density. This separation is the point of the notation: the sieve manipulates the multiplicative main terms uniformly in $d$, and all arithmetic irregularity is pushed into the remainders. Before returning to inclusion-exclusion, it is useful to see that this definition reproduces the familiar density $1/d$ for ordinary divisibility.
[example: Local Density For Integers]
For $A=\{1,2,\dots,\lfloor x\rfloor\}$ and $X=x$, fix a squarefree integer $d$. The integers $n\in A$ with $d\mid n$ are exactly
\begin{align*}
d,2d,3d,\dots,\left\lfloor\frac{\lfloor x\rfloor}{d}\right\rfloor d.
\end{align*}
There are therefore $\lfloor \lfloor x\rfloor/d\rfloor$ such integers. Since $\lfloor x\rfloor\le x<\lfloor x\rfloor+1$, division by $d$ gives
\begin{align*}
\frac{x}{d}-\frac{1}{d}<\frac{\lfloor x\rfloor}{d}\le \frac{x}{d}.
\end{align*}
Taking floors, this shows that
\begin{align*}
|A_d|=\left\lfloor\frac{\lfloor x\rfloor}{d}\right\rfloor=\frac{x}{d}+\left(\left\lfloor\frac{\lfloor x\rfloor}{d}\right\rfloor-\frac{x}{d}\right).
\end{align*}
Comparing this with the local-density form
\begin{align*}
|A_d|=\frac{\omega(d)}{d}X+r_d
\end{align*}
and using $X=x$, we may take
\begin{align*}
\omega(d)=1
\end{align*}
and
\begin{align*}
r_d=\left\lfloor\frac{\lfloor x\rfloor}{d}\right\rfloor-\frac{x}{d}.
\end{align*}
For each prime $p$, the local factor is therefore $1-\omega(p)/p=1-1/p$, so the associated sieve product is
\begin{align*}
\prod_{p<z}\left(1-\frac{1}{p}\right).
\end{align*}
Thus ordinary divisibility by primes has local sieve dimension $1$: each prime removes one residue class, namely $0\pmod p$.
[/example]
The local-density notation separates the expected main term from the arithmetic error $r_d$. The next step is to insert this decomposition into the exact Eratosthenes count, which identifies both the Euler product and the remainder sum that later force Brun's truncation. We write $\mu$ for the Mobius function, so $\mu(d)=(-1)^r$ when $d$ is a product of $r$ distinct primes and $\mu(d)=0$ when $d$ is divisible by a square.
[quotetheorem:751]
[citeproof:751]
This identity explains why the exact sieve is powerful for small $z$ and weak for large $z$. The squarefree restriction matters because $P(z)$ itself is squarefree: for example, if $p<z$ and $p\mid a$, then the conditions $p\mid a$ and $p^2\mid a$ do not describe the same deletion event, while inclusion-exclusion over prime deletion events only needs to know whether the prime $p$ occurs at least once. Adding non-squarefree divisors would therefore count divisibility strength rather than membership in the union of prime obstruction sets.
The finiteness of $A$ is used so that all sums may be interchanged without convergence questions. If $A$ were an infinite set such as all positive integers, the expression $\sum_{d\mid P(z)}\mu(d)|A_d|$ would contain infinite cardinalities and would no longer be a numerical identity until a finite cutoff or a limiting density had been specified. The coprimality condition with $P(z)$ is also tied to deleting prime obstructions: replacing it by unrelated congruence conditions would require a different family of sets $A_d$ and a different inclusion-exclusion lattice.
The multiplicativity of $\omega$ is also essential for the Euler product. If $P(z)=pq$ and the local data were $\omega(p)=\omega(q)=1$ but $\omega(pq)=0$, then the main term
\begin{align*}
X\left(1-\frac{1}{p}-\frac{1}{q}\right)
\end{align*}
would not equal $X(1-1/p)(1-1/q)$; the missing $X/(pq)$ term is exactly the failure of prime-by-prime independence at the level of the model density. Let $\pi(y)=|\{p\le y:p\text{ prime}\}|$ denote the prime-counting function. There are $2^{\pi(z-)}$ squarefree divisors of $P(z)$, where $\pi(z-)$ counts primes $p<z$, so the remainder sum soon contains far more terms than the original problem can tolerate; the theorem gives an exact formula, but it does not by itself give a useful estimate when the accumulated errors $r_d$ are uncontrolled. Brun's truncation keeps the same deletion events and the same local data, but replaces equality by inequalities whose remainders involve only the first few layers of this divisor lattice.
## Brun's Alternating Truncation
If exact inclusion-exclusion is too long, can we stop it while keeping a genuine upper or lower bound? Brun's pure sieve answers this by exploiting the signs of the inclusion-exclusion series. The price for truncation is that the main term is no longer an exact Euler product, but it remains close enough to the product when the truncation level is chosen with care.
[definition: Brun Alternating Sums]
Let $A$ be a finite set of integers and let $P(z)=\prod_{p<z}p$. For $m\ge 0$, define
\begin{align*}
S_m(\,\cdot\,,P,z):\{B\subset \mathbb Z:B\text{ finite}\}\to \mathbb Z
\end{align*}
by
\begin{align*}
S_m(A,P,z)=\sum_{\substack{d\mid P(z),\, \nu(d)\le m}}\mu(d)|A_d|,
\end{align*}
where $\nu(d)$ is the number of prime factors of the squarefree integer $d$.
[/definition]
The integer $m$ is the truncation depth. To use these finite sums as a sieve rather than as approximations, we need a sign principle showing that even and odd depths bound the true sifted count from opposite sides.
[quotetheorem:1111]
[citeproof:1111]
The theorem is the structural heart of Brun's method. The parity of the truncation depth is not cosmetic: if an element is divisible by exactly two sieving primes, then the exact indicator is $0$, while the odd truncation at depth $1$ gives $1-2=-1$ and the even truncation at depth $2$ gives $1-2+1=0$. Reversing the parity in the theorem would assert an upper bound from $S_1$, which fails in this example because $-1$ is not an upper bound for $0$. The truncated sum is therefore not an approximation whose error has an uncontrolled sign; it is a signed one-sided bound whose direction is determined by the last included parity.
The hypotheses are deliberately sparse, and each one serves a specific role. The set $A$ is finite so that the argument reduces to a finite sum of indicators; for an infinite sequence the inequality must first be applied to a finite truncation such as $1\le n\le x$. The integer condition $k\ge 0$ ensures that the upper and lower truncation depths are nonnegative; there is no depth $-1$ inequality to compare with the sifted indicator. The product $P(z)$ is a squarefree product of distinct primes, so an element contributes according to the number of distinct sieving primes dividing it; if repeated prime powers were included as separate events, the binomial calculation in the proof would no longer describe the contribution of a fixed element.
The statement is purely combinatorial and says nothing about the size of the truncated sums. A counterexample to any automatic strength is obtained by taking a finite set in which every element lies in many of the deleted residue classes: the inequality remains true, but $S_{2k}(A,P,z)$ can be much larger than the actual sifted count. Bad arithmetic information about the sets $A_d$ can therefore make the resulting bound weak. Brun's method converts incomplete inclusion-exclusion into rigorous one-sided estimates, so the task becomes bounding the truncated main term and the truncated remainder.
[definition: Pure Brun Sieve Remainder]
Assume that $|A_d|=\omega(d)X/d+r_d$ for squarefree $d$. The order-$m$ pure Brun remainder is
\begin{align*}
R_m(A,z)=\sum_{\substack{d\mid P(z),\, \nu(d)\le m}}\mu(d)r_d.
\end{align*}
[/definition]
This remainder is manageable when $m$ is fixed or grows slowly, because the number of included divisors is polynomial in $\pi(z)$ for fixed $m$. The definition is still only bookkeeping: it does not say that cancellation occurs, and it does not decide how large the truncated density sum is. We now isolate the simplest upper-bound consequence, where all loss from the remainders is paid for by counting how many truncated divisor terms occur.
[quotetheorem:7165]
This statement is intentionally rough, but it shows the mechanism: a bounded number of intersections gives a bound with a controlled combinatorial error. The fixed-$k$ hypothesis is part of the estimate, not a harmless decoration. If $k$ is allowed to grow with $z$, the number of retained divisors is no longer bounded by a constant multiple of $(\pi(z))^{2k}$ with a fixed implicit constant, and the combinatorial error can overwhelm the intended saving. The multiplicativity of $\omega$ is needed to interpret the truncated density sum as the first layers of a product; with $\omega(p)=\omega(q)=1$ but $\omega(pq)=0$, the two-prime contribution has the wrong sign and size compared with the model product. The assumption $|r_d|\le 1$ is a model for sequences counted by residue classes in intervals; without it the conclusion can lose all force. For instance, if the remainders for all $d$ with $\nu(d)\le 2k$ were allowed to have the same positive size $X$, then the truncated remainder would be of order $X(\pi(z))^{2k}$, larger than the expected sieve main term in the ranges where the sieve is useful.
The condition $\omega(p)<p$ says that at least one residue class modulo each sieving prime survives. If $\omega(p)=p$ for some $p<z$, the local factor $1-\omega(p)/p$ vanishes, and if $\omega(p)>p$, the local factor becomes negative, so the density model no longer describes a nonnegative survival proportion. The theorem also gives only a truncated main term, not a full Euler product estimate; extra analytic work is needed to compare the finite alternating sum with the product and to choose $k$ as a function of the sieve level. To obtain prime-gap applications, one chooses $z$ as a small power of $x$ and then optimises $k$ slowly with $x$.
[example: Two-Level Truncation]
Taking $k=1$ in the *Brun [Bonferroni inequalities](/theorems/1111)* gives the even-depth upper bound
\begin{align*}
S(A,P,z)\le S_2(A,P,z).
\end{align*}
By the definition of the truncated Brun sum,
\begin{align*}
S_2(A,P,z)=\sum_{\substack{d\mid P(z),\,\nu(d)\le 2}}\mu(d)|A_d|.
\end{align*}
The divisor $d=1$ contributes $\mu(1)|A_1|=|A|$, because $A_1=\{a\in A:1\mid a\}=A$. The divisors with $\nu(d)=1$ are exactly the primes $p<z$, and each has $\mu(p)=-1$, so their total contribution is
\begin{align*}
-\sum_{p<z}|A_p|.
\end{align*}
The divisors with $\nu(d)=2$ are exactly the products $pq$ with distinct primes $p<q<z$, and each has $\mu(pq)=(-1)^2=1$, so their total contribution is
\begin{align*}
\sum_{p<q<z}|A_{pq}|.
\end{align*}
Combining the three layers gives
\begin{align*}
S(A,P,z)\le |A|-\sum_{p<z}|A_p|+\sum_{p<q<z}|A_{pq}|.
\end{align*}
Thus the first layer subtracts all elements hit by one prime obstruction, the second layer restores the elements subtracted twice because they are hit by two distinct prime obstructions, and the even truncation guarantees that ignoring the remaining higher intersections still leaves an upper bound.
[/example]
## Twin Primes and the Singular Series
How does the abstract sifting set-up detect prime pairs? A pair $n,n+2$ can be prime only when the product $n(n+2)$ has no small prime divisor. Thus Brun applies the sieve not to the primes directly, but to the sequence of products whose forbidden congruence classes encode the two prime conditions.
[definition: Twin-Prime Sifting Sequence]
Define the family
\begin{align*}
A:[1,\infty)\to \{B\subset \mathbb Z:B\text{ finite}\}
\end{align*}
by
\begin{align*}
A(x)=\{n(n+2):1\le n\le x\}.
\end{align*}
The associated sifted count is
\begin{align*}
S(A(x),P,z)=|\{1\le n\le x:\gcd(n(n+2),P(z))=1\}|.
\end{align*}
[/definition]
If both $n$ and $n+2$ are prime and $n>z$, then $n(n+2)$ is counted by the sifted set. Conversely, sifted elements may include pairs in which one or both entries are composite with all prime factors at least $z$.
[example: Local Density For Twin Primes]
For the twin-prime sifting sequence, a prime $p$ deletes those $n$ for which
\begin{align*}
p\mid n(n+2).
\end{align*}
Since $p$ is prime, this divisibility is equivalent to
\begin{align*}
n\equiv 0\pmod p \quad\text{or}\quad n+2\equiv 0\pmod p.
\end{align*}
The second congruence is the same as
\begin{align*}
n\equiv -2\pmod p.
\end{align*}
If $p>2$, then the two residue classes $0\pmod p$ and $-2\pmod p$ are distinct, because equality would give $2\equiv 0\pmod p$, which is impossible for an odd prime. Thus exactly two residue classes modulo $p$ are forbidden, so
\begin{align*}
\omega(p)=2 \quad (p>2).
\end{align*}
For $p=2$, the classes coincide because
\begin{align*}
-2\equiv 0\pmod 2,
\end{align*}
so only one residue class modulo $2$ is forbidden and
\begin{align*}
\omega(2)=1.
\end{align*}
Therefore the prime-by-prime local survival factors are
\begin{align*}
1-\frac{\omega(2)}{2}=1-\frac{1}{2}
\end{align*}
and, for each odd prime $p<z$,
\begin{align*}
1-\frac{\omega(p)}{p}=1-\frac{2}{p}.
\end{align*}
Multiplying these factors over all primes below $z$ gives
\begin{align*}
\left(1-\frac{1}{2}\right)\prod_{2<p<z}\left(1-\frac{2}{p}\right).
\end{align*}
This product records the local obstruction pattern for twin primes: one class is removed modulo $2$, while two classes are removed modulo every odd sieving prime.
[/example]
The local-density example shows that the twin-prime product decays like a constant times $(\log z)^{-2}$, but that constant contains arithmetic information not present in two independent random prime conditions. A naive independence model would remove two unrelated classes modulo every prime, including modulo $2$, and would miss the fact that the two forbidden classes $0$ and $-2$ coincide when $p=2$. That failure at the smallest prime changes the global constant rather than the power of $\log z$. This motivates the following definition, which names the correction factor obtained by comparing the true local survival probabilities with the square of the ordinary prime survival probability.
[definition: Twin-Prime Singular Series]
The twin-prime singular series is
\begin{align*}
\mathfrak{S}_2=2\prod_{p>2}\frac{1-2/p}{(1-1/p)^2}.
\end{align*}
[/definition]
The factor $2$ accounts for the obstruction modulo $2$: among two adjacent odd/even positions, a prime pair of gap $2$ must have both entries odd except for the exceptional pair $(3,5)$. For odd primes the quotient compares the true local survival probability with the model of two independent prime conditions.
The final twin-prime estimate needs a quantitative form of Brun's pure sieve, stronger than the crude fixed-depth bound above. The following theorem is quoted here as a standard dimension-$2$ output of the pure sieve; Chapter 4 will show how Selberg's upper-bound sieve reaches the same twin-prime scale with sharper logarithmic loss.
[quotetheorem:7166]
The restrictions in this quoted theorem are part of the statement. The upper range $u\le c\log\log x$ keeps the sieve level high enough for the logarithmic saving while leaving enough averaging room to absorb the residue-class errors. The dimension-$2$ local data are also fixed: changing the number of forbidden classes changes the product and therefore changes the final power of $\log z$.
This quoted result is not yet a theorem about twin primes, because it bounds the sifted count $S(A(x),P,z)$ rather than the counting function for prime pairs. The remaining step is to connect a genuine pair $p,p+2$ with survival in the sifted set, choose $z$ in the admissible range, and evaluate the local product by Mertens' theorem. This conversion is needed: if $z$ were chosen outside the range of the pure sieve, the sifted set could contain too many composite almost-prime pairs, while if the sequence had dimension $1$ local data such as $\omega(p)=1$, the same argument would lose the second logarithmic saving. The next theorem performs this conversion and records the first quantitative consequence of Brun's pure sieve for the twin-prime problem.
[quotetheorem:7167]
The definition of $\pi_2(x)$ counts first components $p\le x$, while the sifting sequence above uses $1\le n\le x$ and tests $n(n+2)$. These differ only by the harmless endpoint convention that $p+2$ may exceed $x$; changing to $p+2\le x$ alters the count by at most $O(1)$ after shifting $x$ by $2$.
This theorem does not prove that infinitely many twin primes exist. It is an upper bound only: the sifted set still contains almost-prime pairs such as $n$ and $n+2$ both composite but with no small prime factors. The proof also relies on the finite interval $1\le n\le x$; without this cutoff the congruence counts $|A_d|$ would not be finite. The restriction $z=x^{1/u}$ with $u$ in the quoted range is needed because taking $z$ too small leaves too many composites in the sifted set, while taking $z$ too large makes the pure-sieve remainder uncontrollable by this theorem.
The special local-density hypotheses are necessary for the exponent $2$ in the logarithm. If the sequence were $A=\{n:1\le n\le x\}$ with the single obstruction $p\mid n$, then $\omega(p)=1$ and the same product has size comparable to $1/\log z$, giving dimension-$1$ behaviour rather than a twin-prime bound. If instead the two residue classes failed to be distinct for many odd primes, as happens for a fixed gap $h$ at primes $p\mid h$, the local density would no longer be $\omega(p)=2$ at those primes and the singular series normalization would change. The residue estimate $|A_d|=\omega(d)x/d+O(\omega(d))$ is also essential: if the errors were as large as $x/d$ with the same sign over many retained $d$, the quoted pure-sieve theorem would not apply and the logarithmic upper bound would not follow. Its significance is that it improves the naive order $x/\log x$ to almost the conjectural order $x/(\log x)^2$, up to a slowly growing factor.
[example: Reading The Upper Bound]
By the *[Prime Number Theorem](/theorems/1742)*, the number of possible first components $p\le x$ is heuristically of size $x/\log x$. The [Brun Upper Bound For Twin Primes] theorem gives a constant $C>0$ such that
\begin{align*}
\pi_2(x)\le C\frac{x(\log\log x)^2}{(\log x)^2}.
\end{align*}
Rewrite the right-hand side by separating the prime-counting scale:
\begin{align*}
C\frac{x(\log\log x)^2}{(\log x)^2}=C\left(\frac{x}{\log x}\right)\left(\frac{(\log\log x)^2}{\log x}\right).
\end{align*}
Thus, relative to the $x/\log x$ scale for choosing $p$ prime, Brun's theorem saves the additional factor
\begin{align*}
\frac{(\log\log x)^2}{\log x}.
\end{align*}
The conjectural twin-prime scale would replace this factor by a constant multiple of $1/\log x$, so the extra multiplier $(\log\log x)^2$ is the cost of using pure truncated inclusion-exclusion rather than a predicted feature of the primes.
[/example]
The same upper bound is strong enough to prove Brun's theorem on reciprocal twin primes. A count of order $x/\log x$, like the primes themselves, would not suffice for convergence by the usual [integral test](/theorems/196); the extra factor of roughly $1/\log x$ is exactly what changes the behaviour of reciprocal sums. This is a qualitative result: even if there are infinitely many twin primes, they form a sparse enough set that their reciprocal sum converges.
[quotetheorem:7168]
[citeproof:7168]
This result is historically striking because it separates twin primes from the full set of primes, whose reciprocal sum diverges. The proof uses positivity only through the counting function $\pi_2(x)$ and partial summation; without Brun's logarithmic saving over $x/\log x$, the final integral would diverge. This dependence is sharp at the level of the argument: a hypothetical set with counting function comparable to $x/\log x$ would have reciprocal sum comparable to $\int dt/(t\log t)$, which diverges, while Brun's bound replaces this by an integrable majorant. The theorem also requires summing over genuine positive terms; cancellation in a signed series would not imply sparsity of the underlying set. The theorem still does not decide whether the set of twin primes is finite or infinite, because a finite sum and a convergent infinite sum are both compatible with the conclusion. Brun's method therefore proves a real sparsity theorem for prime pairs without resolving whether infinitely many such pairs exist.
## What Brun's Pure Sieve Achieves
What remains after replacing exact counting by one-sided inequalities? The chapter began with Eratosthenes as exact deletion and ended with Brun's replacement for exact deletion: bounded-depth alternating sums. This transition is the first major theme of sieve theory and also a general lesson in combinatorics and probability: local exclusion rules may be exact, but useful global estimates often come from controlled approximations to inclusion-exclusion. Brun's truncation sacrifices equality to obtain upper and lower bounds with feasible error terms.
[remark: Dimension Two Behaviour]
For the twin-prime sequence, the local density satisfies $\omega(p)=2$ for odd primes. This is why the main term behaves like a dimension $2$ sieve and why the conjectural order is $x/(\log x)^2$ rather than $x/\log x$.
[/remark]
Chapter 3 replaces the pure alternating truncation by abstract upper and lower combinatorial weights, and Chapter 4 then introduces Selberg's quadratic upper-bound weights. These weights retain the one-sided nature of Brun's inequalities while producing sharper constants, better ranges of uniformity, and more flexible applications to almost-primes.
Chapter 2 showed how truncating inclusion-exclusion already produces meaningful upper and lower estimates, but at the cost of crude constants and limited flexibility. The next chapter abstracts that pattern into the combinatorial sieve, where weights and the notion of sieve dimension separate the formal structure from any particular example.
# 3. The Combinatorial Sieve and Dimension One
Chapters 1 and 2 treated sifting as exact and then truncated inclusion-exclusion: choose a set of primes, delete the residue classes they forbid, and use the Bonferroni signs to retain one-sided information after truncation. This chapter abstracts that process into the combinatorial sieve. The point is to separate the arithmetic input, recorded in local densities and remainder terms, from the universal combinatorics of the weights. In dimension one this abstraction already recovers Brun-type upper and lower bounds and gives the first form of the fundamental lemma of sieve theory.
## The Sieve Axioms
What information about a sequence is needed before a sieve can be run? The exact elements of the sequence matter less than the number of elements satisfying each finite set of congruence restrictions. The sieve axioms package this information into a main term, controlled by a multiplicative density function, and an error term.
To allow multiplicities, we now write $\mathcal A$ for the finite sequence previously denoted by $A$. Let $\mathcal A$ be a finite sequence of integers, or more generally a finite multiset of integers, with total size parameter $X$. Let $\mathcal P$ be a set of primes, let
\begin{align*}
P(z) = \prod_{p < z,\ p \in \mathcal P} p,
\end{align*}
and write
\begin{align*}
S(\mathcal A, \mathcal P, z) = \#\{a \in \mathcal A : \gcd(a, P(z)) = 1\}.
\end{align*}
The quantity $S(\mathcal A, \mathcal P, z)$ is the number of elements of $\mathcal A$ which survive after all prime divisors $p < z$ from $\mathcal P$ have been sieved out. To evaluate it by inclusion-exclusion, the first required data are the counts satisfying simultaneous divisibility conditions.
[definition: Sieve Congruence Counts]
For each squarefree positive integer $d$ with prime factors in $\mathcal P$, define
\begin{align*}
\mathcal A_d = \{a \in \mathcal A : d \mid a\},
\end{align*}
and
\begin{align*}
A_d = |\mathcal A_d|.
\end{align*}
[/definition]
These counts are the data seen by inclusion-exclusion. If $d$ is a product of sieving primes, then $A_d$ measures how many elements are removed by all congruence conditions attached to the primes dividing $d$. The next issue is to replace each $A_d$ by a predictable main term plus a remainder that can later be controlled on average.
[definition: Sieve Axioms]
Let $\mathcal D(\mathcal P)$ be the set of squarefree positive integers whose prime factors all lie in $\mathcal P$. A sequence $\mathcal A$, a prime set $\mathcal P$, and a parameter $X > 0$ satisfy the sieve axioms with local density
\begin{align*}
\omega: \mathcal D(\mathcal P) \to \mathbb R
\end{align*}
and remainders $(r_d)_{d \in \mathcal D(\mathcal P)}$ if, for every $d \in \mathcal D(\mathcal P)$,
\begin{align*}
A_d = \frac{\omega(d)}{d}X + r_d,
\end{align*}
where $\omega$ is multiplicative on $\mathcal D(\mathcal P)$ and $0 \le \omega(p) < p$ for all $p \in \mathcal P$.
[/definition]
The factor
\begin{align*}
\frac{\omega(d)}{d}
\end{align*}
is the predicted probability that an element of $\mathcal A$ lies in all residue classes removed by primes dividing $d$. Multiplicativity expresses independence of local congruence restrictions at distinct primes, while $r_d$ measures the failure of perfect equidistribution. Once these local probabilities are known, the next quantity to estimate is the product of the probabilities of surviving each prime sieve.
[definition: Sifting Density Product]
The sifting density product is the function
\begin{align*}
V: [2,\infty) \to \mathbb R_{>0}
\end{align*}
defined by
\begin{align*}
z \mapsto V(z) = \prod_{p < z,\ p \in \mathcal P} \left(1 - \frac{\omega(p)}{p}\right).
\end{align*}
[/definition]
The product $V(z)$ is the expected surviving density after removing the forbidden classes modulo all primes below $z$. In dimension one it behaves like a constant multiple of
\begin{align*}
\frac{1}{\log z},
\end{align*}
so a sieve bound of size $X V(z)$ has the scale expected for prime-like objects. We therefore need a definition that records when the product has exactly this logarithmic decay.
[definition: Dimension One Sieve]
A sieve satisfying the above axioms has dimension one if there are constants $C_1, C_2 > 0$ such that for all $2 \le w < z$,
\begin{align*}
C_1\frac{\log w}{\log z}
\le
\prod_{w \le p < z}\left(1 - \frac{\omega(p)}{p}\right)
\le
C_2\frac{\log w}{\log z}.
\end{align*}
[/definition]
This condition says that the accumulated loss from sieving primes between $w$ and $z$ matches the loss from deleting one residue class modulo each prime. The model case is $\omega(p)=1$, where Mertens' theorem gives
\begin{align*}
V(z) \asymp \frac{1}{\log z}.
\end{align*}
[example: Rough Integers]
Let $\mathcal A=\{1,2,\dots,\lfloor x\rfloor\}$, let $\mathcal P$ be the set of all primes, and take $X=x$. If $d$ is squarefree, then $\mathcal A_d$ is the set of multiples of $d$ among the integers $1,\dots,\lfloor x\rfloor$. These multiples are exactly $d,2d,\dots,md$, where $m$ is the largest integer with $md\le x$, so
\begin{align*}
A_d=\left\lfloor \frac{x}{d}\right\rfloor .
\end{align*}
For every real number $y$, $\lfloor y\rfloor\le y<\lfloor y\rfloor+1$, hence $-1<\lfloor y\rfloor-y\le 0$. Applying this with $y=x/d$ gives
\begin{align*}
-1<\left\lfloor \frac{x}{d}\right\rfloor-\frac{x}{d}\le 0.
\end{align*}
Thus
\begin{align*}
A_d=\frac{x}{d}+r_d
\end{align*}
with
\begin{align*}
r_d=\left\lfloor \frac{x}{d}\right\rfloor-\frac{x}{d}
\end{align*}
and $|r_d|<1$. Therefore the sieve axioms hold with $\omega(d)=1$ for every squarefree $d$: this function is multiplicative on squarefree integers, and for every prime $p$ one has $0\le \omega(p)=1<p$.
The sifted quantity is
\begin{align*}
S(\mathcal A,\mathcal P,z)=\#\{1\le n\le \lfloor x\rfloor:\gcd(n,P(z))=1\}.
\end{align*}
Since $P(z)$ is the product of all primes $p<z$, the condition $\gcd(n,P(z))=1$ means exactly that no prime $p<z$ divides $n$. In this model case,
\begin{align*}
V(z)=\prod_{p<z}\left(1-\frac{1}{p}\right).
\end{align*}
By *Mertens' theorem*, this product has order $1/\log z$, so the expected sieve scale is
\begin{align*}
xV(z)\asymp \frac{x}{\log z}.
\end{align*}
Thus rough integers in this range are predicted to occur with density comparable to $1/\log z$, as long as the chosen sieve level is strong enough to keep the accumulated remainders under control.
[/example]
The example has very small individual remainders, but a sieve uses many values of $d$ at once. For this reason the relevant hypothesis is not merely a pointwise estimate for $r_d$. The next definition measures the aggregate remainder across the values of $d$ that the sieve weights will use.
[definition: Level Of Distribution]
For a fixed sieving parameter $z \ge 2$, the total remainder function is
\begin{align*}
R_z: [1,\infty) \to \mathbb R_{\ge 0}
\end{align*}
defined by
\begin{align*}
D \mapsto R_z(D)= \sum_{d < D,\ d \mid P(z)} |r_d|.
\end{align*}
A number $D \ge 1$ is a level of distribution with tolerance $\eta \ge 0$ for this sieve problem at $z$ if
\begin{align*}
R_z(D) \le \eta X V(z).
\end{align*}
[/definition]
The tolerance $\eta$ depends on the theorem being applied: an upper bound may allow $\eta=O(1)$, while an asymptotic formula requires $\eta=o(1)$ along the limiting range. In the rest of the chapter we write $R(D)$ for $R_z(D)$ when $z$ is fixed by context. The combinatorial sieve will give bounds whose error term contains $R(D)$, so the arithmetic problem decides how large $D$ may be.
## Upper and Lower Combinatorial Weights
How can inclusion-exclusion be truncated without losing the inequality in the wrong direction? The answer is to replace the full Möbius coefficients by finite collections of weights which majorise or minorise the condition $(n,P(z))=1$. These weights are universal: after they are constructed, the same weights apply to any sequence satisfying the sieve axioms.
For any integer $n$, exact inclusion-exclusion gives
\begin{align*}
\mathbb{1}_{\gcd(n,P(z))=1}
=
\sum_{d \mid \gcd(n,P(z))}\mu(d).
\end{align*}
A truncated sum cannot usually be equal to the indicator, so the goal is to construct coefficients $\lambda_d^+$ and $\lambda_d^-$ supported on $d < D$ such that they bound this divisor sum from above and below.
[definition: Upper And Lower Sieve Weights]
Let $D,z \ge 2$. Upper and lower sieve weights of level $D$ are [real numbers](/page/Real%20Numbers) $(\lambda_d^+)_{d \mid P(z)}$ and $(\lambda_d^-)_{d \mid P(z)}$ such that $\lambda_1^+ = \lambda_1^- = 1$, $\lambda_d^+ = \lambda_d^- = 0$ for $d \ge D$, and for every positive integer $n$,
\begin{align*}
\sum_{d \mid \gcd(n,P(z))}\lambda_d^-
\le
\mathbb{1}_{\gcd(n,P(z))=1}
\le
\sum_{d \mid \gcd(n,P(z))}\lambda_d^+.
\end{align*}
[/definition]
The inequalities turn the sifting problem into two divisor sums. After summing over $a \in \mathcal A$, the divisor condition $d \,|\, a$ produces $A_d$, so the sieve axioms become directly usable.
[quotetheorem:7169]
[citeproof:7169]
The theorem reduces the sieve to two tasks: construct weights with good main terms, and prove that the remainders are small up to the chosen level. Each hypothesis is doing visible work. If the pointwise weight inequalities fail, summing them over $\mathcal A$ no longer gives a valid upper or lower bound for $S(\mathcal A,\mathcal P,z)$; a single integer $n$ for which the proposed upper weight sum is smaller than $\mathbb{1}_{\gcd(n,P(z))=1}$ would already invalidate the upper estimate. If the support is not restricted to $d<D$, then the error term would involve remainders beyond the range where the arithmetic problem has supplied control. The theorem also does not estimate the weighted main sums themselves, so without a separate analysis of the chosen weights it gives only a formal reduction, not a numerical sieve bound.
To compare different choices of $D$ and $z$, the next definition records the single scale parameter on which the universal main terms depend. The level $D$ says how far inclusion-exclusion may be trusted, while $z$ says which primes have been removed. Their logarithmic ratio is the natural scale because a divisor made from primes below $z$ has size below $D$ only when the total logarithmic mass of its prime factors is below $\log D$.
[definition: Sieve Parameter]
The sieve parameter is the function
\begin{align*}
s: (1,\infty)^2 \to \mathbb R_{>0}
\end{align*}
defined by
\begin{align*}
(D,z) \mapsto s(D,z)=\frac{\log D}{\log z}.
\end{align*}
[/definition]
The value $s(D,z)$ measures how far the weights can see relative to the primes being sieved. The identity
\begin{align*}
D=z^{s(D,z)}
\end{align*}
shows that larger $s(D,z)$ means deeper inclusion-exclusion and a better approximation to the full product $V(z)$. This motivates the next theorem: Brun's estimates quantify how much information upper and lower weights recover from a given value of $s(D,z)$.
[quotetheorem:7170]
[citeproof:7170]
This statement is the practical form of Brun's pure sieve, but its hypotheses mark real limitations. The dimension-one product estimate is essential: if $\omega(p)=2$ for most sieving primes, then $V(z)$ has order $(\log z)^{-2}$ rather than $(\log z)^{-1}$, so the functions $F$ and $f$ for dimension one give the wrong scale. The level restriction is also essential: if $R(D)$ is comparable with $X V(z)$, the error term can swamp the main term even when the combinatorial weights are optimal. Thus Brun's theorem supplies the universal sieve factor once the arithmetic remainders are controlled; it does not by itself prove a distribution estimate or distinguish primes from products of two large primes.
[example: Linear Forms]
Let $L(n)=an+b$ with $a,b\in\mathbb Z$, $a\ge 1$, and $\gcd(a,b)=1$, and let $\mathcal A=\{L(n):1\le n\le x\}$. If $p\nmid a$, then $a$ has an inverse modulo $p$, so
\begin{align*}
p\mid an+b \Longleftrightarrow an\equiv -b \pmod p \Longleftrightarrow n\equiv -b a^{-1}\pmod p.
\end{align*}
Thus exactly one residue class modulo $p$ is removed. If $p\mid a$, then $\gcd(a,b)=1$ gives $p\nmid b$, and hence
\begin{align*}
an+b\equiv b\not\equiv 0\pmod p.
\end{align*}
So no residue class is removed for primes dividing $a$. Therefore
\begin{align*}
\omega(p)=1 \text{ if } p\nmid a,\qquad \omega(p)=0 \text{ if } p\mid a.
\end{align*}
For squarefree $d$, multiplicativity gives
\begin{align*}
\omega(d)=1_{\gcd(d,a)=1}.
\end{align*}
If $\gcd(d,a)=1$, then $a$ is invertible modulo $d$, and the divisibility condition is the single congruence
\begin{align*}
d\mid an+b \Longleftrightarrow n\equiv -b a^{-1}\pmod d.
\end{align*}
Among $1\le n\le x$, one residue class modulo $d$ contributes
\begin{align*}
A_d=\frac{x}{d}+O(1).
\end{align*}
If $\gcd(d,a)>1$, choose a prime $q\mid \gcd(d,a)$. Then $q\mid a$ and $q\nmid b$, so $an+b\equiv b\not\equiv 0\pmod q$, which makes $d\mid an+b$ impossible. Hence $A_d=0$ in this case. Thus the sieve axioms hold with
\begin{align*}
A_d=\frac{\omega(d)}{d}x+r_d
\end{align*}
and remainders bounded uniformly by $O(1)$.
The sifted set is
\begin{align*}
S(\mathcal A,\mathcal P,z)=\#\{1\le n\le x:\gcd(an+b,P(z))=1\}.
\end{align*}
Its density product is
\begin{align*}
V(z)=\prod_{p<z,\ p\nmid a}\left(1-\frac{1}{p}\right).
\end{align*}
For fixed $a$, the missing primes $p\mid a$ change the full Mertens product by only a fixed non-zero factor, so *Mertens' theorem* gives
\begin{align*}
V(z)\asymp \frac{1}{\log z}.
\end{align*}
With $D$ below the available distribution level, the dimension-one upper sieve therefore bounds the number of values $an+b$ with no allowed prime factor below $z$ on the prime-like scale
\begin{align*}
xV(z)\asymp \frac{x}{\log z}.
\end{align*}
[/example]
The linear-form example is the prototype for detecting almost primes in arithmetic progressions. The sieve does not prove that many values are prime, but it can prove that many values have no small prime factor, and hence are products of few large primes once $z$ is chosen as a power of $x$.
## Rosser-Iwaniec Weights In Dimension One
Can the Brun weights be made both explicit and efficient enough for later applications? The Rosser-Iwaniec construction gives dimension-one weights with strong uniformity and with coefficients bounded in magnitude by $1$. This boundedness is valuable because it prevents the remainder term from being amplified by large coefficients.
[definition: Rosser-Iwaniec Weights]
Rosser-Iwaniec upper and lower weights of level $D$ are sieve weights $\lambda_d^+$ and $\lambda_d^-$ supported on squarefree divisors $d$ of $P(z)$ with $d<D$, satisfying
\begin{align*}
|\lambda_d^+| \le 1,
\end{align*}
and
\begin{align*}
|\lambda_d^-| \le 1,
\end{align*}
and the upper and lower pointwise sieve inequalities.
[/definition]
The construction orders the prime factors of $d$ and admits only products satisfying alternating size restrictions. The detailed recursive definition is technical, but the only properties needed in the first pass are support below $D$, bounded coefficients, and the quality of the main term in dimension one.
[quotetheorem:7171]
These weights are the standard dimension-one input for later chapters because they give a clean interface: prove a level of distribution, choose $D$, compute $s$, and insert the universal functions $F$ and $f$. The assumptions cannot be removed harmlessly. If the sieve has dimension two rather than dimension one, for example when $\omega(p)=2$ for all sufficiently large primes outside a finite exceptional set, Mertens' theorem gives
\begin{align*}
\prod_{p<z}\left(1-\frac{2}{p}\right)\asymp \frac{1}{(\log z)^2},
\end{align*}
after deleting the finitely many primes for which the local factor is not positive. The same weighted sums are then governed by dimension-two sieve functions, so the displayed dimension-one main terms have the wrong size. If the weights were allowed to use divisors beyond $D$, the main term might improve formally but the accompanying remainder would require information not available from a level-$D$ distribution estimate; a sequence such as $\mathcal A=\{n\le x:n\equiv 0\pmod q\}$ with $q>D$ can look harmless to divisors below $D$ while behaving very differently at the missing modulus. If the coefficients were unbounded, an error contribution of the form $\sum |\lambda_d r_d|$ could be much larger than $R(D)$; taking many coefficients of size $M$ would magnify even remainders of size $1$ by the same factor $M$. The theorem therefore does not say that every dimension-one sieve has an asymptotic formula, nor that deeper unsupported weights are free. Its role in Chapter 5 is narrower and more useful: once Bombieri-Vinogradov from Chapter 7, or a comparable distribution theorem, supplies a level $D$, the Rosser-Iwaniec bounds convert that level into almost-prime estimates with no further redesign of the sieve weights.
[remark: The Parity Barrier]
The lower-bound threshold $s>2$ reflects the parity obstruction in elementary sieve methods. A sieve based only on divisibility by small primes has difficulty distinguishing primes from products of two primes when both have comparable size. This is why lower-bound sieves usually produce almost primes rather than primes without extra analytic input.
[/remark]
The parity barrier will reappear in applications to twin-prime-type problems. It is not a defect of a particular set of weights, but a structural limitation of what the local divisibility data can detect.
## The Fundamental Lemma In Dimension One
How close can the sieve weights get to exact inclusion-exclusion when the level is much larger than the sieving range? The fundamental lemma answers this by showing that, in dimension one, the upper and lower weighted main terms both approximate $V(z)$ very closely once
\begin{align*}
s=\frac{\log D}{\log z}
\end{align*}
is large. This is the theorem that turns combinatorial sieving into a flexible asymptotic tool.
The exponential error term is a stronger statement than the coarse Rosser-Iwaniec main-term estimate recorded above. It requires the usual linear-sieve regularity hypotheses, not only the two-sided order of magnitude
\begin{align*}
V(z)\asymp \frac{1}{\log z}.
\end{align*}
These hypotheses give uniform control of prime products in short ranges and keep the discretisation of the continuous sieve functions within the same exponential scale.
[quotetheorem:7172]
Combining the fundamental lemma with the [combinatorial sieve inequalities](/theorems/7169) gives the working estimate
\begin{align*}
S(\mathcal A,\mathcal P,z)
=
X V(z)\{1+O(e^{-s})\}+O(R(D))
\end{align*}
whenever both upper and lower estimates are available, $s$ tends to infinity in the permitted range, the regularity hypotheses of the fundamental lemma hold, and $R(D)$ is smaller than the main term. For fixed $s$ the theorem gives the sharper linear-sieve factors $F(s)$ and $f(s)$, not a relative $o(1)$ asymptotic. The condition $s\ge 2+\eta$ for the lower estimate is not cosmetic: it gives a non-zero lower sieve function away from the parity threshold, but the approximation $f(s)=1+O(e^{-s})$ is a large-$s$ statement. The remainder hypothesis is equally necessary. For instance, if the sequence is concentrated in residue classes that the proposed density $\omega$ treats as equidistributed, the main term may still have the formal shape $X V(z)$ while the accumulated errors $R(D)$ are of that same size and the sieve cannot force an asymptotic. This formula should therefore be read as a conditional asymptotic: the combinatorial part supplies the factor $V(z)$ only after the sieve has enough logarithmic depth, while the arithmetic problem supplies the admissible level $D$ and the bound for $R(D)$.
[example: Rough Numbers With A Power Level]
For $\mathcal A=\{1,\dots,\lfloor x\rfloor\}$, take $\mathcal P$ to be the set of all primes and $\omega(p)=1$. Let
\begin{align*}
D=x^\theta
\end{align*}
and
\begin{align*}
z=x^u
\end{align*}
with $0<u<\theta<1$. Since $\log D=\theta\log x$ and $\log z=u\log x$, the sieve parameter is
\begin{align*}
s=\frac{\log D}{\log z}=\frac{\theta\log x}{u\log x}=\frac{\theta}{u}.
\end{align*}
For squarefree $d$, the rough-integer computation gives
\begin{align*}
A_d=\left\lfloor \frac{x}{d}\right\rfloor=\frac{x}{d}+r_d
\end{align*}
with
\begin{align*}
r_d=\left\lfloor \frac{x}{d}\right\rfloor-\frac{x}{d}.
\end{align*}
Because $-1<\lfloor y\rfloor-y\le 0$ for every real $y$, applying this to $y=x/d$ gives $|r_d|<1$. Therefore
\begin{align*}
R(D)=\sum_{d<D,\ d\mid P(z)}|r_d|\le \sum_{d<D,\ d\mid P(z)}1\le \sum_{1\le d<D}1\le D=x^\theta.
\end{align*}
The lower estimate in the fundamental lemma is available when $s>2$ away from the endpoint. Here
\begin{align*}
s=\frac{\theta}{u}>2
\end{align*}
is exactly the condition
\begin{align*}
u<\frac{\theta}{2}.
\end{align*}
Under this condition, the upper and lower sieve estimates from the *Fundamental Lemma Of The Dimension-One Sieve* give
\begin{align*}
S(\mathcal A,\mathcal P,z)=xV(z)\{1+O(e^{-s})\}+O(R(D)).
\end{align*}
Substituting $s=\theta/u$ and $R(D)\le x^\theta$ yields
\begin{align*}
S(\mathcal A,\mathcal P,z)=xV(z)\{1+O(e^{-\theta/u})\}+O(x^\theta).
\end{align*}
In the fixed-power choice $z=x^u$, the number $e^{-\theta/u}$ is independent of $x$, so this estimate is not a relative $o(1)$ asymptotic as $x\to\infty$. To force the sieve error from the fundamental lemma to vanish, choose $u=u(x)$ with $u(x)\to 0$ and $z=x^{u(x)}\to\infty$. Then
\begin{align*}
s=\frac{\log x^\theta}{\log x^{u(x)}}=\frac{\theta\log x}{u(x)\log x}=\frac{\theta}{u(x)}\to\infty,
\end{align*}
so
\begin{align*}
e^{-s}=e^{-\theta/u(x)}\to 0.
\end{align*}
If in addition
\begin{align*}
x^\theta=o(xV(z)),
\end{align*}
then the remainder contribution $O(x^\theta)$ is also $o(xV(z))$. Thus the sieve gives
\begin{align*}
S(\mathcal A,\mathcal P,z)=xV(z)\{1+o(1)\}.
\end{align*}
For $\omega(p)=1$,
\begin{align*}
V(z)=\prod_{p<z}\left(1-\frac{1}{p}\right),
\end{align*}
and *Mertens' theorem* gives $V(z)\asymp 1/\log z$. Hence the main term has the familiar rough-number scale
\begin{align*}
xV(z)\asymp \frac{x}{\log z}.
\end{align*}
[/example]
This example shows the tradeoff that governs the rest of the course. Raising $z$ removes more small prime factors and makes the sifted elements more prime-like, but it lowers $s$ unless the arithmetic input gives a correspondingly larger level $D$.
[example: Almost-Prime Values Of A Linear Form]
Let $L(n)=an+b$ with $\gcd(a,b)=1$ and $1\le n\le x$, and suppose the values under consideration are positive. Write $\Omega(m)$ for the number of prime factors of $m$, counted with multiplicity. Assume that $L(n)$ has no prime factor below
\begin{align*}
z=x^u
\end{align*}
with $u>0$, and that there is a constant $C>0$ such that
\begin{align*}
L(n)\le Cx
\end{align*}
for all $1\le n\le x$. If $\Omega(L(n))=k$, then every prime factor of $L(n)$ is at least $z$, so
\begin{align*}
L(n)\ge z^k.
\end{align*}
Combining this lower bound with $L(n)\le Cx$ gives
\begin{align*}
x^{uk}=z^k\le L(n)\le Cx.
\end{align*}
Taking logarithms gives
\begin{align*}
uk\log x\le \log C+\log x.
\end{align*}
Dividing by $u\log x>0$ gives
\begin{align*}
k\le \frac{1}{u}+\frac{\log C}{u\log x}.
\end{align*}
Since $k$ is an integer, for fixed $u$ and fixed $C$ this implies, once $x$ is large enough, that
\begin{align*}
\Omega(L(n))\le \left\lfloor \frac{1}{u}\right\rfloor.
\end{align*}
Now take a usable sieve level
\begin{align*}
D=x^\theta
\end{align*}
with $\theta>0$. For the lower-bound dimension-one sieve, the basic lower range requires the sieve parameter
\begin{align*}
s=\frac{\log D}{\log z}
\end{align*}
to be larger than $2$. Substituting $D=x^\theta$ and $z=x^u$ gives
\begin{align*}
s=\frac{\log x^\theta}{\log x^u}=\frac{\theta\log x}{u\log x}=\frac{\theta}{u}.
\end{align*}
Thus the condition $s>2$ is exactly
\begin{align*}
\frac{\theta}{u}>2,
\end{align*}
which is equivalent to
\begin{align*}
u<\frac{\theta}{2}.
\end{align*}
Under this inequality, the lower-bound sieve produces many values of $L(n)$ with no prime factor below $x^u$, and the preceding factor-count argument turns each such value into an $R$-almost-prime with
\begin{align*}
R=\left\lfloor \frac{1}{u}\right\rfloor
\end{align*}
for large $x$.
[/example]
The conclusion is weaker than primality, but it is often the natural output of a combinatorial sieve. To go beyond it, Chapters 6 and 7 add analytic information through the large sieve, Vaughan-type bilinear estimates, and Bombieri-Vinogradov distribution results.
## What This Chapter Provides For Later Sieves
The chapter has isolated the two independent components of a sieve argument. The universal component is the construction of upper and lower weights, culminating in the dimension-one fundamental lemma. The arithmetic component is the verification of the axioms and the estimation of $R(D)$ for the sequence being studied.
In applications the procedure is now standard. First identify the forbidden residue classes and compute $\omega(p)$. Next verify that the product $V(z)$ has dimension one. Then prove a level of distribution $D$ by estimating the remainders $r_d$ on average. Finally insert the resulting value of
\begin{align*}
s=\frac{\log D}{\log z}
\end{align*}
into the upper bound, lower bound, or fundamental lemma according to the desired conclusion.
The combinatorial sieve clarified how upper and lower bounds arise from weighted truncations, and how the parameter $s=\log D/\log z$ governs their strength. Selberg's upper-bound sieve now goes further by choosing the weights optimally, turning that abstract framework into a sharper quadratic minimization problem.
# 4. Selberg's Upper Bound Sieve
Chapter 3 developed the combinatorial sieve by abstracting the Brun truncation of Chapter 2 into upper and lower weights. Selberg's idea is different: instead of choosing a truncation pattern first, we build a non-negative majorant and then minimize its average size. This turns the upper-bound sieve into a finite quadratic optimization problem, where the arithmetic information enters through local densities and the analytic work is concentrated in estimating a main quadratic form.
## From Inclusion-Exclusion to Quadratic Weights
The first question is how to count elements avoiding small prime divisors without relying on alternating signs. Returning from the multiset notation $\mathcal A$ of Chapter 3 to the simpler symbol $A$, let $A$ be a finite set of integers or an arithmetic sequence, let $P(z)=\prod_{p<z}p$, and write
\begin{align*}
S(A,P,z)=|\{a\in A:(a,P(z))=1\}|.
\end{align*}
Inclusion-exclusion represents the condition $(a,P(z))=1$ using $\sum_{d\mid (a,P(z))}\mu(d)$, but truncating this sum destroys exact positivity. Selberg replaces the signed indicator by the square of a divisor sum.
[definition: Selberg Weight]
Let $D\ge 1$. A Selberg weight of level $D$ is a function
\begin{align*}
\lambda: \mathbb N \to \mathbb R, \qquad d\mapsto \lambda_d,
\end{align*}
such that $\lambda_1=1$ and $\lambda_d=0$ unless $d\mid P(z)$ and $d<D$.
[/definition]
The normalization $\lambda_1=1$ explains why these weights interact correctly with sifted elements: if $(a,P(z))=1$, then the only divisor $d\mid (a,P(z))$ with $d\mid P(z)$ is $d=1$. The remaining problem is pointwise rather than average: before any information about residue classes can be used, the weighted expression must dominate the actual sifted-set indicator for each individual element of $A$. Squaring the divisor sum is designed to make all unsifted contributions non-negative while forcing sifted elements to contribute exactly one, which is the majorant that Selberg's later optimization can safely sum.
[quotetheorem:7173]
[citeproof:7173]
This theorem contains no arithmetic estimate yet: it is a purely pointwise comparison between the sifted indicator and a non-negative square. The hypothesis $\lambda_1=1$ is essential because it forces the square to equal $1$ on every element with $(a,P(z))=1$; without it the majorant could miss sifted elements. The squaring is equally essential, since unsifted elements may have divisor sums of either sign, and positivity is what keeps their contribution harmless. The next step is to expand the square and use information about how often elements of $A$ lie in residue classes divisible by $d$.
[definition: Sifting Remainder]
Let
\begin{align*}
\mathcal D_z=\{d\in\mathbb N:d\mid P(z)\}.
\end{align*}
A sifting model consists of a size parameter $X>0$, a function
\begin{align*}
g:\mathcal D_z\to [0,1],
\end{align*}
which is multiplicative on squarefree divisors of $P(z)$, and a remainder function
\begin{align*}
r:\mathcal D_z\to \mathbb R, \qquad d\mapsto r_d,
\end{align*}
such that, for each $d\in\mathcal D_z$,
\begin{align*}
|\{a\in A:d\mid a\}|=Xg(d)+r_d.
\end{align*}
[/definition]
The density $g(d)$ records the expected proportion of elements divisible by $d$. In the standard applications $g(p)$ is small, and multiplicativity reflects the independence of congruence conditions modulo distinct primes.
[example: Integers With Small Prime Divisors Removed]
Let $A=\{1,\dots,x\}$, and for a squarefree integer $d$ write $A_d=|\{n\le x:d\mid n\}|$. The multiples of $d$ in this interval are $d,2d,\dots,\lfloor x/d\rfloor d$, hence
\begin{align*}
A_d=\left\lfloor \frac{x}{d}\right\rfloor .
\end{align*}
Writing the floor as its integer part plus its error gives
\begin{align*}
\left\lfloor \frac{x}{d}\right\rfloor=\frac{x}{d}+\left(\left\lfloor \frac{x}{d}\right\rfloor-\frac{x}{d}\right).
\end{align*}
Since $-1<\lfloor x/d\rfloor-x/d\le 0$, this is a sifting model with
\begin{align*}
X=x,\qquad g(d)=\frac1d,\qquad r_d=\left\lfloor \frac{x}{d}\right\rfloor-\frac{x}{d}=O(1).
\end{align*}
Therefore, when the square in the Selberg majorant is expanded, the main contribution from the pair of divisors $d,e$ is
\begin{align*}
xg([d,e])=\frac{x}{[d,e]}.
\end{align*}
Thus in the basic integer model the Selberg optimization is governed by the quadratic main term $\sum_{d,e<D}\lambda_d\lambda_e/[d,e]$, with only bounded floor-function errors left in the remainder.
[/example]
This example is the prototype for dimension $1$. Prime tuples, already visible in the twin-prime calculation of Chapter 2, require replacing $1/d$ by the proportion of forbidden residue classes modulo $d$.
## The Selberg Optimization Problem
The main problem is now finite-dimensional: among all admissible $\lambda_d$, which choice minimizes the main term of the majorant? For positive integers $d$ and $e$, write $[d,e]$ for their least common multiple. Expanding the square gives
\begin{align*}
\sum_{a\in A}\left(\sum_{d\mid (a,P(z))}\lambda_d\right)^2=\sum_{d,e<D}\lambda_d\lambda_e A_{[d,e]}.
\end{align*}
With $A_{[d,e]}=Xg([d,e])+r_{[d,e]}$, this becomes
\begin{align*}
X\sum_{d,e<D}\lambda_d\lambda_e g([d,e])+\sum_{d,e<D}\lambda_d\lambda_e r_{[d,e]}.
\end{align*}
The Selberg sieve chooses $\lambda_d$ to minimize the positive quadratic form
\begin{align*}
Q(\lambda)=\sum_{d,e<D}\lambda_d\lambda_e g([d,e])
\end{align*}
subject to $\lambda_1=1$.
[definition: Selberg Dual Coefficients]
Let
\begin{align*}
\mathcal D_z=\{d\in\mathbb N:d\mid P(z)\}.
\end{align*}
Assume $g:\mathcal D_z\to [0,1)$ is multiplicative. The Selberg dual coefficient function is
\begin{align*}
h:\mathcal D_z\to \mathbb R_{\ge 0}, \qquad d\mapsto h(d)=\prod_{p\mid d}\frac{g(p)}{1-g(p)}.
\end{align*}
[/definition]
The function $h$ is the reciprocal-density correction which appears after diagonalizing the quadratic form by Möbius inversion. After the square is expanded, the choice of the coefficients $\lambda_d$ is no longer a matter of sign but a finite quadratic minimization problem constrained by $\lambda_1=1$ and by the level $D$. The obstruction is that arbitrary weights may give a valid upper bound but a very poor one; to obtain the strongest Selberg bound, one must identify the unique density-corrected choice that minimizes the main quadratic form.
[quotetheorem:7174]
The exact formula is less important than the scale of $G(D,z)$, but the hypotheses explain why the formula is usable. The restriction $0\le g(p)<1$ makes each local factor $g(p)/(1-g(p))$ finite and non-negative; if $g(p)=1$, a congruence condition removes every residue class modulo $p$ and the sieve has no meaningful surviving local mass. Multiplicativity on squarefree divisors is what lets the divisor-lattice quadratic form diagonalize into the $h(m)$-sum; without it the minimization is still finite-dimensional but no longer has this Euler-product structure. The support condition $d<D$ is also not cosmetic: it keeps the optimized weights inside the range where the remainders $r_{[d,e]}$ can be controlled. This leads to the general upper-bound sieve, where the main term is explicit and every remaining difficulty is placed into a remainder sum.
[quotetheorem:7175]
[citeproof:7175]
The theorem separates the sieve into two independent tasks: compute or estimate $G(D,z)$, and prove that the remainder sum is smaller than the main term. The multiplicativity and local bounds on $g$ are needed for the optimized main term $X/G(D,z)$ to have the stated form; if the local model is wrong, the quadratic minimization optimizes the wrong average. The error term also shows the limitation of increasing the level $D$: larger $D$ usually improves $G(D,z)$, but it also introduces lcm moduli $[d,e]$ as large as about $D^2$, where residue-class estimates may no longer be available. Brun's sieve entangles these two issues through the combinatorics of alternating sums, while Selberg's method makes the tradeoff visible in a single formula.
## Dimension and the Size of the Main Term
The next question is how $G(D,z)$ behaves. The answer is controlled by sieve dimension: the cumulative local density over primes below $w$ should look like $\kappa\log\log w$.
[definition: Sieve Dimension]
A sifting model has dimension $\kappa>0$ if its local density function satisfies
\begin{align*}
\sum_{p<w} g(p)\log p=\kappa\log w+O(1)
\end{align*}
for $w\ge 2$, with the implicit constant independent of $w$.
[/definition]
This condition says that the product of local survival probabilities has order $(\log z)^{-\kappa}$. For a single integer avoiding divisibility by primes, $\kappa=1$; for a pair such as $n$ and $n+2$, there are usually two forbidden residue classes modulo $p$, so $\kappa=2$.
[example: Dimension of a Shifted Pair]
Fix a non-zero even integer $h$, and test the product $n(n+h)$ for small prime factors. For a prime $p$, the forbidden residue classes modulo $p$ are the solutions of
\begin{align*}
n(n+h)\equiv 0\pmod p.
\end{align*}
This congruence holds exactly when $n\equiv 0\pmod p$ or $n\equiv -h\pmod p$. If $p\nmid h$, then $0\not\equiv -h\pmod p$, so there are two forbidden classes and $g(p)=2/p$. If $p\mid h$, then $-h\equiv 0\pmod p$, so the two classes coincide and there is one forbidden class, giving $g(p)=1/p$.
Thus
\begin{align*}
\sum_{p<w}g(p)\log p=\sum_{\substack{p<w\\p\nmid h}}\frac{2\log p}{p}+\sum_{\substack{p<w\\p\mid h}}\frac{\log p}{p}.
\end{align*}
Rewrite the first sum by starting from all primes and subtracting the primes dividing $h$:
\begin{align*}
\sum_{p<w}g(p)\log p=2\sum_{p<w}\frac{\log p}{p}-\sum_{\substack{p<w\\p\mid h}}\frac{\log p}{p}.
\end{align*}
The standard prime harmonic estimate gives
\begin{align*}
\sum_{p<w}\frac{\log p}{p}=\log w+O(1).
\end{align*}
Since $h$ is fixed, the correction satisfies
\begin{align*}
0\le \sum_{\substack{p<w\\p\mid h}}\frac{\log p}{p}\le \sum_{p\mid h}\frac{\log p}{p}=O_h(1).
\end{align*}
Therefore
\begin{align*}
\sum_{p<w}g(p)\log p=2\log w+O_h(1).
\end{align*}
So this shifted-pair model has sieve dimension $2$; the primes dividing $h$ change only the bounded local correction, not the leading coefficient of $\log w$.
[/example]
The dimension example identifies the power of the logarithm that should appear in the final bound. What remains is to connect that local prediction to the actual Selberg inequality: the Euler product must have the predicted survival size, the optimized denominator $G(D,z)$ must grow with the same logarithmic dimension, and the lcm-remainder sum must stay smaller than the main term. Without these three checks, the formal dimension can be correct while the upper bound still has no asymptotic force.
[quotetheorem:7176]
[citeproof:7176]
The dimension hypotheses are doing two jobs. The estimate for the Euler product says that the expected local survival probability is of order $(\log z)^{-\kappa}$, while the lower bound for $G(D,z)$ says that the optimized Selberg weights actually capture this saving at level $D$. If $G(D,z)$ is smaller than $(\log D)^\kappa$, the main term deteriorates; if the accumulated remainder is comparable to $X$, the sieve has no power even when the formal local dimension is correct. Compared with Brun's sieve, this method gives a cleaner upper-bound mechanism and better constants in many settings. It does not by itself solve the parity problem: because the majorant is a square, it is designed for upper bounds and cannot distinguish primes from integers with an even number of prime factors at the level needed for lower bounds.
## Prime Tuples and Twin Prime Upper Bounds
The central application in this chapter is to sets where several linear forms are required to be prime. The sieve first counts integers for which none of the forms has a small prime factor, and then chooses $z$ so that actual prime tuples are included among the sifted objects.
[definition: Prime Tuple Local Density]
Let
\begin{align*}
L_i:\mathbb Z\to\mathbb Z, \qquad n\mapsto a_i n+b_i
\end{align*}
for $1\le i\le k$, with integer coefficients and no fixed prime divisor of $\prod_i L_i(n)$. The local-counting function is
\begin{align*}
\nu:\{p:p\text{ prime}\}\to\mathbb N_0,
\end{align*}
where $\nu(p)\in\{0,\dots,p\}$ is the number of residue classes $n\pmod p$ for which
\begin{align*}
\prod_{i=1}^k L_i(n)\equiv 0\pmod p.
\end{align*}
The Selberg local density is
\begin{align*}
g:\{p:p\text{ prime}\}\to [0,1], \qquad p\mapsto \frac{\nu(p)}{p}.
\end{align*}
[/definition]
The condition excluding a fixed prime divisor is essential. If some prime divides the product for every $n$, then the sifted set is empty for $z$ beyond that prime, and the singular series vanishes. The following theorem packages the dimension-$k$ Selberg sieve with these local corrections and gives the course's main prime-tuple upper bound.
[quotetheorem:7177]
[proofunderconstruction:7177]
This is the archetypal upper-bound theorem: it has the same logarithmic order and singular series as the expected Hardy-Littlewood asymptotic, but only in the upper-bound direction. The condition of no fixed prime divisor is necessary because a fixed local obstruction makes the singular series vanish and leaves at most exceptional prime values. Distinctness prevents two forms from representing the same congruence obstruction throughout, while the positivity assumption keeps the phrase "is prime" tied to ordinary positive primes on the whole counting range. Collisions of residue classes modulo finitely many primes do not destroy the estimate; they are exactly what the local factors $\nu(p)$ and the singular series record.
[example: Twin Prime Upper Bound]
Take $L_1(n)=n$ and $L_2(n)=n+2$, and count only $2\le n\le x$; the omitted endpoint $n=1$ contributes at most $1$ and is harmless in an upper bound. The forbidden residue classes modulo $p$ are the solutions of
\begin{align*}
n(n+2)\equiv 0\pmod p.
\end{align*}
This congruence is equivalent to $n\equiv 0\pmod p$ or $n\equiv -2\pmod p$. If $p>2$, then $0\not\equiv -2\pmod p$, so there are two forbidden classes and $\nu(p)=2$. If $p=2$, then $-2\equiv 0\pmod 2$, so the two classes coincide and $\nu(2)=1$.
For these two forms the singular series from *Selberg Bound for Prime K-Tuples* is
\begin{align*}
\mathfrak S(n,n+2)=\prod_p\left(1-\frac1p\right)^{-2}\left(1-\frac{\nu(p)}p\right).
\end{align*}
The factor at $p=2$ is
\begin{align*}
\left(1-\frac12\right)^{-2}\left(1-\frac12\right)=2.
\end{align*}
For $p>2$, using $\nu(p)=2$ gives
\begin{align*}
\left(1-\frac1p\right)^{-2}\left(1-\frac2p\right)=\frac{p^2}{(p-1)^2}\cdot \frac{p-2}{p}.
\end{align*}
Multiplying the fractions gives
\begin{align*}
\frac{p^2}{(p-1)^2}\cdot \frac{p-2}{p}=\frac{p(p-2)}{(p-1)^2}.
\end{align*}
Since $p(p-2)=p^2-2p$ and $(p-1)^2=p^2-2p+1$, this is
\begin{align*}
\frac{p(p-2)}{(p-1)^2}=1-\frac{1}{(p-1)^2}.
\end{align*}
Hence
\begin{align*}
\mathfrak S(n,n+2)=2\prod_{p>2}\left(1-\frac{1}{(p-1)^2}\right),
\end{align*}
which is the twin-prime singular series with the conventional leading factor $2$.
Applying *Selberg Bound for Prime K-Tuples* with $k=2$ gives
\begin{align*}
|\{n\le x:n\text{ and }n+2\text{ are prime}\}|\ll \mathfrak S(n,n+2)\frac{x}{(\log x)^2}+1.
\end{align*}
The singular series is a fixed positive constant for this pair of forms, and the endpoint $+1$ is absorbed into the same upper-bound scale for $x\ge 2$, so
\begin{align*}
|\{n\le x:n\text{ and }n+2\text{ are prime}\}|\ll \frac{x}{(\log x)^2}.
\end{align*}
Compared with the earlier Brun-style estimate $\ll x(\log\log x)^2/(\log x)^2$, the Selberg bound has the same denominator but no extra factor $(\log\log x)^2$ in the numerator.
[/example]
The course also uses the same argument for any fixed even gap. The local factor changes only at primes dividing the gap, which changes the singular series but not the logarithmic order.
[example: Prime Pairs With Gap H]
Fix an even integer $h\ne 0$, and consider the two linear forms $L_1(n)=n$ and $L_2(n)=n+h$. For a prime $p$, the forbidden residue classes are the solutions of
\begin{align*}
n(n+h)\equiv 0\pmod p.
\end{align*}
Since a product is $0$ modulo the prime $p$ exactly when one of its factors is $0$ modulo $p$, this is equivalent to
\begin{align*}
n\equiv 0\pmod p \quad \text{or}\quad n\equiv -h\pmod p.
\end{align*}
If $p\nmid h$, then $h\not\equiv 0\pmod p$, so $0\not\equiv -h\pmod p$ and there are two forbidden residue classes. If $p\mid h$, then $-h\equiv 0\pmod p$, so the two classes coincide and there is one forbidden class. Thus $\nu_h(p)=2$ for $p\nmid h$, while $\nu_h(p)=1$ for $p\mid h$.
The singular series attached to the pair $(n,n+h)$ is
\begin{align*}
\mathfrak S_h=\prod_p\left(1-\frac1p\right)^{-2}\left(1-\frac{\nu_h(p)}p\right).
\end{align*}
For primes $p\nmid h$, the local factor is
\begin{align*}
\left(1-\frac1p\right)^{-2}\left(1-\frac2p\right)=\frac{p^2}{(p-1)^2}\cdot\frac{p-2}{p}.
\end{align*}
Multiplying the fractions gives
\begin{align*}
\frac{p^2}{(p-1)^2}\cdot\frac{p-2}{p}=\frac{p(p-2)}{(p-1)^2}.
\end{align*}
Since $p(p-2)=p^2-2p$ and $(p-1)^2=p^2-2p+1$, this equals
\begin{align*}
\frac{p(p-2)}{(p-1)^2}=1-\frac{1}{(p-1)^2}.
\end{align*}
For primes $p\mid h$, the local factor is
\begin{align*}
\left(1-\frac1p\right)^{-2}\left(1-\frac1p\right)=\left(1-\frac1p\right)^{-1}.
\end{align*}
Equivalently,
\begin{align*}
\left(1-\frac1p\right)^{-1}=\frac{p}{p-1}.
\end{align*}
Only finitely many primes divide the fixed integer $h$, so these exceptional local factors change the singular series by a constant depending on $h$.
Applying *Selberg Bound for Prime K-Tuples* with $k=2$ gives
\begin{align*}
|\{n\le x:n,n+h\text{ prime}\}|\ll \mathfrak S_h\frac{x}{(\log x)^2}.
\end{align*}
Once $h$ is fixed, $\mathfrak S_h$ is a fixed positive constant depending only on $h$, and therefore
\begin{align*}
|\{n\le x:n,n+h\text{ prime}\}|\ll_h \frac{x}{(\log x)^2}.
\end{align*}
If $h$ is odd, then $n$ and $n+h$ have opposite parity. Two primes of opposite parity can both be prime only when one of them is $2$, so the only possible values are $n=2$ or $n=2-h$; this gives at most two values of $n$, independently of $x$.
[/example]
Selberg's upper bound sieve is therefore the first method in the course that naturally reaches the conjectural logarithmic order for prime-pair upper bounds. Its limitation is equally important: the positivity that makes the upper bound robust is also what prevents it from producing matching lower bounds for primes.
Selberg's sieve delivers the strongest general upper bounds in the one-dimensional setting, but its positivity also prevents it from proving matching lower bounds for primes. The next chapter specializes the same machinery to the linear sieve, where the goal shifts to controlling almost primes and extracting the best possible one-sided asymptotics.
# 5. The Linear Sieve and Almost Primes
Chapters 3 and 4 built the combinatorial and Selberg weighted sieve machinery needed to estimate sifted sets. This chapter specialises that machinery to the most important one-dimensional case, where the local density behaves like $1/p$ and the main term has size
\begin{align*}
\frac{X}{\log z}.
\end{align*}
The linear sieve gives nearly optimal upper and lower bounds for such problems, but it also exposes a structural obstruction: ordinary sieve weights see primes and products of two primes in almost the same way. The payoff is a robust theory of almost primes, which is often the strongest conclusion available from distributional information alone.
## The Dimension-One Sieve Problem
The guiding question is how well a sieve can count elements of a sequence after all prime divisors below a threshold have been removed. In dimension one, the expected loss at each prime $p$ is a factor close to $1 - 1/p$, so Mertens' theorem predicts a main scale $X / \log z$. The linear sieve asks for uniform upper and lower constants multiplying this scale, with the constants depending only on the ratio between the available level of distribution and the sifting range.
In this chapter we keep the symbol $A$ for the finite sequence or multiset, with total expected size $X$. For a squarefree integer $d$, write
\begin{align*}
A_d = \{a \in A : d \mid a\}, \qquad |A_d| = \frac{\omega(d)}{d}X + R_d,
\end{align*}
where $\omega$ is multiplicative on squarefree integers. The linear case is the case where $\omega(p)$ is close to $1$ on average over primes.
To isolate the primes used for sieving, set
\begin{align*}
P(z) = \prod_{p < z} p, \qquad S(A, z) = |\{a \in A : (a, P(z)) = 1\}|.
\end{align*}
The main term coming from the local densities is
\begin{align*}
V(z) = \prod_{p < z}\left(1 - \frac{\omega(p)}{p}\right).
\end{align*}
The target estimate is therefore $S(A,z) \approx X V(z)$, with an error controlled by the remainders $R_d$ and the level up to which they are known.
The next definition packages the hypotheses under which the linear sieve is usually stated. It separates the arithmetic input, contained in the distribution estimate for $A_d$, from the universal sieve constants. Without this separation, two different failures would be mixed together: the sequence might have the wrong local density, or it might have the right density but only up to a level too small for the desired sieve range.
[definition: Linear Sieve Data]
Let $A$ be a finite sequence, let $X > 0$, let $2 \le z \le D$, and let $\omega: \{d \in \mathbb N : d \text{ is squarefree}\} \to \mathbb R$ be a multiplicative function. The tuple $(A, X, \omega, z, D)$ is linear sieve data if
\begin{align*}
|A_d| = \frac{\omega(d)}{d}X + R_d
\end{align*}
for every squarefree $d \mid P(z)$ with $d \le D$, and if $0 \le \omega(p) < p$ for every prime $p < z$.
[/definition]
The parameter $D$ is the level of distribution available to the sieve. The ratio
\begin{align*}
s = \frac{\log D}{\log z}
\end{align*}
measures how deeply the sequence is distributed compared with the range of primes being removed. Large $s$ means that the sieve has enough information to control many products of small primes; small $s$ means that the truncation still leaves a large uncertainty.
The local density must also satisfy a one-dimensional normalisation. The following condition is the analytic replacement for the informal statement that each prime removes one residue class.
[definition: Dimension-One Sieve Condition]
Linear sieve data have dimension one with constant $C \ge 1$ if, for all $2 \le w < z$,
\begin{align*}
\prod_{w \le p < z}\left(1 - \frac{\omega(p)}{p}\right)^{-1} \le C \frac{\log z}{\log w}.
\end{align*}
[/definition]
This condition is stable under the examples considered in the course. It is a multiplicative Mertens-type estimate, and it supplies the correct logarithmic scale for the sifted set.
[example: Integers with Small Prime Divisors Removed]
Take $A=\{1,2,\dots,x\}$ and set $\omega(p)=1$ for every prime $p<z$. Since $\omega$ is multiplicative on squarefree integers, every squarefree $d\mid P(z)$ has $\omega(d)=1$. The set $A_d$ is the set of multiples of $d$ up to $x$, so
\begin{align*}
|A_d|=\left\lfloor \frac{x}{d}\right\rfloor.
\end{align*}
Writing
\begin{align*}
R_d=\left\lfloor \frac{x}{d}\right\rfloor-\frac{x}{d},
\end{align*}
we have $-1<R_d\le 0$, hence $|R_d|\le 1$, and therefore
\begin{align*}
|A_d|=\frac{x}{d}+R_d=\frac{\omega(d)}{d}x+R_d.
\end{align*}
For these local densities,
\begin{align*}
V(z)=\prod_{p<z}\left(1-\frac{\omega(p)}{p}\right)=\prod_{p<z}\left(1-\frac{1}{p}\right).
\end{align*}
By *Mertens' product theorem*,
\begin{align*}
\prod_{p<z}\left(1-\frac{1}{p}\right)\sim \frac{e^{-\gamma}}{\log z}.
\end{align*}
Thus the expected main term is
\begin{align*}
xV(z)\sim \frac{e^{-\gamma}x}{\log z},
\end{align*}
so it has order $x/\log z$. Finally,
\begin{align*}
S(A,z)=|\{n\le x:(n,P(z))=1\}|
\end{align*}
counts precisely the integers $n\le x$ with no prime factor below $z$. The level $D$ determines how many of the errors $R_d$, each bounded by $1$, must be controlled in the summed remainder over squarefree $d\le D$ with $d\mid P(z)$.
[/example]
This example is the model for the whole chapter: the sieve does not try to identify primes directly, but instead counts integers with no small prime factor. If $z$ is below $x^{1/2}$, such integers may be prime or may be products of a few large primes, which is why almost primes naturally enter later.
## The Linear Sieve Functions
The problem now is to find the best universal constants in the estimates for $S(A,z)$. These constants are encoded by two functions, $F(s)$ for upper bounds and $f(s)$ for lower bounds. They do not depend on the particular sequence $A$; all sequence-specific information is concentrated in $X$, $V(z)$, and the remainder sum.
The linear sieve functions arise from optimising upper and lower sieve weights. In dimension one, the optimisation is governed by a delay differential system rather than by a finite-dimensional linear programme.
[definition: Linear Sieve Functions]
The linear sieve upper and lower functions are continuous maps
\begin{align*}
F,f : (0,\infty) \to \mathbb R
\end{align*}
determined by
\begin{align*}
sF(s) = 2e^{\gamma} \quad \text{for } 0 < s \le 3.
\end{align*}
They also satisfy
\begin{align*}
sf(s) = 0 \quad \text{for } 0 < s \le 2.
\end{align*}
For $s>3$,
\begin{align*}
(sF(s))' = f(s-1).
\end{align*}
For $s>2$,
\begin{align*}
(sf(s))' = F(s-1).
\end{align*}
[/definition]
The lower function begins at zero because no positive lower bound is possible below the first real threshold. Once $s>2$, the lower-bound weights have enough room to force a positive main term, and the two functions then feed into each other through the delay system.
[remark: Limiting Behaviour of the Sieve Functions]
The functions satisfy $F(s) \to 1$ and $f(s) \to 1$ as $s \to \infty$. Thus the linear sieve becomes asymptotically sharp when the level of distribution is much larger than the sifting range. For bounded $s$, the constants $F(s)$ and $f(s)$ quantify the loss caused by truncating inclusion-exclusion.
[/remark]
We next state the central theorem in the form used for applications. The error term is expressed through the total remainder over the squarefree moduli actually touched by the sieve weights.
[quotetheorem:7178]
The theorem is powerful because the difficult analytic work is now reduced to bounding the remainders. Each hypothesis has a concrete failure mode. If the dimension-one condition is removed, the sequence of integers divisible by values of $k$ independent residue conditions modulo $p$ has local factor roughly $1-k/p$, so $V(z)$ has scale $(\log z)^{-k}$ rather than $(\log z)^{-1}$ and the linear sieve functions are the wrong constants. If the remainder sum is not small, a sequence can be modified on one residue class for every small modulus so that the accumulated $R_d$ is as large as $X V(z)$, and then the displayed main term no longer controls the sign or size of the answer. If the finite sequence is not tied to the stated local density model, the same notation $S(A,z)$ only counts elements with no small prime divisor and says nothing about their arithmetic type. Thus the theorem does not identify primes; it only counts elements not divisible by primes below $z$, which is why the next step is to interpret these sifted elements as almost primes rather than as primes themselves.
[example: A Level of Distribution Hypothesis]
Suppose $A$ has dimension-one local density and, for some $\vartheta>0$ and every $\varepsilon>0$,
\begin{align*}
\sum_{\substack{d \le X^{\vartheta-\varepsilon},\ d \mid P(z)}} |R_d| = o(XV(z))
\end{align*}
uniformly for the relevant choices of $z$. Fix $u>0$ and take
\begin{align*}
D=X^{\vartheta-\varepsilon}
\end{align*}
and
\begin{align*}
z=X^{1/u}.
\end{align*}
Then
\begin{align*}
\log D=\log\left(X^{\vartheta-\varepsilon}\right)=(\vartheta-\varepsilon)\log X
\end{align*}
and
\begin{align*}
\log z=\log\left(X^{1/u}\right)=\frac{1}{u}\log X.
\end{align*}
Therefore the sieve parameter is
\begin{align*}
s=\frac{\log D}{\log z}=\frac{(\vartheta-\varepsilon)\log X}{(1/u)\log X}=u(\vartheta-\varepsilon).
\end{align*}
If $u\vartheta>2$, choose $\varepsilon>0$ so small that $u(\vartheta-\varepsilon)>2$. For this fixed choice, the lower-bound linear sieve gives
\begin{align*}
S(A,X^{1/u}) \ge XV(X^{1/u})\{f(u(\vartheta-\varepsilon))+o(1)\}-o(XV(X^{1/u})).
\end{align*}
Since $u(\vartheta-\varepsilon)>2$, the lower sieve function is positive at this value of $s$, so set
\begin{align*}
c=\frac{1}{2}f(u(\vartheta-\varepsilon))>0.
\end{align*}
For all sufficiently large $X$, the two $o(XV(X^{1/u}))$ terms together have absolute value at most $cXV(X^{1/u})$, and hence
\begin{align*}
S(A,X^{1/u}) \ge cXV(X^{1/u}).
\end{align*}
Thus the level of distribution up to $X^{\vartheta-\varepsilon}$ is deep enough, relative to the sifting threshold $X^{1/u}$, to leave a positive main-term number of elements of $A$ with no prime factor below $X^{1/u}$.
[/example]
The inequality in the example is the standard bridge from analytic distribution results to almost-prime conclusions. It says that if the sequence is distributed in residue classes deeply enough relative to the sieving threshold, then the lower-bound sieve leaves a positive number of sifted elements.
## The Parity Problem
The next question is whether the linear sieve can be sharpened enough to detect primes themselves. The obstruction is not just technical. Ordinary sieve weights built from divisibility by small primes have a built-in parity blindness: they have difficulty separating integers with an odd number of prime factors from those with an even number.
This limitation can already be felt in the shape of the linear sieve. The lower bound begins only when $s>2$, which corresponds to sieving below a square-root type threshold. But an integer with no prime factor below $x^{1/2}$ is either $1$ or prime when it is at most $x$, while an integer with no prime factor below $x^{1/3}$ may also be a product of two large primes. The information needed to distinguish these cases is tied to parity of the number of remaining prime factors.
[definition: Parity-Blind Divisor-Sum Functional]
Fix $z \ge 2$ and $D \ge 1$. A parity-blind divisor-sum functional at level $D$ is a map $\mathcal L$ from finite sequences of positive integers to $\mathbb R$ for which there exist coefficients $\lambda_d \in \mathbb R$, supported on squarefree integers $d \le D$ with $d \mid P(z)$, such that
\begin{align*}
\mathcal L(A) = \sum_{\substack{d \le D,\ d \mid P(z)}} \lambda_d |A_d|.
\end{align*}
Two finite sequences $A$ and $B$ are equivalent for such functionals at level $D$ if $|A_d|=|B_d|$ for every squarefree $d \le D$ with $d \mid P(z)$.
[/definition]
This definition makes explicit what information an ordinary divisor-sum sieve is allowed to use. If two sequences are equivalent at level $D$, every such functional gives them the same value, even if their elements have different numbers of prime factors outside the sieving range. The parity problem begins from this loss of information: improving an upper-bound sieve into a prime-counting asymptotic requires input not expressible only through small-prime divisibility.
Write $\Omega(n)$ for the total number of prime factors of $n$, counted with multiplicity. The parity obstruction concerns the parity of this integer after the small-prime divisibility data have already been fixed.
[quotetheorem:7179]
The parity principle explains why sieve methods excel at upper bounds and almost-prime lower bounds but stall at prime lower bounds. Its hypotheses are also sharp in specific ways. If the restriction to divisor sums is dropped, then Type II bilinear estimates or Vaughan-type identities can compare factorizations in two variables and may distinguish primes from products of two large primes. If the matching-local-data assumption is dropped, for instance by taking $A$ to be odd integers and $B$ to be even integers, the divisor count at $d=2$ separates the two sequences immediately. If the squarefree support condition is replaced by information involving large prime factors themselves, such as direct access to $\Omega(n)$ or to a von Mangoldt weight, then the parity conclusion no longer describes the available data. Thus the principle is not saying that primes are indistinguishable from semiprimes in every analytic framework; it says that local congruence data alone do not contain enough information to separate them. Later tools, including Type I/II decompositions, Chen's switching arguments, and Bombieri's asymptotic sieve, are designed to add precisely the extra information that ordinary one-dimensional divisor sums lack.
[example: Twin Primes and the Parity Barrier]
For the twin-prime pattern, the sifted sequence is built from
\begin{align*}
a_n=n(n+2).
\end{align*}
The condition $(a_n,P(z))=1$ means that, for every prime $p<z$, neither factor is divisible by $p$. Equivalently,
\begin{align*}
p\nmid n(n+2)
\end{align*}
if and only if
\begin{align*}
n\not\equiv 0 \pmod p \quad \text{and} \quad n\not\equiv -2 \pmod p.
\end{align*}
Thus for odd $p$ the sieve removes the two residue classes $0$ and $-2$ modulo $p$, while for $p=2$ these two classes coincide because $-2\equiv 0 \pmod 2$.
This local condition is necessary for both $n$ and $n+2$ to be prime, except for the small exceptional case where one of the factors equals a prime below $z$. It is not sufficient. For example, if $z=5$, then $P(z)=2\cdot 3=6$, and
\begin{align*}
77\cdot 79
\end{align*}
is coprime to $6$ because
\begin{align*}
77\equiv 5 \pmod 6,\qquad 79\equiv 1 \pmod 6.
\end{align*}
But
\begin{align*}
77=7\cdot 11,
\end{align*}
so the sifted condition has allowed a value where one member of the pair is a product of two primes larger than the sieving primes.
The same obstruction remains in the divisor-sum language. A parity-blind sieve expression has the form
\begin{align*}
\sum_{\substack{d\le D,\ d\mid P(z)}}\lambda_d\,|\{n:d\mid n(n+2)\}|.
\end{align*}
Each term only records divisibility by products of primes below $z$; it does not record whether the remaining large prime factors make $n$ or $n+2$ prime, semiprime, or a product with more factors. Therefore the sieve can bound or count many $n$ for which $n(n+2)$ has no small prime factor, and under distribution hypotheses it can produce bounded-almost-prime conclusions, but those same local congruence data alone cannot force both $n$ and $n+2$ to be prime.
[/example]
This does not make the sieve ineffective. It identifies the exact point at which further arithmetic input is required, and it still supplies the almost-prime results that are often the first meaningful approximation to a prime pattern.
## Almost Primes from the Lower-Bound Sieve
Since primality is out of reach for ordinary sieves in many problems, the natural replacement is to count elements with only a bounded number of prime factors. The lower-bound linear sieve is designed for precisely this purpose. If an integer has no small prime factor and is not too large, then it cannot have many prime factors counted with multiplicity.
[definition: Almost Prime]
For $r \in \mathbb N$, an integer $n \ge 2$ is called a $P_r$ if $n$ has at most $r$ prime factors counted with multiplicity.
[/definition]
The phrase counted with multiplicity means, for example, that $12=2^2\cdot 3$ is a $P_3$ but not a $P_2$. In applications the elements of $A$ often have size at most $X^\alpha$ for some fixed $\alpha$, and the absence of prime factors below $X^{1/u}$ forces a bound on the total number of prime factors.
[quotetheorem:7180]
[citeproof:7180]
This theorem turns a lower bound for $S(A,X^{1/u})$ into a lower bound for almost primes in the sequence. The size bound is necessary: for any $m$ one may take a product of $m$ primes all exceeding $X^{1/u}$, and this integer has no small prime factor but is not a $P_{m-1}$; without an upper bound comparable to $X^\alpha$, no fixed $r$ follows. The lower threshold condition is also doing real work. If the sieve only removed primes below a smaller or unspecified threshold, many prime factors just above that weaker threshold could fit below $X^\alpha$, and the claimed bound on $\Omega(a)$ would fail. The theorem does not say that the sifted elements are prime, nor does it give the optimal number of prime factors; it gives the elementary conversion that lets the linear sieve produce a concrete $P_r$ conclusion once $u$ has been chosen. The exact value of $r$ depends on the size of the sequence elements and on how large $u$ must be to make $s=u\vartheta$ exceed $2$.
[example: Almost Primes Under Level of Distribution]
Assume $A$ consists of positive integers $a \le X^\alpha$, has dimension-one local density, and has level of distribution $D=X^{\vartheta-o(1)}$ with negligible remainder sum. Choose $u>2/\vartheta$. Since $u\vartheta>2$, choose $\varepsilon>0$ such that
\begin{align*}
u(\vartheta-\varepsilon)>2.
\end{align*}
For all sufficiently large $X$, the hypothesis $D=X^{\vartheta-o(1)}$ allows the sieve level
\begin{align*}
D_\varepsilon=X^{\vartheta-\varepsilon}.
\end{align*}
With the sifting threshold $z=X^{1/u}$, the sieve parameter is
\begin{align*}
s=\frac{\log D_\varepsilon}{\log z}.
\end{align*}
The numerator is
\begin{align*}
\log D_\varepsilon=\log\left(X^{\vartheta-\varepsilon}\right)=(\vartheta-\varepsilon)\log X.
\end{align*}
The denominator is
\begin{align*}
\log z=\log\left(X^{1/u}\right)=\frac{1}{u}\log X.
\end{align*}
Therefore
\begin{align*}
s=\frac{(\vartheta-\varepsilon)\log X}{(1/u)\log X}=u(\vartheta-\varepsilon)>2.
\end{align*}
By the lower-bound part of *[Linear Sieve Upper and Lower Bounds](/theorems/7178)*, the negligible remainder sum gives
\begin{align*}
S(A,X^{1/u}) \ge X V(X^{1/u})\{f(u(\vartheta-\varepsilon))+o(1)\}-o(XV(X^{1/u})).
\end{align*}
Since $u(\vartheta-\varepsilon)>2$, the lower sieve function is positive at this point. Set
\begin{align*}
c=\frac{1}{2}f(u(\vartheta-\varepsilon))>0.
\end{align*}
For all sufficiently large $X$, the two error terms together have absolute value at most $cXV(X^{1/u})$, so
\begin{align*}
S(A,X^{1/u}) \ge cXV(X^{1/u}).
\end{align*}
Thus at least $cXV(X^{1/u})$ elements of $A$ have no prime factor below $X^{1/u}$.
Now let $a$ be one of these sifted elements. If $a$ had $r+1$ prime factors counted with multiplicity, then each of those prime factors would be at least $X^{1/u}$, so
\begin{align*}
a \ge \left(X^{1/u}\right)^{r+1}.
\end{align*}
Expanding the power gives
\begin{align*}
\left(X^{1/u}\right)^{r+1}=X^{(r+1)/u}.
\end{align*}
If $r\ge \alpha u$, then $r+1>\alpha u$, hence
\begin{align*}
\frac{r+1}{u}>\alpha.
\end{align*}
Therefore
\begin{align*}
a\ge X^{(r+1)/u}>X^\alpha,
\end{align*}
contradicting the assumed size bound $a\le X^\alpha$. Hence every sifted element is a $P_r$ for every integer $r\ge \alpha u$, so the lower-bound sieve produces many $P_r$ values in $A$.
[/example]
The choice of $u$ is a trade-off. Larger $u$ makes the lower-bound sieve easier to apply because $s=u\vartheta$ grows, but it also weakens the almost-prime conclusion because the permitted number $r$ of prime factors increases.
[remark: Optimising the Almost-Prime Parameter]
In a first application, one usually chooses $u$ just above $2/\vartheta$, giving approximately $r > 2\alpha/\vartheta$. Refinements improve the constants by using weighted sieves, switching arguments, or extra information about the sizes and locations of prime factors. The linear sieve supplies the baseline result against which these refinements are measured.
[/remark]
The lower-bound sieve also clarifies why Bombieri-Vinogradov type input is so valuable. A level of distribution close to $1/2$ often gives $s$ just above $2$ when sieving at a range near $X^{1/u}$ with $u$ near $4$, leading naturally to $P_r$ conclusions rather than prime conclusions.
## The Linear Sieve as a Template
The final point of the chapter is methodological. A linear sieve application has a repeated structure: identify the sequence, compute local densities, prove a level of distribution, apply the universal upper or lower bound, then interpret the sifted elements. This template is flexible enough to handle primes in arithmetic progressions, polynomial sequences, shifted primes, and many additive problems, provided the required distribution estimates are available.
[example: Template for a Lower-Bound Application]
Let $A$ be a sequence of positive integers with $a\le X^\alpha$ for every $a\in A$. Assume the squarefree divisibility counts have the form
\begin{align*}
|A_d|=\frac{\omega(d)}{d}X+R_d
\end{align*}
with $\omega(p)$ satisfying the dimension-one sieve condition, and assume that the remainder sum is negligible up to level $X^{\vartheta-o(1)}$. Fix $u>2/\vartheta$. Since $u\vartheta>2$, choose $\varepsilon>0$ such that
\begin{align*}
u(\vartheta-\varepsilon)>2.
\end{align*}
Take
\begin{align*}
D_\varepsilon=X^{\vartheta-\varepsilon}
\end{align*}
and
\begin{align*}
z=X^{1/u}.
\end{align*}
Then
\begin{align*}
\log D_\varepsilon=\log\left(X^{\vartheta-\varepsilon}\right)=(\vartheta-\varepsilon)\log X.
\end{align*}
Also
\begin{align*}
\log z=\log\left(X^{1/u}\right)=\frac{1}{u}\log X.
\end{align*}
Therefore the sieve parameter is
\begin{align*}
s=\frac{\log D_\varepsilon}{\log z}=\frac{(\vartheta-\varepsilon)\log X}{(1/u)\log X}=u(\vartheta-\varepsilon)>2.
\end{align*}
By the lower-bound part of *Linear Sieve Upper and Lower Bounds*, and by the assumed negligible remainder sum at level $D_\varepsilon$, we get
\begin{align*}
S(A,X^{1/u})\ge XV(X^{1/u})\{f(u(\vartheta-\varepsilon))+o(1)\}-o(XV(X^{1/u})).
\end{align*}
Since $u(\vartheta-\varepsilon)>2$, the lower sieve function is positive at this value. Put
\begin{align*}
c=\frac{1}{2}f(u(\vartheta-\varepsilon)).
\end{align*}
Then $c>0$, and for all sufficiently large $X$ the two error terms together have size at most $cXV(X^{1/u})$. Hence
\begin{align*}
S(A,X^{1/u})\ge cXV(X^{1/u}).
\end{align*}
Thus at least $cXV(X^{1/u})$ elements of $A$ have no prime factor below $X^{1/u}$.
Now let $a$ be one of these sifted elements, and let $r$ be an integer with $r\ge \alpha u$. If $a$ had at least $r+1$ prime factors counted with multiplicity, then each of those prime factors would be at least $X^{1/u}$, so
\begin{align*}
a\ge \left(X^{1/u}\right)^{r+1}=X^{(r+1)/u}.
\end{align*}
Since $r\ge \alpha u$, we have
\begin{align*}
r+1>\alpha u.
\end{align*}
Dividing by $u>0$ gives
\begin{align*}
\frac{r+1}{u}>\alpha.
\end{align*}
Therefore
\begin{align*}
a\ge X^{(r+1)/u}>X^\alpha,
\end{align*}
contradicting the size bound $a\le X^\alpha$. Hence each sifted element is a $P_r$ for every integer $r\ge \alpha u$, so the template produces a positive main-term number of almost-prime values in $A$.
[/example]
This pattern should be read together with the parity principle. The same calculation may be strong enough to prove infinitely many almost primes while still being structurally unable to prove infinitely many primes. Chapters 6 and 7 supply the main analytic inputs for this attempt: large-sieve mean-square control and Bombieri-Vinogradov distribution estimates, which are then used in Chapter 8 for Chen-type almost-prime conclusions.
The linear sieve completes the combinatorial side of the story by giving the right scale for sifted sets and almost-prime counts, while also exposing the parity barrier. To go beyond local truncation arguments, the next chapter introduces the large sieve, which measures how residue-class errors behave on average across many moduli.
# 6. The Large Sieve
The large sieve is the point at which the course moves from local combinatorial sieving to global mean-square control. Earlier chapters measured how much of a sequence survives congruence restrictions prime by prime, with the total remainder $R(D)$ limiting the usable sieve level. Here we ask how large a family of congruence obstructions can be when tested all at once. The main idea is that fractions with distinct reduced denominators cannot cluster too much on the unit circle, and this spacing forces an almost-orthogonality inequality.
The chapter has three goals. First, we prove the additive large sieve in a Hilbert-space form that isolates the underlying duality argument. Second, we convert it into the multiplicative large sieve for Dirichlet characters. Third, we use the character form to obtain mean-square estimates for primes and general sequences in arithmetic progressions by mean-square orthogonality rather than by locating zeros of Dirichlet $L$-functions.
## Additive Spacing and the Hilbert-Space Inequality
The first question is how to bound many exponential sums at once. If $(a_n)$ is supported on an interval and the points $\theta_r \in \mathbb R/\mathbb Z$ are well separated, the sums $\sum_n a_n e(n\theta_r)$ cannot all be large unless the coefficients themselves have large square norm. This is the analytic content of the additive large sieve.
[definition: Exponential Notation]
Define the function $e:\mathbb R\to\mathbb C$ by
\begin{align*} e(t) := e^{2\pi i t}. \end{align*}
[/definition]
Since $e(t+m)=e(t)$ for every $m\in\mathbb Z$, the same notation also denotes the induced function $e:\mathbb R/\mathbb Z\to\mathbb C$.
This notation turns additive characters on $\mathbb R/\mathbb Z$ into short expressions. To make a family of such characters behave like an orthogonal family, we need a quantitative rule saying that the frequencies do not bunch together.
[definition: Delta-Spaced Set]
Let $\Delta > 0$. A finite set $\Theta \subset \mathbb R/\mathbb Z$ is $\Delta$-spaced if, for distinct $\theta,\theta' \in \Theta$,
\begin{align*} \|\theta-\theta'\|_{\mathbb R/\mathbb Z} \ge \Delta, \end{align*}
where $\|x\|_{\mathbb R/\mathbb Z} := \min_{m\in\mathbb Z}|x-m|$.
[/definition]
The guiding arithmetic family is the set of reduced rational points with bounded denominator. Before proving the general inequality, it is worth checking that this family has exactly the spacing needed for the large sieve.
[example: Reduced Fractions Are Spaced]
Let
\begin{align*} \Theta_Q := \left\{\frac{a}{q} \bmod 1 : 1 \le q \le Q,\ 1 \le a \le q,\ \gcd(a,q)=1\right\}. \end{align*}
We show that any two distinct points of $\Theta_Q$ are separated by at least $Q^{-2}$ in $\mathbb R/\mathbb Z$. Take two representatives $a/q$ and $b/r$ with $q,r\le Q$, $\gcd(a,q)=\gcd(b,r)=1$, and suppose that $a/q\not\equiv b/r\pmod 1$. By the definition of distance on $\mathbb R/\mathbb Z$,
\begin{align*} \left\|\frac{a}{q}-\frac{b}{r}\right\|_{\mathbb R/\mathbb Z}=\min_{m\in\mathbb Z}\left|\frac{a}{q}-\frac{b}{r}-m\right|. \end{align*}
For each $m\in\mathbb Z$,
\begin{align*} \frac{a}{q}-\frac{b}{r}-m=\frac{ar-bq-mqr}{qr}. \end{align*}
Since $a/q\not\equiv b/r\pmod 1$, the integer $ar-bq-mqr$ is never $0$. Hence $|ar-bq-mqr|\ge 1$ for every $m\in\mathbb Z$, so
\begin{align*} \left|\frac{a}{q}-\frac{b}{r}-m\right|\ge \frac{1}{qr}. \end{align*}
Taking the minimum over $m$ gives
\begin{align*} \left\|\frac{a}{q}-\frac{b}{r}\right\|_{\mathbb R/\mathbb Z}\ge \frac{1}{qr}. \end{align*}
Because $q\le Q$ and $r\le Q$, we have $qr\le Q^2$, and therefore
\begin{align*} \frac{1}{qr}\ge \frac{1}{Q^2}. \end{align*}
Thus $\Theta_Q$ is $Q^{-2}$-spaced in $\mathbb R/\mathbb Z$. The reduced residue classes with moduli $q\le Q$ can therefore be treated as a separated family of points on the circle.
[/example]
This spacing example explains what the theorem must prove: separation of frequencies should force the vectors $(e(n\theta))_n$ to be almost orthogonal. The next result gives the precise Hilbert-space estimate, with one term measuring the length of the coefficient interval and the other measuring how many separated points fit on the circle.
[quotetheorem:7181]
[citeproof:7181]
The theorem is useful because reduced fractions supply a ready-made spaced set. Its hypotheses are also close to sharp: if many frequencies were allowed inside an interval of length much smaller than $N^{-1}$, the corresponding exponential vectors would point in nearly the same direction and the left-hand side could grow like the number of repeated tests. The term $N-1$ is the interval-length contribution, while $\Delta^{-1}$ is the geometric cost of how many separated points can fit around the circle. Substituting the spacing bound for $\Theta_Q$ gives the form that will later interact with congruences modulo many $q$ at once.
[quotetheorem:7182]
[citeproof:7182]
The reducedness condition is essential for this clean form, since unreduced fractions repeat the same point on $\mathbb R/\mathbb Z$ many times and would destroy the spacing hypothesis. The constant $Q^2$ is the cost of considering all denominators up to $Q$, not a bound coming from any single modulus. This is the first place where the large sieve differs from elementary estimates for one congruence class: it trades pointwise control for averaged control over a two-dimensional family of fractions. The estimate is best read as a mean-square statement rather than as a bound for any single exponential sum. The following simple choice of coefficients shows the scale of the result.
[example: Bounding Sums Over Reduced Residue Classes]
Take $a_n=1$ on the interval $1\le n\le N$, so here $M=0$ and
\begin{align*} \sum_{1\le n\le N}|a_n|^2=\sum_{1\le n\le N}1=N. \end{align*}
Substituting these coefficients into the *Additive Large Sieve for Reduced Fractions* gives
\begin{align*} \sum_{q\le Q}\sum_{1\le a\le q,\ \gcd(a,q)=1}\left|\sum_{1\le n\le N} e\left(\frac{an}{q}\right)\right|^2 \le (N-1+Q^2)N. \end{align*}
For example, the term with $q=1$ and $a=1$ is
\begin{align*} \left|\sum_{1\le n\le N} e(n)\right|^2=\left|\sum_{1\le n\le N}1\right|^2=N^2, \end{align*}
because $e(n)=e^{2\pi i n}=1$ for every integer $n$. Thus the inequality is not claiming that every individual exponential sum is small. It says that, after summing over all reduced fractions with denominator at most $Q$, the total square mass is bounded by $(N-1+Q^2)N$, so large sums can occur only sparsely within this separated family of frequencies.
[/example]
## Duality and the Operator Viewpoint
The next problem is that applications often present the large sieve in the opposite direction: instead of summing coefficient sequences against many frequencies, we start with weights attached to the frequencies and ask how large their combined exponential sum can be. The duality principle explains why these two forms are the same estimate.
[quotetheorem:7183]
[citeproof:7183]
Applied to the additive large sieve, this lemma replaces coefficients indexed by integers with coefficients indexed by frequencies. The finite-dimensional hypothesis avoids any topological subtlety: all spaces here are ordinary Euclidean Hilbert spaces with finitely many coordinates. Outside that setting, the statement needs bounded operators on their whole Hilbert spaces. For a concrete failure mode, take $H=K=\ell^2(\mathbb N)$ and define $Te_m=m e_m$ on the finite-support sequences. There is no finite constant $C$ with $\|Tx\|_K^2\le C\|x\|_H^2$, and the formal adjoint is also only defined on a proper dense domain. The common constant $C$ is essential; if the adjoint were allowed a larger constant, the passage would no longer preserve the sharp large-sieve scale. Without this dual viewpoint, one would have to reprove the same almost-orthogonality estimate after interchanging the roles of integers and frequencies. The next theorem states the adjoint estimate explicitly, since it is the version used when congruence conditions are expanded into additive characters.
[quotetheorem:7184]
[citeproof:7184]
The dual form is not a separate phenomenon; it is the same operator bound viewed from the other side. The spacing hypothesis is still doing all the arithmetic work, because the adjoint estimate would fail just as badly if the frequencies repeated or clustered: taking two identical points with weights $1$ and $-1$ would create exact cancellation for one choice of signs, while taking equal signs would double the same exponential vector without paying twice in spacing. The interval length $N$ is also unavoidable, since a single frequency with coefficient $1$ contributes $N$ to the left-hand side. In applications, this form is often the more natural one: residue-class weights are first expanded into characters, and only afterwards are the integer coefficients summed. This perspective is useful enough to record before moving from additive characters to multiplicative ones.
[remark: Why Hilbert Space Language Matters]
The duality principle prevents duplication of proofs. Once the large sieve is proved for one operator, every estimate for the adjoint is already available with the same constant. This is the mechanism behind the passage from additive exponential sums to character sums.
[/remark]
## The Multiplicative Large Sieve
We now ask how to control arithmetic progressions multiplicatively rather than additively. Dirichlet characters diagonalise residue classes in $(\mathbb Z/q\mathbb Z)^\times$, just as additive characters diagonalise residue classes in $\mathbb Z/q\mathbb Z$. The multiplicative large sieve is the resulting mean-square estimate over all characters with modulus at most $Q$.
[definition: Dirichlet Character]
A Dirichlet character modulo $q$ is a function $\chi:\mathbb Z\to\mathbb C$ such that $\chi(n+q)=\chi(n)$ for all $n\in\mathbb Z$, $\chi(mn)=\chi(m)\chi(n)$ for all $m,n\in\mathbb Z$, $\chi(n)=0$ iff $\gcd(n,q)>1$, and $\chi(n)$ is a root of unity whenever $\chi(n)\ne 0$.
[/definition]
Characters are useful because they form the Fourier basis of the finite abelian group $(\mathbb Z/q\mathbb Z)^\times$. To use them for residue classes, we need the exact orthogonality relation that isolates a single reduced class.
[quotetheorem:4338]
[citeproof:4338]
The orthogonality relation explains why character sums detect distribution among reduced residue classes. It says exactly three things for later use. First, it recovers indicators only on the unit group $(\mathbb Z/q\mathbb Z)^\times$; outside that group Dirichlet characters have been extended by zero and cannot distinguish non-unit residue classes. For example, modulo $4$ the class $2\pmod 4$ is non-unit, so every Dirichlet character $\chi\bmod 4$ has $\chi(2)=0$, although the indicator of $n\equiv 2\pmod 4$ is non-zero at $n=2$. Second, the principal character supplies the constant Fourier mode on the reduced residue classes; removing it measures discrepancy from the reduced-class average rather than the full class indicator. Third, the theorem is a statement at one fixed modulus, so it contains no averaging over $q$. The large sieve supplies the missing ingredient: a single inequality that controls the resulting character sums simultaneously for every modulus $q\le Q$, in the same way that the additive large sieve controlled all reduced additive characters.
[quotetheorem:7185]
[citeproof:7185]
The primitive restriction is not cosmetic. If all characters modulo $q$ were included with the same constant, the principal characters alone would produce a contribution of size comparable with $QN^2$ for the coefficients $a_n=1$ on a long interval, while the right-hand side would be only of size $N^2$ when $N$ is much larger than $Q^2$. Imprimitive characters require separate conductor bookkeeping, so the clean sharp statement belongs to primitive characters. For residue-class discrepancies below we will therefore use character orthogonality together with the additive large sieve, rather than applying this primitive theorem directly to all characters.
[remark: Primitive Versus Imprimitive Characters]
The version above includes only primitive characters modulo $q\le Q$. This is the form with the sharp $N-1+Q^2$ constant and no extra conductor multiplicities. All-character estimates are possible, but they must account for how many moduli are induced from the same primitive conductor and cannot be obtained by simply inserting imprimitive characters into the displayed inequality.
[/remark]
The primitive theorem can look abstract because it averages over characters rather than residue classes. A useful test is the coefficient sequence that contains no cancellation of its own: all coefficients equal to $1$ on the interval. In that case the estimate says that primitive characters collectively supply the oscillation, and it also separates the primitive statement from a false all-character analogue, where principal characters would dominate the mean square.
[example: A Mean Square for Primitive Character Sums]
Let $a_n=1$ for every integer $n$ with $M<n\le M+N$. Then
\begin{align*} \sum_{M<n\le M+N}|a_n|^2=\sum_{M<n\le M+N}1=N. \end{align*}
Substituting this into the *Multiplicative Large Sieve for Primitive Dirichlet Characters* gives
\begin{align*} \sum_{q\le Q}\frac{q}{\varphi(q)}\sum_{\chi\bmod q}^{*}\left|\sum_{M<n\le M+N}\chi(n)\right|^2 \le (N-1+Q^2)N. \end{align*}
The principal character modulo $q$ has conductor $1$, so it is primitive only when $q=1$. For $q=1$, the unique character satisfies $\chi(n)=1$ for every $n$, and hence
\begin{align*} \left|\sum_{M<n\le M+N}\chi(n)\right|^2=\left|\sum_{M<n\le M+N}1\right|^2=N^2. \end{align*}
For every modulus $q>1$, the primitive sum contains no principal character term. Thus the displayed mean square is dominated by non-principal primitive characters except for the single unavoidable modulus-$1$ contribution, and this is why the primitive restriction is compatible with the sharp large-sieve scale.
[/example]
## Arithmetic Progressions and Discrepancy Estimates
The final problem is to turn mean-square control of characters into information about arithmetic progressions. The large sieve supplies a direct average estimate for discrepancies after subtracting the expected main term among reduced residue classes, so the argument remains a Hilbert-space mean-square argument throughout.
[definition: Discrepancy in Reduced Residue Classes]
Let $(a_n)$ be a complex sequence supported on $M<n\le M+N$. Let
\begin{align*}
D&:=\{(q,a)\in\mathbb N\times\mathbb Z:\gcd(a,q)=1\}.
\end{align*}
Define the map $A:D\to\mathbb C$ by
\begin{align*} A(q,a):=\sum_{\substack{M<n\le M+N,\ n\equiv a\pmod q}}a_n \end{align*}
for $(q,a)\in D$, define the map $A_0:\mathbb N\to\mathbb C$ by
\begin{align*} A_0(q):=\sum_{\substack{M<n\le M+N,\ \gcd(n,q)=1}}a_n, \end{align*}
and define the map $E:D\to\mathbb C$ by
\begin{align*} E(q,a):=A(q,a)-\frac{A_0(q)}{\varphi(q)}. \end{align*}
[/definition]
The subtraction removes the principal character contribution, so the remaining term should be controlled by non-principal character sums. The next formula makes this exact and prepares the discrepancy for the multiplicative large sieve.
[quotetheorem:7186]
[citeproof:7186]
The formula has converted a distribution question into a square mean of character sums. The principal character has disappeared because the expected average over reduced residue classes has already been subtracted; without that subtraction, the constant sequence $a_n=1$ would leave a large principal term measuring the size of the interval rather than the variation between residue classes. The hypothesis $\gcd(a,q)=1$ is also structural, since characters modulo $q$ vanish on non-units and therefore cannot isolate a non-reduced class by this orthogonality relation. Additive characters alone can detect every residue class modulo $q$, but they do not separate the reduced-class average from the non-unit classes without this projection step. The next estimate should not be read as a consequence of the primitive-character large sieve above, since the formula involves all non-principal characters modulo $q$. Instead, the proof returns to the additive large sieve through the equivalent Fourier expansion of residue-class indicators.
[quotetheorem:7187]
[citeproof:7187]
The estimate applies to any finite weighted set. Its strength is that no cancellation hypothesis is imposed on the coefficients; all arithmetic information enters through the square norm $\sum |a_n|^2$. The limitation is equally important: this is an averaged result over many moduli and classes, not a pointwise equidistribution theorem for a chosen progression. In sieve applications, this lets us measure how evenly the sifted sequence is distributed in many residue classes at once.
[example: Discrepancies for a Finite Set of Integers]
Let $a_n=\mathbb{1}_{\{n\in\mathcal A\}}$ for a finite set $\mathcal A\subset (M,M+N]$. For a reduced residue class $a\pmod q$, the definition of $A(q,a)$ gives
\begin{align*} A(q,a)=\sum_{\substack{M<n\le M+N,\ n\equiv a\pmod q}}\mathbb{1}_{\{n\in\mathcal A\}}. \end{align*}
Each summand is $1$ exactly when $n\in\mathcal A$ and $n\equiv a\pmod q$, and is $0$ otherwise, so $A(q,a)$ is the number of elements of $\mathcal A$ in the class $a\pmod q$. Similarly,
\begin{align*} A_0(q)=\sum_{\substack{M<n\le M+N,\ \gcd(n,q)=1}}\mathbb{1}_{\{n\in\mathcal A\}} \end{align*}
counts the elements of $\mathcal A$ that are coprime to $q$. Thus
\begin{align*} E(q,a)=A(q,a)-\frac{A_0(q)}{\varphi(q)} \end{align*}
is the deviation of the count in the reduced class $a\pmod q$ from the average count among the $\varphi(q)$ reduced residue classes.
The square norm of the coefficient sequence is
\begin{align*} \sum_{M<n\le M+N}|a_n|^2=\sum_{M<n\le M+N}\left|\mathbb{1}_{\{n\in\mathcal A\}}\right|^2. \end{align*}
Since $\left|\mathbb{1}_{\{n\in\mathcal A\}}\right|^2=\mathbb{1}_{\{n\in\mathcal A\}}$ for every $n$, this becomes
\begin{align*} \sum_{M<n\le M+N}|a_n|^2=\sum_{M<n\le M+N}\mathbb{1}_{\{n\in\mathcal A\}}=|\mathcal A|. \end{align*}
Applying the mean-square discrepancy bound to these coefficients therefore gives
\begin{align*} \sum_{q\le Q}\sum_{a\bmod q,\ \gcd(a,q)=1}|E(q,a)|^2 \le (N-1+Q^2)|\mathcal A|. \end{align*}
When $|\mathcal A|$ is much smaller than $N$, the right-hand side is controlled by the actual number of selected integers rather than by the full length of the ambient interval.
[/example]
The finite-set example shows how the large sieve measures distribution for arbitrary coefficients. The next important coefficient sequence is $\Lambda(n)$, because it packages primes and prime powers; to apply the same formalism, we need the corresponding notion of discrepancy for primes in reduced residue classes.
[definition: Prime Progression Discrepancy]
Let
\begin{align*}
D_\Lambda&:=\{(x,q,a)\in[2,\infty)\times\mathbb N\times\mathbb Z:\gcd(a,q)=1\}.
\end{align*}
Define the map $\psi:D_\Lambda\to\mathbb C$ by
\begin{align*} \psi(x;q,a):=\sum_{\substack{n\le x,\ n\equiv a\pmod q}}\Lambda(n) \end{align*}
for $(x,q,a)\in D_\Lambda$, define the map $\psi_0:[2,\infty)\times\mathbb N\to\mathbb C$ by
\begin{align*} \psi_0(x;q):=\sum_{\substack{n\le x,\ \gcd(n,q)=1}}\Lambda(n), \end{align*}
and define the map $E_\Lambda:D_\Lambda\to\mathbb C$ by
\begin{align*} E_\Lambda(x;q,a):=\psi(x;q,a)-\frac{\psi_0(x;q)}{\varphi(q)}. \end{align*}
[/definition]
This definition turns the general discrepancy bound into a statement about primes in arithmetic progressions. The result below is self-contained once we use the elementary second-moment estimate for $\Lambda$.
[quotetheorem:7188]
[citeproof:7188]
The theorem should be read as an average result over many moduli and residue classes. The hypothesis is deliberately weak: only the elementary second moment of $\Lambda$ is used, so no prime number theorem in arithmetic progressions is being smuggled into the argument. Its limitation is that the centre of the discrepancy is $\psi_0(x;q)/\varphi(q)$, not the asymptotic main term $x/\varphi(q)$. The restriction to reduced classes is necessary because primes dividing $q$ occupy non-unit classes in a non-uniform way; including those classes would mix local divisibility obstructions with the distribution question. It does not claim that every individual class has a small discrepancy, but it prevents large discrepancies from occurring too often.
[example: Reading the Prime Discrepancy Estimate]
Suppose $Q\le x^{1/2}$. The preceding large-sieve estimate gives an absolute constant $C$ such that
\begin{align*} \sum_{q\le Q}\sum_{a\bmod q,\ \gcd(a,q)=1}|E_\Lambda(x;q,a)|^2 \le C(x+Q^2)x\log x. \end{align*}
Since $Q\le x^{1/2}$, we have $Q^2\le x$, so
\begin{align*} x+Q^2\le x+x=2x. \end{align*}
Substituting this into the bound gives
\begin{align*} \sum_{q\le Q}\sum_{a\bmod q,\ \gcd(a,q)=1}|E_\Lambda(x;q,a)|^2 \le 2C x^2\log x. \end{align*}
The number of reduced classes with moduli $q\le Q$ is
\begin{align*} R(Q):=\sum_{q\le Q}\varphi(q). \end{align*}
Using the standard summatory estimate $\sum_{q\le Q}\varphi(q)\asymp Q^2$, the average squared discrepancy over these reduced classes is
\begin{align*} \frac{1}{R(Q)}\sum_{q\le Q}\sum_{a\bmod q,\ \gcd(a,q)=1}|E_\Lambda(x;q,a)|^2 \le \frac{2C x^2\log x}{R(Q)} \ll \frac{x^2\log x}{Q^2}. \end{align*}
Thus the large sieve controls the average square of the prime-progression discrepancy at scale $x^2\log x/Q^2$ per reduced class when $Q\le x^{1/2}$. This is not yet the Bombieri--Vinogradov theorem, because the estimate is centered at $\psi_0(x;q)/\varphi(q)$ rather than at $x/\varphi(q)$, but it supplies the mean-square input that later combines with deeper information about primes.
[/example]
This limitation is important for the later sieve chapters. The large sieve supplies robust mean-square control for arbitrary coefficients, while sharper prime distribution theorems require additional input about the main terms themselves.
[remark: What the Large Sieve Does Not Prove Here]
The estimates in this chapter are distributional mean-square bounds for arbitrary coefficients and for $\Lambda(n)$. They do not replace the prime number theorem in arithmetic progressions, because the main term $\psi_0(x;q)/\varphi(q)$ has not been identified with $x/\varphi(q)$. Chapter 7 adds precisely that averaged prime-distribution input through Bombieri-Vinogradov. Their role in the sieve method is to guarantee that residue-class errors cannot be large for too many moduli at once.
[/remark]
The large sieve supplies the mean-square control needed when one cannot track each remainder individually. The next chapter applies that philosophy to primes in arithmetic progressions, adding Bombieri-Vinogradov as the averaged distribution input that determines how far the sieve can be pushed unconditionally.
# 7. Level of Distribution and Bombieri-Vinogradov
Chapters 3 through 5 developed the sieve as a way of turning local congruence information into global upper and lower bounds, and Chapter 6 supplied the large-sieve mean-square tool for controlling many congruence tests at once. The remaining obstruction is the quality of the remainders in arithmetic progressions: a sieve can only exploit congruence classes up to the range in which primes, or weighted primes, are well distributed. This chapter introduces the level of distribution, states the Bombieri-Vinogradov theorem, and explains why it acts as an averaged substitute for the Generalised Riemann Hypothesis in sieve arguments.
## Error Terms in Arithmetic Progressions
The basic question is how uniformly the primes occupy reduced residue classes modulo $q$ as $q$ grows with $x$. Dirichlet's theorem gives an asymptotic for each fixed modulus, but sieve applications require a statement with many moduli at once. The von Mangoldt weight is the natural version for analytic estimates, because it connects prime distribution to characters and Dirichlet series.
[definition: Chebyshev Function in an Arithmetic Progression]
Define the map $\psi:[2,\infty)\times \mathbb N\times \mathbb Z\to \mathbb R$ by
\begin{align*}
\psi:(x,q,a)\longmapsto \psi(x;q,a):=\sum_{n \le x,\, n \equiv a \pmod q} \Lambda(n).
\end{align*}
[/definition]
This function records the prime-power mass in one congruence class. To use it in estimates, we need to subtract the expected main term and isolate the part that should be treated as an error. That comparison is meaningful for reduced residue classes, where the prime number theorem in arithmetic progressions predicts equal sharing among the $\varphi(q)$ available classes.
[definition: Prime Number Theorem Error Term in Progressions]
Let
\begin{align*}
\mathcal D_E:=\{(x,q,a)\in [2,\infty)\times \mathbb N\times \mathbb Z:\gcd(a,q)=1\}.
\end{align*}
Define the map $E:\mathcal D_E\to \mathbb R$ by
\begin{align*}
E:(x,q,a)\longmapsto E(x;q,a):=\psi(x;q,a)-\frac{x}{\varphi(q)}.
\end{align*}
[/definition]
Small $E(x;q,a)$ for each individual pair $(q,a)$ is stronger than most sieves need. The sieve usually sums over many moduli, so cancellation or averaging over $q$ can be enough. In a typical linear sieve, $V(z)$ denotes the local density factor coming from the primes $p<z$ that have been removed, $\lambda_d$ denotes the sieve weight attached to the modulus $d$, and $a_d \pmod d$ denotes the residue class forced by the congruence condition being sieved. The main term then has size about $xV(z)$, while the error contains a weighted expression of the form
\begin{align*}
\sum_{d\le D}\lambda_d E(x;d,a_d).
\end{align*}
If this remainder is only bounded by summing individual errors of main-term size, it can dominate the expected saving. This motivates a scale parameter recording how far the moduli may extend before the total error becomes too large.
[definition: Level of Distribution]
Let $\theta \in [0,1]$. The primes have level of distribution $\theta$ if, for every $A>0$, there is $B=B(A)>0$ such that
\begin{align*}
\sum_{q \le x^\theta/(\log x)^B} \max_{\gcd(a,q)=1} |E(x;q,a)| \lesssim_A \frac{x}{(\log x)^A}
\end{align*}
for all sufficiently large $x$.
[/definition]
The implicit constant may depend on $A$. When $\theta=0$, the range $q\le (\log x)^{-B}$ is eventually empty for every $B>0$, so the definition is literally vacuous in this normalisation; the meaningful cases in this chapter have $\theta>0$. The logarithmic loss in the range is part of the definition used in this course, because it is the form supplied by the Bombieri-Vinogradov theorem and is flexible enough for sieve applications.
[remark: Individual Versus Averaged Distribution]
A pointwise estimate of the shape $E(x;q,a) \lesssim x^{1/2}q^{1/2}(\log x)^C$ up to $q \le x^{1/2}$ would resemble what the Generalised Riemann Hypothesis gives. The level of distribution definition asks for a sum over $q$ instead. It is weaker at a single modulus but often strong enough after the sieve weights have been summed.
[/remark]
The next example turns the definition into the exact estimate that enters later sieve arguments. Its point is not to improve the prime number theorem for one preferred modulus, but to let an argument choose one residue class for every modulus up to $Q$ and still keep the total discrepancy below any prescribed power of $\log x$. This is the scale on which sieve weights operate: they assemble many local congruence restrictions before the analytic error is finally estimated.
[example: Averaged Prime Number Theorem in Progressions]
Fix $A>0$ and suppose the primes have level of distribution $\theta$. Choose $B=B(A)$ from the definition of level of distribution, and put $Q=x^\theta/(\log x)^B$. Then, for all sufficiently large $x$,
\begin{align*}
\sum_{q \le Q} \max_{\gcd(a,q)=1}\left|\psi(x;q,a)-\frac{x}{\varphi(q)}\right| \lesssim_A \frac{x}{(\log x)^A}.
\end{align*}
Equivalently, this total discrepancy is $O_A(x(\log x)^{-A})$.
Now choose one reduced residue class $a_q \pmod q$ for each $q\le Q$. Since $\gcd(a_q,q)=1$, the definition of $E(x;q,a)$ gives
\begin{align*}
\psi(x;q,a_q)=\frac{x}{\varphi(q)}+E(x;q,a_q).
\end{align*}
Summing this identity over $q\le Q$ gives
\begin{align*}
\sum_{q\le Q}\psi(x;q,a_q)=x\sum_{q\le Q}\frac{1}{\varphi(q)}+\sum_{q\le Q}E(x;q,a_q).
\end{align*}
For each $q\le Q$,
\begin{align*}
|E(x;q,a_q)|\le \max_{\gcd(a,q)=1}|E(x;q,a)|.
\end{align*}
Therefore the triangle inequality and the level of distribution bound give
\begin{align*}
\left|\sum_{q\le Q}E(x;q,a_q)\right|\le \sum_{q\le Q}|E(x;q,a_q)|\le \sum_{q \le Q}\max_{\gcd(a,q)=1}|E(x;q,a)|\lesssim_A \frac{x}{(\log x)^A}.
\end{align*}
Substituting this bound into the summed identity yields
\begin{align*}
\sum_{q \le Q} \psi(x;q,a_q)=x\sum_{q \le Q}\frac{1}{\varphi(q)}+O_A\left(\frac{x}{(\log x)^A}\right).
\end{align*}
Thus the level of distribution estimate controls not just one modulus, but any simultaneous choice of one reduced residue class for every modulus up to $Q$, which is exactly the averaged form needed when sieve weights select many congruence classes at once.
[/example]
## Bombieri-Vinogradov as an Averaged GRH
The key problem is that GRH-level pointwise information is unavailable in unconditional sieve arguments. Bombieri-Vinogradov supplies the averaged range $q\le x^{1/2}$, up to logarithmic factors, with a power saving in the average error. This is the main analytic input behind many dimension-one sieve applications involving primes.
[quotetheorem:7189]
[citeproof:7189]
The theorem says that the primes have level of distribution $1/2$. The hypotheses in the maximum are restricted to reduced residue classes because non-reduced classes give an immediate counterexample to the stated main term: if a prime $p\mid q$ and $a\equiv 0\pmod p$, then every prime $n\equiv a\pmod q$ is either $p$ itself or absent, so $\psi(x;q,a)$ is bounded by $O(\log q)$ rather than by $x/\varphi(q)$. The logarithmic factor in the modulus range is also essential in this formulation. If the range were enlarged to all $q\le x^{1/2}$ while still demanding an arbitrary saving $x(\log x)^{-A}$ from this method, the large-sieve term $Q^2$ would be of size $x$ with no logarithmic margin left to absorb divisor weights and dyadic decompositions.
It should be read as an average replacement for GRH in this exact sense: GRH would give a comparable square-root barrier for each modulus separately, while Bombieri-Vinogradov gives the range after summing over moduli. The theorem does not rule out a bad exceptional modulus, nor does it provide a uniform pointwise estimate for every $q\le x^{1/2}$; it says that such bad behaviour cannot contribute too much in total. This is precisely the form needed later, because sieve remainders appear after summing over many moduli with controlled weights rather than after fixing one modulus in isolation.
[remark: The Square-Root Barrier]
The exponent $1/2$ is not an artefact of notation. In the large sieve, the term $Q^2$ competes with the length $N$ of the sequence being summed, and the balance $Q^2\approx x$ is the point at which the method stops giving the desired total error. Breaking this barrier is the content of stronger hypotheses such as Elliott-Halberstam.
[/remark]
This barrier is also the place where this chapter connects to the modern theory of prime patterns. The Elliott-Halberstam conjecture predicts a level of distribution close to $1$, and that stronger input would make sieve weights detect prime constellations far more efficiently. The bounded gaps theorem of Maynard and Tao still works unconditionally because Bombieri-Vinogradov supplies enough averaged distribution at level $1/2$ for a refined multidimensional sieve, while stronger distribution hypotheses improve the numerical strength and allow sharper conclusions about primes in admissible tuples.
Chapter 8 uses Bombieri-Vinogradov in places where a sieve remainder has the form of a weighted average of prime-counting errors over moduli. The following example records the typical admissible range.
[example: Admissible Moduli in Sieve Applications]
Let the sieve remainder be bounded in absolute value by
\begin{align*}
R(Q):=\left|\sum_{q\le Q} c_q \max_{\gcd(a,q)=1}|E(x;q,a)|\right|,
\end{align*}
where $|c_q|\le (\log x)^C$ for every $q\le Q$ and $C>0$ is fixed. To obtain a final saving $x(\log x)^{-A}$, apply the *Bombieri-Vinogradov Theorem* with parameter $A+C$. It gives a number $B=B(A+C)>0$ such that, for
\begin{align*}
Q=\frac{x^{1/2}}{(\log x)^B},
\end{align*}
one has
\begin{align*}
\sum_{q\le Q}\max_{\gcd(a,q)=1}|E(x;q,a)|\lesssim_{A,C}\frac{x}{(\log x)^{A+C}}.
\end{align*}
Now use the coefficient bound term by term. Since each maximum is nonnegative,
\begin{align*}
R(Q)\le \sum_{q\le Q}|c_q|\max_{\gcd(a,q)=1}|E(x;q,a)|.
\end{align*}
The hypothesis $|c_q|\le(\log x)^C$ gives
\begin{align*}
R(Q)\le(\log x)^C\sum_{q\le Q}\max_{\gcd(a,q)=1}|E(x;q,a)|.
\end{align*}
Substituting the Bombieri-Vinogradov bound gives
\begin{align*}
R(Q)\lesssim_{A,C}(\log x)^C\frac{x}{(\log x)^{A+C}}.
\end{align*}
Since $(\log x)^C(\log x)^{-(A+C)}=(\log x)^{-A}$, this becomes
\begin{align*}
R(Q)\lesssim_{A,C}\frac{x}{(\log x)^A}.
\end{align*}
Thus logarithmic coefficient growth of fixed degree only costs the same number of logarithmic powers in the Bombieri-Vinogradov input. This is the precise sense in which a sieve may take moduli up to $x^{1/2}$ with only logarithmic losses, often summarized informally as a level $D\le x^{1/2-o(1)}$.
[/example]
## Mean Squares and the Barban-Davenport-Halberstam Theorem
A second way to measure distribution is not by the largest residue class error for each modulus, but by the mean square over all reduced classes. This is often technically cleaner, because square averages interact naturally with orthogonality. It also shows that the typical error is smaller than what a worst-case estimate might suggest.
[quotetheorem:7190]
[citeproof:7190]
This theorem is not a direct replacement for Bombieri-Vinogradov, because it gives square-average information over residue classes rather than a maximum over $a$. The distinction matters: a small mean square permits a small set of residue classes to have unusually large errors, while Bombieri-Vinogradov controls the worst reduced class for each modulus before summing over $q$. For example, if for each modulus $q\le Q$ there were one reduced class $a_q$ with error about $M_q$ and all other classes had much smaller error, the mean-square contribution of these exceptional classes would be only $\sum_{q\le Q}M_q^2$. The BDH bound allows this provided $\sum_{q\le Q}M_q^2\lesssim xQ\log x$, whereas Bombieri-Vinogradov would require the stronger linear control $\sum_{q\le Q}M_q\lesssim_A x(\log x)^{-A}$ in the range $Q\le x^{1/2}(\log x)^{-B}$. Taking, say, $M_q\asymp (x\log x)^{1/2}$ for about $Q$ moduli is compatible with the mean-square scale but would give a total maximum error about $Q(x\log x)^{1/2}$, far too large at $Q=x^{1/2}$ for a logarithmic-saving Bombieri-Vinogradov conclusion. Conversely, the mean-square formulation is the right measurement when the argument naturally averages over residue classes, since orthogonality turns that average into a second moment over characters.
The reduced-class condition is needed here for the same concrete reason as in Bombieri-Vinogradov: if $a$ is not coprime to $q$, the class cannot contain a positive-density share of the primes, so subtracting $x/\varphi(q)$ would manufacture a large error unrelated to equidistribution. The restriction $Q\le x$ reflects the range in which the variance statement is meaningful for moduli being compared with primes up to $x$; for $q>x$, many residue classes contain at most one integer $n\le x$, so the progression count is governed by sparsity rather than by the prime number theorem in a class. The extra term $x^2(\log x)^{-A}$ protects the estimate in small ranges of $Q$: for instance, when $Q=1$ or $Q$ is fixed, the variance term $xQ\log x$ is not the right mechanism for absorbing all exceptional low-modulus behaviour with an arbitrary logarithmic saving. Later uses choose $Q$ large enough, or combine the theorem with other estimates, so that the variance term is the governing contribution.
[example: Typical Size from the Mean Square Bound]
To isolate the scale predicted by the variance term, take $Q\le x^{1/2}$ and suppress the logarithmic-saving term in the *Barban-Davenport-Halberstam Mean-Square Theorem*. Then the total square error over reduced residue classes satisfies
\begin{align*}
\sum_{q\le Q}\sum_{\substack{a\pmod q,\, \gcd(a,q)=1}}\left|\psi(x;q,a)-\frac{x}{\varphi(q)}\right|^2=O(xQ\log x).
\end{align*}
The number of reduced classes being averaged over is
\begin{align*}
N(Q):=\sum_{q\le Q}\varphi(q).
\end{align*}
Using the standard average order $\sum_{q\le Q}\varphi(q)\asymp Q^2$, the mean square error per reduced class has size
\begin{align*}
\frac{O(xQ\log x)}{N(Q)}=O\left(\frac{xQ\log x}{Q^2}\right)=O\left(\frac{x\log x}{Q}\right).
\end{align*}
Taking the square root gives the root mean square scale
\begin{align*}
\left(\frac{x\log x}{Q}\right)^{1/2}.
\end{align*}
At the endpoint $Q=x^{1/2}$, this becomes
\begin{align*}
\left(\frac{x\log x}{x^{1/2}}\right)^{1/2}=\left(x^{1/2}\log x\right)^{1/2}=x^{1/4}(\log x)^{1/2}.
\end{align*}
Thus the mean-square theorem predicts that a typical reduced residue class has error on the scale $x^{1/4}(\log x)^{1/2}$ when $Q=x^{1/2}$, far below the main term $x/\varphi(q)$ for most moduli in that range.
[/example]
## The Large Sieve and Vaughan's Identity
The remaining question is where the exponent $1/2$ actually enters the proof. The Bombieri-Vinogradov theorem is not proved by estimating each $L$-function separately. Instead, the proof combines an identity that breaks $\Lambda$ into bilinear pieces with an inequality that controls many character sums at once.
[quotetheorem:7191]
[citeproof:7191]
The factor $q/\varphi(q)$ is a genuine part of the hypothesis package, not a harmless normalisation. Primitive characters modulo $q$ form an orthogonal family only after the conductor density has been accounted for, and this weight keeps moduli with unusually small $\varphi(q)/q$ from being undercharged. The primitive-character restriction is likewise not cosmetic: imprimitive characters are induced from smaller conductors, and counting the same conductor at every multiple modulus duplicates information that should be charged only once. The principal character of conductor $1$ illustrates the obstruction. If a formulation counts all induced appearances without grouping by conductor, then the coefficient choice $a_n=1$ on $1\le n\le N$ contributes repeated copies of the same sum of size about $N$ from many moduli. The correct all-character versions therefore use conductor-aware weights or decompose characters by primitive conductor before applying the displayed inequality. The primitive statement above is the clean form needed for Bombieri-Vinogradov, where the conductor bookkeeping is absorbed before the large sieve is invoked.
The support interval of length $N$ is also a hypothesis of the estimate. The right-hand side measures the energy of coefficients on one interval, so $N$ must record the whole range over which the coefficients may be nonzero. If the same coefficient sequence were allowed to occupy many disjoint intervals without increasing $N$, translating a fixed block of nonzero coefficients would produce repeated independent contributions on the left while the right-hand side would still measure only one block. The term $Q^2+N$ contains both the strength and the limitation of the estimate. When $Q^2\ll N$, the inequality behaves almost like orthogonality over a family of size $Q^2$; when $Q^2$ exceeds $N$, the spacing of fractions rather than the length of the sequence controls the estimate. This is the analytic source of the square-root barrier that later appears in Bombieri-Vinogradov.
The large sieve is powerful for arbitrary coefficients, but the coefficients $\Lambda(n)$ still need more structure before the inequality can be applied with useful savings. This motivates a decomposition theorem for $\Lambda$ that separates sums with a short variable from genuinely bilinear sums. Vaughan's identity is the standard tool used for that purpose in this proof.
[quotetheorem:7192]
[citeproof:7192]
The point of the decomposition is structural rather than cosmetic. Without the cutoffs $U$ and $V$, the convolution formula for $\Lambda$ would mix very short, very long, and balanced variables in one expression, and the large sieve would be applied in ranges where either $Q^2$ or a divisor-type coefficient loss overwhelms the desired logarithmic saving. If, for example, $V$ were omitted, the part containing $\Lambda(m)$ would include both prime powers $m$ near $1$ and $m$ near $x$ in the same sum; the former gives no bilinear averaging, while the latter leaves the companion variable too short for the large sieve to gain anything. If $U$ is taken too large, the first divisor sum has too many possible $d$ and loses the logarithmic saving to divisor-type coefficients; if $U$ is taken too small, too much of $\mu*1=\varepsilon$ remains in the final coefficient and the Type II term is no longer balanced. Type I estimates exploit cancellation in characters after one variable has been fixed; Type II estimates exploit bilinear structure and Cauchy-Schwarz before applying the large sieve.
The identity itself does not create cancellation, and it does not estimate $\sum_{n\le x}\Lambda(n)\chi(n)$ by itself. Its role is to put the von Mangoldt weight into forms where the large sieve sees either a long one-dimensional sum or a balanced bilinear family. Together these estimates control the character sums that appear after orthogonality transforms the progression errors.
[explanation: Proof Architecture of Bombieri-Vinogradov]
The progression error is first written as
\begin{align*}
E(x;q,a)=\frac{1}{\varphi(q)}\sum_{\chi \pmod q,\, \chi\ne \chi_0}\overline{\chi(a)}\sum_{n\le x}\Lambda(n)\chi(n) + \text{small principal-character adjustment}.
\end{align*}
Thus an average over $q$ and a maximum over $a$ reduce to bounding many character sums with the von Mangoldt weight.
Vaughan's identity replaces $\Lambda(n)$ by sums whose coefficients have controlled divisor-type size and whose variables lie in ranges suited to the large sieve. For Type I sums, one variable is short, so after fixing it the remaining character sum has enough length for averaging over characters. For Type II sums, Cauchy-Schwarz separates the two variables, and the large sieve bounds the resulting second moments. The estimates close when $Q^2$ is no larger than $x$ up to logarithmic losses, which is the source of the distribution exponent $1/2$.
[/explanation]
This analytic input feeds back into the sieve in a simple way. A sieve of level $D$ can only be as strong as the distributional information available for the remainders $r_d$. For primes in arithmetic progressions, Bombieri-Vinogradov permits $D$ near $x^{1/2}$, and that is the unconditional range underlying the applications in Chapter 8.
Bombieri-Vinogradov gives the distributional range that makes the later applications possible, especially where prime tuples or almost-prime configurations are involved. The final chapter uses the sieve framework, the singular series, and the available level of distribution to explain how these ingredients combine in problems such as twin primes, Goldbach-type questions, and [Chen's theorem](/theorems/7195).
# 8. Applications to Twin Primes, Goldbach, and Almost-Prime Problems
The previous chapters supplied the ingredients: upper-bound sieves, the parity obstruction, the large sieve, and Bombieri--Vinogradov. The point here is not to prove the deepest modern forms of these theorems. Instead we isolate the standard sieve setup, the role of the singular series, and the way distribution estimates such as Brun--Titchmarsh and Bombieri--Vinogradov enter. This chapter therefore functions as a map from abstract sieve inequalities to concrete arithmetic applications.
## Prime Pairs and Admissible Tuples
The twin prime problem asks whether there are infinitely many $n$ such that both $n$ and $n+2$ are prime. A sieve cannot prove this directly, but it can count how many candidates survive all small prime obstructions and it gives the conjectural main local factor. The same setup applies to any finite set of shifts.
[definition: Admissible Tuple]
Let $\nu_1,\dots,\nu_k$ be distinct integers. The tuple $\mathcal H=\{\nu_1,\dots,\nu_k\}$ is admissible if for every prime $p$ the set of residue classes $\{-\nu_1,\dots,-\nu_k\}\pmod p$ does not contain all residue classes modulo $p$.
[/definition]
Admissibility is the local condition that no prime $p$ divides $\prod_{i=1}^k(n+\nu_i)$ for every integer $n$. For $\mathcal H=\{0,2\}$, the condition holds because modulo $2$ both shifts give the same forbidden residue class, while for odd $p$ only two residue classes are forbidden. To turn these local restrictions into a predicted global count, we need to multiply the local correction factors over all primes.
[definition: Singular Series of a Prime Tuple]
The singular series of a prime tuple is the map from admissible finite tuples of distinct integers to positive real numbers defined as follows. For an admissible tuple $\mathcal H=\{\nu_1,\dots,\nu_k\}$, let $\omega(p)$ be the number of distinct residue classes modulo $p$ occupied by $-\nu_1,\dots,-\nu_k$. Then
\begin{align*}
\mathfrak S(\mathcal H)=\prod_p \left(1-\frac{1}{p}\right)^{-k}\left(1-\frac{\omega(p)}{p}\right).
\end{align*}
[/definition]
The factor at $p$ compares the true local density of avoiding all forbidden classes modulo $p$ with the naive density expected if $k$ independent prime conditions were imposed. This product converges for admissible tuples because $\omega(p)=k$ for all sufficiently large $p$ and the logarithmic first-order terms cancel.
[example: Twin Prime Singular Series]
For $\mathcal H=\{0,2\}$, the forbidden residue classes modulo $p$ are $-\nu_1\equiv 0\pmod p$ and $-\nu_2\equiv -2\pmod p$. When $p=2$, these coincide because $-2\equiv 0\pmod 2$, so $\omega(2)=1$. When $p>2$, the classes $0$ and $-2$ are distinct modulo $p$, since $0\equiv -2\pmod p$ would imply $p\mid 2$, impossible for an odd prime. Thus $\omega(p)=2$ for every odd prime $p$.
Substituting $k=2$, $\omega(2)=1$, and $\omega(p)=2$ for $p>2$ into the definition of the singular series gives
\begin{align*}
\mathfrak S(\{0,2\})=\left(1-\frac12\right)^{-2}\left(1-\frac12\right)\prod_{p>2}\left(1-\frac1p\right)^{-2}\left(1-\frac2p\right).
\end{align*}
The prime $2$ factor is
\begin{align*}
\left(1-\frac12\right)^{-2}\left(1-\frac12\right)=\left(\frac12\right)^{-2}\cdot\frac12=4\cdot\frac12=2.
\end{align*}
For each odd prime $p$,
\begin{align*}
\left(1-\frac1p\right)^{-2}\left(1-\frac2p\right)=\left(\frac{p-1}{p}\right)^{-2}\cdot\frac{p-2}{p}.
\end{align*}
Since $\left(\frac{p-1}{p}\right)^{-2}=\left(\frac{p}{p-1}\right)^2=\frac{p^2}{(p-1)^2}$, this factor equals
\begin{align*}
\frac{p^2}{(p-1)^2}\cdot\frac{p-2}{p}=\frac{p(p-2)}{(p-1)^2}.
\end{align*}
Therefore
\begin{align*}
\mathfrak S(\{0,2\})=2\prod_{p>2}\frac{p(p-2)}{(p-1)^2}.
\end{align*}
This is the local correction factor in the Hardy--Littlewood prediction for twin primes, whose expected count up to $x$ is asymptotic to $\mathfrak S(\{0,2\})x/(\log x)^2$.
[/example]
The example identifies the correct local constant, but it does not yet give a theorem. The remaining obstruction is exactly the one Selberg's sieve is suited for: proving that both $n$ and $n+h$ avoid small prime factors gives an upper bound for prime pairs, even though it cannot prove that both numbers are prime. Applying the upper-bound sieve to the sequence $n(n+h)$ therefore gives the strongest conclusion available here, namely the Hardy--Littlewood order of magnitude as an upper bound with the singular series as the local factor.
[quotetheorem:7193]
[citeproof:7193]
This result should be read as a limit of the method as much as a success. The assumption that $h$ is even is forced by a local obstruction: if $h$ is odd, then for every integer $n$ one of $n$ and $n+h$ is even, so the only possible prime pair has one member equal to $2$ and there are only finitely many such cases. The condition $h\ne 0$ also matters, since $h=0$ would count primes once rather than a genuine two-prime pattern. In that degenerate case the expected order is
\begin{align*}
\frac{x}{\log x},
\end{align*}
not
\begin{align*}
\frac{x}{(\log x)^2}.
\end{align*}
For nonzero even $h$, the local obstructions are exactly the ones encoded in $\mathfrak S(\{0,h\})$, but the proof still does not distinguish the event that $n+h$ is prime from the event that $n+h$ has an even number of prime factors.
[example: Sifting Twin-Prime Candidates]
Let $z\le x^{1/2}$, and let $S(x,z)$ denote the number of integers $n\le x$ for which no prime $p<z$ divides $n(n+2)$. For a prime $p$, the divisibility condition $p\mid n(n+2)$ is equivalent to $n\equiv 0\pmod p$ or $n\equiv -2\pmod p$. Hence $\omega(2)=1$, because $0\equiv -2\pmod 2$, while $\omega(p)=2$ for every odd prime $p$, because $0\equiv -2\pmod p$ would imply $p\mid 2$.
The local density model therefore predicts
\begin{align*}
S(x,z)\approx x\prod_{p<z}\left(1-\frac{\omega(p)}p\right).
\end{align*}
Separating the prime $2$ from the odd primes gives
\begin{align*}
\prod_{p<z}\left(1-\frac{\omega(p)}p\right)=\frac12\prod_{2<p<z}\left(1-\frac2p\right).
\end{align*}
For each odd prime $p$,
\begin{align*}
1-\frac2p=\frac{p-2}{p}.
\end{align*}
Also
\begin{align*}
\left(1-\frac1p\right)^2\frac{p(p-2)}{(p-1)^2}=\frac{(p-1)^2}{p^2}\cdot\frac{p(p-2)}{(p-1)^2}=\frac{p-2}{p}.
\end{align*}
Thus
\begin{align*}
\prod_{p<z}\left(1-\frac{\omega(p)}p\right)=\frac12\prod_{2<p<z}\left(1-\frac1p\right)^2\frac{p(p-2)}{(p-1)^2}.
\end{align*}
Since
\begin{align*}
\mathfrak S(\{0,2\})=2\prod_{p>2}\frac{p(p-2)}{(p-1)^2},
\end{align*}
the finite product over $2<p<z$ is bounded above and below by positive constant multiples of $\mathfrak S(\{0,2\})$. By the *Mertens product estimate*,
\begin{align*}
\prod_{p<z}\left(1-\frac1p\right)^2\asymp \frac1{(\log z)^2}.
\end{align*}
Therefore the predicted sifted count has size
\begin{align*}
x\prod_{p<z}\left(1-\frac{\omega(p)}p\right)\asymp \mathfrak S(\{0,2\})\frac{x}{(\log z)^2},
\end{align*}
with constants independent of $x$ and $z$ in the usual sieve range. These are rough twin-prime candidates: surviving the sieve only says that $n$ and $n+2$ have no prime factor below $z$, not that both numbers are prime.
[/example]
## Brun--Titchmarsh and Primes in Arithmetic Progressions
Many sieve applications require bounds for primes in arithmetic progressions where the modulus is too large for the prime number theorem in arithmetic progressions to give a uniform asymptotic. The [Brun--Titchmarsh inequality](/theorems/7194) supplies a robust upper bound in this range. It is weaker than the expected asymptotic, but it is uniform and needs no exceptional-zero analysis.
[definition: Prime Counting in a Progression]
For integers $q\ge 1$ and $a$ with $\gcd(a,q)=1$, the prime-counting function in the progression $a\pmod q$ is the map
\begin{align*}
\pi(\cdot;q,a):[0,\infty)\to \mathbb N_0,
\qquad
\pi(x;q,a)=\#\{p\le x:p\text{ prime and }p\equiv a\pmod q\}.
\end{align*}
[/definition]
This notation packages a variable counting problem into a single function, so that uniformity in $q$ and $a$ can be stated cleanly. The central question is how large $\pi(x;q,a)$ can be when $q$ is comparable with a power of $x$. Dirichlet's theorem gives infinitude for fixed $q$, while the prime number theorem in arithmetic progressions gives asymptotics only with uniformity restrictions. We write $\varphi(q)=|\{1\le b\le q:\gcd(b,q)=1\}|$ for Euler's totient function.
[quotetheorem:7194]
[proofunderconstruction:7194]
The inequality is often used not as a theorem about primes for its own sake, but as a uniform input inside another sieve. The hypothesis $\gcd(a,q)=1$ is necessary because if $\gcd(a,q)>1$, then every term of the progression shares a fixed common divisor with $q$; for instance, the class $0\pmod q$ contributes only the prime divisors of $q$, so the usual density $1/\varphi(q)$ is not the right scale. The range $x>q+1$ keeps the logarithmic denominator positive. When $x$ is only slightly larger than $q$, the bound may be numerically weak because $\log(x/q)$ is small; this is a limitation of the stated uniform estimate, not evidence for an asymptotic formula. The constant $2$ also reflects the sieve square-root barrier: the prime number theorem in arithmetic progressions predicts a leading constant $1$ in ranges where a genuine asymptotic is available, but Brun--Titchmarsh is designed to remain valid far beyond those asymptotic ranges. Whenever a sequence contains a prime variable constrained by a congruence depending on a second variable, Brun--Titchmarsh prevents a small collection of moduli from contributing too many primes.
[example: Bounding Prime Fibres in a Congruence]
Fix $x\ge 4$, let $q\le x^{1/2}$, and let $a\pmod q$ be reduced. By *Brun--Titchmarsh*,
\begin{align*}
\#\{p\le x:p\equiv a\pmod q\}\le \frac{2x}{\varphi(q)\log(x/q)}.
\end{align*}
Since $q\le x^{1/2}$, we have $x/q\ge x^{1/2}$, and therefore
\begin{align*}
\log(x/q)\ge \log(x^{1/2})=\frac12\log x.
\end{align*}
Taking reciprocals gives
\begin{align*}
\frac{1}{\log(x/q)}\le \frac{2}{\log x}.
\end{align*}
Substituting this into the Brun--Titchmarsh bound yields the uniform estimate
\begin{align*}
\#\{p\le x:p\equiv a\pmod q\}\le \frac{4x}{\varphi(q)\log x}.
\end{align*}
Now suppose a sieve decomposition produces one reduced residue class $a_q\pmod q$ for each $q\le Q$, with $Q\le x^{1/2}$. Summing the preceding bound gives
\begin{align*}
\sum_{q\le Q}\#\{p\le x:p\equiv a_q\pmod q\}\le \sum_{q\le Q}\frac{4x}{\varphi(q)\log x}.
\end{align*}
Factoring out the terms independent of $q$ gives
\begin{align*}
\sum_{q\le Q}\#\{p\le x:p\equiv a_q\pmod q\}\le \frac{4x}{\log x}\sum_{q\le Q}\frac{1}{\varphi(q)}.
\end{align*}
Thus any available estimate for the average size of $1/\varphi(q)$ immediately converts Brun--Titchmarsh into a bound for the total contribution of all these prime fibres. This is useful precisely because it gives uniform upper control even when no asymptotic formula is known for the individual progressions.
[/example]
## Chen-Type Almost-Prime Conclusions
The parity barrier prevents a pure sieve from isolating primes in many lower-bound problems, but it can often isolate integers with at most two prime factors. Chen's theorem is the central model: the twin prime problem remains open, yet there are infinitely many primes $p$ for which $p+2$ is either prime or a product of two primes.
[definition: Almost Prime of Order Two]
An integer $m\ge 2$ is denoted $P_2$ if $m$ has at most two prime factors counted with multiplicity.
[/definition]
This definition is tailored to the point at which the sieve can still deliver a lower bound. The replacement of "prime" by "$P_2$" is not cosmetic; it is the precise relaxation that bypasses the obstruction created by parity.
[quotetheorem:7195]
The strategy is a direct continuation of the course's methods, with additional weighted-sieve and bilinear estimates used to make the lower bound positive.
[explanation: Proof Strategy for Chen's Theorem]
The starting sequence is $\mathcal A=\{p+2:p\le x,\ p\text{ prime}\}$. A lower-bound sieve removes small prime divisors from $p+2$, but the sequence is indexed by primes, so the required distribution estimates concern primes in arithmetic progressions. Bombieri--Vinogradov supplies average distribution up to level $x^{1/2}$, while switching arguments and bilinear forms handle additional ranges needed to count semiprime outcomes. The sieve weights are arranged so that survivors either have no small prime factor or have a controlled factorisation with two prime factors. A positive lower bound of order $x/(\log x)^2$ for such primes $p$ remains after subtracting the exceptional configurations.
[/explanation]
Chen's theorem does not assert that $p+2$ is prime infinitely often; it allows $p+2$ to be a product of two primes, counted with multiplicity. This is a concrete weakening of the twin-prime target: twin primes would require both $p$ and $p+2$ to be prime, while Chen's conclusion keeps the first number prime and relaxes only the second. The theorem therefore belongs to the sieve tradition, but it also marks the point where the available distribution information stops short of detecting prime parity exactly.
[example: Setting Up the Sieve for $N=p+m$]
Let $N$ be a large even integer and write $N=p+m$, where $p$ is prime and $m$ is intended to be $P_2$. The sequence to be sifted is
\begin{align*}
\mathcal A=\{N-p:p\le N,\ p\text{ prime}\}.
\end{align*}
For a positive integer $d$, the elements of $\mathcal A$ divisible by $d$ are counted by
\begin{align*}
\#\{m\in\mathcal A:d\mid m\}=\#\{p\le N:p\text{ prime and }d\mid N-p\}.
\end{align*}
The congruence $d\mid N-p$ means $N-p\equiv 0\pmod d$, and adding $p$ to both sides gives
\begin{align*}
N\equiv p\pmod d.
\end{align*}
Equivalently,
\begin{align*}
p\equiv N\pmod d.
\end{align*}
Thus, when $\gcd(N,d)=1$, this count is exactly the prime-counting function in one reduced residue class:
\begin{align*}
\#\{m\in\mathcal A:d\mid m\}=\pi(N;d,N).
\end{align*}
If $\gcd(N,d)>1$, then any prime $p\equiv N\pmod d$ is divisible by $\gcd(N,d)$, so such terms are exceptional local contributions coming from primes dividing $N$.
The sieve problem is therefore converted into a distribution problem for primes in the residue class $N\pmod d$ as $d$ varies. *Brun--Titchmarsh* gives an upper bound for each individual modulus $d$, while *Bombieri--Vinogradov* supplies the averaged error control over many moduli that a lower-bound sieve needs. This is the arithmetic reason Chen-type arguments study $N=p+m$ through primes in progressions rather than through $m$ directly.
[/example]
## Goldbach Through the Sieve Lens
Goldbach's problem asks whether every sufficiently large even integer is a sum of two primes. Sieve methods alone do not prove this binary form, but they give strong upper bounds for the number of representations and lower bounds when one prime is replaced by an almost prime. The problem is an ideal place to see how additive harmonic analysis supplies information that divisibility sieves do not contain.
[definition: Goldbach Representation Function]
The Goldbach representation function is the map
\begin{align*}
r:2\mathbb N_{\ge 2}\to \mathbb N_0,
\qquad
r(N)=\#\{p\le N:p\text{ prime and }N-p\text{ prime}\}.
\end{align*}
[/definition]
The expected main term is governed by local congruence restrictions in exactly the same way as prime-pair problems. For a fixed even $N$, the obstruction at primes dividing $N$ differs from the obstruction at primes not dividing $N$: modulo an odd prime $q\nmid N$, the two conditions $p\not\equiv 0\pmod q$ and $N-p\not\equiv 0\pmod q$ exclude two distinct classes, while for $q\mid N$ they exclude the same class. This residue-level distinction is the reason the Goldbach singular series depends on the prime divisors of $N$.
[definition: Goldbach Singular Series]
The Goldbach singular series is the map from even integers $N\ge 4$ to positive real numbers defined by
\begin{align*}
\mathfrak S(N)=2\prod_{p>2}\left(1-\frac{1}{(p-1)^2}\right)\prod_{\substack{p\mid N,\ p>2}}\frac{p-1}{p-2}.
\end{align*}
[/definition]
This factor measures the local density of pairs $(p,N-p)$ avoiding divisibility obstructions. It is larger when $N$ has many small odd prime factors, because the congruence $p\equiv N\pmod q$ interacts differently with the excluded class $p\equiv 0\pmod q$. Once the local factor is in place, the natural sieve question is whether it controls the representation function from above.
[quotetheorem:7196]
[citeproof:7196]
The upper bound has the same order as the Hardy--Littlewood prediction. The restriction to even $N$ is essential for the binary Goldbach problem: if $N>2$ is odd, then one of two prime summands must be $2$, so there are at most the exceptional representations $N=2+p$ rather than a family of size $N/(\log N)^2$. The local dependence in $\mathfrak S(N)$ records whether an odd prime $q$ divides $N$; if $q\mid N$, the two forbidden congruences for $p$ and $N-p$ collapse, while if $q\nmid N$ they are distinct. A matching lower bound for $r(N)$ is a different matter: a lower-bound sieve cannot distinguish primes from products of two primes sharply enough, and the binary additive structure requires circle-method estimates.
[quotetheorem:7197]
[citeproof:7197]
This theorem is a Chen-type conclusion, not a consequence of Bombieri--Vinogradov in isolation. Bombieri--Vinogradov supplies cancellation on average for primes in progressions up to level about $N^{1/2}$, which is enough to run the first lower-bound sieve but not enough to identify $N-p$ as prime. Without this level of distribution the accumulated remainder terms over the moduli $d$ would be comparable with the expected main term, so positivity could be lost before the almost-prime conclusion is reached. Even with Bombieri--Vinogradov, the parity obstruction leaves values of $N-p$ with two prime factors indistinguishable from primes at the level of ordinary sieve weights; Chen's switching and bilinear estimates are what keep the three-factor contribution under control.
[remark: Where Additive Methods Enter]
The circle method studies exponential sums over primes and can exploit cancellation in the equation $p_1+p_2=N$ directly. Sieve methods instead control divisibility by small primes and distribution in residue classes. The two viewpoints overlap in their singular series, but they obtain lower bounds from different sources: sieves use positivity and distribution estimates, while additive methods use Fourier cancellation on major and minor arcs.
[/remark]
## The Parity Obstruction and the Limits of Sieve Information
The applications in this chapter have a common shape: upper bounds for prime patterns are accessible, while lower bounds for prime patterns resist direct sieving. The underlying reason is the parity obstruction. Sieve weights based only on divisibility by small primes have difficulty distinguishing integers with an odd number of prime factors from those with an even number.
[explanation: Parity Obstruction]
For an integer $n\ge 1$, let $\Omega(n)$ denote the number of prime factors of $n$ counted with multiplicity. A combinatorial sieve estimates a sifted set by combining divisibility information over squarefree moduli. Such data is insensitive to the Liouville-type sign $(-1)^{\Omega(n)}$ unless additional cancellation is available. As a result, a sieve that proves a lower bound for numbers with no small prime factors often also admits semiprimes or products with the wrong parity of prime factors. This is why Brun's method proves convergence of the twin-prime reciprocal sum and upper bounds for twin-prime candidates, while Chen's theorem reaches primes paired with $P_2$ values rather than proving twin primes.
[/explanation]
The obstruction does not make sieve theory weak. It identifies the boundary between local divisibility information and genuinely prime-detecting information, echoing the parity principle from Chapter 5, and it explains why the strongest applications combine sieves with distribution theorems, bilinear forms, or additive harmonic analysis.
[example: Comparing Prime and Semiprime Detection]
Suppose $\mathcal A$ has the expected local density modulo every squarefree $d\le D$, so the sieve has reliable information about conditions of the form $d\mid a$ for $a\in\mathcal A$. If $a$ survives sifting by all primes $p<z$, then
\begin{align*}
p\mid a \text{ and } p<z \quad \text{never occur.}
\end{align*}
Equivalently, every prime divisor of $a$ is at least $z$.
This condition does not force $a$ to be prime. If $a=qr$ with primes $q,r\ge z$, then $a$ also survives the sieve, because neither prime factor is below $z$. More generally, if
\begin{align*}
a=q_1q_2\cdots q_j
\end{align*}
with all $q_i$ prime and all $q_i\ge z$, then no prime $p<z$ divides $a$, so $a$ is counted by the same sifted set. The sieve condition therefore separates integers according to the size of their prime factors, not according to the exact value of $\Omega(a)$.
The distinction between primes and semiprimes is a parity distinction in this notation: a prime has $\Omega(a)=1$, while a product of two primes has $\Omega(a)=2$. Divisibility data modulo squarefree $d$ records which small primes divide $a$, but once all prime factors are larger than $z$, the same local data is compatible with $\Omega(a)=1$, $\Omega(a)=2$, and larger values. Thus replacing a prime target by a $P_2$ target is stable in Chen-type arguments: the sieve can force the absence of small prime factors and then control the remaining factorization range, but it cannot by itself isolate the odd-parity case $\Omega(a)=1$.
[/example]
The chapter closes the course by showing the sieve method at full strength and at its natural boundary. Singular series encode the local constraints with remarkable precision, upper-bound sieves give the correct scale for prime patterns, and average distribution theorems push lower bounds into the almost-prime range. The remaining gap to twin primes and binary Goldbach is not a failure of setup; it is the place where the sieve must be supplemented by deeper information about primes.
## Beyond and Connections
Sieve methods sit between elementary multiplicative number theory and the analytic theory of primes in arithmetic progressions. The exact identities and Brun-type inequalities in the early chapters connect directly to multiplicative functions, Mobius inversion, Euler products, and Tauberian estimates. The Selberg sieve chapters point toward quadratic forms, Hilbert-space duality, and large-sieve inequalities, while the final applications connect the local singular-series viewpoint with additive problems such as Goldbach and prime pairs.
The natural next step is to study distribution theorems for primes in arithmetic progressions in more depth. Bombieri--Vinogradov supplies the average uniformity that makes Chen-type lower bounds possible, but it still stops at the square-root barrier. Stronger conjectural input, such as Elliott--Halberstam type distribution, would change what the sieve can prove. On the additive side, the circle method studies the same singular series from a Fourier-analytic viewpoint and explains why binary Goldbach needs information beyond divisibility by small primes.
From the structural side, the parity obstruction is the main lesson to carry forward. A sieve can encode local exclusions and optimize weights with great precision, but ordinary divisor-sum data alone cannot reliably distinguish primes from products of two primes. This is why modern arguments combine sieves with bilinear forms, dispersion estimates, exponential sums, or additive harmonic analysis: each supplies information that pure local divisibility data does not contain.
## References
- Viggo Brun, "Le crible d'Eratosthene et le theoreme de Goldbach", Videnskabsselskabets Skrifter, 1920.
- Heini Halberstam and Hans-Egon Richert, Sieve Methods, Academic Press, 1974.
- Henryk Iwaniec and Emmanuel Kowalski, Analytic Number Theory, American Mathematical Society, 2004.
- Hugh L. Montgomery and Robert C. Vaughan, Multiplicative Number Theory I: Classical Theory, Cambridge University Press, 2006.
- J. R. Chen, "On the representation of a large even integer as the sum of a prime and the product of at most two primes", Scientia Sinica, 1973.
Contents
- Introduction
- The Problem Sieve Methods Address
- Inclusion-Exclusion as the Starting Point
- Truncation and the Shape of a Sieve
- Sieve Dimension and Local Obstructions
- The Main Course Themes
- A Working Notational Summary
- 1. Sifting Problems and Inclusion-Exclusion
- Sifted Sets and Local Densities
- Exact Inclusion-Exclusion and the Need for Truncation
- The Legendre Sieve and the Square-Root Barrier
- The Fundamental Sieve Problem
- 2. Eratosthenes and Brun's Pure Sieve
- Eratosthenes as Counting by Deletion
- Brun's Alternating Truncation
- Twin Primes and the Singular Series
- What Brun's Pure Sieve Achieves
- 3. The Combinatorial Sieve and Dimension One
- The Sieve Axioms
- Upper and Lower Combinatorial Weights
- Rosser-Iwaniec Weights In Dimension One
- The Fundamental Lemma In Dimension One
- What This Chapter Provides For Later Sieves
- 4. Selberg's Upper Bound Sieve
- From Inclusion-Exclusion to Quadratic Weights
- The Selberg Optimization Problem
- Dimension and the Size of the Main Term
- Prime Tuples and Twin Prime Upper Bounds
- 5. The Linear Sieve and Almost Primes
- The Dimension-One Sieve Problem
- The Linear Sieve Functions
- The Parity Problem
- Almost Primes from the Lower-Bound Sieve
- The Linear Sieve as a Template
- 6. The Large Sieve
- Additive Spacing and the Hilbert-Space Inequality
- Duality and the Operator Viewpoint
- The Multiplicative Large Sieve
- Arithmetic Progressions and Discrepancy Estimates
- 7. Level of Distribution and Bombieri-Vinogradov
- Error Terms in Arithmetic Progressions
- Bombieri-Vinogradov as an Averaged GRH
- Mean Squares and the Barban-Davenport-Halberstam Theorem
- The Large Sieve and Vaughan's Identity
- 8. Applications to Twin Primes, Goldbach, and Almost-Prime Problems
- Prime Pairs and Admissible Tuples
- Brun--Titchmarsh and Primes in Arithmetic Progressions
- Chen-Type Almost-Prime Conclusions
- Goldbach Through the Sieve Lens
- The Parity Obstruction and the Limits of Sieve Information
- Beyond and Connections
- References
Analytic Number Theory II: Sieve Methods
Content
Problems
History
Created by admin on 6/15/2026 | Last updated on 6/15/2026
Prerequisites (0/4 completed)
Log in to track your prerequisite progress.
Prerequisites Graph
Interactive dependency map showing prerequisite concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Rate this page
★
★
★
★
★
Poor
Excellent