Number theory is the study of the integers and their fundamental properties, a field that has captivated mathematicians for centuries with both its elegant theorems and its deep connections to practical problems in cryptography and computation. This course explores the classical foundations of the subject, beginning with the Euclidean algorithm — a deceptively simple procedure that reveals profound insights into divisibility and the structure of the integers. From there, the course develops the theory of congruences, which provides a systematic way to study divisibility properties and forms the algebraic backbone for understanding modular arithmetic. The interplay between these elementary concepts and more advanced topics illustrates how basic observations about integers lead to surprisingly rich mathematical structures.
The course progressively builds toward understanding two central questions that have driven number theory for millennia: which numbers are prime, and how are the primes distributed among the integers? Quadratic residues and binary quadratic forms provide powerful tools for addressing these questions, revealing subtle patterns in which numbers can be expressed as sums of squares and how primes factor in quadratic extensions of the integers. These chapters form the classical heart of number theory, where specific calculations and concrete examples intertwine with abstract theory. Continued fractions offer another perspective on approximating real numbers by rationals, with applications to both theoretical questions in Diophantine approximation and practical algorithms.
The final chapters shift toward computational and algorithmic aspects, culminating in primality testing and factorisation methods that have become essential in cryptography and computer science. These computational topics are not mere applications of earlier theory but represent a modern dimension of classical problems: given a large integer, how can we determine whether it is prime, and if not, how can we find its factors? Throughout the course, the various threads — divisibility, congruences, quadratic forms, and the distribution of primes — reinforce each other, demonstrating that number theory is a unified subject where different techniques and perspectives illuminate the same fundamental phenomena.
# 1. Introduction
Number theory is one of the oldest branches of mathematics, concerned with the properties of the integers $\mathbb{Z} = \{0, \pm 1, \pm 2, \dots\}$ and the rationals $\mathbb{Q} = \left\{\frac{m}{n} \mid m, n \in \mathbb{Z},\ n \neq 0\right\}$. Despite — or perhaps because of — the elementary nature of its objects, number theory is a source of problems that are easy to state but extraordinarily difficult to resolve. It has always had an experimental character: one computes, notices patterns, formulates conjectures, and then struggles to prove them. The proofs, when they come, often require machinery from algebra, analysis, and geometry that seems far removed from the original question.
This chapter sets the scene by describing three representative open problems. Their purpose is not to provide content for the course proper, but to illustrate the character of the subject: accessible statements, deep structure, and connections to the rest of mathematics. The bulk of the course then develops the foundational tools — divisibility, congruences, quadratic reciprocity, arithmetic of number fields — that underpin modern number theory.
## Three Open Problems
The following problems have been chosen to illustrate both the richness of number theory and the fact that its hardest questions resist elementary attack.
[motivation]
### Why These Problems?
Number theory would be easy to underestimate. The integers are the first mathematical objects we encounter, and the questions one can ask about them seem simple: which integers are sums of two squares? Are there infinitely many twin primes? Yet centuries of effort by some of the finest mathematicians have not resolved these questions. Part of what makes them so hard is that the integers have two interacting structures — additive and multiplicative — and managing both simultaneously is the central technical challenge. The three problems below each highlight a different facet of this tension.
[/motivation]
### The Congruent Number Problem
Here is a problem a medieval scholar could have posed: take a right-angled triangle with rational side lengths — the simplest kind of triangle, the one every school pupil knows. What areas can it have? The surprising answer is: we do not know. Determining which positive integers can arise as the area of such a triangle is one of the oldest unsolved problems in number theory.
[definition: Congruent Number]
A positive integer $N$ is a **congruent number** if there exists a right-angled triangle with all three sides of rational length and area equal to $N$.
[/definition]
For instance, $N = 6$ is congruent, witnessed by the 3-4-5 right triangle (area $= \frac{1}{2} \cdot 3 \cdot 4 = 6$). But determining which $N$ are congruent is far from straightforward; the problem reduces to asking whether a certain elliptic curve has a rational point of infinite order, which connects it to deep questions in algebraic geometry.
[remark: Congruent Number Conjecture]
It is conjectured that every positive integer $N$ of the form $8n+5$, $8n+6$, or $8n+7$ (for $n \geq 0$) is a congruent number. This is a consequence of the Birch–Swinnerton-Dyer conjecture, one of the Millennium Prize Problems, which relates the arithmetic of elliptic curves to the behaviour of their associated $L$-functions. The general problem of determining which integers are congruent remains open.
[/remark]
The congruent number problem is a vivid example of a pattern that recurs throughout these notes: a question about discrete objects (integers, triangles with rational sides) turns out to be controlled by the continuous geometry of an algebraic curve. The bridge between the two worlds is the machinery of elliptic curves, which the course does not develop fully, but which will appear in context as the theory of number fields is built up. The next problem shows a different face of the same tension.
### The Twin Prime Conjecture
Look at the primes: $2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, \dots$ The gaps between consecutive primes seem to grow on average, yet pairs separated by just 2 keep appearing: $(3,5)$, $(5,7)$, $(11,13)$, $(17,19)$, $(29,31)$. Does this ever stop, or do such pairs recur infinitely often? The answer is unknown.
[definition: Twin Prime]
A **twin prime** is a prime $p$ such that $p+2$ is also prime. A **twin prime pair** is a pair $(p, p+2)$ with both entries prime.
[/definition]
The twin prime conjecture asserts that there are infinitely many twin prime pairs. A landmark result of Zhang (2013) established for the first time that there exist infinitely many pairs of primes with bounded gap — that is, there exists a finite $H$ such that $p_{n+1} - p_n \leq H$ for infinitely many $n$. Subsequent work of Maynard and the Polymath8b project refined this to a gap of at most 246, giving the current best bound: there are infinitely many primes $p$ such that the interval $[p, p+246]$ contains another prime.
[remark: Why Gaps Are Hard]
The difficulty is that the distribution of primes, while understood statistically via the prime number theorem, has a highly irregular fine structure. The additive and multiplicative properties of the integers interact in ways that are difficult to control simultaneously. To see why a naive approach fails, consider trying to force two nearby integers to both be prime: primes (apart from 2) are odd, so $p$ and $p+2$ are both odd — parity is not an obstruction. But among any three consecutive odd numbers one is divisible by 3, and sieve methods formalize the difficulty of avoiding all such divisibilities simultaneously for gaps as small as 2. The Brun sieve shows the sum $\sum_{(p, p+2) \text{ twin}} \left(\frac{1}{p} + \frac{1}{p+2}\right)$ converges — a much weaker statement than infinitude, and one that sieves can prove — but establishing infinitude itself remains out of reach.
[/remark]
The result of Zhang and Maynard represents the first genuine asymptotic progress on prime gaps in decades. Its proof uses a quantitative form of the large sieve and Fourier analysis over the integers — tools developed later in the course. The third problem brings yet a different kind of machinery into play: complex analysis.
### The Riemann Hypothesis and the Prime Counting Function
How are the primes distributed among the integers? If you count up to a large number $x$, roughly how many primes do you find? The answer involves one of the deepest unsolved problems in mathematics.
The **prime counting function** $\pi: \mathbb{Z}_{>0} \to \mathbb{Z}_{\geq 0}$ is defined by $\pi(x) = \#\{p \leq x : p \text{ prime}\}$. Early data suggests $\pi(x)$ grows like $x / \log x$: for $x = 10^6$ there are $78{,}498$ primes, while $x/\log x \approx 72{,}382$. Yet a better approximation exists.
[quotetheorem:1692]
[citeproof:1692]
The use of $\mathrm{li}(x)$ rather than the cruder $x/\log x$ is not cosmetic: $\mathrm{li}(x) - x/\log x \sim x/(\log x)^2$, which grows without bound, so the two approximations diverge significantly for large $x$. The logarithmic integral implicitly sums infinitely many correction terms that $x/\log x$ ignores, and the prime number theorem in the form $\pi(x) \sim \mathrm{li}(x)$ is strictly stronger.
It is important to note what the prime number theorem, in this form, does not provide: there is no effective error bound. The statement $\pi(x) \sim \mathrm{li}(x)$ gives the leading-order asymptotic, but non-vanishing of $\zeta$ on $\operatorname{Re}(s) = 1$ alone is not enough to extract a quantitative rate of convergence. Obtaining an explicit bound on $|\pi(x) - \mathrm{li}(x)|$ requires additional input — namely, a zero-free region for $\zeta$ that extends into the critical strip. Without such quantitative information, the theorem tells us how primes are distributed in the limit but not how quickly the limit is approached.
A quantitative form of the error is equivalent to the Riemann hypothesis.
[quotetheorem:1693]
[citeproof:1693]
The Riemann hypothesis is a Millennium Prize Problem, and its resolution would have sweeping consequences throughout number theory and beyond.
[explanation: The Role of Complex Analysis]
For $\operatorname{Re}(s) > 1$, define $\zeta: \{s \in \mathbb{C} : \operatorname{Re}(s) > 1\} \to \mathbb{C}$ by $\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}$. This series extends to a meromorphic function on all of $\mathbb{C}$ with a simple pole at $s = 1$. The connection between $\zeta(s)$ and the distribution of primes comes from the Euler product identity, stated next.
[/explanation]
The Euler product expresses $\zeta$ as an infinite product indexed by primes, turning the additive structure of $\sum 1/n^s$ into a multiplicative one. It is this bridge between addition (summing over all positive integers) and multiplication (indexing over primes) that makes zeta a tool for prime-counting at all.
[quotetheorem:1694]
[citeproof:1694]
This identity encodes the fundamental theorem of arithmetic (unique factorisation) in analytic form: the product over all primes converges to the same function as the sum over all integers.
The restriction $\operatorname{Re}(s) > 1$ is essential on both sides. The series $\sum 1/n^s$ converges absolutely precisely when $\operatorname{Re}(s) > 1$; the product converges (in the sense required for the identity to hold) because its convergence is equivalent to absolute convergence of $\sum p^{-s}$, which in turn is controlled by the absolute convergence of the series over $n$. On the boundary $\operatorname{Re}(s) = 1$ the series diverges (it contains the harmonic series at $s = 1$), and the product becomes conditionally non-convergent. This failure reflects that the identity depends fundamentally on unique factorisation in $\mathbb{Z}$. In rings of integers where unique factorisation fails — for instance $\mathbb{Z}[\sqrt{-5}]$, where $6 = 2 \cdot 3 = (1 + \sqrt{-5})(1 - \sqrt{-5})$ admits two distinct prime factorisations — the naive analogue of the Euler product breaks down, and restoring it requires passing to prime ideals rather than prime elements. The zeros of $\zeta(s)$ in the critical strip $0 < \operatorname{Re}(s) < 1$ govern oscillatory error terms in $\pi(x)$. This is why the Riemann hypothesis — a statement about where these zeros can lie — controls the quality of the prime number theorem approximation.
To understand why $\sqrt{x}$ is the critical threshold in the error bound, consider what would happen if $\zeta$ had a zero at $s_0 = 1/2 + \delta + it_0$ for some $\delta > 0$. The explicit formula for $\pi(x)$ contains a term $x^{s_0}/s_0$ for each non-trivial zero $s_0$; such a zero would contribute an error of size $x^{1/2 + \delta}$, larger than $\sqrt{x} \log x$. The Riemann hypothesis is precisely the assertion that no such zeros exist, which caps all error contributions at $O(\sqrt{x} \log x)$. Unconditionally, zero-free regions for $\zeta$ yield only weaker error bounds of the form $O(x \exp(-c\sqrt{\log x}))$, which is larger than any fixed power of $x$ but smaller than $x$ itself.
[example: Why the Error Bound Cannot Be Trivially Improved]
The bound $O(\sqrt{x} \log x)$ in the Riemann hypothesis equivalent is essentially sharp. Littlewood proved in 1914 that the error $\pi(x) - \mathrm{li}(x)$ changes sign infinitely often, and quantitative lower bounds (Ingham, Skewes) show it grows at least as fast as a constant times $\sqrt{x}(\log \log \log x)/\log x$ for infinitely many $x$. In particular, the naive hypothesis $\pi(x) < \mathrm{li}(x)$ for all large $x$, suggested by every computer calculation ever performed (Gauss and others believed it), is false. This shows that the $\sqrt{x}$ threshold in the RH-equivalent is not a conservative estimate: the true fluctuations are of exactly this order of magnitude, and no unconditional method can improve it without resolving the location of the zeros.
[/example]
## Sophistication Behind Simple Statements
Why do these three problems, none of them using advanced notions in their statement, require such heavy machinery? A medieval scholar could pose the congruent number problem; a schoolchild can state the twin prime conjecture; anyone who can count can ask about the distribution of primes. Yet the tools that have made progress — elliptic curves, sieve theory, complex analysis of zeta — lie far from the elementary language of the questions themselves. There is a pattern worth naming: the congruent number problem requires the theory of elliptic curves, the prime number theorem requires complex analysis, and twin primes require sieve theory. Number theory has never been self-contained in method; it has always drawn on whatever machinery is powerful enough.
[remark: Scope of This Course]
This course develops the classical foundations of number theory, progressing from divisibility and congruences through quadratic reciprocity and the structure of $(\mathbb{Z}/n\mathbb{Z})^\times$, to the arithmetic of number fields, $p$-adic numbers, and the proof of the prime number theorem. Along the way, algebraic and analytic tools are introduced as they are needed. The three open problems above will serve as reference points — partial results will appear as the course develops the relevant machinery.
[/remark]
The chapter that follows begins with the basic language of divisibility and primes — the definitions that make precise what it means for one integer to divide another, and what distinguishes a prime from a composite. These are elementary concepts, but stating them carefully is the first step towards the remarkable theorems that follow.
With the fundamentals of number theory established, we turn to the algorithmic heart of the subject: Euclid's elegant method for finding greatest common divisors, which illuminates the structure of integers and naturally leads to questions of unique factorisation.
# 2. Euclid's Algorithm and Factoring
This chapter lays the algebraic groundwork for all of number theory that follows. The central question is deceptively simple: given two integers, what is their greatest common divisor, and how do we find it efficiently? Euclid's algorithm answers this in polynomial time, and the structure it reveals — that every set of integer linear combinations is a principal ideal — has consequences that reach from basic divisibility all the way to the unique factorisation of integers into primes. We close the chapter by examining the computational landscape of factoring, which turns out to be far harder than computing GCDs, and prove Euclid's theorem that primes are infinite.
## The Division Algorithm and the Ideal Structure of $\mathbb{Z}$
Every calculation in number theory ultimately rests on the ability to divide with remainder. The division algorithm makes this precise.
[quotetheorem:725]
[citeproof:725]
Two remarks on the hypotheses are in order. The requirement $b > 0$ is essential: if $b = 0$ the inequality $0 \leq r < b$ becomes nonsensical, and for $b < 0$ the correct analogue replaces $b$ with $|b|$ and demands $0 \leq r < |b|$. Uniqueness of $(q, r)$ also follows from the argument: if $(q', r')$ is another pair satisfying the conditions, then $b(q - q') = r' - r$, so $b$ divides $r - r'$; since $|r - r'| < b$, this forces $r = r'$ and hence $q = q'$.
When $r = 0$, we say $b$ divides $a$ and write $b \mid a$; otherwise we write $b \nmid a$.
The division algorithm is the engine behind a beautiful structural fact about $\mathbb{Z}$. This fact is special to $\mathbb{Z}$: in other rings, not every ideal is generated by a single element. For example, in the polynomial ring $\mathbb{Z}[x]$, the ideal $(2, x) = \{2f(x) + xg(x) \mid f, g \in \mathbb{Z}[x]\}$ is not principal — no single polynomial generates it. Similarly, in $\mathbb{Z}[\sqrt{-5}]$, the ideal $(2, 1 + \sqrt{-5})$ is not principal. The fact that $\mathbb{Z}$ avoids these complications is a non-trivial structural property. Given integers $a_1, \dots, a_n$ not all zero, form the set of all integer linear combinations:
\begin{align*}
I = \{\lambda_1 a_1 + \dots + \lambda_n a_n \mid \lambda_i \in \mathbb{Z}\}.
\end{align*}
Students of Galois/Ring/Module theory will recognise this as the ideal of $\mathbb{Z}$ generated by $\{a_1, \dots, a_n\}$. The key structural fact is that $\mathbb{Z}$ is a principal ideal domain.
[quotetheorem:1695]
[citeproof:1695]
This theorem has an immediate payoff: the number $d$ is the greatest common divisor. The same argument fails in $\mathbb{Z}[x]$: the ideal $(2, x)$ contains no single generator because $\gcd(2, x) = 1$ as polynomials over $\mathbb{Q}$, yet $1 \notin (2, x)$ as an ideal in $\mathbb{Z}[x]$.
[definition: Greatest Common Divisor]
Given $a_1, \dots, a_n \in \mathbb{Z}$ not all zero, their **greatest common divisor** $\gcd(a_1, \dots, a_n)$, also written $(a_1, \dots, a_n)$, is the generator $d$ of the ideal $I$ above. It satisfies:
- $d \mid a_i$ for all $i$ (since each $a_i \in I = d\mathbb{Z}$);
- if $e \mid a_i$ for all $i$, then $e \mid d$ (since $d \in I$ is an integer linear combination of the $a_i$, so any common divisor of the $a_i$ also divides $d$).
[/definition]
The gcd captures precisely the arithmetic content shared by a collection of integers. It will reappear constantly: in the criterion for solvability of Diophantine equations, in the structure of units modulo $n$, and in the proof of unique factorisation.
[remark: GCD as Linear Combination]
The definition immediately implies that $(a_1, \dots, a_n) = \lambda_1 a_1 + \dots + \lambda_n a_n$ for some $\lambda_i \in \mathbb{Z}$. This is Bezout's identity. The definition is non-constructive; Euclid's algorithm provides the construction.
[/remark]
A first application is to linear Diophantine equations — equations in integers. The simplest case is two variables.
[quotetheorem:1696]
[citeproof:1696]
The solution is never unique when it exists. If $(x_0, y_0)$ is one solution, then every solution has the form
\begin{align*}
(x, y) = \left(x_0 + \frac{b}{d} t,\; y_0 - \frac{a}{d} t\right), \quad t \in \mathbb{Z},
\end{align*}
where $d = \gcd(a, b)$: the shifts $(b/d, -a/d)$ span the kernel of $(x, y) \mapsto ax + by$ in $\mathbb{Z}^2$. The same circle of ideas generalises to $n$ variables: the equation $a_1 x_1 + \dots + a_n x_n = c$ has an integer solution if and only if $\gcd(a_1, \dots, a_n) \mid c$, and the solution set is an affine lattice of rank $n - 1$.
The divisibility condition is sharp. For example, $2x + 4y = 3$ has no integer solution: $\gcd(2, 4) = 2$ does not divide $3$, so no combination of even multiples of $2$ and $4$ can sum to an odd number.
## Euclid's Algorithm
The definition of $\gcd$ via ideals tells us the gcd exists but gives no computational procedure. Euclid's algorithm fills this gap. The idea is to reduce the problem size repeatedly using the division algorithm.
[quotetheorem:1697]
[citeproof:1697]
A few remarks on edge cases and efficiency. The hypothesis $a > b > 0$ is mild: for general integer inputs, replace $a$ and $b$ by $|a|$ and $|b|$, and note that $\gcd(a, 0) = |a|$ so the algorithm need only be run when both inputs are nonzero. The termination bound is in fact quantitative — the number of steps is $O(\log b)$, with the worst case attained on consecutive Fibonacci pairs (Lame's theorem), which we illustrate below. Bezout coefficients come for free if we track linear combinations alongside the remainders.
[remark: Extended Euclidean Algorithm]
By the GCD-as-linear-combination result, $(a, b) = ra + sb$ for some $r, s \in \mathbb{Z}$. Euclid's algorithm can be run in "extended" form to compute these Bezout coefficients $r, s$ simultaneously, by tracking how each remainder is expressed as a linear combination of $a$ and $b$.
[/remark]
The extended algorithm is particularly useful for computing modular inverses: if $(a, n) = 1$, the Bezout relation $ra + sn = 1$ gives $ra \equiv 1 \pmod{n}$, so $r$ is the inverse of $a$ modulo $n$. This construction will reappear when we study the group of units $(\mathbb{Z}/n\mathbb{Z})^\times$ in Chapter 3.
[example: Extended Euclidean Algorithm for $\gcd(34, 25)$]
We compute $\gcd(34, 25)$ and find Bezout coefficients. Track the coefficients $(x, y)$ such that the current remainder equals $34x + 25y$:
\begin{align*}
34 &= 1 \cdot 25 + 9 & \text{gives remainder } 9 = 34 \cdot 1 + 25 \cdot (-1),\\
25 &= 2 \cdot 9 + 7 & \text{gives remainder } 7 = 34 \cdot (-2) + 25 \cdot 3,\\
9 &= 1 \cdot 7 + 2 & \text{gives remainder } 2 = 34 \cdot 3 + 25 \cdot (-4),\\
7 &= 3 \cdot 2 + 1 & \text{gives remainder } 1 = 34 \cdot (-11) + 25 \cdot 15.
\end{align*}
So $\gcd(34, 25) = 1$ and $1 = -11 \cdot 34 + 15 \cdot 25$.
[/example]
## Primes and the Fundamental Theorem of Arithmetic
It would be easy to assume every integer greater than 1 has only one prime factorisation — but that claim needs proof. What could go wrong if we simply wrote down factors in some order? Nothing in the definition of divisibility prevents two genuinely different lists of primes from multiplying to the same number; ruling that out requires an argument. The key ingredient turns out to be a special property of primes that composite numbers lack entirely.
[definition: Prime]
An integer $n > 1$ is **prime** if its only positive divisors are $1$ and $n$. An integer $n > 1$ that is not prime is **composite**; equivalently, $n$ is composite if there exist integers $1 < a, b < n$ with $n = ab$.
[/definition]
[quotetheorem:1698]
[citeproof:1698]
This is the key step in proving unique factorisation. Without it, uniqueness fails — and it does fail in rings where the analogous lemma breaks down. The canonical example is $\mathbb{Z}[\sqrt{-5}]$, where $6 = 2 \cdot 3 = (1 + \sqrt{-5})(1 - \sqrt{-5})$ are two genuinely different factorisations into irreducibles. Here $2$ divides the product $(1 + \sqrt{-5})(1 - \sqrt{-5}) = 6$, yet $2 \nmid (1 + \sqrt{-5})$ and $2 \nmid (1 - \sqrt{-5})$ in $\mathbb{Z}[\sqrt{-5}]$, so the prime divisibility lemma fails for the element $2$. The reason is that the ideal $(2, 1 + \sqrt{-5})$ is not principal, so the Bezout argument used above cannot be carried out.
[quotetheorem:730]
[citeproof:730]
A few boundary cases deserve attention. The theorem is stated for $n > 1$: the integer $1$ is the empty product of primes (a valid but vacuous factorisation), and we exclude $1$ from being prime by convention so that factorisations are unique — if $1$ were prime, we could insert arbitrarily many factors of $1$. The integer $0$ has no prime factorisation at all (every prime divides $0$, so no finite product of primes equals $0$). Negative integers require a sign convention: one approach is to allow a factor of $-1$, though the theorem is most cleanly stated for positive integers. In rings that lack the prime divisibility lemma — such as $\mathbb{Z}[\sqrt{-5}]$ — the uniqueness proof breaks down at the cancellation step, and unique factorisation fails in those rings.
The fundamental theorem allows us to express the GCD in terms of prime factorisations. If $a = \prod_i p_i^{\alpha_i}$ and $b = \prod_i p_i^{\beta_i}$ (with $\alpha_i, \beta_i \geq 0$), then
\begin{align*}
(a, b) = \prod_i p_i^{\min(\alpha_i, \beta_i)}.
\end{align*}
This is an elegant formula, but it is not a practical algorithm: to use it, one must first factor $a$ and $b$, which as we discuss next is computationally much harder than computing GCDs.
## Polynomial Time and the Complexity of Factoring
The efficiency of Euclid's algorithm stands in stark contrast to the difficulty of factoring. To see how stark the contrast is, consider the most obvious factoring method: trial division. To factor a 100-digit integer $N$, trial division tests every candidate divisor up to $\sqrt{N} \approx 10^{50}$. At an optimistic $10^{10}$ elementary operations per second, this would take roughly $10^{40}$ seconds — unimaginably longer than the $\sim 4 \times 10^{17}$ seconds that have elapsed since the Big Bang. This is the obstacle: the naive approach scales with $\sqrt{N}$, which grows exponentially in the number of digits of $N$, while Euclid's algorithm handles inputs of the same size in milliseconds. To make this distinction precise — and to articulate what "polynomial" means in the first place — we need a notion of computational complexity.
[definition: Polynomial Time Algorithm]
An algorithm with a single integer input $N > 0$ is **polynomial time** if there exist constants $b, c > 0$ such that the algorithm completes in at most $c(\log N)^b$ elementary operations (such as digit additions or multiplications in a fixed base). For $k$ integer inputs $N_1, \dots, N_k$, the bound is $c(\max_i \log N_i)^b$.
[/definition]
The logarithm appears because $\log N$ is the number of digits of $N$ — the actual size of the input as written down. A polynomial-time algorithm is one whose running time grows polynomially in the input size.
[example: Fibonacci Numbers and the Worst Case for Euclid's Algorithm]
Fibonacci numbers exhibit the worst-case behaviour of Euclid's algorithm. Define $F_1 = 1$, $F_2 = 1$, and $F_{n+1} = F_n + F_{n-1}$. Computing $\gcd(F_{n+1}, F_n)$ takes exactly $n-1$ division steps, since each step produces
\begin{align*}
F_{n+1} = 1 \cdot F_n + F_{n-1},
\end{align*}
so the remainders walk back through the Fibonacci sequence: $F_n, F_{n-1}, \dots, F_2, F_1, 0$. Since $F_{n+1} \sim \varphi^{n+1}/\sqrt{5}$ where $\varphi = (1 + \sqrt{5})/2 \approx 1.618$, the number of steps is proportional to $\log F_{n+1}$. This is Lame's theorem: on any input $(a, b)$ with $a > b > 0$, Euclid's algorithm terminates in at most $\log_\varphi(b) + O(1)$ steps, and the Fibonacci pairs show this bound is sharp. Consequently, Euclid's algorithm runs in polynomial time: the number of steps is at most proportional to the number of digits of the input.
[/example]
### Factoring is Hard
What about factoring $N$ into prime factors? The most straightforward method is trial division: test all integers $d \leq \sqrt{N}$ as potential factors. If $N = pq$ with $p, q \approx \sqrt{N}$, this requires approximately $\sqrt{N}$ divisions. Since $\sqrt{N} = e^{(\log N)/2}$ grows exponentially in $\log N$, trial division is not polynomial time.
To see how dramatic this difference is: if $N$ has $100$ digits, $\sqrt{N}$ is approximately $10^{50}$. At $2^9 \approx 500$ divisions per second, trial division would take roughly $10^{50}/500 \approx 10^{47}$ seconds, or about $6 \times 10^{39}$ years. By contrast, Euclid's algorithm computes $\gcd$ of two 100-digit numbers in milliseconds.
There are substantially better factoring algorithms than trial division — we will encounter some later in the course — but as of today, no polynomial-time factoring algorithm is known. This presumed hardness is the basis for the security of RSA encryption and related cryptographic protocols. The current record is factoring a 232-digit number, achieved using thousands of computers over several months.
## The Infinitude of Primes
How do we know primes do not simply run out — that there is not some largest prime beyond which every integer is composite? The answer is one of the oldest arguments in mathematics.
[quotetheorem:724]
[citeproof:724]
[remark: Inefficiency of Euclid's Construction]
The proof is constructive — it explicitly produces a prime larger than any given $N$ — but the construction is inefficient. For example, taking $N = 1000$, the number $M$ already has over 400 digits. This approach cannot realistically generate large primes for practical use.
[/remark]
### Finding Large Primes in Practice
Euclid's proof that primes are infinite is existential: it guarantees a prime exists beyond any bound, but the construction produces unwieldy numbers. For practical applications — cryptography, pseudorandom generation, hash functions — one needs large primes that can actually be found and worked with. How do we locate them efficiently?
For numbers of reasonable size (fewer than 1000 digits), a practical approach is to choose large integers at random and test them for primality using one of the polynomial-time primality tests. Since primes are relatively common (by the prime number theorem, roughly $1/\log N$ of integers near $N$ are prime), a random search succeeds quickly.
For very large numbers of special forms, faster tests exist. The most notable example is Mersenne numbers of the form $N = 2^q - 1$. Dedicated algorithms such as the Lucas–Lehmer test apply to these and have been used to find the current record primes. At the time of Scholl's lectures, it had been established that $2^{74207281} - 1$ is prime.
Having mastered the mechanics of divisibility and factorisation, we now explore what happens when we identify integers that differ by a fixed modulus, developing the theory of congruences that unifies many number-theoretic phenomena under a common algebraic framework.
# 3. Congruences
Congruences are the central organisational tool of elementary number theory. The preceding chapter established the division algorithm and unique factorisation, giving us a way to decompose integers multiplicatively. This chapter asks a different question: what does arithmetic look like when we care only about remainders? We introduce the ring $\mathbb{Z}/n\mathbb{Z}$, study which elements are invertible, and develop two major structural theorems — the Chinese Remainder Theorem and the cyclicity of $(\mathbb{Z}/p\mathbb{Z})^\times$ — that together determine the arithmetic of residue rings completely.
## The Ring of Residues and Its Units
Given an integer $n \geq 1$, we want to do arithmetic modulo $n$: to treat two integers as the same whenever they differ by a multiple of $n$. The question is whether this is consistent — whether addition and multiplication respect this identification.
[definition: Congruence]
Fix an integer $n \geq 1$ called the modulus. For $a, b \in \mathbb{Z}$, we say $a$ is **congruent to $b$ modulo $n$**, written $a \equiv b \pmod{n}$, if $n \mid a - b$.
[/definition]
Congruence modulo $n$ is an equivalence relation on $\mathbb{Z}$. The equivalence class of $a$ is the coset $a + n\mathbb{Z} = \{a + kn : k \in \mathbb{Z}\}$. We write $\mathbb{Z}/n\mathbb{Z}$ for the set of all such cosets. Because $n\mathbb{Z}$ is an ideal in the ring $\mathbb{Z}$, addition and multiplication of cosets are well-defined: $(a + n\mathbb{Z}) + (b + n\mathbb{Z}) = (a+b) + n\mathbb{Z}$ and similarly for products. This makes $\mathbb{Z}/n\mathbb{Z}$ a commutative ring with identity, called the **quotient ring**.
The additive group $(\mathbb{Z}/n\mathbb{Z}, +)$ is cyclic of order $n$, generated by the class of $1$. But which elements are invertible under multiplication? The answer involves the gcd from Chapter 2.
[quotetheorem:1699]
[citeproof:1699]
This characterisation lets us define the most important substructure of $\mathbb{Z}/n\mathbb{Z}$.
[definition: Group of Units]
For $n > 1$, the **group of units** of $\mathbb{Z}/n\mathbb{Z}$ is
\begin{align*}
(\mathbb{Z}/n\mathbb{Z})^\times &= \{ a + n\mathbb{Z} : (a, n) = 1 \},
\end{align*}
with the operation of multiplication modulo $n$.
[/definition]
[definition: Euler $\varphi$-function]
The **Euler $\varphi$-function** is
\begin{align*}
\varphi(n) &\coloneqq \#(\mathbb{Z}/n\mathbb{Z})^\times = \#\{ a \in \mathbb{Z} : 1 \leq a \leq n,\, (a,n) = 1 \}.
\end{align*}
[/definition]
[remark: When is $\mathbb{Z}/n\mathbb{Z}$ a field?]
For $n > 1$, the ring $\mathbb{Z}/n\mathbb{Z}$ is a field iff every non-zero element has a multiplicative inverse, iff $n$ is prime, iff $\varphi(n) = n - 1$. When $n$ is composite, the non-unit, non-zero elements are precisely those classes $(a + n\mathbb{Z})$ with $1 < (a,n) < n$; these are zero-divisors, obstructing field structure.
[/remark]
With the group $(\mathbb{Z}/n\mathbb{Z})^\times$ in hand, the first major theorem of the chapter follows from a single application of Lagrange's theorem.
[quotetheorem:1700]
[citeproof:1700]
The hypothesis $(a,n) = 1$ is essential. If $a$ and $n$ share a common factor, say $p \mid (a,n)$, then $a^k$ is divisible by $p$ for every $k \geq 1$, so $a^{\varphi(n)}$ cannot be $\equiv 1 \pmod{n}$ (since $1$ is not divisible by $p$). The theorem says nothing about the size of the order of $a$ — it could be any divisor of $\varphi(n)$ — only that the $\varphi(n)$-th power returns to $1$.
[quotetheorem:731]
[citeproof:731]
Fermat's Little Theorem is the case $n = p$ prime and $\varphi(p) = p-1$. Its formulation $a^p \equiv a$ is pleasingly symmetric: it holds for all $a$, not just those coprime to $p$, because the prime case ensures $(a,p) \in \{1, p\}$ and the latter collapses both sides to zero. This theorem is the engine behind primality testing: if $a^{n-1} \not\equiv 1 \pmod{n}$ for some $a$ coprime to $n$, then $n$ is not prime.
The following lemma connects the $\varphi$-function to the structure of cyclic groups, and will be needed for the proof of cyclicity of $(\mathbb{Z}/p\mathbb{Z})^\times$.
[quotetheorem:1701]
[citeproof:1701]
## Simultaneous Congruences and the Chinese Remainder Theorem
Suppose we have two clocks: one ticks mod $10$ and one ticks mod $13$. If we know both readings, can we reconstruct the time mod $130$? What goes wrong if the moduli share a common factor?
The question of solving a system of simultaneous congruences with distinct moduli is one of the oldest in number theory. Brute-force search is feasible for small moduli, but the Chinese Remainder Theorem gives a complete, constructive solution.
[example: Simultaneous congruences mod 10 and 13]
Find all $x \in \mathbb{Z}$ with $x \equiv 7 \pmod{10}$ and $x \equiv 3 \pmod{13}$.
The strategy is to find "indicator" elements $u, v \in \mathbb{Z}$ satisfying
\begin{align*}
u &\equiv 1 \pmod{10}, & u &\equiv 0 \pmod{13}, \\
v &\equiv 0 \pmod{10}, & v &\equiv 1 \pmod{13}.
\end{align*}
Then $x = 7u + 3v$ automatically satisfies both congruences.
Since $(10, 13) = 1$, Bezout gives $r, s \in \mathbb{Z}$ with $10r + 13s = 1$. The Euclidean algorithm yields $r = 4$, $s = -3$. Setting $u = 13s = -39$ and $v = 10r = 40$, we compute:
\begin{align*}
x &= 7(-39) + 3(40) = -273 + 120 = -153 \equiv 107 \pmod{130}.
\end{align*}
The solution set is $\{x \in \mathbb{Z} : x \equiv 107 \pmod{130}\}$.
Note that the modulus of the combined solution is $130 = 10 \times 13$, the product of the two coprime moduli. This is not a coincidence.
[/example]
[quotetheorem:734]
[citeproof:734]
The pairwise coprimality hypothesis cannot be dropped. If $(m_1, m_2) = d > 1$, the system $x \equiv 0 \pmod{m_1}$, $x \equiv 1 \pmod{m_2}$ has no solution when $d \nmid (0 - 1)$, i.e. when $d \nmid 1$. In that case the two constraints are mutually inconsistent. More precisely, a system with moduli $m_1, m_2$ having $(m_1, m_2) = d$ is solvable iff $d \mid (a_1 - a_2)$, and the solution is then unique mod $\text{lcm}(m_1, m_2)$.
The Chinese Remainder Theorem has an elegant algebraic restatement. Define the product ring $\mathbb{Z}/m_1\mathbb{Z} \times \cdots \times \mathbb{Z}/m_k\mathbb{Z}$ with componentwise addition and multiplication.
[definition: Product Ring]
Given rings $R_1, \ldots, R_k$, their **product ring** is
\begin{align*}
R_1 \times \cdots \times R_k &= \{ (r_1, \ldots, r_k) : r_i \in R_i \}
\end{align*}
with componentwise operations: $(r_i) + (s_i) = (r_i + s_i)$ and $(r_i)(s_i) = (r_i s_i)$. The zero element is $(0,\ldots,0)$ and the identity is $(1,\ldots,1)$.
[/definition]
The Chinese Remainder Theorem, formulated in terms of product rings, becomes a structural statement: the residue ring modulo a product of pairwise coprime integers is canonically isomorphic to the product of the individual residue rings. This reformulation turns congruence arithmetic into componentwise arithmetic and exposes the full algebraic content of the theorem.
[quotetheorem:1702]
[citeproof:1702]
This isomorphism shows that $\mathbb{Z}/M\mathbb{Z}$ decomposes completely when $M$ is squarefree with coprime prime factors. When $M = p_1^{k_1} \cdots p_r^{k_r}$, the theorem applies with $m_i = p_i^{k_i}$ since distinct prime powers are always coprime.
An immediate consequence is a multiplicativity formula for $\varphi$.
[quotetheorem:1703]
[citeproof:1703]
## Multiplicative Functions and the Structure of $\varphi$
The multiplicativity of $\varphi$ established above is not a coincidence — it reflects a general pattern among number-theoretic functions. Many natural functions of $n$ depend only on the prime factorisation of $n$, and factor multiplicatively over coprime inputs.
[definition: Multiplicative Function]
A function $f : \mathbb{N} \to \mathbb{C}$ is **multiplicative** if
\begin{align*}
(m, n) = 1 \implies f(mn) = f(m)f(n).
\end{align*}
It is **totally multiplicative** if $f(mn) = f(m)f(n)$ for all $m, n \in \mathbb{N}$, without the coprimality condition.
[/definition]
The distinction matters: $\varphi$ is multiplicative but not totally multiplicative ($\varphi(p^2) = p(p-1) \neq (p-1)^2 = \varphi(p)^2$ in general). A multiplicative function is determined by its values on prime powers.
[example: Key Multiplicative Functions]
The following are multiplicative functions $\mathbb{N} \to \mathbb{C}$:
- $\varphi(n)$, the Euler phi-function,
- $\tau(n) = \#\{d \in \mathbb{N} : d \mid n\}$, the number of positive divisors,
- $\sigma(n) = \sum_{d \mid n} d$, the sum of positive divisors,
- more generally, $\sigma_k(n) = \sum_{d \mid n} d^k$ for any $k \geq 0$, with $\sigma_0 = \tau$ and $\sigma_1 = \sigma$.
The multiplicativity of these functions can be verified directly, or deduced from the lemma below.
[/example]
Many naturally occurring multiplicative functions are divisor sums of simpler multiplicative functions: $\tau(n) = \sum_{d \mid n} 1$, $\sigma(n) = \sum_{d \mid n} d$, and more generally $\sigma_k(n) = \sum_{d \mid n} d^k$. The next result explains why this operation preserves multiplicativity.
[quotetheorem:1704]
[citeproof:1704]
For instance, taking $f(n) = n^k$ (totally multiplicative) yields $g(n) = \sigma_k(n)$. The lemma will be used again in later chapters when we study Dirichlet series and the Riemann zeta function: the identity $\zeta(s)\zeta(s-k) = \sum_n \sigma_k(n) n^{-s}$ is a consequence of this divisor convolution structure.
We can now compute $\varphi$ explicitly.
[quotetheorem:1705]
[citeproof:1705]
The identity $\sum_{d \mid n} \varphi(d) = n$ has a direct combinatorial explanation: the $n$ cosets of $\langle 1 \rangle$ in $\mathbb{Z}/n\mathbb{Z}$ partition according to the order of their element; an element has order $d$ iff $d \mid n$ and $(a/d, n/d) = 1$, so there are $\varphi(d)$ elements of each order $d \mid n$.
## Polynomial Congruences and Roots Modulo Primes
How many solutions can a polynomial equation have modulo a prime? Over the integers or rationals, a degree-$n$ polynomial has at most $n$ roots. Does this bound survive when we reduce modulo $n$?
The answer depends critically on whether $n$ is prime. Over $\mathbb{Z}/8\mathbb{Z}$, the polynomial $f(X) = X^2 - 1$ has four roots: $1, 3, 5, 7$. This is a failure mode that cannot occur over a field, and isolating the condition that rules it out is the first task of this section.
We work with polynomial rings. For a commutative ring $R$ with $1$, the ring $R[X]$ consists of all formal polynomials $\sum_{i=0}^n a_i X^i$ with $a_i \in R$, with the usual addition and multiplication. The word "formal" matters: two polynomials in $R[X]$ are equal iff all their coefficients agree, regardless of whether they define the same function $R \to R$.
[example: Polynomial $\neq$ function in $\mathbb{Z}/p\mathbb{Z}$]
Let $p$ be prime and $R = \mathbb{Z}/p\mathbb{Z}$. The polynomial $f(X) = X^p - X \in R[X]$ is non-zero (it has degree $p$), but Fermat's Little Theorem says $f(a) = a^p - a = 0$ in $R$ for every $a \in R$. So $f$ and the zero polynomial define the same function $R \to R$, even though they are different elements of $R[X]$. This phenomenon cannot happen over an infinite integral domain.
[/example]
To make precise statements about roots of polynomials we need the basic structural apparatus: degree, polynomial division, and the notion of an integral domain. These are collected below.
[definition: Degree]
For $f = \sum_{i=0}^n a_i X^i \in R[X]$, the **degree** $\deg(f)$ is the largest index $i$ with $a_i \neq 0$, and $\deg(f) = -\infty$ if $f = 0$. We have $\deg(fg) \leq \deg(f) + \deg(g)$.
[/definition]
Polynomial long division works in $R[X]$ under a mild hypothesis on the leading coefficient of the divisor, and produces a unique quotient and remainder. This is the foundation for every result about roots that follows.
[quotetheorem:1706]
[citeproof:1706]
The leading coefficient unit condition is essential. In $\mathbb{Z}[X]$, we cannot generally divide by a polynomial whose leading coefficient is $2$, since $2$ has no inverse in $\mathbb{Z}$. Over fields — such as $\mathbb{Z}/p\mathbb{Z}$ for $p$ prime — every non-zero leading coefficient is a unit, so division always works.
[quotetheorem:1707]
[citeproof:1707]
The question of how many roots $f$ can have reduces to the algebraic condition that distinguishes $\mathbb{Z}/p\mathbb{Z}$ from $\mathbb{Z}/8\mathbb{Z}$: whether the ring has zero-divisors.
[definition: Integral Domain]
A nonzero commutative ring $R$ is an **integral domain** if for all $a, b \in R$, $ab = 0$ implies $a = 0$ or $b = 0$.
[/definition]
[example: Integral domains among residue rings]
$\mathbb{Z}$ and $\mathbb{Q}$ are integral domains. Among residue rings: $\mathbb{Z}/n\mathbb{Z}$ is an integral domain iff $n$ is prime. When $n = p$ is prime, $\mathbb{Z}/p\mathbb{Z}$ is in fact a field. When $n$ is composite, say $n = ab$ with $1 < a, b < n$, then $a$ and $b$ are non-zero elements of $\mathbb{Z}/n\mathbb{Z}$ with $ab \equiv 0 \pmod{n}$.
[/example]
The integral-domain hypothesis is exactly what is needed to recover the classical bound on the number of roots of a polynomial. Without it, the failure mode from the start of this section ($X^2 - 1$ having four roots in $\mathbb{Z}/8\mathbb{Z}$) is permitted.
[quotetheorem:1708]
[citeproof:1708]
This theorem fails over non-integral-domains. The polynomial $X^2 - 1 \in (\mathbb{Z}/8\mathbb{Z})[X]$ has roots $1, 3, 5, 7$: four roots for a degree-$2$ polynomial. The ring $\mathbb{Z}/8\mathbb{Z}$ has zero-divisors ($2 \cdot 4 = 0$), which is why the argument breaks down — from $(\beta - 1)(\beta + 1) = 0$ in $\mathbb{Z}/8\mathbb{Z}$ we cannot conclude $\beta = 1$ or $\beta = -1$.
For our main application, the corollary restricts to prime moduli.
[quotetheorem:1709]
[citeproof:1709]
The condition "$p$ does not divide the leading coefficient" cannot be dropped: if $f(X) = pX + 1$, then $\bar{f}(X) = 1$ has no roots, but $f$ itself behaves quite differently over $\mathbb{Z}/p^2\mathbb{Z}$.
[example: Wilson's Theorem via Polynomial Counting]
Take $p$ prime and set
\begin{align*}
f(X) &= X^{p-1} - 1 - \prod_{a=1}^{p-1}(X - a) \in (\mathbb{Z}/p\mathbb{Z})[X].
\end{align*}
By Fermat's Little Theorem, $a^{p-1} \equiv 1 \pmod{p}$ for $a = 1, \ldots, p-1$, so every such $a$ is a root of $f$. But $\deg f \leq p - 2$, and Lagrange's theorem says $f$ can have at most $p - 2$ roots — yet we found $p - 1$ roots. The only resolution is $f \equiv 0$ in $(\mathbb{Z}/p\mathbb{Z})[X]$, i.e. $f$ is identically zero mod $p$.
Evaluating the identity $X^{p-1} - 1 = \prod_{a=1}^{p-1}(X - a)$ at $X = 0$ gives
\begin{align*}
-1 &\equiv \prod_{a=1}^{p-1}(0 - a) = (-1)^{p-1}(p-1)! \pmod{p}.
\end{align*}
Since $p$ is odd, $(-1)^{p-1} = 1$, yielding **Wilson's Theorem**: $(p-1)! \equiv -1 \pmod{p}$.
This is a boundary-revealing example: the polynomial counting argument does not merely give a bound, but forces an identity when the bound is saturated.
[/example]
## Primitive Roots and the Cyclicity of $(\mathbb{Z}/p\mathbb{Z})^\times$
We know $(\mathbb{Z}/n\mathbb{Z}, +)$ is cyclic. The multiplicative group $(\mathbb{Z}/p\mathbb{Z})^\times$ need not be cyclic for general $n$ — for $n = 8$, the group $\{\pm 1, \pm 3\}$ has exponent $2$ and is not cyclic. The question is whether the multiplicative group mod $p$, for $p$ prime, admits a single generator.
The existence of such a generator is far from obvious. The additive group is cyclic because $\mathbb{Z}/n\mathbb{Z}$ is generated by $1$ as a ring. No such structural reason gives us a generator of the multiplicative group — we must actually prove it.
[quotetheorem:1710]
[citeproof:1710]
The proof uses Lagrange's polynomial theorem in an essential way: without the bound on the number of roots of $X^d - 1$, we could not force $N_d \leq \varphi(d)$. This is the point at which the primality of $p$ enters — over $\mathbb{Z}/n\mathbb{Z}$ for composite $n$, the polynomial root bound fails, and indeed $(\mathbb{Z}/n\mathbb{Z})^\times$ need not be cyclic.
[definition: Primitive Root]
A **primitive root modulo $p$** is a generator of the cyclic group $(\mathbb{Z}/p\mathbb{Z})^\times$, that is, an element $g \in \mathbb{Z}$ with $(g, p) = 1$ such that every unit mod $p$ is a power of $g$.
[/definition]
By the cyclicity theorem, primitive roots always exist modulo a prime $p$, and there are exactly $\varphi(p-1)$ of them.
[example: Primitive root modulo 19]
Take $p = 19$. The order of $2$ in $(\mathbb{Z}/19\mathbb{Z})^\times$ divides $p - 1 = 18 = 2 \cdot 3^2$. We check:
- $2^6 \equiv 64 \equiv 7 \not\equiv 1 \pmod{19}$, so $\text{ord}(2) \nmid 6$,
- $2^9 \equiv 512 \equiv -1 \pmod{19}$, so $\text{ord}(2) \nmid 9$.
The only divisors of $18$ not dividing $6$ or $9$ is $18$ itself, so $\text{ord}(2) = 18$. Thus $2$ is a primitive root mod $19$.
[/example]
Primitive roots are both theoretically important and computationally significant — discrete logarithms base a primitive root are the foundation of many public-key cryptographic protocols. Yet many basic questions remain open:
**Artin's Conjecture**: For any integer $g \geq 2$ that is not $-1$ or a perfect square, there are infinitely many primes $p$ for which $g$ is a primitive root mod $p$. Even the case $g = 2$ is unsolved. Conditionally on the Generalised Riemann Hypothesis, Hooley proved the conjecture. Unconditionally, it is known (Heath-Brown, 1986) that at least one of $2, 3, 5$ is a primitive root for infinitely many primes.
**Size of the smallest primitive root**: The smallest primitive root mod $p$ tends to infinity with $p$. It is bounded above by $C p^{1/4 + \varepsilon}$ for any $\varepsilon > 0$, but is expected to be much smaller — conjecturally $O((\log p)^2)$.
## Primitive Roots Modulo Prime Powers
The cyclicity theorem settles the case of prime moduli. What about $(\mathbb{Z}/p^n\mathbb{Z})^\times$ for $n > 1$? For odd primes, the answer is the same.
Before the main theorem, we examine a potential obstacle: a primitive root mod $p$ might fail to be a primitive root mod $p^2$ because $g^{p-1} \equiv 1 \pmod{p^2}$. When this happens, the order of $g$ in $(\mathbb{Z}/p^2\mathbb{Z})^\times$ divides $p - 1$, which is too small to generate the group of order $p(p-1)$. The key lemma shows this is the only obstruction, and it can always be circumvented.
[quotetheorem:1711]
[citeproof:1711]
Part (i) fails for $p = 2$, $k = 1$: $(1 + 2)^2 = 9 \equiv 1 \pmod{8}$, but we would need $9 \equiv 1 + 4 = 5 \pmod{8}$. This is why $p = 2$ requires separate treatment.
[quotetheorem:1712]
[citeproof:1712]
[quotetheorem:1713]
[citeproof:1713]
This settles the cyclic structure for every odd prime power. The exceptional case $p = 2$ requires separate analysis, which we record for completeness below.
[remark: Complete Structure of $(\mathbb{Z}/2^n\mathbb{Z})^\times$]
The prime $p = 2$ is exceptional. We have already seen $(\mathbb{Z}/8\mathbb{Z})^\times \cong \mathbb{Z}/2\mathbb{Z} \times \mathbb{Z}/2\mathbb{Z}$, which is not cyclic. For $n \geq 3$, $(\mathbb{Z}/2^n\mathbb{Z})^\times$ is generated by $-1$ (of order $2$) and $5$ (of order $2^{n-2}$), giving
\begin{align*}
(\mathbb{Z}/2^n\mathbb{Z})^\times &\cong \mathbb{Z}/2\mathbb{Z} \times \mathbb{Z}/2^{n-2}\mathbb{Z}.
\end{align*}
The Power Lifting Lemma fails at $p = 2$, $k = 1$ — this is the precise point where the proof breaks down and the group structure bifurcates.
[/remark]
Combining the cyclicity theorem for odd prime powers with the CRT ring isomorphism gives a complete description of all groups of units. The building blocks are exactly the cyclic groups identified above, assembled componentwise according to the prime factorisation of the modulus.
[remark: Structure of $(\mathbb{Z}/N\mathbb{Z})^\times$ in General]
For $N = \prod_{i=1}^r p_i^{k_i}$, the Chinese Remainder Theorem gives a ring isomorphism $\mathbb{Z}/N\mathbb{Z} \cong \prod_i \mathbb{Z}/p_i^{k_i}\mathbb{Z}$, which restricts to a group isomorphism
\begin{align*}
(\mathbb{Z}/N\mathbb{Z})^\times &\cong (\mathbb{Z}/p_1^{k_1}\mathbb{Z})^\times \times \cdots \times (\mathbb{Z}/p_r^{k_r}\mathbb{Z})^\times.
\end{align*}
Each factor is cyclic for odd $p_i$, and $(\mathbb{Z}/2^{k_i}\mathbb{Z})^\times \cong \mathbb{Z}/2\mathbb{Z} \times \mathbb{Z}/2^{k_i - 2}\mathbb{Z}$ for $k_i \geq 3$. This completely classifies the group $(\mathbb{Z}/N\mathbb{Z})^\times$ for any $N$.
[/remark]
This classification has immediate consequences for the structure of the Dirichlet character group, which will appear prominently when we study $L$-functions in later chapters. A Dirichlet character mod $N$ is a group homomorphism $(\mathbb{Z}/N\mathbb{Z})^\times \to \mathbb{C}^\times$, and knowing the group structure determines the number and nature of all such characters.
Congruences provide a powerful language for studying divisibility, but certain questions — particularly whether an integer is a perfect square modulo a prime — deserve special attention, bringing us to the rich theory of quadratic residues and the quadratic reciprocity law.
# 4. Quadratic Residues
This chapter takes up a fundamental question about quadratic equations modulo a prime: given $a$ and $p$, does $x^2 \equiv a \pmod{p}$ have a solution? The answer organises itself around a clean algebraic invariant — the Legendre symbol — whose properties lead surprisingly quickly to one of the deepest theorems in elementary number theory, the Law of Quadratic Reciprocity. The chapter then extends these ideas to composite moduli via the Jacobi symbol, and closes with a glimpse of higher reciprocity laws and an analytic proof of the infinitude of primes $\equiv 3 \pmod{4}$.
## Quadratic Residues
Fix an odd prime $p$. The equation $x^2 \equiv a \pmod{p}$ can have zero, one, or two solutions depending on $a$. For $a \equiv 0 \pmod{p}$ there is exactly one solution, $x \equiv 0$. For $a \not\equiv 0 \pmod{p}$, if $x$ is a solution then so is $-x$; since $p$ is odd and $a \not\equiv 0$, we have $x \not\equiv 0$, so $x \not\equiv -x \pmod{p}$, giving either zero or two distinct solutions. This dichotomy — solvable or not, with no middle ground — is the starting point.
[definition: Quadratic Residue]
Let $p$ be an odd prime and $a \in \mathbb{Z}$ with $a \not\equiv 0 \pmod{p}$. We say $a$ is a **quadratic residue** mod $p$ if $x^2 \equiv a \pmod{p}$ is soluble, and a **quadratic non-residue** mod $p$ if not.
[/definition]
Equivalently, $a$ is a quadratic residue mod $p$ if and only if the class of $a$ in $(\mathbb{Z}/p\mathbb{Z})^\times$ is a square. Since $(\mathbb{Z}/p\mathbb{Z})^\times$ is a group of order $p-1$, this is a condition on the index of $a$ with respect to any primitive root.
[example: Quadratic Residues Mod 7]
Computing squares mod $7$:
\begin{align*}
\begin{array}{c|cccccc}
x & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline
x^2 \bmod 7 & 1 & 4 & 2 & 2 & 4 & 1
\end{array}
\end{align*}
The quadratic residues mod $7$ are $\{1, 2, 4\}$. Notice that squaring is exactly two-to-one: each residue appears as the square of exactly two elements ($x$ and $p-x$). This is not a coincidence — it is the content of the next lemma.
[/example]
The example above reveals a pattern: there are exactly $\frac{p-1}{2}$ quadratic residues. This is a general fact with two illuminating proofs.
[quotetheorem:1714]
[citeproof:1714]
The primitive root proof makes the structure especially transparent: the quadratic residues are precisely the even powers of $g$, and the non-residues are the odd powers. The oddness of $p$ is essential here: it ensures $p - 1$ is even, so the split between even and odd powers is a genuine dichotomy with equal halves. Were $p = 2$, the group $(\mathbb{Z}/2\mathbb{Z})^\times$ would be trivial and every element would be vacuously a square. From a group-theoretic perspective, the subgroup of quadratic residues is the unique index-$2$ subgroup of $(\mathbb{Z}/p\mathbb{Z})^\times$ — the kernel of the squaring-parity homomorphism — and its existence depends on $p - 1$ being divisible by $2$, i.e., on $p$ being an odd prime. This index-$2$ structure suggests encoding the residue/non-residue distinction as $+1/-1$ according to the parity of the discrete logarithm — which is exactly what the Legendre symbol does.
## The Legendre Symbol and Euler's Criterion
How should one efficiently determine whether a given $a$ is a quadratic residue mod $p$? Computing $x^2$ for all $x \in \{1, \ldots, \frac{p-1}{2}\}$ works in principle but is slow. The right tool is the Legendre symbol, which packages the answer as an arithmetic invariant.
[definition: Legendre Symbol]
For an odd prime $p$, the **Legendre symbol** $\left(\frac{\cdot}{p}\right) : \mathbb{Z} \to \{0, +1, -1\}$ is defined by
\begin{align*}
\left(\frac{a}{p}\right) = \begin{cases} 0 & \text{if } p \mid a, \\ +1 & \text{if } a \text{ is a quadratic residue mod } p, \\ -1 & \text{if } a \text{ is a quadratic non-residue mod } p. \end{cases}
\end{align*}
[/definition]
The Legendre symbol takes only three values, but it is far from a crude tool: Euler's criterion gives a closed-form formula for it in terms of modular exponentiation, making it computable in polynomial time.
[quotetheorem:1715]
[citeproof:1715]
The key point ensuring the congruence determines the symbol completely is that $0, +1, -1$ are three distinct integers modulo $p$ when $p > 2$. The practical upshot: computing $a^{\frac{p-1}{2}} \bmod p$ by repeated squaring — writing $\frac{p-1}{2}$ in binary, computing successive squares, and multiplying selected powers — determines $\left(\frac{a}{p}\right)$ in $O((\log p)^2)$ bit operations. This is polynomial time.
A first and important consequence of Euler's criterion is that the Legendre symbol is multiplicative in its numerator.
[quotetheorem:1716]
[citeproof:1716]
Multiplicativity has a clean group-theoretic restatement: the map $(\mathbb{Z}/p\mathbb{Z})^\times \to \{\pm 1\}$ sending $a \mapsto \left(\frac{a}{p}\right)$ is a group homomorphism. Its kernel is precisely the subgroup of quadratic residues, which has index $2$. As a consequence: the product of two residues is a residue; the product of a residue and a non-residue is a non-residue; and the product of two non-residues is a residue.
With multiplicativity in hand, one can evaluate Legendre symbols for specific values of $a$ as corollaries. The case $a = -1$ is both elegant and useful.
[quotetheorem:1717]
[citeproof:1717]
This has a beautiful consequence for the arithmetic of odd primes.
[quotetheorem:1718]
The forward direction uses $\left(\frac{-1}{p}\right) = +1$ to show $-1$ is a square modulo $p$, which in turn forces $p$ to factor nontrivially in the Gaussian integers as $p = (x + iy)(x - iy)$. A full proof requires unique factorisation in $\mathbb{Z}[i]$, developed in the chapter on algebraic number theory; the connection to higher reciprocity laws is discussed at the end of this chapter.
## Gauss's Lemma and the Legendre Symbol of 2
The Legendre symbol of $2$ modulo $p$ is not immediately accessible from Euler's criterion alone — computing $2^{\frac{p-1}{2}} \bmod p$ directly does not simplify to a formula in $p$ without more work. Gauss's Lemma provides a combinatorial approach that handles all cases at once.
The idea comes from imitating the proof of Fermat's Little Theorem. In that proof, one multiplies together all elements $\{1, 2, \ldots, p-1\}$ and observes that $\{a \cdot 1, a \cdot 2, \ldots, a \cdot (p-1)\}$ is the same set. Here, instead, one works with only the first half $\{1, \ldots, \frac{p-1}{2}\}$ and counts how many multiples $aj$ land in the upper half-interval $\{\frac{p+1}{2}, \ldots, p-1\}$ modulo $p$.
[quotetheorem:1719]
[citeproof:1719]
The lemma gives a combinatorial description of $\left(\frac{a}{p}\right)$: it is $+1$ or $-1$ according to whether an even or odd number of the products $a, 2a, \ldots, \frac{p-1}{2} \cdot a$ reduce to the upper half of $\{1, \ldots, p-1\}$ modulo $p$. Applied to $a = -1$, one finds $\mu = \frac{p-1}{2}$ (since $-j$ is always in the upper half), recovering the formula for $\left(\frac{-1}{p}\right)$. The more interesting case is $a = 2$.
[example: Legendre Symbol of 2 via Gauss's Lemma]
For $a = 2$, the set $\{2j : 1 \leq j \leq \frac{p-1}{2}\} = \{2, 4, \ldots, p-1\}$. A term $2j$ lands in $\{\frac{p+1}{2}, \ldots, p-1\}$ iff $j > \frac{p}{4}$. So $\mu = \#\{j \in \mathbb{Z} : \frac{p}{4} < j < \frac{p}{2}\} = \lfloor \frac{p}{2} \rfloor - \lfloor \frac{p}{4} \rfloor$.
Splitting into cases modulo $8$:
\begin{align*}
p = 8k+1 &\implies \mu = 4k - 2k = 2k, & (-1)^\mu &= +1, \\
p = 8k+3 &\implies \mu = (4k+1) - 2k = 2k+1, & (-1)^\mu &= -1, \\
p = 8k+5 &\implies \mu = (4k+2) - (2k+1) = 2k+1, & (-1)^\mu &= -1, \\
p = 8k+7 &\implies \mu = (4k+3) - (2k+1) = 2k+2, & (-1)^\mu &= +1.
\end{align*}
The pattern depends only on $p \bmod 8$, and $(-1)^\mu$ can be written as $(-1)^{\frac{p^2-1}{8}}$, since $8 \mid p^2 - 1$ for all odd $p$ and $16 \mid p^2 - 1$ iff $p \equiv \pm 1 \pmod{8}$.
[/example]
[quotetheorem:1720]
[citeproof:1720]
The formula $\left(\frac{2}{p}\right) = (-1)^{(p^2-1)/8}$ is natural rather than coincidental. The exponent $(p^2-1)/8$ is always an integer for odd $p$ (since $p^2 - 1 = (p-1)(p+1)$ is divisible by $8$ when $p$ is odd), and its parity is controlled entirely by $p \bmod 8$: writing $p = 8k + r$ with $r \in \{1, 3, 5, 7\}$, one checks $(p^2-1)/8 \equiv (r^2-1)/8 \pmod{2}$, which depends only on $r$. The oddness of $p$ is again essential: for $p = 2$ the formula gives $(4-1)/8$, which is not an integer, reflecting that the Legendre symbol is only defined for odd primes. As an application, primes of the form $p = 8k + 1$ satisfy $\left(\frac{2}{p}\right) = +1$, so there exists $u \in (\mathbb{Z}/p\mathbb{Z})^\times$ with $u^2 \equiv 2$, a fact useful in constructing elements of prescribed order in primality-testing algorithms.
For a general numerator $a \geq 3$, Gauss's Lemma still applies: one counts, for each $k$, the number of $j \in \{1, \ldots, \frac{p-1}{2}\}$ with $(k - \frac{1}{2})\frac{p}{a} < j < \frac{kp}{a}$. The sum over $k$ gives $\mu$. This analysis leads directly to the proof of Quadratic Reciprocity in the next section.
## Quadratic Reciprocity
The Legendre symbol of $a = 3, 5, 7, \ldots$ modulo $p$ depends on $p$ in a way that is not immediately obvious from the definition. What relates $\left(\frac{q}{p}\right)$ to $\left(\frac{p}{q}\right)$ for two distinct odd primes $p$ and $q$? Legendre observed empirically that the two symbols are almost always equal, with a precise sign correction. This is the Law of Quadratic Reciprocity, first proved by Gauss.
[quotetheorem:1721]
[citeproof:1721]
The proof has a striking geometric flavor: the set $S$ consists of lattice points in a rectangle, and the involution $(j,k) \mapsto (j', k')$ pairs up the points in $A$ with those in $B$, showing $\#A = \#B$ without computing either. The main term $\mu + \nu$ is determined by what remains.
Several remarks on the scope of the theorem are worth recording. First, the hypothesis that $p$ and $q$ are **distinct** odd primes is necessary. If $p = q$, the left side becomes $\left(\frac{p}{p}\right)^2 = 0$ while the right side need not be $0$, so the formula breaks down. If $p = 2$ or $q = 2$, the Legendre symbol $\left(\frac{2}{p}\right)$ is handled separately by the formula in the previous section, and the reciprocity law as stated does not apply. Second, the theorem tells us the **product** $\left(\frac{p}{q}\right)\left(\frac{q}{p}\right)$, not the individual values: knowing the product equals $+1$ tells us the two symbols agree, but not which value they take — one must reduce one of them by hand (as in the examples below) to pin down the answer. Third, the theorem concerns prime denominators; extending to composite moduli requires the Jacobi symbol, which carries the same reciprocity law but loses the direct connection to quadratic residuosity.
Quadratic Reciprocity transforms Legendre symbol computations into a recursive process resembling the Euclidean algorithm: one reduces $\left(\frac{q}{p}\right)$ to $\pm\left(\frac{p}{q}\right)$, then reduces the numerator modulo the new denominator, and repeats. The following examples show the pattern.
[example: Evaluating Legendre Symbols via Reciprocity]
**Example (i):** For $\left(\frac{3}{p}\right)$, apply reciprocity. Since $3 \equiv 3 \pmod{4}$, the sign depends on whether $p \equiv 3 \pmod{4}$:
\begin{align*}
\left(\frac{3}{p}\right) &= (-1)^{\frac{p-1}{2}} \left(\frac{p}{3}\right).
\end{align*}
Now $\left(\frac{p}{3}\right) = +1$ if $p \equiv 1 \pmod{3}$ and $-1$ if $p \equiv 2 \pmod{3}$. Combining conditions via CRT modulo $12$:
\begin{align*}
\left(\frac{3}{p}\right) = +1 &\iff p \equiv \pm 1 \pmod{12}, \\
\left(\frac{3}{p}\right) = -1 &\iff p \equiv \pm 5 \pmod{12}.
\end{align*}
**Example (ii):** Since $19 \equiv 73 \equiv 1 \pmod{4}$, reciprocity gives $\left(\frac{19}{73}\right) = \left(\frac{73}{19}\right) = \left(\frac{73 \bmod 19}{19}\right) = \left(\frac{16}{19}\right) = \left(\frac{4^2}{19}\right) = 1$.
**Example (iii):** $\left(\frac{34}{97}\right) = \left(\frac{2}{97}\right)\left(\frac{17}{97}\right) = (+1) \cdot \left(\frac{97}{17}\right) = \left(\frac{97 \bmod 17}{17}\right) = \left(\frac{12}{17}\right) = \left(\frac{3}{17}\right)\left(\frac{4}{17}\right) = \left(\frac{3}{17}\right)$. Since $17 \equiv 5 \pmod{12}$, we get $\left(\frac{3}{17}\right) = -1$, so $\left(\frac{34}{97}\right) = -1$.
[/example]
Notice that in Example (ii), knowing whether $73$ is prime was needed to apply reciprocity (which requires prime denominators). This is a practical obstacle: to compute $\left(\frac{7411}{9283}\right)$ by reducing the numerator step by step, one must verify primality of intermediate moduli and factor intermediate numerators. The Jacobi symbol in the next section removes this obstacle.
## The Jacobi Symbol and Reciprocity for Composite Moduli
The Legendre symbol requires a prime denominator. In practice — particularly in algorithms — one wants to evaluate something like $\left(\frac{a}{n}\right)$ for composite $n$ without factoring $n$ first. The Jacobi symbol extends the Legendre symbol to all odd positive integers, while preserving the key reciprocity law.
[definition: Jacobi Symbol]
For an odd integer $n \geq 1$ with prime factorisation $n = p_1 p_2 \cdots p_k$ (primes not necessarily distinct), the **Jacobi symbol** $\left(\frac{\cdot}{n}\right) : \mathbb{Z} \to \{0, +1, -1\}$ is defined by
\begin{align*}
\left(\frac{a}{n}\right) = \prod_{j=1}^k \left(\frac{a}{p_j}\right),
\end{align*}
where each factor on the right is a Legendre symbol. When $(a,n) > 1$ the product contains a factor $0$ so $\left(\frac{a}{n}\right) = 0$; when $n = 1$ the empty product is $1$.
[/definition]
The Jacobi symbol inherits many properties of the Legendre symbol, but with one critical caveat: $\left(\frac{a}{n}\right) = 1$ does not imply $a$ is a quadratic residue mod $n$. The symbol being $+1$ is necessary but not sufficient for solvability when $n$ is composite.
[example: Jacobi Symbol Without Quadratic Residuosity]
Consider $\left(\frac{2}{15}\right) = \left(\frac{2}{3}\right)\left(\frac{2}{5}\right) = (-1)(-1) = +1$. Yet $2$ is not a quadratic residue mod $3$ (since $1^2 \equiv 1$ and $2^2 \equiv 1$ mod $3$, neither gives $2$), so $2$ is not a square mod $15$ either. The Jacobi symbol is $+1$ because both Legendre factors happen to be $-1$, and their product is $+1$ — but the cancellation of signs obscures the obstruction.
For $n = pq$ with $p \neq q$ odd primes and $(a, pq) = 1$, the Chinese Remainder Theorem gives: $a$ is a square mod $pq$ iff $\left(\frac{a}{p}\right) = \left(\frac{a}{q}\right) = 1$. But $\left(\frac{a}{pq}\right) = 1$ iff $\left(\frac{a}{p}\right) = \left(\frac{a}{q}\right) = +1$ or $\left(\frac{a}{p}\right) = \left(\frac{a}{q}\right) = -1$. These are different conditions.
[/example]
The failure of the converse has cryptographic significance: determining whether $a$ is a square mod $n = pq$ — given only $\left(\frac{a}{n}\right) = 1$ — appears to require knowing the factorisation of $n$. This is the basis of certain cryptographic protocols, since factoring $n$ is believed to be computationally hard.
The Jacobi symbol satisfies formal analogues of all the Legendre symbol properties.
[quotetheorem:1722]
[citeproof:1722]
Properties (iii) and (iv) are remarkable: they extend the Legendre symbol formulas for $-1$ and $2$ to all odd $n$, replacing prime-specific conditions ($p \equiv 1 \pmod{4}$, $p \equiv \pm 1 \pmod{8}$) with the same exponent formulas applied to general odd $n$.
[quotetheorem:1723]
[citeproof:1723]
The Jacobi symbol reciprocity law is the key to an efficient algorithm: to compute $\left(\frac{a}{n}\right)$ one does not need to factor either $a$ or $n$. Instead, apply reciprocity to flip the symbol (adjusting signs by Properties (iii) and (iv)), reduce the new numerator modulo the new denominator, and repeat — exactly as in Euclid's algorithm. For instance:
\begin{align*}
\left(\frac{33}{73}\right) = \left(\frac{73}{33}\right) = \left(\frac{7}{33}\right) = \left(\frac{33}{7}\right) = \left(\frac{5}{7}\right) = -1,
\end{align*}
where the first flip uses $33 \equiv 1 \pmod{4}$ (no sign change), and subsequent steps reduce modulo the running denominator. At worst one must extract factors of $2$ using Property (iv) when the numerator is even, but no factorisation of large integers is required.
## Higher Reciprocity and the Broader Context
The Law of Quadratic Reciprocity asks about squares modulo primes. A natural question is: what happens for higher powers — cubes, fourth powers, and beyond?
For $p \equiv 2 \pmod{3}$, the map $x \mapsto x^3$ is a bijection on $(\mathbb{Z}/p\mathbb{Z})^\times$ (since $3 \nmid p-1$), so every element is a cube; the cubic analogue of residue/non-residue distinction is trivial. The interesting case is $p \equiv 1 \pmod{3}$, where $3 \mid p-1$ and one can define a cubic Legendre symbol taking values in $\{0, 1, \omega, \omega^2\}$, where $\omega$ is a primitive cube root of unity. A cubic reciprocity law holds, but its natural setting requires the Eisenstein integers $\mathbb{Z}[\omega]$. More generally, a reciprocity law for $n$-th powers (for any $n \geq 2$) was established as a crowning achievement of 19th-century algebraic number theory.
A striking consequence of the formula $\left(\frac{-1}{p}\right) = +1$ iff $p \equiv 1 \pmod{4}$ is the following chain of equivalences: for an odd prime $p$,
\begin{align*}
\left(\frac{-1}{p}\right) = 1 \iff p \equiv 1 \pmod{4} \iff p = x^2 + y^2 \text{ for some } x, y \in \mathbb{Z} \iff p = (x+iy)(x-iy) \text{ in } \mathbb{Z}[i].
\end{align*}
This describes exactly how $p$ factors in the Gaussian integers $\mathbb{Z}[i]$. Artin's Reciprocity Law — a highlight of early 20th-century mathematics — vastly generalises this: it tells us how rational primes factor in arbitrary number fields, and the answer is governed by the values of generalised characters (Hecke characters and Artin $L$-functions) that subsume all the classical reciprocity laws.
## An Analytic Proof of the Infinitude of Primes Congruent to 3 mod 4
The Legendre and Jacobi symbols can be used as arithmetic weights in Dirichlet series, giving an analytic approach to questions about the distribution of primes in residue classes. As a demonstration, we give a proof — using the Jacobi symbol for $\left(\frac{-1}{\cdot}\right)$ — that there are infinitely many primes $p \equiv 3 \pmod{4}$.
[quotetheorem:1724]
[citeproof:1724]
This argument is a precursor to Dirichlet's theorem on primes in arithmetic progressions: for any $a$ with $\gcd(a, q) = 1$, there are infinitely many primes $p \equiv a \pmod{q}$. The proof of Dirichlet's theorem uses $L$-functions built from all Dirichlet characters modulo $q$, not just the character $\left(\frac{-1}{\cdot}\right)$, and requires deeper analytic input (non-vanishing of $L(1, \chi)$ for $\chi \neq \chi_0$). The argument above captures the essential idea for the single case $q = 4$, $a = 3$.
The question of which numbers can be represented by quadratic expressions modulo a prime generalises naturally to asking which integers are represented by quadratic polynomials over the integers themselves, a problem that motivates the deeper study of binary quadratic forms.
# 5. Binary Quadratic Forms
Which integers can be written as a sum of two squares? The answer, it turns out, depends entirely on the prime factorisation of $n$ — specifically, on whether the primes $p \equiv 3 \pmod{4}$ divide $n$ to an even or odd power. This chapter develops the machinery needed to answer this question and its natural generalisations: the theory of binary quadratic forms. We classify forms up to a natural equivalence relation, define a canonical representative for each class, and count how many classes share a given discriminant. The final sections characterise which integers are properly represented by a given form, using Legendre symbols and Hensel's lemma as the key tools.
## Sums of Two Squares and the Problem of Representation
The motivating question is concrete: for which $n \geq 1$ does the equation $x^2 + y^2 = n$ have an integer solution?
[quotetheorem:1725]
[citeproof:1725]
This theorem captures the full arithmetic of $x^2 + y^2$. Euler showed that similar clean characterisations hold for $x^2 + 2y^2$ and $x^2 + 3y^2$. However, $x^2 + 6y^2$ is more subtle — it shares a discriminant with $2x^2 + 3y^2$, and the two forms are inequivalent, meaning the question of which $n$ each form represents cannot be separated by simple congruence conditions alone. The reason for this subtlety will emerge once we develop the class number.
The right level of generality is to study all expressions $f(x,y) = ax^2 + bxy + cy^2$ with integer coefficients and ask what sets of integers they represent. This is the theory of binary quadratic forms.
[definition: Binary Quadratic Form]
A **binary quadratic form (BQF)** is a polynomial $f(x,y) = ax^2 + bxy + cy^2$ with coefficients $a, b, c \in \mathbb{Z}$. We use the shorthand $(a, b, c)$ for this form.
[/definition]
[definition: Representation]
A BQF $f$ **represents** $n \in \mathbb{Z}$ if there exist $x, y \in \mathbb{Z}$ with $f(x,y) = n$.
[/definition]
Every BQF $(a, b, c)$ can be written in matrix form:
\begin{align*}
ax^2 + bxy + cy^2 = \begin{pmatrix} x & y \end{pmatrix} \begin{pmatrix} a & b/2 \\ b/2 & c \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix}.
\end{align*}
This matrix representation is the bridge between the arithmetic of the form and linear algebra over $\mathbb{Z}$.
[example: Forms of Discriminant $-4$ and $-16$]
The forms $(1, 0, 1)$ and $(4, 12, 10)$ are related by the change of variables $X = 2x + 3y$, $Y = y$: indeed $(4,12,10)(x,y) = (2x+3y)^2 + y^2 = (1,0,1)(X,Y)$. So $g$ sends $(x,y)$ to $(X,Y) = (1,0,1)(2x+3y, y)$.
However, the change of variables matrix $\begin{pmatrix} 2 & 3 \\ 0 & 1 \end{pmatrix}$ has determinant $2$, so this is not an invertible integer substitution. As a consequence, $g$ represents only even integers (since $4x^2 + 12xy + 10y^2 \equiv 0 \pmod{2}$), while $f = x^2 + y^2$ represents odd numbers like $1$ and $5$. The two forms are not equivalent in the sense that matters.
[/example]
This example motivates the precise notion of equivalence that preserves the representation set.
## Equivalence of Forms and the Group $\mathrm{SL}_2(\mathbb{Z})$
How should we relate two forms that represent the same integers via an invertible integer change of variables?
[definition: Unimodular Substitution and Equivalence]
A **unimodular substitution** is a change of variables of the form
\begin{align*}
X = \alpha x + \gamma y, \quad Y = \beta x + \delta y
\end{align*}
with $\alpha, \beta, \gamma, \delta \in \mathbb{Z}$ and $\alpha\delta - \beta\gamma = 1$. In matrix notation, $\begin{pmatrix} X \\ Y \end{pmatrix} = A^\top \begin{pmatrix} x \\ y \end{pmatrix}$ where $A = \begin{pmatrix} \alpha & \beta \\ \gamma & \delta \end{pmatrix} \in \mathrm{SL}_2(\mathbb{Z})$.
Two BQFs $f$ and $g$ are **equivalent**, written $f \sim g$, if $f(X, Y) = g(x, y)$ for some unimodular substitution; equivalently, if there exists $A \in \mathrm{SL}_2(\mathbb{Z})$ such that the matrix of $f$ equals $A (\text{matrix of } g) A^\top$.
[/definition]
The key point about the determinant condition $\alpha\delta - \beta\gamma = 1$ is that $A^{-1}$ also has integer entries, so the substitution is genuinely two-sided: if $f \sim g$ then $g \sim f$.
[remark: Equivalence as a Group Action]
The set $\mathrm{SL}_2(\mathbb{Z}) = \left\{ A = \begin{pmatrix} \alpha & \beta \\ \gamma & \delta \end{pmatrix} \,\middle|\, \alpha, \beta, \gamma, \delta \in \mathbb{Z},\ \alpha\delta - \gamma\beta = 1 \right\}$ is a group under matrix multiplication. It acts on the set of BQFs by
\begin{align*}
A : f(x,y) \longmapsto f(\alpha x + \gamma y,\, \beta x + \delta y).
\end{align*}
In terms of the associated symmetric matrix $M_f$, this sends $M_f \mapsto A M_f A^\top$. The equivalence classes of BQFs are precisely the orbits of this action. The identity $I_2$ fixes every form, and associativity $(AB)f = A(Bf)$ follows from $(AB)M_f(AB)^\top = A(BM_fB^\top)A^\top$.
[/remark]
Since $A^{-1} \in \mathrm{SL}_2(\mathbb{Z})$ whenever $A \in \mathrm{SL}_2(\mathbb{Z})$, equivalent forms represent the same integers — and more: if $(x,y)$ is a solution for $f$, the image under $A^\top$ gives a solution for $g$. This preservation property makes equivalence the right notion for studying representation sets.
## The Discriminant as an Invariant
Equivalent forms represent the same integers, so any invariant of the form must be shared by equivalent forms. The most important invariant is the discriminant.
[definition: Discriminant]
The **discriminant** of the BQF $f = (a, b, c)$ is
\begin{align*}
\operatorname{disc}(f) = b^2 - 4ac.
\end{align*}
This equals $-4 \det M_f$ where $M_f = \begin{pmatrix} a & b/2 \\ b/2 & c \end{pmatrix}$ is the associated symmetric matrix.
[/definition]
[example: Computing Discriminants]
\begin{align*}
\operatorname{disc}(1, 0, 1) &= 0 - 4 = -4, \\
\operatorname{disc}(4, 12, 10) &= 144 - 160 = -16.
\end{align*}
[/example]
[quotetheorem:1726]
[citeproof:1726]
The converse of this invariance fails: the discriminant does not determine the equivalence class. The forms $f = (1, 0, 6) = x^2 + 6y^2$ and $g = (2, 0, 3) = 2x^2 + 3y^2$ both have discriminant $-24$. But $f(1, 0) = 1$ while $g(x,y) \geq 2$ for all $(x,y) \neq (0,0)$, so $f$ represents $1$ and $g$ does not. Thus $f \not\sim g$. This phenomenon — multiple inequivalent classes sharing a discriminant — is measured by the class number, introduced later.
Before passing to the classification, we characterise which integers can be discriminants.
[quotetheorem:1727]
[citeproof:1727]
## Positive Definite Forms and Definiteness
We have been treating BQFs purely algebraically, but their geometry depends crucially on whether they take all positive values, all negative values, or both signs.
[definition: Definite and Indefinite Forms]
A real quadratic form $f(\vec{x}) = \sum_{1 \leq i \leq j \leq n} a_{ij} x_i x_j$ with $a_{ij} \in \mathbb{R}$ is:
- **positive definite** if $f(\vec{x}) > 0$ for all $\vec{x} \in \mathbb{R}^n \setminus \{0\}$,
- **negative definite** if $f(\vec{x}) < 0$ for all $\vec{x} \in \mathbb{R}^n \setminus \{0\}$,
- **indefinite** if there exist $\vec{x}, \vec{x}' \in \mathbb{R}^n$ with $f(\vec{x}) > 0 > f(\vec{x}')$.
[/definition]
For binary forms, the discriminant completely determines the type.
[quotetheorem:1728]
[citeproof:1728]
This result reveals a key structural feature: the sign of the discriminant is the dividing line between two qualitatively different theories. Positive definite forms (discriminant negative, leading coefficient positive) have bounded integer representations $f(x,y) = n$ — the level set is an ellipse, so only finitely many integer points lie on it. Indefinite forms (discriminant positive) can have infinitely many integer solutions to $f(x,y) = n$, requiring different methods (continued fractions, Pell equations).
[remark: Sign of Coefficients Does Not Determine Type]
The signs of the coefficients $a$, $b$, $c$ alone do not determine definiteness. The form $(1, 3, 1)$ has $d = 9 - 4 = 5 > 0$, so it is indefinite despite all coefficients being positive. Conversely, $(1, -1, 2)$ has $d = 1 - 8 = -7 < 0$, so it is positive definite despite $b < 0$.
[/remark]
From this point on we restrict to **positive definite** BQFs: those with $a, c > 0$ and $d = b^2 - 4ac < 0$. The goal is to identify a canonical representative in each equivalence class.
## Reduced Forms and the Reduction Algorithm
Within each equivalence class of positive definite BQFs, which form is the "simplest"? Two elementary unimodular substitutions generate all of $\mathrm{SL}_2(\mathbb{Z})$ and give an effective reduction procedure.
[example: Reducing $(10, 34, 29)$]
The form $(10, 34, 29)$ has discriminant $d = 34^2 - 4 \cdot 10 \cdot 29 = 1156 - 1160 = -4$. We aim to find an equivalent form with smaller coefficients.
Replacing $x$ by $x + \lambda y$ gives $(a, b, c) \sim (a, b + 2\lambda a, \lambda^2 a + \lambda b + c)$. Taking $\lambda = -1$ repeatedly:
\begin{align*}
(10, 34, 29) \sim (10, 14, 5) \sim (10, -6, 1).
\end{align*}
Applying the substitution $(x, y) \mapsto (y, -x)$, i.e. $S : (a,b,c) \mapsto (c, -b, a)$:
\begin{align*}
(10, -6, 1) \sim (1, 6, 10) \sim (1, 4, 5) \sim (1, 2, 2) \sim (1, 0, 1).
\end{align*}
The form $(1, 0, 1) = x^2 + y^2$ is the simplest representative of this class.
[/example]
The two fundamental substitutions used above are:
\begin{align*}
T_\pm = \begin{pmatrix} 1 & 0 \\ \pm 1 & 1 \end{pmatrix} &: (a,b,c) \longmapsto (a, b \pm 2a, a \pm b + c), \\
S = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix} &: (a,b,c) \longmapsto (c, -b, a).
\end{align*}
Applying $T_\pm$ repeatedly can reduce $|b|$ without changing $a$. Applying $S$ swaps $a$ and $c$, reducing the leading coefficient. Together they guarantee termination.
[definition: Reduced Form]
A positive definite BQF $(a, b, c)$ is **reduced** if
\begin{align*}
-a < b \leq a < c \quad \text{or} \quad 0 \leq b \leq a = c.
\end{align*}
In either case, $|b| \leq a \leq c$.
[/definition]
The two cases handle the boundary: when $a < c$ we allow $b$ up to $a$ (but not $b = -a$ since that is equivalent to $b = a$), and when $a = c$ we require $b \geq 0$.
[quotetheorem:1729]
[citeproof:1729]
This proves existence of a reduced representative. The proof also shows that $S$ and $T_\pm$ generate $\mathrm{SL}_2(\mathbb{Z})$: any unimodular substitution can be written as a product of these elementary ones.
## Finiteness and the Reduced Form Bounds
The reduction algorithm finds a reduced form, but how many reduced forms can there be for a given discriminant?
[quotetheorem:1730]
[citeproof:1730]
[remark: Finiteness of Reduced Forms]
For fixed $d$, the bound $|b| \leq a \leq \sqrt{|d|/3}$ leaves only finitely many choices for $(a, b)$, and then $c = (b^2 - d)/(4a)$ is determined. So there are only finitely many reduced positive definite forms of discriminant $d$.
[/remark]
[example: Reduced Forms of Discriminant $-4$]
We need $|b| \leq a \leq \sqrt{4/3} < 2$ and $b \equiv 0 \pmod{2}$. So $a = 1$ and $b = 0$, giving $c = (0 + 4)/4 = 1$. The only reduced positive definite form of discriminant $-4$ is $(1, 0, 1) = x^2 + y^2$.
This single form will be used momentarily to prove the hard direction of the sum of two squares theorem: if $p \equiv 1 \pmod{4}$, then $p = x^2 + y^2$.
[/example]
Having established finiteness, we can prove that the reduced form in each equivalence class is unique.
## Uniqueness of the Reduced Representative
The reduction algorithm produces a reduced form from any input, but different orderings of moves might plausibly yield different reduced representatives. Is the reduced form attached to an equivalence class unique, or does the choice depend on the path taken? Settling this requires a second invariant: the smallest positive integers that the form represents primitively. We begin with the notion of proper representation, which is stronger than mere representation.
[definition: Proper Representation]
A BQF $f$ **properly represents** $n$ if there exist $x, y \in \mathbb{Z}$ with $f(x, y) = n$ and $\gcd(x, y) = 1$.
[/definition]
Proper representation is preserved under equivalence: if $(x, y)$ is a primitive pair and the substitution has determinant $1$, then the image $(X, Y)$ also satisfies $\gcd(X, Y) = \gcd(x, y) = 1$. (The inverse substitution is also integral, so the $\gcd$ is unchanged.)
[quotetheorem:1731]
[citeproof:1731]
With this ordering of represented values in hand, uniqueness follows.
[quotetheorem:1732]
[citeproof:1732]
[remark: The Motivation for the Definition of Reduced]
The definition of "reduced" was engineered to make this uniqueness theorem true. The two cases in the definition ($a < c$ versus $a = c$) are exactly what is needed to break the symmetry $b \mapsto -b$ and ensure that each equivalence class has exactly one representative.
[/remark]
[example: Reduced Forms of Discriminant $-24$]
We need $|b| \leq a \leq \sqrt{8} < 3$ and $b \equiv 0 \pmod{2}$.
For $a = 1$: $b = 0$, $c = (0 + 24)/4 = 6$. Form: $(1, 0, 6)$.
For $a = 2$: $b \in \{0, \pm 2\}$. For $b = \pm 2$: $c = (4 + 24)/8 = 28/8 \notin \mathbb{Z}$. For $b = 0$: $c = 3$. Form: $(2, 0, 3)$.
So there are exactly two reduced forms, $(1, 0, 6) = x^2 + 6y^2$ and $(2, 0, 3) = 2x^2 + 3y^2$. As noted at the start, they are inequivalent despite sharing discriminant $-24$.
[/example]
## Primes Represented as Sums of Two Squares
The work above now pays off. We return to the promise from the first section and prove the "serious" direction of the sum of two squares theorem.
We first need a lemma that connects proper representation to equivalence.
[quotetheorem:1733]
[citeproof:1733]
Now the promised proof.
[example: Proof that Every Prime $p \equiv 1 \pmod{4}$ is a Sum of Two Squares]
Since $p \equiv 1 \pmod{4}$, the Legendre symbol $\left(\frac{-1}{p}\right) = 1$, so there exists $u \in \mathbb{Z}$ with $u^2 \equiv -1 \pmod{p}$, i.e. $u^2 = -1 + pk$ for some $k \in \mathbb{Z}$.
Define $f = (p, 2u, k)$. Then $f(1, 0) = p$, and $\operatorname{disc}(f) = 4u^2 - 4pk = 4(-1 + pk) - 4pk = -4$.
Since the only reduced positive definite form of discriminant $-4$ is $(1, 0, 1) = x^2 + y^2$, the reduction theorem implies $f \sim x^2 + y^2$. Therefore, since $f$ represents $p$ (at $(1,0)$), so does $x^2 + y^2$: there exist $x, y \in \mathbb{Z}$ with $p = x^2 + y^2$.
[/example]
This is one of the most elegant applications of the theory: an abstract classification theorem directly implies a classical arithmetical fact.
## The Class Number
The discriminant is a coarse invariant: we have already seen that forms of the same discriminant can be genuinely inequivalent (the pair $(1, 0, 6)$ and $(2, 0, 3)$ of discriminant $-24$). How badly does the discriminant fail to determine the equivalence class? A single integer measures exactly this failure. Small values correspond to tame arithmetic — representability is controlled by congruence conditions; large values correspond to arithmetic that congruences alone cannot see.
Since every positive definite BQF is equivalent to a unique reduced form, and there are only finitely many reduced forms of a given discriminant, we can count equivalence classes.
[definition: Class Number]
The **class number** $h(d)$ is the number of positive definite reduced BQFs of discriminant $d$ (equivalently, the number of equivalence classes of positive definite BQFs of discriminant $d$).
[/definition]
[example: Computing Small Class Numbers]
\begin{align*}
h(-4) &= 1 & & \text{(only } x^2 + y^2 \text{)} \\
h(-7) &= 1 & & \text{(only } x^2 + xy + 2y^2\text{)} \\
h(-24) &= 2 & & \text{(forms } x^2 + 6y^2 \text{ and } 2x^2 + 3y^2\text{)}
\end{align*}
[/example]
How does $h(d)$ grow as $d \to -\infty$? This is a deep question settled only by methods from analytic number theory. The following results record the main theorems; each proof uses techniques developed in later courses on $L$-functions and algebraic number theory.
[quotetheorem:1734]
Heilbronn's 1934 proof is unusual: it establishes the conclusion under GRH and under the negation of GRH by completely different arguments, so the result follows regardless of which alternative holds.
[quotetheorem:1735]
Mertens established this formula in 1874, well before the general theory of $L$-functions was developed.
[quotetheorem:1736]
Conjectured around 1900, this was proved (after an initial gap-ridden attempt) by Heegner in 1952, and independently by Baker and Stark in 1967.
[quotetheorem:1737]
The class number is not merely a combinatorial count: for fundamental discriminants $d$, Gauss showed that equivalence classes of positive definite BQFs of discriminant $d$ form an abelian group under composition, the **class group**. This group turns out to be the ideal class group of the quadratic field $\mathbb{Q}(\sqrt{d})$ — the connection to algebraic number theory runs deep.
## Which Integers Does a Form Represent?
The classification into reduced forms gives us a handle on which integers a given form represents. The key theorem relates proper representation to congruence solubility.
[quotetheorem:1738]
[citeproof:1738]
The solubility condition $x^2 \equiv d \pmod{4n}$ breaks into local conditions by the Chinese Remainder Theorem. Writing $n = \prod_p p^{e_p}$:
\begin{align*}
x^2 \equiv d \pmod{4n} \text{ soluble} \iff \begin{cases} x^2 \equiv d \pmod{p^{e_p}} & \text{soluble for all odd } p, \\ x^2 \equiv d \pmod{2^{e_2 + 2}} & \text{soluble.} \end{cases}
\end{align*}
For odd primes $p$ not dividing $d$, solubility mod $p$ is determined by the Legendre symbol. The next result says that solubility mod $p$ already implies solubility mod all higher powers.
[quotetheorem:1739]
[citeproof:1739]
This lifting theorem shows that for odd primes $p \nmid d$, whether $x^2 \equiv d \pmod{p^e}$ is soluble depends only on the Legendre symbol $\left(\frac{d}{p}\right)$, not on $e$. The prime $p = 2$ requires the additional condition $a \equiv 1 \pmod{8}$; the Legendre symbol alone does not suffice at $p = 2$.
## Representation by Forms of Class Number One
When $h(d) = 1$, the theory becomes especially clean: there is a unique positive definite form of discriminant $d$, so a question about which integers are properly represented is entirely determined by congruence conditions.
[example: Proper Representation by $x^2 + xy + 2y^2$]
The form $f = (1, 1, 2)$ has discriminant $-7$. Since $h(-7) = 1$ (one checks that the only reduced form of discriminant $-7$ with $|b| \leq a \leq \sqrt{7/3}$ is $(1, 1, 2)$), every positive definite form of discriminant $-7$ is equivalent to $f$.
For a prime $p \neq 2, 7$, the theorem on proper representation gives: $p$ is properly represented by $f$ if and only if $x^2 \equiv -7 \pmod{4p}$ is soluble, which (by CRT and since $-7 \equiv 1 \pmod{4}$, so solubility mod $4$ is automatic) is equivalent to $\left(\frac{-7}{p}\right) = 1$.
By quadratic reciprocity and the properties of the Legendre symbol:
\begin{align*}
\left(\frac{-7}{p}\right) = \left(\frac{-1}{p}\right)\left(\frac{7}{p}\right) = \left(\frac{p}{7}\right),
\end{align*}
which equals $1$ when $p \equiv 1, 2, 4 \pmod{7}$ and $-1$ when $p \equiv 3, 5, 6 \pmod{7}$.
Additionally, $f(0, 1) = 2$ and $f(1, -2) = 1 - 2 + 8 = 7$, so $2$ and $7$ are also represented.
For general $n = 2^{e_2} 7^{e_7} \prod_{p \equiv 1,2,4 \pmod{7}} p^{e_p}$, the Hensel lifting theorem (and direct computation for $p = 7$: $x^2 \equiv -7 \pmod{49}$ is insoluble) gives:
$f$ properly represents $n$ if and only if $e_7 \in \{0, 1\}$ (and all other prime factors satisfy the Legendre condition). The integers represented (not necessarily properly) are exactly those $n$ such that every prime $p \not\equiv 1, 2, 4 \pmod{7}$ divides $n$ to an even power.
[/example]
When $h(d) > 1$, the theorem on proper representation only tells us that $n$ is represented by some form of discriminant $d$, not which equivalence class represents it. For $d = -24$, with $h(-24) = 2$, the two forms $x^2 + 6y^2$ and $2x^2 + 3y^2$ each represent different subsets of integers, and congruence conditions alone cannot distinguish which form represents a given $n$. For $x^2 + 23y^2$ (with $h(-23) = 3$), the situation is even more subtle: representability by this specific form cannot be described by any finite set of congruence conditions on $n$. This is a deep fact related to non-abelian reciprocity laws and is connected to class field theory.
[definition: Fundamental Discriminant]
A discriminant $d \equiv 0$ or $1 \pmod{4}$ is a **fundamental discriminant** if $d \neq k^2 d'$ for any integer $k \geq 2$ and any $d' \equiv 0$ or $1 \pmod{4}$.
[/definition]
For fundamental discriminants $d < 0$, Gauss defined a composition law on equivalence classes of positive definite BQFs of discriminant $d$, making this set into an abelian group — the class group. This turns out to coincide with the ideal class group of $\mathbb{Q}(\sqrt{d})$, a central object in algebraic number theory.
## Indefinite Forms and Higher-Dimensional Generalisations
The theory of positive definite forms has a clean structure because the level sets $f(x,y) = n$ are ellipses, containing only finitely many integer points. For indefinite forms, the level sets are hyperbolas, and the situation changes drastically.
For an indefinite form, $\#\{(x,y) \in \mathbb{Z}^2 : f(x,y) = n\}$ can be infinite. The equation $x^2 - 2y^2 = 1$ is a striking example. Starting from the solution $(1,0)$, one can check $(3, 2)$ is another. The composition identity $(x^2 - 2y^2)(z^2 - 2w^2) = (xz + 2yw)^2 - 2(xw + yz)^2$ allows one to multiply solutions, generating infinitely many. The study of $x^2 - dy^2 = 1$ (Pell's equation) is a separate and rich chapter, handled via the theory of continued fractions.
In higher dimensions, one asks which integers are represented by $x_1^2 + x_2^2 + \cdots + x_k^2$.
[quotetheorem:1740]
The proof is not given in this course, but there is a formula for the number of representations of $n$ as a sum of four squares, involving divisor sums, that follows from the theory of modular forms.
[quotetheorem:1741]
The easy direction ($\Rightarrow$) uses the observation that $x^2 \equiv 0, 1, 4 \pmod{8}$, so $x^2 + y^2 + z^2 \equiv 0, 1, 2, 3, 4, 5, 6 \pmod{8}$, but never $7$. The converse is hard, requiring quadratic reciprocity and Dirichlet's theorem on primes in arithmetic progressions. Both directions of Legendre's theorem are outside the scope of this course.
The contrast with the four-square theorem is instructive: sums of three squares have an obstruction from local conditions at $2$ (the residue class $7 \pmod{8}$), while sums of four squares have no such obstruction — every positive integer is a sum of four squares without exception.
Binary quadratic forms encode arithmetic information about integers, and their behaviour is intimately connected to how primes distribute throughout the number line, revealing that the apparent randomness of prime occurrence obeys profound regularities.
# 6. Distribution of the Primes
The primes $2, 3, 5, 7, 11, 13, 17, \ldots$ form an infinite sequence — this is Euclid's theorem from antiquity. But knowing that the sequence is infinite leaves two natural questions entirely open: how rapidly does it grow, and how regular or erratic is its spacing? This chapter develops the principal tools for answering these questions, moving from elementary counting arguments through the Riemann $\zeta$-function and Dirichlet series to Chebyshev's theorem and Bertrand's postulate.
## How Fast Do the Primes Grow?
How many primes are there up to a given bound $X$, and what can we say about the size of the $n$-th prime?
The fundamental result governing prime growth is the Prime Number Theorem, which gives an asymptotic count of primes up to $X$.
[definition: Prime Counting Function]
For a real number $X \geq 1$, define
\begin{align*}
\pi(X) := \#\{p \leq X : p \text{ prime}\}.
\end{align*}
[/definition]
[quotetheorem:1742]
The proof of the Prime Number Theorem uses complex analysis — specifically properties of the Riemann $\zeta$-function on the line $\operatorname{Re}(s) = 1$, which we develop later in this chapter. For now, we prove a weaker but elementary lower bound.
[quotetheorem:1743]
[citeproof:1743]
This bound shows $\pi(x) \to \infty$, but grows only logarithmically. It is far from the true asymptotic $x / \log x$, yet the argument is entirely elementary, using only the fundamental theorem of arithmetic.
## Gaps Between Consecutive Primes
If $p_n$ denotes the $n$-th prime, how large or small can the gap $p_{n+1} - p_n$ be?
By the Prime Number Theorem, the average gap near $p_n$ is around $\log p_n$. But the gaps oscillate dramatically, and understanding both extremes — small gaps and large gaps — is one of the central problems of analytic number theory.
**Small gaps.** The twin prime conjecture asserts that $p_{n+1} = p_n + 2$ for infinitely many $n$. This remains open. A landmark recent result (Zhang, 2013; improved by the Polymath project) shows there exists a constant $C$ such that $p_{n+1} \leq p_n + C$ for infinitely many $n$; the best currently known value is $C = 246$.
**Large gaps.** The Prime Number Theorem implies that $p_{n+1} - p_n > c \log p_n$ for infinitely many $n$. More precisely, there are infinitely many $n$ for which
\begin{align*}
p_{n+1} - p_n > c \frac{\log p_n \log \log p_n \log \log \log \log p_n}{\log \log \log p_n},
\end{align*}
while $(\log p_n)^2$ is conjectured to be the true order. For upper bounds, analytic methods show $p_{n+1} - p_n \leq p_n^{0.525}$ for all sufficiently large $n$.
A classical and elementary result gives the best uniform upper bound accessible by elementary means:
[quotetheorem:1744]
The proof appears later in this chapter, once the central binomial coefficient $\binom{2n}{n}$ and its prime factorisation have been developed as tools. The bound $2n$ is essentially sharp: it cannot be tightened to $(2 - \varepsilon)n$ for any $\varepsilon > 0$, since for infinitely many $n$ the interval $(n, (2-\varepsilon)n)$ contains no primes. Bertrand's postulate should be contrasted with Legendre's much stronger conjecture — that a prime exists in $[n^2, (n+1)^2]$ — which remains unresolved.
## Divergence of the Sum of Reciprocals of Primes
Can the primes be so sparse that $\sum_p 1/p$ converges? Euler showed that they cannot.
[quotetheorem:1745]
[citeproof:1745]
This result says something qualitative about prime distribution: the primes are not too sparse. The harmonic series $\sum 1/n$ diverges, and the partial sums $\sum_{n \leq x} 1/n \sim \log \log x$ as $x \to \infty$ — a doubly logarithmic rate. Mertens' theorem refines the estimate for $\sum_{p \leq x} 1/p$ to the same order: $\sum_{p \leq x} 1/p \sim \log \log x$.
## The Riemann $\zeta$-Function
To go further, we need analytic tools. How can we encode information about all primes into a single analytic function?
Euler had studied $\sum_{n \geq 1} n^{-s}$ for real $s > 1$ and found its connection to the primes via a product formula. Riemann's insight was to treat $s$ as a complex variable.
[definition: Riemann Zeta Function]
For a complex variable $s$ with $\operatorname{Re}(s) > 1$, the **Riemann $\zeta$-function** is
\begin{align*}
\zeta(s) := \sum_{n=1}^{\infty} \frac{1}{n^s}.
\end{align*}
[/definition]
[quotetheorem:1746]
[citeproof:1746]
Absolute convergence together with the theory of uniformly convergent series of analytic functions implies that $\zeta(s)$ is an analytic function on the half-plane $\{\operatorname{Re}(s) > 1\}$. The $\zeta$-function connects to the primes through its Euler product.
[quotetheorem:1747]
[citeproof:1747]
The Euler product encodes, via unique factorisation, the complete multiplicative structure of the integers into the analytic properties of $\zeta(s)$. Part (ii) is not a curiosity — the non-vanishing of $\zeta(s)$ on the line $\operatorname{Re}(s) = 1$ is exactly the key ingredient that forces the Prime Number Theorem, as we sketch below.
## Analytic Continuation of the Zeta Function
The series $\sum n^{-s}$ only converges for $\operatorname{Re}(s) > 1$, but the $\zeta$-function has a natural extension past this boundary. Can we make sense of $\zeta(s)$ for $\operatorname{Re}(s) \leq 1$, and what happens at $s = 1$ where the harmonic series diverges?
[quotetheorem:1748]
[citeproof:1748]
A more refined integral representation extends the analysis further: $\zeta(s) - \frac{1}{s-1}$ is in fact entire — analytic on all of $\mathbb{C}$ — and that $\zeta(s)$ satisfies a functional equation. Setting
\begin{align*}
\xi(s) := \pi^{-s/2} \Gamma\!\left(\frac{s}{2}\right) \zeta(s),
\end{align*}
where $\Gamma(s) = \int_0^{\infty} e^{-y} y^{s-1} \, dy$ is the Gamma function (which interpolates the factorial: $\Gamma(n) = (n-1)!$), one has $\xi(s) = \xi(1-s)$. This functional equation relates the behaviour of $\zeta$ in the half-plane $\operatorname{Re}(s) > 1$ to its behaviour in $\operatorname{Re}(s) < 0$.
A second important remark concerns the Prime Number Theorem. The PNT is equivalent to showing that $\Psi(x) := \sum_{n \leq x} \Lambda(n) \sim x$, where $\Lambda$ is the von Mangoldt function defined below. The key analytical ingredient is that $\zeta(s) \neq 0$ on the line $\operatorname{Re}(s) = 1$ — one cannot deduce this from the Euler product alone, and it requires a separate argument.
## Dirichlet Series and Multiplicative Functions
How can we systematically encode arithmetic functions as analytic objects, and how does the Möbius function arise as a tool for inverting summatory relations?
The Dirichlet series formalism generalises the $\zeta$-function to arbitrary arithmetic sequences.
[definition: Dirichlet Series]
Given a sequence $(a_n)_{n \geq 1}$ of complex numbers, the **Dirichlet series** with coefficients $(a_n)$ is the series
\begin{align*}
\sum_{n=1}^{\infty} \frac{a_n}{n^s}
\end{align*}
regarded as a function of the complex variable $s$.
[/definition]
If $|a_n| \leq A n^k$ for some constants $A, k$, the series converges absolutely for $\operatorname{Re}(s) > k + 1$. The key algebraic property is that Dirichlet series multiply in a way that mirrors Dirichlet convolution of their coefficients: if $\sum a_m m^{-s}$ and $\sum b_n n^{-s}$ converge absolutely, their product is $\sum_N c_N N^{-s}$ where
\begin{align*}
c_N = \sum_{d \mid N} a_d b_{N/d}.
\end{align*}
[example: Sigma Function as a Dirichlet Product]
The divisor sum $\sigma(n) = \sum_{d \mid n} d$ arises as a Dirichlet product:
\begin{align*}
\zeta(s-1) \cdot \zeta(s) = \left(\sum_{n \geq 1} \frac{n}{n^s}\right)\left(\sum_{n \geq 1} \frac{1}{n^s}\right) = \sum_{n \geq 1} \frac{\sigma(n)}{n^s},
\end{align*}
since the coefficient of $N^{-s}$ in the product is $\sum_{d \mid N} d \cdot 1 = \sigma(N)$.
[/example]
Before introducing the Möbius function, we recall a key lemma from the multiplicative functions chapter: if $f : \mathbb{N} \to \mathbb{C}$ is multiplicative, then so is $g(n) = \sum_{d \mid n} f(d)$. The question of how to recover $f$ from $g$ leads naturally to the Möbius function.
[example: Recovering $f$ from $g$]
Consider $n = 6$. Then $g(6) = f(1) + f(2) + f(3) + f(6)$, $g(3) = f(1) + f(3)$, $g(2) = f(1) + f(2)$, $g(1) = f(1)$. Subtracting gives $f(6) = g(6) - g(2) - g(3) + g(1)$. The pattern is clear: the coefficients $1, -1, -1, +1$ correspond to the Möbius values $\mu(1), \mu(2), \mu(3), \mu(6)$.
[/example]
[definition: Möbius Function]
The **Möbius function** $\mu : \mathbb{N} \to \{0, \pm 1\}$ is defined by
\begin{align*}
\mu(n) =
\begin{cases}
1 & \text{if } n = 1, \\
(-1)^k & \text{if } n = p_1 \cdots p_k \text{ is a product of } k \text{ distinct primes,} \\
0 & \text{if } p^2 \mid n \text{ for some prime } p.
\end{cases}
\end{align*}
[/definition]
The Möbius function is multiplicative. Applying the earlier lemma to $\mu$: let $\nu(n) = \sum_{d \mid n} \mu(d)$. Since $\mu$ is multiplicative, so is $\nu$, and one computes directly on prime powers that $\nu(p^k) = \mu(1) + \mu(p) = 1 - 1 = 0$ for $k \geq 1$ and $\nu(1) = 1$. By multiplicativity,
\begin{align*}
\nu(n) = \sum_{d \mid n} \mu(d) =
\begin{cases}
1 & \text{if } n = 1, \\
0 & \text{if } n > 1.
\end{cases}
\end{align*}
This is the crucial identity that makes Möbius inversion work.
[quotetheorem:1749]
[citeproof:1749]
In terms of Dirichlet series, Möbius inversion takes the form $\frac{1}{\zeta(s)} = \sum_{n \geq 1} \mu(n) n^{-s}$, expressing the multiplicative inverse of the $\zeta$-function.
## The von Mangoldt Function and Primes in Dirichlet Series
The derivative $\zeta'(s)$ also carries information about primes. Taking the logarithmic derivative of the Euler product produces a new arithmetic function.
[definition: Von Mangoldt Function]
The **von Mangoldt function** $\Lambda : \mathbb{N} \to \mathbb{R}_{\geq 0}$ is defined by
\begin{align*}
\Lambda(n) =
\begin{cases}
\log p & \text{if } n = p^k \text{ for some prime } p \text{ and integer } k \geq 1, \\
0 & \text{otherwise.}
\end{cases}
\end{align*}
[/definition]
[quotetheorem:1750]
[citeproof:1750]
Define the Chebyshev $\Psi$-function $\Psi(x) := \sum_{n \leq x} \Lambda(n)$. Partial summation yields $\pi(x) \sim \Psi(x)/\log x$, so the Prime Number Theorem ($\pi(x) \sim x/\log x$) is equivalent to $\Psi(x) \sim x$. The proof of the PNT proceeds by integrating $\frac{x^{s+1}}{s(s+1)} \frac{\zeta'(s)}{\zeta(s)}$ around suitable contours, using the non-vanishing of $\zeta$ on $\operatorname{Re}(s) = 1$ to control the contributions.
## Dirichlet's Theorem on Primes in Arithmetic Progressions
Are primes evenly distributed among residue classes? If $\gcd(a, N) = 1$, are there infinitely many primes $p \equiv a \pmod{N}$?
[quotetheorem:1751]
[citeproof:1751]
The characters needed here are Dirichlet characters, which are group homomorphisms from $(\mathbb{Z}/N\mathbb{Z})^\times$ to $\mathbb{C}^\times$ extended to all of $\mathbb{N}$.
[definition: Dirichlet Character]
A **Dirichlet character** modulo $N$ is a function $\chi : \mathbb{N} \to \mathbb{C}$ satisfying:
(i) $\chi(1) = 1$ and $\chi(n) = 0$ if $\gcd(n, N) > 1$;
(ii) $\chi(n + N) = \chi(n)$ for all $n$ (periodicity modulo $N$);
(iii) $\chi(mn) = \chi(m)\chi(n)$ for all $m, n$ (complete multiplicativity).
[/definition]
Equivalently, $\chi$ restricted to $(\mathbb{Z}/N\mathbb{Z})^\times$ is a group homomorphism into $\mathbb{C}^\times$, extended to all of $\mathbb{N}$ by declaring $\chi(n) = 0$ when $\gcd(n, N) > 1$. The Legendre and Jacobi symbols are classical examples.
[example: Dirichlet Characters Modulo 3]
For $N = 3$, there are two Dirichlet characters. The principal character $\chi_0$ takes the value $1$ on $n$ coprime to $3$ and $0$ otherwise:
\begin{align*}
\chi_0(n) = \begin{cases} 1 & \gcd(n,3) = 1, \\ 0 & \gcd(n,3) > 1. \end{cases}
\end{align*}
The non-principal character $\chi_1$ takes values $1$ on $n \equiv 1 \pmod{3}$, $-1$ on $n \equiv -1 \pmod{3}$, and $0$ when $3 \mid n$. The linear combinations $(\chi_0 \pm \chi_1)/2$ are then the indicator functions of the residue classes $\pm 1 \pmod 3$, showing that $F_{a,3}(s)$ can be expressed in terms of $L(\chi_0, s)$ and $L(\chi_1, s)$.
[/example]
The $L$-functions $L(\chi, s) = \sum_{n \geq 1} \chi(n) n^{-s}$ inherit many properties from $\zeta(s)$: they have Euler products, analytic continuations, and functional equations. Their non-vanishing at $s = 1$ for non-principal $\chi$ is the key fact that drives Dirichlet's theorem.
## The Sieve of Eratosthenes and Legendre's Formula
Given a bound $X$, the Sieve of Eratosthenes removes from $\{2, 3, \ldots, X\}$ all multiples of primes $\leq \sqrt{X}$, leaving exactly the primes in $(\sqrt{X}, X]$. The Möbius function gives a precise formula for what survives.
[quotetheorem:1752]
[citeproof:1752]
Legendre's formula converts the sieve into an exact inclusion-exclusion. In practice the sum is difficult to evaluate because $P$ has exponentially many divisors. Refined sieves (Brun's sieve, Selberg's sieve) replace this exact formula with upper and lower bounds that are more tractable, and they underlie many results in the direction of the twin prime conjecture.
## The $p$-adic Valuation and Divisibility of Binomial Coefficients
What is the exact prime factorisation of $\binom{2n}{n}$, and how can this be used to locate primes in short intervals?
[definition: $p$-Adic Valuation]
For a prime $p$ and an integer $n \geq 1$, the **$p$-adic valuation** $\nu_p(n)$ is the largest power of $p$ dividing $n$: $\nu_p(n) = k$ where $p^k \mid n$ but $p^{k+1} \nmid n$. Thus $n = p^{\nu_p(n)} n_0$ with $\gcd(n_0, p) = 1$.
[/definition]
The valuation is completely additive: $\nu_p(mn) = \nu_p(m) + \nu_p(n)$. In particular, $\nu_p(n!) = \sum_{i \geq 1} \lfloor n/p^i \rfloor$ (Legendre's formula for factorials). The central binomial coefficient $N = \binom{2n}{n}$ is the key object for proving both Chebyshev's theorem and Bertrand's postulate.
[quotetheorem:1753]
[citeproof:1753]
These four properties are the engine behind both Chebyshev's theorem and Bertrand's postulate. Property (ii) tells us that each prime just above $n$ appears exactly once in $N$, giving a lower bound on $N$ from their product. Property (iv) gives an upper bound on the prime-power contribution of small primes.
## Chebyshev's Theorem
Can we prove, by elementary means, that $\pi(x)$ is of order $x/\log x$? The Prime Number Theorem says $\pi(x) \sim x/\log x$, but a weaker statement with explicit constants is within reach.
[quotetheorem:1754]
[citeproof:1754]
Chebyshev himself obtained the sharper constants $0.92 < \pi(x)/(x/\log x) < 1.11$ for all sufficiently large $x$, using more refined versions of the same method. The theorem confirms that $\pi(x)$ grows at the rate $x/\log x$, settling question (i) from the introduction up to constants — the exact constant $1$ in the asymptotic is the content of the Prime Number Theorem.
## Bertrand's Postulate
Is there always a prime between $n$ and $2n$? This question, raised by Bertrand in 1845 and proved by Chebyshev, has a beautiful elementary proof due to Erdős.
[quotetheorem:1755]
[citeproof:1755]
The proof uses the bound $\Theta(x) \leq 4^x$, which we now establish.
[quotetheorem:1756]
[citeproof:1756]
Bertrand's postulate has a pleasing consequence: the sequence of primes cannot have gaps larger than a factor of $2$. Combined with Chebyshev's theorem, it paints a coherent picture of how the primes grow and how they are distributed, one that anticipates — without reaching — the full precision of the Prime Number Theorem.
The distribution of primes, though fundamentally tied to multiplicative properties, finds unexpected expression through the theory of continued fractions — an additive technique that provides both a tool for approximating irrationals and insight into the deeper structure of real numbers.
# 7. Continued Fractions
The rational numbers are dense in $\mathbb{R}$, so every real number can be approximated by rationals — but this observation says nothing about how efficiently. The central question of this chapter is: given $\theta \in \mathbb{R}$, which fractions $p/q$ approximate $\theta$ most closely relative to the size of their denominators? A small denominator that achieves a tiny error is genuinely remarkable, and the continued fraction algorithm makes such approximations systematic. The chapter develops the algorithm, studies the convergents it produces, characterises exactly when a real number has an eventually periodic expansion (the answer is Lagrange's theorem: precisely the quadratic irrationals), and applies the theory to solve Pell's equation.
## The Continued Fraction Algorithm
How can we extract successively better rational approximations from an arbitrary real number $\theta$, in a canonical and reproducible way?
The idea is to generalise the Euclidean algorithm. Just as the Euclidean algorithm for integers extracts a sequence of quotients by repeated division, the continued fraction algorithm applies the same stripping process to $\theta$: take the integer part, invert the remainder, and repeat.
[definition: Continued Fraction Algorithm]
Let $\theta \in \mathbb{R}$. Define a (possibly finite) sequence of integers $a_0, a_1, \ldots$ with $a_n \geq 1$ for $n \geq 1$ as follows.
**Step 0.** Set $a_0 = \lfloor \theta \rfloor$. If $\theta = a_0$, stop. Otherwise $0 < \theta - a_0 < 1$; define $\theta_1 = \frac{1}{\theta - a_0} > 1$, so that $\theta = a_0 + \frac{1}{\theta_1}$.
**Step $n$.** Given $\theta_n > 1$, set $a_n = \lfloor \theta_n \rfloor$. If $\theta_n = a_n$, stop. Otherwise define $\theta_{n+1} = \frac{1}{\theta_n - a_n} > 1$, so that $\theta_n = a_n + \frac{1}{\theta_{n+1}}$.
If the process does not stop, then for every $n$,
\begin{align*}
\theta &= a_0 + \cfrac{1}{a_1 + \cfrac{1}{a_2 + \cfrac{1}{\ddots + \cfrac{1}{\theta_n}}}}
\end{align*}
written more compactly as $\theta = [a_0, a_1, \ldots, a_{n-1}, \theta_n]$. When the process terminates, $\theta = [a_0, a_1, \ldots, a_n]$. The integers $a_0, a_1, \ldots$ are called the **partial quotients** of $\theta$, and the expression $[a_0, a_1, \ldots]$ is the **continued fraction expansion** of $\theta$.
[/definition]
The notation $[a_0, a_1, \ldots]$ for an infinite sequence is justified once we know the corresponding rational approximations converge — this is established by Theorem 6.4 below. A finite continued fraction $[a_0, \ldots, a_n]$ is simply the rational number obtained by evaluating the nested expression from the bottom up.
[example: Continued Fraction of 59/13]
Take $\theta = \frac{59}{13}$. The algorithm runs:
\begin{align*}
\frac{59}{13} &= 4 + \frac{7}{13}, & a_0 &= 4, \quad \theta_1 = \frac{13}{7}, \\
\frac{13}{7} &= 1 + \frac{6}{7}, & a_1 &= 1, \quad \theta_2 = \frac{7}{6}, \\
\frac{7}{6} &= 1 + \frac{1}{6}, & a_2 &= 1, \quad \theta_3 = 6.
\end{align*}
Since $a_3 = 6 = \theta_3$, the algorithm terminates and $\frac{59}{13} = [4, 1, 1, 6]$.
Expanding the notation explicitly:
\begin{align*}
[4,1,1,6] &= 4 + \cfrac{1}{1 + \cfrac{1}{1 + \cfrac{1}{6}}}.
\end{align*}
Notice the partial quotients $4, 1, 1, 6$ are exactly the quotients appearing in Euclid's algorithm applied to $(59, 13)$:
\begin{align*}
59 &= 4 \cdot 13 + 7, \\
13 &= 1 \cdot 7 + 6, \\
7 &= 1 \cdot 6 + 1, \\
6 &= 6 \cdot 1 + 0.
\end{align*}
This connection is not coincidental — the continued fraction algorithm is a direct generalisation of the Euclidean algorithm to arbitrary reals.
[/example]
The Euclidean algorithm terminates precisely for rational inputs. The same is true here.
[quotetheorem:1757]
[citeproof:1757]
[remark: Connection to Euclid]
The relation $b_{i+1} = c_i < b_i$ can be rewritten as $b_i = a_i b_{i+1} + b_{i+2}$, which is exactly the step of Euclid's algorithm applied to the pair $(b_0, b_1) = (b_0, c_0)$. This gives the terminology "partial quotient": each $a_i$ is literally a quotient in the Euclidean algorithm.
[/remark]
## Convergents and Their Recurrence
Once we have the sequence of partial quotients $a_0, a_1, \ldots$, how do we efficiently compute the rational approximations they define?
At each stage $n$ of the algorithm, the number $[a_0, \ldots, a_n]$ is a rational number. Computing it by evaluating the fraction from the inside out becomes increasingly cumbersome. The key observation is that these rationals satisfy a linear recurrence in $n$, which makes them easy to tabulate and analyse.
[definition: Convergents]
Given integers $a_0, a_1, \ldots$ with $a_n \geq 1$ for $n \geq 1$, define sequences of integers $(p_n)_{n \geq -1}$ and $(q_n)_{n \geq -1}$ by the recurrences
\begin{align*}
p_{-1} &= 1, & p_0 &= a_0, & p_n &= a_n p_{n-1} + p_{n-2} \quad (n \geq 1), \\
q_{-1} &= 0, & q_0 &= 1, & q_n &= a_n q_{n-1} + q_{n-2} \quad (n \geq 1).
\end{align*}
Equivalently, $p_1 = a_1 a_0 + 1$ and $q_1 = a_1$. If $[a_0, a_1, \ldots]$ is the continued fraction expansion of some $\theta$, then $p_n/q_n$ is called the **$n$-th convergent** to $\theta$.
Since $a_i \geq 1$ for $i \geq 1$, the denominators satisfy $1 = q_0 \leq q_1 < q_2 < q_3 < \cdots$, an increasing sequence of positive integers.
[/definition]
The auxiliary values $p_{-1} = 1$ and $q_{-1} = 0$ are purely notational: they allow the recurrence $p_n = a_n p_{n-1} + p_{n-2}$ to hold without exception for all $n \geq 1$.
The first key lemma identifies the convergents with the continued fraction truncations and establishes a fundamental interpolation property.
[quotetheorem:1758]
[citeproof:1758]
Part (ii) is particularly useful: it says that inserting any positive real number $\beta$ as the "last partial quotient" produces a value lying between the last two convergents. This is exactly the interpolation property that will allow us to control how $\theta$ sits relative to its convergents.
## The Determinant Identity and Convergence
The recurrence for convergents implies a remarkable determinant relation between consecutive convergents, which in turn controls the error in approximating $\theta$.
[quotetheorem:1759]
[citeproof:1759]
These identities have several important corollaries. Part (i) says $\gcd(p_n, q_n) = 1$, so every convergent is automatically in lowest terms. Dividing (i) by $q_n q_{n-1}$ gives
\begin{align*}
\frac{p_n}{q_n} - \frac{p_{n-1}}{q_{n-1}} &= \frac{(-1)^{n-1}}{q_n q_{n-1}},
\end{align*}
so consecutive convergents alternate above and below, and their difference tends to zero as $q_n \to \infty$. Part (ii) gives $p_n/q_n - p_{n-2}/q_{n-2} = (-1)^n a_n / (q_n q_{n-2})$, from which the even-indexed convergents $p_0/q_0 < p_2/q_2 < p_4/q_4 < \cdots$ form an increasing sequence and the odd-indexed convergents $p_1/q_1 > p_3/q_3 > p_5/q_5 > \cdots$ form a decreasing sequence, with every even convergent less than every odd convergent.
Since $\theta = [a_0, \ldots, a_n, \theta_{n+1}]$ and $\theta_{n+1} > 0$, the interpolation property from the previous theorem places $\theta$ strictly between $p_n/q_n$ and $p_{n+1}/q_{n+1}$ for every $n$. Combined with the gap estimate, this gives convergence.
[quotetheorem:1760]
[citeproof:1760]
This theorem also justifies writing $\theta = [a_0, a_1, \ldots]$ as an equality rather than merely a notation: the continued fraction is defined to be the limit of its convergents, and that limit exists and equals $\theta$.
[remark: Bijection for Irrationals]
Theorem 6.4 shows that the continued fraction expansion defines a bijection between $\mathbb{R} \setminus \mathbb{Q}$ and the set of sequences $(a_0, a_1, \ldots)$ with $a_0 \in \mathbb{Z}$ and $a_n \geq 1$ for all $n \geq 1$. Every such sequence arises as the expansion of a unique irrational, and every irrational has a unique such expansion.
[/remark]
## Rational Approximations and Best Approximation
We now address the original question directly: are convergents the best rational approximations to $\theta$?
The answer is yes, in a strong sense. Not only do convergents approximate $\theta$ well, but no other fraction with a comparably small denominator can do better. The next theorem makes this precise.
[quotetheorem:1761]
[citeproof:1761]
The theorem is stated in terms of $|q\theta - p|$ rather than $|\theta - p/q|$ because the former behaves better under the linear algebra argument. Dividing by $q$ immediately gives the approximation quality: if $|\theta - p/q| < |\theta - p_n/q_n|$ then $|q\theta - p| < (q/q_n)|q_n\theta - p_n| \leq |q_n\theta - p_n|$, which by the theorem forces $q \geq q_{n+1} > q_n$.
[quotetheorem:1762]
[citeproof:1762]
Part (ii) is the striking converse: fractions satisfying the $1/(2q^2)$ bound are completely characterised as convergents. The threshold $1/(2q^2)$ is essentially sharp — it cannot be improved to $1/(2q^2)$ for all irrationals simultaneously, though for specific $\theta$ (notably $\theta = (\sqrt{5}-1)/2$, the golden ratio) even $1/(\sqrt{5}\, q^2)$ is the best possible constant.
## Quadratic Irrationals and Periodic Expansions
For which real numbers does the continued fraction algorithm produce a periodic sequence of partial quotients?
The Euclidean algorithm terminates for rationals; for irrationals it runs forever, producing an infinite sequence. The question is when this sequence is periodic. A periodic sequence is the simplest possible infinite structure, and it turns out to characterise exactly the quadratic irrationals — real numbers satisfying a quadratic equation over $\mathbb{Q}$.
Before Lagrange's theorem, we need the relevant definitions.
[definition: Eventually Periodic Continued Fraction]
A continued fraction $[a_0, a_1, \ldots]$ is **eventually periodic** if there exist integers $m \geq 0$ and $k \geq 1$ such that $a_{n+k} = a_n$ for all $n \geq m$. This is written $[a_0, \ldots, a_{m-1}, \overline{a_m, \ldots, a_{m+k-1}}]$, where the bar indicates the repeated block. The least such $k$ is the **period**. If $m = 0$ the expansion is **purely periodic**, written $[\overline{a_0, \ldots, a_{k-1}}]$.
[/definition]
[definition: Quadratic Irrational]
A real number $\theta$ is a **quadratic irrational** if $\theta = r + s\sqrt{d}$ for some $r, s \in \mathbb{Q}$ with $s \neq 0$ and $d > 1$ a non-square integer. Equivalently, $\theta$ is irrational and satisfies $a\theta^2 + b\theta + c = 0$ for some $a, b, c \in \mathbb{Z}$ with $a \neq 0$.
[/definition]
The equivalence between the two characterisations of quadratic irrational follows by completing the square: $\theta = r + s\sqrt{d}$ satisfies $(\theta - r)^2 = s^2 d$, which is a quadratic over $\mathbb{Q}$.
[example: Continued Fraction of sqrt(6)]
Take $\theta = \sqrt{6}$. Running the algorithm:
\begin{align*}
\theta &= \sqrt{6} = 2 + (\sqrt{6} - 2), & a_0 &= 2, \\
\theta_1 &= \frac{1}{\sqrt{6}-2} = \frac{\sqrt{6}+2}{2} = 2 + \frac{\sqrt{6}-2}{2}, & a_1 &= 2, \\
\theta_2 &= \frac{2}{\sqrt{6}-2} = \sqrt{6}+2 = 4 + (\sqrt{6}-2), & a_2 &= 4, \\
\theta_3 &= \frac{1}{\sqrt{6}-2} = \theta_1.
\end{align*}
Since $\theta_3 = \theta_1$, the sequence of $\theta_i$ is periodic from index 1 onward, giving partial quotients that repeat: $a_3 = a_1 = 2$, $a_4 = a_2 = 4$, and so on. Thus $\sqrt{6} = [2, \overline{2, 4}]$, an eventually periodic continued fraction with period 2.
[/example]
The example illustrates the mechanism: $\theta_n$ is always of the form $(P + \sqrt{d})/Q$ for integers $P, Q$, and there are only finitely many such expressions (once we bound the size of $P$ and $Q$), so the sequence $(\theta_n)$ must eventually repeat.
[quotetheorem:1763]
[citeproof:1763]
Lagrange's theorem is one of the most satisfying results in elementary number theory: it gives a complete, algebraic characterisation of an analytic property. The period itself, however, is not determined by the theorem — it can vary wildly. For $\sqrt{6}$ the period is 2, while $\sqrt{889} = [29, \overline{1, 4, 2, 3, \ldots, 1, 58}]$ has period 42.
### Galois's Theorem on Purely Periodic Expansions
Within the class of eventually periodic expansions, purely periodic ones are further characterised by a condition on the Galois conjugate.
[quotetheorem:1764]
The proof uses the interpolation properties of convergents and a careful analysis of when $\theta_1$ satisfies the same purely periodic condition: the argument proceeds by showing that the conjugate condition $\theta > 1$, $-1 < \theta' < 0$ is preserved at each step of the algorithm.
[remark: Application to Square Roots]
For $\theta = \sqrt{d}$ with $d > 1$ not a square, $a_0 = \lfloor \sqrt{d} \rfloor$ and $\theta_1 = 1/(\sqrt{d} - a_0)$. Then $\theta_1 > 1$ and $\theta_1' = 1/(-\sqrt{d} - a_0) \in (-1, 0)$. By Galois's theorem, $\theta_1$ has a purely periodic expansion, and thus $\sqrt{d} = [a_0, \overline{a_1, \ldots, a_m}]$ for some period block $a_1, \ldots, a_m$. Every square root of a non-square integer has eventually periodic continued fraction starting from a non-repeating first partial quotient.
[/remark]
## Pell's Equation
The theory of continued fractions yields a completely constructive proof that Pell's equation always has nontrivial solutions.
Pell's equation $x^2 - dy^2 = 1$ asks for integer points on a hyperbola. For $d$ a perfect square this factors over $\mathbb{Z}$ and the only solutions with $y \neq 0$ are trivial. For $d$ non-square, the equation has infinitely many solutions, and the continued fraction expansion of $\sqrt{d}$ produces all of them.
[quotetheorem:1765]
[citeproof:1765]
The proof reveals something deeper: the solutions to Pell's equation are exactly the convergents of $\sqrt{d}$ at specific indices. Every positive integer solution $(x, y)$ is of the form $(p_m, q_m)$ for some $m$ — this follows from the best-approximation property, since $x^2 - dy^2 = 1$ implies $|x/y - \sqrt{d}| < 1/(2y^2)$, forcing $x/y$ to be a convergent.
[remark: Group Structure of Solutions]
The solutions to $x^2 - dy^2 = 1$ form a group under the composition law
\begin{align*}
(x_1, y_1) \cdot (x_2, y_2) &= (x_1 x_2 + dy_1 y_2,\ x_1 y_2 + y_1 x_2),
\end{align*}
which corresponds to multiplying $(x_1 + y_1\sqrt{d})(x_2 + y_2\sqrt{d}) = X + Y\sqrt{d}$ in the ring $\mathbb{Z}[\sqrt{d}]$. The product of two solutions is again a solution: $(X + Y\sqrt{d})(X - Y\sqrt{d}) = (x_1 + y_1\sqrt{d})(x_1 - y_1\sqrt{d}) \cdot (x_2 + y_2\sqrt{d})(x_2 - y_2\sqrt{d}) = 1 \cdot 1 = 1$. The fundamental solution (the convergent at the smallest valid index) generates all others under this group operation. This group is isomorphic to the group of units in the ring of integers of $\mathbb{Q}(\sqrt{d})$, a connection explored further in the theory of algebraic number fields.
[/remark]
Continued fractions offer a classical approach to approximating numbers, but the problem of identifying which large integers are prime demands more direct computational methods, leading finally to modern primality tests and factorisation algorithms that synthesise techniques from the entire course.
# 8. Primality Testing and Factorisation
This chapter addresses two fundamental computational problems that underpin modern public-key cryptography: deciding whether a large integer is prime, and decomposing a composite integer into its prime factors. The RSA cryptosystem rests on the asymmetry between these two tasks — primality testing is fast, whereas factorisation appears to be hard. The prime number theorem tells us that among integers near $M$, roughly one in $\log M$ is prime, so generating a large prime by testing candidates is cheap if primality testing itself is cheap. The deeper question is what makes a number prime and how efficiently we can detect it.
## Fermat's Test and Its Limitations
How can we efficiently certify that a given large integer $N$ is prime? The obvious approach — trial division by all primes up to $\sqrt{N}$ — requires exponentially many steps in the bit-length of $N$. We need a property that primes satisfy and that can be checked in polynomial time.
The natural starting point is Fermat's Little Theorem, which provides a necessary condition for primality that can be verified quickly using repeated squaring.
[quotetheorem:731]
The contrapositive gives a practical test: if $a^{N-1} \not\equiv 1 \pmod{N}$, then $N$ is definitely composite. Computing $a^{N-1} \bmod N$ by repeated squaring takes $O(\log N)$ multiplications modulo $N$, so the test runs in polynomial time.
[example: Fermat Witness Detection]
For $N = 15$: compute $2^{14} = (2^4)^3 \cdot 2^2 = 16^3 \cdot 4 \equiv 1^3 \cdot 4 = 4 \pmod{15}$, which is not $1$, so $15$ is composite. For $N = 91$: one checks $3^{90} \equiv 1 \pmod{91}$, which gives no information, while $2^{90} \equiv 64 \pmod{91} \neq 1$, exposing $91 = 7 \times 13$ as composite.
[/example]
The example illustrates a key difficulty: a single base may fail to detect compositeness. We may need to try several values of $a$. This raises the question: if $N$ is composite, can we always find a base $a$ that exposes it?
[definition: Fermat Pseudoprime]
Let $N$ be an odd composite integer and $b \geq 1$ with $\gcd(b, N) = 1$. Say $N$ is a **Fermat pseudoprime to the base $b$** if
\begin{align*}
b^{N-1} \equiv 1 \pmod{N}.
\end{align*}
(Since this depends only on $b \bmod N$, one typically restricts to $1 \leq b < N$.)
[/definition]
The integer $91$ is a Fermat pseudoprime to the base $3$. The set of bases for which $N$ is a pseudoprime has a pleasant algebraic structure: if $N$ is a pseudoprime to bases $b_1$ and $b_2$, then $(b_1 b_2)^{N-1} = b_1^{N-1} b_2^{N-1} \equiv 1 \pmod{N}$, so $N$ is also a pseudoprime to base $b_1 b_2$. Similarly, every composite $N$ is a pseudoprime to base $1$. Therefore, the set $H = \{1 \leq b < N : \gcd(b, N) = 1,\, N \text{ is a pseudoprime to base } b\}$ is a subgroup of $(\mathbb{Z}/N\mathbb{Z})^\times$.
[quotetheorem:1766]
[citeproof:1766]
This result would give an effective probabilistic test if every composite had at least one non-pseudoprime base. Unfortunately, there exist composite integers that fool the Fermat test for every coprime base.
[definition: Carmichael Number]
An odd composite integer $N > 1$ is a **Carmichael number** if it is a Fermat pseudoprime to every base $b$ with $\gcd(b, N) = 1$.
[/definition]
The first few Carmichael numbers are $561 = 3 \times 11 \times 17$, $1105$, $1729$. A classical criterion (Korselt's criterion, proved on the example sheets) characterises them: $N = p_1 \cdots p_k$ is a Carmichael number iff it is squarefree and $(p_i - 1) \mid (N - 1)$ for each $i$. Erdős and Granville–Pomerance–Spiro proved that there are infinitely many Carmichael numbers, so the Fermat test alone cannot be made unconditionally reliable. A stronger test is needed.
## The Euler and Strong Pseudoprime Tests
The Fermat condition $a^{N-1} \equiv 1 \pmod{N}$ can be refined by examining intermediate square roots. For a prime $p$ with $\gcd(a, p) = 1$, we know $a^{(p-1)/2} \equiv \left(\frac{a}{p}\right) \pmod{p}$, where $\left(\frac{a}{p}\right)$ is the Legendre symbol. This is a sharper demand than merely asking that $a^{p-1} \equiv 1$.
[definition: Euler Pseudoprime]
Let $N$ be an odd composite and $b$ with $\gcd(b, N) = 1$. Say $N$ is an **Euler pseudoprime to the base $b$** if
\begin{align*}
b^{(N-1)/2} \equiv \left(\frac{b}{N}\right) \pmod{N},
\end{align*}
where $\left(\frac{b}{N}\right)$ is the Jacobi symbol.
[/definition]
Squaring the Euler condition gives $b^{N-1} \equiv 1 \pmod{N}$, so every Euler pseudoprime to base $b$ is also a Fermat pseudoprime to base $b$. The Euler condition is strictly stronger. Moreover, the set of bases for which $N$ is an Euler pseudoprime is again a subgroup of $(\mathbb{Z}/N\mathbb{Z})^\times$, by the multiplicativity of both $b^{(N-1)/2}$ and the Jacobi symbol. The analogue of the earlier proposition holds:
[quotetheorem:1767]
[citeproof:1767]
The key advantage of the Euler test over the Fermat test is that Carmichael numbers are no longer an obstacle.
[quotetheorem:1768]
[citeproof:1768]
This theorem gives the **Solovay–Strassen primality test**: given odd $N > 1$, pick $b > 1$ at random. If $\gcd(b, N) > 1$, output composite. Otherwise, check whether $b^{(N-1)/2} \equiv \left(\frac{b}{N}\right) \pmod{N}$ (the left side by repeated squaring, the right side via quadratic reciprocity), each computable in polynomial time. If the check fails, $N$ is definitely composite. If it passes, the theorem and the subgroup argument show $N$ is composite with conditional probability at most $\frac{1}{2}$. Repeating $t$ times independently drives the error probability below $2^{-t}$.
## The Miller–Rabin Test
The Euler condition can be refined further by examining the entire sequence of square roots arising from $a^{N-1} \equiv 1$. Write $N - 1 = 2^s t$ with $s \geq 1$ and $t$ odd. If $p$ is an odd prime and $a^t \equiv 1 \pmod{p}$, we are done. If $a^t \not\equiv 1$, the sequence $a^t, a^{2t}, a^{4t}, \ldots, a^{2^s t}$ must contain $-1$ at some step before it reaches $1$, since $p$ is prime and the only square roots of $1$ modulo $p$ are $\pm 1$.
[definition: Strong Test]
Let $N > 1$ be odd and write $N - 1 = 2^s t$ with $s \geq 1$ and $t$ odd. Say $N$ **passes the strong test to base $b$** (where $\gcd(b, N) = 1$) if either
\begin{align*}
b^t \equiv 1 \pmod{N}, \quad \text{or} \quad b^{2^r t} \equiv -1 \pmod{N} \text{ for some } 0 \leq r < s.
\end{align*}
A composite $N$ that passes the strong test to base $b$ is called a **strong pseudoprime to base $b$**.
[/definition]
Every prime passes the strong test to every coprime base, by the argument above. The strong pseudoprimes to a fixed base do exist — for instance, $N = 65$ with $b = 8$ satisfies $N - 1 = 64 = 2^6$, and $8^2 = 64 \equiv -1 \pmod{65}$, so $65$ passes the strong test to base $8$. However, checking $b = 2$ shows $2^1, 2^2, 2^4, 2^8, 2^{16}, 2^{32}$ are all non-$(\pm 1)$ modulo $65$, exposing $65 = 5 \times 13$ as composite.
The relationship between the three tests is hierarchical.
[quotetheorem:1769]
[citeproof:1769]
The critical quantitative result that makes the strong test into a practical algorithm is the following bound on strong pseudoprimes.
[quotetheorem:1770]
[citeproof:1770]
This bound is tight: there exist composites for which exactly $\frac{1}{4}\varphi(N)$ bases are strong liars. The factor of $\frac{1}{4}$ — better than the $\frac{1}{2}$ of the Euler test — makes the **Miller–Rabin test** the standard probabilistic primality test in practice. The algorithm: choose $k$ random bases $b_1, \ldots, b_k$ coprime to $N$; if $N$ fails the strong test for any $b_i$, output composite; otherwise output probably prime. If $N$ is composite, the probability of error is at most $(1/4)^k$, which falls below $2^{-100}$ for $k = 50$.
Under the Generalised Riemann Hypothesis, one can do better: any composite $N$ fails the strong test for some base $b < 2(\log N)^2$, giving a deterministic polynomial-time primality test that checks only $O((\log N)^2)$ bases. This conditional algorithm is fully rigorous given GRH.
In 2002, Agrawal, Kayal, and Saxena discovered the **AKS algorithm**, an unconditional deterministic primality test running in $O((\log N)^{6+\varepsilon})$ time. This resolved the long-standing open question of whether PRIMES lies in P. In practice, AKS is slower than Miller–Rabin for the key sizes used in cryptography, so probabilistic testing remains the workhorse. The upshot is that generating large primes is computationally cheap: test random candidates until one passes Miller–Rabin with sufficiently many bases.
## Factorisation Algorithms
Primality testing is fast; factorisation is another matter entirely. Given an odd composite $N$, we want to find a nontrivial divisor efficiently. No polynomial-time classical algorithm is known, and this apparent hardness underpins RSA. We survey the main classical strategies, from naive to sophisticated.
The simplest method is **trial division**: test whether $N$ is divisible by $3, 5, 7, \ldots$ in sequence. If $N = pq$ with $p \leq q$, then $p \leq \sqrt{N}$, so trial division up to $\sqrt{N}$ always works. The cost is $O(N^{1/2})$ steps, which is acceptable when $N$ has a small factor but completely infeasible when $N = pq$ with $p, q \approx N^{1/2}$.
### Fermat Factorisation
Can we do better when $N = ab$ with $|a - b|$ small? Since $a$ and $b$ are both odd (for odd $N$), we may write
\begin{align*}
N = ab = \left(\frac{a+b}{2}\right)^2 - \left(\frac{a-b}{2}\right)^2 = x^2 - y^2,
\end{align*}
with $x = (a+b)/2 \geq \sqrt{N}$ and $y = (a-b)/2 \geq 0$. Fermat's method searches for a representation $N + y^2 = x^2$: try $x = \lfloor\sqrt{N}\rfloor + 1, \lfloor\sqrt{N}\rfloor + 2, \ldots$ and check whether $x^2 - N$ is a perfect square.
[example: Fermat Factorisation]
Take $N = 200819$. We have $\lfloor\sqrt{200819}\rfloor = 448$. Check $449^2 - N = 201601 - 200819 = 782$, not a square. Then $450^2 - N = 202500 - 200819 = 1681 = 41^2$. So $N = 450^2 - 41^2 = (450+41)(450-41) = 491 \times 409$.
[/example]
The running time depends on how close together the factors are. If $x = \lfloor\sqrt{N}\rfloor + t$, we need $t$ steps to reach $x$. With $N = ab$ and $x = (a+b)/2$:
\begin{align*}
t \approx \frac{a+b}{2} - \sqrt{ab} = \frac{(\sqrt{a} - \sqrt{b})^2}{2}.
\end{align*}
When $a$ and $b$ are close, $t$ is small and the algorithm is fast. But if $N = pq$ with $p \approx 10^k$ and $q \approx 2 \times 10^k$, then $t \approx \frac{(\sqrt{2}-1)^2}{2} \cdot 10^k \approx 0.08 \times 10^k$, which is better than trial division by a constant factor but remains exponential in the bit-length.
### The Factor Base Method
A more powerful approach exploits the following observation. If we can find integers $x, y$ with $x^2 \equiv y^2 \pmod{N}$ but $x \not\equiv \pm y \pmod{N}$, then $N$ has a nontrivial factor.
[quotetheorem:1771]
[citeproof:1771]
How do we find such a congruence $x^2 \equiv y^2 \pmod{N}$? Direct search is too slow. Instead we look for many integers $x_i$ such that $x_i^2 \bmod N$ is divisible only by small primes, then combine them to produce a product that is a perfect square.
For an integer $a$ and modulus $N$, write $\langle a \rangle$ for the unique integer in $(-N/2, N/2]$ congruent to $a$ modulo $N$.
[definition: Factor Base and B-Numbers]
A **factor base** is a set $B = \{-1, p_1, \ldots, p_r\}$ where $p_1, \ldots, p_r$ are primes. A positive integer $x$ is a **$B$-number** if every prime factor of $\langle x^2 \rangle$ belongs to $B$.
[/definition]
The key combinatorial lemma guarantees that among sufficiently many $B$-numbers, we can always find a subset whose product of $\langle x_i^2 \rangle$ values is a perfect square.
[quotetheorem:1772]
[citeproof:1772]
The **factor base method** to factor $N$ then proceeds as follows. Choose a factor base $B$. Generate $B$-numbers $x_1, \ldots, x_k$ (so each $\langle x_i^2 \rangle$ is $B$-smooth). Find a nonempty subset $I$ such that $\prod_{i \in I} \langle x_i^2 \rangle = y^2$. Then with $x = \prod_{i \in I} x_i$, we have $x^2 \equiv y^2 \pmod{N}$. Apply the congruence-of-squares theorem: if $x \not\equiv \pm y \pmod{N}$, we obtain a nontrivial factor; otherwise, find more $B$-numbers and retry.
The probability of success at each attempt is at least $1 - 2^{1-t}$ when $N$ has $t$ distinct prime factors, since $(\mathbb{Z}/N\mathbb{Z})^\times$ has $2^t$ square roots of $1$, only two of which are $\pm 1$. For $t \geq 2$ this is positive, so the method terminates.
[example: Factor Base Factorisation of 1829]
Take $N = 1829$ and factor base $B = \{-1, 2, 3, 5, 7, 11, 13, 17, 19\}$. We compute $\langle x_i^2 \rangle$ for candidate $B$-numbers and check $B$-smoothness:
| $x_i$ | $\langle x_i^2 \rangle$ | factorisation | $B$-number? |
|--------|--------------------------|---------------|-------------|
| $42$ | $-65$ | $-5 \cdot 13$ | yes |
| $43$ | $20$ | $2^2 \cdot 5$ | yes |
| $60$ | $-58$ | $-2 \cdot 29$ | no |
| $61$ | $63$ | $3^2 \cdot 7$ | yes |
| $74$ | $-11$ | $-11$ | yes |
| $75$ | $138$ | $2 \cdot 3 \cdot 23$ | no |
| $85$ | $-91$ | $-7 \cdot 13$ | yes |
By inspection, the product $(-65)(20)(63)(-91) = (5 \cdot 13)(2^2 \cdot 5)(3^2 \cdot 7)(7 \cdot 13) = (2 \cdot 3 \cdot 5 \cdot 7 \cdot 13)^2$ is a perfect square. So
\begin{align*}
(42 \cdot 43 \cdot 61 \cdot 85)^2 \equiv (2 \cdot 3 \cdot 5 \cdot 7 \cdot 13)^2 \pmod{N},
\end{align*}
which gives $1459^2 \equiv 901^2 \pmod{1829}$. We find $\gcd(1829, 1459 + 901) = \gcd(1829, 2360) = 59$ and $\gcd(1829, 1459 - 901) = \gcd(1829, 558) = 31$. Indeed $N = 31 \times 59$.
[/example]
Note that in general $N$ need not equal $\gcd(N, x+y) \cdot \gcd(N, x-y)$; the GCDs may share a common factor with other prime components of $N$.
### Generating B-Numbers via Continued Fractions
The efficiency of the factor base method depends on generating many $B$-numbers quickly. A key source comes from convergents to the continued fraction expansion of $\sqrt{N}$.
[quotetheorem:1773]
[citeproof:1773]
This means $\langle p_n^2 \rangle = \langle p_n^2 \bmod N \rangle$ has absolute value at most $2\sqrt{N}$, which is much smaller than $N$. Small absolute value makes $B$-smoothness far more likely than for a random residue. Moreover, to run the method we only need $p_n \bmod N$ and can compute successive $p_n$ via the recurrence $p_{n+1} = a_{n+1} p_n + p_{n-1}$, working modulo $N$ throughout.
[example: Continued Fraction Factorisation of 12403]
Take $N = 12403$ with $\sqrt{N} = [111; \overline{2, 1, 2, 2, 7, 1, \ldots}]$ of period 16. The first several convergents give:
| $p_n \bmod N$ | $\langle p_n^2 \rangle$ | factorisation | $B$-number? |
|----------------|--------------------------|---------------|-------------|
| $111$ | $-82$ | $-2 \cdot 41$ | no |
| $223$ | $117$ | $3^2 \cdot 13$ | yes |
| $334$ | $-71$ | $-71$ | no |
| $891$ | $89$ | $89$ | no |
| $2116$ | $-27$ | $-3^3$ | yes |
| $3309$ | $166$ | $2 \cdot 83$ | no |
| $5416$ | $-39$ | $-3 \cdot 13$ | yes |
The product $(117)(-27)(-39) = (3^2 \cdot 13)(3^3)(3 \cdot 13) = 3^6 \cdot 13^2$ is a perfect square. So
\begin{align*}
(223 \times 2116 \times 5416)^2 \equiv (3^3 \cdot 13)^2 \pmod{N},
\end{align*}
giving $11341^2 \equiv 351^2 \pmod{12403}$. Then $\gcd(12403, 10990) = 157$ and $\gcd(12403, 11692) = 79$, so $12403 = 79 \times 157$.
[/example]
### Linear Algebra over $\mathbb{F}_2$ and Running Time
Finding the subset $I$ by exhaustive search over all $2^k$ subsets is completely impractical. Instead, record for each $B$-number $x_i$ with $\langle x_i^2 \rangle = (-1)^{a_{i,0}} \prod_{j=1}^r p_j^{a_{ij}}$ its parity vector $(a_{i,0}, a_{i,1}, \ldots, a_{i,r}) \in \mathbb{F}_2^{r+1}$. Finding a subset $I$ such that $\prod_{i \in I} \langle x_i^2 \rangle$ is a square is equivalent to finding a nonzero vector $\gamma = (\gamma_1, \ldots, \gamma_k) \in \mathbb{F}_2^k$ satisfying
\begin{align*}
\sum_{i=1}^k \gamma_i a_{ij} \equiv 0 \pmod{2} \quad \text{for all } j = 0, 1, \ldots, r.
\end{align*}
This is a linear algebra problem over $\mathbb{F}_2$, solvable by Gaussian elimination in $O(k^3)$ operations. Once $k > r + 1$, a nontrivial solution is guaranteed to exist.
The total expected running time of the continued fraction factor base method (CFRAC) is $C \cdot e^{2\sqrt{\log N \cdot \log \log N}}$. This is subexponential: it grows faster than any polynomial $(\log N)^A$ but slower than any power $N^\varepsilon$. For context, the more refined **quadratic sieve** achieves $e^{\sqrt{\log N \cdot \log \log N}}$ and the **number field sieve** (the current champion for general integers) achieves $e^{b (\log N)^{1/3} (\log \log N)^{2/3}}$ with $b = (64/9)^{1/3} \approx 1.92$.
### Pollard's $(p-1)$ Method
For numbers $N$ that have a prime factor $p$ with $p - 1$ smooth (i.e.\ all prime factors of $p-1$ are small), a more targeted method applies. Fermat's Little Theorem gives $a^{p-1} \equiv 1 \pmod{p}$ for $\gcd(a, p) = 1$. If all prime powers dividing $p-1$ are at most $m$, then $(p-1) \mid k$ where $k = \operatorname{lcm}(1, 2, \ldots, m)$, so $a^k \equiv 1 \pmod{p}$ and hence $p \mid a^k - 1$.
The method: choose random $a$ with $\gcd(a, N) = 1$ (e.g.\ $a = 2$), compute $a^k \bmod N$ by repeated squaring, then compute $d = \gcd(a^k - 1, N)$. If $p-1$ is $m$-smooth, then $p \mid \gcd(a^k - 1, N)$, and unless $a^k \equiv 1 \pmod{N}$ (which is unlikely), $d$ is a nontrivial factor.
[example: Pollard $(p-1)$ Factorisation]
Take $N = 540143$ and try $k = \operatorname{lcm}(1, \ldots, 7) = 2^3 \cdot 3 \cdot 5 \cdot 7 = 840$. With $a = 2$: compute $2^{840} \bmod N$ to get $53047$, then $\gcd(540143, 53047 - 1) = \gcd(540143, 53046) = 421$. So $N = 421 \times 1283$. One checks $p - 1 = 420 = 2^2 \cdot 3 \cdot 5 \cdot 7$, confirming that $p - 1$ was indeed $7$-smooth.
[/example]
The method works only when $p - 1$ happens to be smooth. Two extensions broaden the scope: the **$(p+1)$ method**, which applies when $p + 1$ is smooth and uses the multiplicative group of $\mathbb{F}_{p^2}$ (of order $p^2 - 1$) in place of $\mathbb{F}_p^\times$; and the **elliptic curve method (ECM)**, which replaces the multiplicative group of a finite field with the group of points on an elliptic curve over $\mathbb{F}_p$. Since different elliptic curves give groups of different orders, ECM can succeed even when both $p-1$ and $p+1$ are rough. The expected running time of ECM for finding a factor $p$ is $O(e^{b\sqrt{\log p \cdot \log \log p}})$, making it particularly effective when $N$ has a factor $p$ of moderate size (say $p \approx 2^{100}$ while $N \approx 2^{1000}$).
Finally, Shor's algorithm gives a polynomial-time quantum algorithm for factoring: on a sufficiently large fault-tolerant quantum computer, it factors $N$ in $O((\log N)^3)$ quantum gate operations. As of the time of this course, the largest number factored by Shor's algorithm on actual quantum hardware is $21 = 3 \times 7$. The relevance of quantum computation to the security of RSA remains an active research and engineering challenge.
## References
- A. J. Scholl, *Number Theory*, Cambridge Part II lecture notes, Michaelmas 2017.
- G. H. Hardy and E. M. Wright, *An Introduction to the Theory of Numbers*, 6th ed., Oxford University Press, 2008.
- K. Ireland and M. Rosen, *A Classical Introduction to Modern Number Theory*, 2nd ed., Springer, 1990.
Contents
- 1. Introduction
- Three Open Problems
- Why These Problems?
- The Congruent Number Problem
- The Twin Prime Conjecture
- The Riemann Hypothesis and the Prime Counting Function
- Sophistication Behind Simple Statements
- 2. Euclid's Algorithm and Factoring
- The Division Algorithm and the Ideal Structure of $\mathbb{Z}$
- Euclid's Algorithm
- Primes and the Fundamental Theorem of Arithmetic
- Polynomial Time and the Complexity of Factoring
- Factoring is Hard
- The Infinitude of Primes
- Finding Large Primes in Practice
- 3. Congruences
- The Ring of Residues and Its Units
- Simultaneous Congruences and the Chinese Remainder Theorem
- Multiplicative Functions and the Structure of $\varphi$
- Polynomial Congruences and Roots Modulo Primes
- Primitive Roots and the Cyclicity of $(\mathbb{Z}/p\mathbb{Z})^\times$
- Primitive Roots Modulo Prime Powers
- 4. Quadratic Residues
- Quadratic Residues
- The Legendre Symbol and Euler's Criterion
- Gauss's Lemma and the Legendre Symbol of 2
- Quadratic Reciprocity
- The Jacobi Symbol and Reciprocity for Composite Moduli
- Higher Reciprocity and the Broader Context
- An Analytic Proof of the Infinitude of Primes Congruent to 3 mod 4
- 5. Binary Quadratic Forms
- Sums of Two Squares and the Problem of Representation
- Equivalence of Forms and the Group $\mathrm{SL}_2(\mathbb{Z})$
- The Discriminant as an Invariant
- Positive Definite Forms and Definiteness
- Reduced Forms and the Reduction Algorithm
- Finiteness and the Reduced Form Bounds
- Uniqueness of the Reduced Representative
- Primes Represented as Sums of Two Squares
- The Class Number
- Which Integers Does a Form Represent?
- Representation by Forms of Class Number One
- Indefinite Forms and Higher-Dimensional Generalisations
- 6. Distribution of the Primes
- How Fast Do the Primes Grow?
- Gaps Between Consecutive Primes
- Divergence of the Sum of Reciprocals of Primes
- The Riemann $\zeta$-Function
- Analytic Continuation of the Zeta Function
- Dirichlet Series and Multiplicative Functions
- The von Mangoldt Function and Primes in Dirichlet Series
- Dirichlet's Theorem on Primes in Arithmetic Progressions
- The Sieve of Eratosthenes and Legendre's Formula
- The $p$-adic Valuation and Divisibility of Binomial Coefficients
- Chebyshev's Theorem
- Bertrand's Postulate
- 7. Continued Fractions
- The Continued Fraction Algorithm
- Convergents and Their Recurrence
- The Determinant Identity and Convergence
- Rational Approximations and Best Approximation
- Quadratic Irrationals and Periodic Expansions
- Galois's Theorem on Purely Periodic Expansions
- Pell's Equation
- 8. Primality Testing and Factorisation
- Fermat's Test and Its Limitations
- The Euler and Strong Pseudoprime Tests
- The Miller–Rabin Test
- Factorisation Algorithms
- Fermat Factorisation
- The Factor Base Method
- Generating B-Numbers via Continued Fractions
- Linear Algebra over $\mathbb{F}_2$ and Running Time
- Pollard's $(p-1)$ Method
- References
Cambridge II Number Theory
Content
Problems
History
Created by admin on 4/24/2026 | Last updated on 4/24/2026
Prerequisites
No prerequisites required for this page.
Rate this page
★
★
★
★
★
Poor
Excellent