Cambridge IB Groups, Rings and Modules - Content Verification
Current Content
Debug: Found 1993 attribution entries
First Attribution: Source: create, Text length: 837, Start: N/A, End: N/A
Page content length: 100129
Algebra, at its core, is the study of structure: the hidden patterns that make collections of objects behave in regular, predictable ways. Where analysis asks how fast does this change?, algebra asks what is preserved? Three layers of structure are the subject of these notes. Groups capture the pure essence of symmetry — the transformations of an object that leave it unchanged. Rings generalize the integers, equipping a set with two compatible operations and asking when the arithmetic of $\mathbb{Z}$ — divisibility, primality, factorization — extends to this broader setting. Modules generalize vector spaces by allowing scalars to come from a ring rather than a field; the reward is a unified theory that simultaneously classifies finite abelian groups and explains why every complex matrix is similar to a block-diagonal one.
The common thread through all three is the same methodology: identify the correct notion of subobject, form the quotient by identifying elements related by that subobject, and use the isomorphism theorems to read off the relationship between original, subobject, and quotient. Learning this methodology for groups — where it first appears in its cleanest form — is what makes the transition to rings and modules almost automati
c
.
These notes follow the Cambridge Part IB course in Groups, Rings, and Modules. Each chapter is self-contained, but the progression is deliberate: groups provide the language, rings provide the arithmetic, and modules provide the classification machinery that ties everything toget
her.
G
roups
A group is the mathematical distillation of the concept of symmetry. This chapter begins with the basic definitions — groups, subgroups, cosets — and the first deep result, Lagrange's theorem. It then builds the structural theory: the notion of a normal subgroup and quotient group, the three isomorphism theorems, group actions and the orbit-stabilizer theorem, conjugacy classes and the class equation, Sylow's theorems, and finally the classification of finite abelian groups and the simplicity of the alternating groups. Each of these topics answers a progressively sharper version of the same question: how much can we determine about a group's structure from limit
ed data?
Symmetry and the G
ro
up Axioms
What
Is a Group?
Before writing down an axiom, consider what we want to capture. The symmetries of an equilateral triangle are the six transformations that map the triangle to itself: three rotations ($0°$, $120°$, $240°$) and three reflections. These transformations can be composed — "rotate, then reflect" is itself a transformation of the triangle. This composition is associative, there is a "do nothing" transformation, and every transformation can be undone. The six symmetries, together with composition, form the dihedral group $D_6$. What makes this interesting is not the triangle, but the algebraic structure of the six-element set itself — and a group is precisely the abstraction of t
hat structure.
[d
efinition:Group]
A group is a triple $(G, \cdot, e)$ where $G$ is a set, $\cdot : G \times G \to G$ is a binary operation, and $e \in G$ is an ele
ment, satisfyin
g:
\begin{align*}
&\text{(Associativity)} \quad (a \cdot b) \cdot c = a \cdot (b \cdot c) \quad \text{for all
} a, b, c \in G, \
&\text{(Identity)} \quad a \cdot e = e \cdot a = a \quad \text{f
or all } a \in G, \
&\text{(Inverses)} \quad \text{for all } a \in G \text{ there exists } a^{-1} \in G \text{ with } a \cdot a^{-1}
= a^{-1} \cdo
t a = e.
\end
{
align*}
[/definition]
We usually suppress the operation and write $ab$ for $a \cdot b$. Observe that neither the identity nor inverses are assumed to be unique in the definition — but both turn out to be. If $b$ is also an identity, then $e = eb = b$; if $b$ is also an inverse of $a$, then $b = be = b(aa^{-1}) = (ba)a^{-1} = ea^{-1} = a^{-1}$. So the structure is more rig
i
d than it first appears.
[definition:Abelian Group]
A group $G$ is abelian if $ab = ba$ for all $a, b \in G$.
[/definition]
The distinction between abelian and non-abelian groups is fundamental: in an abelian group, the order of composition is irrelevant, while in a non-abelian group it is not. The symmetry group of the equilateral triangle is non-abelian: rotating by $120°$ and then reflecting across a fixed axis gives a different result than reflecting first and then rotating.
[example:Foundational Examples of Groups]
The following are the core examples that will recur throughout the course.
Additive number groups. The sets $(\mathbb{Z}, +, 0)$, $(\mathbb{Q}, +, 0)$, $(\mathbb{R}, +, 0)$, $(\mathbb{C}, +, 0)$ are all abelian groups. The key point is that subtraction is always possible: $a + (-a) = 0$.
Symmetric group. The symmetric group $S_n$ is the group of all bijections $\{1, 2, \ldots, n\} \to \{1, 2, \ldots, n\}$ under composition. It has order $n!$. We write permutations in disjoint cycle notation: $(1\ 2\ 3)(4\ 5)$ denotes the permutation sending $1 \mapsto 2 \mapsto 3 \mapsto 1$ and $4 \mapsto 5 \mapsto 4$, with $6, 7, \ldots$ fixed. Since permutations are functions, composition is right-to-left: to compute $(1\ 2\ 3) \circ (1\ 2)$, first apply $(1\ 2)$, then $(1\ 2\ 3)$. Tracing each element: $1 \overset{(1\ 2)}{\mapsto} 2 \overset{(1\ 2\ 3)}{\mapsto} 3$, then $2 \overset{(1\ 2)}{\mapsto} 1 \overset{(1\ 2\ 3)}{\mapsto} 2$, then $3 \overset{(1\ 2)}{\mapsto} 3 \overset{(1\ 2\ 3)}{\mapsto} 1$. The composition sends $1 \mapsto 3$, $2 \mapsto 2$, $3 \mapsto 1$, which is the transposition $(1\ 3)$. This right-to-left convention is essential to keep in mind.
General linear group. The set $\mathrm{GL}_n(\mathbb{R})$ of invertible $n \times n$ real matrices forms a group under matrix multiplication. It is non-abelian for $n \geq 2$.
Cyclic group. For $n \geq 1$, the cyclic group $C_n = \mathbb{Z}/n\mathbb{Z}$ is the group of integers modulo $n$ under addition. It is abelian of order $n$.
[/example]
Subgroups
Not every subset of a group is itself a group under the inherited operation. We want those subsets that are closed under the operation and under taking inverses — that is, subsets which are groups in their own right.
[definition:Subgroup]
Let $(G, \cdot, e)$ be a group. A subset $H \subseteq G$ is a subgroup, written $H \leq G$, if:
\begin{align*}
&\text{(i) } e \in H, \\
&\text{(ii) } a, b \in H \implies ab \in H, \\
&\text{(iii) } a \in H \implies a^{-1} \in H.
\end{align*}
[/definition]
A convenient criterion collapses these three conditions into one. If $H$ is non-empty and closed under the operation $h_1 h_2^{-1}$, then $H$ is a subgroup: taking $h_1 = h_2$ gives $e \in H$; then $h_1 = e$ gives $h_2^{-1} \in H$ for any $h_2 \in H$; and combining, $H$ is closed under products. This "subgroup lemma" is used constantly.
[example:A Subgroup Computation in $S_3$]
Consider $S_3$, the symmetric group on three elements. Its six elements in cycle notation are
\begin{align*}
e,\ (1\ 2),\ (1\ 3),\ (2\ 3),\ (1\ 2\ 3),\ (1\ 3\ 2).
\end{align*}
Let $H = \{e, (1\ 2\ 3), (1\ 3\ 2)\}$. We verify $H \leq S_3$. The set is non-empty. Computing products: $(1\ 2\ 3)(1\ 2\ 3) = (1\ 3\ 2)$ and $(1\ 2\ 3)(1\ 3\ 2) = e$, so $H$ is closed under products. Inverses: $(1\ 2\ 3)^{-1} = (1\ 3\ 2)$ and vice versa. So $H \leq S_3$. This is the alternating group $A_3 \cong C_3$.
Now let $K = \{e, (1\ 2)\}$. Then $K \leq S_3$ as well, since $(1\ 2)^{-1} = (1\ 2)$ (a transposition is its own inverse) and $(1\ 2)(1\ 2) = e \in K$. But $\{(1\ 2), (1\ 3)\}$ is not a subgroup, since $(1\ 2)(1\ 3) = (1\ 3\ 2) \notin \{(1\ 2), (1\ 3)\}$.
[/example]
Cosets and the Counting Principle
Lagrange's theorem is the first genuinely surprising result of group theory. It says that the order of any subgroup must divide the order of the whole group — a powerful constraint that immediately rules out many potential subgroup orders. The proof is elegant and rests on a single observation about how a subgroup partitions its parent group.
Before stating it, we make precise the notion of "size" for both groups and elements.
[definition:Order of a Group]
The order of a group $G$, written $|G|$, is the cardinality of the underlying set.
[/definition]
[definition:Order of an Element]
The order of an element $g \in G$, written $\mathrm{ord}(g)$, is the smallest positive integer $n$ such that $g^n = e$. If no such $n$ exists, $g$ has infinite order.
[/definition]
The cosets of a subgroup are the key to Lagrange's proof. They partition the group into pieces of equal size.
[definition:Left Coset]
Let $H \leq G$ and $g \in G$. The left coset of $H$ with representative $g$ is
\begin{align*}
gH = \{gh : h \in H\} \subseteq G.
\end{align*}
The collection of all left $H$-cosets is written $G/H = \{gH : g \in G\}$, and the number of left cosets is the index $|G : H|$.
[/definition]
Two cosets are either identical or completely disjoint: if $g_1H \cap g_2H \neq \varnothing$, pick a common element $g_1h_1 = g_2h_2$, so $g_2^{-1}g_1 = h_2h_1^{-1} \in H$, and one verifies $g_1H = g_2H$. Since every $g \in G$ belongs to the coset $gH$ (as $e \in H$), the cosets partition $G$. Moreover, the map $h \mapsto gh$ is a bijection $H \to gH$, so all cosets have the same size $|H|$. Together, these two observations give Lagrange's theorem.
[quotetheorem:841]
The power of Lagrange's Theorem is its scope: it applies to every finite group and every subgroup, with no additional hypotheses. Its first consequence is that the order of any element divides the group order, since the cyclic subgroup $\langle g \rangle = \{e, g, g^2, \ldots, g^{n-1}\}$ has order $\mathrm{ord}(g)$, which must divide $|G|$. Less obviously, it implies that every group of prime order $p$ is cyclic: any non-identity element has order dividing $p$, hence order $p$, so it generates the whole group.
What Lagrange's theorem does not say is the converse: if $k \mid |G|$, there need not be a subgroup of order $k$. The group $A_4$ (alternating group on four letters, order $12$) has no subgroup of order $6$, even though $6 \mid 12$. The Sylow theorems, proved later, give the sharpest partial converse: subgroups of prime-power order always exist.
[quoteproof:841]
[example:Coset Partition of $S_3$ by $A_3$]
We compute the left $A_3$-cosets in $S_3$ explicitly. Here $A_3 = \{e, (1\ 2\ 3), (1\ 3\ 2)\}$ and $|S_3 : A_3| = 6/3 = 2$.
The two cosets are:
\begin{align*}
e \cdot A_3 &= \{e, (1\ 2\ 3), (1\ 3\ 2)\} = A_3, \\
(1\ 2) \cdot A_3 &= \{(1\ 2), (1\ 2)(1\ 2\ 3), (1\ 2)(1\ 3\ 2)\}.
\end{align*}
We compute each product using right-to-left composition. For $(1\ 2)(1\ 2\ 3)$: apply $(1\ 2\ 3)$ first, then $(1\ 2)$. Tracing elements: $1 \overset{(1\ 2\ 3)}{\mapsto} 2 \overset{(1\ 2)}{\mapsto} 1$, then $2 \overset{(1\ 2\ 3)}{\mapsto} 3 \overset{(1\ 2)}{\mapsto} 3$, then $3 \overset{(1\ 2\ 3)}{\mapsto} 1 \overset{(1\ 2)}{\mapsto} 2$. The composition maps $1 \mapsto 1$, $2 \mapsto 3$, $3 \mapsto 2$, which is the transposition $(2\ 3)$.
For $(1\ 2)(1\ 3\ 2)$: apply $(1\ 3\ 2)$ first, then $(1\ 2)$. Tracing: $1 \overset{(1\ 3\ 2)}{\mapsto} 3 \overset{(1\ 2)}{\mapsto} 3$, then $2 \overset{(1\ 3\ 2)}{\mapsto} 1 \overset{(1\ 2)}{\mapsto} 2$, then $3 \overset{(1\ 3\ 2)}{\mapsto} 2 \overset{(1\ 2)}{\mapsto} 1$. The result is $(1\ 3)$.
Thus:
\begin{align*}
(1\ 2) \cdot A_3 = \{(1\ 2), (2\ 3), (1\ 3)\}.
\end{align*}
The two cosets are exactly the even and odd permutations, partitioning $S_3$ into two equal halves of size $3$.
[/example]
Normal Subgroups and Quotient Groups
The Problem with Cosets
The coset partition $G/H = \{gH : g \in G\}$ is a beautiful object. A natural question is: does it form a group in its own right, with multiplication defined by
\begin{align*}
(g_1 H) \cdot (g_2 H) = g_1 g_2 H?
\end{align*}
The answer is: not always. The formula must be well-defined — it must give the same answer regardless of which representative we choose for each coset. Changing $g_2$ to $g_2' = g_2 h$ (another representative of $g_2 H$) gives no trouble: $g_1 g_2 h H = g_1 g_2 H$ since $h \in H$. But changing $g_1$ to $g_1' = g_1 h$ causes a problem:
\begin{align*}
(g_1 h)(g_2) H = g_1 (h g_2) H.
\end{align*}
For this to equal $g_1 g_2 H$, we need $g_2^{-1} h g_2 \in H$. This must hold for every $h \in H$ and every $g_2 \in G$. Subgroups with this property are called normal.
[definition:Normal Subgroup]
A subgroup $H \leq G$ is normal, written $H \trianglelefteq G$, if for every $h \in H$ and every $g \in G$,
\begin{align*}
g^{-1} h g \in H.
\end{align*}
Equivalently, $gH = Hg$ for all $g \in G$, i.e. left and right cosets coincide.
[/definition]
Every subgroup of an abelian group is normal, since $g^{-1}hg = g^{-1}gh = h \in H$ trivially. In non-abelian groups, normality is a genuine constraint. In $S_3$, the subgroup $A_3 = \{e, (1\ 2\ 3), (1\ 3\ 2)\}$ is normal (as we will see from the coset computation above — it equals its own right coset), but $\{e, (1\ 2)\}$ is not: one checks that $(1\ 3)^{-1}(1\ 2)(1\ 3) = (2\ 3) \notin \{e, (1\ 2)\}$.
[definition:Quotient Group]
If $H \trianglelefteq G$, the quotient group $G/H$ is the set of left $H$-cosets with multiplication
\begin{align*}
(g_1 H) \cdot (g_2 H) = g_1 g_2 H,
\end{align*}
identity element $eH = H$, and inverse $(gH)^{-1} = g^{-1}H$.
[/definition]
The normality condition is exactly what makes this well-defined. The group axioms are inherited from $G$: associativity follows from associativity in $G$, and the identity and inverse checks are immediate.
Homomorphisms and Isomorphisms
Subgroups and quotient groups describe the internal structure of a single group. We now want to understand how different groups relate to each other, and for that we need structure-preserving maps.
[definition:Group Homomorphism]
Let $(G, \cdot, e_G)$ and $(H, \ast, e_H)$ be groups. A function $\varphi : G \to H$ is a group homomorphism if
\begin{align*}
\varphi(g_1 \cdot g_2) = \varphi(g_1) \ast \varphi(g_2) \quad \text{for all } g_1, g_2 \in G.
\end{align*}
[/definition]
From the homomorphism property alone one can deduce $\varphi(e_G) = e_H$ (apply the condition with $g_1 = g_2 = e_G$, then cancel $\varphi(e_G)$) and $\varphi(g^{-1}) = \varphi(g)^{-1}$ (apply with $g_2 = g^{-1}$ and use $\varphi(e_G) = e_H$). So a homomorphism automatically preserves the entire group structure.
Every homomorphism carries two pieces of information: what it hits (the image) and what it collapses (the kernel). Understanding this decomposition — what is preserved versus what is identified — is the key to the isomorphism theorems.
[definition:Kernel of a Homomorphism]
The kernel of a homomorphism $\varphi : G \to H$ is
\begin{align*}
\ker(\varphi) = \{g \in G : \varphi(g) = e_H\}.
\end{align*}
[/definition]
[definition:Image of a Homomorphism]
The image of a homomorphism $\varphi : G \to H$ is
\begin{align*}
\operatorname{im}(\varphi) = \{h \in H : h = \varphi(g) \text{ for some } g \in G\}.
\end{align*}
[/definition]
The kernel is always a normal subgroup of $G$, and the image is always a subgroup of $H$ — both follow from straightforward computation. The kernel measures how far $\varphi$ is from being injective: $\varphi$ is injective if and only if $\ker(\varphi) = \{e_G\}$.
[definition:Group Isomorphism]
A group isomorphism is a bijective group homomorphism. Two groups $G$ and $H$ are isomorphic, written $G \cong H$, if there exists an isomorphism between them.
[/definition]
Isomorphic groups are, for all algebraic purposes, identical: they have the same order, the same subgroup lattice, the same element orders. We regard them as the same group, presented differently.
The Isomorphism Theorems
The three isomorphism theorems are the primary tools for understanding quotient groups. Their common theme: given a homomorphism, the quotient of the domain by the kernel is isomorphic to the image. This single fact, once understood deeply, makes most computations with quotient groups routine.
The problem that the first isomorphism theorem solves is this: we have a homomorphism $\varphi : G \to H$, and we want to understand $\operatorname{im}(\varphi)$ — but $\operatorname{im}(\varphi)$ lives inside $H$, which may be complicated. The theorem says we can instead study $G/\ker(\varphi)$, which lives inside $G$ and is often simpler. More importantly, it gives an explicit isomorphism between the two.
[quotetheorem:842]
The First Isomorphism Theorem for Groups is used in virtually every computation involving quotient groups. Its key feature is that it transforms questions about subgroups of $H$ (the target) into questions about quotients of $G$ (the source), where we have more control. Notice that the theorem does not require $\varphi$ to be surjective — if it is, then $\operatorname{im}(\varphi) = H$ and we get $G/\ker(\varphi) \cong H$ directly. The injectivity of the induced map $\bar{\varphi} : G/\ker(\varphi) \to \operatorname{im}(\varphi)$ is automatic: two cosets are identified precisely when they have the same image, which is exactly the coset condition.
[quoteproof:842]
[example:The Exponential Isomorphism]
Consider the map
\begin{align*}
\varphi : (\mathbb{C}, +) &\to (\mathbb{C} \setminus \{0\}, \times) \\
z &\mapsto e^z.
\end{align*}
The identity $e^{z + w} = e^z e^w$ says exactly that $\varphi$ is a group homomorphism. What is the kernel? We need $e^z = 1$, which holds precisely when $z = 2\pi i k$ for some $k \in \mathbb{Z}$, so $\ker(\varphi) = 2\pi i \mathbb{Z}$. The image is all of $\mathbb{C} \setminus \{0\}$, since the complex logarithm exists (the exponential is surjective onto non-zero complex numbers). By the first isomorphism theorem:
\begin{align*}
(\mathbb{C}/2\pi i\mathbb{Z},\ +) \cong (\mathbb{C} \setminus \{0\},\ \times).
\end{align*}
This is a remarkable identification: the additive quotient of $\mathbb{C}$ by an arithmetic progression equals the multiplicative group of non-zero complex numbers. Geometrically, $\mathbb{C}/2\pi i \mathbb{Z}$ wraps the complex plane into a cylinder by identifying $z$ with $z + 2\pi i$, and $e^z$ wraps that cylinder further to fill $\mathbb{C} \setminus \{0\}$.
[/example]
The second isomorphism theorem addresses a subtler situation: we have two subgroups $H$ and $K$ of $G$, with $K$ normal, and we want to understand how their interaction $H \cap K$ relates to the coset spaces $HK/K$ and $H/(H \cap K)$.
[quotetheorem:843]
The Second Isomorphism Theorem for Groups is best understood as saying: the "$K$-shadow" of $H$ inside $G/K$ is $HK/K$, and the kernel of the restriction of the quotient map $G \to G/K$ to $H$ is exactly $H \cap K$. The proof works by writing down the obvious map $h \mapsto hK$ and applying the first isomorphism theorem — the content is entirely in identifying the kernel and image correctly. A key consequence: $|HK| = |H||K|/|H \cap K|$ (when $G$ is finite), which counts elements in a product of two subgroups.
[quoteproof:843]
The third isomorphism theorem is sometimes called the "cancellation rule" for quotients, and it says that quotienting in stages gives the same result as quotienting all at once.
[quotetheorem:844]
The Third Isomorphism Theorem for Groups is the group-theoretic analogue of the arithmetic identity $(n/m)/1 = n/m$: if we have already taken the quotient by $K$, and we further quotient by $L/K$, we get the same thing as quotienting $G$ by $L$ directly. The proof is a one-line application of the first isomorphism theorem to the obvious surjective map $G/K \to G/L$. Its main use is in induction arguments, where one wants to pass to a quotient group and then apply an inductive hypothesis.
[quoteproof:844]
Finally, the correspondence theorem records how the subgroup lattice of $G/K$ mirrors the part of the subgroup lattice of $G$ that lies above $K$.
[quotetheorem:854]
The Correspondence Theorem for Groups is indispensable whenever we need to enumerate or classify subgroups of a quotient group. Since $K \trianglelefteq G$, every normal subgroup of $G$ containing $K$ descends to a normal subgroup of $G/K$, and conversely. This will be used repeatedly in Sylow theory and in the classification of simple groups: to show $G/K$ is simple, we show $G$ has no normal subgroups strictly between $K$ and $G$.
[quoteproof:854]
Group Actions
Every abstract group arises in practice as a group of symmetries of something. A group action formalizes this: it is a way of letting $G$ "act on" a set $X$ by permuting its elements, compatibly with the group structure.
[definition:Group Action]
An action of a group $(G, \cdot)$ on a set $X$ is a function
\begin{align*}
\ast : G \times X &\to X \\
(g, x) &\mapsto g \ast x
\end{align*}
satisfyin
g:
\begin{align*}
&
\text{(i) } g_1 \ast (g_2 \ast x) = (g_1 \cdot g_2) \ast x \quad \text{for all } g_1, g_2 \in G,\ x \in X, \
&\text{(ii) } e \ast x = x \quad \text{for all } x \in X.
\end{align*}
[/definition]
There is a cleaner way to think about this: an action of $G$ on $X$ is the same as a group homomorphism $\varphi : G \to \mathrm{Sym}(X)$. Given an action, define $\varphi(g) = (x \mapsto g \ast x)$; this is a bijection (with inverse $\varphi(g^{-1})$), and condition (i) says $\varphi(g_1) \circ \varphi(g_2) = \varphi(g_1 g_2)$. Conversely, given $\varphi : G \to \mathrm{Sym}(X)$, define $g \ast x = \varphi(g)(x)$. The two constructions are inverse to each other.
[definition:Permutation Representation]
A permutation representation of $G$ is a group homomorphism $\varphi : G \to \mathrm{Sym}(X)$ for some set $X$. The kernel of the action is $\ker(\varphi)$ and the image is $\operatorname{im}(\varphi)$.
[/definition]
[definition:Orbit]
For a group $G$ acting on a set $X$ and an element $x \in X$, the orbit of $x$ is
\begin{align*}
G \cdot x = \{g \ast x : g \in G\} \subseteq X.
\end{align*}
[/definition]
[definition:Stabilizer]
The stabilizer (or isotropy group) of $x \in X$ under the action of $G$ is
\begin{align*}
G_x = \{g \in G : g \ast x = x\} \leq G.
\end{align*}
[/definition]
The orbits partition $X$ (they are the equivalence classes of the relation $x \sim y \iff y \in G \cdot x$). The stabilizer is always a subgroup of $G$. These two facts combine into the orbit-stabilizer theorem, which is the counting engine behind most applications of group actions.
[quotetheorem:845]
The Orbit-Stabilizer Theorem is the group-action version of Lagrange's theorem. The bijection $G/G_x \leftrightarrow G \cdot x$ says: elements of $G$ that send $x$ to the same image are exactly the elements of the same coset of $G_x$. The finite version $|G| = |G_x| \cdot |G \cdot x|$ is the key formula: to count the orbit, divide $|G|$ by the stabilizer size (which is computable if we understand which elements fix $x$). This will be used to count conjugacy classes, to count Sylow subgroups, and to determine the structure of $p$-groups.
[quoteproof:845]
Every group can be viewed as a group of symmetries — not just an abstract algebraic system — by letting it act on itself. This is Cayley's theorem.
[quotetheorem:846]
Cayley's Theorem has a philosophical message as much as a practical one: groups are not just abstract algebraic systems; they are all, in principle, groups of permutations. In practice, the embedding $G \hookrightarrow \mathrm{Sym}(G)$ is rarely the most efficient way to study $G$ (since $\mathrm{Sym}(G)$ has order $|G|!$, which grows much faster than $|G|$). But having a concrete realization of every group as a permutation group proves existence results and connects abstract group theory to the older tradition of studying permutations directly.
[quoteproof:846]
[example:Counting Symmetries of a Cube via Orbit-Stabilizer]
Let $G$ be the rotation group of the cube, and let $X$ be the set of four main diagonals of the cube (the lines connecting opposite vertices: there are four pairs of opposite vertices, giving four diagonals).
Every rotation of the cube permutes the diagonals, so $G$ acts on $X$, giving $\varphi : G \to \mathrm{Sym}(X) \cong S_4$.
We compute $|G|$ using orbit-stabilizer. Fix a diagonal $d$. The orbit $G \cdot d$ is all four diagonals (any two diagonals can be mapped to each other by a rotation of the cube), so $|G \cdot d| = 4$.
The stabilizer $G_d$ consists of rotations fixing the diagonal $d$. Such a rotation preserves the axis determined by $d$. The rotations fixing this axis are: the identity; rotations by $120°$ and $240°$ around $d$ (viewing $d$ as the rotation axis, the three vertices of the triangular cross-section perpendicular to $d$ are cyclically permuted). In addition, there are three rotations by $180°$ around axes that pass through midpoints of opposite edges and are perpendicular to $d$ — each of these swaps the two endpoints of $d$ while fixing $d$ as a set. This gives $|G_d| = 6$.
By the Orbit-Stabilizer Theorem:
\begin{align*}
|G| = |G_d| \cdot |G \cdot d| = 6 \times 4 = 24.
\end{align*}
As a consistency check, we can instead let $G$ act on the set of six faces. The orbit of any face $f$ has size $6$ (rotations can send any face to any other), and the stabilizer of $f$ consists of rotations by $0°, 90°, 180°, 270°$ around the axis through $f$ and the opposite face, giving $|G_f| = 4$. This confirms $|G| = 4 \times 6 = 24$.
Now $\varphi : G \to S_4$ is injective (a non-identity rotation must move at least one diagonal, so the kernel of the action on diagonals is trivial) and $|G| = 24 = |S_4|$, so $G \cong S_4$.
[/example]
Conjugacy and the Class Equation
Among all group actions, the action of a group on itself by conjugation — $g \ast a = gag^{-1}$ — stands out for its algebraic richness. The orbits are the conjugacy classes, which encode fundamental information about the group's structure.
[definition:Conjugacy Class]
The conjugacy class of $g \in G$ is
\begin{align*}
\mathrm{ccl}_G(g) = \{hgh^{-1} : h \in G\},
\end{align*}
i.e. the orbit of $g$ under the conjugation action of $G$ on itself.
[/definition]
[definition:Centralizer]
The centralizer of $g \in G$ is
\begin{align*}
C_G(g) = \{h \in G : hgh^{-1} = g\} = \{h \in G : hg = gh\},
\end{align*}
i.e. the stabilizer of $g$ under the conjugation action. It is a subgroup of $G$.
[/definition]
[definition:Centre]
The centre of $G$ is
\begin{align*}
Z(G) = \{h \in G : hg = gh \text{ for all } g \in G\} = \bigcap_{g \in G} C_G(g).
\end{align*}
[/definition]
Elements of $Z(G)$ commute with everything; they form conjugacy classes of size $1$ (since $hgh^{-1} = g$ for all $h$ iff $g \in Z(G)$). By the Orbit-Stabilizer Theorem, $|\mathrm{ccl}_G(g)| = |G|/|C_G(g)|$. Since the conjugacy classes partition $G$, we get the class equation:
\begin{align*}
|G| = |Z(G)| + \sum_{\substack{g \notin Z(G) \\ \text{one per class}}} \frac{|G|}{|C_G(g)|}.
\end{align*}
Every term in the sum on the right is greater than $1$ (since $g \notin Z(G)$ means the centralizer is a proper subgroup). This equation is the key to analyzing $p$-groups.
In symmetric groups, conjugacy classes are determined entirely by cycle type: two permutations in $S_n$ are conjugate if and only if they have the same cycle structure. This is because $\sigma (a_1\ a_2\ \cdots\ a_k) \sigma^{-1} = (\sigma(a_1)\ \sigma(a_2)\ \cdots\ \sigma(a_k))$ — conjugation simply relabels the elements being permuted.
[quotetheorem:848]
The p-Group Has Nontrivial Center theorem is one of the most consequential in all of group theory. Its immediate corollary is that a $p$-group of order $p^n$ with $n \geq 2$ is never simple: $Z(G)$ is a non-trivial normal subgroup. More subtly, it enables inductive arguments: since $Z(G)$ is a normal subgroup of $G$, we can form the quotient $G/Z(G)$, which is a strictly smaller $p$-group, and apply the theorem again. This "center-killing" induction is the basis for the classification of $p$-groups of small order. Notice the role of $p$-divisibility in the proof: the class equation forces $p \mid |Z(G)|$ because $p$ divides $|G|$ and $p$ divides every non-singleton conjugacy class size; this is where the prime-power hypothesis is used.
[quoteproof:848]
[example:Conjugacy Classes in a Non-Abelian Group of Order $p^3$]
Let $p$ be an odd prime and let $G$ be a non-abelian group of order $p^3$. We determine the complete conjugacy class structure of $G$.
Step 1: The centre has order $p$.
By p-Group Has Nontrivial Center, $|Z(G)| \geq p$. If $|Z(G)| = p^3$, then $G$ is abelian, contradicting our assumption. If $|Z(G)| = p^2$, then $|G/Z(G)| = p$, which is cyclic. But if $G/Z(G)$ is cyclic, then $G$ is abelian — a standard lemma (if every element of $G$ has the form $g^r z$ with $z \in Z(G)$, then any two elements commute) — contradiction again. So $|Z(G)| = p$.
Step 2: Centralizers of non-central elements.
For $g \notin Z(G)$, we have $Z(G) \subseteq C_G(g)$ (the centre commutes with everything) and $C_G(g) \subsetneq G$ (since $g \notin Z(G)$). By Lagrange, $|C_G(g)|$ divides $p^3$ and satisfies $p \leq |Z(G)| \leq |C_G(g)| < p^3$. The only possibility is $|C_G(g)| = p^2$.
Step 3: Class sizes.
By the orbit-stabilizer theorem, $|\mathrm{ccl}_G(g)| = |G|/|C_G(g)|$. For $g \in Z(G)$: $|C_G(g)| = |G| = p^3$, so the class has size $1$. For $g \notin Z(G)$: $|C_G(g)| = p^2$, so the class has size $p$.
Step 4: Counting.
The class equation gives:
\begin{align*}
p^3 = \underbrace{p}_{\text{elements of } Z(G)} + \lambda \cdot p,
\end{align*}
where $\lambda$ is the number of non-singleton conjugacy classes. Solving: $\lambda = (p^3 - p)/p = p^2 - 1$.
So $G$ has exactly $p$ conjugacy classes of size $1$ (the centre) and $p^2 - 1$ conjugacy classes of size $p$. The total number of conjugacy classes is $p + (p^2 - 1) = p^2 + p - 1$. For $p = 2$, this gives $2^2 + 2 - 1 = 5$ conjugacy classes in a non-abelian group of order $8$ (both $D_8$ and $Q_8$ have exactly $5$ conjugacy classes, consistent with this count).
[/example]
Sylow Theory
Lagrange's theorem tells us that the order of any subgroup divides $|G|$ — but it says nothing about which divisors actually arise. In general, the converse fails: $A_4$ has order $12$ but no subgroup of order $6$. The Sylow theorems provide the strongest result in the other direction: for every prime power $p^a$ dividing $|G|$ to its full extent, a subgroup of that order exists.
[definition:Sylow $p$-Subgroup]
Let $G$ be a finite group with $|G| = p^a m$ where $p$ is prime and $p \nmid m$. A Sylow $p$-subgroup of $G$ is a subgroup of order $p^a$. The set of all Sylow $p$-subgroups is denoted $\mathrm{Syl}_p(G)$, and $n_p = |\mathrm{Syl}_p(G)|$.
[/definition]
Why should such subgroups exist? The naive expectation from Lagrange would be that we need to build them up from smaller subgroups, but there is no obvious reason they should piece together correctly. The key insight in Sylow's proof is to act on the set of all $p^a$-element subsets of $G$ — a combinatorial object whose size is coprime to $p$ — and extract an invariant subset.
[quotetheorem:847]
The three parts of Sylow's Theorems answer three different questions about the Sylow $p$-subgroups. The first guarantees existence. The second — that all Sylow $p$-subgroups are conjugate — is the deeper result: it says the subgroup is essentially unique, up to the internal symmetry of $G$. The third gives arithmetic constraints on $n_p$: the congruence $n_p \equiv 1 \pmod{p}$ and the divisibility $n_p \mid m$ together severely restrict how many Sylow subgroups there can be. In practice, one combines $n_p \equiv 1 \pmod{p}$ with $n_p \mid m$ to narrow $n_p$ to a small list of candidates, and then either forces $n_p = 1$ (giving a unique, hence normal, Sylow subgroup) or derives a contradiction to prove simplicity or non-simplicity.
[quoteproof:847]
[example:No Simple Group of Order 1000]
Let $|G| = 1000 = 2^3 \cdot 5^3$. We show $G$ is not simple.
Apply Sylow's Theorems with $p = 5$. We need $n_5 \equiv 1 \pmod{5}$ and $n_5 \mid 2^3 = 8$. The divisors of $8$ that are congruent to $1 \pmod{5}$ are: $1$ (since $8 \equiv 3 \pmod{5}$, $4 \equiv 4 \pmod{5}$, $2 \equiv 2 \pmod{5}$, $1 \equiv 1 \pmod{5}$). The only option is $n_5 = 1$.
Since there is exactly one Sylow $5$-subgroup $P$, it must be normal in $G$: any conjugate $gPg^{-1}$ is again a Sylow $5$-subgroup (it has the same order $5^3 = 125$), and since there is only one, $gPg^{-1} = P$ for all $g \in G$. So $P \trianglelefteq G$ with $P \neq \{e\}$ and $P \neq G$ (since $|P| = 125 < 1000 = |G|$). Thus $G$ is not simple.
[/example]
[example:No Simple Group of Order 132]
Let $|G| = 132 = 2^2 \cdot 3 \cdot 11$. We show $G$ is not simple by deriving an element-counting contradiction if we assume it is.
Step 1: Sylow 11-subgroups. We have $n_{11} \equiv 1 \pmod{11}$ and $n_{11} \mid 12$ (since $132 = 11 \cdot 12$). The divisors of $12$ are $1, 2, 3, 4, 6, 12$. Those congruent to $1 \pmod{11}$ are: $1$ and $12$. If $G$ were simple, $n_{11} \neq 1$, so $n_{11} = 12$.
Step 2: Sylow 3-subgroups. We have $n_3 \equiv 1 \pmod 3$ and $n_3 \mid 44$ (since $132 = 3 \cdot 44$). Divisors of $44$: $1, 2, 4, 11, 22, 44$. Those $\equiv 1 \pmod 3$: $1, 4, 22$. Simplicity forces $n_3 \neq 1$, so $n_3 \in \{4, 22\}$. We claim $n_3 = 4$ leads to a contradiction even before the element count.
If $n_3 = 4$, then $G$ acts on $\mathrm{Syl}_3(G)$ by conjugation, giving a homomorphism $\varphi : G \to S_4$. Since $G$ is simple, $\ker \varphi = \{e\}$, so $G \cong \operatorname{im}(\varphi) \leq S_4$. But $|G| = 132 > 24 = |S_4|$, a contradiction. So $n_3 = 22$.
Step 3: Element count. Each Sylow $11$-subgroup has order $11$ (prime), so any two distinct Sylow $11$-subgroups intersect trivially. This gives
\begin{align*}
12 \times (11 - 1) = 120 \text{ elements of order } 11.
\end{align*}
Each Sylow $3$-subgroup has order $3$ (prime), so distinct ones also intersect trivially:
\begin{align*}
22 \times (3 - 1) = 44 \text{ elements of order } 3.
\end{align*}
Total elements accounted for: $120 + 44 = 164$. But $|G| = 132 < 164$. Contradiction.
Since assuming $G$ is simple leads to a contradiction, $G$ cannot be simple. $\square$
[/example]
Simple Groups and the Classification of Finite Abelian Groups
The Atoms of Group Theory
The Sylow theorems let us decompose many groups by finding normal Sylow subgroups. The groups that resist any such decomposition — groups with no proper non-trivial normal subgroups — are called simple. They are the "atoms" of group theory, in the same way that primes are the atoms of number theory.
[definition:Simple Group]
A non-trivial group $G$ is simple if its only normal subgroups are $\{e\}$ and $G$ itself.
[/definition]
Among abelian groups, the simple ones are easy to characterize: $C_p$ for prime $p$ is simple (since its only subgroups have order $1$ or $p$, and all subgroups are normal), and conversely, any abelian simple group must be cyclic of prime order (a non-cyclic abelian group has proper non-trivial subgroups, and an infinite cyclic group $\mathbb{Z}$ is not simple since $2\mathbb{Z} \trianglelefteq \mathbb{Z}$). Non-abelian simple groups are far more subtle. The smallest is $A_5$, the alternating group on five letters, which has order $60$.
Why do we care about simple groups? Finite group theory has a decomposition theorem: every finite group $G$ has a composition series — a chain $\{e\} = G_0 \trianglelefteq G_1 \trianglelefteq \cdots \trianglelefteq G_k = G$ in which each quotient $G_{i+1}/G_i$ is simple. The simple groups are therefore the building blocks, and the Jordan-Hölder theorem guarantees the list of quotients is an isomorphism invariant of $G$. Classifying all finite simple groups — the Classification of Finite Simple Groups (CFSG), completed in the early 2000s — is one of the greatest achievements of twentieth-century mathematics.
The infinite family of non-abelian simple groups most relevant to this course is the alternating groups $A_n$ for $n \geq 5$.
[quotetheorem:849]
The proof of the Alternating Groups Are Simple theorem uses a beautiful three-step strategy. First, one shows $A_n$ is generated by 3-cycles. Second, one shows any normal subgroup containing any 3-cycle must contain all 3-cycles, hence all of $A_n$ — this uses conjugation to transport a 3-cycle to any other 3-cycle, with the $n \geq 5$ hypothesis needed to make the conjugating permutation even. Third, one shows that any non-trivial normal subgroup must contain a 3-cycle, by examining all possible cycle structures and deriving a 3-cycle in each case. The condition $n \geq 5$ appears twice in the proof, both times to ensure there are enough indices to construct the needed permutations. The result fails for $n = 4$: the group $V = \{e, (1\ 2)(3\ 4), (1\ 3)(2\ 4), (1\ 4)(2\ 3)\}$ is a normal subgroup of $A_4$.
[quoteproof:849]
Classifying Finite Abelian Groups
While non-abelian simple groups are enormously complex, finite abelian groups have a complete and explicit classification. The key insight is that every finite abelian group can be written uniquely as a product of cyclic groups, with the order of each factor dividing the next.
[quotetheorem:850]
The Classification of Finite Abelian Groups reduces the study of finite abelian groups to purely combinatorial data: the invariant factors $d_1, \ldots, d_r$ with $d_1 \mid d_2 \mid \cdots \mid d_r$. Its proof (which uses the Structure Theorem for Finitely Generated Modules over Euclidean Domains from Chapter 3, applied to $\mathbb{Z}$-modules) is one of the finest examples of how algebraic machinery can produce a complete classification. For now, the theorem should be taken as given; Chapter 3 will prove it.
What the theorem does not say is that the decomposition into cyclic groups is unique in all forms — only in the specific invariant factor form where $d_1 \mid d_2 \mid \cdots \mid d_r$. There is also a primary decomposition, where each $C_{d_i}$ is further broken into prime-power cyclic pieces via the Chinese remainder theorem; that form is not unique in terms of the ordering, but the multiset of prime powers that appears is unique.
[quoteproof:850]
[example:Classifying Finite Abelian Groups of Small Order]
We list all finite abelian groups of order $\leq 16$ using the classification theorem.
Order $1$: only $\{e\}$.
Order $2$: only $C_2$.
Order $3$: only $C_3$.
Order $4$: $d_1 \mid d_2$ with $d_1 d_2 = 4$. Options: $(d_1, d_2) = (2, 2)$ or a single factor $(d_1) = (4)$. This gives $C_2 \times C_2$ and $C_4$. These are distinct: $C_4$ has an element of order $4$; $C_2 \times C_2$ does not.
Order $8$: single factor $C_8$; two factors with $d_1 \mid d_2$, $d_1 d_2 = 8$: $(2, 4)$ giving $C_2 \times C_4$; three factors: $(2, 2, 2)$ giving $C_2 \times C_2 \times C_2$.
Order $12$: $C_{12}$; and $C_2 \times C_6$ (note $2 \mid 6$). These are the only two since $(2, 6)$ is the only factorization with $d_1 \mid d_2$ and $d_1 d_2 = 12$ and $d_1 > 1$, $d_2 > 1$. (The option $(3, 4)$ fails: $3 \nmid 4$.) So there are exactly two abelian groups of order $12$.
Order $16$: $C_{16}$; $C_2 \times C_8$ (since $2 \mid 8$); $C_4 \times C_4$ (since $4 \mid 4$); $C_2 \times C_2 \times C_4$ (since $2 \mid 2 \mid 4$); $C_2 \times C_2 \times C_2 \times C_2$. That is five abelian groups of order $16$, corresponding to the five partitions of $4$: $(4), (1,3), (2,2), (1,1,2), (1,1,1,1)$ translated as exponents in the prime-power decomposition at $p = 2$.
[/example]
Rings
Groups are powerful, but they model only one operation. The integers $\mathbb{Z}$ support two: addition and multiplication, linked by distributivity. A ring is the abstraction of this two-operation structure. The central question driving ring theory is the same one that makes number theory rich: when, and in what form, does factorization work? The integers have unique prime factorization — but most rings do not, and understanding exactly which rings do, and why, will occupy this entire chapter.
The path goes: rings → ideals (the ring-theoretic analogue of normal subgroups) → quotient rings and isomorphism theorems → integral domains and fields of fractions → the hierarchy of increasingly well-behaved rings (Euclidean domains, PIDs, UFDs) → polynomial rings and their factorization theory → Noetherian rings and the Hilbert basis theorem. Each step generalizes the integers in a precise direction, revealing which properties of $\mathbb{Z}$ are robust and which are special.
Rings and Their Arithmetic
The Definition and First Examples
A ring keeps both operations of $\mathbb{Z}$ but drops the requirement that multiplication have inverses. This is deliberate: it is the absence of multiplicative inverses that makes divisibility interesting.
[definition:Ring]
A ring is a quintuple $(R, +, \cdot, 0_R, 1_R)$ where $R$ is a set, $+, \cdot : R \times R \to R$ are binary operations, and $0_R, 1_R \in R$, satisfying:
\begin{align*}
&\text{(i) } (R, +, 0_R) \text{ is an abelian group}, \\
&\text{(ii) multiplication is associative: } a(bc) = (ab)c, \text{ and } 1_R \cdot r = r \cdot 1_R = r, \\
&\text{(iii) multiplication distributes over addition: } r(s + t) = rs + rt \text{ and } (r+s)t = rt + st.
\end{align*}
[/definition]
Note that $0_R \neq 1_R$ unless $R = \{0\}$ (the zero ring). Indeed, if $1_R = 0_R$ and $r \in R$, then $r = r \cdot 1_R = r \cdot 0_R = 0_R$, so every ring in which $1_R = 0_R$ is the trivial one-element ring. One basic consequence of the axioms: $r \cdot 0_R = 0_R$ for all $r$, since $r \cdot 0_R = r \cdot (0_R + 0_R) = r \cdot 0_R + r \cdot 0_R$, and cancelling gives $r \cdot 0_R = 0_R$.
From now on, all rings are commutative: $ab
= ba$ for all
$a, b \in R$. This is the case in almost all rings arising in number theory and geometry, and the commutativity hypothesis is essential for the ideal theory developed below.
[definition:Subring]
A subset $S \subseteq R$ is a subring, written $S \leq R$, if $0_R, 1_R \in S$ and $S$ is closed under addition, subtraction, and multiplication.
[/definition]
[example:The Ring Hierarchy]
The familiar number systems form a nested chain of subrings:
\begin{
align*}
\mathbb{Z} \leq \mathbb{Q} \leq \mathbb{R} \leq \mathbb{C},
\end{align*}
under the usual $0, 1, +, \cdot$. The Gaussian integers $\mathbb{Z}[i] = \{a + bi : a, b \in \mathbb{Z}\} \leq \mathbb{C}$ are another subring. Notice that $\mathbb{Z}[i]$ contains elements (like $1 + i$) that have no multiplicative inverse within $\mathbb{Z}[i]$, just as $2 \in \mathbb{Z}$ has no inverse in $\mathbb{Z}$.
[/example]
The elements that do have multiplicative inverses are called units. They are the "small" elements from the perspective of divisibility, analogous to $\pm 1$ in $\mathbb{Z}$.
[definition:Unit]
An element $u \in R$ is a unit if there exists $v \in R$ with $uv = 1_R$. The set of all units forms a group $R^\times$ under multiplication.
[/definition]
[definition:Field]
A non-zero ring $R$ is a field if every non-zero element is a unit.
[/definition]
The fields $\mathbb{Q}, \mathbb{R}, \mathbb{C}$ are exactly those number systems where division is always possible (by non-zero elements). The integers $\mathbb{Z}$ are not a field — only $\pm 1$ are units. This is what makes $\mathbb{Z}$ arithmetically rich: not everything divides everything else, so divisibility has content.
Rings support a natural polynomial construction, essential for everything that follows.
[definition:Polynomial Ring]
Let $R$ be a ring. The polynomial ring $R[X]$ is the set of all expressions $f = a_0 + a_1 X + \cdots + a_n X^n$ with $a_i \in R$, where $X$ is a formal symbol. Addition and multiplication are defined by the usual rules for polynomials. The degree $\deg f$ is the largest $n$ with $a_n \neq 0$; $f$ is monic if its leading coefficient is $1$.
[/definition]
A crucial subtlety: a polynomial is a sequence of coefficients, not a function. In the ring $\mathbb{Z}/2\mathbb{Z}$, the polynomial $X^2 + X$ is non-zero (its coefficients are not all zero), but $f(0) = f(1) = 0$, so it defines the zero function. Identifying polynomials with functions would collapse this distinction and lose information.
[definition:Power Series]
The ring $R[[X]]$ of formal power series over $R$ consists of infinite expressions $f = \sum_{n=0}^\infty a_n X^n$ with $a_n \in R$, with the obvious addition and Cauchy-product multiplication. The polynomial $1 - X$ is not a unit in $R[X]$, but it is a unit in $R[[X]]$: $(1-X)(1 + X + X^2 + \cdots) = 1$.
[/definition]
Ideals and Quotient Rings
Why Not Just Subrings?
In group theory, to form a quotient $G/H$ we needed $H$ to be normal — the condition that makes coset multiplication well-defined. In ring theory, the analogous condition is being an ideal. The difference from a subring is telling: a subring is closed under multiplication by its own elements; an ideal must be closed under multiplication by any element of the ambient ring. This extra "absorption" property is what allows the quotient to inherit a well-defined ring structure.
[definition:Ideal]
A subset $I \subseteq R$ is an ideal, written $I \trianglelefteq R$, if:
\begin{align*}
&\text{(i) } I \text{ is an additive subgroup of } (R, +, 0_R), \\
&\text{(ii) if } a \in I \text{ and } b \in R, \text{ then } ab \in I. \quad \text{(strong closure)}
\end{align*}
$I$ is a proper ideal if $I \neq R$.
[/definition]
Condition (ii) is strictly stronger than being a subring: a subring requires closure under multiplication of two elements both from $S$, whereas an ideal requires closure even when only one factor is from $I$. An immediate consequence: if a proper ideal $I$ contained a unit $u$, then $1_R = u^{-1}u \in I$ (by strong closure), so $r = r \cdot 1_R \in I$ for all $r$, giving $I = R$ — a contradiction. So proper ideals never contain units, and in particular $1_R \notin I$.
[definition:Generated Ideal]
For $a \in R$, the principal ideal $(a) = aR = \{ar : r \in R\}$ is the ideal generated by $a$. More generally, for $a_1, \ldots, a_k \in R$:
\begin{align*}
(a_1, \ldots, a_k) = \{a_1 r_1 + \cdots + a_k r_k : r_i \in R\}.
\end{align*}
[/definition]
[example:Ideals in $\mathbb{Z}$]
Every ideal of $\mathbb{Z}$ is principal. Given $I \trianglelefteq \mathbb{Z}$, if $I = \{0\}$ then $I = (0)$. Otherwise, let $n$ be the smallest positive element of $I$. For any $m \in I$, write $m = qn + r$ with $0 \leq r < n$; since $r = m - qn \in I$ and $r < n$, minimality forces $r = 0$, so $n \mid m$. Thus $I = n\mathbb{Z} = (n)$.
The ring $\mathbb{Z}[X]$ is not like this: the ideal $(2, X) = \{2f + Xg : f, g \in \mathbb{Z}[X]\}$ is not principal. Suppose $(2, X) = (h)$. Since $2 \in (h)$, we have $h \mid 2$, so $h$ is a constant $\pm 1$ or $\pm 2$. If $h = \pm 1$, then $(h) = \mathbb{Z}[X]$, but $1 \notin (2, X)$ (any element of $(2, X)$ evaluated at $0$ is even). If $h = \pm 2$, then $X \in (2, X) = (\pm 2)$ would require $2 \mid X$ in $\mathbb{Z}[X]$, which is false. Contradiction.
[/example]
[definition:Quotient Ring]
Let $I \trianglelefteq R$. The quotient ring $R/I$ is the set of additive cosets $\{r + I : r \in R\}$ with operations
\begin{align*}
(r_1 + I) + (r_2 + I) &= (r_1 + r_2) + I, \\
(r_1 + I)(r_2 + I) &= r_1 r_2 + I.
\end{align*}
The zero is $0_R + I = I$ and the one is $1_R + I$.
[/definition]
Multiplication is well-defined precisely because of the strong closure property of $I$: if $r_1' = r_1 + a_1$ and $r_2' = r_2 + a_2$ with $a_1, a_2 \in I$, then $r_1' r_2' = r_1 r_2 + r_1 a_2 + a_1 r_2 + a_1 a_2$, and the last three terms all lie in $I$ by strong closure. Just as in group theory, the condition we imposed on $I$ is exactly the condition needed to make the quotient well-defined.
The quotient map $\pi : R \to R/I$ sending $r \mapsto r + I$ is a surjective ring homomorphism with kernel $I$. This is the ring-theoretic analogue of the quotient group map.
The Isomorphism Theorems for Rings
The isomorphism theorems carry over from groups to rings almost verbatim, since rings are abelian groups u
nder additi
on, and the add
i
tional multiplicative structure is preserved by the same constructions.
[quotetheorem:851]
The First Isomorphism Theorem for Rings has the same shape as its gr
oup
-theoretic counterpart: a ring homomorphism $\varphi : R \to S$ factors as a surjection onto its image $\operatorname{im}(\varphi)$, followed by an isomorphism from $R/\ker(\varphi)$. The proof is the same as for groups, with one additional check: the quotient map respects multiplication, which follows immediately from the homomorphism property of $\varphi$. The theorem is used constantly to identify quotient rings: to show $R/I \cong T$, it suffices to find a surjective ring homomorphism $R \to T$ with kernel $I$.
[quoteproof:851]
[example:Polynomial Quotients]
The evaluation homomorphism $\varphi : \mathbb{R}[X] \to \mathbb{C}$ defined by $\varphi(f) = f(i)$ (where $i = \sqrt{-1}$) is a surjective ring homomorphism, since every complex number $a + bi = \varphi(a + bX)$. Its kernel consists of all $f \in \mathbb{R}[X]$ with $f(i) = 0$, i.e. all polynomials divisible by the minimal polynomial of $i$ over $\mathbb{R}$, which is $X^2 + 1$. So $\ker(\varphi) = (X^2 + 1)$. The first isomorphism theorem gives:
\begin{align*}
\mathbb{R}[X]/(X^2 + 1) \cong \mathbb{C}.
\end{align*}
More explicitly, every element of $\mathbb{R}[X]/(X^2 + 1)$ has a unique representative $a + bX$ (reduce modulo $X^2 + 1$ using the Euclidean algorithm in $\mathbb{R}[X]$), and multiplication in this ring satisfies $X^2 \equiv -1$, recovering the standard multiplication rule for complex numbers. The abstract machinery has constructed $\mathbb{C}$ from $\mathbb{R}$ purely algebraically, with no appeal to geometry.
[/example]
The correspondence between ideals of $R/I$ and ideals of $R$ containing $I$ holds in rings exactly as it held for normal subgroups of quotient groups.
There is also a useful parallel to the characteristic of a ring. For any ring $R$, the unique ring homomorphism $\iota : \mathbb{Z} \to R$ (sending $1 \mapsto 1_R$) has kernel $n\mathbb{Z}$ for some unique $n \geq 0$.
[definition:Characteristic]
The characteristic $\operatorname{char}(R)$ of a ring $R$ is the unique non-negative in
teger $n$ such
that $\ker(\iota : \mathbb{Z} \to R) = n\mathbb{Z}$. Equivalently, it is the smallest positive $n$ with $n \cdot 1_R = 0_R$, or $0$ if no such $n$ exists.
[/definition]
The rings $\mathbb{Z}, \mathbb{Q}, \mathbb{R}, \mathbb{C}$ all have characteristic $0$. The ring $\mathbb{Z}/n\mathbb{Z}$ has characteristic $n$. When $R$ is an integral domain, its characteristic is either $0$ or a prime (since $\mathbb{Z}/\operatorname{char}(R)\mathbb{Z}$ embeds into $R$ and must be an integral domain, forcing $\operatorname{char}(R)\mathbb{Z}$ to be prime).
Integral Domains and Fields of Fractions
Most rings that arise naturally — $\mathbb{Z}$, $\mathbb{Z}[i]$, polynomial rings over fields — share a key property of $\mathbb{Z}$: the product of two non-zero elements is non-zero. Rings with this property are called integral domains, and they are the setting in which a meaningful theory of divisibility can be developed.
[definition:Zero Divisor]
An element $x \in R$ (with $x \neq 0$) is a zero divisor if there exists $y \neq 0$ in $R$ with $xy = 0$.
[/definition]
[definition:Integral Domain]
A non-zero commutative ring $R$ is an integral domain if it has no zero divisors: whenever $ab = 0$ in $R$, either $a = 0$ or $b = 0$.
[/definition]
The rings $\mathbb{Z}/6\mathbb{Z}$ fail this: $2 \cdot 3 = 6 = 0$ in $\mathbb{Z}/6\mathbb{Z}$, so $2$ and $3$ are zero divisors. But $\mathbb{Z}/p\mathbb{Z}$ for prime $p$ is an integral domain (in fact a field). The polynomial ring $R[X]$ over an integral domain $R$ is again an integral domain, since the leading coefficient of $fg$ is the product of the leading coefficients of $f$ and $g$, which is non-zero if both factors are non-zero.
An integral domain satisfies the cancellation law: if $ba = bc$ and $b \neq 0$, then $a = c$ (since $b(a-c) = 0$ and $b \neq 0$ forces $a - c = 0$). This is the algebraic form of "dividing both sides by $b$" — valid in integral domains but not in general rings.
Every field is an integral domain: if $ab = 0$ and $b \neq 0$, then $a = a \cdot (bb^{-1}) = (ab)b^{-1} = 0$. The converse fails in general ($\mathbb{Z}$ is an integral domain but not a field), but holds for finite integral domains: any finite integral domain is a field, since the map $x \mapsto ax$ (for $a \neq 0$) is injective (by cancellation) and hence bijective on a finite set, giving $ab = 1$ for some $b$.
The construction of $\mathbb{Q}$ from $\mathbb{Z}$ — taking formal fractions $a/b$ and identifying $a/b = c/d$ when $ad = bc$ — generalizes to any integral domain.
[quotetheorem:866]
The field of fractions construction is one of the most powerful tools in ring theory: it lets us embed any integral domain into a field, unlocking the full arsenal of field techniques (the Euclidean algorithm in $F[X]$, factorization in $F[X]$ using roots, etc.) for problems in $R$. For instance, to study factorization of polynomials in $\mathbb{Z}[X]$, we often pass to $\mathbb{Q}[X]$ (the field of fractions of $\mathbb{Z}$ is $\mathbb{Q}$), where the Euclidean algorithm is available, and then use Gauss's lemma to pull information back to $\mathbb{Z}[X]$. The transitivity property of the field of fractions is: if $R \leq S$ is a subring of an integral domain $S$, then the field of fractions of $R$ embeds into that of $S$.
[quoteproof:866]
[example:Fields of Fractions]
The field of fractions of $\mathbb{Z}$ is $\mathbb{Q}$. The field of fractions of $\mathbb{Z}[i]$ is $\mathbb{Q}(i) = \{a + bi : a, b \in \mathbb{Q}\}$. The field of fractions of $\mathbb{C}[X]$ is $\mathbb{C}(X)$, the field of rational functions $p(X)/q(X)$ with $p, q \in \mathbb{C}[X]$ and $q \neq 0$. These rational functions are not the same as holomorphic functions — they are purely algebraic objects, and two non-zero polynomials $q$ and $q'$ can have the same rational function even if they differ as polynomials (they cannot, but this illustrates that the construction is formal).
[/example]
Prime and Maximal Ideals
Not all ideals are alike. Two special classes — prime ideals and maximal ideals — control the arithmetic of the ring in complementary ways. Both are best understood through their quotient rings.
[definition:Prime Ideal]
An ideal $I \trianglelefteq R$ is prime if $I \neq R$ and whenever $ab \in I$ for $a, b \in R$, either $a \in I$ or $b \in I$.
[/definition]
[definition:Maximal Ideal]
An ideal $I \trianglelefteq R$ is maximal if $I \neq R$ and there is no ideal $J$ with $I \subsetneq J \subsetneq R$.
[/definition]
The prime ideal condition is a ring-theoretic generalization of the defining property of prime numbers in $\mathbb{Z}$: $p \mid ab \implies p \mid a$ or $p \mid b$. The maximal ideal condition says there is no proper ideal strictly larger than $I$, which is the analogue of a minimal prime in some sense — though the terminology is the other way round. The key theorems characterize both conditions via quotient rings.
[quotetheorem:852]
The Maximal Ideal Criterion says that $I$ is maximal if and only if $R/I$ is a field. The proof goes through the ideal correspondence theorem: ideals of $R/I$ correspond to ideals of $R$ containing $I$, so $R/I$ has no proper non-zero ideals iff $I$ is maximal, and a ring with no proper non-zero ideals is a field (every non-zero element $r$ generates the whole ring, so $(r) = R$, giving $sr = 1_R$ for some $s$, i.e. $r$ is a unit). This is one of the most useful criteria in ring theory: to check $I$ is maximal, it suffices to show $R/I$ is a field.
[quoteproof:852]
[quotetheorem:853]
The Prime Ideal Criterion gives an equally clean characterization: $I$ is prime iff $R/I$ is an integral domain. Since every field is an integral domain, every maximal ideal is prime. The converse fails: in $\mathbb{Z}[X]$, the ideal $(X)$ is prime (since $\mathbb{Z}[X]/(X) \cong \mathbb{Z}$, which is an integral domain) but not maximal (the ideal $(X, 2)$ is strictly larger, and $\mathbb{Z}[X]/(X, 2) \cong \mathbb{Z}/2\mathbb{Z}$, a field, so $(X, 2)$ is maximal). More strikingly, in $\mathbb{Z}$, every non-zero prime ideal $(p)$ is also maximal, since $\mathbb{Z}/(p) = \mathbb{Z}/p\mathbb{Z}$ is a field. The coincidence of prime and maximal for $\mathbb{Z}$ is a special property of PIDs.
[quoteproof:853]
[example:Prime and Maximal Ideals in $\mathbb{Z}[X]$]
We survey the ideal landscape of $\mathbb{Z}[X]$.
The zero ideal $(0)$ is prime (since $\mathbb{Z}[X]$ is an integral domain) but not maximal.
For a prime $p \in \mathbb{Z}$, the ideal $(p)$ is prime: $\mathbb{Z}[X]/(p) \cong (\mathbb{Z}/p\mathbb{Z})[X]$, which is an integral domain (since $\mathbb{Z}/p\mathbb{Z}$ is a field, hence $(\mathbb{Z}/p\mathbb{Z})[X]$ is an integral domain). But $(p)$ is not maximal, since $(p, X)$ is strictly larger.
The ideal $(p, f)$ where $f$ is irreducible modulo $p$ is maximal: $\mathbb{Z}[X]/(p, f) \cong (\mathbb{Z}/p\mathbb{Z})[X]/(f)$, which is a field (since $f$ is irreducible over $\mathbb{Z}/p\mathbb{Z}$, so $(f)$ is maximal in $(\mathbb{Z}/p\mathbb{Z})[X]$).
For example, $(2, X^2 + X + 1)$ is a maximal ideal in $\mathbb{Z}[X]$, with quotient isomorphic to $\mathbb{F}_4$, the field of four elements.
[/example]
Factorization in Integral Domains
The integers have two remarkable factorization properties: every non-zero non-unit factors into primes, and this factorization is unique. Most integral domains do not share both properties. Understanding which do — and why — is the heart of ring-theoretic arithmetic.
[definition:Divisibility and Associates]
For $a, b \in R$ (an integral domain), $a$ divides $b$ (written $a \mid b$) if $b = ac$ for some $c \in R$. Equivalently, $(b) \subseteq (a)$. Elements $a, b$ are associates if $a = bu$ for some unit $u$; equivalently, $(a) = (b)$.
[/definition]
[definition:Irreducible Element]
A non-zero non-unit $a \in R$ is irreducible if whenever $a = bc$, either $b$ or $c$ is a unit.
[/definition]
[definition:Prime Element]
A non-zero non-unit $a \in R$ is prime if whenever $a \mid bc$, either $a \mid b$ or $a \mid c$.
[/definition]
In $\mathbb{Z}$, these coincide: an integer is irreducible iff it is prime iff it is $\pm p$ for some prime number $p$. But in general integral domains, primes and irreducibles can diverge. In $\mathbb{Z}[\sqrt{-5}] = \{a + b\sqrt{-5} : a, b \in \mathbb{Z}\}$, the factorization $6 = 2 \cdot 3 = (1 + \sqrt{-5})(1 - \sqrt{-5})$ shows two distinct factorizations into irreducibles. One verifies using the norm $N(a + b\sqrt{-5}) = a^2 + 5b^2$ that $2, 3, 1 \pm \sqrt{-5}$ are all irreducible (there is no element of norm $2$ or $3$ in $\mathbb{Z}[\sqrt{-5}]$), yet they are not all prime: $2 \nmid 1 + \sqrt{-5}$ and $2 \nmid 1 - \sqrt{-5}$ (since $N(2) = 4 \nmid N(1 \pm \sqrt{-5}) = 6$), so $2$ is irreducible but not prime. The failure of unique factorization and the failure of irreducible $\Leftrightarrow$ prime are two sides of the same coin.
To restore well-behaved arithmetic, we impose progressively stronger conditions.
[definition:Euclidean Domain]
An integral domain $R$ is a Euclidean domain (ED) if there is a function $\varphi : R \setminus \{0\} \to \mathbb{Z}_{\geq 0}$ (the Euclidean function) such that:
\begin{align*}
&\text{(i) } \varphi(ab) \geq \varphi(b) \text{ for all } a, b \neq 0,
\
&\te
xt{(ii) for any } a, b \in R \text{ with } b \neq 0, \text{ there exist } q, r \in R \text{ with } a = bq + r \text{ and } r = 0 \text{ or } \varphi(r) < \varphi(b).
\end{align*}
[/definition]
[definition:Principal Ideal Domain]
An integral domain $R$ is a principal ideal domain (PID) if every ideal is principal.
[/definition]
[definition:Unique Factorization Domain]
An integral domain $R$ is a unique factorization domain (UFD) if every non-zero non-unit factors into irreducibles, and this factorization is unique up to order and associates.
[/definition]
The hierarchy is strict: $\mathrm{ED} \implies \mathrm{PID} \implies \mathrm{UFD} \implies \mathrm{ID}$, and none of the implications reverse. The integers $\mathbb{Z}$ with $\varphi(n) = |n|$ and the polynomial ring $F[X]$ over a field $F$ with $\varphi(f) = \deg f$ are Euclidean domains. The Gaussian integers $\mathbb{Z}[i]$ with $\varphi(z) = |z|^2 = a^2 + b^2$ are Euclidean: given $a, b \in \mathbb{Z}[i]$ with $b \neq 0$, the complex number $a/b \in \mathbb{C}$ lies within distance $\sqrt{2}/2 < 1$ of some Gaussian integer $q$, and setting $r = a - bq$ gives $\varphi(r) = |b|^2 |a/b - q|^2 < |b|^2 = \varphi(b)$. The ring $\mathbb{Z}[\sqrt{-5}]$ is none of the above.
[quotetheorem:855]
Euclidean Domains Are Principal Ideal Domains is the ring-theoretic analogue of the argument showing every ideal of $\mathbb{Z}$ is of the form $n\mathbb{Z}$: pick the element of smallest $\varphi$-value in the ideal, and the division algorithm forces every other element to be a multiple. The proof works word-for-word, replacing $|\cdot|$ with $\varphi$.
[quoteproof:855]
In a PID, being irreducible and being prime are equivalent — a fact that fails dramatically in rings like $\mathbb{Z}[\sqrt{-5}]$.
[quotetheorem:856]
The proof of In PIDs Irreducible Elements Are Prime is a Bézout argument. In $\mathbb{Z}$, if $p \nmid a$ then $\gcd(p, a) = 1$, so $rp + sa = 1$ for some integers $r, s$. Multiplying by $b$ gives $b = rpb + sab$; if $p \mid ab$ then both terms on the right are divisible by $p$, so $p \mid b$. In a PID, the ideal $(p, a)$ is principal: $(p, a) = (d)$. Irreducibility of $p$ forces either $d \sim p$ (meaning $p \mid a$, contradicting $p \nmid a$) or $d$ is a unit (meaning $(d) = R$, giving the Bézout relation). The rest is identical to the $\mathbb{Z}$ argument.
[quoteproof:856]
The grand payoff is that PIDs have unique factorization:
[quotetheorem:867]
Principal Ideal Domains Are Unique Factorization Domains. The proof has two independent parts: existence (using the ascending chain condition — PIDs are Noetherian, since every ideal is finitely generated by a single element, so any ascending chain stabilises) and uniqueness (using that irreducibles are prime, then cancelling one factor at a time, just as in $\mathbb{Z}$). The combination of these two properties — ACC and prime equals irreducible — is what characterizes UFDs among integral domains.
[quoteproof:867]
[example:The Ring $\mathbb{Z}[\sqrt{-5}]$ Is Not a UFD]
The failure of unique factorization in $\mathbb{Z}[\sqrt{-5}]$ is now fully explained. The norm function $N(a + b\sqrt{-5}) = a^2 + 5b^2$ satisfies $N(xy) = N(x)N(y)$, so units have norm $1$: only $N(a + b\sqrt{-5}) = 1$ with $a^2 + 5b^2 = 1$ has the solution $(\pm 1, 0)$. One verifies that $2, 3, 1 \pm \sqrt{-5}$ are all irreducible (no element of norm $2$ or $3$ exists), yet $6 = 2 \cdot 3 = (1+\sqrt{-5})(1-\sqrt{-5})$ gives two distinct factorizations into irreducibles. The irreducible $2$ is not prime: $2 \mid (1+\sqrt{-5})(1-\sqrt{-5})$ but $2 \nmid 1 \pm \sqrt{-5}$ (since otherwise $\frac{1 \pm \sqrt{-5}}{2}$ would be a Gaussian integer, but its norm is $6/4 \notin \mathbb{Z}$). So this ring fails: irreducible $\centernot\Rightarrow$ prime, and unique factorization fails simultaneously.
[/example]
Factorization in Polynomial Rings
Polynomial rings over fields are Euclidean domains, hence PIDs and UFDs. But polynomial rings over $\mathbb{Z}$ — like $\mathbb{Z}[X]$ — are UFDs that are not PIDs. For these, Gauss's lemma provides the essential link between factorization in $R[X]$ and factorization in $F[X]$, where $F$ is the field of fractions.
[definition:Content of a Polynomial]
Let $R$ be a UFD and $f = a_0 + a_1 X + \cdots + a_n X^n \in R[X]$. The content of $f$ is $c(f) = \gcd(a_0, a_1, \ldots, a_n) \in R$ (well-defined up to a unit). The polynomial $f$ is primitive if $c(f)$ is a unit, i.e. if the coefficients are coprime.
[/definition]
Every polynomial $f \in R[X]$ factors as $f = c(f) \cdot f_1$ where $f_1$ is primitive. So factorization in $R[X]$ splits into two independent problems: factorizing the content in $R$, and factorizing the primitive part in $R[X]$ (or equivalently, in $F[X]$, by Gauss's lemma).
[quotetheorem:858]
Gauss's Lemma is the bridge between $R[X]$ and $F[X]$. The forward direction (reducible over $R$ implies reducible over $F$) is trivial. The reverse direction is the content: if $f = gh$ in $F[X]$, we can clear denominators to get $abf = (ag)(bh)$ in $R[X]$, then compare contents. Since $f$ is primitive, $c(abf) = ab$, and $c(ag)c(bh) = ab$ up to a unit, allowing us to reassemble $f = g_1 h_1$ with $g_1, h_1 \in R[X]$ primitive, thus non-units. The elegance of the argument is that content is the right invariant to track: multiplicativity of content ($c(fg) \sim c(f)c(g)$) does all the work.
[quoteproof:858]
[example:Irreducibility via Gauss's Lemma]
The polynomial $f = X^3 + X + 1 \in \mathbb{Z}[X]$ is primitive (content $= 1$). By Gauss's lemma, $f$ is irreducible in $\mathbb{Q}[X]$ iff it is irreducible in $\mathbb{Z}[X]$. A degree-$3$ polynomial over $\mathbb{Q}$ is reducible iff it has a rational root. By the rational root theorem, any rational root of $f$ has the form $\pm 1$ (numerator divides the constant term $1$, denominator divides the leading coefficient $1$). But $f(1) = 3 \neq 0$ and $f(-1) = -1 \neq 0$. So $f$ has no rational roots, hence is irreducible over both $\mathbb{Z}$ and $\mathbb{Q}$.
[/example]
When there is no rational root to check, Eisenstein's criterion detects irreducibility by a single prime.
[quotetheorem:859]
Eisenstein's Criterion is one of the most efficient irreducibility tests available. Its proof is a clean divisibility argument: the Eisenstein prime $p$ divides $a_0$ but not $a_n = 1$ (since $f$ is primitive), so exactly one of the constant terms of the two hypothetical factors is divisible by $p$; a traveling-index argument then shows $p$ must divide all coefficients of one factor except the leading one, forcing that factor to have degree $n$ — contradicting the factorization being non-trivial. Notice Eisenstein requires working in $R[X]$, not $F[X]$ — the prime $p$ plays no role over a field, since it is a unit there.
[quoteproof:859]
[example:Cyclotomic Polynomials Are Irreducible]
Let $p$ be prime and consider the polynomial
\begin{align*}
f = X^{p-1} + X^{p-2} + \cdots + X + 1 = \frac{X^p - 1}{X - 1} \in \mathbb{Z}[X].
\end{align*}
Eisenstein does not apply directly to $f$. The standard trick is to substitute $Y = X - 1$:
\begin{align*}
\hat{f}(Y) = f(Y+1) = \frac{(Y+1)^p - 1}{Y} = Y^{p-1} + \binom{p}{1}Y^{p-2} + \cdots + \binom{p}{p-1}.
\end{align*}
Now apply Eisenstein with the prime $p$: $p \mid \binom{p}{k}$ for $1 \leq k \leq p-1$ (a standard binomial coefficient fact), and $p^2 \nmid \binom{p}{p-1} = p$. So $\hat{f}$ is irreducible in $\mathbb{Z}[Y]$ by Eisenstein. Since a factorization $f(X) = g(X)h(X)$ in $\mathbb{Z}[X]$ gives $\hat{f}(Y) = g(Y+1)h(Y+1)$ in $\mathbb{Z}[Y]$, irreducibility of $\hat{f}$ implies irreducibility of $f$.
[/example]
Noetherian Rings
The Hilbert basis theorem is one of the pivotal results of nineteenth-century algebra. Before Hilbert, invariant theorists labored to exhibit finite generating sets for rings of symmetries by hand. Hilbert proved in one stroke that any ideal in a polynomial ring over a Noetherian ring is finitely generated — ending the laborious case-by-case approach.
A ring is Noetherian if ideals cannot grow indefinitely. The definition is equivalent to requiring all ideals to be finitely generated — and that equivalence is itself a useful theorem.
[definition:Ascending Chain Condition]
A ring $R$ satisfies the ascending chain condition (ACC) if every ascending chain of ideals $I_1 \subseteq I_2 \subseteq I_3 \subseteq \cdots$ eventually stabilises: there exists $N$ with $I_n = I_N$ for all $n \geq N$.
[/definition]
[definition:Noetherian Ring]
A ring $R$ is Noetherian if it satisfies the ACC. Equivalently, every ideal of $R$ is finitely generated.
[/definition]
Every PID is Noetherian: all its ideals are principal, hence generated by a single element. Every field is Noetherian (only two ideals). Every quotient of a Noetherian ring is Noetherian (ideals of $R/I$ pull back to ideals of $R$ containing $I$, which are finitely generated, and their images in $R/I$ are then finitely generated by the images of the generators). A non-example: $\mathbb{Z}[X_1, X_2, X_3, \ldots]$ (infinitely many variables) is not Noetherian — the chain $(X_1) \subsetneq (X_1, X_2) \subsetneq (X_1, X_2, X_3) \subsetneq \cdots$ never stabilises.
The crucial closure property is that Noetherian-ness passes to polynomial rings:
[quotetheorem:860]
The Hilbert Basis Theorem is the reason polynomial rings are tractable. Every ideal $I \trianglelefteq R[X]$ is determined by finitely many polynomial equations — a foundational fact for algebraic geometry, where ideals of $\mathbb{R}[X_1, \ldots, X_n]$ correspond to polynomial systems whose solution sets are algebraic varieties. The theorem says that any such system, though potentially given by infinitely many equations, is determined by finitely many of them.
The proof works by extracting leading coefficients at each degree to form an ascending chain of ideals in $R$. The Noetherian hypothesis on $R$ forces this chain to stabilise at some level $N$, and the finitely many generating polynomials (one at each degree $0 \leq n \leq N$ for each generator of the corresponding ideal in $R$) then suffice to generate all of $I$ by an induction on degree argument.
[quoteproof:860]
[example:Applications of Noetherian Rings]
Let $F$ be a field and consider any system of polynomial equations $f_\alpha(x_1, \ldots, x_n) = 0$ for $\alpha$ ranging over some (possibly infinite) index set. Let $I = (\{f_\alpha\})$ be the ideal generated by all these polynomials in $F[X_1, \ldots, X_n]$.
Since $F$ is Noetherian, and $F[X_1, \ldots, X_n]$ is Noetherian by iterated application of the Hilbert basis theorem, the ideal $I$ is finitely generated: $I = (f_1, \ldots, f_k)$ for some finite list $f_1, \ldots, f_k$. A point $a = (a_1, \ldots, a_n)$ satisfies all the equations $f_\alpha(a) = 0$ if and only if it satisfies $f_1(a) = \cdots = f_k(a) = 0$ (since every $f_\alpha$ is a combination $\sum r_i f_i$, so vanishing of $f_1, \ldots, f_k$ forces vanishing of all $f_\alpha$). Thus the solution set of an arbitrary polynomial system equals the solution set of a finite polynomial system — a remarkable compactness statement that requires no topology, only Noetherian algebra.
[/example]
Modules
If rings generalise the integers by keeping two operations, then modules generalise vector spaces by relaxing the requirement that scalars form a field. A vector space over $\mathbb{R}$ or $\mathbb{C}$ is geometrically intuitive but algebraically rigid — bases always exist, dimension is well-defined, and every subspace has a complement. When we allow scalars from a ring $R$, this rigidity dissolves. Not every module has a basis, not every submodule is a direct summand, and the structure of a module depends sensitively on the ring $R$. This loss of rigidity is not a weakness; it is where the richness comes from.
The payoff for working with modules over a ring $R$ rather than vector spaces over a field is the structure theorem: every finitely generated module over a Euclidean domain decomposes into a direct sum of cyclic modules, classified by invariant factors. Applied with $R = \mathbb{Z}$, this immediately classifies all finite abelian groups — the result stated without proof at the end of Chapter 1. Applied with $R = \mathbb{F}[X]$, it produces the rational canonical form and Jordan normal form for matrices, giving a purely algebraic proof of results that linear algebra usually handles by more computational means.
Modules and Submodules
The Definition
A module over a ring $R$ is an abelian group on which $R$ acts by scalar multiplication, compatibly with both the ring structure of $R$ and the group structure of the module.
[definition: Module]
Let $R$ be a commutative ring. An $R$-module is a quadruple $(M, +, 0_M, \cdot)$ where $(M, +, 0_M)$ is an abelian group and $\cdot : R \times M \to M$ is a scalar multiplication satisfying, for all $r, s \in R$ and $m, n \in M$:
\begin{align*}
&\text{(i) } (r + s) \cdot m = r \cdot m + s \cdot m, \\
&\text{(ii) } r \cdot (m + n) = r \cdot m + r \cdot n, \\
&\text{(iii) } r \cdot (s \cdot m) = (rs) \cdot m, \\
&\text{(iv) } 1_R \cdot m = m.
\end{align*}
[/definition]
The axioms say that $R$ acts on $M$ by ring homomorphisms: each $r \in R$ gives an additive endomorphism $m \mapsto rm$ of $M$, and the map $r \mapsto (m \mapsto rm)$ is itself a ring homomorphism $R \to \mathrm{End}(M)$. This is the coordinate-free way to think about modules: a module is an abelian group together with a ring action on it.
[example: The Canonical Examples of Modules]
Vector spaces. If $\mathbb{F}$ is a field, an $\mathbb{F}$-module is exactly an $\mathbb{F}$-vector space. Every result in this chapter specialises to a (usually easier) statement about vector spaces.
Abelian groups as $\mathbb{Z}$-modules. Every abelian group $(A, +)$ is a $\mathbb{Z}$-module via $n \cdot a = a + \cdots + a$ ($n$ times), extended to negative integers and zero in the obvious way. This action is forced: $1 \cdot a = a$ by axiom (iv), and the rest follows by distributivity. Conversely, every $\mathbb{Z}$-module is an abelian group. So $\mathbb{Z}$-modules and abelian groups are the same thing.
Ideals and quotients. Any ideal $I \trianglelefteq R$ is an $R$-module under the ring multiplication. The quotient ring $R/I$ is also an $R$-module via $r \cdot (a + I) = ra + I$.
$R^n$. For any ring $R$ and $n \geq 1$, the direct product $R^n = R \times \cdots \times R$ is an $R$-module via $r \cdot (r_1, \ldots, r_n) = (rr_1, \ldots, rr_n)$. This is the module-theoretic analogue of $\mathbb{F}^n$.
$\mathbb{F}[X]$-modules. Let $\mathbb{F}$ be a field, $V$ an $\mathbb{F}$-vector space, and $\alpha : V \to V$ a linear map. Then $V$ becomes an $\mathbb{F}[X]$-module via $f \cdot v = f(\alpha)(v)$. Different choices of $\alpha$ give different $\mathbb{F}[X]$-module structures on the same abelian group $V$. This example is the gateway to normal forms for matrices.
[/example]
[definition: Submodule]
Let $M$ be an $R$-module. A subset $N \subseteq M$ is an $R$-submodule, written $N \leq M$, if $N$ is a subgroup of $(M, +)$ and $rn \in N$ for all $r \in R$, $n \in N$.
[/definition]
[definition: Quotient Module]
If $N \leq M$ is an $R$-submodule, the quotient module $M/N$ is the set of additive cosets $\{m + N : m \in M\}$ with the $R$-action $r \cdot (m + N) = rm + N$.
[/definition]
Modules differ from groups in a notable way: in groups, we distinguished subgroups from normal subgroups, and only the latter allowed quotienting. In modules, every submodule is automatically "normal" as an abelian group (since $M$ is abelian), so we can always form the quotient. This uniformity makes the quotient theory of modules cleaner than that of groups.
[definition: Annihilator]
Let $M$ be an $R$-module and $S \subseteq M$ a subset. The annihilator of $S$ is
\begin{align*}
\operatorname{Ann}(S) = {r \in R : r \cdot m = 0 \text{ for all } m \in S}.
\end{
align*}
Th
is is always an ideal of $R$. For a single element $m \in M$, $\operatorname{Ann}(m)$ is the ideal of scalars that kill $m$.
[/definition]
[definition: Torsion]
An element $m \in M$ is a torsion element if $\operatorname{Ann}(m) \neq 0$, i.e. if there exists a non-zero $r \in R$ with $rm = 0$. The module $M$ is a torsion module if every element is torsion, and torsion-free if the only torsion element is $0$.
[/definition]
In a $\mathbb{Z}$-module (abelian group), torsion elements are precisely the elements of finite order. In an $\mathbb{F}$-vector space ($\mathbb{F}$ a field), there are no torsion elements other than $0$
, sin
ce $\mathbb{F}$ has no zero divisors and only $0$ is annihilated by a non-zero scalar. Torsion and free parts are the two ingredients in the structure theorem.
[definition: Finitely Generated Module]
An $R$-module $M$ is finitely generated if there exist $m_1, \ldots, m_k \in M$ such that
\begin{align*}
M = Rm_1 + \cdots + Rm_k = \{r_1 m_1 + \cdots + r_k m_k : r_i \in R\}.
\end{align*}
Equivalently, $M$ is finitely generated iff there is a surjective $R$-module homomorphism $R^k \twoheadrightarrow M$ for some $k$.
[/definition]
The equivalence with surjections from $R^k$ is useful: it means every finitely generated module is a quotient of a free module $R^k$. The kernel of that surjection is itself a submodule of $R^k$, and understanding the kernel — via the Smith normal form of its generator matrix — is exactly what the structu
re th
eorem does.
Homomorphisms and the Isomorphism Theorems for Modules
[definition: Module Homomorphism]
Let $M$ and $N$ be $R$-modules. A function $f : M \to N$ is an $R$-module homomorphism if $f(m_1 + m_2) = f(m_1) + f(m_2)$ and $f(rm) = rf(m)$ for all $r \in R$, $m, m_1, m_2 \in M$. A bijective homomorphism is an isomorphism.
[/definition]
The kernel $\ker f = \{m \in M : f(m) = 0\}$ is a submodule of $M$, and the image $\operatorname{im} f$ is a submodule of $N$. The three isomorphism theorems hold for modules with the same proofs as for groups, since both rely only on the underlying abelian group structure supplemented by the scalar action.
[quotetheorem:862]
The First Isomorphism Theorem for Modules is the foundation for identifying modules via surjective homomorphisms. To show $M \cong N$, exhibit a surjective $R$-module homomorphism $\varphi : M \to N$ and identify its kernel. As with groups, the key work is always in computing $\ker \varphi$ and verifying surjectivity; the isomorphism itself is then automatic. This theorem is what converts the Smith normal form computation (which identifies the kernel of a surjection $R^m \to M$) into the structure theorem decomposition.
[quoteproof:862]
[example: The Cyclic Module]
For any $m \in M$, the map $\varphi : R \to M$ defined by $\varphi(r) = rm$ is an $R$-module homomorphism with image $Rm = \{rm : r \in R\}$ (the submodule generated by $m$) and kernel $\operatorname{Ann}(m)$. By the First Isomorphism Theorem for Modules:
\begin{align*}
Rm \cong R/\operatorname{Ann}(m).
\end{align*}
This is the fundamental example of a cyclic module. When $R = \mathbb{Z}$ and $M = \mathbb{Z}/n\mathbb{Z}$, the element $m = 1$ has $\operatorname{Ann}(m) = n\mathbb{Z}$, and $\mathbb{Z} \cdot 1 = \mathbb{Z}/n\mathbb{Z}$ is the whole module. When $R = \mathbb{F}[X]$ and $M = V_\alpha$ is a cyclic $\mathbb{F}[X]$-module, $\operatorname{Ann}(v)$ is the ideal generated by the minimal polynomial of $v$ with respect to $\alpha$.
[/example]
Free Modules and Linear Independence
The nicest modules are those with a basis — a linearly independent generating set. In vector spaces, every generating set contains a basis and every basis has the same size. Neither statement holds in general for modules over rings, which is one of the main differences between module theory and linear algebra.
[definition: Linear Independence]
Elements $m_1, \ldots, m_k \in M$ are linearly independent (over $R$) if $\sum_{i=1}^k r_i m_i = 0$ with $r_i \in R$ implies $r_1 = \cdots = r_k = 0$.
[/definition]
[definition: Free Module and Basis]
An $R$-module $M$ is free if it has a basis: a subset $S \subseteq M$ that generates $M$ and is linearly independent. If $S = \{m_1, \ldots, m_n\}$ is finite, then $M \cong R^n$.
[/definition]
Free modules over a ring behave like vector spaces: any function from a basis to another module extends uniquely to a homomorphism. However, unlike vector spaces, not every module is free. The $\mathbb{Z}$-module $\mathbb{Z}/2\mathbb{Z}$ is not free: any supposed basis element $m$ satisfies $2m = 0$, so the set $\{m\}$ is not linearly independent over $\mathbb{Z}$ (since $2 \neq 0$ in $\mathbb{Z}$ but $2 \cdot m = 0$). More subtly, even for free modules, the rank (size of a basis) need not be well-defined without additional hypotheses. For modules over a non-zero commutative ring, rank is well-defined: if $R^m \cong R^n$ then $m = n$ (proved by passing to $R^m / \mathfrak{m} R^m \cong (R/\mathfrak{m})^m$ for a maximal ideal $\mathfrak{m}$, which is a vector space). This is the invariance of rank.
[example: Free and Non-Free Modules]
The module $R^n$ is free of rank $n$ for any ring $R$, with the standard basis $\{e_1, \ldots, e_n\}$.
The ideal $(2, X)
\triangleleft
eq \mathbb{Z}[X]$ is a submodule of $\mathbb{Z}[X]$ (which is free of rank $1$) but is not free of rank $1$: it cannot be generated by a single element, as shown in Chapter 2. It is generated by $2$ and $X$, but these are not independent: $X \cdot 2 = 2 \cdot X$ in $\mathbb{Z}[X]$, so the generators satisfy a relation. This example shows that submodules of free modules need not be free — unless the ring is a PID (where they always are, as a consequence of the structure theorem). [/example] ## Smith Normal Form The Smith normal form is a normal form for matrices over a Euclidean domain, analogous to the row-echelon form over a field but more refined. Over a field, any matrix can be reduced to a block of $1$s followed by $0$s. Over a Euclidean domain, the best we can do is a diagonal matrix with a divisibility condition. This turns out to be exactly what we need to classify finitely generated modules. [definition: Elementary Row and Column Operations] Over a ring $R$, the **elementary row operations** on a matrix $A$ are: (i) adding $c \in R$ times one row to another, (ii) swapping two rows, (iii) multiplying a row by a unit of $R$. **Elementary column operations** are defined analogously. Two matrices are **equivalent** if one can be obtained from the other by a sequence of elementary row and column operations; equivalently, $B = PAQ$ for some invertible matrices $P, Q$. [/definition] [definition: Fitting Ideals] For an $m \times n$ matrix $A$ over $R$, the **$k$th Fitting ideal** $\mathrm{Fit}_k(A) \trianglelefteq R$ is the ideal generated by all $k \times k$ minors of $A$. Equivalent matrices have the same Fitting ideals. [/definition] The Fitting ideals are the key invariants: they are preserved by row and column operations, so they are genuinely attached to the equivalence class of $A$. For the Smith normal form $D = \mathrm{diag}(d_1, \ldots, d_r, 0, \ldots, 0)$, one computes $\mathrm{Fit}_k(D) = (d_1 d_2 \cdots d_k)$, which shows the invariant factors $d_k$ are uniquely determined (as the ratio of consecutive Fitting ideal generators) and gives the uniqueness part of the Smith normal form theorem. [quotetheorem:861] The [Smith Normal Form Theorem](/theorems/861) is the engine behind the entire classification theory of this chapter. The algorithm is clean: bring the smallest-$\varphi$-value entry to the top-left corner, use the division algorithm to clear the rest of the first row and column, then handle off-diagonal entries in the remaining block by the same method. The divisibility condition $d_1 \mid d_2 \mid \cdots \mid d_r$ emerges automatically from the algorithm, and uniqueness follows from the Fitting ideal computation. [quoteproof:861] [example: Computing a Smith Normal Form over $\mathbb{Z}$] We reduce the matrix MATHENVkqogiwP0END to Smith normal form. First bring the $1$ in position $(2,1)$ to position $(1,1)$ by swapping rows $1$ and $2$: MATHENVkqogiwP1END Clear the first row by subtracting multiples of column $1$ from columns $2$ and $3$: MATHENVkqogiwP2END Clear the first column similarly: MATHENVkqogiwP3END Now work on the $2 \times 2$ block. The entry $-2$ is not divisible by $10$, so use the division algorithm: $10 = (-5)(-2) + 0$, so subtract $-5$ times column $3$ from column $2$ (or note $\gcd(10, -2) = 2$). Instead, swap columns $2$ and $3$ and negate to bring $2$ to position $(2,2)$: MATHENVkqogiwP4END Now $10 = 5 \cdot 2 + 0$ and $8 = 4 \cdot 2 + 0$, so column operations clear the second row, and row operations clear the second column: MATHENVkqogiwP5END We verify the Fitting ideals: $\mathrm{Fit}_1(A) = (1)$ (the entry $1$ generates $\mathbb{Z}$), $\mathrm{Fit}_2(A) = (d_1 d_2) = (2)$ (the $2\times 2$ minor from the first two rows and columns of $A$ equals $\det\begin{pmatrix}3&7\\1&-1\end{pmatrix} = -10$, and others; $\gcd = 2$), and $\mathrm{Fit}_3(A) = (\det A) = (34)$. So $d_1 = 1$, $d_2 = 2$, $d_3 = 17 = 34/2$. Indeed $1 \mid 2 \mid 17$. [/example] ## The Structure Theorem With the Smith normal form established, the classification of finitely generated modules over a Euclidean domain is a single step: write a module as the cokernel of a presentation matrix, put that matrix in Smith normal form, and read off the decomposition. [quotetheorem:857] The [Structure Theorem for Finitely Generated Modules over Euclidean Domains](/theorems/857) is the culmination of everything in this chapter. It says: once you know the ring is Euclidean, every finitely generated module is completely determined, up to isomorphism, by a finite sequence of invariant factors. The free part $R^s$ captures the torsion-free part of $M$; the summands $R/(d_i)$ capture the torsion. The two parts are cleanly separated because $R$ is an integral domain: a module is torsion-free iff it has no cyclic summands $R/(d)$ with $d \neq 0$. The proof strategy is elegant in its economy. Since $M$ is finitely generated, there is a surjecti
on $\varphi :
R^m \to M$. The kernel $\ker\varphi$ is a submodule of $R^m$, hence finitely generated (by at most $m$ elements, since $R$ is a PID). Arrange the generators of $\ker\varphi$ as columns of an $m \times n$ matrix $A$. The Smith normal form theorem turns $A$ into a diagonal matrix via row and column operations. Row operations correspond to change of basis in $R^m$; column operations correspond to change of generators for $\ker\varphi$. Reading off the diagonal entries gives the claimed decomposition.
[quoteproof:857]
[example: Classifying an Abelian Group from Generators and Relations]
Let $A$ be the abelian group generated by $a, b, c$ with relations
MATHENVuwsvadP0END
As a $\mathbb{Z}$-module, $A = \mathbb{Z}^3 / N$ where $N$ is the submodule generated by the rows of the relation matrix (or equivalently, the cokernel of the matrix of relations). The presentation matrix, written with the relations as columns, is:
MATHENVuwsvadP1END
We compute Fitting ideals to find the Smith normal form. Since $(A_{\text{pres}}){31} = 1$, we have $\mathrm{Fit}_1(A{\text{pres}}) = (1)$, so $d_1 = 1$. The $2 \times 2$ minor from rows $1,2$ and columns $1,2$ is $\det\begin{pmatrix}2&1\\3&2\end{pmatrix} = 1$, so $\mathrm{Fit}2 = (1)$ and $d_2 = 1$. Finally $\det(A{\text{pres}}) = 2(14-0) - 1(21-6) + 5(0-2) = 28 - 15 - 10 = 3$, so $\mathrm{Fit}_3 = (3)$ and $d_3 = 3$.
The Smith normal form is $\mathrm{diag}(1, 1, 3)$. Therefore:
\begin{align*}
A \cong \frac{\mathbb{Z}}{(1)} \oplus \frac{\mathbb{Z}}{(1)} \oplus \frac{\mathbb{Z}}{(3)} \cong {0} \oplus {0} \oplus C_3 \cong C_3.
\end{
align*}
Th
e
group is cyclic of order $3$. The two summands $\mathbb{Z}/(1) = 0$ vanish because $d_1 = d_2 = 1$ are units.
[/example]
[example: Classification of Finitely Generated Abelian Groups, Revisited]
As a special case of the structure theorem with $R = \mathbb{Z}$: every finitely generated abelian group is isomorphic to
\begin{align*}
C_{d_1} \times C_{d_2} \times \cdots \times C_{d_r} \times \mathbb{Z}^s,
\end{align*}
with $d_1 \mid d_2 \mid \cdots \mid d_r$ and $s \geq 0$. This is the Classification of Finite Abelian Groups stated in Chapter 1, now fully proved. The invariant factors $d_i$ and the rank $s$ are uniquely determined by the group — they are computed from the Fitting ideals of any pre
sentat
ion matrix.
For example, all abelian groups of order $360 = 2^3 \cdot 3^2 \cdot 5$ (with no free part, since the group is finite) are enumerated by sequences $d_1 \mid d_2 \mid \cdots \mid d_r$ with $\prod d_i = 360$. These are: $C_{360}$; $C_2 \times C_{180}$ (since $2 \mid 180$); $C_6 \times C_{60}$ (since $6 \mid 60$); $C_2 \times C_2 \times C_{90}$ (since $2 \mid 2 \mid 90$); $C_6 \times C_6 \times C_{10}$ (since $6 \mid 6 \mid 10$); and $C_2 \times C_6 \times C_{30}$ (since $2 \mid 6 \mid 30$). So there are six non-isomorphic abelian groups of order $360$.
[/example]
Normal Forms for Matrices
The most striking application of the structure theorem is to linear algebra: it gives a complete classification of linear maps $\alpha : V \to V$ up to conjugacy (i.e. up to change of basis), producing the rational canonical form and the Jordan normal form as two ways of presenting the same classification.
Setting Up the $\mathbb{F}[X]$-Module
Let $\mathbb{F}$ be a field and $V$ a finite-dimensional $\mathbb{F}$-vector space of dimension $n$, and let $\alpha : V \to V$ be a linear map. Turn $V$ into an $\mathbb{F}[X]$-module $V_\alpha$ by defining the action of the polynomial $f(X) = a_0 + a_1 X + \cdots + a_k X^k$ as
\begin{align*}
f \cdot v = f(\alpha)(v) = a_0 v + a_1 \alpha(v) + a_2 \alpha^2(v) + \cdots + a_k \alpha^k(v).
\end{align*}
Since $\mathbb{F} \subseteq \mathbb{F}[X]$, any $\mathbb{F}$-basis of $V$ generates $V_\alpha$ as an $\mathbb{F}[X]$-module, so $V_\alpha$ is finitely generated. Since $V$ is finite-dimensional over $\mathbb{F}$, the module $V_\alpha$ has no free $\mathbb{F}[X]$-summand (a free summand $\mathbb{F}[X]$ is infinite-dimensional over $\mathbb{F}$). By the Cayley–Hamilton theorem, the characteristic polynomial $\chi_\alpha$ annihilates $V$, so $\operatorname{Ann}(V_\alpha) \neq 0$.
The crucial observation is that an $\mathbb{F}$-linear change of basis $\alpha \mapsto P^{-1}\alpha P$ changes the $\mathbb{F}[X]$-module structure of $V$ to an isomorphic one (with the same underlying set $V$ but a new action defined by the new $\alpha$). Conversely, two isomorphic $\mathbb{F}[X]$-module structures on $V$ correspond to conjugate linear maps. So classifying linear maps on $V$ up to conjugacy is the same as classifying $\mathbb{F}[X]$-module structures on $V$ up to isomorphism.
Rational Canonical Form
Applying the structure theorem to $V_\alpha$ (with $R = \mathbb{F}[X]$, which is Euclidean) gives:
[quotetheorem:863]
The Rational Canonical Form is the direct output of the structure theorem for $\mathbb{F}[X]$-modules. Each cyclic summand $\mathbb{F}[X]/(f_i)$ has a preferred basis $\{1, X, X^2, \ldots, X^{\deg f_i - 1}\}$ modulo $(f_i)$, in which the action of $\alpha$ (multiplication by $X$) is represented by the companion matrix $c(f_i)$. The divisibility $f_1 \mid f_2 \mid \cdots \mid f_s$ is the divisibility of the corresponding invariant factors of the p
resentat
ion matrix of $V_\alpha$.
Three important read-offs from the rational canonical form: the minimal polynomial of $\alpha$ is $f_s$ (the largest invariant factor, which annihilates every summand since $f_i \mid f_s$, and is minimal since $f_s$ is the annihilator of the last summand); the characteristic polynomial is $f_1 f_2 \cdots f_s$ (the product of all invariant factors); and the form is genuinely canonical — the invariant factors are uniquely determined, unlike the Jordan form which is canonical only up to block ordering.
[quoteproof:863]
[example: Computing the Rational Canonical Form]
Let $\alpha : \mathbb{Q}^3 \to \mathbb{Q}^3$ be the linear map with matrix
\begin{align
}
A = \begin{pmatrix} 0 & 0 & 1 \\ 1 & 0 & -1 \\ 0 & 1 & 1 \end{pmatrix}.
\end{align}
The characteristic polynomial is $\chi_A = \det(XI - A) = X^3 - X^2 + X - 1 = (X-1)(X^2+1)$.
To find the invariant factors, compute the Smith normal form of $XI - A \in \mathbb{Q}[X]^{3\times 3}$:
\be
gin{a
lign}
XI - A = \begin{pmatrix} X & 0 & -1 \\ -1 & X & 1 \\ 0 & -1 & X-1 \end{pmatrix}.
\end{align}
The $\gcd$ of all entries (the generator of $\mathrm{Fit}_1$) is $1$, so $d_1 = 1$. The $\gcd$ of all $2 \times 2$ minors (the generator of $\mathrm{Fit}_2$, divided by $d_1 = 1$) is $1$, so $d_2 = 1$. The generator of $\mathrm{Fit}_3$ is $\det(XI - A) = (X-1)(X^2+1)$, so $d_3 = (X-1)(X^2+1)$.
Thus $V_\alpha \cong \mathbb{Q}[X]/(1) \oplus \mathbb{Q}[X]/(1) \oplus \mathbb{Q}[X]/((X-1)(X^2+1))$, which simplifies to $\mathbb{Q}[X]/((X-1)(X^2+1))$. The single invariant factor $f_1 = (X-1)(X^2+1) = X^3 - X^2 + X - 1$ gives one $3 \times 3$ companion block:
\begin{alig
n}
c(f_1) = \begin{pmatrix} 0 & 0 & 1 \\ 1 & 0 & -1 \\ 0 & 1 & 1 \end{pmatrix},
\end{align}
which is just $A$ itself — a happy coincidence showing $A$ is already in rational canonical form.
[/example]
Jordan Normal Form
Over $\mathbb{C}$, every polynomial factors into linear factors. This means the invariant factors $f_i$ of $V_\alpha$ factor completely into factors $(X - \lambda)^k$, and the Chinese remainder theorem for modules ($R/(ab) \cong R/(a) \oplus R/(b)$ when $\gcd(a,b) = 1$) further decomposes each summand $\mathbb{C}[X]/(f_i)$ into primary pieces $\mathbb{C}[X]/((X-\lambda)^k)$.
[quotetheorem:864]
The Jordan Normal Form is the prime decomposition version of the rational canonical form, available over algebraically closed fields. Each piece $\mathbb{C}[X]/((X-\lambda)^k)$ has basis $\{1, (X-\lambda), \ldots, (X-\lambda)^{k-1}\}$ modulo $((X-\lambda)^k)$, in which the action of $X$ (i.e. of $\alpha$) is: $(X-\lambda)^j \mapsto (X-\lambda)^{j+1}$ for $j < k-1$, and the identity $(X-\lambda)^{k-1} \mapsto 0$ (the term $(X-\lambda)^k$ vanishes). This means $\alpha$ acts as $\lambda \cdot \mathrm{id}$ plus a nilpotent shift — exactly the Jordan block $J_k(\lambda)$.
The minimal polynomial of $\alpha$ reads off as $\prod_\lambda (X-\lambda)^{a_\lambda}$ where $a_\lambda$ is the size of the largest $\lambda$-block; the characteristic polynomial is $\prod_\lambda (X-\lambda)^{b_\lambda}$ where $b_\lambda$ is the sum of all $\lambda$-block sizes.
[quoteproof:864]
[example: Jordan Normal Form — A Complete Computation]
Let $\alpha : \mathbb{C}^4 \to \mathbb{C}^4$ have characteristic polynomial $(X-2)^3(X+1)$ and minimal polynomial $(X-2)^2(X+1)$.
The minimal polynomial tells us: the largest Jordan $2$-block has size $2$, and the $(-1)$-block has size $1$. The characteristic polynomial tells us: the $2$-eigenspace contributes blocks totalling size $3$, and the $(-1)$-eigenspace contributes blocks totalling size $1$.
For eigenvalue $\lambda = 2$, total block size $3$, largest block size $2$: the only possibility is one $2$-block and one $1$-block (sizes $2, 1$, sum $= 3$, max $= 2$). For $\lambda = -1$: total size $1$, so one $1$-block.
The Jordan form is therefore:
\begin{align*}
J = \begin{pmatrix} 2 & 0 & 0 & 0 \\ 1 & 2 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & -1 \end{pmatrix}.
\end{align*}
(Blocks on the diagonal: $J_2(2)$, then $J_1(2)$, then $J_1(-1)$, with subdiagonal entries within each block.) The module decomposition is $V_\alpha \cong \mathbb{C}[X]/((X-2)^2) \oplus \mathbb{C}[X]/(X-2) \oplus \mathbb{C}[X]/(X+1)$.
[/example]
Cayley–Hamilton
Both normal forms give an immediate proof of the Cayley–Hamilton theorem, which in naive formulations ("a matrix satisfies its own characteristic polynomial") looks like it should be straightforward but is actually subtle to prove without the module machinery.
[quotetheorem:865]
The Cayley-Hamilton Theorem is an immediate corollary of the rational canonical form. The characteristic polynomial $\chi_\alpha = f_1 f_2 \cdots f_s$ divides $f_s^s$ (since $f_i \mid f_s$ for all $i$), but more directly: every summand $V_i \cong \mathbb{F}[X]/(f_i)$ is annihilated by $f_i$, and since $f_i \mid \chi_\alpha$, it is also annihilated by $\chi_\alpha$. Since $V$ is the direct sum of the $V_i$, the whole space is annihilated by $\chi_\alpha$. The theorem holds over any field and even over any
comm
utative ring (with the appropriate generalization of characteristic polyno
m
ial via the adjugate matrix), but the field case from the rational canonical form is the clearest.
[quotep
roof:865]
[example: Cayley-Hamilton in Practice]
Let $A = MATHENVmunv0sP0END$ over $\mathbb{Q}$. The characteristic polynomial is $\chi_A = (X-1)(X-3) = X^2 - 4X + 3$. Cayley-Hamilton
a
sserts $A^2 - 4A + 3I = 0$.
Computing directly:
\begin{align*}
A^2 = \begin{pmatrix} 1 & 8 \\ 0 & 9 \end{pmatrix}, \qquad
4A = \begin{pmatrix} 4 & 8 \\ 0 & 12 \end{pmatrix}, \qquad
3I = \begin{pmatrix} 3 & 0 \\ 0 & 3 \end{pmatrix}.
\end{align*}
So $A^2 - 4A + 3I = MATHENVadqz5qP1END = MATHENVadqz5qP2END$. Confirmed.
The theorem has important practical consequences: any polynomial in $A$ of degree $\geq n$ can be reduced to one of degree $< n$ using the relation $\chi_A(A) = 0$. In particular, $A^{-1}$ (when it exists) can be expressed as a polynomial in $A$ of degree $< n$ with coefficients in $\m
ath
bb{Q}$,
readable from the adjugate formula: $A^{-1}
=
\frac{1}{\det A}
(
(\mathrm{tr},
A)I - A) = \frac{1}{3}(4I - A) = \begin{pmatrix} 1 & -
2
\in (h)$, we
have $h \mid 2$, so $h$ is a constant $\pm 1$ or $\pm 2$. If $h = \pm 1$, then $(h) = \mathbb{Z}[X]$, but $1 \notin (2, X)$ (any element of $(2, X)$ evaluated at $0$ is even). If $h = \pm 2$, then $X \in (2, X) = (\pm 2)$ would require $2 \mid X$ in $\mathbb{Z}[X]$, which is false. Contradiction.
[/example]
[definition:Quotient Ring]
Let $I \trianglelefteq R$. The quotient ring $R/I$ is the set of additive cosets $\{r + I : r \in R\}$ with operations
\begin{align*}
(r_1 + I) + (r_2 + I) &= (r_1 + r_2) + I, \\
(r_1 + I)(r_2 + I) &= r_1 r_2 + I.
\end{align*}
The zero is $0_R + I = I$ and the one is $1_R + I$.
[/definition]
Multiplication is well-defined precisely because of the strong closure property of $I$: if $r_1' = r_1 + a_1$ and $r_2' = r_2 + a_2$ with $a_1, a_2 \in I$, then $r_1' r_2' = r_1 r_2 + r_1 a_2 + a_1 r_2 + a_1 a_2$, and the last three terms all lie in $I$ by strong closure. Just as in group theory, the condition we imposed on $I$ is exactly the condition needed to make the quotient well-defined.
The quotient map $\pi : R \to R/I$ sending $r \mapsto r + I$ is a surjective ring homomorphism with kernel $I$. This is the ring-theoretic analogue of the quotient group map.
The Isomorphism Theorems for Rings
The isomorphism theorems carry over from groups to rings almost verbatim, since rings are abelian groups under addition, and the additional multiplicative structure is preserved by the same constructions.
[quotetheorem:851]
The First Isomorphism Theorem for Rings has the same shape as its group-theoretic counterpart: a ring homomorphism $\varphi : R \to S$ factors as a surjection onto its image $\operatorname{im}(\varphi)$, followed by an isomorphism from $R/\ker(\varphi)$. The proof is the same as for groups, with one additional check: the quotient map respects multiplication, which follows immediately from the homomorphism property of $\varphi$. The theorem is used constantly to identify quotient rings: to show $R/I \cong T$, it suffices to find a surjective ring homomorphism $R \to T$ with kernel $I$.
[quoteproof:851]
[example:Polynomial Quotients]
The evaluation homomorphism $\varphi : \mathbb{R}[X] \to \mathbb{C}$ defined by $\varphi(f) = f(i)$ (where $i = \sqrt{-1}$) is a surjective ring homomorphism, since every complex number $a + bi = \varphi(a + bX)$. Its kernel consists of all $f \in \mathbb{R}[X]$ with $f(i) = 0$, i.e. all polynomials divisible by the minimal polynomial of $i$ over $\mathbb{R}$, which is $X^2 + 1$. So $\ker(\varphi) = (X^2 + 1)$. The first isomorphism theorem gives:
\begin{align*}
\mathbb{R}[X]/(X^2 + 1) \cong \mathbb{C}.
\end{align*}
More explicitly, every element of $\mathbb{R}[X]/(X^2 + 1)$ has a unique representative $a + bX$ (reduce modulo $X^2 + 1$ using the Euclidean algorithm in $\mathbb{R}[X]$), and multiplication in this ring satisfies $X^2 \equiv -1$, recovering the standard multiplication rule for complex numbers. The abstract machinery has constructed $\mathbb{C}$ from $\mathbb{R}$ purely algebraically, with no appeal to geometry.
[/example]
The correspondence between ideals of $R/I$ and ideals of $R$ containing $I$ holds in rings exactly as it held for normal subgroups of quotient groups.
There is also a useful parallel to the characteristic of a ring. For any ring $R$, the unique ring homomorphism $\iota : \mathbb{Z} \to R$ (sending $1 \mapsto 1_R$) has kernel $n\mathbb{Z}$ for some unique $n \geq 0$.
[definition:Characteristic]
The characteristic $\operatorname{char}(R)$ of a ring $R$ is the unique non-negative integer $n$ such that $\ker(\iota : \mathbb{Z} \to R) = n\mathbb{Z}$. Equivalently, it is the smallest positive $n$ with $n \cdot 1_R = 0_R$, or $0$ if no such $n$ exists.
[/definition]
The rings $\mathbb{Z}, \mathbb{Q}, \mathbb{R}, \mathbb{C}$ all have characteristic $0$. The ring $\mathbb{Z}/n\mathbb{Z}$ has characteristic $n$. When $R$ is an integral domain, its characteristic is either $0$ or a prime (since $\mathbb{Z}/\operatorname{char}(R)\mathbb{Z}$ embeds into $R$ and must be an integral domain, forcing $\operatorname{char}(R)\mathbb{Z}$ to be prime).
Integral Domains and Fields of Fractions
Most rings that arise naturally — $\mathbb{Z}$, $\mathbb{Z}[i]$, polynomial rings over fields — share a key property of $\mathbb{Z}$: the product of two non-zero elements is non-zero. Rings with this property are called integral domains, and they are the setting in which a meaningful theory of divisibility can be developed.
[definition:Zero Divisor]
An element $x \in R$ (with $x \neq 0$) is a zero divisor if there exists $y \neq 0$ in $R$ with $xy = 0$.
[/definition]
[definition:Integral Domain]
A non-zero commutative ring $R$ is an integral domain if it has no zero divisors: whenever $ab = 0$ in $R$, either $a = 0$ or $b = 0$.
[/definition]
The rings $\mathbb{Z}/6\mathbb{Z}$ fail this: $2 \cdot 3 = 6 = 0$ in $\mathbb{Z}/6\mathbb{Z}$, so $2$ and $3$ are zero divisors. But $\mathbb{Z}/p\mathbb{Z}$ for prime $p$ is an integral domain (in fact a field). The polynomial ring $R[X]$ over an integral domain $R$ is again an integral domain, since the leading coefficient of $fg$ is the product of the leading coefficients of $f$ and $g$, which is non-zero if both factors are non-zero.
An integral domain satisfies the cancellation law: if $ba = bc$ and $b \neq 0$, then $a = c$ (since $b(a-c) = 0$ and $b \neq 0$ forces $a - c = 0$). This is the algebraic form of "dividing both sides by $b$" — valid in integral domains but not in general rings.
Every field is an integral domain: if $ab = 0$ and $b \neq 0$, then $a = a \cdot (bb^{-1}) = (ab)b^{-1} = 0$. The converse fails in general ($\mathbb{Z}$ is an integral domain but not a field), but holds for finite integral domains: any finite integral domain is a field, since the map $x \mapsto ax$ (for $a \neq 0$) is injective (by cancellation) and hence bijective on a finite set, giving $ab = 1$ for some $b$.
The construction of $\mathbb{Q}$ from $\mathbb{Z}$ — taking formal fractions $a/b$ and identifying $a/b = c/d$ when $ad = bc$ — generalizes to any integral domain.
[quotetheorem:866]
The field of fractions construction is one of the most powerful tools in ring theory: it lets us embed any integral domain into a field, unlocking the full arsenal of field techniques (the Euclidean algorithm in $F[X]$, factorization in $F[X]$ using roots, etc.) for problems in $R$. For instance, to study factorization of polynomials in $\mathbb{Z}[X]$, we often pass to $\mathbb{Q}[X]$ (the field of fractions of $\mathbb{Z}$ is $\mathbb{Q}$), where the Euclidean algorithm is available, and then use Gauss's lemma to pull information back to $\mathbb{Z}[X]$. The transitivity property of the field of fractions is: if $R \leq S$ is a subring of an integral domain $S$, then the field of fractions of $R$ embeds into that of $S$.
[quoteproof:866]
[example:Fields of Fractions]
The field of fractions of $\mathbb{Z}$ is $\mathbb{Q}$. The field of fractions of $\mathbb{Z}[i]$ is $\mathbb{Q}(i) = \{a + bi : a, b \in \mathbb{Q}\}$. The field of fractions of $\mathbb{C}[X]$ is $\mathbb{C}(X)$, the field of rational functions $p(X)/q(X)$ with $p, q \in \mathbb{C}[X]$ and $q \neq 0$. These rational functions are not the same as holomorphic functions — they are purely algebraic objects, and two non-zero polynomials $q$ and $q'$ can have the same rational function even if they differ as polynomials (they cannot, but this illustrates that the construction is formal).
[/example]
Prime and Maximal Ideals
Not all ideals are alike. Two special classes — prime ideals and maximal ideals — control the arithmetic of the ring in complementary ways. Both are best understood through their quotient rings.
[definition:Prime Ideal]
An ideal $I \trianglelefteq R$ is prime if $I \neq R$ and whenever $ab \in I$ for $a, b \in R$, either $a \in I$ or $b \in I$.
[/definition]
[definition:Maximal Ideal]
An ideal $I \trianglelefteq R$ is maximal if $I \neq R$ and there is no ideal $J$ with $I \subsetneq J \subsetneq R$.
[/definition]
The prime ideal condition is a ring-theoretic generalization of the defining property of prime numbers in $\mathbb{Z}$: $p \mid ab \implies p \mid a$ or $p \mid b$. The maximal ideal condition says there is no proper ideal strictly larger than $I$, which is the analogue of a minimal prime in some sense — though the terminology is the other way round. The key theorems characterize both conditions via quotient rings.
[quotetheorem:852]
The Maximal Ideal Criterion says that $I$ is maximal if and only if $R/I$ is a field. The proof goes through the ideal correspondence theorem: ideals of $R/I$ correspond to ideals of $R$ containing $I$, so $R/I$ has no proper non-zero ideals iff $I$ is maximal, and a ring with no proper non-zero ideals is a field (every non-zero element $r$ generates the whole ring, so $(r) = R$, giving $sr = 1_R$ for some $s$, i.e. $r$ is a unit). This is one of the most useful criteria in ring theory: to check $I$ is maximal, it suffices to show $R/I$ is a field.
[quoteproof:852]
[quotetheorem:853]
The Prime Ideal Criterion gives an equally clean characterization: $I$ is prime iff $R/I$ is an integral domain. Since every field is an integral domain, every maximal ideal is prime. The converse fails: in $\mathbb{Z}[X]$, the ideal $(X)$ is prime (since $\mathbb{Z}[X]/(X) \cong \mathbb{Z}$, which is an integral domain) but not maximal (the ideal $(X, 2)$ is strictly larger, and $\mathbb{Z}[X]/(X, 2) \cong \mathbb{Z}/2\mathbb{Z}$, a field, so $(X, 2)$ is maximal). More strikingly, in $\mathbb{Z}$, every non-zero prime ideal $(p)$ is also maximal, since $\mathbb{Z}/(p) = \mathbb{Z}/p\mathbb{Z}$ is a field. The coincidence of prime and maximal for $\mathbb{Z}$ is a special property of PIDs.
[quoteproof:853]
[example:Prime and Maximal Ideals in $\mathbb{Z}[X]$]
We survey the ideal landscape of $\mathbb{Z}[X]$.
The zero ideal $(0)$ is prime (since $\mathbb{Z}[X]$ is an integral domain) but not maximal.
For a prime $p \in \mathbb{Z}$, the ideal $(p)$ is prime: $\mathbb{Z}[X]/(p) \cong (\mathbb{Z}/p\mathbb{Z})[X]$, which is an integral domain (since $\mathbb{Z}/p\mathbb{Z}$ is a field, hence $(\mathbb{Z}/p\mathbb{Z})[X]$ is an integral domain). But $(p)$ is not maximal, since $(p, X)$ is strictly larger.
The ideal $(p, f)$ where $f$ is irreducible modulo $p$ is maximal: $\mathbb{Z}[X]/(p, f) \cong (\mathbb{Z}/p\mathbb{Z})[X]/(f)$, which is a field (since $f$ is irreducible over $\mathbb{Z}/p\mathbb{Z}$, so $(f)$ is maximal in $(\mathbb{Z}/p\mathbb{Z})[X]$).
For example, $(2, X^2 + X + 1)$ is a maximal ideal in $\mathbb{Z}[X]$, with quotient isomorphic to $\mathbb{F}_4$, the field of four elements.
[/example]
Factorization in Integral Domains
The integers have two remarkable factorization properties: every non-zero non-unit factors into primes, and this factorization is unique. Most integral domains do not share both properties. Understanding which do — and why — is the heart of ring-theoretic arithmetic.
[definition:Divisibility and Associates]
For $a, b \in R$ (an integral domain), $a$ divides $b$ (written $a \mid b$) if $b = ac$ for some $c \in R$. Equivalently, $(b) \subseteq (a)$. Elements $a, b$ are associates if $a = bu$ for some unit $u$; equivalently, $(a) = (b)$.
[/definition]
[definition:Irreducible Element]
A non-zero non-unit $a \in R$ is irreducible if whenever $a = bc$, either $b$ or $c$ is a unit.
[/definition]
[definition:Prime Element]
A non-zero non-unit $a \in R$ is prime if whenever $a \mid bc$, either $a \mid b$ or $a \mid c$.
[/definition]
In $\mathbb{Z}$, these coincide: an integer is irreducible iff it is prime iff it is $\pm p$ for some prime number $p$. But in general integral domains, primes and irreducibles can diverge. In $\mathbb{Z}[\sqrt{-5}] = \{a + b\sqrt{-5} : a, b \in \mathbb{Z}\}$, the factorization $6 = 2 \cdot 3 = (1 + \sqrt{-5})(1 - \sqrt{-5})$ shows two distinct factorizations into irreducibles. One verifies using the norm $N(a + b\sqrt{-5}) = a^2 + 5b^2$ that $2, 3, 1 \pm \sqrt{-5}$ are all irreducible (there is no element of norm $2$ or $3$ in $\mathbb{Z}[\sqrt{-5}]$), yet they are not all prime: $2 \nmid 1 + \sqrt{-5}$ and $2 \nmid 1 - \sqrt{-5}$ (since $N(2) = 4 \nmid N(1 \pm \sqrt{-5}) = 6$), so $2$ is irreducible but not prime. The failure of unique factorization and the failure of irreducible $\Leftrightarrow$ prime are two sides of the same coin.
To restore well-behaved arithmetic, we impose progressively stronger conditions.
[definition:Euclidean Domain]
An integral domain $R$ is a Euclidean domain (ED) if there is a function $\varphi : R \setminus \{0\} \to \mathbb{Z}_{\geq 0}$ (the Euclidean function) such that:
\begin{align*}
&\text{(i) } \varphi(ab) \geq \varphi(b) \text{ for all } a, b \neq 0, \\
&\text{(ii) for any } a, b \in R \text{ with } b \neq 0, \text{ there exist } q, r \in R \text{ with } a = bq + r \text{ and } r = 0 \text{ or } \varphi(r) < \varphi(b).
\end{align*}
[/definition]
[definition:Principal Ideal Domain]
An integral domain $R$ is a principal ideal domain (PID) if every ideal is principal.
[/definition]
[definition:Unique Factorization Domain]
An integral domain $R$ is a unique factorization domain (UFD) if every non-zero non-unit factors into irreducibles, and this factorization is unique up to order and associates.
[/definition]
The hierarchy is strict: $\mathrm{ED} \implies \mathrm{PID} \implies \mathrm{UFD} \implies \mathrm{ID}$, and none of the implications reverse. The integers $\mathbb{Z}$ with $\varphi(n) = |n|$ and the polynomial ring $F[X]$ over a field $F$ with $\varphi(f) = \deg f$ are Euclidean domains. The Gaussian integers $\mathbb{Z}[i]$ with $\varphi(z) = |z|^2 = a^2 + b^2$ are Euclidean: given $a, b \in \mathbb{Z}[i]$ with $b \neq 0$, the complex number $a/b \in \mathbb{C}$ lies within distance $\sqrt{2}/2 < 1$ of some Gaussian integer $q$, and setting $r = a - bq$ gives $\varphi(r) = |b|^2 |a/b - q|^2 < |b|^2 = \varphi(b)$. The ring $\mathbb{Z}[\sqrt{-5}]$ is none of the above.
[quotetheorem:855]
Euclidean Domains Are Principal Ideal Domains is the ring-theoretic analogue of the argument showing every ideal of $\mathbb{Z}$ is of the form $n\mathbb{Z}$: pick the element of smallest $\varphi$-value in the ideal, and the division algorithm forces every other element to be a multiple. The proof works word-for-word, replacing $|\cdot|$ with $\varphi$.
[quoteproof:855]
In a PID, being irreducible and being prime are equivalent — a fact that fails dramatically in rings like $\mathbb{Z}[\sqrt{-5}]$.
[quotetheorem:856]
The proof of In PIDs Irreducible Elements Are Prime is a Bézout argument. In $\mathbb{Z}$, if $p \nmid a$ then $\gcd(p, a) = 1$, so $rp + sa = 1$ for some integers $r, s$. Multiplying by $b$ gives $b = rpb + sab$; if $p \mid ab$ then both terms on the right are divisible by $p$, so $p \mid b$. In a PID, the ideal $(p, a)$ is principal: $(p, a) = (d)$. Irreducibility of $p$ forces either $d \sim p$ (meaning $p \mid a$, contradicting $p \nmid a$) or $d$ is a unit (meaning $(d) = R$, giving the Bézout relation). The rest is identical to the $\mathbb{Z}$ argument.
[quoteproof:856]
The grand payoff is that PIDs have unique factorization:
[quotetheorem:867]
Principal Ideal Domains Are Unique Factorization Domains. The proof has two independent parts: existence (using the ascending chain condition — PIDs are Noetherian, since every ideal is finitely generated by a single element, so any ascending chain stabilises) and uniqueness (using that irreducibles are prime, then cancelling one factor at a time, just as in $\mathbb{Z}$). The combination of these two properties — ACC and prime equals irreducible — is what characterizes UFDs among integral domains.
[quoteproof:867]
[example:The Ring $\mathbb{Z}[\sqrt{-5}]$ Is Not a UFD]
The failure of unique factorization in $\mathbb{Z}[\sqrt{-5}]$ is now fully explained. The norm function $N(a + b\sqrt{-5}) = a^2 + 5b^2$ satisfies $N(xy) = N(x)N(y)$, so units have norm $1$: only $N(a + b\sqrt{-5}) = 1$ with $a^2 + 5b^2 = 1$ has the solution $(\pm 1, 0)$. One verifies that $2, 3, 1 \pm \sqrt{-5}$ are all irreducible (no element of norm $2$ or $3$ exists), yet $6 = 2 \cdot 3 = (1+\sqrt{-5})(1-\sqrt{-5})$ gives two distinct factorizations into irreducibles. The irreducible $2$ is not prime: $2 \mid (1+\sqrt{-5})(1-\sqrt{-5})$ but $2 \nmid 1 \pm \sqrt{-5}$ (since otherwise $\frac{1 \pm \sqrt{-5}}{2}$ would be a Gaussian integer, but its norm is $6/4 \notin \mathbb{Z}$). So this ring fails: irreducible $\centernot\Rightarrow$ prime, and unique factorization fails simultaneously.
[/example]
Factorization in Polynomial Rings
Polynomial rings over fields are Euclidean domains, hence PIDs and UFDs. But polynomial rings over $\mathbb{Z}$ — like $\mathbb{Z}[X]$ — are UFDs that are not PIDs. For these, Gauss's lemma provides the essential link between factorization in $R[X]$ and factorization in $F[X]$, where $F$ is the field of fractions.
[definition:Content of a Polynomial]
Let $R$ be a UFD and $f = a_0 + a_1 X + \cdots + a_n X^n \in R[X]$. The content of $f$ is $c(f) = \gcd(a_0, a_1, \ldots, a_n) \in R$ (well-defined up to a unit). The polynomial $f$ is primitive if $c(f)$ is a unit, i.e. if the coefficients are coprime.
[/definition]
Every polynomial $f \in R[X]$ factors as $f = c(f) \cdot f_1$ where $f_1$ is primitive. So factorization in $R[X]$ splits into two independent problems: factorizing the content in $R$, and factorizing the primitive part in $R[X]$ (or equivalently, in $F[X]$, by Gauss's lemma).
[quotetheorem:858]
Gauss's Lemma is the bridge between $R[X]$ and $F[X]$. The forward direction (reducible over $R$ implies reducible over $F$) is trivial. The reverse direction is the content: if $f = gh$ in $F[X]$, we can clear denominators to get $abf = (ag)(bh)$ in $R[X]$, then compare contents. Since $f$ is primitive, $c(abf) = ab$, and $c(ag)c(bh) = ab$ up to a unit, allowing us to reassemble $f = g_1 h_1$ with $g_1, h_1 \in R[X]$ primitive, thus non-units. The elegance of the argument is that content is the right invariant to track: multiplicativity of content ($c(fg) \sim c(f)c(g)$) does all the work.
[quoteproof:858]
[example:Irreducibility via Gauss's Lemma]
The polynomial $f = X^3 + X + 1 \in \mathbb{Z}[X]$ is primitive (content $= 1$). By Gauss's lemma, $f$ is irreducible in $\mathbb{Q}[X]$ iff it is irreducible in $\mathbb{Z}[X]$. A degree-$3$ polynomial over $\mathbb{Q}$ is reducible iff it has a rational root. By the rational root theorem, any rational root of $f$ has the form $\pm 1$ (numerator divides the constant term $1$, denominator divides the leading coefficient $1$). But $f(1) = 3 \neq 0$ and $f(-1) = -1 \neq 0$. So $f$ has no rational roots, hence is irreducible over both $\mathbb{Z}$ and $\mathbb{Q}$.
[/example]
When there is no rational root to check, Eisenstein's criterion detects irreducibility by a single prime.
[quotetheorem:859]
Eisenstein's Criterion is one of the most efficient irreducibility tests available. Its proof is a clean divisibility argument: the Eisenstein prime $p$ divides $a_0$ but not $a_n = 1$ (since $f$ is primitive), so exactly one of the constant terms of the two hypothetical factors is divisible by $p$; a traveling-index argument then shows $p$ must divide all coefficients of one factor except the leading one, forcing that factor to have degree $n$ — contradicting the factorization being non-trivial. Notice Eisenstein requires working in $R[X]$, not $F[X]$ — the prime $p$ plays no role over a field, since it is a unit there.
[quoteproof:859]
[example:Cyclotomic Polynomials Are Irreducible]
Let $p$ be prime and consider the polynomial
\begin{align*}
f = X^{p-1} + X^{p-2} + \cdots + X + 1 = \frac{X^p - 1}{X - 1} \in \mathbb{Z}[X].
\end{align*}
Eisenstein does not apply directly to $f$. The standard trick is to substitute $Y = X - 1$:
\begin{align*}
\hat{f}(Y) = f(Y+1) = \frac{(Y+1)^p - 1}{Y} = Y^{p-1} + \binom{p}{1}Y^{p-2} + \cdots + \binom{p}{p-1}.
\end{align*}
Now apply Eisenstein with the prime $p$: $p \mid \binom{p}{k}$ for $1 \leq k \leq p-1$ (a standard binomial coefficient fact), and $p^2 \nmid \binom{p}{p-1} = p$. So $\hat{f}$ is irreducible in $\mathbb{Z}[Y]$ by Eisenstein. Since a factorization $f(X) = g(X)h(X)$ in $\mathbb{Z}[X]$ gives $\hat{f}(Y) = g(Y+1)h(Y+1)$ in $\mathbb{Z}[Y]$, irreducibility of $\hat{f}$ implies irreducibility of $f$.
[/example]
Noetherian Rings
The Hilbert basis theorem is one of the pivotal results of nineteenth-century algebra. Before Hilbert, invariant theorists labored to exhibit finite generating sets for rings of symmetries by hand. Hilbert proved in one stroke that any ideal in a polynomial ring over a Noetherian ring is finitely generated — ending the laborious case-by-case approach.
A ring is Noetherian if ideals cannot grow indefinitely. The definition is equivalent to requiring all ideals to be finitely generated — and that equivalence is itself a useful theorem.
[definition:Ascending Chain Condition]
A ring $R$ satisfies the ascending chain condition (ACC) if every ascending chain of ideals $I_1 \subseteq I_2 \subseteq I_3 \subseteq \cdots$ eventually stabilises: there exists $N$ with $I_n = I_N$ for all $n \geq N$.
[/definition]
[definition:Noetherian Ring]
A ring $R$ is Noetherian if it satisfies the ACC. Equivalently, every ideal of $R$ is finitely generated.
[/definition]
Every PID is Noetherian: all its ideals are principal, hence generated by a single element. Every field is Noetherian (only two ideals). Every quotient of a Noetherian ring is Noetherian (ideals of $R/I$ pull back to ideals of $R$ containing $I$, which are finitely generated, and their images in $R/I$ are then finitely generated by the images of the generators). A non-example: $\mathbb{Z}[X_1, X_2, X_3, \ldots]$ (infinitely many variables) is not Noetherian — the chain $(X_1) \subsetneq (X_1, X_2) \subsetneq (X_1, X_2, X_3) \subsetneq \cdots$ never stabilises.
The crucial closure property is that Noetherian-ness passes to polynomial rings:
[quotetheorem:860]
The Hilbert Basis Theorem is the reason polynomial rings are tractable. Every ideal $I \trianglelefteq R[X]$ is determined by finitely many polynomial equations — a foundational fact for algebraic geometry, where ideals of $\mathbb{R}[X_1, \ldots, X_n]$ correspond to polynomial systems whose solution sets are algebraic varieties. The theorem says that any such system, though potentially given by infinitely many equations, is determined by finitely many of them.
The proof works by extracting leading coefficients at each degree to form an ascending chain of ideals in $R$. The Noetherian hypothesis on $R$ forces this chain to stabilise at some level $N$, and the finitely many generating polynomials (one at each degree $0 \leq n \leq N$ for each generator of the corresponding ideal in $R$) then suffice to generate all of $I$ by an induction on degree argument.
[quoteproof:860]
[example:Applications of Noetherian Rings]
Let $F$ be a field and consider any system of polynomial equations $f_\alpha(x_1, \ldots, x_n) = 0$ for $\alpha$ ranging over some (possibly infinite) index set. Let $I = (\{f_\alpha\})$ be the ideal generated by all these polynomials in $F[X_1, \ldots, X_n]$.
Since $F$ is Noetherian, and $F[X_1, \ldots, X_n]$ is Noetherian by iterated application of the Hilbert basis theorem, the ideal $I$ is finitely generated: $I = (f_1, \ldots, f_k)$ for some finite list $f_1, \ldots, f_k$. A point $a = (a_1, \ldots, a_n)$ satisfies all the equations $f_\alpha(a) = 0$ if and only if it satisfies $f_1(a) = \cdots = f_k(a) = 0$ (since every $f_\alpha$ is a combination $\sum r_i f_i$, so vanishing of $f_1, \ldots, f_k$ forces vanishing of all $f_\alpha$). Thus the solution set of an arbitrary polynomial system equals the solution set of a finite polynomial system — a remarkable compactness statement that requires no topology, only Noetherian algebra.
[/example]
Modules
If rings generalise the integers by keeping two operations, then modules generalise vector spaces by relaxing the requirement that scalars form a field. A vector space over $\mathbb{R}$ or $\mathbb{C}$ is geometrically intuitive but algebraically rigid — bases always exist, dimension is well-defined, and every subspace has a complement. When we allow scalars from a ring $R$, this rigidity dissolves. Not every module has a basis, not every submodule is a direct summand, and the structure of a module depends sensitively on the ring $R$. This loss of rigidity is not a weakness; it is where the richness comes from.
The payoff for working with modules over a ring $R$ rather than vector spaces over a field is the structure theorem: every finitely generated module over a Euclidean domain decomposes into a direct sum of cyclic modules, classified by invariant factors. Applied with $R = \mathbb{Z}$, this immediately classifies all finite abelian groups — the result stated without proof at the end of Chapter 1. Applied with $R = \mathbb{F}[X]$, it produces the rational canonical form and Jordan normal form for matrices, giving a purely algebraic proof of results that linear algebra usually handles by more computational means.
Modules and Submodules
The Definition
A module over a ring $R$ is an abelian group on which $R$ acts by scalar multiplication, compatibly with both the ring structure of $R$ and the group structure of the module.
[definition: Module]
Let $R$ be a commutative ring. An $R$-module is a quadruple $(M, +, 0_M, \cdot)$ where $(M, +, 0_M)$ is an abelian group and $\cdot : R \times M \to M$ is a scalar multiplication satisfying, for all $r, s \in R$ and $m, n \in M$:
\begin{align*}
&\text{(i) } (r + s) \cdot m = r \cdot m + s \cdot m, \\
&\text{(ii) } r \cdot (m + n) = r \cdot m + r \cdot n, \\
&\text{(iii) } r \cdot (s \cdot m) = (rs) \cdot m, \\
&\text{(iv) } 1_R \cdot m = m.
\end{align*}
[/definition]
The axioms say that $R$ acts on $M$ by ring homomorphisms: each $r \in R$ gives an additive endomorphism $m \mapsto rm$ of $M$, and the map $r \mapsto (m \mapsto rm)$ is itself a ring homomorphism $R \to \mathrm{End}(M)$. This is the coordinate-free way to think about modules: a module is an abelian group together with a ring action on it.
[example: The Canonical Examples of Modules]
Vector spaces. If $\mathbb{F}$ is a field, an $\mathbb{F}$-module is exactly an $\mathbb{F}$-vector space. Every result in this chapter specialises to a (usually easier) statement about vector spaces.
Abelian groups as $\mathbb{Z}$-modules. Every abelian group $(A, +)$ is a $\mathbb{Z}$-module via $n \cdot a = a + \cdots + a$ ($n$ times), extended to negative integers and zero in the obvious way. This action is forced: $1 \cdot a = a$ by axiom (iv), and the rest follows by distributivity. Conversely, every $\mathbb{Z}$-module is an abelian group. So $\mathbb{Z}$-modules and abelian groups are the same thing.
Ideals and quotients. Any ideal $I \trianglelefteq R$ is an $R$-module under the ring multiplication. The quotient ring $R/I$ is also an $R$-module via $r \cdot (a + I) = ra + I$.
$R^n$. For any ring $R$ and $n \geq 1$, the direct product $R^n = R \times \cdots \times R$ is an $R$-module via $r \cdot (r_1, \ldots, r_n) = (rr_1, \ldots, rr_n)$. This is the module-theoretic analogue of $\mathbb{F}^n$.
$\mathbb{F}[X]$-modules. Let $\mathbb{F}$ be a field, $V$ an $\mathbb{F}$-vector space, and $\alpha : V \to V$ a linear map. Then $V$ becomes an $\mathbb{F}[X]$-module via $f \cdot v = f(\alpha)(v)$. Different choices of $\alpha$ give different $\mathbb{F}[X]$-module structures on the same abelian group $V$. This example is the gateway to normal forms for matrices.
[/example]
[definition: Submodule]
Let $M$ be an $R$-module. A subset $N \subseteq M$ is an $R$-submodule, written $N \leq M$, if $N$ is a subgroup of $(M, +)$ and $rn \in N$ for all $r \in R$, $n \in N$.
[/definition]
[definition: Quotient Module]
If $N \leq M$ is an $R$-submodule, the quotient module $M/N$ is the set of additive cosets $\{m + N : m \in M\}$ with the $R$-action $r \cdot (m + N) = rm + N$.
[/definition]
Modules differ from groups in a notable way: in groups, we distinguished subgroups from normal subgroups, and only the latter allowed quotienting. In modules, every submodule is automatically "normal" as an abelian group (since $M$ is abelian), so we can always form the quotient. This uniformity makes the quotient theory of modules cleaner than that of groups.
[definition: Annihilator]
Let $M$ be an $R$-module and $S \subseteq M$ a subset. The annihilator of $S$ is
\begin{align*}
\operatorname{Ann}(S) = \{r \in R : r \cdot m = 0 \text{ for all } m \in S\}.
\end{align*}
This is always an ideal of $R$. For a single element $m \in M$, $\operatorname{Ann}(m)$ is the ideal of scalars that kill $m$.
[/definition]
[definition: Torsion]
An element $m \in M$ is a torsion element if $\operatorname{Ann}(m) \neq 0$, i.e. if there exists a non-zero $r \in R$ with $rm = 0$. The module $M$ is a torsion module if every element is torsion, and torsion-free if the only torsion element is $0$.
[/definition]
In a $\mathbb{Z}$-module (abelian group), torsion elements are precisely the elements of finite order. In an $\mathbb{F}$-vector space ($\mathbb{F}$ a field), there are no torsion elements other than $0$, since $\mathbb{F}$ has no zero divisors and only $0$ is annihilated by a non-zero scalar. Torsion and free parts are the two ingredients in the structure theorem.
[definition: Finitely Generated Module]
An $R$-module $M$ is finitely generated if there exist $m_1, \ldots, m_k \in M$ such that
\begin{align*}
M = Rm_1 + \cdots + Rm_k = \{r_1 m_1 + \cdots + r_k m_k : r_i \in R\}.
\end{align*}
Equivalently, $M$ is finitely generated iff there is a surjective $R$-module homomorphism $R^k \twoheadrightarrow M$ for some $k$.
[/definition]
The equivalence with surjections from $R^k$ is useful: it means every finitely generated module is a quotient of a free module $R^k$. The kernel of that surjection is itself a submodule of $R^k$, and understanding the kernel — via the Smith normal form of its generator matrix — is exactly what the structure theorem does.
Homomorphisms and the Isomorphism Theorems for Modules
[definition: Module Homomorphism]
Let $M$ and $N$ be $R$-modules. A function $f : M \to N$ is an $R$-module homomorphism if $f(m_1 + m_2) = f(m_1) + f(m_2)$ and $f(rm) = rf(m)$ for all $r \in R$, $m, m_1, m_2 \in M$. A bijective homomorphism is an isomorphism.
[/definition]
The kernel $\ker f = \{m \in M : f(m) = 0\}$ is a submodule of $M$, and the image $\operatorname{im} f$ is a submodule of $N$. The three isomorphism theorems hold for modules with the same proofs as for groups, since both rely only on the underlying abelian group structure supplemented by the scalar action.
[quotetheorem:862]
The First Isomorphism Theorem for Modules is the foundation for identifying modules via surjective homomorphisms. To show $M \cong N$, exhibit a surjective $R$-module homomorphism $\varphi : M \to N$ and identify its kernel. As with groups, the key work is always in computing $\ker \varphi$ and verifying surjectivity; the isomorphism itself is then automatic. This theorem is what converts the Smith normal form computation (which identifies the kernel of a surjection $R^m \to M$) into the structure theorem decomposition.
[quoteproof:862]
[example: The Cyclic Module]
For any $m \in M$, the map $\varphi : R \to M$ defined by $\varphi(r) = rm$ is an $R$-module homomorphism with image $Rm = \{rm : r \in R\}$ (the submodule generated by $m$) and kernel $\operatorname{Ann}(m)$. By the First Isomorphism Theorem for Modules:
\begin{align*}
Rm \cong R/\operatorname{Ann}(m).
\end{align*}
This is the fundamental example of a cyclic module. When $R = \mathbb{Z}$ and $M = \mathbb{Z}/n\mathbb{Z}$, the element $m = 1$ has $\operatorname{Ann}(m) = n\mathbb{Z}$, and $\mathbb{Z} \cdot 1 = \mathbb{Z}/n\mathbb{Z}$ is the whole module. When $R = \mathbb{F}[X]$ and $M = V_\alpha$ is a cyclic $\mathbb{F}[X]$-module, $\operatorname{Ann}(v)$ is the ideal generated by the minimal polynomial of $v$ with respect to $\alpha$.
[/example]
Free Modules and Linear Independence
The nicest modules are those with a basis — a linearly independent generating set. In vector spaces, every generating set contains a basis and every basis has the same size. Neither statement holds in general for modules over rings, which is one of the main differences between module theory and linear algebra.
[definition: Linear Independence]
Elements $m_1, \ldots, m_k \in M$ are linearly independent (over $R$) if $\sum_{i=1}^k r_i m_i = 0$ with $r_i \in R$ implies $r_1 = \cdots = r_k = 0$.
[/definition]
[definition: Free Module and Basis]
An $R$-module $M$ is free if it has a basis: a subset $S \subseteq M$ that generates $M$ and is linearly independent. If $S = \{m_1, \ldots, m_n\}$ is finite, then $M \cong R^n$.
[/definition]
Free modules over a ring behave like vector spaces: any function from a basis to another module extends uniquely to a homomorphism. However, unlike vector spaces, not every module is free. The $\mathbb{Z}$-module $\mathbb{Z}/2\mathbb{Z}$ is not free: any supposed basis element $m$ satisfies $2m = 0$, so the set $\{m\}$ is not linearly independent over $\mathbb{Z}$ (since $2 \neq 0$ in $\mathbb{Z}$ but $2 \cdot m = 0$). More subtly, even for free modules, the rank (size of a basis) need not be well-defined without additional hypotheses. For modules over a non-zero commutative ring, rank is well-defined: if $R^m \cong R^n$ then $m = n$ (proved by passing to $R^m / \mathfrak{m} R^m \cong (R/\mathfrak{m})^m$ for a maximal ideal $\mathfrak{m}$, which is a vector space). This is the invariance of rank.
[example: Free and Non-Free Modules]
The module $R^n$ is free of rank $n$ for any ring $R$, with the standard basis $\{e_1, \ldots, e_n\}$.
The ideal $(2, X) \trianglelefteq \mathbb{Z}[X]$ is a submodule of $\mathbb{Z}[X]$ (which is free of rank $1$) but is not free of rank $1$: it cannot be generated by a single element, as shown in Chapter 2. It is generated by $2$ and $X$, but these are not independent: $X \cdot 2 = 2 \cdot X$ in $\mathbb{Z}[X]$, so the generators satisfy a relation. This example shows that submodules of free modules need not be free — unless the ring is a PID (where they always are, as a consequence of the structure theorem).
[/example]
Smith Normal Form
The Smith normal form is a normal form for matrices over a Euclidean domain, analogous to the row-echelon form over a field but more refined. Over a field, any matrix can be reduced to a block of $1$s followed by $0$s. Over a Euclidean domain, the best we can do is a diagonal matrix with a divisibility condition. This turns out to be exactly what we need to classify finitely generated modules.
[definition: Elementary Row and Column Operations]
Over a ring $R$, the elementary row operations on a matrix $A$ are: (i) adding $c \in R$ times one row to another, (ii) swapping two rows, (iii) multiplying a row by a unit of $R$. Elementary column operations are defined analogously. Two matrices are equivalent if one can be obtained from the other by a sequence of elementary row and column operations; equivalently, $B = PAQ$ for some invertible matrices $P, Q$.
[/definition]
[definition: Fitting Ideals]
For an $m \times n$ matrix $A$ over $R$, the $k$th Fitting ideal $\mathrm{Fit}_k(A) \trianglelefteq R$ is the ideal generated by all $k \times k$ minors of $A$. Equivalent matrices have the same Fitting ideals.
[/definition]
The Fitting ideals are the key invariants: they are preserved by row and column operations, so they are genuinely attached to the equivalence class of $A$. For the Smith normal form $D = \mathrm{diag}(d_1, \ldots, d_r, 0, \ldots, 0)$, one computes $\mathrm{Fit}_k(D) = (d_1 d_2 \cdots d_k)$, which shows the invariant factors $d_k$ are uniquely determined (as the ratio of consecutive Fitting ideal generators) and gives the uniqueness part of the Smith normal form theorem.
[quotetheorem:861]
The Smith Normal Form Theorem is the engine behind the entire classification theory of this chapter. The algorithm is clean: bring the smallest-$\varphi$-value entry to the top-left corner, use the division algorithm to clear the rest of the first row and column, then handle off-diagonal entries in the remaining block by the same method. The divisibility condition $d_1 \mid d_2 \mid \cdots \mid d_r$ emerges automatically from the algorithm, and uniqueness follows from the Fitting ideal computation.
[quoteproof:861]
[example: Computing a Smith Normal Form over $\mathbb{Z}$]
We reduce the matrix
\begin{align*}
A = \begin{pmatrix} 3 & 7 & 4 \\ 1 & -1 & 2 \\ 3 & 5 & 1 \end{pmatrix}
\end{align*}
to Smith normal form. First bring the $1$ in position $(2,1)$ to position $(1,1)$ by swapping rows $1$ and $2$:
\begin{align*}
\begin{pmatrix} 1 & -1 & 2 \\ 3 & 7 & 4 \\ 3 & 5 & 1 \end{pmatrix}.
\end{align*}
Clear the first row by subtracting multiples of column $1$ from columns $2$ and $3$:
\begin{align*}
\begin{pmatrix} 1 & 0 & 0 \\ 3 & 10 & -2 \\ 3 & 8 & -5 \end{pmatrix}.
\end{align*}
Clear the first column similarly:
\begin{align*}
\begin{pmatrix} 1 & 0 & 0 \\ 0 & 10 & -2 \\ 0 & 8 & -5 \end{pmatrix}.
\end{align*}
Now work on the $2 \times 2$ block. The entry $-2$ is not divisible by $10$, so use the division algorithm: $10 = (-5)(-2) + 0$, so subtract $-5$ times column $3$ from column $2$ (or note $\gcd(10, -2) = 2$). Instead, swap columns $2$ and $3$ and negate to bring $2$ to position $(2,2)$:
\begin{align*}
\begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 10 \\ 0 & 5 & 8 \end{pmatrix}.
\end{align*}
Now $10 = 5 \cdot 2 + 0$ and $8 = 4 \cdot 2 + 0$, so column operations clear the second row, and row operations clear the second column:
\begin{align*}
\begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & -17 \end{pmatrix} \xrightarrow{\times(-1)} \begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 17 \end{pmatrix}.
\end{align*}
We verify the Fitting ideals: $\mathrm{Fit}_1(A) = (1)$ (the entry $1$ generates $\mathbb{Z}$), $\mathrm{Fit}_2(A) = (d_1 d_2) = (2)$ (the $2\times 2$ minor from the first two rows and columns of $A$ equals $\detMATHENV4al3yvP15END = -10$, and others; $\gcd = 2$), and $\mathrm{Fit}_3(A) = (\det A) = (34)$. So $d_1 = 1$, $d_2 = 2$, $d_3 = 17 = 34/2$. Indeed $1 \mid 2 \mid 17$.
[/example]
The Structure Theorem
With the Smith normal form established, the classification of finitely generated modules over a Euclidean domain is a single step: write a module as the cokernel of a presentation matrix, put that matrix in Smith normal form, and read off the decomposition.
[quotetheorem:857]
The Structure Theorem for Finitely Generated Modules over Euclidean Domains is the culmination of everything in this chapter. It says: once you know the ring is Euclidean, every finitely generated module is completely determined, up to isomorphism, by a finite sequence of invariant factors. The free part $R^s$ captures the torsion-free part of $M$; the summands $R/(d_i)$ capture the torsion. The two parts are cleanly separated because $R$ is an integral domain: a module is torsion-free iff it has no cyclic summands $R/(d)$ with $d \neq 0$.
The proof strategy is elegant in its economy. Since $M$ is finitely generated, there is a surjection $\varphi : R^m \to M$. The kernel $\ker\varphi$ is a submodule of $R^m$, hence finitely generated (by at most $m$ elements, since $R$ is a PID). Arrange the generators of $\ker\varphi$ as columns of an $m \times n$ matrix $A$. The Smith normal form theorem turns $A$ into a diagonal matrix via row and column operations. Row operations correspond to change of basis in $R^m$; column operations correspond to change of generators for $\ker\varphi$. Reading off the diagonal entries gives the claimed decomposition.
[quoteproof:857]
[example: Classifying an Abelian Group from Generators and Relations]
Let $A$ be the abelian group generated by $a, b, c$ with relations
\begin{align*}
2a + 3b + c = 0, \qquad a + 2b = 0, \qquad 5a + 6b + 7c = 0.
\end{align*}
As a $\mathbb{Z}$-module, $A = \mathbb{Z}^3 / N$ where $N$ is the submodule generated by the rows of the relation matrix (or equivalently, the cokernel of the matrix of relations). The presentation matrix, written with the relations as columns, is:
\begin{align*}
A_{\text{pres}} = \begin{pmatrix} 2 & 1 & 5 \\ 3 & 2 & 6 \\ 1 & 0 & 7 \end{pmatrix}.
\end{align*}
We compute Fitting ideals to find the Smith normal form. Since $(A_{\text{pres}})_{31} = 1$, we have $\mathrm{Fit}_1(A_{\text{pres}}) = (1)$, so $d_1 = 1$. The $2 \times 2$ minor from rows $1,2$ and columns $1,2$ is $\detMATHENV4al3yvP18END = 1$, so $\mathrm{Fit}_2 = (1)$ and $d_2 = 1$. Finally $\det(A_{\text{pres}}) = 2(14-0) - 1(21-6) + 5(0-2) = 28 - 15 - 10 = 3$, so $\mathrm{Fit}_3 = (3)$ and $d_3 = 3$.
The Smith normal form is $\mathrm{diag}(1, 1, 3)$. Therefore:
\begin{align*}
A \cong \frac{\mathbb{Z}}{(1)} \oplus \frac{\mathbb{Z}}{(1)} \oplus \frac{\mathbb{Z}}{(3)} \cong \{0\} \oplus \{0\} \oplus C_3 \cong C_3.
\end{align*}
The group is cyclic of order $3$. The two summands $\mathbb{Z}/(1) = 0$ vanish because $d_1 = d_2 = 1$ are units.
[/example]
[example: Classification of Finitely Generated Abelian Groups, Revisited]
As a special case of the structure theorem with $R = \mathbb{Z}$: every finitely generated abelian group is isomorphic to
\begin{align*}
C_{d_1} \times C_{d_2} \times \cdots \times C_{d_r} \times \mathbb{Z}^s,
\end{align*}
with $d_1 \mid d_2 \mid \cdots \mid d_r$ and $s \geq 0$. This is the Classification of Finite Abelian Groups stated in Chapter 1, now fully proved. The invariant factors $d_i$ and the rank $s$ are uniquely determined by the group — they are computed from the Fitting ideals of any presentation matrix.
For example, all abelian groups of order $360 = 2^3 \cdot 3^2 \cdot 5$ (with no free part, since the group is finite) are enumerated by sequences $d_1 \mid d_2 \mid \cdots \mid d_r$ with $\prod d_i = 360$. These are: $C_{360}$; $C_2 \times C_{180}$ (since $2 \mid 180$); $C_6 \times C_{60}$ (since $6 \mid 60$); $C_2 \times C_2 \times C_{90}$ (since $2 \mid 2 \mid 90$); $C_6 \times C_6 \times C_{10}$ (since $6 \mid 6 \mid 10$); and $C_2 \times C_6 \times C_{30}$ (since $2 \mid 6 \mid 30$). So there are six non-isomorphic abelian groups of order $360$.
[/example]
Normal Forms for Matrices
The most striking application of the structure theorem is to linear algebra: it gives a complete classification of linear maps $\alpha : V \to V$ up to conjugacy (i.e. up to change of basis), producing the rational canonical form and the Jordan normal form as two ways of presenting the same classification.
Setting Up the $\mathbb{F}[X]$-Module
Let $\mathbb{F}$ be a field and $V$ a finite-dimensional $\mathbb{F}$-vector space of dimension $n$, and let $\alpha : V \to V$ be a linear map. Turn $V$ into an $\mathbb{F}[X]$-module $V_\alpha$ by defining the action of the polynomial $f(X) = a_0 + a_1 X + \cdots + a_k X^k$ as
\begin{align*}
f \cdot v = f(\alpha)(v) = a_0 v + a_1 \alpha(v) + a_2 \alpha^2(v) + \cdots + a_k \alpha^k(v).
\end{align*}
Since $\mathbb{F} \subseteq \mathbb{F}[X]$, any $\mathbb{F}$-basis of $V$ generates $V_\alpha$ as an $\mathbb{F}[X]$-module, so $V_\alpha$ is finitely generated. Since $V$ is finite-dimensional over $\mathbb{F}$, the module $V_\alpha$ has no free $\mathbb{F}[X]$-summand (a free summand $\mathbb{F}[X]$ is infinite-dimensional over $\mathbb{F}$). By the Cayley–Hamilton theorem, the characteristic polynomial $\chi_\alpha$ annihilates $V$, so $\operatorname{Ann}(V_\alpha) \neq 0$.
The crucial observation is that an $\mathbb{F}$-linear change of basis $\alpha \mapsto P^{-1}\alpha P$ changes the $\mathbb{F}[X]$-module structure of $V$ to an isomorphic one (with the same underlying set $V$ but a new action defined by the new $\alpha$). Conversely, two isomorphic $\mathbb{F}[X]$-module structures on $V$ correspond to conjugate linear maps. So classifying linear maps on $V$ up to conjugacy is the same as classifying $\mathbb{F}[X]$-module structures on $V$ up to isomorphism.
Rational Canonical Form
Applying the structure theorem to $V_\alpha$ (with $R = \mathbb{F}[X]$, which is Euclidean) gives:
[quotetheorem:863]
The Rational Canonical Form is the direct output of the structure theorem for $\mathbb{F}[X]$-modules. Each cyclic summand $\mathbb{F}[X]/(f_i)$ has a preferred basis $\{1, X, X^2, \ldots, X^{\deg f_i - 1}\}$ modulo $(f_i)$, in which the action of $\alpha$ (multiplication by $X$) is represented by the companion matrix $c(f_i)$. The divisibility $f_1 \mid f_2 \mid \cdots \mid f_s$ is the divisibility of the corresponding invariant factors of the presentation matrix of $V_\alpha$.
Three important read-offs from the rational canonical form: the minimal polynomial of $\alpha$ is $f_s$ (the largest invariant factor, which annihilates every summand since $f_i \mid f_s$, and is minimal since $f_s$ is the annihilator of the last summand); the characteristic polynomial is $f_1 f_2 \cdots f_s$ (the product of all invariant factors); and the form is genuinely canonical — the invariant factors are uniquely determined, unlike the Jordan form which is canonical only up to block ordering.
[quoteproof:863]
[example: Computing the Rational Canonical Form]
Let $\alpha : \mathbb{Q}^3 \to \mathbb{Q}^3$ be the linear map with matrix
\begin{align*}
A = \begin{pmatrix} 0 & 0 & 1 \\ 1 & 0 & -1 \\ 0 & 1 & 1 \end{pmatrix}.
\end{align*}
The characteristic polynomial is $\chi_A = \det(XI - A) = X^3 - X^2 + X - 1 = (X-1)(X^2+1)$.
To find the invariant factors, compute the Smith normal form of $XI - A \in \mathbb{Q}[X]^{3\times 3}$:
\begin{align*}
XI - A = \begin{pmatrix} X & 0 & -1 \\ -1 & X & 1 \\ 0 & -1 & X-1 \end{pmatrix}.
\end{align*}
The $\gcd$ of all entries (the generator of $\mathrm{Fit}_1$) is $1$, so $d_1 = 1$. The $\gcd$ of all $2 \times 2$ minors (the generator of $\m
athrm{Fit}_2$, divided by $d_1 = 1$) is $1$, so $d_2 = 1$. The generator of $\mathrm{Fit}_3$ is $\det(XI - A) = (X-1)(X^2+1)$, so $d_3 = (X-1)(X^2+1)$. Thus $V_\alpha \cong \mathbb{Q}[X]/(1) \oplus \mathbb{Q}[X]/(1) \oplus \mathbb{Q}[X]/((X-1)(X^2+1))$, which simplifies to $\mathbb{Q}[X]/((X-1)(X^2+1))$. The single invariant factor $f_1 = (X-1)(X^2+1) = X^3 - X^2 + X - 1$ gives one $3 \times 3$ companion block: MATHENVcnclqeP0END which is just $A$ itself — a happy coincidence showing $A$ is al
r
eady in rational canonical form.
[/example]
Jordan Normal Form
Over $\mathbb{C}$, every polynomial factors into linear factors. This means the invariant factors $f_i$ of $V_\alpha$ factor completely into factors $(X - \lambda)^k$, and the Chinese remainder theorem for mo
d
ules ($R/(ab) \co
n
g R/(a) \oplus R/(b)$ when $\gcd(a,b) = 1$) further dec
omposes each summand $\mathbb{C}[X]/(f_i)$ into primary pieces $\mathbb{C}[X]/((X-\lambda)^k)$.
[quotetheorem:864]
The Jordan Normal Form is the prime decomposition version of the rational canonical form, available over algebraically closed fields. Each piece $\mathbb{C}[X]/((X-\lambda)^k)$ has basis $\{1, (X-\lambda), \ldots, (X-\lambda)^{k-1}\}$ modulo $((X-\lambda)^k)$, in which the action of $X$ (i.e. of $\alpha$) is: $(X-\lambda)^j \mapsto (X-\lambda)^{j+1}$ for $j < k-1$, and the identity $(X-\lambda)^{k-1} \mapsto 0$ (the term $(X-\lambda)^k$ vanishes). This means $\alpha$ acts as $\lambda \cdot \mathrm{id}$ plus a nilpotent shift — exactly the Jordan block $J_k(\lambda)$.
The minimal polynomial of $\alpha$ reads off as $\prod_\lambda (X-\lambda)^{a_\lambda}$ where $a_\lambda$ is the size of the largest $\lambda$-block; the characteristic polynomial is $\prod_\lambda (X-\lambda)^{b_\lambda}$ where $b_\lambda$ is the sum of all $\lambda$-block sizes.
[quoteproof:864]
[example: Jordan Normal Form — A Complete Computation]
Let $\alpha : \mathbb{C}^4 \to \mathbb{C}^4$ have characteristic polynomial $(X-2)^3(X+1)$ and minimal polynomial $(X-2)^2(X+1)$.
The minimal polynomial tells us: the largest Jordan $2$-block has size $2$, and the $(-1)$-block has size $1$. The characteristic polynomial tells us: the $2$-eigenspace contributes blocks totalling size $3$, and the $(-1)$-eigenspace contributes blocks totalling size $1$.
For eigenvalue $\lambda = 2$, total block size $3$, largest block size $2$: the only possibility is one $2$-block and one $1$-block (sizes $2, 1$, sum $= 3$, max $= 2$). For $\lambda = -1$: total size $1$, so one $1$-block.
The Jordan form is therefore:
\begin{align*}
J = \begin{pmatrix} 2 & 0 & 0 & 0 \\ 1 & 2 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & -1 \end{pmatrix}.
\end{align*}
(Blocks on the diagonal: $J_2(2)$, then $J_1(2)$, then $J_1(-1)$, with subdiagonal entries within each block.) The module decomposition is $V_\alpha \cong \mathbb{C}[X]/((X-2)^2) \oplus \mathbb{C}[X]/(X-2) \oplus \mathbb{C}[X]/(X+1)$.
[/example]
Cayley–Hamilton
Both normal forms give an immediate proof of the Cayley–Hamilton theorem, which in naive formulations ("a matrix satisfies its own characteristic polynomial") looks like it should be straightforward but is actually subtle to prove without the module machinery.
[quotetheorem:865]
The Cayley-Hamilton Theorem is an immediate corollary of the rational canonical form.
The characteristic
polynomial $\chi_\alpha = f_1 f_2 \cdots f_s$ divides $f_s^s$ (since $f_i \mid f_s$ for all $i$), but more directly: every summand $V_i \cong \mathbb{F}[X]/(f_i)$ is annihilated by $f_i$, and since $f_i \mid \chi_\alpha$, it is also annihilated by $\chi_\alpha$. Since $V$ is the direct sum of the $V_i$, the whole space is annihilated by $\chi_\alpha$. The theorem holds over any field and even over any commutative ring (with the appropriate generalization of characteristic polynomial via the adjugate matrix), but the field case from the rational ca
nonical for
m is the clearest.
[quoteproof:865]
[example: Cayley-Hamilton in Practice]
Let $A = MATHENVv5jbuwP0END$
over $\mat
hbb{Q}$. The characteristic poly
n
omial is $\chi_A = (X-1)(X-3) = X^2 - 4X + 3$. Cayl
e
y-Hamilton asserts $A^2 - 4A + 3I = 0$.
Computing directly:
\begin{align*}
A^2 = \begin{pmatrix} 1 & 8 \\ 0 & 9 \end{pmatrix}, \qquad
4A = \begin{pmatrix} 4 & 8 \\ 0 & 12 \end{pmatrix}, \qquad
3I = \begin{pmatrix} 3 & 0 \\ 0 & 3 \end{pmatrix}.
\end{align*}
So $A^2 - 4A + 3I = MATHENVhokuadP1END = MATHENVhokuadP2END$. Confirmed.
The theorem has important practic
al
cons
e
quences: any polynomial in $A$ of degree $\geq n$ can be reduced to one of degree $< n$ using the relation $\chi_A(A) = 0$. In particular, $A^{-1}$ (when it exists) can be expressed as a polynomial in $A$ of degree $< n$ with coefficients in $\mathbb{Q}$, readable from the adjugate formula: $A^{-1} = \frac{1}{\det A}((\mathrm{tr}\, A)I - A) = \frac{1}{3}(4I - A) = MATHENVdna6umP0END$.
[/exa
mp
le]
Attribution Debug Info:
Total segments: 115
Attributed segments: 60
Non-attributed segments: 55