Cambridge IB Linear Algebra
Cambridge IB Linear Algebra — Verification - Androma

Current Content

Debug: Found 1 attribution entries

First Attribution: Source: create, Text length: 165134, Start: N/A, End: N/A

Page content length: 160542

Introduction

Linear algebra is the study of vector spaces and the linear maps between them. It is, in a sense, the simplest and most complete branch of algebra: the objects are tame enough to admit a full classification (every finite-dimensional vector space is determined by a single number, its dimension), yet rich enough to underpin vast swathes of mathematics, physics, and computation.

The subject has a dual character. On one hand, it is a concrete toolkit: systems of linear equations, matrix arithmetic, Gaussian elimination, eigenvalue computations. These tools are indispensable in applications ranging from data science and engineering to quantum mechanics and economics. On the other hand, linear algebra is an abstract structural theory: vector spaces, linear maps, duality, and spectral decomposition form a language that organises and clarifies ideas across all of mathematics. The interplay between the concrete and the abstract — between matrices and the maps they represent — is the central theme of this course.

The development begins with the axioms of a vector space and the fundamental notions of linear independence, span, and dimension. We then study linear maps, their kernels and images, and the rank-nullity theorem — the first major structural result of the theory. Next, the dual space provides a second perspective on every vector space, revealing a hidden symmetry that connects subspaces to systems of linear equations. Bilinear forms extend linearity to functions of two variables, leading naturally to determinants (which detect invertibility and measure volume distortion) and to the classification of symmetric and Hermitian forms via Sylvester's law of inertia. The theory of endomorphisms — linear maps from a space to itself — culminates in the Jordan normal form, a complete classification up to similarity over algebraically closed fields. Finally, inner product spaces specialise the theory to the positive-definite case, where algebra meets geometry: orthogonality, projection, and the spectral theorem for self-adjoint and unitary maps.

Throughout, we work over an arbitrary field $\mathbb{F}$ unless a specific choice (typically $\mathbb{R}$ or $\mathbb{C}$) is required. The treatment is rigorous and proof-driven, with each result motivated by the questions it answers and the tools it provides for later developments.

Vector Spaces

When studying systems of linear equations, a striking pattern emerges: the set of all solutions to a homogeneous system is closed under addition and scalar multiplication. If $u$ and $v$ both solve $Ax = \mathbf{0}$, then so does $\lambda u + \mu v$ for any scalars $\lambda, \mu$. This algebraic closure is not special to solution sets of matrices — it appears equally in the space of all continuous functions on an interval, the space of polynomials of bounded degree, and the space of solutions to a linear differential equation. The goal of this chapter is to isolate the abstract structure shared by all these examples, develop the language of linear independence and bases, and prove that every finite-dimensional vector space is completely determined (up to isomorphism) by a single number: its dimension.

Throughout, $\mathbb{F}$ denotes an arbitrary field (such as $\mathbb{R}$, $\mathbb{C}$, or a finite field $\mathbb{F}_p$). All vector spaces are defined over this fixed field unless otherwise specified.

Core Definitions

The Vector Space Axioms

The examples above — solution sets, function spaces, polynomial spaces — all share the same algebraic operations: we can add elements and multiply by scalars, and these operations obey familiar rules (commutativity, associativity, distributivity). Rather than verify these properties separately in each context, we axiomatise them once and develop the theory in full generality. This is the power of abstraction: any result proved for an abstract vector space applies simultaneously to every concrete instance.

[definition:Vector Space]
Let $\mathbb{F}$ be a field. An $\mathbb{F}$-vector space is an abelian group $(V, +)$ equipped with a scalar multiplication map
\begin{align*} \mathbb{F} \times V &\to V \\ (\lambda, v) &\mapsto \lambda v, \end{align*}
satisfying the following axioms for all $\lambda, \mu \in \mathbb{F}$ and all $u, v \in V$:

  1. $\lambda(\mu v) = (\lambda \mu) v$ (associativity of scalar multiplication)
  2. $\lambda(u + v) = \lambda u + \lambda v$ (distributivity over vector addition)
  3. $(\lambda + \mu)v = \lambda v + \mu v$ (distributivity over field addition)
  4. $1_{\mathbb{F}} v = v$ (identity element of scalar multiplication)
    [/definition]

Two points deserve emphasis. First, the definition does not require $V$ to be finite-dimensional or to carry any notion of distance, angle, or convergence — those are additional structures (norms, inner products) that we will introduce much later. Second, the requirement that $(V, +)$ is an abelian group already encodes the existence of a zero vector $\mathbf{0}$ and additive inverses $-v$ for every $v \in V$; there is no need to include these as separate axioms.

Having stated the axioms, a natural question is whether familiar arithmetic properties — such as $0 \cdot v = \mathbf{0}$ — follow from them or need to be assumed separately. The following result confirms that they are consequences, not additional assumptions.

[quotetheorem:369]

This result is more than a sanity check. It confirms that the vector space axioms are self-consistent and sufficiently strong: the familiar arithmetic of vectors is not an extra assumption but a logical consequence. Part (i) is used implicitly throughout linear algebra whenever we need to recognise a zero vector from a scalar multiplication. Part (ii) ensures that the group-theoretic additive inverse and the scalar-multiplication inverse agree, so notation like $u - v$ is unambiguous.

[citeproof:369]

Subspaces

Given a vector space $V$, we often want to work with "smaller" vector spaces sitting inside $V$. For instance, the set of solutions to a single linear equation $a_1 x_1 + \cdots + a_n x_n = 0$ is a subset of $\mathbb{F}^n$ that is itself closed under the vector space operations. Rather than checking all four vector space axioms from scratch each time, the following definition provides a more efficient criterion: we need only check closure under addition and scalar multiplication, plus the presence of the zero vector.

[definition:Subspace]
Let $V$ be an $\mathbb{F}$-vector space. A subset $U \subseteq V$ is a subspace of $V$ if:

  1. $\mathbf{0} \in U$
  2. $u, v \in U$ implies $u + v \in U$ (closure under addition)
  3. $u \in U$, $\lambda \in \mathbb{F}$ implies $\lambda u \in U$ (closure under scalar multiplication)
    [/definition]

Equivalently, a non-empty subset $U$ is a subspace if and only if $\lambda u + \mu v \in U$ for all $\lambda, \mu \in \mathbb{F}$ and $u, v \in U$. The condition $\mathbf{0} \in U$ is not redundant if we do not assume $U$ is non-empty, since the empty set trivially satisfies conditions (2) and (3) but is not a vector space (it has no zero vector).

To see these definitions in action with a less trivial example than a solution set of a matrix equation, consider function spaces:

[example:Functions With Finite Support]
Let $X$ be a set and consider the $\mathbb{F}$-vector space $\mathbb{F}^X$ of all functions $f: X \to \mathbb{F}$, with pointwise addition and scalar multiplication. The support of $f \in \mathbb{F}^X$ is defined as $\mathrm{supp}(f) = \{x \in X : f(x) \neq 0\}$. We claim that the subset

\begin{align*} \mathbb{F}^{(X)} = \{f \in \mathbb{F}^X : |\mathrm{supp}(f)| < \infty\} \end{align*}

of functions with finite support is a subspace of $\mathbb{F}^X$.

Verification of (1): The zero function satisfies $\mathrm{supp}(0) = \emptyset$, which is finite, so $0 \in \mathbb{F}^{(X)}$.

Verification of (2): If $f, g \in \mathbb{F}^{(X)}$, then $\mathrm{supp}(f + g) \subseteq \mathrm{supp}(f) \cup \mathrm{supp}(g)$. Since both supports are finite, their union is finite, so $f + g \in \mathbb{F}^{(X)}$.

Verification of (3): If $f \in \mathbb{F}^{(X)}$ and $\lambda \in \mathbb{F}$, then $\mathrm{supp}(\lambda f) \subseteq \mathrm{supp}(f)$ (with equality when $\lambda \neq 0$). Since $\mathrm{supp}(f)$ is finite, $\lambda f \in \mathbb{F}^{(X)}$.

When $X$ is infinite, this is a proper subspace of $\mathbb{F}^X$ — for example, the function $f(x) = 1$ for all $x \in X$ has $\mathrm{supp}(f) = X$, which is infinite. This space $\mathbb{F}^{(X)}$ plays an important role in the theory of free vector spaces and will reappear when we discuss bases of infinite-dimensional spaces.
[/example]

Constructing New Subspaces

Sums and Intersections

Having established the basic objects — vector spaces and subspaces — we now need tools for building new subspaces from existing ones. Two natural set-theoretic operations present themselves: intersection and union. The intersection $U \cap W$ of two subspaces is always a subspace (closure under the operations is inherited from both $U$ and $W$). The union $U \cup W$, however, is almost never a subspace — for instance, in $\mathbb{R}^2$, the $x$-axis and $y$-axis are both subspaces, but their union is not closed under addition ($(1,0) + (0,1) = (1,1)$ lies in neither axis). The correct replacement for the union is the sum $U + W$, which is the smallest subspace containing both $U$ and $W$.

[definition:Sum Of Subspaces]
Let $V$ be an $\mathbb{F}$-vector space and let $U, W \subseteq V$ be subspaces. The sum of $U$ and $W$ is

\begin{align*} U + W = \{u + w : u \in U,\, w \in W\}. \end{align*}
[/definition]

The claim that these constructions actually produce subspaces — and that the sum is minimal among subspaces containing $U$ and $W$ — requires proof.

[quotetheorem:370]

The minimality statement in part (ii) is what makes the sum the "right" algebraic replacement for the union: $U + W$ is the smallest subspace containing $U \cup W$. This is analogous to the fact that the subgroup generated by two subgroups of a group is the set of all products of their elements. The intersection result in part (i) extends by induction to arbitrary collections of subspaces: if $\{U_\alpha\}_{\alpha \in A}$ is any family of subspaces of $V$, then $\bigcap_{\alpha \in A} U_\alpha$ is a subspace of $V$.

[citeproof:370]

Quotient Spaces

Sometimes we want to "collapse" a subspace to a single point and study the resulting structure. For instance, if $U \subseteq V$ is a subspace, we might want to consider two vectors $v$ and $w$ as "equivalent" whenever their difference lies in $U$. This equivalence relation $v \sim w \iff v - w \in U$ partitions $V$ into cosets, and the key insight is that the vector space operations descend to these cosets in a well-defined way.

[definition:Quotient Space]
Let $V$ be an $\mathbb{F}$-vector space and let $U \subseteq V$ be a subspace. The quotient space $V/U$ is the set of cosets

\begin{align*} V/U = \{v + U : v \in V\}, \quad \text{where } v + U = \{v + u : u \in U\}, \end{align*}

with addition $(v + U) + (w + U) = (v + w) + U$ and scalar multiplication $\lambda(v + U) = \lambdav + U$.
[/definition]

Well-definedness of these operations must be checked: if $v + U = v' + U$ and $w + U = w' + U$, then $v - v' \in U$ and $w - w' \in U$, so $(v + w) - (v' + w') = (v - v') + (w - w') \in U$ (by closure of $U$ under addition), hence $(v + w) + U = (v' + w') + U$. Similarly, $\lambdav - \lambdav' = \lambda(v - v') \in U$ (by closure of $U$ under scalar multiplication), so $\lambdav + U = \lambdav' + U$. With these operations, $V/U$ is an $\mathbb{F}$-vector space with zero element $\mathbf{0} + U = U$.

The intuition behind the quotient space is geometric: in $\mathbb{R}^3$, if $U$ is a line through the origin, then $V/U$ is the set of all lines parallel to $U$. Each coset $v + U$ is one such parallel line, and the quotient space captures the "directions transverse to $U$."

Linear Independence and Bases

Span and Linear Independence

The central question in understanding a vector space is: what is the minimal amount of information needed to describe every vector in the space? In $\mathbb{R}^3$, the three standard basis vectors suffice — every vector is a unique linear combination of them. But which subsets of a general vector space have this property? To answer this, we need two concepts: spanning (can we reach every vector?) and linear independence (is each vector in the set contributing something that the others cannot?).

[definition:Span]
Let $V$ be an $\mathbb{F}$-vector space and let $S \subseteq V$. The span of $S$ is the set of all finite linear combinations of elements of $S$:

\begin{align*} \langle S \rangle = \left\{\sum_{i=1}^{n} \lambda_i s_i : n \in \mathbb{N},\, \lambda_i \in \mathbb{F},\, s_i \in S\right\}. \end{align*}

By convention, $\langle \emptyset \rangle = \{\mathbf{0}\}$ (the empty linear combination gives the zero vector).
[/definition]

The requirement that only finitely many terms appear in each linear combination is essential — without it, the definition would require a notion of convergence, which we do not have in a bare vector space. The span $\langle S \rangle$ is always a subspace of $V$ (in fact, it is the smallest subspace containing $S$), but it may equal $V$ even when $S$ is infinite, provided every vector can be reached by some finite combination.

Having captured the notion of "reaching every vector," we now need its complement: a condition ensuring that no element of the set is redundant.

[definition:Linear Independence]
Let $V$ be an $\mathbb{F}$-vector space. A subset $S \subseteq V$ is linearly independent if for every finite collection of distinct elements $s_1, \dots, s_n \in S$ and scalars $\lambda_1, \dots, \lambda_n \in \mathbb{F}$,

\begin{align*} \sum_{i=1}^n \lambda_i s_i = \mathbf{0} \implies \lambda_i = 0 \text{ for all } i \in \{1, \dots, n\}. \end{align*}

A subset that is not linearly independent is called linearly dependent.
[/definition]

The definition says: the only way to produce the zero vector from elements of $S$ is the trivial way (all coefficients zero). The following result gives a more concrete and often more useful way to think about dependence.

[quotetheorem:371]

This characterisation is the workhorse for detecting linear dependence in practice. It says that a set is dependent precisely when it contains a "redundant" element — one that lies in the span of the others. Conversely, a set is independent precisely when no element is a linear combination of the remaining ones. This perspective is essential for the Steinitz Exchange Lemma below, where we will systematically replace redundant spanning vectors with independent ones.

[citeproof:371]

Bases and Unique Representation

A subset that simultaneously spans the space and contains no redundant elements provides the most efficient possible description: every vector can be written as a linear combination in exactly one way. This is the concept of a basis.

[definition:Basis]
Let $V$ be an $\mathbb{F}$-vector space. A subset $B \subseteq V$ is a basis for $V$ if $B$ is linearly independent and $\langle B \rangle = V$.
[/definition]

The power of a basis lies in the fact that it provides a coordinate system for the space. The following result makes this precise: a basis is characterised by the existence of unique representations.

[quotetheorem:372]

This is the foundation for matrix representations of linear maps: once we fix a basis $\{e_1, \dots, e_n\}$ for $V$, every vector $v$ is uniquely determined by its coordinate vector $(\lambda_1, \dots, \lambda_n) \in \mathbb{F}^n$. The map $v \mapsto (\lambda_1, \dots, \lambda_n)$ is an isomorphism $V \cong \mathbb{F}^n$, reducing the study of abstract finite-dimensional vector spaces to the concrete study of column vectors. The uniqueness clause is what distinguishes a basis from a mere spanning set — a spanning set that is not a basis will have multiple representations for some vectors, making the coordinate map ill-defined.

[citeproof:372]

The Steinitz Exchange Lemma and Dimension

The Exchange Lemma

The fundamental question of the theory is now: do all bases of a given vector space have the same size? If they do, then "dimension" is well-defined; if not, the concept would be meaningless. The key to answering this is the Steinitz Exchange Lemma, which provides a precise relationship between linearly independent sets and spanning sets. The idea is that independent vectors can be "swapped in" for spanning vectors one at a time, without losing the spanning property — and this process must terminate before we run out of spanning vectors to replace.

[quotetheorem:373]

This result is the engine of finite-dimensional linear algebra. The inequality $n \leq m$ is the crucial part: it says that no linearly independent set can be larger than any spanning set. Applied with both sets being bases (which are simultaneously independent and spanning), it forces all bases to have the same size. The exchange mechanism in part (ii) is equally important in practice — it gives a constructive procedure for extending independent sets to bases and for extracting bases from spanning sets.

[citeproof:373]

Well-Definedness of Dimension

With the Steinitz Exchange Lemma in hand, the theory of dimension follows rapidly. The next result collects the fundamental consequences: all bases have the same size (so dimension is well-defined), and there are several equivalent ways to recognise a basis.

[quotetheorem:374]

Parts (i)–(iii) are the theoretical payoff: dimension is well-defined, and to verify that a set is a basis, it suffices to check either independence or spanning (not both), provided the set has the right number of elements. Parts (iv) and (v) are the practical payoff: they guarantee that bases can always be found, either by trimming a spanning set or by extending an independent set. These tools are used constantly — for instance, part (v) is the key step in proving the dimension formula for quotient spaces later in this chapter.

[citeproof:374]

With well-definedness established, we can now introduce dimension as a formal concept.

[definition:Dimension]
Let $V$ be an $\mathbb{F}$-vector space possessing a finite basis. The dimension of $V$, denoted $\dim_{\mathbb{F}} V$ (or simply $\dim V$ when the field is clear), is the number of elements in any basis for $V$.
[/definition]

The subscript $\mathbb{F}$ is sometimes essential: the complex numbers $\mathbb{C}$ form a $1$-dimensional vector space over $\mathbb{C}$ (with basis $\{1\}$), but a $2$-dimensional vector space over $\mathbb{R}$ (with basis $\{1, i\}$). The dimension depends on which field of scalars we use, not just on the underlying set.

Dimension Formulae

Dimension converts questions about subspaces into arithmetic. The following results express the dimensions of subspaces, sums, and quotients in terms of one another, providing the principal computational tools of the theory.

The first result shows that passing to a subspace can only decrease dimension, and that the decrease is strict for proper subspaces. This has an important consequence: in a finite-dimensional space, a subspace of the same dimension must be the whole space — a fact used repeatedly when verifying that a linear map is surjective.

[quotetheorem:375]

The strict inequality for proper subspaces is perhaps the most frequently used dimension argument in linear algebra. Its contrapositive — if $U \subseteq V$ is a subspace with $\dim U = \dim V$, then $U = V$ — provides the standard technique for proving that a subspace is in fact the whole space: simply exhibit enough linearly independent vectors.

[citeproof:375]

The next result is the inclusion-exclusion principle for dimensions, analogous to the formula $|A \cup B| = |A| + |B| - |A \cap B|$ for finite sets. It controls how dimensions interact when we combine subspaces.

[quotetheorem:376]

The formula has an elegant geometric interpretation in $\mathbb{R}^3$: if $U$ and $W$ are two planes through the origin, their sum is $\mathbb{R}^3$ (dimension $3$) and their intersection is a line (dimension $1$), so the formula reads $3 = 2 + 2 - 1$. If instead $U$ and $W$ are a plane and a line not contained in it, the intersection is $\{\mathbf{0}\}$ (dimension $0$), and the formula gives $3 = 2 + 1 - 0$. This connects directly to the theory of direct sums below: $V = U \oplus W$ precisely when $U + W = V$ and $\dim(U \cap W) = 0$.

[citeproof:376]

Finally, the quotient space dimension formula relates the dimension of a space to the dimensions of a subspace and its quotient. This is the prototype for the rank-nullity theorem, which will appear in the next chapter on linear maps.

[quotetheorem:377]

The formula $\dim V = \dim U + \dim(V/U)$ says that the "degrees of freedom" in $V$ split into those "along $U$" and those "transverse to $U$." When we study linear maps in the next chapter, the same idea — applied to the kernel and image of a map — will yield the rank-nullity theorem. The proof constructs an explicit basis for $V/U$ by taking a basis for $U$, extending it to a basis for $V$, and projecting the extension vectors into the quotient.

[citeproof:377]

Direct Sums

Internal Direct Sums

The dimension formula for sums simplifies dramatically when the two subspaces have trivial intersection: in that case, $\dim(U + W) = \dim U + \dim W$, and every vector in $U + W$ has a unique decomposition into a $U$-component and a $W$-component. This situation is so important that it deserves its own name.

[definition:Internal Direct Sum]
Let $V$ be an $\mathbb{F}$-vector space and let $U, W \subseteq V$ be subspaces. We say that $V$ is the internal direct sum of $U$ and $W$, and write $V = U \oplus W$, if:

  1. $U + W = V$
  2. $U \cap W = \{\mathbf{0}\}$
    [/definition]

The two conditions together are equivalent to requiring that every $v \in V$ can be written uniquely as $v = u + w$ with $u \in U$ and $w \in W$. Indeed, condition (1) gives existence, and condition (2) gives uniqueness: if $u_1 + w_1 = u_2 + w_2$, then $u_1 - u_2 = w_2 - w_1 \in U \cap W = \{\mathbf{0}\}$. By the Dimension Formula for Subspace Sums, when $V = U \oplus W$, we have $\dim V = \dim U + \dim W$.

[example:Direct Sum Decomposition Of R3]
Consider $U = \langle (1,0,0), (0,1,0) \rangle$ (the $xy$-plane) and $W = \langle (0,0,1) \rangle$ (the $z$-axis) in $\mathbb{R}^3$. Every vector $(x,y,z) \in \mathbb{R}^3$ decomposes uniquely as

\begin{align*} (x,y,z) = \underbrace{(x,y,0)}_{\in\, U} + \underbrace{(0,0,z)}_{\in\, W}. \end{align*}

Since $U + W = \mathbb{R}^3$ and $U \cap W = \{(0,0,0)\}$, we have $\mathbb{R}^3 = U \oplus W$. This is consistent with the dimension formula: $3 = 2 + 1$.
[/example]

External Direct Sums

The internal direct sum starts with subspaces of an ambient space and decomposes that space. The external direct sum goes in the other direction: it constructs a new vector space from two given spaces.

[definition:External Direct Sum]
Let $U$ and $W$ be $\mathbb{F}$-vector spaces. Their external direct sum is the Cartesian product

\begin{align*} U \oplus W = \{(u, w) : u \in U,\, w \in W\}, \end{align*}

equipped with componentwise operations: $(u_1, w_1) + (u_2, w_2) = (u_1 + u_2, w_1 + w_2)$ and $\lambda(u, w) = (\lambdau, \lambdaw)$.
[/definition]

The two notions of direct sum are related as follows. Within the external direct sum $U \oplus W$, the subsets $\hat{U} = \{(u, \mathbf{0}) : u \in U\}$ and $\hat{W} = \{(\mathbf{0}, w) : w \in W\}$ are subspaces with $\hat{U} \cong U$ and $\hat{W} \cong W$, and $U \oplus W$ is the internal direct sum of $\hat{U}$ and $\hat{W}$. Conversely, whenever $V = U \oplus W$ internally, the map $(u, w) \mapsto u + w$ is an isomorphism from the external direct sum to $V$. The two constructions are thus two perspectives on the same idea: decomposing a space into independent pieces.

Both definitions extend naturally to finite collections $U_1, \dots, U_k$ of subspaces. For the internal direct sum, we require $U_1 + \cdots + U_k = V$ and $U_j \cap (U_1 + \cdots + U_{j-1} + U_{j+1} + \cdots + U_k) = \{\mathbf{0}\}$ for each $j$. Direct sum decompositions will play a central role in the theory of endomorphisms, where we decompose a space into eigenspaces or generalised eigenspaces.

A vector space is an algebraic structure — a set with operations. To understand vector spaces, we need to understand the maps between them that respect this structure. Just as group theory studies groups through their homomorphisms, and ring theory studies rings through their ideals and homomorphisms, linear algebra studies vector spaces through linear maps. The next chapter develops this perspective, revealing that the entire theory of systems of linear equations is a special case of a single structural theorem: the rank-nullity theorem.

Linear Maps

Having established the foundational structure of vector spaces — subspaces, bases, and dimension — we now ask: how can we relate one vector space to another in a way that respects their linear structure? In analysis and topology, spaces are often studied through the maps between them (continuous functions, homeomorphisms, smooth maps). Analogously, in linear algebra, the appropriate maps are those that preserve vector addition and scalar multiplication. These are the linear maps, and they form the backbone of the theory: they link abstract vector spaces to concrete matrix computations, provide the language for solving systems of linear equations, and enable powerful structural tools like the rank-nullity theorem.

The study of linear maps requires only the vector space axioms from the previous chapter. No additional structure — such as inner products or norms — is assumed. Our goal is to understand how linear maps encode the essential features of vector spaces, how they can be represented by matrices once bases are chosen, and how their kernel and image reveal deep structural information about the spaces they connect.

Core Definitions

Linear Maps and Isomorphisms

The examples from the previous chapter — solution sets of linear systems, function spaces, polynomial spaces — all admit natural maps between them: projection onto a subspace, differentiation of polynomials, evaluation at a point. What these maps share is that they preserve the operations of addition and scalar multiplication. Rather than treat each example separately, we axiomatise this property.

[definition:Linear Map]
Let $U$ and $V$ be vector spaces over a field $\mathbb{F}$. A function $\alpha: U \to V$ is a linear map if for all $u_1, u_2 \in U$ and all $\lambda, \mu \in \mathbb{F}$,

\begin{align*} \alpha(\lambda u_1 + \mu u_2) = \lambda \alpha(u_1) + \mu \alpha(u_2). \end{align*}

We denote the set of all linear maps from $U$ to $V$ by $\mathcal{L}(U, V)$.
[/definition]

The single condition above is equivalent to requiring both $\alpha(u_1 + u_2) = \alpha(u_1) + \alpha(u_2)$ and $\alpha(\lambda u) = \lambda \alpha(u)$ separately. The definition does not require $\alpha$ to be injective, surjective, or invertible — those are additional properties with important consequences. Also, the field $\mathbb{F}$ matters: complex conjugation $z \mapsto \bar{z}$ on $\mathbb{C}$ is $\mathbb{R}$-linear but not $\mathbb{C}$-linear, since $\overline{iz} = -i\bar{z} \neq i\bar{z}$ in general.

When two vector spaces are related by an invertible linear map, they are structurally identical from the perspective of linear algebra. This is the strongest form of equivalence between vector spaces.

[definition:Isomorphism]
A linear map $\alpha: U \to V$ is an isomorphism if there exists a linear map $\beta: V \to U$ such that $\alpha \circ \beta = \mathrm{id}_V$ and $\beta \circ \alpha = \mathrm{id}_U$. If such an isomorphism exists, we say $U$ and $V$ are isomorphic and write $U \cong V$.
[/definition]

The definition requires the inverse to be linear, but this turns out to be automatic. If a linear map is bijective (as a function of sets), then its set-theoretic inverse is automatically linear. This is the content of the following result, which reduces the abstract notion of isomorphism to the concrete, verifiable condition of bijectivity.

[quotetheorem:378]

The key content is in the reverse direction: a bijective linear map automatically has a linear inverse, so there is no need to check linearity of the inverse separately. This is a special feature of vector spaces — in other algebraic settings (rings, for instance), a bijective homomorphism need not have a homomorphic inverse. The proof works by showing that for any linear combination in $V$, the preimage of that combination is the same linear combination of the preimages, which follows from linearity of $\alpha$ and the fact that bijectivity lets us move freely between $U$ and $V$.

[citeproof:378]

Kernel and Image

Given a linear map $\alpha: U \to V$, two subsets of the domain and codomain arise naturally: the set of vectors that $\alpha$ sends to zero, and the set of vectors that $\alpha$ actually hits. These are the kernel and image, and they are the primary tools for understanding the structure of $\alpha$.

[definition:Kernel]
Let $\alpha: U \to V$ be a linear map. The kernel of $\alpha$ is

\begin{align*} \ker \alpha = \{u \in U : \alpha(u) = \mathbf{0}\}. \end{align*}
[/definition]

[definition:Image]
Let $\alpha: U \to V$ be a linear map. The image of $\alpha$ is

\begin{align*} \mathrm{im}\,\alpha = \{\alpha(u) : u \in U\}. \end{align*}
[/definition]

Both $\ker\alpha$ and $\mathrm{im}\,\alpha$ are subspaces — of $U$ and $V$ respectively — as a direct consequence of linearity. The kernel measures the "failure of injectivity": $\alpha$ is injective if and only if $\ker\alpha = \{\mathbf{0}\}$. The image measures the "reach" of $\alpha$: $\alpha$ is surjective if and only if $\mathrm{im}\,\alpha = V$.

[example:Kernel And Image Of A Matrix Map]
Let $A \in \mathrm{Mat}_{m,n}(\mathbb{F})$ and define $\alpha: \mathbb{F}^n \to \mathbb{F}^m$ by $\alpha(v) = Av$. Then $\mathrm{im}\,\alpha$ is the column space of $A$ (the span of the columns), and the system $Ax = b$ has a solution if and only if $b \in \mathrm{im}\,\alpha$. The kernel $\ker\alpha$ is the solution space (null space) of the homogeneous system $Ax = \mathbf{0}$, so the general solution to $Ax = b$ (when it exists) is a coset $x_0 + \ker\alpha$ for any particular solution $x_0$.
[/example]

Linear Maps and Linear Structure

Preservation of Independence and Spanning

A fundamental question is: how do linear maps interact with the structural features of vector spaces — linear independence, spanning, and bases? The answer is clean and reflects the injection/surjection dichotomy. Injective maps cannot create new dependence relations (they preserve independence), surjective maps cannot lose spanning (they preserve spanning), and isomorphisms preserve everything.

[quotetheorem:379]

Part (iii) is the most important consequence for the theory: isomorphisms preserve bases, and hence dimension. This means that isomorphic spaces have the same dimension. The converse is also true, and is the content of the classification theorem below. The proof of part (i) is instructive: if $\sum \lambda_i \alpha(s_i) = \mathbf{0}$, linearity gives $\alpha(\sum \lambda_i s_i) = \mathbf{0}$, and injectivity forces $\sum \lambda_i s_i = \mathbf{0}$, whence independence of $S$ gives $\lambda_i = 0$.

[citeproof:379]

Classification of Finite-Dimensional Spaces

The previous result shows that isomorphic spaces have the same dimension. The remarkable fact is that the converse holds: dimension is a complete invariant for finite-dimensional vector spaces. Two spaces of the same dimension are always isomorphic, regardless of how different they might look concretely.

[quotetheorem:380]

This theorem is pivotal: it tells us that, up to isomorphism, there is exactly one $n$-dimensional vector space over $\mathbb{F}$ for each $n$ — namely $\mathbb{F}^n$. Every abstract $n$-dimensional vector space (polynomials of degree at most $n-1$, the space of $n \times n$ symmetric matrices, a solution space of a differential equation) is just $\mathbb{F}^n$ in disguise. The isomorphism is not canonical — it depends on a choice of basis — but its existence is guaranteed. The proof constructs one explicitly: pick a basis for each space and map the $i$th basis vector of one to the $i$th basis vector of the other.

[citeproof:380]

Linear Maps and Matrices

Determination by Values on a Basis

One of the most useful features of linear maps is that they are completely determined by their values on a basis. Since every vector in the domain is a unique linear combination of basis vectors, the image of any vector is forced once we know the images of the basis. This means that to specify a linear map between finite-dimensional spaces, we need only specify finitely many vectors — one for each basis element.

[quotetheorem:381]

This is the theoretical foundation for representing linear maps as matrices: a linear map is an infinite amount of data (the image of every vector in the domain), but by this theorem, it is equivalent to a finite amount (the images of basis vectors). The proof constructs the extension by the only possible formula — $\alpha(\sum \lambda_i e_i) = \sum \lambda_i v_i$ — and verifies that this is well-defined (by uniqueness of basis representations from Unique Representation by a Basis) and linear.

[citeproof:381]

The Matrix–Linear Map Correspondence

With the extension theorem in hand, we can make the matrix representation precise. Fixing bases for both the domain and codomain, every linear map corresponds to a unique matrix (whose columns are the coordinate vectors of the images of the domain basis), and every matrix corresponds to a unique linear map. This correspondence is not just a bijection — it is itself a linear map, making the space of linear maps isomorphic to a matrix space.

[quotetheorem:382]

The column convention deserves emphasis: the $i$th column of $A$ encodes the coordinates of $\alpha(e_i)$ in the codomain basis $(f_j)$. This is the convention used throughout linear algebra and most of mathematics (though some sources use the transpose convention). The dimension formula $\dim\mathcal{L}(U,V) = (\dim U)(\dim V)$ is an immediate consequence: it is the dimension of the matrix space $\mathrm{Mat}_{n,m}(\mathbb{F})$.

[citeproof:382]

Composition and Matrix Multiplication

The matrix representation converts the algebraic operation of composition into the computational operation of matrix multiplication. This is not a coincidence but a fundamental structural property: the representation map is a ring homomorphism (preserving both addition and composition/multiplication).

[quotetheorem:383]

This result is what makes matrix multiplication the "right" operation on matrices — it is defined precisely so that the matrix of a composition is the product of the matrices. The order reversal ($BA$, not $AB$) reflects the order of function composition: $\beta \circ \alpha$ means "first apply $\alpha$, then $\beta$." This is the source of the non-commutativity of matrix multiplication. The proof is a direct computation: evaluate $(\beta \circ \alpha)(e_i)$, expand using linearity, and recognise the coefficients as the entries of the matrix product.

[citeproof:383]

The First Isomorphism Theorem and Rank-Nullity

The First Isomorphism Theorem

The kernel and image of a linear map are linked by a deep structural result that is the linear-algebraic analogue of the first isomorphism theorem for groups. Every linear map factors as: first, project onto the quotient by the kernel (collapsing all fibres to points), then embed into the image. The quotient step loses exactly the information in the kernel, and what remains is a perfect copy of the image.

[quotetheorem:384]

The factorisation $\alpha = \vartheta \circ \pi$ (where $\pi: U \to U/\ker\alpha$ is the quotient map) shows that every linear map decomposes into a surjection followed by an isomorphism followed by an inclusion. This is the universal factorisation of linear maps: the quotient $U/\ker\alpha$ is the "essential" part of the domain — the part that $\alpha$ actually uses. The proof verifies well-definedness of $\vartheta$ (two representatives of the same coset have the same image under $\alpha$, since their difference lies in the kernel), then checks injectivity (the only coset mapping to zero is the kernel itself) and surjectivity (every element of the image is hit by definition).

[citeproof:384]

Rank-Nullity

When the domain is finite-dimensional, the first isomorphism theorem immediately yields a dimension formula. Since $U/\ker\alpha \cong \mathrm{im}\,\alpha$, the dimensions of these spaces are equal. The quotient dimension formula from the previous chapter then converts this into an equation relating the dimension of the domain, the kernel, and the image.

[definition:Rank And Nullity]
Let $\alpha: U \to V$ be a linear map with $U$ finite-dimensional. The rank of $\alpha$ is $r(\alpha) = \dim\mathrm{im}\,\alpha$, and the nullity is $n(\alpha) = \dim\ker\alpha$.
[/definition]

The rank measures the "output dimension" of $\alpha$ — how much of the codomain it actually reaches. The nullity measures the "wasted dimension" — how much of the domain is collapsed to zero. The rank-nullity theorem says these two quantities account for all of $\dim U$, with no overlap and no remainder.

[quotetheorem:385]

This is the cornerstone of finite-dimensional linear algebra. It quantifies the trade-off between injectivity and surjectivity: any "dimension" lost to the kernel reappears in the image, and vice versa. For a map $\alpha: \mathbb{F}^m \to \mathbb{F}^n$ represented by a matrix $A$, the rank is the number of pivots in any row echelon form of $A$, and the nullity is the number of free variables in the homogeneous system $Ax = \mathbf{0}$. The proof is short: combine the First Isomorphism Theorem ($U/\ker\alpha \cong \mathrm{im}\,\alpha$) with the Rank-Nullity for Quotient Spaces ($\dim U = \dim\ker\alpha + \dim(U/\ker\alpha)$).

[citeproof:385]

[example:Applications Of Rank Nullity]
Computing the dimension of a solution space. Let $W = \{x \in \mathbb{R}^5 : x_1 + x_2 + x_3 = 0,\; x_3 - x_4 - x_5 = 0\}$. Define $\alpha: \mathbb{R}^5 \to \mathbb{R}^2$ by

\begin{align*} \alpha(x) = \begin{pmatrix} x_1 + x_2 + x_3 \\ x_3 - x_4 - x_5 \end{pmatrix}. \end{align*}

Then $\ker\alpha = W$. To find $r(\alpha)$, observe that $\alpha(1,0,0,0,0) = (1,0)$ and $\alpha(0,0,1,0,0) = (1,1)$. Since $(1,0)$ and $(1,1)$ are linearly independent in $\mathbb{R}^2$, we have $\mathrm{im}\,\alpha = \mathbb{R}^2$, so $r(\alpha) = 2$. By rank-nullity, $\dim W = 5 - 2 = 3$.

Recovering the dimension formula for subspace sums. For subspaces $U, W \subseteq V$ (finite-dimensional), define $\alpha: U \oplus W \to V$ by $\alpha(u, w) = u + w$. Then $\mathrm{im}\,\alpha = U + W$ and $\ker\alpha = \{(u, -u) : u \in U \cap W\} \cong U \cap W$. The rank-nullity theorem gives $\dim U + \dim W = \dim(U + W) + \dim(U \cap W)$, recovering the Dimension Formula for Subspace Sums.
[/example]

Injective iff Surjective in Equal Dimensions

In the "square" case — when $\dim U = \dim V$ — rank-nullity implies a striking simplification: injectivity, surjectivity, and bijectivity are all equivalent. This is a distinctly finite-dimensional phenomenon; in infinite dimensions, the left shift operator on $\ell^2$ is surjective but not injective, and the right shift is injective but not surjective.

[quotetheorem:386]

This equivalence is used constantly in practice. To show that a linear map $\alpha: V \to V$ (an endomorphism) is an isomorphism, it suffices to check either injectivity or surjectivity — whichever is easier. For instance, to verify that a matrix $A \in \mathrm{Mat}_n(\mathbb{F})$ is invertible, it is enough to show that $Ax = \mathbf{0}$ implies $x = \mathbf{0}$. The proof is a direct application of rank-nullity: if $n(\alpha) = 0$ then $r(\alpha) = \dim U = \dim V$, forcing the image to be all of $V$.

[citeproof:386]

Change of Basis and Matrix Equivalence

The Change of Basis Formula

The matrix representing a linear map depends on the choice of bases for both the domain and codomain. Changing bases transforms the matrix in a precise and predictable way: the new matrix is obtained from the old by multiplying on the left and right by the (invertible) change-of-basis matrices. This is the theoretical foundation for the theory of normal forms — choosing bases to make the matrix as simple as possible.

[quotetheorem:387]

The formula $B = Q^{-1}AP$ has a clean interpretation in terms of composition. The change-of-basis matrix $P$ converts coordinates in the new domain basis to coordinates in the old domain basis, and $Q^{-1}$ converts from the old codomain basis to the new one. The composition $Q^{-1}AP$ says: convert to old coordinates, apply $\alpha$, convert back to new coordinates. When $U = V$ and we use the same basis change for both domain and codomain (so $P = Q$), the formula becomes $B = P^{-1}AP$, which is the similarity relation for square matrices.

[citeproof:387]

Matrix Equivalence and Canonical Form

The change-of-basis formula defines a natural equivalence relation on matrices: two matrices are equivalent if they represent the same linear map with respect to different bases.

[definition:Equivalent Matrices]
Matrices $A, B \in \mathrm{Mat}_{n,m}(\mathbb{F})$ are equivalent if there exist invertible matrices $P \in \mathrm{GL}_m(\mathbb{F})$ and $Q \in \mathrm{GL}_n(\mathbb{F})$ such that $B = Q^{-1}AP$.
[/definition]

Two equivalent matrices represent the same linear map, merely described in different coordinate systems. The natural question is: what is the simplest matrix in each equivalence class? The answer is given by the canonical form theorem, which shows that every matrix is equivalent to a matrix with a block of ones on the diagonal and zeros everywhere else.

[quotetheorem:388]

The canonical form is as simple as a matrix can possibly be while retaining the essential information — the rank $r$. It immediately implies that two matrices are equivalent if and only if they have the same rank, providing a complete invariant for the equivalence relation. The proof is constructive: starting from the linear map $\alpha$ represented by $A$, choose a basis for $\ker\alpha$, extend to a basis for $U$, and show that the images of the extension vectors give a basis for $\mathrm{im}\,\alpha$. In these carefully chosen bases, the matrix takes the canonical form.

[citeproof:388]

An elegant corollary of the canonical form is the equality of row rank and column rank — a result that is not at all obvious from the definition, since rows and columns live in different spaces.

[quotetheorem:389]

The proof reduces both notions of rank to the single number $r$ appearing in the canonical form. Left-multiplication by an invertible matrix preserves the row space (it applies an invertible transformation to the rows), and right-multiplication preserves the column space. Since equivalent matrices have the same row and column ranks, and the canonical form has both equal to $r$, the result follows.

[citeproof:389]

Elementary Matrix Operations

Elementary Matrices

The canonical form theorem guarantees the existence of invertible matrices $P$ and $Q$ achieving the reduction, but it does not say how to find them algorithmically. The answer is provided by elementary row and column operations — the basic moves of Gaussian elimination — which correspond to multiplication by simple invertible matrices.

[definition:Elementary Matrices]
The elementary matrices in $\mathrm{Mat}_n(\mathbb{F})$ are:

  1. Row swap $S_{ij}^n$: the identity matrix with rows $i$ and $j$ interchanged.
  2. Row shear $E_{ij}^n(\lambda)$: the identity matrix with $\lambda \in \mathbb{F}$ added to the $(i,j)$-entry, where $i \neq j$.
  3. Row scaling $T_i^n(\lambda)$: the identity matrix with the $(i,i)$-entry replaced by $\lambda \neq 0$.
    [/definition]

Left-multiplication by an elementary matrix performs the corresponding row operation on a matrix; right-multiplication performs the corresponding column operation. Each elementary matrix is invertible (swaps are self-inverse, shears are inverted by negating $\lambda$, scalings by replacing $\lambda$ with $\lambda^{-1}$), so elementary operations are reversible.

The constructive content of the canonical form theorem is that the invertible matrices $P$ and $Q$ can always be decomposed as products of elementary matrices. This gives an explicit algorithm — a sequence of row and column operations — for reducing any matrix to its canonical form.

[quotetheorem:390]

This result underpins all practical algorithms in numerical linear algebra: Gaussian elimination for solving linear systems, LU factorisation, computing rank, finding inverses. The algorithm processes columns left to right, using pivoting (swaps), normalisation (scaling), and elimination (shears) to create the identity block in the top-left corner and clear everything else. Each step is encoded as multiplication by an elementary matrix, so the accumulated product gives the change-of-basis matrices $P$ and $Q$ explicitly.

[citeproof:390]

[example:Elementary Reduction Of A Matrix]
Consider the matrix $A = MATHENVgxq8ujP12END \in \mathrm{Mat}_3(\mathbb{R})$. We reduce it to canonical form by elementary row operations.

Step 1: Subtract row 1 from row 3: $E_{31}^3(-1) A = MATHENVgxq8ujP13END$.

Step 2: Add row 2 to row 3: $E_{32}^3(1) E_{31}^3(-1) A = MATHENVgxq8ujP14END$.

Step 3: Scale row 3 by $\frac{1}{2}$: $T_3^3(\tfrac{1}{2}) E_{32}^3(1) E_{31}^3(-1) A = MATHENVgxq8ujP15END$.

Step 4: Subtract row 3 from row 2: $MATHENVgxq8ujP16END$.

Step 5: Subtract row 2 from row 1: $MATHENVgxq8ujP17END = I_3$.

Since the canonical form is $I_3$, the rank is $3$. By the Rank-Nullity Theorem, any linear map with this matrix has nullity $0$, so it is injective and hence (by Injective iff Surjective in Finite Dimensions) an isomorphism.
[/example]

[problem]
Let $\alpha: \mathbb{R}^4 \to \mathbb{R}^3$ be the linear map defined by

\begin{align*} \alpha(x_1, x_2, x_3, x_4) = (x_1 + x_2 - x_3, \; 2x_1 + x_2 + x_4, \; x_1 - x_2 + x_3 + x_4). \end{align*}

Find bases for $\ker\alpha$ and $\mathrm{im}\,\alpha$, and verify the rank-nullity theorem.
[/problem]

[solution]
Step 1: Find the matrix of $\alpha$.

Computing $\alpha$ on the standard basis vectors:

\begin{align*} \alpha(e_1) = (1, 2, 1), \quad \alpha(e_2) = (1, 1, -1), \quad \alpha(e_3) = (-1, 0, 1), \quad \alpha(e_4) = (0, 1, 1). \end{align*}

The matrix is $A = MATHENVgxq8ujP20END$.

Step 2: Row-reduce to find the rank.

Apply $R_2 \to R_2 - 2R_1$ and $R_3 \to R_3 - R_1$:

\begin{align*} \begin{pmatrix} 1 & 1 & -1 & 0 \\ 0 & -1 & 2 & 1 \\ 0 & -2 & 2 & 1 \end{pmatrix}. \end{align*}

Apply $R_3 \to R_3 - 2R_2$:

\begin{align*} \begin{pmatrix} 1 & 1 & -1 & 0 \\ 0 & -1 & 2 & 1 \\ 0 & 0 & -2 & -1 \end{pmatrix}. \end{align*}

There are three pivots (in columns 1, 2, 3), so $r(\alpha) = 3$.

Step 3: Find $\ker\alpha$.

By rank-nullity, $n(\alpha) = 4 - 3 = 1$. Solve $Ax = \mathbf{0}$ using back-substitution from the row echelon form. Setting $x_4 = t$ (the free variable):

From row 3: $-2x_3 - t = 0$, so $x_3 = -t/2$.

From row 2: $-x_2 + 2(-t/2) + t = 0$, so $-x_2 - t + t = 0$, giving $x_2 = 0$.

From row 1: $x_1 + 0 - (-t/2) = 0$, so $x_1 = -t/2$.

Taking $t = -2$: $\ker\alpha = \langle (1, 0, 1, -2) \rangle$.

Step 4: Find a basis for $\mathrm{im}\,\alpha$.

The pivot columns are columns 1, 2, 3. The corresponding columns of the original matrix $A$ form a basis for $\mathrm{im}\,\alpha$:

\begin{align*} \mathrm{im}\,\alpha = \langle (1, 2, 1), \; (1, 1, -1), \; (-1, 0, 1) \rangle. \end{align*}

Since these three vectors span $\mathbb{R}^3$ (which has dimension 3), we have $\mathrm{im}\,\alpha = \mathbb{R}^3$ and $\alpha$ is surjective.

Step 5: Verify rank-nullity.

$r(\alpha) + n(\alpha) = 3 + 1 = 4 = \dim\mathbb{R}^4$. $\checkmark$
[/solution]

Linear maps transform vectors — they are "active" operations on a vector space. But there is a complementary "passive" perspective: instead of transforming vectors, we can measure them. A linear functional $f: V \to \mathbb{F}$ assigns a scalar to each vector, respecting the linear structure. The collection of all such functionals forms a vector space in its own right — the dual space $V^*$ — and the interplay between $V$ and $V^*$ reveals a hidden symmetry that connects subspaces to systems of equations, row operations to column operations, and kernels to images.

Duality

Having explored vector spaces and linear maps, we now ask: how can we probe the structure of a vector space using functions that extract scalar information from its vectors? In many areas of mathematics — from differential equations to quantum mechanics — it is not the vectors themselves but their interactions with linear scalar-valued functions that reveal essential features. This leads us to consider the space of all linear maps from a vector space $V$ to its underlying field $\mathbb{F}$, which forms a new vector space in its own right. This dual perspective not only enriches our understanding of linear algebra but also provides the foundation for more advanced topics such as bilinear forms, tensor products, and functional analysis.

The tools required are those already developed: the definition of a vector space and the notion of a linear map. No topology or additional structure is assumed — duality here is purely algebraic.

Core Definitions

The Dual Space

To capture the idea of "measuring" vectors via linear scalar outputs, we introduce the dual space.

[definition:Dual Space]
Let $V$ be a vector space over a field $\mathbb{F}$. The dual space of $V$, denoted $V^*$, is the set of all linear maps $\theta: V \to \mathbb{F}$:
$V^* = \mathcal{L}(V, \mathbb{F}) = \{\theta: V \to \mathbb{F} \mid \theta \text{ is linear}\}.$
Elements of $V^*$ are called linear functionals or linear forms.
[/definition]

This definition isolates the algebraic notion of a linear probe on $V$. Note that we do not require continuity, boundedness, or any topological condition — this is purely a finite-dimensional construction. By convention, vectors in $V$ are denoted by Roman letters ($v, w$), while functionals in $V^*$ are denoted by Greek letters ($\theta, \phi$).

The dual space carries a natural vector space structure via pointwise operations: $(\theta + \phi)(v) = \theta(v) + \phi(v)$ and $(c\theta)(v) = c\,\theta(v)$. This is simply the vector space structure on $\mathcal{L}(V, \mathbb{F})$ inherited from the codomain $\mathbb{F}$.

The Annihilator

To relate subspaces of $V$ to conditions on functionals, we introduce the annihilator — the set of functionals that "see" a subspace as zero.

[definition:Annihilator]
Let $V$ be an $\mathbb{F}$-vector space and let $U \subseteq V$ be a subset. The annihilator of $U$ is $U^0 = \{\theta \in V^* \mid \theta(u) = 0 \text{ for all } u \in U\}$. If $W \subseteq V^*$, its annihilator is $W^0 = \{v \in V \mid \theta(v) = 0 \text{ for all } \theta \in W\}$.
[/definition]

The annihilator $U^0$ is always a subspace of $V^*$, even if $U$ is not a subspace of $V$ (it is the intersection of kernels $\bigcap_{u \in U} \ker(\mathrm{ev}_u)$, where each $\mathrm{ev}_u: V^* \to \mathbb{F}$ is linear).

The Dual Basis

The fundamental result relating a finite-dimensional vector space to its dual is the existence of a dual basis — a mirror-image coordinate system on $V^*$.

[quotetheorem:414]

The dual basis provides explicit coordinates for $V^*$: every functional $\theta \in V^*$ can be written as $\theta = \sum_{i=1}^n \theta(e_i)\,\varepsilon_i$, where the coefficients are simply the values of $\theta$ on the basis vectors. When $V = \mathbb{F}^n$ with the standard basis, the dual basis elements are the coordinate projections $\varepsilon_i(x) = x_i$, and a general functional $\theta$ corresponds to a row vector acting by left multiplication.

[citeproof:414]

[quotetheorem:415]

This is immediate from the Dual Basis: $V^*$ has a basis of $n = \dim V$ elements. Note that this equality fails in infinite dimensions, where $\dim V^* > \dim V$ (algebraic dimension).

[citeproof:415]

[example:Linear Functionals In Coordinates]
Let $V = \mathbb{R}^3$ with the standard basis $(e_1, e_2, e_3)$, and let $\theta(x_1, x_2, x_3) = 2x_1 - x_3$. In terms of the dual basis: $\theta = 2\varepsilon_1 + 0\varepsilon_2 - \varepsilon_3$, corresponding to the row vector $(2, 0, -1)$. The annihilator of $U = \langle e_1, e_2 \rangle$ is $U^0 = \langle \varepsilon_3 \rangle$ — the functionals that vanish on both $e_1$ and $e_2$. Note $\dim U + \dim U^0 = 2 + 1 = 3 = \dim V$.
[/example]

Change of Basis in the Dual Space

[quotetheorem:416]

This contravariance — the dual basis changes by the inverse transpose — is the algebraic origin of the distinction between "vectors" and "covectors" in differential geometry and physics. Vectors transform by $P$; covectors transform by $(P^{-1})^\top$.

[citeproof:416]

Dual Maps

Given a linear map $\alpha: V \to W$, we can pull back functionals on $W$ to functionals on $V$: if $\theta$ measures vectors in $W$, then $\theta \circ \alpha$ measures vectors in $V$.

[definition:Dual Map]
Let $\alpha \in \mathcal{L}(V, W)$. The dual map (or transpose) $\alpha^*: W^* \to V^*$ is defined by $\alpha^*(\theta) = \theta \circ \alpha$ for all $\theta \in W^*$.
[/definition]

Note the reversal of direction: $\alpha$ goes from $V$ to $W$, but $\alpha^*$ goes from $W^*$ to $V^*$. This makes the dual a contravariant construction.

[quotetheorem:417]

The verification is a direct computation using the pointwise definitions of the vector space operations on $V^*$ and $W^*$.

[citeproof:417]

The Matrix of the Dual Map

[quotetheorem:418]

This is one of the most satisfying results in duality theory: the abstract operation of precomposition (which reverses direction) is represented concretely by the familiar matrix transpose (which swaps rows and columns). As an immediate corollary, $\mathrm{rank}(\alpha) = \mathrm{rank}(\alpha^*)$ — since the rank of a matrix equals the rank of its transpose. This gives an elegant algebraic proof that row rank equals column rank.

[citeproof:418]

Annihilator Theory

The Dimension Formula

[quotetheorem:420]

The proof is clean: consider the restriction map $\rho: V^* \to U^*$, $\rho(\theta) = \theta|_U$. Its kernel is $U^0$ and it is surjective (any functional on $U$ extends to $V$ by the Basis Extension Theorem), so Rank-Nullity gives $\dim V^* = \dim U^0 + \dim U^*$.

[citeproof:420]

Kernel, Image, and Annihilators

The dual map and the annihilator interact in a remarkably clean way: duality swaps kernels with annihilators of images, and images with annihilators of kernels.

[quotetheorem:419]

Part (1) is essentially definitional: $\theta \in \ker\alpha^*$ iff $\theta \circ \alpha = 0$ iff $\theta$ vanishes on $\mathrm{im}\,\alpha$. Part (3) follows from part (1) and the Dimension of Annihilator. Part (2) requires both a containment argument and a dimension count.

[citeproof:419]

Annihilators of Sums and Intersections

[quotetheorem:421]

These formulas are the algebraic analogues of De Morgan's laws: taking the annihilator converts $+$ to $\cap$ and $\cap$ to $+$. Part (1) is a direct element chase. Part (2) uses the dimension formula and part (1) to establish equality by dimension counting.

[citeproof:421]

The Canonical Isomorphism to the Double Dual

While $V$ and $V^*$ are isomorphic in finite dimensions (they have the same dimension), no natural isomorphism exists between them — any such isomorphism requires choosing a basis. However, there is a canonical map from $V$ to its double dual $V^{**} = (V^*)^*$ that requires no choices at all.

[definition:Evaluation Map]
The evaluation map $\mathrm{ev}: V \to V^{**}$ is defined by $\mathrm{ev}(v)(\theta) = \theta(v)$ for all $v \in V$ and $\theta \in V^*$.
[/definition]

The idea is simple: a vector $v$ determines a functional on $V^*$ by "evaluating at $v$". This reversal — treating the vector as the function and the functional as the argument — is the heart of the construction.

[quotetheorem:422]

[citeproof:422]

[quotetheorem:423]

The key step is injectivity: if $\mathrm{ev}(v) = 0$ then $\theta(v) = 0$ for every $\theta \in V^*$. But if $v \neq \mathbf{0}$, the Dual Basis construction produces a functional with $\varepsilon_1(v) = 1$. So $v = \mathbf{0}$, and $\mathrm{ev}$ is injective. Since $\dim V = \dim V^{**}$, injectivity implies surjectivity.

[citeproof:423]

Involutivity of Duality

Under the canonical identification $V \cong V^{**}$, applying duality twice returns us to where we started.

[quotetheorem:424]

Part (1) says the annihilator is an involution on the lattice of subspaces: $U \mapsto U^0$ is a bijection from subspaces of $V$ to subspaces of $V^*$, with inverse $W \mapsto W^0$. Part (2) says the double dual functor is naturally isomorphic to the identity functor.

[citeproof:424]

[problem]
Let $V = \mathbb{R}^4$ and let $U = \{(x_1, x_2, x_3, x_4) \in \mathbb{R}^4 \mid x_1 + x_2 = 0 \text{ and } x_3 - x_4 = 0\}$. Find a basis for $U$ and a basis for $U^0$, and verify that $\dim U + \dim U^0 = 4$.
[/problem]

[solution]
Step 1: Find $U$.

The conditions $x_1 = -x_2$ and $x_3 = x_4$ give a $2$-dimensional subspace: $U = \langle (-1, 1, 0, 0), (0, 0, 1, 1) \rangle$.

Step 2: Find $U^0$.

A functional $\theta = a_1\varepsilon_1 + a_2\varepsilon_2 + a_3\varepsilon_3 + a_4\varepsilon_4 \in V^*$ lies in $U^0$ iff $\theta(-1,1,0,0) = -a_1 + a_2 = 0$ and $\theta(0,0,1,1) = a_3 + a_4 = 0$. So $a_2 = a_1$ and $a_4 = -a_3$, giving $U^0 = \langle \varepsilon_1 + \varepsilon_2,\; \varepsilon_3 - \varepsilon_4 \rangle$.

Step 3: Verify.

$\dim U + \dim U^0 = 2 + 2 = 4 = \dim V$, consistent with the Dimension of Annihilator. Note also that the defining equations of $U$ (as row vectors $(1,1,0,0)$ and $(0,0,1,-1)$) are precisely the coefficient vectors of the basis of $U^0$.
[/solution]

Duality reveals that every vector space has a "shadow" — its dual — and that linear maps, subspaces, and systems of equations all have dual counterparts. But duality pairs a vector with a linear functional, producing a scalar. What happens when we pair two vectors directly? A bilinear form $\psi: V \times W \to \mathbb{F}$ is a function linear in each argument, and it opens the door to geometry: inner products, quadratic forms, and symmetric matrices all arise as special cases. The next chapter develops the general theory; later chapters specialise to symmetric, Hermitian, and positive-definite forms.

Bilinear Forms I

Having explored vector spaces, linear maps, and the dual space construction, we now ask: what structures arise when we consider functions that are linear in two variables simultaneously? Such functions — bilinear forms — naturally encode interactions between pairs of vectors. They appear throughout mathematics: the dot product in Euclidean geometry, the evaluation pairing between a space and its dual, and the representation of quadratic expressions via polarisation. The duality framework developed in the previous chapter provides the essential language, since a bilinear form $\psi: V \times W \to \mathbb{F}$ can be "curried" into a linear map $V \to W^*$ (or $W \to V^*$), linking bilinearity directly to duality.

This chapter develops the algebraic theory of bilinear forms on finite-dimensional vector spaces over an arbitrary field $\mathbb{F}$. We establish the matrix representation, derive the change-of-basis formula (which differs fundamentally from that for linear maps), and characterise non-degeneracy in terms of matrix invertibility. The more specialised theory of symmetric and alternating forms — including diagonalisation results — is deferred to a later chapter.

Core Definitions

Bilinear Forms and Their Matrices

In the study of linear maps, we considered functions $\alpha: V \to W$ that are linear in a single variable. Many natural constructions, however, involve functions of two variables that are linear in each argument separately. The dot product $x \cdot y = \sum x_i y_i$ on $\mathbb{R}^n$ is the prototypical example: it is linear in $x$ for fixed $y$ and linear in $y$ for fixed $x$, but not linear as a function of the pair $(x, y)$ jointly (for instance, scaling both arguments by $\lambda$ multiplies the output by $\lambda^2$, not $\lambda$). To capture this structure abstractly, we isolate the property of separate linearity.

[definition:Bilinear Form]
Let $V$ and $W$ be vector spaces over a field $\mathbb{F}$. A function $\psi: V \times W \to \mathbb{F}$ is a bilinear form if:

  1. For each fixed $v \in V$, the map $w \mapsto \psi(v, w)$ is a linear map from $W$ to $\mathbb{F}$.
  2. For each fixed $w \in W$, the map $v \mapsto \psi(v, w)$ is a linear map from $V$ to $\mathbb{F}$.
    [/definition]

The definition does not require symmetry ($\psi(v, w) = \psi(w, v)$), skew-symmetry, positivity, or even that $V = W$. These are additional properties that single out important subclasses — symmetric bilinear forms, alternating forms, inner products — which we will study in later chapters. For now, bilinearity is the only constraint.

Just as a linear map between finite-dimensional spaces is completely determined by its action on a basis (by the Extension from Basis theorem), a bilinear form is completely determined by its values on all pairs of basis vectors. These values naturally assemble into a matrix.

[definition:Matrix Of A Bilinear Form]
Let $V$ and $W$ be finite-dimensional $\mathbb{F}$-vector spaces with bases $(e_1, \dots, e_n)$ and $(f_1, \dots, f_m)$ respectively. The matrix of a bilinear form $\psi: V \times W \to \mathbb{F}$ with respect to these bases is the matrix $A \in \mathrm{Mat}_{n,m}(\mathbb{F})$ defined by

\begin{align*} A_{ij} = \psi(e_i, f_j). \end{align*}
[/definition]

Given this matrix, the value of $\psi$ on arbitrary vectors is recovered by a matrix product. If $v = \sum_{i=1}^n x_i e_i$ and $w = \sum_{j=1}^m y_j f_j$, then bilinearity gives

\begin{align*} \psi(v, w) = \sum_{i=1}^n \sum_{j=1}^m x_i A_{ij} y_j = x^\top A y, \end{align*}

where $x = (x_1, \dots, x_n)^\top$ and $y = (y_1, \dots, y_m)^\top$ are the coordinate vectors. Conversely, every matrix $A \in \mathrm{Mat}_{n,m}(\mathbb{F})$ determines a bilinear form on $\mathbb{F}^n \times \mathbb{F}^m$ via $(x, y) \mapsto x^\top A y$. This establishes a bijection between bilinear forms on $V \times W$ and matrices in $\mathrm{Mat}_{n,m}(\mathbb{F})$ (once bases are fixed).

[example:Evaluation Pairing]
Let $V$ be a finite-dimensional $\mathbb{F}$-vector space. The evaluation pairing $\mathrm{ev}: V \times V^* \to \mathbb{F}$ defined by $\mathrm{ev}(v, \theta) = \theta(v)$ is a bilinear form. For a fixed basis $(e_1, \dots, e_n)$ of $V$ with dual basis $(\varepsilon_1, \dots, \varepsilon_n)$, the matrix of $\mathrm{ev}$ is the $n \times n$ identity matrix $I_n$, since $\mathrm{ev}(e_i, \varepsilon_j) = \varepsilon_j(e_i) = \delta_{ij}$. This is the prototypical non-degenerate bilinear form: the left map $\mathrm{ev}_L: V \to (V^*)^*$ is the canonical evaluation map to the double dual, and the right map $\mathrm{ev}_R: V^* \to V^*$ is the identity.
[/example]

Induced Linear Maps and Kernels

The Left and Right Maps

A bilinear form $\psi: V \times W \to \mathbb{F}$ can be viewed from two perspectives. Fixing the first argument $v \in V$ gives a linear functional on $W$ (an element of $W^*$); fixing the second argument $w \in W$ gives a linear functional on $V$ (an element of $V^*$). These two "currying" operations produce linear maps that encode the bilinear form entirely and connect it to the duality theory from the previous chapter.

[definition:Left Map]
Let $\psi: V \times W \to \mathbb{F}$ be a bilinear form. The left map of $\psi$ is the function

\begin{align*} \psi_L: V &\to W^* \\ v &\mapsto \psi(v, \cdot), \end{align*}

where $\psi(v, \cdot)$ denotes the linear functional $w \mapsto \psi(v, w)$.
[/definition]

[definition:Right Map]
Let $\psi: V \times W \to \mathbb{F}$ be a bilinear form. The right map of $\psi$ is the function

\begin{align*} \psi_R: W &\to V^* \\ w &\mapsto \psi(\cdot, w), \end{align*}

where $\psi(\cdot, w)$ denotes the linear functional $v \mapsto \psi(v, w)$.
[/definition]

Both $\psi_L$ and $\psi_R$ are linear maps — this follows from linearity of $\psi$ in the unfixed argument. The bilinear form $\psi$ is recoverable from either map: $\psi(v, w) = \psi_L(v)(w) = \psi_R(w)(v)$.

The connection to matrices is direct. If $A$ is the matrix of $\psi$ with respect to bases $(e_i)$ and $(f_j)$ with dual bases $(\varepsilon_i)$ and $(\eta_j)$, then the matrix of $\psi_L: V \to W^*$ with respect to $(e_i)$ and $(\eta_j)$ is $A^\top$, and the matrix of $\psi_R: W \to V^*$ with respect to $(f_j)$ and $(\varepsilon_i)$ is $A$. This follows from the definitions: the $j$th component of $\psi_L(e_i)$ in the dual basis $(\eta_j)$ is $\psi_L(e_i)(f_j) = \psi(e_i, f_j) = A_{ij}$, which is the $(j,i)$-entry of $A$, hence the $(i,j)$-entry of $A^\top$ after accounting for the column convention.

Kernels and Non-Degeneracy

The kernels of the left and right maps measure which vectors are "invisible" to the bilinear form — they pair to zero with every vector in the other space.

[definition:Left Kernel]
The left kernel of a bilinear form $\psi: V \times W \to \mathbb{F}$ is

\begin{align*} \ker\psi_L = \{v \in V : \psi(v, w) = 0 \text{ for all } w \in W\}. \end{align*}
[/definition]

[definition:Right Kernel]
The right kernel of a bilinear form $\psi: V \times W \to \mathbb{F}$ is

\begin{align*} \ker\psi_R = \{w \in W : \psi(v, w) = 0 \text{ for all } v \in V\}. \end{align*}
[/definition]

In terms of the representing matrix $A$, the left kernel is the null space of $A^\top$ (identified with a subspace of $V$ via the coordinate isomorphism) and the right kernel is the null space of $A$ (identified with a subspace of $W$). In particular, by the Rank-Nullity Theorem and the equality of row and column rank (Row Rank Equals Column Rank):

\begin{align*} \dim\ker\psi_L = n - \mathrm{rank}(A) = \dim\ker\psi_R + (n - m), \end{align*}

where $n = \dim V$ and $m = \dim W$. When $n = m$, the two kernels have the same dimension.

The most important special case is when both kernels are trivial — the form detects every non-zero vector.

[definition:Non Degenerate Bilinear Form]
A bilinear form $\psi: V \times W \to \mathbb{F}$ is non-degenerate if $\ker\psi_L = \{\mathbf{0}\}$ and $\ker\psi_R = \{\mathbf{0}\}$. Otherwise, it is degenerate.
[/definition]

Non-degeneracy means that the left and right maps are both injective: no non-zero vector in $V$ is annihilated by all of $W$, and vice versa. When $\psi$ is non-degenerate, $\psi_L: V \to W^*$ and $\psi_R: W \to V^*$ are injective linear maps, and by dimension counting they must be isomorphisms (since non-degeneracy forces $\dim V = \dim W$, as we will see below).

[definition:Rank Of A Bilinear Form]
Let $V$ and $W$ be finite-dimensional and let $\psi: V \times W \to \mathbb{F}$ be a bilinear form. The rank of $\psi$ is $\mathrm{rank}(\psi) = \mathrm{rank}(A)$, where $A$ is the matrix of $\psi$ with respect to any choice of bases.
[/definition]

This is well-defined because the Change of Basis for Bilinear Forms shows that $B = P^\top A Q$ with $P, Q$ invertible, so $\mathrm{rank}(B) = \mathrm{rank}(A)$. The rank of $\psi$ equals the rank of both $\psi_L$ and $\psi_R$, and $\psi$ is non-degenerate if and only if $\mathrm{rank}(\psi) = \dim V = \dim W$.

[example:Degenerate And Non Degenerate Forms]
A non-degenerate form. Define $\psi: \mathbb{R}^2 \times \mathbb{R}^2 \to \mathbb{R}$ by

\begin{align*} \psi\!\left(\begin{pmatrix} x_1 \\ x_2 \end{pmatrix}, \begin{pmatrix} y_1 \\ y_2 \end{pmatrix}\right) = x_1 y_2 - x_2 y_1. \end{align*}

The matrix in the standard basis is $A = MATHENVgxq8ujP32END$, which has $\det A = 1 \neq 0$, so $A$ is invertible and $\psi$ is non-degenerate. This is the standard symplectic form on $\mathbb{R}^2$.

A degenerate form. Define $\phi: \mathbb{R}^2 \times \mathbb{R}^2 \to \mathbb{R}$ by $\phi(x, y) = x_1 y_1$. The matrix is $A = MATHENVgxq8ujP33END$, which has rank $1$. The left kernel is $\ker\phi_L = \langle (0,1)^\top \rangle$ and the right kernel is $\ker\phi_R = \langle (0,1)^\top \rangle$: the vector $(0,1)^\top$ is invisible to $\phi$ from both sides.
[/example]

Change of Basis

The Transformation Law

The matrix of a linear map $\alpha: V \to W$ transforms as $B = Q^{-1}AP$ under a change of basis (by the Change of Basis Formula). For bilinear forms, the transformation law is different — and the difference is fundamental. The matrix of a linear map involves one "input" basis (for the domain) and one "output" basis (for the codomain), while the matrix of a bilinear form involves two "input" bases (one for each argument). This structural difference manifests in the appearance of transposes rather than inverses.

[quotetheorem:391]

The formula $B = P^\top A Q$ differs from the linear map formula $B = Q^{-1}AP$ in a crucial way: the first factor is $P^\top$, not $P^{-1}$. This reflects the fact that bilinear forms are "covariant in both arguments" — they transform like a tensor of type $(0,2)$ — while linear maps are "contravariant in the domain and covariant in the codomain." In the important special case $V = W$ with the same basis change applied to both arguments (so $P = Q$), the formula becomes $B = P^\top A P$, which defines the congruence relation on matrices. The proof is a direct computation: expand $\psi(v_i, w_j)$ using bilinearity and the change-of-basis relations, and recognise the result as the $(i,j)$-entry of $P^\top A Q$.

[citeproof:391]

The transformation law defines a natural equivalence relation on matrices. Two matrices $A, B \in \mathrm{Mat}_{n,m}(\mathbb{F})$ represent the same bilinear form in different bases if and only if $B = R^\top A S$ for some invertible $R \in \mathrm{GL}_n(\mathbb{F})$ and $S \in \mathrm{GL}_m(\mathbb{F})$. Since every invertible matrix can be written as $P^\top$ for some invertible $P$ (namely $P = (R^\top)^{-\top} = R$), this is the same as the equivalence relation $B = R A S$ for invertible $R, S$. By the Canonical Form for Linear Maps, the equivalence class is determined by the rank alone: every bilinear form of rank $r$ can be represented by the matrix $MATHENVgxq8ujP34END$ in some pair of bases.

Non-Degeneracy and Invertibility

The characterisation of non-degeneracy in matrix terms follows from the relationship between the kernels and the null spaces of $A$ and $A^\top$.

[quotetheorem:392]

This result has several important consequences. First, non-degenerate bilinear forms can only pair spaces of equal dimension — an asymmetric pairing necessarily has a non-trivial kernel on the larger side. Second, when $\psi: V \times W \to \mathbb{F}$ is non-degenerate, the left map $\psi_L: V \to W^*$ and the right map $\psi_R: W \to V^*$ are both isomorphisms (since they are injective linear maps between spaces of equal dimension, by Injective iff Surjective in Finite Dimensions). This gives a concrete way to identify $V$ with $W^*$: every non-degenerate bilinear form provides a "dictionary" translating vectors in $V$ to linear functionals on $W$ and vice versa. The proof reduces non-degeneracy to the condition that both $A$ and $A^\top$ have trivial null spaces, which forces $A$ to be square (by rank-nullity) and invertible.

[citeproof:392]

[example:Change Of Basis Computation]
Consider the bilinear form $\psi: \mathbb{R}^2 \times \mathbb{R}^2 \to \mathbb{R}$ defined by $\psi(x, y) = 3x_1 y_1 - x_1 y_2 - x_2 y_1 + 4x_2 y_2$. Its matrix in the standard basis is

\begin{align*} A = \begin{pmatrix} 3 & -1 \\ -1 & 4 \end{pmatrix}. \end{align*}

Since $\det A = 12 - 1 = 11 \neq 0$, the form is non-degenerate. Now change to the basis $g_1 = (1,1)^\top$, $g_2 = (1,-1)^\top$ for both copies of $\mathbb{R}^2$. The change-of-basis matrix is $P = MATHENVgxq8ujP36END$. By the Change of Basis for Bilinear Forms:

\begin{align*} B &= P^\top A P = \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} \begin{pmatrix} 3 & -1 \\ -1 & 4 \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} \\ &= \begin{pmatrix} 2 & 3 \\ 4 & -5 \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix} = \begin{pmatrix} 5 & -1 \\ -1 & 9 \end{pmatrix}. \end{align*}

We can verify: $\psi(g_1, g_1) = 3(1)(1) - 1(1)(1) - 1(1)(1) + 4(1)(1) = 5 = B_{11}$. $\checkmark$
[/example]

The Space of Bilinear Forms

Bilinear Forms as a Vector Space

The set of all bilinear forms on $V \times W$ is itself a vector space under pointwise operations: for bilinear forms $\psi, \phi: V \times W \to \mathbb{F}$ and $\lambda \in \mathbb{F}$, define $(\psi + \phi)(v, w) = \psi(v, w) + \phi(v, w)$ and $(\lambda\psi)(v, w) = \lambda\,\psi(v, w)$. The zero form $\psi(v, w) = 0$ serves as the additive identity. We denote this space by $\mathrm{Bil}(V, W; \mathbb{F})$.

The matrix representation provides an isomorphism $\mathrm{Bil}(V, W; \mathbb{F}) \cong \mathrm{Mat}_{n,m}(\mathbb{F})$ (once bases are fixed), since the map $\psi \mapsto A$ is linear and bijective. In particular, $\dim\mathrm{Bil}(V, W; \mathbb{F}) = (\dim V)(\dim W)$. This is the same dimension as $\mathcal{L}(V, W^*)$, which is no coincidence: the left map $\psi \mapsto \psi_L$ is a vector space isomorphism $\mathrm{Bil}(V, W; \mathbb{F}) \cong \mathcal{L}(V, W^*)$.

[problem]
Let $\psi: \mathbb{R}^3 \times \mathbb{R}^2 \to \mathbb{R}$ be defined by $\psi(v, w) = v_1 w_1 + v_2 w_2 - v_3 w_1$. Find the matrix of $\psi$, its rank, and describe the left and right kernels.
[/problem]

[solution]
Step 1: Find the matrix of $\psi$.

Evaluating on standard basis pairs:

\begin{align*} A_{ij} = \psi(e_i, f_j), \quad \text{giving} \quad A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ -1 & 0 \end{pmatrix}. \end{align*}

Explicitly: $\psi(e_1, f_1) = 1$, $\psi(e_1, f_2) = 0$, $\psi(e_2, f_1) = 0$, $\psi(e_2, f_2) = 1$, $\psi(e_3, f_1) = -1$, $\psi(e_3, f_2) = 0$.

Step 2: Compute the rank.

Row-reduce $A$: the matrix already has pivots in positions $(1,1)$ and $(2,2)$, so $\mathrm{rank}(A) = 2$. Since $A$ is $3 \times 2$ and not square, $\psi$ is necessarily degenerate by the Non-Degeneracy and Invertibility theorem.

Step 3: Find the right kernel.

Solve $Ay = \mathbf{0}$: $y_1 = 0$, $y_2 = 0$, and $-y_1 = 0$ (redundant). So $\ker\psi_R = \{\mathbf{0}\}$ — the right kernel is trivial.

Step 4: Find the left kernel.

Solve $A^\top x = \mathbf{0}$, where $A^\top = MATHENVgxq8ujP39END$. This gives $x_1 = x_3$ and $x_2 = 0$, with $x_3$ free. So $\ker\psi_L = \langle (1, 0, 1)^\top \rangle$, which has dimension $1$.

Step 5: Verify dimensions.

By the Rank-Nullity Theorem: $\dim\ker\psi_L = 3 - \mathrm{rank}(A^\top) = 3 - 2 = 1$ and $\dim\ker\psi_R = 2 - \mathrm{rank}(A) = 2 - 2 = 0$. Both agree with our computations. $\checkmark$
[/solution]

Bilinear forms pair vectors to produce scalars — but some pairings are more special than others. A bilinear form on $V \times V$ (same space twice) that is alternating — $\psi(v, v) = 0$ for all $v$ — turns out to encode a fundamental scalar invariant of square matrices: the determinant. The determinant detects invertibility, measures volume distortion, and connects the algebraic structure of linear maps to the geometric structure of the spaces they act on. The next chapter develops this theory from the perspective of multilinear algebra, revealing the determinant as the unique alternating $n$-linear form on the columns of a matrix, normalised to give $\det I = 1$.

Determinants

Having explored bilinear forms and their relationship to linear maps and duality, we now confront a fundamental question: how can we assign a single scalar to a square matrix that detects invertibility, measures how linear maps distort volume, and behaves multiplicatively under composition? The answer is the determinant — an alternating multilinear function of the columns (or rows) of a matrix. The determinant ties together every major theme of the course so far: it governs invertibility (linking to the rank-nullity theorem), it transforms predictably under composition (linking to matrix multiplication), and it arises naturally from the theory of multilinear forms (extending the bilinear theory of the previous chapter to $n$ variables).

We define the determinant via the Leibniz formula, establish its characterisation as the unique normalised volume form, and derive its key structural properties: transpose invariance, multiplicativity, and the invertibility criterion. We then develop the computational tools — cofactor expansion, the adjugate matrix, and the block triangular formula — that make determinants practical. Throughout, we work over an arbitrary field $\mathbb{F}$ and assume familiarity with permutations and the sign homomorphism $\varepsilon: S_n \to \{+1, -1\}$.

Core Definitions

Volume Forms and the Determinant

The geometric motivation for determinants comes from the notion of oriented volume. The volume of the parallelepiped spanned by $n$ vectors in $\mathbb{R}^n$ should be linear in each edge (doubling one edge doubles the volume), and it should vanish when two edges coincide (the parallelepiped is degenerate). These two properties — multilinearity and the alternating condition — characterise the determinant up to a scalar.

[definition:Volume Form]
A volume form on $\mathbb{F}^n$ is a function $d: (\mathbb{F}^n)^n \to \mathbb{F}$ satisfying:

  1. Multilinearity: for each $i \in \{1, \dots, n\}$, the map $v \mapsto d(v_1, \dots, v_{i-1}, v, v_{i+1}, \dots, v_n)$ is linear.
  2. Alternating: if $v_i = v_j$ for some $i \neq j$, then $d(v_1, \dots, v_n) = 0$.
    [/definition]

The alternating condition implies antisymmetry: swapping two arguments changes the sign. To see this, expand $d(\dots, v_i + v_j, \dots, v_i + v_j, \dots) = 0$ using multilinearity. This gives $d(\dots, v_i, \dots, v_j, \dots) + d(\dots, v_j, \dots, v_i, \dots) = 0$ (the two "diagonal" terms vanish by the alternating property), so each swap introduces a factor of $-1$.

The determinant is defined by the Leibniz formula, which sums over all permutations:

[definition:Determinant]
Let $A \in \mathrm{Mat}_n(\mathbb{F})$. The determinant of $A$ is

\begin{align*} \det A = \sum_{\sigma \in S_n} \varepsilon(\sigma) \prod_{i=1}^n A_{i,\sigma(i)}, \end{align*}

where the sum runs over all permutations $\sigma$ in the symmetric group $S_n$, and $\varepsilon(\sigma) \in \{+1, -1\}$ is the sign of $\sigma$.
[/definition]

Each term in the sum picks one entry from each row and each column (the entry in row $i$ and column $\sigma(i)$), weights the product by the sign of the permutation, and sums over all $n!$ such selections. For small matrices, this is directly computable: $\det(a) = a$ for $1 \times 1$, $\detMATHENVgxq8ujP41END = ad - bc$ for $2 \times 2$, and the familiar six-term formula for $3 \times 3$.

The Universal Property

Characterisation of the Determinant

The Leibniz formula defines the determinant, but the question remains: why this particular formula? The answer is that the determinant is the unique volume form normalised so that $\det(e_1, \dots, e_n) = 1$. More precisely, every volume form is a scalar multiple of the determinant. This is the universal property that governs all subsequent theory.

[quotetheorem:393]

This result has three important consequences. First, the space of alternating multilinear forms on $(\mathbb{F}^n)^n$ is one-dimensional, spanned by $\det$. Second, the determinant is the only alternating multilinear function of the columns of a matrix that sends the identity to $1$; this can be taken as an axiomatic definition. Third, for any linear map $\alpha: \mathbb{F}^n \to \mathbb{F}^n$ with matrix $A$, and any volume form $d$, we have $d(\alpha(v_1), \dots, \alpha(v_n)) = (\det A) \cdot d(v_1, \dots, v_n)$. This says that $\det A$ measures how $\alpha$ rescales oriented volume, independent of the choice of volume form. The proof expands each column in the standard basis using multilinearity, eliminates repeated indices using the alternating property, and recognises the surviving terms as the Leibniz formula.

[citeproof:393]

[example:Upper Triangular Determinant]
If $A$ is upper triangular (i.e., $A_{ij} = 0$ for $i > j$), then $\det A = \prod_{i=1}^n A_{ii}$. In the Leibniz sum, any permutation $\sigma \neq \mathrm{id}$ has some $i$ with $\sigma(i) < i$, making $A_{i,\sigma(i)} = 0$ (since $\sigma(i) < i$ means we are below the diagonal). Only the identity permutation contributes, giving the product of diagonal entries. The same holds for lower triangular matrices (by transpose invariance, established below).
[/example]

Structural Properties

Transpose Invariance

The determinant treats rows and columns symmetrically: every statement about column operations has a dual statement about row operations, and vice versa. This is because the determinant is invariant under transposition.

[quotetheorem:394]

Transpose invariance means that the determinant is simultaneously multilinear and alternating in the rows and in the columns. All results proved using column properties have immediate row analogues. The proof reindexes the Leibniz sum using $\sigma \mapsto \sigma^{-1}$, exploiting the fact that inversion is a sign-preserving bijection on $S_n$.

[citeproof:394]

Multiplicativity

The most important structural property of the determinant is multiplicativity: the determinant of a product is the product of the determinants. This makes $\det: \mathrm{GL}_n(\mathbb{F}) \to \mathbb{F}^\times$ a group homomorphism from the general linear group to the multiplicative group of the field.

[quotetheorem:395]

Multiplicativity has far-reaching consequences. It implies that similar matrices have the same determinant (if $B = P^{-1}AP$, then $\det B = \det A$), so the determinant is an invariant of the underlying linear map, not just the matrix. It also implies $\det(A^{-1}) = (\det A)^{-1}$ and $\det(A^n) = (\det A)^n$. The proof defines an auxiliary alternating multilinear form $d(b_1, \dots, b_n) = \det(Ab_1 \mid \cdots \mid Ab_n)$, applies the Determinant as Universal Volume Form to identify $d$ as $(\det A) \cdot \det$, and evaluates on the columns of $B$.

[citeproof:395]

The Invertibility Criterion

The determinant provides a one-number test for invertibility: a square matrix is invertible if and only if its determinant is non-zero. This connects the abstract multilinear construction to the core question of the linear maps chapter.

[quotetheorem:396]

The equivalence of invertibility, non-zero determinant, and full rank unifies three different perspectives: algebraic (existence of an inverse), multilinear (non-vanishing of a volume form), and dimensional (the rank-nullity theorem). The proof uses multiplicativity for (1) $\Rightarrow$ (2), linear dependence of columns for (2) $\Rightarrow$ (3), and the Injective iff Surjective in Finite Dimensions theorem for (3) $\Rightarrow$ (1).

[citeproof:396]

[example:Determinant Detects Invertibility]
The matrix $A = MATHENVgxq8ujP42END$ has $\det A = 1(0 - 24) - 2(0 - 20) + 3(0 - 5) = -24 + 40 - 15 = 1 \neq 0$, so $A$ is invertible. By contrast, the matrix $B = MATHENVgxq8ujP43END$ has $\det B = 1(45 - 48) - 2(36 - 42) + 3(32 - 35) = -3 + 12 - 9 = 0$, so $B$ is singular. Indeed, $R_3 = 2R_2 - R_1$ (check: $(7,8,9) = 2(4,5,6) - (1,2,3)$), confirming linear dependence.
[/example]

Computational Tools

Cofactor Expansion

The Leibniz formula involves $n!$ terms, which grows far too rapidly for direct computation. Cofactor expansion reduces an $n \times n$ determinant to $n$ determinants of size $(n-1) \times (n-1)$, providing a recursive algorithm. Combined with row reduction (which introduces known factors), this yields efficient determinant computation.

[quotetheorem:398]

Cofactor expansion is the standard method for hand computation of small determinants ($n \leq 4$). The strategy is to expand along a row or column with many zeros, minimising the number of non-trivial minors. The proof uses multilinearity in one column to distribute, then evaluates each resulting determinant (which has a standard basis vector in one position) by row and column swaps to reduce to a minor.

[citeproof:398]

The Adjugate and Explicit Inversion

The cofactors appearing in cofactor expansion assemble into a matrix — the adjugate — that provides an explicit formula for the inverse. While computationally expensive ($O(n \cdot n!)$ via minors, or $O(n^4)$ via row reduction of each minor), the adjugate formula is theoretically important: it shows that matrix inversion is a rational operation in the entries of $A$, with $\det A$ in the denominator.

[definition:Adjugate Matrix]
Let $A \in \mathrm{Mat}_n(\mathbb{F})$. The adjugate (or classical adjoint) of $A$ is the matrix $\mathrm{adj}(A) \in \mathrm{Mat}_n(\mathbb{F})$ defined by

\begin{align*} (\mathrm{adj}\, A)_{ij} = (-1)^{i+j} \det \hat{A}_{ji}, \end{align*}

where $\hat{A}_{ji}$ is the $(n-1) \times (n-1)$ matrix obtained by deleting row $j$ and column $i$ from $A$. Note the transposition: the $(i,j)$-entry of $\mathrm{adj}(A)$ involves the $(j,i)$-cofactor.
[/definition]

[quotetheorem:397]

The identity $A \cdot \mathrm{adj}(A) = (\det A) I_n$ is remarkable: it holds for all square matrices, not just invertible ones. When $\det A = 0$, it says that $A \cdot \mathrm{adj}(A) = \mathbf{0}$, which means every column of $\mathrm{adj}(A)$ lies in $\ker A$. When $\det A \neq 0$, dividing by $\det A$ gives the explicit inverse formula $A^{-1} = \frac{1}{\det A} \mathrm{adj}(A)$, known as Cramer's rule in the context of solving linear systems. The proof computes the $(i,j)$-entry of $A \cdot \mathrm{adj}(A)$: on the diagonal ($i = j$) it reduces to cofactor expansion; off the diagonal ($i \neq j$) it equals the determinant of a matrix with two identical rows, hence zero.

[citeproof:397]

Block Triangular Matrices

A useful computational shortcut applies to matrices with block triangular structure. When the lower-left (or upper-right) block is zero, the determinant factors as a product of the determinants of the diagonal blocks. This generalises the elementary fact that the determinant of a triangular matrix is the product of its diagonal entries.

[quotetheorem:399]

This result is used constantly in practice — whenever a matrix has a natural block decomposition, the determinant can be computed block by block. The proof analyses the Leibniz sum and shows that any permutation contributing a non-zero term must preserve the block structure (mapping indices within each block to indices within the same block), so the sum factors as a product of two independent Leibniz sums.

[citeproof:399]

[example:Block Determinant Computation]
Consider the $4 \times 4$ matrix

\begin{align*} M = \begin{pmatrix} 2 & 1 & 3 & 7 \\ 0 & 3 & -1 & 2 \\ 0 & 0 & 4 & 5 \\ 0 & 0 & 1 & 2 \end{pmatrix}. \end{align*}

This is block upper triangular with $A = MATHENVgxq8ujP46END$ and $B = MATHENVgxq8ujP47END$. By the Block Triangular Determinant:

\begin{align*} \det M = (\det A)(\det B) = (2 \cdot 3 - 1 \cdot 0)(4 \cdot 2 - 5 \cdot 1) = 6 \cdot 3 = 18. \end{align*}
[/example]

Determinants and Linear Maps

The Determinant as a Volume Distortion Factor

The universal property established earlier gives the determinant a clean geometric interpretation: for any linear map $\alpha: \mathbb{F}^n \to \mathbb{F}^n$ with matrix $A$, and any vectors $v_1, \dots, v_n \in \mathbb{F}^n$:

\begin{align*} \det(\alpha(v_1), \dots, \alpha(v_n)) = (\det A) \cdot \det(v_1, \dots, v_n). \end{align*}

The determinant of $A$ is the factor by which $\alpha$ scales the oriented volume of every parallelepiped. When $\det A > 0$ (in an ordered field), $\alpha$ preserves orientation; when $\det A < 0$, it reverses orientation; when $\det A = 0$, every parallelepiped is collapsed to a lower-dimensional object.

Multiplicativity of the determinant (Determinant Multiplicativity) is the algebraic expression of this geometric fact: composing two linear maps multiplies their volume distortion factors. If $\alpha$ scales volume by $\det A$ and $\beta$ scales volume by $\det B$, then $\beta \circ \alpha$ scales volume by $(\det B)(\det A) = \det(BA)$.

Basis Independence

The determinant of a linear map $\alpha: V \to V$ (an endomorphism) can be defined without choosing a basis. If $A$ is the matrix of $\alpha$ with respect to some basis and $B = P^{-1}AP$ is the matrix with respect to another basis, then by multiplicativity:

\begin{align*} \det B = \det(P^{-1}AP) = (\det P^{-1})(\det A)(\det P) = \det A. \end{align*}

We may therefore write $\det\alpha = \det A$ unambiguously. This makes the determinant an invariant of the endomorphism itself, not just its matrix representation. This observation will be essential when we study eigenvalues and the characteristic polynomial in the chapter on endomorphisms.

[problem]
Compute the determinant of the matrix

\begin{align*} A = \begin{pmatrix} 1 & 3 & 2 \\ 4 & 1 & 3 \\ 2 & 5 & 2 \end{pmatrix} \end{align*}

using cofactor expansion along the first row and verify using row reduction.
[/problem]

[solution]
Method 1: Cofactor expansion along row 1.

By Cofactor Expansion:

\begin{align*} \det A &= 1 \cdot (-1)^{1+1} \det\begin{pmatrix} 1 & 3 \\ 5 & 2 \end{pmatrix} + 3 \cdot (-1)^{1+2} \det\begin{pmatrix} 4 & 3 \\ 2 & 2 \end{pmatrix} + 2 \cdot (-1)^{1+3} \det\begin{pmatrix} 4 & 1 \\ 2 & 5 \end{pmatrix} \\ &= 1(2 - 15) - 3(8 - 6) + 2(20 - 2) \\ &= -13 - 6 + 36 = 17. \end{align*}

Method 2: Row reduction.

Apply $R_2 \to R_2 - 4R_1$ and $R_3 \to R_3 - 2R_1$ (these are shear operations, which do not change the determinant):

\begin{align*} \begin{pmatrix} 1 & 3 & 2 \\ 0 & -11 & -5 \\ 0 & -1 & -2 \end{pmatrix}. \end{align*}

Apply $R_2 \to R_2 - 11R_3$:

\begin{align*} \begin{pmatrix} 1 & 3 & 2 \\ 0 & 0 & 17 \\ 0 & -1 & -2 \end{pmatrix}. \end{align*}

Swap $R_2 \leftrightarrow R_3$ (introduces a factor of $-1$):

\begin{align*} \begin{pmatrix} 1 & 3 & 2 \\ 0 & -1 & -2 \\ 0 & 0 & 17 \end{pmatrix}. \end{align*}

This is upper triangular, so $\det = (-1) \cdot 1 \cdot (-1) \cdot 17 = 17$. $\checkmark$
[/solution]

Determinants detect invertibility and measure volume distortion — but they are coarse invariants. Two matrices with the same determinant (and even the same trace) may behave completely differently: $I_2$ and $MATHENVgxq8ujP56END$ both have determinant $1$ and trace $2$, yet one is the identity and the other is a shear. To classify linear maps up to similarity — the natural equivalence for maps from a space to itself — we need finer tools: the characteristic polynomial, eigenvalues, and ultimately the Jordan normal form. The next chapter develops this classification.

Endomorphisms

Having explored linear maps between different spaces, we now specialise to the richest case: linear maps from a finite-dimensional vector space to itself. An endomorphism $\alpha: V \to V$ can be composed with itself, evaluated on polynomials, and studied through invariants — trace, determinant, eigenvalues, characteristic and minimal polynomials — that depend only on the map, not on any choice of basis. Because the domain and codomain coincide, the change-of-basis formula simplifies from $B = Q^{-1}AP$ (with two independent basis changes) to $B = P^{-1}AP$ (a single conjugation), and the resulting equivalence relation — similarity — is the organising principle of the chapter.

Our goal is the classification of endomorphisms up to similarity. Over an algebraically closed field, this classification is complete: every endomorphism has a Jordan normal form, a canonical block-diagonal matrix that is unique up to the ordering of the blocks. The path to this result passes through eigenvalues and eigenspaces, the characteristic and minimal polynomials, diagonalisability criteria, the Cayley–Hamilton theorem, and the generalised eigenspace decomposition. All the tools we need — determinants, rank-nullity, the dimension formula for direct sums — are already in place.

Core Definitions

Endomorphisms and Similarity

An endomorphism is a linear map from a vector space to itself. The crucial difference from a general linear map $\alpha: U \to V$ is that we can compose $\alpha$ with itself: $\alpha^2 = \alpha \circ \alpha$, $\alpha^3 = \alpha \circ \alpha \circ \alpha$, and so on. This makes $\mathrm{End}(V)$ not just a vector space but a (non-commutative) algebra.

[definition:Endomorphism]
Let $V$ be a finite-dimensional vector space over a field $\mathbb{F}$. An endomorphism of $V$ is a linear map $\alpha: V \to V$. The set of all endomorphisms of $V$ is denoted $\mathrm{End}(V)$, which forms an $\mathbb{F}$-algebra under pointwise addition, scalar multiplication, and composition. The identity map is denoted $\mathrm{id}_V$ (or simply $\mathrm{id}$). The invertible endomorphisms form the general linear group $\mathrm{GL}(V)$.
[/definition]

When representing an endomorphism by a matrix, we use the same basis for both the domain and the codomain. This convention is essential: it means that changing the basis conjugates the matrix rather than applying a general equivalence transformation.

[definition:Similar Matrices]
Two matrices $A, B \in \mathrm{Mat}_n(\mathbb{F})$ are similar (or conjugate), written $A \sim B$, if there exists $P \in \mathrm{GL}_n(\mathbb{F})$ with $B = P^{-1}AP$. Similarity is an equivalence relation on $\mathrm{Mat}_n(\mathbb{F})$.
[/definition]

The similarity classes are precisely the orbits of the conjugation action of $\mathrm{GL}_n(\mathbb{F})$ on $\mathrm{Mat}_n(\mathbb{F})$, and they correspond to the distinct endomorphisms of an $n$-dimensional space (up to choice of basis). The classification problem is to find a canonical representative in each class.

The Change-of-Basis Formula

[quotetheorem:400]

This is the special case of the general Change of Basis Formula where domain and codomain are the same space. Because the same basis change $P$ is applied to both, the two independent matrices $P$ and $Q$ in the general formula collapse to a single $P$, giving conjugation $B = P^{-1}AP$ rather than equivalence $B = Q^{-1}AP$.

[citeproof:400]

Similarity Invariants

Trace and Determinant

A similarity invariant is a quantity that takes the same value on all matrices in a similarity class — equivalently, a property of the endomorphism itself, not its matrix representation. The trace and determinant are the most basic examples.

[definition:Trace]
The trace of a matrix $A \in \mathrm{Mat}_n(\mathbb{F})$ is $\mathrm{tr}\, A = \sum_{i=1}^n A_{ii}$.
[/definition]

[quotetheorem:401]

The cyclic property $\mathrm{tr}(AB) = \mathrm{tr}(BA)$ is a simple but powerful identity. It immediately implies trace invariance under similarity: $\mathrm{tr}(P^{-1}AP) = \mathrm{tr}(A)$. Determinant invariance follows from the Determinant Multiplicativity theorem. These invariances justify defining the trace and determinant of an endomorphism $\alpha$ as $\mathrm{tr}\,\alpha = \mathrm{tr}\, A$ and $\det\alpha = \det A$ for any representing matrix $A$.

[citeproof:401]

The Characteristic Polynomial

The trace and determinant are the simplest similarity invariants, but they are far from sufficient to distinguish similarity classes. (For instance, the identity matrix $I_2$ and the matrix $MATHENVgxq8ujP57END$ both have trace $2$ and determinant $1$, but are not similar over $\mathbb{R}$.) The characteristic polynomial packages all eigenvalue information into a single invariant.

[definition:Eigenvalue And Eigenvector]
Let $\alpha \in \mathrm{End}(V)$. A scalar $\lambda \in \mathbb{F}$ is an eigenvalue of $\alpha$ if there exists a non-zero vector $v \in V$ with $\alpha(v) = \lambdav$. Such a vector is an eigenvector corresponding to $\lambda$. The $\lambda$-eigenspace is $E_\alpha(\lambda) = \ker(\alpha - \lambda\,\mathrm{id})$.
[/definition]

[definition:Characteristic Polynomial]
The characteristic polynomial of $\alpha \in \mathrm{End}(V)$ is $\chi_\alpha(t) = \det(t\,\mathrm{id} - \alpha)$. This is a monic polynomial of degree $n = \dim V$, and its roots (in $\mathbb{F}$ or an extension) are exactly the eigenvalues of $\alpha$.
[/definition]

[quotetheorem:402]

Well-definedness of $\chi_\alpha$ follows immediately from the Change of Basis for Endomorphisms: $tI - B = P^{-1}(tI - A)P$, so $\det(tI - B) = \det(tI - A)$ by Determinant Multiplicativity.

[citeproof:402]

The coefficients of $\chi_\alpha(t) = t^n - (\mathrm{tr}\,\alpha)\,t^{n-1} + \cdots + (-1)^n \det\alpha$ recover the trace (as the sum of eigenvalues) and the determinant (as the product of eigenvalues). The characteristic polynomial is a strictly finer invariant: it determines both the trace and the determinant, but not vice versa.

[example:Trace And Determinant Invariance]
Let $A = MATHENVgxq8ujP58END$ and $P = MATHENVgxq8ujP59END$, so $P^{-1} = MATHENVgxq8ujP60END$. Then $B = P^{-1}AP = MATHENVgxq8ujP61END$. We verify: $\mathrm{tr}\, A = 5 = \mathrm{tr}\, B$ and $\det A = -2 = \det B$. The characteristic polynomial is $\chi_A(t) = t^2 - 5t - 2 = \chi_B(t)$.
[/example]

Eigenspaces and Diagonalisability

Linear Independence of Eigenvectors

The first structural result about eigenvectors is that eigenvectors for distinct eigenvalues are automatically linearly independent. This places an upper bound on the number of eigenvalues ($\leq \dim V$) and is the first step toward diagonalisation.

[quotetheorem:403]

The proof is by induction on the number of distinct eigenvalues: apply $\alpha$ to a dependence relation to shift all coefficients by their eigenvalues, then subtract to eliminate one eigenspace and apply the inductive hypothesis.

[citeproof:403]

Characterisations of Diagonalisability

An endomorphism is diagonalisable if it can be represented by a diagonal matrix — equivalently, if $V$ has a basis of eigenvectors. Not every endomorphism is diagonalisable (the $2 \times 2$ Jordan block $MATHENVgxq8ujP62END$ has only a one-dimensional eigenspace), but there are clean criteria for when it is.

[definition:Diagonalisable Endomorphism]
An endomorphism $\alpha \in \mathrm{End}(V)$ is diagonalisable if there exists a basis of $V$ consisting of eigenvectors of $\alpha$.
[/definition]

[quotetheorem:404]

The four equivalent conditions give different approaches to checking diagonalisability: find an explicit eigenvector basis (2), verify the eigenspace decomposition spans $V$ (3), or simply compare dimensions (4). The proof connects them via Linear Independence of Eigenvectors: since the eigenspace sum is always direct, the only question is whether the dimensions add up to $\dim V$.

[citeproof:404]

The Minimal Polynomial

Existence and the Divisibility Property

While the characteristic polynomial is defined via the determinant, the minimal polynomial is defined purely in terms of the algebraic structure of $\mathrm{End}(V)$. It is the "smallest" polynomial that annihilates $\alpha$, and it generates the ideal of all annihilating polynomials.

[definition:Minimal Polynomial]
The minimal polynomial $M_\alpha(t)$ of $\alpha \in \mathrm{End}(V)$ is the unique monic polynomial of least degree satisfying $M_\alpha(\alpha) = 0$.
[/definition]

[quotetheorem:405]

Existence relies on finite-dimensionality: $\mathrm{End}(V)$ has dimension $n^2$, so the $n^2 + 1$ endomorphisms $\mathrm{id}, \alpha, \alpha^2, \dots, \alpha^{n^2}$ are linearly dependent. The divisibility property — $p(\alpha) = 0$ iff $M_\alpha \mid p$ — follows from the division algorithm in $\mathbb{F}[t]$, which is a Euclidean domain.

[citeproof:405]

Diagonalisability via the Minimal Polynomial

The minimal polynomial gives the cleanest criterion for diagonalisability: check whether it has repeated roots.

[quotetheorem:406]

This reduces the geometric question "does $V$ have an eigenvector basis?" to the algebraic question "does $M_\alpha$ split into distinct linear factors?" The forward direction is straightforward: if $\alpha$ is diagonalisable, the product of distinct linear factors annihilates $\alpha$. The reverse uses the coprimality of the factors and Bézout's identity to construct a direct sum decomposition of $V$ into eigenspaces.

[citeproof:406]

[example:Minimal Polynomial And Diagonalisability]
Consider $A = MATHENVgxq8ujP63END$ and $B = MATHENVgxq8ujP64END$. For $A$: $A - I = 0$, so $M_A(t) = t - 1$ (distinct linear factor) and $A$ is diagonalisable. For $B$: $B - I = MATHENVgxq8ujP65END \neq 0$ but $(B - I)^2 = 0$, so $M_B(t) = (t-1)^2$ (repeated root) and $B$ is not diagonalisable.
[/example]

The Cayley–Hamilton Theorem

Statement and Proof

The Cayley–Hamilton theorem is the bridge between the characteristic polynomial (defined via the determinant) and the minimal polynomial (defined via annihilation). It says that every endomorphism is annihilated by its own characteristic polynomial.

[quotetheorem:407]

This is one of the most important results in linear algebra. It implies $M_\alpha \mid \chi_\alpha$ (by the divisibility property of the minimal polynomial), giving the bound $\deg M_\alpha \leq n$. Combined with the fact that $M_\alpha$ and $\chi_\alpha$ have the same roots (see the Multiplicity Inequalities below), it tightly constrains the relationship between the two polynomials. The proof works in the polynomial matrix ring $\mathrm{Mat}_n(\mathbb{F}[t])$: apply the Adjugate Identity to $tI - A$, equate polynomial coefficients, multiply by appropriate powers of $A$, and observe that the left side telescopes to zero while the right side gives $\chi_A(A)$.

[citeproof:407]

Triangulability

When Can We Triangulate?

Before tackling the full Jordan normal form, we consider the simpler question: when can an endomorphism be represented by an upper triangular matrix? This is a weaker condition than diagonalisability but still reveals the eigenvalues (they appear on the diagonal).

[definition:Triangulable Endomorphism]
An endomorphism $\alpha \in \mathrm{End}(V)$ is triangulable if there exists a basis of $V$ in which $\alpha$ is represented by an upper triangular matrix.
[/definition]

[quotetheorem:408]

Over $\mathbb{C}$ (or any algebraically closed field), every polynomial splits, so every endomorphism is triangulable. Over $\mathbb{R}$, an endomorphism is triangulable if and only if all its eigenvalues are real. The proof uses induction on $\dim V$: extract an eigenvalue (which exists because $\chi_\alpha$ has a root), find an eigenvector, pass to the quotient $V/\langle v \rangle$, and apply the inductive hypothesis.

[citeproof:408]

Multiplicities and the Jordan Normal Form

Algebraic, Geometric, and Minimal Polynomial Multiplicities

Each eigenvalue carries three natural measures of its "importance": its multiplicity as a root of $\chi_\alpha$ (algebraic), the dimension of its eigenspace (geometric), and its multiplicity as a root of $M_\alpha$ (minimal polynomial multiplicity). These three quantities are tightly constrained.

[definition:Algebraic And Geometric Multiplicity]
Let $\alpha \in \mathrm{End}(V)$ with eigenvalue $\lambda$. The algebraic multiplicity $a_\lambda$ is the multiplicity of $\lambda$ as a root of $\chi_\alpha(t)$. The geometric multiplicity $g_\lambda$ is $\dim E_\alpha(\lambda) = \dim\ker(\alpha - \lambda\,\mathrm{id})$. The minimal polynomial multiplicity $c_\lambda$ is the multiplicity of $\lambda$ as a root of $M_\alpha(t)$.
[/definition]

[quotetheorem:409]

The inequalities $1 \leq g_\lambda \leq a_\lambda$ and $1 \leq c_\lambda \leq a_\lambda$ constrain the possible Jordan block structures. The proof uses basis extension for $g_\lambda \leq a_\lambda$ (choose an eigenvector basis for $E_\alpha(\lambda)$, extend, and use the Block Triangular Determinant) and Cayley–Hamilton for $c_\lambda \leq a_\lambda$.

[citeproof:409]

Diagonalisability Criteria via Multiplicities

[quotetheorem:410]

This unifies three perspectives on diagonalisability. Condition (2) says the eigenspaces are "as large as possible" — each eigenspace accounts for the full algebraic multiplicity. Condition (3) says the minimal polynomial is "as small as possible" — each eigenvalue appears with power exactly $1$. The proof combines the Characterisations of Diagonalisability with the Multiplicity Inequalities and the Diagonalisability via Minimal Polynomial criterion.

[citeproof:410]

The Generalised Eigenspace Decomposition

When $\alpha$ is not diagonalisable, the ordinary eigenspaces $E_\alpha(\lambda_i)$ do not span $V$. The fix is to enlarge them: instead of $\ker(\alpha - \lambda_i\,\mathrm{id})$, take $\ker((\alpha - \lambda_i\,\mathrm{id})^{c_i})$, which captures not just eigenvectors but "generalised eigenvectors" — vectors that are mapped to the eigenspace after repeated applications of $\alpha - \lambda_i\,\mathrm{id}$.

[definition:Generalised Eigenspace]
Let $\alpha \in \mathrm{End}(V)$ with eigenvalue $\lambda$ and minimal polynomial multiplicity $c_\lambda$. The generalised eigenspace for $\lambda$ is $V_\lambda = \ker((\alpha - \lambda\,\mathrm{id})^{c_\lambda})$.
[/definition]

[quotetheorem:411]

This is the key structural result underlying the Jordan normal form. It decomposes $V$ into $\alpha$-invariant subspaces on each of which $\alpha$ has a single eigenvalue — reducing the general problem to the study of nilpotent endomorphisms (since $\alpha - \lambda_i\,\mathrm{id}$ is nilpotent on $V_i$). The proof constructs explicit projection operators using Bézout's identity applied to the pairwise coprime factors of $M_\alpha$.

[citeproof:411]

The Jordan Normal Form

[definition:Jordan Block]
A Jordan block of size $m$ with eigenvalue $\lambda$ is the $m \times m$ matrix $J_m(\lambda) = \lambda I_m + N_m$, where $N_m$ has ones on the superdiagonal and zeros elsewhere:

\begin{align*} J_m(\lambda) = \begin{pmatrix} \lambda & 1 & & \\ & \lambda & \ddots & \\ & & \ddots & 1 \\ & & & \lambda \end{pmatrix}. \end{align*}

A matrix is in Jordan normal form if it is block-diagonal with Jordan blocks along the diagonal.
[/definition]

[quotetheorem:412]

The Jordan normal form theorem solves the similarity classification problem over algebraically closed fields: two matrices are similar if and only if they have the same Jordan form (up to block ordering). The proof reduces to the nilpotent case via the Generalised Eigenspace Decomposition, then constructs Jordan bases for nilpotent endomorphisms by induction on dimension, using chains of the form $w, \nu(w), \dots, \nu^{r-1}(w)$.

[citeproof:412]

Uniqueness via Nullities

[quotetheorem:413]

This result proves uniqueness and also gives a practical method for determining the Jordan form without computing a Jordan basis: compute $\dim\ker(\alpha - \lambda\,\mathrm{id})^r$ for $r = 1, 2, 3, \dots$ until it stabilises, and read off the block sizes from the differences. The number of blocks of size exactly $r$ is $(n_r - n_{r-1}) - (n_{r+1} - n_r)$, where $n_r = \dim\ker(\alpha - \lambda\,\mathrm{id})^r$.

[citeproof:413]

[example:Computing A Jordan Form]
Let $A = MATHENVgxq8ujP67END$. The characteristic polynomial is $\chi_A(t) = (t-1)^2(t-2)$. For $\lambda = 1$: $A - I = MATHENVgxq8ujP68END$ has rank $2$, so $g_1 = n(A - I) = 1 < a_1 = 2$. Since $g_1 < a_1$, $A$ is not diagonalisable. We compute $(A-I)^2$: this has rank $1$, so $n((A-I)^2) = 2$. The differences are $d_1 = 1$, $d_2 = 2 - 1 = 1$, giving one Jordan block of size $\geq 1$ (namely $d_1 = 1$) and one of size $\geq 2$ (namely $d_2 = 1$). So there is one $2 \times 2$ Jordan block for $\lambda = 1$. For $\lambda = 2$: $a_2 = 1$, so there is a single $1 \times 1$ block. The Jordan form is:

\begin{align*} J = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{pmatrix}. \end{align*}
[/example]

[problem]
Let $\alpha: \mathbb{R}^3 \to \mathbb{R}^3$ be defined by the matrix $A = MATHENVgxq8ujP70END$. Find the characteristic polynomial, the minimal polynomial, all eigenspaces, and determine whether $\alpha$ is diagonalisable.
[/problem]

[solution]
Step 1: Characteristic polynomial.

$A$ is upper triangular, so $\chi_A(t) = (t - 2)^3$. The only eigenvalue is $\lambda = 2$ with algebraic multiplicity $a_2 = 3$.

Step 2: Eigenspace.

$A - 2I = MATHENVgxq8ujP71END$, which has rank $2$. So $g_2 = 3 - 2 = 1$ and $E_\alpha(2) = \langle (1, 0, 0)^\top \rangle$.

Step 3: Diagonalisability.

Since $g_2 = 1 < 3 = a_2$, $\alpha$ is not diagonalisable by Diagonalisability Criteria via Multiplicities.

Step 4: Minimal polynomial.

Check: $(A - 2I) \neq 0$, $(A - 2I)^2 = MATHENVgxq8ujP72END \neq 0$, $(A - 2I)^3 = 0$. So $M_A(t) = (t - 2)^3 = \chi_A(t)$.

The minimal polynomial equals the characteristic polynomial, confirming non-diagonalisability (since $(t-2)^3$ has repeated roots). The Jordan form is $J_3(2)$ — a single $3 \times 3$ Jordan block — and $A$ is already in Jordan normal form.
[/solution]

The Jordan normal form classifies endomorphisms completely over algebraically closed fields — but the classification is purely algebraic, making no reference to lengths, angles, or geometry. To bring geometry into the picture, we return to bilinear forms. The first chapter on bilinear forms treated the general case; we now specialise to symmetric and Hermitian forms, where the additional structure of symmetry ($\psi(u, v) = \psi(v, u)$) enables diagonalisation and a complete classification by signature. This is the algebraic preparation for inner product spaces, where positive definiteness will add geometry to algebra.

Bilinear Forms II: Symmetric and Hermitian Forms

Having established the general theory of bilinear forms and developed the necessary tools from determinants and endomorphisms, we now turn to the special and highly structured case of symmetric bilinear forms. These arise naturally in geometry, optimisation, and physics — most notably through their intimate connection with quadratic forms, which encode notions like distance, energy, and curvature. The central question driving this section is: to what extent can a symmetric bilinear form be simplified by a change of basis? Unlike the similarity transformations $P^{-1}AP$ that govern endomorphisms, bilinear forms transform under congruence $P^\top AP$, leading to a distinct classification problem. Over the real and complex numbers, this classification yields powerful canonical forms, culminating in Sylvester's law of inertia.

We also treat the complex analogue: Hermitian forms, where conjugate-linearity in one argument replaces plain linearity. The classification theory runs in parallel, with the conjugate transpose $P^\dagger$ replacing the transpose $P^\top$.

Core Definitions

Symmetric Bilinear Forms and Quadratic Forms

[definition:Symmetric Bilinear Form]
Let $V$ be a vector space over a field $\mathbb{F}$. A bilinear form $\phi: V \times V \to \mathbb{F}$ is symmetric if $\phi(v, w) = \phi(w, v)$ for all $v, w \in V$.
[/definition]

Symmetry is the algebraic abstraction of the idea that "the interaction between $v$ and $w$ is the same as between $w$ and $v$". The standard dot product $x \cdot y$ on $\mathbb{R}^n$ is the prototypical example. More broadly, symmetric bilinear forms appear whenever one has a notion of "angle" or "orthogonality" — in Euclidean geometry, Riemannian geometry, and the Lorentzian geometry of special relativity.

[definition:Quadratic Form]
A function $q: V \to \mathbb{F}$ is a quadratic form if there exists a bilinear form $\phi$ such that $q(v) = \phi(v, v)$ for all $v \in V$.
[/definition]

While bilinear forms take two inputs, a quadratic form "specialises" to the diagonal: it measures the form's value on a vector paired with itself. In coordinates, a quadratic form on $\mathbb{R}^n$ is simply a homogeneous degree-2 polynomial — expressions like $3x^2 + 2xy - z^2$ are exactly the objects studied here. Quadratic forms are ubiquitous: in optimisation, the second-order Taylor expansion of a smooth function at a critical point is a quadratic form (the Hessian), and determining whether that critical point is a minimum, maximum, or saddle reduces to classifying the associated form. In physics, the kinetic energy $\frac{1}{2}m|v|^2$ and the spacetime interval $c^2 t^2 - x^2 - y^2 - z^2$ are both quadratic forms.

[definition:Congruent Matrices]
Two matrices $A, B \in \mathrm{Mat}_n(\mathbb{F})$ are congruent if there exists an invertible $P$ with $B = P^\top A P$. This is the transformation rule for bilinear form matrices under change of basis (as established in Bilinear Forms I).
[/definition]

Congruence is the correct equivalence relation for bilinear forms, just as similarity ($P^{-1}AP$) is the correct one for endomorphisms. The distinction matters: a symmetric matrix can have very different eigenvalues from a congruent matrix, but the two represent the same geometric object. The classification programme for symmetric bilinear forms therefore asks: what are the congruence classes?

Definiteness

For real symmetric forms, the sign behaviour of the associated quadratic form classifies the geometry. This is directly connected to the shape of level sets: a positive definite form has ellipsoidal level sets, an indefinite form has hyperbolic level sets, and the degenerate cases correspond to cylindrical or planar level sets. In the language of multivariable calculus, the Hessian at a critical point determines whether it is a local minimum (positive definite), local maximum (negative definite), or saddle point (indefinite).

[definition:Positive/Negative Definiteness]
Let $\phi$ be a symmetric bilinear form on a real vector space $V$.

  • $\phi$ is positive definite if $\phi(v, v) > 0$ for all $v \neq \mathbf{0}$.
  • $\phi$ is positive semi-definite if $\phi(v, v) \geq 0$ for all $v$.
  • $\phi$ is negative definite if $\phi(v, v) < 0$ for all $v \neq \mathbf{0}$.
  • $\phi$ is negative semi-definite if $\phi(v, v) \leq 0$ for all $v$.
  • $\phi$ is indefinite if it takes both positive and negative values.
    [/definition]

Symmetry and Matrices

The following result translates the abstract algebraic condition of symmetry into a concrete and checkable matrix condition. It also ensures that the property of being symmetric is intrinsic to the form itself, not an artefact of a particular basis choice.

[quotetheorem:425]

The proof is immediate: symmetry of $\phi$ forces $M_{ij} = \phi(e_i, e_j) = \phi(e_j, e_i) = M_{ji}$, and conversely $M = M^\top$ gives $\phi(v, w) = v^\top M w = (w^\top M v)^\top = w^\top M^\top v = \phi(w, v)$. Symmetry of $M$ is preserved under congruence: $(P^\top M P)^\top = P^\top M^\top P = P^\top M P$.

[citeproof:425]

[example:Standard Dot Product]
On $\mathbb{R}^n$, the standard dot product $\phi(x, y) = x^\top y$ has matrix $I_n$, which is symmetric. The Lorentzian form $\phi(x, y) = x_1 y_1 - x_2 y_2 - x_3 y_3 - x_4 y_4$ has matrix $\mathrm{diag}(1, -1, -1, -1)$, also symmetric — this is the Minkowski metric of special relativity, and its indefiniteness reflects the fundamental difference between time and space directions.
[/example]

The Polarisation Identity

The polarisation identity is the key bridge between the one-variable world of quadratic forms and the two-variable world of bilinear forms. Its significance is both theoretical and practical: it tells us that knowing $q(v)$ for all $v$ is exactly the same as knowing $\phi(v, w)$ for all $v, w$ — no information is lost in passing from the bilinear form to the quadratic form, and vice versa. This means that classifying quadratic forms (homogeneous degree-2 polynomials) and classifying symmetric bilinear forms are genuinely the same problem.

[quotetheorem:426]

The formula $\phi(v, w) = \frac{1}{2}(q(v+w) - q(v) - q(w))$ recovers the bilinear form from the quadratic form by "linearising" — expanding $q(v+w)$ and isolating the cross-term. The characteristic restriction $\mathrm{Char}\,\mathbb{F} \neq 2$ is essential: over $\mathbb{F}_2$, the quadratic form $q(x,y) = xy$ on $\mathbb{F}_2^2$ cannot distinguish between $\phi_1((x_1,y_1),(x_2,y_2)) = x_1 y_2$ and $\phi_2 = x_1 y_2 + x_2 y_1$, since both give $q(v) = \phi_i(v,v)$.

[citeproof:426]

Diagonalisation

The central structural result for symmetric bilinear forms is that they can always be diagonalised by congruence — a much stronger statement than mere triangulability. Where diagonalising an endomorphism (finding eigenvectors) can fail, diagonalising a symmetric bilinear form always succeeds in characteristic $\neq 2$. This is remarkable: it says that, after a suitable change of basis, the cross-terms in any quadratic form can be completely eliminated, reducing it to a sum of squares $d_1 x_1^2 + d_2 x_2^2 + \cdots + d_n x_n^2$. The constructive proof is essentially the method of "completing the square" carried out systematically.

[quotetheorem:427]

The proof is constructive and proceeds by induction. The key step is finding a non-isotropic vector $v$ (one with $\phi(v, v) \neq 0$), taking its orthogonal complement $v^\perp$, verifying the direct sum decomposition $V = \langle v \rangle \oplus v^\perp$, and applying the inductive hypothesis to the restriction $\phi|_{v^\perp}$. Finding the non-isotropic vector requires $\mathrm{Char}\,\mathbb{F} \neq 2$: if all diagonal entries $\phi(e_i, e_i)$ vanish but some off-diagonal entry $\phi(e_i, e_j) \neq 0$, then $\phi(e_i + e_j, e_i + e_j) = 2\phi(e_i, e_j) \neq 0$.

[citeproof:427]

[example:Diagonalising by Completing the Square]
Consider $q(x,y,z) = 2x^2 + 4xy + y^2 - z^2$ on $\mathbb{R}^3$. The matrix is $M = MATHENVgxq8ujP73END$. Since $M_{11} = 2 \neq 0$, set $f_1 = e_1$ and orthogonalise: $f_2 = e_2 - \frac{\phi(e_2, e_1)}{\phi(e_1, e_1)}e_1 = e_2 - e_1$. Then $\phi(f_2, f_2) = \phi(e_2 - e_1, e_2 - e_1) = 1 - 4 + 2 = -1$. With $f_3 = e_3$ (already orthogonal to both), the matrix becomes $\mathrm{diag}(2, -1, -1)$, confirming signature $(p,q) = (1,2)$.
[/example]

Classification over Algebraically Closed Fields

Over an algebraically closed field (such as $\mathbb{C}$), the classification of symmetric bilinear forms collapses to something remarkably simple. Since every non-zero element has a square root, each non-zero diagonal entry can be rescaled to $1$, and the only information that survives is how many non-zero entries there are — i.e., the rank. This means that over $\mathbb{C}$, there is essentially no interesting geometry of quadratic forms: all non-degenerate forms of the same dimension look the same. The contrast with the real case, where the signs of the diagonal entries matter, is striking.

[quotetheorem:428]

[citeproof:428]

Sylvester's Law of Inertia

Over $\mathbb{R}$, we cannot take square roots of negative numbers, so positive and negative diagonal entries cannot be merged. This gives a richer classification than the algebraically closed case, and is one of the most important results in the theory of quadratic forms.

Sylvester's law tells us that the signature $(p, q)$ — the number of positive and negative entries in any diagonal representation — is a complete invariant of a real symmetric bilinear form under congruence. The word "inertia" comes from the fact that these numbers are inert (unchanging) under all possible basis changes. The practical consequence is powerful: to determine whether two real quadratic forms are equivalent, one need only diagonalise each and count signs.

[quotetheorem:429]

The existence part follows from diagonalisation plus rescaling by $|d_i|^{-1/2}$. The deep content is uniqueness: $p$ and $q$ are basis-independent. The proof of invariance is elegant — if $W$ is any subspace on which $\phi$ is positive definite, and $N_0$ is the span of the negative and zero directions, then $W \cap N_0 = \{0\}$ (a non-zero vector cannot have both $\phi(w,w) > 0$ and $\phi(w,w) \leq 0$). The Dimension Formula then gives $\dim W \leq p$.

[citeproof:429]

[definition:Signature]
The signature of a real symmetric bilinear form is the pair $(p, q)$ from Sylvester's law. The integer $p - q$ is sometimes also called the signature. The inertia is the triple $(p, q, n - p - q)$.
[/definition]

The signature is a powerful invariant with direct geometric meaning. For instance, a form is positive definite if and only if $(p,q) = (n, 0)$, which corresponds to the geometry of ellipsoids and Euclidean space. The Lorentzian signature $(1, n-1)$ (or equivalently $(n-1, 1)$) gives rise to the geometry of special relativity, where the single "positive" direction corresponds to time. The signature $(p, p)$ with $n = 2p$ arises in symplectic-like contexts and the theory of split forms.

[example:Classifying Small Real Quadratic Forms]
Up to congruence, the symmetric bilinear forms on $\mathbb{R}^2$ are classified by $(p,q)$:

  • $(2, 0)$: positive definite, e.g., $x^2 + y^2$. Matrix $I_2$.
  • $(1, 1)$: indefinite, e.g., $x^2 - y^2$. Matrix $\mathrm{diag}(1,-1)$.
  • $(0, 2)$: negative definite, e.g., $-x^2 - y^2$. Matrix $-I_2$.
  • $(1, 0)$: degenerate positive semi-definite, e.g., $x^2$. Matrix $\mathrm{diag}(1,0)$.
  • $(0, 1)$: degenerate negative semi-definite, e.g., $-x^2$. Matrix $\mathrm{diag}(-1,0)$.
  • $(0, 0)$: the zero form. Matrix $0$.

That gives 6 congruence classes, compared to just 3 over $\mathbb{C}$ (classified by rank alone).
[/example]

Hermitian Forms

For complex vector spaces, symmetric bilinear forms are inadequate for defining any useful notion of "length" or "positivity". The essential problem is that $\phi(iv, iv) = i^2 \phi(v, v) = -\phi(v, v)$, so the "squared length" of a vector changes sign under multiplication by $i$ — meaning no non-zero symmetric bilinear form on a complex space can be positive definite. To develop an inner product theory on complex spaces (as needed throughout quantum mechanics, functional analysis, and signal processing), one must introduce conjugation into the definition.

[definition:Sesquilinear Form]
Let $V$ be a complex vector space. A sesquilinear form is a function $\phi: V \times V \to \mathbb{C}$ that is conjugate-linear in the first argument and linear in the second: $\phi(\lambdav, w) = \bar\lambda\,\phi(v, w)$ and $\phi(v, \lambdaw) = \lambda\,\phi(v, w)$, with additivity in both arguments.
[/definition]

The prefix "sesqui-" means "one and a half" — referring to the fact that the form is "one and a half times linear" (linear in one argument, conjugate-linear in the other). The conjugation in the first argument is precisely what cancels the problematic sign: $\phi(iv, iv) = \bar{i} \cdot i \cdot \phi(v, v) = |i|^2 \phi(v, v) = \phi(v, v)$.

[definition:Hermitian Form]
A sesquilinear form $\phi$ is Hermitian if $\phi(v, w) = \overline{\phi(w, v)}$ for all $v, w \in V$. Note that this forces $\phi(v, v) = \overline{\phi(v, v)}$, so $\phi(v, v) \in \mathbb{R}$ for all $v$, enabling a theory of definiteness.
[/definition]

The Hermitian condition is the natural complex analogue of symmetry: it ensures that the "diagonal values" $\phi(v, v)$ are always real, which is essential for defining positivity and for applications throughout mathematics and physics. The standard inner product $\langle z, w \rangle = \sum_i \bar{z}_i w_i$ on $\mathbb{C}^n$ is the prototypical Hermitian form, and in quantum mechanics, the Hermitian inner product on a Hilbert space underlies the entire probabilistic structure of the theory: expectation values $\langle \psi | A | \psi \rangle$ of an observable $A$ are real precisely because $A$ is a Hermitian operator.

A Hermitian form has matrix $A$ satisfying $A = A^\dagger$ (i.e., $A_{ij} = \overline{A_{ji}}$), the Hermitian analogue of symmetry.

Change of Basis

The following result shows how the matrix of a Hermitian form transforms under change of basis. The conjugate transpose $P^\dagger$ replaces the ordinary transpose $P^\top$, reflecting the conjugate-linearity in the first argument.

[quotetheorem:430]

The proof is a direct computation: $B_{ij} = \phi(f_i, f_j) = \sum_{k,\ell} \overline{P_{ki}}\,A_{k\ell}\,P_{\ell j} = (P^\dagger A P)_{ij}$. Two Hermitian matrices are $\ast$-congruent ($B = P^\dagger A P$) if and only if they represent the same Hermitian form.

[citeproof:430]

Sylvester's Law for Hermitian Forms

The classification of Hermitian forms over $\mathbb{C}$ parallels the real case exactly: the signature $(p, q)$ is again the complete congruence invariant. This is perhaps surprising at first — one might expect the complex numbers to offer more room for simplification (as they did for symmetric forms over $\mathbb{C}$, where rank alone sufficed). But the conjugation built into $\ast$-congruence prevents one from absorbing signs, so the real and Hermitian classification problems have the same answer.

[quotetheorem:431]

The diagonalisation argument adapts to the Hermitian setting: if all $\phi(e_i, e_i) = 0$ but $\phi(e_i, e_j) = re^{i\theta} \neq 0$, set $v = e_i + e^{-i\theta}e_j$ to get $\phi(v, v) = 2r \neq 0$. After diagonalisation, the entries are real (since $\phi(v, v) \in \mathbb{R}$), and the uniqueness argument from the real case carries over verbatim.

[citeproof:431]

[problem]
Let $\phi$ be the symmetric bilinear form on $\mathbb{R}^3$ with matrix $M = MATHENVgxq8ujP74END$. Find the signature $(p, q)$ and determine whether $\phi$ is positive definite, negative definite, or indefinite.
[/problem]

[solution]
Step 1: Diagonalise by orthogonal complement.

Since $M_{11} = 1 \neq 0$, set $f_1 = e_1$. Orthogonalise: $f_2 = e_2 - \frac{\phi(e_2, e_1)}{\phi(e_1, e_1)}e_1 = e_2 - e_1$. Then $\phi(f_2, f_2) = \phi(e_2, e_2) - 2\phi(e_2, e_1) + \phi(e_1, e_1) = 1 - 2 + 1 = 0$.

Since $\phi(f_2, f_2) = 0$, the restriction to $f_1^\perp$ is degenerate. Compute $f_3 = e_3 - \frac{\phi(e_3, e_1)}{\phi(e_1, e_1)}e_1 = e_3$. Then $\phi(f_2, f_3) = \phi(e_2 - e_1, e_3) = 1 - 0 = 1 \neq 0$, and $\phi(f_3, f_3) = 1$. Since $\phi(f_2, f_2) = 0$ but $\phi(f_2, f_3) = 1$, try $g_2 = f_2 + f_3$: $\phi(g_2, g_2) = 0 + 2(1) + 1 = 3$. And $g_3 = f_3 - \frac{\phi(f_3, g_2)}{\phi(g_2, g_2)}g_2 = f_3 - \frac{2}{3}g_2 = f_3 - \frac{2}{3}(f_2 + f_3) = -\frac{2}{3}f_2 + \frac{1}{3}f_3$. Then $\phi(g_3, g_3) = \frac{4}{9}(0) - \frac{4}{9}(1) + \frac{1}{9}(1) = -\frac{3}{9} = -\frac{1}{3}$.

Step 2: Read off the signature.

The diagonal entries are $\phi(f_1, f_1) = 1$, $\phi(g_2, g_2) = 3$, $\phi(g_3, g_3) = -\frac{1}{3}$. So $(p, q) = (2, 1)$.

Verification: $\det M = 1(1 - 1) - 1(1 - 0) + 0 = -1 < 0$, and $\det M = d_1 \cdot d_2 \cdot d_3 = 1 \cdot 3 \cdot (-\frac{1}{3}) = -1$. ✓

The form is indefinite (takes both positive and negative values).
[/solution]

Sylvester's law shows that symmetric bilinear forms over $\mathbb{R}$ are classified by signature $(p, q)$ — a pair of non-negative integers. This is a clean algebraic result, but it says nothing about geometry: lengths, angles, orthogonal projections, and "closest points" all require a form that is not merely non-degenerate but positive definite. The next chapter specialises to this case — inner product spaces — where algebra meets geometry. The payoff is the spectral theorem: every self-adjoint endomorphism of an inner product space admits an orthonormal eigenbasis, a result that unifies diagonalisation, orthogonal projection, and the principal axes of quadratic forms.

Inner Product Spaces

A vector space, on its own, has no notion of length, angle, or distance. A bilinear form adds some structure — it pairs vectors together — but without further constraints the resulting "geometry" may be degenerate or indefinite: vectors can have negative "length," and distinct nonzero vectors may be invisible to the form. The theory of Bilinear Forms I and Bilinear Forms II developed this general framework; we now ask what happens when we impose the strongest possible non-degeneracy condition: positive-definiteness.

The answer is the inner product, and the payoff is enormous. Positive-definiteness gives every vector a genuine length, every pair of vectors a meaningful angle, and every subspace a clean orthogonal complement. These are not merely aesthetic additions: they supply the tools needed for approximation (projecting onto subspaces to find closest points), decomposition (splitting a space into orthogonal pieces), and spectral theory (diagonalising operators in a canonical way). The applications range from least-squares fitting and Fourier analysis to the foundations of quantum mechanics.

The prerequisites are the theory of vector spaces and linear maps developed in earlier pages, together with the general theory of symmetric and Hermitian bilinear forms from Bilinear Forms II. We specialise throughout to the positive-definite case.

Inner Products and Their Induced Norm

The central definition of this page combines the structure of a symmetric (or Hermitian) bilinear form with a decisive analytic condition — positive-definiteness — that anchors the algebra to genuine metric geometry.

[definition:Inner Product]
Let $V$ be a vector space over $\mathbb{F} \in \{\mathbb{R}, \mathbb{C}\}$. An inner product on $V$ is a map $(\cdot\,,\cdot): V \times V \to \mathbb{F}$ satisfying:

  1. Conjugate symmetry: $(v, w) = \overline{(w, v)}$ for all $v, w \in V$.
  2. Linearity in the second argument: $(v, \lambdaw_1 + \muw_2) = \lambda(v, w_1) + \mu(v, w_2)$.
  3. Positive-definiteness: $(v, v) \geq 0$ for all $v$, with equality iff $v = \mathbf{0}$.

When $\mathbb{F} = \mathbb{R}$, this is a positive-definite symmetric bilinear form. When $\mathbb{F} = \mathbb{C}$, it is a positive-definite Hermitian sesquilinear form (conjugate-linear in the first argument, linear in the second).
[/definition]

It is worth pausing on why positive-definiteness is so consequential. An arbitrary symmetric bilinear form can have a nontrivial radical (vectors $v$ with $\phi(v, w) = 0$ for all $w$), and its "length function" $v \mapsto \phi(v, v)$ can take negative values or vanish on nonzero vectors. Positive-definiteness eliminates both pathologies at once: the radical is trivial (so the form is non-degenerate), and the quantity $\sqrt{(v, v)}$ behaves as a genuine length. This is what allows us to measure distances, define orthogonality, and do geometry.

[definition:Inner Product Space]
A vector space $V$ over $\mathbb{F} \in \{\mathbb{R}, \mathbb{C}\}$ equipped with an inner product is an inner product space. The induced norm is $\|v\| = \sqrt{(v, v)}$.
[/definition]

[example:Standard Inner Products]
The standard inner product on $\mathbb{F}^n$ is $(x, y) = \sum_{i=1}^n \bar{x}_i y_i$. On $C([0,1], \mathbb{F})$, the $L^2$ inner product $(f, g) = \int_0^1 \overline{f(t)}\,g(t)\,dt$ makes the space of continuous functions into an inner product space. More generally, any positive-definite symmetric matrix $A$ defines an inner product $(x, y)_A = x^\top A y$ on $\mathbb{R}^n$.
[/example]

The third example illustrates an important point: there are many different inner products on the same vector space, each defining a different geometry. Two vectors that are "orthogonal" with respect to one inner product may not be orthogonal with respect to another. The choice of inner product is additional data, not intrinsic to the vector space.

The Cauchy–Schwarz Inequality and the Norm

Having defined the inner product, we need to verify that the induced "norm" $\|v\| = \sqrt{(v, v)}$ genuinely deserves the name — specifically, that it satisfies the triangle inequality. This is not at all obvious from the definition, and the path to it runs through the single most important inequality in inner product space theory.

[quotetheorem:432]

The Cauchy–Schwarz inequality says, in geometric terms, that the "cosine" $(v, w)/(\|v\|\|w\|)$ always lies in $[-1, 1]$. It is the inequality that makes the notion of angle well-defined, and it controls everything that follows: the triangle inequality, continuity of the inner product, and the convergence arguments in projection theory.

The proof is a beautiful one-line optimisation. For $w \neq \mathbf{0}$, expand $\|v - tw\|^2 \geq 0$ and choose the scalar $t = (v, w)/\|w\|^2$ that minimises the expression. The non-negative remainder is $\|v\|^2 - |(v, w)|^2/\|w\|^2 \geq 0$, which rearranges to the desired inequality.

[citeproof:432]

With Cauchy–Schwarz established, the triangle inequality follows by a short computation:

[quotetheorem:433]

The argument expands $\|v + w\|^2 = \|v\|^2 + 2\,\mathrm{Re}(v, w) + \|w\|^2$, bounds $\mathrm{Re}(v, w) \leq |(v, w)| \leq \|v\|\|w\|$ (the second step is Cauchy–Schwarz), and takes square roots.

[citeproof:433]

Together with absolute homogeneity $\|cv\| = |c|\,\|v\|$ and positive-definiteness $\|v\| = 0 \Leftrightarrow v = \mathbf{0}$, this confirms that $\|\cdot\|$ is a norm. Every inner product space is therefore a normed space, and hence a metric space with distance $d(v, w) = \|v - w\|$. The full power of metric-space topology — open sets, convergence, continuity — is now available.

Orthogonality and Orthonormal Bases

The notion of orthogonality is the geometric heart of inner product space theory. In $\mathbb{R}^2$ or $\mathbb{R}^3$, perpendicularity is a familiar geometric idea; the inner product generalises it to arbitrary dimension and to abstract vector spaces where no ambient "picture" exists.

[definition:Orthogonal Vectors]
Vectors $v, w \in V$ are orthogonal (written $v \perp w$) if $(v, w) = 0$.
[/definition]

Why does orthogonality matter so much? Because orthogonal vectors are, in a precise sense, independent in the strongest possible way. Linear independence says that no vector in a set is a linear combination of the others; orthogonality says more — the "directions" are completely decoupled. This decoupling simplifies almost every computation: inner products reduce to coordinate-wise sums, projections have explicit formulas, and operators can be analysed one eigenspace at a time.

[definition:Orthonormal Basis]
A basis $(e_1, \dots, e_n)$ for $V$ is orthonormal if $(e_i, e_j) = \delta_{ij}$. An orthonormal set is always linearly independent (if $\sum c_i e_i = \mathbf{0}$, take the inner product with $e_j$ to get $c_j = 0$).
[/definition]

The fundamental advantage of an orthonormal basis is that coordinates become trivial to compute — they are just inner products — and the inner product itself reduces to a sum of products of coordinates. This is the content of the following identity, which is the finite-dimensional version of Parseval's theorem from Fourier analysis:

[quotetheorem:434]

The proof is immediate from expanding $(v, w)$ using bilinearity and orthonormality. The special case $\|v\|^2 = \sum |(e_i, v)|^2$ is the Pythagorean theorem in $n$ dimensions: the square of the "hypotenuse" is the sum of the squares of the components along orthogonal axes.

[citeproof:434]

The Gram–Schmidt Process

The theory so far has assumed the existence of orthonormal bases. This existence is not obvious — it requires a constructive procedure that converts an arbitrary basis into an orthonormal one without changing the span.

[quotetheorem:435]

The construction proceeds inductively. Set $v_1 = e_1/\|e_1\|$. For each subsequent vector, subtract its projections onto the previous orthonormal vectors (removing the components that are "already accounted for") and normalise what remains: $u_k = e_k - \sum_{i

[citeproof:435]

An immediate corollary: any orthonormal set in $V$ can be extended to an orthonormal basis. Simply extend the set to a full basis using the Basis Extension Theorem, then apply Gram–Schmidt — the process will leave the original orthonormal vectors untouched (their projections onto each other are already zero or one).

Orthogonal Complements and Projections

Having established that every inner product space admits orthonormal bases, we now develop the structural theory that makes inner product spaces geometrically much richer than plain vector spaces. The key idea is that every subspace $W \leq V$ determines a complementary subspace $W^\perp$ — the set of all vectors perpendicular to $W$ — and this decomposition is canonical: it depends only on the inner product, not on any choice of basis.

[definition:Orthogonal Complement]
For a subspace $W \leq V$, the orthogonal complement is $W^\perp = \{v \in V : (v, w) = 0 \text{ for all } w \in W\}$.
[/definition]

This definition should remind the reader of the annihilator $W^0 \subseteq V^*$ from Duality. The annihilator consists of all linear functionals that vanish on $W$; the orthogonal complement consists of all vectors that pair to zero with $W$ under the inner product. The inner product provides a canonical identification between $V$ and $V^*$ (each vector $u$ determines the functional $v \mapsto (u, v)$), and under this identification, $W^\perp$ corresponds exactly to $W^0$. The advantage of $W^\perp$ is that it lives in $V$ itself, not in the dual — we can add and subtract vectors in $V$ and $W^\perp$ directly.

[quotetheorem:436]

This is the orthogonal decomposition theorem: every vector splits uniquely into a component in $W$ and a component in $W^\perp$. The proof uses two ingredients. First, $W \cap W^\perp = \{0\}$ by positive-definiteness: if $v \in W \cap W^\perp$ then $(v, v) = 0$, so $v = \mathbf{0}$. Second, $\dim W^\perp = n - k$ by Rank-Nullity applied to the map $v \mapsto ((e_1, v), \dots, (e_k, v))$, whose kernel is $W^\perp$. Together, $\dim(W + W^\perp) = n$.

[citeproof:436]

Orthogonal Projection and Best Approximation

The decomposition $V = W \oplus W^\perp$ means that every vector $v$ has a unique "shadow" in $W$ — the component $\pi(v) \in W$ obtained by discarding the $W^\perp$ part. This shadow is not just any element of $W$; it is the closest element of $W$ to $v$. This best-approximation property is what makes orthogonal projection the engine behind least-squares methods, Fourier approximation, and, in infinite dimensions, the variational formulation of PDEs.

[quotetheorem:437]

The projection formula $\pi(v) = \sum_{i=1}^k (e_i, v)\,e_i$ decomposes $v$ into its component in $W$ and its component in $W^\perp$. The best approximation property follows from the Pythagorean theorem: for any $w \in W$, write $v - w = (v - \pi(v)) + (\pi(v) - w)$, where the first term lies in $W^\perp$ and the second in $W$. Since these are orthogonal, $\|v - w\|^2 = \|v - \pi(v)\|^2 + \|\pi(v) - w\|^2$. This is minimised precisely when $w = \pi(v)$.

[citeproof:437]

[example:Least-Squares Approximation]
Let $V = C([-\pi, \pi], \mathbb{R})$ with inner product $(f, g) = \frac{1}{\pi}\int_{-\pi}^{\pi} f(t)g(t)\,dt$, and let $W = \langle 1, \cos t, \sin t \rangle$. The best approximation to $f(t) = t^2$ from $W$ is $\pi(f) = a_0 + a_1 \cos t + b_1 \sin t$ where $a_0 = (1, f)/\|1\|^2$, etc. This is the beginning of Fourier approximation.
[/example]

This example illustrates the broader principle: orthogonal projection does not require the ambient space to be finite-dimensional — only the target subspace $W$ needs a finite orthonormal basis. The formula $\pi(f) = \sum (e_i, f)\, e_i$ works in any inner product space. In the infinite-dimensional setting, taking $W$ to be ever-larger subspaces of trigonometric polynomials leads to Fourier series, and the question of whether $\pi(f) \to f$ as $\dim W \to \infty$ becomes the convergence theory of Hilbert Spaces.

Adjoints

In the general theory of Endomorphisms and Duality, the dual map $\alpha^*: V^* \to V^*$ encodes the "transpose" of a linear map, but it lives in the dual space. When $V$ carries an inner product, we can do better: the canonical identification $V \cong V^*$ allows us to bring the dual map back into $V$ itself. The result is the adjoint $\alpha^*$, which is the unique endomorphism satisfying $(\alpha(v), w) = (v, \alpha^*(w))$.

The adjoint is fundamental because it relates the "input" and "output" sides of a linear map in a way that respects the geometry. Properties of the map — whether it stretches, rotates, or projects — are encoded in the relationship between $\alpha$ and $\alpha^*$.

[quotetheorem:438]

The construction uses orthonormal bases: if $\alpha$ has matrix $A$ with respect to an orthonormal basis, then $\alpha^*$ has matrix $A^\dagger = \bar{A}^\top$. The adjoint property $(\alpha(v), w) = (v, \alpha^*(w))$ follows from a direct computation. Uniqueness holds by non-degeneracy of the inner product: if $(v, \alpha^*(w) - \beta(w)) = 0$ for all $v$, then $\alpha^* = \beta$.

[citeproof:438]

Two classes of endomorphisms are singled out by their relationship to the adjoint. Each captures a fundamental geometric property.

[definition:Self-Adjoint Map]
An endomorphism $\alpha \in \mathrm{End}(V)$ is self-adjoint (or symmetric over $\mathbb{R}$, Hermitian over $\mathbb{C}$) if $\alpha = \alpha^*$, i.e., $(\alpha(v), w) = (v, \alpha(w))$ for all $v, w$.
[/definition]

In an orthonormal basis, $\alpha$ is self-adjoint iff its matrix satisfies $A = A^\dagger$: real symmetric when $\mathbb{F} = \mathbb{R}$, Hermitian when $\mathbb{F} = \mathbb{C}$. Self-adjoint maps are the "real-valued" endomorphisms of inner product space theory: their eigenvalues are always real, their eigenvectors for distinct eigenvalues are orthogonal, and they admit a particularly clean spectral decomposition (see below). They arise naturally whenever a bilinear form is represented as an operator: if $\phi(v, w) = (v, \alpha(w))$ for a symmetric form $\phi$, then $\alpha$ is self-adjoint.

Isometries: Orthogonal and Unitary Maps

The second distinguished class consists of maps that preserve the inner product — the "rigid motions" of the space.

[definition:Isometry]
An endomorphism $\alpha \in \mathrm{End}(V)$ is an isometry if it preserves the inner product: $(\alpha(v), \alpha(w)) = (v, w)$ for all $v, w$. Over $\mathbb{R}$, these are called orthogonal maps; over $\mathbb{C}$, unitary maps.
[/definition]

An isometry preserves all lengths and angles, and in particular sends orthonormal bases to orthonormal bases. These are the symmetries of the inner product space — the maps under which the geometry is completely invariant. The following theorem collects equivalent characterisations:

[quotetheorem:439]

The key step is that preserving the inner product forces $\alpha^*\alpha = \mathrm{id}$ by non-degeneracy: $(v, \alpha^*\alpha(w)) = (\alpha(v), \alpha(w)) = (v, w)$ for all $v, w$. In finite dimensions, injectivity of $\alpha$ (which follows from $\alpha^*\alpha = \mathrm{id}$) implies bijectivity, giving $\alpha^* = \alpha^{-1}$.

[citeproof:439]

[definition:Orthogonal and Unitary Groups]
The orthogonal group $\mathrm{O}(V) = \{\alpha \in \mathrm{End}(V) : \alpha^*\alpha = \mathrm{id}\}$ (over $\mathbb{R}$) and the unitary group $\mathrm{U}(V) = \{\alpha \in \mathrm{End}(V) : \alpha^*\alpha = \mathrm{id}\}$ (over $\mathbb{C}$) are the groups of isometries. In matrix form: $\mathrm{O}(n) = \{A \in \mathrm{GL}_n(\mathbb{R}) : A^\top A = I\}$ and $\mathrm{U}(n) = \{A \in \mathrm{GL}_n(\mathbb{C}) : A^\dagger A = I\}$.
[/definition]

These groups are fundamental objects in their own right. The orthogonal group $\mathrm{O}(n)$ consists of rotations and reflections of $\mathbb{R}^n$; the unitary group $\mathrm{U}(n)$ plays the same role for $\mathbb{C}^n$. The condition $|\det \alpha| = 1$ (which follows from $\alpha^*\alpha = \mathrm{id}$) shows these are compact subgroups of the general linear group — a fact with deep consequences in representation theory and physics.

Spectral Theory

The general theory of Endomorphisms shows that an arbitrary endomorphism need not be diagonalisable: the characteristic polynomial may lack roots over the ground field, or eigenspaces may fail to span the full space. The inner product changes this picture dramatically for two classes of maps: self-adjoint maps and (over $\mathbb{C}$) unitary maps. For these, diagonalisability is guaranteed, and the eigenvectors can be chosen to form an orthonormal basis — a much stronger conclusion than mere diagonalisability.

The Spectral Theorem for Self-Adjoint Maps

[quotetheorem:440]

This is the most important structure theorem in inner product space theory, and the finite-dimensional precursor of the spectral theory that pervades functional analysis and quantum mechanics. It says that every self-adjoint map, when viewed in the right basis, is just scalar multiplication on each axis — the geometry is entirely determined by the eigenvalues.

The proof proceeds by induction on dimension. Three key ingredients make it work, each exploiting the inner product in an essential way:

(i) Eigenvalues are real. If $\alpha(v) = \lambdav$ with $v \neq \mathbf{0}$, then $\lambda\|v\|^2 = (v, \alpha(v)) = (\alpha(v), v) = \bar\lambda\|v\|^2$, forcing $\lambda = \bar\lambda \in \mathbb{R}$. This is the step that fails for general endomorphisms.

(ii) Eigenvectors for distinct eigenvalues are orthogonal. If $\alpha(v) = \lambdav$ and $\alpha(w) = \muw$ with $\lambda \neq \mu$, then $\lambda(v, w) = (\alpha(v), w) = (v, \alpha(w)) = \mu(v, w)$, forcing $(v, w) = 0$. The eigenspaces are automatically perpendicular — no Gram–Schmidt needed.

(iii) Orthogonal complements of eigenspaces are invariant. If $e_1$ is a unit eigenvector with eigenvalue $\lambda$, then for any $w \in \langle e_1 \rangle^\perp$, we have $(e_1, \alpha(w)) = (\alpha(e_1), w) = \lambda(e_1, w) = 0$. So $\alpha$ maps $\langle e_1 \rangle^\perp$ to itself, and we may apply induction.

Over $\mathbb{R}$, the existence of a real eigenvalue requires an additional argument (the complexified characteristic polynomial has a root, which must be real by step (i)). Over $\mathbb{C}$, eigenvalues exist automatically.

[citeproof:440]

In matrix terms: every real symmetric matrix can be written as $A = Q D Q^\top$ where $Q$ is orthogonal and $D$ is diagonal with real entries; every Hermitian matrix can be written as $A = U D U^\dagger$ where $U$ is unitary and $D$ is real diagonal.

A useful corollary connects back to the theory of Bilinear Forms II: if $\phi$ is a positive-definite and $\psi$ is any symmetric bilinear form on a real vector space, we can use $\phi$ as an inner product and apply the spectral theorem to the self-adjoint map $\alpha$ defined by $\psi(v, w) = \phi(v, \alpha(w))$, yielding a basis that simultaneously diagonalises both forms.

The Spectral Theorem for Unitary Maps

Over $\mathbb{C}$, the same orthogonal-diagonalisation phenomenon occurs for unitary maps, though with a different constraint on the eigenvalues.

[quotetheorem:441]

The argument parallels the self-adjoint case, with the key difference that eigenvalues now satisfy $|\lambda| = 1$ rather than $\lambda \in \mathbb{R}$. This follows from the norm-preserving property: $\|\alpha(v)\| = \|v\|$ forces $|\lambda| = 1$. Eigenvectors for distinct eigenvalues are again orthogonal (the proof is similar), and orthogonal complements of eigenspaces are invariant because $\alpha^* = \alpha^{-1}$.

[citeproof:441]

This result is specific to $\mathbb{C}$: a real orthogonal map need not be diagonalisable over $\mathbb{R}$. For example, a rotation of $\mathbb{R}^2$ by angle $\theta \notin \{0, \pi\}$ has no real eigenvalues. The classification of real orthogonal maps — showing they decompose into $2 \times 2$ rotation blocks and $\pm 1$ entries — requires a more refined analysis.

Worked Example

[problem]
Let $A = MATHENVgxq8ujP75END$. Find an orthogonal matrix $Q$ such that $Q^\top A Q$ is diagonal.
[/problem]

[solution]
Step 1: Find eigenvalues. $\chi_A(t) = (2-t)^2 - 1 = t^2 - 4t + 3 = (t-1)(t-3)$. Eigenvalues: $\lambda_1 = 1$, $\lambda_2 = 3$.

Step 2: Find eigenvectors. For $\lambda_1 = 1$: $(A - I)v = 0$ gives $v_1 = \frac{1}{\sqrt{2}}(-1, 1)^\top$. For $\lambda_2 = 3$: $(A - 3I)v = 0$ gives $v_2 = \frac{1}{\sqrt{2}}(1, 1)^\top$.

Step 3: Form $Q$. $Q = \frac{1}{\sqrt{2}}MATHENVgxq8ujP76END$. Check: $Q^\top Q = I$ and $Q^\top A Q = \mathrm{diag}(1, 3)$.

This illustrates the Spectral Theorem for Self-Adjoint Maps: $A$ is real symmetric, so it is orthogonally diagonalisable with real eigenvalues. Note that the eigenvectors are automatically orthogonal (as the theorem guarantees), so normalisation suffices — no Gram–Schmidt is needed.
[/solution]

The spectral theorem is the summit of the course — the point where algebra (eigenvalues, diagonalisation) and geometry (orthogonality, projection) meet perfectly. The remaining section collects several results that synthesise ideas from across the course, applying the full toolkit to decomposition problems and structural questions.

Further Topics in Linear Algebra

This section collects several results that synthesise ideas from across the course: the decomposition of matrix spaces by symmetry and trace, the theory of idempotent operators, and the stabilisation of rank sequences under iteration. These topics connect the algebraic theory of endomorphisms to the geometric theory of bilinear forms, and provide computational tools for working with Jordan normal forms.

Decomposition of the Matrix Algebra

Symmetric and Skew-Symmetric Parts

Every square matrix can be uniquely split into a symmetric part and a skew-symmetric part — the matrix analogue of decomposing a bilinear form into its symmetric and alternating components.

[definition:Symmetric and Skew-Symmetric Matrices]
A matrix $A \in M_n(\mathbb{F})$ is symmetric if $A^\top = A$ and skew-symmetric if $A^\top = -A$. Note that if $\mathrm{Char}\,\mathbb{F} = 2$, these conditions coincide.
[/definition]

[quotetheorem:442]

The proof is immediate: set $C = \frac{1}{2}(A + A^\top)$ and $B = \frac{1}{2}(A - A^\top)$. Division by 2 requires $\mathrm{Char}\,\mathbb{F} \neq 2$. Uniqueness follows by transposing $A = C + B$ to get $A^\top = C - B$, and solving.

[citeproof:442]

[example:Symmetric-Skew Decomposition]
For $A = MATHENVgxq8ujP77END$: $C = \frac{1}{2}(A + A^\top) = MATHENVgxq8ujP78END$, $B = \frac{1}{2}(A - A^\top) = MATHENVgxq8ujP79END$.
[/example]

This lifts to a vector space decomposition:

[quotetheorem:443]

The proof verifies $S_n \cap \mathcal{A}_n = \{0\}$ (if $A = A^\top = -A$ then $2A = 0$) and that every matrix lies in the sum. The dimensions $\dim S_n = \frac{n(n+1)}{2}$ and $\dim \mathcal{A}_n = \frac{n(n-1)}{2}$ sum to $n^2 = \dim M_n$.

[citeproof:443]

Refining by Trace

Within the symmetric matrices, we can further isolate the scalar (trace) part:

[quotetheorem:444]

The decomposition $B = cI_n + F$ with $c = \frac{1}{n}\mathrm{tr}(B)$ and $\mathrm{tr}(F) = 0$ is unique. The condition $\mathrm{Char}\,\mathbb{F} \nmid n$ ensures divisibility by $n$.

[citeproof:444]

Combining these two decompositions yields a three-way splitting:

[quotetheorem:445]

Every matrix $A \in M_n(\mathbb{F})$ decomposes uniquely as $A = cI_n + F + B$ where $c = \frac{1}{n}\mathrm{tr}(A)$ is the scalar part, $F$ is traceless symmetric, and $B$ is skew-symmetric. This decomposition is important in physics and representation theory: for $n = 3$ over $\mathbb{R}$, the three summands have dimensions $1 + 5 + 3 = 9$, corresponding to the irreducible representations of $\mathrm{O}(3)$ on the space of $3 \times 3$ matrices.

[citeproof:445]

Idempotent Endomorphisms

An idempotent is an endomorphism that equals its own square — the algebraic abstraction of a projection.

[definition:Idempotent]
An endomorphism $\alpha \in \mathrm{End}(V)$ is idempotent if $\alpha^2 = \alpha$. The only eigenvalues of an idempotent are $0$ and $1$ (if $\alpha(v) = \lambdav$ then $\lambda^2 = \lambda$).
[/definition]

[quotetheorem:446]

The proof is elementary: write $v = (v - \alpha(v)) + \alpha(v)$, check that $v - \alpha(v) \in \ker(\alpha)$ (since $\alpha(v - \alpha(v)) = \alpha(v) - \alpha^2(v) = 0$), and verify $\ker(\alpha) \cap \mathrm{im}(\alpha) = \{0\}$ (if $v = \alpha(u) \in \ker(\alpha)$ then $v = \alpha(u) = \alpha^2(u) = \alpha(v) = 0$). Moreover, $\alpha$ acts as the identity on $\mathrm{im}(\alpha)$ and as zero on $\ker(\alpha)$.

[citeproof:446]

This result holds without any finite-dimensionality assumption. In an inner product space, an idempotent is an orthogonal projection if and only if it is self-adjoint ($\alpha = \alpha^*$).

[example:Complementary Idempotents]
If $\alpha$ is idempotent, then so is $\beta = \mathrm{id} - \alpha$, and $V = \mathrm{im}(\alpha) \oplus \mathrm{im}(\beta) = \mathrm{im}(\alpha) \oplus \ker(\alpha)$. The pair $(\alpha, \beta)$ gives complementary projections with $\alpha + \beta = \mathrm{id}$ and $\alpha\beta = \beta\alpha = 0$. More generally, a family of idempotents $\alpha_1, \dots, \alpha_k$ with $\sum \alpha_i = \mathrm{id}$ and $\alpha_i \alpha_j = 0$ for $i \neq j$ gives a direct sum decomposition $V = \bigoplus_{i=1}^k \mathrm{im}(\alpha_i)$.
[/example]

Stabilisation of Rank Sequences

The sequence of ranks $r_k = \mathrm{rank}(\alpha^k)$ encodes deep structural information about an endomorphism, particularly its nilpotent part and Jordan block sizes.

[quotetheorem:447]

Part (1) is immediate: $\mathrm{im}(\alpha^{k+1}) \subseteq \mathrm{im}(\alpha^k)$. Part (2) uses the observation that $d_k = r_k - r_{k+1} = \dim(\ker(\alpha) \cap \mathrm{im}(\alpha^k))$, and $\mathrm{im}(\alpha^{k+1}) \subseteq \mathrm{im}(\alpha^k)$ gives $d_{k+1} \leq d_k$. Part (3): if $r_k = r_{k+1}$ then $\alpha$ is injective on $\mathrm{im}(\alpha^k)$, so the image stabilises from step $k$ onward.

[citeproof:447]

Application: Computing Jordan Block Sizes

For a nilpotent endomorphism $\alpha$ with $\alpha^m = 0$ and $\alpha^{m-1} \neq 0$ on an $n$-dimensional space, set $r_0 = n$ and compute the sequence $(r_0, r_1, \dots, r_{m-1}, 0)$. The number of Jordan blocks of size exactly $j$ is $2r_{j-1} - r_{j-2} - r_j$.

[example:Jordan Blocks from Rank Sequence]
Let $\alpha: \mathbb{R}^5 \to \mathbb{R}^5$ with $\alpha^3 = 0$, $\alpha^2 \neq 0$. The rank sequence $(5, r_1, r_2, 0)$ must have non-increasing differences: $5 - r_1 \geq r_1 - r_2 \geq r_2 - 0 = r_2$. Also $r_2 \geq 1$ (since $\alpha^2 \neq 0$).

If $(r_1, r_2) = (3, 1)$: differences are $2, 2, 1$ — non-increasing ✓. Blocks: one $3 \times 3$ block and one $2 \times 2$ block (sizes summing to 5, with largest block size 3).

If $(r_1, r_2) = (2, 1)$: differences are $3, 1, 1$ — non-increasing ✓. Blocks: one $3 \times 3$ block and two $1 \times 1$ zero blocks.

The case $(r_1, r_2) = (4, 1)$ would give differences $1, 3, 1$ — not non-increasing ✗.
[/example]

Failure of Distributive Laws for Subspaces

We close with a cautionary example showing that subspace operations do not satisfy all set-theoretic identities.

[example:Failure of Distributivity]
In $\mathbb{R}^3$, let $T = \langle (1,0,0) \rangle$, $U = \langle (0,1,1) \rangle$, $W = \langle (1,1,1) \rangle$. Then $U \cap W = \{0\}$ (since $(0,1,1)$ and $(1,1,1)$ are not proportional, any non-zero vector in both spans would need to be a scalar multiple of both, which is impossible). So $T + (U \cap W) = T$.

But $(T + U) \cap (T + W) \supseteq \langle (1,1,1) \rangle \not\subseteq T$, so $T + (U \cap W) \subsetneq (T + U) \cap (T + W)$.

The correct general statement is only an inclusion: $T + (U \cap W) \subseteq (T + U) \cap (T + W)$. Equality holds when $T \supseteq U$ or $T \supseteq W$ (or when the subspaces form a modular lattice, which requires $T \subseteq U$ or $T \subseteq W$).
[/example]

[problem]
Let $\alpha: \mathbb{R}^6 \to \mathbb{R}^6$ be nilpotent with $\alpha^4 = 0$, $\alpha^3 \neq 0$, $\mathrm{rank}(\alpha) = 4$, and $\mathrm{rank}(\alpha^2) = 2$. Determine the Jordan normal form of $\alpha$.
[/problem]

[solution]
The rank sequence is $(6, 4, 2, r_3, 0)$ with $r_3 \geq 1$ (since $\alpha^3 \neq 0$). The differences are $2, 2, 2 - r_3, r_3$. For non-increasing differences: $2 \geq 2 \geq 2 - r_3 \geq r_3$. From $2 - r_3 \geq r_3$: $r_3 \leq 1$. Since $r_3 \geq 1$: $r_3 = 1$.

Differences: $2, 2, 1, 1$. Number of blocks of size $j$: size 4: $2(2) - 6 - 2 = -2$... Let me use the formula correctly. Set $n_k = \dim\ker(\alpha^k) = n - r_k$: $(0, 2, 4, 5, 6)$. Number of blocks of size $\geq k$ is $n_k - n_{k-1}$: $(2, 2, 1, 1)$. Number of blocks of size exactly $k$ is $(n_k - n_{k-1}) - (n_{k+1} - n_k)$: size 1: $2 - 2 = 0$, size 2: $2 - 1 = 1$, size 3: $1 - 1 = 0$, size 4: $1 - 0 = 1$.

So the Jordan form has one $4 \times 4$ block and one $2 \times 2$ block (total dimension $4 + 2 = 6$ ✓).
[/solution]

Attribution Debug Info:

Total segments: 1

Attributed segments: 0

Non-attributed segments: 1

Attribution Summary

admin

Contributions: 1
Sources: create
Last Modified: 4/4/2026