Cambridge IA Vectors and Matrices
Cambridge IA Vectors and Matrices — Verification - Androma

Current Content

Debug: Found 2 attribution entries

First Attribution: Source: create, Text length: 60349, Start: N/A, End: N/A

Page content length: 64565

How do you solve a system of ten linear equations in ten unknowns? How do you describe all the symmetries of a cube, or the long-term behaviour of a population model? How do you classify all the conic sections — ellipses, parabolas, hyperbolas — in a single unified framework? These are the questions that drive this course, and the answer to all of them turns out to be the same body of theory: linear algebra.

The course begins with two concrete algebraic systems — complex numbers and vectors in $\mathbb{R}^3$ — that arise independently in algebra and geometry. Complex numbers extend the real line to an algebraically closed field where every polynomial has roots; vectors in three dimensions provide the language of forces, velocities, and electromagnetic fields. The first surprise of the course is that these apparently unrelated systems share a common algebraic structure: both support addition and scalar multiplication satisfying the same axioms. Abstracting this structure gives the notion of a vector space, and with it comes a coordinate-free theory of dimension, bases, and linear independence that applies uniformly to $\mathbb{R}^n$, polynomial spaces, function spaces, and matrix spaces.

The second half of the course studies linear maps — the structure-preserving transformations between vector spaces — and their matrix representations. The central question becomes: given a linear map, can we choose coordinates in which it takes the simplest possible form? This is the diagonalisation problem, and its solution requires the theory of eigenvalues, determinants, and the characteristic polynomial. The course culminates in the spectral theorem for real symmetric matrices, which guarantees not just diagonalisation but an orthonormal eigenbasis, with applications to quadratic forms, coupled differential equations, and the classification of conics.

Prerequisites are limited to fluency with real numbers, basic set notation, and elementary algebra. No prior knowledge of complex numbers, abstract algebra, or analysis is assumed.

Complex Numbers

The real number line, for all its power, has a fundamental algebraic gap: not every polynomial equation has a solution. The equation $x^2 + 1 = 0$ demands a number whose square is $-1$, and no real number has this property — the square of any real number is non-negative. This is not merely an inconvenience. Without roots of $x^2 + 1 = 0$, the factorisation $x^4 - 1 = (x^2 - 1)(x^2 + 1) = (x-1)(x+1)(x^2+1)$ terminates prematurely: four roots are expected from a degree-four polynomial, but only two are visible. To restore the algebraic completeness that the real numbers lack, we adjoin a single new element $i$ satisfying $i^2 = -1$ and extend the field operations accordingly.

The Complex Field

The key observation is that if we want $i^2 = -1$ and we want the usual laws of arithmetic to hold, then every expression built from real numbers and $i$ simplifies to the form $x + iy$ with $x, y \in \mathbb{R}$, since higher powers of $i$ cycle: $i^2 = -1$, $i^3 = -i$, $i^4 = 1$, and so on. This motivates the following definition.

[definition:Complex Number]
A complex number is an ordered pair $(x, y) \in \mathbb{R}^2$, written $z = x + iy$, where $i$ is a formal symbol satisfying $i^2 = -1$. The real number $x = \operatorname{Re}(z)$ is the real part and $y = \operatorname{Im}(z)$ is the imaginary part. The set of all complex numbers is denoted $\mathbb{C}$. The real numbers $\mathbb{R}$ embed in $\mathbb{C}$ via the identification $x \mapsto x + i \cdot 0$.
[/definition]

Note that the imaginary part of $z = 3 + 2i$ is $2$, not $2i$ — the imaginary part is a real number.

The arithmetic operations on $\mathbb{C}$ are forced by the requirement that the usual distributive, commutative, and associative laws hold together with $i^2 = -1$.

[definition:Complex Addition]
For $z = x + iy$ and $w = u + iv$ in $\mathbb{C}$, the sum is
\begin{align*} z + w = (x + u) + i(y + v). \end{align*}
[/definition]

[definition:Complex Multiplication]
For $z = x + iy$ and $w = u + iv$ in $\mathbb{C}$, the product is
\begin{align*} zw = (xu - yv) + i(xv + yu). \end{align*}
[/definition]

This multiplication rule is not arbitrary — it is the unique formula obtained by expanding $(x + iy)(u + iv) = xu + ixv + iyu + i^2yv$ and using $i^2 = -1$. With these operations, $\mathbb{C}$ forms a field: every nonzero complex number has a multiplicative inverse.

How do we compute $1/z$? The naive approach of dividing by $x + iy$ is meaningless, but there is a trick: multiply numerator and denominator by the conjugate of the denominator.

[definition:Complex Conjugate]
For $z = x + iy \in \mathbb{C}$, the complex conjugate is $\overline{z} = x - iy$.
[/definition]

The conjugate reverses the sign of the imaginary part while preserving the real part. Its key algebraic property is that $z\overline{z} = x^2 + y^2$ is always a non-negative real number: the imaginary cross-terms cancel. This immediately yields the formula for division:
\begin{align*} \frac{1}{z} = \frac{\overline{z}}{z\overline{z}} = \frac{x - iy}{x^2 + y^2}, \end{align*}
valid whenever $z \neq 0$ (i.e., not both $x = 0$ and $y = 0$). The quantity $x^2 + y^2$ appearing in the denominator deserves its own name.

[definition:Modulus]
For $z = x + iy \in \mathbb{C}$, the modulus is $|z| = \sqrt{x^2 + y^2} = \sqrt{z\overline{z}}$.
[/definition]

Geometrically, the modulus is the distance from $z$ to the origin in the Argand diagram (the complex plane, where the horizontal axis represents real parts and the vertical axis represents imaginary parts). The conjugate $\overline{z}$ is the reflection of $z$ across the real axis.

[example:Division of Complex Numbers]
To compute $\frac{3 + 4i}{1 - 2i}$, multiply numerator and denominator by the conjugate of the denominator:
\begin{align*} \frac{3 + 4i}{1 - 2i} = \frac{(3 + 4i)(1 + 2i)}{(1 - 2i)(1 + 2i)} = \frac{3 + 6i + 4i + 8i^2}{1 + 4} = \frac{-5 + 10i}{5} = -1 + 2i. \end{align*}
Verification: $(-1 + 2i)(1 - 2i) = -1 + 2i + 2i - 4i^2 = -1 + 4i + 4 = 3 + 4i$. $\checkmark$
[/example]

Polar Form and De Moivre's Theorem

The Cartesian representation $z = x + iy$ is natural for addition but clumsy for multiplication: the product formula $(xu - yv) + i(xv + yu)$ obscures what multiplication does geometrically. To see the geometric picture, we need a different representation.

Any nonzero complex number $z = x + iy$ determines a point in the plane at distance $r = |z|$ from the origin and at angle $\theta$ from the positive real axis. This gives $x = r\cos\theta$ and $y = r\sin\theta$, so $z = r(\cos\theta + i\sin\theta)$.

[definition:Argument]
For $z \in \mathbb{C} \setminus \{0\}$, the argument $\arg(z)$ is any real number $\theta$ such that $z = |z|(\cos\theta + i\sin\theta)$. The principal argument $\operatorname{Arg}(z)$ is the unique value satisfying $-\pi < \operatorname{Arg}(z) \leq \pi$.
[/definition]

The argument is only defined up to multiples of $2\pi$: the angles $\theta$ and $\theta + 2\pi$ determine the same point in the plane. This multi-valuedness, though a nuisance in elementary calculations, becomes a deep structural feature in complex analysis (branch cuts, Riemann surfaces).

Now we can see what multiplication does geometrically. If $z = r(\cos\theta + i\sin\theta)$ and $w = s(\cos\phi + i\sin\phi)$, then a direct computation using the angle addition formulae shows that multiplication amounts to multiplying moduli and adding arguments. This is not at all obvious from the Cartesian product formula — only the polar representation reveals it.

[quotetheorem:911]

This is a remarkable simplification: multiplication of complex numbers is rotation and scaling. Multiplying by $w$ rotates by $\arg(w)$ and scales by $|w|$. This geometric insight — invisible in the Cartesian formula — is the fundamental reason complex numbers appear throughout physics and engineering whenever rotations, oscillations, or waves are involved. The proof is a direct application of the angle addition identities from trigonometry.

[citeproof:911]

An immediate consequence is De Moivre's Theorem, which follows by repeated application of the polar multiplication rule. The question it answers is: what happens when we raise $\cos\theta + i\sin\theta$ to an integer power? Since each multiplication adds $\theta$ to the argument, the answer is simply $n\theta$.

[quotetheorem:912]

De Moivre's theorem converts a difficult algebraic problem — computing high powers of complex numbers — into a trivial trigonometric one: simply multiply the angle. It also provides a systematic method for finding $n$th roots of any complex number. The inductive step uses the Multiplication of Complex Numbers in Polar Form, and the negative exponent case follows from the formula for the reciprocal of a unit complex number.

[citeproof:912]

Roots of Unity and the Fundamental Theorem of Algebra

Consider the equation $z^n = 1$. Writing $z = \cos\theta + i\sin\theta$ (since $|z|^n = 1$ forces $|z| = 1$), De Moivre's theorem gives $\cos(n\theta) + i\sin(n\theta) = 1$, so $n\theta = 2\pi k$ for integer $k$. This yields $n$ distinct solutions:

[example:Roots of Unity]
The $n$th roots of unity are
\begin{align*} \omega_k = \cos\frac{2\pi k}{n} + i\sin\frac{2\pi k}{n}, \qquad k = 0, 1, \ldots, n-1. \end{align*}
They are equally spaced on the unit circle, forming the vertices of a regular $n$-gon.

For $n = 3$: the cube roots of unity are $1$, $\omega = \cos(2\pi/3) + i\sin(2\pi/3) = -\frac{1}{2} + \frac{\sqrt{3}}{2}i$, and $\omega^2 = -\frac{1}{2} - \frac{\sqrt{3}}{2}i$. Note that $\omega^2 = \overline{\omega}$ and $1 + \omega + \omega^2 = 0$ — the sum of all $n$th roots of unity always vanishes for $n \geq 2$, since they are the roots of $z^n - 1 = 0$ and the coefficient of $z^{n-1}$ is zero (Vieta's formula).

For $n = 4$: the fourth roots of unity are $1, i, -1, -i$. These satisfy $z^4 - 1 = (z-1)(z+1)(z-i)(z+i)$, which provides the factorisation into linear factors over $\mathbb{C}$ that was impossible over $\mathbb{R}$.
[/example]

This example illustrates a deep fact: over $\mathbb{C}$, every polynomial of degree $n$ factors into exactly $n$ linear factors. This is the Fundamental Theorem of Algebra — the original motivation for introducing complex numbers. Its proof requires tools from analysis (the simplest proofs use Liouville's theorem or the winding number), but its consequence is purely algebraic: $\mathbb{C}$ is algebraically closed, meaning no further field extension is needed to solve polynomial equations. The real numbers, by contrast, are not algebraically closed — which is precisely the deficiency that $\mathbb{C}$ was designed to repair.

Euler's Formula

The notation $\cos\theta + i\sin\theta$ is unwieldy. Euler's formula provides a compact alternative by connecting the complex exponential to trigonometry:
\begin{align*} e^{i\theta} = \cos\theta + i\sin\theta. \end{align*}
This can be motivated by substituting $i\theta$ into the power series $e^x = \sum_{n=0}^{\infty} x^n/n!$ and separating real and imaginary parts (the real terms give the Taylor series for $\cos\theta$, the imaginary terms give $\sin\theta$). With this notation, the polar form becomes $z = re^{i\theta}$, multiplication becomes $zw = rse^{i(\theta + \phi)}$, and De Moivre's Theorem becomes the obvious identity $(e^{i\theta})^n = e^{in\theta}$.

Euler's formula also yields the remarkable identity $e^{i\pi} + 1 = 0$, connecting five fundamental constants of mathematics in a single equation.

Vectors in Three Dimensions

Many physical quantities — forces, velocities, electric fields — are not adequately described by a single number. A force has both a magnitude and a direction; specifying the wind speed without the wind direction is insufficient for navigation. The mathematical object that captures "magnitude plus direction" is a vector. This chapter develops the algebra and geometry of vectors in $\mathbb{R}^3$, building toward two fundamental operations — the dot product and the cross product — that encode angles, lengths, areas, and volumes.

Vectors and the Dot Product

A vector $a \in \mathbb{R}^3$ is an ordered triple $a = (a_1, a_2, a_3)$ with $a_i \in \mathbb{R}$. Vectors add componentwise: $a + b = (a_1 + b_1, a_2 + b_2, a_3 + b_3)$. Scalar multiplication scales each component: $\lambdaa = (\lambda a_1, \lambda a_2, \lambda a_3)$.

The first operation we need is one that extracts the angle between two vectors. Given vectors $a$ and $b$, the cosine rule from Euclidean geometry gives
\begin{align*} |a - b|^2 = |a|^2 + |b|^2 - 2|a||b|\cos\theta, \end{align*}
where $\theta$ is the angle between them. Expanding the left side componentwise:
\begin{align*} |a - b|^2 = \sum_i (a_i - b_i)^2 = \sum_i a_i^2 - 2\sum_i a_i b_i + \sum_i b_i^2. \end{align*}
Comparing the two expressions gives $\sum_i a_i b_i = |a||b|\cos\theta$. This motivates the following definition.

[definition:Dot Product]
For $a, b \in \mathbb{R}^3$, the dot product (or scalar product) is
\begin{align*} a \cdot b = a_1 b_1 + a_2 b_2 + a_3 b_3. \end{align*}
[/definition]

The derivation above immediately gives the geometric interpretation: $a \cdot b = |a||b|\cos\theta$, where $\theta$ is the angle between the vectors. This formula encodes two important special cases: orthogonality ($a \cdot b = 0$ if and only if $\theta = \pi/2$) and length ($a \cdot a = |a|^2$). The dot product also gives the projection of one vector onto another: the component of $b$ in the direction of $a$ is $(b \cdot \hat{a})\hat{a}$, where $\hat{a} = a/|a|$ is the unit vector.

[example:Angle Between Vectors]
Let $a = (1, 2, 2)$ and $b = (3, -4, 0)$. Then $a \cdot b = 3 - 8 + 0 = -5$, $|a| = \sqrt{1 + 4 + 4} = 3$, $|b| = \sqrt{9 + 16} = 5$, so
\begin{align*} \cos\theta = \frac{-5}{3 \cdot 5} = -\frac{1}{3}, \qquad \theta = \arccos(-1/3) \approx 109.5°. \end{align*}
The negative dot product indicates an obtuse angle — the vectors point "away" from each other.
[/example]

The Cross Product

The dot product takes two vectors and returns a scalar. But in three dimensions, there is a natural operation that takes two vectors and returns a vector: the cross product. The motivation comes from area and orientation.

Given two non-parallel vectors $a$ and $b$ in $\mathbb{R}^3$, they span a parallelogram. The area of this parallelogram is $|a||b|\sin\theta$, and the parallelogram determines a plane with two possible normal directions. The cross product $a \times b$ encodes both: it is the vector perpendicular to the plane of $a$ and $b$, with magnitude equal to the area, and direction determined by the right-hand rule.

[definition:Cross Product]
For $a, b \in \mathbb{R}^3$, the cross product is the vector
\begin{align*} a \times b = (a_2 b_3 - a_3 b_2, \; a_3 b_1 - a_1 b_3, \; a_1 b_2 - a_2 b_1). \end{align*}
[/definition]

The cross product is antisymmetric: $a \times b = -b \times a$. In particular, $a \times a = \mathbf{0}$. This antisymmetry reflects the fact that swapping $a$ and $b$ reverses the orientation of the parallelogram, flipping the normal direction.

A critical property that distinguishes the cross product from ordinary multiplication is non-associativity: in general, $a \times (b \times c) \neq (a \times b) \times c$.

[example:Failure of Associativity]
Let $e_1 = (1,0,0)$, $e_2 = (0,1,0)$, $e_3 = (0,0,1)$ be the standard basis. Then:
\begin{align*} e_1 \times (e_1 \times e_2) &= e_1 \times e_3 = -e_2, \\ (e_1 \times e_1) \times e_2 &= \mathbf{0} \times e_2 = \mathbf{0}. \end{align*}
The two expressions are not equal. The cross product is not associative, and parentheses cannot be moved freely. This is why the Triple Cross Product Identity is essential — it provides the correct expansion.
[/example]

The Scalar Triple Product and Volume

Just as the cross product captures area, the scalar triple product captures volume. Given three vectors $a, b, c \in \mathbb{R}^3$, the parallelepiped they span has volume equal to the absolute value of the scalar triple product.

[definition:Scalar Triple Product]
For $a, b, c \in \mathbb{R}^3$, the scalar triple product is
\begin{align*} [a, b, c] = a \cdot (b \times c) = \det\begin{pmatrix} a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \\ c_1 & c_2 & c_3 \end{pmatrix}. \end{align*}
[/definition]

The determinant representation reveals two important properties: cyclic symmetry ($[a, b, c] = [b, c, a] = [c, a, b]$, since cyclic permutations of rows preserve the determinant up to sign) and the degeneracy criterion ($[a, b, c] = 0$ if and only if the three vectors are coplanar).

The scalar triple product also provides a concrete test for linear independence in $\mathbb{R}^3$: three vectors are linearly independent if and only if $[a, b, c] \neq 0$ — precisely because linear dependence means the vectors are coplanar, which means the parallelepiped they span has zero volume.

Index Notation and the Levi-Civita Symbol

As vector identities become more complex, component-by-component verification becomes tedious and error-prone. Index notation provides a systematic algebraic framework. We write $a_i$ for the $i$th component of $a$, and adopt the Einstein summation convention: repeated indices in a term are summed over $\{1, 2, 3\}$. For example, $a \cdot b = a_i b_i$ (implicit sum over $i$).

The cross product requires a new symbol to handle the antisymmetry.

[definition:Kronecker Delta]
The Kronecker delta $\delta_{ij}$ equals $1$ if $i = j$ and $0$ otherwise.
[/definition]

[definition:Levi-Civita Symbol]
The Levi-Civita symbol (or alternating symbol) $\varepsilon_{ijk}$ is defined by:
\begin{align*} \varepsilon_{ijk} = \begin{cases} +1 & \text{if } (i,j,k) \text{ is an even permutation of } (1,2,3), \\ -1 & \text{if } (i,j,k) \text{ is an odd permutation of } (1,2,3), \\ 0 & \text{if any two indices are equal.} \end{cases} \end{align*}
[/definition]

In this notation, the cross product becomes $(a \times b)_i = \varepsilon_{ijk} a_j b_k$, and the scalar triple product is $[a, b, c] = \varepsilon_{ijk} a_i b_j c_k$.

The power of index notation comes from the following identity, which reduces all vector identities to algebraic manipulations with Kronecker deltas. Without it, every vector identity would require a separate component-by-component verification; with it, the proofs become short algebraic calculations that can be carried out mechanically.

[quotetheorem:913]

The identity is the engine behind virtually every vector identity in three dimensions. Without it, proving an identity like the triple cross product formula would require expanding nine components and comparing them one by one — a tedious process that offers no insight into why the identity holds. With the epsilon-delta contraction, the same proof becomes a two-line algebraic manipulation. Its proof is a symmetry argument: both sides share the same antisymmetry properties and vanish unless the free indices form a permutation, so it suffices to verify a single non-trivial case.

[citeproof:913]

As a demonstration of its power, we derive the triple cross product formula. When computing $a \times (b \times c)$, the naive approach of expanding both cross products in components produces a wall of terms. The epsilon-delta identity collapses this to a two-line calculation.

[quotetheorem:914]

The mnemonic "BAC minus CAB" helps remember the formula: $a \times (b \times c) = b(a \cdot c) - c(a \cdot b)$. Notice that the result is a linear combination of $b$ and $c$ — never $a$ — which is geometrically forced: $b \times c$ is perpendicular to both $b$ and $c$, so $a \times (b \times c)$ is perpendicular to $b \times c$, hence lies in the plane of $b$ and $c$. The proof proceeds by writing both cross products in index notation and applying the Epsilon-Delta Identity.

[citeproof:914]

[example:Solving a Vector Equation]
Consider the equation $r + r \times d = c$ for unknown $r$, with constant vectors $d, c$.

Step 1: Extract the component along $d$. Take the dot product with $d$: $r \cdot d + (r \times d) \cdot d = c \cdot d$. Since $r \times d$ is perpendicular to $d$, the second term vanishes, giving $r \cdot d = c \cdot d$.

Step 2: Eliminate the cross product. Take the cross product with $d$: $r \times d + (r \times d) \times d = c \times d$. The original equation gives $r \times d = c - r$. For the double cross product, apply the Triple Cross Product Identity (with the outer vector on the right):
\begin{align*} (r \times d) \times d = (d \cdot r)d - (d \cdot d)r = (c \cdot d)d - |d|^2r, \end{align*}
using the result $r \cdot d = c \cdot d$ from Step 1. Substituting back: $(c - r) + (c \cdot d)d - |d|^2r = c \times d$.

Step 3: Solve. Collecting terms in $r$:
\begin{align*} (1 + |d|^2)r = c + (c \cdot d)d - c \times d, \end{align*}
so $r = \frac{1}{1 + |d|^2}\bigl(c + (c \cdot d)d - c \times d\bigr)$.

This solution is unique because $1 + |d|^2 \geq 1 > 0$. The key techniques were: dotting with $d$ to extract the parallel component, crossing with $d$ and using the BAC–CAB rule to eliminate the cross product.
[/example]

The tools developed in this chapter — dot products, cross products, the Levi-Civita calculus — are specific to $\mathbb{R}^3$. The cross product, for instance, only exists in three dimensions (and, curiously, in seven). But many of the ideas — linear combinations, spanning, independence — apply in any setting where addition and scalar multiplication make sense. To capture this generality, we need the abstract notion of a vector space.

Vector Spaces

Consider two apparently unrelated problems. First: the set of solutions to the homogeneous system $Ax = \mathbf{0}$, where $A$ is a $3 \times 5$ matrix. If $x$ and $y$ are both solutions, then $A(\lambdax + \muy) = \lambda Ax + \mu Ay = \mathbf{0}$, so every linear combination of solutions is again a solution. Second: the set of solutions to the homogeneous ODE $y'' + y = 0$. If $f$ and $g$ are both solutions, then $(\lambda f + \mu g)'' + (\lambda f + \mu g) = \lambda(f'' + f) + \mu(g'' + g) = 0$, so every linear combination of solutions is again a solution. The two proofs are identical in structure — only the notation differs. The first lives in $\mathbb{R}^5$, the second in a space of functions, but the algebraic mechanism (closure under linear combinations) is the same.

This pattern recurs throughout mathematics: polynomial spaces, matrix spaces, and spaces of continuous functions all support the same operations (addition and scalar multiplication) satisfying the same algebraic laws. Rather than re-proving the same structural results in each setting, we abstract the common features into a single definition and develop the theory once.

Definition and First Examples

The following definition captures the minimal algebraic structure shared by $\mathbb{R}^n$, polynomial spaces, and solution spaces of linear systems.

[definition:Real Vector Space]
A real vector space is a set $V$ together with two operations:

  • Addition: $V \times V \to V$, written $(u, v) \mapsto u + v$
  • Scalar multiplication: $\mathbb{R} \times V \to V$, written $(\lambda, v) \mapsto \lambdav$

satisfying the following axioms:

  1. $(V, +)$ is an abelian group: addition is associative, commutative, has an identity element $\mathbf{0}$, and every element has an additive inverse $-v$.
  2. $\lambda(u + v) = \lambdau + \lambdav$ for all $\lambda \in \mathbb{R}$, $u, v \in V$.
  3. $(\lambda + \mu)v = \lambdav + \muv$ for all $\lambda, \mu \in \mathbb{R}$, $v \in V$.
  4. $\lambda(\muv) = (\lambda\mu)v$ for all $\lambda, \mu \in \mathbb{R}$, $v \in V$.
  5. $1v = v$ for all $v \in V$.
    [/definition]

The definition says nothing about distance, angle, or inner product — these are additional structures (studied later in the chapter on inner product spaces) that may or may not be present.

[example:Standard Examples]

  • $\mathbb{R}^n$ with componentwise operations is a vector space.
  • The set $P_n$ of polynomials of degree at most $n$ with real coefficients, under the usual addition and scalar multiplication, is a vector space. The zero vector is the zero polynomial.
  • The set of all $m \times n$ real matrices, with entry-wise addition and scalar multiplication, is a vector space.
  • The set $C([a,b])$ of continuous functions $f: [a,b] \to \mathbb{R}$, with pointwise operations $(f+g)(x) = f(x) + g(x)$ and $(\lambda f)(x) = \lambda f(x)$, is a vector space. This is an infinite-dimensional example — it cannot be spanned by finitely many functions.
    [/example]

Not every set equipped with addition-like and scaling-like operations forms a vector space. The axioms are restrictions, and violating even one can destroy the entire theory.

[example:Failure of Vector Space Axioms]
Consider the set $\mathbb{R}_{>0}$ of positive real numbers with the operations $x \oplus y = xy$ (multiplication as "addition") and $\lambda \odot x = x^\lambda$ (exponentiation as "scalar multiplication"). This is actually a valid vector space: the "zero vector" is $1$ (since $x \cdot 1 = x$), the "additive inverse" of $x$ is $1/x$, and one can verify all axioms.

Now consider the variant where we keep $x \oplus y = xy$ but redefine scalar multiplication as $\lambda \odot x = \lambda x$ (ordinary multiplication). Then axiom 2 fails: $\lambda \odot (x \oplus y) = \lambda(xy)$, but $(\lambda \odot x) \oplus (\lambda \odot y) = (\lambda x)(\lambda y) = \lambda^2 xy \neq \lambda xy$ in general. The failure of a single distributive law prevents this from being a vector space, and results like dimension theory would not apply.
[/example]

Subspaces

Often the interesting vector spaces arise not as new constructions but as subsets of existing spaces that inherit the vector space structure. The solution set of a homogeneous linear system $Ax = \mathbf{0}$ is a subset of $\mathbb{R}^n$ that is closed under addition and scalar multiplication — if $x$ and $y$ are solutions, so is $\lambdax + \muy$.

[definition:Subspace]
A subspace of a vector space $V$ is a nonempty subset $U \subseteq V$ such that for all $u, v \in U$ and all $\lambda, \mu \in \mathbb{R}$, we have $\lambdau + \muv \in U$.
[/definition]

This single condition — closure under all linear combinations — is equivalent to requiring that $U$ is itself a vector space under the inherited operations. It automatically guarantees that $\mathbf{0} \in U$ (take $\lambda = \mu = 0$) and that $-v \in U$ for any $v \in U$ (take $\lambda = -1$, $\mu = 0$).

The subspace condition can fail in subtle ways, and it is worth seeing a concrete failure.

[example:Failure of the Subspace Condition]
In $\mathbb{R}^2$, the set $S = \{(x, y) : x \geq 0\}$ (the right half-plane) is not a subspace: it contains $(1, 0)$ but not $(-1)(1, 0) = (-1, 0)$, so it fails closure under scalar multiplication.

The set $T = \{(x, y) : x^2 + y^2 \leq 1\}$ (the unit disc) is not a subspace: $(1, 0) \in T$ and $(0, 1) \in T$, but $(1, 0) + (0, 1) = (1, 1) \notin T$ since $1^2 + 1^2 = 2 > 1$. Closure under addition fails.

The set $W = \{(x, y) : y = x + 1\}$ is not a subspace: it does not contain $\mathbf{0} = (0, 0)$ since $0 \neq 0 + 1$. This is a line that misses the origin — it is an affine subspace, not a linear one. All genuine subspaces must pass through the origin.
[/example]

Linear Independence, Spanning, and Bases

A central question in any vector space is: what is the smallest set of vectors from which all other vectors can be built by linear combinations? This leads to three interrelated concepts.

[definition:Linear Combination]
A linear combination of vectors $v_1, \ldots, v_r \in V$ is any expression $\lambda_1v_1 + \cdots + \lambda_rv_r$ with $\lambda_i \in \mathbb{R}$.
[/definition]

[definition:Span]
The span of vectors $v_1, \ldots, v_r \in V$ is the set of all linear combinations: $\operatorname{span}\{v_1, \ldots, v_r\} = \{\lambda_1v_1 + \cdots + \lambda_rv_r : \lambda_i \in \mathbb{R}\}$. This is the smallest subspace of $V$ containing all the $v_i$.
[/definition]

If we want our set of "building blocks" to be as small as possible, we need to eliminate redundancy. A vector $v_k$ is redundant if it can already be expressed as a linear combination of the others — equivalently, if removing it does not shrink the span. This motivates:

[definition:Linear Independence]
Vectors $v_1, \ldots, v_r \in V$ are linearly independent if the only solution to
\begin{align*} \lambda_1v_1 + \lambda_2v_2 + \cdots + \lambda_rv_r = \mathbf{0} \end{align*}
is $\lambda_1 = \lambda_2 = \cdots = \lambda_r = 0$. Otherwise, they are linearly dependent.
[/definition]

Equivalently, the vectors are linearly dependent if and only if at least one of them can be written as a linear combination of the others: if $\lambda_k \neq 0$ in a nontrivial relation, then $v_k = -\lambda_k^{-1}\sum_{i \neq k} \lambda_i v_i$.

[example:Linear Dependence from Geometry]
In $\mathbb{R}^3$, two vectors are linearly dependent if and only if they are parallel (one is a scalar multiple of the other). Three vectors are linearly dependent if and only if they are coplanar — and the scalar triple product $[a, b, c] = a \cdot (b \times c)$ provides the test: the vectors are linearly independent if and only if $[a, b, c] \neq 0$.

For instance, $a = (1, 0, 0)$, $b = (0, 1, 0)$, $c = (1, 1, 0)$ are coplanar (all lie in the $xy$-plane), and indeed $c = a + b$. The scalar triple product is $a \cdot (b \times c) = (1,0,0) \cdot (0,0,-1) = 0$.
[/example]

A basis is a set of vectors that is both spanning and linearly independent — the "just right" set of building blocks: enough to reach every vector, but with no redundancy.

[definition:Basis]
A basis for a vector space $V$ is a set $\mathcal{B} = \{e_1, \ldots, e_n\}$ that:

  1. Spans $V$: every $v \in V$ can be written as $v = v_1e_1 + \cdots + v_ne_n$.
  2. Is linearly independent.
    [/definition]

The combination of spanning and independence guarantees that the coefficients $v_1, \ldots, v_n$ are unique — if there were two distinct representations, their difference would give a nontrivial linear relation among the basis vectors, contradicting independence. These unique coefficients are the components of $v$ with respect to $\mathcal{B}$.

Dimension

A vector space can have many different bases. $\mathbb{R}^2$ has the standard basis $\{(1,0), (0,1)\}$ and also $\{(1,1), (1,-1)\}$ and infinitely many others. A natural question is: can different bases have different numbers of elements? If so, the notion of "size" of a vector space would be ambiguous. The Steinitz Exchange Lemma provides the tool to resolve this: if we have a spanning set and a linearly independent set, we can systematically swap elements, showing that the independent set can never be larger than the spanning set.

[quotetheorem:915]

This is a foundational result: it guarantees that the "size" of a vector space is an intrinsic property, independent of which basis we happen to choose. Without it, dimension would be ill-defined, and coordinates would be meaningless. The proof proceeds by the Steinitz exchange argument, systematically replacing basis vectors one at a time.

[citeproof:915]

This theorem justifies the following definition.

[definition:Dimension]
The dimension of a vector space $V$, denoted $\dim V$, is the number of vectors in any basis of $V$. By convention, the zero space $\{\mathbf{0}\}$ has dimension $0$.
[/definition]

Dimension imposes rigid constraints: in an $n$-dimensional space, any set of more than $n$ vectors is linearly dependent, any set of fewer than $n$ vectors cannot span, and any linearly independent set of exactly $n$ vectors is automatically a basis (no need to check spanning separately).

With vector spaces and their dimensions established, the next question is: what are the natural transformations between vector spaces? A rotation of $\mathbb{R}^3$, a differentiation operator on polynomials, and a matrix multiplication $x \mapsto Ax$ all preserve the linear structure — they respect addition and scaling. Understanding these structure-preserving maps is the central problem of linear algebra.

Linear Maps and Matrices

Having established the abstract framework of vector spaces, we now study the natural transformations between them. A rotation in $\mathbb{R}^3$, a differentiation operator on polynomials, and a matrix multiplication $x \mapsto Ax$ all share a common property: they preserve the vector space structure by respecting addition and scalar multiplication. These are the linear maps — the structure-preserving morphisms of vector spaces — and the study of their properties is the central goal of linear algebra.

Definition and Examples

[definition:Linear Map]
A linear map (or linear transformation) $T: V \to W$ between vector spaces $V$ and $W$ is a function satisfying
\begin{align*} T(\lambdax + \muy) = \lambda T(x) + \mu T(y) \end{align*}
for all $x, y \in V$ and all $\lambda, \mu \in \mathbb{R}$.
[/definition]

Setting $\lambda = \mu = 0$ shows that every linear map sends $\mathbf{0}$ to $\mathbf{0}$: $T(\mathbf{0}) = \mathbf{0}$. This immediately rules out many natural functions from being linear.

[example:Non-Linearity Detector]
The function $f: \mathbb{R} \to \mathbb{R}$ defined by $f(x) = x + 1$ is not linear: $f(0) = 1 \neq 0$. More generally, any function of the form $f(x) = Ax + b$ with $b \neq \mathbf{0}$ is affine, not linear — it translates the origin.

The function $g: \mathbb{R}^2 \to \mathbb{R}$ defined by $g(x, y) = xy$ is not linear: $g(1, 1) = 1$ but $g(2, 2) = 4 \neq 2 \cdot g(1, 1)$. Multiplication of coordinates is a bilinear operation, not a linear one.
[/example]

A linear map is completely determined by its values on a basis: if $\{e_1, \ldots, e_n\}$ is a basis for $V$ and we know $T(e_i)$ for each $i$, then for any $v = \sum_i v_i e_i$, linearity forces $T(v) = \sum_i v_i T(e_i)$. This is one of the most important facts in linear algebra: specifying a linear map requires only $n$ choices (the images of basis vectors), not one choice for every vector in $V$.

Kernel, Image, and the Rank-Nullity Theorem

Two subspaces naturally associated with any linear map encode how the map "compresses" and "expands" information.

[definition:Kernel]
The kernel (or null space) of a linear map $T: V \to W$ is
\begin{align*} \ker(T) = \{v \in V : T(v) = \mathbf{0}\}. \end{align*}
[/definition]

[definition:Image]
The image (or range) of $T: V \to W$ is
\begin{align*} \operatorname{Im}(T) = \{T(v) : v \in V\}. \end{align*}
[/definition]

Both are subspaces (the kernel of $V$, the image of $W$) — this follows immediately from linearity: if $T(u) = T(v) = \mathbf{0}$, then $T(\lambdau + \muv) = \lambda T(u) + \mu T(v) = \mathbf{0}$, so the kernel is closed under linear combinations; similarly for the image.

The kernel measures injectivity: $T$ is injective if and only if $\ker(T) = \{\mathbf{0}\}$ (since $T(u) = T(v)$ implies $T(u - v) = \mathbf{0}$). The image measures surjectivity: $T$ is surjective if and only if $\operatorname{Im}(T) = W$.

The dimensions of the kernel and image are given special names: $\operatorname{nullity}(T) = \dim\ker(T)$ and $\operatorname{rank}(T) = \dim\operatorname{Im}(T)$. A natural question is: how do these quantities relate to the dimension of the domain? Can a map have a large kernel and a large image, or must one come at the expense of the other? The answer is a rigid constraint:

[quotetheorem:916]

The rank-nullity theorem provides a complete accounting of dimensions: the $n$ "degrees of freedom" in $V$ split between the kernel (directions that $T$ collapses to zero) and the image (directions that $T$ maps onto). It has an immediate corollary: a linear map $T: V \to W$ between spaces of equal dimension is injective if and only if it is surjective — in finite dimensions, "one-to-one" and "onto" are equivalent for square maps. The proof constructs a basis for $V$ by extending a basis for $\ker(T)$, then shows the images of the extension vectors form a basis for $\operatorname{Im}(T)$.

[citeproof:916]

The theorem's power is predictive: it tells you the dimension of the solution space before you solve the system. Suppose $A$ is a $3 \times 5$ matrix with $\operatorname{rank}(A) = 3$ (the maximum possible, since $A$ has 3 rows). Then the Rank-Nullity Theorem gives $\operatorname{nullity}(A) = 5 - 3 = 2$: the solution space of $Ax = \mathbf{0}$ is 2-dimensional, meaning the general solution involves exactly 2 free parameters. You know this without row-reducing — the dimension alone determines it. Conversely, if $A$ is a $5 \times 3$ matrix with trivial kernel, then $\operatorname{rank}(A) = 3$, so the image is 3-dimensional inside $\mathbb{R}^5$ — the map cannot be surjective, and the system $Ax = b$ has no solution for most choices of $b$.

Matrix Representation

Once bases are chosen for the domain and codomain, a linear map can be represented by an array of numbers — a matrix.

[definition:Matrix of a Linear Map]
Let $T: V \to W$ be linear, with $\dim V = n$ and $\dim W = m$. Choose bases $\{e_1, \ldots, e_n\}$ for $V$ and $\{f_1, \ldots, f_m\}$ for $W$. The matrix of $T$ with respect to these bases is the $m \times n$ matrix $M$ with entries
\begin{align*} M_{ai} \text{ defined by } T(e_i) = \sum_{a=1}^m M_{ai}f_a. \end{align*}
The columns of $M$ are the coordinate vectors of $T(e_1), \ldots, T(e_n)$ in the basis $\{f_a\}$.
[/definition]

With this convention, if $v = \sum_i v_i e_i$ has component column vector $v$, then $T(v)$ has component vector $Mv$, where the matrix-vector product is defined by $(Mv)_a = \sum_i M_{ai} v_i$. The composition of linear maps corresponds to matrix multiplication.

[definition:Matrix Multiplication]
For an $m \times n$ matrix $A$ and an $n \times p$ matrix $B$, the product $AB$ is the $m \times p$ matrix with entries
\begin{align*} (AB)_{ij} = \sum_{k=1}^n A_{ik}B_{kj}. \end{align*}
[/definition]

Matrix multiplication is associative ($(AB)C = A(BC)$) and distributive ($A(B + C) = AB + AC$), but not commutative: in general, $AB \neq BA$, even for square matrices. This reflects the fact that composition of linear maps is not commutative — rotating then reflecting differs from reflecting then rotating.

[example:Geometric Transformations]
The most concrete examples of linear maps on $\mathbb{R}^2$ and $\mathbb{R}^3$ are geometric transformations. Their matrices can be found by determining where the basis vectors go.

Rotation by $\theta$ in $\mathbb{R}^2$: The basis vector $e_1 = (1,0)$ maps to $(\cos\theta, \sin\theta)$ and $e_2 = (0,1)$ maps to $(-\sin\theta, \cos\theta)$. So the matrix is:
\begin{align*} R(\theta) = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}. \end{align*}

Reflection in a plane (in $\mathbb{R}^3$) with unit normal $\hat{n}$: A vector $v$ decomposes as $v = (v \cdot \hat{n})\hat{n} + (v - (v \cdot \hat{n})\hat{n})$. Reflection negates the normal component and preserves the tangential one, giving $H(v) = v - 2(v \cdot \hat{n})\hat{n}$. In index notation: $H_{ij} = \delta_{ij} - 2n_i n_j$.

Non-commutativity: Let $R = MATHENV6nnu96P23END$ (rotation by $90°$) and $S = MATHENV6nnu96P24END$ (reflection in the $x$-axis). Then:
\begin{align*} RS = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}, \qquad SR = \begin{pmatrix} 0 & -1 \\ -1 & 0 \end{pmatrix}. \end{align*}
These are different transformations: $RS$ is reflection in $y = x$, while $SR$ is reflection in $y = -x$. The order of operations matters.
[/example]

The rank-nullity theorem tells us that a square matrix $A$ is invertible if and only if $\ker(A) = \{\mathbf{0}\}$, but checking this requires row-reducing $A$ — a computation that grows with the matrix size. Is there a single number that can detect invertibility directly? This question leads to the determinant.

Determinants

In the previous chapter, we saw that a linear map is invertible if and only if it is both injective and surjective, or equivalently, if and only if its kernel is trivial. But how do we test invertibility without computing the kernel? Every square matrix $A$ has a single numerical invariant — the determinant $\det(A)$ — that answers this question: $A$ is invertible if and only if $\det(A) \neq 0$. The determinant also measures signed volume, detects linear dependence, and governs the solvability of linear systems.

The motivation comes from $2 \times 2$ systems. Consider $a_{11}x + a_{12}y = b_1$, $a_{21}x + a_{22}y = b_2$. Solving by elimination: multiply the first equation by $a_{22}$ and the second by $a_{12}$, then subtract: $(a_{11}a_{22} - a_{12}a_{21})x = a_{22}b_1 - a_{12}b_2$. The expression $a_{11}a_{22} - a_{12}a_{21}$ determines whether we can solve for $x$: if it is nonzero, the system has a unique solution; if it is zero, the system is either inconsistent or has infinitely many solutions. This expression is the $2 \times 2$ determinant.

Definition and Properties

[definition:Determinant]
The determinant of an $n \times n$ matrix $A$ is defined recursively:

  • For $n = 1$: $\det(a_{11}) = a_{11}$.
  • For $n \geq 2$: expanding along the first row,
    \begin{align*} \det(A) = \sum_{j=1}^n (-1)^{1+j} a_{1j} \det(A_{1j}), \end{align*}
    where $A_{1j}$ is the $(n-1) \times (n-1)$ matrix obtained by deleting row $1$ and column $j$.
    [/definition]

Alternatively, using the Levi-Civita symbol: $\det(A) = \varepsilon_{i_1 i_2 \cdots i_n} a_{1 i_1} a_{2 i_2} \cdots a_{n i_n}$ (summed over all permutations). This formula makes the algebraic properties transparent. The most important properties are collected in the following theorem, which we state without proof — the proofs follow from the permutation formula or by induction on the cofactor expansion.

[quotetheorem:917]

Property 5 is particularly powerful: it reduces the determinant of a product to the product of determinants, bypassing the need to compute $AB$ explicitly. Property 6 gives the fundamental connection between determinants and invertibility: the determinant is the single number that tells you whether a linear system has a unique solution. The proofs all follow from the permutation formula $\det(A) = \sum_\sigma \operatorname{sgn}(\sigma) \prod_i a_{i,\sigma(i)}$, which makes each property a statement about sums over the symmetric group.

[citeproof:917]

[example:Geometric Meaning of the Determinant]
For a $2 \times 2$ matrix $A = MATHENV6nnu96P27END$, $\det(A) = ad - bc$. Geometrically, $|\det(A)|$ is the area of the parallelogram spanned by the column vectors $(a, c)$ and $(b, d)$. The sign encodes orientation: $\det(A) > 0$ if the columns form a right-handed pair, $\det(A) < 0$ if left-handed.

For $3 \times 3$ matrices, $|\det(A)|$ is the volume of the parallelepiped spanned by the three column vectors, and the sign encodes the handedness of the triple. When $\det(A) = 0$, the columns are coplanar and the parallelepiped has zero volume — the map $A$ collapses a dimension.

This is why $\det(A) = 0$ indicates non-invertibility: a map that collapses volume to zero cannot be reversed.
[/example]

The determinant tells us whether a matrix is invertible, but it says nothing about how the matrix transforms space — which directions are stretched, compressed, or rotated. To understand the internal geometry of a linear map, we need to decompose it into its simplest components. This leads to eigenvalues and eigenvectors.

Eigenvalues and Diagonalisation

Consider the system of coupled differential equations
\begin{align*} \frac{dx}{dt} = 4x - 2y, \qquad \frac{dy}{dt} = x + y, \end{align*}
modelling, say, two interacting populations. In matrix form: $x' = Ax$ where $A = MATHENV6nnu96P29END$. If $A$ were diagonal — say $A = MATHENV6nnu96P30END$ — the system would decouple into $x' = \lambda_1 x$ and $y' = \lambda_2 y$, with solutions $x(t) = x_0 e^{\lambda_1 t}$ and $y(t) = y_0 e^{\lambda_2 t}$. The coupling makes this impossible as written. But what if we could find a change of coordinates $x = Pu$ that transforms $A$ into a diagonal matrix $D = P^{-1}AP$? Then $u' = Du$, the system decouples, and the solution is immediate.

The same issue arises in computing matrix powers. To find $A^{100}$, direct multiplication requires 99 matrix products. But if $A = PDP^{-1}$ with $D$ diagonal, then $A^{100} = PD^{100}P^{-1}$, and $D^{100}$ is trivial — just raise each diagonal entry to the 100th power.

Both problems reduce to the same question: can we choose a basis in which a linear map acts as simply as possible — just scaling each coordinate independently? The vectors in this special basis are the eigenvectors, the scaling factors are the eigenvalues, and the question of when such a basis exists is the diagonalisation problem.

Eigenvalues and Eigenvectors

[definition:Eigenvalue and Eigenvector]
Let $T: V \to V$ be a linear map on a finite-dimensional vector space $V$. A nonzero vector $v \in V$ is an eigenvector of $T$ with eigenvalue $\lambda$ if
\begin{align*} T(v) = \lambdav. \end{align*}
[/definition]

The requirement that $v \neq \mathbf{0}$ is essential: $T(\mathbf{0}) = \lambda\mathbf{0} = \mathbf{0}$ for every $\lambda$, so the zero vector would trivially satisfy the equation for any scalar, making the concept vacuous.

Geometrically, an eigenvector is a direction that $T$ preserves (up to scaling): $T$ sends $v$ to a scalar multiple of itself, rather than rotating it into a different direction. The eigenvalue $\lambda$ records the scaling factor. If $\lambda > 0$, the eigenvector is stretched (or compressed); if $\lambda < 0$, it is reversed; if $\lambda = 0$, the eigenvector is sent to $\mathbf{0}$ (it lies in the kernel).

Finding Eigenvalues: The Characteristic Polynomial

How do we find eigenvalues? The equation $T(v) = \lambdav$ rearranges to $(T - \lambda I)v = \mathbf{0}$. We need a nonzero solution, which means $T - \lambda I$ must be singular (non-invertible). By the Properties of the Determinant, singularity is equivalent to the determinant vanishing:

[definition:Characteristic Polynomial]
The characteristic polynomial of an $n \times n$ matrix $A$ is
\begin{align*} \chi_A(t) = \det(A - tI). \end{align*}
This is a polynomial of degree $n$ in $t$, with leading term $(-t)^n$.
[/definition]

The connection between eigenvalues and the characteristic polynomial is immediate: $\lambda$ is an eigenvalue if and only if $A - \lambda I$ is singular, which happens precisely when $\chi_A(\lambda) = 0$. This chain of equivalences — from the existence of a nonzero kernel element, through singularity, to the vanishing of the determinant — is what makes the characteristic polynomial so powerful.

[quotetheorem:918]

The theorem reduces the problem of finding eigenvalues to solving a polynomial equation. Over $\mathbb{C}$, the Fundamental Theorem of Algebra guarantees that $\chi_A(t)$ has exactly $n$ roots (counting multiplicity), so every complex matrix has at least one eigenvalue. Over $\mathbb{R}$, some roots may be complex: a rotation matrix in $\mathbb{R}^2$ by angle $\theta \notin \{0, \pi\}$ has eigenvalues $e^{\pm i\theta}$, which are not real — there is no real direction preserved by a nontrivial rotation. The proof is a direct chain of logical equivalences.

[citeproof:918]

[example:A Complete Eigenvalue Computation]
Consider $A = MATHENV6nnu96P33END$. The characteristic polynomial is:
\begin{align*} \chi_A(t) = \det\begin{pmatrix} 4 - t & -2 \\ 1 & 1 - t \end{pmatrix} = (4-t)(1-t) + 2 = t^2 - 5t + 6 = (t-2)(t-3). \end{align*}
The eigenvalues are $\lambda_1 = 2$ and $\lambda_2 = 3$.

Finding eigenvectors. For $\lambda_1 = 2$: solve $(A - 2I)v = \mathbf{0}$:
\begin{align*} \begin{pmatrix} 2 & -2 \\ 1 & -1 \end{pmatrix}\begin{pmatrix} v_1 \\ v_2 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix} \implies v_1 = v_2. \end{align*}
Eigenvectors: $v_1 = MATHENV6nnu96P36END$ (and all scalar multiples).

For $\lambda_2 = 3$: solve $(A - 3I)v = \mathbf{0}$:
\begin{align*} \begin{pmatrix} 1 & -2 \\ 1 & -2 \end{pmatrix}\begin{pmatrix} v_1 \\ v_2 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix} \implies v_1 = 2v_2. \end{align*}
Eigenvectors: $v_2 = MATHENV6nnu96P38END$.

Verification: $AMATHENV6nnu96P39END = MATHENV6nnu96P40END = 2MATHENV6nnu96P41END$ and $AMATHENV6nnu96P42END = MATHENV6nnu96P43END = 3MATHENV6nnu96P44END$. $\checkmark$
[/example]

Not every real matrix has real eigenvalues. When the characteristic polynomial has no real roots, the eigenvalues are complex, and no real eigenvectors exist.

[example:Failure of Real Eigenvalues]
The rotation matrix $R(\pi/3) = MATHENV6nnu96P45END$ rotates every vector in $\mathbb{R}^2$ by $60°$. Its characteristic polynomial is
\begin{align*} \chi(t) = t^2 - t + 1, \end{align*}
with discriminant $1 - 4 = -3 < 0$. The roots are $t = \frac{1 \pm i\sqrt{3}}{2} = e^{\pm i\pi/3}$ — complex numbers on the unit circle.

Geometrically, this is inevitable: a rotation by $60°$ sends every nonzero vector to a different direction, so no real direction is preserved. The eigenvalue equation $Rv = \lambdav$ demands a direction that is merely scaled, but rotation doesn't scale — it rotates. Over $\mathbb{C}$, eigenvectors do exist ($v = MATHENV6nnu96P47END$ with $\lambda = e^{i\pi/3}$), but they have complex entries and don't correspond to directions in $\mathbb{R}^2$. This example shows that working over $\mathbb{R}$ alone, diagonalisation can fail for reasons beyond the geometric/algebraic multiplicity gap — the eigenvalues themselves may leave the real numbers.
[/example]

We can now resolve the coupled ODE from the chapter opening. The matrix $A = MATHENV6nnu96P48END$ is the same one we just diagonalised: eigenvalues $\lambda_1 = 2$, $\lambda_2 = 3$, eigenvectors $v_1 = (1,1)^T$, $v_2 = (2,1)^T$. Setting $P = MATHENV6nnu96P49END$ and $x = Pu$, the system $x' = Ax$ transforms to $u' = P^{-1}APu = MATHENV6nnu96P50ENDu$, which decouples into $u_1' = 2u_1$, $u_2' = 3u_2$. The solution is $u_1(t) = c_1 e^{2t}$, $u_2(t) = c_2 e^{3t}$, so
\begin{align*} x(t) = c_1 e^{2t}\begin{pmatrix} 1 \\ 1 \end{pmatrix} + c_2 e^{3t}\begin{pmatrix} 2 \\ 1 \end{pmatrix}. \end{align*}
The general solution is a superposition of exponential growth along each eigenvector direction, with the growth rate determined by the eigenvalue. The constants $c_1, c_2$ are fixed by the initial condition. Without diagonalisation, this solution would require the matrix exponential $e^{At}$ — a much heavier computation.

Eigenspaces and Multiplicities

For each eigenvalue $\lambda$, the set of all vectors scaled by $\lambda$ (including the zero vector) forms a subspace.

[definition:Eigenspace]
The eigenspace of $A$ for eigenvalue $\lambda$ is $E_\lambda = \ker(A - \lambda I) = \{v : Av = \lambdav\}$.
[/definition]

Two distinct notions of "how many times" an eigenvalue occurs play a crucial role in diagonalisation.

[definition:Algebraic Multiplicity]
The algebraic multiplicity $M_\lambda$ of an eigenvalue $\lambda$ is its multiplicity as a root of $\chi_A(t)$.
[/definition]

[definition:Geometric Multiplicity]
The geometric multiplicity $m_\lambda$ of an eigenvalue $\lambda$ is $\dim E_\lambda = \dim\ker(A - \lambda I)$.
[/definition]

The algebraic multiplicity counts how many times $\lambda$ appears in the factorisation of the characteristic polynomial; the geometric multiplicity counts how many linearly independent eigenvectors $\lambda$ has. These need not be equal, and the gap between them is the fundamental obstruction to diagonalisation. The relationship between them is governed by the following inequality, whose proof uses a basis adapted to the eigenspace to reveal a block-triangular structure in the matrix.

[quotetheorem:919]

The upper bound is proved by extending a basis for $E_\lambda$ to a basis for the full space: in this basis, $A$ has a block-triangular form with $\lambda I_{m_\lambda}$ in the upper-left corner, so $(\lambda - t)^{m_\lambda}$ divides $\chi_A(t)$, giving $M_\lambda \geq m_\lambda$. The block structure arises because $A$ maps every eigenvector in $E_\lambda$ to a multiple of itself, producing zero entries in the lower-left block.

[citeproof:919]

Diagonalisation

[definition:Diagonalisable Matrix]
An $n \times n$ matrix $A$ is diagonalisable if there exists an invertible matrix $P$ such that $P^{-1}AP = D$ is diagonal. Equivalently, $A$ is diagonalisable if and only if $\mathbb{R}^n$ (or $\mathbb{C}^n$) has a basis consisting entirely of eigenvectors of $A$.
[/definition]

When $A$ is diagonalisable, the columns of $P$ are the eigenvectors and the diagonal entries of $D$ are the corresponding eigenvalues. Every computation simplifies: $A^k = PD^kP^{-1}$, and $D^k$ is trivial (just raise each diagonal entry to the $k$th power).

The first step toward the diagonalisation criterion is showing that eigenvectors from different eigenspaces never interfere with each other. This is not obvious a priori — why should vectors satisfying $Av_i = \lambda_iv_i$ for distinct $\lambda_i$ be linearly independent? The proof uses a clever minimality argument: apply $A$ to a hypothetical linear relation and subtract a multiple of the original to reduce the number of terms.

[quotetheorem:920]

This theorem has an immediate corollary: if $A$ is $n \times n$ and has $n$ distinct eigenvalues, then $A$ is diagonalisable (the $n$ eigenvectors are independent by the theorem, and $n$ independent vectors in an $n$-dimensional space form a basis). But matrices with repeated eigenvalues may or may not be diagonalisable, depending on whether the geometric multiplicity matches the algebraic multiplicity. The proof's strategy of reducing the length of a relation by applying $A$ and subtracting is a technique that appears throughout algebra.

[citeproof:920]

The complete characterisation of diagonalisability follows from combining the independence result with the multiplicity inequality. The question reduces to a simple counting argument: do the eigenspaces collectively supply enough independent vectors to form a basis?

[quotetheorem:921]

The criterion makes it clear what can go wrong: diagonalisation fails when there are "too few" eigenvectors for some eigenvalue — when the eigenspace is smaller than the characteristic polynomial predicts. The proof is a clean dimension-counting argument: the total number of independent eigenvectors is $\sum m_\lambda$, which equals $n = \sum M_\lambda$ if and only if $m_\lambda = M_\lambda$ for every $\lambda$.

[citeproof:921]

[example:A Non-Diagonalisable Matrix]
The matrix $B = MATHENV6nnu96P52END$ has characteristic polynomial $\chi_B(t) = (2 - t)^2$, so $\lambda = 2$ is the only eigenvalue with $M_2 = 2$. Solving $(B - 2I)v = \mathbf{0}$:
\begin{align*} \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}\begin{pmatrix} v_1 \\ v_2 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix} \implies v_2 = 0. \end{align*}
The eigenspace is $E_2 = \operatorname{span}\{(1, 0)\}$, so $m_2 = 1 < 2 = M_2$. There is only one independent eigenvector, but we need two to form a basis for $\mathbb{R}^2$.

What goes wrong geometrically? The matrix $B$ is the identity plus the "shear" $MATHENV6nnu96P54END$: it fixes the $x$-axis and shifts every point by an amount proportional to its height. No basis of eigenvectors exists because there is no second invariant direction — the shear mixes the $y$-component into the $x$-component.

This raises a natural question: is there a class of matrices for which diagonalisation is guaranteed — where the pathology of deficient eigenspaces cannot occur? The answer, developed below, is that real symmetric (or Hermitian) matrices always have enough eigenvectors, and their eigenvectors are automatically orthogonal.
[/example]

Trace, Determinant, and Eigenvalues

The characteristic polynomial connects the eigenvalues to familiar matrix quantities. Since similar matrices have the same characteristic polynomial (and hence the same eigenvalues), trace and determinant must be expressible purely in terms of the eigenvalues. The following theorem makes this relationship precise.

[quotetheorem:922]

These identities provide a useful consistency check in eigenvalue computations: the sum of eigenvalues must equal the trace (the sum of diagonal entries), and the product must equal the determinant. They also reveal that trace and determinant are properties of the linear map, not just the matrix representing it — any two matrices related by similarity share the same eigenvalues, hence the same trace and determinant.

[citeproof:922]

The Cayley-Hamilton Theorem

A remarkable result states that every matrix satisfies its own characteristic equation. The proof uses the adjugate matrix: the identity $\operatorname{adj}(A - tI) \cdot (A - tI) = \chi_A(t) \cdot I$ holds as a polynomial identity in $t$, and equating coefficients of each power of $t$ then summing with appropriate matrix powers produces a telescoping cancellation.

[quotetheorem:923]

This is emphatically not the trivial statement that $\chi_A(\lambda) = 0$ for eigenvalues $\lambda$: we are substituting the matrix $A$ into the polynomial, not a scalar. The result is a matrix equation, and it holds for every square matrix, diagonalisable or not. One application: it implies that $A^{-1}$ (if it exists) can be written as a polynomial in $A$ of degree at most $n - 1$, without computing the inverse directly. The proof's telescoping argument — expanding the adjugate as a matrix polynomial in $t$, matching coefficients, multiplying by successive powers of $A$, and summing — is a beautiful example of algebraic bookkeeping.

[citeproof:923]

Symmetric and Hermitian Matrices

For general matrices, diagonalisation may fail (as we saw above). But for matrices with a symmetry property — real symmetric matrices or, more generally, complex Hermitian matrices — diagonalisation is guaranteed, with the further luxury that the eigenvectors can be chosen orthonormal.

[definition:Transpose and Hermitian Conjugate]
For an $m \times n$ matrix $M$: the transpose $M^T$ has $(M^T)_{ij} = M_{ji}$; the Hermitian conjugate $M^\dagger$ has $(M^\dagger)_{ij} = \overline{M_{ji}}$.
[/definition]

[definition:Symmetric and Hermitian Matrices]
A square matrix $M$ is symmetric if $M^T = M$ (real case) and Hermitian if $M^\dagger = M$ (complex case). Every real symmetric matrix is Hermitian.
[/definition]

Why should symmetry improve the eigenvalue theory? The key is that Hermitian matrices interact well with the inner product. If $A = A^\dagger$ and $Av = \lambdav$, then $\lambda\langlev, v\rangle = \langlev, Av\rangle = \langle Av, v\rangle = \overline{\lambda}\langlev, v\rangle$, so $\lambda = \overline{\lambda}$ — eigenvalues are real. This is impossible for a general matrix (rotations have complex eigenvalues), but symmetry forces it. The same inner product argument also forces eigenvectors for distinct eigenvalues to be orthogonal, a much stronger condition than mere linear independence.

[quotetheorem:924]

The reality of eigenvalues and orthogonality of eigenvectors are crucial in quantum mechanics (where observables are Hermitian operators) and in optimisation (where real eigenvalues determine the definiteness of quadratic forms). The proof exploits the defining property $A^\dagger = A$ to move $A$ between the two slots of the inner product, then uses the positivity $v^\daggerv > 0$ to draw conclusions.

[citeproof:924]

The culmination is the spectral theorem: every Hermitian matrix is not just diagonalisable, but orthogonally diagonalisable. The proof proceeds by induction on the matrix size: find one eigenvector, show that its orthogonal complement is invariant under $A$ (this is where $A^\dagger = A$ is used a second time), and apply the inductive hypothesis to the restricted operator.

[quotetheorem:925]

This is the strongest possible diagonalisation result: not only does a basis of eigenvectors exist, it can be chosen orthonormal. The inductive argument reveals why Hermitian symmetry is so powerful: it guarantees that the orthogonal complement of any eigenspace is itself invariant under $A$, allowing the matrix to be "peeled apart" one eigenvalue at a time. The Spectral Theorem for Hermitian Matrices is the foundation for principal component analysis, the classification of quadratic forms, quantum mechanics (observables are Hermitian operators), and much more.

[citeproof:925]

The hypothesis $A^\dagger = A$ cannot be dropped. The following example shows what goes wrong without it.

[example:Failure of the Spectral Theorem Without Symmetry]
The matrix $A = MATHENV6nnu96P55END$ is not symmetric ($A^T \neq A$). It has eigenvalues $\lambda_1 = 2$ and $\lambda_2 = 3$ (read from the diagonal, since $A$ is upper triangular), with eigenvectors $v_1 = MATHENV6nnu96P56END$ and $v_2 = MATHENV6nnu96P57END$.

The matrix is diagonalisable (two distinct eigenvalues, so the Diagonalisation Criterion is satisfied). However, the eigenvectors are not orthogonal: $v_1 \cdot v_2 = 1 \neq 0$. No choice of eigenvectors can fix this — the eigenspaces themselves are not perpendicular. Contrast this with the symmetric matrix $MATHENV6nnu96P58END$, which has the same eigenvalues but whose eigenvectors are guaranteed orthogonal by the Spectral Properties of Hermitian Matrices.

The spectral theorem gives us three things that can each fail independently for non-symmetric matrices: eigenvalues may be complex (as in the rotation example above), eigenvectors may not be orthogonal (as here), and the matrix may not be diagonalisable at all (as in the shear example). Hermitian symmetry rules out all three pathologies simultaneously.
[/example]

Quadratic Forms

A major application of the spectral theorem is the classification of quadratic forms — homogeneous degree-2 polynomial functions.

[definition:Quadratic Form]
A quadratic form on $\mathbb{R}^n$ is a function $Q: \mathbb{R}^n \to \mathbb{R}$ of the form $Q(x) = x^T Ax$, where $A$ is a real symmetric $n \times n$ matrix.
[/definition]

By the Spectral Theorem for Hermitian Matrices, there exists an orthogonal matrix $P$ such that $P^T AP = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)$. In the rotated coordinates $y = P^Tx$:
\begin{align*} Q(x) = \lambda_1 y_1^2 + \lambda_2 y_2^2 + \cdots + \lambda_n y_n^2. \end{align*}
Every quadratic form reduces to a sum of squares. The signs of the eigenvalues determine the geometry: if all $\lambda_i > 0$, the level set $Q = 1$ is an ellipsoid (positive definite); if the signs are mixed, it is a hyperboloid (indefinite).

[example:Classification of a Conic]
Consider $Q(x_1, x_2) = 3x_1^2 - 2x_1 x_2 + 3x_2^2$. The corresponding symmetric matrix is $A = MATHENV6nnu96P60END$. The eigenvalues are $\lambda = 3 \pm 1$, i.e., $\lambda_1 = 2$ and $\lambda_2 = 4$, with orthonormal eigenvectors $u_1 = \frac{1}{\sqrt{2}}(1, 1)^T$ and $u_2 = \frac{1}{\sqrt{2}}(-1, 1)^T$. In the principal coordinate system (rotated by $45°$):
\begin{align*} Q = 2y_1^2 + 4y_2^2. \end{align*}
The level set $Q = 1$ is an ellipse with semi-axes $1/\sqrt{2}$ and $1/2$, tilted $45°$ from the original axes. Both eigenvalues are positive, confirming the form is positive definite — $Q(x) > 0$ for all $x \neq \mathbf{0}$.
[/example]

Attribution Debug Info:

Total segments: 1

Attributed segments: 0

Non-attributed segments: 1

Attribution Summary

admin

Contributions: 2
Sources: create
Last Modified: 2/22/2026