Suppose you need to compute $A^{100}$ for some $3 \times 3$ matrix $A$. Done naively, this requires ninety-nine matrix multiplications, each involving twenty-seven scalar multiplications. That is a great deal of arithmetic, and it scales catastrophically as the exponent grows. Now suppose instead that $A$ happened to be a diagonal matrix $D = \operatorname{diag}(\lambda_1, \lambda_2, \lambda_3)$. Then $D^{100} = \operatorname{diag}(\lambda_1^{100}, \lambda_2^{100}, \lambda_3^{100})$ — three scalar exponentiations and you are done. Diagonalisation is the theory of when and how a general matrix can be reduced to a diagonal one by a change of basis, and it is one of the most powerful tools in linear algebra.
The core insight is this: if you can find a basis of $\mathbb{R}^n$ (or $\mathbb{C}^n$) consisting entirely of eigenvectors of a matrix $A$, then in that basis $A$ acts by simple scalar multiplication on each coordinate. The matrix becomes diagonal, and every computation — powers, exponentials, functions of $A$ — becomes a matter of scalar arithmetic. The central question of diagonalisation is: when does such a basis exist?
[example: Powers of a 2x2 Matrix]
Consider the matrix
\begin{align*}
A &= \begin{pmatrix} 3 & 1 \\ 0 & 2 \end{pmatrix}.
\end{align*}
Computing $A^n$ directly requires tracking off-diagonal entries that grow with $n$. The $(1,2)$-entry of $A^2$ is $3 \cdot 1 + 1 \cdot 2 = 5$; the pattern is not immediately transparent.
Now suppose instead we find the eigenvectors. The characteristic polynomial is
\begin{align*}
\det(A - \lambda I) &= (3 - \lambda)(2 - \lambda) - 0 = (\lambda - 3)(\lambda - 2),
\end{align*}
giving eigenvalues $\lambda_1 = 3$ and $\lambda_2 = 2$.
For $\lambda_1 = 3$: solve $(A - 3I)v = 0$, i.e., $\begin{pmatrix} 0 & 1 \\ 0 & -1 \end{pmatrix} v = 0$, giving $v_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}$.
For $\lambda_2 = 2$: solve $(A - 2I)v = 0$, i.e., $\begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix} v = 0$, giving $v_2 = \begin{pmatrix} 1 \\ -1 \end{pmatrix}$.
Form the matrix $P = \begin{pmatrix} 1 & 1 \\ 0 & -1 \end{pmatrix}$ whose columns are the eigenvectors. Then
\begin{align*}
P^{-1} &= \begin{pmatrix} 1 & 1 \\ 0 & -1 \end{pmatrix}^{-1} = \begin{pmatrix} 1 & 1 \\ 0 & -1 \end{pmatrix},
\end{align*}
and computing shows $P^{-1}AP = \begin{pmatrix} 3 & 0 \\ 0 & 2 \end{pmatrix} =: D$. Therefore
\begin{align*}
A^n &= P D^n P^{-1} = \begin{pmatrix} 1 & 1 \\ 0 & -1 \end{pmatrix} \begin{pmatrix} 3^n & 0 \\ 0 & 2^n \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 0 & -1 \end{pmatrix} = \begin{pmatrix} 3^n & 3^n - 2^n \\ 0 & 2^n \end{pmatrix}.
\end{align*}
The formula is now explicit for all $n \in \mathbb{N}$, and no iteration is required.
[/example]
This example illustrates the general mechanism: find eigenvalues, find eigenvectors, assemble them into an invertible matrix $P$, and conclude that $A = PDP^{-1}$. The remainder of this chapter makes this precise, identifies when it works, and explores the fundamental obstruction when it does not.
## Definition
To define diagonalisability, we first need the notion of eigenvalue and eigenvector.
When we apply a linear map $T: V \to V$ to a vector, the result $T(v)$ is generally a new vector pointing in a different direction. The special vectors where $T$ merely scales — where the direction is preserved — are the eigenvectors, and they encode the intrinsic geometry of $T$.
[definition: Eigenvalue and Eigenvector]
Let $V$ be a vector space over a field $\mathbb{F}$ (we work primarily with $\mathbb{F} = \mathbb{R}$ or $\mathbb{F} = \mathbb{C}$), and let $T: V \to V$ be a linear map. A scalar $\lambda \in \mathbb{F}$ is an **eigenvalue** of $T$ if there exists a nonzero vector $v \in V$ satisfying
\begin{align*}
T(v) &= \lambda v.
\end{align*}
Any such nonzero $v$ is called an **eigenvector** of $T$ corresponding to $\lambda$. The zero vector is explicitly excluded: requiring $v \neq 0$ ensures that every eigenvalue is associated with a genuinely nontrivial direction.
[/definition]
[remark: Matrix vs. Linear Map]
For a matrix $A \in \mathbb{F}^{n \times n}$, the eigenvalue equation is $Av = \lambda v$ with $v \in \mathbb{F}^n$, $v \neq 0$. A matrix and the linear map it represents share the same eigenvalues; eigenvalues are intrinsic to the linear map, not to the choice of basis. Different bases give different matrices representing the same map, but their eigenvalues — and hence their diagonalisability — are identical.
[/remark]
The set of all eigenvectors for a given eigenvalue, together with the zero vector, forms a subspace. Understanding these subspaces is essential for counting how many linearly independent eigenvectors exist.
[definition: Eigenspace]
Let $T: V \to V$ be a linear map and $\lambda \in \mathbb{F}$ an eigenvalue of $T$. The **eigenspace** of $T$ corresponding to $\lambda$ is
\begin{align*}
E_\lambda &= \ker(T - \lambda \operatorname{Id}) = \{v \in V : T(v) = \lambda v\}.
\end{align*}
[/definition]
This is a nonzero subspace of $V$, since $\lambda$ is an eigenvalue precisely when $E_\lambda \neq \{0\}$.
Now we can state the central definition of the chapter.
The idea behind diagonalisability is that the matrix becomes as simple as possible — a diagonal matrix — once we express it in the right basis. That "right basis" consists entirely of eigenvectors.
[definition: Diagonalisable Matrix]
A matrix $A \in \mathbb{F}^{n \times n}$ is **diagonalisable** over $\mathbb{F}$ if there exists an invertible matrix $P \in \mathbb{F}^{n \times n}$ and a diagonal matrix $D = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)$ such that
\begin{align*}
A &= P D P^{-1}.
\end{align*}
[/definition]
Equivalently, $A$ is diagonalisable if and only if $\mathbb{F}^n$ has a basis consisting of eigenvectors of $A$. In this case, the columns of $P$ are the eigenvectors and the diagonal entries of $D$ are the corresponding eigenvalues.
[explanation: Why Eigenvector Basis Gives Diagonalisation]
Let us unpack the equivalence. Suppose $v_1, \ldots, v_n$ is a basis of $\mathbb{F}^n$ with $Av_i = \lambda_i v_i$ for each $i$. Let $P$ be the matrix with $v_i$ as its $i$-th column. Then
\begin{align*}
AP &= A \begin{pmatrix} v_1 & \cdots & v_n \end{pmatrix} = \begin{pmatrix} Av_1 & \cdots & Av_n \end{pmatrix} = \begin{pmatrix} \lambda_1 v_1 & \cdots & \lambda_n v_n \end{pmatrix} = P D,
\end{align*}
where $D = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)$. Since $v_1, \ldots, v_n$ form a basis, $P$ is invertible (its columns are linearly independent), and left-multiplying by $P^{-1}$ gives $P^{-1}AP = D$.
Conversely, if $P^{-1}AP = D$, then reading the equation $AP = PD$ column by column gives $Av_i = \lambda_i v_i$ for each column $v_i$ of $P$. The invertibility of $P$ forces $v_1, \ldots, v_n$ to be linearly independent, hence a basis of eigenvectors.
[/explanation]
## The Characteristic Polynomial
To find eigenvalues, we need a systematic method. The eigenvalue condition $Av = \lambda v$ with $v \neq 0$ says that the map $A - \lambda I$ is not injective — equivalently, not invertible. Over a finite-dimensional space, a linear map is non-injective if and only if its determinant is zero.
[definition: Characteristic Polynomial]
Let $A \in \mathbb{F}^{n \times n}$. The **characteristic polynomial** of $A$ is
\begin{align*}
p_A(\lambda) &= \det(\lambda I - A).
\end{align*}
This is a polynomial of degree $n$ in $\lambda$, with leading coefficient $1$.
[/definition]
The scalar $\lambda$ is an eigenvalue of $A$ if and only if $p_A(\lambda) = 0$.
[remark: Sign Convention]
Some texts define the characteristic polynomial as $\det(A - \lambda I)$. Both conventions give the same roots; they differ by a sign of $(-1)^n$. The convention $\det(\lambda I - A)$ is preferred here because it yields a monic polynomial.
[/remark]
[example: Characteristic Polynomial of a 3x3 Matrix]
Let
\begin{align*}
A &= \begin{pmatrix} 2 & 0 & 0 \\ 1 & 2 & 0 \\ 0 & 0 & 3 \end{pmatrix}.
\end{align*}
The characteristic polynomial is
\begin{align*}
p_A(\lambda) &= \det(\lambda I - A) = \det \begin{pmatrix} \lambda - 2 & 0 & 0 \\ -1 & \lambda - 2 & 0 \\ 0 & 0 & \lambda - 3 \end{pmatrix}.
\end{align*}
Since the matrix is lower block-triangular (the $(3,3)$ block is $(\lambda - 3)$ and the upper-left $2 \times 2$ block is lower triangular), the determinant factors along the diagonal:
\begin{align*}
p_A(\lambda) &= (\lambda - 2)^2 (\lambda - 3).
\end{align*}
So the eigenvalues are $\lambda = 2$ (with multiplicity $2$) and $\lambda = 3$ (with multiplicity $1$). We will return to this matrix when we discuss algebraic and geometric multiplicities.
[/example]
Here is the central obstruction to diagonalisation: a repeated root of $p_A$ does not guarantee a large eigenspace. Consider a matrix whose characteristic polynomial has $\lambda = 2$ as a double root. If the eigenspace $E_2$ is two-dimensional, we can find two linearly independent eigenvectors for $\lambda = 2$, and diagonalisation succeeds. But if $E_2$ is only one-dimensional — if there is just a single independent eigenvector despite the algebraic "weight" of $2$ — then we fall short by exactly one basis vector, and diagonalisation fails. To measure this potential gap precisely, we introduce two separate notions of multiplicity: one counting how many times $\lambda$ appears as a root of $p_A$, and one measuring the actual dimension of the eigenspace.
[definition: Algebraic and Geometric Multiplicity]
Let $\lambda$ be an eigenvalue of $A \in \mathbb{F}^{n \times n}$.
- The **algebraic multiplicity** of $\lambda$, denoted $m_a(\lambda)$, is the multiplicity of $\lambda$ as a root of the characteristic polynomial $p_A$.
- The **geometric multiplicity** of $\lambda$, denoted $m_g(\lambda)$, is the dimension of the eigenspace $E_\lambda = \ker(A - \lambda I)$.
[/definition]
The relationship between these two multiplicities is fundamental.
[quotetheorem:3276]
The lower bound is immediate: $\lambda$ is an eigenvalue, so $E_\lambda$ contains at least one nonzero vector, giving $m_g(\lambda) \ge 1$. For the upper bound, choose a basis $v_1, \ldots, v_k$ of $E_\lambda$ (where $k = m_g(\lambda)$) and extend it to a basis $v_1, \ldots, v_k, u_1, \ldots, u_{n-k}$ of $\mathbb{F}^n$. In this basis, the matrix of $A - \lambda I$ has its first $k$ columns identically zero (since $Av_i = \lambda v_i$), so $A - \lambda I$ — and hence $A$ itself in this basis — has a block form with a $k$-fold factor of $\lambda$ appearing in the characteristic polynomial. This forces $m_a(\lambda) \ge k = m_g(\lambda)$.
## Criteria for Diagonalisability
Not every matrix is diagonalisable. Understanding precisely when diagonalisation fails is as important as knowing when it succeeds.
The cleanest sufficient condition is that all eigenvalues are distinct. If $A$ has $n$ distinct eigenvalues, we can always find $n$ linearly independent eigenvectors — one per eigenspace — and diagonalise. But distinct eigenvalues are not necessary: a matrix can have repeated eigenvalues and still be diagonalisable (the identity matrix $I$ has eigenvalue $1$ with algebraic multiplicity $n$, yet it is already diagonal).
[quotetheorem:920]
This theorem immediately gives the following corollary.
[quotetheorem:404]
But the converse fails: a matrix with repeated eigenvalues may or may not be diagonalisable. The precise criterion involves matching algebraic and geometric multiplicities.
[quotetheorem:3277]
[explanation: Why the Criterion Works]
The idea is to count dimensions. The total dimension of $\mathbb{F}^n$ is $n$, which equals $\sum_{i=1}^k m_a(\lambda_i)$. For $A$ to be diagonalisable, we need $n$ linearly independent eigenvectors — one basis vector per dimension. Eigenvectors from different eigenspaces are automatically linearly independent (by the theorem above), so the maximum number of linearly independent eigenvectors we can assemble is $\sum_{i=1}^k m_g(\lambda_i)$. This sum equals $n$ if and only if each $m_g(\lambda_i) = m_a(\lambda_i)$, since we always have $m_g(\lambda_i) \le m_a(\lambda_i)$.
If even one $m_g(\lambda_i) < m_a(\lambda_i)$, then $\sum m_g(\lambda_i) < n$, and we cannot fill out a basis with eigenvectors.
[/explanation]
Let us see both cases explicitly.
[example: A Diagonalisable Matrix with Repeated Eigenvalue]
Consider
\begin{align*}
A &= \begin{pmatrix} 2 & 1 & 0 \\ 1 & 2 & 0 \\ 0 & 0 & 3 \end{pmatrix}.
\end{align*}
This matrix is not diagonal — the off-diagonal entries $A_{12} = A_{21} = 1$ couple the first two coordinates — yet we will see it is diagonalisable.
The characteristic polynomial expands as
\begin{align*}
p_A(\lambda) &= (\lambda - 3) \det \begin{pmatrix} \lambda - 2 & -1 \\ -1 & \lambda - 2 \end{pmatrix} = (\lambda - 3)[(\lambda-2)^2 - 1] = (\lambda - 3)(\lambda - 1)(\lambda - 3),
\end{align*}
so the eigenvalues are $\lambda = 1$ (multiplicity $1$) and $\lambda = 3$ (multiplicity $2$).
For $\lambda = 1$: $A - I = \begin{pmatrix} 1 & 1 & 0 \\ 1 & 1 & 0 \\ 0 & 0 & 2 \end{pmatrix}$. Row-reducing gives $v_3 = 0$ and $v_1 + v_2 = 0$, so $\ker(A - I) = \operatorname{span}\begin{pmatrix} 1 \\ -1 \\ 0 \end{pmatrix}$, giving $m_g(1) = 1 = m_a(1)$.
For $\lambda = 3$: $A - 3I = \begin{pmatrix} -1 & 1 & 0 \\ 1 & -1 & 0 \\ 0 & 0 & 0 \end{pmatrix}$. Row-reducing gives $v_1 = v_2$, with $v_3$ free, so
\begin{align*}
\ker(A - 3I) &= \operatorname{span}\left\{ \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix} \right\},
\end{align*}
giving $m_g(3) = 2 = m_a(3)$.
Since all geometric multiplicities match algebraic multiplicities, $A$ is diagonalisable. Setting
\begin{align*}
P &= \begin{pmatrix} 1 & 1 & 0 \\ -1 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}, \quad D = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 3 \end{pmatrix},
\end{align*}
one can verify $P^{-1}AP = D$. The repeated eigenvalue $\lambda = 3$ does not obstruct diagonalisation here because the eigenspace is genuinely two-dimensional.
[/example]
[example: A Non-Diagonalisable Matrix]
Return to the matrix from the earlier example:
\begin{align*}
A &= \begin{pmatrix} 2 & 0 & 0 \\ 1 & 2 & 0 \\ 0 & 0 & 3 \end{pmatrix}.
\end{align*}
The characteristic polynomial is again $p_A(\lambda) = (\lambda - 2)^2(\lambda - 3)$, so $m_a(2) = 2$.
For $\lambda = 2$:
\begin{align*}
A - 2I &= \begin{pmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}.
\end{align*}
The system $(A - 2I)v = 0$ gives $v_1 = 0$ (from row 2) and $v_3 = 0$ (from row 3), with $v_2$ free. So $\ker(A - 2I) = \operatorname{span}(e_2)$, giving $m_g(2) = 1 < 2 = m_a(2)$.
Since geometric multiplicity strictly less than algebraic multiplicity for $\lambda = 2$, the matrix $A$ is **not diagonalisable**. We can find at most one eigenvector for the eigenvalue $2$, but we need two to span a two-dimensional eigenspace. The deficiency is $m_a(2) - m_g(2) = 1$, and this one "missing" eigenvector cannot be recovered.
[/example]
[illustration:diagonalisable-vs-non-diagonalisable]
## Diagonalisation over $\mathbb{R}$ vs. $\mathbb{C}$
A matrix with real entries may fail to be diagonalisable over $\mathbb{R}$ simply because its eigenvalues are not real — yet it becomes diagonalisable once we work over $\mathbb{C}$. This is not a failure of the matrix; it is a limitation of the field.
[example: Rotation Matrix over R and C]
Let
\begin{align*}
A &= \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix},
\end{align*}
the matrix of counterclockwise rotation by $90°$. The characteristic polynomial is
\begin{align*}
p_A(\lambda) &= \lambda^2 + 1.
\end{align*}
This polynomial has no real roots: $\lambda^2 + 1 > 0$ for all $\lambda \in \mathbb{R}$. So $A$ has no real eigenvalues and is not diagonalisable over $\mathbb{R}$.
Over $\mathbb{C}$, the roots are $\lambda = i$ and $\lambda = -i$. These are distinct, so $A$ is diagonalisable over $\mathbb{C}$.
For $\lambda = i$: $(A - iI)v = 0$ gives $\begin{pmatrix} -i & -1 \\ 1 & -i \end{pmatrix} v = 0$. From the first row: $-iv_1 = v_2$, so the eigenvector is $v_1 = \begin{pmatrix} 1 \\ -i \end{pmatrix}$.
For $\lambda = -i$: similarly, $v_2 = \begin{pmatrix} 1 \\ i \end{pmatrix}$.
With $P = \begin{pmatrix} 1 & 1 \\ -i & i \end{pmatrix}$, one can verify $P^{-1}AP = \begin{pmatrix} i & 0 \\ 0 & -i \end{pmatrix}$.
[/example]
The Fundamental Theorem of Algebra guarantees that every polynomial of degree $n$ over $\mathbb{C}$ has exactly $n$ roots (counted with multiplicity). This means the characteristic polynomial of any complex matrix splits completely into linear factors, so the only obstruction to diagonalisation over $\mathbb{C}$ is the gap between geometric and algebraic multiplicities — not the absence of eigenvalues.
[quotetheorem:3278]
Consequently, over $\mathbb{C}$, the diagonalisability criterion reduces to checking geometric multiplicities alone. Over $\mathbb{R}$, one must first check whether all eigenvalues are real.
## Symmetric Matrices and the Spectral Theorem
There is a large and important class of real matrices that are always diagonalisable over $\mathbb{R}$, with eigenvectors that are mutually orthogonal: the symmetric matrices. This is the content of the real Spectral Theorem, and it is one of the deepest results in finite-dimensional linear algebra.
Symmetry is a strong geometric condition that constrains how a matrix interacts with the inner product.
[definition: Symmetric Matrix]
A matrix $A \in \mathbb{R}^{n \times n}$ is **symmetric** if $A = A^\top$, i.e., $A_{ij} = A_{ji}$ for all $i, j$.
[/definition]
In terms of the standard inner product $\langle v, w \rangle = v^\top w$ on $\mathbb{R}^n$, symmetry means
\begin{align*}
\langle Av, w \rangle &= \langle v, Aw \rangle \quad \text{for all } v, w \in \mathbb{R}^n,
\end{align*}
since $\langle Av, w \rangle = (Av)^\top w = v^\top A^\top w = v^\top Aw = \langle v, Aw \rangle$.
This self-adjoint property has two striking consequences: all eigenvalues are real, and eigenvectors for distinct eigenvalues are orthogonal.
[quotetheorem:3279]
[explanation: Why Symmetry Forces Real Eigenvalues]
Suppose $\lambda \in \mathbb{C}$ is an eigenvalue with eigenvector $v \in \mathbb{C}^n$, $v \neq 0$. We work over $\mathbb{C}$ and use the Hermitian inner product $\langle v, w \rangle = \bar{v}^\top w$.
Since $Av = \lambda v$,
\begin{align*}
\lambda \langle v, v \rangle &= \langle v, \lambda v \rangle = \langle v, Av \rangle.
\end{align*}
The symmetry $A = A^\top$ (with real entries, so $\bar{A} = A$) gives $A^* = \bar{A}^\top = A^\top = A$, so $A$ is Hermitian. Therefore
\begin{align*}
\langle v, Av \rangle &= \langle Av, v \rangle = \langle \lambda v, v \rangle = \bar{\lambda} \langle v, v \rangle.
\end{align*}
Combining: $\lambda \langle v, v \rangle = \bar{\lambda} \langle v, v \rangle$. Since $v \neq 0$, we have $\langle v, v \rangle = |v|^2 > 0$, so $\lambda = \bar{\lambda}$, meaning $\lambda \in \mathbb{R}$.
[/explanation]
[quotetheorem:3280]
These two facts set the stage for the Spectral Theorem. Ordinary diagonalisation gives $A = PDP^{-1}$ for some invertible $P$, but the columns of $P$ need not be orthogonal — so computing $P^{-1}$ requires a full matrix inversion, and the decomposition does not respect the geometry of $\mathbb{R}^n$. Orthogonal diagonalisability upgrades this: the change-of-basis matrix is orthogonal, meaning $P^{-1} = P^\top$ (a free transpose instead of a costly inversion), and the eigenvectors form an orthonormal basis. This makes projections, spectral decompositions, and quadratic form computations both numerically stable and geometrically transparent.
[definition: Orthogonally Diagonalisable Matrix]
A matrix $A \in \mathbb{R}^{n \times n}$ is **orthogonally diagonalisable** if there exists an orthogonal matrix $Q \in \mathbb{R}^{n \times n}$ (satisfying $Q^\top Q = I$
[/definition]
Equivalently $Q^{-1} = Q^\top$) and a diagonal matrix $D$ such that
\begin{align*}
A &= Q D Q^\top.
\end{align*}
The columns of $Q$ form an orthonormal basis of $\mathbb{R}^n$ consisting of eigenvectors of $A$.
[quotetheorem:925]
[remark: Orthogonal Diagonalisability is Stronger]
Ordinary diagonalisability only asks for an invertible $P$; orthogonal diagonalisability requires $P$ to be orthogonal ($P^{-1} = P^\top$). Not every diagonalisable matrix is orthogonally diagonalisable. For instance, the matrix $A = \begin{pmatrix} 1 & 1 \\ 0 & 2 \end{pmatrix}$ from the opening example is diagonalisable (distinct eigenvalues) but is not symmetric, and its eigenvectors $\begin{pmatrix} 1 \\ 0 \end{pmatrix}$ and $\begin{pmatrix} 1 \\ -1 \end{pmatrix}$ are not orthogonal. The Spectral Theorem holds precisely for symmetric matrices and no larger class (among real matrices).
[/remark]
[example: Spectral Decomposition of a 2x2 Symmetric Matrix]
Let
\begin{align*}
A &= \begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix}.
\end{align*}
The characteristic polynomial is
\begin{align*}
p_A(\lambda) &= (\lambda - 3)^2 - 1 = \lambda^2 - 6\lambda + 8 = (\lambda - 4)(\lambda - 2),
\end{align*}
giving eigenvalues $\lambda_1 = 2$ and $\lambda_2 = 4$.
For $\lambda_1 = 2$: $(A - 2I)v = \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix} v = 0$, so $v_1 = v_2$. The unit eigenvector is $q_1 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ -1 \end{pmatrix}$.
For $\lambda_2 = 4$: $(A - 4I)v = \begin{pmatrix} -1 & 1 \\ 1 & -1 \end{pmatrix} v = 0$, so $v_1 = v_2$. The unit eigenvector is $q_2 = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \end{pmatrix}$.
Note that $q_1 \cdot q_2 = \frac{1}{2}(1 \cdot 1 + (-1) \cdot 1) = 0$, confirming orthogonality. Set
\begin{align*}
Q &= \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ -1 & 1 \end{pmatrix}, \quad D = \begin{pmatrix} 2 & 0 \\ 0 & 4 \end{pmatrix}.
\end{align*}
Then $Q^\top Q = I$ (one verifies: $\frac{1}{2}(1 \cdot 1 + 1 \cdot 1) = 1$ on the diagonal and $\frac{1}{2}(1 \cdot (-1) + 1 \cdot 1) = 0$ off-diagonal), and $A = QDQ^\top$.
The spectral decomposition also gives $A = 2 q_1 q_1^\top + 4 q_2 q_2^\top$, which writes $A$ as a sum of scaled orthogonal projections:
\begin{align*}
A &= 2 \cdot \frac{1}{2}\begin{pmatrix} 1 \\ -1 \end{pmatrix}\begin{pmatrix} 1 & -1 \end{pmatrix} + 4 \cdot \frac{1}{2}\begin{pmatrix} 1 \\ 1 \end{pmatrix}\begin{pmatrix} 1 & 1 \end{pmatrix} = \begin{pmatrix} 1 & -1 \\ -1 & 1 \end{pmatrix} + \begin{pmatrix} 2 & 2 \\ 2 & 2 \end{pmatrix} = \begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix}. \checkmark
\end{align*}
[/example]
The spectral decomposition $A = \sum_{i=1}^n \lambda_i q_i q_i^\top$ is particularly powerful: it expresses $A$ as a sum of rank-one matrices, each a projection onto an eigendirection scaled by the corresponding eigenvalue.
## Applications: Matrix Functions and Quadratic Forms
One of the most productive consequences of diagonalisation is the ability to define and compute functions of matrices. If $A = PDP^{-1}$, then for any function $f: \mathbb{F} \to \mathbb{F}$, we can define
\begin{align*}
f(A) &= P \operatorname{diag}(f(\lambda_1), \ldots, f(\lambda_n)) P^{-1},
\end{align*}
provided $f$ is defined at each eigenvalue. This is not merely a formula; it is the correct notion of applying $f$ to a matrix, consistent with the functional calculus.
Among all matrix functions, the matrix exponential occupies a special place. The system of linear ODEs $\dot{x} = Ax$ — governing everything from population dynamics to coupled oscillators to circuit networks — has the unique solution $x(t) = e^{tA} x(0)$, where $e^{tA}$ is the matrix exponential. If $A$ is diagonalisable, this exponential can be computed explicitly by exponentiating each eigenvalue separately, turning an operator-theoretic object into a concrete formula.
[definition: Matrix Exponential via Diagonalisation]
If $A \in \mathbb{F}^{n \times n}$ is diagonalisable with $A = PDP^{-1}$ and $D = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)$, the **matrix exponential** of $A$ is
\begin{align*}
e^A &:= P \operatorname{diag}(e^{\lambda_1}, \ldots, e^{\lambda_n}) P^{-1}.
\end{align*}
This definition agrees with the series definition $e^A = \sum_{k=0}^\infty A^k / k!$ whenever $A$ is diagonalisable.
[/definition]
[example: Matrix Exponential and Systems of ODEs]
Consider the system of ODEs
\begin{align*}
\frac{d}{dt} \begin{pmatrix} x_1(t) \\ x_2(t) \end{pmatrix} &= \begin{pmatrix} 3 & 1 \\ 0 & 2 \end{pmatrix} \begin{pmatrix} x_1(t) \\ x_2(t) \end{pmatrix}, \quad \begin{pmatrix} x_1(0) \\ x_2(0) \end{pmatrix} = \begin{pmatrix} 1 \\ 1 \end{pmatrix}.
\end{align*}
The solution is $x(t) = e^{tA} x(0)$.
From the opening example, $A = PDP^{-1}$ with
\begin{align*}
P &= \begin{pmatrix} 1 & 1 \\ 0 & -1 \end{pmatrix}, \quad D = \begin{pmatrix} 3 & 0 \\ 0 & 2 \end{pmatrix}, \quad P^{-1} = \begin{pmatrix} 1 & 1 \\ 0 & -1 \end{pmatrix}.
\end{align*}
Therefore
\begin{align*}
e^{tA} &= P \begin{pmatrix} e^{3t} & 0 \\ 0 & e^{2t} \end{pmatrix} P^{-1} = \begin{pmatrix} 1 & 1 \\ 0 & -1 \end{pmatrix} \begin{pmatrix} e^{3t} & 0 \\ 0 & e^{2t} \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 0 & -1 \end{pmatrix}.
\end{align*}
Carrying out the multiplication: first $P \operatorname{diag}(e^{3t}, e^{2t}) = \begin{pmatrix} e^{3t} & e^{2t} \\ 0 & -e^{2t} \end{pmatrix}$, then multiplying by $P^{-1}$:
\begin{align*}
e^{tA} &= \begin{pmatrix} e^{3t} & e^{3t} - e^{2t} \\ 0 & e^{2t} \end{pmatrix}.
\end{align*}
The solution is therefore
\begin{align*}
\begin{pmatrix} x_1(t) \\ x_2(t) \end{pmatrix} &= e^{tA} \begin{pmatrix} 1 \\ 1 \end{pmatrix} = \begin{pmatrix} e^{3t} + (e^{3t} - e^{2t}) \\ e^{2t} \end{pmatrix} = \begin{pmatrix} 2e^{3t} - e^{2t} \\ e^{2t} \end{pmatrix}.
\end{align*}
One can verify: $x_1'(t) = 6e^{3t} - 2e^{2t} = 3(2e^{3t} - e^{2t}) + e^{2t} = 3x_1(t) + x_2(t)$, as required. Also $x_1(0) = 2 - 1 = 1$ and $x_2(0) = 1$, matching the initial condition.
[/example]
A second major application is the classification of quadratic forms. A quadratic form $Q: \mathbb{R}^n \to \mathbb{R}$ is a function of the form $Q(v) = v^\top A v$ for a symmetric matrix $A$. The Spectral Theorem says we can always change to an orthonormal eigenbasis in which $Q$ becomes purely diagonal: $Q(v) = \sum_{i=1}^n \lambda_i y_i^2$, where $y = Q^\top v$ are the coordinates in the eigenbasis.
Why does this classification matter? In multivariable calculus, the second derivative test for a critical point of a smooth function $f: \mathbb{R}^n \to \mathbb{R}$ reduces entirely to the definiteness of the Hessian matrix $H = D^2 f(x^*)$: $f$ has a local minimum at $x^*$ if $H$ is positive definite, a local maximum if $H$ is negative definite, and a saddle point if $H$ is indefinite. In geometry, the sign pattern of eigenvalues governs whether a conic section $Q(v) = c$ is an ellipse, a hyperbola, or a degenerate conic. In mechanics and control theory, the stability of an equilibrium $x^* = 0$ of $\dot{x} = Ax$ is determined by whether $A$ is negative definite. These are the stakes: definiteness is the bridge between the spectral theory of symmetric matrices and the qualitative behaviour of real systems.
[definition: Definiteness of a Quadratic Form]
Let $A \in \mathbb{R}^{n \times n}$ be symmetric. The quadratic form $Q(v) = v^\top A v$
[/definition]
Equivalently, the matrix $A$ itself) is called:
- **positive definite** if $Q(v) > 0$ for all $v \neq 0$;
- **positive semidefinite** if $Q(v) \ge 0$ for all $v$;
- **negative definite** if $Q(v) < 0$ for all $v \neq 0$;
- **negative semidefinite** if $Q(v) \le 0$ for all $v$;
- **indefinite** if $Q$ takes both positive and negative values.
The Spectral Theorem makes the connection to eigenvalues transparent.
[quotetheorem:3281]
[explanation: Why Eigenvalues Determine Definiteness]
Let $A = QDQ^\top$ be the spectral decomposition. For any $v \in \mathbb{R}^n$, set $w = Q^\top v$. Since $Q$ is orthogonal, $|w| = |v|$, and $w$ ranges over all of $\mathbb{R}^n$ as $v$ does. Then
\begin{align*}
v^\top A v &= v^\top Q D Q^\top v = (Q^\top v)^\top D (Q^\top v) = w^\top D w = \sum_{i=1}^n \lambda_i w_i^2.
\end{align*}
If all $\lambda_i > 0$, then $\sum \lambda_i w_i^2 \ge \lambda_1 |w|^2 = \lambda_1 |v|^2 > 0$ for $v \neq 0$, so $A$ is positive definite. Conversely, if $\lambda_1 \le 0$, taking $v = q_1$ (the eigenvector for $\lambda_1$) gives $v^\top A v = \lambda_1 \le 0$, so $A$ is not positive definite. The other cases follow similarly.
[/explanation]
[example: Classifying a Quadratic Form]
Consider the quadratic form $Q(x, y) = 5x^2 + 4xy + 2y^2$. In matrix form:
\begin{align*}
Q(v) &= v^\top A v, \quad A = \begin{pmatrix} 5 & 2 \\ 2 & 2 \end{pmatrix}, \quad v = \begin{pmatrix} x \\ y \end{pmatrix}.
\end{align*}
(Note: the coefficient of $xy$ in the quadratic form is $4 = A_{12} + A_{21} = 2 + 2$.)
The eigenvalues satisfy $p_A(\lambda) = (\lambda - 5)(\lambda - 2) - 4 = \lambda^2 - 7\lambda + 6 = (\lambda - 1)(\lambda - 6)$, giving $\lambda_1 = 1 > 0$ and $\lambda_2 = 6 > 0$.
Since both eigenvalues are positive, $A$ is positive definite, and $Q(x, y) > 0$ for all $(x, y) \neq (0, 0)$.
In the eigenbasis, $Q$ becomes $Q = u^2 + 6v^2$ where $(u, v)$ are coordinates along the eigenvectors. The level set $Q = 1$ is the set of points satisfying
\begin{align*}
u^2 + 6v^2 &= 1,
\end{align*}
which we rewrite in standard ellipse form as
\begin{align*}
\frac{u^2}{1^2} + \frac{v^2}{(1/\sqrt{6})^2} &= 1.
\end{align*}
This is an ellipse with semi-axis $1$ along the $u$-direction (the eigenvector for $\lambda_1 = 1$) and semi-axis $1/\sqrt{6}$ along the $v$-direction (the eigenvector for $\lambda_2 = 6$). The larger eigenvalue corresponds to the shorter semi-axis: the quadratic form grows fastest along the eigenvector for $\lambda_2 = 6$, so the level set is most compressed in that direction.
[/example]
## What Comes Next
Diagonalisation is not the end of the story — it is the beginning. Two major generalizations carry the theory further when diagonalisation fails or when one wants a canonical form for all matrices.
When a matrix is not diagonalisable, the obstruction is precisely that some geometric multiplicity falls short of the algebraic multiplicity. In this case, there are not enough eigenvectors, but there are **generalised eigenvectors**: vectors $v$ satisfying $(A - \lambda I)^k v = 0$ for some $k > 1$. These fill out the missing dimensions and assemble into **Jordan blocks** — matrices of the form $\lambda I + N$ where $N$ is a nilpotent shift. Every square matrix over $\mathbb{C}$ is similar to a **Jordan Normal Form**, a block-diagonal matrix of Jordan blocks. Jordan Normal Form is the definitive canonical form for matrices over $\mathbb{C}$: it reduces every matrix to the simplest possible structure, and diagonalisation corresponds to the special case where every Jordan block has size $1$.
For real matrices without the symmetry hypothesis, a different canonical decomposition is available: the **Singular Value Decomposition (SVD)**. Every $A \in \mathbb{R}^{m \times n}$ can be written as $A = U \Sigma V^\top$ where $U \in \mathbb{R}^{m \times m}$ and $V \in \mathbb{R}^{n \times n}$ are orthogonal and $\Sigma$ is a diagonal matrix of non-negative **singular values**. Unlike diagonalisation, which requires $A$ to be square and symmetric (for orthogonal diagonalisation), the SVD applies to every matrix. It reveals the geometric action of $A$ as a rotation $V^\top$, followed by scaling $\Sigma$, followed by a rotation $U$, and it underlies a vast range of applications from principal component analysis to numerical rank computation.
## References
- Axler, S., *Linear Algebra Done Right* (2015). Springer, 3rd edition. Presents diagonalisation from the linear map perspective, with a careful treatment of the Spectral Theorem without determinants.
- Horn, R. A. and Johnson, C. R., *Matrix Analysis* (2013). Cambridge University Press, 2nd edition. Comprehensive reference for spectral theory, including generalizations to normal matrices.
- Strang, G., *Introduction to Linear Algebra* (2016). Wellesley-Cambridge Press, 5th edition. Emphasizes computational aspects and applications to differential equations and data analysis.
- Trefethen, L. N. and Bau, D., *Numerical Linear Algebra* (1997). SIAM. Covers the numerical computation of eigenvalues and the practical aspects of diagonalisation.