Every student of linear algebra encounters the dream of diagonalisation: find a basis in which a linear operator acts by simply scaling each coordinate. When that dream is realised, the entire theory becomes transparent — powers of the matrix, exponentials, invariant subspaces all read off at a glance. But the dream fails more often than it succeeds. The rotation matrix
\begin{align*}
R &= \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}
\end{align*}
over $\mathbb{R}$ has characteristic polynomial $\lambda^2 + 1$, which has no real roots, so $R$ admits no real eigenvalues at all. Even over $\mathbb{C}$, the matrix
\begin{align*}
N &= \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}
\end{align*}
has $\lambda = 0$ as a double eigenvalue but only a one-dimensional eigenspace: $\ker N = \operatorname{span}\{e_1\}$. Any attempt to diagonalise $N$ fails because we simply do not have enough linearly independent eigenvectors to form a basis.
The Jordan Normal Form theorem says that over an algebraically closed field — most importantly $\mathbb{C}$ — every linear operator on a finite-dimensional vector space does have a canonical form, just not necessarily a diagonal one. Instead of diagonal matrices, the canonical form consists of **Jordan blocks**: triangular matrices with a single eigenvalue on the diagonal and ones on the superdiagonal. These blocks are as close to diagonal as the structure of the operator allows, and they encode every invariant of the operator up to similarity.
[example: Why Diagonalisation Fails for a Nilpotent Matrix]
Let $V = \mathbb{C}^3$ and let $T: V \to V$ be given by
\begin{align*}
T &= \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{pmatrix}.
\end{align*}
The characteristic polynomial is $\lambda^3$, so $\lambda = 0$ is the only eigenvalue with algebraic multiplicity $3$. The eigenspace is $\ker T = \{v : Tv = 0\}$. A vector $v = (a, b, c)^\top$ lies in $\ker T$ only when $Tv = (b, c, 0)^\top = 0$, which forces $b = c = 0$. Thus $\ker T = \operatorname{span}\{e_1\}$ has dimension $1$. The geometric multiplicity (dimension of the eigenspace) is $1$, but the algebraic multiplicity is $3$. There are not enough eigenvectors to span $V$, so $T$ is not diagonalisable.
Notice further that $T^2$ sends $e_3 \mapsto e_1$ and $T^3 = 0$, so the chain $e_3, T e_3 = e_2, T^2 e_3 = e_1$ forms a natural basis in which $T$ acts by shifting: $T e_3 = e_2$, $T e_2 = e_1$, $T e_1 = 0$. In this basis, $T$ is exactly the $3 \times 3$ Jordan block $J_3(0)$ — this is the Jordan Normal Form of $T$.
[/example]
This example reveals the key idea: when eigenvectors are scarce, we extend them to chains. The Jordan Normal Form theorem is the precise statement that such chains always exist and together span the entire space.
[example: What Goes Wrong When You Assume Diagonalisability]
Suppose one naively assumes that every matrix over $\mathbb{C}$ is diagonalisable and tries to compute $e^{tA}$ for
\begin{align*}
A &= \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}.
\end{align*}
The characteristic polynomial is $(\lambda - 1)^2$, so $\lambda = 1$ is a double eigenvalue. The eigenspace is $\ker(A - I) = \ker \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix} = \operatorname{span}\{e_1\}$, which is one-dimensional. There is no basis of eigenvectors, so $A$ is not diagonalisable.
If one ignores this and writes $e^{tA} = e^t I$ (incorrectly treating $A$ as the scalar matrix $1 \cdot I$), the "solution" $x(t) = e^t x_0$ to $\dot{x} = Ax$ fails: differentiating gives $\dot{x}(t) = e^t x_0$, but $A x(t) = e^t A x_0$, and $e^t x_0 = e^t A x_0$ would require $x_0 = A x_0$ for all initial data — which is false since $A \neq I$.
The correct formula, obtained from the Jordan Normal Form $A = J_2(1)$, gives $e^{tA} = e^t \begin{pmatrix} 1 & t \\ 0 & 1 \end{pmatrix}$. For $x_0 = (0, 1)^\top$: $x(t) = e^t (t, 1)^\top$. The first component $x_1(t) = t e^t$ has linear growth in $t$ multiplying the exponential — a signature of a non-trivial Jordan block that diagonalisation completely misses.
[/example]
## Definition and Basic Structure
Before stating the Jordan Normal Form theorem, we need to name its building blocks.
When a linear operator $T$ has an eigenvalue $\lambda$, not all behaviour at $\lambda$ is captured by the eigenspace $\ker(T - \lambda I)$. There may be vectors that are not eigenvectors but become eigenvectors after applying $T - \lambda I$ once, or twice, or finitely many times. This "generalised" structure is what the Jordan block encodes.
[definition: Jordan Block]
Let $\lambda \in \mathbb{C}$ and $m \in \mathbb{N}$. The **Jordan block** of size $m$ with eigenvalue $\lambda$ is the $m \times m$ matrix
\begin{align*}
J_m(\lambda) &= \begin{pmatrix} \lambda & 1 & 0 & \cdots & 0 \\ 0 & \lambda & 1 & \cdots & 0 \\ 0 & 0 & \lambda & \cdots & 0 \\ \vdots & & & \ddots & 1 \\ 0 & 0 & 0 & \cdots & \lambda \end{pmatrix} \in \mathrm{Mat}_m(\mathbb{C}).
\end{align*}
That is, $(J_m(\lambda))_{ij} = \lambda$ when $i = j$, $(J_m(\lambda))_{ij} = 1$ when $j = i + 1$, and $0$ otherwise.
[/definition]
The Jordan block $J_m(\lambda)$ can be written as $\lambda I_m + N_m$, where $N_m$ is the $m \times m$ nilpotent shift matrix with $1$s on the superdiagonal. This decomposition is fundamental: $\lambda I$ is the scalar part, and $N_m$ is the nilpotent "error" that prevents $J_m(\lambda)$ from being a scalar matrix when $m > 1$.
[remark: Size-One Jordan Blocks]
A Jordan block of size $1$ is simply $J_1(\lambda) = (\lambda)$, a $1 \times 1$ scalar matrix. An operator is diagonalisable if and only if its Jordan Normal Form consists entirely of $1 \times 1$ blocks — each eigenvalue appears with a block for each eigenvector in a basis.
[/remark]
[definition: Jordan Normal Form]
Let $V$ be a finite-dimensional vector space over $\mathbb{C}$ and let $T: V \to V$ be a linear operator. A **Jordan Normal Form** of $T$ is a block-diagonal matrix
\begin{align*}
J &= \begin{pmatrix} J_{m_1}(\lambda_1) & & \\ & \ddots & \\ & & J_{m_k}(\lambda_k) \end{pmatrix}
\end{align*}
where $\lambda_1, \ldots, \lambda_k \in \mathbb{C}$ (not necessarily distinct) and $m_1 + \cdots + m_k = \dim V$, such that $T$ is similar to $J$: there exists an invertible $P$ with $P^{-1}TP = J$.
[/definition]
The entries $\lambda_i$ are the eigenvalues of $T$ (with repetition), and the sizes $m_i$ are the sizes of the corresponding Jordan blocks. The Jordan Normal Form is unique up to permutation of the blocks — this is the content of the uniqueness assertion in the main theorem.
[quotetheorem:412]
The proof of this theorem — which we will build toward through generalised eigenspaces and cyclic decomposition — is the central goal of this chapter. We will prove it by understanding the structure of nilpotent operators first, then combining that with the primary decomposition theorem.
## Generalised Eigenspaces and the Primary Decomposition
The failure of $T$ to be diagonalisable stems from the gap between the algebraic multiplicity of an eigenvalue and the dimension of its eigenspace. The correct replacement for the eigenspace is the **generalised eigenspace**, which captures all vectors that are annihilated by some power of $T - \lambda I$.
An eigenspace $\ker(T - \lambda I)$ contains only vectors where $T$ acts exactly by scalar multiplication $\lambda$. But the nilpotent example shows there can be vectors $v$ where $Tv \neq \lambda v$, yet $(T - \lambda I)^2 v = 0$. Such vectors carry information about $T$ at $\lambda$ that the eigenspace misses.
[definition: Generalised Eigenspace]
Let $T: V \to V$ be linear and $\lambda \in \mathbb{C}$ an eigenvalue of $T$. The **generalised eigenspace** of $T$ at $\lambda$ is
\begin{align*}
V_\lambda &= \ker(T - \lambda I)^n,
\end{align*}
where $n = \dim V$.
[/definition]
Equivalently, $V_\lambda = \bigcup_{k=1}^\infty \ker(T - \lambda I)^k$.
The two descriptions in the definition agree because the ascending chain of subspaces $\ker(T - \lambda I) \subset \ker(T - \lambda I)^2 \subset \cdots$ must stabilise in a finite-dimensional space: once $\ker(T - \lambda I)^k = \ker(T - \lambda I)^{k+1}$, all subsequent kernels coincide. This stabilisation happens no later than step $n = \dim V$, by dimension considerations.
[explanation: Stabilisation of the Kernel Chain]
Let $W_k = \ker(T - \lambda I)^k$. These form an ascending chain $W_1 \subset W_2 \subset \cdots$ of subspaces of $V$. If $W_k = W_{k+1}$ for some $k$, then $W_k = W_m$ for all $m \geq k$: indeed, if $v \in W_{m+1}$, then $(T - \lambda I)^{m+1} v = 0$, so $(T - \lambda I)^m((T - \lambda I)v) = 0$, meaning $(T - \lambda I)v \in W_m = W_k$, so $(T - \lambda I)^{k+1}v = (T - \lambda I)^k((T - \lambda I)v) = 0$, giving $v \in W_{k+1} = W_k$. Since $V$ is finite-dimensional and each $W_k$ is a subspace, the dimensions $\dim W_k$ form a weakly increasing sequence bounded by $n$. Therefore the chain stabilises at some step $k \leq n$, and certainly by step $n$.
[/explanation]
The primary decomposition theorem says that the generalised eigenspaces do exactly what eigenspaces do for diagonalisable operators: they decompose $V$ into a direct sum of invariant subspaces. The difference is that $T$ no longer acts as a scalar on each piece — it acts as a scalar plus a nilpotent operator.
Before stating the primary decomposition, we record the algebraic fact that makes it work: the minimal polynomial factors over $\mathbb{C}$.
[definition: Minimal Polynomial]
The **minimal polynomial** $m_T(\lambda)$ of a linear operator $T: V \to V$ is the monic polynomial of least degree such that $m_T(T) = 0$.
[/definition]
Over $\mathbb{C}$, every polynomial factors into linear factors. Since $m_T(\lambda)$ divides the characteristic polynomial $\chi_T(\lambda) = \det(\lambda I - T)$, which has degree $n$ and factors as $\prod_i (\lambda - \lambda_i)^{a_i}$ where $a_i$ is the algebraic multiplicity of $\lambda_i$, the minimal polynomial takes the form $m_T(\lambda) = \prod_i (\lambda - \lambda_i)^{r_i}$ with $1 \leq r_i \leq a_i$.
[quotetheorem:411]
The Primary Decomposition Theorem reduces the Jordan Normal Form problem to each generalised eigenspace separately. On $V_{\lambda_i}$, the operator $T - \lambda_i I$ is nilpotent by definition. So it suffices to understand the Jordan Normal Form of a nilpotent operator — this is the core of the entire theory.
[illustration:primary-decomposition]
[illustration:jordan-chain-structure]
## Nilpotent Operators and Jordan Chains
Having reduced to nilpotent operators via the Primary Decomposition Theorem, we now study nilpotent operators directly. A nilpotent operator is one that becomes the zero operator after enough applications — the simplest possible departure from diagonalisability.
Nilpotent operators are poorly handled by eigenspace theory: the only eigenvalue is $0$, but the eigenspace $\ker T$ may be much smaller than $V$. What replaces eigenvectors in the nilpotent setting is the notion of a **Jordan chain**: a sequence of vectors $v, Tv, T^2v, \ldots, T^{m-1}v$ forming a basis for a cyclic invariant subspace.
[definition: Nilpotent Operator]
A linear operator $N: V \to V$ is **nilpotent** if $N^k = 0$ for some $k \in \mathbb{N}$. The smallest such $k$ is the **nilpotency index** of $N$.
[/definition]
[definition: Jordan Chain]
Let $N: V \to V$ be nilpotent. A **Jordan chain** for $N$ of length $m$ starting at $v$ is the sequence
\begin{align*}
v,\; Nv,\; N^2 v,\; \ldots,\; N^{m-1}v,
\end{align*}
where $N^{m-1}v \neq 0$ and $N^m v = 0$. The vector $N^{m-1}v$ at the end of the chain is the eigenvector (the element of $\ker N$), and $v$ is called the **generator** of the chain.
[/definition]
A Jordan chain of length $m$ spans a cyclic subspace $\operatorname{span}\{v, Nv, \ldots, N^{m-1}v\}$ on which $N$ acts as the $m \times m$ nilpotent shift. This is the building block for the Jordan Normal Form: on each Jordan block corresponding to eigenvalue $\lambda$, the operator $T - \lambda I$ acts as such a nilpotent shift.
[quotetheorem:3282]
To see why: suppose $c_0 v + c_1 Nv + \cdots + c_{m-1} N^{m-1}v = 0$. Apply $N^{m-1}$ to both sides: $c_0 N^{m-1}v = 0$, and since $N^{m-1}v \neq 0$, we get $c_0 = 0$. Then apply $N^{m-2}$ to the original equation with $c_0 = 0$: $c_1 N^{m-1}v = 0$, giving $c_1 = 0$. Continuing inductively, all coefficients are zero.
The full Jordan Normal Form theorem for nilpotent operators says that any nilpotent $N$ can be decomposed into a direct sum of cyclic subspaces spanned by Jordan chains.
[quotetheorem:3283]
The existence part of this theorem is proved by induction on $\dim V$, using the fact that $\ker N$ and $\operatorname{Range}(N)$ are complementary in a suitable sense. The uniqueness of the block sizes is established by counting: the number of blocks of size at least $k$ equals $\dim \ker N^k - \dim \ker N^{k-1}$, a quantity determined entirely by the ranks of powers of $N$.
[example: Jordan Chains for a $4 \times 4$ Nilpotent Matrix]
Let $V = \mathbb{C}^4$ and let $N: V \to V$ be given by
\begin{align*}
N &= \begin{pmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{pmatrix}.
\end{align*}
We compute: $N e_2 = e_1$, $N e_1 = 0$, $N e_4 = e_3$, $N e_3 = 0$. Therefore:
- The chain starting at $e_2$: $e_2, N e_2 = e_1$ gives a Jordan chain of length $2$.
- The chain starting at $e_4$: $e_4, N e_4 = e_3$ gives a Jordan chain of length $2$.
These two chains span $V$, and in the basis $\{e_2, e_1, e_4, e_3\}$, the matrix of $N$ is the Jordan Normal Form
\begin{align*}
J &= \begin{pmatrix} J_2(0) & 0 \\ 0 & J_2(0) \end{pmatrix} = \begin{pmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{pmatrix}.
\end{align*}
To verify: $\ker N = \operatorname{span}\{e_1, e_3\}$ has dimension $2$. The number of Jordan blocks is $\dim \ker N = 2$ and each has size $2$, consistent with $N^2 = 0$ but $N \neq 0$.
[/example]
The relationship between the Jordan block sizes and the dimensions of iterated kernels is worth recording carefully, since it is the key to reading off the Jordan Normal Form from a nilpotent matrix.
[remark: Reading Off Jordan Blocks from Kernel Dimensions]
For a nilpotent operator $N$ with Jordan block sizes $m_1 \geq m_2 \geq \cdots \geq m_r$, let $d_k = \dim \ker N^k$. Then:
- $d_0 = 0$, $d_1 = $ (number of Jordan chains) $ = r$.
- The number of Jordan blocks of size $\geq k$ equals $d_k - d_{k-1}$.
- The number of Jordan blocks of size exactly $k$ equals $(d_k - d_{k-1}) - (d_{k+1} - d_k) = 2d_k - d_{k-1} - d_{k+1}$.
In the example above, $d_1 = 2$ (two Jordan blocks), $d_2 = 4$ (both blocks have size $\leq 2$, so $N^2 = 0$). The number of blocks of size exactly $2$ is $2 \cdot 2 - 0 - 4 = 0$... wait, let us recount: $d_0 = 0$, $d_1 = \dim \ker N = 2$, $d_2 = \dim \ker N^2 = \dim V = 4$. Blocks of size $\geq 1$: $d_1 - d_0 = 2$. Blocks of size $\geq 2$: $d_2 - d_1 = 2$. Blocks of size $\geq 3$: $d_3 - d_2 = 0$. So there are $2$ blocks of size exactly $2$ (blocks of size $\geq 2$ minus blocks of size $\geq 3$), confirming our computation.
[/remark]
## Computing the Jordan Normal Form
Knowing that the Jordan Normal Form exists is one thing; computing it explicitly for a given matrix is another. The computation proceeds in two stages: first find the eigenvalues and generalised eigenspaces from the characteristic polynomial, then determine the Jordan block structure within each generalised eigenspace from the ranks of iterated matrices.
A natural question is: given a matrix $A$ with repeated eigenvalue $\lambda$, how large are the Jordan blocks for $\lambda$? The number of blocks equals $\dim \ker(A - \lambda I)$ (the geometric multiplicity), and the total size of all blocks equals the algebraic multiplicity — but knowing these two numbers still leaves the block structure ambiguous. How do we know whether we have one large block or several small ones?
The answer is in the rank sequence: the ranks of $(A - \lambda I)^k$ for $k = 1, 2, \ldots$ tell us exactly.
[quotetheorem:3284]
This is the computational engine of the Jordan theory. Every entry in the Jordan Normal Form is determined by the rank drops in the sequence $r_0 > r_1 > \cdots$.
[example: Computing Jordan Normal Form of a $5 \times 5$ Matrix]
Let $A \in \mathrm{Mat}_5(\mathbb{C})$ have characteristic polynomial $\chi_A(\lambda) = (\lambda - 2)^3 (\lambda - 5)^2$, and suppose:
\begin{align*}
\operatorname{rank}(A - 2I)^1 &= 3, \\
\operatorname{rank}(A - 2I)^2 &= 2, \\
\operatorname{rank}(A - 2I)^3 &= 2.
\end{align*}
For the eigenvalue $\lambda = 2$ (algebraic multiplicity $3$): The dimension of $\ker(A - 2I)$ is $5 - 3 = 2$, so there are $2$ Jordan blocks for $\lambda = 2$. The dimension of $\ker(A - 2I)^2$ is $5 - 2 = 3$, which equals the algebraic multiplicity $3$, confirming $\ker(A - 2I)^3 = \ker(A - 2I)^2$.
Blocks of size $\geq 1$: $\dim \ker(A - 2I)^1 - \dim \ker(A - 2I)^0 = 2 - 0 = 2$.
Blocks of size $\geq 2$: $\dim \ker(A - 2I)^2 - \dim \ker(A - 2I)^1 = 3 - 2 = 1$.
Blocks of size $\geq 3$: $3 - 3 = 0$.
So there are $2$ blocks of size $\geq 1$, $1$ block of size $\geq 2$, and $0$ blocks of size $\geq 3$. Blocks of size exactly $1$: $2 - 1 = 1$. Blocks of size exactly $2$: $1 - 0 = 1$. The Jordan blocks for $\lambda = 2$ are $J_1(2)$ and $J_2(2)$.
For the eigenvalue $\lambda = 5$ (algebraic multiplicity $2$): Suppose $\operatorname{rank}(A - 5I)^1 = 3$. Then $\dim \ker(A - 5I) = 5 - 3 = 2$, which already equals the algebraic multiplicity $2$. This means there are $2$ Jordan blocks for $\lambda = 5$, each of size $1$ — the eigenvalue $\lambda = 5$ is semisimple (the restriction $A|_{V_5}$ is diagonalisable).
The Jordan Normal Form of $A$ is therefore:
\begin{align*}
J &= \begin{pmatrix} J_1(2) & & & & \\ & J_2(2) & & & \\ & & J_1(5) & & \\ & & & J_1(5) & \end{pmatrix} = \begin{pmatrix} 2 & 0 & 0 & 0 & 0 \\ 0 & 2 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 5 & 0 \\ 0 & 0 & 0 & 0 & 5 \end{pmatrix}.
\end{align*}
[/example]
## The Invariance Theorem and Similarity Invariants
The uniqueness part of the Jordan Normal Form theorem is arguably more surprising than the existence part: the block structure is entirely determined by $T$ and does not depend on any choices made during the construction. This makes the Jordan blocks **similarity invariants** — quantities that remain unchanged when you replace $T$ by $PTP^{-1}$ for any invertible $P$.
Two matrices are similar if and only if they have the same Jordan Normal Form. This gives a complete invariant for the similarity classification of matrices, solving the problem of deciding when two matrices represent the same linear operator in different bases.
Two matrices can have the same eigenvalues and yet differ fundamentally — the characteristic polynomial alone does not pin down an operator up to similarity. A stark example: the two matrices
\begin{align*}
A &= \begin{pmatrix} 2 & 1 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{pmatrix}, \qquad B = \begin{pmatrix} 2 & 1 & 0 \\ 0 & 2 & 1 \\ 0 & 0 & 2 \end{pmatrix}
\end{align*}
both have characteristic polynomial $(\lambda - 2)^3$ and minimal polynomial $(\lambda - 2)^2$... wait — let us be precise. For $A$: the first two columns form a Jordan chain of length $2$ at eigenvalue $2$, and the third is an independent eigenvector, giving Jordan blocks $J_2(2) \oplus J_1(2)$. For $B$: there is a single Jordan chain of length $3$, giving block $J_3(2)$. The characteristic polynomial is $(\lambda - 2)^3$ for both. But $A$ and $B$ are not similar: $\dim \ker(A - 2I) = 2$ while $\dim \ker(B - 2I) = 1$, so their Jordan forms differ. The characteristic polynomial failed to distinguish them; the geometric multiplicity (equivalently, the number of Jordan blocks) did not.
What we need is a complete set of invariants: quantities that are preserved under conjugation $T \mapsto PTP^{-1}$ and that together determine the Jordan Normal Form uniquely. The Jordan theory provides exactly three such invariants, and together they are complete.
[definition: Similar Matrices]
Two linear operators $T, S: V \to V$ are **similar** if there exists an invertible operator $P: V \to V$ with $S = PTP^{-1}$. Similarity is an equivalence relation on $\mathrm{End}(V)$; matrices representing the same linear operator in different bases are similar.
[/definition]
The classification problem for linear operators is precisely the problem of identifying all similarity classes. The Jordan theory answers this completely: the similarity class of $T$ is determined by a finite set of discrete data. The characteristic polynomial, minimal polynomial, and Jordan block structure are all preserved under similarity — and together they are complete invariants of the similarity class.
The three similarity invariants carry redundant but complementary information. The characteristic polynomial $\chi_T(\lambda) = \prod_i (\lambda - \lambda_i)^{a_i}$ records the eigenvalues and their algebraic multiplicities. The minimal polynomial $m_T(\lambda) = \prod_i (\lambda - \lambda_i)^{r_i}$ records the size of the largest Jordan block for each eigenvalue $\lambda_i$ (since $r_i$ is the size of the largest Jordan block at $\lambda_i$, as $(T - \lambda_i I)^{r_i}$ vanishes on the cyclic subspace of that block but no smaller power does). The Jordan Normal Form encodes the complete block structure.
[quotetheorem:3285]
[quotetheorem:3286]
An important consequence is a characterisation of diagonalisable operators.
[quotetheorem:406]
This characterisation is useful in practice: to test whether $A$ is diagonalisable, one checks whether all Jordan blocks have size $1$, which is equivalent to the minimal polynomial being squarefree.
[example: The Minimal Polynomial Detects Non-Diagonalisability]
Consider $A = J_3(0)$, the $3 \times 3$ nilpotent shift. Its characteristic polynomial is $\lambda^3$. Its minimal polynomial is also $\lambda^3$: we have $A \neq 0$, $A^2 \neq 0$ (since $A^2 e_3 = e_1 \neq 0$), but $A^3 = 0$. The minimal polynomial $\lambda^3$ has a repeated root at $\lambda = 0$, confirming $A$ is not diagonalisable.
Now consider $B = J_1(0) \oplus J_1(0) \oplus J_1(0) = 0$, the $3 \times 3$ zero matrix. Its characteristic polynomial is $\lambda^3$, the same as $A$. But its minimal polynomial is $\lambda$ (since $B = 0$ already): the equation $B = 0$ is immediate, and no power smaller than $\lambda^1$ could annihilate a nonzero scalar matrix. The minimal polynomial $\lambda$ has no repeated roots, confirming $B$ is diagonalisable — it is already diagonal, with all entries zero.
The characteristic polynomials of $A$ and $B$ are identical, but their minimal polynomials differ, and they are not similar: $B = 0$ is similar only to itself, while $A$ is not similar to $0$.
[/example]
## Applications: Matrix Functions and Exponentials
The Jordan Normal Form is not just a classification tool — it is a powerful computational device. Any function of a matrix that can be defined via power series or polynomial interpolation becomes explicit once the Jordan Normal Form is known.
The most important application in analysis and differential equations is the matrix exponential $e^{tA}$. The need for such an object arises immediately when solving linear ODE systems $\dot{x} = Ax$: in the scalar case, the solution is $x(t) = e^{at} x_0$, and one wants an analogous formula for systems. For a diagonalisable $A = P \operatorname{diag}(\lambda_1, \ldots, \lambda_n) P^{-1}$, the formula $e^{tA} = P \operatorname{diag}(e^{t\lambda_1}, \ldots, e^{t\lambda_n}) P^{-1}$ is natural. But what if $A$ is not diagonalisable? The naive approach of exponentiating each diagonal entry breaks down. Instead, one defines $e^{tA}$ by the power series directly, which converges for every matrix and reduces to the diagonal formula when $A$ is diagonalisable.
[definition: Matrix Exponential]
For $A \in \mathrm{Mat}_n(\mathbb{C})$ and $t \in \mathbb{C}$, the **matrix exponential** is
\begin{align*}
e^{tA} &= \sum_{k=0}^\infty \frac{t^k A^k}{k!}.
\end{align*}
This series converges absolutely in any matrix norm, since the terms are bounded in norm by $\|tA\|^k / k!$ and $\sum_{k=0}^\infty \|tA\|^k / k! = e^{\|tA\|} < \infty$.
[/definition]
This object governs the solution to the linear ODE system $\dot{x} = Ax$: the unique solution with $x(0) = x_0$ is $x(t) = e^{tA} x_0$, regardless of whether $A$ is diagonalisable.
The Jordan Normal Form makes $e^{tA}$ fully explicit. If $A = PJP^{-1}$ where $J$ is the Jordan Normal Form, then $e^{tA} = P e^{tJ} P^{-1}$. Since $J$ is block diagonal, $e^{tJ}$ is block diagonal with blocks $e^{t J_{m_i}(\lambda_i)}$. Writing $J_{m}(\lambda) = \lambda I + N$ where $N$ is the nilpotent shift, and using the fact that $\lambda I$ and $N$ commute:
\begin{align*}
e^{t J_m(\lambda)} &= e^{t\lambda I + tN} = e^{t\lambda} e^{tN} = e^{t\lambda} \sum_{k=0}^{m-1} \frac{t^k N^k}{k!},
\end{align*}
where the sum terminates at $k = m-1$ because $N^m = 0$. Since $(N^k)_{ij} = \mathbb{1}_{j = i + k}$, we get:
\begin{align*}
e^{t J_m(\lambda)} &= e^{t\lambda} \begin{pmatrix} 1 & t & \frac{t^2}{2!} & \cdots & \frac{t^{m-1}}{(m-1)!} \\ 0 & 1 & t & \cdots & \frac{t^{m-2}}{(m-2)!} \\ 0 & 0 & 1 & \cdots & \vdots \\ \vdots & & & \ddots & t \\ 0 & 0 & 0 & \cdots & 1 \end{pmatrix}.
\end{align*}
This formula is the source of the well-known phenomenon in ODE theory: solutions of $\dot{x} = Ax$ involve terms of the form $t^k e^{\lambda t}$ (polynomials in $t$ times exponentials), and the degree $k$ is bounded by one less than the size of the largest Jordan block at $\lambda$.
[example: Matrix Exponential via Jordan Normal Form]
Let $A = \begin{pmatrix} 3 & 1 \\ 0 & 3 \end{pmatrix} = J_2(3)$. This is already in Jordan Normal Form with a single block of size $2$ at $\lambda = 3$. Using the formula above:
\begin{align*}
e^{tA} &= e^{3t} \begin{pmatrix} 1 & t \\ 0 & 1 \end{pmatrix}.
\end{align*}
Verification: $e^{tA}$ at $t = 0$ is $I$. The derivative $\frac{d}{dt} e^{tA}|_{t=0}$ should be $A$. Indeed, $\frac{d}{dt}\left(e^{3t}\begin{pmatrix}1 & t \\ 0 & 1\end{pmatrix}\right)|_{t=0} = 3 \begin{pmatrix}1 & 0 \\ 0 & 1\end{pmatrix} + \begin{pmatrix}0 & 1 \\ 0 & 0\end{pmatrix} = \begin{pmatrix}3 & 1 \\ 0 & 3\end{pmatrix} = A$. The solution to $\dot{x} = Ax$ with initial condition $x(0) = x_0$ is $x(t) = e^{tA}x_0 = e^{3t}\begin{pmatrix}1 & t \\ 0 & 1\end{pmatrix}x_0$.
If $x_0 = (0, 1)^\top$, then $x(t) = e^{3t}(t, 1)^\top$: the first component grows as $t e^{3t}$, a polynomial-times-exponential behaviour arising directly from the Jordan block size $2$.
[/example]
More generally, for any function $f$ that is analytic on a neighbourhood of the spectrum of $A$, the Jordan Normal Form gives the formula for $f(A)$ by replacing each Jordan block $J_m(\lambda)$ with the matrix whose $(i,j)$ entry (for $j \geq i$) is $\frac{f^{(j-i)}(\lambda)}{(j-i)!}$. This is the **Jordan functional calculus**.
[quotetheorem:3287]
The upper-triangular structure of $f(J_m(\lambda))$ reflects the Taylor coefficients of $f$ at $\lambda$: the operator $J_m(\lambda)$ "probes" $f$ at $\lambda$ through all derivatives up to order $m-1$.
## Invariant Subspaces and Cyclic Decomposition
The Jordan Normal Form reveals a rich structure of invariant subspaces. For each Jordan block $J_m(\lambda)$, the corresponding cyclic subspace $C = \operatorname{span}\{v, Tv, T^2v, \ldots, T^{m-1}v\}$ is $T$-invariant, and within it there is a complete nested chain of smaller invariant subspaces.
[remark: Invariant Flag Inside Each Cyclic Subspace]
For a Jordan chain $v, Tv, \ldots, T^{m-1}v$ of length $m$, the subspaces
\begin{align*}
0 \subset \operatorname{span}\{T^{m-1}v\} \subset \operatorname{span}\{T^{m-2}v, T^{m-1}v\} \subset \cdots \subset \operatorname{span}\{v, Tv, \ldots, T^{m-1}v\}
\end{align*}
form a complete $T$-invariant flag of length $m$ inside the cyclic subspace. Each member of the flag is spanned by a tail of the Jordan chain and is mapped into the next smaller subspace by $T - \lambda I$. This flag is a hallmark of a non-trivial Jordan block: for a $1 \times 1$ block (an eigenvector), the flag collapses to just $0 \subset \operatorname{span}\{v\}$.
[/remark]
Understanding the invariant subspaces of $T$ is important in many applications, from representation theory to control theory. The Jordan Normal Form gives a complete description of the $T$-invariant subspaces of $V$ in terms of the block structure.
When there is only one Jordan block at each eigenvalue (equivalently, when the geometric multiplicity of every eigenvalue is $1$), the structure is especially clean. What makes this case significant is not merely the sparseness of the block structure, but a striking generating property: a single vector $v$ then suffices to span all of $V$ by iterating $T$. This means the operator's action is completely determined by how it behaves on one starting point — the minimal polynomial and the characteristic polynomial coincide, so $T$ is characterised up to similarity by a single polynomial. This is the maximally non-degenerate situation in Jordan theory: one chain per eigenvalue, no redundancy, and the richest possible connection between the operator and its associated polynomial.
[definition: Cyclic Operator]
A linear operator $T: V \to V$ is **cyclic** (or **non-derogatory**) if there exists a single vector $v \in V$ such that $\{v, Tv, T^2v, \ldots, T^{n-1}v\}$ is a basis for $V$, where $n = \dim V$.
[/definition]
Equivalently, the minimal polynomial of $T$ equals its characteristic polynomial: $m_T = \chi_T$.
[explanation: Why Cyclic Operators Have Only One Jordan Block Per Eigenvalue]
If $T$ is cyclic with cyclic vector $v$, then $V = \operatorname{span}\{v, Tv, \ldots, T^{n-1}v\}$ is a single cyclic subspace. The Jordan blocks must together span $V$, and since the whole space is generated by iterating $T$ on a single vector, there can be only one Jordan chain and hence only one Jordan block per eigenvalue. Conversely, if $T$ has only one Jordan block per eigenvalue, one can always find a single vector that generates $V$ under $T$.
The condition $m_T = \chi_T$ is equivalent: since the size of the largest Jordan block at $\lambda$ equals the exponent of $(\lambda - \lambda_0)$ in $m_T$, having $m_T = \chi_T$ means the largest block at $\lambda_0$ already has size equal to the total algebraic multiplicity, so there can be only one block.
[/explanation]
[quotetheorem:3288]
The Cyclic Decomposition Theorem is essentially the Jordan Normal Form stated in terms of cyclic subspaces rather than blocks: each $C_i$ corresponds to a Jordan block, with $d_i$ being the block size. The uniqueness of $(d_1, \ldots, d_r)$ is the uniqueness of the Jordan Normal Form.
[remark: Module-Theoretic Perspective]
There is a clean algebraic framework that unifies the Jordan Normal Form theorem with other structure theorems in algebra. A linear operator $T: V \to V$ on a finite-dimensional $k$-vector space makes $V$ into a module over the polynomial ring $k[x]$ by declaring that $x \cdot v := Tv$. Under this identification, $T$-invariant subspaces of $V$ correspond to $k[x]$-submodules. Since $k[x]$ is a principal ideal domain (PID), the structure theorem for finitely generated modules over a PID applies: any finitely generated $k[x]$-module decomposes as a direct sum of cyclic modules $k[x] / (f_i(x))$. When $k = \mathbb{C}$ and $V$ is finite-dimensional, the cyclic summands with $f_i(x) = (x - \lambda)^m$ correspond exactly to the Jordan blocks $J_m(\lambda)$. The Jordan Normal Form theorem is thus a special case of the PID structure theorem — and the Rational Canonical Form (valid over any field, without requiring algebraic closure) is the other special case, where the $f_i$ need not split into linear factors. This perspective makes the uniqueness of the Jordan form a consequence of the uniqueness of the primary decomposition in module theory.
[/remark]
## References
Axler, S., *Linear Algebra Done Right* (3rd ed., 2015).
Horn, R.A. and Johnson, C.R., *Matrix Analysis* (2nd ed., 2013).
Halmos, P.R., *Finite-Dimensional Vector Spaces* (1958).
Gantmacher, F.R., *The Theory of Matrices*, Vols. 1–2 (1959).
Lang, S., *Algebra* (3rd ed., 2002).