Consider the following situation: you have a linear operator $T: V \to V$ on a finite-dimensional vector space $V$ over a field $k$, and you want to understand its algebraic structure. The characteristic polynomial $\chi_T(\lambda) = \det(\lambda I - T)$ captures some information — its roots are the eigenvalues of $T$ — but it has a fundamental flaw as a descriptor: it depends on the dimension of $V$ in a crude way, counting eigenvalues with geometric and algebraic multiplicity tangled together. Two operators on spaces of the same dimension can have the same characteristic polynomial while behaving completely differently under iteration. Worse, $\chi_T$ does not tell you the simplest algebraic relation that $T$ satisfies.
The question that leads to the minimal polynomial is deceptively simple: what is the monic polynomial of smallest degree that $T$ satisfies? That is, what is the simplest $p \in k[x]$ such that $p(T) = 0$ — the zero operator? This polynomial, called the minimal polynomial of $T$, is the true algebraic fingerprint of the operator. It tells you exactly what polynomial relations $T$ obeys, and no more. It is sensitive to the structure of $T$ in ways the characteristic polynomial is not: a diagonalisable operator has a minimal polynomial with no repeated roots, while a non-trivial Jordan block always has a repeated root. Operators with wildly different sizes can share the same minimal polynomial, because the minimal polynomial cares about algebraic behaviour, not dimension.
[example: Motivating Contrast Between Characteristic and Minimal Polynomial]
Let $V = k^4$ and let $T$ be the operator whose matrix in the standard basis is
\begin{align*}
A = \begin{pmatrix} 2 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 3 \end{pmatrix}.
\end{align*}
The characteristic polynomial is $\chi_A(\lambda) = (\lambda - 2)^2(\lambda - 3)^2$. Now consider also the operator $B$ on $k^2$ with matrix
\begin{align*}
B = \begin{pmatrix} 2 & 0 \\ 0 & 3 \end{pmatrix}.
\end{align*}
The characteristic polynomial of $B$ is $\chi_B(\lambda) = (\lambda - 2)(\lambda - 3)$, which is different from $\chi_A$. Yet both $A$ and $B$ are diagonalisable with the same eigenvalues $2$ and $3$. The polynomial $p(\lambda) = (\lambda - 2)(\lambda - 3)$ satisfies $p(A) = 0$ and $p(B) = 0$ — it is the minimal polynomial of both operators. The minimal polynomial sees through the dimension and captures what matters: both operators are diagonalisable with eigenvalues $2$ and $3$, and nothing more is needed to describe their algebraic behaviour.
Now contrast with the operator $C$ on $k^2$ with matrix
\begin{align*}
C = \begin{pmatrix} 2 & 1 \\ 0 & 2 \end{pmatrix}.
\end{align*}
The characteristic polynomial is $\chi_C(\lambda) = (\lambda - 2)^2$. Does $p(\lambda) = \lambda - 2$ annihilate $C$? No: $C - 2I = \begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix} \neq 0$. The minimal polynomial of $C$ is therefore $(\lambda - 2)^2$. The single Jordan block has no room to decouple; the repeated root in the minimal polynomial is an algebraic signature of the non-trivial nilpotent part.
[/example]
## Definition
To define the minimal polynomial precisely, we need to place $T$ inside the polynomial ring and identify which polynomials kill it.
Fix a field $k$ and a finite-dimensional $k$-vector space $V$. If $p(\lambda) = a_n \lambda^n + \cdots + a_1 \lambda + a_0 \in k[\lambda]$ is a polynomial with coefficients in $k$, and $T: V \to V$ is a linear operator, we define $p(T) \in \mathcal{L}(V, V)$ by substituting $T$ for $\lambda$:
\begin{align*}
p(T) = a_n T^n + \cdots + a_1 T + a_0 I,
\end{align*}
where $T^j$ denotes $j$-fold composition of $T$ with itself, and $T^0 = I$ is the identity. The set of polynomials that kill $T$,
\begin{align*}
\operatorname{Ann}(T) := \{ p \in k[\lambda] : p(T) = 0 \},
\end{align*}
is an ideal in $k[\lambda]$. Since $k[\lambda]$ is a principal ideal domain, this ideal is generated by a single polynomial — the one of smallest degree. We normalise it to be monic.
[definition: Minimal Polynomial of a Linear Operator]
Let $V$ be a finite-dimensional vector space over a field $k$, and let $T: V \to V$ be a linear operator. The **minimal polynomial** of $T$, denoted $m_T(\lambda) \in k[\lambda]$, is the unique monic polynomial of smallest positive degree such that $m_T(T) = 0$.
[/definition]
Equivalently, $m_T$ is the monic generator of the ideal $\operatorname{Ann}(T) = \{ p \in k[\lambda] : p(T) = 0 \}$ in $k[\lambda]$.
The existence of a nonzero annihilating polynomial is not obvious — it is the content of the Cayley–Hamilton theorem, which we state below. The uniqueness of the minimal polynomial (up to the monic normalisation) follows from the fact that any two generators of a principal ideal in $k[\lambda]$ differ by a unit, and the only monic units in $k[\lambda]$ are constants equal to $1$.
[remark: Minimal Polynomial of a Matrix]
When $T$ is represented by a matrix $A \in M_n(k)$ in some basis, the minimal polynomial of $T$ equals the minimal polynomial of $A$ (defined identically with $A$ in place of $T$). The minimal polynomial does not depend on the choice of basis, since $p(PAP^{-1}) = P\, p(A)\, P^{-1}$ for any invertible $P$.
[/remark]
The fundamental relationship between the minimal polynomial and the characteristic polynomial is established by the Cayley–Hamilton theorem, which guarantees that the characteristic polynomial always annihilates $T$.
[quotetheorem:407]
An immediate consequence is that the minimal polynomial $m_T$ divides the characteristic polynomial $\chi_T$ in $k[\lambda]$. Since $m_T$ is the generator of $\operatorname{Ann}(T)$ and $\chi_T \in \operatorname{Ann}(T)$, divisibility follows directly.
[quotetheorem:3291]
The divisibility runs in only one direction: $m_T$ divides $\chi_T$, but $\chi_T$ need not equal $m_T$. The motivating example above shows that $m_A(\lambda) = (\lambda - 2)(\lambda - 3)$ while $\chi_A(\lambda) = (\lambda - 2)^2(\lambda - 3)^2$; the characteristic polynomial has higher multiplicity. What is always true is that $m_T$ and $\chi_T$ share the same irreducible factors over $k$ — a fact that will follow from the structure theory we develop next.
[quotetheorem:3292]
This theorem makes precise the sense in which $m_T$ sees every eigenvalue of $T$: no root is hidden from the minimal polynomial.
## Eigenvalues and the Annihilator Ideal
Before developing the structural theory, it is worth understanding concretely what it means for a polynomial to annihilate a linear operator.
If $v \in V$ is an eigenvector of $T$ with eigenvalue $\lambda_0$, then for any polynomial $p$, we have $p(T)v = p(\lambda_0)v$. This is a direct computation:
\begin{align*}
p(T) v &= (a_n T^n + \cdots + a_1 T + a_0 I) v \\
&= a_n T^n v + \cdots + a_1 T v + a_0 v \\
&= a_n \lambda_0^n v + \cdots + a_1 \lambda_0 v + a_0 v \\
&= p(\lambda_0) v.
\end{align*}
If $p(T) = 0$, then $p(T)v = 0$ for all $v$, which forces $p(\lambda_0) = 0$ for every eigenvalue $\lambda_0$. This is why every eigenvalue of $T$ must be a root of $m_T$.
The converse direction — that every root of $m_T$ is an eigenvalue — is less immediate. Suppose $\lambda_0$ is a root of $m_T$, so $m_T(\lambda) = (\lambda - \lambda_0) q(\lambda)$ for some polynomial $q$ with $\deg q < \deg m_T$. Since $m_T$ is minimal, $q(T) \neq 0$, and there exists $w \in V$ with $q(T)w \neq 0$. Setting $v = q(T)w$, we compute:
\begin{align*}
T v = T(q(T)w) = (Tq(T))w.
\end{align*}
Since $m_T(T) = (T - \lambda_0 I) q(T) = 0$, we get $(T - \lambda_0 I)(q(T)w) = 0$, so $Tv = \lambda_0 v$. Thus $v \neq 0$ is an eigenvector with eigenvalue $\lambda_0$.
This argument shows that the roots of $m_T$ are exactly the eigenvalues of $T$. The theorem above is therefore proved.
[example: Computing the Minimal Polynomial]
Let $T: \mathbb{R}^3 \to \mathbb{R}^3$ be given by the matrix
\begin{align*}
A = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{pmatrix}.
\end{align*}
The characteristic polynomial is $\chi_A(\lambda) = (\lambda - 1)^3$, so the minimal polynomial must be one of $(\lambda - 1)$, $(\lambda - 1)^2$, or $(\lambda - 1)^3$.
First check $(\lambda - 1)$: is $A - I = 0$? We compute $A - I = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{pmatrix}$, which is nonzero. So $m_A \neq (\lambda - 1)$.
Next check $(\lambda - 1)^2$: compute $(A - I)^2$.
\begin{align*}
(A - I)^2 = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{pmatrix}^2 = \begin{pmatrix} 0 & 0 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix} \neq 0.
\end{align*}
So $m_A \neq (\lambda - 1)^2$ either.
Check $(\lambda - 1)^3$: compute $(A - I)^3$.
\begin{align*}
(A - I)^3 = \begin{pmatrix} 0 & 0 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix} \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{pmatrix} = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix} = 0.
\end{align*}
Therefore $m_A(\lambda) = (\lambda - 1)^3 = \chi_A(\lambda)$. The single Jordan block of size $3$ forces the minimal polynomial to equal the characteristic polynomial. This is a general fact: the minimal polynomial of a single $n \times n$ Jordan block with eigenvalue $\lambda_0$ is always $(\lambda - \lambda_0)^n$.
[/example]
[example: Misidentifying the Minimal Polynomial — What Goes Wrong]
Here is a concrete illustration of why one cannot simply guess the minimal polynomial without checking. Let $T: \mathbb{R}^3 \to \mathbb{R}^3$ have matrix
\begin{align*}
A = \begin{pmatrix} 2 & 1 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{pmatrix}.
\end{align*}
The characteristic polynomial is $\chi_A(\lambda) = (\lambda - 2)^2(\lambda - 3)$. A naive guess might be that since the eigenvalues are $2$ and $3$, the minimal polynomial is $m_A(\lambda) = (\lambda - 2)(\lambda - 3)$ — the product of distinct linear factors corresponding to the eigenvalues.
This is wrong. To see why, test whether $(A - 2I)(A - 3I) = 0$:
\begin{align*}
A - 2I &= \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}, \quad A - 3I = \begin{pmatrix} -1 & 1 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 0 \end{pmatrix}.
\end{align*}
\begin{align*}
(A - 2I)(A - 3I) = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}\begin{pmatrix} -1 & 1 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 0 \end{pmatrix} = \begin{pmatrix} 0 & -1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix} \neq 0.
\end{align*}
So $(\lambda - 2)(\lambda - 3)$ does not annihilate $A$. The $2 \times 2$ Jordan block for eigenvalue $2$ — the $1$ in the $(1,2)$ position of $A$ — prevents $(A - 2I)$ from killing everything in the generalised eigenspace. The correct minimal polynomial must include $(\lambda - 2)^2$. Indeed, compute $(A - 2I)^2$:
\begin{align*}
(A - 2I)^2 = \begin{pmatrix} 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}^2 = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix},
\end{align*}
and then:
\begin{align*}
(A - 2I)^2(A - 3I) = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}\begin{pmatrix} -1 & 1 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 0 \end{pmatrix} = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix} = 0.
\end{align*}
Therefore $m_A(\lambda) = (\lambda - 2)^2(\lambda - 3)$. The lesson: knowing the eigenvalues is not enough. You must also know the size of the largest Jordan block at each eigenvalue — which is exactly the exponent that appears in $m_T$.
[/example]
## Diagonalisability and the Minimal Polynomial
One of the most striking applications of the minimal polynomial is a clean algebraic criterion for diagonalisability. The question of whether an operator can be diagonalised is fundamental in linear algebra, and the minimal polynomial answers it definitively.
Recall that $T$ is diagonalisable over $k$ if $V$ has a basis of eigenvectors of $T$. Equivalently, $V$ decomposes as a direct sum of eigenspaces:
\begin{align*}
V = E_{\lambda_1} \oplus E_{\lambda_2} \oplus \cdots \oplus E_{\lambda_r},
\end{align*}
where $E_{\lambda_i} = \ker(T - \lambda_i I)$ is the eigenspace for eigenvalue $\lambda_i$, and $\lambda_1, \ldots, \lambda_r$ are the distinct eigenvalues.
The key insight is that $T$ restricted to $E_{\lambda_i}$ satisfies $(T - \lambda_i I)|_{E_{\lambda_i}} = 0$, so the minimal polynomial of $T|_{E_{\lambda_i}}$ is $(\lambda - \lambda_i)$. When $V$ is a direct sum of such spaces, the minimal polynomial of $T$ is the least common multiple of the minimal polynomials on each summand. Since all these are distinct linear factors, their lcm has no repeated roots.
[quotetheorem:3277]
This criterion has a remarkable corollary: if the characteristic polynomial of $T$ has $n = \dim V$ distinct roots in $k$, then $T$ is automatically diagonalisable, since in that case $m_T$ must divide a polynomial with distinct linear factors.
[explanation: Why the Criterion Works — The Direct Sum Decomposition]
To understand why no repeated roots in $m_T$ implies diagonalisability, consider what a repeated factor $(\lambda - \lambda_0)^2$ in $m_T$ would mean. It would mean that $(T - \lambda_0 I)^2 = 0$ on some part of $V$, but $(T - \lambda_0 I) \neq 0$ there. A vector $v$ with $(T - \lambda_0 I)^2 v = 0$ but $(T - \lambda_0 I) v \neq 0$ is a generalised eigenvector of order $2$: it satisfies $Tv = \lambda_0 v + (T - \lambda_0 I)v$, and $(T - \lambda_0 I)v \neq 0$ is itself an eigenvector. Such a vector cannot be part of an eigenbasis, because $Tv \neq \lambda_0 v$. The repeated root captures the presence of a non-trivial Jordan block, which is precisely the obstruction to diagonalisability.
Conversely, if $m_T(\lambda) = (\lambda - \lambda_1)\cdots(\lambda - \lambda_r)$ with distinct $\lambda_i$, then by the Chinese Remainder Theorem in $k[\lambda]$, we get an isomorphism
\begin{align*}
k[\lambda]/(m_T) \cong k[\lambda]/(\lambda - \lambda_1) \times \cdots \times k[\lambda]/(\lambda - \lambda_r) \cong k^r.
\end{align*}
Applying this to the module structure of $V$ over $k[\lambda]$ (where $\lambda$ acts as $T$), we get a corresponding decomposition of $V$ into a direct sum of eigenspaces — exactly the decomposition that witnesses diagonalisability.
[/explanation]
[example: Failure of Diagonalisability via Minimal Polynomial]
Let $T: \mathbb{R}^2 \to \mathbb{R}^2$ have matrix
\begin{align*}
A = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}.
\end{align*}
The characteristic polynomial is $\chi_A(\lambda) = \lambda^2 + 1$, which has no roots in $\mathbb{R}$. Therefore $T$ has no real eigenvalues and cannot be diagonalised over $\mathbb{R}$. The minimal polynomial must be $\lambda^2 + 1$ (since $A^2 + I = \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix} + \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = 0$, confirming $m_A(\lambda) = \lambda^2 + 1$). This polynomial does not split over $\mathbb{R}$, so the diagonalisability criterion correctly predicts non-diagonalisability.
Over $\mathbb{C}$, $\lambda^2 + 1 = (\lambda - i)(\lambda + i)$ has distinct linear factors, so $T$ is diagonalisable over $\mathbb{C}$. This illustrates how the diagonalisability criterion is sensitive to the ground field.
[/example]
## Jordan Structure and the Primary Decomposition
The minimal polynomial does more than detect diagonalisability — it encodes the full Jordan structure of $T$, provided the characteristic polynomial splits completely over $k$. The bridge between the minimal polynomial and Jordan form is the primary decomposition theorem.
Suppose $\chi_T$ splits over $k$ with distinct eigenvalues $\lambda_1, \ldots, \lambda_r$. The characteristic polynomial factors as
\begin{align*}
\chi_T(\lambda) = (\lambda - \lambda_1)^{a_1} \cdots (\lambda - \lambda_r)^{a_r}
\end{align*}
and the minimal polynomial factors as
\begin{align*}
m_T(\lambda) = (\lambda - \lambda_1)^{e_1} \cdots (\lambda - \lambda_r)^{e_r},
\end{align*}
where $1 \le e_i \le a_i$ for each $i$.
[quotetheorem:411]
The exponent $e_i$ in the minimal polynomial is the size of the largest Jordan block for eigenvalue $\lambda_i$. This is the deepest structural information the minimal polynomial carries: it sees the largest Jordan block at each eigenvalue, even though it cannot distinguish between operators with different numbers of Jordan blocks of the same size.
[explanation: What the Minimal Polynomial Sees and Does Not See]
Here is precisely what the minimal polynomial determines and what it does not. Suppose $T$ and $S$ are both operators on $k^n$ with the same minimal polynomial. They necessarily have the same eigenvalues with the same maximum Jordan block sizes at each eigenvalue. However, they can differ in the total number of Jordan blocks of each size.
For a concrete example, take $k = \mathbb{R}$, $n = 4$, and compare:
\begin{align*}
T &= \begin{pmatrix} 1 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \end{pmatrix}, \qquad
S = \begin{pmatrix} 1 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}.
\end{align*}
Both have minimal polynomial $(\lambda - 1)^2$: for $T$, $(T - I)^2 = 0$ because both Jordan blocks have size $2$; for $S$, $(S - I)^2 = 0$ because the largest Jordan block has size $2$. Yet $T$ has two Jordan blocks of size $2$, while $S$ has one of size $2$ and one of size $1$. They are not similar as linear operators — they have different Jordan normal forms — but they share the same minimal polynomial.
The characteristic polynomial, by contrast, does distinguish these: $\chi_T(\lambda) = (\lambda - 1)^4$ and $\chi_S(\lambda) = (\lambda - 1)^4$ — they are also equal. So neither the minimal nor the characteristic polynomial alone determines the Jordan form; together they constrain it but do not determine it in general. The full Jordan structure requires the elementary divisors (the sizes of all Jordan blocks), not just the minimal polynomial.
[/explanation]
[illustration:jordan-block-structure]
## Minimal Polynomial and Cyclic Subspaces
There is a more computational approach to the minimal polynomial that illuminates how to find it in practice, via cyclic subspaces generated by a single vector.
Given $T: V \to V$ and a nonzero vector $v \in V$, the **cyclic subspace** generated by $v$ is the smallest $T$-invariant subspace containing $v$:
\begin{align*}
Z(v, T) := \operatorname{span}\{v, Tv, T^2v, \ldots\}.
\end{align*}
Since $V$ is finite-dimensional, this sequence must eventually become linearly dependent. Let $d$ be the smallest positive integer such that $T^d v \in \operatorname{span}\{v, Tv, \ldots, T^{d-1}v\}$, so that
\begin{align*}
T^d v + c_{d-1} T^{d-1} v + \cdots + c_1 Tv + c_0 v = 0
\end{align*}
for some scalars $c_0, \ldots, c_{d-1}$. The monic polynomial $p_v(\lambda) = \lambda^d + c_{d-1}\lambda^{d-1} + \cdots + c_0$ is the **minimal polynomial of $T$ relative to $v$**: it is the monic polynomial of smallest degree such that $p_v(T)v = 0$.
Why single out one vector? Because the global minimal polynomial $m_T$ is determined by how $T$ acts on individual vectors: if we track the polynomial relation that kills the orbit $\{v, Tv, T^2 v, \ldots\}$ for each $v$, we can reconstruct $m_T$ as their least common multiple. Isolating the annihilator ideal of a single vector is not a restriction — it is a computational lens that makes the structure of $m_T$ visible one cyclic orbit at a time.
[definition: Minimal Polynomial of T Relative to a Vector]
Let $T: V \to V$ be a linear operator and $v \in V$ a nonzero vector. The **minimal polynomial of $T$ relative to $v$**, denoted $m_{T,v}(\lambda)$, is the unique monic polynomial of smallest degree in $k[\lambda]$ such that $m_{T,v}(T) v = 0$.
[/definition]
Equivalently, $m_{T,v}$ is the monic generator of the ideal $\{ p \in k[\lambda] : p(T)v = 0 \}$ in $k[\lambda]$.
The relationship between $m_{T,v}$ and $m_T$ is: $m_{T,v}$ divides $m_T$ for every $v$, and $m_T$ is the least common multiple of all $m_{T,v}$ as $v$ ranges over $V$. This gives a strategy for computing $m_T$: find vectors $v$ that maximise $\deg m_{T,v}$. When $V$ is a cyclic $k[\lambda]$-module — i.e., when there exists a single vector $v$ with $Z(v,T) = V$ — then $m_{T,v} = m_T = \chi_T$.
[quotetheorem:3293]
The existence of such a "cyclic vector" $v$ with $m_{T,v} = m_T$ is a useful tool: to compute the minimal polynomial, one can often find a well-chosen vector and iterate $T$ until the sequence becomes dependent.
[example: Computing Minimal Polynomial via a Cyclic Vector]
Let $T: \mathbb{R}^3 \to \mathbb{R}^3$ have matrix
\begin{align*}
A = \begin{pmatrix} 0 & 0 & -2 \\ 1 & 0 & 1 \\ 0 & 1 & 2 \end{pmatrix}.
\end{align*}
We use the cyclic vector method: pick a candidate vector $v$, iterate $T$, and find the minimal dependence relation among $\{v, Av, A^2v, \ldots\}$ to read off $m_{A,v}$. If $\deg m_{A,v} = \dim V = 3$, then $v$ is a cyclic vector and $m_{A,v} = m_A$.
Take $v = e_1 = (1, 0, 0)^\top$. Compute the orbit:
\begin{align*}
v &= \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}, \quad
Av = \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix}, \quad
A^2 v = A(Av) = A\begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}.
\end{align*}
The set $\{v, Av, A^2v\} = \{e_1, e_2, e_3\}$ is the standard basis of $\mathbb{R}^3$, hence linearly independent. So we continue to $A^3 v$:
\begin{align*}
A^3 v = A(A^2 v) = A\begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix} = \begin{pmatrix} -2 \\ 1 \\ 2 \end{pmatrix}.
\end{align*}
Now $A^3 v$ must lie in $\operatorname{span}\{v, Av, A^2v\}$; we find the unique coefficients $c_0, c_1, c_2$ such that $A^3 v = c_0 v + c_1 Av + c_2 A^2 v$:
\begin{align*}
\begin{pmatrix} -2 \\ 1 \\ 2 \end{pmatrix} = c_0 \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} + c_1 \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} + c_2 \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix},
\end{align*}
giving $c_0 = -2$, $c_1 = 1$, $c_2 = 2$. The dependence relation is $A^3 v - 2A^2 v - Av + 2v = 0$, so the minimal polynomial of $A$ relative to $v = e_1$ is
\begin{align*}
m_{A, e_1}(\lambda) = \lambda^3 - 2\lambda^2 - \lambda + 2.
\end{align*}
Since $\deg m_{A,e_1} = 3 = \dim \mathbb{R}^3$, the vector $e_1$ is a cyclic vector for $A$, and $m_A = m_{A,e_1} = \lambda^3 - 2\lambda^2 - \lambda + 2$.
Factoring confirms: $\lambda^3 - 2\lambda^2 - \lambda + 2 = (\lambda - 2)(\lambda^2 - 1) = (\lambda - 2)(\lambda - 1)(\lambda + 1)$, three distinct linear factors. By the diagonalisability criterion, $A$ is diagonalisable over $\mathbb{R}$, and $m_A = \chi_A$.
[/example]
## Minimal Polynomial and Invariant Subspaces
The minimal polynomial interacts beautifully with $T$-invariant subspaces. If $W \subset V$ is a $T$-invariant subspace — meaning $T(W) \subset W$ — then $T|_W: W \to W$ is itself a linear operator, and we can ask for its minimal polynomial $m_{T|_W}$.
The minimal polynomial of the restriction divides that of the full operator: $m_{T|_W}$ divides $m_T$. This is because any annihilating polynomial for $T$ also annihilates $T|_W$. The ideal of polynomials killing $T|_W$ contains the ideal of polynomials killing $T$, so its generator $m_{T|_W}$ divides the generator $m_T$ of the smaller ideal.
[quotetheorem:3294]
This has a useful converse flavour: if you know $m_T$ and want to understand invariant subspaces, you know that any invariant subspace has a restriction operator whose minimal polynomial is built from the same prime factors as $m_T$.
[explanation: How to Use the Restriction Theorem]
Suppose $m_T(\lambda) = (\lambda - 2)^3 (\lambda - 5)$. If $W$ is any $T$-invariant subspace, then $m_{T|_W}$ divides $(\lambda - 2)^3(\lambda - 5)$, so $m_{T|_W}$ is of the form $(\lambda - 2)^j (\lambda - 5)^k$ with $j \le 3$ and $k \le 1$. In particular, every element of $W$ satisfies $(T - 2I)^3(T - 5I)v = 0$. If $W$ is contained in an eigenspace for eigenvalue $2$, then $m_{T|_W} = (\lambda - 2)$ — a single linear factor. The restriction theorem thus gives a constraint that limits how "complicated" the behaviour of $T$ can be on any invariant subspace.
[/explanation]
The restriction theorem is also key in proving the primary decomposition theorem. Each generalised eigenspace $K_i = \ker(T - \lambda_i I)^{e_i}$ is $T$-invariant, and the minimal polynomial of $T|_{K_i}$ is exactly $(\lambda - \lambda_i)^{e_i}$ — neither smaller (since $(T - \lambda_i I)^{e_i - 1}|_{K_i} \neq 0$) nor larger (since it must divide $m_T$).
## References
S. Axler, *Linear Algebra Done Right* (3rd edition, 2015).
P.R. Halmos, *Finite-Dimensional Vector Spaces* (1958).
N. Jacobson, *Basic Algebra I* (2nd edition, 1985).
S. Lang, *Algebra* (Revised 3rd edition, 2002).
W.A. Adkins and S.H. Weintraub, *Algebra: An Approach via Module Theory* (1992).