[proofplan]
We identify the unobservable initial states with the kernel of the finite observability map. If an output trajectory $t \mapsto C e^{tA}x$ vanishes identically, then differentiating at $t=0$ gives $CA^k x=0$ for every $k \ge 0$. Conversely, if the first $n$ quantities $CA^k x$ vanish, the [Cayley-Hamilton theorem](/theorems/865) makes all higher powers redundant, and the matrix exponential series then forces the whole output trajectory to vanish. Thus observability is exactly injectivity of $\mathcal{O}(C,A)$, which is equivalent to full rank.
[/proofplan]
[step:Define the unobservable subspace and identify the finite kernel]
Define the unobservable subspace
\begin{align*}
\mathcal{N}_o := \{x \in \mathbb{R}^n : C e^{tA}x = 0 \text{ for every } t \in \mathbb{R}\}.
\end{align*}
Since $\mathcal{O}(C,A)x=(CA^0x,CA^1x,\dots,CA^{n-1}x)$, its kernel is
\begin{align*}
\ker\mathcal{O}(C,A)=\bigcap_{k=0}^{n-1}\ker(CA^k).
\end{align*}
Therefore it remains to prove
\begin{align*}
\mathcal{N}_o=\ker\mathcal{O}(C,A).
\end{align*}
[/step]
[step:Differentiate the zero output trajectory to obtain the finite rank conditions]
Let $x \in \mathcal{N}_o$. Define
\begin{align*}
f_x: \mathbb{R} &\to \mathbb{R}^m
\end{align*}
by
\begin{align*}
f_x(t)=C e^{tA}x.
\end{align*}
The matrix exponential is smooth, and repeated differentiation gives
\begin{align*}
f_x^{(k)}(t)=C A^k e^{tA}x
\end{align*}
for every integer $k \ge 0$ and every $t \in \mathbb{R}$. Since $x \in \mathcal{N}_o$, the function $f_x$ is identically zero, so $f_x^{(k)}(0)=0$ for every $k \ge 0$. Hence
\begin{align*}
CA^k x=0
\end{align*}
for every $k \ge 0$, and in particular for $0 \le k \le n-1$. Thus $x \in \ker\mathcal{O}(C,A)$, so
\begin{align*}
\mathcal{N}_o \subseteq \ker\mathcal{O}(C,A).
\end{align*}
[/step]
[step:Use Cayley-Hamilton to propagate the first $n$ kernel conditions to all powers]
Let $x \in \ker\mathcal{O}(C,A)$. Then
\begin{align*}
CA^k x=0
\end{align*}
for every integer $k$ with $0 \le k \le n-1$.
By the [Cayley-Hamilton theorem](/theorems/923) (citing a result not yet in the wiki: Cayley-Hamilton theorem), every power $A^r$ with $r \ge n$ is a real linear combination of $I_n,A,\dots,A^{n-1}$. Therefore, for each $r \ge n$, there exist scalars $\alpha_{r,0},\dots,\alpha_{r,n-1} \in \mathbb{R}$ such that
\begin{align*}
A^r=\sum_{j=0}^{n-1}\alpha_{r,j}A^j.
\end{align*}
Multiplying this identity on the left by $C$ and on the right by $x$ gives
\begin{align*}
CA^r x=\sum_{j=0}^{n-1}\alpha_{r,j}CA^j x=0.
\end{align*}
Hence
\begin{align*}
CA^k x=0
\end{align*}
for every integer $k \ge 0$.
[guided]
We start from the finite information contained in $x \in \ker\mathcal{O}(C,A)$. By definition of the observability map, this means exactly that
\begin{align*}
CA^k x=0
\end{align*}
for every integer $k$ with $0 \le k \le n-1$.
Why should these first $n$ equations control all higher powers? The reason is that an $n \times n$ matrix satisfies its own characteristic polynomial. By the Cayley-Hamilton theorem (citing a result not yet in the wiki: Cayley-Hamilton theorem), the power $A^n$ is a real linear combination of $I_n,A,\dots,A^{n-1}$. Multiplying that identity by $A^{r-n}$ shows, for every integer $r \ge n$, that $A^r$ is also a real linear combination of $I_n,A,\dots,A^{n-1}$. Thus there are scalars $\alpha_{r,0},\dots,\alpha_{r,n-1} \in \mathbb{R}$ satisfying
\begin{align*}
A^r=\sum_{j=0}^{n-1}\alpha_{r,j}A^j.
\end{align*}
Now multiply by $C$ on the left and by $x$ on the right. This gives
\begin{align*}
CA^r x=\sum_{j=0}^{n-1}\alpha_{r,j}CA^j x.
\end{align*}
Each term $CA^j x$ on the right-hand side is zero because $0 \le j \le n-1$ and $x \in \ker\mathcal{O}(C,A)$. Therefore $CA^r x=0$ for every $r \ge n$. Combining this with the defining equations for $0 \le k \le n-1$, we obtain
\begin{align*}
CA^k x=0
\end{align*}
for every integer $k \ge 0$.
[/guided]
[/step]
[step:Expand the matrix exponential to show the output vanishes identically]
Let $x \in \ker\mathcal{O}(C,A)$. From the previous step, $CA^k x=0$ for every integer $k \ge 0$. For each $t \in \mathbb{R}$, the matrix exponential is given by the absolutely convergent series
\begin{align*}
e^{tA}=\sum_{k=0}^{\infty}\frac{t^k}{k!}A^k.
\end{align*}
Multiplying the series by $C$ on the left and by $x$ on the right is justified by convergence in the finite-dimensional matrix norm, and gives
\begin{align*}
C e^{tA}x=\sum_{k=0}^{\infty}\frac{t^k}{k!}CA^k x=0.
\end{align*}
Thus $x \in \mathcal{N}_o$, so
\begin{align*}
\ker\mathcal{O}(C,A)\subseteq \mathcal{N}_o.
\end{align*}
Together with the reverse inclusion proved earlier,
\begin{align*}
\mathcal{N}_o=\ker\mathcal{O}(C,A).
\end{align*}
[/step]
[step:Translate zero unobservable subspace into full rank]
By definition, $(C,A)$ is observable exactly when
\begin{align*}
\mathcal{N}_o=\{0\}.
\end{align*}
Since $\mathcal{N}_o=\ker\mathcal{O}(C,A)$, observability is equivalent to
\begin{align*}
\ker\mathcal{O}(C,A)=\{0\}.
\end{align*}
The map $\mathcal{O}(C,A):\mathbb{R}^n \to (\mathbb{R}^m)^n$ is linear. For a [linear map](/page/Linear%20Map) with domain $\mathbb{R}^n$, injectivity is equivalent to rank $n$ by the [rank-nullity theorem](/theorems/916). Therefore
\begin{align*}
(C,A)\text{ is observable}
\end{align*}
if and only if
\begin{align*}
\operatorname{rank}\mathcal{O}(C,A)=n.
\end{align*}
Using the identity
\begin{align*}
\ker\mathcal{O}(C,A)=\bigcap_{k=0}^{n-1}\ker(CA^k),
\end{align*}
this is equivalently
\begin{align*}
\bigcap_{k=0}^{n-1}\ker(CA^k)=\{0\}.
\end{align*}
[/step]