[proofplan]
We prove that observability is equivalent to the absence of unobservable eigenvectors. One direction is immediate: an eigenvector killed by $C$ is killed by every block $CA^k$ in the observability map. Conversely, a nonzero vector in the kernel of the observability map generates a nonzero $A$-invariant subspace of unobservable vectors; restricting $A$ to this subspace gives an eigenvector still killed by $C$. Finally, the PBH rank condition is exactly the statement that the simultaneous equations $(\lambda I_n-A)v=0$ and $Cv=0$ have only the zero solution.
[/proofplan]
[step:Show that an eigenvector in $\ker C$ is unobservable]
Let $p$ denote the number of rows of $C$. Define the observability map $\mathcal{O}_{C,A}:\mathbb{C}^n\to\mathbb{C}^{pn}$ by
\begin{align*}
\mathcal{O}_{C,A}v=(Cv,CAv,\dots,CA^{n-1}v).
\end{align*}
Recall that, for this finite-dimensional system, the pair $(C,A)$ is observable exactly when $\ker\mathcal{O}_{C,A}=\{0\}$.
Assume there exist $\lambda \in \mathbb{C}$ and $v \in \mathbb{C}^n\setminus\{0\}$ such that $Av=\lambda v$ and $Cv=0$. For each integer $k$ with $0\leq k\leq n-1$, induction on $k$ gives $A^k v=\lambda^k v$. Therefore
\begin{align*}
CA^k v=C(\lambda^k v)=\lambda^k Cv=0.
\end{align*}
Hence $\mathcal{O}_{C,A}v=0$. Since $v\neq 0$, the observability map has nonzero kernel, so $(C,A)$ is not observable.
[guided]
Suppose $v$ is an eigenvector of $A$ with eigenvalue $\lambda$ and suppose also that $C$ cannot see this vector, meaning $Cv=0$. The point is that applying powers of $A$ to $v$ only rescales $v$, so applying $C$ afterwards still gives zero.
We first verify the power identity. For $k=0$, $A^0v=I_nv=v=\lambda^0v$. If $A^k v=\lambda^k v$ for some $k\geq 0$, then
\begin{align*}
A^{k+1}v=A(A^k v)=A(\lambda^k v)=\lambda^k Av=\lambda^k\lambda v=\lambda^{k+1}v.
\end{align*}
Thus $A^k v=\lambda^k v$ for every integer $k\geq 0$. In particular, for $0\leq k\leq n-1$,
\begin{align*}
CA^k v=C(\lambda^k v)=\lambda^k Cv=0.
\end{align*}
Let $p$ be the number of rows of $C$. The observability map is the [linear map](/page/Linear%20Map) $\mathcal{O}_{C,A}:\mathbb{C}^n\to\mathbb{C}^{pn}$ defined by stacking precisely these vectors:
\begin{align*}
\mathcal{O}_{C,A}v=(Cv,CAv,\dots,CA^{n-1}v).
\end{align*}
By the definition of observability for this finite-dimensional system, $(C,A)$ is observable exactly when $\ker\mathcal{O}_{C,A}=\{0\}$. Every component is zero, so $\mathcal{O}_{C,A}v=0$. Since $v$ is nonzero, $\ker\mathcal{O}_{C,A}$ is nontrivial, and therefore $(C,A)$ is not observable.
[/guided]
[/step]
[step:Turn a nonzero unobservable vector into an unobservable eigenvector]
Assume $(C,A)$ is not observable. Define the subspace
\begin{align*}
W:=\ker\mathcal{O}_{C,A}=\bigcap_{k=0}^{n-1}\ker(CA^k)\subseteq \mathbb{C}^n.
\end{align*}
By non-observability, $W\neq\{0\}$.
We claim that $W$ is $A$-invariant. Let $w\in W$. For $0\leq k\leq n-2$,
\begin{align*}
CA^k(Aw)=CA^{k+1}w=0.
\end{align*}
For $k=n-1$, use the [Cayley-Hamilton Theorem](/theorems/865), which applies because $A$ is an $n\times n$ matrix over the field $\mathbb{C}$: if
\begin{align*}
\det(tI_n-A)=t^n+a_{n-1}t^{n-1}+\cdots+a_1t+a_0
\end{align*}
is the characteristic polynomial of $A$, then
\begin{align*}
A^n+a_{n-1}A^{n-1}+\cdots+a_1A+a_0I_n=0.
\end{align*}
Therefore
\begin{align*}
CA^{n-1}(Aw)=CA^n w=-\sum_{j=0}^{n-1}a_j CA^j w=0.
\end{align*}
Thus $Aw\in W$, so $W$ is $A$-invariant.
Since $W$ is a nonzero finite-dimensional complex [vector space](/page/Vector%20Space) and $A|_W:W\to W$ is a complex linear map, the [Fundamental Theorem of Algebra](/theorems/347) applied to the characteristic polynomial of $A|_W$ gives an eigenvalue $\lambda\in\mathbb{C}$ and a nonzero vector $v\in W$ such that $A|_Wv=\lambda v$. Since $v\in W\subseteq\ker C$, we have $Cv=0$. Hence there exists $v\neq 0$ with $Av=\lambda v$ and $Cv=0$.
[/step]
[step:Identify the rank condition with the absence of unobservable eigenvectors]
Fix $\lambda\in\mathbb{C}$. Since $C$ has $p$ rows, define the block map $P_\lambda:\mathbb{C}^n\to\mathbb{C}^{n+p}$ by
\begin{align*}
P_\lambda(v)=((\lambda I_n-A)v,Cv).
\end{align*}
The matrix of $P_\lambda$ is the block column matrix $\operatorname{col}(\lambda I_n-A,C)$.
By the [Rank-Nullity Theorem](/theorems/916) for the linear map $P_\lambda$, this block matrix has rank $n$ if and only if $\ker P_\lambda=\{0\}$. But
\begin{align*}
\ker P_\lambda=\{v\in\mathbb{C}^n:(\lambda I_n-A)v=0\text{ and }Cv=0\}.
\end{align*}
The equation $(\lambda I_n-A)v=0$ is exactly $Av=\lambda v$. Therefore the rank condition holds for every $\lambda\in\mathbb{C}$ if and only if there is no nonzero $v\in\mathbb{C}^n$ and no $\lambda\in\mathbb{C}$ such that $Av=\lambda v$ and $Cv=0$.
Combining the first two steps with this equivalence proves the Popov-Belevitch-Hautus observability criterion.
[/step]