[guided]The goal is to build coordinates in which the output and dynamics take the canonical form. Let $A\in\mathbb R^{n\times n}$ have characteristic polynomial $p$, and let $C:\mathbb R^n\to\mathbb R$ be a linear map such that $(C,A)$ is observable. Define the observability matrix $\mathcal O(C,A)\in\mathbb R^{n\times n}$ by making its $k$-th row equal to $CA^{k-1}$ for $1\leq k\leq n$. For a single-output pair, observability means precisely that this $n\times n$ observability matrix has rank $n$, hence $\mathcal O(C,A)$ is invertible. Set
\begin{align*}
P=\mathcal O(C,A)^{-1}.
\end{align*}
The observable coordinate vector is $z=\mathcal O(C,A)x$, equivalently $x=Pz$, for a state vector $x\in\mathbb R^n$.
We next compute how the dynamics look in these coordinates. Multiplying $\mathcal O(C,A)$ on the right by $A$ shifts the rows: the first $n-1$ rows of $\mathcal O(C,A)A$ are $CA,CA^2,\dots,CA^{n-1}$, which are exactly the second through $n$-th rows of $\mathcal O(C,A)$. The only row not obtained by this shift is the last row, $CA^n$. Since $A\in\mathbb R^{n\times n}$ is a square real matrix and $p$ is its characteristic polynomial, the [Cayley-Hamilton theorem](/theorems/921) gives $p(A)=0$. Thus
\begin{align*}
A^n+a_{n-1}A^{n-1}+\cdots+a_1A+a_0I_n=0.
\end{align*}
Multiplying this identity on the left by the linear map $C$ gives
\begin{align*}
CA^n=-a_{n-1}CA^{n-1}-\cdots-a_1CA-a_0C.
\end{align*}
Therefore the last row of $\mathcal O(C,A)A$ is the linear combination of the rows of $\mathcal O(C,A)$ with coefficients $-a_0,-a_1,\dots,-a_{n-1}$. Comparing this row-shift structure with the entrywise definition of $A_o$, we obtain
\begin{align*}
\mathcal O(C,A)A=A_o\mathcal O(C,A).
\end{align*}
Now multiply on the right by $\mathcal O(C,A)^{-1}=P$ to get
\begin{align*}
P^{-1}AP=A_o.
\end{align*}
Finally, the first row of $\mathcal O(C,A)$ is $C$, while the first row of the identity matrix is $C_o$. Hence
\begin{align*}
CP=C\mathcal O(C,A)^{-1}=C_o.
\end{align*}
Thus, in the observable coordinate basis determined by $P$, the pair $(C,A)$ becomes $(C_o,A_o)$. This proves that every observable single-output pair with characteristic polynomial $p$ is similar to the canonical pair.[/guided]