[proofplan]
We compare the two transfer functions at infinity to obtain equality of all Markov parameters $CA^kB=\hat C\hat A^k\hat B$. Using these identities, we define a [linear map](/page/Linear%20Map) from reachable generators of the first state space to reachable generators of the second state space. Observability proves that this map preserves all linear relations, so it is well-defined, and reachability proves that it is an isomorphism. The intertwining identities for $A$, $B$, and $C$ then follow on reachable generators, and rewriting the isomorphism as $S^{-1}$ gives the required similarity.
[/proofplan]
[step:Extract equality of all Markov parameters from the transfer functions]
Let $X := \mathbb{R}^n$, $\hat X := \mathbb{R}^{\hat n}$, $U := \mathbb{R}^m$, and $Y := \mathbb{R}^p$ denote the state, hatted state, input, and output spaces. Let $I_n: X \to X$ and $I_{\hat n}: \hat X \to \hat X$ denote the identity matrices. Let $\|A\|_{\mathrm{op}}$ denote the operator norm of the linear map $A:X \to X$ induced by the Euclidean norm on $X$, and let $\|\hat A\|_{\mathrm{op}}$ denote the operator norm of the linear map $\hat A:\hat X \to \hat X$ induced by the Euclidean norm on $\hat X$.
For all complex numbers $z$ with $|z| > \max\{\|A\|_{\mathrm{op}},\|\hat A\|_{\mathrm{op}}\}$, the Neumann series for the resolvents gives
\begin{align*}
(zI_n-A)^{-1} = \sum_{k=0}^{\infty} z^{-k-1} A^k
\end{align*}
and
\begin{align*}
(zI_{\hat n}-\hat A)^{-1} = \sum_{k=0}^{\infty} z^{-k-1} \hat A^k.
\end{align*}
The equality of transfer functions holds on the common exterior domain where both resolvents are defined. Substituting these expansions into that equality and cancelling $D$ gives
\begin{align*}
\sum_{k=0}^{\infty} z^{-k-1} CA^kB = \sum_{k=0}^{\infty} z^{-k-1} \hat C\hat A^k\hat B.
\end{align*}
Define $w := z^{-1}$. Then the preceding identity becomes an equality of matrix-valued [power series](/page/Power%20Series) in $w$ on a neighbourhood of $0$:
\begin{align*}
\sum_{k=0}^{\infty} w^{k+1} CA^kB = \sum_{k=0}^{\infty} w^{k+1} \hat C\hat A^k\hat B.
\end{align*}
Applying uniqueness of power-series coefficients entrywise to each scalar component, for every integer $k \geq 0$,
\begin{align*}
CA^kB = \hat C\hat A^k\hat B.
\end{align*}
[/step]
[step:Define the candidate state-space map on reachable generators]
We use the standard meaning of minimality for finite-dimensional realizations: $(A,B)$ and $(\hat A,\hat B)$ are reachable, while $(C,A)$ and $(\hat C,\hat A)$ are observable. Define a linear map first on formal finite sums of reachable generators by sending each vector $A^kBu \in X$, with $k \geq 0$ and $u \in U$, to the vector $\hat A^k\hat Bu \in \hat X$. More explicitly, for a finite index set $F$, integers $k_i \geq 0$, and vectors $u_i \in U$, set
\begin{align*}
T\left(\sum_{i \in F} A^{k_i}Bu_i\right) := \sum_{i \in F} \hat A^{k_i}\hat Bu_i,
\end{align*}
provided this prescription is independent of the chosen representation. We now verify that independence.
Suppose
\begin{align*}
\sum_{i \in F} A^{k_i}Bu_i = 0
\end{align*}
in $X$. Define
\begin{align*}
\hat x := \sum_{i \in F} \hat A^{k_i}\hat Bu_i \in \hat X.
\end{align*}
For every integer $j \geq 0$, using linearity and the Markov-parameter identities from the previous step,
\begin{align*}
\hat C\hat A^j\hat x
=
\sum_{i \in F} \hat C\hat A^{j+k_i}\hat Bu_i.
\end{align*}
Thus
\begin{align*}
\hat C\hat A^j\hat x
=
\sum_{i \in F} CA^{j+k_i}Bu_i.
\end{align*}
Factoring $CA^j$ from the finite sum gives
\begin{align*}
\hat C\hat A^j\hat x = CA^j\left(\sum_{i \in F} A^{k_i}Bu_i\right)=0.
\end{align*}
Since $(\hat C,\hat A)$ is observable, the condition $\hat C\hat A^j\hat x=0$ for all $j \geq 0$ implies $\hat x=0$. Hence every linear relation among the vectors $A^kBu$ is carried to the corresponding linear relation among the vectors $\hat A^k\hat Bu$, so the displayed prescription defines a well-defined linear map
\begin{align*}
T:X \to \hat X.
\end{align*}
[guided]
The possible obstruction is that a vector in $X$ may have many representations as a finite sum of reachable generators. We must prove that the proposed value of $T$ does not depend on which representation is chosen.
So assume that a finite linear combination of reachable generators vanishes in the first realization:
\begin{align*}
\sum_{i \in F} A^{k_i}Bu_i = 0,
\end{align*}
where $F$ is a finite index set, $k_i \geq 0$, and $u_i \in U$. The corresponding vector in the hatted state space is
\begin{align*}
\hat x := \sum_{i \in F} \hat A^{k_i}\hat Bu_i \in \hat X.
\end{align*}
To prove well-definedness, we must show $\hat x=0$.
The only available mechanism for proving that a hatted state vector is zero is observability of $(\hat C,\hat A)$. Therefore we test $\hat x$ through every output map $\hat C\hat A^j$. For any integer $j \geq 0$, linearity gives
\begin{align*}
\hat C\hat A^j\hat x
=
\sum_{i \in F} \hat C\hat A^{j+k_i}\hat Bu_i.
\end{align*}
The Markov-parameter identity says that $\hat C\hat A^\ell \hat B = CA^\ell B$ for every $\ell \geq 0$. Applying it with $\ell=j+k_i$ for each $i \in F$ gives
\begin{align*}
\hat C\hat A^j\hat x
=
\sum_{i \in F} CA^{j+k_i}Bu_i.
\end{align*}
Now $CA^j$ is a fixed linear map from $X$ to $Y$, so it factors through the finite sum:
\begin{align*}
\hat C\hat A^j\hat x
=
CA^j\left(\sum_{i \in F} A^{k_i}Bu_i\right).
\end{align*}
The sum inside parentheses is zero by hypothesis, hence
\begin{align*}
\hat C\hat A^j\hat x = 0.
\end{align*}
This holds for every $j \geq 0$. Observability of $(\hat C,\hat A)$ means precisely that the only vector in $\hat X$ invisible to all maps $\hat C\hat A^j$ is the zero vector. Therefore $\hat x=0$.
Thus every relation among reachable generators in the first realization is preserved in the hatted realization. This proves that the assignment
\begin{align*}
A^kBu \mapsto \hat A^k\hat Bu
\end{align*}
extends to a well-defined linear map $T:X \to \hat X$.
[/guided]
[/step]
[step:Use reachability to prove that the candidate map is an isomorphism]
Because $(A,B)$ is reachable, the span of the set
\begin{align*}
\{A^kBu : k \geq 0,\ u \in U\}
\end{align*}
is all of $X$. Thus the map $T:X \to \hat X$ is defined on all of $X$ by the preceding step.
Because $(\hat A,\hat B)$ is reachable, every vector of $\hat X$ is a finite linear combination of vectors $\hat A^k\hat Bu$. Each such vector is the image under $T$ of $A^kBu$, so $T$ is surjective.
Constructing the same map in the reverse direction gives a well-defined linear map
\begin{align*}
R:\hat X \to X
\end{align*}
such that $R(\hat A^k\hat Bu)=A^kBu$ for all $k \geq 0$ and $u \in U$. On the spanning set of $X$ consisting of reachable generators,
\begin{align*}
(RT)(A^kBu)=R(\hat A^k\hat Bu)=A^kBu.
\end{align*}
Hence $RT=I_X$. Similarly, on the spanning set of $\hat X$ consisting of hatted reachable generators,
\begin{align*}
(TR)(\hat A^k\hat Bu)=T(A^kBu)=\hat A^k\hat Bu,
\end{align*}
so $TR=I_{\hat X}$. Therefore $T$ is invertible, $R=T^{-1}$, and in particular $n=\hat n$.
[/step]
[step:Prove that the isomorphism intertwines the state and input matrices]
For every $u \in U$,
\begin{align*}
TB u = T(A^0Bu)=\hat A^0\hat Bu=\hat Bu.
\end{align*}
Thus, as linear maps $U \to \hat X$,
\begin{align*}
TB=\hat B.
\end{align*}
Next let $x \in X$. Since $(A,B)$ is reachable, there exist a finite index set $F$, integers $k_i \geq 0$, and vectors $u_i \in U$ such that
\begin{align*}
x=\sum_{i \in F} A^{k_i}Bu_i.
\end{align*}
Using the definition of $T$ and linearity,
\begin{align*}
T(Ax)=T\left(\sum_{i \in F} A^{k_i+1}Bu_i\right)=\sum_{i \in F} \hat A^{k_i+1}\hat Bu_i.
\end{align*}
Factoring $\hat A$ gives
\begin{align*}
T(Ax)=\hat A\left(\sum_{i \in F} \hat A^{k_i}\hat Bu_i\right)=\hat A T x.
\end{align*}
Therefore
\begin{align*}
TA=\hat A T.
\end{align*}
[/step]
[step:Prove that the isomorphism intertwines the output matrices]
Let $x \in X$. Choose a finite representation
\begin{align*}
x=\sum_{i \in F} A^{k_i}Bu_i
\end{align*}
using reachability of $(A,B)$. Then, by the definition of $T$ and the Markov-parameter identities,
\begin{align*}
\hat CTx=\hat C\left(\sum_{i \in F} \hat A^{k_i}\hat Bu_i\right)=\sum_{i \in F} \hat C\hat A^{k_i}\hat Bu_i.
\end{align*}
Hence
\begin{align*}
\hat CTx=\sum_{i \in F} CA^{k_i}Bu_i=C\left(\sum_{i \in F} A^{k_i}Bu_i\right)=Cx.
\end{align*}
Since this holds for every $x \in X$,
\begin{align*}
\hat C T = C.
\end{align*}
[/step]
[step:Rewrite the intertwining identities as the desired similarity]
Set
\begin{align*}
S := T^{-1}:\hat X \to X.
\end{align*}
Since $T$ is invertible, $S$ is an invertible matrix in $\mathbb{R}^{n \times n}$ after identifying $n=\hat n$.
From $TA=\hat A T$, multiply on the left by $T^{-1}$ and on the right by $T^{-1}$ in the equivalent form $\hat A = TAT^{-1}$ to obtain
\begin{align*}
\hat A = S^{-1}AS.
\end{align*}
From $TB=\hat B$ and $T=S^{-1}$, we obtain
\begin{align*}
\hat B=S^{-1}B.
\end{align*}
From $\hat C T=C$ and $T=S^{-1}$, multiply on the right by $S$ to obtain
\begin{align*}
\hat C=CS.
\end{align*}
These are exactly the asserted similarity relations.
[/step]