[proofplan]
We first show that any non-reachable state directions can be removed without changing the transfer function, and likewise any non-observable state directions can be removed without changing the transfer function. We also record that the all-powers definitions of reachability and observability agree with the finite tests using powers $0,\dots,n-1$, by reducing higher powers through the characteristic polynomial of $A$. For the converse, equality of transfer functions is interpreted as equality of analytic matrix-valued functions near infinity, so equality of Laurent coefficients gives equality of all Markov parameters. These identities produce an injective intertwining map from a reachable and observable realization into any competing realization after observable reduction, forcing the desired dimension inequality and hence minimality.
[/proofplan]
[step:Delete unreachable state directions without changing the transfer function]
Let
\begin{align*}
\mathcal{R} := \operatorname{span}\{A^kBv : k \geq 0,\ v \in \mathbb{R}^m\} \subseteq \mathbb{R}^n
\end{align*}
be the reachable subspace of $(A,B)$, where $k$ ranges over the non-negative integers. Let
\begin{align*}
p_A(\lambda)=\lambda^n+a_{n-1}\lambda^{n-1}+\cdots+a_0
\end{align*}
be the characteristic polynomial of $A$. The matrix $A$ satisfies its characteristic polynomial, so $p_A(A)=0$, and therefore every power $A^k$ with $k\geq n$ is a linear combination of $I_n,A,\dots,A^{n-1}$ by induction on $k$. Hence
\begin{align*}
\mathcal{R}=\operatorname{span}\{A^kBv : 0\leq k\leq n-1,\ v\in\mathbb{R}^m\}.
\end{align*}
Thus this all-powers definition is equivalent to the finite reachability test in the statement. Since $A(A^kBv)=A^{k+1}Bv$ for every such generator, $A\mathcal{R} \subseteq \mathcal{R}$. Since $Bv=A^0Bv$, we also have $\operatorname{im} B \subseteq \mathcal{R}$. If $\mathcal{R} \neq \mathbb{R}^n$, set $r := \dim \mathcal{R} < n$ and choose a basis of $\mathbb{R}^n$ whose first $r$ vectors form a basis of $\mathcal{R}$. Writing $\mathbb{R}^n=\mathcal{R}\oplus W$ in these coordinates, there are linear maps $A_{11}:\mathcal{R}\to\mathcal{R}$, $A_{12}:W\to\mathcal{R}$, $A_{22}:W\to W$, $B_1:\mathbb{R}^m\to\mathcal{R}$, $C_1:\mathcal{R}\to\mathbb{R}^p$, and $C_2:W\to\mathbb{R}^p$ such that
\begin{align*}
A(x,y)=(A_{11}x+A_{12}y,A_{22}y)
\end{align*}
for $x\in\mathcal{R}$ and $y\in W$,
\begin{align*}
Bv=(B_1v,0)
\end{align*}
for $v\in\mathbb{R}^m$, and
\begin{align*}
C(x,y)=C_1x+C_2y.
\end{align*}
For every complex number $s$ for which $sI_n-A$ is invertible, the block matrix of $sI_n-A$ is upper triangular with diagonal blocks $sI_r-A_{11}$ and $sI_W-A_{22}$. Its determinant is the product of the determinants of these diagonal blocks, so $sI_r-A_{11}$ is invertible. Solving the two block equations for $(sI_n-A)^{-1}Bv$ gives
\begin{align*}
(sI_n-A)^{-1}Bv=((sI_r-A_{11})^{-1}B_1v,0)
\end{align*}
for every $v\in\mathbb{R}^m$. Therefore
\begin{align*}
C(sI_n-A)^{-1}B + D
=
C_1(sI_r-A_{11})^{-1}B_1 + D.
\end{align*}
Thus $(A_{11},B_1,C_1,D)$ is a realization of the same transfer function with smaller state dimension $r<n$. Hence a minimal realization must be reachable.
[guided]
The reachable subspace records exactly the state directions that can be generated from the input by repeated application of the dynamics. We define
\begin{align*}
\mathcal{R} := \operatorname{span}\{A^kBv : k \geq 0,\ v \in \mathbb{R}^m\},
\end{align*}
where $k$ ranges over the non-negative integers. This definition makes invariance under $A$ immediate: applying $A$ to a generator gives $A(A^kBv)=A^{k+1}Bv$, which is again a generator. Hence $A\mathcal{R} \subseteq \mathcal{R}$. Also, since $Bv=A^0Bv$, we have $\operatorname{im}B \subseteq \mathcal{R}$.
Assume $\mathcal{R} \neq \mathbb{R}^n$, and let $r=\dim \mathcal{R}$. Choose a complementary subspace $W\subseteq\mathbb{R}^n$ so that $\mathbb{R}^n=\mathcal{R}\oplus W$. In this decomposition, $A\mathcal{R}\subseteq \mathcal{R}$ means that there are linear maps $A_{11}:\mathcal{R}\to\mathcal{R}$, $A_{12}:W\to\mathcal{R}$, and $A_{22}:W\to W$ with
\begin{align*}
A(x,y)=(A_{11}x+A_{12}y,A_{22}y)
\end{align*}
for $x\in\mathcal{R}$ and $y\in W$. Also, $\operatorname{im}B\subseteq\mathcal{R}$ means that there is a [linear map](/page/Linear%20Map) $B_1:\mathbb{R}^m\to\mathcal{R}$ with
\begin{align*}
Bv=(B_1v,0)
\end{align*}
for $v\in\mathbb{R}^m$, and the output map has the form
\begin{align*}
C(x,y)=C_1x+C_2y
\end{align*}
for linear maps $C_1:\mathcal{R}\to\mathbb{R}^p$ and $C_2:W\to\mathbb{R}^p$.
Now solve the resolvent equation
\begin{align*}
(sI_n-A)x = Bv
\end{align*}
for a vector $v \in \mathbb{R}^m$. Writing $x=(x_1,x_2)$ according to the same direct-sum decomposition, the second component equation is
\begin{align*}
(sI_W-A_{22})x_2 = 0,
\end{align*}
where $I_W:W\to W$ is the identity map. Whenever $sI_n-A$ is invertible, this forces $x_2=0$. The block matrix of $sI_n-A$ is upper triangular with diagonal blocks $sI_{\mathcal{R}}-A_{11}$ and $sI_W-A_{22}$. Since the full block matrix is invertible, the determinant product for a block triangular matrix shows that $sI_{\mathcal{R}}-A_{11}$ is invertible. The first component equation becomes
\begin{align*}
(sI_{\mathcal{R}}-A_{11})x_1 = B_1v,
\end{align*}
where $I_{\mathcal{R}}:\mathcal{R}\to\mathcal{R}$ is the identity map, so
\begin{align*}
x_1=(sI_{\mathcal{R}}-A_{11})^{-1}B_1v.
\end{align*}
Since this holds for every $v$, we have
\begin{align*}
(sI_n-A)^{-1}Bv=((sI_{\mathcal{R}}-A_{11})^{-1}B_1v,0).
\end{align*}
Multiplying by $C$ gives
\begin{align*}
C(sI_n-A)^{-1}B + D
=
C_1(sI_{\mathcal{R}}-A_{11})^{-1}B_1 + D.
\end{align*}
So the unreachable component contributes nothing to the transfer function. Deleting it gives a realization of the same transfer function with dimension $r<n$, contradicting minimality. Therefore a minimal realization must be reachable.
[/guided]
[/step]
[step:Delete unobservable state directions without changing the transfer function]
Let
\begin{align*}
\mathcal{N} := \bigcap_{k \geq 0}\ker(CA^k)
\end{align*}
be the unobservable subspace of $(C,A)$, where $k$ ranges over the non-negative integers. With the characteristic polynomial $p_A(\lambda)=\lambda^n+a_{n-1}\lambda^{n-1}+\cdots+a_0$ as above, the identity $p_A(A)=0$ gives that every $A^k$ with $k\geq n$ is a linear combination of $I_n,A,\dots,A^{n-1}$. Hence
\begin{align*}
\mathcal{N}=\bigcap_{k=0}^{n-1}\ker(CA^k),
\end{align*}
so this all-powers definition is equivalent to the finite observability test in the statement. If $x\in\mathcal{N}$, then $CA^k(Ax)=CA^{k+1}x=0$ for every $k\geq0$, so $A\mathcal{N}\subseteq \mathcal{N}$. Since $\mathcal{N}\subseteq \ker C$, every state in $\mathcal{N}$ has zero direct output. If $\mathcal{N}\neq\{0\}$, set $q:=\dim\mathcal{N}>0$ and choose a basis of $\mathbb{R}^n$ whose last $q$ vectors form a basis of $\mathcal{N}$. Writing $\mathbb{R}^n=V\oplus\mathcal{N}$ in these coordinates, there are linear maps $A_{11}:V\to V$, $A_{21}:V\to\mathcal{N}$, $A_{22}:\mathcal{N}\to\mathcal{N}$, $B_1:\mathbb{R}^m\to V$, $B_2:\mathbb{R}^m\to\mathcal{N}$, and $C_1:V\to\mathbb{R}^p$ such that
\begin{align*}
A(x,y)=(A_{11}x,A_{21}x+A_{22}y)
\end{align*}
for $x\in V$ and $y\in\mathcal{N}$,
\begin{align*}
Bv=(B_1v,B_2v)
\end{align*}
for $v\in\mathbb{R}^m$, and
\begin{align*}
C(x,y)=C_1x.
\end{align*}
For every complex number $s$ for which $sI_n-A$ is invertible, the block matrix of $sI_n-A$ is lower triangular with diagonal blocks $sI_V-A_{11}$ and $sI_{\mathcal{N}}-A_{22}$. Its determinant is the product of the determinants of these diagonal blocks, so $sI_V-A_{11}$ is invertible. The first component of $(sI_n-A)^{-1}Bv$ is therefore $(sI_V-A_{11})^{-1}B_1v$ for every $v\in\mathbb{R}^m$, where $I_V:V\to V$ is the identity map. Hence
\begin{align*}
C(sI_n-A)^{-1}B + D
=
C_1(sI_V-A_{11})^{-1}B_1 + D.
\end{align*}
Thus $(A_{11},B_1,C_1,D)$ is a realization of the same transfer function with state dimension $n-q<n$. Hence a minimal realization must be observable.
[/step]
[step:Extract equality of Markov parameters from equality of transfer functions]
Let $(A,B,C,D)$ be reachable and observable with state dimension $n$. Let $(\widetilde{A},\widetilde{B},\widetilde{C},\widetilde{D})$ be another realization of the same transfer function with state dimension $\ell$. Here equality of transfer functions means equality as $p \times m$ rational matrix-valued functions on their common domain of definition. Equality of transfer functions gives
\begin{align*}
C(sI_n-A)^{-1}B + D
=
\widetilde{C}(sI_\ell-\widetilde{A})^{-1}\widetilde{B}+\widetilde{D}.
\end{align*}
Taking the limit as $|s|\to\infty$ gives $D=\widetilde{D}$. Let $\|A\|_{\mathrm{op}}$ and $\|\widetilde{A}\|_{\mathrm{op}}$ denote the Euclidean operator norms of $A$ and $\widetilde{A}$. For every complex number $s$ with $|s|>\max\{\|A\|_{\mathrm{op}},\|\widetilde{A}\|_{\mathrm{op}}\}$, the operators $s^{-1}A$ and $s^{-1}\widetilde{A}$ have operator norm strictly less than $1$. Hence the geometric series of operators converge absolutely in operator norm, and multiplication by $I_n-s^{-1}A$ and $I_\ell-s^{-1}\widetilde{A}$ gives
\begin{align*}
(sI_n-A)^{-1}=\sum_{k=0}^{\infty}s^{-k-1}A^k.
\end{align*}
Also,
\begin{align*}
(sI_\ell-\widetilde{A})^{-1}=\sum_{k=0}^{\infty}s^{-k-1}\widetilde{A}^k.
\end{align*}
The two transfer functions are rational matrix-valued functions, and the assumed equality holds on a nonempty open subset of the common resolvent domain outside both spectra. Therefore the equality holds as an identity of analytic matrix-valued functions near infinity. Equality of analytic Laurent expansions there is entrywise equality of coefficients, so comparing coefficients gives the Markov parameter identities
\begin{align*}
CA^kB = \widetilde{C}\widetilde{A}^k\widetilde{B}
\end{align*}
for every integer $k\geq 0$.
[/step]
[step:Construct an injective intertwining map from the reachable realization]
Before defining the comparison map, replace $(\widetilde{A},\widetilde{B},\widetilde{C},\widetilde{D})$, if necessary, by its observable reduction from the previous step. This does not change the transfer function and does not increase the dimension, so it is enough to prove the dimension bound for the reduced realization. Thus we may assume that $(\widetilde{C},\widetilde{A})$ is observable.
Define a linear map first on reachable generators by
\begin{align*}
T(A^kBv) := \widetilde{A}^k\widetilde{B}v
\end{align*}
for every integer $k\geq0$ and every $v\in\mathbb{R}^m$, and extend linearly to $\mathbb{R}^n$, which is possible once well-definedness is proved because $(A,B)$ is reachable.
To prove well-definedness, suppose
\begin{align*}
\sum_{j=1}^{N} A^{k_j}Bv_j = 0,
\end{align*}
where $N\in\mathbb{N}$, each $k_j$ is a non-negative integer, and $v_j\in\mathbb{R}^m$. Define
\begin{align*}
z := \sum_{j=1}^{N}\widetilde{A}^{k_j}\widetilde{B}v_j \in \mathbb{R}^{\ell}.
\end{align*}
For every integer $r\geq 0$, the Markov parameter identities give
\begin{align*}
\widetilde{C}\widetilde{A}^r z=\sum_{j=1}^{N}\widetilde{C}\widetilde{A}^{r+k_j}\widetilde{B}v_j.
\end{align*}
Using the Markov parameter identities once more,
\begin{align*}
\sum_{j=1}^{N}\widetilde{C}\widetilde{A}^{r+k_j}\widetilde{B}v_j=\sum_{j=1}^{N}CA^{r+k_j}Bv_j.
\end{align*}
By linearity of $CA^r$,
\begin{align*}
\sum_{j=1}^{N}CA^{r+k_j}Bv_j=CA^r\left(\sum_{j=1}^{N}A^{k_j}Bv_j\right)=0.
\end{align*}
Thus $z$ lies in the unobservable subspace of $(\widetilde{C},\widetilde{A})$. Since the reduced realization is observable, $z=0$. Hence $T$ is well-defined.
We next prove the intertwining identities. The identity $TB=\widetilde{B}$ follows by taking $k=0$ in the definition of $T$. To prove $TA=\widetilde{A}T$, it is enough by reachability to check it on vectors $A^kBv$ with $k\geq0$. For such a generator, the definition gives
\begin{align*}
TA(A^kBv)=T(A^{k+1}Bv)=\widetilde{A}^{k+1}\widetilde{B}v=\widetilde{A}T(A^kBv).
\end{align*}
This proves $TA=\widetilde{A}T$ on all of $\mathbb{R}^n$. Finally, for every generator $A^kBv$ with $k\geq0$, the Markov parameter identity gives
\begin{align*}
C(A^kBv)=\widetilde{C}\widetilde{A}^k\widetilde{B}v=\widetilde{C}T(A^kBv),
\end{align*}
so $C=\widetilde{C}T$ by reachability.
If $x\in\ker T$, then applying $C=\widetilde{C}T$ and $TA=\widetilde{A}T$ repeatedly gives, for every integer $r\geq0$,
\begin{align*}
CA^r x=\widetilde{C}TA^r x=\widetilde{C}\widetilde{A}^rTx=0.
\end{align*}
Since $(C,A)$ is observable, $x=0$. Hence $T$ is injective, so $n\leq \ell$.
[guided]
The delicate point is that $T$ is defined from a spanning set rather than from a basis, so relations among reachable generators must be respected. We first remove any unobservable part of the competing realization. This is legitimate because the previous reduction preserves the transfer function and cannot increase the state dimension; after this replacement, $(\widetilde{C},\widetilde{A})$ is observable.
Define $T$ on generators by
\begin{align*}
T(A^kBv)=\widetilde{A}^k\widetilde{B}v
\end{align*}
for every integer $k\geq0$ and every $v\in\mathbb{R}^m$. To check that this rule is independent of the chosen representation, assume a finite relation
\begin{align*}
\sum_{j=1}^{N} A^{k_j}Bv_j = 0,
\end{align*}
where $N\in\mathbb{N}$, each $k_j$ is a non-negative integer, and $v_j\in\mathbb{R}^m$. Let
\begin{align*}
z=\sum_{j=1}^{N}\widetilde{A}^{k_j}\widetilde{B}v_j.
\end{align*}
For every integer $r\geq0$, equality of Markov parameters gives
\begin{align*}
\widetilde{C}\widetilde{A}^r z=\sum_{j=1}^{N}\widetilde{C}\widetilde{A}^{r+k_j}\widetilde{B}v_j.
\end{align*}
The same Markov parameter identity changes each summand into the corresponding summand for the original realization:
\begin{align*}
\sum_{j=1}^{N}\widetilde{C}\widetilde{A}^{r+k_j}\widetilde{B}v_j=\sum_{j=1}^{N}CA^{r+k_j}Bv_j.
\end{align*}
Factoring out $CA^r$ from the finite sum gives
\begin{align*}
\sum_{j=1}^{N}CA^{r+k_j}Bv_j=CA^r\left(\sum_{j=1}^{N}A^{k_j}Bv_j\right)=0.
\end{align*}
Thus $\widetilde{C}\widetilde{A}^r z=0$ for every $r\geq0$. Since $(\widetilde{C},\widetilde{A})$ is observable, the only such vector is $0$, so $z=0$. This proves that the proposed rule for $T$ respects every linear relation among the generators, hence defines a linear map $T:\mathbb{R}^n\to\mathbb{R}^\ell$ because $(A,B)$ is reachable.
Now we justify the intertwining identity rather than merely naming it. The equality $TB=\widetilde{B}$ follows from the definition with $k=0$. For $TA=\widetilde{A}T$, take a generator $A^kBv$ with $k\geq0$. Then
\begin{align*}
TA(A^kBv)=T(A^{k+1}Bv)=\widetilde{A}^{k+1}\widetilde{B}v=\widetilde{A}T(A^kBv).
\end{align*}
Reachability extends this identity from generators to every vector in $\mathbb{R}^n$.
Finally, on each generator the Markov parameter identity at exponent $k$ gives
\begin{align*}
C(A^kBv)=\widetilde{C}\widetilde{A}^k\widetilde{B}v=\widetilde{C}T(A^kBv),
\end{align*}
so $C=\widetilde{C}T$ on all of $\mathbb{R}^n$. If $x\in\ker T$, then the two identities just proved imply
\begin{align*}
CA^r x=\widetilde{C}TA^r x=\widetilde{C}\widetilde{A}^rTx=0
\end{align*}
for every integer $r\geq0$. Observability of $(C,A)$ forces $x=0$. Hence $T$ is injective, and an injective linear map $T:\mathbb{R}^n\to\mathbb{R}^\ell$ gives $n\leq\ell$.
[/guided]
[/step]
[step:Conclude minimality and identify the McMillan degree]
The preceding step shows that every realization of the same transfer function has, after deleting its unobservable part if necessary, state dimension at least $n$. Since deletion cannot increase dimension, the original competing realization also has dimension at least $n$. Therefore a reachable and observable realization is minimal.
Combining this with the first two steps proves that a finite-dimensional realization is minimal if and only if it is reachable and observable. Since all minimal realizations of a fixed transfer function have the same least possible state dimension, this common dimension is exactly the McMillan degree of the transfer function. Thus, for a reachable and observable realization $(A,B,C,D)$, the state dimension $n$ equals the McMillan degree of its transfer function.
[/step]