[proofplan]
We prove both directions by comparing transfer functions through their Markov parameters $CA^kB$. If the realization is not reachable, we restrict the dynamics to the reachable subspace; if it is not observable, we quotient by the unobservable subspace. Conversely, for a reachable and observable realization, the infinite Hankel map formed from the Markov parameters factors as an observable map after a reachable map, so its rank is exactly the state dimension. Any competing realization factors the same Hankel map through its own state space, forcing its dimension to be at least $n$.
[/proofplan]
[step:Show that equality of transfer functions is equality of Markov parameters]
Let $X := \mathbb{R}^n$, $U := \mathbb{R}^m$, and $Y := \mathbb{R}^p$. We regard $A: X \to X$, $B: U \to X$, $C: X \to Y$, and $D: U \to Y$ as linear maps represented by the given matrices. When evaluating the transfer function at a complex parameter $\lambda \in \mathbb{C}$, we extend these real linear maps complex-linearly to $X_{\mathbb{C}} := X \otimes_{\mathbb{R}} \mathbb{C}$, $U_{\mathbb{C}} := U \otimes_{\mathbb{R}} \mathbb{C}$, and $Y_{\mathbb{C}} := Y \otimes_{\mathbb{R}} \mathbb{C}$. Define the transfer function as the map
\begin{align*}
G_{A,B,C,D}: \{\lambda \in \mathbb{C} : \lambda I_n-A \text{ is invertible}\} \to \mathcal{L}(U_{\mathbb{C}},Y_{\mathbb{C}}).
\end{align*}
For each $\lambda$ in its domain, this map is given by
\begin{align*}
G_{A,B,C,D}(\lambda)=D+C(\lambda I_n-A)^{-1}B.
\end{align*}
The Markov parameters of the realization are the linear maps $CA^kB: U \to Y$ for $k \in \mathbb{N}\cup\{0\}$.
For $|\lambda|$ sufficiently large, define $w := \lambda^{-1}$. Then $|w|$ is sufficiently small that the spectral radius of $wA$ is strictly less than $1$. Hence $I_n-wA$ is invertible and the Neumann series converges in the finite-dimensional operator norm to its inverse:
\begin{align*}
(I_n-wA)^{-1}=\sum_{k=0}^{\infty} w^k A^k.
\end{align*}
Thus
\begin{align*}
(\lambda I_n-A)^{-1}=w(I_n-wA)^{-1}
\end{align*}
and hence
\begin{align*}
G_{A,B,C,D}(\lambda)=D+\sum_{k=0}^{\infty} CA^kB\,\lambda^{-k-1}.
\end{align*}
By [uniqueness of Laurent series coefficients](/page/Laurent%20Series) for matrix-valued holomorphic functions on a punctured neighbourhood of $\lambda=\infty$, applied entrywise to the Laurent expansions above, if two finite-dimensional realizations have the same transfer function, then their Laurent expansions at $\lambda=\infty$ have the same coefficients. Conversely, if two realizations have equal direct terms and equal Markov parameters $CA^kB$ for every $k \in \mathbb{N}\cup\{0\}$, then the displayed Laurent expansions agree for all sufficiently large $|\lambda|$. Each transfer function entry is a rational function of $\lambda$, hence holomorphic away from the finite set of poles determined by the corresponding characteristic polynomial. Equality on the large-$|\lambda|$ pole-free region therefore implies equality of the corresponding rational matrix entries on the common pole-free domain, by the identity theorem applied entrywise on every connected component of that domain. In particular, if
\begin{align*}
(\hat A,\hat B,\hat C,\hat D) \in \mathbb{R}^{\hat n \times \hat n} \times \mathbb{R}^{\hat n \times m} \times \mathbb{R}^{p \times \hat n} \times \mathbb{R}^{p \times m}
\end{align*}
has the same transfer function, then
\begin{align*}
D=\hat D
\end{align*}
and, for every $k \in \mathbb{N}\cup\{0\}$,
\begin{align*}
CA^kB=\hat C\hat A^k\hat B.
\end{align*}
[/step]
[step:Reduce a nonreachable realization to its reachable subspace]
Define the reachable subspace
\begin{align*}
\mathcal{R}(A,B) := \operatorname{span}\{A^kBu : k \in \mathbb{N}\cup\{0\},\ u \in U\} \subset X.
\end{align*}
Assume $\mathcal{R}(A,B) \neq X$. Let $r := \dim \mathcal{R}(A,B)$, so $r<n$. The subspace $\mathcal{R}(A,B)$ is $A$-invariant because, for every generator $A^kBu$ with $k \in \mathbb{N}\cup\{0\}$ and $u \in U$,
\begin{align*}
A(A^kBu)=A^{k+1}Bu \in \mathcal{R}(A,B).
\end{align*}
It also contains $B(U)$ by taking $k=0$.
Define the restricted linear maps
\begin{align*}
A_{\mathcal{R}}: \mathcal{R}(A,B) \to \mathcal{R}(A,B), \qquad A_{\mathcal{R}}x := Ax,
\end{align*}
\begin{align*}
B_{\mathcal{R}}: U \to \mathcal{R}(A,B), \qquad B_{\mathcal{R}}u := Bu,
\end{align*}
and
\begin{align*}
C_{\mathcal{R}}: \mathcal{R}(A,B) \to Y, \qquad C_{\mathcal{R}}x := Cx.
\end{align*}
For every $k \in \mathbb{N}\cup\{0\}$ and $u \in U$, induction on $k$ gives
\begin{align*}
A_{\mathcal{R}}^kB_{\mathcal{R}}u=A^kBu.
\end{align*}
Applying $C_{\mathcal{R}}$ gives
\begin{align*}
C_{\mathcal{R}}A_{\mathcal{R}}^kB_{\mathcal{R}}u=CA^kBu.
\end{align*}
Choose a basis of the finite-dimensional space $\mathcal{R}(A,B)$ and represent $A_{\mathcal{R}}$, $B_{\mathcal{R}}$, and $C_{\mathcal{R}}$ by matrices in that basis and the fixed bases of $U$ and $Y$. Thus the reduced realization $(A_{\mathcal{R}},B_{\mathcal{R}},C_{\mathcal{R}},D)$ has the same Markov parameters and the same direct term $D$ as $(A,B,C,D)$. By the Laurent-expansion converse proved in the first step, the two transfer functions agree. Its state dimension is $r<n$, so the original realization is not minimal.
[/step]
[step:Reduce a nonobservable realization by quotienting the unobservable subspace]
Define the unobservable subspace
\begin{align*}
\mathcal{N}(C,A) := \{x \in X : CA^k x=0 \text{ for every } k \in \mathbb{N}\cup\{0\}\} \subset X.
\end{align*}
Assume $\mathcal{N}(C,A) \neq \{0\}$. Let $q: X \to X/\mathcal{N}(C,A)$ be the quotient map. The subspace $\mathcal{N}(C,A)$ is $A$-invariant: if $x \in \mathcal{N}(C,A)$, then for every $k \in \mathbb{N}\cup\{0\}$,
\begin{align*}
CA^k(Ax)=CA^{k+1}x=0.
\end{align*}
Therefore the induced [linear map](/page/Linear%20Map)
\begin{align*}
A_q: X/\mathcal{N}(C,A) \to X/\mathcal{N}(C,A), \qquad A_q(qx) := q(Ax)
\end{align*}
is well-defined.
Define
\begin{align*}
B_q: U \to X/\mathcal{N}(C,A), \qquad B_q u := q(Bu).
\end{align*}
Since $\mathcal{N}(C,A) \subseteq \ker C$, the map
\begin{align*}
C_q: X/\mathcal{N}(C,A) \to Y, \qquad C_q(qx) := Cx
\end{align*}
is well-defined: if $qx=qx'$, then $x-x' \in \mathcal{N}(C,A) \subseteq \ker C$, so $Cx=Cx'$.
For every $k \in \mathbb{N}\cup\{0\}$ and $u \in U$, induction gives
\begin{align*}
A_q^kB_q u=q(A^kBu).
\end{align*}
Applying $C_q$ gives
\begin{align*}
C_qA_q^kB_q u=CA^kBu.
\end{align*}
Choose a basis of the finite-dimensional quotient space $X/\mathcal{N}(C,A)$ and represent $A_q$, $B_q$, and $C_q$ by matrices in that basis and the fixed bases of $U$ and $Y$. Thus $(A_q,B_q,C_q,D)$ has the same Markov parameters and the same direct term $D$ as $(A,B,C,D)$. By the Laurent-expansion converse proved in the first step, the two transfer functions agree. Its state dimension is
\begin{align*}
\dim(X/\mathcal{N}(C,A))=n-\dim\mathcal{N}(C,A)<n.
\end{align*}
So the original realization is not minimal.
[/step]
[step:Factor the Hankel map through the state space]
Assume now that $\mathcal{R}(A,B)=X$ and $\mathcal{N}(C,A)=\{0\}$. Let
\begin{align*}
E := \{(u_j)_{j=0}^{\infty} : u_j \in U \text{ for every } j,\ u_j=0 \text{ for all but finitely many } j\}
\end{align*}
be the [vector space](/page/Vector%20Space) of finitely supported input sequences. Let
\begin{align*}
S := \{(y_i)_{i=0}^{\infty} : y_i \in Y \text{ for every } i\}
\end{align*}
be the vector space of output sequences.
Define the reachability map
\begin{align*}
\mathscr{R}: E \to X, \qquad \mathscr{R}((u_j)_{j=0}^{\infty}) := \sum_{j=0}^{\infty} A^jBu_j.
\end{align*}
The sum is finite because $(u_j)_{j=0}^{\infty}$ has finite support. Since $\mathcal{R}(A,B)=X$, the map $\mathscr{R}$ is surjective.
Define the observability map
\begin{align*}
\mathscr{O}: X \to S, \qquad \mathscr{O}(x) := (CA^i x)_{i=0}^{\infty}.
\end{align*}
Since $\mathcal{N}(C,A)=\{0\}$, the map $\mathscr{O}$ is injective.
Define the Hankel map
\begin{align*}
\mathscr{H}: E \to S, \qquad \mathscr{H} := \mathscr{O}\mathscr{R}.
\end{align*}
For $(u_j)_{j=0}^{\infty} \in E$, its $i$th output component is
\begin{align*}
(\mathscr{H}(u))_i=\sum_{j=0}^{\infty} CA^{i+j}Bu_j.
\end{align*}
Thus $\mathscr{H}$ is exactly the block Hankel map with block entries $CA^{i+j}B$. Since $\mathscr{R}$ is surjective and $\mathscr{O}$ is injective,
\begin{align*}
\operatorname{rank}\mathscr{H}=\dim \mathscr{O}(X)=\dim X=n.
\end{align*}
[guided]
The purpose of this step is to turn reachability and observability into one rank statement. We define the input-sequence space
\begin{align*}
E := \{(u_j)_{j=0}^{\infty} : u_j \in U \text{ for every } j,\ u_j=0 \text{ for all but finitely many } j\}
\end{align*}
and the output-sequence space
\begin{align*}
S := \{(y_i)_{i=0}^{\infty} : y_i \in Y \text{ for every } i\}.
\end{align*}
The finite-support condition on $E$ ensures that all sums below are finite algebraic sums, so no convergence issue is involved.
Define
\begin{align*}
\mathscr{R}: E \to X, \qquad \mathscr{R}((u_j)_{j=0}^{\infty}) := \sum_{j=0}^{\infty} A^jBu_j.
\end{align*}
This map records all state directions produced by finitely many input impulses. Its range is precisely
\begin{align*}
\operatorname{span}\{A^jBu : j \in \mathbb{N}\cup\{0\},\ u \in U\}=\mathcal{R}(A,B).
\end{align*}
Since the realization is reachable, $\mathcal{R}(A,B)=X$, so $\mathscr{R}$ is surjective.
Next define
\begin{align*}
\mathscr{O}: X \to S, \qquad \mathscr{O}(x) := (CA^i x)_{i=0}^{\infty}.
\end{align*}
This map sends an initial state $x$ to its entire future output history under zero input. Its kernel is
\begin{align*}
\ker \mathscr{O}=\{x \in X : CA^i x=0 \text{ for every } i \in \mathbb{N}\cup\{0\}\}=\mathcal{N}(C,A).
\end{align*}
Since the realization is observable, $\mathcal{N}(C,A)=\{0\}$, so $\mathscr{O}$ is injective.
Now define
\begin{align*}
\mathscr{H}: E \to S, \qquad \mathscr{H}:=\mathscr{O}\mathscr{R}.
\end{align*}
For $u=(u_j)_{j=0}^{\infty}\in E$, the $i$th component of $\mathscr{H}(u)$ is
\begin{align*}
(\mathscr{H}(u))_i=C A^i\left(\sum_{j=0}^{\infty} A^jBu_j\right).
\end{align*}
By linearity of $A^i$ and $C$, this becomes
\begin{align*}
(\mathscr{H}(u))_i=\sum_{j=0}^{\infty} CA^{i+j}Bu_j.
\end{align*}
Thus the matrix of $\mathscr{H}$ is the infinite block Hankel matrix whose $(i,j)$ block is $CA^{i+j}B$.
Because $\mathscr{R}$ is surjective, the image of $\mathscr{H}=\mathscr{O}\mathscr{R}$ is exactly $\mathscr{O}(X)$. Because $\mathscr{O}$ is injective, $\dim \mathscr{O}(X)=\dim X=n$. Therefore
\begin{align*}
\operatorname{rank}\mathscr{H}=n.
\end{align*}
This is the core minimality mechanism: reachable states supply $n$ independent state directions, and observability prevents two distinct state directions from producing the same output history.
[/guided]
[/step]
[step:Compare every competing realization through the same Hankel map]
Let
\begin{align*}
(\hat A,\hat B,\hat C,\hat D) \in \mathbb{R}^{\hat n \times \hat n} \times \mathbb{R}^{\hat n \times m} \times \mathbb{R}^{p \times \hat n} \times \mathbb{R}^{p \times m}
\end{align*}
be any realization with the same transfer function as $(A,B,C,D)$. Let $\hat X := \mathbb{R}^{\hat n}$. Define
\begin{align*}
\widehat{\mathscr{R}}: E \to \hat X, \qquad \widehat{\mathscr{R}}((u_j)_{j=0}^{\infty}) := \sum_{j=0}^{\infty} \hat A^j\hat B u_j
\end{align*}
and
\begin{align*}
\widehat{\mathscr{O}}: \hat X \to S, \qquad \widehat{\mathscr{O}}(\hat x) := (\hat C\hat A^i\hat x)_{i=0}^{\infty}.
\end{align*}
The first step gives
\begin{align*}
CA^kB=\hat C\hat A^k\hat B
\end{align*}
for every $k \in \mathbb{N}\cup\{0\}$. Therefore, for every $u=(u_j)_{j=0}^{\infty}\in E$ and every $i \in \mathbb{N}\cup\{0\}$,
\begin{align*}
(\mathscr{H}(u))_i=\sum_{j=0}^{\infty} CA^{i+j}Bu_j=\sum_{j=0}^{\infty} \hat C\hat A^{i+j}\hat B u_j=(\widehat{\mathscr{O}}\widehat{\mathscr{R}}(u))_i.
\end{align*}
Hence
\begin{align*}
\mathscr{H}=\widehat{\mathscr{O}}\widehat{\mathscr{R}}.
\end{align*}
Since this factorization passes through the vector space $\hat X$, its rank is at most $\dim \hat X=\hat n$:
\begin{align*}
\operatorname{rank}\mathscr{H}\leq \hat n.
\end{align*}
From the previous step, $\operatorname{rank}\mathscr{H}=n$. Therefore
\begin{align*}
n\leq \hat n.
\end{align*}
Thus no realization with the same transfer function has state dimension smaller than $n$.
[/step]
[step:Conclude the equivalence]
If $(A,B,C,D)$ is not reachable or not observable, the preceding reduction steps construct a realization of the same transfer function with strictly smaller state dimension, so $(A,B,C,D)$ is not minimal.
Conversely, if $\mathcal{R}(A,B)=\mathbb{R}^n$ and $\mathcal{N}(C,A)=\{0\}$, then every realization of the same transfer function has state dimension at least $n$. Hence $(A,B,C,D)$ is minimal. This proves the equivalence.
[/step]