[proofplan]
We count admissible words and identify their exponential growth with the growth of the non-negative matrix powers of $M$. The word count is exactly the sum of all entries of $M^{n-1}$, so the entropy reduces to a finite-dimensional spectral-radius computation. We prove the required spectral-radius growth statement directly from [Jordan normal form](/theorems/864), avoiding any external citation. Finally, reducibility is handled by permuting the strongly connected components into block upper triangular form, where the spectrum is the union of the spectra of the diagonal blocks.
[/proofplan]
[step:Translate admissible words into entries of powers of the transition matrix]
Let $\mathcal{A} := \{1,\dots,k\}$ be the alphabet. For each $n \in \mathbb{N}$, define the set of admissible words of length $n$ by
\begin{align*}
\mathcal{W}_n(M) := \{(i_1,\dots,i_n) \in \mathcal{A}^n : M_{i_\ell i_{\ell+1}} = 1 \text{ for every } 1 \leq \ell \leq n-1\}.
\end{align*}
Let $S_n(M) \in \mathbb{N} \cup \{0\}$ denote the sum of all entries of $M^n$:
\begin{align*}
S_n(M) := \sum_{i=1}^{k}\sum_{j=1}^{k} (M^n)_{ij}.
\end{align*}
For $n \geq 2$, the entry $(M^{n-1})_{ij}$ counts admissible paths of length $n$ beginning at $i$ and ending at $j$. Therefore
\begin{align*}
|\mathcal{W}_n(M)| = S_{n-1}(M).
\end{align*}
For $n=1$, $|\mathcal{W}_1(M)| = k$, which does not affect the exponential growth rate.
[guided]
The matrix $M$ is the adjacency matrix of a directed graph with vertex set $\mathcal{A} = \{1,\dots,k\}$: there is an arrow $i \to j$ exactly when $M_{ij}=1$. A word $(i_1,\dots,i_n)$ is admissible precisely when it follows arrows in this graph.
For $n \in \mathbb{N}$, define
\begin{align*}
\mathcal{W}_n(M) := \{(i_1,\dots,i_n) \in \mathcal{A}^n : M_{i_\ell i_{\ell+1}} = 1 \text{ for every } 1 \leq \ell \leq n-1\}.
\end{align*}
Also define the total entry sum of $M^n$ by
\begin{align*}
S_n(M) := \sum_{i=1}^{k}\sum_{j=1}^{k} (M^n)_{ij}.
\end{align*}
Why does $M^{n-1}$ appear? Matrix multiplication expands $(M^{n-1})_{ij}$ as a sum over all intermediate vertices:
\begin{align*}
(M^{n-1})_{ij} = \sum_{i_2,\dots,i_{n-1}=1}^{k} M_{i i_2} M_{i_2 i_3}\cdots M_{i_{n-1} j}.
\end{align*}
Each product in this sum is either $0$ or $1$. It equals $1$ exactly when
\begin{align*}
i \to i_2 \to \cdots \to i_{n-1} \to j
\end{align*}
is an admissible path. Thus $(M^{n-1})_{ij}$ counts the admissible words of length $n$ whose first symbol is $i$ and whose last symbol is $j$. Summing over all possible first and last symbols gives
\begin{align*}
|\mathcal{W}_n(M)| = \sum_{i=1}^{k}\sum_{j=1}^{k} (M^{n-1})_{ij} = S_{n-1}(M)
\end{align*}
for every $n \geq 2$. The case $n=1$ gives $|\mathcal{W}_1(M)|=k$, and this isolated initial value has no effect on a limit of the form $\lim_{n \to \infty} n^{-1}\log |\mathcal{W}_n(M)|$.
[/guided]
[/step]
[step:Relate topological entropy to the exponential growth of admissible words]
For the topological Markov chain $\Sigma_M$ with the shift map $\sigma|_{\Sigma_M}$, topological entropy is the exponential growth rate of admissible cylinder words:
\begin{align*}
h_{\mathrm{top}}(\sigma|_{\Sigma_M}) = \lim_{n \to \infty} \frac{1}{n}\log |\mathcal{W}_n(M)|.
\end{align*}
Using $|\mathcal{W}_n(M)|=S_{n-1}(M)$ for $n \geq 2$, it is enough to prove
\begin{align*}
\lim_{m \to \infty} S_m(M)^{1/m} = \rho(M).
\end{align*}
[/step]
[step:Prove that entry sums grow at the spectral radius]
Let $\|\cdot\|_1$ denote the matrix norm on $\mathbb{R}^{k \times k}$ induced by the vector norm $|x|_1=\sum_{i=1}^{k}|x_i|$. Since $M^m$ has non-negative entries, for every $m \in \mathbb{N}$,
\begin{align*}
\|M^m\|_1 \leq S_m(M) \leq k\|M^m\|_1.
\end{align*}
Therefore $S_m(M)^{1/m}$ and $\|M^m\|_1^{1/m}$ have the same limit if the latter limit exists.
We now prove the finite-dimensional spectral-radius formula for the norm $\|\cdot\|_1$. Let $A \in \mathbb{C}^{k \times k}$ be arbitrary. Put $A=PJP^{-1}$ in Jordan normal form, where $P \in GL(k,\mathbb{C})$ and $J$ is block diagonal with Jordan blocks. If $\lambda$ is an eigenvalue of $A$ and $J_{\lambda,d}$ is a Jordan block of size $d$, then
\begin{align*}
J_{\lambda,d}^m = \sum_{\ell=0}^{d-1} \binom{m}{\ell}\lambda^{m-\ell}N^\ell,
\end{align*}
where $N \in \mathbb{C}^{d \times d}$ is the nilpotent matrix with ones on the superdiagonal and zeros elsewhere. Hence there is a constant $C_A>0$ such that
\begin{align*}
\|A^m\|_1 \leq C_A m^{k-1}(\rho(A)+\varepsilon)^m
\end{align*}
for every $\varepsilon>0$ and every $m \in \mathbb{N}$. Taking $m$-th roots and then letting $m \to \infty$ gives
\begin{align*}
\limsup_{m \to \infty}\|A^m\|_1^{1/m} \leq \rho(A)+\varepsilon.
\end{align*}
Since $\varepsilon>0$ is arbitrary,
\begin{align*}
\limsup_{m \to \infty}\|A^m\|_1^{1/m} \leq \rho(A).
\end{align*}
Conversely, if $\lambda$ is an eigenvalue of $A$ with $|\lambda|=\rho(A)$ and $v \in \mathbb{C}^k \setminus \{0\}$ is an eigenvector, then
\begin{align*}
\|A^m\|_1 |v|_1 \geq |A^m v|_1 = |\lambda|^m |v|_1 = \rho(A)^m |v|_1.
\end{align*}
Thus $\|A^m\|_1^{1/m} \geq \rho(A)$ for every $m$, and consequently
\begin{align*}
\lim_{m \to \infty}\|A^m\|_1^{1/m} = \rho(A).
\end{align*}
Applying this to $A=M$ and using the comparison between $S_m(M)$ and $\|M^m\|_1$ gives
\begin{align*}
\lim_{m \to \infty} S_m(M)^{1/m} = \rho(M).
\end{align*}
[/step]
[step:Use nonemptiness to force a directed cycle and positive spectral radius]
Because $\Sigma_M \neq \varnothing$, choose $x=(x_n)_{n \in \mathbb{N}} \in \Sigma_M$. The sequence $(x_n)_{n \in \mathbb{N}}$ takes values in the finite set $\{1,\dots,k\}$, so there exist integers $a,b \in \mathbb{N}$ with $a<b$ and $x_a=x_b$. The admissibility condition gives a directed cycle
\begin{align*}
x_a \to x_{a+1} \to \cdots \to x_{b-1} \to x_b=x_a.
\end{align*}
Let $\ell:=b-a$. Then $(M^\ell)_{x_a x_a} \geq 1$, so $S_\ell(M)\geq 1$. Repeating the same cycle $q$ times gives $S_{q\ell}(M)\geq 1$ for every $q \in \mathbb{N}$. Therefore
\begin{align*}
\rho(M) = \lim_{m \to \infty}S_m(M)^{1/m} \geq \limsup_{q \to \infty} S_{q\ell}(M)^{1/(q\ell)} \geq 1.
\end{align*}
In particular, $\rho(M)>0$.
[/step]
[step:Conclude the entropy formula]
Combining the word-count identity and the entry-sum growth formula, we obtain
\begin{align*}
h_{\mathrm{top}}(\sigma|_{\Sigma_M}) = \lim_{n \to \infty}\frac{1}{n}\log |\mathcal{W}_n(M)| = \lim_{n \to \infty}\frac{1}{n}\log S_{n-1}(M).
\end{align*}
Since $\rho(M)>0$ and $S_{n-1}(M)^{1/(n-1)} \to \rho(M)$,
\begin{align*}
\lim_{n \to \infty}\frac{1}{n}\log S_{n-1}(M) = \log \rho(M).
\end{align*}
Thus
\begin{align*}
h_{\mathrm{top}}(\sigma|_{\Sigma_M})=\log \rho(M).
\end{align*}
[/step]
[step:Identify the spectral radius in the reducible case]
Let $G_M$ be the directed graph with vertex set $\{1,\dots,k\}$ and edge set
\begin{align*}
E_M := \{(i,j) \in \{1,\dots,k\}^2 : M_{ij}=1\}.
\end{align*}
Let $C_1,\dots,C_r$ be the strongly connected components of $G_M$, ordered so that if there is a directed path from $C_a$ to $C_b$ with $a \neq b$, then $a \leq b$. Such an order exists because the condensation graph of strongly connected components is acyclic. Let $P \in \mathbb{R}^{k \times k}$ be the permutation matrix that lists the vertices component by component in this order. Let $B := PMP^{-1} \in \mathbb{R}^{k \times k}$. With respect to the ordered decomposition of the vertex set into $C_1,\dots,C_r$, the matrix $B$ is block upper triangular: its diagonal block on $C_j \times C_j$ is $M_j$, and its block on $C_a \times C_b$ is the zero block whenever $a>b$. Here each $M_j$ is the transition submatrix induced by the component $C_j$.
For every $\lambda \in \mathbb{C}$, the matrix $\lambda I-B$ is block upper triangular with diagonal blocks $\lambda I-M_j$. Hence
\begin{align*}
\det(\lambda I-B) = \prod_{j=1}^{r}\det(\lambda I-M_j).
\end{align*}
Since $B=PMP^{-1}$ is similar to $M$, it has the same characteristic polynomial as $M$. Therefore the eigenvalues of $M$ are exactly the union, with algebraic multiplicity, of the eigenvalues of the diagonal blocks $M_1,\dots,M_r$. Taking the maximum modulus of these eigenvalues gives
\begin{align*}
\rho(M)=\max_{1 \leq j \leq r}\rho(M_j).
\end{align*}
This proves the reducible statement and completes the proof.
[/step]