[proofplan]
We first verify directly from the Perron eigenvector identities that $P$ is stochastic and that $\pi$ is stationary, which proves shift-invariance of the Markov measure. We then compute the Markov entropy of $\mu_P$; the logarithm of the Parry transition probability splits into a constant term $\log \lambda$ and a coboundary term involving $\log r_i$, and the coboundary averages to zero by stationarity. The equality with topological entropy follows from the standard word-growth formula for irreducible shifts of finite type. Finally, for uniqueness, we compare the conditional transition law of an arbitrary invariant measure with the Parry transition law using the Gibbs inequality for conditional entropy; equality forces the same Markov transition probabilities and hence the same cylinder weights.
[/proofplan]
[step:Verify that the Parry transition matrix is stochastic and has stationary vector $\pi$]
For each $i \in \{1,\dots,N\}$, the $i$-th coordinate of $Mr=\lambda r$ gives
\begin{align*}
\sum_{j=1}^N M_{ij} r_j = \lambda r_i.
\end{align*}
Since $r_i>0$, this implies
\begin{align*}
\sum_{j=1}^N P_{ij} = \sum_{j=1}^N \frac{M_{ij} r_j}{\lambda r_i} = 1.
\end{align*}
Thus $P$ is a stochastic matrix. Also, for each $j \in \{1,\dots,N\}$, using the $j$-th coordinate of $l^\top M=\lambda l^\top$ gives
\begin{align*}
\sum_{i=1}^N \pi_i P_{ij} = \sum_{i=1}^N l_i r_i \frac{M_{ij} r_j}{\lambda r_i} = \frac{r_j}{\lambda}\sum_{i=1}^N l_i M_{ij} = l_j r_j = \pi_j.
\end{align*}
Hence $\pi P=\pi$. The normalization $\sum_i l_i r_i=1$ gives $\sum_i \pi_i=1$, so $\pi$ is a stationary probability vector for $P$.
Because $P_{ij}=0$ whenever $M_{ij}=0$, the stationary Markov measure with initial distribution $\pi$ and transition matrix $P$ is supported on $\Sigma_M$. The stationarity identity $\pi P=\pi$ implies that its finite-dimensional distributions are invariant under deleting the first coordinate, so the resulting Borel probability measure $\mu_P$ satisfies $\sigma_\#\mu_P=\mu_P$.
[/step]
[step:Compute the entropy of the Parry measure]
Since $\mu_P$ is a stationary finite-state Markov measure, its measure-theoretic entropy is
\begin{align*}
h_{\mu_P}(\sigma) = -\sum_{i=1}^N \pi_i \sum_{j=1}^N P_{ij}\log P_{ij},
\end{align*}
with the convention $0\log 0=0$. For every pair $(i,j)$ with $P_{ij}>0$, equivalently $M_{ij}=1$, the definition of $P$ gives
\begin{align*}
\log P_{ij} = \log r_j - \log \lambda - \log r_i.
\end{align*}
Therefore
\begin{align*}
h_{\mu_P}(\sigma) = -\sum_{i=1}^N \pi_i \sum_{j=1}^N P_{ij}(\log r_j - \log \lambda - \log r_i).
\end{align*}
Using $\sum_j P_{ij}=1$ and $\sum_i \pi_i=1$, the constant term contributes $\log\lambda$. The remaining two terms are
\begin{align*}
\sum_{i=1}^N \pi_i \log r_i - \sum_{i=1}^N \pi_i \sum_{j=1}^N P_{ij}\log r_j.
\end{align*}
By stationarity, $\sum_i \pi_i P_{ij}=\pi_j$, so
\begin{align*}
\sum_{i=1}^N \pi_i \sum_{j=1}^N P_{ij}\log r_j = \sum_{j=1}^N \pi_j \log r_j.
\end{align*}
The two eigenvector terms cancel, and hence
\begin{align*}
h_{\mu_P}(\sigma)=\log\lambda.
\end{align*}
[guided]
The entropy formula for a stationary finite-state Markov measure says that the entropy rate is the average uncertainty of one transition:
\begin{align*}
h_{\mu_P}(\sigma) = -\sum_{i=1}^N \pi_i \sum_{j=1}^N P_{ij}\log P_{ij}.
\end{align*}
Here the hypotheses are satisfied because $\mu_P$ was defined from a stationary probability vector $\pi$ and a finite stochastic matrix $P$.
The useful feature of the Parry transition probabilities is that their logarithm separates into a constant part and a telescoping part. If $P_{ij}>0$, then $M_{ij}=1$, and
\begin{align*}
P_{ij} = \frac{r_j}{\lambda r_i}.
\end{align*}
Taking logarithms gives
\begin{align*}
\log P_{ij} = \log r_j - \log \lambda - \log r_i.
\end{align*}
Substituting this into the entropy formula yields
\begin{align*}
h_{\mu_P}(\sigma) = -\sum_{i=1}^N \pi_i \sum_{j=1}^N P_{ij}(\log r_j - \log \lambda - \log r_i).
\end{align*}
Now separate the three terms. Since each row of $P$ sums to $1$ and $\pi$ is a probability vector, the $\log\lambda$ term contributes exactly $\log\lambda$. The term involving $\log r_i$ becomes
\begin{align*}
\sum_{i=1}^N \pi_i \log r_i.
\end{align*}
The term involving $\log r_j$ becomes
\begin{align*}
\sum_{i=1}^N \pi_i \sum_{j=1}^N P_{ij}\log r_j.
\end{align*}
Stationarity is precisely the identity needed to rewrite this last expression:
\begin{align*}
\sum_{i=1}^N \pi_i \sum_{j=1}^N P_{ij}\log r_j = \sum_{j=1}^N \pi_j \log r_j.
\end{align*}
Thus the two eigenvector terms are the same number with opposite signs. They cancel, leaving
\begin{align*}
h_{\mu_P}(\sigma)=\log\lambda.
\end{align*}
[/guided]
[/step]
[step:Identify the topological entropy with the Perron eigenvalue]
Let $\mathcal{L}_n(\Sigma_M)$ denote the set of admissible words of length $n$ in $\Sigma_M$. For a shift of finite type, the topological entropy is the exponential word-growth rate
\begin{align*}
h_{\mathrm{top}}(\sigma|_{\Sigma_M}) = \lim_{n\to\infty}\frac{1}{n}\log |\mathcal{L}_n(\Sigma_M)|.
\end{align*}
Moreover,
\begin{align*}
|\mathcal{L}_n(\Sigma_M)| = \sum_{i,j=1}^N (M^{n-1})_{ij}.
\end{align*}
By the Perron-Frobenius growth theorem for irreducible nonnegative matrices, the exponential growth rate of the entries of $M^n$ is $\lambda$. Hence
\begin{align*}
\lim_{n\to\infty}\frac{1}{n}\log |\mathcal{L}_n(\Sigma_M)| = \log\lambda.
\end{align*}
Therefore
\begin{align*}
h_{\mathrm{top}}(\sigma|_{\Sigma_M})=\log\lambda.
\end{align*}
Here the external input is the standard Perron-Frobenius word-growth formula for irreducible shifts of finite type.
[/step]
[step:Bound the entropy of an arbitrary invariant measure by $\log\lambda$]
Let $\nu$ be a $\sigma$-invariant Borel probability measure on $\Sigma_M$. Let $\widehat{\Sigma}_M \subset \{1,\dots,N\}^{\mathbb{Z}}$ be the two-sided shift space with the same transition matrix $M$, let $\widehat{\sigma}: \widehat{\Sigma}_M \to \widehat{\Sigma}_M$ be the two-sided shift, and let $\widehat{\nu}$ be the natural two-sided extension of $\nu$. For each $k \in \mathbb{Z}$, define the coordinate map
\begin{align*}
X_k: \widehat{\Sigma}_M \to \{1,\dots,N\}
\end{align*}
by
\begin{align*}
X_k(x):=x_k.
\end{align*}
Let $\mathcal{F}^- := \sigma(X_0,X_{-1},X_{-2},\dots)$ be the past-and-present $\sigma$-algebra.
For $\widehat{\nu}$-almost every $x \in \widehat{\Sigma}_M$, define the [conditional probability](/page/Conditional%20Probability) vector $q(x)=(q_1(x),\dots,q_N(x))$ by
\begin{align*}
q_j(x) := \widehat{\nu}(X_1=j \mid \mathcal{F}^-)(x).
\end{align*}
Also define the Parry transition vector $p(x)=(p_1(x),\dots,p_N(x))$ by
\begin{align*}
p_j(x) := P_{X_0(x),j}.
\end{align*}
Since both $\widehat{\nu}$ and the Parry transition rule are supported on admissible transitions, $q_j(x)=0$ whenever $p_j(x)=0$, for $\widehat{\nu}$-almost every $x$.
The Gibbs inequality for probability vectors gives, pointwise for $\widehat{\nu}$-almost every $x$,
\begin{align*}
-\sum_{j=1}^N q_j(x)\log q_j(x) \leq -\sum_{j=1}^N q_j(x)\log p_j(x).
\end{align*}
Integrating this inequality with respect to $\widehat{\nu}$ gives
\begin{align*}
h_\nu(\sigma) \leq -\int_{\widehat{\Sigma}_M}\log P_{X_0(x),X_1(x)}\, d\widehat{\nu}(x).
\end{align*}
For every admissible transition $X_0(x) \to X_1(x)$,
\begin{align*}
-\log P_{X_0(x),X_1(x)} = \log\lambda + \log r_{X_0(x)} - \log r_{X_1(x)}.
\end{align*}
Therefore
\begin{align*}
h_\nu(\sigma) \leq \log\lambda + \int_{\widehat{\Sigma}_M}\log r_{X_0(x)}\, d\widehat{\nu}(x) - \int_{\widehat{\Sigma}_M}\log r_{X_1(x)}\, d\widehat{\nu}(x).
\end{align*}
Since $\widehat{\nu}$ is $\widehat{\sigma}$-invariant, $X_0$ and $X_1$ have the same distribution under $\widehat{\nu}$. The two integrals cancel, so
\begin{align*}
h_\nu(\sigma)\leq \log\lambda.
\end{align*}
[/step]
[step:Show that equality forces the Parry transition rule]
Assume now that $\nu$ is $\sigma$-invariant and satisfies $h_\nu(\sigma)=\log\lambda$. In the previous step, equality must hold in the integrated Gibbs inequality. Since equality in the Gibbs inequality for probability vectors occurs exactly when the two probability vectors are equal, we obtain
\begin{align*}
\widehat{\nu}(X_1=j \mid \mathcal{F}^-)(x)=P_{X_0(x),j}
\end{align*}
for every $j \in \{1,\dots,N\}$ and for $\widehat{\nu}$-almost every $x \in \widehat{\Sigma}_M$.
Thus, under $\widehat{\nu}$, the conditional law of $X_1$ given the entire past depends only on $X_0$ and is given by the matrix $P$. Hence $\widehat{\nu}$ is a stationary Markov measure with transition matrix $P$. Let $\rho_i := \widehat{\nu}(X_0=i)$ denote its one-coordinate marginal. Since $\widehat{\nu}$ is shift-invariant, $\rho P=\rho$.
Define $v_i := \rho_i/r_i$ for $i \in \{1,\dots,N\}$. The stationarity equation gives, for every $j$,
\begin{align*}
\rho_j = \sum_{i=1}^N \rho_i \frac{M_{ij}r_j}{\lambda r_i}.
\end{align*}
Dividing by $r_j>0$ yields
\begin{align*}
v_j = \frac{1}{\lambda}\sum_{i=1}^N v_i M_{ij}.
\end{align*}
Equivalently,
\begin{align*}
v^\top M = \lambda v^\top.
\end{align*}
By the [Perron-Frobenius theorem for irreducible nonnegative matrices](/theorems/6787), the nonnegative left eigenvector for the Perron eigenvalue is unique up to scalar multiplication. Hence $v=c\,l$ for some $c\geq 0$, and therefore $\rho_i=c\,l_i r_i=c\,\pi_i$. Since both $\rho$ and $\pi$ are probability vectors, $c=1$, so $\rho=\pi$.
The measure $\widehat{\nu}$ is therefore the stationary Markov measure with initial distribution $\pi$ and transition matrix $P$. Its one-sided projection is exactly $\mu_P$, so $\nu=\mu_P$.
[/step]
[step:Conclude uniqueness of the measure of maximal entropy]
The preceding steps show
\begin{align*}
h_{\mu_P}(\sigma)=\log\lambda=h_{\mathrm{top}}(\sigma|_{\Sigma_M})
\end{align*}
and also show that every $\sigma$-invariant Borel probability measure $\nu$ on $\Sigma_M$ satisfies
\begin{align*}
h_\nu(\sigma)\leq \log\lambda.
\end{align*}
If $\nu$ is a measure of maximal entropy, then
\begin{align*}
h_\nu(\sigma)=h_{\mathrm{top}}(\sigma|_{\Sigma_M})=\log\lambda.
\end{align*}
The equality case from the previous step forces $\nu=\mu_P$. Hence $\mu_P$ is the unique measure of maximal entropy on $\Sigma_M$.
[/step]