[step:Bound the entropy of an arbitrary invariant measure by $\log\lambda$]
Let $\nu$ be a $\sigma$-invariant Borel probability measure on $\Sigma_M$. Let $\widehat{\Sigma}_M \subset \{1,\dots,N\}^{\mathbb{Z}}$ be the two-sided shift space with the same transition matrix $M$, let $\widehat{\sigma}: \widehat{\Sigma}_M \to \widehat{\Sigma}_M$ be the two-sided shift, and let $\widehat{\nu}$ be the natural two-sided extension of $\nu$. For each $k \in \mathbb{Z}$, define the coordinate map
\begin{align*}
X_k: \widehat{\Sigma}_M \to \{1,\dots,N\}
\end{align*}
by
\begin{align*}
X_k(x):=x_k.
\end{align*}
Let $\mathcal{F}^- := \sigma(X_0,X_{-1},X_{-2},\dots)$ be the past-and-present $\sigma$-algebra.
For $\widehat{\nu}$-almost every $x \in \widehat{\Sigma}_M$, define the [conditional probability](/page/Conditional%20Probability) vector $q(x)=(q_1(x),\dots,q_N(x))$ by
\begin{align*}
q_j(x) := \widehat{\nu}(X_1=j \mid \mathcal{F}^-)(x).
\end{align*}
Also define the Parry transition vector $p(x)=(p_1(x),\dots,p_N(x))$ by
\begin{align*}
p_j(x) := P_{X_0(x),j}.
\end{align*}
Since both $\widehat{\nu}$ and the Parry transition rule are supported on admissible transitions, $q_j(x)=0$ whenever $p_j(x)=0$, for $\widehat{\nu}$-almost every $x$.
The Gibbs inequality for probability vectors gives, pointwise for $\widehat{\nu}$-almost every $x$,
\begin{align*}
-\sum_{j=1}^N q_j(x)\log q_j(x) \leq -\sum_{j=1}^N q_j(x)\log p_j(x).
\end{align*}
Integrating this inequality with respect to $\widehat{\nu}$ gives
\begin{align*}
h_\nu(\sigma) \leq -\int_{\widehat{\Sigma}_M}\log P_{X_0(x),X_1(x)}\, d\widehat{\nu}(x).
\end{align*}
For every admissible transition $X_0(x) \to X_1(x)$,
\begin{align*}
-\log P_{X_0(x),X_1(x)} = \log\lambda + \log r_{X_0(x)} - \log r_{X_1(x)}.
\end{align*}
Therefore
\begin{align*}
h_\nu(\sigma) \leq \log\lambda + \int_{\widehat{\Sigma}_M}\log r_{X_0(x)}\, d\widehat{\nu}(x) - \int_{\widehat{\Sigma}_M}\log r_{X_1(x)}\, d\widehat{\nu}(x).
\end{align*}
Since $\widehat{\nu}$ is $\widehat{\sigma}$-invariant, $X_0$ and $X_1$ have the same distribution under $\widehat{\nu}$. The two integrals cancel, so
\begin{align*}
h_\nu(\sigma)\leq \log\lambda.
\end{align*}
[/step]