[proofplan]
We compute the entropy from the finite cylinder partitions generated by the coordinate at time $0$. The Markov cylinder formula factors every word probability into an initial stationary weight and transition probabilities, so the entropy of an $n$-block separates into one initial entropy term plus $n-1$ identical averaged transition entropy terms. Dividing by $n$ and passing to the limit removes the initial term and leaves precisely the $\pi$-weighted average of the row entropies of $P$.
[/proofplan]
[step:Introduce the coordinate partitions and reduce entropy to block entropies]
Let $\mathcal{B}(A^{\mathbb Z})$ denote the product $\sigma$-algebra on $A^{\mathbb Z}$, and use the left shift convention
\begin{align*}
(\sigma x)_r:=x_{r+1}
\end{align*}
for every $r\in\mathbb Z$.
For each integer $r \in \mathbb{Z}$ and each symbol $a \in A$, define the coordinate cylinder
\begin{align*}
C_r(a) := \{x \in A^{\mathbb{Z}} : x_r = a\}.
\end{align*}
Let $\mathcal{P}_r := \{C_r(a) : a \in A\}$ be the finite measurable partition according to the coordinate $x_r$. For $n \in \mathbb{N}$, define the $n$-block partition
\begin{align*}
\mathcal{Q}_n := \bigvee_{r=0}^{n-1} \mathcal{P}_r.
\end{align*}
Its atoms are the cylinders
\begin{align*}
C(a_0,\dots,a_{n-1}) := \{x \in A^{\mathbb{Z}} : x_0 = a_0,\dots,x_{n-1}=a_{n-1}\},
\end{align*}
where $(a_0,\dots,a_{n-1}) \in A^n$.
Because $\pi P=\pi$, the defining cylinder probabilities of the stationary Markov measure are invariant under the shift $\sigma:A^{\mathbb{Z}}\to A^{\mathbb{Z}}$, so $(A^{\mathbb{Z}},\mathcal{B}(A^{\mathbb{Z}}),\mu_{\pi,P},\sigma)$ is a probability-preserving system. Since $A$ is finite, $\mathcal{P}_0$ is a finite measurable partition. The shifted partitions $\sigma^{-r}\mathcal{P}_0$, with $r \in \mathbb{Z}$, distinguish every coordinate and therefore generate the product $\sigma$-algebra on $A^{\mathbb{Z}}$. By the [Kolmogorov-Sinai generator theorem](/theorems/6726) for a finite generating partition, the Kolmogorov-Sinai entropy is the partition entropy block entropy rate
\begin{align*}
h_{\mu_{\pi,P}}(\sigma)= \lim_{n \to \infty} \frac{1}{n} H_{\mu_{\pi,P}}(\mathcal{Q}_n),
\end{align*}
where
\begin{align*}
H_{\mu_{\pi,P}}(\mathcal{Q}_n)
= -\sum_{(a_0,\dots,a_{n-1}) \in A^n}
\mu_{\pi,P}(C(a_0,\dots,a_{n-1}))
\log \mu_{\pi,P}(C(a_0,\dots,a_{n-1})).
\end{align*}
Terms with zero measure are omitted in the entropy sum.
[guided]
The goal is to compute entropy from finite symbolic data. We work on the product measurable space $(A^{\mathbb Z},\mathcal{B}(A^{\mathbb Z}))$, where $\mathcal{B}(A^{\mathbb Z})$ is the product $\sigma$-algebra, and we use the left shift
\begin{align*}
(\sigma x)_r:=x_{r+1}.
\end{align*}
For each time $r \in \mathbb{Z}$ and each symbol $a \in A$, the set
\begin{align*}
C_r(a) := \{x \in A^{\mathbb{Z}} : x_r = a\}
\end{align*}
records the event that the coordinate at time $r$ equals $a$. The finite partition
\begin{align*}
\mathcal{P}_r := \{C_r(a) : a \in A\}
\end{align*}
therefore records exactly one coordinate.
For $n \in \mathbb{N}$, the join
\begin{align*}
\mathcal{Q}_n := \bigvee_{r=0}^{n-1} \mathcal{P}_r
\end{align*}
records the word seen from time $0$ through time $n-1$. Its atoms are precisely the cylinders
\begin{align*}
C(a_0,\dots,a_{n-1}) := \{x \in A^{\mathbb{Z}} : x_0 = a_0,\dots,x_{n-1}=a_{n-1}\},
\end{align*}
with $(a_0,\dots,a_{n-1}) \in A^n$.
Because $\pi P=\pi$, the cylinder formula defining the stationary Markov measure is unchanged after shifting all time indices by $1$. Thus $\mu_{\pi,P}(\sigma^{-1}E)=\mu_{\pi,P}(E)$ first on cylinder sets $E$ and hence, by generation of the product $\sigma$-algebra by cylinders, on all measurable sets $E\subset A^{\mathbb{Z}}$. Therefore $(A^{\mathbb{Z}},\mathcal{B}(A^{\mathbb{Z}}),\mu_{\pi,P},\sigma)$ is a probability-preserving system.
Because the alphabet $A$ is finite, the one-coordinate partition $\mathcal{P}_0$ is finite. Its shifts distinguish all coordinates of a bi-infinite sequence, so the partitions $\sigma^{-r}\mathcal{P}_0$, with $r \in \mathbb{Z}$, generate the product $\sigma$-algebra on $A^{\mathbb{Z}}$. These are precisely the hypotheses needed for the Kolmogorov-Sinai generator theorem for a finite generating partition. Therefore the Kolmogorov-Sinai entropy of $\sigma$ is computed by the partition entropy rate of these finite block partitions:
\begin{align*}
h_{\mu_{\pi,P}}(\sigma)= \lim_{n \to \infty} \frac{1}{n} H_{\mu_{\pi,P}}(\mathcal{Q}_n).
\end{align*}
Here the entropy of $\mathcal{Q}_n$ is
\begin{align*}
H_{\mu_{\pi,P}}(\mathcal{Q}_n)
= -\sum_{(a_0,\dots,a_{n-1}) \in A^n}
\mu_{\pi,P}(C(a_0,\dots,a_{n-1}))
\log \mu_{\pi,P}(C(a_0,\dots,a_{n-1})).
\end{align*}
As usual for Shannon entropy, atoms of measure $0$ contribute nothing, so terms with zero measure are omitted.
[/guided]
[/step]
[step:Compute the probability of each finite word]
Fix $n \in \mathbb{N}$ and a word $(a_0,\dots,a_{n-1}) \in A^n$. By the defining cylinder formula for the stationary Markov measure,
\begin{align*}
\mu_{\pi,P}(C(a_0,\dots,a_{n-1}))
= \pi_{a_0}\prod_{r=0}^{n-2} P_{a_r a_{r+1}}.
\end{align*}
When this probability is positive, each factor appearing in the product is positive, and hence
\begin{align*}
\log \mu_{\pi,P}(C(a_0,\dots,a_{n-1}))
= \log \pi_{a_0} + \sum_{r=0}^{n-2} \log P_{a_r a_{r+1}}.
\end{align*}
[/step]
[step:Separate the block entropy into initial and transition contributions]
Let $W_n \subset A^n$ denote the set of words $(a_0,\dots,a_{n-1})$ with
\begin{align*}
\pi_{a_0}\prod_{r=0}^{n-2} P_{a_r a_{r+1}} > 0.
\end{align*}
Using the logarithmic factorization from the previous step only on words in $W_n$, where every displayed logarithm is defined, we obtain
\begin{align*}
H_{\mu_{\pi,P}}(\mathcal{Q}_n)= -\sum_{(a_0,\dots,a_{n-1}) \in W_n} \pi_{a_0}\prod_{r=0}^{n-2} P_{a_r a_{r+1}}\log \pi_{a_0}
\end{align*}
plus
\begin{align*}
-\sum_{s=0}^{n-2}\sum_{(a_0,\dots,a_{n-1}) \in W_n} \pi_{a_0}\prod_{r=0}^{n-2} P_{a_r a_{r+1}}\log P_{a_s a_{s+1}}.
\end{align*}
For the initial term, summing over $a_1,\dots,a_{n-1}$ gives $1$ because every row of $P$ sums to $1$. Thus the initial contribution is
\begin{align*}
-\sum_{i \in A : \pi_i > 0} \pi_i \log \pi_i.
\end{align*}
For a fixed $s \in \{0,\dots,n-2\}$, summing over all coordinates except $a_s$ and $a_{s+1}$ gives the joint probability. Explicitly, the coordinates before $s$ sum to the marginal $\pi_i$ by stationarity, the transition from $i$ to $j$ contributes $P_{ij}$, and the coordinates after $s+1$ sum to $1$ by repeated row-stochasticity of $P$. Hence
\begin{align*}
\mu_{\pi,P}(\{x \in A^{\mathbb{Z}} : x_s = i,\ x_{s+1}=j\})
= \pi_i P_{ij}.
\end{align*}
Here stationarity gives the marginal distribution $\pi$ at time $s$. Therefore the $s$-th transition contribution is
\begin{align*}
-\sum_{i \in A} \pi_i \sum_{j \in A : P_{ij} > 0} P_{ij}\log P_{ij}.
\end{align*}
This expression is independent of $s$, so
\begin{align*}
H_{\mu_{\pi,P}}(\mathcal{Q}_n)
= -\sum_{i \in A : \pi_i > 0} \pi_i \log \pi_i
-(n-1)\sum_{i \in A} \pi_i \sum_{\{j \in A : P_{ij} > 0\}} P_{ij}\log P_{ij}.
\end{align*}
[guided]
The block probability is a product: one initial weight and then transition weights. Entropy involves the negative average of the logarithm of that probability, so the logarithm turns the product into a sum. This is the point where the Markov structure becomes an additive entropy formula.
Let $W_n \subset A^n$ be the set of words $(a_0,\dots,a_{n-1})$ with positive cylinder measure. From the previous step, for every word in $W_n$,
\begin{align*}
\log \mu_{\pi,P}(C(a_0,\dots,a_{n-1}))= \log \pi_{a_0} + \sum_{r=0}^{n-2} \log P_{a_r a_{r+1}}.
\end{align*}
The restriction to $W_n$ matters because it ensures that $\pi_{a_0}$ and each transition factor along the word are positive, so every logarithm written here is defined. Substituting this identity into the entropy sum gives an initial part
\begin{align*}
-\sum_{(a_0,\dots,a_{n-1}) \in W_n} \pi_{a_0}\prod_{r=0}^{n-2} P_{a_r a_{r+1}}\log \pi_{a_0}
\end{align*}
and transition parts
\begin{align*}
-\sum_{s=0}^{n-2}\sum_{(a_0,\dots,a_{n-1}) \in W_n} \pi_{a_0}\prod_{r=0}^{n-2} P_{a_r a_{r+1}}\log P_{a_s a_{s+1}}.
\end{align*}
We first simplify the initial part. Fix $a_0=i$. Because $P$ is row-stochastic, summing over all possible continuations $a_1,\dots,a_{n-1}$ gives
\begin{align*}
\sum_{(a_1,\dots,a_{n-1}) \in A^{n-1}}
\prod_{r=0}^{n-2} P_{a_r a_{r+1}}
= 1.
\end{align*}
Thus the initial contribution is
\begin{align*}
-\sum_{i \in A : \pi_i > 0} \pi_i \log \pi_i.
\end{align*}
Now fix a transition time $s \in \{0,\dots,n-2\}$. The term $\log P_{a_s a_{s+1}}$ depends only on the adjacent pair $(a_s,a_{s+1})$. Therefore summing over all other coordinates collapses the word probability to the probability that this adjacent pair equals $(i,j)$. The sum over past coordinates gives the time-$s$ marginal $\pi_i$ by stationarity, the selected transition contributes $P_{ij}$, and the sum over future continuations is $1$ because each row of $P$ sums to $1$. Thus
\begin{align*}
\mu_{\pi,P}(\{x \in A^{\mathbb{Z}} : x_s = i,\ x_{s+1}=j\})
= \pi_i P_{ij}.
\end{align*}
The equality uses stationarity: at every time $s$, the one-coordinate marginal is $\pi$, and then the Markov transition from $i$ to $j$ has probability $P_{ij}$.
Hence the contribution from this fixed transition time is
\begin{align*}
-\sum_{i \in A} \pi_i \sum_{j \in A : P_{ij} > 0} P_{ij}\log P_{ij}.
\end{align*}
This does not depend on $s$. Since there are $n-1$ transition times in an $n$-block, the full block entropy is
\begin{align*}
H_{\mu_{\pi,P}}(\mathcal{Q}_n)
= -\sum_{i \in A : \pi_i > 0} \pi_i \log \pi_i
-(n-1)\sum_{i \in A} \pi_i \sum_{j \in A : P_{ij} > 0} P_{ij}\log P_{ij}.
\end{align*}
[/guided]
[/step]
[step:Divide by the block length and pass to the entropy rate]
Define the finite initial entropy constant
\begin{align*}
H(\pi) := -\sum_{i \in A : \pi_i > 0} \pi_i \log \pi_i.
\end{align*}
Define the averaged transition entropy
\begin{align*}
H(P \mid \pi) := -\sum_{i \in A} \pi_i \sum_{j \in A : P_{ij} > 0} P_{ij}\log P_{ij}.
\end{align*}
The previous step gives
\begin{align*}
H_{\mu_{\pi,P}}(\mathcal{Q}_n)= H(\pi) + (n-1)H(P \mid \pi).
\end{align*}
Therefore
\begin{align*}
\frac{1}{n}H_{\mu_{\pi,P}}(\mathcal{Q}_n)= \frac{H(\pi)}{n} + \frac{n-1}{n}H(P \mid \pi).
\end{align*}
Since $A$ is finite, both $H(\pi)$ and $H(P \mid \pi)$ are finite. Passing to the limit as $n \to \infty$ gives
\begin{align*}
h_{\mu_{\pi,P}}(\sigma)= H(P \mid \pi)= -\sum_{i \in A} \pi_i \sum_{j \in A : P_{ij} > 0} P_{ij}\log P_{ij}.
\end{align*}
This is the claimed entropy formula.
[guided]
The last step is only a normalization by the block length, but it is where the initial distribution disappears. Define
\begin{align*}
H(\pi) := -\sum_{i \in A : \pi_i > 0} \pi_i \log \pi_i.
\end{align*}
This is finite because $A$ has finitely many symbols. Also define
\begin{align*}
H(P \mid \pi) := -\sum_{i \in A} \pi_i \sum_{j \in A : P_{ij} > 0} P_{ij}\log P_{ij}.
\end{align*}
This quantity is finite for the same reason: there are only finitely many pairs $(i,j) \in A \times A$, and terms with $P_{ij}=0$ are omitted.
The block entropy formula proved in the previous step is
\begin{align*}
H_{\mu_{\pi,P}}(\mathcal{Q}_n)= H(\pi) + (n-1)H(P \mid \pi).
\end{align*}
Dividing by $n$ gives
\begin{align*}
\frac{1}{n}H_{\mu_{\pi,P}}(\mathcal{Q}_n)= \frac{H(\pi)}{n} + \frac{n-1}{n}H(P \mid \pi).
\end{align*}
Now $H(\pi)$ is a fixed finite number, so $H(\pi)/n \to 0$. Also $(n-1)/n \to 1$. Therefore, using the block entropy rate formula from the first step,
\begin{align*}
h_{\mu_{\pi,P}}(\sigma)= H(P \mid \pi)= -\sum_{i \in A} \pi_i \sum_{j \in A : P_{ij} > 0} P_{ij}\log P_{ij}.
\end{align*}
This is exactly the asserted entropy formula for the stationary Markov measure.
[/guided]
[/step]