[proofplan]
We first show that the coordinate partition is generating: finite symmetric joins of its shifts determine longer and longer coordinate blocks, and these cylinder coordinates generate the product $\sigma$-algebra. We then compute the entropy of the finite-time join $\bigvee_{j=0}^{n-1}T^{-j}\mathcal P$ by identifying its atoms with words of length $n$ and using the product structure of $\mu$. The entropy of that join is exactly $nH(p)$, so the partition entropy rate is $H(p)$. Finally, since the coordinate partition is a finite generator in the finite-alphabet case and a countable generator of finite entropy in the countable-alphabet case, the [Kolmogorov-Sinai generator theorem](/theorems/6726) gives $h_\mu(T)=h_\mu(T,\mathcal P)=H(p)$.
[/proofplan]
[step:Show that coordinate cylinders generated by shifted coordinate partitions determine the product $\sigma$-algebra]
Let $X:=A^{\mathbb Z}$, let $\mu:=p^{\mathbb Z}$, and define the shift map $T:X\to X$ by $(Tx)_k=x_{k+1}$ for every $x\in X$ and $k\in\mathbb Z$; hence $(T^jx)_0=x_j$ for every $j\in\mathbb Z$. For each $a\in A$, define the coordinate atom $P_a:=\{x\in X:x_0=a\}$, so $\mathcal P=\{P_a:a\in A\}$.
For each integer $k\in\mathbb Z$, define the coordinate projection
\begin{align*}
\pi_k:X\to A
\end{align*}
by $\pi_k(x)=x_k$. Since $A$ is finite or countable and is equipped with the discrete $\sigma$-algebra $\mathcal P(A)$, the product $\sigma$-algebra on $X=A^{\mathbb Z}$ is
\begin{align*}
\mathcal B=\sigma(\pi_k^{-1}(\{a\}):k\in\mathbb Z,\ a\in A).
\end{align*}
For each $a\in A$ and $j\in\mathbb Z$,
\begin{align*}
T^{-j}P_a=\{x\in X:(T^jx)_0=a\}=\{x\in X:x_j=a\}=\pi_j^{-1}(\{a\}).
\end{align*}
Hence
\begin{align*}
\sigma\left(\bigcup_{j\in\mathbb Z}T^{-j}\mathcal P\right)=\sigma(\pi_j^{-1}(\{a\}):j\in\mathbb Z,\ a\in A)=\mathcal B.
\end{align*}
Thus $\mathcal P$ is a generator for $T$.
Equivalently, for every $m\in\mathbb N$, the atoms of
\begin{align*}
\bigvee_{j=-m}^{m}T^{-j}\mathcal P
\end{align*}
are precisely the cylinder sets specifying the coordinates $x_{-m},x_{-m+1},\dots,x_m$. As $m\to\infty$, these finite-coordinate cylinders generate all cylinder sets and therefore generate $\mathcal B$.
[guided]
The generator assertion means that observing, for every time $j\in\mathbb Z$, which atom of $\mathcal P$ contains $T^jx$ determines the point $x$ up to the ambient product $\sigma$-algebra. We verify this directly.
Let $X:=A^{\mathbb Z}$, let $\mu:=p^{\mathbb Z}$, and define the shift map $T:X\to X$ by $(Tx)_k=x_{k+1}$ for every $x\in X$ and $k\in\mathbb Z$. Iterating this definition gives $(T^jx)_0=x_j$ for every $j\in\mathbb Z$. For each $a\in A$, define the coordinate atom $P_a:=\{x\in X:x_0=a\}$, so the coordinate partition is $\mathcal P=\{P_a:a\in A\}$.
For each integer $k\in\mathbb Z$, define the coordinate projection
\begin{align*}
\pi_k:X\to A
\end{align*}
by $\pi_k(x)=x_k$. The product $\sigma$-algebra on $X=A^{\mathbb Z}$ is, by definition, $\mathcal B=\sigma(\pi_k^{-1}(\{a\}):k\in\mathbb Z,\ a\in A)$, because $A$ has the discrete $\sigma$-algebra $\mathcal P(A)$.
Now compare these coordinate cylinders with shifted copies of the coordinate partition. For $a\in A$ and $j\in\mathbb Z$, the atom $P_a$ records whether the zeroth coordinate is $a$. Pulling it back by $T^j$ records whether the $j$th coordinate is $a$:
\begin{align*}
T^{-j}P_a=\{x\in X:(T^jx)_0=a\}=\{x\in X:x_j=a\}=\pi_j^{-1}(\{a\}).
\end{align*}
Therefore the $\sigma$-algebra generated by all shifted coordinate partitions is exactly
\begin{align*}
\sigma\left(\bigcup_{j\in\mathbb Z}T^{-j}\mathcal P\right)=\sigma(\pi_j^{-1}(\{a\}):j\in\mathbb Z,\ a\in A)=\mathcal B.
\end{align*}
This proves that $\mathcal P$ is a generator.
The finite joins make the same point in a more concrete way. The atom of
\begin{align*}
\bigvee_{j=-m}^{m}T^{-j}\mathcal P
\end{align*}
containing $x\in X$ is determined exactly by the finite coordinate block
\begin{align*}
(x_{-m},x_{-m+1},\dots,x_m).
\end{align*}
As $m$ increases, these blocks exhaust all finite coordinate information. Since finite-coordinate cylinder sets generate the product $\sigma$-algebra, the coordinate partition is generating.
[/guided]
[/step]
[step:Identify the atoms of the forward join with finite words]
Fix $n\in\mathbb N$. Define the forward $n$-block partition $\mathcal P_n:=\bigvee_{j=0}^{n-1}T^{-j}\mathcal P$. For a word $w=(a_0,\dots,a_{n-1})\in A^n$, define the cylinder atom $C_w:=\{x\in X:x_0=a_0,\ x_1=a_1,\dots,\ x_{n-1}=a_{n-1}\}$.
The atoms of $\mathcal P_n$ are exactly the nonempty sets $C_w$ with $w\in A^n$. By the definition of the product measure $\mu=p^{\mathbb Z}$,
\begin{align*}
\mu(C_w)=\prod_{j=0}^{n-1}p_{a_j}.
\end{align*}
[/step]
[step:Compute the entropy of each forward join by the product formula]
Let $H(\mathcal Q)$ denote the Shannon entropy of a finite or countable measurable partition $\mathcal Q$, namely $H(\mathcal Q):=-\sum_{Q\in\mathcal Q}\mu(Q)\log\mu(Q)$, again using $0\log 0:=0$. For $\mathcal P_n$, the preceding step gives
\begin{align*}
H(\mathcal P_n)=-\sum_{(a_0,\dots,a_{n-1})\in A^n}\left(\prod_{j=0}^{n-1}p_{a_j}\right)\log\left(\prod_{j=0}^{n-1}p_{a_j}\right).
\end{align*}
If some $p_{a_j}=0$, then the corresponding product measure is $0$, and the summand is interpreted as $0$. On the support of the product distribution, the logarithm of the product is the sum of logarithms, so
\begin{align*}
H(\mathcal P_n)=-\sum_{(a_0,\dots,a_{n-1})\in A^n}\left(\prod_{j=0}^{n-1}p_{a_j}\right)\sum_{j=0}^{n-1}\log p_{a_j}.
\end{align*}
Because the summands are nonnegative after rewriting as $-\log p_{a_j}$ and because $H(p)<\infty$ in the countable case, Tonelli's theorem for nonnegative series and finite additivity over the remaining coordinates give
\begin{align*}
H(\mathcal P_n)=\sum_{j=0}^{n-1}\sum_{(a_0,\dots,a_{n-1})\in A^n}\left(\prod_{\ell=0}^{n-1}p_{a_\ell}\right)(-\log p_{a_j}).
\end{align*}
For each fixed $j$, summing first over all coordinates except $a_j$ gives total mass $1$, hence
\begin{align*}
\sum_{(a_0,\dots,a_{n-1})\in A^n}\left(\prod_{\ell=0}^{n-1}p_{a_\ell}\right)(-\log p_{a_j})=\sum_{a\in A}p_a(-\log p_a)=H(p).
\end{align*}
Therefore
\begin{align*}
H(\mathcal P_n)=nH(p).
\end{align*}
[guided]
The partition $\mathcal P_n$ records the first $n$ coordinates, so its entropy should be the entropy of $n$ independent samples from $p$. We now compute that statement rather than cite it.
For a finite or countable measurable partition $\mathcal Q$, define its entropy by $H(\mathcal Q):=-\sum_{Q\in\mathcal Q}\mu(Q)\log\mu(Q)$, with $0\log 0:=0$. Applying this to the partition $\mathcal P_n$ and using the word atoms $C_w$ from the previous step gives
\begin{align*}
H(\mathcal P_n)=-\sum_{(a_0,\dots,a_{n-1})\in A^n}\left(\prod_{j=0}^{n-1}p_{a_j}\right)\log\left(\prod_{j=0}^{n-1}p_{a_j}\right).
\end{align*}
The product measure is essential here: the probability of seeing the word $(a_0,\dots,a_{n-1})$ is exactly the product $\prod_{j=0}^{n-1}p_{a_j}$. When this product is nonzero, the logarithm turns products into sums:
\begin{align*}
\log\left(\prod_{j=0}^{n-1}p_{a_j}\right)=\sum_{j=0}^{n-1}\log p_{a_j}.
\end{align*}
If the product is zero, the corresponding entropy contribution is $0$ by convention, so the same entropy computation is obtained by summing over the support of $p$ and then extending by zero to all of $A^n$. Thus
\begin{align*}
H(\mathcal P_n)=-\sum_{(a_0,\dots,a_{n-1})\in A^n}\left(\prod_{j=0}^{n-1}p_{a_j}\right)\sum_{j=0}^{n-1}\log p_{a_j}.
\end{align*}
We now interchange the finite sum over $j$ with the countable sum over words. In the finite-alphabet case this is just finite distributivity. In the countable-alphabet case, the terms after rewriting $-\log p_{a_j}$ are nonnegative, so Tonelli's theorem for nonnegative series permits the rearrangement; the assumption $H(p)<\infty$ ensures the resulting entropy is finite. Therefore
\begin{align*}
H(\mathcal P_n)=\sum_{j=0}^{n-1}\sum_{(a_0,\dots,a_{n-1})\in A^n}\left(\prod_{\ell=0}^{n-1}p_{a_\ell}\right)(-\log p_{a_j}).
\end{align*}
Fix one coordinate index $j$. The factor involving $a_j$ is $p_{a_j}(-\log p_{a_j})$. All other coordinates contribute total mass $1$, because $\sum_{a\in A}p_a=1$. Thus
\begin{align*}
\sum_{(a_0,\dots,a_{n-1})\in A^n}\left(\prod_{\ell=0}^{n-1}p_{a_\ell}\right)(-\log p_{a_j})=\sum_{a\in A}p_a(-\log p_a)=H(p).
\end{align*}
There are $n$ choices of $j$, each contributing the same value $H(p)$, and hence
\begin{align*}
H(\mathcal P_n)=nH(p).
\end{align*}
[/guided]
[/step]
[step:Pass from join entropy to the entropy rate of the coordinate partition]
The metric entropy of $T$ relative to $\mathcal P$ is $h_\mu(T,\mathcal P):=\lim_{n\to\infty}\frac{1}{n}H\left(\bigvee_{j=0}^{n-1}T^{-j}\mathcal P\right)$, where the limit exists by subadditivity of partition entropy. Since
\begin{align*}
H\left(\bigvee_{j=0}^{n-1}T^{-j}\mathcal P\right)=H(\mathcal P_n)=nH(p)
\end{align*}
for every $n\in\mathbb N$, we obtain
\begin{align*}
h_\mu(T,\mathcal P)=\lim_{n\to\infty}\frac{nH(p)}{n}=H(p).
\end{align*}
[/step]
[step:Use the generator theorem to identify the system entropy]
In the finite-alphabet case, $\mathcal P$ is a finite generator. By the Kolmogorov-Sinai generator theorem for finite generating partitions, the entropy of the system equals the entropy rate of any finite generating partition:
\begin{align*}
h_\mu(T)=h_\mu(T,\mathcal P).
\end{align*}
Therefore
\begin{align*}
h_\mu(T)=H(p).
\end{align*}
In the countable-alphabet case, $\mathcal P$ is a countable generator and the hypothesis $H(p)<\infty$ gives $H(\mathcal P)=H(p)<\infty$. The countable-generator version of the Kolmogorov-Sinai generator theorem applies under this finite-entropy hypothesis, so again
\begin{align*}
h_\mu(T)=h_\mu(T,\mathcal P)=H(p).
\end{align*}
This proves the stated formula in both the finite and countable cases.
[/step]