[proofplan]
The inequality $h_\mu(T)\ge h_\mu(T,\mathcal P)$ is immediate once the entropy of $T$ is defined as a supremum over finite partitions, because finite coarsenings of $\mathcal P$ approximate $\mathcal P$ in entropy. For the reverse inequality, we fix an arbitrary finite partition $\mathcal Q$ and approximate it, in conditional entropy, by a finite time-window partition generated by $\mathcal P$. The entropy rate of that finite window is controlled by the entropy rate of $\mathcal P$, because the extra time indices form only a bounded boundary window whose contribution vanishes after division by $n$. Taking the approximation error to zero gives $h_\mu(T,\mathcal Q)\le h_\mu(T,\mathcal P)$ for every finite $\mathcal Q$, hence the supremum is no larger than $h_\mu(T,\mathcal P)$.
[/proofplan]
[step:Set the notation for time-window partitions and entropy rates]
For integers $a\le b$, define the countable time-window partition
\begin{align*}
\mathcal P_{[a,b]}:=\bigvee_{k=a}^{b}T^{-k}\mathcal P.
\end{align*}
For a finite measurable partition $\mathcal Q$, define similarly
\begin{align*}
\mathcal Q_{[0,n-1]}:=\bigvee_{k=0}^{n-1}T^{-k}\mathcal Q.
\end{align*}
The entropy rate of the partition $\mathcal Q$ is
\begin{align*}
h_\mu(T,\mathcal Q):=\lim_{n\to\infty}\frac{1}{n}H_\mu(\mathcal Q_{[0,n-1]}),
\end{align*}
and the same definition applies to the countable finite-entropy partition $\mathcal P$. The limit exists because the sequence $a_n:=H_\mu(\mathcal P_{[0,n-1]})$ is finite and subadditive. Finiteness follows from $H_\mu(\mathcal P)<\infty$ and the entropy subadditivity bound
\begin{align*}
H_\mu(\mathcal P_{[0,n-1]})\le \sum_{k=0}^{n-1}H_\mu(T^{-k}\mathcal P)=nH_\mu(\mathcal P)<\infty.
\end{align*}
Subadditivity follows from invariance of $\mu$ and the [Chain Rule for Entropy](/theorems/1635):
\begin{align*}
H_\mu(\mathcal P_{[0,m+n-1]})\le H_\mu(\mathcal P_{[0,m-1]})+H_\mu(T^{-m}\mathcal P_{[0,n-1]})=H_\mu(\mathcal P_{[0,m-1]})+H_\mu(\mathcal P_{[0,n-1]}).
\end{align*}
Thus the finite-valued subadditive form of Fekete's lemma applies to $(a_n)_{n\ge1}$.
[guided]
The notation $\mathcal P_{[a,b]}$ records the atoms of $\mathcal P$ seen from time $a$ through time $b$. This is the partition whose atoms carry the finite names used in entropy computations. The entropy rate $h_\mu(T,\mathcal P)$ is defined from the averages $H_\mu(\mathcal P_{[0,n-1]})/n$, so we must first know that those averages have a limit.
For every $n$, the finite entropy assumption and [subadditivity of entropy](/theorems/1634) give
\begin{align*}
H_\mu(\mathcal P_{[0,n-1]})\le nH_\mu(\mathcal P)<\infty.
\end{align*}
If a block has length $m+n$, we split it into its first $m$ coordinates and its last $n$ coordinates:
\begin{align*}
\mathcal P_{[0,m+n-1]}=\mathcal P_{[0,m-1]}\vee T^{-m}\mathcal P_{[0,n-1]}.
\end{align*}
The chain rule for entropy bounds the entropy of the join by the sum of the two entropies, and $T$-invariance of $\mu$ preserves the atom measures of the shifted partition. Hence
\begin{align*}
H_\mu(\mathcal P_{[0,m+n-1]})\le H_\mu(\mathcal P_{[0,m-1]})+H_\mu(\mathcal P_{[0,n-1]}).
\end{align*}
Thus the sequence $a_n=H_\mu(\mathcal P_{[0,n-1]})$ is finite and subadditive. The finite-valued subadditive sequence argument then gives the existence of $\lim_{n\to\infty}a_n/n$, which is the entropy rate $h_\mu(T,\mathcal P)$.
[/guided]
[/step]
[step:Approximate an arbitrary finite partition by a finite generator window]
Let $\mathcal Q=\{Q_1,\dots,Q_r\}$ be a finite measurable partition of $X$. For each $N\in\mathbb N$, define the finite-window $\sigma$-algebra
\begin{align*}
\mathcal F_N:=\sigma(\mathcal P_{[-N,N]}).
\end{align*}
For a sub-$\sigma$-algebra $\mathcal F\subseteq\mathcal A$, define the conditional entropy of the finite partition $\mathcal Q$ relative to $\mathcal F$ by
\begin{align*}
H_\mu(\mathcal Q\mid\mathcal F):=\int_X -\sum_{i=1}^{r}\mu(Q_i\mid\mathcal F)(x)\log\mu(Q_i\mid\mathcal F)(x)\,d\mu(x),
\end{align*}
with the convention $0\log0=0$. Because $\mathcal P$ is a two-sided generator, the increasing family of completed $\sigma$-algebras generated by $(\mathcal F_N)_{N\ge1}$ generates the completion of $\mathcal A$ modulo $\mu$. We take conditional expectations with respect to these completed $\sigma$-algebras, which does not change their $\mu$-a.e. equivalence classes. The martingale convergence theorem for conditional expectations gives $\mu(Q_i\mid\mathcal F_N)\to\mathbb 1_{Q_i}$ in $L^1(X,\mathcal A,\mu)$ for each $i\in\{1,\dots,r\}$. Since the entropy function on the probability simplex in $\mathbb R^r$ is continuous and bounded, the [conditional probability](/page/Conditional%20Probability) vectors converge in measure to a vertex, the corresponding entropy integrands converge in measure to $0$, and bounded convergence along almost-everywhere convergent subsequences gives
\begin{align*}
\lim_{N\to\infty}H_\mu(\mathcal Q\mid\mathcal F_N)=0.
\end{align*}
Choose $N\in\mathbb N$ such that
\begin{align*}
H_\mu(\mathcal Q\mid\mathcal F_N)<\varepsilon,
\end{align*}
where $\varepsilon>0$ is arbitrary.
[guided]
We fix the finite partition $\mathcal Q=\{Q_1,\dots,Q_r\}$ because $h_\mu(T)$ is obtained by taking the supremum over such partitions. The generator hypothesis says that observing all translates $T^{-k}\mathcal P$ for $k\in\mathbb Z$ recovers the whole $\sigma$-algebra $\mathcal A$ modulo null sets. Hence, as the symmetric window $[-N,N]$ grows, the finite-window information
\begin{align*}
\mathcal F_N:=\sigma(\mathcal P_{[-N,N]})
\end{align*}
recovers more and more of the information needed to decide which atom of $\mathcal Q$ contains a point.
The quantitative form of this approximation is conditional entropy convergence. For any sub-$\sigma$-algebra $\mathcal F\subseteq\mathcal A$, the conditional entropy of $\mathcal Q$ relative to $\mathcal F$ is
\begin{align*}
H_\mu(\mathcal Q\mid\mathcal F):=\int_X -\sum_{i=1}^{r}\mu(Q_i\mid\mathcal F)(x)\log\mu(Q_i\mid\mathcal F)(x)\,d\mu(x),
\end{align*}
where $0\log0=0$. Each atom $Q_i$ is measurable modulo $\mu$ with respect to the completed $\sigma$-algebra generated by $\bigcup_{N\ge1}\mathcal F_N$. We therefore take conditional expectations relative to the completed $\mathcal F_N$; this only changes conditional expectations on $\mu$-null sets. The martingale convergence theorem for conditional expectations applies to the integrable function $\mathbb 1_{Q_i}:X\to\mathbb R$ and gives
\begin{align*}
\mu(Q_i\mid\mathcal F_N)\to\mathbb 1_{Q_i}
\end{align*}
in $L^1(X,\mathcal A,\mu)$ as $N\to\infty$. Since $\mathcal Q$ has finitely many atoms, these $L^1$ convergences imply [convergence in measure](/page/Convergence%20in%20Measure) of the conditional probability vector to a vertex of the probability simplex. The entropy function on that simplex is continuous and bounded, so the entropy integrands converge in measure to $0$ and are bounded above by $\log r$. Every subsequence has an almost-everywhere convergent further subsequence, and the [dominated convergence theorem](/theorems/4) applied to that further subsequence forces the integrals to converge to $0$; hence the whole sequence satisfies
\begin{align*}
H_\mu(\mathcal Q\mid\mathcal F_N)\to0.
\end{align*}
Thus, for an arbitrary error tolerance $\varepsilon>0$, we may choose $N$ such that
\begin{align*}
H_\mu(\mathcal Q\mid\mathcal F_N)<\varepsilon.
\end{align*}
[/guided]
[/step]
[step:Bound the entropy rate of $\mathcal Q$ by the entropy rate of the finite window]
Let $\mathcal R$ denote the countable partition $\mathcal P_{[-N,N]}$. Since $\mathcal F_N=\sigma(\mathcal R)$, the preceding step gives
\begin{align*}
H_\mu(\mathcal Q\mid\mathcal R)<\varepsilon.
\end{align*}
For every $n\in\mathbb N$, the chain rule for conditional entropy and monotonicity of conditional entropy give
\begin{align*}
H_\mu(\mathcal Q_{[0,n-1]}\mid\mathcal R_{[0,n-1]})\le\sum_{j=0}^{n-1}H_\mu(T^{-j}\mathcal Q\mid T^{-j}\mathcal R).
\end{align*}
Since $T$ preserves $\mu$, the map $T^{-j}$ preserves all joint atom measures of $\mathcal Q\vee\mathcal R$, with the countable partition $\mathcal R$ having finite entropy because it is a finite join of translates of $\mathcal P$. Therefore each summand equals $H_\mu(\mathcal Q\mid\mathcal R)$, hence
\begin{align*}
H_\mu(\mathcal Q_{[0,n-1]}\mid\mathcal R_{[0,n-1]})<n\varepsilon.
\end{align*}
Using $H_\mu(\mathcal A\vee\mathcal B)\le H_\mu(\mathcal B)+H_\mu(\mathcal A\mid\mathcal B)$ for partitions, with $\mathcal A=\mathcal Q_{[0,n-1]}$ and $\mathcal B=\mathcal R_{[0,n-1]}$, we obtain
\begin{align*}
H_\mu(\mathcal Q_{[0,n-1]})\le H_\mu(\mathcal R_{[0,n-1]})+n\varepsilon.
\end{align*}
[guided]
The partition $\mathcal R=\mathcal P_{[-N,N]}$ is the finite generator window chosen so that $\mathcal Q$ has small conditional entropy relative to $\mathcal R$. We compare the $n$-block name of $\mathcal Q$ with the $n$-block name of this window. The chain rule says that the uncertainty in $\mathcal Q_{[0,n-1]}$ is at most the uncertainty in $\mathcal R_{[0,n-1]}$ plus the remaining uncertainty in $\mathcal Q_{[0,n-1]}$ after $\mathcal R_{[0,n-1]}$ is known:
\begin{align*}
H_\mu(\mathcal Q_{[0,n-1]})\le H_\mu(\mathcal R_{[0,n-1]})+H_\mu(\mathcal Q_{[0,n-1]}\mid\mathcal R_{[0,n-1]}).
\end{align*}
The conditional entropy of a join is bounded by the sum of the conditional entropies of its coordinates. Therefore
\begin{align*}
H_\mu(\mathcal Q_{[0,n-1]}\mid\mathcal R_{[0,n-1]})\le\sum_{j=0}^{n-1}H_\mu(T^{-j}\mathcal Q\mid T^{-j}\mathcal R).
\end{align*}
Because $T$ preserves $\mu$, each summand has the same joint atom measures as $H_\mu(\mathcal Q\mid\mathcal R)$, which is less than $\varepsilon$. Hence the total extra uncertainty is less than $n\varepsilon$, and
\begin{align*}
H_\mu(\mathcal Q_{[0,n-1]})\le H_\mu(\mathcal R_{[0,n-1]})+n\varepsilon.
\end{align*}
This is the entropy-rate comparison: after division by $n$, the cost of replacing $\mathcal Q$ by the generator window is at most $\varepsilon$.
[/guided]
[/step]
[step:Replace the finite window by a bounded enlargement of the original generator block]
By the definition of $\mathcal R=\mathcal P_{[-N,N]}$,
\begin{align*}
\mathcal R_{[0,n-1]}=\bigvee_{j=0}^{n-1}T^{-j}\mathcal P_{[-N,N]}=\mathcal P_{[-N,n-1+N]}.
\end{align*}
Since $T$ is invertible and preserves $\mu$, applying $T^N$ shifts this partition without changing entropy: the induced bijection between atoms, after discarding null atoms, preserves their $\mu$-measures because $\mu(T^{-N}A)=\mu(A)$ for every $A\in\mathcal A$. Hence
\begin{align*}
H_\mu(\mathcal P_{[-N,n-1+N]})=H_\mu(\mathcal P_{[0,n-1+2N]}).
\end{align*}
Therefore
\begin{align*}
\frac{1}{n}H_\mu(\mathcal Q_{[0,n-1]})\le\frac{1}{n}H_\mu(\mathcal P_{[0,n-1+2N]})+\varepsilon.
\end{align*}
Taking the limit superior as $n\to\infty$ gives
\begin{align*}
h_\mu(T,\mathcal Q)\le h_\mu(T,\mathcal P)+\varepsilon,
\end{align*}
because $(n-1+2N)/n\to1$ and the entropy rate of $\mathcal P$ exists.
[guided]
The partition $\mathcal R$ is not a new source of entropy; it is only a bounded time-window of the original generator. Its $n$-block is
\begin{align*}
\mathcal R_{[0,n-1]}=\mathcal P_{[-N,n-1+N]}.
\end{align*}
In the invertible case, applying $T^N$ shifts this two-sided block to the forward block $\mathcal P_{[0,n-1+2N]}$ without changing atom measures, so it does not change entropy. Thus the previous step gives
\begin{align*}
\frac{1}{n}H_\mu(\mathcal Q_{[0,n-1]})\le\frac{1}{n}H_\mu(\mathcal P_{[0,n-1+2N]})+\varepsilon.
\end{align*}
The numerator on the right is the entropy of a block of $\mathcal P$ whose length is $n+2N$. Since $N$ is fixed while $n\to\infty$, the ratio $(n+2N)/n$ tends to $1$, and the existing entropy rate of $\mathcal P$ controls these averages. Taking the limit superior yields
\begin{align*}
h_\mu(T,\mathcal Q)\le h_\mu(T,\mathcal P)+\varepsilon.
\end{align*}
This proves the desired upper bound for the single finite partition $\mathcal Q$, up to the arbitrary error $\varepsilon$.
[/guided]
[/step]
[step:Take the supremum over finite partitions and recover the countable generator by finite coarsenings]
The number $\varepsilon>0$ was arbitrary, so the previous step gives
\begin{align*}
h_\mu(T,\mathcal Q)\le h_\mu(T,\mathcal P)
\end{align*}
for every finite measurable partition $\mathcal Q$ of $X$. Taking the supremum over all finite $\mathcal Q$ yields
\begin{align*}
h_\mu(T)\le h_\mu(T,\mathcal P).
\end{align*}
It remains to justify the reverse inequality for the countable partition $\mathcal P$. Enumerate the atoms of $\mathcal P$ as $(P_i)_{i\ge1}$, allowing empty atoms if necessary, and define the finite coarsening $\mathcal P_m$ by
\begin{align*}
\mathcal P_m:=\{P_1,\dots,P_m,X\setminus\bigcup_{i=1}^{m}P_i\}.
\end{align*}
For each fixed $m$, the partition $\mathcal P_m$ is finite, so
\begin{align*}
h_\mu(T,\mathcal P_m)\le h_\mu(T).
\end{align*}
Since $\mathcal P_m$ is coarser than $\mathcal P$, the chain rule for conditional entropy and monotonicity of conditional entropy give, for every $n\in\mathbb N$,
\begin{align*}
H_\mu(\mathcal P_{[0,n-1]})\le H_\mu((\mathcal P_m)_{[0,n-1]})+H_\mu(\mathcal P_{[0,n-1]}\mid(\mathcal P_m)_{[0,n-1]}).
\end{align*}
The conditional term is bounded by the sum of the one-time conditional entropies:
\begin{align*}
H_\mu(\mathcal P_{[0,n-1]}\mid(\mathcal P_m)_{[0,n-1]})\le\sum_{k=0}^{n-1}H_\mu(T^{-k}\mathcal P\mid T^{-k}\mathcal P_m)=nH_\mu(\mathcal P\mid\mathcal P_m),
\end{align*}
where the equality uses $T$-invariance of $\mu$. Dividing by $n$ and passing to the limit gives
\begin{align*}
h_\mu(T,\mathcal P)\le h_\mu(T,\mathcal P_m)+H_\mu(\mathcal P\mid\mathcal P_m)\le h_\mu(T)+H_\mu(\mathcal P\mid\mathcal P_m).
\end{align*}
Because $\mathcal P_m$ is a coarsening of $\mathcal P$, the chain rule gives
\begin{align*}
H_\mu(\mathcal P\mid\mathcal P_m)=H_\mu(\mathcal P)-H_\mu(\mathcal P_m).
\end{align*}
The finite entropy assumption $H_\mu(\mathcal P)<\infty$ implies $H_\mu(\mathcal P_m)\uparrow H_\mu(\mathcal P)$ as $m\to\infty$, since the partial entropy sums converge to the countable entropy. Hence $H_\mu(\mathcal P\mid\mathcal P_m)\to0$, and letting $m\to\infty$ yields
\begin{align*}
h_\mu(T,\mathcal P)\le h_\mu(T).
\end{align*}
Combining the two inequalities proves
\begin{align*}
h_\mu(T)=h_\mu(T,\mathcal P)=\lim_{n\to\infty}\frac{1}{n}H_\mu(\mathcal P_{[0,n-1]}).
\end{align*}
[guided]
The previous step applies to every finite partition $\mathcal Q$. Since $\varepsilon>0$ was arbitrary, it gives $h_\mu(T,\mathcal Q)\le h_\mu(T,\mathcal P)$ for each finite $\mathcal Q$. Taking the supremum over finite partitions in the definition of $h_\mu(T)$ gives
\begin{align*}
h_\mu(T)\le h_\mu(T,\mathcal P).
\end{align*}
For the reverse inequality, approximate the countable partition $\mathcal P$ by finite coarsenings. Enumerate its atoms as $(P_i)_{i\ge1}$ and define
\begin{align*}
\mathcal P_m=\{P_1,\dots,P_m,X\setminus\bigcup_{i=1}^{m}P_i\}.
\end{align*}
Each $\mathcal P_m$ is finite, so its entropy rate is bounded by the supremum:
\begin{align*}
h_\mu(T,\mathcal P_m)\le h_\mu(T).
\end{align*}
The chain rule compares the $n$-block of $\mathcal P$ with the $n$-block of $\mathcal P_m$ and gives
\begin{align*}
H_\mu(\mathcal P_{[0,n-1]})\le H_\mu((\mathcal P_m)_{[0,n-1]})+nH_\mu(\mathcal P\mid\mathcal P_m).
\end{align*}
After division by $n$ and passage to the entropy-rate limit,
\begin{align*}
h_\mu(T,\mathcal P)\le h_\mu(T)+H_\mu(\mathcal P\mid\mathcal P_m).
\end{align*}
Because $H_\mu(\mathcal P)<\infty$, the finite coarsenings exhaust the countable entropy, so $H_\mu(\mathcal P\mid\mathcal P_m)=H_\mu(\mathcal P)-H_\mu(\mathcal P_m)\to0$. Letting $m\to\infty$ gives $h_\mu(T,\mathcal P)\le h_\mu(T)$. Combining the two inequalities proves the theorem in the invertible two-sided case.
[/guided]
[/step]
[step:Adapt the argument to one-sided non-invertible generators]
Assume now that $T$ is not necessarily invertible and that $\mathcal P$ is a one-sided generator of finite entropy. For a finite partition $\mathcal Q$, define
\begin{align*}
\mathcal F_N:=\sigma(\mathcal P_{[0,N]}).
\end{align*}
The one-sided generator hypothesis gives
\begin{align*}
H_\mu(\mathcal Q\mid\mathcal F_N)\to0.
\end{align*}
Choosing $N$ with $H_\mu(\mathcal Q\mid\mathcal F_N)<\varepsilon$ and setting $\mathcal R:=\mathcal P_{[0,N]}$, the same conditional-entropy argument gives
\begin{align*}
H_\mu(\mathcal Q_{[0,n-1]})\le H_\mu(\mathcal R_{[0,n-1]})+n\varepsilon.
\end{align*}
Here no inverse of $T$ is needed, because
\begin{align*}
\mathcal R_{[0,n-1]}=\mathcal P_{[0,n-1+N]}.
\end{align*}
Dividing by $n$ and passing to the limit superior gives
\begin{align*}
h_\mu(T,\mathcal Q)\le h_\mu(T,\mathcal P)+\varepsilon.
\end{align*}
Letting $\varepsilon\downarrow0$ and then taking the supremum over finite $\mathcal Q$ proves $h_\mu(T)\le h_\mu(T,\mathcal P)$ in the one-sided setting. The reverse inequality $h_\mu(T,\mathcal P)\le h_\mu(T)$ follows from the finite-coarsening argument in the previous step, since that argument used only $T$-invariance of $\mu$ and the forward blocks $\mathcal P_{[0,n-1]}$, not invertibility. This proves the one-sided statement.
[guided]
In the non-invertible setting we cannot shift negative time indices forward, so we use only forward windows. The one-sided generator hypothesis says that the increasing $\sigma$-algebras
\begin{align*}
\mathcal F_N=\sigma(\mathcal P_{[0,N]})
\end{align*}
generate $\mathcal A$ modulo $\mu$. Repeating the conditional-entropy approximation from the two-sided case gives, for every finite partition $\mathcal Q$ and every $\varepsilon>0$, an $N$ such that $H_\mu(\mathcal Q\mid\mathcal F_N)<\varepsilon$.
Set $\mathcal R=\mathcal P_{[0,N]}$. The same chain-rule argument gives
\begin{align*}
H_\mu(\mathcal Q_{[0,n-1]})\le H_\mu(\mathcal R_{[0,n-1]})+n\varepsilon.
\end{align*}
Now the $n$-block of $\mathcal R$ is already a forward block:
\begin{align*}
\mathcal R_{[0,n-1]}=\mathcal P_{[0,n-1+N]}.
\end{align*}
Dividing by $n$ and taking the limit superior gives $h_\mu(T,\mathcal Q)\le h_\mu(T,\mathcal P)+\varepsilon$. Letting $\varepsilon$ tend to $0$ and taking the supremum over finite $\mathcal Q$ gives $h_\mu(T)\le h_\mu(T,\mathcal P)$. The finite-coarsening argument proving $h_\mu(T,\mathcal P)\le h_\mu(T)$ used only forward blocks and $T$-invariance, so it remains valid without invertibility. This proves the one-sided generator theorem.
[/guided]
[/step]