[proofplan]
We encode the entropies of the iterated joins as a numerical sequence $a_n=H_\mu(\mathcal P_0^{n-1})$. The join identity for the first $m+n$ iterates and the [entropy inequality](/theorems/6729) for joins give subadditivity, $a_{m+n}\le a_m+a_n$. A direct proof of the subadditive sequence lemma then shows that $a_n/n$ has a limit equal to its infimum.
[/proofplan]
[step:Define the entropy sequence and prove it is finite]
Let $N$ denote the number of atoms of the finite partition $\mathcal P$. For each integer $n\ge 1$, define the finite partition
\begin{align*}
\mathcal P_0^{n-1}=\bigvee_{k=0}^{n-1}T^{-k}\mathcal P
\end{align*}
and define the real number
\begin{align*}
a_n=H_\mu(\mathcal P_0^{n-1}).
\end{align*}
The partition $\mathcal P_0^{n-1}$ has at most $N^n$ atoms. Hence, with the convention $0\log 0=0$, its entropy satisfies $0\le a_n\le \log(N^n)=n\log N$. Thus every $a_n$ is a finite nonnegative real number.
[guided]
The finite partition $\mathcal P$ gives only finitely many names at each time. A point of $X$ belongs to one atom of $\mathcal P$, one atom of $T^{-1}\mathcal P$, and so on through $T^{-(n-1)}\mathcal P$, so it has at most $N^n$ possible $n$-names. Shannon entropy of a probability vector on at most $N^n$ atoms lies between $0$ and $\log(N^n)$, so $a_n=H_\mu(\mathcal P_0^{n-1})$ is finite for every $n$.
[/guided]
[/step]
[step:Split the long join into two shorter joins]
Let $m,n\ge 1$ be integers. The partition recording the first $m+n$ iterates splits as
\begin{align*}
\mathcal P_0^{m+n-1}=\mathcal P_0^{m-1}\vee T^{-m}\mathcal P_0^{n-1}.
\end{align*}
For finite measurable partitions $\mathcal A$ and $\mathcal C$, the entropy inequality
\begin{align*}
H_\mu(\mathcal A\vee\mathcal C)\le H_\mu(\mathcal A)+H_\mu(\mathcal C)
\end{align*}
follows from the chain rule $H_\mu(\mathcal A\vee\mathcal C)=H_\mu(\mathcal A)+H_\mu(\mathcal C\mid\mathcal A)$ and the bound $H_\mu(\mathcal C\mid\mathcal A)\le H_\mu(\mathcal C)$. Applying this inequality with $\mathcal A=\mathcal P_0^{m-1}$ and $\mathcal C=T^{-m}\mathcal P_0^{n-1}$ gives
\begin{align*}
a_{m+n}\le a_m+H_\mu(T^{-m}\mathcal P_0^{n-1}).
\end{align*}
Because $T$ is probability-preserving, every atom of $T^{-m}\mathcal P_0^{n-1}$ has the same measure as the corresponding atom of $\mathcal P_0^{n-1}$ up to the standard merging of null atoms. Therefore
\begin{align*}
H_\mu(T^{-m}\mathcal P_0^{n-1})=H_\mu(\mathcal P_0^{n-1})=a_n.
\end{align*}
Thus
\begin{align*}
a_{m+n}\le a_m+a_n.
\end{align*}
[guided]
The join $\mathcal P_0^{m+n-1}$ records the atoms seen at times $0,1,\ldots,m+n-1$. The first $m$ records form $\mathcal P_0^{m-1}$, and the remaining $n$ records form the pullback by $T^m$ of the partition $\mathcal P_0^{n-1}$. Entropy of a join is at most the sum of the entropies of the two joined partitions, so the only remaining point is to identify the second entropy. Since $T$ preserves $\mu$, pulling a finite partition back by $T^m$ preserves the list of atom measures, and Shannon entropy depends only on that list. Hence $H_\mu(T^{-m}\mathcal P_0^{n-1})=a_n$, giving $a_{m+n}\le a_m+a_n$.
[/guided]
[/step]
[step:Derive convergence from subadditivity]
Set
\begin{align*}
\alpha=\inf_{j\ge 1}\frac{a_j}{j}.
\end{align*}
The inequalities from the previous step imply that the sequence $(a_n)_{n\ge 1}$ is subadditive. Since $a_j\ge 0$ for all $j$, the number $\alpha$ lies in $[0,\infty)$.
Fix $\varepsilon>0$. By the definition of the infimum, choose an integer $r\ge 1$ such that
\begin{align*}
\frac{a_r}{r}\le \alpha+\varepsilon.
\end{align*}
For every integer $N\ge 1$, write $N=qr+s$ with integers $q\ge 0$ and $0\le s<r$. Repeated subadditivity gives
\begin{align*}
a_N\le q a_r+a_s,
\end{align*}
where the term $a_s$ is omitted when $s=0$. Let
\begin{align*}
C_r=\max\{a_s:1\le s<r\}\cup\{0\}.
\end{align*}
Then $a_N\le q a_r+C_r$, and hence
\begin{align*}
\frac{a_N}{N}\le \frac{q r}{N}\frac{a_r}{r}+\frac{C_r}{N}\le \alpha+\varepsilon+\frac{C_r}{N}.
\end{align*}
Taking the limit superior as $N\to\infty$ yields
\begin{align*}
\limsup_{N\to\infty}\frac{a_N}{N}\le \alpha+\varepsilon.
\end{align*}
Since $\varepsilon>0$ was arbitrary, $\limsup_{N\to\infty}a_N/N\le\alpha$. The definition of $\alpha$ also gives $\alpha\le a_N/N$ for every $N$, so $\alpha\le\liminf_{N\to\infty}a_N/N$. Therefore
\begin{align*}
\lim_{N\to\infty}\frac{a_N}{N}=\alpha.
\end{align*}
[guided]
Subadditivity says that a long block can be covered by many blocks of a fixed length $r$, plus a bounded remainder shorter than $r$. Choose $r$ so that $a_r/r$ is within $\varepsilon$ of the best possible average $\alpha$. If $N=qr+s$ with $0\le s<r$, subadditivity bounds $a_N$ by $q$ copies of $a_r$ and one remainder cost. After division by $N$, the remainder cost is at most $C_r/N$, which tends to $0$. This proves that the upper limiting average is at most $\alpha+\varepsilon$, while every average is at least $\alpha$ by definition of the infimum. Hence the averages converge to $\alpha$.
[/guided]
[/step]
[step:Identify the entropy rate]
By the definition of $a_n$, the convergence just proved is exactly
\begin{align*}
\lim_{n\to\infty}\frac{1}{n}H_\mu(\mathcal P_0^{n-1})=\alpha.
\end{align*}
The first step showed that $0\le a_n/n\le\log N$ for every $n\ge 1$, so $\alpha\in[0,\infty)$. Therefore the entropy rate $h_\mu(T,\mathcal P)$ exists in $[0,\infty)$.
[/step]