[guided]The earlier compact-core construction fixed the bad set $B$ before $n$ was chosen. This is essential: the covering number $M$ is then fixed while the orbit length grows. The price is that an orbit segment is not required to avoid $B$ at every time; instead we control the set of points whose orbit visits $B$ too often.
For $n\in\mathbb N$, the function
\begin{align*}
N_n(x):=\sum_{k=0}^{n-1}\mathbb{1}_B(T^k x)
\end{align*}
counts the number of bad visits of $x$ during the first $n$ iterates. Since $T$ preserves $m_G$, its average is
\begin{align*}
\int_G N_n(x)\,d m_G(x)=\sum_{k=0}^{n-1}m_G(T^{-k}B)=n\,m_G(B)<n\alpha^2.
\end{align*}
Thus [Markov's inequality](/theorems/741) applied to the nonnegative measurable function $N_n$ with threshold $\alpha n$ gives a mostly good set
\begin{align*}
C_n:=\{x\in G:N_n(x)\leq\alpha n\}
\end{align*}
with
\begin{align*}
m_G(G\setminus C_n)<\alpha.
\end{align*}
Now fix one covering ball $B(z_\ell,\rho/2)$. If $x,y\in B(z_\ell,\rho/2)\cap C_n$, then $d(x,y)<\rho$, and the isometry property gives $d(T^k x,T^k y)<\rho$ for every $k$. Whenever both iterates avoid $B$, they lie in compact cores. Two different compact cores are at distance at least $2\rho$, so the two iterates must lie in the same core and therefore have the same partition symbol. Hence the two length-$n$ names can disagree only at times that are bad for $x$ or bad for $y$, at most $2\alpha n$ times in total.
Consequently, inside one covering ball, all names lie in a Hamming ball of radius $2\alpha n$ in the alphabet $\{1,\dots,r\}$. The number of such words is bounded by
\begin{align*}
\sum_{j=0}^{\lfloor 2\alpha n\rfloor}\binom{n}{j}r^j.
\end{align*}
We estimate this sum directly. Let $\beta(t)=-t\log t-(1-t)\log(1-t)$ be the binary entropy function. Stirling's inequality gives $\binom{n}{j}\leq\exp(n\beta(j/n))$ for $0\leq j\leq n$, and $\beta$ is increasing on $(0,1/2)$. Since $2\alpha<1/2$, every $0\leq j\leq\lfloor2\alpha n\rfloor$ satisfies
\begin{align*}
\binom{n}{j}r^j\leq\exp(n\beta(2\alpha)+2\alpha n\log r).
\end{align*}
There are at most $\lfloor2\alpha n\rfloor+1$ possible values of $j$, so
\begin{align*}
\sum_{j=0}^{\lfloor 2\alpha n\rfloor}\binom{n}{j}r^j\leq(\lfloor2\alpha n\rfloor+1)\exp(n\beta(2\alpha)+2\alpha n\log r).
\end{align*}
The extra factor is harmless in the entropy rate because its logarithm divided by $n$ tends to zero. Thus, multiplying by the fixed number $M$ of covering balls, the mostly good set $C_n$ meets at most
\begin{align*}
L_n:=M(\lfloor2\alpha n\rfloor+1)\exp(n\beta(2\alpha)+2\alpha n\log r)
\end{align*}
atoms of the join $\mathcal P_0^{n-1}$.
It remains to translate this atom count into entropy. Let
\begin{align*}
S_n:=\{C_n,G\setminus C_n\}
\end{align*}
be the two-set partition. The finite entropy decomposition with respect to $S_n$ says that the information in the join is bounded by the information in the two-set split, plus the conditional information after the side of the split is known:
\begin{align*}
H_{m_G}(\mathcal P_0^{n-1})\leq H_{m_G}(S_n)+H_{m_G}(\mathcal P_0^{n-1}\mid S_n).
\end{align*}
This is the finite-partition identity obtained by grouping the atoms of $\mathcal P_0^{n-1}$ according to their intersection with $C_n$ and $G\setminus C_n$. On $C_n$, at most $L_n$ atoms occur, so the conditional entropy there is at most $\log L_n$. On $G\setminus C_n$, the mass is less than $\alpha$ and there are at most $r^n$ possible atoms, so the conditional contribution is at most $\alpha n\log r$. The binary entropy of $S_n$ is at most $\log 2$. Thus
\begin{align*}
H_{m_G}(\mathcal P_0^{n-1})\leq \log 2+\log M+\log(\lfloor2\alpha n\rfloor+1)+n\beta(2\alpha)+2\alpha n\log r+\alpha n\log r.
\end{align*}
After dividing by $n$ and letting $n\to\infty$, the terms $\log 2$, $\log M$, and $\log(\lfloor2\alpha n\rfloor+1)$ divided by $n$ disappear, leaving
\begin{align*}
h_{m_G}(T,\mathcal P)\leq \beta(2\alpha)+3\alpha\log r.
\end{align*}
Finally let $\alpha\downarrow0$. Both terms on the right tend to zero, so $h_{m_G}(T,\mathcal P)=0$.[/guided]