[proofplan]
We prove a slightly more general fact: every measure-preserving isometry of a compact metric probability space has zero measure entropy. The key point is that, after discarding a set of small measure, a finite partition can be replaced by compact pieces separated by a positive distance. Since an isometry preserves distances along orbit segments, the number of possible symbolic names outside the discarded set is bounded independently of the orbit length. The discarded set contributes only an arbitrarily small entropy error per iterate, so every finite partition has zero entropy rate.
[/proofplan]
[step:Reduce the rotation to a measure-preserving isometry]
Let $e: G \times G \to [0,\infty)$ be any metric compatible with the topology of $G$. Define
\begin{align*}
d(x,y):=\sup_{b\in G} e(x+b,y+b)
\end{align*}
for $x,y\in G$. Compactness of $G$ makes the supremum finite. The map $d: G \times G \to [0,\infty)$ is a metric, and it is translation-invariant because for every $c\in G$,
\begin{align*}
d(x+c,y+c)=\sup_{b\in G} e(x+c+b,y+c+b)=\sup_{b\in G} e(x+b,y+b)=d(x,y),
\end{align*}
where the last equality uses that $b\mapsto b+c$ is a bijection of $G$. The metric $d$ is compatible with the topology induced by $e$. The inequality $e(x,y)\leq d(x,y)$ gives one direction. Conversely, [uniform continuity](/page/Uniform%20Continuity) of the map $A:G\times G\to G$, $A(x,b)=x+b$, on the [compact space](/page/Compact%20Space) $G\times G$ implies that for every $\varepsilon>0$ there is $\eta>0$ such that $e(x,y)<\eta$ gives $e(x+b,y+b)<\varepsilon$ for every $b\in G$; hence $d(x,y)<\varepsilon$. Thus $T$ is an isometry, because
\begin{align*}
d(Tx,Ty)=d(x+a,y+a)=d(x,y).
\end{align*}
The Haar probability measure $m_G$ is translation-invariant, so for every Borel set $E \subset G$,
\begin{align*}
m_G(T^{-1}E)=m_G(E-a)=m_G(E).
\end{align*}
Hence $T$ is $m_G$-measure-preserving.
[guided]
We first choose a metric for which the rotation is visibly an isometry. Let $e:G\times G\to[0,\infty)$ be any metric compatible with the topology of $G$, and define $d:G\times G\to[0,\infty)$ by
\begin{align*}
d(x,y):=\sup_{b\in G} e(x+b,y+b).
\end{align*}
The supremum is finite because $G$ is compact and $e$ is bounded on $G\times G$. [Translation invariance](/theorems/4911) follows by reindexing the supremum: for $c\in G$, the map $b\mapsto b+c$ is a bijection of $G$, so
\begin{align*}
d(x+c,y+c)=\sup_{b\in G}e(x+c+b,y+c+b)=\sup_{b\in G}e(x+b,y+b)=d(x,y).
\end{align*}
The metric $d$ induces the same topology as $e$. Since $e(x,y)\leq d(x,y)$, every $d$-small pair is $e$-small. For the converse, use uniform continuity of the addition map $A:G\times G\to G$, $A(x,b)=x+b$, on compact $G\times G$: for each $\varepsilon>0$ there is $\eta>0$ such that $e(x,y)<\eta$ implies $e(x+b,y+b)<\varepsilon$ for all $b\in G$. Taking the supremum over $b\in G$ gives $d(x,y)<\varepsilon$. Therefore the $e$- and $d$-topologies agree.
Now the rotation $T:G\to G$, $T(x)=x+a$, is an isometry for $d$, since
\begin{align*}
d(Tx,Ty)=d(x+a,y+a)=d(x,y).
\end{align*}
It is also measure-preserving because Haar probability measure is translation-invariant. Thus for every Borel set $E\subset G$,
\begin{align*}
m_G(T^{-1}E)=m_G(E-a)=m_G(E).
\end{align*}
So the theorem reduces to proving that this measure-preserving isometry has zero entropy.
[/guided]
[/step]
[step:Approximate a finite partition by separated compact pieces]
Let $\mathcal P=\{P_1,\dots,P_r\}$ be a finite Borel partition of $G$, where $r \in \mathbb N$. If $r=1$, then every join $\mathcal P_0^{n-1}$ has one atom, so $H_{m_G}(\mathcal P_0^{n-1})=0$ for every $n\in\mathbb N$ and hence $h_{m_G}(T,\mathcal P)=0$. Thus, for the remaining argument, assume $r\geq 2$.
Let $\delta\in(0,1)$ be a parameter chosen later. By [inner regularity of finite Borel measures on compact metric spaces](/theorems/756), applied to the Borel probability measure $m_G$, for each $i \in \{1,\dots,r\}$ choose a compact set $K_i \subset P_i$ such that
\begin{align*}
m_G(P_i \setminus K_i) < \frac{\delta}{r}.
\end{align*} Define the bad set
\begin{align*}
B := G \setminus \bigcup_{i=1}^r K_i.
\end{align*}
Then
\begin{align*}
m_G(B) \leq \sum_{i=1}^r m_G(P_i \setminus K_i) < \delta.
\end{align*}
The compact sets $K_i$ are pairwise disjoint. Define the nonempty-core index set
\begin{align*}
I:=\{i\in\{1,\dots,r\}:K_i\neq\varnothing\}.
\end{align*}
Since $m_G(B)<\delta<1$, the set $I$ is nonempty. If $i,j\in I$ and $i\neq j$, define
\begin{align*}
\rho_{ij}:=\inf\{d(x,y):x\in K_i,\ y\in K_j\}.
\end{align*}
Since $K_i$ and $K_j$ are nonempty disjoint compact sets in a [metric space](/page/Metric%20Space), $\rho_{ij}>0$. If $|I|\geq 2$, define
\begin{align*}
\rho:=\frac{1}{2}\min\{\rho_{ij}:i,j\in I,\ i\neq j\}.
\end{align*}
If $|I|=1$, define
\begin{align*}
\rho:=1.
\end{align*}
Then $\rho>0$, and whenever $i,j\in I$ with $i\neq j$, every $x\in K_i$ and $y\in K_j$ satisfy $d(x,y)\geq 2\rho$.
By compactness of $G$, choose points $z_1,\dots,z_M \in G$ such that
\begin{align*}
G \subset \bigcup_{\ell=1}^M B(z_\ell,\rho/2),
\end{align*}
where $M \in \mathbb N$ and $B(z_\ell,\rho/2)=\{x\in G:d(x,z_\ell)<\rho/2\}$.
[guided]
We begin with an arbitrary finite measurable partition
\begin{align*}
\mathcal P=\{P_1,\dots,P_r\}.
\end{align*}
If $r=1$, then every dynamical join has a single atom, so the entropy rate of $\mathcal P$ is already zero. We therefore assume $r\geq2$. The atoms of $\mathcal P$ may have very rough boundaries, so we replace most of each atom by a compact core. For a parameter $\delta\in(0,1)$, [inner regularity of finite Borel measures on compact metric spaces](/theorems/756) gives compact sets $K_i \subset P_i$ satisfying
\begin{align*}
m_G(P_i \setminus K_i) < \frac{\delta}{r}.
\end{align*}
The part not covered by these compact cores is
\begin{align*}
B := G \setminus \bigcup_{i=1}^r K_i.
\end{align*}
Since the $P_i$ form a partition, the only points outside the compact cores lie in the removed pieces $P_i \setminus K_i$, and therefore
\begin{align*}
m_G(B) \leq \sum_{i=1}^r m_G(P_i \setminus K_i) < \delta.
\end{align*}
The reason for using compact cores is separation, but we must ignore empty cores because they contain no orbit points and have no positive distance from anything. Define
\begin{align*}
I:=\{i\in\{1,\dots,r\}:K_i\neq\varnothing\}.
\end{align*}
Since $m_G(B)<\delta<1$, at least one compact core is nonempty, so $I\neq\varnothing$. If $i,j\in I$ and $i\neq j$, then $K_i$ and $K_j$ are nonempty disjoint compact subsets of the metric space $G$. Hence their distance
\begin{align*}
\rho_{ij}:=\inf\{d(x,y):x\in K_i,\ y\in K_j\}
\end{align*}
is positive. If $|I|\geq 2$, define
\begin{align*}
\rho:=\frac{1}{2}\min\{\rho_{ij}:i,j\in I,\ i\neq j\}.
\end{align*}
If $|I|=1$, define
\begin{align*}
\rho:=1.
\end{align*}
Then $\rho>0$. This means that two points at distance less than $\rho$ cannot lie in two different nonempty compact cores $K_i$ and $K_j$. Finally, compactness of $G$ gives a finite cover by $\rho/2$-balls:
\begin{align*}
G \subset \bigcup_{\ell=1}^M B(z_\ell,\rho/2).
\end{align*}
This finite covering number $M$ is fixed once the compact cores are fixed, and the smaller radius ensures that any two points in one covering ball are less than $\rho$ apart.
[/guided]
[/step]
[step:Bound orbit names with few bad visits by Hamming balls]
Fix $\alpha\in(0,1/4)$ and choose the compact cores in the preceding step with $m_G(B)<\alpha^2$. For $n\in\mathbb N$, define the bad-visit counting function $N_n:G\to\{0,1,\dots,n\}$ by
\begin{align*}
N_n(x):=\sum_{k=0}^{n-1}\mathbb{1}_B(T^k x).
\end{align*}
Since $T$ preserves $m_G$,
\begin{align*}
\int_G N_n(x)\,d m_G(x)=\sum_{k=0}^{n-1}m_G(T^{-k}B)=n\,m_G(B)<n\alpha^2.
\end{align*}
Define the mostly good set $C_n\subset G$ by
\begin{align*}
C_n:=\{x\in G:N_n(x)\leq \alpha n\}.
\end{align*}
[Markov's inequality](/theorems/741) applied to the nonnegative measurable function $N_n$ with threshold $\alpha n$ gives
\begin{align*}
m_G(G\setminus C_n)\leq \frac{1}{\alpha n}\int_G N_n(x)\,d m_G(x)<\alpha.
\end{align*}
Let $\mathcal P_0^{n-1}$ denote the dynamical join
\begin{align*}
\mathcal P_0^{n-1}:=\bigvee_{k=0}^{n-1} T^{-k}\mathcal P.
\end{align*}
Each atom of $\mathcal P_0^{n-1}$ corresponds to a word $(i_0,\dots,i_{n-1})\in\{1,\dots,r\}^n$.
Fix $\ell\in\{1,\dots,M\}$. If $B(z_\ell,\rho/2)\cap C_n$ is nonempty, choose a point $x_\ell\in B(z_\ell,\rho/2)\cap C_n$ and let $w_\ell\in\{1,\dots,r\}^n$ be its $\mathcal P_0^{n-1}$-name. For any $y\in B(z_\ell,\rho/2)\cap C_n$, the triangle inequality gives $d(x_\ell,y)<\rho$. Hence for every $k\in\{0,\dots,n-1\}$,
\begin{align*}
d(T^k x_\ell,T^k y)=d(x_\ell,y)<\rho.
\end{align*}
At every time $k$ for which both $T^k x_\ell\notin B$ and $T^k y\notin B$, the points $T^k x_\ell$ and $T^k y$ lie in compact cores $K_i$ and $K_j$. If $i\neq j$, then $d(T^k x_\ell,T^k y)\geq 2\rho$, contradicting the preceding inequality. Thus their partition symbols agree at every time outside the union of the two bad-time sets. Since both $x_\ell$ and $y$ belong to $C_n$, this union has cardinality at most $2\alpha n$.
Therefore every name appearing in $B(z_\ell,\rho/2)\cap C_n$ differs from $w_\ell$ in at most $2\alpha n$ positions. The number of such words is at most
\begin{align*}
\sum_{j=0}^{\lfloor 2\alpha n\rfloor}\binom{n}{j}r^j.
\end{align*}
[/step]
[step:Estimate the entropy rate using the splitting by mostly good points]
Let $H_{m_G}(\mathcal Q)$ denote the Shannon entropy of a finite measurable partition $\mathcal Q$:
\begin{align*}
H_{m_G}(\mathcal Q):=-\sum_{Q\in\mathcal Q} m_G(Q)\log m_G(Q),
\end{align*}
with the convention $0\log 0=0$. Define the binary entropy function $\beta:(0,1)\to[0,\infty)$ by
\begin{align*}
\beta(t):=-t\log t-(1-t)\log(1-t).
\end{align*}
Since $2\alpha<1/2$, we use the elementary binomial entropy bound in the following form. For $0\leq j\leq n$, the coefficient estimate $\binom{n}{j}\leq \exp(n\beta(j/n))$ follows from Stirling's inequality, and $\beta$ is increasing on $(0,1/2)$. Therefore, for every $0\leq j\leq\lfloor2\alpha n\rfloor$,
\begin{align*}
\binom{n}{j}r^j\leq \exp(n\beta(2\alpha)+2\alpha n\log r).
\end{align*}
Summing over the at most $\lfloor2\alpha n\rfloor+1$ possible values of $j$ gives
\begin{align*}
\sum_{j=0}^{\lfloor 2\alpha n\rfloor}\binom{n}{j}r^j\leq (\lfloor 2\alpha n\rfloor+1)\exp(n\beta(2\alpha)+2\alpha n\log r).
\end{align*}
The factor $\lfloor 2\alpha n\rfloor+1$ is subexponential, since $n^{-1}\log(\lfloor 2\alpha n\rfloor+1)\to0$ as $n\to\infty$.
Combining this estimate with the $M$ covering balls, the number of atoms of $\mathcal P_0^{n-1}$ that meet $C_n$ is at most
\begin{align*}
L_n:=M(\lfloor 2\alpha n\rfloor+1)\exp(n\beta(2\alpha)+2\alpha n\log r).
\end{align*}
We next prove the entropy splitting estimate used here. Let $\mathcal R_n$ be the collection of atoms of $\mathcal P_0^{n-1}$ that meet $C_n$. Then $|\mathcal R_n|\leq L_n$, while the whole join has at most $r^n$ atoms. Let $S_n$ denote the two-set partition
\begin{align*}
S_n:=\{C_n,G\setminus C_n\}.
\end{align*}
We use the elementary entropy decomposition with respect to the two-set partition $S_n$: the information needed to determine an atom of $\mathcal P_0^{n-1}$ is bounded by first specifying whether the point lies in $C_n$ or $G\setminus C_n$, and then specifying the atom inside that selected piece. Thus
\begin{align*}
H_{m_G}(\mathcal P_0^{n-1})\leq H_{m_G}(S_n)+H_{m_G}(\mathcal P_0^{n-1}\mid S_n).
\end{align*}
Conditioning on $C_n$, at most $L_n$ atoms occur, so the conditional entropy is at most $\log L_n$. Conditioning on $G\setminus C_n$, at most $r^n$ atoms occur, so the conditional entropy is at most $n\log r$. Therefore
\begin{align*}
H_{m_G}(\mathcal P_0^{n-1})\leq H_{m_G}(S_n)+m_G(C_n)\log L_n+m_G(G\setminus C_n)n\log r.
\end{align*}
Since $m_G(G\setminus C_n)<\alpha$ and $H_{m_G}(S_n)\leq\log 2$, this implies
\begin{align*}
H_{m_G}(\mathcal P_0^{n-1})\leq \log 2+\log M+\log(\lfloor 2\alpha n\rfloor+1)+n\beta(2\alpha)+2\alpha n\log r+\alpha n\log r.
\end{align*}
Dividing by $n$ and taking the limit superior in the definition of the entropy rate of $\mathcal P$ yields
\begin{align*}
h_{m_G}(T,\mathcal P)\leq \beta(2\alpha)+3\alpha\log r.
\end{align*}
The parameter $\alpha\in(0,1/4)$ was arbitrary. Since $\beta(2\alpha)\to0$ and $3\alpha\log r\to0$ as $\alpha\downarrow0$, we obtain
\begin{align*}
h_{m_G}(T,\mathcal P)=0.
\end{align*}
[guided]
The earlier compact-core construction fixed the bad set $B$ before $n$ was chosen. This is essential: the covering number $M$ is then fixed while the orbit length grows. The price is that an orbit segment is not required to avoid $B$ at every time; instead we control the set of points whose orbit visits $B$ too often.
For $n\in\mathbb N$, the function
\begin{align*}
N_n(x):=\sum_{k=0}^{n-1}\mathbb{1}_B(T^k x)
\end{align*}
counts the number of bad visits of $x$ during the first $n$ iterates. Since $T$ preserves $m_G$, its average is
\begin{align*}
\int_G N_n(x)\,d m_G(x)=\sum_{k=0}^{n-1}m_G(T^{-k}B)=n\,m_G(B)<n\alpha^2.
\end{align*}
Thus [Markov's inequality](/theorems/741) applied to the nonnegative measurable function $N_n$ with threshold $\alpha n$ gives a mostly good set
\begin{align*}
C_n:=\{x\in G:N_n(x)\leq\alpha n\}
\end{align*}
with
\begin{align*}
m_G(G\setminus C_n)<\alpha.
\end{align*}
Now fix one covering ball $B(z_\ell,\rho/2)$. If $x,y\in B(z_\ell,\rho/2)\cap C_n$, then $d(x,y)<\rho$, and the isometry property gives $d(T^k x,T^k y)<\rho$ for every $k$. Whenever both iterates avoid $B$, they lie in compact cores. Two different compact cores are at distance at least $2\rho$, so the two iterates must lie in the same core and therefore have the same partition symbol. Hence the two length-$n$ names can disagree only at times that are bad for $x$ or bad for $y$, at most $2\alpha n$ times in total.
Consequently, inside one covering ball, all names lie in a Hamming ball of radius $2\alpha n$ in the alphabet $\{1,\dots,r\}$. The number of such words is bounded by
\begin{align*}
\sum_{j=0}^{\lfloor 2\alpha n\rfloor}\binom{n}{j}r^j.
\end{align*}
We estimate this sum directly. Let $\beta(t)=-t\log t-(1-t)\log(1-t)$ be the binary entropy function. Stirling's inequality gives $\binom{n}{j}\leq\exp(n\beta(j/n))$ for $0\leq j\leq n$, and $\beta$ is increasing on $(0,1/2)$. Since $2\alpha<1/2$, every $0\leq j\leq\lfloor2\alpha n\rfloor$ satisfies
\begin{align*}
\binom{n}{j}r^j\leq\exp(n\beta(2\alpha)+2\alpha n\log r).
\end{align*}
There are at most $\lfloor2\alpha n\rfloor+1$ possible values of $j$, so
\begin{align*}
\sum_{j=0}^{\lfloor 2\alpha n\rfloor}\binom{n}{j}r^j\leq(\lfloor2\alpha n\rfloor+1)\exp(n\beta(2\alpha)+2\alpha n\log r).
\end{align*}
The extra factor is harmless in the entropy rate because its logarithm divided by $n$ tends to zero. Thus, multiplying by the fixed number $M$ of covering balls, the mostly good set $C_n$ meets at most
\begin{align*}
L_n:=M(\lfloor2\alpha n\rfloor+1)\exp(n\beta(2\alpha)+2\alpha n\log r)
\end{align*}
atoms of the join $\mathcal P_0^{n-1}$.
It remains to translate this atom count into entropy. Let
\begin{align*}
S_n:=\{C_n,G\setminus C_n\}
\end{align*}
be the two-set partition. The finite entropy decomposition with respect to $S_n$ says that the information in the join is bounded by the information in the two-set split, plus the conditional information after the side of the split is known:
\begin{align*}
H_{m_G}(\mathcal P_0^{n-1})\leq H_{m_G}(S_n)+H_{m_G}(\mathcal P_0^{n-1}\mid S_n).
\end{align*}
This is the finite-partition identity obtained by grouping the atoms of $\mathcal P_0^{n-1}$ according to their intersection with $C_n$ and $G\setminus C_n$. On $C_n$, at most $L_n$ atoms occur, so the conditional entropy there is at most $\log L_n$. On $G\setminus C_n$, the mass is less than $\alpha$ and there are at most $r^n$ possible atoms, so the conditional contribution is at most $\alpha n\log r$. The binary entropy of $S_n$ is at most $\log 2$. Thus
\begin{align*}
H_{m_G}(\mathcal P_0^{n-1})\leq \log 2+\log M+\log(\lfloor2\alpha n\rfloor+1)+n\beta(2\alpha)+2\alpha n\log r+\alpha n\log r.
\end{align*}
After dividing by $n$ and letting $n\to\infty$, the terms $\log 2$, $\log M$, and $\log(\lfloor2\alpha n\rfloor+1)$ divided by $n$ disappear, leaving
\begin{align*}
h_{m_G}(T,\mathcal P)\leq \beta(2\alpha)+3\alpha\log r.
\end{align*}
Finally let $\alpha\downarrow0$. Both terms on the right tend to zero, so $h_{m_G}(T,\mathcal P)=0$.
[/guided]
[/step]
[step:Take the supremum over finite partitions]
The finite partition $\mathcal P$ was arbitrary. Since entropy is nonnegative,
\begin{align*}
0 \leq h_{m_G}(T,\mathcal P)=0
\end{align*}
for every finite Borel partition $\mathcal P$ of $G$. The Kolmogorov-Sinai entropy of $T$ with respect to $m_G$ is, by definition, the supremum of the finite-partition entropy rates:
\begin{align*}
h_{m_G}(T)=\sup_{\mathcal P} h_{m_G}(T,\mathcal P),
\end{align*}
where the supremum is over all finite Borel partitions of $G$. Therefore
\begin{align*}
h_{m_G}(T)=0.
\end{align*}
This proves that every compact metrizable abelian group rotation has zero measure entropy.
[guided]
We have proved that for every finite Borel partition $\mathcal P$ of $G$, the entropy rate satisfies
\begin{align*}
h_{m_G}(T,\mathcal P)=0.
\end{align*}
The Kolmogorov-Sinai entropy of $T$ with respect to $m_G$ is defined as the supremum of these finite-partition entropy rates:
\begin{align*}
h_{m_G}(T)=\sup_{\mathcal P}h_{m_G}(T,\mathcal P),
\end{align*}
where $\mathcal P$ ranges over all finite Borel partitions of $G$. Since each term in this supremum is zero and entropy is nonnegative, the supremum is zero. Therefore
\begin{align*}
h_{m_G}(T)=0.
\end{align*}
This is exactly the desired conclusion for the rotation $T(x)=x+a$ on the compact metrizable abelian group $G$.
[/guided]
[/step]