[proofplan]
We compare separated sets at an arbitrary small scale $r>0$ with separated sets at the fixed expansivity scale $\varepsilon$. The inequality from the fixed scale to entropy follows from monotonicity of separated sets as the scale decreases. For the reverse inequality, compactness and expansivity give a finite observation window: if two orbit segments stay $\delta$-close for that many iterates, then their initial points are $r$-close. Applying this window at each time along an orbit block shows that every $(n,r)$-separated set is automatically $(n+N,\varepsilon)$-separated, and the bounded loss of $N$ iterates disappears after dividing by $n$.
[/proofplan]
[step:Introduce the separated growth rate at each scale]
Let $\mathbb{N}_0:=\{0,1,2,\dots\}$ denote the set of nonnegative integers. For each $n\in\mathbb{N}$, define the Bowen orbit metric $d_n:X\times X\to[0,\infty)$ by
\begin{align*}
d_n(x,y)=\max_{0\leq k\leq n-1} d(T^k x,T^k y).
\end{align*}
Thus a subset $E\subset X$ is $(n,r)$-separated exactly when $d_n(x,y)>r$ for every distinct $x,y\in E$, and $s_n(r)$ denotes the maximal cardinality of such a set. For each $r>0$, define the scale-$r$ separated growth rate $a(r)\in[0,\infty]$ by
\begin{align*}
a(r)=\limsup_{n\to\infty}\frac{1}{n}\log s_n(r).
\end{align*}
In this theorem we use the separated-set definition of topological entropy on compact metric spaces, namely
\begin{align*}
h_{\mathrm{top}}(T)=\lim_{r\downarrow 0} a(r)=\sup_{r>0} a(r).
\end{align*}
The equality with the supremum follows because $a(r)$ is monotone nonincreasing in $r$: if $0<r_1<r_2$, then every $(n,r_2)$-separated set is $(n,r_1)$-separated, so $s_n(r_1)\geq s_n(r_2)$ for every $n\in\mathbb{N}$.
[/step]
[step:Use monotonicity to obtain the lower bound]
Since $0<\varepsilon$ is one admissible separation scale in the supremum formula for entropy, we have
\begin{align*}
a(\varepsilon)\leq \sup_{r>0} a(r)=h_{\mathrm{top}}(T).
\end{align*}
Equivalently,
\begin{align*}
\limsup_{n\to\infty}\frac{1}{n}\log s_n(\varepsilon)\leq h_{\mathrm{top}}(T).
\end{align*}
[/step]
[step:Upgrade expansivity to a finite observation window]
Fix an arbitrary number $r>0$. We claim that there exists $N=N(r)\in\mathbb{N}$ such that for all $x,y\in X$,
\begin{align*}
\bigl(d(T^k x,T^k y)\leq \delta \text{ for every } 0\leq k\leq N\bigr) \implies d(x,y)<r.
\end{align*}
Suppose no such $N$ exists. Then for each $m\in\mathbb{N}$ there exist points $x_m,y_m\in X$ such that
\begin{align*}
d(T^k x_m,T^k y_m)\leq \delta \text{ for every } 0\leq k\leq m
\end{align*}
and
\begin{align*}
d(x_m,y_m)\geq r.
\end{align*}
Since $X$ is compact, there is a strictly increasing sequence of indices $(m_j)_{j\in\mathbb{N}}$ and points $x,y\in X$ such that $x_{m_j}\to x$ and $y_{m_j}\to y$ in $X$. Fix $k\in\mathbb{N}_0$. For all sufficiently large $j$, we have $m_j\geq k$, hence
\begin{align*}
d(T^k x_{m_j},T^k y_{m_j})\leq \delta.
\end{align*}
Because $T^k:X\to X$ is continuous, passing to the limit gives
\begin{align*}
d(T^k x,T^k y)\leq \delta.
\end{align*}
Thus $d(T^k x,T^k y)\leq \delta$ for every $k\in\mathbb{N}_0$. By expansivity at scale $\delta$, we get $x=y$. On the other hand, continuity of $d:X\times X\to[0,\infty)$ gives
\begin{align*}
d(x,y)=\lim_{j\to\infty}d(x_{m_j},y_{m_j})\geq r,
\end{align*}
contradicting $x=y$. Therefore the finite observation window exists.
[guided]
Fix $r>0$. We want to convert the infinite-time definition of expansivity into a finite-time statement. Expansivity says that if two points remain $\delta$-close for every nonnegative iterate, then they are the same point. The finite statement we need is stronger in a useful compactness direction: there is some finite number $N=N(r)$ such that remaining $\delta$-close only up to time $N$ already forces the initial points to be within distance $r$.
Assume, toward a contradiction, that no such $N$ exists. Then for every $m\in\mathbb{N}$ we can find points $x_m,y_m\in X$ satisfying
\begin{align*}
d(T^k x_m,T^k y_m)\leq \delta \text{ for every } 0\leq k\leq m
\end{align*}
but still separated initially by
\begin{align*}
d(x_m,y_m)\geq r.
\end{align*}
The role of compactness is now to pass from longer and longer finite orbit agreements to an infinite orbit agreement. Since $X$ is compact, the sequence of pairs $(x_m,y_m)$ in the compact [metric space](/page/Metric%20Space) $X\times X$ has a convergent subsequence. Thus there exist a strictly increasing sequence $(m_j)_{j\in\mathbb{N}}$ and points $x,y\in X$ such that $x_{m_j}\to x$ and $y_{m_j}\to y$.
Now fix a single time $k\in\mathbb{N}_0$. Because $m_j\to\infty$, we have $m_j\geq k$ for all sufficiently large $j$. For those $j$, the defining property of $x_{m_j},y_{m_j}$ gives
\begin{align*}
d(T^k x_{m_j},T^k y_{m_j})\leq \delta.
\end{align*}
The map $T^k:X\to X$ is continuous because it is an iterate of the continuous map $T:X\to X$. Therefore $T^k x_{m_j}\to T^k x$ and $T^k y_{m_j}\to T^k y$, and continuity of the metric gives
\begin{align*}
d(T^k x,T^k y)\leq \delta.
\end{align*}
Since $k\in\mathbb{N}_0$ was arbitrary, the points $x$ and $y$ remain $\delta$-close for every nonnegative iterate. The expansivity hypothesis at scale $\delta$ therefore implies $x=y$.
But the initial separation also passes to the limit. Since $d:X\times X\to[0,\infty)$ is continuous,
\begin{align*}
d(x,y)=\lim_{j\to\infty}d(x_{m_j},y_{m_j})\geq r.
\end{align*}
This contradicts $x=y$, because $r>0$. Hence there must exist a finite $N=N(r)$ with the desired property.
[/guided]
[/step]
[step:Compare small-scale separated sets with fixed-scale separated sets]
Fix $r>0$, and let $N=N(r)$ be the integer obtained in the previous step. We prove that for every $n\in\mathbb{N}$,
\begin{align*}
s_n(r)\leq s_{n+N}(\varepsilon).
\end{align*}
Let $E\subset X$ be any $(n,r)$-separated set. Take distinct points $x,y\in E$. Since $E$ is $(n,r)$-separated, there exists an index $i\in\{0,\dots,n-1\}$ such that
\begin{align*}
d(T^i x,T^i y)>r.
\end{align*}
If, contrary to what we need, $d_{n+N}(x,y)\leq \varepsilon$, where $d_{n+N}$ is the Bowen orbit metric defined in the first step, then for every $\ell\in\{0,\dots,N\}$ we would have $i+\ell\leq n+N-1$ and therefore
\begin{align*}
d(T^\ell(T^i x),T^\ell(T^i y))=d(T^{i+\ell}x,T^{i+\ell}y)\leq d_{n+N}(x,y)\leq \varepsilon<\delta.
\end{align*}
The finite observation window applied to the pair $T^i x,T^i y$ would imply
\begin{align*}
d(T^i x,T^i y)<r,
\end{align*}
contradicting the choice of $i$. Hence $d_{n+N}(x,y)>\varepsilon$ for every distinct $x,y\in E$, so $E$ is $(n+N,\varepsilon)$-separated. Taking maximal cardinalities gives $s_n(r)\leq s_{n+N}(\varepsilon)$.
[guided]
Fix $r>0$, and let $N=N(r)$ be the finite observation window obtained above. We want to prove that any set separated at the smaller scale $r$ for $n$ iterates is also separated at the fixed scale $\varepsilon$ once we allow the orbit comparison to run for $N$ additional iterates.
Let $E\subset X$ be an arbitrary $(n,r)$-separated set, and choose distinct points $x,y\in E$. By the definition of $(n,r)$-separation through the Bowen orbit metric $d_n$, there exists an index $i\in\{0,\dots,n-1\}$ such that
\begin{align*}
d(T^i x,T^i y)>r.
\end{align*}
We prove that $x$ and $y$ are also separated at scale $\varepsilon$ in the longer Bowen metric $d_{n+N}$.
Assume for contradiction that $d_{n+N}(x,y)\leq \varepsilon$. Then for every $\ell\in\{0,\dots,N\}$, the index $i+\ell$ satisfies $0\leq i+\ell\leq n+N-1$. Therefore the definition of $d_{n+N}$ gives
\begin{align*}
d(T^\ell(T^i x),T^\ell(T^i y))=d(T^{i+\ell}x,T^{i+\ell}y)\leq d_{n+N}(x,y)\leq \varepsilon<\delta.
\end{align*}
Thus the two points $T^i x$ and $T^i y$ remain $\delta$-close for the whole finite window $0\leq \ell\leq N$. The finite observation window applies exactly to this pair of initial points, so it yields
\begin{align*}
d(T^i x,T^i y)<r.
\end{align*}
This contradicts the index $i$ chosen from $(n,r)$-separation. Hence $d_{n+N}(x,y)>\varepsilon$ for every distinct $x,y\in E$. Since $E$ was arbitrary, every $(n,r)$-separated set is $(n+N,\varepsilon)$-separated, and taking maximal cardinalities gives
\begin{align*}
s_n(r)\leq s_{n+N}(\varepsilon).
\end{align*}
[/guided]
[/step]
[step:Pass to limsups and remove the bounded time loss]
From the comparison $s_n(r)\leq s_{n+N}(\varepsilon)$, we obtain for every $n\in\mathbb{N}$,
\begin{align*}
\frac{1}{n}\log s_n(r)\leq \frac{1}{n}\log s_{n+N}(\varepsilon).
\end{align*}
Define $m=n+N$. Then $m\to\infty$ as $n\to\infty$, and
\begin{align*}
\frac{1}{n}\log s_{n+N}(\varepsilon)=\frac{m}{m-N}\cdot \frac{1}{m}\log s_m(\varepsilon).
\end{align*}
Set $b_m=\frac{1}{m}\log s_m(\varepsilon)$ for $m\in\mathbb{N}$. The sequence $(b_m)$ is nonnegative because $s_m(\varepsilon)\geq 1$, and the factors $c_m=\frac{m}{m-N}$ satisfy $c_m\to 1$ as $m\to\infty$. The standard limsup rule for nonnegative sequences gives $\limsup_{m\to\infty} c_m b_m=\limsup_{m\to\infty} b_m$, with the equality still valid when the common value is $+\infty$: for every $\eta>0$, eventually $1\leq c_m\leq 1+\eta$, so the finite case is squeezed and the infinite case remains infinite. Therefore taking limsups gives
\begin{align*}
a(r)\leq a(\varepsilon).
\end{align*}
Because $r>0$ was arbitrary, taking the supremum over all $r>0$ yields
\begin{align*}
h_{\mathrm{top}}(T)=\sup_{r>0}a(r)\leq a(\varepsilon).
\end{align*}
Together with the lower bound $a(\varepsilon)\leq h_{\mathrm{top}}(T)$, this proves
\begin{align*}
h_{\mathrm{top}}(T)=a(\varepsilon)=\limsup_{n\to\infty}\frac{1}{n}\log s_n(\varepsilon).
\end{align*}
[guided]
The comparison estimate says that, for the fixed $r>0$ and its associated integer $N=N(r)$,
\begin{align*}
s_n(r)\leq s_{n+N}(\varepsilon)
\end{align*}
for every $n\in\mathbb{N}$. Taking logarithms and dividing by $n$ gives
\begin{align*}
\frac{1}{n}\log s_n(r)\leq \frac{1}{n}\log s_{n+N}(\varepsilon).
\end{align*}
Now define $m=n+N$. Then $m\to\infty$ exactly when $n\to\infty$, and $n=m-N$. Hence
\begin{align*}
\frac{1}{n}\log s_{n+N}(\varepsilon)=\frac{m}{m-N}\cdot \frac{1}{m}\log s_m(\varepsilon).
\end{align*}
The only point requiring care is the extra factor $\frac{m}{m-N}$. It tends to $1$, so it should not change the exponential growth rate. Formally, set $b_m=\frac{1}{m}\log s_m(\varepsilon)$ and $c_m=\frac{m}{m-N}$. Since $s_m(\varepsilon)\geq 1$, the sequence $(b_m)$ is nonnegative. Since $c_m\to 1$, for every $\eta>0$ we have $1\leq c_m\leq 1+\eta$ for all sufficiently large $m$. Therefore
\begin{align*}
\limsup_{m\to\infty} c_m b_m\leq (1+\eta)\limsup_{m\to\infty} b_m
\end{align*}
when the right-hand side is finite, and the same estimate shows that no finite upper bound can appear when $\limsup b_m=+\infty$. Letting $\eta\downarrow 0$ in the finite case gives
\begin{align*}
\limsup_{m\to\infty} c_m b_m=\limsup_{m\to\infty} b_m.
\end{align*}
Thus
\begin{align*}
a(r)\leq a(\varepsilon).
\end{align*}
Because this holds for every $r>0$, taking the supremum over all positive $r$ gives
\begin{align*}
h_{\mathrm{top}}(T)=\sup_{r>0}a(r)\leq a(\varepsilon).
\end{align*}
The earlier monotonicity argument gave $a(\varepsilon)\leq h_{\mathrm{top}}(T)$. Combining the two inequalities proves
\begin{align*}
h_{\mathrm{top}}(T)=a(\varepsilon)=\limsup_{n\to\infty}\frac{1}{n}\log s_n(\varepsilon).
\end{align*}
[/guided]
[/step]