[proofplan]
We compute topological entropy using the standard symbolic metric on the subshift. At a fixed sufficiently small scale, distinct length-$n$ words give an $(n,\varepsilon)$-separated set, while length-$(n+q)$ cylinders give an $(n,\varepsilon)$-spanning set, where $q$ depends only on $\varepsilon$. These two estimates sandwich the separated and spanning entropy growth rates between the same word-growth limit. Finally, we prove the existence of that limit directly from subadditivity of $\log |\mathcal{L}_n(\Sigma)|$.
[/proofplan]
[step:Equip the subshift with the standard symbolic metric]
The theorem gives a one-sided subshift $\Sigma\subset A^{\mathbb Z_{\ge 0}}$ and its restricted shift map $\sigma:\Sigma\to\Sigma$. For $n\in\mathbb{N}$, define the language at length $n$ by
\begin{align*}
\mathcal{L}_n(\Sigma):=\{(x_0,\dots,x_{n-1})\in A^n:x\in\Sigma\}.
\end{align*}
Define $N:\Sigma \times \Sigma \to \mathbb{Z}_{\geq 0}\cup\{\infty\}$ by
\begin{align*}
N(x,y):=\inf\{k \in \mathbb{Z}_{\geq 0}: x_k \neq y_k\},
\end{align*}
with the convention that $\inf \varnothing=\infty$. Define the metric $d:\Sigma \times \Sigma \to [0,1]$ as follows. Set $d(x,x):=0$ for every $x\in\Sigma$, and for $x,y\in\Sigma$ with $x\neq y$ set
\begin{align*}
d(x,y):=2^{-N(x,y)}.
\end{align*}
This metric induces the [product topology](/page/Product%20Topology) on $\Sigma$. For each $n \in \mathbb{N}$, define the Bowen metric $d_n:\Sigma \times \Sigma \to [0,1]$ by
\begin{align*}
d_n(x,y):=\max_{0\leq j<n} d(\sigma^j x,\sigma^j y).
\end{align*}
For $\varepsilon>0$, let $s_n(\varepsilon)$ denote the maximal cardinality of an $(n,\varepsilon)$-separated set in $\Sigma$, and let $\rho_n(\varepsilon)$ denote the minimal cardinality of an $(n,\varepsilon)$-spanning set in $\Sigma$, both computed with respect to $d_n$.
[/step]
[step:Construct separated sets from admissible words]
Fix $0<\varepsilon<1$. For each word $w=(w_0,\dots,w_{n-1}) \in \mathcal{L}_n(\Sigma)$, choose one point $x_w \in \Sigma$ such that $(x_w)_i=w_i$ for $0\leq i<n$. Define
\begin{align*}
E_n:=\{x_w : w \in \mathcal{L}_n(\Sigma)\}.
\end{align*}
If $w,v \in \mathcal{L}_n(\Sigma)$ are distinct, choose an index $i \in \{0,\dots,n-1\}$ such that $w_i \neq v_i$. Then the zeroth coordinates of $\sigma^i x_w$ and $\sigma^i x_v$ differ, so
\begin{align*}
d(\sigma^i x_w,\sigma^i x_v)=1.
\end{align*}
Therefore
\begin{align*}
d_n(x_w,x_v)\geq 1>\varepsilon.
\end{align*}
Thus $E_n$ is $(n,\varepsilon)$-separated, and
\begin{align*}
s_n(\varepsilon)\geq |\mathcal{L}_n(\Sigma)|.
\end{align*}
[guided]
The purpose of this step is to turn admissible words into genuinely separated orbit segments. Fix $0<\varepsilon<1$. For every word $w=(w_0,\dots,w_{n-1}) \in \mathcal{L}_n(\Sigma)$, the definition of the language says that there exists at least one point $x_w \in \Sigma$ whose first $n$ coordinates are exactly $w$. We choose one such point for each word and define
\begin{align*}
E_n:=\{x_w : w \in \mathcal{L}_n(\Sigma)\}.
\end{align*}
Now take two different words $w,v \in \mathcal{L}_n(\Sigma)$. Since they are different elements of $A^n$, there exists an index $i \in \{0,\dots,n-1\}$ with $w_i\neq v_i$. After applying the shift $i$ times, that disagreement moves to coordinate $0$. Hence $\sigma^i x_w$ and $\sigma^i x_v$ differ in their zeroth coordinate, so the first index at which they differ is $0$. By the definition of $d$,
\begin{align*}
d(\sigma^i x_w,\sigma^i x_v)=2^0=1.
\end{align*}
The Bowen metric $d_n$ is the maximum of these shifted distances over $0\leq j<n$, so
\begin{align*}
d_n(x_w,x_v)\geq d(\sigma^i x_w,\sigma^i x_v)=1>\varepsilon.
\end{align*}
Thus distinct length-$n$ words give distinct points that are $(n,\varepsilon)$-separated. Consequently,
\begin{align*}
s_n(\varepsilon)\geq |E_n|=|\mathcal{L}_n(\Sigma)|.
\end{align*}
[/guided]
[/step]
[step:Construct spanning sets from longer admissible words]
Fix $0<\varepsilon<1$, and choose $q \in \mathbb{N}$ such that $2^{-q}<\varepsilon$. For each word $u \in \mathcal{L}_{n+q}(\Sigma)$, choose one point $y_u \in \Sigma$ whose first $n+q$ coordinates form $u$, and define
\begin{align*}
F_{n,q}:=\{y_u : u \in \mathcal{L}_{n+q}(\Sigma)\}.
\end{align*}
Let $x \in \Sigma$. Its first $n+q$ coordinates form a word $u_x \in \mathcal{L}_{n+q}(\Sigma)$. By construction, $x$ and $y_{u_x}$ agree in coordinates $0,\dots,n+q-1$. Hence, for every $j \in \{0,\dots,n-1\}$, the points $\sigma^j x$ and $\sigma^j y_{u_x}$ agree in coordinates $0,\dots,q$. Therefore
\begin{align*}
d(\sigma^j x,\sigma^j y_{u_x})\leq 2^{-(q+1)}<\varepsilon.
\end{align*}
It follows that
\begin{align*}
d_n(x,y_{u_x})<\varepsilon.
\end{align*}
Thus $F_{n,q}$ is $(n,\varepsilon)$-spanning, and
\begin{align*}
\rho_n(\varepsilon)\leq |\mathcal{L}_{n+q}(\Sigma)|.
\end{align*}
[/step]
[step:Prove the word growth limit from subadditivity]
Define $a_n:=\log |\mathcal{L}_n(\Sigma)|$ for $n \in \mathbb{N}$. Since $\Sigma$ is nonempty, every $\mathcal{L}_n(\Sigma)$ is nonempty, so $a_n$ is finite and nonnegative.
For $n,m \in \mathbb{N}$, every word in $\mathcal{L}_{n+m}(\Sigma)$ occurs as coordinates $0,\dots,n+m-1$ of some point $x\in\Sigma$. Its prefix of length $n$ belongs to $\mathcal{L}_n(\Sigma)$. Its suffix of length $m$ occurs as coordinates $0,\dots,m-1$ of $\sigma^n x$, and this point lies in $\Sigma$ because $\Sigma$ is shift-invariant; hence the suffix belongs to $\mathcal{L}_m(\Sigma)$. Therefore the map sending a length-$(n+m)$ word to its prefix-suffix pair is injective from $\mathcal{L}_{n+m}(\Sigma)$ to $\mathcal{L}_n(\Sigma)\times \mathcal{L}_m(\Sigma)$. Consequently,
\begin{align*}
|\mathcal{L}_{n+m}(\Sigma)|\leq |\mathcal{L}_n(\Sigma)|\,|\mathcal{L}_m(\Sigma)|.
\end{align*}
Taking logarithms gives
\begin{align*}
a_{n+m}\leq a_n+a_m.
\end{align*}
Thus $(a_n)_{n\in\mathbb{N}}$ is subadditive.
Let
\begin{align*}
\alpha:=\inf_{k\in\mathbb{N}}\frac{a_k}{k}.
\end{align*}
Since $\alpha\leq a_n/n$ for every $n$, it remains to prove the reverse inequality for the limit superior. Fix $k\in\mathbb{N}$. For each $n\in\mathbb{N}$, write $n=Q_n k+R_n$ with quotient $Q_n\in\mathbb{Z}_{\geq 0}$ and remainder $R_n\in\{0,\dots,k-1\}$. Subadditivity gives
\begin{align*}
a_n\leq Q_n a_k+a_{R_n}
\end{align*}
when $R_n\geq 1$, and gives $a_n\leq Q_n a_k$ when $R_n=0$. Define
\begin{align*}
M_k:=\max\{a_r:1\leq r\leq k-1\},
\end{align*}
with $M_1:=0$. Then
\begin{align*}
\frac{a_n}{n}\leq \frac{Q_n a_k}{n}+\frac{M_k}{n}.
\end{align*}
Taking the limit superior as $n\to\infty$ yields
\begin{align*}
\limsup_{n\to\infty}\frac{a_n}{n}\leq \frac{a_k}{k}.
\end{align*}
Taking the infimum over $k$ gives
\begin{align*}
\limsup_{n\to\infty}\frac{a_n}{n}\leq \alpha.
\end{align*}
Since $\alpha\leq \liminf_{n\to\infty} a_n/n$, the limit exists and equals $\alpha$:
\begin{align*}
\lim_{n\to\infty}\frac{1}{n}\log |\mathcal{L}_n(\Sigma)|=\alpha.
\end{align*}
[/step]
[step:Compare separated and spanning entropy with word growth]
Let
\begin{align*}
L:=\lim_{n\to\infty}\frac{1}{n}\log |\mathcal{L}_n(\Sigma)|.
\end{align*}
From the separated-set estimate, for every $0<\varepsilon<1$ and every $n\in\mathbb{N}$,
\begin{align*}
\frac{1}{n}\log s_n(\varepsilon)\geq \frac{1}{n}\log |\mathcal{L}_n(\Sigma)|.
\end{align*}
Hence
\begin{align*}
\limsup_{n\to\infty}\frac{1}{n}\log s_n(\varepsilon)\geq L.
\end{align*}
For the spanning estimate, fix $0<\varepsilon<1$ and choose $q\in\mathbb{N}$ with $2^{-q}<\varepsilon$. Then
\begin{align*}
\frac{1}{n}\log \rho_n(\varepsilon)\leq \frac{1}{n}\log |\mathcal{L}_{n+q}(\Sigma)|.
\end{align*}
Since
\begin{align*}
\frac{1}{n}\log |\mathcal{L}_{n+q}(\Sigma)|=\frac{n+q}{n}\cdot \frac{1}{n+q}\log |\mathcal{L}_{n+q}(\Sigma)|,
\end{align*}
letting $n\to\infty$ gives
\begin{align*}
\limsup_{n\to\infty}\frac{1}{n}\log \rho_n(\varepsilon)\leq L.
\end{align*}
Since $A$ is finite, $A^{\mathbb{Z}_{\geq 0}}$ is compact in the product topology, and the closed shift-invariant subset $\Sigma$ is compact. The restricted shift map $\sigma:\Sigma\to\Sigma$ is continuous because it is the restriction of the continuous product shift. Thus $(\Sigma,d,\sigma)$ is a compact metric dynamical system. By the separated-spanning characterization theorem for topological entropy of compact metric dynamical systems, the entropy can be computed either from maximal separated sets or from minimal spanning sets; in the present notation this gives
\begin{align*}
h_{\mathrm{top}}(\sigma|_\Sigma)=\lim_{\varepsilon\downarrow 0}\limsup_{n\to\infty}\frac{1}{n}\log s_n(\varepsilon)=\lim_{\varepsilon\downarrow 0}\limsup_{n\to\infty}\frac{1}{n}\log \rho_n(\varepsilon).
\end{align*}
The preceding lower bound through separated sets and upper bound through spanning sets imply
\begin{align*}
h_{\mathrm{top}}(\sigma|_\Sigma)=L.
\end{align*}
This is exactly
\begin{align*}
h_{\mathrm{top}}(\sigma|_\Sigma)=\lim_{n\to\infty}\frac{1}{n}\log |\mathcal L_n(\Sigma)|.
\end{align*}
[/step]