[guided]The idea is to "zoom in" at finer and finer scales. At scale $1/m$, total boundedness guarantees a finite covering; since an infinite sequence meets finitely many sets, some set must capture infinitely many terms. We pass to the subsequence living in that single ball, then repeat at scale $1/(m+1)$.
Formally, assume $A$ is totally bounded. Let $\{a_k\}_{k=1}^\infty$ be any sequence in $A$. We recursively define a nested family of infinite subsequences: set $\{a_k^{(0)}\}_{k=1}^\infty := \{a_k\}_{k=1}^\infty$.
At stage $m \ge 1$, since $A$ is totally bounded, there exists a finite set $\{x_1, \ldots, x_{N_m}\} \subset M$ with $A \subset \bigcup_{i=1}^{N_m} B(x_i, 1/m)$. Every term of the infinite sequence $\{a_k^{(m-1)}\}_{k=1}^\infty$ lies in $A$, hence in at least one of the $N_m$ balls. Since there are infinitely many terms and only $N_m$ balls, the pigeonhole principle guarantees at least one ball --- call it $B(x_{i_m}, 1/m)$ --- contains infinitely many terms. We define $\{a_k^{(m)}\}_{k=1}^\infty$ as the subsequence of $\{a_k^{(m-1)}\}_{k=1}^\infty$ consisting of precisely those terms in $B(x_{i_m}, 1/m)$.
Two properties are immediate. First, $\{a_k^{(m)}\}_{k=1}^\infty$ is a subsequence of $\{a_k^{(m-1)}\}_{k=1}^\infty$ and hence of the original sequence. Second, for any indices $j, k$, both $a_j^{(m)}$ and $a_k^{(m)}$ lie in $B(x_{i_m}, 1/m)$, so the triangle inequality gives
\begin{align*}
d(a_j^{(m)}, a_k^{(m)}) \le d(a_j^{(m)}, x_{i_m}) + d(x_{i_m}, a_k^{(m)}) < \frac{1}{m} + \frac{1}{m} = \frac{2}{m}.
\end{align*}
Why do we need this nested construction rather than a single application of the pigeonhole principle? Because a single application at scale $\varepsilon$ only guarantees that the subsequence terms are within $2\varepsilon$ of each other --- not that they form a Cauchy sequence. We need the diameter to shrink as the scale refines, which requires iterating over all $m$.[/guided]