[proofplan]
We first prove the result for finite products by induction, then extend to countable products. For the finite case, we construct an $\varepsilon$-net for $M_1 \times \cdots \times M_N$ from the Cartesian product of individual $\varepsilon$-nets. For the countable product, we split the metric $d(x, y) = \sum_{k=1}^\infty 2^{-k} \min(d_k(x_k, y_k), 1)$ into a finite head (controlled by total boundedness of finitely many factors) and an infinite tail (controlled by the geometric decay of the weights $2^{-k}$).
[/proofplan]
[step:Establish the finite product case by constructing Cartesian $\varepsilon$-nets]
We first treat the finite product $M = M_1 \times \cdots \times M_N$ equipped with the metric
\begin{align*}
d(x, y) = \sum_{k=1}^N 2^{-k} \min(d_k(x_k, y_k), 1).
\end{align*}
Fix $\varepsilon > 0$. For each $k \in \{1, \ldots, N\}$, the space $(M_k, d_k)$ is totally bounded. Choose $\delta_k > 0$ such that $2^{-k} \min(\delta_k, 1) < \varepsilon / N$ --- for instance, $\delta_k = \varepsilon \cdot 2^k / N$ works when $\varepsilon \cdot 2^k / N < 1$, and any $\delta_k > 0$ works otherwise since $2^{-k} \min(\delta_k, 1) \le 2^{-k} \le 2^{-1} < \varepsilon/N$ when $\varepsilon > N/2$. More precisely, set $\delta_k = \min(\varepsilon \cdot 2^k / N, \, 1)$. By total boundedness of $M_k$, there exists a finite set $F_k \subset M_k$ with $M_k \subset \bigcup_{a \in F_k} B_{d_k}(a, \delta_k)$.
Define $F = F_1 \times F_2 \times \cdots \times F_N \subset M$, which is finite with $|F| = \prod_{k=1}^N |F_k|$. For any $x = (x_1, \ldots, x_N) \in M$, choose $a_k \in F_k$ with $d_k(x_k, a_k) < \delta_k$ for each $k$. Set $a = (a_1, \ldots, a_N) \in F$. Then
\begin{align*}
d(x, a) = \sum_{k=1}^N 2^{-k} \min(d_k(x_k, a_k), 1) < \sum_{k=1}^N 2^{-k} \min(\delta_k, 1) \le \sum_{k=1}^N \frac{\varepsilon}{N} = \varepsilon.
\end{align*}
Therefore $F$ is a finite $\varepsilon$-net for $M$, and the finite product is totally bounded.
[guided]
The idea for a finite product is natural: approximate each coordinate separately. The only subtlety is distributing the $\varepsilon$-budget across the $N$ coordinates, accounting for the weights $2^{-k}$ in the product metric.
Fix $\varepsilon > 0$. For each factor $M_k$, we need $2^{-k} \min(d_k(x_k, a_k), 1) < \varepsilon / N$. Since $2^{-k} \min(\delta_k, 1) < \varepsilon/N$ is achieved by choosing $\delta_k$ so that $\min(\delta_k, 1) < \varepsilon \cdot 2^k / N$, we set $\delta_k = \min(\varepsilon \cdot 2^k / N, 1)$. Total boundedness of $M_k$ then provides a finite $\delta_k$-net $F_k$.
The Cartesian product $F = \prod_{k=1}^N F_k$ is finite (a product of finitely many finite sets). For any $x \in M$, choose the nearest net point in each coordinate to get $a \in F$, and sum the coordinate contributions:
\begin{align*}
d(x, a) = \sum_{k=1}^N 2^{-k} \min(d_k(x_k, a_k), 1) < \sum_{k=1}^N \frac{\varepsilon}{N} = \varepsilon.
\end{align*}
Why does this argument not extend directly to countably many factors? Because the Cartesian product $\prod_{k=1}^\infty F_k$ is infinite (even uncountable in general) when infinitely many $F_k$ have more than one element. We need a different strategy for the tail.
[/guided]
[/step]
[step:Handle the tail of the series using the geometric decay of the weights]
Now consider the countable product $M = \prod_{k=1}^\infty M_k$ with $d(x, y) = \sum_{k=1}^\infty 2^{-k} \min(d_k(x_k, y_k), 1)$.
Fix $\varepsilon > 0$. Since the series $\sum_{k=1}^\infty 2^{-k} = 1$ converges, we can choose $K \in \mathbb{N}$ large enough that
\begin{align*}
\sum_{k=K+1}^\infty 2^{-k} = 2^{-K} < \frac{\varepsilon}{2}.
\end{align*}
For any $x, y \in M$, regardless of the coordinates $x_k, y_k$ for $k > K$, the tail contribution satisfies
\begin{align*}
\sum_{k=K+1}^\infty 2^{-k} \min(d_k(x_k, y_k), 1) \le \sum_{k=K+1}^\infty 2^{-k} = 2^{-K} < \frac{\varepsilon}{2}.
\end{align*}
Therefore, the tail is uniformly small.
[guided]
The countable product metric is a weighted sum with geometrically decaying weights $2^{-k}$. The total weight is $\sum_{k=1}^\infty 2^{-k} = 1 < \infty$. This convergence is what makes the countable product tractable: beyond some index $K$, the remaining coordinates contribute at most $2^{-K}$ to the distance, regardless of how far apart the points are in those coordinates.
Choose $K$ with $2^{-K} < \varepsilon/2$. Since $\min(d_k(x_k, y_k), 1) \le 1$, the tail is bounded by
\begin{align*}
\sum_{k=K+1}^\infty 2^{-k} \min(d_k(x_k, y_k), 1) \le \sum_{k=K+1}^\infty 2^{-k} = 2^{-K} < \frac{\varepsilon}{2}.
\end{align*}
This bound is independent of $x$ and $y$ --- it holds for all points in $M$. The remaining $\varepsilon/2$ budget is spent on approximating the first $K$ coordinates.
[/guided]
[/step]
[step:Cover the first $K$ coordinates using the finite product result and combine]
By the finite product case (proved above), the finite product $M_1 \times \cdots \times M_K$ equipped with the metric $d'(x', y') = \sum_{k=1}^K 2^{-k} \min(d_k(x_k, y_k), 1)$ is totally bounded. There exists a finite set $\{a^{(1)}, \ldots, a^{(L)}\} \subset M_1 \times \cdots \times M_K$ such that for every $(x_1, \ldots, x_K) \in M_1 \times \cdots \times M_K$, there exists $j$ with
\begin{align*}
\sum_{k=1}^K 2^{-k} \min(d_k(x_k, a^{(j)}_k), 1) < \frac{\varepsilon}{2}.
\end{align*}
For each $j \in \{1, \ldots, L\}$, define $\tilde{a}^{(j)} \in M$ by setting the first $K$ coordinates to $a^{(j)}_1, \ldots, a^{(j)}_K$ and choosing the remaining coordinates $\tilde{a}^{(j)}_k$ for $k > K$ arbitrarily (any fixed element of $M_k$ suffices, since the tail contribution is uniformly controlled).
For any $x \in M$, choose $j$ such that the first $K$ coordinates of $x$ are within $\varepsilon/2$ of $a^{(j)}$ in the metric $d'$. Then
\begin{align*}
d(x, \tilde{a}^{(j)}) &= \sum_{k=1}^K 2^{-k} \min(d_k(x_k, a^{(j)}_k), 1) + \sum_{k=K+1}^\infty 2^{-k} \min(d_k(x_k, \tilde{a}^{(j)}_k), 1) \\
&< \frac{\varepsilon}{2} + 2^{-K} < \frac{\varepsilon}{2} + \frac{\varepsilon}{2} = \varepsilon.
\end{align*}
Therefore $\{\tilde{a}^{(1)}, \ldots, \tilde{a}^{(L)}\}$ is a finite $\varepsilon$-net for $M$, and $M$ is totally bounded.
[guided]
We combine the two ingredients: the finite product covers the "head" (first $K$ coordinates), and the geometric decay handles the "tail" (coordinates $k > K$).
From the previous step, we have $K$ with $2^{-K} < \varepsilon/2$. The finite product $M_1 \times \cdots \times M_K$ is totally bounded (by the finite product result), so there exists a finite $\varepsilon/2$-net $\{a^{(1)}, \ldots, a^{(L)}\}$ for it under the metric $d'$.
We "embed" each net point into the full product $M$ by extending it to all coordinates. For $k > K$, the choice of coordinate does not matter: pick any $c_k \in M_k$ (which exists since each $M_k$ is non-empty --- a totally bounded space is non-empty). Define $\tilde{a}^{(j)} = (a^{(j)}_1, \ldots, a^{(j)}_K, c_{K+1}, c_{K+2}, \ldots) \in M$.
For any $x = (x_1, x_2, \ldots) \in M$, the head-tail decomposition gives
\begin{align*}
d(x, \tilde{a}^{(j)}) &= \underbrace{\sum_{k=1}^K 2^{-k} \min(d_k(x_k, a^{(j)}_k), 1)}_{\text{head: } < \varepsilon/2} + \underbrace{\sum_{k=K+1}^\infty 2^{-k} \min(d_k(x_k, c_k), 1)}_{\text{tail: } \le 2^{-K} < \varepsilon/2},
\end{align*}
where $j$ is chosen so that the head is less than $\varepsilon/2$. The total is less than $\varepsilon$.
The set $\{\tilde{a}^{(1)}, \ldots, \tilde{a}^{(L)}\}$ is a finite $\varepsilon$-net for $M$. Since $\varepsilon > 0$ was arbitrary, $M$ is totally bounded. The argument also applies to any product metric of the form $d(x, y) = \sum_{k=1}^\infty w_k \min(d_k(x_k, y_k), 1)$ with $\sum w_k < \infty$ and $w_k > 0$, since the only property used is the convergence of the weight series.
[/guided]
[/step]