[proofplan]
We prove the two inequalities $\mathcal{P}(A, 2\varepsilon) \le \mathcal{N}(A, \varepsilon) \le \mathcal{P}(A, \varepsilon)$ separately. For the upper bound $\mathcal{N}(A, \varepsilon) \le \mathcal{P}(A, \varepsilon)$, we show that a maximal $\varepsilon$-separated subset of $A$ is automatically an $\varepsilon$-net. For the lower bound $\mathcal{P}(A, 2\varepsilon) \le \mathcal{N}(A, \varepsilon)$, we show that any $\varepsilon$-covering must assign distinct balls to distinct points of a $2\varepsilon$-separated set, since no single $\varepsilon$-ball can contain two such points.
[/proofplan]
[step:Upper bound: a maximal $\varepsilon$-separated set is an $\varepsilon$-net]
We prove $\mathcal{N}(A, \varepsilon) \le \mathcal{P}(A, \varepsilon)$.
Let $S = \{s_1, \ldots, s_P\} \subset A$ be a maximal $\varepsilon$-separated subset of $A$, meaning $d(s_i, s_j) \ge \varepsilon$ for all $i \neq j$, and no point of $A$ can be added to $S$ while preserving the $\varepsilon$-separation property. Such a maximal set exists: start with any point of $A$ and greedily add points at distance $\ge \varepsilon$ from all previously selected points; the process terminates because $|S| \le \mathcal{P}(A, \varepsilon) < \infty$ (if $\mathcal{P}(A, \varepsilon) = \infty$, the inequality $\mathcal{N}(A, \varepsilon) \le \mathcal{P}(A, \varepsilon)$ holds vacuously).
We claim $S$ is an $\varepsilon$-net for $A$: for any $a \in A$, there exists $s_i \in S$ with $d(a, s_i) < \varepsilon$. Indeed, if $d(a, s_i) \ge \varepsilon$ for all $i$, then $S \cup \{a\}$ would be an $\varepsilon$-separated subset of $A$ strictly larger than $S$, contradicting maximality.
Therefore $A \subset \bigcup_{i=1}^P B(s_i, \varepsilon)$, and $\mathcal{N}(A, \varepsilon) \le P$. Since $S$ was an arbitrary maximal $\varepsilon$-separated set with $|S| = P$, and a maximal $\varepsilon$-separated set achieves $|S| = \mathcal{P}(A, \varepsilon)$, we conclude $\mathcal{N}(A, \varepsilon) \le \mathcal{P}(A, \varepsilon)$.
[guided]
The upper bound rests on a beautifully simple observation: the greedy process used to build a maximal separated set simultaneously produces a covering.
An **$\varepsilon$-separated** subset $S \subset A$ is one where $d(s, s') \ge \varepsilon$ for all distinct $s, s' \in S$. A **maximal** $\varepsilon$-separated set is one that cannot be enlarged: no point $a \in A \setminus S$ satisfies $d(a, s) \ge \varepsilon$ for all $s \in S$. Equivalently, for every $a \in A$, some $s \in S$ satisfies $d(a, s) < \varepsilon$.
But "for every $a \in A$, some $s \in S$ satisfies $d(a, s) < \varepsilon$" is precisely the statement that $S$ is an $\varepsilon$-net for $A$ (with centres in $A$ itself). So $A \subset \bigcup_{s \in S} B(s, \varepsilon)$, and $\mathcal{N}(A, \varepsilon) \le |S| = \mathcal{P}(A, \varepsilon)$.
Why is maximality essential? A non-maximal $\varepsilon$-separated set might "miss" parts of $A$: some points of $A$ could be at distance $\ge \varepsilon$ from every point of $S$. Maximality precisely excludes this possibility.
This argument also provides a practical algorithm for constructing $\varepsilon$-nets: run the greedy separated-set construction until no more points can be added. The resulting set is both a maximal $\varepsilon$-separated set and an $\varepsilon$-net, and its cardinality is at most $\mathcal{P}(A, \varepsilon)$.
[/guided]
[/step]
[step:Lower bound: an $\varepsilon$-covering must distinguish $2\varepsilon$-separated points]
We prove $\mathcal{P}(A, 2\varepsilon) \le \mathcal{N}(A, \varepsilon)$.
Let $\{x_1, \ldots, x_N\} \subset M$ be a finite $\varepsilon$-net for $A$, so $A \subset \bigcup_{i=1}^N B(x_i, \varepsilon)$ and $N \ge \mathcal{N}(A, \varepsilon)$. Let $S = \{s_1, \ldots, s_P\} \subset A$ be a $2\varepsilon$-separated subset, so $d(s_j, s_k) \ge 2\varepsilon$ for all $j \neq k$.
For each $s_j \in S$, since $s_j \in A \subset \bigcup_{i=1}^N B(x_i, \varepsilon)$, there exists $i(j) \in \{1, \ldots, N\}$ with $d(s_j, x_{i(j)}) < \varepsilon$. We claim the map $j \mapsto i(j)$ is injective. Suppose $i(j) = i(k)$ for $j \neq k$. Then
\begin{align*}
d(s_j, s_k) \le d(s_j, x_{i(j)}) + d(x_{i(j)}, s_k) = d(s_j, x_{i(j)}) + d(x_{i(k)}, s_k) < \varepsilon + \varepsilon = 2\varepsilon,
\end{align*}
contradicting $d(s_j, s_k) \ge 2\varepsilon$. Therefore the map is injective, and $P = |S| \le N$.
Since this holds for every $\varepsilon$-net of size $N$ and every $2\varepsilon$-separated set of size $P$, taking $N = \mathcal{N}(A, \varepsilon)$ (a minimum-size $\varepsilon$-net) and $P = \mathcal{P}(A, 2\varepsilon)$ (a maximum-size $2\varepsilon$-separated set) gives $\mathcal{P}(A, 2\varepsilon) \le \mathcal{N}(A, \varepsilon)$.
[guided]
The lower bound says that covering numbers cannot be too small relative to packing numbers: any covering must use enough balls to "distinguish" well-separated points.
Let $\{x_1, \ldots, x_N\}$ be an $\varepsilon$-net for $A$ (with $N = \mathcal{N}(A, \varepsilon)$), and let $S = \{s_1, \ldots, s_P\}$ be a $2\varepsilon$-separated subset of $A$ (with $P = \mathcal{P}(A, 2\varepsilon)$). Each $s_j$ lies in some ball $B(x_{i(j)}, \varepsilon)$.
The key claim is that no ball $B(x_i, \varepsilon)$ can contain two distinct points of $S$. If $s_j, s_k \in B(x_i, \varepsilon)$ with $j \neq k$, the triangle inequality gives
\begin{align*}
d(s_j, s_k) \le d(s_j, x_i) + d(x_i, s_k) < \varepsilon + \varepsilon = 2\varepsilon,
\end{align*}
contradicting the $2\varepsilon$-separation of $S$. Therefore the assignment $j \mapsto i(j)$ is injective: each of the $P$ separated points lands in a distinct ball of the covering. An injective map from $\{1, \ldots, P\}$ to $\{1, \ldots, N\}$ requires $P \le N$.
The factor of $2$ is geometrically natural: two points at distance $\ge 2\varepsilon$ cannot both be within $\varepsilon$ of a common centre. The duality between packing and covering is a manifestation of this simple geometric fact: packing measures how many "non-overlapping" balls of radius $\varepsilon$ fit inside $A$, while covering measures how many balls of radius $\varepsilon$ are needed to contain $A$. The factor-of-$2$ loss in the radius is the price of moving between the "inside" and "outside" viewpoints.
[/guided]
[/step]