[guided]Let $m\in\mathbb N$, and let $C_1,\dots,C_m\in\mathcal C$ be a finite family satisfying $d_Q(C_i,C_j)>\varepsilon$ for all distinct indices $i,j\in\{1,\dots,m\}$. The metric in the theorem is an $L^2(Q)$ metric on indicator functions. Squaring it converts the problem into the symmetric-difference metric
\begin{align*}
\rho_Q:\mathcal C\times\mathcal C\to[0,1],\qquad \rho_Q(C,D):=Q(C\triangle D).
\end{align*}
Indeed, by the definition of $d_Q$,
\begin{align*}
\rho_Q(C_i,C_j)=d_Q(C_i,C_j)^2\ge \varepsilon^2
\end{align*}
for all distinct $i,j$.
The external ingredient is Pollard's packing estimate for VC classes. Its hypotheses are: the class has finite VC dimension, the measure is a probability measure on the underlying measurable space, and the separation parameter $\alpha$ satisfies $0<\alpha<1$. These are available here because $V(\mathcal C)=v<\infty$, $Q$ is a probability measure on $(S,\mathcal A)$, and any choice $0<\alpha<\varepsilon^2$ also satisfies $0<\alpha<1$ since $0<\varepsilon<1$. The conclusion of Pollard's estimate is that every finite family in $\mathcal C$ whose pairwise symmetric differences have $Q$-measure strictly larger than $\alpha$ has cardinality at most
\begin{align*}
2(v+1)(4e)^v\alpha^{-v}.
\end{align*}
The finite-trace mechanism behind the estimate is that a random sample converts measure separation into separation of traces on a finite set, while the Sauer-Shelah lemma [citetheorem:1969] bounds the number of traces of a VC class of dimension $v$.
Now fix $\alpha$ with $0<\alpha<\varepsilon^2$. Since $Q(C_i\triangle C_j)\ge\varepsilon^2>\alpha$ for all distinct $i,j$, the strict separation hypothesis in Pollard's estimate is satisfied. Therefore
\begin{align*}
m\le 2(v+1)(4e)^v\alpha^{-v}.
\end{align*}
This inequality holds for every $0<\alpha<\varepsilon^2$. Passing to the limit as $\alpha\uparrow\varepsilon^2$ gives
\begin{align*}
m\le 2(v+1)(4e)^v\varepsilon^{-2v}.
\end{align*}[/guided]