[proofplan]
We bound the uniform empirical error in outer expectation and then convert the bound to outer probability. Symmetrisation reduces the empirical error to a Rademacher process indexed by the same class of sets. Conditional on the realised sample, only the finitely many traces of $\mathcal C$ on the observed points matter; Sauer-Shelah bounds their number polynomially in $n$, and Massart's finite-class lemma bounds the conditional Rademacher supremum. The resulting deterministic upper bound tends to zero, and Markov's inequality gives the desired convergence.
[/proofplan]
[step:Define the outer error and symmetrise it]
For $n\in\mathbb N$, define the possibly nonmeasurable extended real map
\begin{align*}
Z_n:\Omega&\to[0,\infty]
\end{align*}
\begin{align*}
\omega&\mapsto \sup_{C\in\mathcal C}|P_n(C)(\omega)-P(C)|.
\end{align*}
If $\mathcal C=\varnothing$, then the convention in the theorem statement gives $Z_n=0$ for every $n\in\mathbb N$, so both outer-probability and ordinary-probability conclusions follow immediately. Hence, for the rest of the proof, assume $\mathcal C\neq\varnothing$.
Let $\mathbb E^*$ denote outer expectation with respect to $\mathbb P$. Since $\mathbb 1_C:S\to\mathbb R$ is measurable and bounded for every $C\in\mathcal C$, every member of the indicator class
\begin{align*}
\mathcal F_{\mathcal C}:=\{\mathbb 1_C:S\to\mathbb R:C\in\mathcal C\}
\end{align*}
is $P$-integrable. Applying the [[Symmetrization Inequality for Empirical Processes](/theorems/9851)][citetheorem:9823] to $\mathcal F_{\mathcal C}$ gives
\begin{align*}
\mathbb E^*[Z_n]\le 2\,\mathbb E^*\left[\sup_{C\in\mathcal C}\left|\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i\mathbb 1_C(X_i)\right|\right],
\end{align*}
where $\varepsilon_1,\dots,\varepsilon_n$ are independent Rademacher random variables, independent of $X_1,\dots,X_n$.
[guided]
The object
\begin{align*}
Z_n(\omega)=\sup_{C\in\mathcal C}|P_n(C)(\omega)-P(C)|
\end{align*}
may fail to be $\mathcal F$-measurable because $\mathcal C$ need not be countable. This is why we use outer expectation $\mathbb E^*$ and later outer probability $\mathbb P^*$.
The symmetrisation theorem applies to the function class
\begin{align*}
\mathcal F_{\mathcal C}:=\{\mathbb 1_C:S\to\mathbb R:C\in\mathcal C\}.
\end{align*}
For each $C\in\mathcal C$, the function $\mathbb 1_C$ is measurable because $C\in\mathcal A$, and it is $P$-integrable because
\begin{align*}
\int_S |\mathbb 1_C(x)|\,dP(x)=P(C)\le 1.
\end{align*}
Thus the hypotheses of the [Symmetrization Inequality for Empirical Processes][citetheorem:9823] are satisfied for this bounded indicator class, with outer expectation used for possibly nonmeasurable suprema. Therefore
\begin{align*}
\mathbb E^*[Z_n]\le 2\,\mathbb E^*\left[\sup_{C\in\mathcal C}\left|\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i\mathbb 1_C(X_i)\right|\right].
\end{align*}
This replaces the unknown centering $P(C)$ by random signs. The advantage is that, once the sample points are fixed, the class $\mathcal C$ contributes only through its finitely many membership patterns on those sample points.
[/guided]
[/step]
[step:Replace the class by its traces on the realised sample]
Fix $n\in\mathbb N$ and fix a deterministic vector $x=(x_1,\dots,x_n)\in S^n$. Define the trace vector set
\begin{align*}
T(x):=\{(\mathbb 1_C(x_1),\dots,\mathbb 1_C(x_n)):C\in\mathcal C\}\subset\{0,1\}^n.
\end{align*}
Then, conditional on $X_1=x_1,\dots,X_n=x_n$,
\begin{align*}
\sup_{C\in\mathcal C}\left|\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i\mathbb 1_C(x_i)\right|
=
\frac{1}{n}\sup_{t\in T(x)}\left|\sum_{i=1}^{n}\varepsilon_i t_i\right|.
\end{align*}
Let $D(x):=\{x_1,\dots,x_n\}\subset S$ be the set of distinct observed points. The map $C\mapsto C\cap D(x)$ determines the vector $(\mathbb 1_C(x_1),\dots,\mathbb 1_C(x_n))$, so
\begin{align*}
|T(x)|\le |\{C\cap D(x):C\in\mathcal C\}|\le \Pi_{\mathcal C}(n),
\end{align*}
where $\Pi_{\mathcal C}(n)$ is the growth function of $\mathcal C$.
[/step]
[step:Bound the number of traces by Sauer-Shelah]
Set $v:=V(\mathcal C)$. By hypothesis, $v<\infty$. The [Sauer-Shelah Lemma][citetheorem:1969] gives, for every $n\in\mathbb N$,
\begin{align*}
\Pi_{\mathcal C}(n)\le \sum_{j=0}^{v}{n\choose j}.
\end{align*}
If $v=0$, this gives $\Pi_{\mathcal C}(n)\le 1$ for every $n$. If $v\ge 1$, then
\begin{align*}
\Pi_{\mathcal C}(n)\le \sum_{j=0}^{v}{n\choose j}\le (n+1)^v
\end{align*}
for every $n\in\mathbb N$, since each [binomial coefficient](/page/Binomial%20Coefficient) is at most $n^j$ and the sum has $v+1$ terms, which is bounded by $(n+1)^v$ for $v\ge 1$.
[/step]
[step:Apply Massart's finite-class lemma conditionally]
For the fixed sample vector $x\in S^n$, let $\mathbb E_{\varepsilon}$ denote expectation with respect to the product distribution of the independent Rademacher variables $\varepsilon_1,\dots,\varepsilon_n$, with $x$ held fixed. Define
\begin{align*}
T_{\pm}(x):=T(x)\cup\{-t:t\in T(x)\}\subset\mathbb R^n.
\end{align*}
Then
\begin{align*}
\sup_{t\in T(x)}\left|\sum_{i=1}^{n}\varepsilon_i t_i\right|
=
\sup_{u\in T_{\pm}(x)}\sum_{i=1}^{n}\varepsilon_i u_i.
\end{align*}
Every $u\in T_{\pm}(x)$ satisfies $|u|\le\sqrt n$, and
\begin{align*}
|T_{\pm}(x)|\le 2|T(x)|.
\end{align*}
Applying [[Massart Finite-Class Lemma](/theorems/9826)][citetheorem:9826] conditionally on the sample yields
\begin{align*}
\mathbb E_{\varepsilon}\left[\sup_{C\in\mathcal C}\left|\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i\mathbb 1_C(x_i)\right|\right]
\le
\sqrt{\frac{2\log(2|T(x)|)}{n}}.
\end{align*}
Using the trace bound from the preceding steps, we obtain the deterministic estimate
\begin{align*}
\mathbb E_{\varepsilon}\left[\sup_{C\in\mathcal C}\left|\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i\mathbb 1_C(x_i)\right|\right]
\le
\sqrt{\frac{2\log(2\Pi_{\mathcal C}(n))}{n}}.
\end{align*}
The right-hand side is independent of the fixed vector $x$. Therefore this pointwise conditional bound controls the outer expectation over the joint law of $(X_1,\dots,X_n,\varepsilon_1,\dots,\varepsilon_n)$:
\begin{align*}
\mathbb E^*\left[\sup_{C\in\mathcal C}\left|\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i\mathbb 1_C(X_i)\right|\right]
\le
\sqrt{\frac{2\log(2\Pi_{\mathcal C}(n))}{n}}.
\end{align*}
[guided]
After the sample points $x_1,\dots,x_n$ are fixed, the random quantity depends on $C$ only through the vector
\begin{align*}
(\mathbb 1_C(x_1),\dots,\mathbb 1_C(x_n)).
\end{align*}
Thus the index set is the finite subset
\begin{align*}
T(x)=\{(\mathbb 1_C(x_1),\dots,\mathbb 1_C(x_n)):C\in\mathcal C\}\subset\{0,1\}^n.
\end{align*}
We want to bound a supremum of absolute values. Massart's lemma is stated for a supremum of linear Rademacher sums without absolute values, so we symmetrise the finite set itself by defining
\begin{align*}
T_{\pm}(x):=T(x)\cup\{-t:t\in T(x)\}.
\end{align*}
Then
\begin{align*}
\sup_{t\in T(x)}\left|\sum_{i=1}^{n}\varepsilon_i t_i\right|
=
\sup_{u\in T_{\pm}(x)}\sum_{i=1}^{n}\varepsilon_i u_i.
\end{align*}
The hypotheses of [Massart Finite-Class Lemma][citetheorem:9826] are now verified. The set $T_{\pm}(x)$ is finite. For every $u\in T_{\pm}(x)$, each coordinate $u_i$ belongs to $\{-1,0,1\}$, hence
\begin{align*}
|u|^2=\sum_{i=1}^{n}u_i^2\le n,
\end{align*}
so $|u|\le\sqrt n$. Massart's lemma gives
\begin{align*}
\mathbb E_{\varepsilon}\left[\sup_{u\in T_{\pm}(x)}\sum_{i=1}^{n}\varepsilon_i u_i\right]
\le
\sqrt n\,\sqrt{2\log |T_{\pm}(x)|}.
\end{align*}
Dividing by $n$ and using $|T_{\pm}(x)|\le 2|T(x)|$ gives
\begin{align*}
\mathbb E_{\varepsilon}\left[\sup_{C\in\mathcal C}\left|\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i\mathbb 1_C(x_i)\right|\right]
\le
\sqrt{\frac{2\log(2|T(x)|)}{n}}.
\end{align*}
Finally, repeated sample points do not increase the number of possible membership vectors: the vector is determined by the trace $C\cap\{x_1,\dots,x_n\}$ on the distinct observed points. Therefore $|T(x)|\le\Pi_{\mathcal C}(n)$, and hence
\begin{align*}
\mathbb E_{\varepsilon}\left[\sup_{C\in\mathcal C}\left|\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i\mathbb 1_C(x_i)\right|\right]
\le
\sqrt{\frac{2\log(2\Pi_{\mathcal C}(n))}{n}}.
\end{align*}
Because this upper bound is deterministic and holds for every fixed sample vector $x\in S^n$, it also bounds the outer expectation after substituting $x_i=X_i$ and averaging over the sample together with the Rademacher signs:
\begin{align*}
\mathbb E^*\left[\sup_{C\in\mathcal C}\left|\frac{1}{n}\sum_{i=1}^{n}\varepsilon_i\mathbb 1_C(X_i)\right|\right]
\le
\sqrt{\frac{2\log(2\Pi_{\mathcal C}(n))}{n}}.
\end{align*}
[/guided]
[/step]
[step:Show that the outer expectation tends to zero]
Combining the symmetrisation estimate with the conditional finite-class bound gives
\begin{align*}
\mathbb E^*[Z_n]\le 2\sqrt{\frac{2\log(2\Pi_{\mathcal C}(n))}{n}}.
\end{align*}
If $v=0$, then $\Pi_{\mathcal C}(n)\le 1$, and therefore
\begin{align*}
\mathbb E^*[Z_n]\le 2\sqrt{\frac{2\log 2}{n}}\to 0.
\end{align*}
If $v\ge 1$, then $\Pi_{\mathcal C}(n)\le (n+1)^v$, and hence
\begin{align*}
\mathbb E^*[Z_n]\le 2\sqrt{\frac{2\log(2(n+1)^v)}{n}}.
\end{align*}
Since
\begin{align*}
\frac{\log(2(n+1)^v)}{n}=\frac{\log 2+v\log(n+1)}{n}\to 0,
\end{align*}
we have
\begin{align*}
\mathbb E^*[Z_n]\to 0.
\end{align*}
[/step]
[step:Convert the expectation bound into outer probability convergence]
Fix $\eta>0$. By Markov's inequality for outer expectation applied to the nonnegative map $Z_n$, we have
\begin{align*}
\mathbb P^*(Z_n>\eta)\le \frac{\mathbb E^*[Z_n]}{\eta}.
\end{align*}
Since $\mathbb E^*[Z_n]\to 0$, it follows that
\begin{align*}
\mathbb P^*\left(\sup_{C\in\mathcal C}|P_n(C)-P(C)|>\eta\right)\to 0.
\end{align*}
This proves convergence in outer probability.
[/step]
[step:Recover ordinary convergence in probability under measurability]
Assume now that, for every $n\in\mathbb N$, the map $Z_n:\Omega\to[0,\infty]$ is $\mathcal F$-measurable. Then outer probability agrees with ordinary probability on the event $\{Z_n>\eta\}$ for every $\eta>0$, because this event belongs to $\mathcal F$. Therefore
\begin{align*}
\mathbb P(Z_n>\eta)=\mathbb P^*(Z_n>\eta)\to 0
\end{align*}
for every $\eta>0$. This is exactly
\begin{align*}
\sup_{C\in\mathcal C}|P_n(C)-P(C)|\xrightarrow{\mathbb P}0.
\end{align*}
[/step]