[proofplan]
We prove the estimate by viewing the discrepancies as the image of the finitely supported sequence $(a_n)$ under the residue-class discrepancy operator. The only analytic input is Selberg's residue-ring form of the large sieve, stated in the exact form needed below: for an interval of $N$ consecutive integers and moduli $q \le Q$, this operator has squared norm at most $N - 1 + Q^2$. After defining the residue sums and subtracting the reduced-residue average for each modulus, the desired inequality is exactly this operator norm bound applied to the given sequence.
[/proofplan]
[step:Define the residue-class discrepancy operator]
For each integer $q$ with $1 \le q \le Q$, let $R_q := (\mathbb{Z}/q\mathbb{Z})^\times$ denote the reduced residue classes modulo $q$. For each $a \in R_q$, define the residue sum
\begin{align*}
A(q,a) := \sum_{M < n \le M+N,\ n \equiv a \bmod q} a_n.
\end{align*}
Define the reduced-residue average
\begin{align*}
\overline{A}(q) := \frac{1}{\varphi(q)} \sum_{b \in R_q} A(q,b),
\end{align*}
and define the discrepancy
\begin{align*}
E(q,a) := A(q,a) - \overline{A}(q).
\end{align*}
Thus the left-hand side of the claimed estimate is the squared $\ell^2$ norm of the array $(E(q,a))_{q \le Q,\, a \in R_q}$.
[/step]
[step:Prove the required operator bound by duality]
Let $\mathcal{I} := \{n \in \mathbb{Z} : M < n \le M+N\}$ denote the interval of indices. For each $q \le Q$, let $B_q: \mathbb{C}^{R_q} \to \mathbb{C}^{R_q}$ be the [orthogonal projection](/theorems/437), with respect to the standard [inner product](/page/Inner%20Product) on $\mathbb{C}^{R_q}$, onto the subspace of vectors whose coordinate sum is $0$. The array $(E(q,a))_{a \in R_q}$ is obtained by applying $B_q$ to the array $(A(q,a))_{a \in R_q}$.
By Hilbert-space duality, it is enough to prove that for every finitely supported family $\beta = (\beta_{q,a})_{q \le Q,\ a \in R_q}$ satisfying
\begin{align*}
\sum_{a \in R_q} \beta_{q,a} = 0
\end{align*}
for each $q \le Q$, one has
\begin{align*}
\sum_{n \in \mathcal{I}} \left|\sum_{q \le Q} \beta_{q,n \bmod q}\mathbb{1}_{\gcd(n,q)=1}\right|^2 \le (N - 1 + Q^2) \sum_{q \le Q} \sum_{a \in R_q} |\beta_{q,a}|^2.
\end{align*}
Here $\mathbb{1}_{\gcd(n,q)=1}$ is the indicator of the condition $\gcd(n,q)=1$, and when this indicator is $0$ the corresponding term is interpreted as $0$. The displayed estimate is exactly the Selberg residue-class dual large-sieve bound assumed in the theorem statement.
Applying this dual bound to the adjoint of the discrepancy operator gives
\begin{align*}
\sum_{q \le Q} \sum_{a \in R_q} |E(q,a)|^2 \le (N - 1 + Q^2) \sum_{n \in \mathcal{I}} |a_n|^2.
\end{align*}
Since $\mathcal{I} = \{n \in \mathbb{Z} : M < n \le M+N\}$, this is
\begin{align*}
\sum_{q \le Q} \sum_{a \in R_q} |E(q,a)|^2 \le (N - 1 + Q^2) \sum_{M < n \le M+N} |a_n|^2.
\end{align*}
[guided]
The main point is to convert the stated Selberg residue-class dual bound into the desired mean-square discrepancy estimate. Let $\mathcal{I} := \{n \in \mathbb{Z} : M < n \le M+N\}$ be the finite interval supporting the sequence. For a fixed modulus $q$, the operation $A(q,a) \mapsto E(q,a)$ subtracts the average over $R_q$; in finite-dimensional Hilbert-space language this is the orthogonal projection $B_q: \mathbb{C}^{R_q} \to \mathbb{C}^{R_q}$ onto the hyperplane of vectors with coordinate sum $0$.
We now pass to the dual operator. It is enough to test against families $\beta = (\beta_{q,a})_{q \le Q,\ a \in R_q}$ with zero coordinate sum for each fixed $q$, because the range of each projection $B_q$ is exactly that zero-sum hyperplane. The adjoint expression is
\begin{align*}
n \mapsto \sum_{q \le Q} \beta_{q,n \bmod q}\mathbb{1}_{\gcd(n,q)=1}.
\end{align*}
Thus duality reduces the desired estimate to
\begin{align*}
\sum_{n \in \mathcal{I}} \left|\sum_{q \le Q} \beta_{q,n \bmod q}\mathbb{1}_{\gcd(n,q)=1}\right|^2 \le (N - 1 + Q^2) \sum_{q \le Q} \sum_{a \in R_q} |\beta_{q,a}|^2.
\end{align*}
This is exactly the Selberg residue-class dual large-sieve estimate assumed in the theorem statement. The symbol $\mathbb{1}_{\gcd(n,q)=1}$ is the indicator of the coprimality condition, so terms with $\gcd(n,q)>1$ contribute $0$ and no residue class $n\bmod q$ in $R_q$ needs to be chosen for them.
Having applied the assumed dual estimate, Hilbert-space duality gives the corresponding primal estimate for the discrepancy operator:
\begin{align*}
\sum_{q \le Q} \sum_{a \in R_q} |E(q,a)|^2 \le (N - 1 + Q^2) \sum_{n \in \mathcal{I}} |a_n|^2.
\end{align*}
Finally, substituting the definition of $\mathcal{I}$ gives
\begin{align*}
\sum_{q \le Q} \sum_{a \in R_q} |E(q,a)|^2 \le (N - 1 + Q^2) \sum_{M < n \le M+N} |a_n|^2.
\end{align*}
[/guided]
[/step]
[step:Translate reduced residue notation back to the stated sum]
The notation $a \in R_q$ is equivalent to summing over residue classes $a \bmod q$ with $\gcd(a,q)=1$. Therefore
\begin{align*}
\sum_{q \le Q} \sum_{a \in R_q} |E(q,a)|^2 = \sum_{q \le Q}\sum_{a \bmod q,\ \gcd(a,q)=1}|E(q,a)|^2.
\end{align*}
Substituting this identity into the preceding estimate proves
\begin{align*}
\sum_{q \le Q}\sum_{a \bmod q,\ \gcd(a,q)=1}|E(q,a)|^2 \le (N - 1 + Q^2)\sum_{M < n \le M+N}|a_n|^2.
\end{align*}
This is the claimed inequality.
[/step]