[proofplan]
We decompose the exponential sum into residue classes modulo $q$. On reduced residue classes, the Siegel-Walfisz [prime number theorem](/theorems/1692) in arithmetic progressions, namely [citetheorem:9080], gives a uniform approximation to the cumulative von Mangoldt sums; partial summation, namely [citetheorem:9048], transfers this approximation to the weight $e(\beta n)$. The sum over reduced residues is then evaluated by the elementary Ramanujan sum identity $\sum_{\gcd(r,q)=1}e(ar/q)=\mu(q)$, while non-reduced residue classes contribute only powers of primes dividing $q$. All losses from $q$, $L$, and partial summation are powers of $\log N$, and are absorbed into the exponential error after decreasing the constant in the exponential and increasing the threshold $N_A$.
[/proofplan]
[step:Choose uniform constants for the arithmetic progression input]
Fix $A>0$. We shall use [citetheorem:9080] with parameter $B:=2A+4$. Thus there exist constants $C_0=C_0(A)>0$, $c_0=c_0(A)>0$, and $x_0=x_0(A)\ge 3$ such that, whenever $x\ge x_0$, $Q\le(\log x)^B$, and $r$ is a reduced residue class modulo $Q$, one has
\begin{align*}
\left|\sum_{\substack{1\le n\le x, n\equiv r\pmod Q}}\Lambda(n)-\frac{x}{\varphi(Q)}\right|\le C_0 x\exp(-c_0\sqrt{\log x}).
\end{align*}
We increase the final threshold $N_A$ so that $N_A\ge x_0^2$, $\log N_A\ge 1$, and, for every $N\ge N_A$, every $t\in[N^{1/2},N]$ satisfies
\begin{align*}
(\log N)^A\le(\log t)^B.
\end{align*}
This is possible because $\log t\ge \frac{1}{2}\log N$ on $[N^{1/2},N]$ and $B>A$.
[/step]
[step:Separate the sum into reduced and non-reduced residue classes]
Let $N\ge N_A$, let $a,q,L,\beta,\alpha$ satisfy the hypotheses, and define $f:[1,N]\to\mathbb C$ by
\begin{align*}
f(t):=e(\beta t).
\end{align*}
Then $f\in C^1([1,N];\mathbb C)$ and
\begin{align*}
f'(t)=2\pi i\beta e(\beta t).
\end{align*}
Writing each integer $n$ by its residue class modulo $q$, we have
\begin{align*}
S_N(a/q+\beta)=\sum_{r\bmod q}e(ar/q)\sum_{\substack{1\le n\le N, n\equiv r\pmod q}}\Lambda(n)e(\beta n).
\end{align*}
Let $S_{\mathrm{red}}$ denote the sub-sum over residue classes $r\bmod q$ with $\gcd(r,q)=1$, and let $S_{\mathrm{bad}}$ denote the complementary sub-sum over classes with $\gcd(r,q)>1$.
If $\Lambda(n)\ne0$ and $\gcd(n,q)>1$, then $n=p^j$ for some prime $p\mid q$ and some $j\in\mathbb N$. Therefore
\begin{align*}
|S_{\mathrm{bad}}|\le\sum_{p\mid q}\sum_{\substack{j\ge1, p^j\le N}}\log p.
\end{align*}
For each prime $p\mid q$, the inner sum is at most $\log N$. Since $q\le(\log N)^A$, the number of prime divisors of $q$ is at most $\log q/\log 2\le A\log\log N/\log 2$. Hence
\begin{align*}
|S_{\mathrm{bad}}|\le \frac{A}{\log 2}(\log N)(\log\log N).
\end{align*}
After increasing $N_A$ and decreasing the eventual exponential constant if necessary, this is at most $N\exp(-(c_0/4)\sqrt{\log N})$ for all $N\ge N_A$.
[/step]
[step:Apply partial summation on each reduced residue class]
Fix a reduced residue class $r\bmod q$. Define $A_r:[0,N]\to\mathbb R$ by
\begin{align*}
A_r(t):=\sum_{\substack{1\le n\le t, n\equiv r\pmod q}}\Lambda(n).
\end{align*}
For $1\le t<N^{1/2}$ we use the elementary bound $\Lambda(n)\le\log n\le\log t$ for every integer $n$ with $2\le n\le t$, while $\Lambda(1)=0$. Hence
\begin{align*}
|A_r(t)|\le\sum_{1\le n\le t}\Lambda(n)\le t\log t\le N^{1/2}\log N.
\end{align*}
For $N^{1/2}\le t\le N$, the choice of $N_A$ gives $q\le(\log t)^B$, so the arithmetic progression estimate gives
\begin{align*}
A_r(t)=\frac{t}{\varphi(q)}+E_r(t)
\end{align*}
with
\begin{align*}
|E_r(t)|\le C_0 t\exp(-c_0\sqrt{\log t}).
\end{align*}
In particular, after increasing $N_A$ and enlarging $C_1=C_1(A)>0$, we have the uniform bound
\begin{align*}
|E_r(t)|\le C_1 N\exp(-(c_0/2)\sqrt{\log N})
\end{align*}
for every $1\le t\le N$, where for $t<N^{1/2}$ the left-hand side is interpreted as $A_r(t)-t/\varphi(q)$.
By [citetheorem:9048] applied to the sequence $(a_n)_{1\le n\le N}$ defined by $a_n=\Lambda(n)$ when $n\equiv r\pmod q$ and by $a_n=0$ otherwise, and to the function $f$, we obtain
\begin{align*}
\sum_{\substack{1\le n\le N, n\equiv r\pmod q}}\Lambda(n)e(\beta n)=A_r(N)f(N)-\int_1^N A_r(t)f'(t)\,d\mathcal L^1(t).
\end{align*}
Substituting $A_r(t)=t/\varphi(q)+E_r(t)$ gives
\begin{align*}
\sum_{\substack{1\le n\le N, n\equiv r\pmod q}}\Lambda(n)e(\beta n)=\frac{1}{\varphi(q)}\left(Nf(N)-\int_1^N t f'(t)\,d\mathcal L^1(t)\right)+R_r,
\end{align*}
where
\begin{align*}
R_r:=E_r(N)f(N)-\int_1^N E_r(t)f'(t)\,d\mathcal L^1(t).
\end{align*}
Since $|f(N)|=1$ and $|f'(t)|=2\pi|\beta|$, the hypotheses on $L$, $q$, and $\beta$ give
\begin{align*}
|\beta|\le \frac{L}{qN}\le \frac{(\log N)^A}{N}.
\end{align*}
Therefore
\begin{align*}
|R_r|\le C_1 N\exp(-(c_0/2)\sqrt{\log N})\left(1+2\pi(\log N)^A\right).
\end{align*}
After decreasing $c_0/2$ to $c_2=c_2(A)>0$ and increasing $N_A$, this becomes
\begin{align*}
|R_r|\le C_2 N\exp(-c_2\sqrt{\log N})
\end{align*}
with $C_2=C_2(A)>0$.
[guided]
The purpose of this step is to turn the unweighted arithmetic progression estimate into a weighted one. The weight is $f(t)=e(\beta t)$, and the condition $|\beta|\le L/(qN)$ says that this weight varies slowly on the interval $[1,N]$ up to a logarithmic factor.
For a fixed reduced residue class $r\bmod q$, define the counting function
\begin{align*}
A_r(t):=\sum_{\substack{1\le n\le t, n\equiv r\pmod q}}\Lambda(n).
\end{align*}
The theorem [citetheorem:9080] applies uniformly on the range $N^{1/2}\le t\le N$: indeed $q\le(\log N)^A$, and the threshold $N_A$ was chosen so that $(\log N)^A\le(\log t)^B$ throughout this range. Hence
\begin{align*}
A_r(t)=\frac{t}{\varphi(q)}+E_r(t)
\end{align*}
with
\begin{align*}
|E_r(t)|\le C_0t\exp(-c_0\sqrt{\log t})
\end{align*}
for $N^{1/2}\le t\le N$. For $1\le t<N^{1/2}$ we do not use the [prime number theorem](/theorems/1742). Instead, $\Lambda(1)=0$ and $\Lambda(n)\le\log n\le\log t$ for every integer $n$ with $2\le n\le t$, so
\begin{align*}
|A_r(t)|\le\sum_{1\le n\le t}\Lambda(n)\le t\log t\le N^{1/2}\log N.
\end{align*}
This is much smaller than $N\exp(-c\sqrt{\log N})$ for a smaller positive constant $c$ and all sufficiently large $N$. Thus, after adjusting constants depending only on $A$, we may use one uniform bound
\begin{align*}
|E_r(t)|\le C_1N\exp(-(c_0/2)\sqrt{\log N})
\end{align*}
for all $1\le t\le N$.
Now apply partial summation, [citetheorem:9048], to the sequence
\begin{align*}
a_n:=\Lambda(n)\mathbb{1}_{n\equiv r\pmod q}
\end{align*}
and the $C^1$ function $f:[1,N]\to\mathbb C$ given by $f(t)=e(\beta t)$. Its derivative is
\begin{align*}
f'(t)=2\pi i\beta e(\beta t).
\end{align*}
Partial summation gives
\begin{align*}
\sum_{\substack{1\le n\le N, n\equiv r\pmod q}}\Lambda(n)e(\beta n)=A_r(N)f(N)-\int_1^N A_r(t)f'(t)\,d\mathcal L^1(t).
\end{align*}
Substituting $A_r(t)=t/\varphi(q)+E_r(t)$ separates the main term and the error:
\begin{align*}
\sum_{\substack{1\le n\le N, n\equiv r\pmod q}}\Lambda(n)e(\beta n)=\frac{1}{\varphi(q)}\left(Nf(N)-\int_1^N t f'(t)\,d\mathcal L^1(t)\right)+R_r,
\end{align*}
where
\begin{align*}
R_r:=E_r(N)f(N)-\int_1^N E_r(t)f'(t)\,d\mathcal L^1(t).
\end{align*}
The error is controlled because $|f(N)|=1$ and $|f'(t)|=2\pi|\beta|$. Since $|\beta|N\le(\log N)^A$, the integral contributes only a logarithmic loss:
\begin{align*}
|R_r|\le C_1N\exp(-(c_0/2)\sqrt{\log N})\left(1+2\pi(\log N)^A\right).
\end{align*}
A power of $\log N$ is absorbed into $\exp(c\sqrt{\log N})$ after reducing the exponential constant. Thus there are constants $C_2=C_2(A)>0$ and $c_2=c_2(A)>0$ such that
\begin{align*}
|R_r|\le C_2N\exp(-c_2\sqrt{\log N}).
\end{align*}
[/guided]
[/step]
[step:Replace the continuous partial summation main term by the discrete exponential sum]
Define
\begin{align*}
M_\beta:=Nf(N)-\int_1^N t f'(t)\,d\mathcal L^1(t).
\end{align*}
Applying [citetheorem:9048] to the constant sequence $b_n=1$ gives
\begin{align*}
\sum_{1\le n\le N}e(\beta n)=Nf(N)-\int_1^N \lfloor t\rfloor f'(t)\,d\mathcal L^1(t).
\end{align*}
Therefore
\begin{align*}
M_\beta-\sum_{1\le n\le N}e(\beta n)=\int_1^N(\lfloor t\rfloor-t)f'(t)\,d\mathcal L^1(t).
\end{align*}
Since $|\lfloor t\rfloor-t|\le1$ and $|f'(t)|=2\pi|\beta|$, we get
\begin{align*}
\left|M_\beta-\sum_{1\le n\le N}e(\beta n)\right|\le 2\pi|\beta|N\le 2\pi(\log N)^A.
\end{align*}
After enlarging constants and increasing $N_A$, this error is at most $N\exp(-c_2\sqrt{\log N})$ for all $N\ge N_A$.
[/step]
[step:Evaluate the reduced residue sum by the Ramanujan identity]
We prove the elementary identity
\begin{align*}
\sum_{\substack{r\bmod q, \gcd(r, q)=1}}e(ar/q)=\mu(q)
\end{align*}
under the hypothesis $\gcd(a,q)=1$.
[claim:Ramanujan sum at a reduced frequency]
If $q\in\mathbb N$ and $\gcd(a,q)=1$, then
\begin{align*}
\sum_{\substack{r\bmod q, \gcd(r, q)=1}}e(ar/q)=\mu(q).
\end{align*}
[/claim]
[proof]
Define the Ramanujan sum map $c_q:\mathbb Z\to\mathbb C$ by
\begin{align*}
c_q(a'):=\sum_{\substack{r\bmod q, \gcd(r, q)=1}}e(a'r/q)
\end{align*}
for each $a'\in\mathbb Z$.
Define $\delta_q:\mathbb Z/q\mathbb Z\to\{0,1\}$ by setting $\delta_q(r)=1$ when $\gcd(r,q)=1$ and $\delta_q(r)=0$ otherwise. Using the identity
\begin{align*}
\delta_q(r)=\sum_{d\mid \gcd(r,q)}\mu(d),
\end{align*}
we obtain
\begin{align*}
c_q(a)=\sum_{r\bmod q}e(ar/q)\sum_{d\mid \gcd(r,q)}\mu(d).
\end{align*}
Interchanging the finite sums gives
\begin{align*}
c_q(a)=\sum_{d\mid q}\mu(d)\sum_{\substack{r\bmod q, d\mid r}}e(ar/q).
\end{align*}
For a fixed divisor $d\mid q$, write $r=dm$ with $m\bmod q/d$. Then
\begin{align*}
\sum_{\substack{r\bmod q, d\mid r}}e(ar/q)=\sum_{m\bmod q/d}e(am/(q/d)).
\end{align*}
This finite geometric sum equals $q/d$ if $q/d\mid a$, and equals $0$ otherwise. Since $\gcd(a,q)=1$, the condition $q/d\mid a$ holds only when $d=q$. Hence
\begin{align*}
c_q(a)=\mu(q).
\end{align*}
[/proof]
Summing the reduced-class formula from the previous step over all reduced $r\bmod q$ gives
\begin{align*}
S_{\mathrm{red}}=\frac{\mu(q)}{\varphi(q)}M_\beta+\sum_{\substack{r\bmod q, \gcd(r, q)=1}}e(ar/q)R_r.
\end{align*}
Using $\varphi(q)\le q\le(\log N)^A$ and the bound for $R_r$, we obtain
\begin{align*}
\left|\sum_{\substack{r\bmod q, \gcd(r, q)=1}}e(ar/q)R_r\right|\le q C_2N\exp(-c_2\sqrt{\log N}).
\end{align*}
After reducing the exponential constant once more and increasing $N_A$, this is bounded by
\begin{align*}
C_3N\exp(-c_3\sqrt{\log N})
\end{align*}
for constants $C_3=C_3(A)>0$ and $c_3=c_3(A)>0$.
[/step]
[step:Combine the reduced and non-reduced estimates]
From the previous steps,
\begin{align*}
S_N(a/q+\beta)=\frac{\mu(q)}{\varphi(q)}M_\beta+O_A\left(N\exp(-c_3\sqrt{\log N})\right).
\end{align*}
The replacement estimate for $M_\beta$ gives
\begin{align*}
\frac{\mu(q)}{\varphi(q)}M_\beta=\frac{\mu(q)}{\varphi(q)}\sum_{1\le n\le N}e(\beta n)+O_A\left(N\exp(-c_3\sqrt{\log N})\right),
\end{align*}
because $|\mu(q)/\varphi(q)|\le1$. The non-reduced residue class contribution is also $O_A(N\exp(-c_3\sqrt{\log N}))$. Therefore, after renaming constants as $C_A>0$ and $c_A>0$, we obtain
\begin{align*}
\left|S_N(a/q+\beta)-\frac{\mu(q)}{\varphi(q)}\sum_{1\le n\le N}e(\beta n)\right|\le C_A N\exp(-c_A\sqrt{\log N}).
\end{align*}
All constants introduced depend only on $A$, and every threshold enlargement was made only in terms of $A$. This proves the asserted uniform estimate.
[/step]