[step:Approximate each residue-class sum by one $q$-th of the integral]
We prove the following elementary estimate: for every $r\in\{1,\dots,q\}$,
\begin{align*}
T_r=q^{-1}V_k(\beta;P)+O_k(1+P^k|\beta|),
\end{align*}
with a constant independent of $r$, $q$, $P$, and $\beta$.
For $r\in\{1,\dots,q\}$, define the counting function
\begin{align*}
A_r:[0,P]\to\mathbb R,\qquad t\mapsto \#\{x\in\mathbb Z:1\le x\le t,\ x\equiv r\pmod q\}.
\end{align*}
Also define the discrepancy function
\begin{align*}
D_r:[0,P]\to\mathbb R,\qquad t\mapsto A_r(t)-\frac{t}{q}.
\end{align*}
For every $t\in[0,P]$, the interval $\{1,\dots,\lfloor t\rfloor\}$ contains either $\lfloor t/q\rfloor$ or $\lfloor t/q\rfloor+1$ integers in the residue class $r\pmod q$, so
\begin{align*}
|D_r(t)|\le 2.
\end{align*}
Define
\begin{align*}
F:[0,P]\to\mathbb C,\qquad t\mapsto e(\beta t^k).
\end{align*}
Then $F\in C^1([0,P];\mathbb C)$ and
\begin{align*}
F'(t)=2\pi i k\beta t^{k-1}e(\beta t^k).
\end{align*}
By the Riemann-Stieltjes summation-by-parts formula applied to $F$ and the step function $A_r$,
\begin{align*}
T_r=\int_{[0,P]}F(t)\,dA_r(t).
\end{align*}
Since $A_r(t)=t/q+D_r(t)$, this gives
\begin{align*}
T_r=q^{-1}\int_0^P F(t)\,d\mathcal L^1(t)+\int_{[0,P]}F(t)\,dD_r(t).
\end{align*}
Applying summation by parts again to the bounded-variation function $D_r$ gives
\begin{align*}
\int_{[0,P]}F(t)\,dD_r(t)=F(P)D_r(P)-F(0)D_r(0)-\int_0^P D_r(t)F'(t)\,d\mathcal L^1(t).
\end{align*}
Since $|F(t)|=1$ and $|D_r(t)|\le 2$,
\begin{align*}
\left|\int_{[0,P]}F(t)\,dD_r(t)\right|\le 4+2\int_0^P |F'(t)|\,d\mathcal L^1(t).
\end{align*}
Using the displayed formula for $F'$,
\begin{align*}
\int_0^P |F'(t)|\,d\mathcal L^1(t)=2\pi k|\beta|\int_0^P t^{k-1}\,d\mathcal L^1(t)=2\pi |\beta|P^k.
\end{align*}
Hence
\begin{align*}
\left|T_r-q^{-1}V_k(\beta;P)\right|\le 4+4\pi P^k|\beta|.
\end{align*}
This proves the claimed estimate, with an implicit constant depending only on $k$.
[/step]