[proofplan]
We split the Weyl sum according to residue classes modulo $q$. On each residue class, the rational part $e(a x^k/q)$ is constant and contributes the complete sum factor. The remaining slowly varying factor $e(\beta x^k)$ is compared with its average integral over $[0,P]$ by an elementary summation-by-parts estimate for lattice points in one residue class. Summing the resulting uniform residue-class error over the $q$ residue classes gives the stated approximation.
[/proofplan]
[step:Split the Weyl sum into residue classes modulo $q$]
For each $r\in\{1,\dots,q\}$, define the residue-class sum
\begin{align*}
T_r:=\sum_{\substack{1\le x\le P, x\equiv r\pmod q}} e(\beta x^k),
\end{align*}
where the sum is over integers $x$. Since $x\equiv r\pmod q$ implies $x^k\equiv r^k\pmod q$, and since $a\in\mathbb Z$, we have
\begin{align*}
e\left(\frac{a x^k}{q}\right)=e\left(\frac{a r^k}{q}\right)
\end{align*}
for every integer $x$ with $x\equiv r\pmod q$. Therefore
\begin{align*}
S_k(\alpha;P)=\sum_{r=1}^{q} e\left(\frac{a r^k}{q}\right)T_r.
\end{align*}
[guided]
The point of introducing residue classes is that the rational phase only depends on $x$ modulo $q$. For each $r\in\{1,\dots,q\}$, define
\begin{align*}
T_r:=\sum_{\substack{1\le x\le P, x\equiv r\pmod q}} e(\beta x^k),
\end{align*}
with the summation variable $x$ restricted to integers. If $x\equiv r\pmod q$, then $q$ divides $x-r$, hence $q$ divides $x^k-r^k$ because $X^k-Y^k$ is divisible by $X-Y$ in $\mathbb Z[X,Y]$. Thus
\begin{align*}
\frac{a x^k}{q}-\frac{a r^k}{q}\in\mathbb Z.
\end{align*}
The function $e(t)=\exp(2\pi i t)$ is $1$-periodic, so this integer difference gives
\begin{align*}
e\left(\frac{a x^k}{q}\right)=e\left(\frac{a r^k}{q}\right).
\end{align*}
Using $\alpha=a/q+\beta$, we may write
\begin{align*}
e(\alpha x^k)=e\left(\frac{a x^k}{q}\right)e(\beta x^k)=e\left(\frac{a r^k}{q}\right)e(\beta x^k)
\end{align*}
on the residue class $x\equiv r\pmod q$. Summing first over each residue class and then over $r=1,\dots,q$ gives
\begin{align*}
S_k(\alpha;P)=\sum_{r=1}^{q} e\left(\frac{a r^k}{q}\right)T_r.
\end{align*}
[/guided]
[/step]
[step:Approximate each residue-class sum by one $q$-th of the integral]
We prove the following elementary estimate: for every $r\in\{1,\dots,q\}$,
\begin{align*}
T_r=q^{-1}V_k(\beta;P)+O_k(1+P^k|\beta|),
\end{align*}
with a constant independent of $r$, $q$, $P$, and $\beta$.
For $r\in\{1,\dots,q\}$, define the counting function
\begin{align*}
A_r:[0,P]\to\mathbb R,\qquad t\mapsto \#\{x\in\mathbb Z:1\le x\le t,\ x\equiv r\pmod q\}.
\end{align*}
Also define the discrepancy function
\begin{align*}
D_r:[0,P]\to\mathbb R,\qquad t\mapsto A_r(t)-\frac{t}{q}.
\end{align*}
For every $t\in[0,P]$, the interval $\{1,\dots,\lfloor t\rfloor\}$ contains either $\lfloor t/q\rfloor$ or $\lfloor t/q\rfloor+1$ integers in the residue class $r\pmod q$, so
\begin{align*}
|D_r(t)|\le 2.
\end{align*}
Define
\begin{align*}
F:[0,P]\to\mathbb C,\qquad t\mapsto e(\beta t^k).
\end{align*}
Then $F\in C^1([0,P];\mathbb C)$ and
\begin{align*}
F'(t)=2\pi i k\beta t^{k-1}e(\beta t^k).
\end{align*}
By the Riemann-Stieltjes summation-by-parts formula applied to $F$ and the step function $A_r$,
\begin{align*}
T_r=\int_{[0,P]}F(t)\,dA_r(t).
\end{align*}
Since $A_r(t)=t/q+D_r(t)$, this gives
\begin{align*}
T_r=q^{-1}\int_0^P F(t)\,d\mathcal L^1(t)+\int_{[0,P]}F(t)\,dD_r(t).
\end{align*}
Applying summation by parts again to the bounded-variation function $D_r$ gives
\begin{align*}
\int_{[0,P]}F(t)\,dD_r(t)=F(P)D_r(P)-F(0)D_r(0)-\int_0^P D_r(t)F'(t)\,d\mathcal L^1(t).
\end{align*}
Since $|F(t)|=1$ and $|D_r(t)|\le 2$,
\begin{align*}
\left|\int_{[0,P]}F(t)\,dD_r(t)\right|\le 4+2\int_0^P |F'(t)|\,d\mathcal L^1(t).
\end{align*}
Using the displayed formula for $F'$,
\begin{align*}
\int_0^P |F'(t)|\,d\mathcal L^1(t)=2\pi k|\beta|\int_0^P t^{k-1}\,d\mathcal L^1(t)=2\pi |\beta|P^k.
\end{align*}
Hence
\begin{align*}
\left|T_r-q^{-1}V_k(\beta;P)\right|\le 4+4\pi P^k|\beta|.
\end{align*}
This proves the claimed estimate, with an implicit constant depending only on $k$.
[/step]
[step:Collect the complete sum and sum the residue-class errors]
For each $r\in\{1,\dots,q\}$, write
\begin{align*}
T_r=q^{-1}V_k(\beta;P)+E_r,
\end{align*}
where the previous step gives
\begin{align*}
|E_r|\le C_k(1+P^k|\beta|)
\end{align*}
for a constant $C_k>0$ depending only on $k$. Substituting this into the residue-class decomposition yields
\begin{align*}
S_k(\alpha;P)=q^{-1}V_k(\beta;P)\sum_{r=1}^{q}e\left(\frac{a r^k}{q}\right)+\sum_{r=1}^{q}e\left(\frac{a r^k}{q}\right)E_r.
\end{align*}
By the definition of $C_k(q,a)$, the first term is
\begin{align*}
q^{-1}C_k(q,a)V_k(\beta;P).
\end{align*}
For the error term, $|e(a r^k/q)|=1$ for every $r$, so the triangle inequality gives
\begin{align*}
\left|\sum_{r=1}^{q}e\left(\frac{a r^k}{q}\right)E_r\right|\le \sum_{r=1}^{q}|E_r|\le C_k q(1+P^k|\beta|).
\end{align*}
Therefore
\begin{align*}
S_k(\alpha;P)=q^{-1}C_k(q,a)V_k(\beta;P)+O_k\left(q(1+P^k|\beta|)\right).
\end{align*}
The bound is uniform in the stated range because the constant $C_k$ comes only from the derivative estimate for $t\mapsto e(\beta t^k)$ and therefore depends only on $k$.
[/step]