[proofplan]
We repeatedly apply Weyl differencing until the original degree-$k$ phase has been replaced by a family of linear phases. The leading coefficient of each final linear phase is $k!\alpha h_1\cdots h_{k-1}$, so the remaining task is to average the elementary bound for a linear exponential sum over all shift products. The rational approximation $\alpha=a/q+O(q^{-2})$ and a divisor-counting estimate control this average with a loss $N^\varepsilon$. Taking the $2^{k-1}$-st root of the resulting estimate gives the exponent $2^{1-k}$.
[/proofplan]
[step:Record the two elementary estimates used after differencing]
Define the distance-to-nearest-integer map
\begin{align*}
\|\cdot\|_{\mathbb R/\mathbb Z}:\mathbb R\to [0,1/2]
\end{align*}
by
\begin{align*}
\|t\|_{\mathbb R/\mathbb Z}:=\inf_{m\in\mathbb Z}|t-m|.
\end{align*}
We write $A\lesssim_{k,\varepsilon}B$ to mean that there exists a constant $C=C(k,\varepsilon)>0$ such that $A\le C B$; similarly, $A\lesssim_k B$ and $A\lesssim_c B$ mean that the implied constant depends only on the displayed parameters.
We shall use the following elementary linear exponential sum estimate: for every $M\in\mathbb N$, every $L\in\mathbb Z$, and every $\theta\in\mathbb R$,
\begin{align*}
\left|\sum_{1\le n\le M}e(\theta n+L)\right|\le \min\left(M,\frac{1}{2\|\theta\|_{\mathbb R/\mathbb Z}}\right),
\end{align*}
with the convention that the right-hand side is $M$ when $\|\theta\|_{\mathbb R/\mathbb Z}=0$.
Indeed, if $\|\theta\|_{\mathbb R/\mathbb Z}=0$, then the triangle inequality gives the bound $M$. Otherwise the finite geometric series identity gives
\begin{align*}
\sum_{1\le n\le M}e(\theta n+L)=e(L+\theta)\frac{1-e(M\theta)}{1-e(\theta)}.
\end{align*}
Since $|1-e(M\theta)|\le 2$ and $|1-e(\theta)|=2|\sin(\pi\theta)|\ge 4\|\theta\|_{\mathbb R/\mathbb Z}$, the claimed estimate follows.
We also use the following standard reciprocal-spacing estimate, whose proof is a residue-class decomposition together with the divisor bound for products of $k-1$ integers. For every fixed $k\ge 2$ and every $\varepsilon>0$, there is a constant $C_1=C_1(k,\varepsilon)>0$ such that the following holds. If $c\in\mathbb N$ satisfies $1\le c\le k!$, if $\beta\in\mathbb R$, if $a,q\in\mathbb Z$ satisfy $q\ge 1$, $\gcd(a,q)=1$, and
\begin{align*}
\left|\beta-\frac{ca}{q}\right|\le \frac{c}{q^2},
\end{align*}
then
\begin{align*}
\sum_{1\le h_1,\dots,h_{k-1}\le N}\min\left(N,\frac{1}{\|\beta h_1\cdots h_{k-1}\|_{\mathbb R/\mathbb Z}}\right)\le C_1N^{k+\varepsilon}\left(\frac{1}{q}+\frac{1}{N}+\frac{q}{N^k}\right).
\end{align*}
To justify this estimate, reduce the fraction $ca/q$ to $b/r$ with $\gcd(b,r)=1$. Then $r\mid q$, $q/r\le c$, and therefore
\begin{align*}
\frac{1}{r}\le \frac{c}{q},\qquad r\le q.
\end{align*}
For each integer $m\ge 1$, let $d_{k-1}(m;N)$ denote the number of tuples $(h_1,\dots,h_{k-1})\in\{1,\dots,N\}^{k-1}$ with product $m$. The standard divisor estimate gives, for every $\varepsilon>0$,
\begin{align*}
d_{k-1}(m;N)\le C_2m^{\varepsilon/k}
\end{align*}
with $C_2=C_2(k,\varepsilon)>0$. Since $m\le N^{k-1}$, this implies
\begin{align*}
d_{k-1}(m;N)\le C_2N^\varepsilon.
\end{align*}
Hence it is enough to bound
\begin{align*}
\sum_{1\le m\le N^{k-1}}\min\left(N,\frac{1}{\|\beta m\|_{\mathbb R/\mathbb Z}}\right).
\end{align*}
The usual spacing lemma for a number within $O(r^{-2})$ of a reduced fraction of denominator $r$ gives
\begin{align*}
\sum_{1\le m\le X}\min\left(Y,\frac{1}{\|\beta m\|_{\mathbb R/\mathbb Z}}\right)\lesssim_c \frac{XY}{r}+(X+r)\log(2r)
\end{align*}
for $X,Y\ge 1$. Applying this with $X=N^{k-1}$ and $Y=N$, and absorbing $\log(2r)$ into $N^{\varepsilon/2}$ after increasing the constant, gives
\begin{align*}
\sum_{1\le m\le N^{k-1}}\min\left(N,\frac{1}{\|\beta m\|_{\mathbb R/\mathbb Z}}\right)\lesssim_{k,\varepsilon} N^{\varepsilon/2}\left(\frac{N^k}{q}+N^{k-1}+q\right).
\end{align*}
Using the divisor estimate with exponent $\varepsilon/2$ instead of $\varepsilon$, the product of the two losses is $N^\varepsilon$. Multiplying by the divisor-counting factor gives the asserted reciprocal-spacing estimate. The spacing lemma and divisor bound are standard prerequisites not yet resolved to theorem IDs in the wiki.
[/step]
[step:Iterate Weyl differencing down to linear phases]
For $r\in\{0,1,\dots,k-1\}$ and $h=(h_1,\dots,h_r)\in\mathbb Z^r$, define the iterated finite difference
\begin{align*}
\Delta_hP:\mathbb Z\to\mathbb R
\end{align*}
recursively by $\Delta_{\varnothing}P=P$ and
\begin{align*}
\Delta_{(h_1,\dots,h_r)}P(n):=\Delta_{(h_1,\dots,h_{r-1})}P(n+h_r)-\Delta_{(h_1,\dots,h_{r-1})}P(n)
\end{align*}
for $r\ge 1$.
By applying the [[Weyl Differencing Lemma](/theorems/9052)][citetheorem:9052] successively with shift parameter $H:=N$ at each stage, and absorbing the harmless constants depending only on $k$ into the notation $\lesssim_k$, we obtain
\begin{align*}
\left|\sum_{1\le n\le N}e(P(n))\right|^{2^{k-1}}\lesssim_k N^{2^{k-1}-k}\sum_{1\le h_1,\dots,h_{k-1}\le N}\left|\sum_{n\in I_h}e(\Delta_hP(n))\right|+N^{2^{k-1}-1}.
\end{align*}
Here $h=(h_1,\dots,h_{k-1})$, and $I_h\subset\mathbb Z$ is an integer interval depending on $h$ whose cardinality satisfies $|I_h|\le N$.
[guided]
The point of Weyl differencing is to replace one difficult polynomial sum by many easier sums with lower-degree phases. We define the finite differences carefully because the coefficient of the final linear phase is the quantity that will interact with the rational approximation to $\alpha$.
For $r\in\{0,1,\dots,k-1\}$ and $h=(h_1,\dots,h_r)\in\mathbb Z^r$, define
\begin{align*}
\Delta_hP:\mathbb Z\to\mathbb R
\end{align*}
by $\Delta_{\varnothing}P=P$, and for $r\ge 1$ by
\begin{align*}
\Delta_{(h_1,\dots,h_r)}P(n):=\Delta_{(h_1,\dots,h_{r-1})}P(n+h_r)-\Delta_{(h_1,\dots,h_{r-1})}P(n).
\end{align*}
Thus each differencing step compares the previous phase at two points separated by a shift.
We now apply the [Weyl Differencing Lemma][citetheorem:9052] repeatedly. At each stage the summation is over an integer interval of cardinality at most $N$, the phase is a real-valued function on the relevant integer set because each $\Delta_hP$ maps $\mathbb Z$ to $\mathbb R$, and we choose a shift parameter $H$ comparable to the current interval length. These are exactly the data required by the lemma: an integer length, an integer shift range, a real-valued phase, and the additive character $e(t)=\exp(2\pi i t)$. At the first application it bounds the square of the original sum by an average of correlation sums with phase $P(n+h_1)-P(n)$. Applying the same lemma to each of those sums produces phases with two finite differences. Continuing $k-1$ times gives phases $\Delta_hP$ with $h=(h_1,\dots,h_{k-1})$. The integer intervals shrink near the endpoints after each shift, so we denote the final interval by $I_h\subset\mathbb Z$; its length is at most $N$ because it is always contained in an interval obtained from the original summation range.
The repeated Cauchy-Schwarz inequalities in the differencing lemma produce the power $2^{k-1}$ on the left. The normalization factors give
\begin{align*}
\left|\sum_{1\le n\le N}e(P(n))\right|^{2^{k-1}}\lesssim_k N^{2^{k-1}-k}\sum_{1\le h_1,\dots,h_{k-1}\le N}\left|\sum_{n\in I_h}e(\Delta_hP(n))\right|+N^{2^{k-1}-1}.
\end{align*}
The final term $N^{2^{k-1}-1}$ accounts for endpoint and zero-shift contributions. It is harmless because it is dominated by the final bound after taking the $2^{k-1}$-st root.
[/guided]
[/step]
[step:Identify the leading coefficient of each final linear phase]
Fix $h=(h_1,\dots,h_{k-1})\in\{1,\dots,N\}^{k-1}$. Since $P$ has degree $k$ and leading coefficient $\alpha$, the [[Leading Coefficient of an Iterated Finite Difference](/theorems/9053)][citetheorem:9053] gives
\begin{align*}
\Delta_hP(n)=k!\alpha h_1\cdots h_{k-1}n+C_h
\end{align*}
for all $n\in\mathbb Z$, where $C_h\in\mathbb R$ is a constant depending on $h$ and on the coefficients of $P$.
Since $I_h$ is an integer interval of length at most $N$, translating its summation index reduces the sum to one over $\{1,\dots,M\}$ for some $M\le N$, and the translation only changes the constant term in the phase. Therefore the linear exponential sum estimate with
\begin{align*}
\theta_h:=k!\alpha h_1\cdots h_{k-1}
\end{align*}
gives
\begin{align*}
\left|\sum_{n\in I_h}e(\Delta_hP(n))\right|\le \min\left(N,\frac{1}{2\|\theta_h\|_{\mathbb R/\mathbb Z}}\right).
\end{align*}
Increasing constants depending only on $k$, this implies
\begin{align*}
\left|\sum_{n\in I_h}e(\Delta_hP(n))\right|\lesssim_k \min\left(N,\frac{1}{\|k!\alpha h_1\cdots h_{k-1}\|_{\mathbb R/\mathbb Z}}\right).
\end{align*}
[/step]
[step:Average the final linear bounds using the rational approximation]
Set
\begin{align*}
A:=\frac{1}{q}+\frac{1}{N}+\frac{q}{N^k}.
\end{align*}
The hypothesis gives
\begin{align*}
\left|k!\alpha-\frac{k!a}{q}\right|\le \frac{k!}{q^2}.
\end{align*}
Applying the reciprocal-spacing estimate from the first step with $\beta=k!\alpha$ and $c=k!$, we get
\begin{align*}
\sum_{1\le h_1,\dots,h_{k-1}\le N}\min\left(N,\frac{1}{\|k!\alpha h_1\cdots h_{k-1}\|_{\mathbb R/\mathbb Z}}\right)\lesssim_{k,\varepsilon} N^{k+\varepsilon}A.
\end{align*}
Combining this with the differencing estimate yields
\begin{align*}
\left|\sum_{1\le n\le N}e(P(n))\right|^{2^{k-1}}\lesssim_{k,\varepsilon} N^{2^{k-1}-k}N^{k+\varepsilon}A+N^{2^{k-1}-1}.
\end{align*}
Since $A\ge N^{-1}$, the endpoint term satisfies
\begin{align*}
N^{2^{k-1}-1}\le N^{2^{k-1}}A.
\end{align*}
Thus, after increasing the implicit constant,
\begin{align*}
\left|\sum_{1\le n\le N}e(P(n))\right|^{2^{k-1}}\lesssim_{k,\varepsilon} N^{2^{k-1}+\varepsilon}A.
\end{align*}
[/step]
[step:Take the final root and absorb the harmless epsilon loss]
Taking the $2^{k-1}$-st root gives
\begin{align*}
\left|\sum_{1\le n\le N}e(P(n))\right|\lesssim_{k,\varepsilon} N^{1+\varepsilon/2^{k-1}}A^{2^{1-k}}.
\end{align*}
Since the statement allows an arbitrary positive $\varepsilon$, replace $\varepsilon/2^{k-1}$ by $\varepsilon$ by applying the preceding estimate with $2^{k-1}\varepsilon$ in place of $\varepsilon$. Recalling the definition of $A$, we obtain
\begin{align*}
\left|\sum_{1\le n\le N}e(P(n))\right|\lesssim_{k,\varepsilon} N^{1+\varepsilon}\left(\frac{1}{q}+\frac{1}{N}+\frac{q}{N^k}\right)^{2^{1-k}}.
\end{align*}
This is the asserted Weyl inequality.
[/step]