[step:Bound the resulting linear sums by distance to the nearest integer]
Let $D:=k!$. For $t\in\mathbb R$, define the distance-to-integer map
\begin{align*}
\|\cdot\|_{\mathbb R/\mathbb Z}:\mathbb R\to\left[0,\tfrac12\right],\qquad t\mapsto \operatorname{dist}(t,\mathbb Z)=\inf_{m\in\mathbb Z}|t-m|
\end{align*}.
If $H(h_1,\dots,h_{k-1})=0$, the corresponding inner sum has absolute value at most $N\le X$. There are $\lesssim_k X^{k-2}$ such tuples, so the total contribution of these tuples is $\lesssim_k X^{k-1}$.
Assume now that $H(h_1,\dots,h_{k-1})\ne 0$. There exists $c(h_1,\dots,h_{k-1})\in\mathbb R$ such that, for all $n\in\mathbb Z$,
\begin{align*}
(\delta_{h_{k-1}}\cdots\delta_{h_1}P)(n)=D\alpha_kH(h_1,\dots,h_{k-1})n+c(h_1,\dots,h_{k-1}).
\end{align*}
For every $\theta\in\mathbb R$, every $\gamma\in\mathbb R$, and every integer interval $J\subset\mathbb Z$ with $\#J\le X$, the finite geometric-series identity gives
\begin{align*}
\left|\sum_{n\in J}e(\theta n+\gamma)\right|\le \min\left\{X,\frac{1}{2\|\theta\|_{\mathbb R/\mathbb Z}}\right\},
\end{align*}
with the convention that the second term is $+\infty$ when $\theta\in\mathbb Z$.
Using this with $\theta=D\alpha_kH(h_1,\dots,h_{k-1})$, we obtain
\begin{align*}
|S|^{2^{k-1}}\lesssim_k X^{2^{k-1}-k}\left(X^{k-1}+\sum_{1\le |r|\le X^{k-1}}d_{k-1}(|r|)\min\left\{X,\|D r\alpha_k\|_{\mathbb R/\mathbb Z}^{-1}\right\}\right),
\end{align*}
where the $(k-1)$-fold divisor function is the map
\begin{align*}
d_{k-1}:\mathbb N\to\mathbb N,\qquad m\mapsto \#\{(u_1,\dots,u_{k-1})\in\mathbb N^{k-1}:u_1\cdots u_{k-1}=m\}
\end{align*}.
We use the elementary divisor bound: for each $\eta>0$ there is a constant $C_{k,\eta}>0$ such that
\begin{align*}
d_{k-1}(m)\le C_{k,\eta}m^\eta
\end{align*}
for every $m\in\mathbb N$. Indeed, this follows by induction on $k-1$ from the estimate $\#\{d\in\mathbb N:d\mid m\}\le 2m^{\eta/2}$ for all sufficiently large $m$, with the finitely many smaller values absorbed into $C_{k,\eta}$. Since $1\le r\le X^{k-1}$ implies $r^\eta\le X^{(k-1)\eta}$, choosing $\eta>0$ so that $(k-1)\eta\le\varepsilon$ absorbs the divisor factor into $X^\varepsilon$. Hence
\begin{align*}
|S|^{2^{k-1}}\lesssim_{k,\varepsilon} X^{2^{k-1}-k+\varepsilon}\left(X^{k-1}+\sum_{1\le r\le X^{k-1}}\min\left\{X,\|D r\alpha_k\|_{\mathbb R/\mathbb Z}^{-1}\right\}\right).
\end{align*}
[/step]