[guided]The action equation contains the oscillatory term $-\varepsilon\partial_\theta R$. If we integrate this directly over time $0\le t\le C/\varepsilon$, the factor $\varepsilon$ is not enough by itself, since the time interval has length of order $\varepsilon^{-1}$. We need to use the rapid rotation in $\theta$.
The fast rotation is controlled by
\begin{align*}
\omega(I,s):=\partial_I K(I,\mu(s)).
\end{align*}
The non-separatrix hypothesis is encoded in the bound $|\omega(I,s)|\ge\omega_0>0$ on $J_1\times[0,C]$. This is the place where the proof would fail near a separatrix: the period can become unbounded, equivalently the frequency can approach zero, and the division below would no longer be uniformly controlled.
We separate $R$ into its average and oscillatory parts. Define
\begin{align*}
\overline R(I,s):=\frac{1}{2\pi}\int_0^{2\pi}R(I,\theta,s)\,d\mathcal{L}^1(\theta).
\end{align*}
The function $R(I,\theta,s)-\overline R(I,s)$ has zero average over $\theta$. Therefore it has a periodic primitive in $\theta$. Because $\omega(I,s)$ is bounded away from zero, we can define a smooth periodic function
\begin{align*}
S:J_1\times\mathbb{T}\times[0,C]\to\mathbb{R}
\end{align*}
with zero angle average by requiring
\begin{align*}
\omega(I,s)\,\partial_\theta S(I,\theta,s)=R(I,\theta,s)-\overline R(I,s).
\end{align*}
The zero-average normalization removes the additive constant ambiguity. Smoothness follows from the smoothness of $R$, $K$, and $\lambda$, together with the uniform lower bound on $|\omega|$.[/guided]