[proofplan]
The [representation formula](/theorems/39) reduces the approximation estimate to bounding an average of $r$-th finite differences. The only auxiliary ingredient needed is the elementary scaling inequality for moduli of smoothness: a difference with step $t$ is controlled by $(1+n|t|)^r$ times the modulus at scale $1/n$. We prove this scaling directly from the translation-operator identity, integrate it against the total variation measure $|\mu_n|$, and then take the supremum over $x$.
[/proofplan]
[step:Prove the scaling estimate for finite differences]
For each $h\in\mathbb{T}$, define the translation operator $\tau_h:C(\mathbb{T})\to C(\mathbb{T})$ by
\begin{align*}
(\tau_h f)(x):=f(x+h)
\end{align*}
for every $f\in C(\mathbb{T})$ and $x\in\mathbb{T}$. Then $\Delta_h^r=(\tau_h-I)^r$, where $I:C(\mathbb{T})\to C(\mathbb{T})$ is the identity operator.
Fix $t\in\mathbb{T}$ and choose its representative $\theta\in[-\pi,\pi)$ with $|\theta|=|t|$. If $t=0$, then $\Delta_t^r f=0$, so the desired bound is immediate. Suppose $t\neq 0$. Define
\begin{align*}
m(t,n):=\lceil n|t|\rceil \in \mathbb{N}
\end{align*}
and let $h\in\mathbb{T}$ be represented by $\theta/m(t,n)$. Then $m(t,n)h=t$ in $\mathbb{T}$ and
\begin{align*}
|h|=\frac{|t|}{m(t,n)}\leq \frac{1}{n}.
\end{align*}
Define the [bounded linear operator](/page/Bounded%20Linear%20Operator) $S_{t,n}:C(\mathbb{T})\to C(\mathbb{T})$ by
\begin{align*}
S_{t,n}:=\sum_{k=0}^{m(t,n)-1}\tau_{kh}.
\end{align*}
Since
\begin{align*}
\tau_t-I=\tau_{m(t,n)h}-I=(\tau_h-I)S_{t,n},
\end{align*}
and since $S_{t,n}$ commutes with $\tau_h-I$ because all translations commute, we obtain
\begin{align*}
\Delta_t^r f=(\tau_t-I)^r f=S_{t,n}^r\Delta_h^r f.
\end{align*}
Each translation $\tau_{kh}$ is an isometry on $C(\mathbb{T})$ with the supremum norm, so $\|S_{t,n}g\|_\infty\leq m(t,n)\|g\|_\infty$ for every $g\in C(\mathbb{T})$. Applying this bound to $g=S_{t,n}^{r-1}\Delta_h^r f$ and iterating $r$ times gives
\begin{align*}
\|\Delta_t^r f\|_\infty\leq m(t,n)^r\|\Delta_h^r f\|_\infty.
\end{align*}
Because $|h|\leq 1/n$, the definition of $\omega_r$ gives
\begin{align*}
\|\Delta_h^r f\|_\infty\leq \omega_r\left(f,\frac{1}{n}\right).
\end{align*}
Finally, since $m(t,n)=\lceil n|t|\rceil\leq 1+n|t|$, we have
\begin{align*}
\|\Delta_t^r f\|_\infty\leq (1+n|t|)^r\omega_r\left(f,\frac{1}{n}\right).
\end{align*}
[guided]
We need to compare a difference at the possibly large step $t$ with differences at the small scale $1/n$. The mechanism is purely algebraic: express a large translation as an integer iterate of a smaller translation.
For $h\in\mathbb{T}$, define the translation operator $\tau_h:C(\mathbb{T})\to C(\mathbb{T})$ by
\begin{align*}
(\tau_h f)(x):=f(x+h)
\end{align*}
for every $f\in C(\mathbb{T})$ and $x\in\mathbb{T}$. With this notation the $r$-th difference is
\begin{align*}
\Delta_h^r=(\tau_h-I)^r,
\end{align*}
where $I:C(\mathbb{T})\to C(\mathbb{T})$ is the identity operator.
Fix $t\in\mathbb{T}$, and choose the representative $\theta\in[-\pi,\pi)$ satisfying $|\theta|=|t|$. If $t=0$, then all translated values in the finite difference are the same, so $\Delta_0^r f=0$. Thus assume $t\neq 0$. Define
\begin{align*}
m(t,n):=\lceil n|t|\rceil.
\end{align*}
This is a positive integer. Let $h\in\mathbb{T}$ be the class represented by $\theta/m(t,n)$. Then adding $h$ to itself $m(t,n)$ times gives $t$ in $\mathbb{T}$, and the size of $h$ is small:
\begin{align*}
|h|=\frac{|t|}{m(t,n)}\leq \frac{1}{n}.
\end{align*}
The translation identity
\begin{align*}
a^m-1=(a-1)(1+a+\cdots+a^{m-1})
\end{align*}
applied to the operator $a=\tau_h$ gives
\begin{align*}
\tau_t-I=\tau_{m(t,n)h}-I=(\tau_h-I)\sum_{k=0}^{m(t,n)-1}\tau_{kh}.
\end{align*}
Raising this identity to the $r$-th power is legitimate because all translation operators commute. Therefore
\begin{align*}
\Delta_t^r f=(\tau_t-I)^r f=(\tau_h-I)^r\left(\sum_{k=0}^{m(t,n)-1}\tau_{kh}\right)^r f.
\end{align*}
Each $\tau_{kh}$ preserves the supremum norm, since translation is a bijection of $\mathbb{T}$ onto itself. Hence the operator norm of $\sum_{k=0}^{m(t,n)-1}\tau_{kh}$ on $C(\mathbb{T})$ is at most $m(t,n)$, and the operator norm of its $r$-th power is at most $m(t,n)^r$. Consequently,
\begin{align*}
\|\Delta_t^r f\|_\infty\leq m(t,n)^r\|\Delta_h^r f\|_\infty.
\end{align*}
Since $|h|\leq 1/n$, the defining supremum in the modulus of smoothness includes this particular $h$, so
\begin{align*}
\|\Delta_h^r f\|_\infty\leq \omega_r\left(f,\frac{1}{n}\right).
\end{align*}
Finally $\lceil n|t|\rceil\leq 1+n|t|$, and therefore
\begin{align*}
\|\Delta_t^r f\|_\infty\leq (1+n|t|)^r\omega_r\left(f,\frac{1}{n}\right).
\end{align*}
This is the scaling estimate needed to convert the representation formula into a uniform approximation bound.
[/guided]
[/step]
[step:Integrate the scaling estimate against the representation measure]
Fix $n\in\mathbb{N}$ and $f\in C(\mathbb{T})$. By the assumed representation formula, for every $x\in\mathbb{T}$,
\begin{align*}
f(x)-(f*K_n)(x)=\int_{\mathbb{T}}\Delta_t^r f(x)\,d\mu_n(t).
\end{align*}
Taking absolute values and using the defining property of the total variation measure $|\mu_n|$, we get
\begin{align*}
|f(x)-(f*K_n)(x)|\leq \int_{\mathbb{T}}|\Delta_t^r f(x)|\,d|\mu_n|(t).
\end{align*}
For each $t\in\mathbb{T}$, the pointwise bound $|\Delta_t^r f(x)|\leq \|\Delta_t^r f\|_\infty$ and the scaling estimate from the previous step give
\begin{align*}
|\Delta_t^r f(x)|\leq (1+n|t|)^r\omega_r\left(f,\frac{1}{n}\right).
\end{align*}
Therefore
\begin{align*}
|f(x)-(f*K_n)(x)|\leq \omega_r\left(f,\frac{1}{n}\right)\int_{\mathbb{T}}(1+n|t|)^r\,d|\mu_n|(t).
\end{align*}
Using the assumed moment estimate for $\mu_n$,
\begin{align*}
|f(x)-(f*K_n)(x)|\leq A_r\,\omega_r\left(f,\frac{1}{n}\right).
\end{align*}
[/step]
[step:Take the supremum and choose the constant]
The preceding estimate holds for every $x\in\mathbb{T}$. Taking the supremum over $x$ gives
\begin{align*}
\|f-f*K_n\|_\infty\leq A_r\,\omega_r\left(f,\frac{1}{n}\right).
\end{align*}
Thus the conclusion holds with
\begin{align*}
C_r:=A_r.
\end{align*}
This constant depends only on the parameter $A_r$ associated with the fixed order $r$, and the proof is complete.
[/step]