[proofplan]
We first reduce the problem to the reference interval $[0,1]$ by an affine change of variables and check that both the best approximation error and the $r$-th modulus of smoothness are invariant under this normalization. Let $\mathcal{P}_{r-1}$ denote the [vector space](/page/Vector%20Space) of real polynomials of degree at most $r-1$, restricted to the interval under discussion. On $[0,1]$, we prove the needed estimate by contradiction: the interpolation projector normalizes away the polynomial kernel, while compactness for anchored functions with vanishing $r$-th differences forces a limiting polynomial with too many zeros. Scaling then returns the estimate on the original interval.
[/proofplan]
[step:Reduce the estimate to the reference interval $[0,1]$]
If $I=\{a\}$ is a singleton, then the constant polynomial $p(x)=f(a)$ satisfies $f-p=0$ on $I$, so $E_{r-1}(f;I)=0$ and the desired estimate follows. We may therefore assume that $I = [a,b] \subset \mathbb{R}$ with $a < b$, and define the affine map
\begin{align*}
T: [0,1] &\to I
\end{align*}
by $T(y) = a + (b-a)y$. Define
\begin{align*}
g: [0,1] &\to \mathbb{R}
\end{align*}
by $g(y) = f(T(y))$. For a compact interval $J \subset \mathbb{R}$ and a [continuous function](/page/Continuous%20Function) $v \in C(J)$, define the best uniform approximation error by
\begin{align*}
E_{r-1}(v;J) := \inf_{p \in \mathcal{P}_{r-1}} \|v-p\|_{C(J)}.
\end{align*}
For any function $v$ on an interval, any $h \geq 0$, and any point $x$ for which $x,x+h,\dots,x+rh$ lie in the interval, define the $r$-th forward finite difference by
\begin{align*}
\Delta_h^r v(x) := \sum_{k=0}^r (-1)^{r-k}\binom{r}{k}v(x+kh).
\end{align*}
For $t \geq 0$, define the $r$-th modulus of smoothness on $J$ by
\begin{align*}
\omega_r(v,t;J) := \sup\{|\Delta_h^r v(x)| : 0 \leq h \leq t,\ x,x+h,\dots,x+rh \in J\}.
\end{align*}
For every polynomial $p \in \mathcal{P}_{r-1}$ on $I$, the function $q := p \circ T$ belongs to $\mathcal{P}_{r-1}$ on $[0,1]$. Conversely, every $q \in \mathcal{P}_{r-1}$ on $[0,1]$ has the form $q = p \circ T$ for a unique $p \in \mathcal{P}_{r-1}$ on $I$, namely the polynomial map $p:I\to\mathbb{R}$ defined by
\begin{align*}
p(x) := q\left(\frac{x-a}{b-a}\right).
\end{align*}
Therefore
\begin{align*}
E_{r-1}(f;I) = E_{r-1}(g;[0,1]).
\end{align*}
Next let $0 \leq \eta \leq 1$ and let $y,y+\eta,\dots,y+r\eta \in [0,1]$. Set $x := T(y)$ and $h := (b-a)\eta$. Then $x,x+h,\dots,x+rh \in I$, and
\begin{align*}
\Delta_\eta^r g(y) = \Delta_h^r f(x).
\end{align*}
Conversely every admissible $r$-th finite difference of $f$ on $I$ arises in this way. Hence
\begin{align*}
\omega_r(g,1;[0,1]) = \omega_r(f,|I|;I).
\end{align*}
It is therefore enough to prove that there is a constant $A_r > 0$ such that
\begin{align*}
E_{r-1}(g;[0,1]) \leq A_r \omega_r(g,1;[0,1])
\end{align*}
for every $g \in C([0,1])$.
[guided]
The affine normalization removes the interval length from the problem. We define the map
\begin{align*}
T: [0,1] &\to I
\end{align*}
by $T(y) = a+(b-a)y$, and we pull $f$ back to
\begin{align*}
g: [0,1] &\to \mathbb{R}
\end{align*}
by $g(y)=f(T(y))$. We also fix the approximation and finite-difference notation. For a compact interval $J \subset \mathbb{R}$ and $v \in C(J)$, the best approximation error is
\begin{align*}
E_{r-1}(v;J) := \inf_{p \in \mathcal{P}_{r-1}} \|v-p\|_{C(J)}.
\end{align*}
For any function $v$ on an interval, any $h \geq 0$, and any admissible point $x$, set
\begin{align*}
\Delta_h^r v(x) := \sum_{k=0}^r (-1)^{r-k}\binom{r}{k}v(x+kh).
\end{align*}
For $t \geq 0$, the modulus of smoothness is
\begin{align*}
\omega_r(v,t;J) := \sup\{|\Delta_h^r v(x)| : 0 \leq h \leq t,\ x,x+h,\dots,x+rh \in J\}.
\end{align*}
This is the supremum of the absolute values of all admissible $r$-th differences with step at most $t$.
The first quantity to check is the best polynomial approximation error. Composition with $T$ sends every polynomial of degree at most $r-1$ on $I$ to a polynomial of degree at most $r-1$ on $[0,1]$. Composition with $T^{-1}$ gives the inverse correspondence. Since $T$ is a bijection from $[0,1]$ onto $I$, the uniform errors agree:
\begin{align*}
\sup_{x \in I}|f(x)-p(x)| = \sup_{y \in [0,1]} |g(y)-p(T(y))|.
\end{align*}
Taking the infimum over all $p \in \mathcal{P}_{r-1}$ gives
\begin{align*}
E_{r-1}(f;I)=E_{r-1}(g;[0,1]).
\end{align*}
Now we check the finite differences. If $y,y+\eta,\dots,y+r\eta \in [0,1]$ and $h=(b-a)\eta$, then
\begin{align*}
T(y+k\eta)=T(y)+kh
\end{align*}
for each $k \in \{0,\dots,r\}$. Therefore
\begin{align*}
\Delta_\eta^r g(y)
=
\sum_{k=0}^r (-1)^{r-k}\binom{r}{k}g(y+k\eta)
=
\sum_{k=0}^r (-1)^{r-k}\binom{r}{k}f(T(y)+kh)
=
\Delta_h^r f(T(y)).
\end{align*}
The admissibility condition is also preserved: $y,y+\eta,\dots,y+r\eta \in [0,1]$ exactly corresponds to $T(y),T(y)+h,\dots,T(y)+rh \in I$. Hence
\begin{align*}
\omega_r(g,1;[0,1])=\omega_r(f,|I|;I).
\end{align*}
Thus the whole theorem reduces to proving the reference interval estimate on $[0,1]$.
[/guided]
[/step]
[step:Identify the kernel of the finite-difference seminorm using Fréchet's equation]
Let $u \in C([0,1])$ satisfy
\begin{align*}
\omega_r(u,1;[0,1]) = 0.
\end{align*}
Then $\Delta_h^r u(x)=0$ for every $h \geq 0$ and every $x \in [0,1]$ such that $x,x+h,\dots,x+rh \in [0,1]$.
We use the following precise local form of Fréchet's finite-difference theorem. If $J \subset \mathbb{R}$ is an interval and $w:J \to \mathbb{R}$ is continuous with $\Delta_h^r w(x)=0$ whenever $x,x+h,\dots,x+rh \in J$, then $w$ agrees on $J$ with a polynomial of degree at most $r-1$. Its hypotheses are exactly satisfied with $J=[0,1]$ and $w=u$, because $u$ is continuous and the displayed vanishing holds for every admissible equal-step $r$-th difference. Hence $u \in \mathcal{P}_{r-1}$.
Conversely, if $u \in \mathcal{P}_{r-1}$, then each application of the forward difference operator lowers polynomial degree by one. Therefore $\Delta_h^r u(x)=0$ for every admissible pair $(x,h)$, and so
\begin{align*}
\omega_r(u,1;[0,1])=0.
\end{align*}
Thus
\begin{align*}
\omega_r(u,1;[0,1])=0
\end{align*}
if and only if $u \in \mathcal{P}_{r-1}$.
[guided]
The point of this step is to identify exactly which functions are invisible to the $r$-th finite-difference seminorm. Suppose first that $u \in C([0,1])$ and
\begin{align*}
\omega_r(u,1;[0,1]) = 0.
\end{align*}
By the definition of the modulus, this means that every admissible equal-step finite difference vanishes:
\begin{align*}
\Delta_h^r u(x)=0
\end{align*}
whenever $h \geq 0$ and $x,x+h,\dots,x+rh \in [0,1]$.
We now apply the local form of Fréchet's finite-difference theorem stated in the exact proof. The result requires an interval $J \subset \mathbb{R}$, a continuous function $w:J \to \mathbb{R}$, and the identity $\Delta_h^r w(x)=0$ for every admissible equal-step $r$-th difference in $J$. These hypotheses hold with $J=[0,1]$ and $w=u$: continuity is assumed, and the vanishing of all admissible differences is exactly the conclusion of $\omega_r(u,1;[0,1])=0$. Therefore $u$ agrees on $[0,1]$ with a polynomial of degree at most $r-1$, so $u \in \mathcal{P}_{r-1}$.
The converse is algebraic. If $u \in \mathcal{P}_{r-1}$, then applying one forward difference lowers the degree by one, and after $r$ applications the result is the zero polynomial. Hence every admissible $r$-th difference of $u$ is zero, so
\begin{align*}
\omega_r(u,1;[0,1])=0.
\end{align*}
This proves that the kernel of the seminorm $u \mapsto \omega_r(u,1;[0,1])$ is precisely $\mathcal{P}_{r-1}$.
[/guided]
[/step]
[step:Construct a bounded projector from sampled values]
Choose distinct points $t_0,\dots,t_{r-1} \in [0,1]$; for instance, when $r\geq 2$ take
\begin{align*}
t_j:=\frac{j}{r-1}
\end{align*}
for $j\in\{0,\dots,r-1\}$, and when $r=1$ take $t_0=0$. Let $L_j \in \mathcal{P}_{r-1}$ denote the Lagrange polynomial with $L_j(t_i)=\delta_{ij}$. Define the interpolation operator
\begin{align*}
\Pi_r:C([0,1]) \to \mathcal{P}_{r-1}, \qquad \Pi_r u := \sum_{j=0}^{r-1} u(t_j)L_j.
\end{align*}
Since each $L_j$ is continuous on $[0,1]$, the constant
\begin{align*}
B_r := 1+\sum_{j=0}^{r-1}\|L_j\|_{C([0,1])}
\end{align*}
is finite and depends only on $r$. For every $u\in C([0,1])$,
\begin{align*}
\|u-\Pi_r u\|_{C([0,1])}\leq B_r\|u\|_{C([0,1])}.
\end{align*}
Moreover $\Pi_r p=p$ for every $p\in\mathcal{P}_{r-1}$.
[/step]
[step:Prove the reference interval estimate by compactness and anchoring]
We prove that there is a constant $A_r>0$ such that every $u\in C([0,1])$ satisfies
\begin{align*}
E_{r-1}(u;[0,1])\leq A_r\omega_r(u,1;[0,1]).
\end{align*}
Assume the contrary. Then for each $n\in\mathbb{N}$ there is $u_n\in C([0,1])$ such that
\begin{align*}
E_{r-1}(u_n;[0,1])>n\omega_r(u_n,1;[0,1]).
\end{align*}
After replacing $u_n$ by $u_n/E_{r-1}(u_n;[0,1])$, we may assume
\begin{align*}
E_{r-1}(u_n;[0,1])=1
\end{align*}
and
\begin{align*}
\omega_r(u_n,1;[0,1])<\frac{1}{n}.
\end{align*}
Define $v_n:=u_n-\Pi_r u_n$. Since $\Pi_r p=p$ for every $p\in\mathcal{P}_{r-1}$, subtracting the polynomial $\Pi_r u_n$ does not change the best approximation error or any $r$-th finite difference. Hence
\begin{align*}
E_{r-1}(v_n;[0,1])=1
\end{align*}
and
\begin{align*}
\omega_r(v_n,1;[0,1])<\frac{1}{n}.
\end{align*}
Also $v_n(t_j)=0$ for each $j\in\{0,\dots,r-1\}$. Choosing $p_n\in\mathcal{P}_{r-1}$ with $\|u_n-p_n\|_{C([0,1])}\leq 2$, and using $\Pi_r p_n=p_n$, gives
\begin{align*}
\|v_n\|_{C([0,1])}=\|(I-\Pi_r)(u_n-p_n)\|_{C([0,1])}\leq B_r\|u_n-p_n\|_{C([0,1])}\leq 2B_r.
\end{align*}
[claim:Anchored functions with vanishing $r$-th modulus converge uniformly to zero]
Let $(w_n)_{n=1}^{\infty}\subset C([0,1])$ be uniformly bounded, suppose $w_n(t_j)=0$ for each $j\in\{0,\dots,r-1\}$, and suppose
\begin{align*}
\omega_r(w_n,1;[0,1])\to 0.
\end{align*}
Then $w_n\to 0$ uniformly on $[0,1]$.
[/claim]
[proof]
Let $M>0$ be such that $\|w_n\|_{C([0,1])}\leq M$ for every $n\in\mathbb{N}$. We use the following explicit Marchaud finite-difference estimate on $[0,1]$: for each $r\in\mathbb{N}$ there are constants $K_r,L_r>0$, depending only on $r$, such that every $w\in C([0,1])$ and every $0<\delta\leq 1$ satisfy
\begin{align*}
\omega_1(w,\delta;[0,1])\leq K_r\delta\|w\|_{C([0,1])}+L_r\omega_r(w,1;[0,1]).
\end{align*}
Here $\omega_1$ is the first modulus of smoothness defined by the same formula with $r=1$. Applying this estimate to $w=w_n$ gives, for every $0<\delta\leq1$,
\begin{align*}
\omega_1(w_n,\delta;[0,1])\leq K_rM\delta+L_r\omega_r(w_n,1;[0,1]).
\end{align*}
Since $\omega_r(w_n,1;[0,1])\to0$, the family $(w_n)$ is equicontinuous: given $\varepsilon>0$, choose $\delta>0$ with $K_rM\delta<\varepsilon/2$, and then choose $N\in\mathbb{N}$ such that $L_r\omega_r(w_n,1;[0,1])<\varepsilon/2$ for all $n\geq N$; the finitely many functions $w_1,\dots,w_{N-1}$ are uniformly continuous on $[0,1]$, so decreasing $\delta$ if necessary gives the same bound for them.
The sequence $(w_n)$ is uniformly bounded by $M$ and equicontinuous on the compact [metric space](/page/Metric%20Space) $[0,1]$. The sequential [Arzelà-Ascoli theorem](/theorems/885) therefore gives that every subsequence of $(w_n)$ contains a uniformly convergent further subsequence. Take an arbitrary subsequence $(w_{n_k})_{k=1}^{\infty}$ and choose a further subsequence $(w_{n_{k_\ell}})_{\ell=1}^{\infty}$ converging uniformly to some $w\in C([0,1])$.
For every admissible pair $(x,h)$, the finite-difference functional $z\mapsto \Delta_h^r z(x)$ is continuous with respect to the uniform norm, since it is a finite linear combination of point evaluations. Hence
\begin{align*}
\Delta_h^r w(x)=\lim_{\ell\to\infty}\Delta_h^r w_{n_{k_\ell}}(x)=0.
\end{align*}
By the kernel identification from the previous step, $w\in\mathcal{P}_{r-1}$. [Uniform convergence](/page/Uniform%20Convergence) also preserves the anchored values, hence $w(t_j)=0$ for each $j\in\{0,\dots,r-1\}$. Since the nodes $t_0,\dots,t_{r-1}$ are distinct, a polynomial of degree at most $r-1$ with these $r$ zeros is the zero polynomial. Therefore $w=0$ on $[0,1]$.
Every subsequence has a further subsequence converging uniformly to $0$. If the full sequence did not converge uniformly to $0$, there would be an $\varepsilon>0$ and a subsequence with $\|w_{n_k}\|_{C([0,1])}\geq\varepsilon$ for all $k$, contradicting the further subsequence convergence to $0$. Thus $w_n\to0$ uniformly.
[/proof]
Applying the claim to $(v_n)$ gives $\|v_n\|_{C([0,1])}\to 0$, and therefore
\begin{align*}
1=E_{r-1}(v_n;[0,1])\leq \|v_n\|_{C([0,1])}\to 0,
\end{align*}
a contradiction. Thus the reference interval estimate holds. The constant $A_r$ is supplied by this contradiction argument; its dependence is only on the order $r$, since the compactness step uses only the Marchaud constants $K_r,L_r$, the fixed interpolation nodes chosen above, and the uniform bound obtained from $B_r$.
[guided]
We now prove the fixed-interval estimate rather than citing it. Suppose no constant $A_r$ exists. Then for each $n\in\mathbb{N}$ we can choose $u_n\in C([0,1])$ such that
\begin{align*}
E_{r-1}(u_n;[0,1])>n\omega_r(u_n,1;[0,1]).
\end{align*}
The error $E_{r-1}(u_n;[0,1])$ is nonzero, because otherwise the displayed strict inequality would force $0>n\omega_r(u_n,1;[0,1])$, impossible. Dividing by this positive number, we may assume
\begin{align*}
E_{r-1}(u_n;[0,1])=1
\end{align*}
and
\begin{align*}
\omega_r(u_n,1;[0,1])<\frac{1}{n}.
\end{align*}
The polynomial part is invisible to both sides of the inequality, so we remove it in a controlled way. Define
\begin{align*}
v_n:=u_n-\Pi_r u_n.
\end{align*}
Since $\Pi_r u_n\in\mathcal{P}_{r-1}$ and adding or subtracting a polynomial of degree at most $r-1$ merely translates the approximating polynomial, the best approximation error is unchanged:
\begin{align*}
E_{r-1}(v_n;[0,1])=E_{r-1}(u_n;[0,1])=1.
\end{align*}
The same subtraction does not change the $r$-th finite differences, because every $r$-th finite difference of a polynomial in $\mathcal{P}_{r-1}$ is zero. Hence
\begin{align*}
\omega_r(v_n,1;[0,1])=\omega_r(u_n,1;[0,1])<\frac{1}{n}.
\end{align*}
The interpolation property also gives anchoring:
\begin{align*}
v_n(t_j)=0
\end{align*}
for each $j\in\{0,\dots,r-1\}$.
We still need a uniform bound. Since $E_{r-1}(u_n;[0,1])=1$, the definition of the infimum gives a polynomial $p_n\in\mathcal{P}_{r-1}$ such that
\begin{align*}
\|u_n-p_n\|_{C([0,1])}\leq 2.
\end{align*}
Because $\Pi_r p_n=p_n$, we have
\begin{align*}
v_n=(I-\Pi_r)(u_n-p_n).
\end{align*}
The projector estimate from the previous step therefore gives
\begin{align*}
\|v_n\|_{C([0,1])}\leq B_r\|u_n-p_n\|_{C([0,1])}\leq 2B_r.
\end{align*}
We now prove the compactness conclusion needed here. The sequence $(v_n)$ is uniformly bounded by $2B_r$, is anchored by $v_n(t_j)=0$ for each $j\in\{0,\dots,r-1\}$, and satisfies $\omega_r(v_n,1;[0,1])\to0$. The explicit Marchaud estimate used in the claim says that there are constants $K_r,L_r>0$, depending only on $r$, such that every $w\in C([0,1])$ and every $0<\delta\leq1$ satisfy
\begin{align*}
\omega_1(w,\delta;[0,1])\leq K_r\delta\|w\|_{C([0,1])}+L_r\omega_r(w,1;[0,1]).
\end{align*}
Applying this to $w=v_n$ gives
\begin{align*}
\omega_1(v_n,\delta;[0,1])\leq 2K_rB_r\delta+L_r\omega_r(v_n,1;[0,1]).
\end{align*}
Given $\varepsilon>0$, choose $\delta>0$ with $2K_rB_r\delta<\varepsilon/2$ and then choose $N\in\mathbb{N}$ such that $L_r\omega_r(v_n,1;[0,1])<\varepsilon/2$ for all $n\geq N$. The finitely many functions $v_1,\dots,v_{N-1}$ are uniformly continuous on $[0,1]$, so decreasing $\delta$ if necessary gives equicontinuity for the entire sequence. Since the sequence is also uniformly bounded, the sequential Arzelà-Ascoli theorem implies that every subsequence of $(v_n)$ contains a uniformly convergent further subsequence.
Let $(v_{n_{k_\ell}})$ be such a uniformly convergent further subsequence, and let $v\in C([0,1])$ be its uniform limit. For every admissible pair $(x,h)$, the map $z\mapsto\Delta_h^r z(x)$ is continuous in the uniform norm, because it is a finite linear combination of point evaluations. Hence
\begin{align*}
\Delta_h^r v(x)=\lim_{\ell\to\infty}\Delta_h^r v_{n_{k_\ell}}(x)=0.
\end{align*}
The kernel identification from the previous step then gives $v\in\mathcal{P}_{r-1}$. Also, uniform convergence preserves point values, so
\begin{align*}
v(t_j)=\lim_{\ell\to\infty}v_{n_{k_\ell}}(t_j)=0.
\end{align*}
This holds for every $j\in\{0,\dots,r-1\}$. A polynomial of degree at most $r-1$ with the $r$ distinct zeros $t_0,\dots,t_{r-1}$ is identically zero. Thus every subsequence of $(v_n)$ has a further subsequence converging uniformly to $0$.
This subsequential statement forces the whole sequence to converge uniformly to $0$: otherwise some $\varepsilon>0$ and some subsequence would satisfy $\|v_{n_k}\|_{C([0,1])}\geq\varepsilon$, contradicting the existence of a further subsequence converging uniformly to $0$. Hence
\begin{align*}
\|v_n\|_{C([0,1])}\to 0.
\end{align*}
But the zero polynomial is an admissible approximant, so
\begin{align*}
1=E_{r-1}(v_n;[0,1])\leq \|v_n\|_{C([0,1])}\to 0,
\end{align*}
which is impossible. The contradiction proves the reference interval estimate.
[/guided]
[/step]
[step:Scale the reference interval estimate back to $I$]
Applying the reference estimate to $g=f\circ T$ gives
\begin{align*}
E_{r-1}(g;[0,1]) \leq A_r \omega_r(g,1;[0,1]).
\end{align*}
Using the equalities from the affine reduction step,
\begin{align*}
E_{r-1}(f;I) \leq A_r \omega_r(f,|I|;I).
\end{align*}
Thus the theorem holds with
\begin{align*}
C_r := A_r.
\end{align*}
The constant depends only on $r$, because the reference interval argument depends only on the order of the finite difference and not on $I$ or $f$.
[/step]