[proofplan]
We first reduce the theorem from an arbitrary finite closed interval $[a,b]$ to the interval $[0,1]$ by an affine change of variables, treating the degenerate case $a=b$ separately. On $[0,1]$, we approximate the transported function by its Bernstein polynomials. The Bernstein error is written as a weighted average, split into indices near $t$ and far from $t$, with [uniform continuity](/page/Uniform%20Continuity) controlling the near part and the binomial variance identity controlling the far part uniformly in $t$.
[/proofplan]
[step:Dispose of the degenerate interval case]
If $a=b$, define the polynomial
\begin{align*}
p: \mathbb{R} &\to \mathbb{R} \\
x &\mapsto f(a).
\end{align*}
Then for the only point $x \in [a,b]$, namely $x=a$, we have $|f(x)-p(x)|=0 \leq \varepsilon$. Hence the theorem holds when $a=b$.
[/step]
[step:Transport the problem to the unit interval]
Assume now that $a<b$. Define the affine map
\begin{align*}
A: [0,1] &\to [a,b] \\
t &\mapsto a+(b-a)t,
\end{align*}
and define
\begin{align*}
g: [0,1] &\to \mathbb{R} \\
t &\mapsto f(A(t)).
\end{align*}
Since $A$ is [continuous](/page/Continuity) and $f \in C[a,b]$, the composition $g=f\circ A$ belongs to $C[0,1]$. It is enough to prove that for every $\varepsilon>0$ there is a polynomial $q:\mathbb{R}\to\mathbb{R}$ such that $|g(t)-q(t)|\leq\varepsilon$ for all $t\in[0,1]$, because then
\begin{align*}
p: \mathbb{R} &\to \mathbb{R} \\
x &\mapsto q\left(\frac{x-a}{b-a}\right)
\end{align*}
is a polynomial and, for every $x\in[a,b]$, the point $t=(x-a)/(b-a)$ lies in $[0,1]$, so
\begin{align*}
|f(x)-p(x)|=|g(t)-q(t)|\leq \varepsilon.
\end{align*}
[/step]
[step:Define the Bernstein polynomial and express its error as a weighted average]
Fix $\varepsilon>0$. For each $n\in\mathbb{N}$, define the polynomial
\begin{align*}
B_n: \mathbb{R} &\to \mathbb{R} \\
t &\mapsto \sum_{k=0}^{n} g\left(\frac{k}{n}\right)\binom{n}{k}t^k(1-t)^{n-k}.
\end{align*}
For $t\in[0,1]$, define the Bernstein weight
\begin{align*}
w_{n,k}(t):=\binom{n}{k}t^k(1-t)^{n-k}
\end{align*}
for each integer $k$ with $0\leq k\leq n$. By the [binomial theorem](/theorems/750),
\begin{align*}
\sum_{k=0}^{n} w_{n,k}(t)
=
\sum_{k=0}^{n}\binom{n}{k}t^k(1-t)^{n-k}
=
(t+(1-t))^n
=
1.
\end{align*}
Therefore, for every $t\in[0,1]$,
\begin{align*}
B_n(t)-g(t)
&=
\sum_{k=0}^{n}\left[g\left(\frac{k}{n}\right)-g(t)\right]w_{n,k}(t).
\end{align*}
[/step]
[step:Estimate the far weights by the binomial variance identity]
For every $n\in\mathbb{N}$ and every $t\in[0,1]$,
\begin{align*}
\sum_{k=0}^{n}\left(\frac{k}{n}-t\right)^2w_{n,k}(t)
=
\frac{t(1-t)}{n}
\leq
\frac{1}{4n}.
\end{align*}
Indeed, differentiating the identity $(s+1)^n=\sum_{k=0}^{n}\binom{n}{k}s^k$ after multiplying by suitable powers of $t$ and $1-t$ gives
\begin{align*}
\sum_{k=0}^{n}k\,w_{n,k}(t)&=nt,\\
\sum_{k=0}^{n}k(k-1)\,w_{n,k}(t)&=n(n-1)t^2.
\end{align*}
Hence
\begin{align*}
\sum_{k=0}^{n}k^2w_{n,k}(t)
&=
n(n-1)t^2+nt,
\end{align*}
and subtracting the square of the first moment gives
\begin{align*}
\sum_{k=0}^{n}\left(\frac{k}{n}-t\right)^2w_{n,k}(t)
&=
\frac{1}{n^2}\sum_{k=0}^{n}k^2w_{n,k}(t)-2t\frac{1}{n}\sum_{k=0}^{n}k w_{n,k}(t)+t^2\sum_{k=0}^{n}w_{n,k}(t)\\
&=
\frac{t(1-t)}{n}.
\end{align*}
Since $0\leq t\leq 1$, we have $t(1-t)\leq 1/4$.
[guided]
The purpose of this step is to quantify how much Bernstein weight can sit far from the point $t$. The weights $w_{n,k}(t)$ behave like the probability weights of a binomial random variable divided by $n$, but we only need the elementary finite-sum identities.
For each $n\in\mathbb{N}$, each $t\in[0,1]$, and each integer $k$ with $0\leq k\leq n$, the weight is
\begin{align*}
w_{n,k}(t)=\binom{n}{k}t^k(1-t)^{n-k}.
\end{align*}
The binomial theorem gives
\begin{align*}
\sum_{k=0}^{n}w_{n,k}(t)=1.
\end{align*}
To compute the second central moment, we use the standard binomial derivative identities. First,
\begin{align*}
\sum_{k=0}^{n}k\,w_{n,k}(t)=nt.
\end{align*}
Second,
\begin{align*}
\sum_{k=0}^{n}k(k-1)\,w_{n,k}(t)=n(n-1)t^2.
\end{align*}
Adding the first identity to the second gives the second moment:
\begin{align*}
\sum_{k=0}^{n}k^2w_{n,k}(t)
=
n(n-1)t^2+nt.
\end{align*}
Now expand the square in the central moment:
\begin{align*}
\sum_{k=0}^{n}\left(\frac{k}{n}-t\right)^2w_{n,k}(t)
&=
\frac{1}{n^2}\sum_{k=0}^{n}k^2w_{n,k}(t)-2t\frac{1}{n}\sum_{k=0}^{n}k w_{n,k}(t)+t^2\sum_{k=0}^{n}w_{n,k}(t)\\
&=
\frac{n(n-1)t^2+nt}{n^2}-2t^2+t^2\\
&=
\frac{t(1-t)}{n}.
\end{align*}
Finally, the elementary inequality $t(1-t)\leq 1/4$ for $t\in[0,1]$ gives
\begin{align*}
\sum_{k=0}^{n}\left(\frac{k}{n}-t\right)^2w_{n,k}(t)
\leq
\frac{1}{4n}.
\end{align*}
This is the variance estimate that will force the far part of the Bernstein average to vanish uniformly in $t$.
[/guided]
[/step]
[step:Split the Bernstein error into near and far indices]
Since $g\in C[0,1]$ and $[0,1]$ is compact, the [Heine-Cantor Theorem](/theorems/280) implies that $g$ is uniformly continuous. Apply uniform continuity with tolerance $\varepsilon/2$: there exists $\delta>0$ such that for all $s,t\in[0,1]$,
\begin{align*}
|s-t|<\delta
\quad\Longrightarrow\quad
|g(s)-g(t)|<\frac{\varepsilon}{2}.
\end{align*}
Define
\begin{align*}
M:=\max_{t\in[0,1]}|g(t)|.
\end{align*}
The maximum exists by the [Extreme Value Theorem for compact spaces](/theorems/304), because $g$ is continuous on the [compact](/page/Compact%20Space) interval $[0,1]$.
For fixed $n\in\mathbb{N}$ and $t\in[0,1]$, split the indices into the near set
\begin{align*}
N_{n,t}:=\left\{k\in\{0,\dots,n\}:\left|\frac{k}{n}-t\right|<\delta\right\}
\end{align*}
and the far set
\begin{align*}
F_{n,t}:=\left\{k\in\{0,\dots,n\}:\left|\frac{k}{n}-t\right|\geq\delta\right\}.
\end{align*}
Using the weighted-average formula for the error, the triangle inequality, the near estimate, and $|g(k/n)-g(t)|\leq 2M$, we obtain
\begin{align*}
|B_n(t)-g(t)|
&\leq
\sum_{k=0}^{n}\left|g\left(\frac{k}{n}\right)-g(t)\right|w_{n,k}(t)\\
&=
\sum_{k\in N_{n,t}}\left|g\left(\frac{k}{n}\right)-g(t)\right|w_{n,k}(t)
+
\sum_{k\in F_{n,t}}\left|g\left(\frac{k}{n}\right)-g(t)\right|w_{n,k}(t)\\
&\leq
\frac{\varepsilon}{2}\sum_{k\in N_{n,t}}w_{n,k}(t)
+
2M\sum_{k\in F_{n,t}}w_{n,k}(t)\\
&\leq
\frac{\varepsilon}{2}
+
2M\sum_{k\in F_{n,t}}w_{n,k}(t).
\end{align*}
For $k\in F_{n,t}$, we have $\delta^2\leq (k/n-t)^2$, so
\begin{align*}
\sum_{k\in F_{n,t}}w_{n,k}(t)
&\leq
\frac{1}{\delta^2}\sum_{k\in F_{n,t}}\left(\frac{k}{n}-t\right)^2w_{n,k}(t)\\
&\leq
\frac{1}{\delta^2}\sum_{k=0}^{n}\left(\frac{k}{n}-t\right)^2w_{n,k}(t)\\
&\leq
\frac{1}{4n\delta^2}.
\end{align*}
Therefore, for every $t\in[0,1]$,
\begin{align*}
|B_n(t)-g(t)|
\leq
\frac{\varepsilon}{2}+\frac{M}{2n\delta^2}.
\end{align*}
[guided]
We now turn the variance estimate into an approximation estimate. The obstacle is that uniform continuity controls $|g(k/n)-g(t)|$ only when $k/n$ is close to $t$. Therefore we split the finite sum into indices close to $t$ and indices far from $t$.
Because $g\in C[0,1]$ and $[0,1]$ is compact, the Heine-[Cantor Theorem](/theorems/759) gives uniform continuity. Applying it with tolerance $\varepsilon/2$, choose $\delta>0$ such that for all $s,t\in[0,1]$,
\begin{align*}
|s-t|<\delta
\quad\Longrightarrow\quad
|g(s)-g(t)|<\frac{\varepsilon}{2}.
\end{align*}
Also define
\begin{align*}
M:=\max_{t\in[0,1]}|g(t)|.
\end{align*}
This maximum exists by the [Extreme Value Theorem for compact spaces](/theorems/304), since $g$ is continuous on the compact interval $[0,1]$.
Fix $n\in\mathbb{N}$ and $t\in[0,1]$. Define the near indices by
\begin{align*}
N_{n,t}:=\left\{k\in\{0,\dots,n\}:\left|\frac{k}{n}-t\right|<\delta\right\},
\end{align*}
and define the far indices by
\begin{align*}
F_{n,t}:=\left\{k\in\{0,\dots,n\}:\left|\frac{k}{n}-t\right|\geq\delta\right\}.
\end{align*}
The Bernstein error is
\begin{align*}
B_n(t)-g(t)
=
\sum_{k=0}^{n}\left[g\left(\frac{k}{n}\right)-g(t)\right]w_{n,k}(t).
\end{align*}
Taking absolute values and applying the triangle inequality gives
\begin{align*}
|B_n(t)-g(t)|
&\leq
\sum_{k=0}^{n}\left|g\left(\frac{k}{n}\right)-g(t)\right|w_{n,k}(t).
\end{align*}
On the near set $N_{n,t}$, the definition of $\delta$ gives
\begin{align*}
\left|g\left(\frac{k}{n}\right)-g(t)\right|<\frac{\varepsilon}{2}.
\end{align*}
On the far set $F_{n,t}$, we use the uniform bound $|g|\leq M$, which gives
\begin{align*}
\left|g\left(\frac{k}{n}\right)-g(t)\right|
\leq
\left|g\left(\frac{k}{n}\right)\right|+|g(t)|
\leq
2M.
\end{align*}
Thus
\begin{align*}
|B_n(t)-g(t)|
&\leq
\frac{\varepsilon}{2}\sum_{k\in N_{n,t}}w_{n,k}(t)
+
2M\sum_{k\in F_{n,t}}w_{n,k}(t)\\
&\leq
\frac{\varepsilon}{2}
+
2M\sum_{k\in F_{n,t}}w_{n,k}(t),
\end{align*}
because the weights are non-negative and their total sum is $1$.
It remains to bound the far weight. If $k\in F_{n,t}$, then
\begin{align*}
\delta^2\leq \left(\frac{k}{n}-t\right)^2.
\end{align*}
Multiplying by the non-negative weight $w_{n,k}(t)$ and summing over $F_{n,t}$ gives
\begin{align*}
\sum_{k\in F_{n,t}}w_{n,k}(t)
&\leq
\frac{1}{\delta^2}\sum_{k\in F_{n,t}}\left(\frac{k}{n}-t\right)^2w_{n,k}(t)\\
&\leq
\frac{1}{\delta^2}\sum_{k=0}^{n}\left(\frac{k}{n}-t\right)^2w_{n,k}(t).
\end{align*}
The variance estimate from the previous step gives
\begin{align*}
\sum_{k\in F_{n,t}}w_{n,k}(t)
\leq
\frac{1}{4n\delta^2}.
\end{align*}
Substituting this into the error estimate yields
\begin{align*}
|B_n(t)-g(t)|
\leq
\frac{\varepsilon}{2}+\frac{M}{2n\delta^2}.
\end{align*}
This estimate is uniform in $t\in[0,1]$, which is the essential point needed for uniform approximation.
[/guided]
[/step]
[step:Choose a Bernstein polynomial that achieves the required accuracy]
Choose $n\in\mathbb{N}$ large enough that
\begin{align*}
\frac{M}{2n\delta^2}\leq \frac{\varepsilon}{2}.
\end{align*}
Then the previous estimate gives, for every $t\in[0,1]$,
\begin{align*}
|B_n(t)-g(t)|\leq \varepsilon.
\end{align*}
Thus $q:=B_n$ is a polynomial satisfying $|g(t)-q(t)|\leq\varepsilon$ for all $t\in[0,1]$. By the affine reduction step, the polynomial
\begin{align*}
p: \mathbb{R} &\to \mathbb{R} \\
x &\mapsto q\left(\frac{x-a}{b-a}\right)
\end{align*}
satisfies $|f(x)-p(x)|\leq\varepsilon$ for all $x\in[a,b]$. This proves the theorem.
[/step]