[proofplan]
The converse is proved directly from the zeroth and first binomial moment identities, which show that Bernstein operators reproduce constants and the identity function. For the forward implication, we introduce the rescaled error operators $A_n f := n(B_n f-f)$ and use the weak [Voronovskaya formula](/theorems/6880): against compactly supported smooth test functions in $(0,1)$, $A_n$ converges to the distributional operator $\frac{1}{2}x(1-x)\partial_x^2$. Since the hypothesis says $A_n f \to 0$ uniformly, the weak limit forces $\partial_x^2 f=0$ in distributions on $(0,1)$ after division by the positive smooth function $x(1-x)$. A [continuous function](/page/Continuous%20Function) whose second [distributional derivative](/page/Distributional%20Derivative) vanishes is affine on $(0,1)$, and continuity extends the affine identity to $[0,1]$.
[/proofplan]
[step:Verify that Bernstein operators reproduce affine functions]
Let $\mathbf{1}: [0,1] \to \mathbb{R}$ denote the constant function $\mathbf{1}(x)=1$, and let $\operatorname{id}: [0,1] \to \mathbb{R}$ denote the identity function $\operatorname{id}(x)=x$. For every $x \in [0,1]$, the [binomial theorem](/theorems/750) gives
\begin{align*}
(B_n \mathbf{1})(x)=\sum_{k=0}^{n}\binom{n}{k}x^k(1-x)^{n-k}=1.
\end{align*}
Also, using the first moment of the binomial distribution with parameters $n$ and $x$,
\begin{align*}
(B_n \operatorname{id})(x)=\sum_{k=0}^{n}\frac{k}{n}\binom{n}{k}x^k(1-x)^{n-k}=x.
\end{align*}
Therefore, if $f=a\operatorname{id}+b\mathbf{1}$ for some $a,b \in \mathbb{R}$, then linearity of $B_n$ gives
\begin{align*}
B_n f=aB_n\operatorname{id}+bB_n\mathbf{1}=a\operatorname{id}+b\mathbf{1}=f
\end{align*}
for every $n \in \mathbb{N}$.
[/step]
[step:Convert the saturation hypothesis into vanishing weak generator limits]
Assume now that $f \in C([0,1])$ satisfies
\begin{align*}
\|B_n f-f\|_{C([0,1])}=o\left(\frac{1}{n}\right).
\end{align*}
Define the rescaled Bernstein error operator
\begin{align*}
A_n: C([0,1]) &\to C([0,1])
\end{align*}
by
\begin{align*}
A_n g=n(B_n g-g)
\end{align*}
for every $g \in C([0,1])$. The hypothesis is exactly
\begin{align*}
\|A_n f\|_{C([0,1])}\to 0.
\end{align*}
Let $\mathcal{L}^1$ denote one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure), let $C_c^\infty((0,1))$ denote the space of smooth real-valued functions with compact support in $(0,1)$, and let $\mathcal{D}'((0,1))$ denote the space of distributions on $(0,1)$, namely the linear functionals on $C_c^\infty((0,1))$. Hence, for every [test function](/page/Test%20Function) $\varphi \in C_c^\infty((0,1))$,
\begin{align*}
\int_0^1 A_n f(x)\varphi(x)\,d\mathcal{L}^1(x)\to 0,
\end{align*}
because
\begin{align*}
\left|\int_0^1 A_n f(x)\varphi(x)\,d\mathcal{L}^1(x)\right|
\leq \|A_n f\|_{C([0,1])}\int_0^1 |\varphi(x)|\,d\mathcal{L}^1(x).
\end{align*}
[guided]
The hypothesis is a uniform statement about the size of $B_n f-f$. The natural object for saturation is not $B_n f-f$ itself, but the rescaled error
\begin{align*}
A_n f=n(B_n f-f).
\end{align*}
Indeed, saying $\|B_n f-f\|_{C([0,1])}=o(n^{-1})$ is equivalent to saying
\begin{align*}
\|A_n f\|_{C([0,1])}\to 0.
\end{align*}
We now test this [uniform convergence](/page/Uniform%20Convergence) against an arbitrary compactly supported smooth function. Let $\varphi \in C_c^\infty((0,1))$. Since $\varphi$ is continuous with compact support in $[0,1]$, it is integrable with respect to one-dimensional Lebesgue measure $\mathcal{L}^1$. Therefore
\begin{align*}
\left|\int_0^1 A_n f(x)\varphi(x)\,d\mathcal{L}^1(x)\right|
\leq \|A_n f\|_{C([0,1])}\int_0^1 |\varphi(x)|\,d\mathcal{L}^1(x).
\end{align*}
The first factor tends to $0$ by the saturation hypothesis, while the second factor is fixed and finite. Hence
\begin{align*}
\int_0^1 A_n f(x)\varphi(x)\,d\mathcal{L}^1(x)\to 0.
\end{align*}
This is the bridge from uniform approximation to a distributional conclusion.
[/guided]
[/step]
[step:Identify the weak limit of the rescaled Bernstein errors]
We use the weak Voronovskaya formula for Bernstein operators, and record the exact form needed here. For every $g \in C([0,1])$ and every $\varphi \in C_c^\infty((0,1))$,
\begin{align*}
\lim_{n\to\infty}\int_0^1 A_n g(x)\varphi(x)\,d\mathcal{L}^1(x)=\frac{1}{2}\int_0^1 g(x)\frac{d^2}{dx^2}\left(x(1-x)\varphi(x)\right)\,d\mathcal{L}^1(x).
\end{align*}
This is the weak form of Voronovskaya's theorem for Bernstein operators. In the present application its hypotheses are satisfied because $f \in C([0,1])$, $\varphi \in C_c^\infty((0,1))$, and all integrals are finite over the compact interval $[0,1]$ with respect to $\mathcal{L}^1$.
Applying this formula to $g=f$ and combining it with the vanishing limit from the previous step gives
\begin{align*}
\int_0^1 f(x)\frac{d^2}{dx^2}\left(x(1-x)\varphi(x)\right)\,d\mathcal{L}^1(x)=0
\end{align*}
for every $\varphi \in C_c^\infty((0,1))$.
[guided]
The rescaled errors $A_n f=n(B_n f-f)$ have a weak limit even when $f$ is only continuous. The precise tool is the weak Voronovskaya formula for Bernstein operators: if $g \in C([0,1])$ and $\varphi \in C_c^\infty((0,1))$, then
\begin{align*}
\lim_{n\to\infty}\int_0^1 A_n g(x)\varphi(x)\,d\mathcal{L}^1(x)=\frac{1}{2}\int_0^1 g(x)\frac{d^2}{dx^2}\left(x(1-x)\varphi(x)\right)\,d\mathcal{L}^1(x).
\end{align*}
We verify the hypotheses in our situation. The theorem requires a continuous input function on $[0,1]$, and this is exactly the standing assumption $f \in C([0,1])$. It also requires a compactly supported smooth test function in the open interval, and $\varphi \in C_c^\infty((0,1))$ was chosen arbitrarily. Since $\varphi$ has compact support in $(0,1)$, the function $x \mapsto x(1-x)\varphi(x)$ also belongs to $C_c^\infty((0,1))$, so its second ordinary derivative is continuous and compactly supported in $(0,1)$. Thus the integral on the right-hand side is finite because $f$ is continuous on the compact interval $[0,1]$.
Applying the weak Voronovskaya formula with $g=f$ yields
\begin{align*}
\lim_{n\to\infty}\int_0^1 A_n f(x)\varphi(x)\,d\mathcal{L}^1(x)=\frac{1}{2}\int_0^1 f(x)\frac{d^2}{dx^2}\left(x(1-x)\varphi(x)\right)\,d\mathcal{L}^1(x).
\end{align*}
The previous step proved, from the saturation hypothesis, that the left-hand side is $0$. Therefore
\begin{align*}
\int_0^1 f(x)\frac{d^2}{dx^2}\left(x(1-x)\varphi(x)\right)\,d\mathcal{L}^1(x)=0.
\end{align*}
This identity is the point where the approximation estimate becomes a differential constraint on $f$.
[/guided]
[/step]
[step:Translate the weak limit into the distributional equation $D^2 f=0$]
Let $\psi: (0,1) \to \mathbb{R}$ be the smooth function
\begin{align*}
\psi(x)=x(1-x).
\end{align*}
The previous step says that
\begin{align*}
\int_0^1 f(x)(\psi\varphi)''(x)\,d\mathcal{L}^1(x)=0
\end{align*}
for every $\varphi \in C_c^\infty((0,1))$. Let $\partial_x^2 f \in \mathcal{D}'((0,1))$ denote the second distributional derivative of the [regular distribution](/page/Regular%20Distribution) induced by $f$, defined by
\begin{align*}
(\partial_x^2 f)(\eta)=\int_0^1 f(x)\eta''(x)\,d\mathcal{L}^1(x)
\end{align*}
for every $\eta \in C_c^\infty((0,1))$. By this definition, the displayed identity is
\begin{align*}
(\partial_x^2 f)(\psi\varphi)=0
\end{align*}
for every $\varphi \in C_c^\infty((0,1))$.
Since $\psi(x)>0$ for every $x \in (0,1)$, multiplication by $\psi$ maps $C_c^\infty((0,1))$ bijectively onto itself: for any $\eta \in C_c^\infty((0,1))$, the function $\varphi=\eta/\psi$ also belongs to $C_c^\infty((0,1))$. Hence
\begin{align*}
(\partial_x^2 f)(\eta)=0
\end{align*}
for every $\eta \in C_c^\infty((0,1))$. Therefore
\begin{align*}
\partial_x^2 f=0
\end{align*}
in $\mathcal{D}'((0,1))$.
[/step]
[step:Conclude that the continuous function is affine]
Because $\partial_x^2 f=0$ in $\mathcal{D}'((0,1))$, the first distributional derivative $\partial_x f$ has zero distributional derivative on $(0,1)$. Therefore $\partial_x f$ is a constant distribution: there exists $a \in \mathbb{R}$ such that
\begin{align*}
\partial_x f=a
\end{align*}
in $\mathcal{D}'((0,1))$. It follows that the distributional derivative of $f-a\operatorname{id}$ is zero, so there exists $b \in \mathbb{R}$ such that
\begin{align*}
f(x)=ax+b
\end{align*}
for every $x \in (0,1)$.
Finally, $f$ is continuous on $[0,1]$ and the affine function $x \mapsto ax+b$ is continuous on $[0,1]$. Since they agree on the [dense subset](/page/Dense%20Subset) $(0,1)$ of $[0,1]$, they agree at $0$ and $1$ as well. Therefore $f$ is affine on $[0,1]$.
[guided]
We have shown that $\partial_x^2 f=0$ as a distribution on $(0,1)$. This means that the distribution $\partial_x f$ has distributional derivative equal to zero. A distribution on an interval whose derivative is zero is constant, so there is a real number $a \in \mathbb{R}$ such that
\begin{align*}
\partial_x f=a
\end{align*}
in $\mathcal{D}'((0,1))$.
Now subtract the affine function with slope $a$. Let $h: (0,1) \to \mathbb{R}$ be the continuous function defined by $h(x)=f(x)-ax$. Its associated regular distribution has distributional derivative
\begin{align*}
\partial_x h=\partial_x f-a=0
\end{align*}
in $\mathcal{D}'((0,1))$. A continuous function whose distributional derivative is zero on an interval is constant on that interval. Hence there exists $b \in \mathbb{R}$ such that
\begin{align*}
h(x)=b
\end{align*}
for every $x \in (0,1)$. Replacing $h$ by its definition gives
\begin{align*}
f(x)=ax+b
\end{align*}
for every $x \in (0,1)$.
It remains only to include the endpoints. The original function $f$ is continuous on $[0,1]$, and the affine function $x \mapsto ax+b$ is continuous on $[0,1]$. Since the two continuous functions agree on the dense subset $(0,1)$, their values at the endpoint limits must agree as well. Therefore $f(0)=b$ and $f(1)=a+b$, so $f$ is affine on all of $[0,1]$.
[/guided]
[/step]