How well can polynomials approximate continuous functions? The question seems straightforward — after all, Taylor's theorem tells us that a smooth function is well-approximated by its Taylor polynomials near any given point. But Taylor approximation is fundamentally *local*: the polynomial is tuned to a single point, and as one moves away from that point, the approximation can deteriorate catastrophically. For a function like $f \in C([0,1])$ that is merely continuous — not differentiable, let alone analytic — Taylor's theorem says nothing at all.
text
admin
The Weierstrass Approximation Theorem resolves this by guaranteeing that *every* continuous function on a closed bounded interval can be approximated *uniformly* by polynomials to any desired accuracy. The word "uniformly" is decisive: the polynomial approximation is simultaneously good at every point of the interval, not just near a chosen center. This transforms polynomials from a local tool into a global one.
text
admin
[example: Taylor Failure]
Consider the function $f \in C([-1,1])$ defined by $f(x) = |x|$. This function is continuous on $[-1,1]$ but not differentiable at $x = 0$, so it has no Taylor series centered at the origin. If we instead expand around a point $x_0 > 0$, then $f$ agrees with the analytic function $x \mapsto x$ near $x_0$, and the Taylor series is simply $x$ — but this representation fails for $x < 0$ where $f(x) = -x$. The non-analyticity at $x = 0$ is an impassable barrier for any Taylor-based approach: no single power series centered at any point can represent $|x|$ on all of $[-1,1]$.
Yet the Weierstrass theorem guarantees the existence of a sequence of polynomials $p_n \in C([-1,1])$ with
\begin{align*}
\sup_{x \in [-1,1]} |f(x) - p_n(x)| \to 0 \quad \text{as } n \to \infty.
\end{align*}
The polynomials achieving this approximation have nothing to do with Taylor expansions — they are constructed by entirely different means, such as the Bernstein polynomials
\begin{align*}
B_n(f; x) := \sum_{k=0}^{n} f\!\left(\frac{k}{n}\right) \binom{n}{k} x^k (1-x)^{n-k},
\end{align*}
which average the values of $f$ using the binomial distribution rather than matching derivatives at a point.
[/example]
example
admin
The implications of this theorem reach far beyond polynomial approximation. It establishes that the polynomials form a *dense* subset of $C([a,b])$ under the supremum norm — a fact that underpins spectral methods in numerical analysis, the theory of moments, and the functional-analytic study of Banach algebras. The Stone--Weierstrass theorem generalizes this to characterize exactly which subalgebras of $C(K)$ are dense when $K$ is a compact Hausdorff space, replacing the specific role of polynomials with an abstract algebraic condition: *separation of points*.
text
admin
## Definition
h2
admin
The natural setting for the Weierstrass theorem is the [Banach space](/page/Banach%20Space) of continuous functions on a compact interval, equipped with the supremum norm. The question of "how well can we approximate" is made precise by measuring distance in this norm.
text
admin
[definition: Uniform Approximation by Polynomials]
Let $[a,b] \subset \mathbb{R}$ be a closed bounded interval and let $C([a,b])$ denote the Banach space of continuous functions $f: [a,b] \to \mathbb{R}$ equipped with the supremum norm
\begin{align*}
\|f\|_\infty := \sup_{x \in [a,b]} |f(x)|.
\end{align*}
A function $f \in C([a,b])$ is **uniformly approximable by polynomials** if for every $\varepsilon > 0$ there exists a polynomial $p: [a,b] \to \mathbb{R}$ such that
\begin{align*}
\|f - p\|_\infty < \varepsilon.
\end{align*}
Equivalently, the set of polynomial functions is **dense** in $C([a,b])$ with respect to the supremum norm: every $f \in C([a,b])$ is the uniform limit of a sequence of polynomials.
[/definition]
definition
admin
The supremum norm is the correct metric for this theory. Weaker notions of convergence — pointwise convergence, or convergence in $L^p$ — would yield different and often easier problems. Pointwise approximation by polynomials, for instance, is possible even for some discontinuous functions. The power of the Weierstrass theorem lies in the uniformity of the approximation, which preserves continuity, integrals, and many qualitative features of the target function.
text
admin
[remark: Why Closed Bounded Intervals]
The restriction to closed bounded intervals is essential on two counts. *Boundedness* rules out domains like $\mathbb{R}$ or $[0, \infty)$, where polynomial approximation fails for functions that grow or oscillate at infinity. *Closedness* ensures that $C([a,b])$ is a Banach space (the supremum norm is a genuine norm, and every Cauchy sequence converges). On an open interval like $(0,1)$, continuous functions need not be bounded, and the supremum norm may be infinite. The failure of the theorem on non-compact domains is examined in detail below.
[/remark]
remark
admin
## The Classical Weierstrass Theorem and Bernstein's Construction
h2
admin
The most natural proof of the Weierstrass theorem is also the most constructive: Bernstein's 1912 proof, which builds an explicit sequence of approximating polynomials using probabilistic averaging. Before stating the theorem, we describe the construction and explain why it works.
text
admin
### Bernstein Polynomials
h3
admin
The challenge in constructing polynomial approximations to a continuous function $f \in C([0,1])$ is that polynomials are infinitely smooth and analytic, while $f$ may have corners, cusps, or other non-smooth features. Any construction based on matching derivatives (as in Taylor's theorem) will fail for non-smooth $f$. Bernstein's idea circumvents this entirely: instead of matching local differential data, the $n$-th Bernstein polynomial evaluates $f$ at the grid points $k/n$ and forms a weighted average using the binomial probability weights.
text
admin
[definition: Bernstein Polynomial]
Let $f \in C([0,1])$. The **$n$-th Bernstein polynomial** of $f$ is the polynomial $B_n(f; \cdot): [0,1] \to \mathbb{R}$ defined by
\begin{align*}
B_n(f; x) := \sum_{k=0}^{n} f\!\left(\frac{k}{n}\right) \binom{n}{k} x^k (1 - x)^{n-k}.
\end{align*}
Equivalently, if $S_n$ denotes the sum of $n$ independent Bernoulli random variables each with parameter $x \in [0,1]$, then
\begin{align*}
B_n(f; x) = \mathbb{E}\!\left[f\!\left(\frac{S_n}{n}\right)\right].
\end{align*}
[/definition]
definition
admin
The probabilistic interpretation is the key to understanding why Bernstein polynomials converge. By the [Weak Law of Large Numbers](/page/Law%20of%20Large%20Numbers), the sample mean $S_n/n$ converges in probability to $x$ as $n \to \infty$. Since $f$ is continuous, $f(S_n/n)$ should be close to $f(x)$ for large $n$, and taking expectations preserves this closeness. The role of [uniform continuity](/page/Uniform%20Continuity) — guaranteed by the compactness of $[0,1]$ — is to make this convergence uniform in $x$.
text
admin
The following identities, which are direct consequences of the binomial theorem, are essential for the quantitative analysis.
text
admin
[quotetheorem:1214]
text
admin
The third identity reveals that the binomial weights $\binom{n}{k} x^k(1-x)^{n-k}$ concentrate around $k/n \approx x$ with variance $x(1-x)/n$. As $n$ grows, the weights become sharply peaked near $k/n = x$, so the Bernstein polynomial $B_n(f;x)$ increasingly samples $f$ near $x$ itself.