How well can polynomials approximate continuous functions? The question seems straightforward — after all, Taylor's theorem tells us that a smooth function is well-approximated by its Taylor polynomials near any given point. But Taylor approximation is fundamentally *local*: the polynomial is tuned to a single point, and as one moves away from that point, the approximation can deteriorate catastrophically. For a function like $f \in C([0,1])$ that is merely continuous — not differentiable, let alone analytic — Taylor's theorem says nothing at all.
The Weierstrass Approximation Theorem resolves this by guaranteeing that *every* continuous function on a closed bounded interval can be approximated *uniformly* by polynomials to any desired accuracy. The word "uniformly" is decisive: the polynomial approximation is simultaneously good at every point of the interval, not just near a chosen center. This transforms polynomials from a local tool into a global one.
[example: Taylor Failure]
Consider the function $f \in C([-1,1])$ defined by $f(x) = |x|$. This function is continuous on $[-1,1]$ but not differentiable at $x = 0$, so it has no Taylor series centered at the origin. If we instead expand around a point $x_0 > 0$, then $f$ agrees with the analytic function $x \mapsto x$ near $x_0$, and the Taylor series is simply $x$ — but this representation fails for $x < 0$ where $f(x) = -x$. The non-analyticity at $x = 0$ is an impassable barrier for any Taylor-based approach: no single power series centered at any point can represent $|x|$ on all of $[-1,1]$.
Yet the Weierstrass theorem guarantees the existence of a sequence of polynomials $p_n \in C([-1,1])$ with
\begin{align*}
\sup_{x \in [-1,1]} |f(x) - p_n(x)| \to 0 \quad \text{as } n \to \infty.
\end{align*}
The polynomials achieving this approximation have nothing to do with Taylor expansions — they are constructed by entirely different means, such as the Bernstein polynomials
\begin{align*}
B_n(f; x) := \sum_{k=0}^{n} f\!\left(\frac{k}{n}\right) \binom{n}{k} x^k (1-x)^{n-k},
\end{align*}
which average the values of $f$ using the binomial distribution rather than matching derivatives at a point.
[/example]
The implications of this theorem reach far beyond polynomial approximation. It establishes that the polynomials form a *dense* subset of $C([a,b])$ under the supremum norm — a fact that underpins spectral methods in numerical analysis, the theory of moments, and the functional-analytic study of Banach algebras. The Stone--Weierstrass theorem generalizes this to characterize exactly which subalgebras of $C(K)$ are dense when $K$ is a compact Hausdorff space, replacing the specific role of polynomials with an abstract algebraic condition: *separation of points*.
## Definition
The natural setting for the Weierstrass theorem is the [Banach space](/page/Banach%20Space) of continuous functions on a compact interval, equipped with the supremum norm. The question of "how well can we approximate" is made precise by measuring distance in this norm.
[definition: Uniform Approximation by Polynomials]
Let $[a,b] \subset \mathbb{R}$ be a closed bounded interval and let $C([a,b])$ denote the Banach space of continuous functions $f: [a,b] \to \mathbb{R}$ equipped with the supremum norm
\begin{align*}
\|f\|_\infty := \sup_{x \in [a,b]} |f(x)|.
\end{align*}
A function $f \in C([a,b])$ is **uniformly approximable by polynomials** if for every $\varepsilon > 0$ there exists a polynomial $p: [a,b] \to \mathbb{R}$ such that
\begin{align*}
\|f - p\|_\infty < \varepsilon.
\end{align*}
Equivalently, the set of polynomial functions is **dense** in $C([a,b])$ with respect to the supremum norm: every $f \in C([a,b])$ is the uniform limit of a sequence of polynomials.
[/definition]
The supremum norm is the correct metric for this theory. Weaker notions of convergence — pointwise convergence, or convergence in $L^p$ — would yield different and often easier problems. Pointwise approximation by polynomials, for instance, is possible even for some discontinuous functions. The power of the Weierstrass theorem lies in the uniformity of the approximation, which preserves continuity, integrals, and many qualitative features of the target function.
[remark: Why Closed Bounded Intervals]
The restriction to closed bounded intervals is essential on two counts. *Boundedness* rules out domains like $\mathbb{R}$ or $[0, \infty)$, where polynomial approximation fails for functions that grow or oscillate at infinity. *Closedness* ensures that $C([a,b])$ is a Banach space (the supremum norm is a genuine norm, and every Cauchy sequence converges). On an open interval like $(0,1)$, continuous functions need not be bounded, and the supremum norm may be infinite. The failure of the theorem on non-compact domains is examined in detail below.
[/remark]
## The Classical Weierstrass Theorem and Bernstein's Construction
The most natural proof of the Weierstrass theorem is also the most constructive: Bernstein's 1912 proof, which builds an explicit sequence of approximating polynomials using probabilistic averaging. Before stating the theorem, we describe the construction and explain why it works.
### Bernstein Polynomials
The challenge in constructing polynomial approximations to a continuous function $f \in C([0,1])$ is that polynomials are infinitely smooth and analytic, while $f$ may have corners, cusps, or other non-smooth features. Any construction based on matching derivatives (as in Taylor's theorem) will fail for non-smooth $f$. Bernstein's idea circumvents this entirely: instead of matching local differential data, the $n$-th Bernstein polynomial evaluates $f$ at the grid points $k/n$ and forms a weighted average using the binomial probability weights.
[definition: Bernstein Polynomial]
Let $f \in C([0,1])$. The **$n$-th Bernstein polynomial** of $f$ is the polynomial $B_n(f; \cdot): [0,1] \to \mathbb{R}$ defined by
\begin{align*}
B_n(f; x) := \sum_{k=0}^{n} f\!\left(\frac{k}{n}\right) \binom{n}{k} x^k (1 - x)^{n-k}.
\end{align*}
Equivalently, if $S_n$ denotes the sum of $n$ independent Bernoulli random variables each with parameter $x \in [0,1]$, then
\begin{align*}
B_n(f; x) = \mathbb{E}\!\left[f\!\left(\frac{S_n}{n}\right)\right].
\end{align*}
[/definition]
The probabilistic interpretation is the key to understanding why Bernstein polynomials converge. By the [Weak Law of Large Numbers](/page/Law%20of%20Large%20Numbers), the sample mean $S_n/n$ converges in probability to $x$ as $n \to \infty$. Since $f$ is continuous, $f(S_n/n)$ should be close to $f(x)$ for large $n$, and taking expectations preserves this closeness. The role of [uniform continuity](/page/Uniform%20Continuity) — guaranteed by the compactness of $[0,1]$ — is to make this convergence uniform in $x$.
The following identities, which are direct consequences of the binomial theorem, are essential for the quantitative analysis.
[quotetheorem:1214]
The third identity reveals that the binomial weights $\binom{n}{k} x^k(1-x)^{n-k}$ concentrate around $k/n \approx x$ with variance $x(1-x)/n$. As $n$ grows, the weights become sharply peaked near $k/n = x$, so the Bernstein polynomial $B_n(f;x)$ increasingly samples $f$ near $x$ itself.
We can now state the classical theorem.
[quotetheorem:480]
Several aspects of this theorem deserve careful attention.
**No rate of convergence is asserted.** The theorem guarantees existence of approximating polynomials but says nothing about how large their degree must be relative to $\varepsilon$. For a general $f \in C([a,b])$, the convergence of the Bernstein polynomials can be very slow — as slow as $O(1/\sqrt{n})$ — and the rate depends on the modulus of continuity of $f$. For Lipschitz functions, the Bernstein polynomials converge at rate $O(1/n)$; for $C^2$ functions, the rate improves to $O(1/n)$ as well but with a sharper constant. Jackson's theorems provide the precise relationship between smoothness and approximation rate.
**The interval must be compact.** On unbounded domains, the theorem fails completely: the function $f(x) = \sin(x)$ on $\mathbb{R}$ cannot be uniformly approximated by polynomials, because any nonzero polynomial $p$ satisfies $|p(x)| \to \infty$ as $|x| \to \infty$ while $|\sin(x)| \leq 1$. The failure on non-compact domains is analyzed in detail below.
**The approximating polynomial is not unique.** For any given $\varepsilon$, there are infinitely many polynomials within $\varepsilon$ of $f$. The Bernstein polynomials provide one specific, canonical sequence, but other constructions (Landau kernels, Fejér sums applied after a change of variable) yield different sequences. The question of *best* polynomial approximation — finding the polynomial of degree $n$ that minimizes $\|f - p\|_\infty$ — is the subject of Chebyshev approximation theory.
### A Quantitative Estimate via Bernstein Polynomials
The proof of the Weierstrass theorem via Bernstein polynomials naturally produces a quantitative bound in terms of the modulus of continuity.
[definition: Modulus of Continuity]
Let $f \in C([a,b])$. The **modulus of continuity** of $f$ is the function $\omega_f: [0, \infty) \to [0, \infty)$ defined by
\begin{align*}
\omega_f(\delta) := \sup \left\{ |f(x) - f(y)| : x, y \in [a,b], \; |x - y| \leq \delta \right\}.
\end{align*}
The function $f$ is uniformly continuous on $[a,b]$ if and only if $\omega_f(\delta) \to 0$ as $\delta \to 0$.
[/definition]
Since every continuous function on a [compact space](/page/Compact%20Space) is [uniformly continuous](/page/Uniform%20Continuity), we always have $\omega_f(\delta) \to 0$ as $\delta \to 0$ for $f \in C([a,b])$. The modulus of continuity measures *how fast* this convergence occurs and thereby controls the rate of polynomial approximation.
[quotetheorem:1215]
This estimate makes the role of uniform continuity transparent: the speed at which $\omega_f(1/\sqrt{n})$ tends to zero determines the convergence rate. For a Lipschitz function with $\omega_f(\delta) \leq L\delta$, the bound gives $O(1/\sqrt{n})$, which is not optimal — sharper analysis using the second-order modulus of continuity yields $O(1/n)$. For a function that is merely continuous, $\omega_f$ can decay arbitrarily slowly, and correspondingly, the Bernstein approximation can converge arbitrarily slowly.
[example: Bernstein Polynomials of Absolute Value]
Consider $f(x) = |2x - 1|$ on $[0,1]$, which has a corner at $x = 1/2$. The Bernstein polynomial of degree $n$ is
\begin{align*}
B_n(f; x) = \sum_{k=0}^{n} \left|\frac{2k}{n} - 1\right| \binom{n}{k} x^k(1-x)^{n-k}.
\end{align*}
At $x = 1/2$, the value $f(1/2) = 0$, while
\begin{align*}
B_n(f; 1/2) &= \frac{1}{2^n} \sum_{k=0}^{n} \left|\frac{2k}{n} - 1\right| \binom{n}{k}.
\end{align*}
For even $n$, by symmetry of the binomial coefficients, this reduces to
\begin{align*}
B_n(f; 1/2) &= \frac{2}{2^n} \sum_{k=n/2+1}^{n} \left(\frac{2k}{n} - 1\right) \binom{n}{k} = \frac{2}{2^n} \cdot \frac{2}{n} \sum_{k=n/2+1}^{n} \left(k - \frac{n}{2}\right) \binom{n}{k}.
\end{align*}
Using the identity $\sum_{k=n/2+1}^{n}(k - n/2)\binom{n}{k} = \frac{n}{2} \cdot \frac{1}{2^n}\binom{n}{n/2} \cdot 2^{n-1}$ (from properties of the binomial distribution), one obtains
\begin{align*}
B_n(f; 1/2) = \frac{1}{2^n} \binom{n}{n/2} \sim \sqrt{\frac{2}{\pi n}}
\end{align*}
by Stirling's approximation. Since $f(1/2) = 0$, the pointwise error at $x = 1/2$ decays like $1/\sqrt{n}$, consistent with the estimate $\omega_f(1/\sqrt{n}) = O(1/\sqrt{n})$ for the Lipschitz function $f$. This demonstrates that the $O(1/\sqrt{n})$ rate from the Bernstein estimate is *attained* for functions with corners — it is sharp for the class of Lipschitz functions.
[/example]
## Density in the Supremum Norm and the Lattice Structure of $C([a,b])$
The Weierstrass theorem has a natural functional-analytic reformulation: the polynomials are dense in the Banach space $(C([a,b]), \|\cdot\|_\infty)$. This raises an immediate question — *what other families of functions are dense in $C([a,b])$?* The answer depends on algebraic and order-theoretic properties of the approximating family, and leads to the Stone--Weierstrass theorem.
Before reaching that generalization, it is worth understanding *why* density in the supremum norm is a much stronger statement than density in other norms. A sequence of polynomials $p_n \to f$ uniformly means that the graphs of $p_n$ converge to the graph of $f$ as sets in $\mathbb{R}^2$ — the approximation is geometric, not just measure-theoretic. In contrast, density of polynomials in $L^p([a,b])$ (which follows from Weierstrass since uniform convergence implies $L^p$ convergence on bounded domains) does not preclude the approximants from having large pointwise errors on small sets.
### Consequences for Separability
The Weierstrass theorem implies that $C([a,b])$ is a **separable** Banach space: it contains a countable dense subset. To see this, note that the polynomials with rational coefficients form a countable set, and every polynomial can be uniformly approximated by polynomials with rational coefficients (by perturbing each coefficient). Since every $f \in C([a,b])$ can be uniformly approximated by polynomials, and every polynomial can be approximated by rational-coefficient polynomials, the rational-coefficient polynomials are dense in $C([a,b])$.
This separability has profound consequences. It guarantees that the dual space $C([a,b])^*$ — which by the Riesz Representation Theorem consists of signed Radon measures on $[a,b]$ — is weak-* metrizable on bounded sets. It also ensures that bounded sequences in $C([a,b])$ have separable ranges, which is essential for many arguments in functional analysis.
### The Moment Problem Connection
A different perspective on density comes from the theory of moments. A signed Borel measure $\mu$ on $[a,b]$ is uniquely determined by its moments
\begin{align*}
m_k := \int_a^b x^k \, d\mu(x), \quad k = 0, 1, 2, \ldots
\end{align*}
if and only if the polynomials are dense in $L^1(|\mu|)$, where $|\mu|$ is the total variation measure. The Weierstrass theorem provides the stronger statement that polynomials are dense in $C([a,b])$ — which, combined with the fact that every continuous linear functional on $C([a,b])$ is represented by a measure (Riesz), yields the following.
[quotetheorem:1216]
[example: Moment Uniqueness]
Suppose $\mu$ is a signed Borel measure on $[0,1]$ satisfying
\begin{align*}
\int_0^1 x^k \, d\mu(x) = 0 \quad \text{for all } k \geq 0.
\end{align*}
By linearity, $\int_0^1 p(x)\, d\mu(x) = 0$ for every polynomial $p$. Now let $g \in C([0,1])$ be arbitrary. By the Weierstrass theorem, there exists a sequence of polynomials $p_n$ with $\|g - p_n\|_\infty \to 0$. Then
\begin{align*}
\left|\int_0^1 g \, d\mu\right| = \left|\int_0^1 (g - p_n) \, d\mu\right| \leq \|g - p_n\|_\infty \cdot |\mu|([0,1]) \to 0.
\end{align*}
Since $\int_0^1 g \, d\mu = 0$ for every $g \in C([0,1])$, the measure $\mu$ is zero (by the Riesz representation theorem, which identifies the dual of $C([0,1])$ with signed Radon measures). This argument fails for the moment problem on $[0,\infty)$ — there exist distinct measures on $[0,\infty)$ with identical moments, because polynomials are *not* dense in the relevant function space on unbounded domains.
[/example]
## The Stone--Weierstrass Theorem
### From Polynomials to Subalgebras
The classical Weierstrass theorem concerns a single, specific family of functions — the polynomials. But what *property* of the polynomials makes them dense? The polynomials form an algebra (closed under addition and multiplication), they contain the constants, and they *separate points* (the polynomial $p(x) = x$ takes different values at different points). The Stone--Weierstrass theorem identifies these as the *only* properties that matter: any subalgebra of $C(K)$ with these three properties is dense, regardless of the specific nature of its elements.
This abstraction has enormous reach. It immediately implies the density of trigonometric polynomials in $C([a,b])$ (the trigonometric functions form a point-separating algebra), the density of piecewise linear functions, and — on compact subsets of $\mathbb{R}^n$ — the density of multivariate polynomials.
[definition: Point-Separating Subalgebra]
Let $K$ be a [compact](/page/Compact%20Space) Hausdorff space and let $\mathcal{A} \subset C(K; \mathbb{R})$ be a subset of the space of continuous real-valued functions on $K$. The set $\mathcal{A}$ is a **subalgebra** of $C(K; \mathbb{R})$ if it is closed under addition, scalar multiplication, and pointwise multiplication:
\begin{align*}
f, g \in \mathcal{A}, \; \alpha \in \mathbb{R} \quad &\Longrightarrow \quad \alpha f + g \in \mathcal{A}, \\
f, g \in \mathcal{A} \quad &\Longrightarrow \quad f \cdot g \in \mathcal{A}.
\end{align*}
The subalgebra $\mathcal{A}$ **separates points** if for every pair of distinct points $x, y \in K$ with $x \neq y$, there exists $f \in \mathcal{A}$ such that $f(x) \neq f(y)$.
The subalgebra $\mathcal{A}$ **vanishes at no point** if for every $x \in K$, there exists $f \in \mathcal{A}$ with $f(x) \neq 0$.
[/definition]
The condition "vanishes at no point" is automatically satisfied if $\mathcal{A}$ contains a nonzero constant function. For subalgebras that do not contain constants, this condition is genuinely needed — the subalgebra $\{f \in C([0,1]) : f(0) = 0\}$ separates points but vanishes at $0$, and it is *not* dense in $C([0,1])$ (it cannot approximate the constant function $1$).
[quotetheorem:1217]
The power of this theorem lies in its three-pronged structure. Each hypothesis plays a distinct and indispensable role, and removing any one of them causes the conclusion to fail.
**Separation of points cannot be dropped.** If $\mathcal{A}$ does not separate points, then there exist $x \neq y$ in $K$ such that $f(x) = f(y)$ for all $f \in \mathcal{A}$. Every function in the closure $\overline{\mathcal{A}}$ then also satisfies $f(x) = f(y)$, so $\overline{\mathcal{A}}$ cannot contain any $g \in C(K; \mathbb{R})$ with $g(x) \neq g(y)$. For a concrete example, the constant functions form a subalgebra of $C([0,1])$ that vanishes at no point but does not separate points, and its closure is just the constants — far from all of $C([0,1])$.
**The "vanishes at no point" condition cannot be dropped.** Consider $K = [0,1]$ and $\mathcal{A} = \{p \in C([0,1]) : p \text{ is a polynomial with } p(0) = 0\}$. This is a subalgebra (the product of two polynomials vanishing at $0$ again vanishes at $0$), and it separates points (the polynomial $p(x) = x$ does so). But $\overline{\mathcal{A}} = \{f \in C([0,1]) : f(0) = 0\} \neq C([0,1])$.
**Compactness of $K$ cannot be dropped.** On a non-compact, locally compact space like $\mathbb{R}$, the analogous statement is false in its naive form. The appropriate generalization uses the space $C_0(X)$ of continuous functions vanishing at infinity and requires the subalgebra to separate points and vanish at no point of $X$.
**The algebra structure is essential.** A subspace of $C(K)$ that separates points and contains constants but is not closed under multiplication need not be dense. For instance, the linear span of $\{1, x\}$ separates points on $[0,1]$ and contains constants, but its closure consists only of affine functions. The multiplicative closure — that is, the full polynomial algebra — is needed.
### The Complex Version and the Role of Conjugation
For complex-valued functions, the Stone--Weierstrass theorem requires an additional hypothesis.
[quotetheorem:1218]
The self-adjointness hypothesis is not an artifact of the proof — it is genuinely necessary. The subalgebra of $C(\overline{B}(0,1); \mathbb{C})$ consisting of functions that are holomorphic on the open unit disk $B(0,1) \subset \mathbb{C}$ and continuous on $\overline{B}(0,1)$ (the *disk algebra*) separates points, contains constants, but is not self-adjoint: the function $z \mapsto \bar{z}$ is not holomorphic. The disk algebra is a proper closed subalgebra of $C(\overline{B}(0,1); \mathbb{C})$ — it cannot approximate every continuous function, because uniform limits of holomorphic functions are holomorphic (by Morera's theorem), and not every continuous function on $\overline{B}(0,1)$ is holomorphic.
## Trigonometric Approximation
One of the most important applications of the Stone--Weierstrass theorem is the density of trigonometric polynomials in the space of continuous periodic functions. This result is the foundation of Fourier analysis on compact groups, and its proof via Stone--Weierstrass is far more transparent than direct arguments using Fejér kernels.
### Periodic Functions as Functions on the Circle
The difficulty in applying the Stone--Weierstrass theorem to periodic functions is that the domain $\mathbb{R}$ is not compact. The resolution is to recognize that a continuous function $f: \mathbb{R} \to \mathbb{R}$ with period $2\pi$ is equivalent to a continuous function on the compact space $\mathbb{T} := \mathbb{R}/(2\pi\mathbb{Z})$, the unit circle. The space $C(\mathbb{T}; \mathbb{R})$ is a Banach space under the supremum norm, and the Stone--Weierstrass theorem applies directly.
[definition: Trigonometric Polynomial]
A **trigonometric polynomial** of degree at most $n$ is a function $T: \mathbb{T} \to \mathbb{R}$ (equivalently, a $2\pi$-periodic function on $\mathbb{R}$) of the form
\begin{align*}
T(\theta) = a_0 + \sum_{k=1}^{n} \left(a_k \cos(k\theta) + b_k \sin(k\theta)\right),
\end{align*}
where $a_0, a_1, \ldots, a_n, b_1, \ldots, b_n \in \mathbb{R}$.
[/definition]
[quotetheorem:1219]
This follows immediately from the Stone--Weierstrass theorem. The trigonometric polynomials form a subalgebra of $C(\mathbb{T}; \mathbb{R})$ (using the product-to-sum identities $\cos \alpha \cos \beta = \frac{1}{2}[\cos(\alpha - \beta) + \cos(\alpha + \beta)]$ and similar formulas, the product of two trigonometric polynomials is again a trigonometric polynomial). They contain the constants (take $a_0 = 1$, all other coefficients zero). And they separate points: if $\theta_1 \neq \theta_2$ in $\mathbb{T}$, then $\cos(\theta_1) \neq \cos(\theta_2)$ or $\sin(\theta_1) \neq \sin(\theta_2)$ (otherwise $e^{i\theta_1} = e^{i\theta_2}$, contradicting $\theta_1 \neq \theta_2$ in $\mathbb{T}$). All hypotheses of Stone--Weierstrass are satisfied.
This density result has a crucial consequence for [$L^p$ spaces](/page/L%5Ep%20Spaces): since uniform convergence on a space of finite measure implies $L^p$ convergence, trigonometric polynomials are also dense in $L^p(\mathbb{T})$ for all $1 \leq p < \infty$. This is the starting point for the $L^p$ theory of Fourier series.
## Failure on Non-Compact Domains
The Weierstrass theorem is genuinely a theorem about compact domains, and understanding *why* it fails for non-compact domains clarifies the role of compactness in the theory.
### Growth at Infinity
The most basic obstruction is growth: any nonconstant polynomial is unbounded on $\mathbb{R}$, while many continuous functions (such as bounded functions) are not.
[example: Failure on the Real Line]
Let $f: \mathbb{R} \to \mathbb{R}$ be defined by $f(x) = \sin(x)$. Suppose for contradiction that there exists a polynomial $p$ with
\begin{align*}
\sup_{x \in \mathbb{R}} |f(x) - p(x)| < \frac{1}{2}.
\end{align*}
Then $|p(x)| \leq |f(x)| + 1/2 \leq 3/2$ for all $x \in \mathbb{R}$. But a bounded polynomial on $\mathbb{R}$ must be constant (a polynomial of degree $\geq 1$ satisfies $|p(x)| \to \infty$ as $|x| \to \infty$). So $p$ is constant, say $p(x) = c$. Then $|\sin(x) - c| < 1/2$ for all $x$, which is impossible since $\sin$ takes values in $[-1, 1]$ and oscillates — no constant $c$ can stay within $1/2$ of both $\sin(0) = 0$ and $\sin(\pi/2) = 1$.
More generally, *no* bounded continuous function on $\mathbb{R}$ other than a constant can be uniformly approximated by polynomials on $\mathbb{R}$.
[/example]
### Failure on Half-Lines and the Role of Compactness
Even on $[0, \infty)$, where the domain is closed, polynomial approximation fails for bounded functions by the same growth argument. But there is a subtler failure: on non-compact domains, continuous functions can exhibit behavior at infinity that polynomials cannot replicate.
[example: Failure on a Half-Line]
Define $f: [0, \infty) \to \mathbb{R}$ by $f(x) = e^{-x}$. This function is continuous and bounded, with $f(x) \to 0$ as $x \to \infty$. If a polynomial $p$ of degree $d \geq 1$ satisfies $\|f - p\|_\infty < 1/2$ on $[0, \infty)$, then $|p(x)| < |f(x)| + 1/2 \leq 3/2$ for all $x \geq 0$. Since $p$ has degree $\geq 1$, $|p(x)| \to \infty$ as $x \to \infty$, contradicting the bound. If $p$ is a constant $c$, then $|e^{-x} - c| < 1/2$ for all $x \geq 0$ requires both $|1 - c| < 1/2$ (at $x = 0$) and $|c| < 1/2$ (as $x \to \infty$), giving $c > 1/2$ and $c < 1/2$ simultaneously — a contradiction.
Thus $e^{-x}$ cannot be uniformly approximated by polynomials on $[0, \infty)$, despite being infinitely smooth and even real-analytic.
[/example]
The essential point is that compactness serves a dual role in the Weierstrass theorem. First, it guarantees that continuous functions are bounded and uniformly continuous, which are needed for the approximation argument. Second, it prevents the "escape to infinity" that makes polynomial approximation impossible on unbounded domains. The Stone--Weierstrass theorem makes the role of compactness fully explicit: it is a theorem about $C(K)$ for *compact* $K$.
## The Muntz--Szasz Theorem
The Weierstrass theorem states that the *full* sequence of monomials $\{1, x, x^2, x^3, \ldots\}$ generates a dense subspace of $C([0,1])$. A natural refinement asks: which *subsequences* of monomials suffice? Can we omit some powers and still approximate every continuous function?
[quotetheorem:1220]
This theorem draws a sharp boundary through the space of monomial sequences. The classical case $\lambda_k = k$ satisfies $\sum 1/k = \infty$ (the harmonic series diverges), recovering the Weierstrass theorem. The even powers $\lambda_k = 2k$ also satisfy $\sum 1/(2k) = \infty$, so $\{1, x^2, x^4, \ldots\}$ spans a dense subspace of $C([0,1])$ — this is geometrically reasonable, since every continuous function on $[0,1]$ can be expressed as a function of $x^2$ after reparametrization.
On the other hand, the sequence $\lambda_k = k^2$ gives $\sum 1/k^2 = \pi^2/6 < \infty$, so the span of $\{1, x, x^4, x^9, x^{16}, \ldots\}$ is *not* dense in $C([0,1])$. There exist continuous functions on $[0,1]$ that cannot be uniformly approximated by linear combinations of $\{1, x^{k^2}\}$. The theorem does not tell us explicitly what these functions look like, but their existence is guaranteed.
[example: A Lacunary Sequence]
Consider the sequence $\lambda_k = 2^k$, so we ask whether $\operatorname{span}\{1, x^2, x^4, x^8, x^{16}, \ldots\}$ is dense in $C([0,1])$. The divergence condition requires
\begin{align*}
\sum_{k=1}^{\infty} \frac{1}{2^k} = 1 < \infty.
\end{align*}
Since the series converges, the Muntz--Szasz theorem asserts that the span is *not* dense. Despite having infinitely many monomials, the powers grow so fast that the sequence is too sparse to capture all continuous functions. In contrast, the sequence $\lambda_k = k \log(k+1)$ for $k \geq 1$ satisfies
\begin{align*}
\sum_{k=1}^{\infty} \frac{1}{k \log(k+1)} = \infty
\end{align*}
(by comparison with the integral $\int_1^\infty \frac{dx}{x \log(x+1)}$, which diverges), so the span of $\{1, x^{\lambda_k}\}$ is dense despite the exponents growing faster than linearly.
[/example]
## Standard Techniques in Approximation Theory
Working with the Weierstrass theorem and its generalizations requires a toolkit of standard arguments. The following techniques appear repeatedly in the theory and its applications.
### Reduction to $[0,1]$ by Affine Change of Variable
Many statements of the Weierstrass theorem are given for $[0,1]$, but the result holds on any closed bounded interval $[a,b]$. The passage between the two is a standard reduction. Given $f \in C([a,b])$, define $g \in C([0,1])$ by
\begin{align*}
g(t) := f\!\left(a + (b-a)t\right).
\end{align*}
If $p_n$ is a sequence of polynomials with $\|g - p_n\|_\infty \to 0$ on $[0,1]$, then $q_n(x) := p_n\!\left(\frac{x-a}{b-a}\right)$ is a sequence of polynomials with $\|f - q_n\|_\infty \to 0$ on $[a,b]$. The composition of a polynomial with an affine map is again a polynomial, so this reduction is legitimate.
### Approximation on Compact Subsets of $\mathbb{R}^n$
The multivariate Weierstrass theorem — that polynomials in $n$ variables are dense in $C(K)$ for any compact $K \subset \mathbb{R}^n$ — follows from the Stone--Weierstrass theorem. The algebra of polynomials in $n$ variables separates points (the coordinate functions $x_i$ do so), contains constants, and is closed under multiplication. Every hypothesis of Stone--Weierstrass is satisfied with $\mathcal{A}$ equal to the polynomial algebra on $K$.
### The Lattice Version: Approximation via Max and Min
A useful variant of the Stone--Weierstrass theorem replaces the algebra condition with a *lattice* condition.
[quotetheorem:1221]
This version is sometimes easier to apply than the algebra version because checking the lattice property (closure under $\max$ and $\min$) can be simpler than checking closure under multiplication. It is the key tool in proving that piecewise linear functions are dense in $C([a,b])$: the set of continuous piecewise linear functions is a lattice (the max and min of two piecewise linear functions are piecewise linear), and it separates points in the strong sense above.
### Extending from Real to Complex via Self-Adjointness
When working with complex-valued functions, the standard technique for applying Stone--Weierstrass is to verify self-adjointness explicitly. Given a subalgebra $\mathcal{A} \subset C(K; \mathbb{C})$, one checks that $\overline{f} \in \mathcal{A}$ whenever $f \in \mathcal{A}$. For the trigonometric polynomials $\sum c_k e^{ik\theta}$ on the circle, self-adjointness holds because $\overline{e^{ik\theta}} = e^{-ik\theta}$, which is again in the algebra. For the disk algebra (holomorphic functions on the closed disk), self-adjointness fails, and this failure is the precise reason the algebra is not dense.
An equivalent approach is to decompose $f = \operatorname{Re}(f) + i \operatorname{Im}(f)$ and apply the real Stone--Weierstrass theorem to each part separately, using the fact that $\operatorname{Re}(f) = (f + \bar{f})/2$ and $\operatorname{Im}(f) = (f - \bar{f})/(2i)$ belong to the self-adjoint part of $\mathcal{A}$.
## References
- Weierstrass, K., "Uber die analytische Darstellbarkeit sogenannter willkurlicher Functionen einer reellen Veranderlichen," *Sitzungsberichte der Koniglich Preussischen Akademie der Wissenschaften zu Berlin* (1885).
- Bernstein, S. N., "Demonstration du theoreme de Weierstrass fondee sur le calcul des probabilites," *Communications de la Societe Mathematique de Kharkow* (1912).
- Stone, M. H., "The Generalized Weierstrass Approximation Theorem," *Mathematics Magazine* (1948).
- Muntz, Ch. H., "Uber den Approximationssatz von Weierstrass," *Mathematische Abhandlungen H. A. Schwarz gewidmet* (1914).
- Rudin, W., *Principles of Mathematical Analysis*, 3rd ed. (1976).
- Rudin, W., *Real and Complex Analysis*, 3rd ed. (1987).
- Cheney, E. W., *Introduction to Approximation Theory*, 2nd ed. (1982).