Weierstrass Approximation Theorem — Statement & Proof

Weierstrass Approximation Theorem (Theorem # 480)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first reduce the theorem from an arbitrary finite closed interval $[a,b]$ to the interval $[0,1]$ by an affine change of variables, treating the degenerate case $a=b$ separately. On $[0,1]$, we approximate the transported function by its Bernstein polynomials. The Bernstein error is written as a weighted average, split into indices near $t$ and far from $t$, with [uniform continuity](/page/Uniform%20Continuity) controlling the near part and the binomial variance identity controlling the far part uniformly in $t$. [/proofplan] [step:Dispose of the degenerate interval case] If $a=b$, define the polynomial \begin{align*} p: \mathbb{R} &\to \mathbb{R} \\ x &\mapsto f(a). \end{align*} Then for the only point $x \in [a,b]$, namely $x=a$, we have $|f(x)-p(x)|=0 \leq \varepsilon$. Hence the theorem holds when $a=b$. [/step] [step:Transport the problem to the unit interval] Assume now that $a<b$. Define the affine map \begin{align*} A: [0,1] &\to [a,b] \\ t &\mapsto a+(b-a)t, \end{align*} and define \begin{align*} g: [0,1] &\to \mathbb{R} \\ t &\mapsto f(A(t)). \end{align*} Since $A$ is [continuous](/page/Continuity) and $f \in C[a,b]$, the composition $g=f\circ A$ belongs to $C[0,1]$. It is enough to prove that for every $\varepsilon>0$ there is a polynomial $q:\mathbb{R}\to\mathbb{R}$ such that $|g(t)-q(t)|\leq\varepsilon$ for all $t\in[0,1]$, because then \begin{align*} p: \mathbb{R} &\to \mathbb{R} \\ x &\mapsto q\left(\frac{x-a}{b-a}\right) \end{align*} is a polynomial and, for every $x\in[a,b]$, the point $t=(x-a)/(b-a)$ lies in $[0,1]$, so \begin{align*} |f(x)-p(x)|=|g(t)-q(t)|\leq \varepsilon. \end{align*} [/step] [step:Define the Bernstein polynomial and express its error as a weighted average] Fix $\varepsilon>0$. For each $n\in\mathbb{N}$, define the polynomial \begin{align*} B_n: \mathbb{R} &\to \mathbb{R} \\ t &\mapsto \sum_{k=0}^{n} g\left(\frac{k}{n}\right)\binom{n}{k}t^k(1-t)^{n-k}. \end{align*} For $t\in[0,1]$, define the Bernstein weight \begin{align*} w_{n,k}(t):=\binom{n}{k}t^k(1-t)^{n-k} \end{align*} for each integer $k$ with $0\leq k\leq n$. By the [binomial theorem](/theorems/750), \begin{align*} \sum_{k=0}^{n} w_{n,k}(t) = \sum_{k=0}^{n}\binom{n}{k}t^k(1-t)^{n-k} = (t+(1-t))^n = 1. \end{align*} Therefore, for every $t\in[0,1]$, \begin{align*} B_n(t)-g(t) &= \sum_{k=0}^{n}\left[g\left(\frac{k}{n}\right)-g(t)\right]w_{n,k}(t). \end{align*} [/step] [step:Estimate the far weights by the binomial variance identity] For every $n\in\mathbb{N}$ and every $t\in[0,1]$, \begin{align*} \sum_{k=0}^{n}\left(\frac{k}{n}-t\right)^2w_{n,k}(t) = \frac{t(1-t)}{n} \leq \frac{1}{4n}. \end{align*} Indeed, differentiating the identity $(s+1)^n=\sum_{k=0}^{n}\binom{n}{k}s^k$ after multiplying by suitable powers of $t$ and $1-t$ gives \begin{align*} \sum_{k=0}^{n}k\,w_{n,k}(t)&=nt,\\ \sum_{k=0}^{n}k(k-1)\,w_{n,k}(t)&=n(n-1)t^2. \end{align*} Hence \begin{align*} \sum_{k=0}^{n}k^2w_{n,k}(t) &= n(n-1)t^2+nt, \end{align*} and subtracting the square of the first moment gives \begin{align*} \sum_{k=0}^{n}\left(\frac{k}{n}-t\right)^2w_{n,k}(t) &= \frac{1}{n^2}\sum_{k=0}^{n}k^2w_{n,k}(t)-2t\frac{1}{n}\sum_{k=0}^{n}k w_{n,k}(t)+t^2\sum_{k=0}^{n}w_{n,k}(t)\\ &= \frac{t(1-t)}{n}. \end{align*} Since $0\leq t\leq 1$, we have $t(1-t)\leq 1/4$. [guided] The purpose of this step is to quantify how much Bernstein weight can sit far from the point $t$. The weights $w_{n,k}(t)$ behave like the probability weights of a binomial random variable divided by $n$, but we only need the elementary finite-sum identities. For each $n\in\mathbb{N}$, each $t\in[0,1]$, and each integer $k$ with $0\leq k\leq n$, the weight is \begin{align*} w_{n,k}(t)=\binom{n}{k}t^k(1-t)^{n-k}. \end{align*} The binomial theorem gives \begin{align*} \sum_{k=0}^{n}w_{n,k}(t)=1. \end{align*} To compute the second central moment, we use the standard binomial derivative identities. First, \begin{align*} \sum_{k=0}^{n}k\,w_{n,k}(t)=nt. \end{align*} Second, \begin{align*} \sum_{k=0}^{n}k(k-1)\,w_{n,k}(t)=n(n-1)t^2. \end{align*} Adding the first identity to the second gives the second moment: \begin{align*} \sum_{k=0}^{n}k^2w_{n,k}(t) = n(n-1)t^2+nt. \end{align*} Now expand the square in the central moment: \begin{align*} \sum_{k=0}^{n}\left(\frac{k}{n}-t\right)^2w_{n,k}(t) &= \frac{1}{n^2}\sum_{k=0}^{n}k^2w_{n,k}(t)-2t\frac{1}{n}\sum_{k=0}^{n}k w_{n,k}(t)+t^2\sum_{k=0}^{n}w_{n,k}(t)\\ &= \frac{n(n-1)t^2+nt}{n^2}-2t^2+t^2\\ &= \frac{t(1-t)}{n}. \end{align*} Finally, the elementary inequality $t(1-t)\leq 1/4$ for $t\in[0,1]$ gives \begin{align*} \sum_{k=0}^{n}\left(\frac{k}{n}-t\right)^2w_{n,k}(t) \leq \frac{1}{4n}. \end{align*} This is the variance estimate that will force the far part of the Bernstein average to vanish uniformly in $t$. [/guided] [/step] [step:Split the Bernstein error into near and far indices] Since $g\in C[0,1]$ and $[0,1]$ is compact, the [Heine-Cantor Theorem](/theorems/280) implies that $g$ is uniformly continuous. Apply uniform continuity with tolerance $\varepsilon/2$: there exists $\delta>0$ such that for all $s,t\in[0,1]$, \begin{align*} |s-t|<\delta \quad\Longrightarrow\quad |g(s)-g(t)|<\frac{\varepsilon}{2}. \end{align*} Define \begin{align*} M:=\max_{t\in[0,1]}|g(t)|. \end{align*} The maximum exists by the [Extreme Value Theorem for compact spaces](/theorems/304), because $g$ is continuous on the [compact](/page/Compact%20Space) interval $[0,1]$. For fixed $n\in\mathbb{N}$ and $t\in[0,1]$, split the indices into the near set \begin{align*} N_{n,t}:=\left\{k\in\{0,\dots,n\}:\left|\frac{k}{n}-t\right|<\delta\right\} \end{align*} and the far set \begin{align*} F_{n,t}:=\left\{k\in\{0,\dots,n\}:\left|\frac{k}{n}-t\right|\geq\delta\right\}. \end{align*} Using the weighted-average formula for the error, the triangle inequality, the near estimate, and $|g(k/n)-g(t)|\leq 2M$, we obtain \begin{align*} |B_n(t)-g(t)| &\leq \sum_{k=0}^{n}\left|g\left(\frac{k}{n}\right)-g(t)\right|w_{n,k}(t)\\ &= \sum_{k\in N_{n,t}}\left|g\left(\frac{k}{n}\right)-g(t)\right|w_{n,k}(t) + \sum_{k\in F_{n,t}}\left|g\left(\frac{k}{n}\right)-g(t)\right|w_{n,k}(t)\\ &\leq \frac{\varepsilon}{2}\sum_{k\in N_{n,t}}w_{n,k}(t) + 2M\sum_{k\in F_{n,t}}w_{n,k}(t)\\ &\leq \frac{\varepsilon}{2} + 2M\sum_{k\in F_{n,t}}w_{n,k}(t). \end{align*} For $k\in F_{n,t}$, we have $\delta^2\leq (k/n-t)^2$, so \begin{align*} \sum_{k\in F_{n,t}}w_{n,k}(t) &\leq \frac{1}{\delta^2}\sum_{k\in F_{n,t}}\left(\frac{k}{n}-t\right)^2w_{n,k}(t)\\ &\leq \frac{1}{\delta^2}\sum_{k=0}^{n}\left(\frac{k}{n}-t\right)^2w_{n,k}(t)\\ &\leq \frac{1}{4n\delta^2}. \end{align*} Therefore, for every $t\in[0,1]$, \begin{align*} |B_n(t)-g(t)| \leq \frac{\varepsilon}{2}+\frac{M}{2n\delta^2}. \end{align*} [guided] We now turn the variance estimate into an approximation estimate. The obstacle is that uniform continuity controls $|g(k/n)-g(t)|$ only when $k/n$ is close to $t$. Therefore we split the finite sum into indices close to $t$ and indices far from $t$. Because $g\in C[0,1]$ and $[0,1]$ is compact, the Heine-[Cantor Theorem](/theorems/759) gives uniform continuity. Applying it with tolerance $\varepsilon/2$, choose $\delta>0$ such that for all $s,t\in[0,1]$, \begin{align*} |s-t|<\delta \quad\Longrightarrow\quad |g(s)-g(t)|<\frac{\varepsilon}{2}. \end{align*} Also define \begin{align*} M:=\max_{t\in[0,1]}|g(t)|. \end{align*} This maximum exists by the [Extreme Value Theorem for compact spaces](/theorems/304), since $g$ is continuous on the compact interval $[0,1]$. Fix $n\in\mathbb{N}$ and $t\in[0,1]$. Define the near indices by \begin{align*} N_{n,t}:=\left\{k\in\{0,\dots,n\}:\left|\frac{k}{n}-t\right|<\delta\right\}, \end{align*} and define the far indices by \begin{align*} F_{n,t}:=\left\{k\in\{0,\dots,n\}:\left|\frac{k}{n}-t\right|\geq\delta\right\}. \end{align*} The Bernstein error is \begin{align*} B_n(t)-g(t) = \sum_{k=0}^{n}\left[g\left(\frac{k}{n}\right)-g(t)\right]w_{n,k}(t). \end{align*} Taking absolute values and applying the triangle inequality gives \begin{align*} |B_n(t)-g(t)| &\leq \sum_{k=0}^{n}\left|g\left(\frac{k}{n}\right)-g(t)\right|w_{n,k}(t). \end{align*} On the near set $N_{n,t}$, the definition of $\delta$ gives \begin{align*} \left|g\left(\frac{k}{n}\right)-g(t)\right|<\frac{\varepsilon}{2}. \end{align*} On the far set $F_{n,t}$, we use the uniform bound $|g|\leq M$, which gives \begin{align*} \left|g\left(\frac{k}{n}\right)-g(t)\right| \leq \left|g\left(\frac{k}{n}\right)\right|+|g(t)| \leq 2M. \end{align*} Thus \begin{align*} |B_n(t)-g(t)| &\leq \frac{\varepsilon}{2}\sum_{k\in N_{n,t}}w_{n,k}(t) + 2M\sum_{k\in F_{n,t}}w_{n,k}(t)\\ &\leq \frac{\varepsilon}{2} + 2M\sum_{k\in F_{n,t}}w_{n,k}(t), \end{align*} because the weights are non-negative and their total sum is $1$. It remains to bound the far weight. If $k\in F_{n,t}$, then \begin{align*} \delta^2\leq \left(\frac{k}{n}-t\right)^2. \end{align*} Multiplying by the non-negative weight $w_{n,k}(t)$ and summing over $F_{n,t}$ gives \begin{align*} \sum_{k\in F_{n,t}}w_{n,k}(t) &\leq \frac{1}{\delta^2}\sum_{k\in F_{n,t}}\left(\frac{k}{n}-t\right)^2w_{n,k}(t)\\ &\leq \frac{1}{\delta^2}\sum_{k=0}^{n}\left(\frac{k}{n}-t\right)^2w_{n,k}(t). \end{align*} The variance estimate from the previous step gives \begin{align*} \sum_{k\in F_{n,t}}w_{n,k}(t) \leq \frac{1}{4n\delta^2}. \end{align*} Substituting this into the error estimate yields \begin{align*} |B_n(t)-g(t)| \leq \frac{\varepsilon}{2}+\frac{M}{2n\delta^2}. \end{align*} This estimate is uniform in $t\in[0,1]$, which is the essential point needed for uniform approximation. [/guided] [/step] [step:Choose a Bernstein polynomial that achieves the required accuracy] Choose $n\in\mathbb{N}$ large enough that \begin{align*} \frac{M}{2n\delta^2}\leq \frac{\varepsilon}{2}. \end{align*} Then the previous estimate gives, for every $t\in[0,1]$, \begin{align*} |B_n(t)-g(t)|\leq \varepsilon. \end{align*} Thus $q:=B_n$ is a polynomial satisfying $|g(t)-q(t)|\leq\varepsilon$ for all $t\in[0,1]$. By the affine reduction step, the polynomial \begin{align*} p: \mathbb{R} &\to \mathbb{R} \\ x &\mapsto q\left(\frac{x-a}{b-a}\right) \end{align*} satisfies $|f(x)-p(x)|\leq\varepsilon$ for all $x\in[a,b]$. This proves the theorem. [/step]

Prerequisites (0/4 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Extreme Value Theorem for Compact Spaces

Definitions & Concepts

Explore Further

Continuity Definition Uniform Continuity Definition Compact Space Definition Extreme Value Theorem for Compact Spaces Theorem #304 Dual of a Quotient Space Analysis Riesz Representation Theorem Functional Analysis Equivalent Characterisations of Compactness in Metric Spaces Topology Termwise Differentiation of Power Series Sequences and Series of Functions Dense Subset Criterion for Weak Convergence Analysis Hilbert Inversion Formula On [-1,1] Complex Analysis BV Embedding Into L Star Geometric Measure Theory Kolmogorov Zero One Law Measure Theory Analysis Area

What brings you to Androma?

Start with a route through the knowledge graph.

Weierstrass Approximation Theorem (Theorem # 480)

Discussion

Proof

Prerequisites (0/4 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Weierstrass Approximation Theorem (Theorem # 480)

Discussion

Proof

Prerequisites (0/4 completed)

Prerequisites Graph

Explore Further