[proofplan]
We show that the uniform closure $\overline{\mathcal{A}}$ is a closed subalgebra of $C(K; \mathbb{R})$ that is also a lattice (closed under $\max$ and $\min$). The lattice property follows from approximating the absolute value $|t|$ uniformly on $[-M, M]$ by polynomials (via the [Weierstrass Approximation Theorem](/theorems/480)) and composing with elements of $\overline{\mathcal{A}}$. Once the lattice structure is established, we use it to approximate any $f \in C(K; \mathbb{R})$ from below and above by elements of $\overline{\mathcal{A}}$: for each pair of points, the separation and non-vanishing hypotheses produce an element matching $f$ at those points, and the lattice operations assemble these into a global approximation.
[/proofplan]
[step:Show that $\overline{\mathcal{A}}$ is a closed subalgebra that contains $|g|$ for every $g \in \overline{\mathcal{A}}$]
Since $\mathcal{A}$ is a subalgebra of $C(K; \mathbb{R})$, its uniform closure $\overline{\mathcal{A}}$ is also a subalgebra: if $g_m \to g$ and $h_m \to h$ uniformly with $g_m, h_m \in \mathcal{A}$, then $g_m + h_m \to g + h$, $\alpha g_m \to \alpha g$, and $g_m h_m \to g h$ uniformly (the last using $\|g_m h_m - gh\|_\infty \leq \|g_m\|_\infty \|h_m - h\|_\infty + \|h\|_\infty \|g_m - g\|_\infty$, where $\|g_m\|_\infty$ is bounded since convergent sequences are bounded). Moreover, $\overline{\mathcal{A}}$ inherits the separation and non-vanishing properties from $\mathcal{A}$, since $\mathcal{A} \subset \overline{\mathcal{A}}$.
We now show $g \in \overline{\mathcal{A}}$ implies $|g| \in \overline{\mathcal{A}}$. Let $M := \|g\|_\infty$. If $M = 0$, then $g = 0$ and $|g| = 0 \in \overline{\mathcal{A}}$. Assume $M > 0$. By the [Weierstrass Approximation Theorem](/theorems/480) applied to the continuous function
\begin{align*}
\varphi: [-1, 1] &\to \mathbb{R} \\
s &\mapsto |s|,
\end{align*}
for each $\varepsilon > 0$ there exists a polynomial $p_\varepsilon$ with $\|p_\varepsilon - \varphi\|_{C([-1,1])} < \varepsilon$. Define $q_\varepsilon := p_\varepsilon - p_\varepsilon(0)$, so that $q_\varepsilon(0) = 0$. Since $\varphi(0) = 0$, we have $|p_\varepsilon(0)| = |p_\varepsilon(0) - \varphi(0)| < \varepsilon$, and therefore $\|q_\varepsilon - \varphi\|_{C([-1,1])} \leq \|p_\varepsilon - \varphi\|_{C([-1,1])} + |p_\varepsilon(0)| < 2\varepsilon$. Since $q_\varepsilon(0) = 0$, the polynomial $q_\varepsilon$ has no constant term: $q_\varepsilon(s) = c_1 s + c_2 s^2 + \cdots + c_d s^d$.
Now consider the function $g/M \in \overline{\mathcal{A}}$ (since $\overline{\mathcal{A}}$ is closed under scalar multiplication), which maps $K$ into $[-1, 1]$. Since $\overline{\mathcal{A}}$ is a subalgebra, the powers $(g/M)^j \in \overline{\mathcal{A}}$ for all $j \geq 1$, and the linear combination
\begin{align*}
q_\varepsilon(g/M) = c_1 (g/M) + c_2 (g/M)^2 + \cdots + c_d (g/M)^d \in \overline{\mathcal{A}}.
\end{align*}
For every $x \in K$, we have $g(x)/M \in [-1,1]$, so
\begin{align*}
|q_\varepsilon(g(x)/M) - |g(x)/M|| = |q_\varepsilon(g(x)/M) - \varphi(g(x)/M)| < 2\varepsilon.
\end{align*}
Therefore $\|M \cdot q_\varepsilon(g/M) - |g|\|_\infty < 2M\varepsilon$. Since $M \cdot q_\varepsilon(g/M) \in \overline{\mathcal{A}}$ and $\varepsilon > 0$ is arbitrary, we conclude $|g| \in \overline{\mathcal{A}}$.
[guided]
The central idea is that the absolute value function can be uniformly approximated by polynomials on any bounded interval, and substituting an algebra element into a polynomial produces another algebra element.
Why do we subtract $p_\varepsilon(0)$? Because we need the approximating polynomial to have no constant term. If $q_\varepsilon$ had a constant term $c_0$, then $q_\varepsilon(g/M) = c_0 + c_1(g/M) + \cdots$, and we would need the constant function $c_0$ to belong to $\overline{\mathcal{A}}$. While the non-vanishing hypothesis will let us produce constants later, at this stage we avoid the issue entirely by using a polynomial with $q_\varepsilon(0) = 0$.
The cost of subtracting $p_\varepsilon(0)$ is a factor of $2$ in the approximation error: $\|q_\varepsilon - \varphi\| \leq \|p_\varepsilon - \varphi\| + |p_\varepsilon(0)| < \varepsilon + \varepsilon = 2\varepsilon$. This is harmless since $\varepsilon$ is arbitrary.
[/guided]
[/step]
[step:Deduce that $\overline{\mathcal{A}}$ is a lattice]
For any $g, h \in \overline{\mathcal{A}}$, the identities
\begin{align*}
\max(g, h) &= \frac{g + h + |g - h|}{2}, \\
\min(g, h) &= \frac{g + h - |g - h|}{2}
\end{align*}
express $\max$ and $\min$ in terms of addition, scalar multiplication, and the absolute value. Since $g - h \in \overline{\mathcal{A}}$ (subalgebra) and $|g - h| \in \overline{\mathcal{A}}$ (previous step), both $\max(g, h)$ and $\min(g, h)$ belong to $\overline{\mathcal{A}}$.
[/step]
[step:For each pair of points, construct an element of $\overline{\mathcal{A}}$ matching $f$ at both]
Let $f \in C(K; \mathbb{R})$ and let $x, y \in K$ with $x \neq y$. Since $\mathcal{A}$ separates points, there exists $\psi \in \mathcal{A}$ with $\psi(x) \neq \psi(y)$. Since $\mathcal{A}$ vanishes at no point, there exist $\varphi_1, \varphi_2 \in \mathcal{A}$ with $\varphi_1(x) \neq 0$ and $\varphi_2(y) \neq 0$. Then $\varphi_1^2 + \varphi_2^2 \in \mathcal{A}$ is strictly positive at both $x$ and $y$, and the three functions $\varphi_1^2 + \varphi_2^2$, $\psi$, and $1$ (which we do not assume belongs to $\mathcal{A}$) can be combined to produce an element matching $f$ at both points.
Concretely, consider the three functions $u_1 := \psi^2$, $u_2 := \psi$, and $u_3 := \varphi_1^2 + \varphi_2^2$, all in $\mathcal{A}$. The $2 \times 3$ system
\begin{align*}
\alpha \, u_1(x) + \beta \, u_2(x) + \gamma \, u_3(x) &= f(x), \\
\alpha \, u_1(y) + \beta \, u_2(y) + \gamma \, u_3(y) &= f(y)
\end{align*}
has a solution $(\alpha, \beta, \gamma) \in \mathbb{R}^3$ because the matrix
\begin{align*}
\begin{pmatrix} u_1(x) & u_2(x) & u_3(x) \\ u_1(y) & u_2(y) & u_3(y) \end{pmatrix}
\end{align*}
has rank $2$: the columns corresponding to $u_2$ and $u_3$ give a $2 \times 2$ submatrix with determinant $\psi(x)(\varphi_1(y)^2 + \varphi_2(y)^2) - \psi(y)(\varphi_1(x)^2 + \varphi_2(x)^2)$. If this determinant is zero, replace $u_3$ with $u_1 = \psi^2$ and use the pair $(u_1, u_2)$: the $2 \times 2$ determinant is $\psi(x)^2 \psi(y) - \psi(y)^2 \psi(x) = \psi(x)\psi(y)(\psi(x) - \psi(y)) \neq 0$ whenever $\psi(x)\psi(y) \neq 0$. If $\psi(x) = 0$ (say), replace $\psi$ with $\psi + c\varphi_1^2$ for a small $c > 0$; the separation property $\psi(x) \neq \psi(y)$ is preserved for $c$ small enough, and the new function is non-zero at $x$.
In any case, we obtain $g_{xy} := \alpha u_1 + \beta u_2 + \gamma u_3 \in \mathcal{A}$ with $g_{xy}(x) = f(x)$ and $g_{xy}(y) = f(y)$.
[guided]
The goal is to find, for every pair $(x,y)$ with $x \neq y$, a function $g_{xy} \in \mathcal{A}$ that interpolates $f$ at both points. The hypotheses of separation and non-vanishing are both needed.
**Why separation alone is not enough:** If $\mathcal{A}$ separates $x$ and $y$ but every function in $\mathcal{A}$ vanishes at $x$, then no element of $\mathcal{A}$ can match a value $f(x) \neq 0$.
**Why non-vanishing alone is not enough:** If $\mathcal{A}$ does not separate $x$ and $y$, every function in $\mathcal{A}$ takes the same value at both points, so we cannot independently prescribe $f(x) \neq f(y)$.
The construction works by choosing enough linearly independent elements of $\mathcal{A}$ evaluated at $x$ and $y$ to span $\mathbb{R}^2$. The separation hypothesis provides a function $\psi$ with $\psi(x) \neq \psi(y)$, giving one independent direction. The non-vanishing hypothesis provides functions non-zero at $x$ and $y$, giving the second independent direction.
Once the $2 \times 3$ system has rank $2$, the underdetermined linear system $\alpha u_1(x_0) + \beta u_2(x_0) + \gamma u_3(x_0) = f(x_0)$ for $x_0 \in \{x, y\}$ has (infinitely many) solutions.
[/guided]
[/step]
[step:For fixed $x$, use $\min$ over finitely many $y$-values to approximate $f$ from above near $x$]
Fix $x \in K$ and $\varepsilon > 0$. For each $y \in K$, the function $g_{xy} \in \mathcal{A} \subset \overline{\mathcal{A}}$ satisfies $g_{xy}(x) = f(x)$ and $g_{xy}(y) = f(y)$. Since $g_{xy} - f$ is continuous and vanishes at $y$, the set
\begin{align*}
V_y := \{z \in K : g_{xy}(z) < f(z) + \varepsilon\}
\end{align*}
is open (as the preimage of $(-\infty, \varepsilon)$ under the continuous function $g_{xy} - f$) and contains $y$. As $y$ ranges over $K$, the sets $\{V_y\}_{y \in K}$ form an open cover of $K$. Since $K$ is compact, there exist finitely many points $y_1, \ldots, y_m \in K$ with $K = V_{y_1} \cup \cdots \cup V_{y_m}$.
Define
\begin{align*}
h_x := \min(g_{x y_1}, g_{x y_2}, \ldots, g_{x y_m}) \in \overline{\mathcal{A}},
\end{align*}
using the lattice property of $\overline{\mathcal{A}}$ established above (applying $\min$ iteratively to finitely many elements). Then:
- $h_x(x) = f(x)$, since $g_{xy_j}(x) = f(x)$ for all $j$ and $\min$ of equal values is that value.
- $h_x(z) < f(z) + \varepsilon$ for all $z \in K$, since every $z$ belongs to some $V_{y_j}$, where $g_{xy_j}(z) < f(z) + \varepsilon$, and $h_x \leq g_{xy_j}$.
[/step]
[step:Use $\max$ over finitely many $x$-values to approximate $f$ globally within $\varepsilon$]
For each $x \in K$, the function $h_x \in \overline{\mathcal{A}}$ satisfies $h_x(x) = f(x)$ and $h_x < f + \varepsilon$ on $K$. Since $h_x - f$ is continuous and vanishes at $x$, the set
\begin{align*}
U_x := \{z \in K : h_x(z) > f(z) - \varepsilon\}
\end{align*}
is open and contains $x$. The sets $\{U_x\}_{x \in K}$ cover $K$, so by compactness there exist $x_1, \ldots, x_\ell \in K$ with $K = U_{x_1} \cup \cdots \cup U_{x_\ell}$.
Define
\begin{align*}
h := \max(h_{x_1}, h_{x_2}, \ldots, h_{x_\ell}) \in \overline{\mathcal{A}}.
\end{align*}
Then for every $z \in K$:
- **Lower bound:** There exists $i$ with $z \in U_{x_i}$, so $h(z) \geq h_{x_i}(z) > f(z) - \varepsilon$.
- **Upper bound:** For each $i$, $h_{x_i}(z) < f(z) + \varepsilon$ (from the previous step), so $h(z) = \max_i h_{x_i}(z) < f(z) + \varepsilon$.
Combining: $|h(z) - f(z)| < \varepsilon$ for all $z \in K$, so $\|h - f\|_\infty < \varepsilon$.
Since $h \in \overline{\mathcal{A}}$ and $\varepsilon > 0$ was arbitrary, $f \in \overline{\mathcal{A}}$. As $f \in C(K; \mathbb{R})$ was arbitrary, $\overline{\mathcal{A}} = C(K; \mathbb{R})$.
[guided]
This is the "lattice approximation" argument, which reduces density to two applications of compactness. The structure is:
1. **Fix $x$, vary $y$:** For each $y$, the function $g_{xy}$ agrees with $f$ at both $x$ and $y$. Taking the minimum over finitely many $y$-values (chosen by compactness) produces $h_x$ that agrees with $f$ at $x$ and is at most $f + \varepsilon$ everywhere. The minimum pulls the function down wherever any $g_{xy_j}$ is close to $f$.
2. **Vary $x$:** Each $h_x$ satisfies $h_x(x) = f(x)$, so $h_x$ is close to $f$ near $x$. Taking the maximum over finitely many $x$-values (chosen by compactness) produces $h$ that is at least $f - \varepsilon$ everywhere. The maximum pushes the function up wherever some $h_{x_i}$ is close to $f$.
The interplay of $\min$ (to get the upper bound) and $\max$ (to get the lower bound) sandwiches $h$ within $\varepsilon$ of $f$ globally.
[/guided]
[/step]