Let $K \subset \mathbb{R}^p$ be a compact set, and let $\psi: \mathbb{R} \to \mathbb{R}$ be a continuous activation function that is not a polynomial. For any continuous function $f: K \to \mathbb{R}$ and any $\varepsilon > 0$, there exist $m \in \mathbb{N}$, weight matrices $\beta^{(1)} \in \mathbb{R}^{m \times p}$, $\beta^{(2)} \in \mathbb{R}^{1 \times m}$, and bias vectors $\mu^{(1)} \in \mathbb{R}^m$, $\mu^{(2)} \in \mathbb{R}$ such that the two-layer network
\begin{align*}
h(x) = \beta^{(2)} \psi(\beta^{(1)} x + \mu^{(1)}) + \mu^{(2)}
\end{align*}
satisfies $\sup_{x \in K} |h(x) - f(x)| < \varepsilon$.