[proofplan]
The case $E=0$ is immediate because then $f=p$. For $E>0$, the active set is nonempty by compactness, and the signed active evaluation functionals describe exactly the first-order behaviour of the uniform norm at the error $e$ along directions in $X$. If the zero functional lies in their convex hull, evaluating the convex combination on an arbitrary perturbation forces at least one active point not to decrease the maximal error. Conversely, if zero is not in the convex hull, a finite-dimensional separation argument produces a direction in $X$ that strictly decreases the error at every active point, and compactness then makes the decrease uniform on all of $K$.
[/proofplan]
[step:Handle the exact approximation case]
Assume $E=0$. Then $\|f-p\|_\infty=0$, hence $f=p$ in $C(K)$. Therefore, for every $q \in X$,
\begin{align*}
\|f-p\|_\infty = 0 \leq \|f-q\|_\infty.
\end{align*}
Thus $p$ is a best uniform approximation to $f$ from $X$.
[/step]
[step:Define the active signed evaluation set]
Assume $E>0$. Since $e \in C(K)$ and $K$ is compact, the function $|e|: K \to \mathbb{R}$ attains its maximum. Hence the active set
\begin{align*}
A := \{t \in K : |e(t)| = E\}
\end{align*}
is nonempty and compact.
For each $t \in A$, define the signed active evaluation functional $\ell_t: X \to \mathbb{R}$ by
\begin{align*}
\ell_t(x) := \operatorname{sgn}(e(t))x(t).
\end{align*}
Let
\begin{align*}
C := \operatorname{conv}\{\ell_t : t \in A\} \subset X^*
\end{align*}
denote their convex hull.
[/step]
[step:Use a convex balance to prove optimality]
Assume $0 \in C$. Then there exist points $t_1,\dots,t_m \in A$ and coefficients $\alpha_1,\dots,\alpha_m \in [0,1]$ such that
\begin{align*}
\sum_{j=1}^{m}\alpha_j = 1
\end{align*}
and
\begin{align*}
\sum_{j=1}^{m}\alpha_j \ell_{t_j} = 0
\end{align*}
as an element of $X^*$.
Let $q \in X$ be arbitrary and define the perturbation $h := q-p \in X$. Evaluating the zero functional on $h$ gives
\begin{align*}
\sum_{j=1}^{m}\alpha_j \operatorname{sgn}(e(t_j))h(t_j) = 0.
\end{align*}
Therefore at least one index $j_0 \in \{1,\dots,m\}$ satisfies
\begin{align*}
\operatorname{sgn}(e(t_{j_0}))h(t_{j_0}) \leq 0.
\end{align*}
Since $t_{j_0} \in A$, we have $e(t_{j_0})=\operatorname{sgn}(e(t_{j_0}))E$. Thus
\begin{align*}
|f(t_{j_0})-q(t_{j_0})| = |e(t_{j_0})-h(t_{j_0})|.
\end{align*}
Using $e(t_{j_0})=\operatorname{sgn}(e(t_{j_0}))E$ and multiplying inside the absolute value by the sign $\operatorname{sgn}(e(t_{j_0}))$, whose absolute value is $1$, gives
\begin{align*}
|e(t_{j_0})-h(t_{j_0})| = |E-\operatorname{sgn}(e(t_{j_0}))h(t_{j_0})|.
\end{align*}
Because $\operatorname{sgn}(e(t_{j_0}))h(t_{j_0}) \leq 0$, this implies
\begin{align*}
|f(t_{j_0})-q(t_{j_0})| \geq E.
\end{align*}
Hence
\begin{align*}
\|f-q\|_\infty \geq |f(t_{j_0})-q(t_{j_0})| \geq E = \|f-p\|_\infty.
\end{align*}
Because $q \in X$ was arbitrary, $p$ is a best uniform approximation to $f$ from $X$.
[guided]
Recall the objects in this direction. We are assuming $E>0$, where $E=\|e\|_\infty$ and $e=f-p$. The active set is
\begin{align*}
A := \{t \in K : |e(t)|=E\},
\end{align*}
and for each $t \in A$ the signed active evaluation functional is the map $\ell_t: X \to \mathbb{R}$ defined by
\begin{align*}
\ell_t(x) := \operatorname{sgn}(e(t))x(t).
\end{align*}
Assume that the zero functional is a convex combination of these signed evaluations. This means there are active points $t_1,\dots,t_m \in A$ and nonnegative coefficients $\alpha_1,\dots,\alpha_m$ with total mass $1$ such that
\begin{align*}
\sum_{j=1}^{m}\alpha_j \operatorname{sgn}(e(t_j))x(t_j) = 0
\end{align*}
for every $x \in X$.
We want to prove that no competing element $q \in X$ can improve on $p$. Let
\begin{align*}
h := q-p \in X.
\end{align*}
Substituting this particular element of $X$ into the convex-balance identity gives
\begin{align*}
\sum_{j=1}^{m}\alpha_j \operatorname{sgn}(e(t_j))h(t_j) = 0.
\end{align*}
If every term with positive coefficient were strictly positive, then the weighted sum would be strictly positive. Therefore at least one active point $t_{j_0}$ satisfies
\begin{align*}
\operatorname{sgn}(e(t_{j_0}))h(t_{j_0}) \leq 0.
\end{align*}
At this point the sign is exactly what matters. Since $t_{j_0}$ is active, $|e(t_{j_0})|=E$, so
\begin{align*}
e(t_{j_0})=\operatorname{sgn}(e(t_{j_0}))E.
\end{align*}
The new error at $t_{j_0}$ is
\begin{align*}
f(t_{j_0})-q(t_{j_0}) = e(t_{j_0})-h(t_{j_0}).
\end{align*}
Multiplying inside the absolute value by the sign $\operatorname{sgn}(e(t_{j_0}))$, whose absolute value is $1$, gives
\begin{align*}
|f(t_{j_0})-q(t_{j_0})|
= |E-\operatorname{sgn}(e(t_{j_0}))h(t_{j_0})|.
\end{align*}
Because $\operatorname{sgn}(e(t_{j_0}))h(t_{j_0}) \leq 0$, the last quantity is at least $E$. Hence
\begin{align*}
\|f-q\|_\infty \geq |f(t_{j_0})-q(t_{j_0})| \geq E = \|f-p\|_\infty.
\end{align*}
Since this holds for every $q \in X$, the element $p$ is a best uniform approximation.
[/guided]
[/step]
[step:Separate the origin from the convex hull when the balance fails]
Assume $0 \notin C$. Since $X$ is finite-dimensional, so is $X^*$. The set $C$ is compact and convex. Convexity is part of the definition of convex hull. For compactness, first note that the map $t \mapsto \ell_t$ from $A$ to $X^*$ is continuous, and set $d := \dim X^*$. We use the following elementary finite-dimensional reduction. If a convex combination uses more than $d+1$ points of $X^*$, then those points are affinely dependent; subtracting a suitable multiple of the affine dependence removes at least one coefficient while preserving the represented point and keeping all coefficients nonnegative. More explicitly, choose a nonzero affine dependence among the points, and take the largest nonnegative multiple for which every adjusted coefficient remains nonnegative; at least one coefficient then becomes zero, while the total coefficient sum and the represented point are unchanged. Repeating this reduction shows that every element of $C$ is a convex combination of at most $d+1$ elements of $\{\ell_t:t\in A\}$. Hence $C$ is the continuous image of the compact set $A^{d+1} \times \Delta_d$, where
\begin{align*}
\Delta_d := \{(\alpha_0,\dots,\alpha_d) \in [0,1]^{d+1} : \alpha_0+\cdots+\alpha_d=1\},
\end{align*}
under the map sending $(t_0,\dots,t_d,\alpha_0,\dots,\alpha_d)$ to $\sum_{i=0}^{d}\alpha_i\ell_{t_i}$. Therefore $C$ is compact.
We prove the finite-dimensional separation fact needed below, so no external separation theorem is being invoked.
[claim:Strict separation from a compact convex set]
Let $V$ be a finite-dimensional real [normed vector space](/page/Normed%20Vector%20Space), and let $D \subset V$ be nonempty, compact, and convex. If $0 \notin D$, then there exists a linear functional $\Lambda: V \to \mathbb{R}$ such that
\begin{align*}
\Lambda(y) > 0
\end{align*}
for every $y \in D$.
[/claim]
[proof]
Fix a Euclidean [inner product](/page/Inner%20Product) $\langle \cdot,\cdot\rangle$ on $V$, and let $|y|_0 := \sqrt{\langle y,y\rangle}$ denote its induced Euclidean norm. Since $V$ is finite-dimensional, this norm induces the same topology as the given norm on $V$. Since $D$ is compact and $0 \notin D$, the [continuous function](/page/Continuous%20Function) $y \mapsto |y|_0$ attains a positive minimum on $D$. Choose $y_0 \in D$ with
\begin{align*}
|y_0|_0 = \min_{y \in D}|y|_0 > 0.
\end{align*}
For every $y \in D$ and every $s \in [0,1]$, convexity gives
\begin{align*}
y_0+s(y-y_0) \in D.
\end{align*}
The function $\varphi_{y}: [0,1] \to \mathbb{R}$ defined by
\begin{align*}
\varphi_y(s) := |y_0+s(y-y_0)|_0^2
\end{align*}
has a minimum at $s=0$. Therefore its right derivative at $0$ is nonnegative:
\begin{align*}
2\langle y_0, y-y_0\rangle \geq 0.
\end{align*}
Hence
\begin{align*}
\langle y_0,y\rangle \geq |y_0|_0^2 > 0
\end{align*}
for every $y \in D$. Defining $\Lambda: V \to \mathbb{R}$ by
\begin{align*}
\Lambda(y) := \langle y_0,y\rangle
\end{align*}
gives the required linear functional.
[/proof]
Apply the claim with $V=X^*$ and $D=C$. We obtain a linear functional $\Lambda: X^* \to \mathbb{R}$ such that
\begin{align*}
\Lambda(\ell) > 0
\end{align*}
for every $\ell \in C$. We next realize $\Lambda$ as evaluation at an element of $X$ by writing out the finite-dimensional dual-basis argument. Choose a basis $x_1,\dots,x_r$ of $X$, and let $\varepsilon_1,\dots,\varepsilon_r$ be the [dual basis](/theorems/414) of $X^*$, so every $\ell \in X^*$ has the expansion $\ell=\sum_{i=1}^{r}\ell(x_i)\varepsilon_i$. Define
\begin{align*}
h := \sum_{i=1}^{r}\Lambda(\varepsilon_i)x_i \in X.
\end{align*}
Then, for every $\ell \in X^*$,
\begin{align*}
\ell(h)=\sum_{i=1}^{r}\Lambda(\varepsilon_i)\ell(x_i)=\Lambda\left(\sum_{i=1}^{r}\ell(x_i)\varepsilon_i\right)=\Lambda(\ell).
\end{align*} In particular,
\begin{align*}
\operatorname{sgn}(e(t))h(t)=\ell_t(h)>0
\end{align*}
for every $t \in A$.
[guided]
The purpose of this step is to turn the failure of convex balance into a direction that improves the error at every active point. We are working in the finite-dimensional real [vector space](/page/Vector%20Space) $X^*$. Since $0 \notin C$ and $C$ is compact and convex, we first justify that a strict separating functional exists rather than quoting a separation theorem.
The compactness of $C$ follows from a finite-dimensional reduction. The map $A \to X^*$, $t \mapsto \ell_t$, is continuous because evaluation at each fixed $x \in X$ is continuous on $K$, and all norms on the finite-dimensional space $X^*$ induce the same topology. Let $d := \dim X^*$. If a convex combination in $X^*$ uses more than $d+1$ points, those points are affinely dependent. Choosing a nonzero affine dependence and subtracting the largest nonnegative multiple that keeps all coefficients nonnegative removes at least one coefficient, preserves the represented point, and keeps the coefficient sum equal to $1$. Repeating this reduction shows that every element of $C$ is represented using at most $d+1$ points. Therefore $C$ is the continuous image of the compact set $A^{d+1} \times \Delta_d$, where
\begin{align*}
\Delta_d := \{(\alpha_0,\dots,\alpha_d) \in [0,1]^{d+1} : \alpha_0+\cdots+\alpha_d=1\},
\end{align*}
so $C$ is compact.
We now prove strict separation directly. Let $V$ be a finite-dimensional real normed vector space and let $D \subset V$ be nonempty, compact, convex, and disjoint from $0$. Choose a Euclidean inner product $\langle \cdot,\cdot\rangle$ on $V$, and write $|y|_0 := \sqrt{\langle y,y\rangle}$. Compactness gives a point $y_0 \in D$ minimizing $|y|_0$, and since $0 \notin D$ this minimum is positive:
\begin{align*}
|y_0|_0 = \min_{y \in D}|y|_0 > 0.
\end{align*}
For each $y \in D$ and $s \in [0,1]$, convexity gives $y_0+s(y-y_0) \in D$. Hence the function $s \mapsto |y_0+s(y-y_0)|_0^2$ has a minimum at $s=0$, so its right derivative at $0$ is nonnegative:
\begin{align*}
2\langle y_0,y-y_0\rangle \geq 0.
\end{align*}
Thus
\begin{align*}
\langle y_0,y\rangle \geq |y_0|_0^2 > 0
\end{align*}
for every $y \in D$. The functional $\Lambda: V \to \mathbb{R}$ defined by $\Lambda(y):=\langle y_0,y\rangle$ strictly separates $D$ from $0$.
Applying this result with $V=X^*$ and $D=C$ gives a linear functional $\Lambda: X^* \to \mathbb{R}$ such that $\Lambda(\ell)>0$ for every $\ell \in C$. We still need to convert this functional on $X^*$ into an actual perturbation direction in $X$. Choose a basis $x_1,\dots,x_r$ of $X$, and let $\varepsilon_1,\dots,\varepsilon_r$ be the dual basis of $X^*$. Define
\begin{align*}
h := \sum_{i=1}^{r}\Lambda(\varepsilon_i)x_i \in X.
\end{align*}
For every $\ell \in X^*$, the dual-basis expansion $\ell=\sum_{i=1}^{r}\ell(x_i)\varepsilon_i$ gives
\begin{align*}
\ell(h)=\sum_{i=1}^{r}\Lambda(\varepsilon_i)\ell(x_i)=\Lambda\left(\sum_{i=1}^{r}\ell(x_i)\varepsilon_i\right)=\Lambda(\ell).
\end{align*}
Taking $\ell=\ell_t$ for $t \in A$ yields
\begin{align*}
\operatorname{sgn}(e(t))h(t)=\ell_t(h)=\Lambda(\ell_t)>0.
\end{align*}
This is the desired direction: at every active point, moving from $p$ in the direction $h$ decreases the signed error.
[/guided]
[/step]
[step:Turn strict positivity on the active set into a uniform improvement]
From the previous step, define $g: A \to \mathbb{R}$ by
\begin{align*}
g(t) := \operatorname{sgn}(e(t))h(t).
\end{align*}
The function $g$ is continuous on $A$ and strictly positive on the compact set $A$. Hence it has a positive minimum. Define
\begin{align*}
\eta := \min_{t \in A}\operatorname{sgn}(e(t))h(t) > 0.
\end{align*}
We next choose a small error gap on which the same sign improvement persists. Since $E>0$, the set
\begin{align*}
V := \{t \in K : |e(t)|>E/2\}
\end{align*}
is an open neighbourhood of $A$, and $t \mapsto \operatorname{sgn}(e(t))h(t)$ is continuous on $V$. By compactness of $K$ and continuity, there exists
\begin{align*}
\delta \in \left(0,\min\left\{\frac{E}{2},\frac{\eta}{2}\right\}\right)
\end{align*}
such that
\begin{align*}
|e(t)|>E-\delta \implies \operatorname{sgn}(e(t))h(t)\geq \frac{\eta}{2}
\end{align*}
for every $t \in K$. Indeed, if no such $\delta$ existed, then for each positive integer $n$ there would be $t_n \in K$ with $|e(t_n)|>E-1/n$ and $\operatorname{sgn}(e(t_n))h(t_n)<\eta/2$ after replacing $1/n$ by a smaller number below $\min\{E/2,\eta/2\}$. A convergent subsequence of $(t_n)$ would have a limit $t_* \in A$, and continuity on $V$ would give $\operatorname{sgn}(e(t_*))h(t_*)\leq\eta/2$, contradicting the definition of $\eta$.
Choose $\lambda>0$ such that
\begin{align*}
\lambda\|h\|_\infty < \delta.
\end{align*}
Let
\begin{align*}
q_\lambda := p+\lambda h \in X.
\end{align*}
We prove $\|f-q_\lambda\|_\infty<E$.
If $t \in K$ and $|e(t)|\leq E-\delta$, then the triangle inequality gives
\begin{align*}
|f(t)-q_\lambda(t)|=|e(t)-\lambda h(t)|\leq |e(t)|+\lambda|h(t)|.
\end{align*}
Using $|e(t)|\leq E-\delta$ and $|h(t)|\leq \|h\|_\infty$, we obtain
\begin{align*}
|f(t)-q_\lambda(t)|\leq E-\delta+\lambda\|h\|_\infty<E.
\end{align*}
If $t \in K$ and $|e(t)|>E-\delta$, then the choice of $\delta$ gives
\begin{align*}
\operatorname{sgn}(e(t))h(t)\geq\frac{\eta}{2}>0.
\end{align*}
Also
\begin{align*}
\lambda|h(t)|\leq\lambda\|h\|_\infty<\delta,
\end{align*}
so
\begin{align*}
|e(t)|-\lambda\operatorname{sgn}(e(t))h(t)\geq |e(t)|-\lambda|h(t)|>E-2\delta\geq 0.
\end{align*}
Therefore the absolute value may be removed with the displayed nonnegativity justification:
\begin{align*}
|f(t)-q_\lambda(t)|=|e(t)-\lambda h(t)|=|e(t)|-\lambda\operatorname{sgn}(e(t))h(t).
\end{align*}
Using $\operatorname{sgn}(e(t))h(t)\geq \eta/2$ gives
\begin{align*}
|f(t)-q_\lambda(t)|\leq |e(t)|-\lambda\frac{\eta}{2}<E.
\end{align*}
Thus $\|f-q_\lambda\|_\infty<E=\|f-p\|_\infty$, so $p$ is not a best uniform approximation.
[/step]
[step:Conclude the equivalence]
We have shown that if
\begin{align*}
0 \in \operatorname{conv}\{\operatorname{sgn}(e(t))\delta_t|_X : t \in A\},
\end{align*}
then $p$ is a best uniform approximation to $f$ from $X$. Conversely, if the zero functional is not in this convex hull, the separation argument produces $q_\lambda \in X$ with
\begin{align*}
\|f-q_\lambda\|_\infty < \|f-p\|_\infty,
\end{align*}
so $p$ is not best. Therefore, for $E>0$, $p$ is a best uniform approximation if and only if the zero functional lies in the stated convex hull. Together with the exact approximation case $E=0$, this proves the theorem.
[/step]