[proofplan]
Write $e:=f-p$. The sufficiency direction is the direct zero-counting argument: any strictly better approximant forces $q-p$ to alternate sign across the $n+1$ extremal points, hence to have at least $n$ distinct zeros, contradicting the Haar property unless $q=p$. For necessity, we prove the standard Haar improvement alternative: either the active error set contains an alternating chain of length $n+1$, or there is a direction $h\in X$ whose sign agrees with $e$ on the whole active set. A small perturbation $p+\varepsilon h$ then lowers the maximum error on the active set while compactness and continuity keep the inactive set below the old maximum, contradicting best approximation.
[/proofplan]
[step:Show that an alternating extremal chain forces best approximation]
Assume that there exist points $a \leq x_0 < \cdots < x_n \leq b$ and a sign $\sigma \in \{-1,1\}$ such that
\begin{align*}
e(x_i)=\sigma(-1)^iE
\end{align*}
for every $i\in\{0,\dots,n\}$.
Suppose, toward a contradiction, that there exists $q\in X$ such that
\begin{align*}
\|f-q\|_{C([a,b])}<E.
\end{align*}
Define $r\in X$ by $r:=q-p$. At each $x_i$,
\begin{align*}
|e(x_i)-r(x_i)|=|f(x_i)-q(x_i)|<E.
\end{align*}
If $e(x_i)=E$, then this inequality implies $r(x_i)>0$. If $e(x_i)=-E$, then it implies $r(x_i)<0$. Hence
\begin{align*}
\sigma(-1)^i r(x_i)>0
\end{align*}
for every $i\in\{0,\dots,n\}$.
Therefore $r(x_i)$ and $r(x_{i+1})$ have opposite signs for each $i\in\{0,\dots,n-1\}$. Since $r$ is continuous on $[a,b]$, for each such $i$ there exists a point $z_i\in(x_i,x_{i+1})$ such that $r(z_i)=0$. The intervals $(x_i,x_{i+1})$ are pairwise disjoint, so the zeros $z_0,\dots,z_{n-1}$ are distinct. Thus $r$ has at least $n$ distinct zeros in $[a,b]$.
Because $X$ is a Haar space of dimension $n$, every nonzero element of $X$ has at most $n-1$ distinct zeros. Hence $r=0$, so $q=p$. But then
\begin{align*}
\|f-q\|_{C([a,b])}=\|f-p\|_{C([a,b])}=E,
\end{align*}
contradicting $\|f-q\|_{C([a,b])}<E$. Therefore no strictly better $q$ exists, and $p$ is a best approximation.
[/step]
[step:Prove the Haar improvement alternative on the active error set]
Define the active set $A\subset [a,b]$ by
\begin{align*}
A:=\{x\in[a,b]: |e(x)|=E\}.
\end{align*}
Since $e$ is continuous and $[a,b]$ is compact, $A$ is a nonempty compact subset of $[a,b]$. Define the sign function $s:A\to\{-1,1\}$ by
\begin{align*}
s(x):=\operatorname{sgn}(e(x)).
\end{align*}
This is well-defined on $A$ because $|e(x)|=E$. If $E=0$, then $f=p\in X$, and the theorem is immediate: any $n+1$ distinct points in $[a,b]$ give $e(x_i)=0=\sigma(-1)^iE$. Thus assume $E>0$.
We prove the following alternative.
[claim:Haar improvement alternative]
Either $A$ contains points $x_0<\cdots<x_n$ such that $s(x_i)=\sigma(-1)^i$ for some $\sigma\in\{-1,1\}$, or there exists $h\in X$ such that
\begin{align*}
s(x)h(x)>0
\end{align*}
for every $x\in A$.
[/claim]
[proof]
Choose a basis $u_1,\dots,u_n$ of $X$. Define the evaluation map
\begin{align*}
U:[a,b]\to\mathbb{R}^n
\end{align*}
by
\begin{align*}
U(x):=(u_1(x),\dots,u_n(x)).
\end{align*}
The Haar property implies that for every choice of distinct points $t_1<\cdots<t_n$ in $[a,b]$, the matrix $(u_j(t_i))_{i,j=1}^n$ is invertible. Indeed, if it were singular, there would be coefficients $c_1,\dots,c_n$, not all zero, such that $h:=\sum_{j=1}^n c_j u_j$ vanished at all $t_1,\dots,t_n$, contradicting the Haar property.
Define the compact set $K\subset\mathbb{R}^n$ by
\begin{align*}
K:=\{s(x)U(x):x\in A\}.
\end{align*}
A function $h=\sum_{j=1}^n c_j u_j\in X$ satisfies $s(x)h(x)>0$ on $A$ exactly when the vector $c:=(c_1,\dots,c_n)\in\mathbb{R}^n$ satisfies
\begin{align*}
c\cdot v>0
\end{align*}
for every $v\in K$.
If $0\notin \operatorname{conv}(K)$, the compact convex set $\operatorname{conv}(K)$ and the singleton $\{0\}$ are disjoint. Since $\operatorname{conv}(K)$ is compact, choose $w\in\operatorname{conv}(K)$ with minimal Euclidean norm. Then $w\neq0$. For every $v\in\operatorname{conv}(K)$ and every $t\in[0,1]$, minimality of $w$ gives
\begin{align*}
|w|^2\leq |(1-t)w+tv|^2.
\end{align*}
Expanding the right-hand side and dividing by $t>0$ gives
\begin{align*}
0\leq 2w\cdot(v-w)+t|v-w|^2.
\end{align*}
Letting $t\downarrow0$ yields $w\cdot v\geq |w|^2>0$. In particular $w\cdot v>0$ for every $v\in K$. Therefore $h:=\sum_{j=1}^n w_j u_j$ satisfies $s(x)h(x)>0$ for every $x\in A$.
It remains to show that if $0\in\operatorname{conv}(K)$, then $A$ contains an alternating chain of length $n+1$. By Carathéodory's finite-dimensional convexity theorem, there exist points $y_0<\cdots<y_m$ in $A$ and coefficients $\lambda_0,\dots,\lambda_m>0$, with $m\leq n$ and $\sum_{r=0}^m\lambda_r=1$, such that
\begin{align*}
\sum_{r=0}^m \lambda_r s(y_r)U(y_r)=0.
\end{align*}
Choose such a representation with the smallest possible number of support points.
The vectors $U(y_0),\dots,U(y_m)$ are linearly independent whenever $m\leq n-1$. To prove this, extend the distinct points $y_0,\dots,y_m$ to $n$ distinct points $t_1<\cdots<t_n$ in $[a,b]$. The Haar determinant invertibility proved above says that the $n\times n$ matrix $(u_j(t_i))_{i,j=1}^n$ is invertible, so its rows are linearly independent. The rows corresponding to $y_0,\dots,y_m$ are a subset of these rows, hence $U(y_0),\dots,U(y_m)$ are linearly independent. Therefore the displayed positive dependence cannot have $m\leq n-1$. Hence $m=n$.
We now prove that the signs $s(y_0),\dots,s(y_n)$ alternate. Suppose instead that $s(y_j)=s(y_{j+1})$ for some $j\in\{0,\dots,n-1\}$. Let $c_r:=\lambda_r s(y_r)$, so
\begin{align*}
\sum_{r=0}^n c_rU(y_r)=0.
\end{align*}
Because every $n$ of the vectors $U(y_0),\dots,U(y_n)$ are linearly independent, this dependence is unique up to multiplication by a nonzero scalar.
For a Chebyshev system, the coefficients of the unique dependence among the $n+1$ ordered evaluation vectors have alternating signs. To verify this directly, define the ordered Haar determinant map on the ordered configuration set by
\begin{align*}
D(t_1,\dots,t_n):=\det(u_j(t_i))_{i,j=1}^n.
\end{align*}
The set $\{(t_1,\dots,t_n)\in[a,b]^n:t_1<\cdots<t_n\}$ is connected, and $D$ is continuous and never zero by the Haar determinant invertibility proved above. Hence $D$ has one fixed sign on this ordered configuration set, after possibly replacing $u_1$ by $-u_1$. Delete the $r$-th vector and expand the determinant of the resulting $n\times n$ matrix. Cramer's cofactor formula gives
\begin{align*}
\operatorname{sgn}(c_r)=\tau(-1)^r
\end{align*}
for one fixed sign $\tau\in\{-1,1\}$ and every $r\in\{0,\dots,n\}$. Since $\lambda_r>0$, this means
\begin{align*}
s(y_r)=\operatorname{sgn}(c_r)=\tau(-1)^r.
\end{align*}
Thus the signs alternate, contradicting $s(y_j)=s(y_{j+1})$. Hence $y_0<\cdots<y_n$ is the required alternating chain.
[/proof]
[guided]
The point of this alternative is to turn the geometric failure of alternation into an analytic direction that improves the approximation. Choose a basis $u_1,\dots,u_n$ of $X$. Define the evaluation map
\begin{align*}
U:[a,b]\to\mathbb{R}^n
\end{align*}
by
\begin{align*}
U(x):=(u_1(x),\dots,u_n(x)).
\end{align*}
The first structural fact we need is that evaluations at $n$ distinct points are independent. If $t_1<\cdots<t_n$ are points in $[a,b]$ and the matrix $(u_j(t_i))_{i,j=1}^n$ were singular, then there would be coefficients $c_1,\dots,c_n$, not all zero, such that the function $h:=\sum_{j=1}^n c_j u_j\in X$ vanished at every point $t_1,\dots,t_n$. This would contradict the Haar property, because a nonzero element of an $n$-dimensional Haar space has at most $n-1$ distinct zeros. Hence every such Haar evaluation matrix is invertible.
Now encode each active point $x\in A$ by the signed evaluation vector $s(x)U(x)\in\mathbb{R}^n$, and define
\begin{align*}
K:=\{s(x)U(x):x\in A\}.
\end{align*}
The set $K$ is compact because $A$ is compact and $x\mapsto s(x)U(x)$ is continuous on $A$: here $s(x)=e(x)/E$ on $A$ since $E>0$. A direction $h=\sum_{j=1}^n c_j u_j\in X$ satisfies
\begin{align*}
s(x)h(x)>0
\end{align*}
for every $x\in A$ exactly when the coefficient vector $c:=(c_1,\dots,c_n)\in\mathbb{R}^n$ satisfies
\begin{align*}
c\cdot v>0
\end{align*}
for every $v\in K$.
If $0\notin\operatorname{conv}(K)$, then the compact convex set $\operatorname{conv}(K)$ is disjoint from the singleton $\{0\}$. Choose $w\in\operatorname{conv}(K)$ with minimal Euclidean norm. Since $0\notin\operatorname{conv}(K)$, we have $w\neq0$. For every $v\in\operatorname{conv}(K)$ and every $t\in[0,1]$, the point $(1-t)w+tv$ belongs to $\operatorname{conv}(K)$, so minimality of $w$ gives
\begin{align*}
|w|^2\leq |(1-t)w+tv|^2.
\end{align*}
Expanding the square and subtracting $|w|^2$ gives
\begin{align*}
0\leq 2t w\cdot(v-w)+t^2|v-w|^2.
\end{align*}
For $t>0$, division by $t$ gives
\begin{align*}
0\leq 2w\cdot(v-w)+t|v-w|^2.
\end{align*}
Letting $t\downarrow0$ yields
\begin{align*}
w\cdot v\geq |w|^2>0.
\end{align*}
Taking $v=s(x)U(x)$, define $h\in X$ by
\begin{align*}
h:=\sum_{j=1}^n w_j u_j.
\end{align*}
Then $s(x)h(x)=w\cdot s(x)U(x)>0$ for every $x\in A$, which is the desired improvement direction.
It remains to treat the case $0\in\operatorname{conv}(K)$. By Carathéodory's finite-dimensional convexity theorem, there exist points $y_0<\cdots<y_m$ in $A$ and coefficients $\lambda_0,\dots,\lambda_m>0$, with $m\leq n$ and $\sum_{r=0}^m\lambda_r=1$, such that
\begin{align*}
\sum_{r=0}^m \lambda_r s(y_r)U(y_r)=0.
\end{align*}
Choose such a representation with the smallest possible number of support points. We claim that $m=n$. If $m\leq n-1$, extend the distinct points $y_0,\dots,y_m$ to $n$ distinct points $t_1<\cdots<t_n$ in $[a,b]$. The matrix $(u_j(t_i))_{i,j=1}^n$ is invertible by the Haar determinant argument above, so its rows are linearly independent. The rows $U(y_0),\dots,U(y_m)$ are a subset of those rows, hence they are linearly independent. This contradicts the displayed positive dependence. Therefore $m=n$.
We now prove that the signs $s(y_0),\dots,s(y_n)$ alternate. Let $c_r:=\lambda_r s(y_r)$, so
\begin{align*}
\sum_{r=0}^n c_rU(y_r)=0.
\end{align*}
Every $n$ of the vectors $U(y_0),\dots,U(y_n)$ are linearly independent by the same extension-and-invertibility argument, so this dependence is unique up to multiplication by a nonzero scalar. To compute its signs, define the ordered Haar determinant map by
\begin{align*}
D(t_1,\dots,t_n):=\det(u_j(t_i))_{i,j=1}^n.
\end{align*}
The ordered configuration set $\{(t_1,\dots,t_n)\in[a,b]^n:t_1<\cdots<t_n\}$ is connected, and $D$ is continuous and never zero on it. Hence $D$ has one fixed sign there, after possibly replacing $u_1$ by $-u_1$. Cramer's cofactor formula for the unique dependence among the $n+1$ ordered vectors gives
\begin{align*}
\operatorname{sgn}(c_r)=\tau(-1)^r
\end{align*}
for one fixed sign $\tau\in\{-1,1\}$ and every $r\in\{0,\dots,n\}$. Since each $\lambda_r>0$, the sign of $c_r=\lambda_r s(y_r)$ is exactly $s(y_r)$. Therefore
\begin{align*}
s(y_r)=\tau(-1)^r
\end{align*}
for every $r\in\{0,\dots,n\}$. Thus $y_0<\cdots<y_n$ is an alternating chain in $A$.
[/guided]
[/step]
[step:Use the improvement alternative to prove necessity]
Assume that $p$ is a best approximation to $f$ from $X$. If $E=0$, then $f=p$, and any strictly increasing choice of $n+1$ points in $[a,b]$ satisfies
\begin{align*}
f(x_i)-p(x_i)=0=\sigma(-1)^iE.
\end{align*}
Thus assume $E>0$.
Apply the Haar improvement alternative to the active set $A=\{x\in[a,b]:|e(x)|=E\}$. If $A$ contains an alternating chain of length $n+1$, then the desired conclusion follows. Suppose instead that there exists $h\in X$ such that
\begin{align*}
s(x)h(x)>0
\end{align*}
for every $x\in A$.
Since $E>0$ and $|e(x)|=E$ on $A$, we have $s(x)=e(x)/E$ for every $x\in A$. Hence the function $x\mapsto s(x)h(x)$ is continuous on $A$. Since $A$ is compact, define
\begin{align*}
\alpha:=\min_{x\in A}s(x)h(x)>0.
\end{align*}
Let
\begin{align*}
M:=\|h\|_{C([a,b])}.
\end{align*}
If $M=0$, then $h=0$, contradicting $\alpha>0$, so $M>0$.
Choose $\varepsilon>0$ such that
\begin{align*}
0<\varepsilon<\frac{2E\alpha}{M^2}.
\end{align*}
For $x\in A$, we have $e(x)=s(x)E$, and therefore
\begin{align*}
|e(x)-\varepsilon h(x)|^2=E^2-2\varepsilon E s(x)h(x)+\varepsilon^2 h(x)^2.
\end{align*}
Using $s(x)h(x)\geq\alpha$ and $|h(x)|\leq M$, we get
\begin{align*}
|e(x)-\varepsilon h(x)|^2\leq E^2-2\varepsilon E\alpha+\varepsilon^2M^2<E^2.
\end{align*}
Thus
\begin{align*}
|e(x)-\varepsilon h(x)|<E
\end{align*}
for every $x\in A$.
It remains to control points away from $A$. Because $A$ is compact and the function $x\mapsto |e(x)|$ is continuous, for every open neighbourhood $V$ of $A$ in $[a,b]$ with compact complement $[a,b]\setminus V$, the maximum of $|e|$ on $[a,b]\setminus V$ is strictly smaller than $E$. Choose such a neighbourhood $V$ small enough that the strict inequality on $A$ above persists on $V$ by continuity of $e-\varepsilon h$. Then there exists $\eta>0$ such that
\begin{align*}
|e(x)-\varepsilon h(x)|\leq E-\eta
\end{align*}
for every $x\in V$.
On the compact set $[a,b]\setminus V$, define
\begin{align*}
\beta:=E-\max_{x\in [a,b]\setminus V}|e(x)|>0.
\end{align*}
By decreasing $\varepsilon$ if necessary while keeping $\varepsilon>0$, we may also require
\begin{align*}
\varepsilon M<\beta.
\end{align*}
Then for $x\in [a,b]\setminus V$,
\begin{align*}
|e(x)-\varepsilon h(x)|\leq |e(x)|+\varepsilon |h(x)|\leq E-\beta+\varepsilon M<E.
\end{align*}
Combining the estimates on $V$ and on $[a,b]\setminus V$ gives
\begin{align*}
\|f-(p+\varepsilon h)\|_{C([a,b])}=\|e-\varepsilon h\|_{C([a,b])}<E.
\end{align*}
Since $p+\varepsilon h\in X$, this contradicts the assumption that $p$ is a best approximation. Therefore the improvement direction cannot exist, and the Haar improvement alternative forces an alternating chain $x_0<\cdots<x_n$ in $A$. For those points,
\begin{align*}
e(x_i)=s(x_i)E=\sigma(-1)^iE,
\end{align*}
which is the desired conclusion.
[/step]
[step:Conclude the equivalence]
The first step proves that the existence of an alternating extremal chain implies that $p$ is a best approximation from $X$. The preceding step proves that every best approximation must possess such a chain. Hence $p$ is a best uniform approximation to $f$ from the Haar space $X$ if and only if there exist $a\leq x_0<\cdots<x_n\leq b$ and $\sigma\in\{-1,1\}$ such that
\begin{align*}
f(x_i)-p(x_i)=\sigma(-1)^i\|f-p\|_{C([a,b])}
\end{align*}
for every $i\in\{0,\dots,n\}$.
[/step]