[proofplan]
Let $a_j \in \mathbb{R}^m$ denote the $j$-th column of $A$, and consider the finitely generated cone $K=\{\sum_{j=1}^n x_j a_j : x_j \geq 0\}$. First we prove that $K$ is a closed convex cone, so that if $b\notin K$ we may separate $b$ from $K$ by a hyperplane. The conic nature of $K$ forces the separating functional to be nonnegative on all of $K$, and testing it on the generators $a_j$ gives $A^\top y \geq 0$. Finally, the two alternatives are incompatible by taking the Euclidean [inner product](/page/Inner%20Product) with $y$.
[/proofplan]
[step:Build the finitely generated cone determined by the columns of $A$]
For each $j \in \{1,\dots,n\}$, let $a_j \in \mathbb{R}^m$ denote the $j$-th column of $A$. Define the column cone
\begin{align*}
K := \left\{\sum_{j=1}^n x_j a_j : x=(x_1,\dots,x_n)\in \mathbb{R}^n \text{ and } x_j \geq 0 \text{ for every } j\right\} \subset \mathbb{R}^m.
\end{align*}
Equivalently, $K=\{Ax : x \in \mathbb{R}^n,\ x_j\geq 0 \text{ for every } j\}$. Thus the first alternative holds exactly when $b \in K$.
The set $K$ is a convex cone. Indeed, if $z=\sum_{j=1}^n x_j a_j$ and $w=\sum_{j=1}^n u_j a_j$ with $x_j,u_j\geq 0$, then for $\lambda,\mu\geq 0$,
\begin{align*}
\lambda z+\mu w=\sum_{j=1}^n (\lambda x_j+\mu u_j)a_j,
\end{align*}
and each coefficient $\lambda x_j+\mu u_j$ is nonnegative.
[/step]
[step:Prove that the column cone is closed]
We prove that $K$ is closed in the Euclidean topology of $\mathbb{R}^m$. For a subset $I\subset \{1,\dots,n\}$, define
\begin{align*}
K_I:=\left\{\sum_{i\in I}\lambda_i a_i : \lambda_i\geq 0 \text{ for every } i\in I\right\}.
\end{align*}
When the family $(a_i)_{i\in I}$ is linearly independent, let $V_I:=\operatorname{span}\{a_i:i\in I\}$ and define the [linear map](/page/Linear%20Map)
\begin{align*}
L_I:\mathbb{R}^I\to V_I,\quad (\lambda_i)_{i\in I}\mapsto \sum_{i\in I}\lambda_i a_i.
\end{align*}
Because $(a_i)_{i\in I}$ is a basis of $V_I$, the map $L_I$ is a linear isomorphism. Therefore $L_I^{-1}:V_I\to \mathbb{R}^I$ is continuous, and
\begin{align*}
K_I=L_I\left([0,\infty)^I\right)
\end{align*}
is closed in $V_I$. Since $V_I$ is a finite-dimensional linear subspace of $\mathbb{R}^m$, it is closed in $\mathbb{R}^m$, so $K_I$ is closed in $\mathbb{R}^m$.
It remains to see that $K$ is the union of these $K_I$ over linearly independent subfamilies. Take any $z\in K$, and choose a representation
\begin{align*}
z=\sum_{j\in J} x_j a_j
\end{align*}
where $J\subset \{1,\dots,n\}$, $x_j>0$ for every $j\in J$, and $J$ has minimal cardinality among all such positive-support representations of $z$. If $(a_j)_{j\in J}$ were linearly dependent, there would exist [real numbers](/page/Real%20Numbers) $c_j$, not all zero, such that
\begin{align*}
\sum_{j\in J} c_j a_j=0.
\end{align*}
Replacing $(c_j)_{j\in J}$ by its negative if necessary, assume that $c_j>0$ for at least one $j\in J$. Define
\begin{align*}
\alpha:=\min\left\{\frac{x_j}{c_j}:j\in J \text{ and } c_j>0\right\}.
\end{align*}
Then $\alpha>0$, and the coefficients $x_j-\alpha c_j$ are nonnegative for every $j\in J$, with at least one of them equal to $0$. Moreover,
\begin{align*}
z=\sum_{j\in J}x_j a_j-\alpha\sum_{j\in J}c_j a_j=\sum_{j\in J}(x_j-\alpha c_j)a_j.
\end{align*}
This gives a representation of $z$ with strictly smaller positive support, contradicting the minimality of $J$. Hence $(a_j)_{j\in J}$ is linearly independent.
Let $\mathcal{I}$ denote the finite collection of all subsets $I\subset\{1,\dots,n\}$ for which $(a_i)_{i\in I}$ is linearly independent. Thus
\begin{align*}
K=\bigcup_{I\in\mathcal{I}} K_I.
\end{align*}
The collection $\mathcal{I}$ is finite, and each set in the union is closed. Therefore $K$ is closed.
[guided]
The point of this step is that separation theorems require a closed convex set, so we must not simply assume that the image of the nonnegative orthant under $A$ is closed. We prove closedness using finite generation.
For every subset $I\subset \{1,\dots,n\}$, define
\begin{align*}
K_I:=\left\{\sum_{i\in I}\lambda_i a_i : \lambda_i\geq 0 \text{ for every } i\in I\right\}.
\end{align*}
If the generators $(a_i)_{i\in I}$ are linearly independent, then they form a basis for their span $V_I:=\operatorname{span}\{a_i:i\in I\}$. Define
\begin{align*}
L_I:\mathbb{R}^I\to V_I,\quad (\lambda_i)_{i\in I}\mapsto \sum_{i\in I}\lambda_i a_i.
\end{align*}
This map is a linear isomorphism because the generators are a basis of $V_I$. In finite-dimensional normed spaces, a linear isomorphism has a continuous inverse. Since the orthant $[0,\infty)^I$ is closed in $\mathbb{R}^I$, its image under $L_I$ is closed in $V_I$:
\begin{align*}
K_I=L_I\left([0,\infty)^I\right).
\end{align*}
Also, $V_I$ is a finite-dimensional linear subspace of $\mathbb{R}^m$, hence closed in $\mathbb{R}^m$. Therefore $K_I$ is closed as a subset of $\mathbb{R}^m$.
Now we explain why every point of $K$ actually lies in one of these closed cones generated by a linearly independent subfamily. Take $z\in K$. By definition, $z$ has at least one representation as a nonnegative linear combination of the columns:
\begin{align*}
z=\sum_{j=1}^n x_j a_j,\qquad x_j\geq 0.
\end{align*}
Discard every zero coefficient, and among all remaining positive-support representations choose one with the smallest possible support:
\begin{align*}
z=\sum_{j\in J}x_j a_j,\qquad x_j>0 \text{ for every } j\in J.
\end{align*}
We claim that $(a_j)_{j\in J}$ is linearly independent. Suppose not. Then there are real numbers $c_j$, not all zero, such that
\begin{align*}
\sum_{j\in J}c_j a_j=0.
\end{align*}
After replacing every $c_j$ by $-c_j$ if needed, at least one coefficient $c_j$ is positive. Define
\begin{align*}
\alpha:=\min\left\{\frac{x_j}{c_j}:j\in J \text{ and } c_j>0\right\}.
\end{align*}
Because each $x_j>0$ and the minimum is taken over a nonempty finite set, $\alpha>0$. For indices with $c_j>0$, the definition of $\alpha$ gives $x_j-\alpha c_j\geq 0$. For indices with $c_j\leq 0$, we have $x_j-\alpha c_j\geq x_j>0$. Hence every new coefficient is nonnegative, and at least one coefficient becomes zero by the definition of the minimum. Since the dependence relation sums to zero, subtracting $\alpha$ times it does not change $z$:
\begin{align*}
z=\sum_{j\in J}x_j a_j-\alpha\sum_{j\in J}c_j a_j=\sum_{j\in J}(x_j-\alpha c_j)a_j.
\end{align*}
This is a nonnegative representation of $z$ with fewer positive coefficients, contradicting the minimal choice of $J$. Therefore the columns in $J$ are linearly independent.
We have shown that each $z\in K$ belongs to $K_I$ for some linearly independent index set $I$. Let $\mathcal{I}$ denote the finite collection of all subsets $I\subset\{1,\dots,n\}$ for which $(a_i)_{i\in I}$ is linearly independent. Hence
\begin{align*}
K=\bigcup_{I\in\mathcal{I}} K_I.
\end{align*}
There are only finitely many subsets $I\subset\{1,\dots,n\}$, so $\mathcal{I}$ is finite and this is a finite union of closed subsets of $\mathbb{R}^m$. Therefore $K$ is closed.
[/guided]
[/step]
[step:Separate $b$ from the cone when $b$ is not feasible]
If $b\in K$, then by the definition of $K$ there exists $x\in\mathbb{R}^n$ with $x_j\geq 0$ for every $j$ and $Ax=b$, so the first alternative holds.
Assume now that $b\notin K$. Since $K\subset\mathbb{R}^m$ is nonempty, closed, and convex, and $b\notin K$, the finite-dimensional strong separating hyperplane theorem applies (citing a result not yet in the wiki: Strong Separating Hyperplane Theorem). Hence there exist a nonzero vector $y\in\mathbb{R}^m$ and a real number $\alpha\in\mathbb{R}$ such that
\begin{align*}
b\cdot y<\alpha\leq z\cdot y
\end{align*}
for every $z\in K$.
Because $0\in K$, substituting $z=0$ gives $\alpha\leq 0$. We now prove that $z\cdot y\geq 0$ for every $z\in K$. If there were $z_0\in K$ with $z_0\cdot y<0$, then, since $K$ is a cone, $t z_0\in K$ for every $t\geq 0$. The separation inequality would give
\begin{align*}
\alpha\leq (t z_0)\cdot y=t(z_0\cdot y)
\end{align*}
for every $t\geq 0$, which is impossible for sufficiently large $t$ because $z_0\cdot y<0$. Therefore
\begin{align*}
z\cdot y\geq 0
\end{align*}
for every $z\in K$. Since $b\cdot y<\alpha\leq 0$, we also have $b\cdot y<0$.
[/step]
[step:Test the separating vector on the columns of $A$]
For each $j\in\{1,\dots,n\}$, the column $a_j$ belongs to $K$, because
\begin{align*}
a_j=Ae_j,
\end{align*}
where $e_j\in\mathbb{R}^n$ is the $j$-th standard basis vector and has nonnegative components. From the previous step, $a_j\cdot y\geq 0$ for every $j$.
The $j$-th component of $A^\top y$ is
\begin{align*}
(A^\top y)_j=a_j\cdot y.
\end{align*}
Therefore $(A^\top y)_j\geq 0$ for every $j\in\{1,\dots,n\}$. Together with $b\cdot y<0$, this proves the second alternative.
[/step]
[step:Show that the two alternatives cannot hold simultaneously]
Assume, for contradiction, that both alternatives hold. Then there exist $x\in\mathbb{R}^n$ and $y\in\mathbb{R}^m$ such that $Ax=b$, $x_j\geq 0$ for every $j$, $(A^\top y)_j\geq 0$ for every $j$, and $b\cdot y<0$.
Using $Ax=b$ and the compatibility of transpose with the Euclidean inner product, we obtain
\begin{align*}
b\cdot y=(Ax)\cdot y=x\cdot (A^\top y)=\sum_{j=1}^n x_j(A^\top y)_j.
\end{align*}
Every summand $x_j(A^\top y)_j$ is nonnegative, so
\begin{align*}
b\cdot y\geq 0.
\end{align*}
This contradicts $b\cdot y<0$. Hence the alternatives are mutually exclusive.
The previous steps show that if $b\in K$ then the first alternative holds, while if $b\notin K$ then the second alternative holds. Since the two alternatives cannot both hold, exactly one of them holds.
[/step]