[proofplan]
We prove both implications inside ZF. First, if every set can be well-ordered, then the least element of each member of a family of nonempty sets gives a choice function. Conversely, assuming the [Axiom of Choice](/page/Axiom%20of%20Choice), we choose elements from a fixed set $X$ by transfinite recursion until no elements remain; [Hartogs' theorem](/theorems/4822) prevents this process from continuing injectively through an ordinal too large to inject into $X$. The stage at which each element is chosen then defines a well-ordering of $X$.
[/proofplan]
[step:Use a well-order of the union to choose least elements]
Assume the [Well-Ordering Theorem](/theorems/1227). Let $A$ be a set such that every $a \in A$ is nonempty, and define
\begin{align*}
U := \bigcup A.
\end{align*}
By the Well-Ordering Theorem, there is a binary relation $<_U$ on $U$ such that $(U, <_U)$ is well-ordered.
For each $a \in A$, the set $a$ is a nonempty subset of $U$, so $a$ has a unique $<_U$-least element. Define
\begin{align*}
f: A &\to U
\end{align*}
by declaring $f(a)$ to be the unique $<_U$-least element of $a$. Then $f(a) \in a$ for every $a \in A$, so $f$ is a choice function for $A$. Therefore the Axiom of Choice holds.
[guided]
Assume the Well-Ordering Theorem. We must prove the Axiom of Choice, so let $A$ be an arbitrary set whose elements are all nonempty sets. Define the union
\begin{align*}
U := \bigcup A.
\end{align*}
Every element $a \in A$ is a subset of $U$ by the definition of union.
The Well-Ordering Theorem applies to the set $U$, so there exists a binary relation $<_U$ on $U$ such that $(U, <_U)$ is well-ordered. The point of well-ordering $U$ is that it gives a uniform rule for choosing one element from each nonempty subset of $U$: take the least one.
Fix $a \in A$. Since $a$ is nonempty and $a \subset U$, the defining property of a well-order gives a $<_U$-least element of $a$. This element is unique because a strict linear order cannot have two distinct least elements in the same subset. Hence we may define a function
\begin{align*}
f: A &\to U
\end{align*}
by assigning to each $a \in A$ the unique $<_U$-least element of $a$.
By construction, $f(a) \in a$ for every $a \in A$. Thus $f$ is a choice function for the arbitrary family $A$ of nonempty sets. Therefore the Axiom of Choice holds.
[/guided]
[/step]
[step:Use choice to select from every nonempty subset of a fixed set]
Assume the Axiom of Choice, and let $X$ be a set. If $X = \varnothing$, then the empty relation well-orders $X$, so assume $X \neq \varnothing$.
Apply the Axiom of Choice to the set $\mathcal{P}(X) \setminus \{\varnothing\}$ of all nonempty subsets of $X$. Since
\begin{align*}
\bigcup\bigl(\mathcal{P}(X) \setminus \{\varnothing\}\bigr) = X,
\end{align*}
there exists a function
\begin{align*}
c: \mathcal{P}(X) \setminus \{\varnothing\} &\to X
\end{align*}
such that $c(Y) \in Y$ for every nonempty subset $Y \subset X$.
Let $\theta$ be an ordinal which does not inject into $X$, whose existence is guaranteed by [Hartogs' Theorem](/theorems/???). Define the marker
\begin{align*}
\ast := \{X\}.
\end{align*}
By the Axiom of Foundation, $\ast \notin X$, since $\ast \in X$ would imply $X \in \ast \in X$.
[guided]
Assume the Axiom of Choice, and fix an arbitrary set $X$. We want to build a well-ordering of $X$. If $X = \varnothing$, then there is nothing to choose: the empty relation is a well-ordering of the empty set. Hence assume $X \neq \varnothing$.
The Axiom of Choice gives a choice function for any set of nonempty sets. We apply it to the particular family
\begin{align*}
\mathcal{P}(X) \setminus \{\varnothing\},
\end{align*}
whose members are exactly the nonempty subsets of $X$. Its union is $X$, because every element of $X$ lies in the singleton subset containing it, and every member of every subset of $X$ lies in $X$. Therefore
\begin{align*}
\bigcup\bigl(\mathcal{P}(X) \setminus \{\varnothing\}\bigr) = X.
\end{align*}
The Axiom of Choice yields a function
\begin{align*}
c: \mathcal{P}(X) \setminus \{\varnothing\} &\to X
\end{align*}
such that $c(Y) \in Y$ for every nonempty subset $Y \subset X$.
This function $c$ is the rule that will choose the next unused element of $X$. To ensure that the recursive construction cannot continue forever, we use [Hartogs' Theorem](/theorems/???): there exists an ordinal $\theta$ which admits no injection into $X$.
We also need a value to record that the construction has stopped. Define
\begin{align*}
\ast := \{X\}.
\end{align*}
The marker $\ast$ is not an element of $X$. Indeed, if $\ast \in X$, then $X \in \ast$ by the definition of $\ast$, so $X \in \ast \in X$, contradicting the Axiom of Foundation. Thus $\ast$ is distinguishable from every element of $X$.
[/guided]
[/step]
[step:Recursively choose new elements until the set is exhausted]
For each ordinal $\alpha \in \theta$, let $\mathcal{H}_\alpha$ denote the set of all functions $h: \alpha \to X \cup \{\ast\}$. For $h \in \mathcal{H}_\alpha$, define the remaining set determined by $h$ by
\begin{align*}
S_{\alpha,h} := X \setminus \{h(\beta) : \beta \in \alpha \text{ and } h(\beta) \in X\}.
\end{align*}
Define the recursion operator at stage $\alpha$ by
\begin{align*}
G_\alpha: \mathcal{H}_\alpha &\to X \cup \{\ast\}\\
h &\mapsto
\begin{cases}
c(S_{\alpha,h}), & S_{\alpha,h} \neq \varnothing,\\
\ast, & S_{\alpha,h} = \varnothing.
\end{cases}
\end{align*}
This map is well-defined because $S_{\alpha,h}$ is a subset of $X$, and when it is nonempty the choice function $c$ is defined on it. By the [Transfinite Recursion Theorem](/theorems/???), applied to the ordinal $\theta$ and the stage rules $G_\alpha$, there is a unique function
\begin{align*}
y: \theta &\to X \cup \{\ast\}
\end{align*}
such that $y(\alpha) = G_\alpha(y|_\alpha)$ for every $\alpha \in \theta$.
For each ordinal $\alpha \in \theta$, define the remaining set before stage $\alpha$ by
\begin{align*}
R_\alpha := X \setminus \{y(\beta) : \beta \in \alpha \text{ and } y(\beta) \in X\}.
\end{align*}
Since $R_\alpha = S_{\alpha,y|_\alpha}$, the recursive equation gives
\begin{align*}
y(\alpha) :=
\begin{cases}
c(R_\alpha), & R_\alpha \neq \varnothing,\\
\ast, & R_\alpha = \varnothing.
\end{cases}
\end{align*}
There must exist some $\gamma \in \theta$ such that $R_\gamma = \varnothing$. If not, then $R_\alpha \neq \varnothing$ for every $\alpha \in \theta$, so $y(\alpha) = c(R_\alpha) \in X$ for every $\alpha \in \theta$. Moreover, if $\beta \in \alpha$, then $y(\beta)$ belongs to the set of previously chosen elements at stage $\alpha$, while $y(\alpha) \in R_\alpha$, so $y(\alpha) \neq y(\beta)$. Hence $y: \theta \to X$ would be injective, contradicting the choice of $\theta$.
[guided]
We now formalize the idea of repeatedly choosing a new element of $X$. The recursive value at stage $\alpha$ must be determined only by the earlier values, that is, by a function whose domain is $\alpha$. For each ordinal $\alpha \in \theta$, let $\mathcal{H}_\alpha$ denote the set of all functions $h: \alpha \to X \cup \{\ast\}$. For $h \in \mathcal{H}_\alpha$, define
\begin{align*}
S_{\alpha,h} := X \setminus \{h(\beta) : \beta \in \alpha \text{ and } h(\beta) \in X\}.
\end{align*}
This set is exactly the part of $X$ that remains after removing the earlier values recorded by $h$.
Define the stage rule
\begin{align*}
G_\alpha: \mathcal{H}_\alpha &\to X \cup \{\ast\}\\
h &\mapsto
\begin{cases}
c(S_{\alpha,h}), & S_{\alpha,h} \neq \varnothing,\\
\ast, & S_{\alpha,h} = \varnothing.
\end{cases}
\end{align*}
The map $G_\alpha$ is well-defined: $S_{\alpha,h} \subset X$, and if $S_{\alpha,h}$ is nonempty, then $S_{\alpha,h}$ lies in $\mathcal{P}(X) \setminus \{\varnothing\}$, so $c(S_{\alpha,h}) \in S_{\alpha,h} \subset X$; if $S_{\alpha,h}$ is empty, the value is the marker $\ast \in X \cup \{\ast\}$.
By the [Transfinite Recursion Theorem](/theorems/???), applied to the ordinal $\theta$ and the stage rules $G_\alpha$, there is a unique function
\begin{align*}
y: \theta &\to X \cup \{\ast\}
\end{align*}
such that $y(\alpha) = G_\alpha(y|_\alpha)$ for every $\alpha \in \theta$.
At stage $\alpha \in \theta$, define the remaining subset of $X$ before stage $\alpha$ by
\begin{align*}
R_\alpha := X \setminus \{y(\beta) : \beta \in \alpha \text{ and } y(\beta) \in X\}.
\end{align*}
Because $R_\alpha = S_{\alpha,y|_\alpha}$, the recursive equation says
\begin{align*}
y(\alpha) :=
\begin{cases}
c(R_\alpha), & R_\alpha \neq \varnothing,\\
\ast, & R_\alpha = \varnothing.
\end{cases}
\end{align*}
If $R_\alpha$ is nonempty, then $c(R_\alpha)$ is defined and lies in $R_\alpha$. If $R_\alpha$ is empty, the construction has exhausted $X$, and the value is the marker $\ast$.
We prove that exhaustion occurs before stage $\theta$. Suppose, toward a contradiction, that $R_\alpha \neq \varnothing$ for every $\alpha \in \theta$. Then the recursive definition gives $y(\alpha) = c(R_\alpha) \in X$ for every $\alpha \in \theta$.
We next check injectivity. Let $\beta, \alpha \in \theta$ with $\beta \in \alpha$. At stage $\alpha$, the value $y(\beta)$ has already been chosen, so $y(\beta)$ lies in the set removed from $X$ when forming $R_\alpha$. Since $y(\alpha) \in R_\alpha$, it follows that $y(\alpha) \neq y(\beta)$. Thus distinct ordinals below $\theta$ are sent to distinct elements of $X$, and $y: \theta \to X$ is injective.
This contradicts the defining property of $\theta$: no injection from $\theta$ into $X$ exists. Therefore there is some $\gamma \in \theta$ such that $R_\gamma = \varnothing$.
[/guided]
[/step]
[step:Order each element by the stage at which it is chosen]
Let $\gamma \in \theta$ be the least ordinal such that $R_\gamma = \varnothing$. For every $\alpha \in \gamma$, minimality gives $R_\alpha \neq \varnothing$, so $y(\alpha) = c(R_\alpha) \in X$. The restricted function
\begin{align*}
y|_\gamma: \gamma &\to X
\end{align*}
is injective: if $\beta,\alpha \in \gamma$ and $\beta \in \alpha$, then $y(\beta) \in X$ and $y(\beta)$ belongs to the set removed from $X$ in the definition of $R_\alpha$, while $y(\alpha) \in R_\alpha$; hence $y(\alpha) \neq y(\beta)$. Since any two distinct ordinals in $\gamma$ are comparable by membership, this proves injectivity. Since $R_\gamma = \varnothing$, every element of $X$ appears among the values $y(\alpha)$ with $\alpha \in \gamma$, so $y|_\gamma$ is surjective. Hence $y|_\gamma$ is a bijection.
Define the stage map
\begin{align*}
\tau: X &\to \gamma
\end{align*}
to be the inverse of $y|_\gamma$. Define a binary relation $<_X$ on $X$ by
\begin{align*}
x <_X z \quad \Longleftrightarrow \quad \tau(x) \in \tau(z).
\end{align*}
Since ordinal membership well-orders $\gamma$, the pullback relation $<_X$ linearly orders $X$.
[guided]
Let $\gamma \in \theta$ be the least ordinal such that $R_\gamma = \varnothing$. Such a least ordinal exists because ordinals are well-ordered by membership and we have already proved that at least one exhaustion stage exists.
For every $\alpha \in \gamma$, the minimality of $\gamma$ gives $R_\alpha \neq \varnothing$. Hence
\begin{align*}
y(\alpha) = c(R_\alpha) \in X.
\end{align*}
We now prove injectivity on $\gamma$ directly. Let $\beta,\alpha \in \gamma$ with $\beta \in \alpha$. Since $\beta \in \gamma$, the preceding paragraph gives $y(\beta) \in X$. In the definition of $R_\alpha$, every earlier value lying in $X$ is removed, so $y(\beta) \notin R_\alpha$. On the other hand, $y(\alpha) = c(R_\alpha) \in R_\alpha$. Therefore $y(\alpha) \neq y(\beta)$. Because membership linearly orders the ordinal $\gamma$, any two distinct elements of $\gamma$ are related in one of the two directions, and this proves that
\begin{align*}
y|_\gamma: \gamma &\to X
\end{align*}
is injective.
It is also surjective. The statement $R_\gamma = \varnothing$ says exactly that no element of $X$ remains after removing the elements $y(\alpha)$ for $\alpha \in \gamma$. Therefore every $x \in X$ is equal to $y(\alpha)$ for some $\alpha \in \gamma$. Hence $y|_\gamma$ is a bijection from $\gamma$ onto $X$.
Define the inverse bijection
\begin{align*}
\tau: X &\to \gamma
\end{align*}
by letting $\tau(x)$ be the unique stage $\alpha \in \gamma$ such that $y(\alpha) = x$. Now transfer the ordinal order on $\gamma$ to $X$ by declaring
\begin{align*}
x <_X z \quad \Longleftrightarrow \quad \tau(x) \in \tau(z).
\end{align*}
Because membership well-orders every ordinal, this pullback relation is a linear order on $X$.
[/guided]
[/step]
[step:Verify that the pulled-back order is a well-order]
Let $B \subset X$ be nonempty. Then
\begin{align*}
\tau[B] := \{\tau(b) : b \in B\}
\end{align*}
is a nonempty subset of the ordinal $\gamma$. Hence $\tau[B]$ has a least element $\alpha_0$ under ordinal membership. Let $b_0 \in B$ be the unique element satisfying $\tau(b_0) = \alpha_0$.
For any $b \in B$ with $b \neq b_0$, we have $\tau(b) \neq \alpha_0$, and the minimality of $\alpha_0$ in $\tau[B]$ gives $\alpha_0 \in \tau(b)$. Therefore $b_0 <_X b$. Thus every nonempty subset of $X$ has a $<_X$-least element, so $(X, <_X)$ is well-ordered. Since $X$ was arbitrary, the Well-Ordering Theorem holds.
[/step]