[proofplan]
We pass from pairwise comparisons to ranks in the pooled sample. Under the continuous null, the set of ranks occupied by the $X$-sample is a uniformly chosen $m$-element subset of $\{1,\dots,m+n\}$, so the rank sum is a simple random sample sum without replacement. Its exact mean and variance give the exact moments of $U_{m,n}$ through the affine identity between $U_{m,n}$ and the rank sum. The normal limit follows from the finite-population [central limit theorem](/theorems/521) for simple random sampling without replacement, and the same affine identity transfers the limit to the Mann-Whitney statistic.
[/proofplan]
[step:Represent the Mann-Whitney statistic by the pooled rank sum]
Fix $m,n \in \mathbb{N}$ and set $N := m+n$. Let
\begin{align*}
Z_1,\dots,Z_N
\end{align*}
denote the pooled sample, where $Z_i := X_i$ for $1 \leq i \leq m$ and $Z_{m+j} := Y_j$ for $1 \leq j \leq n$. Since the common distribution is continuous, ties occur with probability zero. On the event of no ties, define the rank map
\begin{align*}
R: \{1,\dots,N\} &\to \{1,\dots,N\}
\end{align*}
by declaring $R(a)$ to be the rank of $Z_a$ among $Z_1,\dots,Z_N$ in increasing order.
Define the $X$-rank sum
\begin{align*}
S_X := \sum_{i=1}^{m} R(i).
\end{align*}
For each $i \in \{1,\dots,m\}$, the rank $R(i)$ counts $X_i$ itself, the other $X$-observations below $X_i$, and the $Y$-observations below $X_i$. Hence
\begin{align*}
S_X
&= \frac{m(m+1)}{2}
+ \sum_{i=1}^{m}\sum_{j=1}^{n} \mathbb{1}_{\{Y_j < X_i\}}.
\end{align*}
Since ties have probability zero,
\begin{align*}
\mathbb{1}_{\{Y_j < X_i\}} = 1-\mathbb{1}_{\{X_i < Y_j\}}
\end{align*}
almost surely for every pair $(i,j)$. Therefore
\begin{align*}
S_X
&= \frac{m(m+1)}{2} + mn - U_{m,n}(X,Y),
\end{align*}
or equivalently
\begin{align*}
U_{m,n}(X,Y)
= mn + \frac{m(m+1)}{2} - S_X.
\end{align*}
[/step]
[step:Identify the rank sum as a simple random sample sum]
Because $Z_1,\dots,Z_N$ are independent and identically distributed with a continuous common distribution, every ordering of the labels $\{1,\dots,N\}$ is equally likely. Consequently the random set
\begin{align*}
A_X := \{R(1),\dots,R(m)\}
\end{align*}
is uniformly distributed over all $m$-element subsets of $\{1,\dots,N\}$. Thus $S_X$ has the same distribution as
\begin{align*}
T_{m,N} := \sum_{r \in A} r,
\end{align*}
where $A$ is a uniformly chosen $m$-element subset of $\{1,\dots,N\}$.
[guided]
The purpose of introducing ranks is to remove the unknown continuous distribution. Since the pooled variables are independent and identically distributed and ties have probability zero, the only information relevant to ranks is the random ordering of the labels
\begin{align*}
1,\dots,m,m+1,\dots,N.
\end{align*}
Each of the $N!$ strict orderings of these labels has the same probability, because the variables are exchangeable.
Define
\begin{align*}
A_X := \{R(1),\dots,R(m)\}.
\end{align*}
This is the set of ranks occupied by the observations labelled as $X$-observations. For any fixed $m$-element subset $A_0 \subset \{1,\dots,N\}$, the number of label orderings for which $A_X=A_0$ is $m!n!$: the $m$ labels $1,\dots,m$ may be arranged among the ranks in $A_0$ in $m!$ ways, and the $n$ labels $m+1,\dots,N$ may be arranged among the complementary ranks in $n!$ ways. Since this number is independent of $A_0$, each $m$-element subset has the same probability. Therefore $A_X$ is uniformly distributed over all $m$-element subsets of $\{1,\dots,N\}$.
It follows that
\begin{align*}
S_X = \sum_{i=1}^{m} R(i)
\end{align*}
has the same distribution as
\begin{align*}
T_{m,N}:=\sum_{r \in A} r,
\end{align*}
where $A$ is a uniformly chosen $m$-element subset of $\{1,\dots,N\}$. This is exactly the model of a simple random sample without replacement from the finite population $\{1,\dots,N\}$.
[/guided]
[/step]
[step:Compute the exact mean of the rank sum and of $U_{m,n}$]
For each $r \in \{1,\dots,N\}$, define the indicator [random variable](/page/Random%20Variable)
\begin{align*}
I_r := \mathbb{1}_{\{r \in A\}}.
\end{align*}
Since $A$ is uniformly distributed over all $m$-element subsets of $\{1,\dots,N\}$,
\begin{align*}
\mathbb{P}(r \in A)=\frac{m}{N}.
\end{align*}
Thus, by linearity of expectation applied to the finite sum defining $T_{m,N}$,
\begin{align*}
\mathbb{E}[S_X] = \mathbb{E}[T_{m,N}] = \sum_{r=1}^{N} r\,\mathbb{E}[I_r] = \frac{m}{N}\sum_{r=1}^{N} r = \frac{m}{N}\cdot \frac{N(N+1)}{2} = \frac{m(N+1)}{2}.
\end{align*}
Using $N=m+n$ and the affine identity for $U_{m,n}$,
\begin{align*}
\mathbb{E}[U_{m,n}(X,Y)] = mn+\frac{m(m+1)}{2}-\frac{m(N+1)}{2} = mn+\frac{m(m+1)}{2}-\frac{m(m+n+1)}{2} = \frac{mn}{2}.
\end{align*}
[/step]
[step:Compute the exact variance of the rank sum and of $U_{m,n}$]
For a uniformly chosen $m$-element subset $A \subset \{1,\dots,N\}$, the finite-population mean and finite-population variance of the population values $1,\dots,N$ are
\begin{align*}
\mu_N := \frac{1}{N}\sum_{r=1}^{N} r = \frac{N+1}{2}
\end{align*}
and
\begin{align*}
\sigma_N^2
:= \frac{1}{N}\sum_{r=1}^{N}(r-\mu_N)^2
= \frac{N^2-1}{12}.
\end{align*}
The variance formula for a simple random sample sum without replacement gives
\begin{align*}
\operatorname{Var}(S_X)
= \operatorname{Var}(T_{m,N})
= \frac{m(N-m)}{N-1}\sigma_N^2.
\end{align*}
Substituting $N-m=n$ and $\sigma_N^2=(N^2-1)/12$ yields
\begin{align*}
\operatorname{Var}(S_X)
&= \frac{mn}{N-1}\cdot \frac{N^2-1}{12}
= \frac{mn(N+1)}{12}.
\end{align*}
Since $U_{m,n}(X,Y)=mn+m(m+1)/2-S_X$, adding constants and multiplying by $-1$ do not change variance. Therefore
\begin{align*}
\operatorname{Var}(U_{m,n}(X,Y))
= \operatorname{Var}(S_X)
= \frac{mn(m+n+1)}{12}.
\end{align*}
[/step]
[step:Apply the finite-population central limit theorem to the centered rank sum]
Let
\begin{align*}
\sigma_{m,n}^2 := \frac{mn(N+1)}{12}.
\end{align*}
Since $m/N \to \lambda \in (0,1)$, both sample sizes satisfy $m \to \infty$ and $N-m=n \to \infty$, and neither sampling fraction degenerates. The finite-population [central limit theorem](/theorems/1848) for simple random sampling without replacement applied to the population $\{1,\dots,N\}$ gives
\begin{align*}
\frac{S_X-\mathbb{E}[S_X]}{\sigma_{m,n}}
\xrightarrow{d} \mathcal{N}(0,1).
\end{align*}
Here the theorem applies because the centered population values
\begin{align*}
r-\frac{N+1}{2}, \qquad 1 \leq r \leq N,
\end{align*}
have variance $(N^2-1)/12$, and their maximal squared deviation is of order $N^2$, while the total finite-population variance is of order $N^3$; under $m/N \to \lambda \in (0,1)$, the usual Lindeberg condition for sampling without replacement is satisfied. This invocation is of a result not yet linked in the wiki: finite-population central limit theorem for simple random sampling without replacement.
[guided]
We now need a central limit theorem for the rank sum
\begin{align*}
S_X=\sum_{r\in A} r,
\end{align*}
where $A$ is a uniformly chosen $m$-element subset of $\{1,\dots,N\}$. This is not an independent sum, because the ranks are sampled without replacement. The correct replacement for the ordinary independent central limit theorem is the finite-population central limit theorem for simple random sampling without replacement.
Define the normalizing variance
\begin{align*}
\sigma_{m,n}^2 := \frac{mn(N+1)}{12}.
\end{align*}
From the previous step, this is exactly $\operatorname{Var}(S_X)$. The finite population is
\begin{align*}
\{1,\dots,N\},
\end{align*}
with finite-population mean
\begin{align*}
\mu_N=\frac{N+1}{2}
\end{align*}
and finite-population variance
\begin{align*}
\sigma_N^2=\frac{N^2-1}{12}.
\end{align*}
The centered population values are
\begin{align*}
a_{r,N}:=r-\mu_N, \qquad 1 \leq r \leq N.
\end{align*}
Their largest absolute size is at most $N$, while their total squared size is
\begin{align*}
\sum_{r=1}^{N} a_{r,N}^2
= N\sigma_N^2
= \frac{N(N^2-1)}{12}.
\end{align*}
Hence no single population value contributes a non-negligible proportion of the total squared mass:
\begin{align*}
\frac{\max_{1\leq r\leq N} a_{r,N}^2}{\sum_{r=1}^{N} a_{r,N}^2}
\leq \frac{N^2}{N(N^2-1)/12}
\to 0.
\end{align*}
Together with $m/N \to \lambda \in (0,1)$, this is the standard Lindeberg-type hypothesis in the finite-population central limit theorem for sampling without replacement. Therefore that theorem gives
\begin{align*}
\frac{S_X-\mathbb{E}[S_X]}{\sigma_{m,n}}
\xrightarrow{d} \mathcal{N}(0,1).
\end{align*}
This is precisely the normal approximation for the rank-sum statistic. The role of the condition $\lambda \in (0,1)$ is to ensure that both the sampled and unsampled portions of the finite population remain macroscopically large; if one sample size were negligible compared with $N$, the variance scale and limiting argument would be different.
[/guided]
[/step]
[step:Transfer the normal limit from the rank sum to the Mann-Whitney statistic]
From the affine relation
\begin{align*}
U_{m,n}(X,Y)=mn+\frac{m(m+1)}{2}-S_X
\end{align*}
and the identity
\begin{align*}
\mathbb{E}[U_{m,n}(X,Y)]
= mn+\frac{m(m+1)}{2}-\mathbb{E}[S_X],
\end{align*}
we obtain
\begin{align*}
U_{m,n}(X,Y)-\mathbb{E}[U_{m,n}(X,Y)]
= -(S_X-\mathbb{E}[S_X]).
\end{align*}
Since
\begin{align*}
\operatorname{Var}(U_{m,n}(X,Y))
= \operatorname{Var}(S_X)
= \sigma_{m,n}^2,
\end{align*}
the rank-sum limit gives
\begin{align*}
\frac{U_{m,n}(X,Y)-mn/2}{\sqrt{mn(m+n+1)/12}}
=
-\frac{S_X-\mathbb{E}[S_X]}{\sigma_{m,n}}
\xrightarrow{d} \mathcal{N}(0,1),
\end{align*}
because the standard normal distribution is symmetric about $0$. This proves the asserted mean, variance, and asymptotic normality.
[/step]