[proofplan]
The proof separates the deterministic displacement of the statistic from its random fluctuation. First we compute the local expansion of $\mathbb{P}(X<Y_\theta)$ at $\theta=0$ and identify its derivative as the square-density integral
\begin{align*}
\int_{\mathbb{R}} f(x)^2\,d\mathcal{L}^1(x).
\end{align*}
Multiplying this probability shift by the product of the two sample sizes and dividing by the usual null standard deviation gives the displayed noncentrality parameter. The deterministic expansion uses absolute continuity and the assumption $f\in L^2(\mathbb{R})$. The theorem's explicit Hájek projection hypothesis supplies both the uniform projection approximation and the projection [central limit theorem](/theorems/521) along the local alternatives, so the centered random part is asymptotically standard normal.
[/proofplan]
[step:Express the expectation of the Mann-Whitney statistic as a shifted comparison probability]
For each $N$, let $(\Omega_N,\mathcal{F}_N,\mathbb{P}_N)$ denote the probability space carrying the triangular-array sample at level $N$. Let $m_N,n_N\in\mathbb{N}$ denote the sample sizes, so that $N=m_N+n_N$. Let $X_{N,1},\dots,X_{N,m_N}:\Omega_N\to\mathbb{R}$ denote independent real-valued random variables with distribution function $F$, and let $Y_{N,1},\dots,Y_{N,n_N}:\Omega_N\to\mathbb{R}$ denote independent real-valued random variables with distribution function $y\mapsto F(y-\theta_N)$, independent of the $X$-sample. Let $\mathcal{L}^1$ denote one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on $\mathbb{R}$, and let $\mathcal{L}^2$ denote two-dimensional Lebesgue measure on $\mathbb{R}^2$. For the one-pair comparison calculation, fix an auxiliary probability space $(\Omega,\mathcal{F},\mathbb{P})$ carrying independent real-valued variables with the stated one-dimensional laws. For $\theta\in\mathbb{R}$, let $Y_\theta:\Omega\to\mathbb{R}$ denote a real-valued [random variable](/page/Random%20Variable) with distribution function $y\mapsto F(y-\theta)$, and let $X:\Omega\to\mathbb{R}$ denote a real-valued random variable with distribution function $F$, independent of $Y_\theta$. The density of $Y_\theta$ with respect to $\mathcal{L}^1$ is the measurable map $f_\theta:\mathbb{R}\to[0,\infty)$ defined by $f_\theta(y)=f(y-\theta)$. Since the laws of $X$ and $Y_\theta$ are absolutely continuous with respect to $\mathcal{L}^1$, their product law is absolutely continuous with respect to $\mathcal{L}^2$. The diagonal $D:=\{(x,y)\in\mathbb{R}^2:x=y\}$ satisfies $\mathcal{L}^2(D)=0$, so
\begin{align*}
\mathbb{P}(X=Y_\theta)=0.
\end{align*}
Therefore, by linearity of expectation and the identical distribution of all cross-sample pairs,
\begin{align*}
\mathbb{E}[U_{m_N,n_N}]=\sum_{i=1}^{m_N}\sum_{j=1}^{n_N}\mathbb{P}(X_{N,i}<Y_{N,j})=m_Nn_N\,p(\theta_N),
\end{align*}
where the comparison probability $p:\mathbb{R}\to[0,1]$ is defined by
\begin{align*}
p(\theta):=\mathbb{P}(X<Y_\theta)=\int_{\mathbb{R}}F(y)f(y-\theta)\,d\mathcal{L}^1(y).
\end{align*}
Using the substitution $x=y-\theta$, so that $y=x+\theta$ and $d\mathcal{L}^1(y)=d\mathcal{L}^1(x)$, this becomes
\begin{align*}
p(\theta)=\int_{\mathbb{R}}F(x+\theta)f(x)\,d\mathcal{L}^1(x).
\end{align*}
At $\theta=0$, the variables $X$ and $Y_0$ are independent and identically distributed with continuous distribution function $F$. Hence
\begin{align*}
\mathbb{P}(X<Y_0)=\mathbb{P}(Y_0<X).
\end{align*}
Since $\mathbb{P}(X=Y_0)=0$, the two strict-ordering events have probabilities summing to $1$, and therefore
\begin{align*}
p(0)=\mathbb{P}(X<Y_0)=\frac{1}{2}.
\end{align*}
[guided]
For each $N$, let $(\Omega_N,\mathcal{F}_N,\mathbb{P}_N)$ be the probability space carrying the triangular-array sample at level $N$. The sample sizes are denoted by $m_N$ and $n_N$, and they satisfy $N=m_N+n_N$. The random variables $X_{N,1},\dots,X_{N,m_N}:\Omega_N\to\mathbb{R}$ have distribution function $F$, while $Y_{N,1},\dots,Y_{N,n_N}:\Omega_N\to\mathbb{R}$ have distribution function $y\mapsto F(y-\theta_N)$; the two samples are independent. For the one-pair comparison calculation, we use an auxiliary probability space $(\Omega,\mathcal{F},\mathbb{P})$ carrying independent real-valued variables with the required one-dimensional laws. The statistic $U_{m_N,n_N}$ counts ordered cross-sample pairs for which the $X$ observation is smaller than the $Y$ observation. Because every pair has the same distribution, linearity of expectation reduces the mean of the whole statistic to one comparison probability.
Let $\mathcal{L}^1$ denote one-dimensional Lebesgue measure on $\mathbb{R}$, and let $\mathcal{L}^2$ denote two-dimensional Lebesgue measure on $\mathbb{R}^2$. For a shift parameter $\theta\in\mathbb{R}$, define $Y_\theta:\Omega\to\mathbb{R}$ to have distribution function $y\mapsto F(y-\theta)$. Since $F$ is absolutely continuous with density $f$, the shifted density is the measurable map $f_\theta:\mathbb{R}\to[0,\infty)$ defined by $f_\theta(y)=f(y-\theta)$. Let $X:\Omega\to\mathbb{R}$ have distribution function $F$, independently of $Y_\theta$. The laws of $X$ and $Y_\theta$ are absolutely continuous, so their product law is absolutely continuous with respect to $\mathcal{L}^2$. Because the diagonal $D:=\{(x,y)\in\mathbb{R}^2:x=y\}$ has $\mathcal{L}^2(D)=0$, ties have probability zero, and the strict comparison $X<Y_\theta$ has no boundary correction.
Define $p:\mathbb{R}\to[0,1]$ by $p(\theta)=\mathbb{P}(X<Y_\theta)$.
Conditioning on the value of $Y_\theta$ and using its density gives
\begin{align*}
p(\theta)
=\int_{\mathbb{R}}\mathbb{P}(X<y)\,f_\theta(y)\,d\mathcal{L}^1(y)
=\int_{\mathbb{R}}F(y)f(y-\theta)\,d\mathcal{L}^1(y).
\end{align*}
Now apply the translation substitution $x=y-\theta$. The one-dimensional [Lebesgue measure is translation invariant](/theorems/4911), so $d\mathcal{L}^1(y)=d\mathcal{L}^1(x)$, and the domain $\mathbb{R}$ is unchanged under translation. Hence
\begin{align*}
p(\theta)=\int_{\mathbb{R}}F(x+\theta)f(x)\,d\mathcal{L}^1(x).
\end{align*}
Finally, by linearity of expectation and the identity $\mathbb{E}[\mathbb{1}_A]=\mathbb{P}(A)$ for an event $A$,
\begin{align*}
\mathbb{E}[U_{m_N,n_N}]=\sum_{i=1}^{m_N}\sum_{j=1}^{n_N}\mathbb{E}[\mathbb{1}_{\{X_{N,i}<Y_{N,j}\}}]=\sum_{i=1}^{m_N}\sum_{j=1}^{n_N}\mathbb{P}(X_{N,i}<Y_{N,j})=m_Nn_N\,p(\theta_N).
\end{align*}
At the null shift $\theta=0$, the two variables are independent and identically distributed with continuous distribution function $F$, so symmetry gives
\begin{align*}
\mathbb{P}(X<Y_0)=\mathbb{P}(Y_0<X).
\end{align*}
Since ties have probability zero, these two probabilities sum to $1$, and therefore
\begin{align*}
p(0)=\frac{1}{2}.
\end{align*}
[/guided]
[/step]
[step:Differentiate the shifted comparison probability at the null]
For $\theta>0$, define the translation-average operator $A_\theta:L^2(\mathbb{R})\to L^2(\mathbb{R})$ by
\begin{align*}
(A_\theta g)(x):=\frac{1}{\theta}\int_{(0,\theta)} g(x+s)\,d\mathcal{L}^1(s).
\end{align*}
For $\theta<0$, define $A_\theta:L^2(\mathbb{R})\to L^2(\mathbb{R})$ by
\begin{align*}
(A_\theta g)(x):=\frac{1}{|\theta|}\int_{(\theta,0)} g(x+s)\,d\mathcal{L}^1(s).
\end{align*}
Since $F$ is absolutely continuous with derivative $f$ $\mathcal{L}^1$-a.e., the [Fundamental Theorem of Calculus](/theorems/632) for absolutely continuous functions gives, for $\mathcal{L}^1$-a.e. $x\in\mathbb{R}$,
\begin{align*}
\frac{F(x+\theta)-F(x)}{\theta}=(A_\theta f)(x)
\end{align*}
for both signs of $\theta$. Indeed, when $\theta<0$, the identity follows from $F(x+\theta)-F(x)=-\int_{(\theta,0)} f(x+s)\,d\mathcal{L}^1(s)$. For each fixed $g\in L^2(\mathbb{R})$, the map $s\mapsto g(\cdot+s)$ from the relevant finite interval into $L^2(\mathbb{R})$ is strongly measurable because translations are strongly continuous on $L^2(\mathbb{R})$. To justify this standard fact here, approximate $g$ in $L^2(\mathbb{R})$ by a [simple function](/page/Simple%20Function) $q=\sum_{k=1}^K a_k\mathbb{1}_{E_k}$, where $K\in\mathbb{N}$, $a_k\in\mathbb{R}$, and each $E_k\subset\mathbb{R}$ is a finite union of bounded intervals; such approximations follow from truncation and regularity of one-dimensional Lebesgue measure. For such $q$, translation continuity follows from $\mathcal{L}^1((E_k+s)\triangle E_k)\to0$ for each finite union of bounded intervals $E_k$. The triangle inequality and translation-invariance of the $L^2(\mathbb{R})$ norm then give $\|g(\cdot+s)-g\|_{L^2(\mathbb{R})}\to0$. Hence the displayed integrals define Bochner averages in $L^2(\mathbb{R})$. Applying [Jensen's inequality](/theorems/9) to the convex function $t\mapsto t^2$ and to the normalized measure $|\theta|^{-1}\mathcal{L}^1$ on the relevant averaging interval gives the pointwise estimate. Integrating this estimate over $x\in\mathbb{R}$ and using [Fubini's Theorem](/theorems/2961) in its nonnegative-integrand form yields
\begin{align*}
\|A_\theta f\|_{L^2(\mathbb{R})}\leq \|f\|_{L^2(\mathbb{R})}.
\end{align*}
Moreover, by the strong continuity of the translation map $s\mapsto f(\cdot+s)$ in $L^2(\mathbb{R})$,
\begin{align*}
\|A_\theta f-f\|_{L^2(\mathbb{R})}
\leq \frac{1}{|\theta|}\int_{I_\theta}\|f(\cdot+s)-f\|_{L^2(\mathbb{R})}\,d\mathcal{L}^1(s),
\end{align*}
where $I_\theta=(0,\theta)$ if $\theta>0$ and $I_\theta=(\theta,0)$ if $\theta<0$. The integrand tends to $0$ uniformly for $s\in I_\theta$ as $\theta\to0$, so $A_\theta f\to f$ in $L^2(\mathbb{R})$.
Using the $L^2$ [inner product](/page/Inner%20Product) with the fixed function $f\in L^2(\mathbb{R})$, we obtain
\begin{align*}
\frac{p(\theta)-p(0)}{\theta}=\int_{\mathbb{R}}(A_\theta f)(x)f(x)\,d\mathcal{L}^1(x).
\end{align*}
The convergence follows from the $L^2(\mathbb{R})$ inner-product estimate
\begin{align*}
\left|\int_{\mathbb{R}}(A_\theta f)(x)f(x)\,d\mathcal{L}^1(x)-\int_{\mathbb{R}}f(x)^2\,d\mathcal{L}^1(x)\right|
\leq \|A_\theta f-f\|_{L^2(\mathbb{R})}\|f\|_{L^2(\mathbb{R})},
\end{align*}
and the right-hand side tends to $0$ because $A_\theta f\to f$ in $L^2(\mathbb{R})$.
Therefore
\begin{align*}
p(\theta)=\frac{1}{2}+\theta\int_{\mathbb{R}}f(x)^2\,d\mathcal{L}^1(x)+o(\theta)
\end{align*}
as $\theta\to 0$.
[guided]
The point of this step is to differentiate $p(\theta)$ without assuming that $f$ has a pointwise derivative. We already rewrote the comparison probability as
\begin{align*}
p(\theta)=\int_{\mathbb{R}}F(x+\theta)f(x)\,d\mathcal{L}^1(x).
\end{align*}
Thus the derivative at $0$ is controlled by the difference quotient of $F$, paired against the fixed $L^2$ function $f$.
For $\theta>0$, define $A_\theta:L^2(\mathbb{R})\to L^2(\mathbb{R})$ by
\begin{align*}
(A_\theta g)(x):=\frac{1}{\theta}\int_{(0,\theta)}g(x+s)\,d\mathcal{L}^1(s).
\end{align*}
For $\theta<0$, define $A_\theta:L^2(\mathbb{R})\to L^2(\mathbb{R})$ by
\begin{align*}
(A_\theta g)(x):=\frac{1}{|\theta|}\int_{(\theta,0)}g(x+s)\,d\mathcal{L}^1(s).
\end{align*}
This split is necessary because Lebesgue integration over an interval is not oriented. The sign is instead carried by the numerator $F(x+\theta)-F(x)$. Since $F$ is absolutely continuous and has derivative $f$ $\mathcal{L}^1$-a.e., the [Fundamental Theorem of Calculus](/theorems/632) for absolutely continuous functions gives, for $\mathcal{L}^1$-a.e. $x$,
\begin{align*}
\frac{F(x+\theta)-F(x)}{\theta}=(A_\theta f)(x)
\end{align*}
for both $\theta>0$ and $\theta<0$.
The operators $A_\theta$ are Bochner averages of translations in $L^2(\mathbb{R})$. Indeed, for each fixed $g\in L^2(\mathbb{R})$, the map $s\mapsto g(\cdot+s)$ from the averaging interval into $L^2(\mathbb{R})$ is strongly measurable because translations are strongly continuous on $L^2(\mathbb{R})$. We verify that continuity directly. Given $g\in L^2(\mathbb{R})$ and $\varepsilon>0$, choose a simple function $q=\sum_{k=1}^K a_k\mathbb{1}_{E_k}$, where $K\in\mathbb{N}$, $a_k\in\mathbb{R}$, and each $E_k\subset\mathbb{R}$ is a finite union of bounded intervals, such that $\|g-q\|_{L^2(\mathbb{R})}<\varepsilon$. This approximation is obtained by truncating $g$, approximating the truncated function by simple functions, and using regularity of $\mathcal{L}^1$ to replace the measurable level sets by finite unions of bounded intervals. For each such set $E_k$, the symmetric difference satisfies $\mathcal{L}^1((E_k+s)\triangle E_k)\to0$ as $s\to0$, so $\|q(\cdot+s)-q\|_{L^2(\mathbb{R})}\to0$. Translation-invariance gives $\|g(\cdot+s)-q(\cdot+s)\|_{L^2(\mathbb{R})}=\|g-q\|_{L^2(\mathbb{R})}$, and the triangle inequality then gives $\|g(\cdot+s)-g\|_{L^2(\mathbb{R})}\to0$. Applying [Jensen's inequality](/theorems/9) to the convex function $t\mapsto t^2$ and to the probability measure $|\theta|^{-1}\mathcal{L}^1$ on the relevant interval gives a pointwise estimate for $|(A_\theta f)(x)|^2$. We then integrate in $x$ and use [Fubini's Theorem](/theorems/2961) in its nonnegative-integrand form, whose nonnegativity hypothesis is satisfied by the squared integrand, to interchange the $x$- and $s$-integrals. This gives
\begin{align*}
\|A_\theta f\|_{L^2(\mathbb{R})}\leq \|f\|_{L^2(\mathbb{R})}.
\end{align*}
Moreover, the same strong continuity gives
\begin{align*}
\|A_\theta f-f\|_{L^2(\mathbb{R})}
\leq \frac{1}{|\theta|}\int_{I_\theta}\|f(\cdot+s)-f\|_{L^2(\mathbb{R})}\,d\mathcal{L}^1(s),
\end{align*}
where $I_\theta=(0,\theta)$ if $\theta>0$ and $I_\theta=(\theta,0)$ if $\theta<0$. The integrand tends to $0$ uniformly for $s\in I_\theta$ as $\theta\to0$, so $A_\theta f\to f$ in $L^2(\mathbb{R})$.
Therefore
\begin{align*}
\frac{p(\theta)-p(0)}{\theta}=\int_{\mathbb{R}}(A_\theta f)(x)f(x)\,d\mathcal{L}^1(x).
\end{align*}
Because $A_\theta f\to f$ in $L^2(\mathbb{R})$ and $f\in L^2(\mathbb{R})$, the $L^2(\mathbb{R})$ inner-product estimate gives
\begin{align*}
\left|\int_{\mathbb{R}}(A_\theta f)(x)f(x)\,d\mathcal{L}^1(x)-\int_{\mathbb{R}}f(x)^2\,d\mathcal{L}^1(x)\right|
\leq \|A_\theta f-f\|_{L^2(\mathbb{R})}\|f\|_{L^2(\mathbb{R})}\to0.
\end{align*}
Since $p(0)=1/2$, this proves the two-sided expansion
\begin{align*}
p(\theta)=\frac{1}{2}+\theta\int_{\mathbb{R}}f(x)^2\,d\mathcal{L}^1(x)+o(\theta)
\end{align*}
as $\theta\to 0$.
[/guided]
[/step]
[step:Compute the deterministic mean shift after standardization]
Set
\begin{align*}
I_f:=\int_{\mathbb{R}}f(x)^2\,d\mathcal{L}^1(x).
\end{align*}
The local shift is
\begin{align*}
\theta_N=\frac{h}{\sqrt{N}}.
\end{align*}
Define the remainder function $r:\mathbb{R}\to\mathbb{R}$ by
\begin{align*}
r(\theta):=p(\theta)-\frac{1}{2}-\theta I_f.
\end{align*}
The previous step says $r(\theta)/\theta\to0$ as $\theta\to0$ with $\theta\neq0$, and also $r(0)=0$. With this displayed choice of $\theta_N$, we have
\begin{align*}
\mathbb{E}[U_{m_N,n_N}]-\frac{m_Nn_N}{2}
=m_Nn_N\theta_N I_f+m_Nn_N r(\theta_N).
\end{align*}
Divide by
\begin{align*}
\sigma_N:=\sqrt{\frac{m_Nn_N(N+1)}{12}}.
\end{align*}
Since
\begin{align*}
\frac{m_N}{N}\to\lambda, \qquad \frac{n_N}{N}\to 1-\lambda,
\end{align*}
we compute
\begin{align*}
\frac{m_Nn_N\theta_N}{\sigma_N}=h\sqrt{12}\sqrt{\frac{m_Nn_N}{N(N+1)}}.
\end{align*}
Here the relation $N=m_N+n_N$ is used in the factor $N+1$ in the null variance normalization. Since
\begin{align*}
\frac{m_N}{N}\to\lambda, \qquad \frac{n_N}{N}\to1-\lambda,
\end{align*}
this quantity satisfies
\begin{align*}
\frac{m_Nn_N\theta_N}{\sigma_N}\to h\sqrt{12\lambda(1-\lambda)}.
\end{align*}
The remainder term vanishes after standardization. If $h=0$, then $\theta_N=0$ for every $N$ and $r(\theta_N)=0$. If $h\neq0$, write $r(\theta_N)=\theta_N\varepsilon_N$, where $\varepsilon_N\to0$; then
\begin{align*}
\frac{m_Nn_N r(\theta_N)}{\sigma_N}
=h\sqrt{12}\sqrt{\frac{m_Nn_N}{N(N+1)}}\,\varepsilon_N\to0.
\end{align*}
Thus
\begin{align*}
\frac{\mathbb{E}[U_{m_N,n_N}]-m_Nn_N/2}{\sigma_N}
\to h\sqrt{12\lambda(1-\lambda)}\,I_f.
\end{align*}
[guided]
Define
\begin{align*}
I_f:=\int_{\mathbb{R}}f(x)^2\,d\mathcal{L}^1(x).
\end{align*}
The previous step proved that the comparison probability has the first-order expansion
\begin{align*}
p(\theta)=\frac{1}{2}+\theta I_f+o(\theta)
\end{align*}
as $\theta\to0$. The local alternative uses
\begin{align*}
\theta_N=\frac{h}{\sqrt{N}}.
\end{align*}
Define $r:\mathbb{R}\to\mathbb{R}$ by
\begin{align*}
r(\theta):=p(\theta)-\frac{1}{2}-\theta I_f.
\end{align*}
Then $r(\theta)/\theta\to0$ as $\theta\to0$ with $\theta\neq0$, and $r(0)=0$. Therefore
\begin{align*}
\mathbb{E}[U_{m_N,n_N}]-\frac{m_Nn_N}{2}=m_Nn_N\theta_N I_f+m_Nn_Nr(\theta_N).
\end{align*}
This formulation also covers the case $h=0$, where $\theta_N=0$ and the remainder is exactly $0$.
The null standard deviation is the quantity
\begin{align*}
\sigma_N:=\sqrt{\frac{m_Nn_N(N+1)}{12}}.
\end{align*}
Dividing the leading deterministic term by $\sigma_N$ gives
\begin{align*}
\frac{m_Nn_N\theta_N}{\sigma_N}=h\sqrt{12}\sqrt{\frac{m_Nn_N}{N(N+1)}}.
\end{align*}
The assumptions
\begin{align*}
\frac{m_N}{N}\to\lambda, \qquad \frac{n_N}{N}\to1-\lambda,
\end{align*}
together with $N=m_N+n_N$, imply
\begin{align*}
\frac{m_Nn_N\theta_N}{\sigma_N}\to h\sqrt{12\lambda(1-\lambda)}.
\end{align*}
It remains to check that the standardized remainder contributes nothing. If $h=0$, then $\theta_N=0$ and $r(\theta_N)=0$. If $h\neq0$, write $r(\theta_N)=\theta_N\varepsilon_N$ with $\varepsilon_N\to0$. Then
\begin{align*}
\frac{m_Nn_N r(\theta_N)}{\sigma_N}
=h\sqrt{12}\sqrt{\frac{m_Nn_N}{N(N+1)}}\,\varepsilon_N\to0.
\end{align*}
Multiplying the leading term by $I_f$ gives the deterministic standardized mean shift:
\begin{align*}
\frac{\mathbb{E}[U_{m_N,n_N}]-m_Nn_N/2}{\sigma_N}
\to h\sqrt{12\lambda(1-\lambda)}\,I_f.
\end{align*}
[/guided]
[/step]
[step:Use the assumed Hájek projection to identify the limiting fluctuation]
Define the centered standardized fluctuation
\begin{align*}
Z_N:=
\frac{U_{m_N,n_N}-\mathbb{E}[U_{m_N,n_N}]}{\sigma_N}.
\end{align*}
Let $H_N:\Omega_N\to\mathbb{R}$ denote the centered standardized linear Hájek projection appearing in the theorem statement. The theorem's contiguity-transfer hypothesis is used at this point: along the local alternative sequence
\begin{align*}
\theta_N=\frac{h}{\sqrt{N}},
\end{align*}
it gives the alternative-measure projection approximation $Z_N-H_N=o_{\mathbb{P}}(1)$ and transfers the projection [central limit theorem](/theorems/1848) to give $H_N\xrightarrow{d}\mathcal{N}(0,1)$ under the shifted product measures. Thus the projection central limit theorem is an explicit part of the theorem's probabilistic hypothesis, with contiguity specifying why the null projection asymptotics remain valid along the local alternatives. Since $Z_N=H_N+(Z_N-H_N)$, [Slutsky's Lemma](/theorems/1850), applied to the distributionally convergent sequence $H_N$ and the probability-convergent error $Z_N-H_N$, gives
\begin{align*}
Z_N\xrightarrow{d}\mathcal{N}(0,1).
\end{align*}
[guided]
The random part of the statistic is isolated by defining
\begin{align*}
Z_N:=
\frac{U_{m_N,n_N}-\mathbb{E}[U_{m_N,n_N}]}{\sigma_N}.
\end{align*}
The theorem assumes a Hájek projection approximation along the local alternative sequence
\begin{align*}
\theta_N=\frac{h}{\sqrt{N}}.
\end{align*}
More precisely, let $H_N:\Omega_N\to\mathbb{R}$ be the centered standardized linear Hájek projection specified in the theorem statement. The shifted product measures are contiguous to the null product measures, and the theorem assumes that this contiguity transfers the null projection asymptotics to the local alternatives. Therefore, under the shifted product measures, the hypothesis gives
\begin{align*}
Z_N-H_N\xrightarrow{\mathbb{P}}0,
\qquad
H_N\xrightarrow{d}\mathcal{N}(0,1).
\end{align*}
This is the exact probabilistic input needed here: contiguity is not being used as a standalone central limit theorem, but as the mechanism included in the hypothesis for carrying the Hájek projection approximation and projection limit law to the local alternatives.
Since $Z_N=H_N+(Z_N-H_N)$, [Slutsky's Lemma](/theorems/1850) applies to the distributionally convergent sequence $H_N$ and the error term $Z_N-H_N$ converging to $0$ in probability. This use is justified because both random variables are real-valued and are defined under the same shifted product measure for each $N$. Hence
\begin{align*}
Z_N\xrightarrow{d}\mathcal{N}(0,1).
\end{align*}
[/guided]
[/step]
[step:Combine the deterministic and random parts]
Decompose the statistic as
\begin{align*}
\frac{U_{m_N,n_N}-m_Nn_N/2}{\sigma_N}
=
\frac{U_{m_N,n_N}-\mathbb{E}[U_{m_N,n_N}]}{\sigma_N}
+
\frac{\mathbb{E}[U_{m_N,n_N}]-m_Nn_N/2}{\sigma_N}.
\end{align*}
The first term converges in distribution to $\mathcal{N}(0,1)$ by the previous step, and the second term converges to the deterministic constant
\begin{align*}
h\sqrt{12\lambda(1-\lambda)}\int_{\mathbb{R}}f(x)^2\,d\mathcal{L}^1(x).
\end{align*}
A second application of [Slutsky's Lemma](/theorems/1850), now with a deterministic real sequence converging to the displayed constant, therefore gives
\begin{align*}
\frac{U_{m_N,n_N}-m_Nn_N/2}{\sqrt{m_Nn_N(N+1)/12}}
\xrightarrow{d}
\mathcal{N}\left(
h\sqrt{12\lambda(1-\lambda)}\int_{\mathbb{R}}f(x)^2\,d\mathcal{L}^1(x),
1
\right).
\end{align*}
This is the claimed local asymptotic mean shift and completes the proof.
[guided]
We now combine the deterministic centering correction with the random fluctuation. The exact algebraic decomposition is
\begin{align*}
\frac{U_{m_N,n_N}-m_Nn_N/2}{\sigma_N}
=
\frac{U_{m_N,n_N}-\mathbb{E}[U_{m_N,n_N}]}{\sigma_N}
+
\frac{\mathbb{E}[U_{m_N,n_N}]-m_Nn_N/2}{\sigma_N}.
\end{align*}
The first term is $Z_N$, and the previous step proved
\begin{align*}
Z_N\xrightarrow{d}\mathcal{N}(0,1).
\end{align*}
The second term is deterministic, and the deterministic-scaling step proved that it converges to
\begin{align*}
h\sqrt{12\lambda(1-\lambda)}\int_{\mathbb{R}}f(x)^2\,d\mathcal{L}^1(x).
\end{align*}
[Slutsky's Lemma](/theorems/1850) applies to the sum of the distributionally convergent random term $Z_N$ and the deterministic real sequence converging to the displayed constant. Therefore
\begin{align*}
\frac{U_{m_N,n_N}-m_Nn_N/2}{\sqrt{m_Nn_N(N+1)/12}}
\xrightarrow{d}
\mathcal{N}\left(
h\sqrt{12\lambda(1-\lambda)}\int_{\mathbb{R}}f(x)^2\,d\mathcal{L}^1(x),
1
\right).
\end{align*}
This is precisely the asserted asymptotic normal law with variance $1$ and the displayed local mean shift.
[/guided]
[/step]