[proofplan]
The moment assumptions upgrade the two marginal narrow convergences to convergence in the Wasserstein distance $W_p$. Once this is known, the convergence of $W_p(\mu_N,\nu_N)$ follows directly from the triangle inequality for the metric $W_p$ on $\mathcal{P}_p(X)$. For the second assertion, a narrowly convergent subsequence of optimal plans has limiting marginals $\mu$ and $\nu$ because the coordinate projections are continuous. Lower semicontinuity of the nonnegative continuous transport cost under narrow convergence then gives that the limiting plan has cost at most the limiting optimal value, while the definition of $W_p$ gives the reverse inequality.
[/proofplan]
[step:Upgrade the marginal convergence to $W_p$ convergence]
By the characterization of Wasserstein convergence by narrow convergence plus convergence of $p$-th moments, applied in the Polish [metric space](/page/Metric%20Space) $(X,d)$ with reference point $x_0$, the hypotheses give $W_p(\mu_N,\mu) \to 0$ and $W_p(\nu_N,\nu) \to 0$. Indeed, this characterization applies because $\mu_N,\mu,\nu_N,\nu \in \mathcal{P}_p(X)$, the sequences $(\mu_N)_{N \in \mathbb{N}}$ and $(\nu_N)_{N \in \mathbb{N}}$ converge narrowly to $\mu$ and $\nu$, respectively, and the stated $p$-moment convergences with respect to the declared reference point $x_0 \in X$ are exactly the required moment hypotheses.
[/step]
[step:Use the triangle inequality to prove convergence of the transport distances]
Since $W_p$ is a metric on $\mathcal{P}_p(X)$, its triangle inequality gives $W_p(\mu_N,\nu_N) \leq W_p(\mu_N,\mu) + W_p(\mu,\nu) + W_p(\nu,\nu_N)$. Applying the same triangle inequality in the form $W_p(\mu,\nu) \leq W_p(\mu,\mu_N) + W_p(\mu_N,\nu_N) + W_p(\nu_N,\nu)$ and using the symmetry of $W_p$, we obtain $\left|W_p(\mu_N,\nu_N)-W_p(\mu,\nu)\right| \leq W_p(\mu_N,\mu)+W_p(\nu_N,\nu)$. The right-hand side tends to $0$ by the previous step, hence $W_p(\mu_N,\nu_N) \to W_p(\mu,\nu)$.
[guided]
The goal is to compare an optimal distance with moving endpoints, namely $W_p(\mu_N,\nu_N)$, to the limiting distance $W_p(\mu,\nu)$. The correct comparison tool is the triangle inequality for the metric $W_p$ on $\mathcal{P}_p(X)$.
First, we insert $\mu$ and $\nu$ between the two varying measures. The triangle inequality gives
\begin{align*}
W_p(\mu_N,\nu_N) \leq W_p(\mu_N,\mu) + W_p(\mu,\nu) + W_p(\nu,\nu_N)
\end{align*}
This gives an upper bound for $W_p(\mu_N,\nu_N)-W_p(\mu,\nu)$ after subtracting $W_p(\mu,\nu)$.
For the opposite direction, we insert $\mu_N$ and $\nu_N$ between the limiting measures:
\begin{align*}
W_p(\mu,\nu) \leq W_p(\mu,\mu_N) + W_p(\mu_N,\nu_N) + W_p(\nu_N,\nu)
\end{align*}
Using symmetry of the metric, $W_p(\mu,\mu_N)=W_p(\mu_N,\mu)$ and $W_p(\nu_N,\nu)=W_p(\nu,\nu_N)$. Rearranging yields
\begin{align*}
W_p(\mu,\nu)-W_p(\mu_N,\nu_N) \leq W_p(\mu_N,\mu)+W_p(\nu_N,\nu)
\end{align*}
Combining the two one-sided estimates gives
\begin{align*}
\left|W_p(\mu_N,\nu_N)-W_p(\mu,\nu)\right| \leq W_p(\mu_N,\mu)+W_p(\nu_N,\nu)
\end{align*}
The previous step proved that both terms on the right tend to $0$. Therefore the absolute value tends to $0$, which is exactly
\begin{align*}
W_p(\mu_N,\nu_N) \to W_p(\mu,\nu)
\end{align*}
[/guided]
[/step]
[step:Pass the marginal constraints to the narrow limit of optimal plans]
Let $p_1:X \times X \to X$ and $p_2:X \times X \to X$ denote the coordinate projection maps, defined by $p_1(x,y)=x$ and $p_2(x,y)=y$. Since each $\pi_{N_j}$ belongs to $\Pi(\mu_{N_j},\nu_{N_j})$, its marginals satisfy $(p_1)_\#\pi_{N_j}=\mu_{N_j}$ and $(p_2)_\#\pi_{N_j}=\nu_{N_j}$. Fix a bounded [continuous function](/page/Continuous%20Function) $\varphi:X \to \mathbb{R}$. The composition $\varphi \circ p_1:X \times X \to \mathbb{R}$ is bounded and continuous, so narrow convergence $\pi_{N_j}\to \pi$ implies
\begin{align*}
\int_{X \times X} \varphi(p_1(x,y)) \, d\pi_{N_j}(x,y) \to \int_{X \times X} \varphi(p_1(x,y)) \, d\pi(x,y)
\end{align*}
By the definition of pushforward measure, this is exactly
\begin{align*}
\int_X \varphi(x) \, d\mu_{N_j}(x) \to \int_X \varphi(x) \, d(p_1)_\#\pi(x)
\end{align*}
Since $\mu_{N_j}\to \mu$ narrowly as well, the left-hand side also converges to
\begin{align*}
\int_X \varphi(x) \, d\mu(x)
\end{align*}
Therefore $(p_1)_\#\pi=\mu$. Repeating the same argument with $p_2$ and the narrow convergence $\nu_{N_j}\to\nu$ gives $(p_2)_\#\pi=\nu$. Hence $\pi \in \Pi(\mu,\nu)$.
[/step]
[step:Use lower semicontinuity of the cost to prove optimality of the limiting plan]
Define the cost function $c:X \times X \to [0,\infty)$ by $c(x,y)=d(x,y)^p$. The metric $d$ is continuous on $X \times X$, hence $c$ is continuous and nonnegative. By lower semicontinuity of integrals of nonnegative lower semicontinuous functions under narrow convergence, applied on the Polish space $X \times X$ to the narrowly convergent sequence $(\pi_{N_j})_{j \in \mathbb{N}}$ and to the function $c$, we obtain
\begin{align*}
\int_{X \times X} d(x,y)^p \, d\pi(x,y) \leq \liminf_{j \to \infty} \int_{X \times X} d(x,y)^p \, d\pi_{N_j}(x,y)
\end{align*}
Since $\pi_{N_j}$ is optimal for the cost $d^p$ between $\mu_{N_j}$ and $\nu_{N_j}$, we have
\begin{align*}
\int_{X \times X} d(x,y)^p \, d\pi_{N_j}(x,y)=W_p(\mu_{N_j},\nu_{N_j})^p
\end{align*}
The convergence already proved gives
\begin{align*}
W_p(\mu_{N_j},\nu_{N_j})^p \to W_p(\mu,\nu)^p
\end{align*}
Therefore
\begin{align*}
\int_{X \times X} d(x,y)^p \, d\pi(x,y) \leq W_p(\mu,\nu)^p
\end{align*}
On the other hand, the previous step proved $\pi \in \Pi(\mu,\nu)$, so by the definition of $W_p$ as the infimum over all couplings,
\begin{align*}
W_p(\mu,\nu)^p \leq \int_{X \times X} d(x,y)^p \, d\pi(x,y)
\end{align*}
The two inequalities imply
\begin{align*}
\int_{X \times X} d(x,y)^p \, d\pi(x,y)=W_p(\mu,\nu)^p
\end{align*}
Thus $\pi$ is an optimal transport plan from $\mu$ to $\nu$ for the cost $d^p$.
[/step]