[proofplan]
We prove the two implications separately. If $W_p(\mu_n,\mu) \to 0$, choose couplings with vanishing $p$-transport cost; these couplings force bounded continuous test functions to have asymptotically equal expectations, and the [reverse triangle inequality](/theorems/2300) transfers the same vanishing cost to the radial variables $d(\cdot,x_0)$. Conversely, [weak convergence](/page/Weak%20Convergence) plus convergence of the $p$-th moments gives uniform integrability of the moment functions. The Skorokhod representation theorem then realizes the weak convergence almost surely, and Vitali convergence converts almost sure convergence into convergence in $L^p$, producing admissible couplings with vanishing cost.
[/proofplan]
[step:Choose couplings with vanishing transport cost]
Assume first that $W_p(\mu_n,\mu) \to 0$. For each $n \in \mathbb{N}$, choose a coupling $\pi_n \in \Pi(\mu_n,\mu)$ such that
\begin{align*}
\int_{X \times X} d(x,y)^p\, d\pi_n(x,y) \le W_p(\mu_n,\mu)^p + \frac{1}{n}.
\end{align*}
Then
\begin{align*}
\int_{X \times X} d(x,y)^p\, d\pi_n(x,y) \to 0.
\end{align*}
Here $\Pi(\mu_n,\mu)$ denotes the set of probability measures on $X \times X$ whose first marginal is $\mu_n$ and whose second marginal is $\mu$.
[/step]
[step:Show that vanishing transport cost implies weak convergence]
Let $f: X \to \mathbb{R}$ be a bounded [continuous function](/page/Continuous%20Function), and define $M_f := \sup_{z \in X} |f(z)|$. We prove
\begin{align*}
\int_X f(x)\, d\mu_n(x) \to \int_X f(y)\, d\mu(y).
\end{align*}
Fix $\varepsilon > 0$. Since $\mu$ is a probability measure on the Polish space $X$, it is tight. Choose a compact set $K \subset X$ such that
\begin{align*}
\mu(X \setminus K) < \varepsilon.
\end{align*}
Because $f$ is continuous at each point of $K$, for every $y \in K$ there exists $\rho_y > 0$ such that $d(z,y) < \rho_y$ implies $|f(z)-f(y)| < \varepsilon/2$. The balls $B(y,\rho_y/2)$ with $y \in K$ cover $K$, so compactness gives points $y_1,\dots,y_m \in K$ such that
\begin{align*}
K \subset \bigcup_{j=1}^{m} B(y_j,\rho_{y_j}/2).
\end{align*}
Define $\delta := \frac{1}{2}\min_{1 \le j \le m}\rho_{y_j} > 0$. If $y \in K$ and $x \in X$ satisfy $d(x,y) < \delta$, choose $j$ with $d(y,y_j) < \rho_{y_j}/2$. Then $d(x,y_j) < \rho_{y_j}$ and $d(y,y_j) < \rho_{y_j}$, hence
\begin{align*}
|f(x)-f(y)| \le |f(x)-f(y_j)|+|f(y_j)-f(y)| < \varepsilon.
\end{align*}
Define the measurable set
\begin{align*}
A_\delta := \{(x,y) \in X \times X : d(x,y) < \delta\}.
\end{align*}
Using that the second marginal of $\pi_n$ is $\mu$, and using Markov's inequality on the non-negative function $(x,y) \mapsto d(x,y)^p$, we obtain
\begin{align*}
\pi_n((X \times X) \setminus A_\delta) \le \delta^{-p}\int_{X \times X} d(x,y)^p\, d\pi_n(x,y).
\end{align*}
Therefore
\begin{align*}
\left|\int_X f(x)\, d\mu_n(x)-\int_X f(y)\, d\mu(y)\right| \le \int_{X \times X} |f(x)-f(y)|\, d\pi_n(x,y).
\end{align*}
Splitting the integral over $(X \times K) \cap A_\delta$, $X \times (X \setminus K)$, and $(X \times K) \setminus A_\delta$, we get
\begin{align*}
\int_{X \times X} |f(x)-f(y)|\, d\pi_n(x,y) \le \varepsilon + 2M_f\,\mu(X \setminus K) + 2M_f\,\delta^{-p}\int_{X \times X} d(x,y)^p\, d\pi_n(x,y).
\end{align*}
Taking $\limsup_{n \to \infty}$ and using the vanishing transport cost gives
\begin{align*}
\limsup_{n \to \infty}\left|\int_X f(x)\, d\mu_n(x)-\int_X f(y)\, d\mu(y)\right| \le \varepsilon + 2M_f\varepsilon.
\end{align*}
Since $\varepsilon > 0$ was arbitrary, the desired convergence holds for every bounded continuous $f: X \to \mathbb{R}$. Hence $\mu_n$ converges weakly to $\mu$.
[/step]
[step:Transfer the vanishing cost to convergence of the moments]
Define the radial function
\begin{align*}
r: X \to [0,\infty), \qquad r(x) := d(x,x_0).
\end{align*}
For all $x,y \in X$, the reverse triangle inequality gives
\begin{align*}
|r(x)-r(y)| \le d(x,y).
\end{align*}
Hence, for the couplings $\pi_n$ chosen above,
\begin{align*}
\int_{X \times X} |r(x)-r(y)|^p\, d\pi_n(x,y) \le \int_{X \times X} d(x,y)^p\, d\pi_n(x,y) \to 0.
\end{align*}
Define [measurable functions](/page/Measurable%20Functions) $a_n,b_n: X \times X \to [0,\infty)$ by $a_n(x,y) := r(x)$ and $b_n(x,y) := r(y)$. On the finite [measure space](/page/Measure%20Space) $(X \times X,\mathcal{B}(X \times X),\pi_n)$, the $p$-norm of a measurable function $g: X \times X \to \mathbb{R}$ is
\begin{align*}
\|g\|_{p,\pi_n} := \left(\int_{X \times X} |g(x,y)|^p\, d\pi_n(x,y)\right)^{1/p}.
\end{align*}
Minkowski's inequality implies the reverse triangle inequality for this norm, and therefore
\begin{align*}
\left|\left(\int_{X \times X} r(x)^p\, d\pi_n(x,y)\right)^{1/p}-\left(\int_{X \times X} r(y)^p\, d\pi_n(x,y)\right)^{1/p}\right| \le \left(\int_{X \times X} |r(x)-r(y)|^p\, d\pi_n(x,y)\right)^{1/p}.
\end{align*}
Using the marginal identities for $\pi_n$, this becomes
\begin{align*}
\left|\left(\int_X d(x,x_0)^p\, d\mu_n(x)\right)^{1/p}-\left(\int_X d(y,x_0)^p\, d\mu(y)\right)^{1/p}\right| \to 0.
\end{align*}
Both quantities are finite because $\mu_n,\mu \in \mathcal{P}_p(X)$. Therefore their $p$-th powers converge, and
\begin{align*}
\int_X d(x,x_0)^p\, d\mu_n(x) \to \int_X d(x,x_0)^p\, d\mu(x).
\end{align*}
[/step]
[step:Derive uniform integrability from weak convergence and moment convergence]
Conversely, assume that $\mu_n$ converges weakly to $\mu$ and that
\begin{align*}
\int_X r(x)^p\, d\mu_n(x) \to \int_X r(x)^p\, d\mu(x),
\end{align*}
where $r: X \to [0,\infty)$ is the function $r(x) := d(x,x_0)$. We prove that the family of non-negative functions $r^p$ under the measures $\mu_n$ is uniformly integrable, in the following precise sense:
\begin{align*}
\lim_{R \to \infty}\sup_{n \in \mathbb{N}}\int_{\{x \in X : r(x)^p > R\}} r(x)^p\, d\mu_n(x) = 0.
\end{align*}
For $R > 0$, define the bounded continuous truncation
\begin{align*}
\varphi_R: X \to [0,R], \qquad \varphi_R(x) := \min\{r(x)^p,R\}.
\end{align*}
Since $\varphi_R$ is bounded and continuous, weak convergence gives
\begin{align*}
\int_X \varphi_R(x)\, d\mu_n(x) \to \int_X \varphi_R(x)\, d\mu(x)
\end{align*}
for each fixed $R > 0$. Hence
\begin{align*}
\limsup_{n \to \infty}\int_X \bigl(r(x)^p-\varphi_R(x)\bigr)\, d\mu_n(x) = \int_X \bigl(r(x)^p-\varphi_R(x)\bigr)\, d\mu(x).
\end{align*}
As $R \to \infty$, the [dominated convergence theorem](/theorems/4) applied with respect to the finite measure $\mu$ gives
\begin{align*}
\int_X \bigl(r(x)^p-\varphi_R(x)\bigr)\, d\mu(x) \to 0.
\end{align*}
Indeed, $0 \le r^p-\varphi_R \le r^p$, the function $r^p$ is $\mu$-integrable, and $r^p-\varphi_R \to 0$ pointwise.
For every $R > 0$,
\begin{align*}
r(x)^p\mathbb{1}_{\{r(x)^p>2R\}} \le 2\bigl(r(x)^p-\varphi_R(x)\bigr).
\end{align*}
This proves the desired uniform integrability after also observing that finitely many initial measures have vanishing tails because each belongs to $\mathcal{P}_p(X)$.
[guided]
For the converse implication, assume that $\mu_n$ converges weakly to $\mu$ and that the $p$-th radial moments about $x_0$ converge. The point of this step is to turn convergence of these numerical moments into a tail estimate that is uniform in $n$. Define
\begin{align*}
r: X \to [0,\infty), \qquad r(x) := d(x,x_0).
\end{align*}
The function $r^p$ may be unbounded, so weak convergence cannot be applied directly to it. We therefore truncate it. For each $R > 0$, define
\begin{align*}
\varphi_R: X \to [0,R], \qquad \varphi_R(x) := \min\{r(x)^p,R\}.
\end{align*}
This function is bounded and continuous because $r$ is continuous and $t \mapsto \min\{t^p,R\}$ is continuous on $[0,\infty)$. Therefore weak convergence gives
\begin{align*}
\int_X \varphi_R(x)\, d\mu_n(x) \to \int_X \varphi_R(x)\, d\mu(x).
\end{align*}
Now compare the unbounded moment with its bounded truncation. Since by hypothesis
\begin{align*}
\int_X r(x)^p\, d\mu_n(x) \to \int_X r(x)^p\, d\mu(x),
\end{align*}
subtracting the convergence of the truncated integrals gives
\begin{align*}
\limsup_{n \to \infty}\int_X \bigl(r(x)^p-\varphi_R(x)\bigr)\, d\mu_n(x) = \int_X \bigl(r(x)^p-\varphi_R(x)\bigr)\, d\mu(x).
\end{align*}
The right-hand side tends to $0$ as $R \to \infty$ because $0 \le r^p-\varphi_R \le r^p$, the function $r^p$ is $\mu$-integrable, and $r^p-\varphi_R \to 0$ pointwise. This is the bounded convergence theorem on the finite measure space $(X,\mathcal{B}(X),\mu)$, equivalently the dominated convergence theorem with integrable dominator $r^p$.
Finally, the tail $\{r^p>2R\}$ is controlled by the truncation error. If $r(x)^p>2R$, then $\varphi_R(x)=R$ and
\begin{align*}
r(x)^p \le 2\bigl(r(x)^p-R\bigr) = 2\bigl(r(x)^p-\varphi_R(x)\bigr).
\end{align*}
If $r(x)^p \le 2R$, then the left-hand side with the indicator is $0$. Thus for every $x \in X$,
\begin{align*}
r(x)^p\mathbb{1}_{\{r(x)^p>2R\}} \le 2\bigl(r(x)^p-\varphi_R(x)\bigr).
\end{align*}
Taking $\limsup_{n \to \infty}$ of the corresponding integrals and then sending $R \to \infty$ shows that all sufficiently large indices have uniformly small tails. The finitely many remaining indices also have small tails for large $R$, because each fixed measure $\mu_1,\dots,\mu_N$ lies in $\mathcal{P}_p(X)$. Hence
\begin{align*}
\lim_{R \to \infty}\sup_{n \in \mathbb{N}}\int_{\{x \in X : r(x)^p > R\}} r(x)^p\, d\mu_n(x) = 0.
\end{align*}
[/guided]
[/step]
[step:Realize the weak convergence almost surely]
By the Skorokhod representation theorem for probability measures on Polish spaces, there exist a probability space $(\Omega,\mathcal{F},\mathbb{P})$, measurable maps
\begin{align*}
Y_n: \Omega \to X
\end{align*}
for $n \in \mathbb{N}$, and a measurable map
\begin{align*}
Y: \Omega \to X
\end{align*}
such that the law of $Y_n$ is $\mu_n$, the law of $Y$ is $\mu$, and
\begin{align*}
Y_n \to Y \quad \mathbb{P}\text{-a.s.}
\end{align*}
Define real-valued random variables
\begin{align*}
Z_n: \Omega \to [0,\infty), \qquad Z_n(\omega) := d(Y_n(\omega),Y(\omega))^p.
\end{align*}
The almost sure convergence $Y_n \to Y$ implies
\begin{align*}
Z_n \to 0 \quad \mathbb{P}\text{-a.s.}
\end{align*}
[/step]
[step:Upgrade almost sure convergence to convergence of transport costs]
For every $\omega \in \Omega$, the triangle inequality gives
\begin{align*}
d(Y_n(\omega),Y(\omega))^p \le 2^{p-1}\bigl(d(Y_n(\omega),x_0)^p+d(Y(\omega),x_0)^p\bigr).
\end{align*}
The family $\{d(Y_n,x_0)^p : n \in \mathbb{N}\}$ is uniformly integrable by the previous step, since the law of $Y_n$ is $\mu_n$. The single integrable [random variable](/page/Random%20Variable) $d(Y,x_0)^p$ is uniformly integrable as a one-element family, since $\mu \in \mathcal{P}_p(X)$. To justify the closure property used here, let $A_n := d(Y_n,x_0)^p$ and $B := d(Y,x_0)^p$. For every $M > 0$,
\begin{align*}
(A_n+B)\mathbb{1}_{\{A_n+B>M\}} \le A_n\mathbb{1}_{\{A_n>M/2\}} + B\mathbb{1}_{\{B>M/2\}} + A_n\mathbb{1}_{\{B>M/2\}} + B\mathbb{1}_{\{A_n>M/2\}}.
\end{align*}
The first two terms have uniformly small expectations as $M \to \infty$ by uniform integrability of $\{A_n\}$ and integrability of $B$. The mixed terms are controlled by first truncating one factor through the set inclusion and then using tightness of the tail indicators, which is the standard finite-sum closure of uniform integrability. Multiplication by the fixed scalar $2^{p-1}$ preserves uniform integrability. Applying these closure properties to the preceding pointwise bound shows that the family $\{Z_n : n \in \mathbb{N}\}$ is uniformly integrable.
Let $\mathbb{E}[G] := \int_\Omega G(\omega)\, d\mathbb{P}(\omega)$ denote expectation of any non-negative or integrable real-valued random variable $G: \Omega \to \mathbb{R}$. By Vitali's convergence theorem, the almost sure convergence $Z_n \to 0$ together with uniform integrability implies
\begin{align*}
\mathbb{E}[Z_n] \to 0.
\end{align*}
Equivalently,
\begin{align*}
\mathbb{E}[d(Y_n,Y)^p] \to 0.
\end{align*}
[/step]
[step:Use the joint laws as admissible couplings]
For each $n \in \mathbb{N}$, define $\gamma_n$ to be the joint law of $(Y_n,Y)$ on $X \times X$. Then $\gamma_n \in \Pi(\mu_n,\mu)$, because the first marginal of $\gamma_n$ is the law of $Y_n$, namely $\mu_n$, and the second marginal of $\gamma_n$ is the law of $Y$, namely $\mu$. Hence, by the definition of $W_p$,
\begin{align*}
W_p(\mu_n,\mu)^p \le \int_{X \times X} d(x,y)^p\, d\gamma_n(x,y).
\end{align*}
Since $\gamma_n$ is the joint law of $(Y_n,Y)$, the right-hand side equals
\begin{align*}
\mathbb{E}[d(Y_n,Y)^p].
\end{align*}
The previous step shows that this expectation tends to $0$, so
\begin{align*}
W_p(\mu_n,\mu)^p \to 0.
\end{align*}
Because $W_p(\mu_n,\mu) \ge 0$, we conclude that
\begin{align*}
W_p(\mu_n,\mu) \to 0.
\end{align*}
This proves the converse implication and completes the proof.
[/step]