[guided]The purpose of this step is to prove that a quadratic transport bound is stronger than a linear transport bound. We do this at the level of each coupling, before taking infima.
Let $\Pi(\nu,\mu)$ be the set of Borel probability measures $\pi$ on $E \times E$ with first marginal $\nu$ and second marginal $\mu$. For a fixed coupling $\pi \in \Pi(\nu,\mu)$, define the map $D_\pi: E \times E \to [0,\infty)$ by $D_\pi(x,y)=d(x,y)$. This function is measurable because $d$ is the metric on the Borel [metric space](/page/Metric%20Space) $E$.
Suppose first that the quadratic transportation cost under $\pi$ is finite:
\begin{align*}
\int_{E \times E} D_\pi(x,y)^2 \, d\pi(x,y) <+\infty.
\end{align*}
We want to compare the first moment of $D_\pi$ to the second moment of $D_\pi$. Define
\begin{align*}
m_\pi := \int_{E \times E} D_\pi(x,y) \, d\pi(x,y).
\end{align*}
The key elementary estimate is the non-negativity of variance. Since $\pi$ is a probability measure and $(D_\pi-m_\pi)^2$ is non-negative, we have
\begin{align*}
0 \le \int_{E \times E} (D_\pi(x,y)-m_\pi)^2 \, d\pi(x,y).
\end{align*}
Expanding the square under the integral is justified by the finite second-moment assumption. Because $\pi(E\times E)=1$ and $m_\pi$ is constant, this gives
\begin{align*}
0 \le \int_{E \times E} D_\pi(x,y)^2 \, d\pi(x,y) - 2m_\pi \int_{E \times E} D_\pi(x,y) \, d\pi(x,y) + m_\pi^2.
\end{align*}
By the definition of $m_\pi$, the middle integral equals $m_\pi$, so the inequality becomes
\begin{align*}
0 \le \int_{E \times E} D_\pi(x,y)^2 \, d\pi(x,y) - m_\pi^2.
\end{align*}
Thus
\begin{align*}
\left(\int_{E \times E} D_\pi(x,y) \, d\pi(x,y)\right)^2 \le \int_{E \times E} D_\pi(x,y)^2 \, d\pi(x,y).
\end{align*}
Taking square roots gives
\begin{align*}
\int_{E \times E} D_\pi(x,y) \, d\pi(x,y) \le \left(\int_{E \times E} D_\pi(x,y)^2 \, d\pi(x,y)\right)^{1/2}.
\end{align*}
If the quadratic cost is infinite, the same displayed inequality holds in the extended real sense.
Now take the infimum over all couplings $\pi \in \Pi(\nu,\mu)$. The left-hand infimum is $W_1(\nu,\mu)$ by definition, while the infimum of the square roots of the quadratic costs is the square root of the infimum because the square-root map is increasing on $[0,\infty]$. Therefore
\begin{align*}
W_1(\nu,\mu) \le W_2(\nu,\mu).
\end{align*}[/guided]