[proofplan]
The argument is the elementary bounded-diameter form of Marton's transportation inequality. First we compare $W_1$ to total variation by using a coupling which realizes the total variation distance and the estimate $d(x,y)\le D\mathbb{1}_{\{x\ne y\}}$. Then Pinsker's inequality converts total variation into relative entropy with the convention $\|\nu-\rho\|_{\mathrm{TV}}=\sup_{A\in\mathcal B(X)}|\nu(A)-\rho(A)|$. Squaring the coupling estimate and combining the two inequalities gives the stated constant.
[/proofplan]
[step:Reduce to the finite entropy case]
If $H(\nu\mid\rho)=+\infty$, then the right-hand side is $+\infty$, while $W_1(\nu,\rho)\in[0,D]$ because $d\le D$ on $X\times X$. Hence the desired inequality holds in this case.
Assume from now on that $H(\nu\mid\rho)<+\infty$. Then $\nu\ll\rho$ by the definition of relative entropy.
[/step]
[step:Declare the total variation convention and choose a maximal coupling]
Define the total variation distance between $\nu$ and $\rho$ by
\begin{align*}
\|\nu-\rho\|_{\mathrm{TV}}:=\sup_{A\in\mathcal B(X)}|\nu(A)-\rho(A)|.
\end{align*}
Let $\Delta\subset X\times X$ denote the diagonal,
\begin{align*}
\Delta:=\{(x,y)\in X\times X:x=y\}.
\end{align*}
Since $X$ is a separable metric space, the product Borel $\sigma$-algebra satisfies $\mathcal B(X\times X)=\mathcal B(X)\otimes\mathcal B(X)$. Because $d:X\times X\to[0,\infty)$ is continuous and $\Delta=d^{-1}(\{0\})$, the set $\Delta$ is closed in $X\times X$, hence $\Delta\in\mathcal B(X)\otimes\mathcal B(X)$. Let $\mathbb{1}_{(X\times X)\setminus\Delta}:X\times X\to\{0,1\}$ denote the indicator function of the measurable complement $(X\times X)\setminus\Delta$.
Let $\Pi(\nu,\rho)$ denote the set of Borel probability measures on $(X\times X,\mathcal B(X)\otimes\mathcal B(X))$ with first marginal $\nu$ and second marginal $\rho$. By the coupling characterization of total variation, there exists $\pi\in\Pi(\nu,\rho)$ such that
\begin{align*}
\pi\bigl((X\times X)\setminus\Delta\bigr)=\|\nu-\rho\|_{\mathrm{TV}}.
\end{align*}
Equivalently, the two coordinates disagree with $\pi$-probability exactly $\|\nu-\rho\|_{\mathrm{TV}}$.
[guided]
We need a coupling that turns total variation into a probability of disagreement. Let $\Pi(\nu,\rho)$ denote the set of Borel probability measures $\pi$ on $X\times X$ whose first marginal is $\nu$ and whose second marginal is $\rho$. The total variation convention used in this proof is
\begin{align*}
\|\nu-\rho\|_{\mathrm{TV}}:=\sup_{A\in\mathcal B(X)}|\nu(A)-\rho(A)|.
\end{align*}
This is the non-doubled convention, and it is the convention for which Pinsker's inequality later gives the factor $1/2$.
Define the diagonal set
\begin{align*}
\Delta:=\{(x,y)\in X\times X:x=y\}.
\end{align*}
Since $X$ is separable and metric, the Borel $\sigma$-algebra of the product topology satisfies $\mathcal B(X\times X)=\mathcal B(X)\otimes\mathcal B(X)$. Because $d:X\times X\to[0,\infty)$ is continuous and $\Delta=d^{-1}(\{0\})$, the set $\Delta$ is closed in $X\times X$. Therefore $\Delta$ and its complement $(X\times X)\setminus\Delta$ are measurable with respect to $\mathcal B(X)\otimes\mathcal B(X)$. Let $\mathbb{1}_{(X\times X)\setminus\Delta}:X\times X\to\{0,1\}$ denote the indicator function of this measurable complement.
The coupling characterization of total variation says that, under the convention above, there is a coupling $\pi\in\Pi(\nu,\rho)$ such that
\begin{align*}
\pi\bigl((X\times X)\setminus\Delta\bigr)=\|\nu-\rho\|_{\mathrm{TV}}.
\end{align*}
This coupling is often called a maximal coupling: it makes the two coupled random coordinates equal with the largest possible probability. We use it because the diameter estimate bounds transport cost by the probability that the two coordinates fail to coincide.
[/guided]
[/step]
[step:Bound the Wasserstein distance by total variation using the diameter]
Let $\pi\in\Pi(\nu,\rho)$ be the coupling chosen above. Since $X$ is separable metric, $\mathcal B(X\times X)=\mathcal B(X)\otimes\mathcal B(X)$; hence the continuous function $d:X\times X\to[0,\infty)$ is $\mathcal B(X)\otimes\mathcal B(X)$-measurable. By the definition of the $1$-Wasserstein distance through couplings,
\begin{align*}
W_1(\nu,\rho)\le \int_{X\times X} d(x,y)\,d\pi(x,y).
\end{align*}
For every $(x,y)\in X\times X$, the diameter assumption gives
\begin{align*}
d(x,y)\le D\mathbb{1}_{(X\times X)\setminus\Delta}(x,y),
\end{align*}
because $d(x,x)=0$ on $\Delta$ and $d(x,y)\le D$ off $\Delta$. Integrating this pointwise inequality with respect to $\pi$ gives
\begin{align*}
\int_{X\times X} d(x,y)\,d\pi(x,y)\le D\int_{X\times X}\mathbb{1}_{(X\times X)\setminus\Delta}(x,y)\,d\pi(x,y).
\end{align*}
By the defining property of $\pi$,
\begin{align*}
W_1(\nu,\rho)\le D\pi\bigl((X\times X)\setminus\Delta\bigr)=D\|\nu-\rho\|_{\mathrm{TV}}.
\end{align*}
[/step]
[step:Apply Pinsker's inequality and combine the estimates]
Pinsker's inequality, with the total variation convention
\begin{align*}
\|\nu-\rho\|_{\mathrm{TV}}=\sup_{A\in\mathcal B(X)}|\nu(A)-\rho(A)|,
\end{align*}
states that
\begin{align*}
\|\nu-\rho\|_{\mathrm{TV}}^2\le \frac{1}{2}H(\nu\mid\rho).
\end{align*}
Squaring the estimate obtained from the coupling step gives
\begin{align*}
W_1(\nu,\rho)^2\le D^2\|\nu-\rho\|_{\mathrm{TV}}^2.
\end{align*}
Combining this with Pinsker's inequality yields
\begin{align*}
W_1(\nu,\rho)^2\le \frac{D^2}{2}H(\nu\mid\rho).
\end{align*}
This is the desired transportation-entropy inequality.
[/step]