[proofplan]
We prove the inequality first when $H(\nu\mid\rho)<\infty$, since the infinite-entropy case is immediate. The key input is the sequential disintegration of $\nu$ along the coordinates and the entropy chain rule relative to the product reference measure $\rho$. For each history of previous coordinates, we transport the conditional law of the next coordinate to $\rho_i$ by an almost optimal coordinate coupling, and then glue these coordinate kernels into a coupling of $\nu$ with the full product measure $\rho$. The additivity of the cost $d_p^p=\sum_i d_i^p$ converts the coordinate estimates into the desired product estimate, and the maximum of the constants $C_i$ controls the sum.
[/proofplan]
[step:Reduce to finite relative entropy]
Let $\nu\in\mathcal P(X)$. If $H(\nu\mid\rho)=+\infty$, then the asserted inequality reads
\begin{align*}
W_{d_p,p}(\nu,\rho)^p\le+\infty,
\end{align*}
which holds by definition of the extended real order. Hence assume from now on that
\begin{align*}
H(\nu\mid\rho)<+\infty.
\end{align*}
In particular, $\nu\ll\rho$.
For $i\in\{1,\dots,n\}$, define
\begin{align*}
X_{<i}:=\prod_{j=1}^{i-1}X_j,
\end{align*}
with the convention that $X_{<1}$ is a one-point measurable space. Let $\nu_{<i}$ denote the marginal of $\nu$ on $X_{<i}$. By the disintegration theorem for probability measures on Polish spaces, applied successively to the coordinate projections, there exist probability kernels $K_i:X_{<i}\times\mathcal B(X_i)\to[0,1]$ such that, writing $K_i(z,\cdot)=:\nu_{i,z}$ for $z\in X_{<i}$, the measure $\nu$ has the sequential representation
\begin{align*}
d\nu(x_1,\dots,x_n)=d\nu_{1,*}(x_1)\prod_{i=2}^n d\nu_{i,(x_1,\dots,x_{i-1})}(x_i),
\end{align*}
where $\nu_{1,*}$ is the first marginal of $\nu$.
Since $\rho=\rho_1\otimes\cdots\otimes\rho_n$ is a product measure and $\nu\ll\rho$, the corresponding conditional laws satisfy
\begin{align*}
\nu_{i,z}\ll\rho_i
\end{align*}
for $\nu_{<i}$-almost every $z\in X_{<i}$ and every $i\in\{1,\dots,n\}$.
[/step]
[step:Decompose the entropy into conditional coordinate entropies]
By the chain rule for relative entropy on standard Borel spaces, applied to the sequential disintegration above and to the product disintegration of $\rho$, one has
\begin{align*}
H(\nu\mid\rho)=\sum_{i=1}^n\int_{X_{<i}} H(\nu_{i,z}\mid\rho_i)\,d\nu_{<i}(z),
\end{align*}
where for $i=1$ the integral over $X_{<1}$ means the single value $H(\nu_{1,*}\mid\rho_1)$.
[guided]
The point of this step is to isolate the entropy cost paid in each coordinate. The reference measure is a product, so after fixing a history $z=(x_1,\dots,x_{i-1})\in X_{<i}$, the reference conditional law for the $i$-th coordinate is always $\rho_i$, independent of $z$.
For each $i\in\{1,\dots,n\}$, let $X_{<i}:=\prod_{j=1}^{i-1}X_j$, with $X_{<1}$ a one-point measurable space, let $\nu_{<i}$ be the marginal of $\nu$ on $X_{<i}$, and let $K_i:X_{<i}\times\mathcal B(X_i)\to[0,1]$ be a sequential disintegration kernel for $\nu$. Write $K_i(z,\cdot)=\nu_{i,z}$. We use the chain rule for relative entropy on standard Borel spaces. This theorem applies because every $X_i$ is Polish, hence each Borel space $(X_i,\mathcal B(X_i))$ is standard Borel, and finite products of standard Borel spaces are standard Borel. The theorem states that if a probability measure is disintegrated sequentially into conditional kernels and the reference measure is disintegrated in the same coordinate order, then the relative entropy is the sum of the relative entropies of the conditional laws, integrated over the preceding-coordinate marginals.
Here the reference disintegration is especially simple:
\begin{align*}
d\rho(x_1,\dots,x_n)=\prod_{i=1}^n d\rho_i(x_i).
\end{align*}
Thus the conditional reference law of $x_i$ given $(x_1,\dots,x_{i-1})$ is $\rho_i$. Therefore the entropy chain rule gives
\begin{align*}
H(\nu\mid\rho)=\sum_{i=1}^n\int_{X_{<i}} H(\nu_{i,z}\mid\rho_i)\,d\nu_{<i}(z).
\end{align*}
The finite-entropy assumption ensures that the right-hand side is finite and that the conditional entropies are finite for $\nu_{<i}$-almost every history $z$.
[/guided]
[/step]
[step:Choose measurable almost optimal coordinate couplings]
Fix $\varepsilon>0$. For each $i\in\{1,\dots,n\}$ and for $\nu_{<i}$-almost every $z\in X_{<i}$, the coordinate transport entropy assumption gives
\begin{align*}
W_{d_i,p}(\nu_{i,z},\rho_i)^p\le C_i H(\nu_{i,z}\mid\rho_i).
\end{align*}
Because $X_i$ is Polish, the space $\mathcal P(X_i\times X_i)$ with the narrow Borel structure is standard Borel. Define $J_i:\mathcal P(X_i\times X_i)\to[0,+\infty]$ by
\begin{align*}
J_i(\pi):=\int_{X_i\times X_i}d_i(x_i,y_i)^p\,d\pi((x_i,y_i)).
\end{align*}
The cost $(x_i,y_i)\mapsto d_i(x_i,y_i)^p$ is nonnegative and continuous, possibly unbounded. Hence $J_i$ is lower semicontinuous, and therefore Borel measurable, for narrow convergence: it is the increasing supremum of the bounded continuous truncation functionals obtained from $d_i^p\wedge m$, $m\in\mathbb N$. The marginal maps from $\mathcal P(X_i\times X_i)$ to $\mathcal P(X_i)$ are Borel, and the kernel map $z\mapsto\nu_{i,z}$ is measurable. Consequently the set of pairs $(z,\pi)$ such that $\pi\in\Pi(\nu_{i,z},\rho_i)$ and $J_i(\pi)\le W_{d_i,p}(\nu_{i,z},\rho_i)^p+\varepsilon/n$ is a measurable graph with nonempty sections. Hence the measurable selection theorem for probability kernels on Polish spaces gives, for each $i$, a probability kernel $\kappa_i:X_{<i}\times\mathcal B(X_i\times X_i)\to[0,1]$ such that, for $\nu_{<i}$-almost every $z\in X_{<i}$, the measure $\kappa_i(z,\cdot)$ belongs to $\Pi(\nu_{i,z},\rho_i)$ and satisfies
\begin{align*}
\int_{X_i\times X_i}d_i(x_i,y_i)^p\,d\kappa_i(z,(x_i,y_i))\le W_{d_i,p}(\nu_{i,z},\rho_i)^p+\frac{\varepsilon}{n}.
\end{align*}
Combining these two inequalities yields, for $\nu_{<i}$-almost every $z$,
\begin{align*}
\int_{X_i\times X_i}d_i(x_i,y_i)^p\,d\kappa_i(z,(x_i,y_i))\le C_i H(\nu_{i,z}\mid\rho_i)+\frac{\varepsilon}{n}.
\end{align*}
[/step]
[step:Glue the coordinate couplings into a product coupling]
Define a probability measure $\Gamma_\varepsilon$ on $X\times X$ by the following iterated kernel construction: sample $(x_1,y_1)$ according to $\kappa_1$, and for each $i\ge2$, after the history $x_{<i}=(x_1,\dots,x_{i-1})$ has been sampled, sample $(x_i,y_i)$ according to $\kappa_i(x_{<i},\cdot)$. Equivalently, for every bounded Borel map $\Phi:X\times X\to\mathbb R$,
\begin{align*}
\int_{X\times X}\Phi(x,y)\,d\Gamma_\varepsilon(x,y)
\end{align*}
is defined by the corresponding iterated integral against the kernels $\kappa_1,\kappa_2,\dots,\kappa_n$.
The first marginal of $\Gamma_\varepsilon$ is $\nu$, because each $\kappa_i(z,\cdot)$ has first marginal $\nu_{i,z}$ and the first-coordinate kernels reproduce the sequential disintegration of $\nu$. The second marginal of $\Gamma_\varepsilon$ is $\rho$, because each $\kappa_i(z,\cdot)$ has second marginal $\rho_i$, and this second marginal does not depend on the history $z$. Hence the $y_i$-coordinates are generated by the product kernels $\rho_i$, so the second marginal is $\rho_1\otimes\cdots\otimes\rho_n=\rho$. Thus
\begin{align*}
\Gamma_\varepsilon\in\Pi(\nu,\rho).
\end{align*}
[/step]
[step:Estimate the product cost by the conditional entropies]
Since $d_p(x,y)^p=\sum_{i=1}^n d_i(x_i,y_i)^p$ for every $x,y\in X$, Tonelli's theorem for the nonnegative Borel cost gives
\begin{align*}
\int_{X\times X}d_p(x,y)^p\,d\Gamma_\varepsilon((x,y))=\sum_{i=1}^n\int_{X\times X}d_i(x_i,y_i)^p\,d\Gamma_\varepsilon((x,y)).
\end{align*}
By the construction of $\Gamma_\varepsilon$, the $i$-th summand is
\begin{align*}
\int_{X_{<i}}\int_{X_i\times X_i}d_i(x_i,y_i)^p\,d\kappa_i(z,(x_i,y_i))\,d\nu_{<i}(z).
\end{align*}
Therefore
\begin{align*}
\int_{X\times X}d_p(x,y)^p\,d\Gamma_\varepsilon((x,y))\le\sum_{i=1}^n C_i\int_{X_{<i}}H(\nu_{i,z}\mid\rho_i)\,d\nu_{<i}(z)+\varepsilon.
\end{align*}
Let
\begin{align*}
C_{\max}:=\max_{1\le i\le n}C_i.
\end{align*}
Then
\begin{align*}
\int_{X\times X}d_p(x,y)^p\,d\Gamma_\varepsilon((x,y))\le C_{\max}\sum_{i=1}^n\int_{X_{<i}}H(\nu_{i,z}\mid\rho_i)\,d\nu_{<i}(z)+\varepsilon.
\end{align*}
Using the entropy chain rule from the previous step, this becomes
\begin{align*}
\int_{X\times X}d_p(x,y)^p\,d\Gamma_\varepsilon((x,y))\le C_{\max}H(\nu\mid\rho)+\varepsilon.
\end{align*}
[/step]
[step:Take the infimum over couplings and let the error vanish]
Since $\Gamma_\varepsilon\in\Pi(\nu,\rho)$, the definition of $W_{d_p,p}$ gives
\begin{align*}
W_{d_p,p}(\nu,\rho)^p
\le
\int_{X\times X}d_p(x,y)^p\,d\Gamma_\varepsilon(x,y).
\end{align*}
Combining this with the previous estimate yields
\begin{align*}
W_{d_p,p}(\nu,\rho)^p
\le
C_{\max}H(\nu\mid\rho)+\varepsilon.
\end{align*}
Because $\varepsilon>0$ was arbitrary, letting $\varepsilon\downarrow0$ gives
\begin{align*}
W_{d_p,p}(\nu,\rho)^p
\le
C_{\max}H(\nu\mid\rho).
\end{align*}
This is exactly the claimed tensorized transport entropy inequality.
[/step]