[guided]We first make the setup used throughout the proof explicit. If $H(\nu\mid\mu)=\infty$, the asserted inequality is vacuous because the right-hand side is $+\infty$, so assume $H(\nu\mid\mu)<\infty$; then $\nu\ll\mu$. For $i\in\{1,\dots,n\}$, define
\begin{align*}
X_{<i}:=X_1\times\cdots\times X_{i-1}
\end{align*}
and
\begin{align*}
\mathcal{X}_{<i}:=\mathcal{X}_1\otimes\cdots\otimes\mathcal{X}_{i-1}.
\end{align*}
For $i=1$, use the convention that $X_{<1}$ is a singleton with its one-point $\sigma$-algebra. Since the coordinate spaces are standard Borel, the [disintegration of measures](/theorems/971) applies to the projection onto the previous coordinates. Thus, for each $i$, choose a regular conditional probability kernel
\begin{align*}
\nu_i(\,\cdot\mid x_{<i}):\mathcal{X}_i\to[0,1]
\end{align*}
from $(X_{<i},\mathcal{X}_{<i})$ to $(X_i,\mathcal{X}_i)$ giving the conditional law of the $i$-th coordinate under $\nu$ given $x_{<i}=(x_1,\dots,x_{i-1})$.
The quantity we need to control is the probability that the two coupled coordinates differ. For a single coordinate space $(X_i,\mathcal{X}_i)$, the best possible mismatch probability between two probability measures is their total variation distance. Thus, before constructing the full product coupling, we need a way to bound total variation by entropy.
Let $(E,\mathcal{E})$ be any measurable space, and let $\rho$ and $\pi$ be probability measures on it. Define
\begin{align*}
\|\rho-\pi\|_{\mathrm{TV}} := \sup_{A\in\mathcal{E}} |\rho(A)-\pi(A)|.
\end{align*}
If $\rho\not\ll\pi$, then $H(\rho\mid\pi)=\infty$, so the desired bound is immediate. Assume $\rho\ll\pi$, and let $f:E\to[0,\infty)$ be the Radon-Nikodym derivative $f=d\rho/d\pi$. Then
\begin{align*}
H(\rho\mid\pi)=\int_E f\log f\,d\pi.
\end{align*}
Because both $\rho$ and $\pi$ are probability measures, $\int_E f\,d\pi=1$ and $\int_E 1\,d\pi=1$, so
\begin{align*}
\int_E(f-1)\,d\pi=0.
\end{align*}
Therefore
\begin{align*}
H(\rho\mid\pi)=\int_E (f\log f-f+1)\,d\pi.
\end{align*}
The elementary inequality
\begin{align*}
t\log t-t+1 \geq \frac{(t-1)^2}{t+1}
\end{align*}
holds for every $t\geq0$. Applying it pointwise to $t=f(z)$ and integrating with respect to $\pi$ gives
\begin{align*}
H(\rho\mid\pi)\geq \int_E \frac{(f-1)^2}{f+1}\,d\pi.
\end{align*}
Now we convert this integral lower bound into a total variation bound. The identity
\begin{align*}
|f-1|=\frac{|f-1|}{\sqrt{f+1}}\sqrt{f+1}
\end{align*}
lets us apply the [Cauchy-Schwarz inequality](/theorems/432) in $L^2(E,\mathcal{E},\pi)$:
\begin{align*}
\left(\int_E |f-1|\,d\pi\right)^2 \leq \left(\int_E \frac{(f-1)^2}{f+1}\,d\pi\right)\left(\int_E (f+1)\,d\pi\right).
\end{align*}
Since $\int_E(f+1)\,d\pi=2$, we obtain
\begin{align*}
\left(\int_E |f-1|\,d\pi\right)^2 \leq 2H(\rho\mid\pi).
\end{align*}
Using
\begin{align*}
\|\rho-\pi\|_{\mathrm{TV}}=\frac{1}{2}\int_E |f-1|\,d\pi,
\end{align*}
we conclude
\begin{align*}
\|\rho-\pi\|_{\mathrm{TV}} \leq \sqrt{\frac{1}{2}H(\rho\mid\pi)}.
\end{align*}
Applying this result with $\rho=\nu_i(\,\cdot\mid x_{<i})$ and $\pi=\mu_i$ yields, for each coordinate $i$ and each admissible conditioning value $x_{<i}$,
\begin{align*}
\|\nu_i(\,\cdot\mid x_{<i})-\mu_i\|_{\mathrm{TV}}
\leq
\sqrt{\frac{1}{2}H(\nu_i(\,\cdot\mid x_{<i})\mid\mu_i)}.
\end{align*}[/guided]