[guided]We want to compare an arbitrary finite partition $\mathcal R$ of $X\times Y$ with a rectangle partition. This approximation uses only the product $\sigma$-algebra: measurable rectangles generate $\mathcal B_X\otimes\mathcal B_Y$, and finite rectangle algebras approximate product-measurable sets in measure.
Let $\rho:=\mu\otimes\nu$ and let $\mathcal R=\{R_1,\dots,R_m\}$ be a finite measurable partition of $X\times Y$. Fix $\varepsilon>0$. If $m=1$, then $\mathcal R=\{X\times Y\}$ up to null sets, and $H_\rho(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)=0$ for any finite measurable partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$. Hence the claim is immediate in that case, so we assume $m\ge 2$. We first justify the approximation fact used here. Let $\mathscr A$ be the class of all sets $E\in\mathcal B_X\otimes\mathcal B_Y$ such that, for every $\delta>0$, there is a set $F$ in a finite algebra generated by measurable rectangles with
\begin{align*}
\rho(E\triangle F)<\delta.
\end{align*}
Why is $\mathscr A$ a $\sigma$-algebra? Complements are immediate because symmetric difference commutes with complement. For countable disjoint unions, let $(E_j)_{j\in\mathbb N}$ be pairwise disjoint sets in $\mathscr A$, define $E:=\bigcup_{j=1}^\infty E_j$, and fix $\delta>0$. Since $\rho$ is finite, continuity from below gives $N\in\mathbb N$ such that
\begin{align*}
\rho\left(E\setminus\bigcup_{j=1}^N E_j\right)<\frac{\delta}{2}.
\end{align*}
For $1\le j\le N$, choose $F_j$ in a finite rectangle algebra with
\begin{align*}
\rho(E_j\triangle F_j)<\frac{\delta}{2N}.
\end{align*}
Then $F:=\bigcup_{j=1}^N F_j$ lies in the finite algebra generated by all rectangles used for $F_1,\dots,F_N$, and
\begin{align*}
\rho(E\triangle F)\le \rho\left(E\setminus\bigcup_{j=1}^N E_j\right)+\sum_{j=1}^N\rho(E_j\triangle F_j)<\delta.
\end{align*}
Thus $\mathscr A$ is closed under countable disjoint unions, hence is a $\sigma$-algebra. Every measurable rectangle $B\times C$ belongs to $\mathscr A$, because it is already in the finite algebra generated by that rectangle. Since measurable rectangles generate $\mathcal B_X\otimes\mathcal B_Y$, we get $\mathscr A=\mathcal B_X\otimes\mathcal B_Y$.
Now apply this density result to each atom $R_i$ with tolerance $\eta/m^2$, where $\eta>0$ will be chosen later. Taking the common refinement of the finitely many rectangle algebras obtained for $R_1,\dots,R_m$, we may choose sets $A_1,\dots,A_m$ belonging to one finite rectangle algebra such that
\begin{align*}
\sum_{i=1}^m \rho(R_i\triangle A_i)<\frac{\eta}{m}.
\end{align*}
A finite rectangle algebra is generated by finitely many sets in $X$ and finitely many sets in $Y$. Refining those generators into atoms gives finite measurable partitions $\mathcal P$ of $X$ and $\mathcal Q$ of $Y$ such that each $A_i$ is a union of atoms of $\mathcal P\boxtimes\mathcal Q$.
The sets $A_i$ need not themselves form a partition. We turn them into one without leaving the same rectangle algebra. Define
\begin{align*}
C_1:=A_1.
\end{align*}
For $2\le i\le m-1$, define
\begin{align*}
C_i:=A_i\setminus\bigcup_{\ell=1}^{i-1}A_\ell.
\end{align*}
Finally define
\begin{align*}
C_m:=(X\times Y)\setminus\bigcup_{\ell=1}^{m-1}C_\ell.
\end{align*}
Then $\mathcal C=\{C_1,\dots,C_m\}$ is a finite measurable partition, and every $C_i$ is still a union of atoms of $\mathcal P\boxtimes\mathcal Q$. Hence $\mathcal P\boxtimes\mathcal Q$ refines $\mathcal C$. We also keep explicit control of the approximation error. For $1\le i\le m-1$, because $C_i\subset A_i$,
\begin{align*}
R_i\triangle C_i\subset (R_i\triangle A_i)\cup(A_i\setminus C_i).
\end{align*}
The only points removed from $A_i$ are those already lying in some earlier $A_\ell$. If $z\in A_i\cap A_\ell$ with $\ell<i$, then $z$ cannot lie in both disjoint atoms $R_i$ and $R_\ell$, so $z\in(A_i\triangle R_i)\cup(A_\ell\triangle R_\ell)$. Thus
\begin{align*}
\sum_{i=1}^{m-1}\rho(R_i\triangle C_i)\le m\sum_{i=1}^m\rho(R_i\triangle A_i)<\eta.
\end{align*}
Finally, since both $\mathcal R$ and $\mathcal C$ are partitions, any point in $R_m\triangle C_m$ must already lie in one of the symmetric differences $R_i\triangle C_i$ for $1\le i\le m-1$. Therefore
\begin{align*}
\sum_{i=1}^m\rho(R_i\triangle C_i)<2\eta.
\end{align*}
Now we convert closeness in measure into small conditional entropy. Define the label maps
\begin{align*}
L_{\mathcal R}:X\times Y\to\{1,\dots,m\}
\end{align*}
and
\begin{align*}
L_{\mathcal C}:X\times Y\to\{1,\dots,m\}
\end{align*}
by $L_{\mathcal R}(z)=i$ for $z\in R_i$ and $L_{\mathcal C}(z)=i$ for $z\in C_i$. Let
\begin{align*}
E:=\{z\in X\times Y: L_{\mathcal R}(z)\ne L_{\mathcal C}(z)\}.
\end{align*}
If the two labels disagree at $z$, then $z$ belongs to at least one symmetric difference $R_i\triangle C_i$, so
\begin{align*}
\rho(E)\le \sum_{i=1}^m\rho(R_i\triangle C_i)<2\eta.
\end{align*}
Define the error-indicator map $I:X\times Y\to\{0,1\}$ by $I(z)=1$ for $z\in E$ and $I(z)=0$ for $z\notin E$. Given the $\mathcal C$-label and $I$, the $\mathcal R$-label is determined on $E^c$ and has at most $m$ possible values on $E$. Expanding the joint entropy of the finite labels $(L_{\mathcal R},L_{\mathcal C},I)$ by $H(A,B)=H(B)+H(A\mid B)$ gives the elementary bound
\begin{align*}
H_\rho(\mathcal R\mid\mathcal C)\le H_\rho(\{E,E^c\})+\rho(E)\log m.
\end{align*}
Writing out the two-atom entropy gives
\begin{align*}
H_\rho(\mathcal R\mid\mathcal C)\le -\rho(E)\log\rho(E)-(1-\rho(E))\log(1-\rho(E))+\rho(E)\log m.
\end{align*}
The right-hand side tends to $0$ as $\rho(E)\to 0$. Choose $\eta>0$ small enough that this upper bound is below $\varepsilon$. Since $\mathcal P\boxtimes\mathcal Q$ refines $\mathcal C$, there is a finite partition $\mathcal D$ such that $\mathcal P\boxtimes\mathcal Q=\mathcal C\vee\mathcal D$. The chain rule gives
\begin{align*}
H_\rho(\mathcal R\vee\mathcal D\mid\mathcal C)=H_\rho(\mathcal D\mid\mathcal C)+H_\rho(\mathcal R\mid\mathcal C\vee\mathcal D)
\end{align*}
and also
\begin{align*}
H_\rho(\mathcal R\vee\mathcal D\mid\mathcal C)=H_\rho(\mathcal R\mid\mathcal C)+H_\rho(\mathcal D\mid\mathcal R\vee\mathcal C).
\end{align*}
Why does this imply the desired direction of monotonicity? The finite-sum formula for conditional entropy says that, on each atom of $\mathcal C$, splitting further according to $\mathcal R$ cannot increase the average uncertainty of the $\mathcal D$-label; this is exactly the log-sum inequality. Hence
\begin{align*}
H_\rho(\mathcal D\mid\mathcal R\vee\mathcal C)\le H_\rho(\mathcal D\mid\mathcal C).
\end{align*}
Comparing the two chain-rule identities gives
\begin{align*}
H_\rho(\mathcal R\mid\mathcal C\vee\mathcal D)\le H_\rho(\mathcal R\mid\mathcal C).
\end{align*}
Since $\mathcal P\boxtimes\mathcal Q=\mathcal C\vee\mathcal D$, we conclude
\begin{align*}
H_\rho(\mathcal R\mid\mathcal P\boxtimes\mathcal Q)=H_\rho(\mathcal R\mid\mathcal C\vee\mathcal D)\le H_\rho(\mathcal R\mid\mathcal C)<\varepsilon.
\end{align*}
This proves the desired approximation in conditional entropy.[/guided]