[proofplan]
We dominate both measures by $\lambda=\mu+\nu$ and split their Radon-Nikodym densities into a common part and two residual parts. The common part will be used to force equality of the two coordinates, while the residual parts will be shown to be mutually singular, so they cannot contribute equality. Finally we construct the coupling as a mixture of the common branch and the residual branch, compute its marginals, and identify the common mass with $1-\|\mu-\nu\|_{\mathrm{TV}}$.
[/proofplan]
[step:Split the two measures into a common part and residual parts]
Define the finite measure $\lambda$ on $(E,\mathcal E)$ by
\begin{align*}
\lambda(A):=\mu(A)+\nu(A)\quad\text{for every }A\in\mathcal E.
\end{align*}
Since $\mu\ll\lambda$ and $\nu\ll\lambda$, the [Radon-Nikodym theorem](/theorems/1247) for finite measures gives $\mathcal E$-measurable extended non-negative densities for $\mu$ and $\nu$ with respect to $\lambda$. Because both densities have finite integral with respect to the finite measure $\lambda$, they are finite $\lambda$-a.e.; replacing them by $0$ on the $\lambda$-null set where either density is infinite, we obtain $\mathcal E$-[measurable functions](/page/Measurable%20Functions)
\begin{align*}
f,g:(E,\mathcal E)\to([0,\infty),\mathcal B([0,\infty)))
\end{align*}
such that, for every $A\in\mathcal E$,
\begin{align*}
\mu(A)=\int_A f\,d\lambda(x)
\end{align*}
and
\begin{align*}
\nu(A)=\int_A g\,d\lambda(x).
\end{align*}
Define the measurable function $h:E\to[0,\infty)$ by $h(x):=\min\{f(x),g(x)\}$, and define the finite measure $\alpha$ on $(E,\mathcal E)$ by
\begin{align*}
\alpha(A):=\int_A h\,d\lambda(x)\quad\text{for every }A\in\mathcal E.
\end{align*}
Because $h\le f$ and $h\le g$, the set functions
\begin{align*}
\beta:=\mu-\alpha
\end{align*}
and
\begin{align*}
\gamma:=\nu-\alpha
\end{align*}
are finite positive measures. Let
\begin{align*}
m:=\alpha(E).
\end{align*}
Then $0\le m\le 1$, and
\begin{align*}
\beta(E)=\gamma(E)=1-m.
\end{align*}
[guided]
The first goal is to isolate exactly the mass that $\mu$ and $\nu$ share. To compare them pointwise, we choose one measure that dominates both. Define
\begin{align*}
\lambda(A):=\mu(A)+\nu(A)\quad\text{for every }A\in\mathcal E.
\end{align*}
If $\lambda(A)=0$, then $\mu(A)=0$ and $\nu(A)=0$, so $\mu\ll\lambda$ and $\nu\ll\lambda$. Therefore the [Radon-Nikodym theorem](/theorems/2640) for finite measures applies and gives extended non-negative $\mathcal E$-measurable densities for $\mu$ and $\nu$ with respect to $\lambda$. Since
\begin{align*}
\int_E f\,d\lambda(x)=\mu(E)=1
\end{align*}
and
\begin{align*}
\int_E g\,d\lambda(x)=\nu(E)=1,
\end{align*}
each density is finite $\lambda$-a.e. Replacing both densities by $0$ on the $\lambda$-null set where either one is infinite does not change any integral over a set in $\mathcal E$. Thus we may and do take $\mathcal E$-measurable functions
\begin{align*}
f,g:(E,\mathcal E)\to([0,\infty),\mathcal B([0,\infty)))
\end{align*}
such that
\begin{align*}
\mu(A)=\int_A f\,d\lambda(x)
\end{align*}
and
\begin{align*}
\nu(A)=\int_A g\,d\lambda(x)
\end{align*}
for every measurable set $A\in\mathcal E$.
The common part should have, at each point, exactly the smaller of the two densities. Define $h:E\to[0,\infty)$ by $h(x):=\min\{f(x),g(x)\}$. Since $f$ and $g$ are measurable, $h$ is measurable. Now define the finite measure $\alpha$ by
\begin{align*}
\alpha(A):=\int_A h\,d\lambda(x)\quad\text{for every }A\in\mathcal E.
\end{align*}
The inequalities $h\le f$ and $h\le g$ imply $\alpha(A)\le\mu(A)$ and $\alpha(A)\le\nu(A)$ for every $A\in\mathcal E$. Hence the residual set functions
\begin{align*}
\beta:=\mu-\alpha
\end{align*}
and
\begin{align*}
\gamma:=\nu-\alpha
\end{align*}
are positive finite measures.
Finally set
\begin{align*}
m:=\alpha(E).
\end{align*}
Since $\alpha\le\mu$ and $\mu(E)=1$, we have $0\le m\le 1$. Also,
\begin{align*}
\beta(E)=\mu(E)-\alpha(E)=1-m
\end{align*}
and
\begin{align*}
\gamma(E)=\nu(E)-\alpha(E)=1-m.
\end{align*}
Thus the two residual measures have the same total mass, so after normalization they can be used as the two marginals on the non-equality branch.
[/guided]
[/step]
[step:Show that the residual measures are mutually singular]
Define the measurable set
\begin{align*}
B:=\{x\in E:f(x)>g(x)\}.
\end{align*}
For every $A\in\mathcal E$,
\begin{align*}
\beta(A)=\int_A (f-h)\,d\lambda(x)=\int_A (f-g)^+\,d\lambda(x)
\end{align*}
and
\begin{align*}
\gamma(A)=\int_A (g-h)\,d\lambda(x)=\int_A (g-f)^+\,d\lambda(x).
\end{align*}
On $E\setminus B$ we have $(f-g)^+=0$, hence $\beta(E\setminus B)=0$. On $B$ we have $(g-f)^+=0$, hence $\gamma(B)=0$. Therefore $\beta$ and $\gamma$ are mutually singular.
[/step]
[step:Identify the common mass with one minus total variation distance]
Let $u:E\to[0,\infty)$ and $v:E\to[0,\infty)$ be the measurable functions defined by $u(x):=(f(x)-g(x))^+$ and $v(x):=(g(x)-f(x))^+$. Since
\begin{align*}
f-g=u-v
\end{align*}
pointwise and $\mu(E)=\nu(E)=1$, we have
\begin{align*}
0=\mu(E)-\nu(E)=\int_E (f-g)\,d\lambda(x)=\int_E u\,d\lambda(x)-\int_E v\,d\lambda(x).
\end{align*}
Hence
\begin{align*}
\int_E u\,d\lambda(x)=\int_E v\,d\lambda(x).
\end{align*}
For every $A\in\mathcal E$,
\begin{align*}
\mu(A)-\nu(A)=\int_A (f-g)\,d\lambda(x)\le \int_A u\,d\lambda(x)\le \int_E u\,d\lambda(x)
\end{align*}
and
\begin{align*}
\nu(A)-\mu(A)=\int_A (g-f)\,d\lambda(x)\le \int_A v\,d\lambda(x)\le \int_E v\,d\lambda(x)=\int_E u\,d\lambda(x).
\end{align*}
Therefore
\begin{align*}
|\mu(A)-\nu(A)|\le \int_E u\,d\lambda(x)
\end{align*}
for every $A\in\mathcal E$. Taking $A=B=\{f>g\}$ gives
\begin{align*}
\mu(B)-\nu(B)=\int_B u\,d\lambda(x)=\int_E u\,d\lambda(x),
\end{align*}
because $u=0$ on $E\setminus B$. Thus
\begin{align*}
\|\mu-\nu\|_{\mathrm{TV}}=\int_E (f-g)^+\,d\lambda(x).
\end{align*}
Since
\begin{align*}
(f-g)^+ + \min\{f,g\}=f
\end{align*}
pointwise and $\int_E f\,d\lambda(x)=\mu(E)=1$, we obtain
\begin{align*}
\|\mu-\nu\|_{\mathrm{TV}}+m=1.
\end{align*}
Thus
\begin{align*}
m=1-\|\mu-\nu\|_{\mathrm{TV}}.
\end{align*}
[/step]
[step:Construct the coupling in the identical-measure case]
If $m=1$, then $\alpha(E)=\mu(E)=\nu(E)=1$ and $\alpha\le\mu,\nu$, hence $\alpha=\mu=\nu$. Take the complete probability space $(\Omega,\mathcal F,\mathbb P):=(E,\mathcal E^\mu,\mu^\mu)$, where $\mathcal E^\mu$ is the $\mu$-completion of $\mathcal E$ and $\mu^\mu$ is the completed measure. Define $X,Y:\Omega\to E$ by
\begin{align*}
X(\omega):=\omega\quad\text{and}\quad Y(\omega):=\omega.
\end{align*}
Then $X$ and $Y$ both have law $\mu=\nu$, and $\mathbb P(X\neq Y)=0$. Since $m=1$, the preceding step gives $\|\mu-\nu\|_{\mathrm{TV}}=0$, so this coupling is maximal.
[/step]
[step:Construct the mixture coupling when residual mass is present]
Assume $m<1$, and define the probability measures $\rho$ and $\sigma$ on $(E,\mathcal E)$ by
\begin{align*}
\rho(A):=\frac{\beta(A)}{1-m}\quad\text{and}\quad \sigma(A):=\frac{\gamma(A)}{1-m}
\end{align*}
for every $A\in\mathcal E$.
If $m>0$, let $\overline{\alpha}$ be the probability measure $\alpha/m$ on $(E,\mathcal E)$. Define the measurable space
\begin{align*}
\Omega_0:=\{0\}\times E
\end{align*}
with measure $m\,\overline{\alpha}$. Define maps $X_0,Y_0:\Omega_0\to E$ by
\begin{align*}
X_0(0,z):=z\quad\text{and}\quad Y_0(0,z):=z.
\end{align*}
If $m=0$, omit this branch.
Define the residual branch
\begin{align*}
\Omega_1:=\{1\}\times E\times E
\end{align*}
with measure $(1-m)(\rho\otimes\sigma)$ on $\{1\}\times\mathcal E\otimes\mathcal E$. Define maps $X_1,Y_1:\Omega_1\to E$ by
\begin{align*}
X_1(1,x,y):=x\quad\text{and}\quad Y_1(1,x,y):=y.
\end{align*}
Let $I:=\{0,1\}$ if $m>0$ and $I:=\{1\}$ if $m=0$. Define the branch space $\Omega^\ast$ by
\begin{align*}
\Omega^\ast:=\bigl(\{0\}\times E\bigr)\cup\bigl(\{1\}\times E\times E\bigr)
\end{align*}
when $m>0$, and define $\Omega^\ast:=\{1\}\times E\times E$ when $m=0$. Let $\mathcal F^\ast$ be the disjoint-union $\sigma$-algebra: a set $C\subset\Omega^\ast$ lies in $\mathcal F^\ast$ exactly when $C\cap(\{0\}\times E)$ belongs to $\{0\}\times\mathcal E$ if $m>0$, and $C\cap(\{1\}\times E\times E)$ belongs to $\{1\}\times\mathcal E\otimes\mathcal E$. Define the probability measure $\mathbb P^\ast$ on $(\Omega^\ast,\mathcal F^\ast)$ by summing the branch measures $m\overline{\alpha}$ on $\{0\}\times E$ when $m>0$ and $(1-m)(\rho\otimes\sigma)$ on $\{1\}\times E\times E$. Let $(\Omega,\mathcal F,\mathbb P)$ be the completion of $(\Omega^\ast,\mathcal F^\ast,\mathbb P^\ast)$.
Define maps $X^\ast,Y^\ast:\Omega^\ast\to E$ as follows: on $\Omega_0$ when $m>0$, set $X^\ast:=X_0$ and $Y^\ast:=Y_0$; on $\Omega_1$, set $X^\ast:=X_1$ and $Y^\ast:=Y_1$. These maps are $\mathcal F^\ast/\mathcal E$-measurable because their restrictions to each branch are coordinate projections. Let $X,Y:\Omega\to E$ be the same pointwise maps regarded on the completed domain. Since completion enlarges only the domain $\sigma$-algebra, $X$ and $Y$ remain $\mathcal F/\mathcal E$-measurable.
For every $A\in\mathcal E$,
\begin{align*}
\mathbb P(X\in A)=\alpha(A)+\beta(A)=\mu(A)
\end{align*}
and
\begin{align*}
\mathbb P(Y\in A)=\alpha(A)+\gamma(A)=\nu(A).
\end{align*}
Hence $(X,Y)$ is a coupling of $\mu$ and $\nu$.
[guided]
When $m<1$, there is a genuine residual part. We normalize the residual measures by defining probability measures $\rho$ and $\sigma$ through
\begin{align*}
\rho(A):=\frac{\beta(A)}{1-m}\quad\text{and}\quad \sigma(A):=\frac{\gamma(A)}{1-m}
\end{align*}
for every $A\in\mathcal E$. These are probability measures because $\beta(E)=\gamma(E)=1-m$.
The coupling is built by a two-branch experiment. On the common branch, which has probability $m$, we draw one point and assign it to both coordinates. If $m>0$, define $\overline{\alpha}:=\alpha/m$ and put
\begin{align*}
\Omega_0:=\{0\}\times E.
\end{align*}
On this branch the measure is $m\,\overline{\alpha}$. The coordinate maps are $X_0,Y_0:\Omega_0\to E$, defined by
\begin{align*}
X_0(0,z):=z\quad\text{and}\quad Y_0(0,z):=z.
\end{align*}
Thus equality is forced on the whole common branch.
On the residual branch, which has probability $1-m$, we use the product measure of the normalized residual laws. Define
\begin{align*}
\Omega_1:=\{1\}\times E\times E
\end{align*}
with measure $(1-m)(\rho\otimes\sigma)$. Define maps $X_1,Y_1:\Omega_1\to E$ by
\begin{align*}
X_1(1,x,y):=x\quad\text{and}\quad Y_1(1,x,y):=y.
\end{align*}
The product measure is used only to obtain the required marginals; the crucial point is that $\rho$ and $\sigma$ are mutually singular because $\beta$ and $\gamma$ are.
Now take the measurable disjoint union of the two branches. Its $\sigma$-algebra consists of the sets whose intersections with the branch spaces are measurable in the corresponding branch $\sigma$-algebras, and its probability measure is obtained by summing the weighted branch measures. Complete this probability space and call the completion $(\Omega,\mathcal F,\mathbb P)$. The branchwise coordinate maps are measurable before completion because their restrictions are coordinate projections, and they remain measurable after completion because the domain $\sigma$-algebra has only been enlarged. Define $X$ and $Y$ by using the branchwise maps just described.
The completion is useful because equality events on arbitrary measurable spaces can be awkward at the product-space level, while null subsets become measurable after completion. We will prove in the next step that the residual equality event is contained in a measurable null set; this is the point where no countable-generation hypothesis on $\mathcal E$ is needed.
We check the first marginal. For any $A\in\mathcal E$, the common branch contributes $\alpha(A)$, because its weighted law is $m(\alpha/m)=\alpha$. The residual branch contributes
\begin{align*}
(1-m)\rho(A)=\beta(A).
\end{align*}
Therefore
\begin{align*}
\mathbb P(X\in A)=\alpha(A)+\beta(A)=\mu(A).
\end{align*}
The same computation for the second coordinate gives
\begin{align*}
\mathbb P(Y\in A)=\alpha(A)+\gamma(A)=\nu(A).
\end{align*}
Thus $X$ has law $\mu$ and $Y$ has law $\nu$, so $(X,Y)$ is a coupling.
[/guided]
[/step]
[step:Prove that equality occurs exactly on the common branch up to a null set]
Let $B=\{f>g\}$ be the measurable set from the singularity step. Since $\beta(E\setminus B)=0$ and $\gamma(B)=0$, the normalized residual measures satisfy
\begin{align*}
\rho(E\setminus B)=0
\end{align*}
and
\begin{align*}
\sigma(B)=0.
\end{align*}
On the residual branch, the set-theoretic equality event $\{X=Y\}$ is contained in
\begin{align*}
\{X\in E\setminus B\}\cup\{Y\in B\}.
\end{align*}
The containing set is measurable because $X$ and $Y$ are measurable and $B\in\mathcal E$, and it has probability
\begin{align*}
(1-m)\rho(E\setminus B)+(1-m)\sigma(B)=0.
\end{align*}
Since $(\Omega,\mathcal F,\mathbb P)$ is complete, every subset of this measurable null set belongs to $\mathcal F$. Hence the residual equality event is measurable and has probability zero. The equality event on the common branch is the whole common branch, which is measurable and has probability $m$. Therefore $\{X=Y\}\in\mathcal F$ and
\begin{align*}
\mathbb P(X=Y)=m.
\end{align*}
Using $m=1-\|\mu-\nu\|_{\mathrm{TV}}$, we obtain
\begin{align*}
\mathbb P(X\neq Y)=1-m=\|\mu-\nu\|_{\mathrm{TV}}.
\end{align*}
Thus the constructed coupling is maximal.
[/step]