[proofplan]
The proof is a direct comparison of probabilities of measurable sets. For a fixed measurable set $A$, the indicators $\mathbb{1}_A(X)$ and $\mathbb{1}_A(Y)$ can differ only on the event $\{X\ne Y\}$, and their difference has absolute value at most $1$. Taking the supremum over $A$ gives the total variation estimate. The Markov-chain estimate then follows by applying the first estimate to the coupled pair $(X_k,Y_k)$ and using faithfulness to identify $\{X_k\ne Y_k\}$ as a subset of the event that the two chains have not yet coupled by time $k$.
[/proofplan]
[step:Bound the discrepancy on a fixed measurable set by the disagreement event]
Fix $A\in\mathcal E$. Let $\mathbb{1}_A:E\to\{0,1\}$ denote the indicator map of $A$. The bounded real-valued [random variable](/page/Random%20Variable) $Z_A:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ defined by $Z_A:=\mathbb{1}_A(X)-\mathbb{1}_A(Y)$ has expectation $\mathbb E[Z_A]=\int_\Omega Z_A\,d\mathbb P$. Since $X$ and $Y$ have laws $\mu$ and $\nu$,
\begin{align*}
\mu(A)-\nu(A)=\mathbb E[\mathbb{1}_A(X)-\mathbb{1}_A(Y)].
\end{align*}
The event $\{X=Y\}$ is measurable because $\{X=Y\}=(X,Y)^{-1}(\Delta_E)$ and $\Delta_E\in\mathcal E\otimes\mathcal E$. On $\{X=Y\}$ the two indicators agree, while on $\{X\ne Y\}$ their difference has absolute value at most $1$. Therefore,
\begin{align*}
|\mathbb{1}_A(X)-\mathbb{1}_A(Y)|\leq \mathbb{1}_{\{X\ne Y\}}.
\end{align*}
Taking absolute values in the expectation and using monotonicity of expectation gives
\begin{align*}
|\mu(A)-\nu(A)|\leq \mathbb E[\mathbb{1}_{\{X\ne Y\}}]=\mathbb P(X\ne Y).
\end{align*}
[guided]
Fix a measurable set $A\in\mathcal E$. The total variation distance is obtained by taking the largest possible difference between $\mu(A)$ and $\nu(A)$, so we first estimate this difference for one arbitrary $A$. Let $\mathbb{1}_A:E\to\{0,1\}$ be the indicator map of $A$. Because $\mu=\mathbb P\circ X^{-1}$ and $\nu=\mathbb P\circ Y^{-1}$, the probabilities of $A$ under the two laws are
\begin{align*}
\mu(A)=\mathbb P(X\in A)
\end{align*}
and
\begin{align*}
\nu(A)=\mathbb P(Y\in A).
\end{align*}
Equivalently,
\begin{align*}
\mu(A)-\nu(A)=\mathbb E[\mathbb{1}_A(X)-\mathbb{1}_A(Y)].
\end{align*}
The point of introducing the indicators is that they agree whenever the two random variables take the same value. The event $\{X=Y\}$ is measurable: indeed, the map $(X,Y):(\Omega,\mathcal F)\to(E\times E,\mathcal E\otimes\mathcal E)$ is measurable, and
\begin{align*}
\{X=Y\}=(X,Y)^{-1}(\Delta_E).
\end{align*}
Since $\Delta_E\in\mathcal E\otimes\mathcal E$, this preimage belongs to $\mathcal F$.
Now consider an outcome $\omega\in\Omega$. If $X(\omega)=Y(\omega)$, then $\mathbb{1}_A(X(\omega))=\mathbb{1}_A(Y(\omega))$, so the difference is $0$. If $X(\omega)\ne Y(\omega)$, the two indicators are each either $0$ or $1$, so their difference has absolute value at most $1$. Thus, pointwise on $\Omega$,
\begin{align*}
|\mathbb{1}_A(X)-\mathbb{1}_A(Y)|\leq \mathbb{1}_{\{X\ne Y\}}.
\end{align*}
Taking absolute values in the expectation and using the elementary inequality $|\mathbb E[Z]|\leq \mathbb E[|Z|]$ for the bounded real-valued random variable $Z:=\mathbb{1}_A(X)-\mathbb{1}_A(Y)$, we obtain
\begin{align*}
|\mu(A)-\nu(A)|\leq \mathbb E[|\mathbb{1}_A(X)-\mathbb{1}_A(Y)|].
\end{align*}
Using the pointwise bound and monotonicity of expectation,
\begin{align*}
|\mu(A)-\nu(A)|\leq \mathbb E[\mathbb{1}_{\{X\ne Y\}}]=\mathbb P(X\ne Y).
\end{align*}
This is the desired estimate for the fixed set $A$.
[/guided]
[/step]
[step:Take the supremum over measurable sets to obtain total variation]
The preceding step proves that for every $A\in\mathcal E$,
\begin{align*}
|\mu(A)-\nu(A)|\leq \mathbb P(X\ne Y).
\end{align*}
Taking the supremum over all $A\in\mathcal E$ and using the stated convention for total variation gives
\begin{align*}
\|\mu-\nu\|_{\mathrm{TV}}=\sup_{A\in\mathcal E}|\mu(A)-\nu(A)|\leq \mathbb P(X\ne Y).
\end{align*}
[/step]
[step:Apply the coupling bound to the two coordinate chains at time $k$]
Fix $x,y\in E$ and $k\geq 0$. Under $\mathbb P_{x,y}$, the random variables $X_k:(\Omega_{x,y},\mathcal F_{x,y})\to(E,\mathcal E)$ and $Y_k:(\Omega_{x,y},\mathcal F_{x,y})\to(E,\mathcal E)$ have laws $P^k(x,\cdot)$ and $P^k(y,\cdot)$, respectively, because the coupling has $P$-Markov coordinate chains started from $x$ and $y$. Applying the first part of the theorem to $X_k$ and $Y_k$ under $\mathbb P_{x,y}$ gives
\begin{align*}
\|P^k(x,\cdot)-P^k(y,\cdot)\|_{\mathrm{TV}}\leq \mathbb P_{x,y}(X_k\ne Y_k).
\end{align*}
[/step]
[step:Use faithfulness to replace disagreement at time $k$ by noncoupling by time $k$]
By definition,
\begin{align*}
\tau=\inf\{m\geq 0:X_m=Y_m\}.
\end{align*}
Faithfulness says that once the chains meet, they remain together: if $\tau\leq k$, then $X_k=Y_k$. Equivalently,
\begin{align*}
\{X_k\ne Y_k\}\subseteq \{\tau>k\}.
\end{align*}
Therefore,
\begin{align*}
\mathbb P_{x,y}(X_k\ne Y_k)\leq \mathbb P_{x,y}(\tau>k).
\end{align*}
Combining this with the estimate from the previous step yields
\begin{align*}
\|P^k(x,\cdot)-P^k(y,\cdot)\|_{\mathrm{TV}}\leq \mathbb P_{x,y}(\tau>k).
\end{align*}
This is the desired Markov-chain coupling inequality.
[/step]