Coupling Inequality for Total Variation Distance

Coupling Inequality for Total Variation Distance (Theorem # 7238)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The proof is a direct comparison of probabilities of measurable sets. For a fixed measurable set $A$, the indicators $\mathbb{1}_A(X)$ and $\mathbb{1}_A(Y)$ can differ only on the event $\{X\ne Y\}$, and their difference has absolute value at most $1$. Taking the supremum over $A$ gives the total variation estimate. The Markov-chain estimate then follows by applying the first estimate to the coupled pair $(X_k,Y_k)$ and using faithfulness to identify $\{X_k\ne Y_k\}$ as a subset of the event that the two chains have not yet coupled by time $k$. [/proofplan] [step:Bound the discrepancy on a fixed measurable set by the disagreement event] Fix $A\in\mathcal E$. Let $\mathbb{1}_A:E\to\{0,1\}$ denote the indicator map of $A$. The bounded real-valued [random variable](/page/Random%20Variable) $Z_A:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ defined by $Z_A:=\mathbb{1}_A(X)-\mathbb{1}_A(Y)$ has expectation $\mathbb E[Z_A]=\int_\Omega Z_A\,d\mathbb P$. Since $X$ and $Y$ have laws $\mu$ and $\nu$, \begin{align*} \mu(A)-\nu(A)=\mathbb E[\mathbb{1}_A(X)-\mathbb{1}_A(Y)]. \end{align*} The event $\{X=Y\}$ is measurable because $\{X=Y\}=(X,Y)^{-1}(\Delta_E)$ and $\Delta_E\in\mathcal E\otimes\mathcal E$. On $\{X=Y\}$ the two indicators agree, while on $\{X\ne Y\}$ their difference has absolute value at most $1$. Therefore, \begin{align*} |\mathbb{1}_A(X)-\mathbb{1}_A(Y)|\leq \mathbb{1}_{\{X\ne Y\}}. \end{align*} Taking absolute values in the expectation and using monotonicity of expectation gives \begin{align*} |\mu(A)-\nu(A)|\leq \mathbb E[\mathbb{1}_{\{X\ne Y\}}]=\mathbb P(X\ne Y). \end{align*} [guided] Fix a measurable set $A\in\mathcal E$. The total variation distance is obtained by taking the largest possible difference between $\mu(A)$ and $\nu(A)$, so we first estimate this difference for one arbitrary $A$. Let $\mathbb{1}_A:E\to\{0,1\}$ be the indicator map of $A$. Because $\mu=\mathbb P\circ X^{-1}$ and $\nu=\mathbb P\circ Y^{-1}$, the probabilities of $A$ under the two laws are \begin{align*} \mu(A)=\mathbb P(X\in A) \end{align*} and \begin{align*} \nu(A)=\mathbb P(Y\in A). \end{align*} Equivalently, \begin{align*} \mu(A)-\nu(A)=\mathbb E[\mathbb{1}_A(X)-\mathbb{1}_A(Y)]. \end{align*} The point of introducing the indicators is that they agree whenever the two random variables take the same value. The event $\{X=Y\}$ is measurable: indeed, the map $(X,Y):(\Omega,\mathcal F)\to(E\times E,\mathcal E\otimes\mathcal E)$ is measurable, and \begin{align*} \{X=Y\}=(X,Y)^{-1}(\Delta_E). \end{align*} Since $\Delta_E\in\mathcal E\otimes\mathcal E$, this preimage belongs to $\mathcal F$. Now consider an outcome $\omega\in\Omega$. If $X(\omega)=Y(\omega)$, then $\mathbb{1}_A(X(\omega))=\mathbb{1}_A(Y(\omega))$, so the difference is $0$. If $X(\omega)\ne Y(\omega)$, the two indicators are each either $0$ or $1$, so their difference has absolute value at most $1$. Thus, pointwise on $\Omega$, \begin{align*} |\mathbb{1}_A(X)-\mathbb{1}_A(Y)|\leq \mathbb{1}_{\{X\ne Y\}}. \end{align*} Taking absolute values in the expectation and using the elementary inequality $|\mathbb E[Z]|\leq \mathbb E[|Z|]$ for the bounded real-valued random variable $Z:=\mathbb{1}_A(X)-\mathbb{1}_A(Y)$, we obtain \begin{align*} |\mu(A)-\nu(A)|\leq \mathbb E[|\mathbb{1}_A(X)-\mathbb{1}_A(Y)|]. \end{align*} Using the pointwise bound and monotonicity of expectation, \begin{align*} |\mu(A)-\nu(A)|\leq \mathbb E[\mathbb{1}_{\{X\ne Y\}}]=\mathbb P(X\ne Y). \end{align*} This is the desired estimate for the fixed set $A$. [/guided] [/step] [step:Take the supremum over measurable sets to obtain total variation] The preceding step proves that for every $A\in\mathcal E$, \begin{align*} |\mu(A)-\nu(A)|\leq \mathbb P(X\ne Y). \end{align*} Taking the supremum over all $A\in\mathcal E$ and using the stated convention for total variation gives \begin{align*} \|\mu-\nu\|_{\mathrm{TV}}=\sup_{A\in\mathcal E}|\mu(A)-\nu(A)|\leq \mathbb P(X\ne Y). \end{align*} [/step] [step:Apply the coupling bound to the two coordinate chains at time $k$] Fix $x,y\in E$ and $k\geq 0$. Under $\mathbb P_{x,y}$, the random variables $X_k:(\Omega_{x,y},\mathcal F_{x,y})\to(E,\mathcal E)$ and $Y_k:(\Omega_{x,y},\mathcal F_{x,y})\to(E,\mathcal E)$ have laws $P^k(x,\cdot)$ and $P^k(y,\cdot)$, respectively, because the coupling has $P$-Markov coordinate chains started from $x$ and $y$. Applying the first part of the theorem to $X_k$ and $Y_k$ under $\mathbb P_{x,y}$ gives \begin{align*} \|P^k(x,\cdot)-P^k(y,\cdot)\|_{\mathrm{TV}}\leq \mathbb P_{x,y}(X_k\ne Y_k). \end{align*} [/step] [step:Use faithfulness to replace disagreement at time $k$ by noncoupling by time $k$] By definition, \begin{align*} \tau=\inf\{m\geq 0:X_m=Y_m\}. \end{align*} Faithfulness says that once the chains meet, they remain together: if $\tau\leq k$, then $X_k=Y_k$. Equivalently, \begin{align*} \{X_k\ne Y_k\}\subseteq \{\tau>k\}. \end{align*} Therefore, \begin{align*} \mathbb P_{x,y}(X_k\ne Y_k)\leq \mathbb P_{x,y}(\tau>k). \end{align*} Combining this with the estimate from the previous step yields \begin{align*} \|P^k(x,\cdot)-P^k(y,\cdot)\|_{\mathrm{TV}}\leq \mathbb P_{x,y}(\tau>k). \end{align*} This is the desired Markov-chain coupling inequality. [/step]

Explore Further

Transitivity of Polynomial-Time Many-One Reductions applied Deterministic Separation Principle applied Weierstrass Theorem for Proper Lower Semicontinuous Coercive Functions applied Closure of Deterministic Space Under Complement applied Existence and Compactness of Minimizers for Level-Bounded Convex Programs applied Finite-Dimensional Spectral Decomposition applied Rotated Second-Order Cone Reformulation of a Convex Quadratic Bound applied Moreau Decomposition Theorem for Closed Convex Cones applied

What brings you to Androma?

Start with a route through the knowledge graph.