Conditional Exchangeability Identification Formula

Conditional Exchangeability Identification Formula (Theorem # 9657)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the formula at the level of conditional kernels, not separately for each measurable set. First we choose regular conditional distributions for $Y_x$ given $L$, for $X$ given $L$, and for the joint event $\{Y\in A,X=x\}$ given $L$. Conditional exchangeability factors the conditional joint law of $(Y_x,X)$ given $L$, while consistency identifies the joint events $\{Y\in A,X=x\}$ and $\{Y_x\in A,X=x\}$. Positivity then permits division by the conditional treatment probability on the full target covariate support, and integrating over $\mathbb P_L$ gives the asserted identification formula. [/proofplan] [step:Choose conditional kernels and fix a common measurable version] Let $\mu:=\mathbb P_L$ denote the law of $L$ on $(E,\mathcal E)$. Since $E$ and $S$ are standard Borel spaces, regular conditional distributions exist. Choose a probability kernel \begin{align*} N_x:E\times\mathcal S\to[0,1] \end{align*} such that, for every $A\in\mathcal S$, \begin{align*} N_x(L,A)=\mathbb P(Y_x\in A\mid L) \quad \mathbb P\text{-a.s.} \end{align*} Choose a [measurable function](/page/Measurable%20Function) \begin{align*} q_x:E\to[0,1] \end{align*} such that \begin{align*} q_x(L)=\mathbb P(X=x\mid L) \quad \mathbb P\text{-a.s.} \end{align*} Finally, choose a finite kernel \begin{align*} H_x:E\times\mathcal S\to[0,1] \end{align*} such that, for every $A\in\mathcal S$, \begin{align*} H_x(L,A)=\mathbb P(Y\in A,\ X=x\mid L) \quad \mathbb P\text{-a.s.} \end{align*} For each $l\in E$, $H_x(l,\cdot)$ is a finite measure and $H_x(l,S)=q_x(l)$ after modifying $H_x$ on a $\mu$-null set. Because $S$ is standard Borel, its $\sigma$-algebra $\mathcal S$ has a countable determining class closed under finite intersections. All kernel identities below may therefore be first imposed on that countable class outside one $\mu$-null set and then extended to all $A\in\mathcal S$ by the [monotone class theorem](/theorems/4925). We use this convention once and for all, so the exceptional null sets do not depend on $A$. [/step] [step:Use consistency to replace observed outcomes by potential outcomes on the treated stratum] For $A\in\mathcal S$, define the events \begin{align*} B_A:=\{\omega\in\Omega:Y(\omega)\in A,\ X(\omega)=x\} \end{align*} and \begin{align*} C_A:=\{\omega\in\Omega:Y_x(\omega)\in A,\ X(\omega)=x\}. \end{align*} Both events belong to $\mathcal F$ because $Y$, $Y_x$, and $X$ are measurable. By consistency, $Y=Y_x$ $\mathbb P$-a.s. on $\{X=x\}$, hence $\mathbb P(B_A\triangle C_A)=0$. Therefore their conditional probabilities given $L$ agree: \begin{align*} \mathbb P(B_A\mid L)=\mathbb P(C_A\mid L) \quad \mathbb P\text{-a.s.} \end{align*} Using the kernel $H_x$ and the common-version convention from the previous step, this gives, for $\mu$-a.e. $l\in E$ and every $A\in\mathcal S$, \begin{align*} H_x(l,A)=\mathbb P(Y_x\in A,\ X=x\mid L=l). \end{align*} [guided] The role of consistency is exactly to connect the observed outcome $Y$ to the counterfactual outcome $Y_x$ inside the stratum in which the observed treatment is $x$. For a measurable outcome set $A\in\mathcal S$, consider \begin{align*} B_A:=\{\omega\in\Omega:Y(\omega)\in A,\ X(\omega)=x\} \end{align*} and \begin{align*} C_A:=\{\omega\in\Omega:Y_x(\omega)\in A,\ X(\omega)=x\}. \end{align*} These are measurable events because $Y:(\Omega,\mathcal F)\to(S,\mathcal S)$, $Y_x:(\Omega,\mathcal F)\to(S,\mathcal S)$, and $X:(\Omega,\mathcal F)\to(\{0,1\},2^{\{0,1\}})$ are measurable maps. Consistency says that $Y=Y_x$ except on a $\mathbb P$-null subset of $\{X=x\}$. Hence the two events $B_A$ and $C_A$ can differ only on that null subset. In symbols, \begin{align*} \mathbb P(B_A\triangle C_A)=0. \end{align*} [Conditional expectation](/page/Conditional%20Expectation) respects almost sure equality of indicators, so \begin{align*} \mathbb P(B_A\mid L)=\mathbb P(C_A\mid L) \quad \mathbb P\text{-a.s.} \end{align*} The left-hand side is represented by the kernel $H_x(L,A)$ by definition of $H_x$. Therefore, after the common version adjustment using the countable determining class for $\mathcal S$, we have for $\mu$-a.e. $l\in E$ and every $A\in\mathcal S$, \begin{align*} H_x(l,A)=\mathbb P(Y_x\in A,\ X=x\mid L=l). \end{align*} This is the point where the observational law first becomes connected to the potential-outcome law. [/guided] [/step] [step:Factor the conditional joint law using conditional exchangeability] Conditional exchangeability states that $(Y_0,Y_1)$ and $X$ are conditionally independent given $L$. Since $Y_x$ is a measurable coordinate function of the pair $(Y_0,Y_1)$, it follows that $Y_x$ and $X$ are conditionally independent given $L$. Thus, for $\mu$-a.e. $l\in E$ and every $A\in\mathcal S$, \begin{align*} \mathbb P(Y_x\in A,\ X=x\mid L=l)=N_x(l,A)q_x(l). \end{align*} Combining this factorization with the previous step gives, for $\mu$-a.e. $l\in E$ and every $A\in\mathcal S$, \begin{align*} H_x(l,A)=N_x(l,A)q_x(l). \end{align*} [/step] [step:Define the conditional outcome kernel on the positive treatment support] Fix a probability measure $\nu_0$ on $(S,\mathcal S)$. Define \begin{align*} K_x:E\times\mathcal S\to[0,1] \end{align*} by \begin{align*} K_x(l,A)=\frac{H_x(l,A)}{q_x(l)} \quad \text{if } q_x(l)>0 \end{align*} and \begin{align*} K_x(l,A)=\nu_0(A) \quad \text{if } q_x(l)=0. \end{align*} For each $A\in\mathcal S$, the map $l\mapsto K_x(l,A)$ is $\mathcal E$-measurable because $H_x(\cdot,A)$ and $q_x$ are measurable. For each $l$ with $q_x(l)>0$, $K_x(l,\cdot)$ is a probability measure since $H_x(l,\cdot)$ is a finite measure and $H_x(l,S)=q_x(l)$. For $l$ with $q_x(l)=0$, $K_x(l,\cdot)=\nu_0$ is a probability measure. Hence $K_x$ is a probability kernel from $E$ to $S$. Moreover, for every $A\in\mathcal S$, \begin{align*} H_x(l,A)=K_x(l,A)q_x(l) \end{align*} for every $l\in E$. This is exactly the disintegration identity for a regular conditional distribution of $Y$ given the stratum $(X=x,L=l)$, because $H_x(l,A)$ is the [conditional probability](/page/Conditional%20Probability) of $\{Y\in A,X=x\}$ given $L=l$ and $q_x(l)$ is the conditional probability of $\{X=x\}$ given $L=l$. [/step] [step:Divide by positivity and integrate over the covariate law] By positivity, $q_x(l)>0$ for $\mu$-a.e. $l\in E$. On this full $\mu$-measure set, the identities \begin{align*} H_x(l,A)=N_x(l,A)q_x(l) \end{align*} and \begin{align*} H_x(l,A)=K_x(l,A)q_x(l) \end{align*} allow division by $q_x(l)$. Therefore, for $\mu$-a.e. $l\in E$ and every $A\in\mathcal S$, \begin{align*} K_x(l,A)=N_x(l,A). \end{align*} Since $N_x$ is a regular conditional distribution of $Y_x$ given $L$, the [law of total probability](/theorems/1113) for regular conditional distributions gives \begin{align*} \mathbb P(Y_x\in A)=\int_E N_x(l,A)\,d\mu(l). \end{align*} Substituting $K_x(l,A)=N_x(l,A)$ $\mu$-a.e. yields \begin{align*} \mathbb P(Y_x\in A)=\int_E K_x(l,A)\,d\mu(l). \end{align*} Since $\mu=\mathbb P_L$, this is \begin{align*} \mathbb P(Y_x\in A)=\int_E K_x(l,A)\,d\mathbb P_L(l). \end{align*} Finally, because $K_x(L,A)$ is a version of $\mathbb P(Y\in A\mid X=x,L)$, the last integral is equivalently \begin{align*} \int_E K_x(l,A)\,d\mathbb P_L(l)=\mathbb E[K_x(L,A)]=\mathbb E[\mathbb P(Y\in A\mid X=x,L)]. \end{align*} This proves the asserted identification formula for every $A\in\mathcal S$. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Distribution

Explore Further

Distribution Definition Donsker's Theorem for the Uniform Empirical Process Probability & Statistics Consistency of Random Design Ordinary Least Squares Probability & Statistics Testing-to-Estimation Reduction Probability & Statistics McDiarmid's Bounded Differences Inequality Probability & Statistics Coefficient Recovery for Probability Generating Functions Probability & Statistics PGF of a Sum Probability Theory Tian-Pearl C-Component Factorization Probability & Statistics Immediate Return to Zero for Brownian Motion Brownian Motion Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.