Point-Treatment Adjustment Formula under Conditional Exchangeability

Point-Treatment Adjustment Formula under Conditional Exchangeability (Theorem # 9671)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first prove the distributional identity for a fixed Borel set $B$ by comparing three conditional probabilities on the stratum $\{A=a\}$. Conditional exchangeability identifies the conditional law of $Y_a$ given $(A=a,L=l)$ with the conditional law of $Y_a$ given $L=l$, while consistency identifies the conditional law of $Y_a$ with that of $Y$ on the same stratum. Positivity is used to pass from identities holding under the joint law of $(A,L)$ to identities holding for $\mu_L$-almost every $l$ at the point $A=a$. The expectation formula follows by the same argument with conditional expectations instead of indicators. [/proofplan] [step:Fix regular conditional versions and the positive treatment atom] Fix $B\in\mathcal B(\mathbb R)$. Define the indicator random variables \begin{align*} I_a:\Omega\to\{0,1\},\qquad I_a(\omega)=\mathbb 1_{\{Y_a\in B\}}(\omega), \end{align*} and \begin{align*} I:\Omega\to\{0,1\},\qquad I(\omega)=\mathbb 1_{\{Y\in B\}}(\omega). \end{align*} Let $p_a:\mathcal L\to[0,1]$ be a measurable version of $\mathbb P(A=a\mid L=l)$ satisfying $p_a(l)>0$ for $\mu_L$-almost every $l$. Because $\mathcal T$, $\mathcal L$, and $\mathbb R$ are standard Borel spaces, regular conditional laws exist. Choose measurable kernels \begin{align*} K_a:\mathcal L\times\mathcal B(\mathbb R)\to[0,1] \end{align*} and \begin{align*} H_a:(\mathcal T\times\mathcal L)\times\mathcal B(\mathbb R)\to[0,1] \end{align*} such that \begin{align*} K_a(l,B)=\mathbb P(Y_a\in B\mid L=l) \end{align*} for $\mu_L$-almost every $l$, and \begin{align*} H_a(t,l,B)=\mathbb P(Y_a\in B\mid A=t,L=l) \end{align*} for the joint law of $(A,L)$-almost every $(t,l)$. Also let \begin{align*} H:(\mathcal T\times\mathcal L)\times\mathcal B(\mathbb R)\to[0,1] \end{align*} be an arbitrary regular conditional version of the law of $Y$ given $(A,L)$. [guided] We fix the Borel set $B$ because the distributional formula is a pointwise statement in $B$. The two random variables \begin{align*} I_a:\Omega\to\{0,1\},\qquad I_a(\omega)=\mathbb 1_{\{Y_a\in B\}}(\omega), \end{align*} and \begin{align*} I:\Omega\to\{0,1\},\qquad I(\omega)=\mathbb 1_{\{Y\in B\}}(\omega) \end{align*} are measurable because $Y_a$ and $Y$ are measurable maps into $(\mathbb R,\mathcal B(\mathbb R))$ and $B$ is Borel. The positivity hypothesis gives a measurable version \begin{align*} p_a:\mathcal L\to[0,1] \end{align*} of $\mathbb P(A=a\mid L=l)$ with $p_a(l)>0$ for $\mu_L$-almost every $l$. This function is the [conditional probability](/page/Conditional%20Probability) mass of the point treatment value $a$ inside the covariate stratum $L=l$. Since the conditioning variables take values in standard Borel spaces, regular conditional laws exist. We choose \begin{align*} K_a:\mathcal L\times\mathcal B(\mathbb R)\to[0,1] \end{align*} so that $K_a(l,B)$ is a version of $\mathbb P(Y_a\in B\mid L=l)$, and choose \begin{align*} H_a:(\mathcal T\times\mathcal L)\times\mathcal B(\mathbb R)\to[0,1] \end{align*} so that $H_a(t,l,B)$ is a version of $\mathbb P(Y_a\in B\mid A=t,L=l)$. Finally, $H$ denotes the arbitrary regular conditional law of $Y$ given $(A,L)$ appearing in the theorem. The point of naming these kernels is that version issues occur precisely at the point $(a,l)$, and the proof must show that positivity makes that point identifiable for $\mu_L$-almost every $l$. [/guided] [/step] [step:Use conditional exchangeability to identify the potential-outcome kernel on the stratum $A=a$] We claim that \begin{align*} H_a(a,l,B)=K_a(l,B) \end{align*} for $\mu_L$-almost every $l$. Let $D\in\mathcal G$. Since $H_a$ is a regular conditional version of the law of $Y_a$ given $(A,L)$, \begin{align*} \mathbb E[I_a\mathbb 1_{\{A=a\}}\mathbb 1_{\{L\in D\}}]=\int_D H_a(a,l,B)p_a(l)\,d\mu_L(l). \end{align*} Conditional exchangeability $Y_a\perp\!\!\!\perp A\mid L$ gives \begin{align*} \mathbb E[I_a\mathbb 1_{\{A=a\}}\mid L]=\mathbb E[I_a\mid L]\mathbb P(A=a\mid L) \end{align*} a.s. Therefore, \begin{align*} \mathbb E[I_a\mathbb 1_{\{A=a\}}\mathbb 1_{\{L\in D\}}]=\int_D K_a(l,B)p_a(l)\,d\mu_L(l). \end{align*} Since this holds for every $D\in\mathcal G$, \begin{align*} H_a(a,l,B)p_a(l)=K_a(l,B)p_a(l) \end{align*} for $\mu_L$-almost every $l$. Positivity gives $p_a(l)>0$ for $\mu_L$-almost every $l$, so cancellation yields \begin{align*} H_a(a,l,B)=K_a(l,B) \end{align*} for $\mu_L$-almost every $l$. [/step] [step:Use consistency to replace the potential outcome by the observed outcome on $A=a$] We claim that \begin{align*} H(a,l,B)=H_a(a,l,B) \end{align*} for $\mu_L$-almost every $l$. Let $D\in\mathcal G$. Consistency gives $I=I_a$ a.s. on $\{A=a\}$, hence \begin{align*} \mathbb E[I\mathbb 1_{\{A=a\}}\mathbb 1_{\{L\in D\}}]=\mathbb E[I_a\mathbb 1_{\{A=a\}}\mathbb 1_{\{L\in D\}}]. \end{align*} Using the regular conditional kernels $H$ and $H_a$, this becomes \begin{align*} \int_D H(a,l,B)p_a(l)\,d\mu_L(l)=\int_D H_a(a,l,B)p_a(l)\,d\mu_L(l). \end{align*} Since $D\in\mathcal G$ was arbitrary, \begin{align*} H(a,l,B)p_a(l)=H_a(a,l,B)p_a(l) \end{align*} for $\mu_L$-almost every $l$. Positivity again permits cancellation of $p_a(l)$, and therefore \begin{align*} H(a,l,B)=H_a(a,l,B) \end{align*} for $\mu_L$-almost every $l$. [/step] [step:Integrate over the marginal law of $L$ to obtain the distributional formula] By the tower property for [conditional expectation](/page/Conditional%20Expectation), \begin{align*} \mathbb P(Y_a\in B)=\mathbb E[I_a]=\int_{\mathcal L}K_a(l,B)\,d\mu_L(l). \end{align*} The preceding two steps give \begin{align*} K_a(l,B)=H(a,l,B) \end{align*} for $\mu_L$-almost every $l$. Substitution into the preceding integral gives \begin{align*} \mathbb P(Y_a\in B)=\int_{\mathcal L}H(a,l,B)\,d\mu_L(l). \end{align*} Since $H(a,l,B)$ is the chosen regular conditional version $\mathbb P(Y\in B\mid A=a,L=l)$, this is exactly \begin{align*} \mathbb P(Y_a\in B)=\int_{\mathcal L}\mathbb P(Y\in B\mid A=a,L=l)\,d\mu_L(l). \end{align*} The argument used only that $H$ was a regular conditional version of the law of $Y$ given $(A,L)$. Thus every such version gives the same integral value under the stated positivity assumption. [/step] [step:Repeat the argument with integrable outcomes to identify the mean] Assume now that $Y$ and $Y_a$ are integrable. Let \begin{align*} m_a:\mathcal L\to\mathbb R \end{align*} be a version of $\mathbb E[Y_a\mid L=l]$, let \begin{align*} r_a:\mathcal T\times\mathcal L\to\mathbb R \end{align*} be a version of $\mathbb E[Y_a\mid A=t,L=l]$, and let \begin{align*} r:\mathcal T\times\mathcal L\to\mathbb R \end{align*} be a version of $\mathbb E[Y\mid A=t,L=l]$. Conditional exchangeability gives \begin{align*} \mathbb E[Y_a\mathbb 1_{\{A=a\}}\mid L]=\mathbb E[Y_a\mid L]\mathbb P(A=a\mid L) \end{align*} a.s. Therefore, for every $D\in\mathcal G$, \begin{align*} \int_D r_a(a,l)p_a(l)\,d\mu_L(l)=\int_D m_a(l)p_a(l)\,d\mu_L(l). \end{align*} By positivity, \begin{align*} r_a(a,l)=m_a(l) \end{align*} for $\mu_L$-almost every $l$. Consistency gives $Y=Y_a$ a.s. on $\{A=a\}$, and hence, for every $D\in\mathcal G$, \begin{align*} \int_D r(a,l)p_a(l)\,d\mu_L(l)=\int_D r_a(a,l)p_a(l)\,d\mu_L(l). \end{align*} Again positivity yields \begin{align*} r(a,l)=r_a(a,l) \end{align*} for $\mu_L$-almost every $l$. Combining the two identities gives \begin{align*} r(a,l)=m_a(l) \end{align*} for $\mu_L$-almost every $l$. Finally, the tower property gives \begin{align*} \mathbb E[Y_a]=\int_{\mathcal L}m_a(l)\,d\mu_L(l). \end{align*} Substituting $m_a(l)=r(a,l)$ $\mu_L$-almost everywhere gives \begin{align*} \mathbb E[Y_a]=\int_{\mathcal L}r(a,l)\,d\mu_L(l). \end{align*} Equivalently, \begin{align*} \mathbb E[Y_a]=\mathbb E[\mathbb E[Y\mid A=a,L]]. \end{align*} This completes the proof. [/step]

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.