Identification of the Average Treatment Effect Under Strong Ignorability

Identification of the Average Treatment Effect Under Strong Ignorability (Theorem # 9659)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We first make the conditional mean on each treatment stratum precise as a $\sigma(X)$-measurable [random variable](/page/Random%20Variable). Consistency replaces the observed outcome $Y$ by the corresponding potential outcome $Y(t)$ on the event $\{T=t\}$. Conditional exchangeability then removes the treatment indicator from the conditional mean, and positivity lets us cancel the conditional treatment probability. Applying this for $t=1$ and $t=0$, subtracting, and using the defining property of [conditional expectation](/page/Conditional%20Expectation) gives the average treatment effect. [/proofplan] [step:Define the conditional treatment probabilities and potential-outcome regressions] For each $t\in\{0,1\}$, define the conditional treatment probability \begin{align*} \pi_t:=\mathbb P(T=t\mid\sigma(X)). \end{align*} Thus $\pi_t:\Omega\to[0,1]$ is a $\sigma(X)$-measurable random variable satisfying \begin{align*} \int_B \pi_t\,d\mathbb P = \mathbb P(B\cap\{T=t\}) \end{align*} for every $B\in\sigma(X)$. By positivity, $\pi_t>0$ $\mathbb P$-a.s. for each $t\in\{0,1\}$. For each $t\in\{0,1\}$, define \begin{align*} m_t:=\mathbb E[Y(t)\mid\sigma(X)]. \end{align*} Since $Y(t)\in L^1(\Omega,\mathcal F,\mathbb P)$, the conditional expectation $m_t$ exists, is integrable, and is $\sigma(X)$-measurable. [/step] [step:Show that the observed stratum mean equals the potential-outcome regression] Fix $t\in\{0,1\}$. We prove that $m_t$ is a valid version of $\mathbb E[Y\mid T=t,X]$ in the sense of the statement. Let $B\in\sigma(X)$. By the defining property of conditional expectation and the fact that $\mathbb 1_B\pi_t$ is bounded and $\sigma(X)$-measurable, \begin{align*} \int_B m_t\pi_t\,d\mathbb P = \int_\Omega \mathbb 1_B\pi_tY(t)\,d\mathbb P. \end{align*} Conditional independence of $Y(t)$ and $T$ given $\sigma(X)$ implies \begin{align*} \mathbb E[\mathbb 1_{\{T=t\}}Y(t)\mid\sigma(X)] = \pi_t m_t. \end{align*} Therefore, \begin{align*} \int_B m_t\pi_t\,d\mathbb P = \int_B \mathbb E[\mathbb 1_{\{T=t\}}Y(t)\mid\sigma(X)]\,d\mathbb P = \int_{B\cap\{T=t\}}Y(t)\,d\mathbb P. \end{align*} By consistency, $Y=Y(t)$ $\mathbb P$-a.s. on $\{T=t\}$, so \begin{align*} \int_{B\cap\{T=t\}}Y(t)\,d\mathbb P = \int_{B\cap\{T=t\}}Y\,d\mathbb P. \end{align*} Combining these identities gives \begin{align*} \int_B m_t\pi_t\,d\mathbb P = \int_{B\cap\{T=t\}}Y\,d\mathbb P. \end{align*} Thus $m_t$ satisfies the defining identity for $\mathbb E[Y\mid T=t,X]$. [guided] Fix $t\in\{0,1\}$. The goal is to identify the observed regression in the treatment stratum $T=t$. Because conditioning on the event $\{T=t\}$ together with $X$ can be undefined on covariate strata where treatment has zero probability, the statement defines $\mathbb E[Y\mid T=t,X]$ through the weighted identity \begin{align*} \int_B r_t\pi_t\,d\mathbb P = \int_{B\cap\{T=t\}}Y\,d\mathbb P \end{align*} for every $B\in\sigma(X)$, where $\pi_t=\mathbb P(T=t\mid\sigma(X))$. Positivity gives $\pi_t>0$ $\mathbb P$-a.s., so this identity determines the $\sigma(X)$-measurable regression uniquely up to $\mathbb P$-a.s. equality. We claim that the potential-outcome regression \begin{align*} m_t:=\mathbb E[Y(t)\mid\sigma(X)] \end{align*} is such an $r_t$. Let $B\in\sigma(X)$. Since $m_t$ is the conditional expectation of $Y(t)$ given $\sigma(X)$, and since $\mathbb 1_B\pi_t$ is a bounded $\sigma(X)$-measurable random variable, the defining property of conditional expectation gives \begin{align*} \int_B m_t\pi_t\,d\mathbb P = \int_\Omega \mathbb 1_B\pi_tY(t)\,d\mathbb P. \end{align*} Now we use conditional ignorability. Strong ignorability says that $Y(t)$ and the treatment indicator $T$ are conditionally independent given $\sigma(X)$. Hence the conditional expectation of the product $\mathbb 1_{\{T=t\}}Y(t)$ factors into the product of the conditional expectations: \begin{align*} \mathbb E[\mathbb 1_{\{T=t\}}Y(t)\mid\sigma(X)] = \mathbb E[\mathbb 1_{\{T=t\}}\mid\sigma(X)]\,\mathbb E[Y(t)\mid\sigma(X)]. \end{align*} By the definitions of $\pi_t$ and $m_t$, this becomes \begin{align*} \mathbb E[\mathbb 1_{\{T=t\}}Y(t)\mid\sigma(X)] = \pi_t m_t. \end{align*} Integrating this identity over $B\in\sigma(X)$ and using the defining property of conditional expectation yields \begin{align*} \int_B m_t\pi_t\,d\mathbb P = \int_B \mathbb E[\mathbb 1_{\{T=t\}}Y(t)\mid\sigma(X)]\,d\mathbb P = \int_{B\cap\{T=t\}}Y(t)\,d\mathbb P. \end{align*} Finally, consistency converts the potential outcome into the observed outcome on the treatment stratum. Since $Y=Y(t)$ $\mathbb P$-a.s. on $\{T=t\}$, we have \begin{align*} \int_{B\cap\{T=t\}}Y(t)\,d\mathbb P = \int_{B\cap\{T=t\}}Y\,d\mathbb P. \end{align*} Therefore \begin{align*} \int_B m_t\pi_t\,d\mathbb P = \int_{B\cap\{T=t\}}Y\,d\mathbb P. \end{align*} This is exactly the defining identity for $\mathbb E[Y\mid T=t,X]$, so the observed stratum mean equals $\mathbb E[Y(t)\mid\sigma(X)]$. [/guided] [/step] [step:Apply the stratum identity for treated and control units] From the previous step, for $t=1$, \begin{align*} \mathbb E[Y\mid T=1,X] = \mathbb E[Y(1)\mid\sigma(X)] \end{align*} up to $\mathbb P$-a.s. equality. For $t=0$, \begin{align*} \mathbb E[Y\mid T=0,X] = \mathbb E[Y(0)\mid\sigma(X)] \end{align*} up to $\mathbb P$-a.s. equality. Since $Y(1)$ and $Y(0)$ are integrable, both conditional expectations are integrable, and linearity of conditional expectation gives \begin{align*} \mathbb E[Y(1)\mid\sigma(X)]-\mathbb E[Y(0)\mid\sigma(X)] = \mathbb E[Y(1)-Y(0)\mid\sigma(X)]. \end{align*} Therefore \begin{align*} \mathbb E[Y\mid T=1,X]-\mathbb E[Y\mid T=0,X] = \mathbb E[Y(1)-Y(0)\mid\sigma(X)] \end{align*} $\mathbb P$-a.s. [/step] [step:Take expectations to recover the average treatment effect] Taking expectations in the almost-sure identity from the previous step and using the defining property of conditional expectation with the set $\Omega\in\sigma(X)$, we obtain \begin{align*} \mathbb E\big[\mathbb E[Y\mid T=1,X]-\mathbb E[Y\mid T=0,X]\big] = \mathbb E\big[\mathbb E[Y(1)-Y(0)\mid\sigma(X)]\big] = \mathbb E[Y(1)-Y(0)]. \end{align*} By definition, \begin{align*} \operatorname{ATE}:=\mathbb E[Y(1)-Y(0)]. \end{align*} Hence \begin{align*} \operatorname{ATE} = \mathbb E\big[\mathbb E[Y\mid T=1,X]-\mathbb E[Y\mid T=0,X]\big]. \end{align*} This is the claimed identification formula. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Event Definition Random Variable Definition Coordinate Projections on a Product Space Are Independent Probability Theory Optional Stopping for Non-Negative Supermartingales Martingale Theory Consistency Lemma for Point Conditioning Probability & Statistics Otto-Villani Theorem Probability & Statistics Elastic Net Grouping Effect for Two Predictors Probability & Statistics Identification of the Average Treatment Effect by the Propensity Score Probability & Statistics Adaptive Lasso Oracle Property Probability & Statistics VC Subgraph Closure Properties Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.