Nonidentifiability of the Average Treatment Effect Under Consistency Alone

Nonidentifiability of the Average Treatment Effect Under Consistency Alone (Theorem # 9653)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We construct two explicit models on the same four-point [probability space](/page/Probability%20Space). In the first model, the two potential outcomes are identical, so the average treatment effect is zero, while the observed pair $(A,Y)$ is uniformly distributed on $\{0,1\}^2$. In the second model, we choose four potential-outcome types that give the same uniform observed law after applying consistency, but whose average treatment effect is positive. Therefore the observational law of $(A,Y)$ does not determine the causal parameter under consistency alone. [/proofplan] [step:Build a common finite probability space for both models] Let \begin{align*} \Omega:=\{\omega_1,\omega_2,\omega_3,\omega_4\}. \end{align*} Let $\mathcal F:=2^\Omega$ be the power-set $\sigma$-algebra, and define the probability measure $\mathbb P:\mathcal F\to[0,1]$ by \begin{align*} \mathbb P(\{\omega_i\})=\frac{1}{4} \end{align*} for each $i\in\{1,2,3,4\}$. Since the codomain $\{0,1\}$ is equipped with its power-set $\sigma$-algebra, every function from $\Omega$ to $\{0,1\}$ is measurable. Thus any treatment, potential-outcome, or observed-outcome function specified below is automatically a [random variable](/page/Random%20Variable) of the required type. [/step] [step:Construct the first model with zero treatment effect] Define \begin{align*} A_1:(\Omega,\mathcal F)\to(\{0,1\},2^{\{0,1\}}) \end{align*} by setting $A_1(\omega_1)=A_1(\omega_2)=0$ and $A_1(\omega_3)=A_1(\omega_4)=1$. Define \begin{align*} U:(\Omega,\mathcal F)\to(\{0,1\},2^{\{0,1\}}) \end{align*} by setting $U(\omega_1)=U(\omega_3)=0$ and $U(\omega_2)=U(\omega_4)=1$. Define the potential outcomes of the first model by \begin{align*} Y_1(0):=U \end{align*} and \begin{align*} Y_1(1):=U. \end{align*} Define the observed outcome \begin{align*} Y_1:(\Omega,\mathcal F)\to(\{0,1\},2^{\{0,1\}}) \end{align*} by the consistency rule \begin{align*} Y_1(\omega):=Y_1(A_1(\omega))(\omega). \end{align*} Because $Y_1(0)=Y_1(1)=U$, this gives $Y_1=U$. Hence the observed pairs are \begin{align*} (A_1(\omega_1),Y_1(\omega_1))=(0,0),\quad (A_1(\omega_2),Y_1(\omega_2))=(0,1),\quad (A_1(\omega_3),Y_1(\omega_3))=(1,0),\quad (A_1(\omega_4),Y_1(\omega_4))=(1,1). \end{align*} Also $Y_1(1)-Y_1(0)=0$ pointwise on $\Omega$, so \begin{align*} \mathbb E_{M_1}[Y(1)-Y(0)]=0. \end{align*} [/step] [step:Construct the second model with a positive treatment effect] Define \begin{align*} A_2:(\Omega,\mathcal F)\to(\{0,1\},2^{\{0,1\}}) \end{align*} by setting $A_2(\omega_1)=A_2(\omega_2)=0$ and $A_2(\omega_3)=A_2(\omega_4)=1$. Define the potential outcomes $Y_2(0),Y_2(1):(\Omega,\mathcal F)\to(\{0,1\},2^{\{0,1\}})$ by the following values: \begin{align*} (Y_2(0)(\omega_1),Y_2(1)(\omega_1))=(0,1). \end{align*} \begin{align*} (Y_2(0)(\omega_2),Y_2(1)(\omega_2))=(1,1). \end{align*} \begin{align*} (Y_2(0)(\omega_3),Y_2(1)(\omega_3))=(0,0). \end{align*} \begin{align*} (Y_2(0)(\omega_4),Y_2(1)(\omega_4))=(0,1). \end{align*} Define the observed outcome \begin{align*} Y_2:(\Omega,\mathcal F)\to(\{0,1\},2^{\{0,1\}}) \end{align*} by \begin{align*} Y_2(\omega):=Y_2(A_2(\omega))(\omega). \end{align*} This is exactly the consistency condition for $M_2$. [guided] The second model is designed so that the observed outcome hides some of the potential-outcome information. We first fix the treatment assignment by setting $A_2(\omega_1)=A_2(\omega_2)=0$ and $A_2(\omega_3)=A_2(\omega_4)=1$. Then we assign the potential-outcome pairs by \begin{align*} (Y_2(0)(\omega_1),Y_2(1)(\omega_1))=(0,1). \end{align*} \begin{align*} (Y_2(0)(\omega_2),Y_2(1)(\omega_2))=(1,1). \end{align*} \begin{align*} (Y_2(0)(\omega_3),Y_2(1)(\omega_3))=(0,0). \end{align*} \begin{align*} (Y_2(0)(\omega_4),Y_2(1)(\omega_4))=(0,1). \end{align*} Each of these functions is measurable because $\mathcal F=2^\Omega$. The observed outcome is not chosen independently after this assignment; consistency forces it to be the value of the potential outcome under the treatment actually received. Thus we define \begin{align*} Y_2(\omega):=Y_2(A_2(\omega))(\omega). \end{align*} For example, since $A_2(\omega_1)=0$, the observed outcome at $\omega_1$ is $Y_2(0)(\omega_1)=0$; since $A_2(\omega_4)=1$, the observed outcome at $\omega_4$ is $Y_2(1)(\omega_4)=1$. This pointwise rule is precisely the consistency condition required in the model class $\mathcal M$. [/guided] The observed pairs in $M_2$ are therefore \begin{align*} (A_2(\omega_1),Y_2(\omega_1))=(0,0),\quad (A_2(\omega_2),Y_2(\omega_2))=(0,1),\quad (A_2(\omega_3),Y_2(\omega_3))=(1,0),\quad (A_2(\omega_4),Y_2(\omega_4))=(1,1). \end{align*} The average treatment effect in $M_2$ is \begin{align*} \mathbb E_{M_2}[Y(1)-Y(0)]=\frac{1}{4}\bigl((1-0)+(1-1)+(0-0)+(1-0)\bigr)=\frac{1}{2}. \end{align*} [/step] [step:Compare the observed laws and the causal parameters] For $j\in\{1,2\}$, let $\mu_j$ denote the joint law of $(A_j,Y_j)$ on $\{0,1\}^2$, defined by \begin{align*} \mu_j(B):=\mathbb P\bigl(\{\omega\in\Omega:(A_j(\omega),Y_j(\omega))\in B\}\bigr) \end{align*} for each subset $B\subset\{0,1\}^2$. In both models, each of the four points $(0,0),(0,1),(1,0),(1,1)$ is attained at exactly one atom of $\Omega$, and every atom has probability $1/4$. Hence, for every $(a,y)\in\{0,1\}^2$, \begin{align*} \mu_1(\{(a,y)\})=\frac{1}{4}=\mu_2(\{(a,y)\}). \end{align*} Since $\{0,1\}^2$ is finite, equality on singletons implies $\mu_1=\mu_2$ on all subsets of $\{0,1\}^2$. Thus the two models have the same observational law of $(A,Y)$. However, the causal parameters are different: \begin{align*} \mathbb E_{M_1}[Y(1)-Y(0)]=0 \end{align*} and \begin{align*} \mathbb E_{M_2}[Y(1)-Y(0)]=\frac{1}{2}. \end{align*} Therefore the observational law of $(A,Y)$ does not determine the average treatment effect within the class $\mathcal M$. This proves the claimed nonidentifiability under consistency alone. [/step]

Prerequisites (0/1 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Random Variable

Explore Further

What brings you to Androma?

Start with a route through the knowledge graph.