[proofplan]
We construct two explicit models on the same four-point [probability space](/page/Probability%20Space). In the first model, the two potential outcomes are identical, so the average treatment effect is zero, while the observed pair $(A,Y)$ is uniformly distributed on $\{0,1\}^2$. In the second model, we choose four potential-outcome types that give the same uniform observed law after applying consistency, but whose average treatment effect is positive. Therefore the observational law of $(A,Y)$ does not determine the causal parameter under consistency alone.
[/proofplan]
[step:Build a common finite probability space for both models]
Let
\begin{align*}
\Omega:=\{\omega_1,\omega_2,\omega_3,\omega_4\}.
\end{align*}
Let $\mathcal F:=2^\Omega$ be the power-set $\sigma$-algebra, and define the probability measure $\mathbb P:\mathcal F\to[0,1]$ by
\begin{align*}
\mathbb P(\{\omega_i\})=\frac{1}{4}
\end{align*}
for each $i\in\{1,2,3,4\}$. Since the codomain $\{0,1\}$ is equipped with its power-set $\sigma$-algebra, every function from $\Omega$ to $\{0,1\}$ is measurable. Thus any treatment, potential-outcome, or observed-outcome function specified below is automatically a [random variable](/page/Random%20Variable) of the required type.
[/step]
[step:Construct the first model with zero treatment effect]
Define
\begin{align*}
A_1:(\Omega,\mathcal F)\to(\{0,1\},2^{\{0,1\}})
\end{align*}
by setting $A_1(\omega_1)=A_1(\omega_2)=0$ and $A_1(\omega_3)=A_1(\omega_4)=1$. Define
\begin{align*}
U:(\Omega,\mathcal F)\to(\{0,1\},2^{\{0,1\}})
\end{align*}
by setting $U(\omega_1)=U(\omega_3)=0$ and $U(\omega_2)=U(\omega_4)=1$. Define the potential outcomes of the first model by
\begin{align*}
Y_1(0):=U
\end{align*}
and
\begin{align*}
Y_1(1):=U.
\end{align*}
Define the observed outcome
\begin{align*}
Y_1:(\Omega,\mathcal F)\to(\{0,1\},2^{\{0,1\}})
\end{align*}
by the consistency rule
\begin{align*}
Y_1(\omega):=Y_1(A_1(\omega))(\omega).
\end{align*}
Because $Y_1(0)=Y_1(1)=U$, this gives $Y_1=U$. Hence the observed pairs are
\begin{align*}
(A_1(\omega_1),Y_1(\omega_1))=(0,0),\quad (A_1(\omega_2),Y_1(\omega_2))=(0,1),\quad (A_1(\omega_3),Y_1(\omega_3))=(1,0),\quad (A_1(\omega_4),Y_1(\omega_4))=(1,1).
\end{align*}
Also $Y_1(1)-Y_1(0)=0$ pointwise on $\Omega$, so
\begin{align*}
\mathbb E_{M_1}[Y(1)-Y(0)]=0.
\end{align*}
[/step]
[step:Construct the second model with a positive treatment effect]
Define
\begin{align*}
A_2:(\Omega,\mathcal F)\to(\{0,1\},2^{\{0,1\}})
\end{align*}
by setting $A_2(\omega_1)=A_2(\omega_2)=0$ and $A_2(\omega_3)=A_2(\omega_4)=1$. Define the potential outcomes $Y_2(0),Y_2(1):(\Omega,\mathcal F)\to(\{0,1\},2^{\{0,1\}})$ by the following values:
\begin{align*}
(Y_2(0)(\omega_1),Y_2(1)(\omega_1))=(0,1).
\end{align*}
\begin{align*}
(Y_2(0)(\omega_2),Y_2(1)(\omega_2))=(1,1).
\end{align*}
\begin{align*}
(Y_2(0)(\omega_3),Y_2(1)(\omega_3))=(0,0).
\end{align*}
\begin{align*}
(Y_2(0)(\omega_4),Y_2(1)(\omega_4))=(0,1).
\end{align*}
Define the observed outcome
\begin{align*}
Y_2:(\Omega,\mathcal F)\to(\{0,1\},2^{\{0,1\}})
\end{align*}
by
\begin{align*}
Y_2(\omega):=Y_2(A_2(\omega))(\omega).
\end{align*}
This is exactly the consistency condition for $M_2$.
[guided]
The second model is designed so that the observed outcome hides some of the potential-outcome information. We first fix the treatment assignment by setting $A_2(\omega_1)=A_2(\omega_2)=0$ and $A_2(\omega_3)=A_2(\omega_4)=1$. Then we assign the potential-outcome pairs by
\begin{align*}
(Y_2(0)(\omega_1),Y_2(1)(\omega_1))=(0,1).
\end{align*}
\begin{align*}
(Y_2(0)(\omega_2),Y_2(1)(\omega_2))=(1,1).
\end{align*}
\begin{align*}
(Y_2(0)(\omega_3),Y_2(1)(\omega_3))=(0,0).
\end{align*}
\begin{align*}
(Y_2(0)(\omega_4),Y_2(1)(\omega_4))=(0,1).
\end{align*}
Each of these functions is measurable because $\mathcal F=2^\Omega$. The observed outcome is not chosen independently after this assignment; consistency forces it to be the value of the potential outcome under the treatment actually received. Thus we define
\begin{align*}
Y_2(\omega):=Y_2(A_2(\omega))(\omega).
\end{align*}
For example, since $A_2(\omega_1)=0$, the observed outcome at $\omega_1$ is $Y_2(0)(\omega_1)=0$; since $A_2(\omega_4)=1$, the observed outcome at $\omega_4$ is $Y_2(1)(\omega_4)=1$. This pointwise rule is precisely the consistency condition required in the model class $\mathcal M$.
[/guided]
The observed pairs in $M_2$ are therefore
\begin{align*}
(A_2(\omega_1),Y_2(\omega_1))=(0,0),\quad (A_2(\omega_2),Y_2(\omega_2))=(0,1),\quad (A_2(\omega_3),Y_2(\omega_3))=(1,0),\quad (A_2(\omega_4),Y_2(\omega_4))=(1,1).
\end{align*}
The average treatment effect in $M_2$ is
\begin{align*}
\mathbb E_{M_2}[Y(1)-Y(0)]=\frac{1}{4}\bigl((1-0)+(1-1)+(0-0)+(1-0)\bigr)=\frac{1}{2}.
\end{align*}
[/step]
[step:Compare the observed laws and the causal parameters]
For $j\in\{1,2\}$, let $\mu_j$ denote the joint law of $(A_j,Y_j)$ on $\{0,1\}^2$, defined by
\begin{align*}
\mu_j(B):=\mathbb P\bigl(\{\omega\in\Omega:(A_j(\omega),Y_j(\omega))\in B\}\bigr)
\end{align*}
for each subset $B\subset\{0,1\}^2$. In both models, each of the four points $(0,0),(0,1),(1,0),(1,1)$ is attained at exactly one atom of $\Omega$, and every atom has probability $1/4$. Hence, for every $(a,y)\in\{0,1\}^2$,
\begin{align*}
\mu_1(\{(a,y)\})=\frac{1}{4}=\mu_2(\{(a,y)\}).
\end{align*}
Since $\{0,1\}^2$ is finite, equality on singletons implies $\mu_1=\mu_2$ on all subsets of $\{0,1\}^2$. Thus the two models have the same observational law of $(A,Y)$.
However, the causal parameters are different:
\begin{align*}
\mathbb E_{M_1}[Y(1)-Y(0)]=0
\end{align*}
and
\begin{align*}
\mathbb E_{M_2}[Y(1)-Y(0)]=\frac{1}{2}.
\end{align*}
Therefore the observational law of $(A,Y)$ does not determine the average treatment effect within the class $\mathcal M$. This proves the claimed nonidentifiability under consistency alone.
[/step]