[proofplan]
Fix the treatment value $x$ and compare three laws: the marginal law of $Y_x$, the conditional law of $Y_x$ among units with $X=x$, and the conditional law of the observed outcome $Y$ among those units. Marginal exchangeability identifies the first two laws. Consistency identifies the last two because $Y=Y_x$ on the event $\{X=x\}$. The expectation identity then follows from equality of laws by applying the law identity to bounded truncations and passing to the limit using integrability.
[/proofplan]
[step:Use exchangeability to condition on the observed treatment group]
Fix $x\in\{0,1\}$ such that $\mathbb P(X=x)>0$, and fix a Borel set $A\in\mathcal B(\mathbb R)$. Define the event
\begin{align*}
E_A:=\{\omega\in\Omega:Y_x(\omega)\in A\}.
\end{align*}
Since $Y_x$ is $\mathcal F$-measurable and $A$ is Borel, $E_A\in\mathcal F$. Marginal exchangeability says that $Y_x$ is independent of $X$, hence $E_A$ is independent of the event $\{X=x\}$. Therefore
\begin{align*}
\mathbb P(E_A\cap\{X=x\})=\mathbb P(E_A)\mathbb P(X=x).
\end{align*}
Since $\mathbb P(X=x)>0$, division by $\mathbb P(X=x)$ gives
\begin{align*}
\mathbb P(Y_x\in A\mid X=x)=\mathbb P(Y_x\in A).
\end{align*}
[guided]
We first isolate exactly what marginal exchangeability gives. Fix $x\in\{0,1\}$ with $\mathbb P(X=x)>0$, and let $A\in\mathcal B(\mathbb R)$. Define
\begin{align*}
E_A:=\{\omega\in\Omega:Y_x(\omega)\in A\}.
\end{align*}
This is an event because $Y_x:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ is measurable and $A$ is Borel.
Marginal exchangeability means that the potential outcome $Y_x$ is independent of the treatment $X$. Equivalently, every event determined by $Y_x$ is independent of every event determined by $X$. The event $E_A$ is determined by $Y_x$, and the event $\{X=x\}$ is determined by $X$, so
\begin{align*}
\mathbb P(E_A\cap\{X=x\})=\mathbb P(E_A)\mathbb P(X=x).
\end{align*}
The positivity hypothesis $\mathbb P(X=x)>0$ is exactly what permits conditioning on the group with observed treatment value $x$. Dividing by $\mathbb P(X=x)$ yields
\begin{align*}
\mathbb P(Y_x\in A\mid X=x)=\frac{\mathbb P(E_A\cap\{X=x\})}{\mathbb P(X=x)}=\mathbb P(E_A)=\mathbb P(Y_x\in A).
\end{align*}
Thus exchangeability identifies the marginal law of the potential outcome with its conditional law inside the observed treatment group.
[/guided]
[/step]
[step:Use consistency to replace the potential outcome by the observed outcome]
Define the event
\begin{align*}
F_A:=\{\omega\in\Omega:Y(\omega)\in A\}.
\end{align*}
By consistency, $Y=Y_x$ on $\{X=x\}$. Hence
\begin{align*}
E_A\cap\{X=x\}=F_A\cap\{X=x\}.
\end{align*}
Dividing the probabilities of these equal events by $\mathbb P(X=x)$ gives
\begin{align*}
\mathbb P(Y_x\in A\mid X=x)=\mathbb P(Y\in A\mid X=x).
\end{align*}
Combining this identity with the previous step gives
\begin{align*}
\mathbb P(Y_x\in A)=\mathbb P(Y\in A\mid X=x).
\end{align*}
[guided]
We now use consistency to connect the potential outcome to the observed outcome. Define the observed-outcome event
\begin{align*}
F_A:=\{\omega\in\Omega:Y(\omega)\in A\}.
\end{align*}
This is an event because $Y:(\Omega,\mathcal F)\to(\mathbb R,\mathcal B(\mathbb R))$ is measurable and $A\in\mathcal B(\mathbb R)$.
Consistency says that, on the event $\{X=x\}$, the observed outcome is exactly the potential outcome under treatment value $x$. Therefore an outcome value belongs to $A$ for $Y_x$ if and only if it belongs to $A$ for $Y$, once we restrict to units satisfying $X=x$. Hence
\begin{align*}
E_A\cap\{X=x\}=F_A\cap\{X=x\}.
\end{align*}
Because $\mathbb P(X=x)>0$, conditional probabilities given $X=x$ are obtained by dividing by $\mathbb P(X=x)$. Thus
\begin{align*}
\mathbb P(Y_x\in A\mid X=x)=\frac{\mathbb P(E_A\cap\{X=x\})}{\mathbb P(X=x)}=\frac{\mathbb P(F_A\cap\{X=x\})}{\mathbb P(X=x)}=\mathbb P(Y\in A\mid X=x).
\end{align*}
The preceding step proved $\mathbb P(Y_x\in A\mid X=x)=\mathbb P(Y_x\in A)$. Combining the two equalities gives
\begin{align*}
\mathbb P(Y_x\in A)=\mathbb P(Y\in A\mid X=x).
\end{align*}
[/guided]
[/step]
[step:Pass from equality of laws to equality of expectations by truncation]
Assume $Y_x\in L^1(\Omega,\mathcal F,\mathbb P)$. Define the [conditional probability measure](/theorems/4972)
\begin{align*}
\mathbb P_x^Y(B):=\mathbb P(B\mid X=x)=\frac{\mathbb P(B\cap\{X=x\})}{\mathbb P(X=x)}
\end{align*}
for every $B\in\mathcal F$. The law identity already proved says that, under $\mathbb P$, the [random variable](/page/Random%20Variable) $Y_x$ has the same distribution as $Y$ under $\mathbb P_x^Y$. Consequently, if $g:\mathbb R\to\mathbb R$ is bounded and Borel, then
\begin{align*}
\mathbb E[g(Y_x)]=\mathbb E[g(Y)\mid X=x].
\end{align*}
Indeed, this follows first for indicator functions $g=\mathbb 1_B$ with $B\in\mathcal B(\mathbb R)$ from the equality of laws, then for non-negative simple Borel functions by linearity. If $g$ is bounded and non-negative, choose finite-valued Borel simple functions $s_n:\mathbb R\to[0,\infty)$ with $\|s_n-g\|_\infty\to0$; the two expectations differ by at most $\|s_n-g\|_\infty$, so the identity passes to $g$. Applying this to the positive and negative parts gives the identity for every bounded Borel $g:\mathbb R\to\mathbb R$.
For each $n\in\mathbb N$, define the bounded Borel truncation map
\begin{align*}
\tau_n:\mathbb R&\to\mathbb R
\end{align*}
by
\begin{align*}
\tau_n(t):=\max\{-n,\min\{t,n\}\}.
\end{align*}
Equality of distributions gives
\begin{align*}
\mathbb E[\tau_n(Y_x)]=\mathbb E[\tau_n(Y)\mid X=x].
\end{align*}
Also define
\begin{align*}
\sigma_n:\mathbb R&\to[0,\infty)
\end{align*}
by
\begin{align*}
\sigma_n(t):=\min\{|t|,n\}.
\end{align*}
Again by equality of distributions,
\begin{align*}
\mathbb E[\sigma_n(Y)\mid X=x]=\mathbb E[\sigma_n(Y_x)].
\end{align*}
Since $\sigma_n(Y_x)\uparrow |Y_x|$, the [monotone convergence theorem](/theorems/509) gives
\begin{align*}
\lim_{n\to\infty}\mathbb E[\sigma_n(Y_x)]=\mathbb E[|Y_x|]<\infty.
\end{align*}
Therefore $Y$ is integrable under the [conditional probability](/page/Conditional%20Probability) measure $\mathbb P_x^Y$.
Since $|\tau_n(Y_x)|\le |Y_x|$ and $\tau_n(Y_x)\to Y_x$ pointwise, the [dominated convergence theorem](/theorems/4) gives
\begin{align*}
\lim_{n\to\infty}\mathbb E[\tau_n(Y_x)]=\mathbb E[Y_x].
\end{align*}
Since $|\tau_n(Y)|\le |Y|$ and $\tau_n(Y)\to Y$ $\mathbb P_x^Y$-a.s., the [dominated convergence theorem](/theorems/7529) under $\mathbb P_x^Y$ gives
\begin{align*}
\lim_{n\to\infty}\mathbb E[\tau_n(Y)\mid X=x]=\mathbb E[Y\mid X=x].
\end{align*}
Passing to the limit in
\begin{align*}
\mathbb E[\tau_n(Y_x)]=\mathbb E[\tau_n(Y)\mid X=x]
\end{align*}
therefore yields
\begin{align*}
\mathbb E[Y_x]=\mathbb E[Y\mid X=x].
\end{align*}
This proves the distributional and expectation identification formulas.
[guided]
We have already proved equality of the two laws:
\begin{align*}
\mathbb P(Y_x\in A)=\mathbb P(Y\in A\mid X=x)
\end{align*}
for every $A\in\mathcal B(\mathbb R)$. To turn this law identity into an expectation identity, define the conditional probability measure $\mathbb P_x^Y$ on $(\Omega,\mathcal F)$ by
\begin{align*}
\mathbb P_x^Y(B):=\mathbb P(B\mid X=x)=\frac{\mathbb P(B\cap\{X=x\})}{\mathbb P(X=x)}
\end{align*}
for $B\in\mathcal F$. The positivity assumption $\mathbb P(X=x)>0$ makes this definition valid.
Under $\mathbb P$, the random variable $Y_x$ has the same distribution as $Y$ under $\mathbb P_x^Y$. Therefore, for every bounded Borel map $g:\mathbb R\to\mathbb R$,
\begin{align*}
\mathbb E[g(Y_x)]=\mathbb E[g(Y)\mid X=x].
\end{align*}
The justification is by the standard monotone-class construction: the identity holds for indicators of Borel sets by equality of laws, hence for non-negative simple Borel functions by linearity, then for bounded non-negative Borel functions by uniform approximation with finite-valued Borel simple functions, and finally for bounded signed Borel functions by applying the result to positive and negative parts.
For $n\in\mathbb N$, define the bounded Borel truncation map $\tau_n:\mathbb R\to\mathbb R$ by
\begin{align*}
\tau_n(t):=\max\{-n,\min\{t,n\}\}.
\end{align*}
Applying the bounded-Borel identity with $g=\tau_n$ gives
\begin{align*}
\mathbb E[\tau_n(Y_x)]=\mathbb E[\tau_n(Y)\mid X=x].
\end{align*}
To verify conditional integrability of $Y$, define $\sigma_n:\mathbb R\to[0,\infty)$ by
\begin{align*}
\sigma_n(t):=\min\{|t|,n\}.
\end{align*}
Again the bounded-Borel identity gives
\begin{align*}
\mathbb E[\sigma_n(Y)\mid X=x]=\mathbb E[\sigma_n(Y_x)].
\end{align*}
Since $\sigma_n(Y_x)\uparrow |Y_x|$ pointwise and $Y_x\in L^1(\Omega,\mathcal F,\mathbb P)$, the monotone convergence theorem gives
\begin{align*}
\lim_{n\to\infty}\mathbb E[\sigma_n(Y_x)]=\mathbb E[|Y_x|]<\infty.
\end{align*}
It follows that $Y$ is integrable under $\mathbb P_x^Y$, which is exactly conditional integrability on $\{X=x\}$.
Now $|\tau_n(Y_x)|\le |Y_x|$ and $\tau_n(Y_x)\to Y_x$ pointwise, so the dominated convergence theorem gives
\begin{align*}
\lim_{n\to\infty}\mathbb E[\tau_n(Y_x)]=\mathbb E[Y_x].
\end{align*}
Similarly, $|\tau_n(Y)|\le |Y|$ and $\tau_n(Y)\to Y$ $\mathbb P_x^Y$-almost surely, while $Y$ is integrable under $\mathbb P_x^Y$. Applying the dominated convergence theorem under $\mathbb P_x^Y$ gives
\begin{align*}
\lim_{n\to\infty}\mathbb E[\tau_n(Y)\mid X=x]=\mathbb E[Y\mid X=x].
\end{align*}
Passing to the limit in the truncation identity therefore yields
\begin{align*}
\mathbb E[Y_x]=\mathbb E[Y\mid X=x].
\end{align*}
[/guided]
[/step]