[proofplan]
We prove the formula at the level of conditional kernels, not separately for each measurable set. First we choose regular conditional distributions for $Y_x$ given $L$, for $X$ given $L$, and for the joint event $\{Y\in A,X=x\}$ given $L$. Conditional exchangeability factors the conditional joint law of $(Y_x,X)$ given $L$, while consistency identifies the joint events $\{Y\in A,X=x\}$ and $\{Y_x\in A,X=x\}$. Positivity then permits division by the conditional treatment probability on the full target covariate support, and integrating over $\mathbb P_L$ gives the asserted identification formula.
[/proofplan]
[step:Choose conditional kernels and fix a common measurable version]
Let $\mu:=\mathbb P_L$ denote the law of $L$ on $(E,\mathcal E)$. Since $E$ and $S$ are standard Borel spaces, regular conditional distributions exist. Choose a probability kernel
\begin{align*}
N_x:E\times\mathcal S\to[0,1]
\end{align*}
such that, for every $A\in\mathcal S$,
\begin{align*}
N_x(L,A)=\mathbb P(Y_x\in A\mid L) \quad \mathbb P\text{-a.s.}
\end{align*}
Choose a [measurable function](/page/Measurable%20Function)
\begin{align*}
q_x:E\to[0,1]
\end{align*}
such that
\begin{align*}
q_x(L)=\mathbb P(X=x\mid L) \quad \mathbb P\text{-a.s.}
\end{align*}
Finally, choose a finite kernel
\begin{align*}
H_x:E\times\mathcal S\to[0,1]
\end{align*}
such that, for every $A\in\mathcal S$,
\begin{align*}
H_x(L,A)=\mathbb P(Y\in A,\ X=x\mid L) \quad \mathbb P\text{-a.s.}
\end{align*}
For each $l\in E$, $H_x(l,\cdot)$ is a finite measure and $H_x(l,S)=q_x(l)$ after modifying $H_x$ on a $\mu$-null set.
Because $S$ is standard Borel, its $\sigma$-algebra $\mathcal S$ has a countable determining class closed under finite intersections. All kernel identities below may therefore be first imposed on that countable class outside one $\mu$-null set and then extended to all $A\in\mathcal S$ by the [monotone class theorem](/theorems/4925). We use this convention once and for all, so the exceptional null sets do not depend on $A$.
[/step]
[step:Use consistency to replace observed outcomes by potential outcomes on the treated stratum]
For $A\in\mathcal S$, define the events
\begin{align*}
B_A:=\{\omega\in\Omega:Y(\omega)\in A,\ X(\omega)=x\}
\end{align*}
and
\begin{align*}
C_A:=\{\omega\in\Omega:Y_x(\omega)\in A,\ X(\omega)=x\}.
\end{align*}
Both events belong to $\mathcal F$ because $Y$, $Y_x$, and $X$ are measurable. By consistency, $Y=Y_x$ $\mathbb P$-a.s. on $\{X=x\}$, hence $\mathbb P(B_A\triangle C_A)=0$. Therefore their conditional probabilities given $L$ agree:
\begin{align*}
\mathbb P(B_A\mid L)=\mathbb P(C_A\mid L) \quad \mathbb P\text{-a.s.}
\end{align*}
Using the kernel $H_x$ and the common-version convention from the previous step, this gives, for $\mu$-a.e. $l\in E$ and every $A\in\mathcal S$,
\begin{align*}
H_x(l,A)=\mathbb P(Y_x\in A,\ X=x\mid L=l).
\end{align*}
[guided]
The role of consistency is exactly to connect the observed outcome $Y$ to the counterfactual outcome $Y_x$ inside the stratum in which the observed treatment is $x$. For a measurable outcome set $A\in\mathcal S$, consider
\begin{align*}
B_A:=\{\omega\in\Omega:Y(\omega)\in A,\ X(\omega)=x\}
\end{align*}
and
\begin{align*}
C_A:=\{\omega\in\Omega:Y_x(\omega)\in A,\ X(\omega)=x\}.
\end{align*}
These are measurable events because $Y:(\Omega,\mathcal F)\to(S,\mathcal S)$, $Y_x:(\Omega,\mathcal F)\to(S,\mathcal S)$, and $X:(\Omega,\mathcal F)\to(\{0,1\},2^{\{0,1\}})$ are measurable maps.
Consistency says that $Y=Y_x$ except on a $\mathbb P$-null subset of $\{X=x\}$. Hence the two events $B_A$ and $C_A$ can differ only on that null subset. In symbols,
\begin{align*}
\mathbb P(B_A\triangle C_A)=0.
\end{align*}
[Conditional expectation](/page/Conditional%20Expectation) respects almost sure equality of indicators, so
\begin{align*}
\mathbb P(B_A\mid L)=\mathbb P(C_A\mid L) \quad \mathbb P\text{-a.s.}
\end{align*}
The left-hand side is represented by the kernel $H_x(L,A)$ by definition of $H_x$. Therefore, after the common version adjustment using the countable determining class for $\mathcal S$, we have for $\mu$-a.e. $l\in E$ and every $A\in\mathcal S$,
\begin{align*}
H_x(l,A)=\mathbb P(Y_x\in A,\ X=x\mid L=l).
\end{align*}
This is the point where the observational law first becomes connected to the potential-outcome law.
[/guided]
[/step]
[step:Factor the conditional joint law using conditional exchangeability]
Conditional exchangeability states that $(Y_0,Y_1)$ and $X$ are conditionally independent given $L$. Since $Y_x$ is a measurable coordinate function of the pair $(Y_0,Y_1)$, it follows that $Y_x$ and $X$ are conditionally independent given $L$. Thus, for $\mu$-a.e. $l\in E$ and every $A\in\mathcal S$,
\begin{align*}
\mathbb P(Y_x\in A,\ X=x\mid L=l)=N_x(l,A)q_x(l).
\end{align*}
Combining this factorization with the previous step gives, for $\mu$-a.e. $l\in E$ and every $A\in\mathcal S$,
\begin{align*}
H_x(l,A)=N_x(l,A)q_x(l).
\end{align*}
[/step]
[step:Define the conditional outcome kernel on the positive treatment support]
Fix a probability measure $\nu_0$ on $(S,\mathcal S)$. Define
\begin{align*}
K_x:E\times\mathcal S\to[0,1]
\end{align*}
by
\begin{align*}
K_x(l,A)=\frac{H_x(l,A)}{q_x(l)} \quad \text{if } q_x(l)>0
\end{align*}
and
\begin{align*}
K_x(l,A)=\nu_0(A) \quad \text{if } q_x(l)=0.
\end{align*}
For each $A\in\mathcal S$, the map $l\mapsto K_x(l,A)$ is $\mathcal E$-measurable because $H_x(\cdot,A)$ and $q_x$ are measurable. For each $l$ with $q_x(l)>0$, $K_x(l,\cdot)$ is a probability measure since $H_x(l,\cdot)$ is a finite measure and $H_x(l,S)=q_x(l)$. For $l$ with $q_x(l)=0$, $K_x(l,\cdot)=\nu_0$ is a probability measure. Hence $K_x$ is a probability kernel from $E$ to $S$.
Moreover, for every $A\in\mathcal S$,
\begin{align*}
H_x(l,A)=K_x(l,A)q_x(l)
\end{align*}
for every $l\in E$. This is exactly the disintegration identity for a regular conditional distribution of $Y$ given the stratum $(X=x,L=l)$, because $H_x(l,A)$ is the [conditional probability](/page/Conditional%20Probability) of $\{Y\in A,X=x\}$ given $L=l$ and $q_x(l)$ is the conditional probability of $\{X=x\}$ given $L=l$.
[/step]
[step:Divide by positivity and integrate over the covariate law]
By positivity, $q_x(l)>0$ for $\mu$-a.e. $l\in E$. On this full $\mu$-measure set, the identities
\begin{align*}
H_x(l,A)=N_x(l,A)q_x(l)
\end{align*}
and
\begin{align*}
H_x(l,A)=K_x(l,A)q_x(l)
\end{align*}
allow division by $q_x(l)$. Therefore, for $\mu$-a.e. $l\in E$ and every $A\in\mathcal S$,
\begin{align*}
K_x(l,A)=N_x(l,A).
\end{align*}
Since $N_x$ is a regular conditional distribution of $Y_x$ given $L$, the [law of total probability](/theorems/1113) for regular conditional distributions gives
\begin{align*}
\mathbb P(Y_x\in A)=\int_E N_x(l,A)\,d\mu(l).
\end{align*}
Substituting $K_x(l,A)=N_x(l,A)$ $\mu$-a.e. yields
\begin{align*}
\mathbb P(Y_x\in A)=\int_E K_x(l,A)\,d\mu(l).
\end{align*}
Since $\mu=\mathbb P_L$, this is
\begin{align*}
\mathbb P(Y_x\in A)=\int_E K_x(l,A)\,d\mathbb P_L(l).
\end{align*}
Finally, because $K_x(L,A)$ is a version of $\mathbb P(Y\in A\mid X=x,L)$, the last integral is equivalently
\begin{align*}
\int_E K_x(l,A)\,d\mathbb P_L(l)=\mathbb E[K_x(L,A)]=\mathbb E[\mathbb P(Y\in A\mid X=x,L)].
\end{align*}
This proves the asserted identification formula for every $A\in\mathcal S$.
[/step]