[proofplan]
We first prove the distributional identity for a fixed Borel set $B$ by comparing three conditional probabilities on the stratum $\{A=a\}$. Conditional exchangeability identifies the conditional law of $Y_a$ given $(A=a,L=l)$ with the conditional law of $Y_a$ given $L=l$, while consistency identifies the conditional law of $Y_a$ with that of $Y$ on the same stratum. Positivity is used to pass from identities holding under the joint law of $(A,L)$ to identities holding for $\mu_L$-almost every $l$ at the point $A=a$. The expectation formula follows by the same argument with conditional expectations instead of indicators.
[/proofplan]
[step:Fix regular conditional versions and the positive treatment atom]
Fix $B\in\mathcal B(\mathbb R)$. Define the indicator random variables
\begin{align*}
I_a:\Omega\to\{0,1\},\qquad I_a(\omega)=\mathbb 1_{\{Y_a\in B\}}(\omega),
\end{align*}
and
\begin{align*}
I:\Omega\to\{0,1\},\qquad I(\omega)=\mathbb 1_{\{Y\in B\}}(\omega).
\end{align*}
Let $p_a:\mathcal L\to[0,1]$ be a measurable version of $\mathbb P(A=a\mid L=l)$ satisfying $p_a(l)>0$ for $\mu_L$-almost every $l$.
Because $\mathcal T$, $\mathcal L$, and $\mathbb R$ are standard Borel spaces, regular conditional laws exist. Choose measurable kernels
\begin{align*}
K_a:\mathcal L\times\mathcal B(\mathbb R)\to[0,1]
\end{align*}
and
\begin{align*}
H_a:(\mathcal T\times\mathcal L)\times\mathcal B(\mathbb R)\to[0,1]
\end{align*}
such that
\begin{align*}
K_a(l,B)=\mathbb P(Y_a\in B\mid L=l)
\end{align*}
for $\mu_L$-almost every $l$, and
\begin{align*}
H_a(t,l,B)=\mathbb P(Y_a\in B\mid A=t,L=l)
\end{align*}
for the joint law of $(A,L)$-almost every $(t,l)$. Also let
\begin{align*}
H:(\mathcal T\times\mathcal L)\times\mathcal B(\mathbb R)\to[0,1]
\end{align*}
be an arbitrary regular conditional version of the law of $Y$ given $(A,L)$.
[guided]
We fix the Borel set $B$ because the distributional formula is a pointwise statement in $B$. The two random variables
\begin{align*}
I_a:\Omega\to\{0,1\},\qquad I_a(\omega)=\mathbb 1_{\{Y_a\in B\}}(\omega),
\end{align*}
and
\begin{align*}
I:\Omega\to\{0,1\},\qquad I(\omega)=\mathbb 1_{\{Y\in B\}}(\omega)
\end{align*}
are measurable because $Y_a$ and $Y$ are measurable maps into $(\mathbb R,\mathcal B(\mathbb R))$ and $B$ is Borel.
The positivity hypothesis gives a measurable version
\begin{align*}
p_a:\mathcal L\to[0,1]
\end{align*}
of $\mathbb P(A=a\mid L=l)$ with $p_a(l)>0$ for $\mu_L$-almost every $l$. This function is the [conditional probability](/page/Conditional%20Probability) mass of the point treatment value $a$ inside the covariate stratum $L=l$.
Since the conditioning variables take values in standard Borel spaces, regular conditional laws exist. We choose
\begin{align*}
K_a:\mathcal L\times\mathcal B(\mathbb R)\to[0,1]
\end{align*}
so that $K_a(l,B)$ is a version of $\mathbb P(Y_a\in B\mid L=l)$, and choose
\begin{align*}
H_a:(\mathcal T\times\mathcal L)\times\mathcal B(\mathbb R)\to[0,1]
\end{align*}
so that $H_a(t,l,B)$ is a version of $\mathbb P(Y_a\in B\mid A=t,L=l)$. Finally, $H$ denotes the arbitrary regular conditional law of $Y$ given $(A,L)$ appearing in the theorem. The point of naming these kernels is that version issues occur precisely at the point $(a,l)$, and the proof must show that positivity makes that point identifiable for $\mu_L$-almost every $l$.
[/guided]
[/step]
[step:Use conditional exchangeability to identify the potential-outcome kernel on the stratum $A=a$]
We claim that
\begin{align*}
H_a(a,l,B)=K_a(l,B)
\end{align*}
for $\mu_L$-almost every $l$.
Let $D\in\mathcal G$. Since $H_a$ is a regular conditional version of the law of $Y_a$ given $(A,L)$,
\begin{align*}
\mathbb E[I_a\mathbb 1_{\{A=a\}}\mathbb 1_{\{L\in D\}}]=\int_D H_a(a,l,B)p_a(l)\,d\mu_L(l).
\end{align*}
Conditional exchangeability $Y_a\perp\!\!\!\perp A\mid L$ gives
\begin{align*}
\mathbb E[I_a\mathbb 1_{\{A=a\}}\mid L]=\mathbb E[I_a\mid L]\mathbb P(A=a\mid L)
\end{align*}
a.s. Therefore,
\begin{align*}
\mathbb E[I_a\mathbb 1_{\{A=a\}}\mathbb 1_{\{L\in D\}}]=\int_D K_a(l,B)p_a(l)\,d\mu_L(l).
\end{align*}
Since this holds for every $D\in\mathcal G$,
\begin{align*}
H_a(a,l,B)p_a(l)=K_a(l,B)p_a(l)
\end{align*}
for $\mu_L$-almost every $l$. Positivity gives $p_a(l)>0$ for $\mu_L$-almost every $l$, so cancellation yields
\begin{align*}
H_a(a,l,B)=K_a(l,B)
\end{align*}
for $\mu_L$-almost every $l$.
[/step]
[step:Use consistency to replace the potential outcome by the observed outcome on $A=a$]
We claim that
\begin{align*}
H(a,l,B)=H_a(a,l,B)
\end{align*}
for $\mu_L$-almost every $l$.
Let $D\in\mathcal G$. Consistency gives $I=I_a$ a.s. on $\{A=a\}$, hence
\begin{align*}
\mathbb E[I\mathbb 1_{\{A=a\}}\mathbb 1_{\{L\in D\}}]=\mathbb E[I_a\mathbb 1_{\{A=a\}}\mathbb 1_{\{L\in D\}}].
\end{align*}
Using the regular conditional kernels $H$ and $H_a$, this becomes
\begin{align*}
\int_D H(a,l,B)p_a(l)\,d\mu_L(l)=\int_D H_a(a,l,B)p_a(l)\,d\mu_L(l).
\end{align*}
Since $D\in\mathcal G$ was arbitrary,
\begin{align*}
H(a,l,B)p_a(l)=H_a(a,l,B)p_a(l)
\end{align*}
for $\mu_L$-almost every $l$. Positivity again permits cancellation of $p_a(l)$, and therefore
\begin{align*}
H(a,l,B)=H_a(a,l,B)
\end{align*}
for $\mu_L$-almost every $l$.
[/step]
[step:Integrate over the marginal law of $L$ to obtain the distributional formula]
By the tower property for [conditional expectation](/page/Conditional%20Expectation),
\begin{align*}
\mathbb P(Y_a\in B)=\mathbb E[I_a]=\int_{\mathcal L}K_a(l,B)\,d\mu_L(l).
\end{align*}
The preceding two steps give
\begin{align*}
K_a(l,B)=H(a,l,B)
\end{align*}
for $\mu_L$-almost every $l$. Substitution into the preceding integral gives
\begin{align*}
\mathbb P(Y_a\in B)=\int_{\mathcal L}H(a,l,B)\,d\mu_L(l).
\end{align*}
Since $H(a,l,B)$ is the chosen regular conditional version $\mathbb P(Y\in B\mid A=a,L=l)$, this is exactly
\begin{align*}
\mathbb P(Y_a\in B)=\int_{\mathcal L}\mathbb P(Y\in B\mid A=a,L=l)\,d\mu_L(l).
\end{align*}
The argument used only that $H$ was a regular conditional version of the law of $Y$ given $(A,L)$. Thus every such version gives the same integral value under the stated positivity assumption.
[/step]
[step:Repeat the argument with integrable outcomes to identify the mean]
Assume now that $Y$ and $Y_a$ are integrable. Let
\begin{align*}
m_a:\mathcal L\to\mathbb R
\end{align*}
be a version of $\mathbb E[Y_a\mid L=l]$, let
\begin{align*}
r_a:\mathcal T\times\mathcal L\to\mathbb R
\end{align*}
be a version of $\mathbb E[Y_a\mid A=t,L=l]$, and let
\begin{align*}
r:\mathcal T\times\mathcal L\to\mathbb R
\end{align*}
be a version of $\mathbb E[Y\mid A=t,L=l]$.
Conditional exchangeability gives
\begin{align*}
\mathbb E[Y_a\mathbb 1_{\{A=a\}}\mid L]=\mathbb E[Y_a\mid L]\mathbb P(A=a\mid L)
\end{align*}
a.s. Therefore, for every $D\in\mathcal G$,
\begin{align*}
\int_D r_a(a,l)p_a(l)\,d\mu_L(l)=\int_D m_a(l)p_a(l)\,d\mu_L(l).
\end{align*}
By positivity,
\begin{align*}
r_a(a,l)=m_a(l)
\end{align*}
for $\mu_L$-almost every $l$.
Consistency gives $Y=Y_a$ a.s. on $\{A=a\}$, and hence, for every $D\in\mathcal G$,
\begin{align*}
\int_D r(a,l)p_a(l)\,d\mu_L(l)=\int_D r_a(a,l)p_a(l)\,d\mu_L(l).
\end{align*}
Again positivity yields
\begin{align*}
r(a,l)=r_a(a,l)
\end{align*}
for $\mu_L$-almost every $l$. Combining the two identities gives
\begin{align*}
r(a,l)=m_a(l)
\end{align*}
for $\mu_L$-almost every $l$.
Finally, the tower property gives
\begin{align*}
\mathbb E[Y_a]=\int_{\mathcal L}m_a(l)\,d\mu_L(l).
\end{align*}
Substituting $m_a(l)=r(a,l)$ $\mu_L$-almost everywhere gives
\begin{align*}
\mathbb E[Y_a]=\int_{\mathcal L}r(a,l)\,d\mu_L(l).
\end{align*}
Equivalently,
\begin{align*}
\mathbb E[Y_a]=\mathbb E[\mathbb E[Y\mid A=a,L]].
\end{align*}
This completes the proof.
[/step]