[proofplan]
We first make the conditional mean on each treatment stratum precise as a $\sigma(X)$-measurable [random variable](/page/Random%20Variable). Consistency replaces the observed outcome $Y$ by the corresponding potential outcome $Y(t)$ on the event $\{T=t\}$. Conditional exchangeability then removes the treatment indicator from the conditional mean, and positivity lets us cancel the conditional treatment probability. Applying this for $t=1$ and $t=0$, subtracting, and using the defining property of [conditional expectation](/page/Conditional%20Expectation) gives the average treatment effect.
[/proofplan]
[step:Define the conditional treatment probabilities and potential-outcome regressions]
For each $t\in\{0,1\}$, define the conditional treatment probability
\begin{align*}
\pi_t:=\mathbb P(T=t\mid\sigma(X)).
\end{align*}
Thus $\pi_t:\Omega\to[0,1]$ is a $\sigma(X)$-measurable random variable satisfying
\begin{align*}
\int_B \pi_t\,d\mathbb P
=
\mathbb P(B\cap\{T=t\})
\end{align*}
for every $B\in\sigma(X)$. By positivity, $\pi_t>0$ $\mathbb P$-a.s. for each $t\in\{0,1\}$.
For each $t\in\{0,1\}$, define
\begin{align*}
m_t:=\mathbb E[Y(t)\mid\sigma(X)].
\end{align*}
Since $Y(t)\in L^1(\Omega,\mathcal F,\mathbb P)$, the conditional expectation $m_t$ exists, is integrable, and is $\sigma(X)$-measurable.
[/step]
[step:Show that the observed stratum mean equals the potential-outcome regression]
Fix $t\in\{0,1\}$. We prove that $m_t$ is a valid version of $\mathbb E[Y\mid T=t,X]$ in the sense of the statement.
Let $B\in\sigma(X)$. By the defining property of conditional expectation and the fact that $\mathbb 1_B\pi_t$ is bounded and $\sigma(X)$-measurable,
\begin{align*}
\int_B m_t\pi_t\,d\mathbb P
=
\int_\Omega \mathbb 1_B\pi_tY(t)\,d\mathbb P.
\end{align*}
Conditional independence of $Y(t)$ and $T$ given $\sigma(X)$ implies
\begin{align*}
\mathbb E[\mathbb 1_{\{T=t\}}Y(t)\mid\sigma(X)]
=
\pi_t m_t.
\end{align*}
Therefore,
\begin{align*}
\int_B m_t\pi_t\,d\mathbb P
=
\int_B \mathbb E[\mathbb 1_{\{T=t\}}Y(t)\mid\sigma(X)]\,d\mathbb P
=
\int_{B\cap\{T=t\}}Y(t)\,d\mathbb P.
\end{align*}
By consistency, $Y=Y(t)$ $\mathbb P$-a.s. on $\{T=t\}$, so
\begin{align*}
\int_{B\cap\{T=t\}}Y(t)\,d\mathbb P
=
\int_{B\cap\{T=t\}}Y\,d\mathbb P.
\end{align*}
Combining these identities gives
\begin{align*}
\int_B m_t\pi_t\,d\mathbb P
=
\int_{B\cap\{T=t\}}Y\,d\mathbb P.
\end{align*}
Thus $m_t$ satisfies the defining identity for $\mathbb E[Y\mid T=t,X]$.
[guided]
Fix $t\in\{0,1\}$. The goal is to identify the observed regression in the treatment stratum $T=t$. Because conditioning on the event $\{T=t\}$ together with $X$ can be undefined on covariate strata where treatment has zero probability, the statement defines $\mathbb E[Y\mid T=t,X]$ through the weighted identity
\begin{align*}
\int_B r_t\pi_t\,d\mathbb P
=
\int_{B\cap\{T=t\}}Y\,d\mathbb P
\end{align*}
for every $B\in\sigma(X)$, where $\pi_t=\mathbb P(T=t\mid\sigma(X))$. Positivity gives $\pi_t>0$ $\mathbb P$-a.s., so this identity determines the $\sigma(X)$-measurable regression uniquely up to $\mathbb P$-a.s. equality.
We claim that the potential-outcome regression
\begin{align*}
m_t:=\mathbb E[Y(t)\mid\sigma(X)]
\end{align*}
is such an $r_t$. Let $B\in\sigma(X)$. Since $m_t$ is the conditional expectation of $Y(t)$ given $\sigma(X)$, and since $\mathbb 1_B\pi_t$ is a bounded $\sigma(X)$-measurable random variable, the defining property of conditional expectation gives
\begin{align*}
\int_B m_t\pi_t\,d\mathbb P
=
\int_\Omega \mathbb 1_B\pi_tY(t)\,d\mathbb P.
\end{align*}
Now we use conditional ignorability. Strong ignorability says that $Y(t)$ and the treatment indicator $T$ are conditionally independent given $\sigma(X)$. Hence the conditional expectation of the product $\mathbb 1_{\{T=t\}}Y(t)$ factors into the product of the conditional expectations:
\begin{align*}
\mathbb E[\mathbb 1_{\{T=t\}}Y(t)\mid\sigma(X)]
=
\mathbb E[\mathbb 1_{\{T=t\}}\mid\sigma(X)]\,\mathbb E[Y(t)\mid\sigma(X)].
\end{align*}
By the definitions of $\pi_t$ and $m_t$, this becomes
\begin{align*}
\mathbb E[\mathbb 1_{\{T=t\}}Y(t)\mid\sigma(X)]
=
\pi_t m_t.
\end{align*}
Integrating this identity over $B\in\sigma(X)$ and using the defining property of conditional expectation yields
\begin{align*}
\int_B m_t\pi_t\,d\mathbb P
=
\int_B \mathbb E[\mathbb 1_{\{T=t\}}Y(t)\mid\sigma(X)]\,d\mathbb P
=
\int_{B\cap\{T=t\}}Y(t)\,d\mathbb P.
\end{align*}
Finally, consistency converts the potential outcome into the observed outcome on the treatment stratum. Since $Y=Y(t)$ $\mathbb P$-a.s. on $\{T=t\}$, we have
\begin{align*}
\int_{B\cap\{T=t\}}Y(t)\,d\mathbb P
=
\int_{B\cap\{T=t\}}Y\,d\mathbb P.
\end{align*}
Therefore
\begin{align*}
\int_B m_t\pi_t\,d\mathbb P
=
\int_{B\cap\{T=t\}}Y\,d\mathbb P.
\end{align*}
This is exactly the defining identity for $\mathbb E[Y\mid T=t,X]$, so the observed stratum mean equals $\mathbb E[Y(t)\mid\sigma(X)]$.
[/guided]
[/step]
[step:Apply the stratum identity for treated and control units]
From the previous step, for $t=1$,
\begin{align*}
\mathbb E[Y\mid T=1,X]
=
\mathbb E[Y(1)\mid\sigma(X)]
\end{align*}
up to $\mathbb P$-a.s. equality. For $t=0$,
\begin{align*}
\mathbb E[Y\mid T=0,X]
=
\mathbb E[Y(0)\mid\sigma(X)]
\end{align*}
up to $\mathbb P$-a.s. equality. Since $Y(1)$ and $Y(0)$ are integrable, both conditional expectations are integrable, and linearity of conditional expectation gives
\begin{align*}
\mathbb E[Y(1)\mid\sigma(X)]-\mathbb E[Y(0)\mid\sigma(X)]
=
\mathbb E[Y(1)-Y(0)\mid\sigma(X)].
\end{align*}
Therefore
\begin{align*}
\mathbb E[Y\mid T=1,X]-\mathbb E[Y\mid T=0,X]
=
\mathbb E[Y(1)-Y(0)\mid\sigma(X)]
\end{align*}
$\mathbb P$-a.s.
[/step]
[step:Take expectations to recover the average treatment effect]
Taking expectations in the almost-sure identity from the previous step and using the defining property of conditional expectation with the set $\Omega\in\sigma(X)$, we obtain
\begin{align*}
\mathbb E\big[\mathbb E[Y\mid T=1,X]-\mathbb E[Y\mid T=0,X]\big]
=
\mathbb E\big[\mathbb E[Y(1)-Y(0)\mid\sigma(X)]\big]
=
\mathbb E[Y(1)-Y(0)].
\end{align*}
By definition,
\begin{align*}
\operatorname{ATE}:=\mathbb E[Y(1)-Y(0)].
\end{align*}
Hence
\begin{align*}
\operatorname{ATE}
=
\mathbb E\big[\mathbb E[Y\mid T=1,X]-\mathbb E[Y\mid T=0,X]\big].
\end{align*}
This is the claimed identification formula.
[/step]