[proofplan]
We first pass from strong ignorability given the full covariate $X$ to strong ignorability given the scalar propensity score $e(X)$ using the [Rosenbaum-Rubin propensity score theorem](/theorems/9660). Positivity ensures that the treated and control conditional means given the propensity score are defined on the relevant strata. For each treatment value $t$, conditional ignorability removes the conditioning on $T=t$ from the potential-outcome mean, and consistency replaces the potential outcome by the observed outcome on the event $\{T=t\}$. Averaging the resulting conditional contrast over the distribution of $e(X)$ gives the average treatment effect by the tower property.
[/proofplan]
custom_env
admin
[step:Introduce the propensity-score sigma-algebra and record positivity on its strata]
For any [random variable](/page/Random%20Variable) or finite tuple of random variables, write $\sigma(\cdot)$ for the sub-$\sigma$-algebra of $\mathcal F$ that it generates. Define the propensity-score random variable
\begin{align*}
S:(\Omega,\mathcal F)\to([0,1],\mathcal B([0,1]))
\end{align*}
by $S=e(X)$. Let $\mathcal G:=\sigma(S)$ be the sub-$\sigma$-algebra generated by $S$.
Since $e(X)=\mathbb P(T=1\mid X)$ $\mathbb P$-a.s. and $0<\mathbb P(T=1\mid X)<1$ $\mathbb P$-a.s., we have
\begin{align*}
0<S<1 \quad \mathbb P\text{-a.s.}
\end{align*}
Moreover, by the tower property for [conditional expectation](/page/Conditional%20Expectation) applied first with respect to $\sigma(X)$ and then with respect to $\mathcal G\subseteq\sigma(X)$,
\begin{align*}
\mathbb P(T=1\mid \mathcal G)=\mathbb E[\mathbb 1_{\{T=1\}}\mid \mathcal G]=\mathbb E[\mathbb E[\mathbb 1_{\{T=1\}}\mid X]\mid \mathcal G]=\mathbb E[S\mid \mathcal G]=S
\end{align*}
$\mathbb P$-a.s. Hence
\begin{align*}
\mathbb P(T=0\mid \mathcal G)=1-S
\end{align*}
$\mathbb P$-a.s. Therefore $\mathbb P(T=t\mid \mathcal G)>0$ $\mathbb P$-a.s. for both $t\in\{0,1\}$.
[/step]
custom_env
admin
[step:Use Rosenbaum-Rubin to obtain conditional ignorability given $e(X)$]By the [Rosenbaum-Rubin Propensity Score Theorem][citetheorem:9660], strong ignorability of $T$ given $X$ and the definition of the propensity score imply
\begin{align*}
(Y(1),Y(0))\perp\!\!\!\perp T\mid \mathcal G.
\end{align*}
In particular, for each $t\in\{0,1\}$, the integrable random variable $Y(t)$ is conditionally independent of $T$ given $\mathcal G$. Since $\mathbb P(T=t\mid\mathcal G)>0$ $\mathbb P$-a.s. by the preceding step, the conditional mean on the treatment stratum is well defined and satisfies
\begin{align*}
\mathbb E[Y(t)\mid T=t,S]=\mathbb E[Y(t)\mid S]
\end{align*}
$\mathbb P$-a.s.[/step]
custom_env
admin
[guided]The purpose of the propensity score is to replace adjustment for the full covariate $X$ by adjustment for the one-dimensional score $S=e(X)$. The result that justifies this replacement is the [Rosenbaum-Rubin Propensity Score Theorem][citetheorem:9660]. Its hypotheses match the present setting: $T$ is binary, $e(X)$ is a propensity score because $e(X)=\mathbb P(T=1\mid X)$ $\mathbb P$-a.s., and strong ignorability given $X$ is assumed. Therefore the theorem gives
\begin{align*}
(Y(1),Y(0))\perp\!\!\!\perp T\mid \sigma(S).
\end{align*}
We also record positivity inside this guided argument. Since $e(X)=\mathbb P(T=1\mid X)$ $\mathbb P$-a.s. and $S=e(X)$, the assumed positivity gives
\begin{align*}
0<S<1 \quad \mathbb P\text{-a.s.}
\end{align*}
Moreover, because $S$ is $\sigma(S)$-measurable and $\sigma(S)\subseteq\sigma(X)$, the tower property gives
\begin{align*}
\mathbb P(T=1\mid \sigma(S))=\mathbb E[\mathbb 1_{\{T=1\}}\mid \sigma(S)]=\mathbb E[\mathbb E[\mathbb 1_{\{T=1\}}\mid X]\mid \sigma(S)]=\mathbb E[S\mid \sigma(S)]=S
\end{align*}
$\mathbb P$-a.s. Therefore $\mathbb P(T=0\mid\sigma(S))=1-S$ $\mathbb P$-a.s., and $\mathbb P(T=t\mid\sigma(S))>0$ $\mathbb P$-a.s. for both $t\in\{0,1\}$.
Fix $t\in\{0,1\}$. Since $Y(t)$ is one coordinate of the tuple $(Y(1),Y(0))$, conditional independence of the tuple implies
\begin{align*}
Y(t)\perp\!\!\!\perp T\mid \sigma(S).
\end{align*}
This conditional independence says that, after conditioning on $S$, learning whether $T=t$ does not change the conditional law of $Y(t)$. Because $Y(t)\in L^1(\Omega,\mathcal F,\mathbb P)$, conditional expectations of $Y(t)$ exist. Because we have just proved $\mathbb P(T=t\mid\sigma(S))>0$ $\mathbb P$-a.s., conditioning further on the event $\{T=t\}$ is legitimate on almost every propensity-score stratum. Hence
\begin{align*}
\mathbb E[Y(t)\mid T=t,S]=\mathbb E[Y(t)\mid S]
\end{align*}
$\mathbb P$-a.s.
This is the exact point where ignorability is used: it converts an unobservable conditional mean of the potential outcome among subjects assigned treatment $t$ into the unconditional propensity-score-specific potential-outcome mean.[/guided]
custom_env
admin
[step:Use consistency to replace potential outcomes by observed outcomes within treatment strata]
Fix $t\in\{0,1\}$. Consistency gives
\begin{align*}
Y=Y(t) \quad \mathbb P\text{-a.s. on } \{T=t\}.
\end{align*}
Let $H$ be a bounded $\sigma(T,S)$-measurable real-valued random variable. Since $\mathbb 1_{\{T=t\}}H$ is $\sigma(T,S)$-measurable and vanishes outside $\{T=t\}$, consistency implies
\begin{align*}
\mathbb E[\mathbb 1_{\{T=t\}}HY]=\mathbb E[\mathbb 1_{\{T=t\}}HY(t)].
\end{align*}
This is precisely the testing identity for conditional expectations on the sub-$\sigma$-algebra $\sigma(T,S)$ restricted to the stratum $\{T=t\}$. Since both random variables are integrable, it gives
\begin{align*}
\mathbb E[Y\mid T=t,S]=\mathbb E[Y(t)\mid T=t,S]
\end{align*}
$\mathbb P$-a.s. on the treatment stratum, hence for the chosen versions of the stratum conditional means. Combining this identity with conditional ignorability from the preceding step yields
\begin{align*}
\mathbb E[Y\mid T=t,S]=\mathbb E[Y(t)\mid S]
\end{align*}
$\mathbb P$-a.s. for each $t\in\{0,1\}$.
[/step]
custom_env
admin
[step:Average the propensity-score-specific contrast to obtain the ATE]
Applying the previous step with $t=1$ and $t=0$ gives
\begin{align*}
\mathbb E[Y\mid T=1,S]-\mathbb E[Y\mid T=0,S]=\mathbb E[Y(1)\mid S]-\mathbb E[Y(0)\mid S]
\end{align*}
$\mathbb P$-a.s. By linearity of conditional expectation for integrable random variables,
\begin{align*}
\mathbb E[Y(1)\mid S]-\mathbb E[Y(0)\mid S]=\mathbb E[Y(1)-Y(0)\mid S]
\end{align*}
$\mathbb P$-a.s. Taking expectations and using the tower property gives
\begin{align*}
\mathbb E\big[\mathbb E[Y\mid T=1,S]-\mathbb E[Y\mid T=0,S]\big]=\mathbb E[\mathbb E[Y(1)-Y(0)\mid S]]=\mathbb E[Y(1)-Y(0)].
\end{align*}
By the definition of $\operatorname{ATE}$, the right-hand side is $\operatorname{ATE}$. Since $S=e(X)$, this is precisely
\begin{align*}
\operatorname{ATE}
= \mathbb E\big[\mathbb E[Y\mid T=1,e(X)]-\mathbb E[Y\mid T=0,e(X)]\big].
\end{align*}
[/step]