[proofplan]
We first partition the event $\{Y_a\in B\}$ according to the countable values of the counterfactual mediator $M_a$. Composition turns $Y_a$ into $Y_m$ on the stratum $\{M_a=m\}$, and cross-world exchangeability factors the resulting joint probability into a mediator term and an outcome term. The mediator term is identified by treatment-mediator exchangeability and mediator consistency. The outcome term is expanded over the observed treatment and then identified inside each treatment stratum by mediator-outcome exchangeability and outcome consistency.
[/proofplan]
[step:Decompose the counterfactual outcome by the mediator value under treatment $a$]
For each $m\in\mathcal M$, define the event
\begin{align*}
E_m:=\{M_a=m\}\in\mathcal F.
\end{align*}
The nested potential outcome $Y_{M_a}:(\Omega,\mathcal F)\to(S,\mathcal S)$ is the [random variable](/page/Random%20Variable) defined by $Y_{M_a}(\omega)=Y_{M_a(\omega)}(\omega)$.
The events $(E_m)_{m\in\mathcal M}$ are pairwise disjoint and their union is $\Omega$. Since $Y_a=Y_{M_a}$ a.s. by composition, and since $Y_{M_a}=Y_m$ on $E_m$, countable additivity gives
\begin{align*}
\mathbb P(Y_a\in B)
=
\sum_{m\in\mathcal M}\mathbb P(Y_m\in B,E_m).
\end{align*}
[guided]
The first task is to expose the mediator pathway. For every mediator value $m\in\mathcal M$, define
\begin{align*}
E_m:=\{M_a=m\}.
\end{align*}
Because $M_a:\Omega\to\mathcal M$ is a random variable and $\mathcal M$ is countable with the discrete $\sigma$-algebra, each $E_m$ is an event in $\mathcal F$. The family $(E_m)_{m\in\mathcal M}$ is pairwise disjoint, and every $\omega\in\Omega$ belongs to exactly one of these events.
Composition says that the treatment-level potential outcome $Y_a$ is obtained by setting the mediator to whatever value it would take under treatment $a$:
\begin{align*}
Y_a=Y_{M_a}
\end{align*}
almost surely. On the event $E_m=\{M_a=m\}$, the nested potential outcome $Y_{M_a}$ is therefore $Y_m$. Hence, up to a null set,
\begin{align*}
\{Y_a\in B\}\cap E_m
=
\{Y_m\in B\}\cap E_m.
\end{align*}
Using the countable disjoint decomposition of $\Omega$ by the events $E_m$, we obtain
\begin{align*}
\mathbb P(Y_a\in B)
=
\sum_{m\in\mathcal M}\mathbb P(Y_a\in B,E_m).
\end{align*}
Replacing $Y_a$ by $Y_m$ on $E_m$ gives
\begin{align*}
\mathbb P(Y_a\in B)
=
\sum_{m\in\mathcal M}\mathbb P(Y_m\in B,E_m).
\end{align*}
This is the point where the composition assumption is used: it converts the total-effect counterfactual into mediator-specific counterfactuals.
[/guided]
[/step]
[step:Factor the mediator-specific joint probabilities using cross-world exchangeability]
Fix $m\in\mathcal M$. Since $E_m=\{M_a=m\}$ and $Y_m\perp\!\!\!\perp M_a$, the events $\{Y_m\in B\}$ and $E_m$ are independent. Therefore
\begin{align*}
\mathbb P(Y_m\in B,E_m)
=
\mathbb P(Y_m\in B)\mathbb P(M_a=m).
\end{align*}
Substituting into the preceding decomposition yields
\begin{align*}
\mathbb P(Y_a\in B)
=
\sum_{m\in\mathcal M}\mathbb P(Y_m\in B)\mathbb P(M_a=m).
\end{align*}
[guided]
Fix a mediator value $m\in\mathcal M$. The event paired with $\{Y_m\in B\}$ is
\begin{align*}
E_m=\{M_a=m\}.
\end{align*}
The cross-world front-door exchangeability assumption says that the random variable $Y_m$ is independent of the random variable $M_a$. Therefore every event measurable with respect to $Y_m$ is independent of every event measurable with respect to $M_a$. In particular, $\{Y_m\in B\}$ is independent of $E_m$, so
\begin{align*}
\mathbb P(Y_m\in B,E_m)
=
\mathbb P(Y_m\in B)\mathbb P(E_m).
\end{align*}
Since $E_m=\{M_a=m\}$, this becomes
\begin{align*}
\mathbb P(Y_m\in B,E_m)
=
\mathbb P(Y_m\in B)\mathbb P(M_a=m).
\end{align*}
Substituting this factorization into the decomposition from the previous step gives
\begin{align*}
\mathbb P(Y_a\in B)
=
\sum_{m\in\mathcal M}\mathbb P(Y_m\in B)\mathbb P(M_a=m).
\end{align*}
This is the step where the cross-world assumption separates the mediator distribution from the mediator-specific outcome distribution.
[/guided]
[/step]
[step:Identify the counterfactual mediator law from observed mediator data]
Fix $m\in\mathcal M$. Since $M_a\perp\!\!\!\perp A$ and $\mathbb P(A=a)>0$,
\begin{align*}
\mathbb P(M_a=m)
=
\mathbb P(M_a=m\mid A=a).
\end{align*}
Mediator consistency gives $M=M_a$ a.s. on $\{A=a\}$, so
\begin{align*}
\mathbb P(M_a=m\mid A=a)
=
\mathbb P(M=m\mid A=a).
\end{align*}
Thus
\begin{align*}
\mathbb P(M_a=m)
=
\mathbb P(M=m\mid A=a).
\end{align*}
[guided]
Fix $m\in\mathcal M$. Treatment-mediator exchangeability states that $M_a$ is independent of $A$. Because the theorem fixes $a\in\mathcal A$ with $\mathbb P(A=a)>0$, the [conditional probability](/page/Conditional%20Probability) given $A=a$ is defined, and independence gives
\begin{align*}
\mathbb P(M_a=m)
=
\mathbb P(M_a=m\mid A=a).
\end{align*}
Mediator consistency now converts the counterfactual mediator into the observed mediator on the treatment stratum $\{A=a\}$. On this event, $M=M_a$ almost surely, so the two events $\{M_a=m\}\cap\{A=a\}$ and $\{M=m\}\cap\{A=a\}$ differ by a null set. Dividing their equal probabilities by $\mathbb P(A=a)>0$ gives
\begin{align*}
\mathbb P(M_a=m\mid A=a)
=
\mathbb P(M=m\mid A=a).
\end{align*}
Combining the two equalities yields
\begin{align*}
\mathbb P(M_a=m)
=
\mathbb P(M=m\mid A=a).
\end{align*}
[/guided]
[/step]
[step:Expand the mediator intervention outcome law over treatment strata]
Fix $m\in\mathcal M$. Since the events $\{A=a'\}$ for $a'\in\mathcal A$ partition $\Omega$, countable additivity gives
\begin{align*}
\mathbb P(Y_m\in B)
=
\sum_{a'\in\mathcal A}\mathbb P(Y_m\in B,A=a').
\end{align*}
For each $a'\in\mathcal A$ with $\mathbb P(A=a')>0$,
\begin{align*}
\mathbb P(Y_m\in B,A=a')
=
\mathbb P(Y_m\in B\mid A=a')\mathbb P(A=a').
\end{align*}
If $\mathbb P(A=a')=0$, the corresponding joint probability is $0$, and the same summand may be taken as $0$. Hence
\begin{align*}
\mathbb P(Y_m\in B)
=
\sum_{a'\in\mathcal A}\mathbb P(Y_m\in B\mid A=a')\mathbb P(A=a').
\end{align*}
[/step]
[step:Identify each treatment-stratum outcome term from observed outcomes]
Now assume $\mathbb P(M=m\mid A=a)>0$. Fix $a'\in\mathcal A$ with $\mathbb P(A=a')>0$. Positivity gives $\mathbb P(M=m,A=a')>0$, so the conditional probability given $\{M=m,A=a'\}$ is defined. By mediator-outcome exchangeability within treatment strata,
\begin{align*}
\mathbb P(Y_m\in B\mid A=a')
=
\mathbb P(Y_m\in B\mid M=m,A=a').
\end{align*}
Outcome consistency gives $Y=Y_m$ a.s. on $\{M=m\}$, and therefore also on $\{M=m,A=a'\}$. Hence
\begin{align*}
\mathbb P(Y_m\in B\mid M=m,A=a')
=
\mathbb P(Y\in B\mid M=m,A=a').
\end{align*}
Consequently, for every $m\in\mathcal M$ with $\mathbb P(M=m\mid A=a)>0$,
\begin{align*}
\mathbb P(Y_m\in B)
=
\sum_{a'\in\mathcal A:\,\mathbb P(A=a')>0}\mathbb P(Y\in B\mid M=m,A=a')\mathbb P(A=a').
\end{align*}
Mediator values with $\mathbb P(M=m\mid A=a)=0$ will be omitted from the final outer sum, so no conditional probability on unsupported mediator-outcome strata is needed.
[guided]
We now identify the remaining counterfactual quantity $\mathbb P(Y_m\in B)$. The reason for conditioning on the observed treatment $A$ is that the front-door exchangeability assumption for $Y_m$ is stated within treatment strata.
Fix $m\in\mathcal M$. Since $A:\Omega\to\mathcal A$ is countable-valued, the events $\{A=a'\}$, indexed by $a'\in\mathcal A$, form a countable measurable partition of $\Omega$. Therefore
\begin{align*}
\mathbb P(Y_m\in B)
=
\sum_{a'\in\mathcal A}\mathbb P(Y_m\in B,A=a').
\end{align*}
For a treatment value $a'$ with $\mathbb P(A=a')>0$, the definition of conditional probability gives
\begin{align*}
\mathbb P(Y_m\in B,A=a')
=
\mathbb P(Y_m\in B\mid A=a')\mathbb P(A=a').
\end{align*}
If $\mathbb P(A=a')=0$, then $\mathbb P(Y_m\in B,A=a')=0$, so the corresponding term contributes zero. Thus
\begin{align*}
\mathbb P(Y_m\in B)
=
\sum_{a'\in\mathcal A}\mathbb P(Y_m\in B\mid A=a')\mathbb P(A=a').
\end{align*}
Now assume $\mathbb P(M=m\mid A=a)>0$, since only such mediator values appear in the final outer sum. Take a treatment value $a'$ with $\mathbb P(A=a')>0$. The positivity assumption gives
\begin{align*}
\mathbb P(M=m,A=a')>0.
\end{align*}
This guarantees that the conditional law given both $M=m$ and $A=a'$ is defined. Mediator-outcome exchangeability within treatment strata says that, conditional on $A=a'$, the potential outcome $Y_m$ has the same conditional law after additionally conditioning on $M=m$. Therefore
\begin{align*}
\mathbb P(Y_m\in B\mid A=a')
=
\mathbb P(Y_m\in B\mid M=m,A=a').
\end{align*}
Finally, outcome consistency converts the potential outcome into the observed outcome in the stratum where the observed mediator actually equals $m$. On the event $\{M=m\}$, we have $Y=Y_m$ almost surely. Since $\{M=m,A=a'\}\subseteq\{M=m\}$, this gives
\begin{align*}
\mathbb P(Y_m\in B\mid M=m,A=a')
=
\mathbb P(Y\in B\mid M=m,A=a').
\end{align*}
Substituting this identified conditional probability into the expansion over the positive-probability treatment strata yields
\begin{align*}
\mathbb P(Y_m\in B)
=
\sum_{a'\in\mathcal A:\,\mathbb P(A=a')>0}\mathbb P(Y\in B\mid M=m,A=a')\mathbb P(A=a').
\end{align*}
If $\mathbb P(M=m\mid A=a)=0$, then this mediator value is omitted from the final outer sum, so no conditional probability on unsupported refined strata is required.
[/guided]
[/step]
[step:Substitute the identified mediator and outcome components]
From the factorization step and the omission of zero outer coefficients,
\begin{align*}
\mathbb P(Y_a\in B)
=
\sum_{m\in\mathcal M:\,\mathbb P(M=m\mid A=a)>0}\mathbb P(Y_m\in B)\mathbb P(M_a=m).
\end{align*}
Using the mediator identification
\begin{align*}
\mathbb P(M_a=m)
=
\mathbb P(M=m\mid A=a)
\end{align*}
and the outcome identification, valid for $m$ with $\mathbb P(M=m\mid A=a)>0$,
\begin{align*}
\mathbb P(Y_m\in B)
=
\sum_{a'\in\mathcal A:\,\mathbb P(A=a')>0}\mathbb P(Y\in B\mid M=m,A=a')\mathbb P(A=a'),
\end{align*}
we obtain
\begin{align*}
\mathbb P(Y_a\in B)
=
\sum_{m\in\mathcal M:\,\mathbb P(M=m\mid A=a)>0}\mathbb P(M=m\mid A=a)
\sum_{a'\in\mathcal A:\,\mathbb P(A=a')>0}\mathbb P(Y\in B\mid M=m,A=a')\mathbb P(A=a').
\end{align*}
This is the asserted front-door formula under the stated conditional exchangeability, consistency, composition, and positivity assumptions.
[guided]
We start from the factorized counterfactual expression
\begin{align*}
\mathbb P(Y_a\in B)
=
\sum_{m\in\mathcal M}\mathbb P(Y_m\in B)\mathbb P(M_a=m).
\end{align*}
The mediator-identification step gives
\begin{align*}
\mathbb P(M_a=m)
=
\mathbb P(M=m\mid A=a).
\end{align*}
Therefore all mediator values with $\mathbb P(M=m\mid A=a)=0$ contribute zero to the sum and may be omitted. For each remaining mediator value, the outcome-identification step gives
\begin{align*}
\mathbb P(Y_m\in B)
=
\sum_{a'\in\mathcal A:\,\mathbb P(A=a')>0}\mathbb P(Y\in B\mid M=m,A=a')\mathbb P(A=a').
\end{align*}
Substituting these two identified components into the factorized expression yields
\begin{align*}
\mathbb P(Y_a\in B)
=
\sum_{m\in\mathcal M:\,\mathbb P(M=m\mid A=a)>0}\mathbb P(M=m\mid A=a)
\sum_{a'\in\mathcal A:\,\mathbb P(A=a')>0}\mathbb P(Y\in B\mid M=m,A=a')\mathbb P(A=a').
\end{align*}
The restrictions on the sums are exactly the support conditions needed for the conditional probabilities in the displayed expression to be defined.
[/guided]
[/step]