[proofplan]
The proof decomposes the effect of setting $A=a$ through the finite mediator $M$, then identifies the two factors in that decomposition from the observational law. The front-door graphical assumptions first reduce the conditional outcome under $do(A=a)$ and mediator value $m$ to the mediator intervention law $Y_m$, then identify the law of $M_a$ with the observed conditional law of $M$ given $A=a$, and finally identify the law of $Y_m$ by adjusting over $A$. Substituting these two finite identifications gives the displayed front-door formula, with the positivity assumptions ensuring that every [conditional probability](/page/Conditional%20Probability) used is defined.
[/proofplan]
[step:Decompose the intervention on $A$ through the mediator values]
Fix $B\in\mathcal E$. Since $S_M$ is finite, the events $\{M_a=m\}$ for $m\in S_M$ form a finite measurable partition of $\Omega$. By finite additivity of $\mathbb P$ and conditional probability on the atoms $\{M_a=m\}$, we have
\begin{align*}
\mathbb P(Y_a\in B)=\sum_{m\in S_M}\mathbb P(Y_a\in B\mid M_a=m)\mathbb P(M_a=m),
\end{align*}
where terms with $\mathbb P(M_a=m)=0$ may be assigned arbitrary conditional values because they are multiplied by zero.
The first front-door graphical hypothesis says that every directed path from $A$ to $Y$ passes through $M$. Under the do-calculus compatibility assumed in the statement, Pearl's rule for deleting actions, [citetheorem:9680], applies after the mediator has been fixed: in the graph where incoming arrows to $A$ are removed and outgoing arrows from $M$ are deleted for the conditioning exchange, $Y$ is separated from the action on $A$ once $M=m$ is fixed. Hence, for every $m\in S_M$ with $\mathbb P(M_a=m)>0$,
\begin{align*}
\mathbb P(Y_a\in B\mid M_a=m)=\mathbb P(Y_m\in B).
\end{align*}
Substituting this do-calculus identification into the partition formula gives
\begin{align*}
\mathbb P(Y_a\in B)=\sum_{m\in S_M}\mathbb P(Y_m\in B)\mathbb P(M_a=m).
\end{align*}
[guided]
We first isolate the only channel through which the intervention $do(A=a)$ can affect $Y$: the mediator $M$. Because $S_M$ is finite, the events $\{M_a=m\}$, indexed by $m\in S_M$, form a finite partition. Hence finite additivity and the elementary [law of total probability](/theorems/1113) give
\begin{align*}
\mathbb P(Y_a\in B)=\sum_{m\in S_M}\mathbb P(Y_a\in B\mid M_a=m)\mathbb P(M_a=m).
\end{align*}
If $\mathbb P(M_a=m)=0$, the corresponding product is zero, so the conditional probability on that null event does not affect the sum.
Now we use the causal content of the first front-door condition. Since every directed path from $A$ to $Y$ passes through $M$, once the mediator is fixed at $m$, the remaining action on $A$ has no directed route by which it can affect $Y$. The point is not merely a pointwise equality of counterfactual variables; what we need is an equality of the relevant conditional interventional laws. The do-calculus compatibility hypothesis supplies this through Pearl's rule for deleting actions, [citetheorem:9680]: after the mediator value $m$ is fixed, the graph obtained by the corresponding surgery separates $Y$ from the action on $A$. Therefore, whenever $\mathbb P(M_a=m)>0$,
\begin{align*}
\mathbb P(Y_a\in B\mid M_a=m)=\mathbb P(Y_m\in B).
\end{align*}
Substituting this identified conditional law into the total-probability decomposition yields
\begin{align*}
\mathbb P(Y_a\in B)=\sum_{m\in S_M}\mathbb P(Y_m\in B)\mathbb P(M_a=m).
\end{align*}
This is the key structural decomposition: the effect of setting $A=a$ is expressed as a mixture over mediator values.
[/guided]
[/step]
[step:Identify the interventional mediator law from the observed conditional law]
The second front-door graphical hypothesis says that there is no unblocked back-door path from $A$ to $M$. By the action-exchange step justified by [Pearl's Rule Two](/theorems/9679), [citetheorem:9679], applied to the intervention $do(A=a)$ and the event $\{M=m\}$, the graph-surgery distribution of $M_a$ agrees with the observed conditional distribution of $M$ given $A=a$. Since $\mathbb P(A=a)>0$, this conditional probability is defined, and for every $m\in S_M$,
\begin{align*}
\mathbb P(M_a=m)=\mathbb P(M=m\mid A=a).
\end{align*}
Hence the previous decomposition becomes
\begin{align*}
\mathbb P(Y_a\in B)=\sum_{m\in S_M}\mathbb P(Y_m\in B)\mathbb P(M=m\mid A=a).
\end{align*}
Terms with $\mathbb P(M=m\mid A=a)=0$ vanish, so the sum may be restricted to those $m\in S_M$ satisfying $\mathbb P(M=m\mid A=a)>0$.
[/step]
[step:Identify the outcome law under mediator intervention by adjusting over $A$]
Fix $m\in S_M$ such that $\mathbb P(M=m\mid A=a)>0$. Since $S_A$ is finite, the events $\{A=a'\}$ for $a'\in S_A$ form a finite measurable partition of $\Omega$. Therefore
\begin{align*}
\mathbb P(Y_m\in B)=\sum_{a'\in S_A}\mathbb P(Y_m\in B\mid A=a')\mathbb P(A=a'),
\end{align*}
with zero-probability atoms contributing zero.
The third front-door graphical hypothesis says that conditioning on $A$ blocks every back-door path from $M$ to $Y$. By the action-exchange step justified by Pearl's Rule Two, [citetheorem:9679], applied conditionally on $A=a'$, the intervention $do(M=m)$ may be exchanged with conditioning on $M=m$ for the event $\{Y\in B\}$ whenever the relevant conditional law is defined. Thus, for every $a'\in S_A$ with $\mathbb P(A=a')>0$ and $\mathbb P(M=m,A=a')>0$,
\begin{align*}
\mathbb P(Y_m\in B\mid A=a')=\mathbb P(Y\in B\mid M=m,A=a').
\end{align*}
By the stated positivity assumption, $\mathbb P(M=m,A=a')>0$ for every $a'\in S_A$ with $\mathbb P(A=a')>0$. Therefore
\begin{align*}
\mathbb P(Y_m\in B)=\sum_{\substack{a'\in S_A, \mathbb P(A=a')>0}}\mathbb P(Y\in B\mid M=m,A=a')\mathbb P(A=a').
\end{align*}
[guided]
We now identify the remaining interventional factor $\mathbb P(Y_m\in B)$. The idea is to average over the observed treatment $A$, because the front-door criterion says that $A$ blocks the back-door paths from $M$ to $Y$.
Since $S_A$ is finite, the events $\{A=a'\}$ form a finite partition. The law of total probability gives
\begin{align*}
\mathbb P(Y_m\in B)=\sum_{a'\in S_A}\mathbb P(Y_m\in B\mid A=a')\mathbb P(A=a').
\end{align*}
Terms with $\mathbb P(A=a')=0$ contribute zero, so only treatment values with positive observational probability matter.
For such an $a'$, we want to replace the intervention $do(M=m)$ by ordinary conditioning on $M=m$. The third front-door condition supplies exactly the needed graphical separation: after conditioning on $A$, all back-door paths from $M$ to $Y$ are blocked. Under the graph-surgery compatibility assumption, Pearl's action-exchange rule, [citetheorem:9679], therefore gives
\begin{align*}
\mathbb P(Y_m\in B\mid A=a')=\mathbb P(Y\in B\mid M=m,A=a').
\end{align*}
This equality is meaningful only when the conditional probability on the right is defined. The theorem's positivity assumption ensures this: for the fixed $m$ with $\mathbb P(M=m\mid A=a)>0$ and every $a'\in S_A$ with $\mathbb P(A=a')>0$, one has
\begin{align*}
\mathbb P(M=m,A=a')>0.
\end{align*}
Thus every conditional probability $\mathbb P(Y\in B\mid M=m,A=a')$ appearing in the finite sum is well-defined. We conclude that
\begin{align*}
\mathbb P(Y_m\in B)=\sum_{\substack{a'\in S_A, \mathbb P(A=a')>0}}\mathbb P(Y\in B\mid M=m,A=a')\mathbb P(A=a').
\end{align*}
[/guided]
[/step]
[step:Substitute the two identified factors into the mediator decomposition]
From the mediator decomposition and the identification of the mediator law,
\begin{align*}
\mathbb P(Y_a\in B)=\sum_{\substack{m\in S_M, \mathbb P(M=m\mid A=a)>0}}\mathbb P(Y_m\in B)\mathbb P(M=m\mid A=a).
\end{align*}
Substituting the identified expression for $\mathbb P(Y_m\in B)$ from the previous step gives
\begin{align*}
\mathbb P(Y_a\in B)=\sum_{\substack{m\in S_M, \mathbb P(M=m\mid A=a)>0}}\mathbb P(M=m\mid A=a)\sum_{\substack{a'\in S_A, \mathbb P(A=a')>0}}\mathbb P(Y\in B\mid M=m,A=a')\mathbb P(A=a').
\end{align*}
This is the asserted front-door identification formula for the arbitrary event $B\in\mathcal E$, so the proof is complete.
[/step]